GraphDB-SE Plug-in API

Skip to end of metadata
Go to start of metadata
GraphDB Documentation
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

Next versions

GraphDB 6.5
GraphBD 6.6
GraphBD 7.0
GraphDB 7.1

Previous versions

GraphDB 6.3
GraphDB 6.2
GraphDB 6.0 & 6.1

[OWLIM 5.4]
[OWLIM 5.3]
[OWLIM 5.2]
[OWLIM 5.1]
[OWLIM 5.0]
[OWLIM 4.4]
[OWLIM 4.3]
[OWLIM 4.2]
[OWLIM 4.1]
[OWLIM 4.0]

Overview

The GraphDB Plug-in API is a framework and a set of public classes and interfaces, which allow developers to extend GraphDB in many useful ways. These extensions are bundled into plug-ins, which GraphDB discovers during its initialisation phase, and then uses to delegate parts of its query processing tasks. The plug-ins are given low-level access to GraphDB repository data, which enables them to do their job efficiently. The plug-ins are discovered via the Java service discovery mechanism, which enables dynamic addition/removal of plug-ins from the system without having to recompile GraphDB or change any configuration files.
This section covers the plug-in capabilities that the framework provides and introduced mostly by example.

Description of a GraphDB plug-in

A GraphDB plug-in is a java class that implements the com.ontotext.trree.sdk.Plugin interface. All public classes and interfaces of the plug-in API are located in this java package, i.e. com.ontotext.trree.sdk, so this package name is omitted for the rest of this section. Here is what the Plugin interface looks like in abbreviated form:

Being derived from the Service interface means that plug-ins is automatically discovered at run-time provided that the following conditions also hold:

  • The plug-in class is located somewhere in the classpath;
  • It is mentioned in a META-INF/services/com.ontotext.trree.sdk.Plugin file in the classpath or in a jar that is in the classpath. The full class signature should be written in such a file alone on a separate line.

The only method introduced by the Service interface is getName(), which provides the plug-in's (service's) name. This name should be unique within a particular GraphDB repository and serves as a plug-in identifier, which can be used at any time to retrieve a reference to the plug-in instance. The rest of the base Plugin methods are described further in the following sections.

There are a lot more functions (interfaces) that a plug-in could implement, but these are all optional and are declared in separate interfaces. Implementing any such complementary interface is the means to announce to the system what this particular plug-in can do in addition to its mandatory plug-in responsibilities. It is then automatically used as appropriate.

The life-cycle of a plug-in

A plug-in's life-cycle is separated into several phases:

  • Discovery - this phase is executed at repository initialisation. GraphDB searches for all plug-in services registered in META-INF/services/com.ontotext.trree.sdk.Plugins service registry files and constructs a single instance of each plug-in found.
  • Configuration - every plug-in instance discovered and constructed during the previous phase is then configured. During this phase plug-ins are injected with a Logger object, which they use for logging (setLogger(Logger logger)), and the path to their own data directory (setDataDir(File dataDir)), which they create, if needed, and then use to store their data. If a plug-in doesn't need to store anything to the disk, then it can skip the creation of its data directory. However, if it needs to use it, it is guaranteed that this directory will be unique and available only to the particular plug-in that it was assigned to. The plug-ins also inject Statements and Entities instances (see below), and a SystemOptions instance, which gives the plug-ins access to the system-wide configuration options and settings.
  • Initialisation - after a plug-in has been configured the framework calls its initialize() method so it gets the chance to do whatever initialisation work it needs to do. It is important at this point that the plug-in has received all its configuration and low-level access to the repository data (see Statements and Entities below).
  • Request - the plug-in participates in the request processing. This phase is optional for the plug-ins. It is divided into several sub-phases and each plug-in can choose to participate in any or none of these. The request phase not only includes the evaluation of, for instance SPARQL queries, but also SPARQL/Update requests and getStatements calls. Here are the sub-phases of the request phase:
    • Pre-processing - plug-ins are given the chance to modify the request before it is processed. In this phase they could also initialise a context object, which will be visible till the end of the request processing (see below);
    • Pattern interpretation - plug-ins can choose to provide results for requested statement patterns (see below);
    • Post-processing - before the request results are returned to the client, plug-ins are given a chance to modify them, filter them out or even insert new results (see below);
  • Shutdown - during repository shutdown, each plug-in is prompted to execute its own shutdown routines, free resources, flush data to disk, etc. This should be done in the shutdown() method.

Repository Internals (Statements and Entities)

In order to enable efficient request processing plug-ins are given low-level access to the repository data and internals. This is done through the Statements and Entities interfaces.

The Entities interface represents a set of RDF objects (URIs, blank nodes and literals). All such objects are termed entities and are given unique long identifiers. The Entities instance is responsible for resolving those objects from their identifiers and inversely for looking up the identifier of a given entity. Most plug-ins process entities using their identifiers, because dealing with integer identifiers is a lot more efficient than working with the actual RDF entities they represent. The Entities interface is the single entry point available to plug-ins for entity management. It supports the addition of new entities, entity replacement, look-up of entity type and properties, resolving entities, listening for entity change events, etc. It is possible in a GraphDB repository to declare two RDF objects to be equivalent (e.g. by using owl:sameAs. In order to provide a way to use such declarations, the Entities interface assigns a class identifier to each entity. For newly created entities this class identifier is the same as the entity identifier. When two entities are declared equivalent one of them adopts the class identifier of the other, and thus they become members of the same equivalence class. The Entities interface exposes the entity class identifier for plug-ins to determine which entities are equivalent.
Entities within an Entities instance have a certain scope. There are three entity scopes:

  • Default - entities stored in this scope are persisted to disk and can be used in statements that are also physically stored on disk. These entities have non-zero, positive identifiers and are often referred to physical entities.
  • System - system entities have negative identifiers and are not persisted to disk. They can be used e.g. for system (or magic) predicates. They are available throughout the whole repository lifetime, but after it is restarted, they disappear and need to be re-created again, should one need them.
  • Request - entities stored in request scope, like system entities, are not persisted on disk and have negative identifiers. However, they only live in the scope of a particular request. They are not visible to other concurrent requests and disappear immediately after the request processing has finished. This scope is useful for temporary entities like literal values that are not expected to occur often (e.g. numerical values) and don't appear inside a physical statement.

The Statements interface represents a set of RDF statements where statement means a quadruple of subject, predicate, object and context RDF entity identifiers. Statements can be added, removed and searched for. Additionally, a plug-in can subscribe to receive statement event notifications:

  • transaction started;
  • statement added;
  • statement deleted;
  • transaction completed.

An important abstract class, which is related to GraphDB internals, is StatementIterator. It has a method - boolean next() - which attempts to scroll the iterator onto the next available statement and returns true only if it succeeds. In the case of success its subject, predicate, object and context fields are initialised with the respective components of the next statement. Furthermore, some properties of each statement are available via the following methods:

  • boolean isReadOnly() - returns true if the statement is in the Axioms part of the rule-file or is imported at initialisation;
  • boolean isExplicit() - returns true if the statement is explicitly asserted;
  • boolean isImplicit() - returns true if the statement is produced by the inferencer (raw statements can be both explicit and implicit).

Here is a brief example, which puts Statements, Entities and StatementIterator together, in order to output all literals that are related to a given URI:

Putting Statements, Entities and StatementIterator to work

Getting to know these interfaces should be sufficient for a plug-in developer to make full use of GraphDB repository data.

Request-Processing Phases

As already mentioned, a plug-in's interaction with each of the request-processing phases is optional. The plug-in declares if it plans to participate in any phase by implementing the appropriate interface.

Pre-processing

A plug-in willing to participate in request pre-processing should implement the Preprocessor interface. It looks like this:

Preprocessor.java

The preprocess() method receives the request object and returns RequestContext instance. The Request instance passed as the parameter is a different class instance, depending on the type of the request (e.g. SPARQL/Update or "get statements"). The plug-in changes the request object in the necessary way, and initialises and returns its context object, which is passed back to it in every other method during the request processing phase. The returned request context may be null, and whatever it is, it is only visible to the plug-in that initialises it. It can be used to store data, visible for (and only for) this whole request, e.g. to pass data relating to two different statement patterns recognised by the plug-in. The request context gives further request processing phases access to the Request object reference. Plug-ins that opt to skip this phase do not have a request context and are not able to get access to the original Request object.

Pattern Interpretation

This is one of the most important phases in the lifetime of a plug-in. In fact most plug-ins need to participate in exactly this phase. This is the point where request statement patterns need to get evaluated and statement results are returned. For example, consider the following SPARQL query:

Simple SPARQL query

There is just one statement pattern inside this query: ?s <http://example/predicate> ?o. All plug-ins that have implemented the PatternInterpreter interface (thus declaring that they intend to participate in the pattern interpretation phase) are asked if they can interpret this pattern. The first one to accept it and return results for it will be used. If no plug-in interprets the pattern it will be looked up using the repository's physical statements, i.e. the ones persisted on the disk.

Here is the PatternInterpreter interface:

PatternInterpreter.java

The estimate() and interpret() methods take the same arguments and are used in the following way:

  • given a statement pattern (e.g. the one in the SPARQL query above) all plug-ins that implement PatternInterpreter are asked to interpret() the pattern. The subject, predicate, object and context values are either the identifiers of the values in the pattern or 0, if any of them is an unbound variable. The statements and entities objects represent respectively the statements and entities that are available for this particular request. For instance, if the query contains any FROM <http://some/graph> clauses, the statements object will only provide access to statements in the defined named graphs. Similarly, the entities object contains entities that might be valid only for this particular request. The plug-in's interpret() method must return a StatementIterator, if it intends to interpret this pattern or null, if it refuses.
  • in case the plug-in signals that it will interpret the given pattern (returns non-null value), GraphDB's query optimiser will call the plug-in's estimate() method, in order to get an estimate on how many results will be returned by the StatementIterator returned by interpret(). This estimate need not be precise, but the better it is, the more likely the optimiser will make an efficient optimisation. There is a slight difference in the values that will be passed to estimate(). The statement components (e.g. subject) might not only be entity identifiers, but also they can be set to 2 special values:
    • Entities.BOUND - the pattern component is said to be bound, but its particular binding is not yet known;
    • Entities.UNBOUND - the pattern component will not be bound.
      These values should be treated as hints to the estimate() method to provide a better approximation of the result set size, although its precise value cannot be determined before the query is actually run.
  • after the query has been optimised the interpret() method of the plug-in might be called again should any variable become bound due to the pattern reordering applied by the optimiser. Plug-ins should be prepared to expect different combinations of bound and unbound statement pattern components, and return appropriate iterators.

The requestContext parameter is the value returned by the preprocess() method, if one exists, or null otherwise.

The plug-in framework also supports the interpretation of an extended type of list pattern. Consider the following SPARQL query:

Simple SPARQL query

If a plug-in wants to handle such list patterns it has to implement an interface very similar to the PatternInterpreter interface - ListPatternInterpreter:

ListPatternInterpreter.java

It only differs by having multiple objects passed as an array of long, instead of a single long object. The semantics of both methods is equivalent to the one in the basic pattern interpretation case.

Post-processing

There are cases when a plug-in would like to modify or otherwise filter the final results of a request. This is where the Postprocessor interface comes into play:

Postprocessor.java

The postprocess() method is called for each binding set that is to be returned to the repository client. This method may modify the binding set and return it, or alternatively return null, in which case the binding set is removed from the result set. After a binding set is processed by a plug-in, the possibly modified binding set is passed to the next plug-in having post-processing functionality enabled. After the binding set is processed by all plug-ins (in the case where no plug-in deletes it), it is returned to the client. Finally, after all results are processed and returned, each plug-in's flush() method is called to introduce new binding set results in the result set. These in turn are finally returned to the client.

Update processing

As well as query/read processing, plug-ins are able to process update operations for statement patterns containing specific predicates. In order to intercept updates, a plug-in must implement the UpdateInterpreter interface. During initialisation the getPredicatesToListenFor is called once by the framework, so that the plug-in can indicate, which predicates it is interested in.

From then onwards, the plug-in framework will filter updates for statements using these predicates and notify the plug-in. Filtered updates are not processed further by GraphDB, so if the insert or delete operation should be persisted, then the plug-in must handle this by using the Statements object passed to it.

UpdateInterpreter.java

Putting It All Together: An example Plug-in

The example plug-in have two responsibilities:

  • it interpres patterns like ?s <http://example.com/time> ?o and binds their object component to a literal containing the repository local date and time;
  • if a FROM <http://example.com/time> clause is detected in the query, the result is a single binding set in which all projected variables are bound to a literal containing the repository local date and time.

For the first part, it is clear that the plug-in implements the PatternInterpreter interface. A date/time literal is stored as a request-scope entity to avoid cluttering the repository with extra literals.

For the second requirement the plug-in must first take part in the pre-processing phase, in order to inspect the query and detect the FROM clause. Then the plug-in must hook into the post-processing phase, where if the pre-processing phase detects the desired FROM clause, it deletes all query results (in postprocess() and returns (in flush()) a single result containing the binding set specified by the requirements. Again, request-scoped literals are created.

The plug-in implementation extends the PluginBase class that provides a default implementation of the Plugin methods:

Example plugin

In this basic implementation the plug-in name is defined and during initialisation a single system-scope predicate is registered. It is important not to forget to register the plug-in in the META-INF/services/com.ontotext.trree.sdk.Plugin file in the classpath.

The next step is to implement the first of the plug-in's requirements - the pattern interpretation part:

Example plug-in

The interpret() method only processes patterns with a predicate matching the desired predicate identifier. Further on, it simply creates a new date/time literal (in the request scope) and places its identifier in the object position of the returned single result. The estimate() method always returns 1, because this is the exact size of the result set.

Finally to implement the second requirement concerning the interpretation of the FROM clause:

Example plug-in, pre- and post-processing

The plug-in provides the custom implementation of the RequestContext interface, which can hold a reference to the desired single BindingSet with the date/time literal, bound to every variable name in the query projection. The postprocess() method filters out all results if the requestContext is non-null (i.e. if the FROM clause is detected by preprocess()). Finally, flush() returns a singleton iterator, containing the desired binding set in the required case and returns nothing otherwise.

Making a Plug-in Configurable

Plug-ins are expected to require configuring. There are two ways for GraphDB plug-ins to receive their configuration. The first practice is to define magic system predicates that can be used to pass some configuration values to the plug-in through a query at run-time. This approach is appropriate whenever the configuration changes from one plug-in usage scenario to another, i.e. when there are no globally valid parameters for the plug-in. However, in many cases the plug-in behaviour has to be configured "globally" and then the plug-in framework provides a suitable mechanism through the Configurable interface.

A plug-in implements the Configurable interface to announce its configuration parameters to the system. This allows it to read parameter values during initialisation from the repository configuration and have them merged with all other repository parameters (accessible through the SystemOptions instance passed during the configuration phase).

This is the Configurable interface:

Configurable.java

The plug-in needs to enumerate its configuration parameter names. The example plug-in is extended with the ability to define the name of special predicate it uses. The parameter is called predicate-uri and it accepts a URI value.

Example plug-in, configuration

Now that the plug-in parameter has been declared, it can be configured either by adding the http://www.ontotext.com/trree/owlim#predicate-uri parameter to the GraphDB configuration, or by setting a Java system property using -Dpredicate-uri parameter for the JVM running GraphDB.

There are also a special kind of configuration parameters, called "memory" parameters. These are parameters that are used to configure the amount of memory available for the plug-in to use. If a plug-in has such parameters it uses the MemoryConfigurable interface:

MemoryConfigurable

The getMemoryParameters() method enumerates the names of the plug-in's memory parameters in a similar way to Configurable.getParameters(). During the configuration phase, the plug-in's setMemoryParameter() method is called once for each such parameter with its respective configured value in bytes. The parameters defined as memory parameters can be given values like "1g" or "300M", but such values are interpreted and converted to bytes.

A special property of the memory parameters is that they can be configured in a group. GraphDB accepts a parameter called cache-memory. This parameter accumulates the values of a group of other parameters: tuple-index-memory, fts-memory and predicate-memory. Declaring a memory parameter, automatically adds it in the group of parameters accumulated by cache-memory. What is good about this approach is that, if cache-memory is configured to some amount and any of the grouped memory parameters is not configured (unknown), the amount configured for cache-memory is divided among all unknown memory parameters, thus providing the user with a simple way to control the memory requirements of many plug-ins using a single parameter. For instance, if cache-memory is configured to "100m", tuple-index-memory to "20m", there are no predicate lists configured (which automatically disables the predicate-memory parameter) and there are 4 memory parameters declared by several plug-ins, which weren't explicitly configured. The effect of such a setup is that 80M (100M - 20M) are divided among the 4 memory parameters and each of them is set to 20M. This value is then reported to the plug-ins in bytes using their setMemoryParameter() method.

Accessing other plug-ins

Plug-ins are able to make use of the functionality of other plug-ins. For example, the Lucene-based full-text search plug-in can make use of the rank values provided by the RDFRank plug-in, to facilitate query result scoring and ordering. This is not a matter of re-using program code (e.g. in a jar with common classes), rather it is about re-using data. The mechanism to do this allows plug-ins to obtain references to other plug-in objects by knowing their names. To achieve this they only need to implement the PluginDependency interface:

PluginDependency.java

They are then injected an instance of the PluginLocator interface (during configuration phase) that does the actual plug-in discovery for them:

PluginLocator.java

Having a reference to another plug-in is all that is needed to call its methods directly and make use of its services.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.