The GraphDB Plug-in API is a framework and a set of public classes and interfaces, which allow developers to extend GraphDB in many useful ways. These extensions are bundled into plug-ins, which GraphDB discovers during its initialisation phase, and then uses to delegate parts of its query processing tasks. The plug-ins are given low-level access to GraphDB repository data, which enables them to do their job efficiently. The plug-ins are discovered via the Java service discovery mechanism, which enables dynamic addition/removal of plug-ins from the system without having to recompile GraphDB or change any configuration files.
A GraphDB plug-in is a java class that implements the com.ontotext.trree.sdk.Plugin interface. All public classes and interfaces of the plug-in API are located in this java package, i.e. com.ontotext.trree.sdk, so this package name is omitted for the rest of this section. Here is what the Plugin interface looks like in abbreviated form:
Being derived from the Service interface means that plug-ins is automatically discovered at run-time provided that the following conditions also hold:
The only method introduced by the Service interface is getName(), which provides the plug-in's (service's) name. This name should be unique within a particular GraphDB repository and serves as a plug-in identifier, which can be used at any time to retrieve a reference to the plug-in instance. The rest of the base Plugin methods are described further in the following sections.
There are a lot more functions (interfaces) that a plug-in could implement, but these are all optional and are declared in separate interfaces. Implementing any such complementary interface is the means to announce to the system what this particular plug-in can do in addition to its mandatory plug-in responsibilities. It is then automatically used as appropriate.
A plug-in's life-cycle is separated into several phases:
In order to enable efficient request processing plug-ins are given low-level access to the repository data and internals. This is done through the Statements and Entities interfaces.
The Entities interface represents a set of RDF objects (URIs, blank nodes and literals). All such objects are termed entities and are given unique long identifiers. The Entities instance is responsible for resolving those objects from their identifiers and inversely for looking up the identifier of a given entity. Most plug-ins process entities using their identifiers, because dealing with integer identifiers is a lot more efficient than working with the actual RDF entities they represent. The Entities interface is the single entry point available to plug-ins for entity management. It supports the addition of new entities, entity replacement, look-up of entity type and properties, resolving entities, listening for entity change events, etc. It is possible in a GraphDB repository to declare two RDF objects to be equivalent (e.g. by using owl:sameAs. In order to provide a way to use such declarations, the Entities interface assigns a class identifier to each entity. For newly created entities this class identifier is the same as the entity identifier. When two entities are declared equivalent one of them adopts the class identifier of the other, and thus they become members of the same equivalence class. The Entities interface exposes the entity class identifier for plug-ins to determine which entities are equivalent.
The Statements interface represents a set of RDF statements where statement means a quadruple of subject, predicate, object and context RDF entity identifiers. Statements can be added, removed and searched for. Additionally, a plug-in can subscribe to receive statement event notifications:
An important abstract class, which is related to GraphDB internals, is StatementIterator. It has a method - boolean next() - which attempts to scroll the iterator onto the next available statement and returns true only if it succeeds. In the case of success its subject, predicate, object and context fields are initialised with the respective components of the next statement. Furthermore, some properties of each statement are available via the following methods:
Here is a brief example, which puts Statements, Entities and StatementIterator together, in order to output all literals that are related to a given URI:
Getting to know these interfaces should be sufficient for a plug-in developer to make full use of GraphDB repository data.
As already mentioned, a plug-in's interaction with each of the request-processing phases is optional. The plug-in declares if it plans to participate in any phase by implementing the appropriate interface.
A plug-in willing to participate in request pre-processing should implement the Preprocessor interface. It looks like this:
The preprocess() method receives the request object and returns RequestContext instance. The Request instance passed as the parameter is a different class instance, depending on the type of the request (e.g. SPARQL/Update or "get statements"). The plug-in changes the request object in the necessary way, and initialises and returns its context object, which is passed back to it in every other method during the request processing phase. The returned request context may be null, and whatever it is, it is only visible to the plug-in that initialises it. It can be used to store data, visible for (and only for) this whole request, e.g. to pass data relating to two different statement patterns recognised by the plug-in. The request context gives further request processing phases access to the Request object reference. Plug-ins that opt to skip this phase do not have a request context and are not able to get access to the original Request object.
This is one of the most important phases in the lifetime of a plug-in. In fact most plug-ins need to participate in exactly this phase. This is the point where request statement patterns need to get evaluated and statement results are returned. For example, consider the following SPARQL query:
There is just one statement pattern inside this query: ?s <http://example/predicate> ?o. All plug-ins that have implemented the PatternInterpreter interface (thus declaring that they intend to participate in the pattern interpretation phase) are asked if they can interpret this pattern. The first one to accept it and return results for it will be used. If no plug-in interprets the pattern it will be looked up using the repository's physical statements, i.e. the ones persisted on the disk.
Here is the PatternInterpreter interface:
The estimate() and interpret() methods take the same arguments and are used in the following way:
The requestContext parameter is the value returned by the preprocess() method, if one exists, or null otherwise.
The plug-in framework also supports the interpretation of an extended type of list pattern. Consider the following SPARQL query:
If a plug-in wants to handle such list patterns it has to implement an interface very similar to the PatternInterpreter interface - ListPatternInterpreter:
It only differs by having multiple objects passed as an array of long, instead of a single long object. The semantics of both methods is equivalent to the one in the basic pattern interpretation case.
There are cases when a plug-in would like to modify or otherwise filter the final results of a request. This is where the Postprocessor interface comes into play:
The postprocess() method is called for each binding set that is to be returned to the repository client. This method may modify the binding set and return it, or alternatively return null, in which case the binding set is removed from the result set. After a binding set is processed by a plug-in, the possibly modified binding set is passed to the next plug-in having post-processing functionality enabled. After the binding set is processed by all plug-ins (in the case where no plug-in deletes it), it is returned to the client. Finally, after all results are processed and returned, each plug-in's flush() method is called to introduce new binding set results in the result set. These in turn are finally returned to the client.
As well as query/read processing, plug-ins are able to process update operations for statement patterns containing specific predicates. In order to intercept updates, a plug-in must implement the UpdateInterpreter interface. During initialisation the getPredicatesToListenFor is called once by the framework, so that the plug-in can indicate, which predicates it is interested in.
From then onwards, the plug-in framework will filter updates for statements using these predicates and notify the plug-in. Filtered updates are not processed further by GraphDB, so if the insert or delete operation should be persisted, then the plug-in must handle this by using the Statements object passed to it.
The example plug-in have two responsibilities:
For the first part, it is clear that the plug-in implements the PatternInterpreter interface. A date/time literal is stored as a request-scope entity to avoid cluttering the repository with extra literals.
For the second requirement the plug-in must first take part in the pre-processing phase, in order to inspect the query and detect the FROM clause. Then the plug-in must hook into the post-processing phase, where if the pre-processing phase detects the desired FROM clause, it deletes all query results (in postprocess() and returns (in flush()) a single result containing the binding set specified by the requirements. Again, request-scoped literals are created.
The plug-in implementation extends the PluginBase class that provides a default implementation of the Plugin methods:
In this basic implementation the plug-in name is defined and during initialisation a single system-scope predicate is registered. It is important not to forget to register the plug-in in the META-INF/services/com.ontotext.trree.sdk.Plugin file in the classpath.
The next step is to implement the first of the plug-in's requirements - the pattern interpretation part:
The interpret() method only processes patterns with a predicate matching the desired predicate identifier. Further on, it simply creates a new date/time literal (in the request scope) and places its identifier in the object position of the returned single result. The estimate() method always returns 1, because this is the exact size of the result set.
Finally to implement the second requirement concerning the interpretation of the FROM clause:
The plug-in provides the custom implementation of the RequestContext interface, which can hold a reference to the desired single BindingSet with the date/time literal, bound to every variable name in the query projection. The postprocess() method filters out all results if the requestContext is non-null (i.e. if the FROM clause is detected by preprocess()). Finally, flush() returns a singleton iterator, containing the desired binding set in the required case and returns nothing otherwise.
Plug-ins are expected to require configuring. There are two ways for GraphDB plug-ins to receive their configuration. The first practice is to define magic system predicates that can be used to pass some configuration values to the plug-in through a query at run-time. This approach is appropriate whenever the configuration changes from one plug-in usage scenario to another, i.e. when there are no globally valid parameters for the plug-in. However, in many cases the plug-in behaviour has to be configured "globally" and then the plug-in framework provides a suitable mechanism through the Configurable interface.
A plug-in implements the Configurable interface to announce its configuration parameters to the system. This allows it to read parameter values during initialisation from the repository configuration and have them merged with all other repository parameters (accessible through the SystemOptions instance passed during the configuration phase).
This is the Configurable interface:
The plug-in needs to enumerate its configuration parameter names. The example plug-in is extended with the ability to define the name of special predicate it uses. The parameter is called predicate-uri and it accepts a URI value.
Now that the plug-in parameter has been declared, it can be configured either by adding the http://www.ontotext.com/trree/owlim#predicate-uri parameter to the GraphDB configuration, or by setting a Java system property using -Dpredicate-uri parameter for the JVM running GraphDB.
There are also a special kind of configuration parameters, called "memory" parameters. These are parameters that are used to configure the amount of memory available for the plug-in to use. If a plug-in has such parameters it uses the MemoryConfigurable interface:
The getMemoryParameters() method enumerates the names of the plug-in's memory parameters in a similar way to Configurable.getParameters(). During the configuration phase, the plug-in's setMemoryParameter() method is called once for each such parameter with its respective configured value in bytes. The parameters defined as memory parameters can be given values like "1g" or "300M", but such values are interpreted and converted to bytes.
A special property of the memory parameters is that they can be configured in a group. GraphDB accepts a parameter called cache-memory. This parameter accumulates the values of a group of other parameters: tuple-index-memory, fts-memory and predicate-memory. Declaring a memory parameter, automatically adds it in the group of parameters accumulated by cache-memory. What is good about this approach is that, if cache-memory is configured to some amount and any of the grouped memory parameters is not configured (unknown), the amount configured for cache-memory is divided among all unknown memory parameters, thus providing the user with a simple way to control the memory requirements of many plug-ins using a single parameter. For instance, if cache-memory is configured to "100m", tuple-index-memory to "20m", there are no predicate lists configured (which automatically disables the predicate-memory parameter) and there are 4 memory parameters declared by several plug-ins, which weren't explicitly configured. The effect of such a setup is that 80M (100M - 20M) are divided among the 4 memory parameters and each of them is set to 20M. This value is then reported to the plug-ins in bytes using their setMemoryParameter() method.
Plug-ins are able to make use of the functionality of other plug-ins. For example, the Lucene-based full-text search plug-in can make use of the rank values provided by the RDFRank plug-in, to facilitate query result scoring and ordering. This is not a matter of re-using program code (e.g. in a jar with common classes), rather it is about re-using data. The mechanism to do this allows plug-ins to obtain references to other plug-in objects by knowing their names. To achieve this they only need to implement the PluginDependency interface:
They are then injected an instance of the PluginLocator interface (during configuration phase) that does the actual plug-in discovery for them:
Having a reference to another plug-in is all that is needed to call its methods directly and make use of its services.
Skip to end of metadata Go to start of metadata