The GraphDB Connectors provide extremely fast keyword and faceted (aggregation) searches that are typically implemented by an external component or service, but have the additional benefit of staying automatically up-to-date with the GraphDB repository data.
The Connectors provide synchronisation at the entity level, where an entity is defined as having a unique identifier (URI) and a set of properties and property values. In terms of RDF, this corresponds to a set of triples that have the same subject. In addition to simple properties (defined by a single triple), the Connectors support property chains. A property chain is defined as a sequence of triples where each triple's object is the subject of the subsequent triple.
The main features of the GraphDB Connectors are:
- maintaining an index that is always in sync with the data stored in GraphDB
- multiple independent instances per repository
- the entities for synchronisation are defined by:
- a list of fields (on the Solr side) and property chains (on the GraphDB side), the values of which are to be synchronised
- a list of the rdf:type of the entities for synchronisation
- a list of languages for synchronisation (the default is all languages)
- additional filtering by property and value
- full-text search using native Solr queries
- snippet extraction: highlighting of search terms in the search result
- faceted search
- sorting by any preconfigured field
- paging of results using offset and limit
All interaction with the Solr GraphDB Connector are done through SPARQL queries.
There are three types of SPARQL queries:
- INSERT for creating and deleting connectors.
- SELECT for listing connectors and querying connector configuration parameters.
- INSERT/SELECT for storing and querying data as part of the normal GraphDB data workflow.
In general this corresponds to INSERT adds or modifies data and SELECT queries existing data.
Each connector implementation defines its own URI prefix to distinguish it from other connectors. For the Solr GraphDB Connector this is *http://www.ontotext.com/connectors/solr#*. Each command or predicate that is executed by the connector uses this prefix, e.g. <http://www.ontotext.com/connectors/solr##createConnector> for creating a connector for Solr.
Individual instances of a connector are distinguished by unique names that are also URIs. They have their own prefix to avoid clashing with any of the command predicates. For Solr, the instance prefix is http://www.ontotext.com/connectors/solr/instance#.
Creating a connector is done by sending a SPARQL query with the following configuration data:
- the name of the connector (e.g. my_index)
- classes to synchronise
- properties to synchronise
The configuration data must be provided as a JSON string representation and passed together with the create command.
|What we recommend|
Use the GraphDB Connectors management interface provided by the GraphDB Workbench. It lets you create the configuration easily and then create the connector directly or copy the configuration and execute it elsewhere.
The create command is triggered by a SPARQL INSERT with the createConnector predicate, e.g. this creates a connector called model_en that will synchronise the entities of type Model.
Note that we use template_en as a core to copy the config from and we specify the needed languages with languages.
The above command creates a new Solr connector that connects to the Solr instance accessible at port 8983 on the localhost as specified by the "solrUrl" key.
Once a connector has been created, it is possible to query data from it through SPARQL. For each matching abstract document, the connector returns the document's subject. In its simplest form, querying is achieved by using a SELECT and providing the Elasticsearch query as the object of the :query predicate:
The result will bind ?entity to all models that mention Messi in their preferred label.
The bound ?entity can be used in other SPARQL triples in order to build complex queries that fetch additional data from GraphDB. For example, to see the actual model labels in the matching models as well as their types:
It is possible to access the match score returned by Solr with the :score predicate. As each entity has its own score, the predicate must come at the entity level, for example:
The result will again contain products featuring Messi but those that also mention "leather" will have a higher match score.
We can use the same connector to query facets too:
It is important to specify the fields we want to facet by using the facetFields predicate. Its value must be a simple comma-delimited list of field names. In order to get the faceted results, we have to use the facets predicate and as each facet has three components (name, value and count), the facets predicate will bind a blank node, which in turn can be used to access the individual values for each component through the predicates facetName, facetValue, and facetCount.
The resulting bindings look like in the table below:
Limit and offset are supported on the Solr side of the query. This is achieved through the predicates limit and offset. Consider this example, in which we specify an offset of 1 and a limit of 1:
The result will skip the first 20 results and only contain 10 models.
Snippet extraction is used to extract highlighted snippets of text that match the query. The snippets are accessed through the dedicated predicate :snippets. It binds a blank node, which in turn provides the actual snippets via the predicates :snippetField and :snippetText. The predicate :snippets must be attached to the entity, as each entity has a different set of snippets. For example, in a search for Messi in all fields.
The query will return the matching models mentioning Messi as well as the respective matching fields and snippets, e.g.
It is possible to tweak how the snippets are collected/composed by using the following option predicates:
- :snippetSize sets the maximum size of the extracted text fragment, 250 by default.
- :snippetSpanOpen text to insert before the highlighted text, <em> by default.
- :snippetSpanClose text to insert after the highlighted text, </em> by default.
The option predicates are set on the connector instance, much like the :query predicate.
You can get the total number of hits by using the :totalHits predicate, e.g. for the Messi in prefLabel search:
There are 116 models mentioning Messi.
If we change the creation parameters for the connector by using a different language list and Solr template, we can support multiple languages. We have defined two additional connectors model_de and *model_ru" for German and Russian respectively.
To illustrate that the language settings have kicked in, we can search for all models that mention both Messi (Месси in Russian) and the word leather (кожа in Russian) and request the matching snippets so that we can verify the correct words have matched:
The requested word "кож*а*" is the base (dictionary) form of the word, while the results highlight the word "кож*и*", which is the same word but in the genitive case. Matching the two forms of the words would not have been possible without proper language-specific support.