{htmlcomment}
WARNING: DO NOT EDIT THIS ARTICLE. IT HAS BEEN AUTOMATICALLY GENERATED FROM A TEMPLATE.
{htmlcomment}
{toc:maxLevel=2}
h1. Overview
The GraphDB Connectors provide extremely fast normal and facet (aggregation) searches that are typically implemented by an external component or service such as Elasticsearch, but have the additional benefit to stay automatically up-to-date with the GraphDB repository data.
The Connectors provide synchronisation at the _entity_ level, where an entity is defined as having a unique identifier (a URI) and a set of properties and property values. In terms of RDF, this corresponds to a set of triples that have the same subject. In addition to simple properties (defined by a single triple), the Connectors support _property chains_. A property chain is defined as a sequence of triples where each triple's object is the subject of the following triple.
h1. Features
The main features of the GraphDB Connectors are:
* maintaining an index that is always in sync with the data stored in GraphDB;
* multiple independent instances per repository;
* the entities for synchronisation are defined by:
** a list of fields (on the Elasticsearch side) and property chains (on the GraphDB side) whose values will be synchronised;
** a list of rdf:type's of the entities for synchronisation;
** a list of languages for synchronisation (the default is all languages);
** additional filtering by property and value.
* full-text search using native Elasticsearch queries;
* snippet extraction: highlighting of search terms in the search result;
* faceted search;
* sorting by any preconfigured field;
* paging of results using _offset_ and _limit_;
* custom mapping of RDF types to Elasticsearch types;
Each feature is described in detail below.
h1. Sample data
All examples use the following sample data, which describes five fictitious wines: Yoyowine, Franvino, Noirette, Blanquito and Rozova as well as the grape varieties required to make these wines. The minimum required ruleset level in GraphDB is RDFS.
{noformat}
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix : <http://www.ontotext.com/example/wine#> .
:RedWine rdfs:subClassOf :Wine .
:WhiteWine rdfs:subClassOf :Wine .
:RoseWine rdfs:subClassOf :Wine .
:Merlo
rdf:type :Grape ;
rdfs:label "Merlo" .
:CabernetSauvignon
rdf:type :Grape ;
rdfs:label "Cabernet Sauvignon" .
:CabernetFranc
rdf:type :Grape ;
rdfs:label "Cabernet Franc" .
:PinotNoir
rdf:type :Grape ;
rdfs:label "Pinot Noir" .
:Chardonnay
rdf:type :Grape ;
rdfs:label "Chardonnay" .
:Yoyowine
rdf:type :RedWine ;
:madeFromGrape :CabernetSauvignon ;
:hasSugar "dry" ;
:hasYear "2013"^^xsd:integer .
:Franvino
rdf:type :RedWine ;
:madeFromGrape :Merlo ;
:madeFromGrape :CabernetFranc ;
:hasSugar "dry" ;
:hasYear "2012"^^xsd:integer .
:Noirette
rdf:type :RedWine ;
:madeFromGrape :PinotNoir ;
:hasSugar "medium" ;
:hasYear "2012"^^xsd:integer .
:Blanquito
rdf:type :WhiteWine ;
:madeFromGrape :Chardonnay ;
:hasSugar "dry" ;
:hasYear "2012"^^xsd:integer .
:Rozova
rdf:type :RoseWine ;
:madeFromGrape :PinotNoir ;
:hasSugar "medium" ;
:hasYear "2013"^^xsd:integer .
{noformat}
h1. Usage
All interactions with the Elasticsearch GraphDB Connector shall be done through SPARQL queries.
There are three types of SPARQL queries:
* INSERT for creating and deleting connectors;
* SELECT for listing connectors and querying connector configuration parameters;
* INSERT/SELECT for storing and querying data as part of the normal GraphDB data workflow.
In general this corresponds to _INSERT adds or modifies data_ and _SELECT queries existing data_.
Each connector implementation defines its own URI prefix to distinguish it from other connectors. For the Elasticsearch GraphDB Connector, this is *http://www.ontotext.com/connectors/elasticsearch#*. Each command or predicate executed by the connector uses this prefix, e.g. <http://www.ontotext.com/connectors/elasticsearch##createConnector> to create a connector for Elasticsearch.
Individual instances of a connector are distinguished by unique names that are also URIs. They have their own prefix to avoid clashing with any of the command predicates. For Elasticsearch, the instance prefix is http://www.ontotext.com/connectors/elasticsearch/instance#.
h2. Using a connector with a GraphDB cluster
This release introduces support for Elasticsearch connectors in a GraphDB cluster. The connectors require a transactional entity pool, which is off by default. Please, refer to [GraphDB Entity Pool] to enable the transactional entity pool.
h2. Creating a connector
Creating a connector is done by sending a SPARQL query with the following configuration data:
* the name of the connector (e.g. my_index);
* classes to synchronise;
* properties to synchronise.
The configuration data has to be provided as a JSON string representation and passed together with the create command.
{tip:title=What we recommend}
Use the GraphDB Connectors management interface provided by the GraphDB Workbench as it will let you create the configuration easily and then create the connector directly or copy the configuration and execute it elsewhere.
{tip}
The create command is triggered by a SPARQL *INSERT* with the *createConnector* predicate, e.g. this will create a connector called *my_index* that will synchronise the wines from the sample data above:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
INSERT DATA {
inst:my_index :createConnector '''
{
"elasticsearchNode": "localhost:9300",
"types": [
"http://www.ontotext.com/example/wine#Wine"
],
"fields": [
{
"fieldName": "grape",
"propertyChain": [
"http://www.ontotext.com/example/wine#madeFromGrape",
"http://www.w3.org/2000/01/rdf-schema#label"
]
},
{
"fieldName": "sugar",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasSugar"
],
},
{
"fieldName": "year",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasYear"
]
}
]
}
''' .
}
{noformat}
The above command will create a new Elasticsearch connector that will connect to the Elasticsearch instance accessible at port 9300 on the localhost as specified by the "elasticsearchUrl" key.
The "types" key defines the RDF type of the entities to synchronise and in the example it is only entities of the type <http://www.ontotext.com/example/wine#Wine> (and its subtypes). The "fields" key defines the mapping from RDF to Elasticsearch. The basic building block is the property chain, i.e. a sequence of RDF properties where the object of each property is the subject of the following property. In the example we map three bits of information - the wine's grape, sugar content, and year. Each chain is assigned a short and convenient field name: "grape", "sugar", and "year". The field names are later used in the queries.
Grape is an example of a property chain composed of more than one property. First we take the wine's madeFromGrape property, the object of which is an instance of type Grape, and then we take the rdfs:label of this instance. Sugar and year are both composed of a single property that links the value directly to the wine.
h4. Schema and core management
By default GraphDB will manage (create, delete or update if needed) the Solr core and the Solr schema. This makes it easier to use Solr as everything will be done automatically. This behaviour can be changed by the following options:
* _manageIndex_: if true, GraphDB will manage the index. True by default.
* _manageSchema_: if true, GraphDB will manage the schema. True by default.
Note that if either of the options is set to false you will be responsible for creating, updating or removing the core/schema and the connector will not function correctly if you misconfigured Elasticsearch.
h5. Using a non-managed schema
The present version of the connectors provides no support for changing some advanced options, such as stopwords, on a per field basis. The recommended way to do that for now is to manage the mapping yourself and tell the connector to just sync the object values in the appropriate fields. Here is an example:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
INSERT DATA {
inst:my_index :createConnector '''
{
"elasticsearchNode": "localhost:9200",
"types": [
"http://www.ontotext.com/example/wine#Wine"
],
"fields": [
{
"fieldName": "grape",
"propertyChain": [
"http://www.ontotext.com/example/wine#madeFromGrape",
"http://www.w3.org/2000/01/rdf-schema#label"
]
},
{
"fieldName": "sugar",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasSugar"
]
},
{
"fieldName": "year",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasYear"
]
}
],
"manageSchema": "false"
}
''' .
}
{noformat}
this will create the same connector as above but it expects fields with the specified fieldnames to be already present in the index mapping, as well as some internal GraphDB fields. For the example you must have the following fields:
|| field name || Elasticsearch config ||
| _graphdb_id | "type":"long", "index":"not_analyzed", "store":"yes" |
| _chains | "type":"long", "index":"not_analyzed", "store":"no" |
| grape | "type":"string", "index":"analyzed", "store":"yes" |
| sugar | "type":"string", "index":"analyzed", "store":"yes" |
| year | "type":"integer", "index":"analyzed", "store":"yes" |
_graphdb_id and _chains are used internally by GraphDB and are always required.
h2. Dropping a connector
Dropping a connector removes all references to its external store from GraphDB as well as the Sorl core associated with it. Dropping a connector is achieved through a SPARQL INSERT query with the following parameter:
* Name of the connector
The drop command is triggered by a SPARQL *INSERT* with the *dropConnector* predicate, e.g. this will remove the connector *:my_index*:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
INSERT DATA {
inst:my_index :dropConnector "" .
}
{noformat}
h2. Listing available connectors
Listing connectors returns all previously created connectors. It is a *SELECT* query with the *listConnectors* predicate:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
SELECT ?cntUri ?cntStr {
?cntUri :listConnectors ?cntStr .
}
{noformat}
*?cntUri* will be bound to the prefixed URI of the connector that was used during creation, e.g. <http://www.ontotext.com/connectors/elasticsearch/instance#my_index>, while *?cntStr* will be bound to a string, representing the part after the prefix, e.g. "my_index".
h2. Status check
The internal state of each connector can be queried using a *SELECT* query and the *connectorStatus* predicate:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
SELECT ?cntUri ?cntStatus {
?cntUri :connectorStatus ?cntStatus .
}
{noformat}
*?cntUri* will be bound to the connector prefixed URI, while *?cntStatus* will be bound to a string representation of the status of the connector represented by this URI. The status is key-value based.
h2. Adding, updating and deleting data
From the user's point of view all synchronisation will happen transparently without using any additional predicates or naming a specific store explicitly, i.e. the user should simply execute standard SPARQL INSERT/DELETE queries. This is achieved by intercepting all changes in the plugin and determining which abstract documents need to be updated.
h2. Querying data
Once a connector has been created it will be possible to query data from it through SPARQL. For each matching abstract document, the connector returns the document's subject. In its simplest form querying is achieved by using a *SELECT* and providing the Elasticsearch query as the object of the *:query* predicate:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity {
?search a inst:my_index ;
:query "grape:cabernet" ;
:entities ?entity .
}
{noformat}
The result will bind ?entity to the two wines made from grapes that have "cabernet" in their name, namely :Yoyowine and :Franvino.
Note that you must use the field names you chose when you created the connector. It is perfectly valid to have field names identical to the property URIs but then you responsible for escaping any special characters according to what Elasticsearch expects.
First we get an instance of the requested connector by using the RDF notation "X a Y" (= X rdf:type Y), where X is a variable and Y is a connector. X will be bound to an instance of this connector. Then we assign a query to that instance by using the system predicate *:query*. Finally we request the matching entities through the *:entities* predicate.
It is also possible to provide per query search options by using one or more option predicates. The option predicates are described in details below.
h4. Raw queries
If you want to access a Elasticsearch query parameter that is not exposed through a special predicate you can do it with a raw query. Instead of providing a full text query in the :query part you specify raw Elasticsearch parameters. For example, if you want to boost some parts of your full text query as described [here|http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html] you can use the following query:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity {
?search a inst:my_index ;
:query '''
{
{
"bool" : {
"should" : [ {
"query_string" : {
"query" : "<full-text-query-not-boosted>"
}
}, {
"query_string" : {
"query" : "<full-text-query-boosted>",
"boost" : 4.0
}
} ]
}
}
}
''' ;
:entities ?entity .
}
{noformat}
h3. Combining Elasticsearch results with GraphDB data
The bound ?entity can be used in other SPARQL triples in order to build complex queries that fetch additional data from GraphDB. For example to see the actual grapes in the matching wines as well as the year they were made:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
PREFIX wine: <http://www.ontotext.com/example/wine#>
SELECT ?entity ?grape ?year {
?search a inst:my_index ;
:query "grape:cabernet" ;
:entities ?entity .
?entity wine:madeFromGrape ?grape .
?entity wine:hasYear ?year
}
{noformat}
The result will look like this:
|| ?entity || ?grape || ?sugar ||
| :Yoyowine | :CabernetSauvignon | 2013 |
| :Franvino | :Merlo | 2012 |
| :Franvino | :CabernetFranc | 2012 |
Note that :Franvino is returned twice because it is made from two different grapes, both of which are returned.
h3. Entity match score
It is possible to access the match score returned by Elasticsearch with the *:score* predicate. As each entity has its own score, the predicate must come at the entity level. For example:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity ?score {
?search a inst:my_index ;
:query "grape:cabernet" ;
:entities ?entity .
?entity :score ?score
}
{noformat}
The result will look like this but the actual score might be different as it depends on the specific Elasticsearch version:
|| ?entity || ?score ||
| :Yoyowine | 0.9442660212516785 |
| :Franvino | 0.7554128170013428 |
h2. Basic faceting
Consider the sample wine data and the my_index connector described previously. We can use the same connector to query facets too:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?facetName ?facetValue ?facetCount WHERE {
# note empty query is allowed and will just match all documents, hence no :query
?r a inst:my_index ;
:facetFields "year,sugar" ;
:facets _:f .
_:f :facetName ?facetName .
_:f :facetValue ?facetValue .
_:f :facetCount ?facetCount .
}
{noformat}
It is important to specify the fields we want to facet by using the *facetFields* predicate. Its value must be a simple comma-delimited list of field names. In order to get the faceted results, we have to use the *facets* predicate and as each facet has three components (name, value and count), the facets predicate binds a blank node, which in turn can be used to access the individual values for each component through the predicates *facetName*, *facetValue*, and *facetCount*.
The resulting bindings will look like in the table below:
|| facetName || facetValue || facetCount ||
| year | 2012 | 3 |
| year | 2013 | 2 |
| sugar | dry | 3 |
| sugar | medium | 2 |
We can easily see that there are three wines produced in 2012 and two in 2013. We also see that three of the wines are dry, while two are medium. However, it is not necessarily true that the three wines produced in 2012 are the same as the three dry wines as each facet is computed independently.
h2. Advanced faceting and aggregations
While basic faceting allows for simple counting of documents based on the discrete values of a particular field, there are more complex faceted or aggregation searches in Elasticsearch. The connector provides a mapping from Elasticsearch results to RDF results but no mechanism for specifying the queries other than executing a [raw query|#Raw queries].
h3. Supported Elasticsearch facets and aggregations
The Elasticsearch connector supports mapping of range, interval and pivot facets. Please refer to the documentation of Solr for more information.
h3. RDF mapping of the results
The results are accessed through the predicate :aggregations (much like the basic facets are accessed through :facets). The predicate will bind multiple blank nodes that each contain a single aggregation bucket. The individual bucket items can be accessed through these predicates:
|| predicate || meaning || Elasticsearch counterpart ||
| :name | Bucket name | getName() |
| :key | Key or value associated with the bucket | getValue() or getKey() |
| :count | Count of documents in the bucket | getDocCount(), getValue() |
| :from | Start of range | getFrom(), getFromAsDate() |
| :to | End of range (RangeFacet) | getTo(), getToAsDate() |
| :min | Minimum value | getMin(), getValue() |
| :max | Maximum value | getMax(), getValue() |
| :sum | Sum value | getSum(), getValue() |
| :avg | Average value | getAvg(), getValue() |
| :sum_of_squares | Sum of squares value | getSumOfSquares() |
| :variance | Variance value | getVariance() |
| :std_deviation | Standard deviation value | getStdDeviation() |
| :parent | Sub-aggregations: points to the parent (upper level) blank node | |
| :level | Sub-aggregations: level number where 1 is the uppermost level and the following levels are 2, 3 and so on | |
| :levelName | Sub-aggregations: level name | getKey() or getValue() |
{anchor:sorting}
h2. Sorting
It is possible to sort the entities returned by a connector query according to one or more fields. In order to be able to use a certain field for sorting, you have to specify this at the time of creating the connector instance. Sorting is achieved by the *orderBy* predicate the value of which must be a comma-delimited list of fields. Each field may be prefixed with a minus to indicate sorting in descending order. For example:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity {
?search a inst:my_index ;
:query "year:2013" ;
:orderBy "-sugar" ;
:entities ?entity .
}
{noformat}
The result will contain wines produced in 2013 sorted according to their sugar content in descending order:
|| entity ||
| Rozova |
| Yoyowine |
By default, entities are sorted according to their matching score in descending order.
Note that if you join the entity from the connector query to other triples stored in GraphDB, GraphDB might scramble the order. To remedy this, use ORDER BY from SPARQL.
h2. Limit and offset
Limit and offset are supported on the Elasticsearch side of the query. This is achieved through the predicates *limit* and *offset*. Consider this example in which we specify an offset of 1 and a limit of 1:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity {
?search a inst:my_index ;
:query "sugar:dry" ;
:offset "1" ;
:limit "1" ;
:entities ?entity .
}
{noformat}
The result will contain a single wine, Franvino, as it would be second in the list if we executed the query without the limit and offset:
|| entity ||
| Yoyowine |
| *Franvino* |
| Blanquito |
Note that the specific order in which GraphDB returns the results, depends on both how Elasticsearch returns the matches, unless you specified sorting.
h2. Snippet extraction
Snippet extraction is used to extract highlighted snippets of text that match the query. The snippets are accessed through the dedicated predicate *:snippets*, which binds a blank node that in in turn provides the actual snippets via the predicates *:snippetField* and *:snippetText*. The predicate :snippets must be attached to the entity, as each entity has a different set of snippets. For example, in a search for Cabernet:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity ?snippetField ?snippetText {
?search a inst:my_index ;
:query "grape:cabernet" ;
:entities ?entity .
?entity :snippets _:s .
_:s :snippetField ?snippetField ;
:snippetText ?snippetText .
}
{noformat}
The query will return the two wines made from Cabernet Sauvignon or Cabernet Franc grapes as well as the respective matching fields and snippets:
|| ?entity || ?snippetField || ?snippetText ||
| :Yoyowine | grape | <em>Cabernet</em> Sauvignon |
| :Franvino | grape | <em>Cabernet</em> Franc |
Note that the actual snippets might be somewhat different as this depends on the specific Elasticsearch implementation.
It is possible to tweak how the snippets are collected/composed by using the following option predicates:
* *:snippetSize* sets the maximum size of the extracted text fragment, 250 by default.
* *:snippetSpanOpen* text to insert before the highlighted text, <em> by default.
* *:snippetSpanClose* text to insert after the highlighted text, </em> by default.
The option predicates are set on the connector instance, much like the :query predicate.
h2. Total hits
You can get the total number of hits by using the *:totalHits* predicate, e.g. for the connector :my_index and a query that would retrieve all wines made in 2012:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?totalHits {
?r a inst:my_index ;
:query "year:2012" ;
:totalHits ?totalHits .
}
{noformat}
As there are three wines made in 2012, the value 3 (of type xdd:long) will be bound to ?totalHits.
h1. Creation parameters
The creation parameters define how a connector instance is created by the :createConnector predicate. There are some required parameters and some that are optional. All parameters are provided together in a JSON object, where the parameter names are the object keys. Parameter values may be simple JSON values such as a string or a boolean, or they can be lists or objects.
All of the creation parameters can also be set conveniently from the Create Connector user interface in the GraphDB Workbench without any knowledge of JSON.
h3. Elasticsearch instance to sync to: elasticsearchNode (string)
Since Elasticsearch is a third-party service, you have to specify the node where it is running. The format of the node value is of the form *hostname.domain:port*. There is no default value.
h3. Types of entities to sync: types (list of URI)
The RDF types of entities to sync are specified as a list of URIs. At least one type URI must be provided.
h3. What exactly to sync: fields (list of field object)
The fields define exactly what parts of each entity will be synchronised as well as the specific details on the connector side. The field is the smallest synchronisation unit and it maps a property chain from GraphDB to a field in Elasticsearch. The fields are specified as a list of field objects. At least one field object must be provided. Each field object has further keys that specify details.
h4. Name of the field: fieldName (string)
The name of the field defines the mapping on the connector side. It is specified by the key fieldName with a string value. The field name is used at query time to refer to the field. There are few restrictions on the allowed characters in a field name but to avoid unnecessary escaping (which depends on how Elasticsearch parses its queries) we recommend to keep the field names simple.
h4. Property chain to map: propertyChain (list of URI)
The property chain (propertyChain) defines the mapping on the GraphDB side. A property chain is defined as a sequence of triples where the entity URI is the subject of the first triple, its object is the subject of the next triple and so on. In this model, a property chain with a single element corresponds to a direct property defined by a single triple. Property chains are specified as a list of URIs and at least one URI must be provided. If you need to store the entity URI in the connector, you may map it by defining a property chain with a single special URI: $self. Only one field per connector may use the $self notation.
h4. The default value: defaultValue (string)
The default value (defaultValue) provides means for specifying a default value for the field when the property chain has no matching values in GraphDB. The default value can be a plain literal, a literal with a datatype (xsd: prefix supported), a literal with language or a URI. It has no default value.
h4. Indexing the field: indexed (boolean)
Fields are indexed by default but that can be changed by using the Boolean option "indexed". True by default.
If true this option corresponds to "index" = "analyzed" or "not_analyzed". If false it corresponds to "index" = "no".
h4. Storing the field: stored (boolean)
Fields are stored in Elasticsearch by default but that can be changed by using the Boolean option "stored". Stored fields are required for retrieving snippets. True by default.
This option corresponds to the property "store" in the Elasticsearch mapping.
h4. Skipping the analyser: analyzed (boolean)
When literal fields are indexed in Elasticsearch, they will be analysed according to the analyser settings. Should you require that a given field is not analysed you may use "analyzed". This option has no effect for URIs (they are never analysed). True by default.
If true this option corresponds to "index" = "analyzed" in the Elasticsearch schema. If false it corresponds to "index" = "not_analyzed".
h4. Multivalued fields: multivalued (boolean)
RDF propreties and synchronised fields may have more than one value. If "multivalued" is set to true, all values will be synchronised to Elasticsearch. If set to false, only a single value will be synchronised. True by default.
h3. Automatic datatype mapping
The connector will map different types of RDF values to different types of Elasticsearch values according to the basic type of the RDF value (URI or literal) and the datatype of literals. The autodetection will use the following mapping:
|| RDF value || RDF datatype || Elasticsearch type ||
| URI | n/a | string, indexed = not_analyzed |
| literal | none | string |
| literal | xsd:boolean | boolean |
| literal | xsd:double | double |
| literal | xsd:float | float |
| literal | xsd:long | long |
| literal | xsd:int | integer |
| literal | xsd:datetime | date, format = date_optional_time |
| literal | xsd:date | date, format = date_optional_time |
Note that for any given field the automatic mapping will use the first value it sees. This will work fine for clean datasets but might lead to problems if your dataset has non-normalised data, e.g. the first value has no datatype but other values have one.
h4. Manual datatype mapping: datatype (string)
The mapping can be overriden through the property "datatype", which can be specified per field. The value of "datatype" may be any of the xsd: types supported by the automatic mapping or a native Elasticsearch type prefixed by native:, e.g. both xsd:long and native:long will map to the long type in Elasticsearch.
h3. Literals in what language: languages (list of string)
RDF data is often multilingual but you may want to map only some of the languages represented in the literal values. This can be done by specifying a list of language ranges that will be matched to the language tags of literals according to RFC 4647, Section 3.3.1. Basic Filtering. In addition an empty range can be used to include literals that have no language tag. The list of language ranges will map all existing literals that have matching language tags.
h3. Elasticsearch index extra settings: indexCreateSettings (string)
This option will be passed directly to Elasticsearch when creating the index. It can be in JSON, YAML or properties format.
h1. Advanced filtering and fine tuning
h3. entityFilter (string)
The _entityFilter_ parameter is used to fine-tune the set of entities and/or individual values for the configured fields, based on the field value. Entities and field values will be synchronised to Elasticsearch if, and only if, they pass the filter. The entity filter is similar to a FILTER() inside a SPARQL query but not exactly the same. Each configured field can be referred to in the entity filter by prefixing it with a "?", much like referring to a variable in SPARQL. Several operators are supported:
|| Operator || Meaning || Example ||
| ?var in (_value1_, _value2_, ...) | Tests if the field _var_'s value is one of the specified values. Values that do not match will be treated as if they were not present in the repository. | ?status in ("active", "new") |
| ?var not in (_value1_, _value2_, ...) | The negated version of the in-operator. | ?status not in ("archived") |
| bound(?var) | Tests if the field _var_ has a valid value. This can be used to make the field compulsory. | bound(?name) |
| _expr1_ \|\| _expr2_ | Logical disjunction of expressions _expr1_ and _expr2_. | bound(?name) \|\| bound(?company) |
| _expr1_ && _expr2_ | Logical conjunction of expressions _expr1_ and _expr2_. | bound(?status) && ?status in ("active", "new") |
| !_expr_ | Logical negation of expression _expr_. | !bound(?company) |
| ( expr ) | Grouping of expressions | (bound(?name) \|\| bound(?company)) && bound(?address) |
In addition to the operators there are some constructions that can be used to write filters based not on the values but on values related to them:
h4. Accessing the previous element in chain
The construction *parent(?var)* can be used to go to a previous level in a property chain. It can be applied recursively as many times as needed, e.g. *parent(parent(parent(?var)))* will go back in the chain three times. The effective value of *parent(?var)* can be used with the *in* or *not in* operator like this: parent(?company) in (<urn:a>, <urn:b>).
h4. Accessing an element beyond the chain
The construction *?var -> _uri_* (alternatively *?var o _uri_* or just *?var _uri_*) can be used to access additional values that are accessible through the property _uri_. In essence this construction corresponds to the triple pattern _value_ _uri_ ?effectiveValue, where ?value is a value bound by the field _var_. The effective value of *?var -> _uri_* can be used with the *in* or *not in* operator like this: ?company -> rdfs:type in (<urn:c>, <urn:d>). It can be combined with *parent()* like this: parent(?company) -> rdf:type.
The URI parameter can be a full URI within < > or the special string _rdfs:type_ (alternatively just _type_), which will be expanded to http://www.w3.org/1999/02/22-rdf-syntax-ns#type.
h4. Filtering by RDF context
The construction *context(?var)* can be used to access the RDF context of a field's value. The typical use case is to sync only explicit values: context(?a) not in (<http://www.ontotext.com/implicit>). The construction can be combined with *parent()* like this: context(parent(?a)) in (<urn:a>).
h4. Entity filters and default values
Entity filters can be combined with default values in order to get more flexible behaviour.
A typical use-case for an entity filter is having soft deletes, i.e. instead of deleting an entity it is marked as deleted by the presence of a specific value for a given property.
h3. Basic entity filter example
For example, if we create a connector like this:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
INSERT DATA {
inst:my_index :createConnector '''
{
"elasticsearchNode": "localhost:9200",
"types": ["http://www.ontotext.com/example#gadget"],
"fields": [
{
"fieldName": "name",
"propertyChain": ["http://www.ontotext.com/example#name"]
},
{
"fieldName": "city",
"propertyChain": ["http://www.ontotext.com/example#city"]
}
],
"entityFilter":"bound(?city) && ?city in (\\"London\\")"
}
''' .
}
{noformat}
and then insert some entities:
{noformat}
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://www.ontotext.com/example#> .
# the entity bellow will be synchronised because it has a matching value for city: ?city in ("London")
:alpha
rdf:type :gadget ;
:name "John Synced" ;
:city "London" .
# the entity below will not be synchronised because it is lacking the property completely: bound(?city)
:beta
rdf:type :gadget ;
:name "Peter Syncfree" .
# the entity below will not be synchronised - different city value:
# ?city in ("London") will remove the value "Liverpool" so bound(?city) will be false
:gamma
rdf:type :gadget ;
:name "Mary Syncless" ;
:city "Liverpool" .
{noformat}
We could create the following index to specify a default value for _city_:
{noformat}
...
{
"fieldName": "city",
"propertyChain": ["http://www.ontotext.com/example#city"],
"defaultValue": "London"
}
...
}
{noformat}
The default value will be used for entity:b as it has no value for city in the repository. As the value is "London", the entity will be synchronised.
h3. Advanced entity filter example
Sometimes data represented in RDF is not well suited to map directly to non-RDF. For example, if we have news articles and they can be tagged with different concepts (locations, persons, events, etc.), one possible way to model that is a single property :taggedWith. Consider the following RDF data:
{noformat}
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://www.ontotext.com/example2#> .
:Berlin
rdf:type :Location ;
rdfs:label "Berlin" .
:Mozart
rdf:type :Person ;
rdfs:label "Wolfgang Amadeus Mozart" .
:Einstein
rdf:type :Person ;
rdfs:label "Albert Einstein" .
:Cannes-FF
rdf:type :Event ;
rdfs:label "Cannes Film Festival" .
:Article1
rdf:type :Article ;
rdfs:comment "An article about a film about Einstein's life while he was a professor in Berlin." ;
:taggedWith :Berlin ;
:taggedWith :Einstein ;
:taggedWith :Cannes-FF .
:Article2
rdf:type :Article ;
rdfs:comment "An article about Berlin." ;
:taggedWith :Berlin .
:Article3
rdf:type :Article ;
rdfs:comment "An article about Mozart's life." ;
:taggedWith :Mozart .
:Article4
rdf:type :Article ;
rdfs:comment "An article about classical music in Berlin." ;
:taggedWith :Berlin ;
:taggedWith :Mozart .
:Article5
rdf:type :Article ;
rdfs:comment "A boring article that has no tags." .
:Article6
rdf:type :Article ;
rdfs:comment "An article about the Cannes Film Festival in 2013." ;
:taggedWith :Cannes-FF .
{noformat}
Now, if we want to map this data to Elasticsearch such that the property *:taggedWith _x_* is mapped to separate fields *taggedWithPerson* and *taggedWithLocation* according to the type of _x_ (we are not interested in events), we can map :taggedWith twice to different fields and then use an entity filter to get the desired values:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
INSERT DATA {
inst:my_index :createConnector '''
{
"elasticsearchNode": "localhost:9200",
"types": ["http://www.ontotext.com/example2#Article"],
"fields": [
{
"fieldName": "comment",
"propertyChain": ["http://www.w3.org/2000/01/rdf-schema#comment"]
},
{
"fieldName": "taggedWithPerson",
"propertyChain": ["http://www.ontotext.com/example2#taggedWith"]
},
{
"fieldName": "taggedWithLocation",
"propertyChain": ["http://www.ontotext.com/example2#taggedWith"]
}
],
"entityFilter": "?taggedWithPerson type in (<http://www.ontotext.com/example2#Person>) && ?taggedWithLocation type in (<http://www.ontotext.com/example2#Location>)"
}
''' .
}
{noformat}
Note: *type* is the short way to write <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>.
The six articles in the RDF data above will be mapped as such:
|| Article URI || Entity mapped? || Value in taggedWithPerson || Value in taggedWithLocation || Explanation ||
| :Article1 | yes | :Einstein | :Berlin | :taggedWith has the values :Einstein, :Berlin and :Cannes-FF. The filter leaves only the correct values in the respective fields. The value :Cannes-FF is ignored as it does not match the filter. |
| :Article2 | yes | | :Berlin | :taggedWith has the value :Berlin. After the filter is applied, only taggedWithLocation is populated. |
| :Article3 | yes | :Mozart | | :taggedWith has the value :Mozart. After the filter is applied, only taggedWithPerson is populated |
| :Article4 | yes | :Mozart | :Berlin | :taggedWith has the values :Berlin and :Mozart. The filter leaves only the correct values in the respective fields. |
| :Article5 | yes | | | :taggedWith has no values. The filter is not relevant. |
| :Article6 | yes | | | :taggedWith has the value :Cannes-FF. The filter removes it as it does not match. |
This can be checked by issuing a faceted search for taggedWithLocation and taggedWithPerson:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?facetName ?facetValue ?facetCount {
?search a inst:my_index ;
:facetFields "taggedWithLocation,taggedWithPerson" ;
:facets _:f .
_:f :facetName ?facetName ;
:facetValue ?facetValue ;
:facetCount ?facetCount .
}
{noformat}
If the filter was applied you should get only :Berlin for taggedWithLocation and only :Einstein and :Mozart for taggedWithPerson:
|| ?facetName || ?facetValue || ?facetCount ||
| taggedWithLocation | http://www.ontotext.com/example2#Berlin | 3 |
| taggedWithPerson | http://www.ontotext.com/example2#Mozart | 2 |
| taggedWithPerson| http://www.ontotext.com/example2#Einstein | 1 |
h1. Overview of connector predicates
The following diagram shows a summary of all predicates that can administer (create, drop, check status) connector instances or issue queries and retrieve results. It can be used as a quick reference of what a particular predicate needs to be attached to. For example, to retrieve entities you need to use :entities on a search instance and to retrieve snippets you need to use :snippets on an entity. Variables that are bound as a result of query are shown in green, blank helper nodes are shown in blue, literals in red, and URIs in orange. The predicates are represented by labelled arrows.
{plantuml}
left to right direction
skinparam activity {
BackgroundColor<<BNode>> #D1E0FF
BackgroundColor<<Var>> #D1FFD1
BackgroundColor<<URI>> #FFCC80
BackgroundColor #FFE3E3
}
partition "Instance level" {
"instance URI" <<URI>> -->[:createConnector] "JSON params"
"instance URI" -->[:dropConnector] "dummy value"
"instance URI" -->[:repairConnector] "dummy value"
"instance URI" -->[:connectorStatus] "?status" <<Var>>
"_:search" <<BNode>> -->[rdf:type] "instance URI"
}
partition "Search level: query and options" {
"_:search" -->[:query] "query value"
"_:search" -->[:limit] "limit value"
"_:search" -->[:offset] "offset value"
"_:search" -->[:orderBy] "order by expression"
"_:search" -->[:facetFields] "field name list"
"_:search" -->[:snippetSize] "snippet size value"
"_:search" -->[:snippetSpanOpen] "string"
"_:search" -->[:snippetSpanClose] "string"
}
partition "Search level: results"
"_:search" -->[:entities] "?entity" <<Var>>
"_:search" -->[:totalHits] "?totalHits" <<Var>>
"_:search" -->[:facets] "_:facet" <<BNode>>
"_:search" -->[:aggregations] "_:aggregation" <<BNode>>
}
partition "Entity level" {
"?entity" -->[:score] "?score" <<Var>>
"?entity" -->[:snippets] "_:snippet" <<BNode>>
}
partition "Snippet level" {
"_:snippet" -->[:snippetField] "?snippetField" <<Var>>
"_:snippet" -->[:snippetText] "?snippetText" <<Var>>
}
partition "Facet level" {
"_:facet" -->[:facetName] "?facetName" <<Var>>
"_:facet" -->[:facetValue] "?facetValue" <<Var>>
"_:facet" -->[:facetCount] "?facetCount" <<Var>>
}
partition "Aggregation level" {
"_:aggregation" -->[:name] "?aggrName" <<Var>>
"_:aggregation" -->[:key] "?aggrKey" <<Var>>
"_:aggregation" -->[:count] "?aggrCount" <<Var>>
"_:aggregation" -->[:from] "?aggrFrom" <<Var>>
"_:aggregation" -->[:to] "?aggrTo" <<Var>>
"_:aggregation" -->[:min] "?aggrMin" <<Var>>
"_:aggregation" -->[:max] "?aggrMax" <<Var>>
"_:aggregation" -->[:sum] "?aggrSum" <<Var>>
"_:aggregation" -->[:avg] "?aggrAvg" <<Var>>
"_:aggregation" -->[:sum_of_squares] "?aggrSumSq" <<Var>>
"_:aggregation" -->[:std_deviation] "?aggrStdDev" <<Var>>
"_:aggregation" -->[:variance] "?aggrVar" <<Var>>
"_:aggregation" -->[:parent] "?aggrParent" <<Var>>
"_:aggregation" -->[:level] "?aggrLevel" <<Var>>
"_:aggregation" -->[:levelName] "?aggrLevelName" <<Var>>
}
{plantuml}
h1. Caveats
h2. Order of control
Even though SPARQL per se is not sensitive to the order of triple patterns, the connectors expect to receive certain predicates before others so that queries can be executed properly. In particular, predicates that specify the query or query options need to come before any predicates that fetch results.
The diagram in [#Overview of connector predicates] provides a quick overview of the predicates.
h1. Migrating from a pre-6.2 version
GraphDB prior to 6.2 shipped with a versions of the connectors that had different options and slightly different behaviour. Most existing connector instances will be automatically migrated to the new settings but in some cases it is not possible to continue using the same queries. It is recommended to review the connector configuration after the upgrade and if necessary recreate it with adjusted parameters.
h2. Changes in field configuration and synchronisation
Prior to 6.2 a single field in the config could produce up to three individual fields on the Elasticsearch side based on the field options. For example for the field "firstName":
|| field || note ||
| firstName | produced if option "index" was true; used explicitly in queries |
| _facet_firstName | produced if option "facet" was true; used implicitly for facet search |
| _sort_firstName | produced if option "sort" was true; used implicitly for ordering connector results |
The current version always produces a single Elasticsearch field per field definition in the configuration. This means you are responsible for creating all appropriate fields based on your needs. See more under [#Creation parameters].
h2. The option manageExternalIndex
Prior to 6.2 the option _manageExternalIndex_ could be used to control the management of both the schema and the index. In the current implementation there are separate options, _manageSchema_ and _manageIndex_. See [#Schema and index management] for more information.
WARNING: DO NOT EDIT THIS ARTICLE. IT HAS BEEN AUTOMATICALLY GENERATED FROM A TEMPLATE.
{htmlcomment}
{toc:maxLevel=2}
h1. Overview
The GraphDB Connectors provide extremely fast normal and facet (aggregation) searches that are typically implemented by an external component or service such as Elasticsearch, but have the additional benefit to stay automatically up-to-date with the GraphDB repository data.
The Connectors provide synchronisation at the _entity_ level, where an entity is defined as having a unique identifier (a URI) and a set of properties and property values. In terms of RDF, this corresponds to a set of triples that have the same subject. In addition to simple properties (defined by a single triple), the Connectors support _property chains_. A property chain is defined as a sequence of triples where each triple's object is the subject of the following triple.
h1. Features
The main features of the GraphDB Connectors are:
* maintaining an index that is always in sync with the data stored in GraphDB;
* multiple independent instances per repository;
* the entities for synchronisation are defined by:
** a list of fields (on the Elasticsearch side) and property chains (on the GraphDB side) whose values will be synchronised;
** a list of rdf:type's of the entities for synchronisation;
** a list of languages for synchronisation (the default is all languages);
** additional filtering by property and value.
* full-text search using native Elasticsearch queries;
* snippet extraction: highlighting of search terms in the search result;
* faceted search;
* sorting by any preconfigured field;
* paging of results using _offset_ and _limit_;
* custom mapping of RDF types to Elasticsearch types;
Each feature is described in detail below.
h1. Sample data
All examples use the following sample data, which describes five fictitious wines: Yoyowine, Franvino, Noirette, Blanquito and Rozova as well as the grape varieties required to make these wines. The minimum required ruleset level in GraphDB is RDFS.
{noformat}
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix : <http://www.ontotext.com/example/wine#> .
:RedWine rdfs:subClassOf :Wine .
:WhiteWine rdfs:subClassOf :Wine .
:RoseWine rdfs:subClassOf :Wine .
:Merlo
rdf:type :Grape ;
rdfs:label "Merlo" .
:CabernetSauvignon
rdf:type :Grape ;
rdfs:label "Cabernet Sauvignon" .
:CabernetFranc
rdf:type :Grape ;
rdfs:label "Cabernet Franc" .
:PinotNoir
rdf:type :Grape ;
rdfs:label "Pinot Noir" .
:Chardonnay
rdf:type :Grape ;
rdfs:label "Chardonnay" .
:Yoyowine
rdf:type :RedWine ;
:madeFromGrape :CabernetSauvignon ;
:hasSugar "dry" ;
:hasYear "2013"^^xsd:integer .
:Franvino
rdf:type :RedWine ;
:madeFromGrape :Merlo ;
:madeFromGrape :CabernetFranc ;
:hasSugar "dry" ;
:hasYear "2012"^^xsd:integer .
:Noirette
rdf:type :RedWine ;
:madeFromGrape :PinotNoir ;
:hasSugar "medium" ;
:hasYear "2012"^^xsd:integer .
:Blanquito
rdf:type :WhiteWine ;
:madeFromGrape :Chardonnay ;
:hasSugar "dry" ;
:hasYear "2012"^^xsd:integer .
:Rozova
rdf:type :RoseWine ;
:madeFromGrape :PinotNoir ;
:hasSugar "medium" ;
:hasYear "2013"^^xsd:integer .
{noformat}
h1. Usage
All interactions with the Elasticsearch GraphDB Connector shall be done through SPARQL queries.
There are three types of SPARQL queries:
* INSERT for creating and deleting connectors;
* SELECT for listing connectors and querying connector configuration parameters;
* INSERT/SELECT for storing and querying data as part of the normal GraphDB data workflow.
In general this corresponds to _INSERT adds or modifies data_ and _SELECT queries existing data_.
Each connector implementation defines its own URI prefix to distinguish it from other connectors. For the Elasticsearch GraphDB Connector, this is *http://www.ontotext.com/connectors/elasticsearch#*. Each command or predicate executed by the connector uses this prefix, e.g. <http://www.ontotext.com/connectors/elasticsearch##createConnector> to create a connector for Elasticsearch.
Individual instances of a connector are distinguished by unique names that are also URIs. They have their own prefix to avoid clashing with any of the command predicates. For Elasticsearch, the instance prefix is http://www.ontotext.com/connectors/elasticsearch/instance#.
h2. Using a connector with a GraphDB cluster
This release introduces support for Elasticsearch connectors in a GraphDB cluster. The connectors require a transactional entity pool, which is off by default. Please, refer to [GraphDB Entity Pool] to enable the transactional entity pool.
h2. Creating a connector
Creating a connector is done by sending a SPARQL query with the following configuration data:
* the name of the connector (e.g. my_index);
* classes to synchronise;
* properties to synchronise.
The configuration data has to be provided as a JSON string representation and passed together with the create command.
{tip:title=What we recommend}
Use the GraphDB Connectors management interface provided by the GraphDB Workbench as it will let you create the configuration easily and then create the connector directly or copy the configuration and execute it elsewhere.
{tip}
The create command is triggered by a SPARQL *INSERT* with the *createConnector* predicate, e.g. this will create a connector called *my_index* that will synchronise the wines from the sample data above:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
INSERT DATA {
inst:my_index :createConnector '''
{
"elasticsearchNode": "localhost:9300",
"types": [
"http://www.ontotext.com/example/wine#Wine"
],
"fields": [
{
"fieldName": "grape",
"propertyChain": [
"http://www.ontotext.com/example/wine#madeFromGrape",
"http://www.w3.org/2000/01/rdf-schema#label"
]
},
{
"fieldName": "sugar",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasSugar"
],
},
{
"fieldName": "year",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasYear"
]
}
]
}
''' .
}
{noformat}
The above command will create a new Elasticsearch connector that will connect to the Elasticsearch instance accessible at port 9300 on the localhost as specified by the "elasticsearchUrl" key.
The "types" key defines the RDF type of the entities to synchronise and in the example it is only entities of the type <http://www.ontotext.com/example/wine#Wine> (and its subtypes). The "fields" key defines the mapping from RDF to Elasticsearch. The basic building block is the property chain, i.e. a sequence of RDF properties where the object of each property is the subject of the following property. In the example we map three bits of information - the wine's grape, sugar content, and year. Each chain is assigned a short and convenient field name: "grape", "sugar", and "year". The field names are later used in the queries.
Grape is an example of a property chain composed of more than one property. First we take the wine's madeFromGrape property, the object of which is an instance of type Grape, and then we take the rdfs:label of this instance. Sugar and year are both composed of a single property that links the value directly to the wine.
h4. Schema and core management
By default GraphDB will manage (create, delete or update if needed) the Solr core and the Solr schema. This makes it easier to use Solr as everything will be done automatically. This behaviour can be changed by the following options:
* _manageIndex_: if true, GraphDB will manage the index. True by default.
* _manageSchema_: if true, GraphDB will manage the schema. True by default.
Note that if either of the options is set to false you will be responsible for creating, updating or removing the core/schema and the connector will not function correctly if you misconfigured Elasticsearch.
h5. Using a non-managed schema
The present version of the connectors provides no support for changing some advanced options, such as stopwords, on a per field basis. The recommended way to do that for now is to manage the mapping yourself and tell the connector to just sync the object values in the appropriate fields. Here is an example:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
INSERT DATA {
inst:my_index :createConnector '''
{
"elasticsearchNode": "localhost:9200",
"types": [
"http://www.ontotext.com/example/wine#Wine"
],
"fields": [
{
"fieldName": "grape",
"propertyChain": [
"http://www.ontotext.com/example/wine#madeFromGrape",
"http://www.w3.org/2000/01/rdf-schema#label"
]
},
{
"fieldName": "sugar",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasSugar"
]
},
{
"fieldName": "year",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasYear"
]
}
],
"manageSchema": "false"
}
''' .
}
{noformat}
this will create the same connector as above but it expects fields with the specified fieldnames to be already present in the index mapping, as well as some internal GraphDB fields. For the example you must have the following fields:
|| field name || Elasticsearch config ||
| _graphdb_id | "type":"long", "index":"not_analyzed", "store":"yes" |
| _chains | "type":"long", "index":"not_analyzed", "store":"no" |
| grape | "type":"string", "index":"analyzed", "store":"yes" |
| sugar | "type":"string", "index":"analyzed", "store":"yes" |
| year | "type":"integer", "index":"analyzed", "store":"yes" |
_graphdb_id and _chains are used internally by GraphDB and are always required.
h2. Dropping a connector
Dropping a connector removes all references to its external store from GraphDB as well as the Sorl core associated with it. Dropping a connector is achieved through a SPARQL INSERT query with the following parameter:
* Name of the connector
The drop command is triggered by a SPARQL *INSERT* with the *dropConnector* predicate, e.g. this will remove the connector *:my_index*:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
INSERT DATA {
inst:my_index :dropConnector "" .
}
{noformat}
h2. Listing available connectors
Listing connectors returns all previously created connectors. It is a *SELECT* query with the *listConnectors* predicate:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
SELECT ?cntUri ?cntStr {
?cntUri :listConnectors ?cntStr .
}
{noformat}
*?cntUri* will be bound to the prefixed URI of the connector that was used during creation, e.g. <http://www.ontotext.com/connectors/elasticsearch/instance#my_index>, while *?cntStr* will be bound to a string, representing the part after the prefix, e.g. "my_index".
h2. Status check
The internal state of each connector can be queried using a *SELECT* query and the *connectorStatus* predicate:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
SELECT ?cntUri ?cntStatus {
?cntUri :connectorStatus ?cntStatus .
}
{noformat}
*?cntUri* will be bound to the connector prefixed URI, while *?cntStatus* will be bound to a string representation of the status of the connector represented by this URI. The status is key-value based.
h2. Adding, updating and deleting data
From the user's point of view all synchronisation will happen transparently without using any additional predicates or naming a specific store explicitly, i.e. the user should simply execute standard SPARQL INSERT/DELETE queries. This is achieved by intercepting all changes in the plugin and determining which abstract documents need to be updated.
h2. Querying data
Once a connector has been created it will be possible to query data from it through SPARQL. For each matching abstract document, the connector returns the document's subject. In its simplest form querying is achieved by using a *SELECT* and providing the Elasticsearch query as the object of the *:query* predicate:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity {
?search a inst:my_index ;
:query "grape:cabernet" ;
:entities ?entity .
}
{noformat}
The result will bind ?entity to the two wines made from grapes that have "cabernet" in their name, namely :Yoyowine and :Franvino.
Note that you must use the field names you chose when you created the connector. It is perfectly valid to have field names identical to the property URIs but then you responsible for escaping any special characters according to what Elasticsearch expects.
First we get an instance of the requested connector by using the RDF notation "X a Y" (= X rdf:type Y), where X is a variable and Y is a connector. X will be bound to an instance of this connector. Then we assign a query to that instance by using the system predicate *:query*. Finally we request the matching entities through the *:entities* predicate.
It is also possible to provide per query search options by using one or more option predicates. The option predicates are described in details below.
h4. Raw queries
If you want to access a Elasticsearch query parameter that is not exposed through a special predicate you can do it with a raw query. Instead of providing a full text query in the :query part you specify raw Elasticsearch parameters. For example, if you want to boost some parts of your full text query as described [here|http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html] you can use the following query:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity {
?search a inst:my_index ;
:query '''
{
{
"bool" : {
"should" : [ {
"query_string" : {
"query" : "<full-text-query-not-boosted>"
}
}, {
"query_string" : {
"query" : "<full-text-query-boosted>",
"boost" : 4.0
}
} ]
}
}
}
''' ;
:entities ?entity .
}
{noformat}
h3. Combining Elasticsearch results with GraphDB data
The bound ?entity can be used in other SPARQL triples in order to build complex queries that fetch additional data from GraphDB. For example to see the actual grapes in the matching wines as well as the year they were made:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
PREFIX wine: <http://www.ontotext.com/example/wine#>
SELECT ?entity ?grape ?year {
?search a inst:my_index ;
:query "grape:cabernet" ;
:entities ?entity .
?entity wine:madeFromGrape ?grape .
?entity wine:hasYear ?year
}
{noformat}
The result will look like this:
|| ?entity || ?grape || ?sugar ||
| :Yoyowine | :CabernetSauvignon | 2013 |
| :Franvino | :Merlo | 2012 |
| :Franvino | :CabernetFranc | 2012 |
Note that :Franvino is returned twice because it is made from two different grapes, both of which are returned.
h3. Entity match score
It is possible to access the match score returned by Elasticsearch with the *:score* predicate. As each entity has its own score, the predicate must come at the entity level. For example:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity ?score {
?search a inst:my_index ;
:query "grape:cabernet" ;
:entities ?entity .
?entity :score ?score
}
{noformat}
The result will look like this but the actual score might be different as it depends on the specific Elasticsearch version:
|| ?entity || ?score ||
| :Yoyowine | 0.9442660212516785 |
| :Franvino | 0.7554128170013428 |
h2. Basic faceting
Consider the sample wine data and the my_index connector described previously. We can use the same connector to query facets too:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?facetName ?facetValue ?facetCount WHERE {
# note empty query is allowed and will just match all documents, hence no :query
?r a inst:my_index ;
:facetFields "year,sugar" ;
:facets _:f .
_:f :facetName ?facetName .
_:f :facetValue ?facetValue .
_:f :facetCount ?facetCount .
}
{noformat}
It is important to specify the fields we want to facet by using the *facetFields* predicate. Its value must be a simple comma-delimited list of field names. In order to get the faceted results, we have to use the *facets* predicate and as each facet has three components (name, value and count), the facets predicate binds a blank node, which in turn can be used to access the individual values for each component through the predicates *facetName*, *facetValue*, and *facetCount*.
The resulting bindings will look like in the table below:
|| facetName || facetValue || facetCount ||
| year | 2012 | 3 |
| year | 2013 | 2 |
| sugar | dry | 3 |
| sugar | medium | 2 |
We can easily see that there are three wines produced in 2012 and two in 2013. We also see that three of the wines are dry, while two are medium. However, it is not necessarily true that the three wines produced in 2012 are the same as the three dry wines as each facet is computed independently.
h2. Advanced faceting and aggregations
While basic faceting allows for simple counting of documents based on the discrete values of a particular field, there are more complex faceted or aggregation searches in Elasticsearch. The connector provides a mapping from Elasticsearch results to RDF results but no mechanism for specifying the queries other than executing a [raw query|#Raw queries].
h3. Supported Elasticsearch facets and aggregations
The Elasticsearch connector supports mapping of range, interval and pivot facets. Please refer to the documentation of Solr for more information.
h3. RDF mapping of the results
The results are accessed through the predicate :aggregations (much like the basic facets are accessed through :facets). The predicate will bind multiple blank nodes that each contain a single aggregation bucket. The individual bucket items can be accessed through these predicates:
|| predicate || meaning || Elasticsearch counterpart ||
| :name | Bucket name | getName() |
| :key | Key or value associated with the bucket | getValue() or getKey() |
| :count | Count of documents in the bucket | getDocCount(), getValue() |
| :from | Start of range | getFrom(), getFromAsDate() |
| :to | End of range (RangeFacet) | getTo(), getToAsDate() |
| :min | Minimum value | getMin(), getValue() |
| :max | Maximum value | getMax(), getValue() |
| :sum | Sum value | getSum(), getValue() |
| :avg | Average value | getAvg(), getValue() |
| :sum_of_squares | Sum of squares value | getSumOfSquares() |
| :variance | Variance value | getVariance() |
| :std_deviation | Standard deviation value | getStdDeviation() |
| :parent | Sub-aggregations: points to the parent (upper level) blank node | |
| :level | Sub-aggregations: level number where 1 is the uppermost level and the following levels are 2, 3 and so on | |
| :levelName | Sub-aggregations: level name | getKey() or getValue() |
{anchor:sorting}
h2. Sorting
It is possible to sort the entities returned by a connector query according to one or more fields. In order to be able to use a certain field for sorting, you have to specify this at the time of creating the connector instance. Sorting is achieved by the *orderBy* predicate the value of which must be a comma-delimited list of fields. Each field may be prefixed with a minus to indicate sorting in descending order. For example:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity {
?search a inst:my_index ;
:query "year:2013" ;
:orderBy "-sugar" ;
:entities ?entity .
}
{noformat}
The result will contain wines produced in 2013 sorted according to their sugar content in descending order:
|| entity ||
| Rozova |
| Yoyowine |
By default, entities are sorted according to their matching score in descending order.
Note that if you join the entity from the connector query to other triples stored in GraphDB, GraphDB might scramble the order. To remedy this, use ORDER BY from SPARQL.
h2. Limit and offset
Limit and offset are supported on the Elasticsearch side of the query. This is achieved through the predicates *limit* and *offset*. Consider this example in which we specify an offset of 1 and a limit of 1:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity {
?search a inst:my_index ;
:query "sugar:dry" ;
:offset "1" ;
:limit "1" ;
:entities ?entity .
}
{noformat}
The result will contain a single wine, Franvino, as it would be second in the list if we executed the query without the limit and offset:
|| entity ||
| Yoyowine |
| *Franvino* |
| Blanquito |
Note that the specific order in which GraphDB returns the results, depends on both how Elasticsearch returns the matches, unless you specified sorting.
h2. Snippet extraction
Snippet extraction is used to extract highlighted snippets of text that match the query. The snippets are accessed through the dedicated predicate *:snippets*, which binds a blank node that in in turn provides the actual snippets via the predicates *:snippetField* and *:snippetText*. The predicate :snippets must be attached to the entity, as each entity has a different set of snippets. For example, in a search for Cabernet:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?entity ?snippetField ?snippetText {
?search a inst:my_index ;
:query "grape:cabernet" ;
:entities ?entity .
?entity :snippets _:s .
_:s :snippetField ?snippetField ;
:snippetText ?snippetText .
}
{noformat}
The query will return the two wines made from Cabernet Sauvignon or Cabernet Franc grapes as well as the respective matching fields and snippets:
|| ?entity || ?snippetField || ?snippetText ||
| :Yoyowine | grape | <em>Cabernet</em> Sauvignon |
| :Franvino | grape | <em>Cabernet</em> Franc |
Note that the actual snippets might be somewhat different as this depends on the specific Elasticsearch implementation.
It is possible to tweak how the snippets are collected/composed by using the following option predicates:
* *:snippetSize* sets the maximum size of the extracted text fragment, 250 by default.
* *:snippetSpanOpen* text to insert before the highlighted text, <em> by default.
* *:snippetSpanClose* text to insert after the highlighted text, </em> by default.
The option predicates are set on the connector instance, much like the :query predicate.
h2. Total hits
You can get the total number of hits by using the *:totalHits* predicate, e.g. for the connector :my_index and a query that would retrieve all wines made in 2012:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?totalHits {
?r a inst:my_index ;
:query "year:2012" ;
:totalHits ?totalHits .
}
{noformat}
As there are three wines made in 2012, the value 3 (of type xdd:long) will be bound to ?totalHits.
h1. Creation parameters
The creation parameters define how a connector instance is created by the :createConnector predicate. There are some required parameters and some that are optional. All parameters are provided together in a JSON object, where the parameter names are the object keys. Parameter values may be simple JSON values such as a string or a boolean, or they can be lists or objects.
All of the creation parameters can also be set conveniently from the Create Connector user interface in the GraphDB Workbench without any knowledge of JSON.
h3. Elasticsearch instance to sync to: elasticsearchNode (string)
Since Elasticsearch is a third-party service, you have to specify the node where it is running. The format of the node value is of the form *hostname.domain:port*. There is no default value.
h3. Types of entities to sync: types (list of URI)
The RDF types of entities to sync are specified as a list of URIs. At least one type URI must be provided.
h3. What exactly to sync: fields (list of field object)
The fields define exactly what parts of each entity will be synchronised as well as the specific details on the connector side. The field is the smallest synchronisation unit and it maps a property chain from GraphDB to a field in Elasticsearch. The fields are specified as a list of field objects. At least one field object must be provided. Each field object has further keys that specify details.
h4. Name of the field: fieldName (string)
The name of the field defines the mapping on the connector side. It is specified by the key fieldName with a string value. The field name is used at query time to refer to the field. There are few restrictions on the allowed characters in a field name but to avoid unnecessary escaping (which depends on how Elasticsearch parses its queries) we recommend to keep the field names simple.
h4. Property chain to map: propertyChain (list of URI)
The property chain (propertyChain) defines the mapping on the GraphDB side. A property chain is defined as a sequence of triples where the entity URI is the subject of the first triple, its object is the subject of the next triple and so on. In this model, a property chain with a single element corresponds to a direct property defined by a single triple. Property chains are specified as a list of URIs and at least one URI must be provided. If you need to store the entity URI in the connector, you may map it by defining a property chain with a single special URI: $self. Only one field per connector may use the $self notation.
h4. The default value: defaultValue (string)
The default value (defaultValue) provides means for specifying a default value for the field when the property chain has no matching values in GraphDB. The default value can be a plain literal, a literal with a datatype (xsd: prefix supported), a literal with language or a URI. It has no default value.
h4. Indexing the field: indexed (boolean)
Fields are indexed by default but that can be changed by using the Boolean option "indexed". True by default.
If true this option corresponds to "index" = "analyzed" or "not_analyzed". If false it corresponds to "index" = "no".
h4. Storing the field: stored (boolean)
Fields are stored in Elasticsearch by default but that can be changed by using the Boolean option "stored". Stored fields are required for retrieving snippets. True by default.
This option corresponds to the property "store" in the Elasticsearch mapping.
h4. Skipping the analyser: analyzed (boolean)
When literal fields are indexed in Elasticsearch, they will be analysed according to the analyser settings. Should you require that a given field is not analysed you may use "analyzed". This option has no effect for URIs (they are never analysed). True by default.
If true this option corresponds to "index" = "analyzed" in the Elasticsearch schema. If false it corresponds to "index" = "not_analyzed".
h4. Multivalued fields: multivalued (boolean)
RDF propreties and synchronised fields may have more than one value. If "multivalued" is set to true, all values will be synchronised to Elasticsearch. If set to false, only a single value will be synchronised. True by default.
h3. Automatic datatype mapping
The connector will map different types of RDF values to different types of Elasticsearch values according to the basic type of the RDF value (URI or literal) and the datatype of literals. The autodetection will use the following mapping:
|| RDF value || RDF datatype || Elasticsearch type ||
| URI | n/a | string, indexed = not_analyzed |
| literal | none | string |
| literal | xsd:boolean | boolean |
| literal | xsd:double | double |
| literal | xsd:float | float |
| literal | xsd:long | long |
| literal | xsd:int | integer |
| literal | xsd:datetime | date, format = date_optional_time |
| literal | xsd:date | date, format = date_optional_time |
Note that for any given field the automatic mapping will use the first value it sees. This will work fine for clean datasets but might lead to problems if your dataset has non-normalised data, e.g. the first value has no datatype but other values have one.
h4. Manual datatype mapping: datatype (string)
The mapping can be overriden through the property "datatype", which can be specified per field. The value of "datatype" may be any of the xsd: types supported by the automatic mapping or a native Elasticsearch type prefixed by native:, e.g. both xsd:long and native:long will map to the long type in Elasticsearch.
h3. Literals in what language: languages (list of string)
RDF data is often multilingual but you may want to map only some of the languages represented in the literal values. This can be done by specifying a list of language ranges that will be matched to the language tags of literals according to RFC 4647, Section 3.3.1. Basic Filtering. In addition an empty range can be used to include literals that have no language tag. The list of language ranges will map all existing literals that have matching language tags.
h3. Elasticsearch index extra settings: indexCreateSettings (string)
This option will be passed directly to Elasticsearch when creating the index. It can be in JSON, YAML or properties format.
h1. Advanced filtering and fine tuning
h3. entityFilter (string)
The _entityFilter_ parameter is used to fine-tune the set of entities and/or individual values for the configured fields, based on the field value. Entities and field values will be synchronised to Elasticsearch if, and only if, they pass the filter. The entity filter is similar to a FILTER() inside a SPARQL query but not exactly the same. Each configured field can be referred to in the entity filter by prefixing it with a "?", much like referring to a variable in SPARQL. Several operators are supported:
|| Operator || Meaning || Example ||
| ?var in (_value1_, _value2_, ...) | Tests if the field _var_'s value is one of the specified values. Values that do not match will be treated as if they were not present in the repository. | ?status in ("active", "new") |
| ?var not in (_value1_, _value2_, ...) | The negated version of the in-operator. | ?status not in ("archived") |
| bound(?var) | Tests if the field _var_ has a valid value. This can be used to make the field compulsory. | bound(?name) |
| _expr1_ \|\| _expr2_ | Logical disjunction of expressions _expr1_ and _expr2_. | bound(?name) \|\| bound(?company) |
| _expr1_ && _expr2_ | Logical conjunction of expressions _expr1_ and _expr2_. | bound(?status) && ?status in ("active", "new") |
| !_expr_ | Logical negation of expression _expr_. | !bound(?company) |
| ( expr ) | Grouping of expressions | (bound(?name) \|\| bound(?company)) && bound(?address) |
In addition to the operators there are some constructions that can be used to write filters based not on the values but on values related to them:
h4. Accessing the previous element in chain
The construction *parent(?var)* can be used to go to a previous level in a property chain. It can be applied recursively as many times as needed, e.g. *parent(parent(parent(?var)))* will go back in the chain three times. The effective value of *parent(?var)* can be used with the *in* or *not in* operator like this: parent(?company) in (<urn:a>, <urn:b>).
h4. Accessing an element beyond the chain
The construction *?var -> _uri_* (alternatively *?var o _uri_* or just *?var _uri_*) can be used to access additional values that are accessible through the property _uri_. In essence this construction corresponds to the triple pattern _value_ _uri_ ?effectiveValue, where ?value is a value bound by the field _var_. The effective value of *?var -> _uri_* can be used with the *in* or *not in* operator like this: ?company -> rdfs:type in (<urn:c>, <urn:d>). It can be combined with *parent()* like this: parent(?company) -> rdf:type.
The URI parameter can be a full URI within < > or the special string _rdfs:type_ (alternatively just _type_), which will be expanded to http://www.w3.org/1999/02/22-rdf-syntax-ns#type.
h4. Filtering by RDF context
The construction *context(?var)* can be used to access the RDF context of a field's value. The typical use case is to sync only explicit values: context(?a) not in (<http://www.ontotext.com/implicit>). The construction can be combined with *parent()* like this: context(parent(?a)) in (<urn:a>).
h4. Entity filters and default values
Entity filters can be combined with default values in order to get more flexible behaviour.
A typical use-case for an entity filter is having soft deletes, i.e. instead of deleting an entity it is marked as deleted by the presence of a specific value for a given property.
h3. Basic entity filter example
For example, if we create a connector like this:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
INSERT DATA {
inst:my_index :createConnector '''
{
"elasticsearchNode": "localhost:9200",
"types": ["http://www.ontotext.com/example#gadget"],
"fields": [
{
"fieldName": "name",
"propertyChain": ["http://www.ontotext.com/example#name"]
},
{
"fieldName": "city",
"propertyChain": ["http://www.ontotext.com/example#city"]
}
],
"entityFilter":"bound(?city) && ?city in (\\"London\\")"
}
''' .
}
{noformat}
and then insert some entities:
{noformat}
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://www.ontotext.com/example#> .
# the entity bellow will be synchronised because it has a matching value for city: ?city in ("London")
:alpha
rdf:type :gadget ;
:name "John Synced" ;
:city "London" .
# the entity below will not be synchronised because it is lacking the property completely: bound(?city)
:beta
rdf:type :gadget ;
:name "Peter Syncfree" .
# the entity below will not be synchronised - different city value:
# ?city in ("London") will remove the value "Liverpool" so bound(?city) will be false
:gamma
rdf:type :gadget ;
:name "Mary Syncless" ;
:city "Liverpool" .
{noformat}
We could create the following index to specify a default value for _city_:
{noformat}
...
{
"fieldName": "city",
"propertyChain": ["http://www.ontotext.com/example#city"],
"defaultValue": "London"
}
...
}
{noformat}
The default value will be used for entity:b as it has no value for city in the repository. As the value is "London", the entity will be synchronised.
h3. Advanced entity filter example
Sometimes data represented in RDF is not well suited to map directly to non-RDF. For example, if we have news articles and they can be tagged with different concepts (locations, persons, events, etc.), one possible way to model that is a single property :taggedWith. Consider the following RDF data:
{noformat}
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://www.ontotext.com/example2#> .
:Berlin
rdf:type :Location ;
rdfs:label "Berlin" .
:Mozart
rdf:type :Person ;
rdfs:label "Wolfgang Amadeus Mozart" .
:Einstein
rdf:type :Person ;
rdfs:label "Albert Einstein" .
:Cannes-FF
rdf:type :Event ;
rdfs:label "Cannes Film Festival" .
:Article1
rdf:type :Article ;
rdfs:comment "An article about a film about Einstein's life while he was a professor in Berlin." ;
:taggedWith :Berlin ;
:taggedWith :Einstein ;
:taggedWith :Cannes-FF .
:Article2
rdf:type :Article ;
rdfs:comment "An article about Berlin." ;
:taggedWith :Berlin .
:Article3
rdf:type :Article ;
rdfs:comment "An article about Mozart's life." ;
:taggedWith :Mozart .
:Article4
rdf:type :Article ;
rdfs:comment "An article about classical music in Berlin." ;
:taggedWith :Berlin ;
:taggedWith :Mozart .
:Article5
rdf:type :Article ;
rdfs:comment "A boring article that has no tags." .
:Article6
rdf:type :Article ;
rdfs:comment "An article about the Cannes Film Festival in 2013." ;
:taggedWith :Cannes-FF .
{noformat}
Now, if we want to map this data to Elasticsearch such that the property *:taggedWith _x_* is mapped to separate fields *taggedWithPerson* and *taggedWithLocation* according to the type of _x_ (we are not interested in events), we can map :taggedWith twice to different fields and then use an entity filter to get the desired values:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
INSERT DATA {
inst:my_index :createConnector '''
{
"elasticsearchNode": "localhost:9200",
"types": ["http://www.ontotext.com/example2#Article"],
"fields": [
{
"fieldName": "comment",
"propertyChain": ["http://www.w3.org/2000/01/rdf-schema#comment"]
},
{
"fieldName": "taggedWithPerson",
"propertyChain": ["http://www.ontotext.com/example2#taggedWith"]
},
{
"fieldName": "taggedWithLocation",
"propertyChain": ["http://www.ontotext.com/example2#taggedWith"]
}
],
"entityFilter": "?taggedWithPerson type in (<http://www.ontotext.com/example2#Person>) && ?taggedWithLocation type in (<http://www.ontotext.com/example2#Location>)"
}
''' .
}
{noformat}
Note: *type* is the short way to write <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>.
The six articles in the RDF data above will be mapped as such:
|| Article URI || Entity mapped? || Value in taggedWithPerson || Value in taggedWithLocation || Explanation ||
| :Article1 | yes | :Einstein | :Berlin | :taggedWith has the values :Einstein, :Berlin and :Cannes-FF. The filter leaves only the correct values in the respective fields. The value :Cannes-FF is ignored as it does not match the filter. |
| :Article2 | yes | | :Berlin | :taggedWith has the value :Berlin. After the filter is applied, only taggedWithLocation is populated. |
| :Article3 | yes | :Mozart | | :taggedWith has the value :Mozart. After the filter is applied, only taggedWithPerson is populated |
| :Article4 | yes | :Mozart | :Berlin | :taggedWith has the values :Berlin and :Mozart. The filter leaves only the correct values in the respective fields. |
| :Article5 | yes | | | :taggedWith has no values. The filter is not relevant. |
| :Article6 | yes | | | :taggedWith has the value :Cannes-FF. The filter removes it as it does not match. |
This can be checked by issuing a faceted search for taggedWithLocation and taggedWithPerson:
{noformat}
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
SELECT ?facetName ?facetValue ?facetCount {
?search a inst:my_index ;
:facetFields "taggedWithLocation,taggedWithPerson" ;
:facets _:f .
_:f :facetName ?facetName ;
:facetValue ?facetValue ;
:facetCount ?facetCount .
}
{noformat}
If the filter was applied you should get only :Berlin for taggedWithLocation and only :Einstein and :Mozart for taggedWithPerson:
|| ?facetName || ?facetValue || ?facetCount ||
| taggedWithLocation | http://www.ontotext.com/example2#Berlin | 3 |
| taggedWithPerson | http://www.ontotext.com/example2#Mozart | 2 |
| taggedWithPerson| http://www.ontotext.com/example2#Einstein | 1 |
h1. Overview of connector predicates
The following diagram shows a summary of all predicates that can administer (create, drop, check status) connector instances or issue queries and retrieve results. It can be used as a quick reference of what a particular predicate needs to be attached to. For example, to retrieve entities you need to use :entities on a search instance and to retrieve snippets you need to use :snippets on an entity. Variables that are bound as a result of query are shown in green, blank helper nodes are shown in blue, literals in red, and URIs in orange. The predicates are represented by labelled arrows.
{plantuml}
left to right direction
skinparam activity {
BackgroundColor<<BNode>> #D1E0FF
BackgroundColor<<Var>> #D1FFD1
BackgroundColor<<URI>> #FFCC80
BackgroundColor #FFE3E3
}
partition "Instance level" {
"instance URI" <<URI>> -->[:createConnector] "JSON params"
"instance URI" -->[:dropConnector] "dummy value"
"instance URI" -->[:repairConnector] "dummy value"
"instance URI" -->[:connectorStatus] "?status" <<Var>>
"_:search" <<BNode>> -->[rdf:type] "instance URI"
}
partition "Search level: query and options" {
"_:search" -->[:query] "query value"
"_:search" -->[:limit] "limit value"
"_:search" -->[:offset] "offset value"
"_:search" -->[:orderBy] "order by expression"
"_:search" -->[:facetFields] "field name list"
"_:search" -->[:snippetSize] "snippet size value"
"_:search" -->[:snippetSpanOpen] "string"
"_:search" -->[:snippetSpanClose] "string"
}
partition "Search level: results"
"_:search" -->[:entities] "?entity" <<Var>>
"_:search" -->[:totalHits] "?totalHits" <<Var>>
"_:search" -->[:facets] "_:facet" <<BNode>>
"_:search" -->[:aggregations] "_:aggregation" <<BNode>>
}
partition "Entity level" {
"?entity" -->[:score] "?score" <<Var>>
"?entity" -->[:snippets] "_:snippet" <<BNode>>
}
partition "Snippet level" {
"_:snippet" -->[:snippetField] "?snippetField" <<Var>>
"_:snippet" -->[:snippetText] "?snippetText" <<Var>>
}
partition "Facet level" {
"_:facet" -->[:facetName] "?facetName" <<Var>>
"_:facet" -->[:facetValue] "?facetValue" <<Var>>
"_:facet" -->[:facetCount] "?facetCount" <<Var>>
}
partition "Aggregation level" {
"_:aggregation" -->[:name] "?aggrName" <<Var>>
"_:aggregation" -->[:key] "?aggrKey" <<Var>>
"_:aggregation" -->[:count] "?aggrCount" <<Var>>
"_:aggregation" -->[:from] "?aggrFrom" <<Var>>
"_:aggregation" -->[:to] "?aggrTo" <<Var>>
"_:aggregation" -->[:min] "?aggrMin" <<Var>>
"_:aggregation" -->[:max] "?aggrMax" <<Var>>
"_:aggregation" -->[:sum] "?aggrSum" <<Var>>
"_:aggregation" -->[:avg] "?aggrAvg" <<Var>>
"_:aggregation" -->[:sum_of_squares] "?aggrSumSq" <<Var>>
"_:aggregation" -->[:std_deviation] "?aggrStdDev" <<Var>>
"_:aggregation" -->[:variance] "?aggrVar" <<Var>>
"_:aggregation" -->[:parent] "?aggrParent" <<Var>>
"_:aggregation" -->[:level] "?aggrLevel" <<Var>>
"_:aggregation" -->[:levelName] "?aggrLevelName" <<Var>>
}
{plantuml}
h1. Caveats
h2. Order of control
Even though SPARQL per se is not sensitive to the order of triple patterns, the connectors expect to receive certain predicates before others so that queries can be executed properly. In particular, predicates that specify the query or query options need to come before any predicates that fetch results.
The diagram in [#Overview of connector predicates] provides a quick overview of the predicates.
h1. Migrating from a pre-6.2 version
GraphDB prior to 6.2 shipped with a versions of the connectors that had different options and slightly different behaviour. Most existing connector instances will be automatically migrated to the new settings but in some cases it is not possible to continue using the same queries. It is recommended to review the connector configuration after the upgrade and if necessary recreate it with adjusted parameters.
h2. Changes in field configuration and synchronisation
Prior to 6.2 a single field in the config could produce up to three individual fields on the Elasticsearch side based on the field options. For example for the field "firstName":
|| field || note ||
| firstName | produced if option "index" was true; used explicitly in queries |
| _facet_firstName | produced if option "facet" was true; used implicitly for facet search |
| _sort_firstName | produced if option "sort" was true; used implicitly for ordering connector results |
The current version always produces a single Elasticsearch field per field definition in the configuration. This means you are responsible for creating all appropriate fields based on your needs. See more under [#Creation parameters].
h2. The option manageExternalIndex
Prior to 6.2 the option _manageExternalIndex_ could be used to control the management of both the schema and the index. In the current implementation there are separate options, _manageSchema_ and _manageIndex_. See [#Schema and index management] for more information.