Elasticsearch GraphDB Connector

Version 1 by Pavel Mihaylov
on Aug 03, 2015 16:47.

compared with
Version 2 by Pavel Mihaylov
on Oct 14, 2015 19:04.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (11)

View Page History
h3. Third-party component versions

This version of the Elasticsearch GraphDB Connector uses Elasticsearch version 1.6.0. 1.7.2.

h2. Creating a connector instance
See [#Copy fields] for defining multiple fields with the same property chain.

See [#Multiple property chains per field] for defining a field whose values are populated from more than one property chain.

h4. defaultValue (string), optional, specifies a default value for the field

{note}

h3. Multiple property chains per field

Sometimes you have to work with data models that define the same concept (in terms of what you want to index in Elasticsearch) with more than one property chain, e.g. the concept of "name" could be defined as a single canoncial name, multiple historical names and some unofficial names. If you want to index those together as a single field in Elasticsearch you can define that as a multiple property chains field.

Fields with multiple property chains are defined as a set of separate _virtual_ fields that will be merged into a single _physical_ field when indexed. Virtual fields are distinguished by the suffix {nf}/xyz{nf}, where xyz is any alphanumeric sequence of convenience. For example, we can define the fields *name/1* and *name/2* like this:

{div:style=width: 70em}{noformat}
{
...
"fields": [
{
"fieldName": "name/1",
"propertyChain": [
"http://www.ontotext.com/example#canonicalName"
],
"fieldName": "name/2",
"propertyChain": [
"http://www.ontotext.com/example#historicalName"
]
...
},
...
}
{noformat}

The values of the fields *name/1* and *name/2* will be merged and synchronised to the field *name* in Elasticsearch.

{note}
You cannot mix suffixed and unsuffixed fields with the same same, e.g. if you defined *myField/new* and *myField/old* you cannot have a field called just *myField*.
{note}

h4. Filters and fields with multiple property chains

Filters can be used with fields defined with multiple property chains. Both the physical field values and the individual virtual field values are available:
* Physical fields are specified without the suffix, e.g. ?myField
* Virtual fields are specified with the suffix, e.g. ?myField/2 or ?myField/alt.

{note:title=Limitation}
Physical fields cannot be combined with parent() as their values come from different property chains. If you really need to filter the same parent level you can rewrite {nf}parent(?myField) in (<urn:x>, <urn:y>){nf} as {nf}parent(?myField/1) in (<urn:x>, <urn:y>) || parent(?myField/2) in (<urn:x>, <urn:y>) || parent(?myField/3) ...{nf} and surround it with parentheses if it is part of a bigger expression.
{note}

h1. Datatype mapping

| ( expr ) | Grouping of expressions | {nf}(bound(?name) || bound(?company)) && bound(?address){nf} |

{note}
* *?var in (...)* filters the values of ?var and leaves only the matching values, i.e. it will modify the actual data that will be synchronised to Elasticsearch
* *bound(?var)* checks if there is any valid value left after filtering operators like *?var in (...)* have been applied
{note}

In addition to the operators, there are some constructions that can be used to write filters based not on the values but on values related to them:

h4. Accessing the previous element in the chain

The construction *parent(?var)* is used for going to a previous level in a property chain. It can be applied recursively as many times as needed, e.g., *parent(parent(parent(?var)))* goes back in the chain three times. The effective value of *parent(?var)* can be used with the *in* or *not in* operator like this: {nf}parent(?company) in (<urn:a>, <urn:b>){nf}, or in the *bound* operator like this: {nf}parent(bound(?var)){nf}.

h4. Accessing an element beyond the chain

The construction *?var -> _uri_* (alternatively *?var o _uri_* or just *?var _uri_*) is used to access additional values that are accessible through the property _uri_. In essence, this construction corresponds to the triple pattern _value_ _uri_ ?effectiveValue, where ?value is a value bound by the field _var_. The effective value of ?var -> _uri_ can be used with the *in* or *not in* operator like this: {nf}?company -> rdf:type in (<urn:c>, <urn:d>){nf}. It can be combined with parent() like this: {nf}parent(?company) -> rdf:type in (<urn:c>, <urn:d>){nf}. The same construction can be applied to the *bound* operator like this: {nf}bound(?company -> <urn:hasBranch>){nf}, or even combined with parent() like this: {nf}bound(parent(?company) -> <urn:hasGroup>){nf}.

The URI parameter can be a full URI within < > or the special string _rdf:type_ (alternatively just _type_), which will be expanded to http://www.w3.org/1999/02/22-rdf-syntax-ns#type.
The diagram in [#Overview of connector predicates] provides a quick overview of the predicates.

h1. Migrating from a pre-6.2 version
h1. Upgrading from previous versions

GraphDB prior to 6.2 shipped with a version of the Elasticsearch GraphDB Connector that had different options and slightly different behaviour and internals. Unfortunately, it is not possible to migrate existing connector instances automatically. To prevent any data loss, the Elasticsearch GraphDB Connector will not initialise, if it detects an existing connector in the old format. The recommended way to migrate your existing instances is:
No special procedures are required for upgrading from:
* GraphDB 6.2 / Elasticsearch Connector 4.0
* GraphDB 6.3 / Elasticsearch Connector 4.1
* GraphDB 6.4 / Elasticsearch Connector 4.1

h3. Migrating from a pre-6.2 version of GraphDB

GraphDB prior to 6.2 shipped with version 3.x of the Elasticsearch GraphDB Connector that had different options and slightly different behaviour and internals. Unfortunately, it is not possible to migrate existing connector instances automatically. To prevent any data loss, the Elasticsearch GraphDB Connector will not initialise, if it detects an existing connector in the old format. The recommended way to migrate your existing instances is:

# backup the INSERT statement used to create the connector instance;
# drop the connector;