View Source

{toc}
h1. Full-Text Indexing and Search

Full-text search (FTS) concerns retrieving text documents out of a large collection by keywords or, more generally, by tokens (represented as sequences of characters). Formally, the query represents an unordered set of tokens and the result is set of documents, relevant to the query. In a simple FTS implementation, relevance is Boolean: a document is either relevant to the query, when it contains all the query tokens, or not. More advanced FTS implementations deal with a degree of relevance of the document to the query, usually judged on some sort of measure of the frequency of appearance of each of the tokens in the document, normalized versus the frequency of their appearance in the entire document collection. Such implementations return an ordered list of documents, where the most relevant documents come first.
FTS and structured queries, like those in database management systems (DBMS), are different information access methods based on a different query syntax and semantics, where the results are also displayed in a different form. FTS and databases usually require different types of indices too. The ability to combine these two types of information access methods is very useful for a wide range of applications. Many relational DBMS support some sort of FTS (which is integrated into the SQL syntax) and maintain additional indices that allow efficient evaluation of FTS constraints. Typically, relational DBMS allow the user to define a query, which requires specific tokens to appear in a specific column of a specific table. In SPARQL there is no standard way for the specification of FTS constraints. In general, there is neither a well defined nor widely accepted concept for FTS in RDF data. Nevertheless, some semantic repository vendors offer some sort of FTS in their engines. This section documents the FTS supported by OWLIM-SE.
Two approaches are implemented in OWLIM-SE, a proprietary implementation called 'Node Search', and a Lucene-based implementation called 'RDF Search'. The two approaches are collectively referred to in this guide as 'full-text indexing' and both of them enable OWLIM to perform complex queries against character data, which significantly speeds up the query process. To select one of them, one should consider their functional differences, which are outlined in the table below. Furthermore, there can be considerable differences between indexing and search speed of the two FTS implementations. Thus, performance-conscious users are recommended to experiment with the performance of both methods with respect to dataset and queries representative for the intended application.
\\
| || Node Search || RDF Search |
|| FTS query form | List of tokens | List of tokens (with Lucene query extensions) ||
|| Result form | Unordered set of nodes | Ordered list of URIs ||
|| Textual Representation | For literals: the string value. \\
For URIs and B-nodes: tokenized URL | Concatenation of the text representations of the nodes from the molecule (1-step neighbourhood in the graph) of the URI ||
|| Relevance | Boolean, based on presence of the query tokens in the text | Vector-space model, reflecting the degree of relevance of the text and the RDF rank of the URI ||
|| Implementation | Proprietary full-text indexing and search implementation | The Lucene engine is integrated and used for indexing and search ||
\\
The Node Search (with parameter *ftsLiteralsOnly* set to *true*) resembles functionality similar to typical FTS implementations in relational DBMS. However, RDF Search is a novel information retrieval concept, which allows for efficient extraction of RDF resources from huge datasets, where ordering of the results by relevance is crucial.

h2. Node Search -- Proprietary Full-Text Search

The parameters for OWLIM's full-text index control when/if the index is to be created, the index cache size, and whether literals only or all types of nodes should be indexed. See the parameters *ftsIndexPolicy*, *fts-memory* and *ftsLiteralsOnly* in the [configuration section|OWLIM-SE Configuration].
The following example configures the database engine to create a 20 megabyte cache for the full-text index on start up that indexes all literals and URIs:
{noformat}owlim:ftsIndexPolicy "onStartup" ;
owlim:fts-memory "20m" ;
owlim:ftsLiteralsOnly "false"
{noformat}Full-text search patterns are embedded in SPARQL and SeRQL queries by adding extra statement patterns that use special system predicates:
{noformat}<String:> <Algorithm predicate> <Binding> .
{noformat}Each of the elements of this triple is explained below:
* {{<String:>}} the search string - a list of tokens separated by colons ':', whose use is determined by the choice of predicate, see below;
* {{<Algorithm predicate>}} specifies the search method, i.e. how the tokens in the search string are to be used, see below;
* {{<Binding>}} the variable containing the result, i.e. the values (URIs or literals) that match with the given search string and method.


|| Predicate || Description ||
| {{fts:exactMatch}} | Matches literals that contain all tokens considering the case. For example, searching for {{<United:States>}} will match "The president of the United States", but not "United Statesless", "united states" or "notUnited notStates.". |
| {{fts:matchIgnoreCase}} | Similar to the above but ignores case. {{<United:States>}} will match "The president of the United States", "united states" but not "United Statesless" or "notUnited notStates." |
| {{fts:prefixMatch}} | Matches tokens that begin with the given search tokens considering the case. For example, {{<United:States>}} will match "The president of the United States" and "United Statesless" but not "notUnited notStates" or "united states." |
| {{fts:prefixMatchIgnoreCase}} | Similar to the above but ignores case. For example, {{<United:States>}} will match "The president of the United States", "United Statesless", "united states" but not "notUnited notStates". |

The namespace prefix {{onto}} in the above table {{<[http://www.ontotext.com/owlim/fts#]>}}
There follow some query examples for Node search in SPARQL and SeRQL:
* *Example 1:* Get all values that contain a token that matches exactly with 'abstract'
SPARQL query:
{noformat}PREFIX fts: <http://www.ontotext.com/owlim/fts#>
SELECT ?label
WHERE { <abstract:> fts:exactMatch ?label . }
{noformat}SeRQL query:
{noformat}SELECT L
FROM {<abstract:abstract>} fts:exactMatch {L}USING NAMESPACE
fts = <http://www.ontotext.com/owlim/fts#>
{noformat}Note that in SeRQL, *abstract:* is not a valid URI, so *abstract:abstract* is used instead, which works the same and also conforms with what the parser expects.
* *Example 2:* Get all values that contain both tokens 'Remorselessness' and 'books' using case-insensitive search (SPARQL):
{noformat}PREFIX fts: <http://www.ontotext.com/owlim/fts#>
SELECT ?label
WHERE { <Remorselessness:books> fts:matchIgnoreCase ?label. }
{noformat}The corresponding SeRQL query is omitted due to its similarity with the above SPARQL query.
* *Example 3:* Find everything that has a label that starts with "3d" regardless of the language or the case (SPARQL):
{noformat}PREFIX fts: <http://www.ontotext.com/owlim/fts#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label WHERE {
?X rdfs:label ?label .
<3d:> fts:prefixMatchIgnoreCase ?label. }
{noformat}This query cannot be expressed in SeRQL using full-text search predicates, because the SERQL parser won't accept a URI starting with a digit.
The above example is hard to formulate without a full text search capability. For example, the trivial query below won't match an entry with the label {{"3d"@en}}, because this literal is an {{rdf:PlainLiteral}} and not the same as {{"3d"}}, which is an {{xsd:string}}, i.e. the data types are different.
{noformat}PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?x
WHERE { ?x rdfs:label "3d" }
{noformat}


h2. RDF Search - Full-Text Search using Lucene

[Apache Lucene|http://lucene.apache.org] is a high-performance, full-featured text search engine written entirely in Java. OWLIM-SE supports full text search capabilities using Lucene with a variety of indexing options and the ability to simultaneously use multiple, differently configured indices in the same query.
{note}
The classpath must include lucene-core-3*.jar (included with the OWLIM distribution) in order for the Lucene-based full-text search to function correctly. This can be substituted with the full Lucene jar file that can be downloaded from the [Apache Lucene download page|http://www.apache.org/dyn/closer.cgi/lucene/java/].
{note}
In order to use Lucene full-text search in OWLIM-SE a Lucene index must first be computed. Before being created, each index can be parameterised in a number of ways using SPARQL ASK queries. This provides the ability to:
* select what kinds of nodes are indexed (URIs/literals/blank-nodes)
* select what is included in the 'molecule' (explained below) associated with each node
* select literals with certain language tags
* choose the size of the RDF 'molecule' to index
* choose whether to boost the relevance of nodes using RDF Rank values
* select alternative analysers
* select alternative scorers

In order to use the indexing behaviour of Lucene, a text document must be created for each node in the RDF graph to be indexed. This text document is called the 'RDF molecule' and is made up of other nodes reachable via the predicates that connect nodes to each other. Once a molecule has been created for each node, Lucene creates an index over these molecules. During search (query answering) Lucene identifies the matching molecules and OWLIM uses the associated nodes as variables substitutions when evaluating the enclosing SPARQL query.
The scope of an RDF molecule includes the starting node and its neighbouring nodes that are reachable via the specified number of predicate arcs. What type of nodes are indexed and what type of nodes are included in the molecule can be specified for each Lucene index. Furthermore, the size of the molecule can be controlled by specifying the number of allowed traversals of predicate arcs starting from the molecule centre (the node being indexed). Note that blank nodes themselves are never included in the molecule. If a blank node is encountered the search is extended via any predicate to the next nearest entity and so on. Therefore even when the molecule size is 1, entities reachable via several intermediate predicates can still be included in the molecule if all the intermediate entities are blank nodes.
The parameters are described in more detail as follows:

|| Parameter | *Exclude* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#exclude] ||
|| Description | Provides a regular expression to identify nodes that will be excluded from to the molecule. Note that the regular expression will be applied case-sensitively to literals and URI local names. \\
The example given below will cause matching URIs (e.g. <[http://example.com/uri#helloWorld]> ) and literals (e.g. "hello world\!") not to be included. ||
|| Default | <none> ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:exclude luc:setParam "hello.*" \} ||
\\
|| Parameter | *Exclude entities* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#excludeEntities] ||
|| Description | A comma/semi-colon/white-space separated list of entities that will NOT be included in an RDF molecule. \\
The example below will include any URI in a molecule, except the two listed. ||
|| Default | <none> ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:excludeEntities luc:setParam "http://www.w3.org/2000/01/rdf-schema#Class http://www.example.com/dummy#E1" \} ||
\\
|| Parameter | *Exclude predicates* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#excludePredicates] ||
|| Description | A comma/semi-colon/white-space separated list of properties that will NOT be traversed in order build an RDF molecule. \\
The example below will prevent any entities being added to an RDF molecule if they can only be reached via the two given properties. ||
|| Default | <none> ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:excludePredicates luc:setParam "http://www.w3.org/2000/01/rdf-schema#subClassOf http://www.example.com/dummy#p1" \} ||
\\
|| Parameter | *Include* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#include] ||
|| Description | Indicates what kinds of nodes are to be included in the molecule. The value can be a list of values from: uri, literal, centre (the plural forms are also allowed: uris, literals, centres). The value of _centre_ causes the node for which the molecule is built to be added to the molecule (provided it is not a blank node). This can be useful, for example, when indexing URI nodes with molecules that contain only literals, but the local part of the URI should also be searchable. ||
|| Default | "literals" ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:include luc:setParam "literal uri" . \} ||
\\
|| Parameter | *Include entities* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#includeEntities] ||
|| Description | A comma/semi-colon/white-space separated list of entities that can be included in an RDF molecule. \\
Any other entities will be ignored. The example below will build molecules that only contain the two entities. ||
|| Default | <none> ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:includeEntities luc:setParam "http://www.w3.org/2000/01/rdf-schema#Class http://www.example.com/dummy#E1" \} ||
\\
|| Parameter | *Include predicates* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#includePredicates] ||
|| Description | A comma/semi-colon/white-space separated list of properties that can be traversed in order build an RDF molecule. \\
The example below will allow any entities to be added to an RDF molecule, but only if they can be reached via the two given properties. ||
|| Default | <none> ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:includePredicates luc:setParam "http://www.w3.org/2000/01/rdf-schema#subClassOf http://www.example.com/dummy#p1" \} ||
\\
|| Parameter | *Index* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#index] ||
|| Description | Indicates what kinds of nodes are to be indexed. The value can be a list of values from: uri, literal, bnode (the plural forms are also allowed: uris, literals, bnodes). ||
|| Default | "literals" ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:index luc:setParam "literals, bnodes" . \} ||
\\
|| Parameter | *Language(s)* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#languages] ||
|| Description | A comma separated list of language tags. Only literals with the indicated language tags will be included in the index. To include literals that have no language tag, use the special value 'none'. ||
|| Default | "" (which is used to indicate that literals with any language tag are used, including those with no language tag) ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:languages luc:setParam "en,fr,none" . \} ||
\\
|| Parameter | *Molecule size* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#moleculeSize] ||
|| Description | Set the size of the molecule associated with each entity. A value of zero indicates that only the entity itself should be indexed. A value of 1 indicates that the molecule will contain all entities reachable by a single 'hop' via any predicate (predicates not included in the molecule). Note that blank nodes themselves are never included in the molecule. If a blank node is encountered the search is extended via any predicate to the next nearest entity and so on. Therefore even when the molecule size is 1, entities reachable via several intermediate predicates can still be included in the molecule if all the intermediate entities are blank nodes. Molecule sizes of 2 and upwards are allowed, but with large datasets it can take a very long time to create the index. ||
|| Default | 0 ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:moleculeSize luc:setParam "1" . \} ||
\\
|| Parameter | *Use RDF rank* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#useRDFRank] ||
|| Description | Indicates whether the RDF weights (if they have been computed already) associated with each entity should be used as boosting factors when computing the relevance to a given Lucene query. Allowable values are 'no', 'yes' and 'squared'. This last value indicates that the square of the RDF Rank value is to be used. ||
|| Default | "no" ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:useRDFRank luc:setParam "yes" . \} ||
\\
|| Parameter | *Set alternative analyser* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#analyzer] ||
|| Description | Used to set an alternative analyser for processing text to produce terms to index. By default, this parameter has no value and the default analyser used is: \\
org.apache.lucene.analysis.standard.StandardAnalyzer \\
An alternative analyser must be derived from: \\
org.apache.lucene.analysis.Analyzer \\
To use an alternative analyser, use this parameter to identify the name of a Java factory class that can instantiate it. The factory class must be available on the Java virtual machine's classpath and must implement this interface: \\
com.ontotext.trree.plugin.lucene.AnalyzerFactory ||
|| Default | <none> ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:analyzer luc:setParam "com.ex.MyAnalyserFactory" . \} ||
\\
|| Parameter | *Set alternative scorer* ||
|| Predicate | [http://www.ontotext.com/owlim/lucene#scorer] ||
|| Description | Used to set an alternative scorer that provides boosting values that adjust the relevance (and hence the ordering) of results to a Lucene query. By default, this parameter has no value and no additional scoring takes place, however, if the useRDFRank parameter is set to true, then the RDF Rank scores are used (see section 10.1). \\
An alternative scorer must implement this interface: \\
com.ontotext.trree.plugin.Scorer \\
In order to use an alternative scorer, use this parameter to identify the name of a Java factory class that can instantiate it. The factory class must be available on the Java virtual machine's classpath and must implement this interface: \\
com.ontotext.trree.plugin.ScorerFactory ||
|| Default | <none> ||
|| Example | PREFIX luc: <[http://www.ontotext.com/owlim/lucene#]> \\
ASK \{ luc:scorer luc:setParam "com.ex.MxScorerFactory" . \} ||
\\
Once the parameters for an index have been set, the index is created and named using a SPARQL ASK query of this form, where the index name appears as the subject in the query statement pattern:
{noformat}PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:myIndex luc:createIndex "true" . }
{noformat}The index name must have the {{[http://www.ontotext.com/owlim/lucene#]}} namespace and the local part can contain only alphanumeric characters and underscores.
Creating an index can take some time, although usually no more than a few minutes when the molecule size is 1 or less. During this process, for each node in the repository its surrounding molecule is computed. Then each such molecule is converted into a single string document (by concatenating the textual representation of all the nodes in the molecule) and this document is indexed by Lucene. If RDF Rank weights are used (or an alternative scorer is specified) then the computed values are stored in Lucene's index as a boosting factor that will later on influence the selection order.
To use a custom Lucene index in a SPARQL query use the index's name as the predicate in a statement pattern, with the Lucene query as the object using the full [Lucene query|http://lucene.apache.org/java/3_0_0/queryparsersyntax.html] vocabulary.

The following query will produce bindings for {{?s}} from entities in the repository, where the RDF molecule associated with that entity (for the given index) contains terms that begin with "United". Furthermore, the bindings will be ordered by relevance (with any boosting factor):
{noformat}PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
SELECT ?s
WHERE { ?s luc:myIndex "United*" . }
{noformat}

The Lucene score for a bound entity for a particular query can be exposed using a special predicate:
{noformat}http://www.ontotext.com/owlim/lucene#score{noformat}
This can be useful when lucene query results should be ordered in a manner based on, but different from, the original Lucene score. For example, the following query will order results by a combination of the Lucene score and some ontology defined importance value:
{noformat}
PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
PREFIX ex: <http://www.example.com/myontology#>
SELECT * {
?node luc:myIndex "lucene query string" .
?node ex:importance ?importance .
?node luc:score ?score .
} ORDER BY ( ?score + ?importance )
{noformat}
The {{luc:score}} predicate will work only on bound variables. There is no problem disambiguating multiple indices because each variable will be bound from exactly one Lucene index and hence its score.

The combination of ranking RDF molecules together with full-text search provides a powerful mechanism for querying/analysing datasets even when the schema is not known. This allows for keyword-based search over both literals and URIs with the results ordered by importance/interconnectedness. For an example of this kind of 'RDF Search', see [FactForge|http://factforge.net].

h1. Geo-spatial Extensions

OWLIM-SE has special support for 2-dimensional geo-spatial data that uses the [WGS84 Geo Positioning RDF vocabulary (World Geodetic System 1984)|http://www.w3.org/2003/01/geo/wgs84_pos]. Special indices can be used for this data that permit the efficient evaluation of special query forms and extension functions that allow:
* locations to be found that are within a certain distance of a point, i.e. within the specified circle on the surface of the sphere (Earth), using the nearby(...) construction;
* locations that are within rectangles and polygons, where the vertices are defined using spherical polar coordinates, using the within(...) construction.

{note}
The following jar files (included with the OWLIM distribution) must be on the classpath in order for the geo-spatial extensions to function correctly: jsi-1*.jar, log4j-1*.jar, sil-0*.jar, trove4j-2*.jar
{note}

The WGS84 ontology can be found at: [http://www.w3.org/2003/01/geo/wgs84_pos] and contains several classes and predicates:

|| Element || Description ||
| {{SpatialThing}} | Class used to represent anything with spatial extent, i.e. size, shape or position. |
| {{Point}} | Class used represent a point (relative to Earth) defined using latitude, longitude (and altitude). \\
subClassOf [http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing] |
| {{location}} | The relation between a thing and where it is. \\
range SpatialThing \\
subPropertyOf [http://xmlns.com/foaf/0.1/based_near] |
| {{lat}} | The WGS84 latitude of a SpatialThing (decimal degrees). \\
domain [http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing] |
| {{long}} | The WGS84 longitude of a SpatialThing (decimal degrees). \\
domain [http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing] |
| {{lat_long}} | A comma-separated representation of a latitude, longitude coordinate. |
| {{alt}} | The WGS84 altitude of a SpatialThing (decimal meters above the local reference ellipsoid). \\
domain [http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing] |

Before the geo-spatial extensions can be used, the geo-spatial index must be built. This is achieved using a special predicate as follows:
{noformat}PREFIX ontogeo: <http://www.ontotext.com/owlim/geo#>
ASK { _:b1 ontogeo:createIndex _:b2. }
{noformat}If the indexing is successful, the above query will return true, false otherwise. Information about the indexing process and any errors can be found in the log. Note that this query wil return false if there is no geospatial data in the repository, i.e. no statements describing resources with latitude and longitude properties.

h2. Geo-spatial query syntax

The special syntax used to query geo-spatial data makes use of SPARQL's [RDF Collections syntax|http://www.w3.org/TR/rdf-sparql-query/#collections]. This syntax uses round brackets as a shorthand for the statements connecting a list of values using {{rdf:first}} and {{rdf:rest}} predicates with terminating {{rdf:nil}}. Statement patterns that use one of the special geo-spatial predicates supported by OWLIM-SE are treated differently by the query engine. The following special syntax is supported when evaluating SPARQL queries (the descriptions all use the namespace {{omgeo: <[http://www.ontotext.com/owlim/geo#]>}}):
\\
|| Construct | *Nearby (lat long distance)* ||
|| Syntaxt | ?point omgeo:nearby(?lat ?long ?distance) ||
|| Description | This statement pattern will evaluate to true if the following constraints hold: \\
* ?point geo:lat ?plat . \\
* ?point geo:long ?plong . \\
* Shortest great circle distance from (?plat, ?plong) to (?lat, ?long) <= ?distance \\
\\
Such a construction will use the geo-spatial indices to find bindings for ?point that lie within the defined circle. \\
Constants are allowed for any of *?lat ?long ?distance*, where latitude and longitude are specified in decimal degrees and distance is specified in either kilometres ('km' suffix) or miles ('mi' suffix). If the units are not specified, then 'km' is assumed. ||
|| Restrictions | Latitude is limited to the range \-90 (South) to \+90 (North) \\
Longitude is limited to the range \-180 (West) to \+180 (East) ||
|| Examples | Find the names of airports that are within 50 miles of Seoul: \\
{noformat}PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>
SELECT distinct ?airport
WHERE {
?base geo-ont:name "Seoul" .
?base geo-pos:lat ?latBase .
?base geo-pos:long ?longBase .
?link omgeo:nearby(?latBase ?longBase "50mi") .
?link geo-ont:name ?airport .
?link geo-ont:featureCode geo-ont:S.AIRP .
}
{noformat} ||
\\
\\
|| Construct | *Within (rectangle)* ||
|| Syntax | ?point omgeo:within(?lat{~}1~ ?long{~}1~ ?lat{~}2~ ?long{~}2~) ||
|| Description | This statement pattern is used to test/find points that lie within the rectangle specified by diagonally opposite corners *?lat1 ?long1 and ?lat2 ?long2*. The corners of the rectangle must be either constants or bound values. \\
It will evaluate to true if the following constraints hold: \\
* ?point geo:lat ?plat . \\
* ?point geo:long ?plong . \\
* ?lat{~}1~ <= ?plat <= ?lat{~}2~ \\
* ?long{~}1~ <= ?plong <= ?long{~}2~ \\
\\
Note that the corners must be specified most westerly and southerly (first) and most northerly and easterly (second). Proper account is taken for rectangles that cross the \+/-180 degree meridian. \\
Constants are allowed for any of *?lat{*}{*}{~}1{~}* *?long{*}{*}{~}1{~}* *?lat{*}{*}{~}2{~}* *\*long{*}{*}{~}2{~}*, where latitude and longitude are specified in decimal degrees. If *?point* is unbound then bindings for all points within the rectangle will be produced. ||
|| Restrictions | Latitude is limited to the range \-90 (South) to \+90 (North) \\
Longitude is limited to the range \-180 (West) to \+180 (East) \\
Rectangle vertices must be specified in the order lower-left followed by upper-right ||
|| Examples | Find tunnels lying within a rectangle enclosing Tirol, Austria: \\
{noformat}PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>
SELECT ?feature ?lat ?long
WHERE {
?link omgeo:within(45.85 9.15 48.61 13.18) .
?link geo-ont:featureCode geo-ont:R.TNL .
?link geo-ont:name ?feature .
?link geo-pos:lat ?lat .
?link geo-pos:long ?long .
}
{noformat} ||
\\
|| Construct | *Within (polygon)* ||
|| Syntax | ?point omgeo:within(?lat{~}1~ ?long{~}1~ ... ?lat{~}n~ ?long{~}n~) ||
|| Description | This statement pattern is used to test/find points that lie within the polygon whose vertices are specified by three or more latitude/longitude pairs. The values for the vertices must be either constants or bound values. \\
It will evaluate to true if the following constraints hold: \\
* ?point geo:lat ?plat . \\
* ?point geo:long ?plong . \\
* the position ?plat ?plong is enclosed by the polygon \\
The polygon is closed automatically if the first and last vertices do not coincide. The vertices must be constants or bound values. Coordinates are specified in decimal degrees. If *?point* is unbound then bindings for all points within the polygon will be produced. ||
|| Restrictions | Latitude is limited to the range \-90 (South) to \+90 (North) \\
Longitude is limited to the range \-180 (West) to \+180 (East) ||
|| Examples | Find caves in the sides of cliffs lying within a polygon approximating the shape of England: \\
{noformat}PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>
SELECT ?feature ?lat ?long
WHERE {
?link omgeo:within( "51.45" "-2.59"
"54.99" "-3.06"
"55.81" "-2.03"
"52.74" "1.68"
"51.17" "1.41" ) .
?link geo-ont:featureCode geo-ont:S.CAVE .
?link geo-ont:name ?feature .
?link geo-pos:lat ?lat .
?link geo-pos:long ?long .
}
{noformat} ||


h2. Extension query functions

At present there is just one SPARQL extension function.

|| Function | *Distance function* ||
|| Syntax | double omgeo:distance(?lat{~}1~, ?long{~}1~, ?lat{~}2~, ?long{~}2~) ||
|| Description | This SPARQL extension function computes the distance between two points in kilometres and can be used in FILTER and ORDER BY clauses. ||
|| Restrictions | Latitude is limited to the range \-90 (South) to \+90 (North) \\
Longitude is limited to the range \-180 (West) to \+180 (East) ||
|| Examples | Find all the airports within 80 miles of Bournemouth and filter out those that are more than 80 kilometres from Brize Norton, order the results with the closest to Brize Norton first: \\
{noformat}PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>

SELECT distinct ?airport_name
WHERE {
?a1 geo-ont:name "Bournemouth" .
?a1 geo-pos:lat ?lat1 .
?a1 geo-pos:long ?long1 .
?airport omgeo:nearby(?lat1 ?long1 "80mi" ) .
?airport geo-ont:name ?airport_name .
?airport geo-ont:featureCode geo-ont:S.AIRP .
?airport geo-pos:lat ?lat2 .
?airport geo-pos:long ?long2 .
?a2 geo-ont:name "Brize Norton" .
?a2 geo-pos:lat ?lat3 .
?a2 geo-pos:long ?long3 .
FILTER( omgeo:distance(?lat2, ?long2, ?lat3, ?long3) < 80)
}
ORDER BY ASC( omgeo:distance(?lat2, ?long2, ?lat3, ?long3) )
{noformat} ||


h2. Implementation details

Knowledge of the implementation's algorithms and assumptions will allow users to make the best use of the OWLIM-SE geo-spatial extensions. The following points are significant and can affect the expected behaviour during query answering:
* Spherical Earth -- the current implementation treats the Earth as a perfect sphere with a radius of 6371.009km;
* Only 2-Dimensional points are supported, i.e. there is no special handling of geo:alt (metres above the reference surface of the Earth);
* All latitude and longitude values must be specified using decimal degrees, where East and North are positive and \-90 <= latitude <= \+90 and \-180 <= longitude <= \+180;
* Distances must be in units of kilometres (suffix 'km') or statute miles (suffix 'mi'). If the suffix is omitted, kilometres are assumed;
* {{omgeo:within( rectangle )}} construct uses a 'rectangle' whose edges are lines of latitude and longitude, so the north-south distance is constant and the rectangle described forms a band around the Earth that starts and stops at the given longitudes;
* {{omgeo:within( polygon )}} joins vertices with straight lines on a cylindrical projection of the Earth tangential to the equator. A straight line starting at the point under test and continuing East out of the polygon is examined to see how many polygon edges it intersects. If the number of intersections is even then the point is outside the polygon, if the number of intersections is odd, the point is inside the polygon. With the current algorithm, the order of vertices is not relevant (clockwise or anticlockwise);
* {{omgeo:within()}} may not work correctly when the region (polygon or rectangle) spans the \+/-180 meridian;
* {{omgeo:nearby()}} uses the great circle distance between points.

h1. RDF Rank

RDF Rank is an algorithm that identifies the more important or more popular entities in the repository by examining their interconnectedness. The popularity of entities can then be used to order query results in a similar way to internet search engines, such as how Google orders search results using PageRank [http://en.wikipedia.org/wiki/PageRank].
The RDF Rank component computes a numerical weighting for all the nodes in the entire RDF graph stored in the repository, including URIs, blank nodes and literals. The weights are floating point numbers with values between 0 and 1 that can be interpreted as a measure of a node's relevance/popularity.
Since the values range from 0 to 1, the weights can be used for sorting a result set (the lexicographical order works fine even if the rank literals are interpreted as plain strings). Here is an example SPARQL query that uses RDF rank for sorting results by their popularity:
{noformat}PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
PREFIX opencyc-en: <http://sw.opencyc.org/2008/06/10/concept/en/>
SELECT * WHERE {
?Person a opencyc-en:Entertainer .
?Person rank:hasRDFRank ?rank .
}
ORDER BY DESC(?rank) LIMIT 100
{noformat}As seen in the example query, RDF Rank weights are made available via a special system predicate. Triple patterns with the predicate {{[http://www.ontotext.com/owlim/RDFRank#hasRDFRank]}} are handled specially by OWLIM, where the object of the statement pattern is bound to a literal containing the RDF Rank of the subject.
In order to use this mechanism the RDF ranks for the whole repository must be computed in advance. This is done by executing a series of SPARQL ASK queries to parameterise the weighting algorithm, followed by a query that triggers the computation itself.
\\
|| Parameter | *Maximum iterations* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFRank#maxIterations]}} ||
|| Description | Sets the maximum number of iterations of the algorithm over all entities in the repository. ||
|| Default | 20 ||
|| Example | PREFIX rank: <[http://www.ontotext.com/owlim/RDFRank#]> \\
ASK \{ rank:maxIterations rank:setParam "16" . \} ||

\\
|| Parameter | *Epsilon* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFRank#epsilon]}} ||
|| Description | Used to terminate the weighting algorithm early when the total change of all RDF Rank scores has fallen below this value. ||
|| Default | 0.01 ||
|| Example | PREFIX rank: <[http://www.ontotext.com/owlim/RDFRank#]> \\
ASK \{ rank:epsilon rank:setParam "0.05" . \} ||
\\
To trigger the computation of the RDF Rank weights, use the following query:
{noformat}PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
ASK { _:b1 rank:compute _:b2. }
{noformat}The computed weights can be exported to an external file using a query of this form:
{noformat}PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
ASK { _:b1 rank:export "/home/user1/rdf_ranks.txt" . }
{noformat}The query will return true if the export was successful, false otherwise. If the export failed then an error message will be recorded in the log file.
\\
Lastly, when using [RDF Priming|OWLIM-SE Advanced Features#RDF Priming], the RDF Rank weights can be used as the initial action values. To set this up, use the following query:
{noformat}PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
ASK { _:b1 rank:ranksAsWeights _:b2 . }
{noformat}

h1. RDF Priming

RDF Priming is a technique that selects a subset of available statements for use as the input to query answering. It is based upon the concept of 'spreading activation' as developed in cognitive science. RDF Priming is a scalable and customisable implementation of the popular connectionist method on top of RDF graphs that allows for the "priming" of large datasets with respect to concepts relevant to the context and to the query. It is controlled using SPARQL ASK queries. This section provides an overview of the mechanism and explains the necessary SPARQL queries used to manage and set up RDF Priming.

h2. RDF Priming Configuration

To enable RDF Priming over the repository, the {{repository-type}} configuration parameter should be set to {{weighted-file-repository}}.
The current implementation of RDF Priming does not store activation values, which means that they are only available at runtime and are lost when the repository is shutdown. However, they can be exported and imported using the special query directives shown below. Another side effect is that the activation values are global, because they stored within the shared Entity pool.
The initialization and management of the RDF Priming module is achieved by performing SPARQL ASK queries.

h2. Controlling RDF Priming

RDF Priming is controlled using SPARQL ASK queries, which allows all the parameters and default values to be set. These queries use special system predicates, which are described below:
\\
|| Function | *Enable Activation Spreading* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#enableSpreading]}} ||
|| Description | Used to enable or disable the RDF Priming module. The Object value of the statement pattern should be a Literal whose value is either "true" or "false" ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{_:b1 prim:enableSpreading "true".\} ||
\\
|| Function | *Set Activation Decay* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#decayActivations]}} ||
|| Description | Used to alter all the activation values for the nodes in the RDF graph by multiplying them by a factor specified as a Literal in the Object position of the Statement pattern of the query. The following example will reset all the activation values to zero by multiplying them by "0.0" ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{_:b1 prim:decayActivations "0.0".\} ||
\\
|| Function | *Trigger Activation Spreading Cycle* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#spreadActivation]}} ||
|| Description | Used to trigger an Activation spreading cycle that starts from the nodes that were scheduled for activation for this round. No special values are required for the Subject or Object part of the statement pattern -- blank nodes suffice ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{_:b1 prim:spreadActivation \_:b2.\} ||
\\
|| Function | *Set Statement Weight* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#assignWeight]}} ||
|| Description | Used to set a non-default weight factor for statements with a specific predicate. The Subject of the Statement pattern is the predicate to which the new value should be set. The Object of the pattern is the new weight value as a Literal. The example query sets 0.5 as a weight factor to all the rdfs:subClassOf statements ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
PREFIX rdfs: <[http://www.w3.org/2000/01/rdf-schema#]> \\
ASK \{ rdfs:subClassOf prim:assignWeight "0.5" . \} ||
\\
|| Function | *Schedule Nodes for Activation* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#activateNode]}} ||
|| Description | Used to schedule the nodes specified as Subject or Object of the statement pattern for activation. Scheduling for activation can also be performed by evaluating an ASK query with variables in the body, in which case the nodes bound to the variables used in the query will be scheduled for activation. The behaviour of such an ASK query is altered, so that all the solutions are exhausted before returning the query result. This could take a long time, since LIMIT and OFFSET are not available in this case. The first example activates two nodes *gossip:hasTrack* and *prel:hasChild* and the second example activates many nodes identifying people (and their names) that have an album called "American Life". ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
PREFIX gossip: <[http://www.ontotext.com/rascalli/2008/04/gossipdb.owl#]> \\
PREFIX prel: <[http://proton.semanticweb.org/2007/10/proton_rel#]> \\
ASK \{ gossip:hasTrack prim:activateNode prel:hasChild \} \\
\\
PREFIX gossip: <[http://www.ontotext.com/rascalli/2008/04/gossipdb.owl#]> \\
PREFIX onto: <[http://www.ontotext.com#]> \\
ASK \{ \\
?person gossip:hasAlbum ?album . \\
?album gossip:name "American Life" . \\
?person gossip:name ?name \} ||
\\
The following URI's are used with conjuction with the {{<[http://www.ontotext.com/owlim/RDFPriming#decayFactor]>}} predicate to change the parameters of the RDF Priming module. In general, the names of the parameters are Subjects of the statement pattern and the new values are passed as its Object.
\\
|| Parameter | *Activation Threshold* |
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#activationThreshold]}} |
|| Description | During activation spreading activations are accumulated in nodes and can grow indefinitely. The activationThreshold allows the user to trim those value to a certain threshold. The default value of this parameter is {{1.0}}, which means that all values bigger than {{1.0}} are set to {{1.0}} on every iteration. This parameter is applied on every iteration of the process and guarantees that no activations larger than the parameter value will be encountered. |
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:activationThreshold prim:setParam "0.9" . \} |
\\
|| Parameter | *Decay Factor* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#decayFactor]}} ||
|| Description | Is used during spreading activation to control how much a node's activation level is transferred to nodes that it affects. The following example query sets the new decayFactor to "0.55" |
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:decayFactor prim:setParam "0.55" . \} ||
\\
|| Parameter | *Default Activation Value* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#defaultActivation]}} ||
|| Description | Sets the default activation value for all nodes in the repository. If the default activation is not preset then the default activation for all repository nodes is 0. This does not affect the activation origin nodes, whose activation values are set by using {{[http://www.ontotext.com/owlim/RDFPriming#initialActivation]}} ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:defaultActivation prim:setParam "0.4" . \} ||
\\
|| Parameter | *Default Weight* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#defaultWeight]}} ||
|| Description | Edges in the RDF graph can be given weights that are multiplied by the source node activation in order to compute the activation that is spread across the edge to the destination node (see {{assignWeight}}). If the predicate of the edge is not given any specific weight (via {{assignWeight}}) then the edge weight is assumed to be 1/3 (one third). This default weight can be changed by using the defaultWeight parameter. Any floating point value in the range [0,1] can be used. ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:defaultWeight prim:setParam "0.2" . \} ||
\\
|| Function | *Export Activation Values* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#exportActivations]}} ||
|| Description | Is used to export activation values for a set of nodes. The values are stored in a file identified by the URL given as the Object of the statement pattern. The format of the data in the file is simply one line per URI followed by a tab character and the floating-point value of its activation value. ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:exportActivations prim:setParam "file:///D/work/my_activations.txt" . \} ||
\\
|| Parameter | *Filter Threshold* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#filterThreshold]}} ||
|| Description | Sets the new filter threshold value used to decide when a statement is visible depending on the activation level of its subject, predicate and object. ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:filterThreshold prim:setParam "0.50" . \} ||
\\
|| Parameter | *Firing Threshold* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#firingThreshold]}} ||
|| Description | Sets the threshold above which a node will activate its neighbours ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:firingThreshold prim:setParam "0.25" . \} ||
\\
|| Function | *Import Activation Values* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#importActivations]}} ||
|| Description | Is used to import activation values for a set of nodes. The values are loaded from a file identified by the URL given as the Object of the statement pattern. The format of the data in the file is simply one line per URI followed by a tab character and the floating-point value of its activation value. ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:importActivations prim:setParam "file:///D/work/my_activations.txt" . \} ||
\\
|| Parameter | *Initial Activation Value* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#initialActivation]}} ||
|| Description | Sets the initial activation value for each of the nodes from which the activation process starts. The nodes that are scheduled for activation will receive that amount at the beginning of the spreading activation process. ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:initialActivation prim:setParam "0.66" . \} ||
\\
|| Parameter | *Maximum Nodes Fired Per Cycle* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#maxNodesFiredPerCycle]}} ||
|| Description | Sets the number of nodes that should fire activations during one spreading activation cycle. The default value is 100000. ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:maxNodesFiredPerCycle prim:setParam "10000" . \} ||
\\
|| Parameter | *Number of Cycles* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#cycles]}} ||
|| Description | Sets the number of activation spreading cycles to perform when the process is initiated. ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:cycles prim:setParam "4" . \} ||
\\
|| Parameter | *Number of Worker Threads* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFPriming#workerThreads]}} ||
|| Description | Sets the number of worker threads that will perform the spreading activation (the default is 2). ||
|| Example | PREFIX prim: <[http://www.ontotext.com/owlim/RDFPriming#]> \\
ASK \{ prim:workerThreads prim:setParam "4" . \} ||
\\

h2. RDF Priming Example

The following example uses data from DBPEDIA [http://dbpedia.org/About|http://dbpedia.org/About] and was imported into OWLIM-SE with the RDF Priming mode enabled. The management queries are evaluated through the Sesame console application for convenience. The initial step is to evaluate a demo query that retrieves all the instances of the {{dbpedia:V8}} concept:
{noformat}SELECT *
WHERE {?x <http://dbpedia.org/property/class> <http://dbpedia.org/resource/V8>. }
{noformat}The above query returns the following results:
{noformat}?x
------------------------------------
dbpedia3:Jaguar_AJ-V8_engine
dbpedia3:BMW_M62
dbpedia3:BMW_N62
dbpedia3:Chrysler_Flathead_engine
dbpedia3:Duramax_V8_engine
dbpedia3:Ford_385_engine
dbpedia3:Ford_MEL_engine
dbpedia3:Ford_Power_Stroke_engine
dbpedia3:Ford_Y-block_engine
dbpedia3:Ford_Yamaha_V8_engine
dbpedia3:GM_Premium_V_engine
dbpedia3:Lincoln_Y-block_V8_engine
dbpedia3:Mercedes-Benz_M113_engine
dbpedia3:Nissan_VH_engine
dbpedia3:Nissan_VK_engine
dbpedia3:BMW_N63
dbpedia3:Toyota_UR_engine
dbpedia3:Toyota_UZ_engine
{noformat}As can be seen, the query returns many engines from different manufacturers. The RDF Priming module can be used to reduce the number of results returned by this query by targeting the query to specific parts of the global RDF graph, i.e. the parts of the graph that have been activated.
The following text shows an example of setting up and configuring the RDF Priming module for the purpose of making the example query return a smaller set of more specific results. It is assumed that a SPARQL endpoint is available that is connected to a running repository instance.
Enable the RDF Priming module:
{noformat}PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { _:b1 onto:enableSpreading "true" . }
{noformat}Change the default decay factor:
{noformat}PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:decayFactor onto:setParam "0.55" . }
{noformat}Change the firing threshold parameter:
{noformat}PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:firingThreshold onto:setParam "0.25" . }
{noformat}Change the filter threshold:
{noformat}PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:filterThreshold onto:setParam "0.60" . }
{noformat}The initial Activation Level is changed to reflect the specifics of the data set:
{noformat}PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:initialActivation onto:setParam "0.66" . }
{noformat}Adjust the Weight factors for a specific predicate so that it activates the relevant sub-set of the RDF graph, in this case the {{rdfs:subClassOf}} predicate:
{noformat}PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
ASK { rdfs:subClassOf onto:assignWeight "0.5" . }
{noformat}The next step alters the Weight Factor of the {{rdf:type}} predicate so that it does not propagate activations to the classes from the activated instances. This is a useful technique when there are a lot of instances and a very large classification taxonomy which should not be broadly activated (as is the case with the DBpedia dataset).
{noformat}PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ASK { rdf:type onto:assignWeight "0.1" . }
{noformat}If the example query is executed at this stage, it will return no results, because the RDF graph has no activated nodes at all. Therefore the next step is to activate two particular nodes, the Ford Motor Company {{dbpedia3:Ford_Motor_Company}} and one of the cars they build {{dbpedia3:1955_Ford}}, which came out of the factory with a very nice V8 engine:
{noformat}PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
PREFIX dbpedia3: <http://dbpedia.org/resource/>
ASK { dbpedia3:1955_Ford onto:activateNode dbpedia3:Ford_Motor_Company }
{noformat}Finally, tell the RDF Priming module to spread the activations from these two nodes:
{noformat}PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { _:b0 onto:spreadActivation _:b1 . }
{noformat}This will normally take 8-10 seconds after which the example query can be re-evaluated with the following results:
{noformat}?x
------------------------------------
dbpedia3:Jaguar_AJ-V8_engine
dbpedia3:BMW_M62
dbpedia3:Ford_385_engine
dbpedia3:Ford_MEL_engine
dbpedia3:Ford_Y-block_engine
{noformat}As can be seen, the result set is smaller and most of the engines retrieved are made by Ford. However, there is an engine made by Jaguar which is most probably there because Ford owned Jaguar for some time in the past, so both manufacturers are somehow related to each other. This might also be the case for the other non-Ford engines returned, since BMW also owned Jaguar for some time. Of course, these remarks are a free interpretation of the results.
Finally, disable the RDF Priming module:
{noformat}PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { _:b1 onto:enableSpreading "false" . }
{noformat}to return to the normal operating mode.

h1. Local Notifications

Notifications are a publish/subscribe mechanism for registering and receiving events from an OWLIM-SE repository whenever triples matching a certain graph pattern are inserted or removed. The Sesame API provides such a mechanism, where a RepositoryConnectionListener can be notified of changes to a NotifiyingRepositoryConnection. However the OWLIM-SE notifications API works at a lower level and uses the internal raw entity IDs for subject, predicate, object instead of Java objects. The benefit of this is that a much higher performance is possible. The downside is that the client must do a separate lookup up to get the actual entity values and because of this, the notification mechanism will only work when the client is running inside the same JVM as the repository instance. See the next section for the remote notification mechanism.
The user of the notifications API registers for notifications by providing a SPARQL query. The SPARQL query is interpreted as a plain graph pattern by ignoring all the more complicated SPARQL constructs like FILTER, OPTIONAL, DISTINCT, LIMIT, ORDER BY, etc. Therefore the SPARQL query is interpreted as a complex graph pattern involving triple patterns combined by means of joins and unions at any level. The order of the triple patterns is not significant.
Here is an example how to register for notifications based on a given SPARQL query:
{code}AbstractRepository rep =
((OwlimSchemaRepository)owlimSail).getRepository();
EntityPool ent = ((OwlimSchemaRepository)owlimSail).getEntities();
String query = "SELECT * WHERE { ?s rdf:type ?o }";
SPARQLQueryListener listener =
new SPARQLQueryListener(query, rep, ent) {
public void notifyMatch(int subj, int pred, int obj, int context) {
System.out.println("Notification on subject: " + subj);
}
}
rep.addListener(listener); // start receiving notifications
...
rep.removeListener(listener); // stop receiving notifications
{code}In the example code, the caller would be asynchronously notified about incoming statements matching the pattern {{?s rdf:type ?o}}. In general, notifications will be sent for all incoming triples that contribute to a solution of the query. The integer parameters in the {{notifyMatch}} method can be mapped to values using the {{EntityPool}} object. Furthermore, any statements inferred from newly inserted statements will also be subject to handling by the notification mechanism, i.e. new implicit statements will also be notified to clients when the requested triple pattern matches.
The subscriber should not rely on any particular order or distinctness of the statement notifications. Duplicate statements might be delivered in response to a graph pattern subscription in an order not even bound to the chronological order of the statements insertion in to the underlying triple store.
The purpose of the notification services is to enable the efficient and timely discovery of newly added RDF data. Therefore it should be treated as a mechanism for giving the client a hint that certain new data is available and not as an asynchronous SPARQL evaluation engine.

h1. Remote notifications

OWLIM's remote notification mechanism provides filtered statement add/remove and transaction begin/end notifications for a local or a remote OWLIM-SE repository. Subscribers for this mechanism use patterns of subject, predicate and object (with wildcards) to filter the statement notifications. JMX is used internally as a transport mechanism.

h2. Using remote notifications

Registering and deregistering for notifications is achieved through the {{NotifyingOwlimConnection}} class, which wraps a {{RepositoryConnection}} object connected to an OWLIM repository and provides an API to add/remove notification listeners of type {{RepositoryNotificationsListener}}. Here is a simple example of the API usage:
{code}RepositoryConnection conn = null;
// initialize repository connection to OWLIM ...

RepositoryNotificationsListener listener = new RepositoryNotificationsListener() {
@Override
public void addStatement(Resource subject, URI predicate,
Value object, Resource context, boolean isExplicit, long tid) {
System.out.println("Added: " + subject + " " + predicate + " " + object);
}
@Override
public void removeStatement(Resource subject, URI predicate,
Value object, Resource context, boolean isExplicit, long tid) {
System.out.println("Removed: " + subject + " " + predicate + " " + object);
}
@Override
public void transactionStarted(long tid) {
System.out.println("Started transaction " + tid);
}
@Override
public void transactionComplete(long tid) {
System.out.println("Finished transaction " + tid);
}
};

NotifyingOwlimConnection nConn = new NotifyingOwlimConnection(conn);
URIImpl ex = new URIImpl("http://example.com/");

// subscribe for statements with 'ex' as subject
nConn.subscribe(listener, ex, null, null);

// note that this could be any other connection to the same repository
conn.add(ex, ex, ex);
conn.commit();
// statement added should have been printed out

// stop listening for this pattern
nConn.unsubscribe(listener);
{code}

{info}
Note that the transactionStarted() and transactionComplete() events are not bound to any statement, they are dispatched to all subscribers, no matter what they are subscribed for. This means that pairs of start/complete events can be detected by the client without receiving any statement notifications in between.
{info}

The above example will work when the OWLIM repository is initialized in the same JVM that runs the example (local repository). If a remote repository is used (e.g. HTTPRepository) the notifying repository connection should be initialized differently:
{code}NotifyingOwlimConnection nConn =
new NotifyingOwlimConnection(conn, host, port);
{code}where {{host}} (String) and {{port}} (int) are the host name of the remote machine where the repository resides and the port number of the JMX service in the repository JVM. The other part of the example remains valid for the remote case. The repository connection used to initialize a {{NotifyingOwlimConnection}} instance could be a {{ReplicationClusterConnection}} in which case notifications will work with an OWLIM-Enterprise master node (transparently to the user) - no changes on the client side are required.

h2. Remote Notification Configuration

For remote notifications, where the subscriber and the repository are running in different JVM instances (possibly on different hosts), a JMX remote service should be configured in the repository JVM. This is done by adding the following parameters to the JVM command line:
{noformat}-Dcom.sun.management.jmxremote.port=1717
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
{noformat}If the repository is running inside a servlet container, then these parameters must be passed to the JVM that runs the container and OWLIM. For Tomcat, this can be done using the {{JAVA_OPTS}} or {{CATALINA_OPTS}} environment variable.
The port number used should be exactly the port number that is passed to the NotifyingOwlimConnection constructor (as in the example above). One should make sure that the specified port (e.g. 1717) is accessible remotely, i.e. no firewalls or NAT redirection prevent access to it.
In an OWLIM-Enterprise cluster setup, all the worker nodes should have their JMX configured properly in order to enable notifications for the whole cluster. The master node assumes that each worker is exposing its JMX service on port 1717 but this can be overridden when nodes are added to the cluster (the third parameter to addClusterNode() operation is the JMX service port of that node) or by editing the {{cluster.properties}} configuration file and adding the following parameter:
{noformat}jmxport<N> = <PORTN>
{noformat}where N is the consecutive number of the node we want to configure and PORTN is the port number of that node's JMX service. Cluster workers should also have their {{com.sun.management.jmxremote.\*}} JVM parameters properly configured. OWLIM-Enterprise cluster master nodes will therefore be controlled and emit notifications using the same JMX port number.

h1. Query modifiers/extensions


h2. Managing Explicit and Implicit Statements

In order to control whether only explicit or only implicit statements are considered during SPARQL query evaluation, some special context identifiers can be used with the {{FROM}} and {{FROM NAMED}} SPARQL constructs. The following table gives details:
\\
|| Clause || Behaviour ||
| {{FROM <[http://www.ontotext.com/explicit]>}} | The default graph (used in triple patterns with such scope) will include only explicit statements (with or without a context) |
| {{FROM <[http://www.ontotext.com/implicit]>}} | The default graph (used in triple patterns with such scope) will include only inferred statements |
| {{FROM NAMED <[http://www.ontotext.com/explicit]>}} | This means that the NAMED graph (used in triple patterns with such scope e.g. \{GRAPH ?g \{?s ?p ?o\} . ) will include only explicit statements (with or without a context) |
| {{FROM NAMED <[http://www.ontotext.com/implicit]>}} | That the NAMED graph (used in triple patterns with such scope e.g. \{GRAPH ?g \{?s ?p ?o\} . ) will include all the inferred statements |

Effectively, statements behave as though they have a context of {{[http://www.ontotext.com/implicit]}} or {{[http://www.ontotext.com/explicit]}} independent of whether they have an actual context or not. Various combinations of {{FROM}} and {{FROM NAMED}} are allowed in alignment with SPARQL semantics.

h2. Accessing internal identifiers for entities

Internally, OWLIM uses integer identifiers (IDs) to index all entities (URIs, blank nodes and literals). Statement indices are made up of these IDs and a large data structure is used to map from ID to entity value and back. There are occasions, e.g. when interfacing to application infrastructure, when having access to these internal IDs can improve the efficiency of data structures external to OWLIM by allowing them to be indexed by an integer value rather than a full URI.

This section introduces a special OWLIM predicate and function that provide access to these internal IDs. The datatype of internal IDs is <[http://www.w3.org/2001/XMLSchema#long]>.


|| Predicate | <[http://www.ontotext.com/owlim/entity#id]> ||
|| Description | Map between entity and internal ID ||
|| Example | Select all entities and their IDs: \\
\\
PREFIX ent: <[http://www.ontotext.com/owlim/entity#]> \\
SELECT * WHERE \{
?s ent:id ?id
\} ORDER BY ?id ||

|| Function | <[http://www.ontotext.com/owlim/entity#id]> ||
|| Description | Return an entity's internal ID ||
|| Example | Select all statements and order them by the internal ID of the object values: \\
\\
PREFIX ent: <[http://www.ontotext.com/owlim/entity#]> \\
SELECT * WHERE \{ \\
?s ?p ?o . \\
\} order by ent:id(?o) ||

h3. Examples

* Enumerate all the entities and bind the nodes to ?s and their IDs to ?id, order by ?id:
{noformat}
select * where {
?s <http://www.ontotext.com/owlim/entity#id> ?id
} order by ?id
{noformat}

* Enumerate all non-literals and bind the nodes to ?s and their IDs to ?id, order by ?id:
{noformat}
SELECT * WHERE {
?s <http://www.ontotext.com/owlim/entity#id> ?id .
FILTER (!isLiteral(?s)) .
} ORDER BY ?id
{noformat}

* Find the internal IDs of subjects of statements with specific predicate and object values:
{noformat}
SELECT * WHERE {
?s <http://test.org#Pred1> "A literal".
?s <http://www.ontotext.com/owlim/entity#id> ?id .
} ORDER BY ?id
{noformat}

* Find all statements where the object has the given internal ID by using an explicit, un-typed value as the ID (the "115" used as object in the second statement pattern):
{noformat}
SELECT * WHERE {
?s ?p ?o.
?o <http://www.ontotext.com/owlim/entity#id> "115" .
}
{noformat}

* As above, but using an xsd:long datatype for the constant within a FILTER condition:
{noformat}
SELECT * WHERE {
?s ?p ?o.
?o <http://www.ontotext.com/owlim/entity#id> ?id .
FILTER (?id="115"^^<http://www.w3.org/2001/XMLSchema#long>) .
} ORDER BY ?o
{noformat}

* Find the internal IDs of subject and object entities for all statements:
{noformat}
SELECT * WHERE {
?s ?p ?o.
?s <http://www.ontotext.com/owlim/entity#id> ?ids.
?o <http://www.ontotext.com/owlim/entity#id> ?ido.
}
{noformat}

* Retrieve all statements where the ID of the subject is equal to "115"^^xsd:long, by providing an internal ID value within a filter expression:
{noformat}
SELECT * WHERE {
?s ?p ?o.
FILTER ((<http://www.ontotext.com/owlim/entity#id>(?s)) = "115"^^<http://www.w3.org/2001/XMLSchema#long>).
}
{noformat}

* Retrieve all statements where the string-ised ID of the subject is equal to "115", by providing an internal ID value within a filter expression:
{noformat}
SELECT * WHERE {
?s ?p ?o.
FILTER (str( <http://www.ontotext.com/owlim/entity#id>(?s) ) = "115").
}
{noformat}


h2. Other special query behaviour

There are several more special graph URIs used in OWLIM-SE that can be used to control query evaluation.
\\
|| Clause || Behaviour ||
| {{FROM/FROM NAMED <[http://www.ontotext.com/disable-sameAs]>}} | Used to switch off the enumeration of the equivalence classes produced by owl:sameAs during triple pattern matching, which is the default behaviour, so that solutions followed by these are excluded. Its purpose is to reduce the number of results to only those that are valid for a single representative of the class (this is a rough description and not fully explanatory). For example, given a triple that matches a pattern: {{test:Inst rdf:type, test:SomeClass}} and {{test:Inst}} is {{owl:sameAs}} to {{test:Inst2}} then, by default there would be 2 triples matching the pattern, one for {{test:Inst}} and another for {{test:Inst2}}. Using the above system graph in {{FROM/FROM NAMED}} clauses excludes such redundancies. BE AWARE that if the query uses filters over the textual representation of a node that modifier may skip some valid solutions since not all the nodes within an equivalence class will be matched against such a {{FILTER}}. |
| {{FROM/FROM NAMED <[http://www.ontotext.com/count]>}} | Will trigger the evaluation of the query so that it will give a single result in which all the variable bindings in the projection will be replaced with a plain literal holding the value of the total number of solutions of the query, i.e. the equivalent of COUNT\(*) from SQL. In the case of a CONSTRUCT query in which the projection contains three variables (?subject, ?predicate, ?object), the subject and the predicate will be bound to {{<[http://www.ontotext.com/]>}} and the object will hold the literal value. This is because there cannot exist a statement with literal in the place of the subject or predicate. |
| {{FROM/FROM NAMED <[http://www.ontotext.com/skip-redundant-implicit]>}} | Will trigger the exclusion of implicit statements when there exists an explicit one within a specific context(even default). Initially implemented to allow for filtering of redundant rows where the context part is not taken into account and which leads to 'duplicate' results. |
| {{FROM <[http://www.ontotext.com/distinct]>}} | Using this special graph name in {{DESCRIBE}} and {{CONSTRUCT}} queries will cause only distinct triples to be returned. This is useful when several resources are being described, where the same triple can be returned more than once, i.e. when describing its subject and its object. |
| {{FROM <[http://www.ontotext.com/owlim/cluster/control-query]>}} | Identifies the query to the OWLIM-Enterprise cluster master node as needing to be routed to all worker nodes. |