OWLIM-SE Experimental Features

Skip to end of metadata
Go to start of metadata
Search
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

OWLIM Documentation

Next versions

[OWLIM 5.4]

GraphDB 6.0 & 6.1
GraphDB 6.2
GraphDB 6.3
GraphDB 6.4
GraphDB 6.5
GraphDB 6.6
GraphDB 7.0
GraphDB 7.1

Previous versions

[OWLIM 5.2]
[OWLIM 5.1]
[OWLIM 5.0]
[OWLIM 4.4]
[OWLIM 4.3]
[OWLIM 4.2]
[OWLIM 4.1]
[OWLIM 4.0]

RDF Priming

RDF Priming is a technique that selects a subset of available statements for use as the input to query answering. It is based upon the concept of 'spreading activation' as developed in cognitive science. RDF Priming is a scalable and customisable implementation of the popular connectionist method on top of RDF graphs that allows for the "priming" of large datasets with respect to concepts relevant to the context and to the query. It is controlled using SPARQL ASK queries. This section provides an overview of the mechanism and explains the necessary SPARQL queries used to manage and set up RDF Priming.

RDF Priming Configuration

To enable RDF Priming over the repository, the repository-type configuration parameter should be set to weighted-file-repository.
The current implementation of RDF Priming does not store activation values, which means that they are only available at runtime and are lost when the repository is shutdown. However, they can be exported and imported using the special query directives shown below. Another side effect is that the activation values are global, because they stored within the shared Entity pool.
The initialization and management of the RDF Priming module is achieved by performing SPARQL ASK queries.

Controlling RDF Priming

RDF Priming is controlled using SPARQL ASK queries, which allows all the parameters and default values to be set. These queries use special system predicates, which are described below:

Function Enable Activation Spreading
Predicate http://www.ontotext.com/owlim/RDFPriming#enableSpreading
Description Used to enable or disable the RDF Priming module. The Object value of the statement pattern should be a Literal whose value is either "true" or "false"
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK {_:b1 prim:enableSpreading "true".}


Function Set Activation Decay
Predicate http://www.ontotext.com/owlim/RDFPriming#decayActivations
Description Used to alter all the activation values for the nodes in the RDF graph by multiplying them by a factor specified as a Literal in the Object position of the Statement pattern of the query. The following example will reset all the activation values to zero by multiplying them by "0.0"
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK {_:b1 prim:decayActivations "0.0".}


Function Trigger Activation Spreading Cycle
Predicate http://www.ontotext.com/owlim/RDFPriming#spreadActivation
Description Used to trigger an Activation spreading cycle that starts from the nodes that were scheduled for activation for this round. No special values are required for the Subject or Object part of the statement pattern – blank nodes suffice
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK {_:b1 prim:spreadActivation _:b2.}


Function Set Statement Weight
Predicate http://www.ontotext.com/owlim/RDFPriming#assignWeight
Description Used to set a non-default weight factor for statements with a specific predicate. The Subject of the Statement pattern is the predicate to which the new value should be set. The Object of the pattern is the new weight value as a Literal. The example query sets 0.5 as a weight factor to all the rdfs:subClassOf statements
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
ASK { rdfs:subClassOf prim:assignWeight "0.5" . }


Function Schedule Nodes for Activation
Predicate http://www.ontotext.com/owlim/RDFPriming#activateNode
Description Used to schedule the nodes specified as Subject or Object of the statement pattern for activation. Scheduling for activation can also be performed by evaluating an ASK query with variables in the body, in which case the nodes bound to the variables used in the query will be scheduled for activation. The behaviour of such an ASK query is altered, so that all the solutions are exhausted before returning the query result. This could take a long time, since LIMIT and OFFSET are not available in this case. The first example activates two nodes gossip:hasTrack and prel:hasChild and the second example activates many nodes identifying people (and their names) that have an album called "American Life".
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
PREFIX gossip: <http://www.ontotext.com/rascalli/2008/04/gossipdb.owl#>
PREFIX prel: <http://proton.semanticweb.org/2007/10/proton_rel#>
ASK { gossip:hasTrack prim:activateNode prel:hasChild }

PREFIX gossip: <http://www.ontotext.com/rascalli/2008/04/gossipdb.owl#>
PREFIX onto: <http://www.ontotext.com#>
ASK {
?person gossip:hasAlbum ?album .
?album gossip:name "American Life" .
?person gossip:name ?name }


The following URI's are used with conjuction with the <http://www.ontotext.com/owlim/RDFPriming#decayFactor> predicate to change the parameters of the RDF Priming module. In general, the names of the parameters are Subjects of the statement pattern and the new values are passed as its Object.

Parameter Activation Threshold
Predicate http://www.ontotext.com/owlim/RDFPriming#activationThreshold
Description During activation spreading activations are accumulated in nodes and can grow indefinitely. The activationThreshold allows the user to trim those value to a certain threshold. The default value of this parameter is 1.0, which means that all values bigger than 1.0 are set to 1.0 on every iteration. This parameter is applied on every iteration of the process and guarantees that no activations larger than the parameter value will be encountered.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:activationThreshold prim:setParam "0.9" . }


Parameter Decay Factor
Predicate http://www.ontotext.com/owlim/RDFPriming#decayFactor
Description Is used during spreading activation to control how much a node's activation level is transferred to nodes that it affects. The following example query sets the new decayFactor to "0.55"
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:decayFactor prim:setParam "0.55" . }


Parameter Default Activation Value
Predicate http://www.ontotext.com/owlim/RDFPriming#defaultActivation
Description Sets the default activation value for all nodes in the repository. If the default activation is not preset then the default activation for all repository nodes is 0. This does not affect the activation origin nodes, whose activation values are set by using http://www.ontotext.com/owlim/RDFPriming#initialActivation
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:defaultActivation prim:setParam "0.4" . }


Parameter Default Weight
Predicate http://www.ontotext.com/owlim/RDFPriming#defaultWeight
Description Edges in the RDF graph can be given weights that are multiplied by the source node activation in order to compute the activation that is spread across the edge to the destination node (see assignWeight). If the predicate of the edge is not given any specific weight (via assignWeight) then the edge weight is assumed to be 1/3 (one third). This default weight can be changed by using the defaultWeight parameter. Any floating point value in the range [0,1] can be used.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:defaultWeight prim:setParam "0.2" . }


Function Export Activation Values
Predicate http://www.ontotext.com/owlim/RDFPriming#exportActivations
Description Is used to export activation values for a set of nodes. The values are stored in a file identified by the URL given as the Object of the statement pattern. The format of the data in the file is simply one line per URI followed by a tab character and the floating-point value of its activation value.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:exportActivations prim:setParam "file:///D/work/my_activations.txt" . }


Parameter Filter Threshold
Predicate http://www.ontotext.com/owlim/RDFPriming#filterThreshold
Description Sets the new filter threshold value used to decide when a statement is visible depending on the activation level of its subject, predicate and object.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:filterThreshold prim:setParam "0.50" . }


Parameter Firing Threshold
Predicate http://www.ontotext.com/owlim/RDFPriming#firingThreshold
Description Sets the threshold above which a node will activate its neighbours
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:firingThreshold prim:setParam "0.25" . }


Function Import Activation Values
Predicate http://www.ontotext.com/owlim/RDFPriming#importActivations
Description Is used to import activation values for a set of nodes. The values are loaded from a file identified by the URL given as the Object of the statement pattern. The format of the data in the file is simply one line per URI followed by a tab character and the floating-point value of its activation value.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:importActivations prim:setParam "file:///D/work/my_activations.txt" . }


Parameter Initial Activation Value
Predicate http://www.ontotext.com/owlim/RDFPriming#initialActivation
Description Sets the initial activation value for each of the nodes from which the activation process starts. The nodes that are scheduled for activation will receive that amount at the beginning of the spreading activation process.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:initialActivation prim:setParam "0.66" . }


Parameter Maximum Nodes Fired Per Cycle
Predicate http://www.ontotext.com/owlim/RDFPriming#maxNodesFiredPerCycle
Description Sets the number of nodes that should fire activations during one spreading activation cycle. The default value is 100000.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:maxNodesFiredPerCycle prim:setParam "10000" . }


Parameter Number of Cycles
Predicate http://www.ontotext.com/owlim/RDFPriming#cycles
Description Sets the number of activation spreading cycles to perform when the process is initiated.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:cycles prim:setParam "4" . }


Parameter Number of Worker Threads
Predicate http://www.ontotext.com/owlim/RDFPriming#workerThreads
Description Sets the number of worker threads that will perform the spreading activation (the default is 2).
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:workerThreads prim:setParam "4" . }


RDF Priming Example

The following example uses data from DBPEDIA http://dbpedia.org/About and was imported into OWLIM-SE with the RDF Priming mode enabled. The management queries are evaluated through the Sesame console application for convenience. The initial step is to evaluate a demo query that retrieves all the instances of the dbpedia:V8 concept:

SELECT *
WHERE {?x <http://dbpedia.org/property/class> <http://dbpedia.org/resource/V8>. }

The above query returns the following results:

?x
------------------------------------
dbpedia3:Jaguar_AJ-V8_engine
dbpedia3:BMW_M62
dbpedia3:BMW_N62
dbpedia3:Chrysler_Flathead_engine
dbpedia3:Duramax_V8_engine
dbpedia3:Ford_385_engine
dbpedia3:Ford_MEL_engine
dbpedia3:Ford_Power_Stroke_engine
dbpedia3:Ford_Y-block_engine
dbpedia3:Ford_Yamaha_V8_engine
dbpedia3:GM_Premium_V_engine
dbpedia3:Lincoln_Y-block_V8_engine
dbpedia3:Mercedes-Benz_M113_engine
dbpedia3:Nissan_VH_engine
dbpedia3:Nissan_VK_engine
dbpedia3:BMW_N63
dbpedia3:Toyota_UR_engine
dbpedia3:Toyota_UZ_engine

As can be seen, the query returns many engines from different manufacturers. The RDF Priming module can be used to reduce the number of results returned by this query by targeting the query to specific parts of the global RDF graph, i.e. the parts of the graph that have been activated.
The following text shows an example of setting up and configuring the RDF Priming module for the purpose of making the example query return a smaller set of more specific results. It is assumed that a SPARQL endpoint is available that is connected to a running repository instance.
Enable the RDF Priming module:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { _:b1 onto:enableSpreading "true" . }

Change the default decay factor:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:decayFactor onto:setParam "0.55" . }

Change the firing threshold parameter:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:firingThreshold onto:setParam "0.25" . }

Change the filter threshold:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:filterThreshold onto:setParam "0.60" . }

The initial Activation Level is changed to reflect the specifics of the data set:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:initialActivation onto:setParam "0.66" . }

Adjust the Weight factors for a specific predicate so that it activates the relevant sub-set of the RDF graph, in this case the rdfs:subClassOf predicate:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
ASK { rdfs:subClassOf onto:assignWeight "0.5" . }

The next step alters the Weight Factor of the rdf:type predicate so that it does not propagate activations to the classes from the activated instances. This is a useful technique when there are a lot of instances and a very large classification taxonomy which should not be broadly activated (as is the case with the DBpedia dataset).

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ASK {  rdf:type onto:assignWeight "0.1" . }

If the example query is executed at this stage, it will return no results, because the RDF graph has no activated nodes at all. Therefore the next step is to activate two particular nodes, the Ford Motor Company dbpedia3:Ford_Motor_Company and one of the cars they build dbpedia3:1955_Ford, which came out of the factory with a very nice V8 engine:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
PREFIX dbpedia3: <http://dbpedia.org/resource/>
ASK { dbpedia3:1955_Ford onto:activateNode dbpedia3:Ford_Motor_Company }

Finally, tell the RDF Priming module to spread the activations from these two nodes:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { _:b0 onto:spreadActivation _:b1 . }

This will normally take 8-10 seconds after which the example query can be re-evaluated with the following results:

?x
------------------------------------
dbpedia3:Jaguar_AJ-V8_engine
dbpedia3:BMW_M62
dbpedia3:Ford_385_engine
dbpedia3:Ford_MEL_engine
dbpedia3:Ford_Y-block_engine

As can be seen, the result set is smaller and most of the engines retrieved are made by Ford. However, there is an engine made by Jaguar which is most probably there because Ford owned Jaguar for some time in the past, so both manufacturers are somehow related to each other. This might also be the case for the other non-Ford engines returned, since BMW also owned Jaguar for some time. Of course, these remarks are a free interpretation of the results.
Finally, disable the RDF Priming module:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { _:b1 onto:enableSpreading "false" . }

to return to the normal operating mode.

Nested Repositories

Nested repositories is a technique for sharing RDF data between multiple OWLIM-SE repositories. It is most useful when several logically independent repositories need to make use of a large (reference) dataset, e.g. a combination of one or more a LOD datasets such as geonames, DBpedia, MusicBrainz, etc, but where each repository adds its own specific data. This mechanism allows the data in the common repository to be logically included, or 'nested', within other repositories that extend it. RDF data in the common repository is combined with data in each child repository for inference purposes. Changes in the common repository are reflected across all child repositories and inferences are maintained to be logically consistent.

Results for queries against a child repository are computed from the contents of the child repository as well as the nested repository. The following diagram illustrates the nested repositories concept:



When two child repositories extend the same nested repository, they remain logically separate. Only changes made to the common nested repository will affect any child repositories.

Definition: A repository that is nested in to another repository (possibly into more than one other repository) is called a parent repository.

Definition: A repository that nests a parent repository is called a child repository.

Inference, indexing and queries

A child repository will ignore any value for its ruleset parameter and automatically use the same rule-set as its parent repository. Child repositories compute inferences based on the union of the explicit data stored stored in the child and parent repository. Changes to either parent or child will cause the set of inferred statements in the child to be updated. However, the child repository must be initialised (running) when updates to the parent repository take place. If this is not the case then the child can become logically inconsistent.

When a parent repository is updated then before its transaction is committed it will in turn update every connected child repository by a set of statement insert/delete operations. When a child repository is updated, any new resources are recorded in the parent's dictionary in order that the same resource in sibling child repositories will be index using the same internal identifier.

A current limitation on the implementation is that no updates using the owl:sameAs predicate are permitted

Queries executed on a child repository should perform almost as well as queries executed against a repository containing all the data (from both parent and child repositories).

Configuration

Both parent and child repositories must be deployed using Tomcat and they must deployed to the same instance on the same machine (same JVM).

Repositories that are configured to use the nesting mechanism must be created using specific Sesame SAIL types:

  • owlim:ParentSail is used for parent (shared) repositories
  • owlim:ChildSail is used for child repositories that extend parent repositories

(Where the owlim namespace above maps to http://www.ontotext.com/trree/owlim#)

Additionally, the following configuration parameters are also used:

  • owlim:id is used in the parent configuration to provide a nesting name
  • owlim:parent-id is used in child repository configurations to identify the parent repository

Once created, a child repository must not be reconfigured to use a different parent repository as this will lead to inconsistent data.

When setting up several OWLIM instances to run in the same Java Virtual Machine, i.e. the JVM used to host Tomcat, make sure that the configured memory settings take in to account the other repositories, e.g. if setting up 3 OWLIM instances, configure them as though they each had only one third of the total Java heap space available.

Initialisation and shutdown

The correct initialisation sequence is to start the parent repository followed by each of its children.

As long as no further updates occur, the shutdown sequence is not defined. However, it suggested that the children be shut down first followed by the parent.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.