GraphDB-SE Experimental Features

compared with
Current by reneta.popova
on Aug 27, 2014 14:50.

This line was removed.
This word was removed. This word was added.
This line was added.

Changes (27)

View Page History
h1. RDF Priming

RDF Priming is a technique that selects a subset of available statements for use as the an input to query answering. It is based upon the concept of 'spreading activation' as developed in cognitive science. RDF Priming is a scalable and customisable implementation of the popular connectionist method on top of RDF graphs that graphs, which allows for the "priming" of large datasets with respect to concepts relevant to the context and to the query. It is controlled using SPARQL ASK queries. This section provides an overview of the mechanism and explains the necessary SPARQL queries used to manage and set up RDF Priming.

h2. RDF Priming Configuration

To enable RDF Priming over the repository, the {{repository-type}} configuration parameter should be set to {{weighted-file-repository}}.
The current implementation of RDF Priming does not store activation values, which means that they are only available at runtime and are lost when the repository is shutdown. However, they can be exported and imported using the special query directives shown below. Another side effect is that the activation values are global, because they are stored within the shared Entity pool.
The initializsation and management of the RDF Priming module is achieved by performing SPARQL ASK queries.

h2. Controlling RDF Priming
|| Function | *Enable Activation Spreading* ||
|| Predicate | {{[]}} ||
|| Description | Used to enable or disable the RDF Priming module. The Object value of the statement pattern should be a Literal whose Literal, which value is either "true" or "false". ||
|| Example | PREFIX prim: <[]> \\
ASK \{_:b1 prim:enableSpreading "true".\} ||
|| Function | *Set Activation Decay* ||
|| Predicate | {{[]}} ||
|| Description | Used to alter all the activation values for the nodes in the RDF graph by multiplying them by a factor specified as a Literal in the Object position of the Statement pattern of the query. The following example will reset resets all the activation values to zero by multiplying them by "0.0" ||
|| Example | PREFIX prim: <[]> \\
ASK \{_:b1 prim:decayActivations "0.0".\} ||
|| Function | *Set Statement Weight* ||
|| Predicate | {{[]}} ||
|| Description | Used to set a non-default weight factor for statements with a specific predicate. The Subject of the Statement pattern is the predicate to which the new value should be set. The Object of the pattern is the new weight value as a Literal. The example query sets 0.5 as a weight factor to all the rdfs:subClassOf statements. ||
|| Example | PREFIX prim: <[]> \\
PREFIX rdfs: <[]> \\
?person gossip:name ?name \} ||
The following URI's are used with conjunction with the {{<[]>}} predicate to change the parameters of the RDF Priming module. In general, the names of the parameters are Subjects of the statement pattern and the new values are passed as its Object.
|| Parameter | *Activation Threshold* |
|| Predicate | {{[]}} |
|| Description | During activation spreading, activations are accumulated in nodes and can grow indefinitely. The activationThreshold allows the user to trim those values to a certain threshold. The default value of this parameter is {{1.0}}, which means that all values bigger than {{1.0}} are set to {{1.0}} on every iteration. This parameter is applied on every iteration of the process and guarantees that no activations larger than the parameter value will be encountered. |
|| Example | PREFIX prim: <[]> \\
ASK \{ prim:activationThreshold prim:setParam "0.9" . \} |
|| Parameter | *Default Activation Value* ||
|| Predicate | {{[]}} ||
|| Description | Sets the default activation value for all nodes in the repository. If the default activation is not preset, then the default activation for all repository nodes is 0. This does not affect the activation origin nodes, whose which activation values are set by using {{[]}} ||
|| Example | PREFIX prim: <[]> \\
ASK \{ prim:defaultActivation prim:setParam "0.4" . \} ||
|| Parameter | *Default Weight* ||
|| Predicate | {{[]}} ||
|| Description | Edges in the RDF graph can be given weights that weights, which are multiplied by the source node activation in order to compute the activation that is spread across the edge to the destination node (see {{assignWeight}}). If the predicate of the edge is not given any specific weight (via {{assignWeight}}), then the edge weight is assumed to be 1/3 (one third). This default weight can be changed by using the defaultWeight parameter. Any floating point value in the range [0,1] can be used. ||
|| Example | PREFIX prim: <[]> \\
ASK \{ prim:defaultWeight prim:setParam "0.2" . \} ||
|| Function | *Export Activation Values* ||
|| Predicate | {{[]}} ||
|| Description | Is used to export activation values for a set of nodes. The values are stored in a file, identified by the URL given, as the Object of the statement pattern. The format of the data in the file is simply one line per URI, followed by a tab character, and the floating-point value of its activation value. ||
|| Example | PREFIX prim: <[]> \\
ASK \{ prim:exportActivations prim:setParam "file:///D/work/my_activations.txt" . \} ||
|| Parameter | *Firing Threshold* ||
|| Predicate | {{[]}} ||
|| Description | Sets the threshold above which a node will activate its neighbours. ||
|| Example | PREFIX prim: <[]> \\
ASK \{ prim:firingThreshold prim:setParam "0.25" . \} ||
|| Function | *Import Activation Values* ||
|| Predicate | {{[]}} ||
|| Description | Is used to import activation values for a set of nodes. The values are loaded from a file, identified by the URL given, as the Object of the statement pattern. The format of the data in the file is simply one line per URI, followed by a tab character, and the floating-point value of its activation value. ||
|| Example | PREFIX prim: <[]> \\
ASK \{ prim:importActivations prim:setParam "file:///D/work/my_activations.txt" . \} ||
PREFIX rdfs: <>
ASK { rdfs:subClassOf onto:assignWeight "0.5" . }
{noformat}The next step alters the Weight Factor of the {{rdf:type}} predicate so that it does not propagate activations to the classes from the activated instances. This is a useful technique when there are a lot of instances and a very large classification taxonomy, which should not be broadly activated (as is the case with the DBpedia dataset).
{noformat}PREFIX onto: <>
prefix rdfs: <>
PREFIX rdf:<>
ASK { rdf:type onto:assignWeight "0.1" . }
{noformat}If the example query is executed at this stage, it will return returns no results, because the RDF graph has no activated nodes at all. Therefore, the next step is to activate two particular nodes, the Ford Motor Company {{dbpedia3:Ford_Motor_Company}} and one of the cars they build {{dbpedia3:1955_Ford}}, which came out of the factory with a very nice V8 engine:
{noformat}PREFIX onto: <>
PREFIX dbpedia3: <>
{noformat}As can be seen, the result set is smaller and most of the engines retrieved are made by Ford. However, there is an engine made by Jaguar, which is most probably there because Ford owned Jaguar for some time in the past, so both manufacturers are somehow related to each other. This might also be the case for the other non-Ford engines returned, since BMW also owned Jaguar for some time. Of course, these remarks are a free interpretation of the results.
Finally, disable the RDF Priming module:
{noformat}PREFIX onto: <>
h1. Nested Repositories

*Nested repositories* is a technique for sharing RDF data between multiple GraphDB-SE repositories. It is most useful when several logically independent repositories need to make use of a large (reference) dataset, e.g. a combination of one or more a LOD datasets such as geonames, DBpedia, MusicBrainz, etc., but where each repository adds its own specific data. This mechanism allows the data in the common repository to be logically included, or 'nested', within other repositories that extend it. RDF data in the common repository is combined with data in each child repository for inference purposes. Changes in the common repository are reflected across all child repositories and inferences are maintained to be logically consistent.

Results for queries against a child repository are computed from the contents of the child repository, as well as the nested repository. The following diagram illustrates the nested repositories concept:

h2. Inference, indexing and queries

A child repository will ignore ignores any value for its ruleset parameter and automatically uses the same rule-set as its parent repository. Child repositories compute inferences based on the union of the explicit data stored stored in the child and parent repository. Changes to either parent or child will cause the set of inferred statements in the child to be updated. However, the child repository must be initialised (running) when updates to the parent repository take place. If this is not the case then the child can become logically inconsistent.

When a parent repository is updated, then before its transaction is committed it will in turn updates every connected child repository by a set of statement insert/delete operations. When a child repository is updated, any new resources are recorded in the parent's dictionary in order that the same resource in sibling child repositories will be index is indexed using the same internal identifier.

A current limitation on the implementation is that no updates using the owl:sameAs predicate are permitted.

Repositories that are configured to use the nesting mechanism must be created using specific Sesame SAIL types:

* {{owlim:ParentSail}} is used for parent (shared) repositories;
* {{owlim:ChildSail}} is used for child repositories that extend parent repositories.

(Where the {{owlim}} namespace above maps to {{[]}}.)

Additionally, the following configuration parameters are also used:

* {{owlim:id}} is used in the parent configuration to provide a nesting name;
* {{owlim:parent-id}} is used in child repository configurations to identify the parent repository.

Once created, a child repository must not be reconfigured to use a different parent repository as this will lead leads to inconsistent data.