RDF Priming is a technique that selects a subset of available statements for use as an input to query answering. It is based upon the concept of 'spreading activation' as developed in cognitive science. RDF Priming is a scalable and customisable implementation of the popular connectionist method on top of RDF graphs, which allows for the "priming" of large datasets with respect to concepts relevant to the context and to the query. It is controlled using SPARQL ASK queries. This section provides an overview of the mechanism and explains the necessary SPARQL queries used to manage and set up RDF Priming.
To enable RDF Priming over the repository, the repository-type configuration parameter should be set to weighted-file-repository.
RDF Priming is controlled using SPARQL ASK queries, which allows all the parameters and default values to be set. These queries use special system predicates, which are described below:
The following example uses data from DBPEDIA http://dbpedia.org/About and was imported into GraphDB-SE with the RDF Priming mode enabled. The management queries are evaluated through the Sesame console application for convenience. The initial step is to evaluate a demo query that retrieves all the instances of the dbpedia:V8 concept:
The above query returns the following results:
As can be seen, the query returns many engines from different manufacturers. The RDF Priming module can be used to reduce the number of results returned by this query by targeting the query to specific parts of the global RDF graph, i.e. the parts of the graph that have been activated.
Change the default decay factor:
Change the firing threshold parameter:
Change the filter threshold:
The initial Activation Level is changed to reflect the specifics of the data set:
Adjust the Weight factors for a specific predicate so that it activates the relevant sub-set of the RDF graph, in this case the rdfs:subClassOf predicate:
The next step alters the Weight Factor of the rdf:type predicate so that it does not propagate activations to the classes from the activated instances. This is a useful technique when there are a lot of instances and a very large classification taxonomy, which should not be broadly activated (as is the case with the DBpedia dataset).
If the example query is executed at this stage, it returns no results, because the RDF graph has no activated nodes at all. Therefore, the next step is to activate two particular nodes, the Ford Motor Company dbpedia3:Ford_Motor_Company and one of the cars they build dbpedia3:1955_Ford, which came out of the factory with a very nice V8 engine:
Finally, tell the RDF Priming module to spread the activations from these two nodes:
This will normally take 8-10 seconds after which the example query can be re-evaluated with the following results:
As can be seen, the result set is smaller and most of the engines retrieved are made by Ford. However, there is an engine made by Jaguar, which is most probably there because Ford owned Jaguar for some time in the past, so both manufacturers are somehow related to each other. This might also be the case for the other non-Ford engines returned, since BMW also owned Jaguar for some time. Of course, these remarks are a free interpretation of the results.
to return to the normal operating mode.
Nested repositories is a technique for sharing RDF data between multiple GraphDB-SE repositories. It is most useful when several logically independent repositories need to make use of a large (reference) dataset, e.g. a combination of one or more a LOD datasets such as geonames, DBpedia, MusicBrainz, etc., but where each repository adds its own specific data. This mechanism allows the data in the common repository to be logically included, or 'nested', within other repositories that extend it. RDF data in the common repository is combined with data in each child repository for inference purposes. Changes in the common repository are reflected across all child repositories and inferences are maintained to be logically consistent.
Results for queries against a child repository are computed from the contents of the child repository, as well as the nested repository. The following diagram illustrates the nested repositories concept:
When two child repositories extend the same nested repository, they remain logically separate. Only changes made to the common nested repository will affect any child repositories.
Definition: A repository that is nested in to another repository (possibly into more than one other repository) is called a parent repository.
Definition: A repository that nests a parent repository is called a child repository.
A child repository ignores any value for its ruleset parameter and automatically uses the same rule-set as its parent repository. Child repositories compute inferences based on the union of the explicit data stored in the child and parent repository. Changes to either parent or child cause the set of inferred statements in the child to be updated. However, the child repository must be initialised (running) when updates to the parent repository take place. If this is not the case then the child can become logically inconsistent.
When a parent repository is updated, then before its transaction is committed it in turn updates every connected child repository by a set of statement insert/delete operations. When a child repository is updated, any new resources are recorded in the parent's dictionary in order that the same resource in sibling child repositories is indexed using the same internal identifier.
Queries executed on a child repository should perform almost as well as queries executed against a repository containing all the data (from both parent and child repositories).
Both parent and child repositories must be deployed using Tomcat and they must deployed to the same instance on the same machine (same JVM).
Repositories that are configured to use the nesting mechanism must be created using specific Sesame SAIL types:
(Where the owlim namespace above maps to http://www.ontotext.com/trree/owlim#.)
Additionally, the following configuration parameters are also used:
Once created, a child repository must not be reconfigured to use a different parent repository as this leads to inconsistent data.
The correct initialisation sequence is to start the parent repository followed by each of its children.
As long as no further updates occur, the shutdown sequence is not defined. However, it suggested that the children be shut down first followed by the parent.
Skip to end of metadata Go to start of metadata