There are two principle strategies for rule-based inference called 'forward-chaining' and 'backward-chaining'. They can be briefly explained as follows:
Both of these strategies have their advantages and disadvantages, which have been well studied in the history of KR and expert systems. Attempts to overcome the weak points have led to the development of various hybrid strategies (involving partial forward- and backward-chaining), which have proven efficient in many contexts.
RDFS inference is achieved via a set of axiomatic triples and entailment rules. These rules allow the full set of valid inferences using RDFS semantics to be determined. Herman ter Horst in  defines RDFS extensions for more general rule support and a fragment of OWL, which is more expressive than DLP and fully compatible with RDFS. First, he defines R-entailment, which extends RDFS-entailment in the following ways:
GraphDB uses a notation almost identical to R-Entailment defined by Horst. GraphDB-SE performs reasoning based on forward-chaining of entailment rules defined using RDF triple patterns with variables. GraphDB-SE's reasoning strategy is one of 'materialisation', which is introduced in the GraphDB Primer in the Reasoning Strategies topic.
The rule format and the semantics enforced is analogous to R-entailment (see the Rule-Based Inference topic on page and ) with the following differences:
GraphDB-SE can be configured via "rule-sets" – sets of axiomatic triples, consistency checks and entailment rules, which determine the applied semantics. The implementation of GraphDB-SE relies on a compile stage, during which the rules are compiled into Java source code, which is then further compiled, using the Java compiler, and merged together with the inference engine.
A rule-set file has three sections named Prefices, Axioms, and Rules. All sections are mandatory and must appear sequentially in this order. Comments are allowed anywhere and follow the Java convention, i.e. "/* ... */" for block comments and "//" for end of line comments.
For historic reasons, the way in which terms (variables, URLs and literals) are written differs from Turtle and SPARQL:
See the examples below. Please be careful when writing terms. If you make a mistake, OWLIM will fail to start. You can see the syntax error in the error log, which can be viewed e.g. with Sesame at http://<server>/openrdf-sesame/system/logging/overview.view
This section defines abbreviations for the namespaces used in the rest of the file. The syntax is:
A typical prefices section might look like this:
This section is used to assert 'axiomatic triples', which are usually used to describe the meta-level primitives used to define the schema, such as rdf:type, rdfs:Class, etc. This section contains a list of the (variable free) triples, one per line. For example, the RDF axiomatic triples are defined thus:
This section is used to define entailment rules and consistency checks, which share a similar format. Each definition consists of premises and corollaries that are RDF statements defined with subject, predicate, object and optional context components. The subject, predicate and object can each be a variable, blank node, literal, full URI or the short name for a URI. If given, the context must be a full URI or a short name for a URI. Variables are alpha-numeric and must begin with a letter.
The syntax of a rule definition is as follows:
Where each premise and consequence is on a separate line. The following example helps to illustrate the possibilities:
The symbols p, x, y, z and a are variables. The second rule contains two constraints that reduce the number of bindings for each premise, i.e. they 'filter out' those statements where the constraint does not hold.
As can be seen, the last two variants are identical apart from the rotation of variables y and z, so one of these variants is not needed. The use of the [Cut] operator above tells the rule compiler to eliminate this last variant, i.e. the one beginning with the premise x p z.
The RIF rules that implement prp-spo2 use a relation (unrelated to the input or generated triples) called _checkChain. The GraphDB implementation maps this relation to the 'invisible' context of the same name with the addition of [Context <onto:_checkChain>] to certain statement patterns. Generated statements with this context can only be used for bindings to rule premises when the exact same context is specified in the rule premise. The generated statements with this context will not be used for any other rules.
Consistency checks are used to ensure that the data model is in a consistent state and are applied whenever an update transaction is committed. The syntax is similar to that of rules, except that Consistency replaces the Id tag that introduces normal rules. Also consistency checks do not have any consequences and will indicate an inconsistency, whenever their premises can be satisfied, e.g.
In case of any consistency check(s) failure, when a transaction is committed and consistency checking is switched on (it is off by default - see the configuration section), then:
A GraphDB repository uses the configured rule-set to compute all inferred statements at load time. To some extent, this process increases the processing cost and time taken to load a repository with a large amount of data. However, it has the desirable advantage that subsequent query evaluation can proceed extremely quickly.
GraphDB stores explicit and implicit statements, i.e. those statements inferred (materialised) from the explicit statements. It follows therefore, that when explicit statements are removed from the repository, any implicit statements that rely on the removed statement must also be removed.
Special care is taken when retracting owl:sameAs statements, so that the algorithm still works correctly, when modifying equivalence classes.
In situations when fast statement retraction is required, but it is also necessary to update schemas, a special statement pattern can be used. By including an insert for a statement with the following form in the update:
GraphDB will use the smooth-delete algorithm, but will also traverse read-only statements and allow them to be deleted/inserted. Such transactions are likely to be be much more computationally expensive to achieve, but are intended for the occasional, offline update to otherwise read-only schemas. The advantage is that fast-delete can still be used, but a repository export and import is not required when making a modification to a schema.
For any transaction that includes an insert of the above special predicate/statement:
Schema statements can be inserted or deleted using SPARQL Update as follows:
Statements are inferred only when you insert new statements. So if reconnect to a repository with a different rule set, it does not take effect immediately. However, you can cause reinference with an Update statement like this one:
This removes all inferred statements and reinfers from scratch. If a statement is both explicitly inserted and inferred, it is not removed, see Managing Explicit and Implicit Statements.
There are a number of pre-defined rule-sets provided with GraphDB-SE, which cover various well known knowledge representation formalisms. The following table gives the details:
The implementation of OWL2 QL is non-conformant with the W3C OWL2 profiles recommendation  as shown in Table 3:
GraphDB has an internal rule compiler that can be configured with a custom set of inference rules and axioms. The user may define a custom rule-set (see 'The Rule Language' on page ) in a .pie file (e.g. MySemantics.pie). The easiest way to create a custom rule-set is to start modifying one of the .pie files that were used to build the precompiled rule-sets. All of these are provided as part of the GraphDB-SE distribution.
There are several features in the RDFS and OWL specifications that result in rather inefficient entailment rules and axioms, which can have a significant impact on the performance of a reasoning engine. Such examples are:
Although the above inferences are important for formal semantics completeness, users rarely execute queries that seek such statements. Moreover, these inferences generate so many inferred statements that performance and scalability can be severely degraded.
The following optimisations are enacted:
The performance of a GraphDB-SE repository is greatly improved by a specific optimisation that handles owl:sameAs statements efficiently. owl:sameAs declares that two different URIs identify the same resource. Most often, it is used to align identifiers of the same real-world entity used in different data sets. For example, in DBPedia the URI of Vienna is http://dbpedia.org/page/Vienna, while in Geonames it is http://sws.geonames.org/2761369. DBpedia contains the statement
that declares the two URIs as equivalent.
owl:sameAs is probably the most important OWL predicate when it comes to integrating data from different data sources and interlinking RDF datasets. However, its semantics causes explosion of the number of inferred statements. Following the formal definition of OWL (OWL2 RL, to be more specific), whenever two URIs are declared equivalent, all statements that involve one of the URIs should be "replicated" using the other URI in the same position. For instance, in Geonames the city of Vienna is defined as part of http://www.geonames.org/2761367 (the first-order administrative division in Austria with the same name), which in turn, is part of Austria http://www.geonames.org/2782113:
Since gno:parentFeature is a transitive relationship, it will be inferred that the city of Vienna is also part of Austria:
Due to the semantics of owl:sameAs from (S1) it should also be inferred that statements (S2) and (S4) also hold for Vienna's DBpedia URI:
These implicit statements must hold no matter which of the equivalent URIs is used in a query. When we consider that Austria also has an equivalent URI in DBpedia:
we should also infer that:
In the above example, we had two alignment statements (S1 and S7), two statements carrying specific factual knowledge (S2 and S3), one statement inferred due to a transitive property (S4), and seven statements inferred as a result of owl:sameAs expansion (S5, S7, S8, S9, S10, as well as the inverse statements of S1 and S7).
Furthermore, owl:sameAs is an equivalence relation (transitive, reflexive, and symmetric). Thus for a set of N equivalent URIs, N2 (N squared) owl:sameAs statements should be considered.
GraphDB handles owl:sameAs in a special manner, avoiding both problems. It does not explode the statement indices, nor stores a quadratic number of owl:sameAs statements per cluster (equivalence class). Instead, each owl:sameAs cluster is represented by a single super-node, and all statements are recorded against a selected representative of each cluster. During query evaluation, GraphDB uses a kind of backward chaining by enumerating equivalent URIs, guaranteeing completeness of inference and query results. Special care is taken to ensure that this optimisation does not hinder the ability to distinguish between explicit and implicit statements.
Skip to end of metadata Go to start of metadata