GraphDB-Lite Reasoner

Version 1 by barry.bishop
on Mar 23, 2012 15:12.

compared with
Current by Gergana Petkova
on Sep 18, 2014 12:46.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (34)

View Page History
{toc}

h1. Rule-Based Inference

Both of these strategies have their advantages and disadvantages, which have been well studied in the history of KR and expert systems. Attempts to overcome the weak points have led to the development of various hybrid strategies (involving partial forward\- and backward-chaining) which have proven efficient in many contexts.

Reasoning and materialization are discussed in some detail in the [OWLIM [GraphDB Primer|Primer Background Knowledge] .

h1. OWLIM-Lite's GraphDB-Lite Logical Formalism

RDFS inference is achieved via a set of axiomatic triples and entailment rules. These rules allow the full set of valid inferences using RDFS semantics to be determined. Herman ter Horst in \[21\] in \[21\] defines RDFS extensions for more general rule support and a fragment of OWL, which is more expressive than DLP and fully compatible with RDFS. First, he defines R-entailment, which extends RDFS-entailment in the following ways:
* it can operate on the basis of any set of rules R (i.e. allows for extension or replacement of the standard set, defining the semantics of RDFS);
* it operates over so-called generalizsed RDF graphs, where blank nodes can appear as predicates (a possibility disallowed in RDF);
* rules without premises are used to declare axiomatic statements;
* rules without consequences are used to detect inconsistency (integrity constraints).

OWLIM GraphDB uses a notation almost identical to R-Entailment defined by Horst. One major difference is that consistency checking rules are not supported. OWLIM GraphDB performs reasoning based on forward-chaining of entailment rules defined using RDF triple patterns with variables. OWLIM's GraphDB reasoning strategy is 'total materialisation', which is introduced in the OWLIM GraphDB Primer in the Reasoning Strategies topic.

h2. Rule Format and Semantics

The rule format and the semantics enforced is analogous to R-entailment (see the Rule-Based Inference topic on page and \[21\]) with the following differences:
* Free variables in the head (without binding in the body) are treated as blank nodes. This feature must be used with extreme care, because custom rule-sets can easily be created that recursively infer an infinite number of statements making the semantics intractable;
* Variable inequality constraints can be specified in addition to the triple patterns (they can be placed after any premise or consequence). This leads to lower complexity compared to R-entailment;
* the *\[cCut\]* operator can be associated with rule premises. This is an optimisation that tells the rule compiler not to generate a variant of the rule with the identified rule premise as the first triple pattern;
* Axiomatic triples can be provided as a set of statements, although these are not modelled as rules with empty bodies.

OWLIM GraphDB can be configured via "rule-sets" -- sets of axiomatic triples and entailment rules - that determine the applied semantics. The implementation of OWLIM GraphDB relies on a compile stage, during which the rules are compiled into Java source code that is then further compiled using the Java compiler and merged together with the inference engine.

h2. The Rule Language
}
{noformat}

h3. Axioms

}
{noformat}

h3. Rules

As can be seen, the last two variants are identical apart from the rotation of variables *y* and *z*, so one of these variants is not needed. The use of the *\[Cut\]* operator above tells the rule compiler to eliminate this last variant, i.e. the one beginning with the premise *x p z*.

h2. Materializsation

An OWLIM A GraphDB repository will use the configured rule-set to compute all inferred statements at load time. To some extent, this process increases processing cost and time taken to load a repository with a large amount of data. However, it has the desirable advantage that subsequent query evaluation can proceed extremely quickly.
Apart from a number of optimisations, the approach taken by OWLIM GraphDB is one of 'total materializsation', where the inference rules are applied repeatedly to the asserted (explicit) statements until no further inferred (implicit) statements are produced.

h2. Retraction of assertions

OWLIM GraphDB stores explicit and implicit statements, i.e. those statements inferred (materializsed) from the explicit statements. It follows therefore, that when explicit statements are removed from the repository, any implicit statements that rely on the removed statement must also be removed.
This is achieved with a re-computation of the full closure (minimal model), i.e. applying the entailment rules to all explicit statements and computing the inferences. This approach guarantees correctness, but does not scale - the computation is increasingly slow and computationally expensive in proportion to the number of explicit statements and the complexity of the entailment rule-set.

h1. Predefined Rule-Sets

There are a number of pre-defined rule-sets provided with OWLIM-Lite GraphDB-Lite that cover various well known knowledge representation formalisms. The following table gives the details:
\\
|| Rule set || Description ||
| empty | No reasoning at all, i.e. OWLIM GraphDB operates as a plain RDF store |
| rdfs | Supports the standard model-theoretic RDFS semantics |
| owl-horst | OWL dialect close to OWL Horst -- essentially {{pD\*}} |
| owl-max | RDFS and that part of OWL- Lite that can be captured in rules (deriving functional and inverse functional properties, all-different, subclass by union/enumeration; min/max cardinality constraints, etc.) |
| owl2-ql | The OWL2 QL profile -- a fragment of OWL2 Full designed so that sound and complete query answering is LOGSPACE with respect to the size of the data. This OWL2 profile is based on DL-Lite{~}R~, a variant of DL-Lite that does not require the unique name assumption. |
| owl2-rl-reduced | The OWL2 RL profile -- an expressive fragment of OWL2 Full that is amenable for implementation on rule-engines, but without the {{prp-key}} rule for efficiency reasons. |
h2. OWL2 RL non-conformance

The OWLIM-Lite GraphDB-Lite reasoner does not support the quad patterns that are available in OWLIM-SE. Therefore GraphDB-SE. Therefore, it manages auxiliary ternary predicates (associated with the expansion of LIST structures in the bodies of [OWLIM2-RL [GraphDB2-RL entailment rules|http://www.w3.org/TR/owl-profiles/#Reasoning_in_OWL_2_RL_and_RDF_Graphs_using_Rules]) using blank nodes. This is much less efficient than OWLIM-SE, GraphDB-SE, so it is provided in two versions. {{owl2-rl-conf}} contains all the rules, but is not recommended for datasets larger and than a few tens of thousands of statements. {{owl2-rl-reduced}} is identical, except that the {{prp-key}} rule has been removed. This makes it suitable for datasets of several hundreds of thousands of statements.

h2. OWL2 QL non-conformance

The implementation of OWL2 QL OWL2 QL is non-conformant with the W3C OWL2 profiles recommendation \[31\] recommendation \[31\] as shown in Table 3:
\\
|| Conformant behaviour || Implemented behaviour ||
p owl:propertyDisjointWith q \\
Which is more likely to be useful for query answering. |
| For each class C in the knowledge base infer the existence of an anonymous class that is the union of a list of classes containing only C. | Not supported. Even if this infinite expansion were is possible in a forward-chaining rule-based implementation, the resulting statements are of no use during query evaluation. |
| If a instance of C1, and b instance of C2, and C1 and C2 disjoint then infer: \\
a owl:differentFrom b | Impractical for knowledge bases with many members of pairs of disjoint classes, e.g. Wordnet. Instead this is implemented as a consistency check: \\
h1. Custom Rule-Sets

OWLIM GraphDB has an internal rule compiler that can be used to configure a custom set of inference rules and axioms. The user may define a custom rule-set (see section 7.1.2) section 7.1.2) in a .pie file (e.g. MySemantics.pie). The easiest way to create a custom rule-set is to start modifying one of the .pie files that were used to build the precompiled rule-sets. All of these are provided as part of the OWLIM-Lite GraphDB-Lite distribution.
If the code generation or compilation cannot be completed successfully, a Java exception is thrown with an indication of the problem. It will state either the Id of the rule or the complete line from the source file where the problem is located. Line information is not preserved during the parsing of the rule file.
The user should specify the custom rule-set via the ruleset configuration parameter. The value of the ruleset parameter is interpreted as a filename and '.pie' is appended when not present. This file is processed to create Java source code that is compiled using the compiler from the Java Development Kit (JDK). The compiler is invoked using the mechanism provided by the JDK version 1.6 (or later). Therefore, a prerequisite for using custom rule-sets is that the Java Virtual Machine (JVM) from a JDK version 1.6 (or later) is used to run the application. If all goes well, the class is loaded dynamically and instantiated for further use by OWLIM GraphDB during inference. The intermediate files are created in the folder that is pointed by the java.io.tmpdir system property. The JVM should have sufficient rights to read and write to this directory.
In environments that use custom class-loader schemes, such as the OSGI frameworks, the default method for locating .class file does not work when compiling rule files. To get around this, the rule compiler will use uses the values of two Java system properties, namely, \-Dowlim-lite.X.Y.jar.file and \--Dopenrdf-model.jar.file, for the exact locations of the OWLIM GraphDB and open RDF model libraries that should be used during the compilation process. The values should contain the exact file system path to both files.

{info}An important issue with custom rule sets is that OWLIM GraphDB requires that at least one of the following RDF(S) resources be specified/mentioned within an axiomatic triple or a rule: {{rdf:type}}, {{rdfs:range}} {{dfs:doman}}, {{rdfs:subClassOf}}, {{rdfs:Class}} and {{rdfs:subPropertyOf}}. These are used in specific implementations related to basic RDF support and are hard-coded into OWLIM. GraphDB.
{info}
{info}Due to some of the optimisation techniques used in OWLIM, GraphDB, the set of rules in a custom rule set require at least one of the rules to derive the following triple patterns: {{x rdf:type y}}, {{x rdfs:subClassOf y}} and {{x rdfs:subPropertyOf y}} (the actual variable names do not matter).
{info}
h1. Performance Optimizations in RDFS and OWL Support

There are several features in the RDFS and OWL specifications that result in rather inefficient entailment rules and axioms, which can have a significant impact on the performance of a reasoning engine. Such examples are:
* The consequence *X* *rdf:type rdfs:Resource* for each URI node in the RDF graph;
* The system should be able to infer that URIs are classes and properties if they appear in schema-defining statements like *X rdfs:subClassOf Y* and *X rdfs:subPropertyOf Y*;
* The individual equality property in OWL is reflexive, i.e. the statement *X owl:sameAs X* holds for every OWL individual;
* All OWL classes are sub-classes of *owl:Thing* and for all individuals *X rdf:type owl:Thin>* should hold;
* A class is inferred as being a *rdfs:Class* whenever an instance of the class is defined with *I rdf:type C*.
h1. Performance Optimisations in RDFS and OWL Support

There are several features in the RDFS and OWL specifications that result in rather inefficient entailment rules and axioms, which can have a significant impact on the performance of a reasoning engine. Such examples are:
* The consequence {{X}} {{rdf:type rdfs:Resource}} for each URI node in the RDF graph;
* The system should be able to infer that URIs are classes and properties if they appear in schema-defining statements like {{X rdfs:subClassOf Y}} and {{X rdfs:subPropertyOf Y}};
* The individual equality property in OWL is reflexive, i.e. the statement {{X owl:sameAs X}} holds for every OWL individual;
* All OWL classes are sub-classes of {{owl:Thing}} and for all individuals {{X rdf:type owl:Thing}} should hold;
* A class is inferred as being a {{rdfs:Class}} whenever an instance of the class is defined with {{I rdf:type C}}.

Although the above inferences are correct and important for the completeness of the formal semantics, users rarely execute queries whose results are affected by the existence of such statements. Moreover, these inferences generate so many inferred statements that performance and scalability can be severely degraded.
For this reason, optimized versions of standard rule-sets are provided. These have '-optimized' appended to the rule-set name, e.g. *owl-horst-optimized*, and use the optimizsations listed in Table 4.
\\
| Optimizsation | *Affects* |
| Remove axiomatic triples | *<any> <any> <rdfs:Resource>* \\
*<rdfs:Resource> <any> <any>* \\
| Remove rule conclusions | *<any> <any> <rdfs:Resource>* |
| Remove rule constraints | *\[Constraint <variable> \!= <rdfs:Resource>\]* |
These optimization optimisations were previously achieved using the *partialRDFS* parameter, but are now achieved by using a previously optimizsed built-in rule-set, see the *ruleset* {{ruleset}} parameter in the [configuration section|OWLIM-Lite section|GraphDB-Lite Configuration] for a complete list.