There are two principle strategies for rule-based inference called 'forward-chaining' and 'backward-chaining'. They can be briefly explained as follows:
Both of these strategies have their advantages and disadvantages, which have been well studied in the history of KR and expert systems. Attempts to overcome the weak points have led to the development of various hybrid strategies (involving partial forward- and backward-chaining) which have proven efficient in many contexts
Reasoning and materialization are discussed in some detail in the OWLIM Primer .
RDFS inference is achieved via a set of axiomatic triples and entailment rules. These rules allow the full set of valid inferences using RDFS semantics to be determined. Herman ter Horst in  defines RDFS extensions for more general rule support and a fragment of OWL, which is more expressive than DLP and fully compatible with RDFS. First, he defines R-entailment, which extends RDFS-entailment in the following ways:
OWLIM uses a notation almost identical to R-Entailment defined by Horst. One major difference is that two forms of consistency checking rules are permitted. The first form is the same as defined by R-entailment, i.e. a rule without a consequence indicates inconsistency when its premises are satisfied. The second form has consequences that identify statements that must exist when the premises are true. OWLIM-SE performs reasoning based on forward-chaining of entailment rules defined using RDF triple patterns with variables. OWLIM-SE's reasoning strategy is 'total materialisation', which is introduced in the OWLIM Primer in the Reasoning Strategies topic
The rule format and the semantics enforced is analogous to R-entailment (see the Rule-Based Inference topic on page and ) with the following differences:
OWLIM-SE can be configured via "rule-sets" – sets of axiomatic triples, consistency checks and entailment rules - that determine the applied semantics. The implementation of OWLIM-SE relies on a compile stage, during which the rules are compiled into Java source code that is then further compiled using the Java compiler and merged together with the inference engine.
A rule-set file has three sections named Prefices, Axioms, and Rules. All sections are mandatory and must appear sequentially in this order. Comments are allowed anywhere and follow the Java convention, i.e. "/* ... */" for block comments and "//" for end of line comments.
This section defines abbreviations for the namespaces used in the rest of the file. The syntax is:
A typical prefices section might look like this:
This section is used to assert 'axiomatic triples', which are usually used to describe the meta-level primitives used to define the schema, such as rdf:type, rdfs:Class, etc. This section contains a list of the (variable free) triples, one per line. For example, the RDF axiomatic triples are defined thus:
This section is used to define entailment rules and consistency checks, which share a similar format. Each definition consists of premises and corollaries that are RDF statements defined with subject, predicate, object and optional context components. The subject, predicate and object can each be a variable, blank node, literal, full URI or the short name for a URI. If given, the context must be a full URI or a short name for a URI. Variables are alpha-numeric and must begin with a letter.
The syntax of a rule definition is as follows:
Where each premise and consequence is on a separate line. The following example helps to illustrate the possibilities:
The symbols p, x, y, z and a are variables. The second rule contains two constraints that reduce the number of bindings for each premise, i.e. they 'filter out' those statements where the constraint does not hold.
As can be seen, the last two variants are identical apart from the rotation of variables y and z, so one of these variants is not needed. The use of the [Cut] operator above tells the rule compiler to eliminate this last variant, i.e. the one beginning with the premise x p z.
The RIF rules that implement prp-spo2 use a relation (unrelated to the input or generated triples) called _checkChain. The OWLIM implementation maps this relation to the 'invisible' context of the same name with the addition of [Context <onto:_checkChain>] to certain statement patterns. Generated statements with this context can only used for bindings to rule premises when the exact same context is specified in the rule premise. The generated statements with this context will not be used for any other rules.
Consistency checks are used to ensure that the data model is in a consistent state and are applied whenever an update transaction is committed. The syntax is similar to that of rules, except that Consistency replaces the Id tag that introduces normal rules and furthermore consistency checks do not need to have any consequences. Consistency checks that have no consequences will indicate an inconsistency whenever their premises can be satisfied, e.g.
These inconsistency checks will output an error message to standard output whenever their premises are satisfied. No other action will be taken (no exception is thrown and the behaviour of the repository is not changed). The second consistency check describes a contradiction, which are typically expressed by inconsistency checks without consequences.
Consistency checks can have multiple consequences, but the semantics will remain the same – when all the premises are satisfied and one of the consequences is not found in the repository, then the data is inconsistent and an error message is written to standard output. The error message will include the statements that caused the inconsistency. The mechanism of inconsistency checking is switched off by default. It can be switched on by using the boolean parameter check-for-inconsistencies, see section 8.5.
An OWLIM repository will use the configured rule-set to compute all inferred statements at load time. To some extent, this process increases processing cost and time taken to load a repository with a large amount of data. However, it has the desirable advantage that subsequent query evaluation can proceed extremely quickly.
OWLIM stores explicit and implicit statements, i.e. those statements inferred (materialized) from the explicit statements. It follows therefore, that when explicit statements are removed from the repository, any implicit statements that rely on the removed statement must also be removed.
Special care is taken is taken when retracting owl:sameAs statements, so that the algorithm still works correctly when modifying equivalence classes.
In situations when fast statement retraction is required, but it is also necessary to update schemas, a special statement pattern can be used. By including a statement with the following form in the update:
where ?subject and ?object can be anything, OWLIM will use the smooth-delete algorithm, but will also traverse read-only statements and allow them to be deleted/inserted. Such transactions are likely to be be much more computationally expensive to achieve, but are intended for the occasional, offline update to otherwise read-only schemas. The advantage is that fast-delete can still be used, but a repository export and import is not required when making a modification to a schema.
For any transaction that includes the above special predicate:
There are a number of pre-defined rule-sets provided with OWLIM-SE that cover various well known knowledge representation formalisms. The following table gives the details:
The implementation of OWL2 QL is non-conformant with the W3C OWL2 profiles recommendation  as shown in Table 3:
OWLIM has an internal rule compiler that can be configured with a custom set of inference rules and axioms. The user may define a custom rule-set (see 'The Rule Language' on page ) in a .pie file (e.g. MySemantics.pie). The easiest way to create a custom rule-set is to start modifying one of the .pie files that were used to build the precompiled rule-sets. All of these are provided as part of the OWLIM-SE distribution.
There are several features in the RDFS and OWL specifications that result in rather inefficient entailment rules and axioms, which can have a significant impact on the performance of a reasoning engine. Such examples are:
Although the above inferences are correct and important for the completeness of the formal semantics, users rarely execute queries whose results are affected by the existence of such statements. Moreover, these inferences generate so many inferred statements that performance and scalability can be severely degraded.
These optimization were previously achieved using the partialRDFS parameter, but are now achieved by using a previously optimized built-in rule-set, see the ruleset parameter in the configuration section for a complete list.
The performance of a OWLIM-SE repository is greatly improved with a specific optimisation that allows it to handle owl:sameAs statements efficiently. owl:sameAs is an OWL predicate that declares that two different URIs identify one and the same resource. Most often, it is used to align different identifiers of the same real-world entity used in different data sources. For example, in DBPedia, the URI of Vienna is http://dbpedia.org/page/Vienna, while in Geonames it is http://sws.geonames.org/2761369/. DBpedia contains the statement
which declares that the two URIs are equivalent. owl:sameAs is probably the most important OWL predicate when it comes to merging data from different data sources.
Since gno:parentFeature is a transitive relationship, it will be inferred that the city of Vienna is also part of Austria:
Due to the semantics of owl:sameAs from (S1) it should also be inferred that statements (S2) and (S4) also hold for Vienna's DBpedia URI:
These implicit statements must hold no matter which one of the equivalent URIs is used, i.e. if a query is evaluated, the same results will be returned. When we consider that Austria, too, has an equivalent URI in DBpedia:
we should also infer that:
In the above example, we had two alignment statements (S1 and S7), two statements carrying specific factual knowledge (S2 and S3), one statement inferred due to a transitive property (S4), and seven statements inferred as a result of owl:sameAs alignment (S5, S7, S8, S9, S10, as well as the inverse statements of S1 and S7). As we see, inference without owl:sameAs inflated the dataset by 25% (one new statement on top of 4 explicit), while the presence of the owl:sameAs statements increased the full closure by 175% (7 new statements). Considering that Vienna has a URI also in UMBEL, which is also declared equivalent to the one in DBpedia, the addition of one more explicit statement for this alignment, will cause the inference of 4 new implicit statements (duplicates of S1, S5, S6, and S8). Although this is a small example, it provides a indication about the performance implications of using owl:sameAs alignment statements from Linked Open Data. Also, because owl:sameAs is a transitive, reflexive, and symmetric relationship, for a set of N equivalent URIs N2 (N squared) owl:sameAs statements will be generated for each pair of URIs (in reality there are not that many examples of large owl:sameAs equivalence classes). Thus, although owl:sameAs is useful for interlinking RDF datasets, its semantics causes considerable inflation of the number of implicit statements that should be considered during inference and query evaluation (either through forward- or through backward-chaining).
Skip to end of metadata Go to start of metadata