GraphDB-SE Reasoner

Skip to end of metadata
Go to start of metadata
GraphDB Documentation
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

Next versions

GraphDB 6.5
GraphBD 6.6
GraphBD 7.0
GraphDB 7.1

Previous versions

GraphDB 6.3
GraphDB 6.2
GraphDB 6.0 & 6.1

OWLIM 5.4
OWLIM 5.3
OWLIM 5.2
OWLIM 5.1
OWLIM 5.0
OWLIM 4.4
OWLIM 4.3
OWLIM 4.2
OWLIM 4.1
OWLIM 4.0

Rule-Based Inference

There are two principle strategies for rule-based inference called 'forward-chaining' and 'backward-chaining'. They can be briefly explained as follows:

  • Forward-chaining: involves applying the inference rules to the known facts (explicit statements) to generate new facts. The rules can then be re-applied to the combination of original facts and inferred facts to produce yet more new facts. The process is iterative and continues until no new facts can be generated. This kind of reasoning can have diverse objectives, for instance: to compute the inferred closure, to answer a particular query, to infer a particular sort of knowledge (e.g. the class taxonomy), etc. The advantage of this approach is that when all inferences have been computed query answering can proceed extremely quickly. The disadvantages come from greater initialisation costs (inference computed at load time) and space/memory usage (especially when the number of inferred facts is very large);
  • Backward-chaining: involves starting with a fact to be proved or a query to be answered. Typically, the reasoner examines the knowledge base to see if the fact to be proved is present and if not it examines the rule set to see which rules could be used to prove it. For the latter case, a check is made to see what other 'supporting' facts would need to be present to 'fire' these rules. The reasoner searches for proofs of each of these 'supporting' facts in the same way and iteratively maps out a search tree. The process terminates when either all of the leaves of the tree have proofs or no new candidate solutions can be found. Query processing is similar, but only stops when all search paths have been explored. The purpose in query answering is to find not just one, but all possible substitutions in the query expression. The advantages of this approach are that there are no inferencing costs at start-up and minimal space requirements. The disadvantage is that inference must be done each and every time a query is answered and for complex search graphs this can be computationally expensive and slow.

Both of these strategies have their advantages and disadvantages, which have been well studied in the history of KR and expert systems. Attempts to overcome the weak points have led to the development of various hybrid strategies (involving partial forward- and backward-chaining), which have proven efficient in many contexts.

Reasoning and materialisation are discussed in some detail in the GraphDB Primer. The handling of explicit and inferred statements is explained in Managing Explicit and Implicit Statements

GraphDB-SE's Logical Formalism

RDFS inference is achieved via a set of axiomatic triples and entailment rules. These rules allow the full set of valid inferences using RDFS semantics to be determined. Herman ter Horst in [25] defines RDFS extensions for more general rule support and a fragment of OWL, which is more expressive than DLP and fully compatible with RDFS. First, he defines R-entailment, which extends RDFS-entailment in the following ways:

  • it can operate on the basis of any set of rules R (i.e. allows for extension or replacement of the standard set, defining the semantics of RDFS);
  • it operates over so-called generalized RDF graphs, where blank nodes can appear as predicates (a possibility disallowed in RDF);
  • rules without premises are used to declare axiomatic statements;
  • rules without consequences are used to detect inconsistencies (integrity constraints).

GraphDB uses a notation almost identical to R-Entailment defined by Horst. GraphDB-SE performs reasoning based on forward-chaining of entailment rules defined using RDF triple patterns with variables. GraphDB-SE's reasoning strategy is one of 'materialisation', which is introduced in the GraphDB Primer in the Reasoning Strategies topic.

Rule Format and Semantics

The rule format and the semantics enforced is analogous to R-entailment (see the Rule-Based Inference topic on page and [25]) with the following differences:

  • Free variables in the head (without binding in the body) are treated as blank nodes. This feature must be used with extreme care, because custom rule-sets can easily be created, which recursively infer an infinite number of statements making the semantics intractable;
  • Variable inequality constraints can be specified in addition to the triple patterns (they can be placed after any premise or consequence). This leads to lower complexity compared to R-entailment;
  • the [cut] operator can be associated with rule premises. This is an optimisation that tells the rule compiler not to generate a variant of the rule with the identified rule premise as the first triple pattern;
  • Context can be used for both rule premises and rule consequences allowing more expressive constructions that utilise 'intermediate' statements contained within the given context URI;
  • Consistency checking rules do not have consequences and will indicate an inconsistency when the premises are satisfied;
  • Axiomatic triples can be provided as a set of statements, although these are not modelled as rules with empty bodies.

GraphDB-SE can be configured via "rule-sets" – sets of axiomatic triples, consistency checks and entailment rules, which determine the applied semantics. The implementation of GraphDB-SE relies on a compile stage, during which the rules are compiled into Java source code, which is then further compiled, using the Java compiler, and merged together with the inference engine.

The Rule Language

A rule-set file has three sections named Prefices, Axioms, and Rules. All sections are mandatory and must appear sequentially in this order. Comments are allowed anywhere and follow the Java convention, i.e. "/* ... */" for block comments and "//" for end of line comments.

For historic reasons, the way in which terms (variables, URLs and literals) are written differs from Turtle and SPARQL:

  • URLs in Prefices are written without angle brackets
  • variables are written without ? or $ and can include multiple alphanumeric chars
  • URLs are written in brackets, no matter if they are use prefix or are spelt in full
  • datatype URLs are written without brackets, eg
    a <owl:maxQualifiedCardinality> "1"^^xsd:nonNegativeInteger

See the examples below. Please be careful when writing terms. If you make a mistake, OWLIM will fail to start. You can see the syntax error in the error log, which can be viewed e.g. with Sesame at http://<server>/openrdf-sesame/system/logging/overview.view

Prefices

This section defines abbreviations for the namespaces used in the rest of the file. The syntax is:

shortname : URI

A typical prefices section might look like this:

Prefices
{
    rdf  :  http://www.w3.org/1999/02/22-rdf-syntax-ns#
    rdfs :  http://www.w3.org/2000/01/rdf-schema#
    owl  :  http://www.w3.org/2002/07/owl#
    xsd  :  http://www.w3.org/2001/XMLSchema#
}

Axioms

This section is used to assert 'axiomatic triples', which are usually used to describe the meta-level primitives used to define the schema, such as rdf:type, rdfs:Class, etc. This section contains a list of the (variable free) triples, one per line. For example, the RDF axiomatic triples are defined thus:

Axioms
{
    // RDF axiomatic triples
    <rdf:type>      <rdf:type> <rdf:Property>
    <rdf:subject>   <rdf:type> <rdf:Property>
    <rdf:predicate> <rdf:type> <rdf:Property>
    <rdf:object>    <rdf:type> <rdf:Property>
    <rdf:first>     <rdf:type> <rdf:Property>
    <rdf:rest>      <rdf:type> <rdf:Property>
    <rdf:value>     <rdf:type> <rdf:Property>
    <rdf:nil>       <rdf:type> <rdf:List>
}
Axiomatic statements are considered to be inferred for the purpose of query-answering, because they are a result a semantic interpretation defined by the chosen rule-set.

Rules

This section is used to define entailment rules and consistency checks, which share a similar format. Each definition consists of premises and corollaries that are RDF statements defined with subject, predicate, object and optional context components. The subject, predicate and object can each be a variable, blank node, literal, full URI or the short name for a URI. If given, the context must be a full URI or a short name for a URI. Variables are alpha-numeric and must begin with a letter.
If the context is provided, then the statements produced as rule consequences are not 'visible' during normal query answering. Instead, they can only be used as input to this or other rules and then only when the rule premise explicitly uses the given context. See the example below.
Furthermore, inequality constraints can be used to state that the values of the variables in a statement must not be equal to a specific full URI (or its short name) or to the value of another variable within the same rule. The behaviour of an inequality constraint depends on whether it is placed in the body or head of a rule. If it is placed in the body of a rule, then the whole rule will not 'fire' if the constraint fails, i.e. the constraint can be next to any statement pattern in the body of a rule with the same behaviour (the constraint does not have to be placed next to the variables it references). If the constraint is in the head, then its location is significant, because a constraint that does not hold will prevent only the statement it is adjacent to from being inferred.

Entailment rules

The syntax of a rule definition is as follows:

Id: <rule_name>
    <premises> <optional_constraints>
    -------------------------------
    <consequences> <optional_constraints>

Where each premise and consequence is on a separate line. The following example helps to illustrate the possibilities:

Rules
{
Id: rdf1_rdfs4a_4b
    x  a  y
    -------------------------------
    x  <rdf:type>  <rdfs:Resource>
    a  <rdf:type>  <rdfs:Resource>
    y  <rdf:type>  <rdfs:Resource>

Id: rdfs2
    x  a  y                 [Constraint a != <rdf:type>]
    a  <rdfs:domain>  z     [Constraint z != <rdfs:Resource>]
    -------------------------------
    x  <rdf:type>  z

Id: owl_FunctProp
    p  <rdf:type>  <owl:FunctionalProperty>
    x  p  y       [Constraint y != z, p != <rdf:type>]
    x  p  z       [Constraint z != y] [Cut]
    -------------------------------
    y  <owl:sameAs>  z
}

The symbols p, x, y, z and a are variables. The second rule contains two constraints that reduce the number of bindings for each premise, i.e. they 'filter out' those statements where the constraint does not hold.
In a forward chaining inference step, a rule is interpreted as meaning that for all possible ways of satisfying the premises, the bindings for the variables are used to populate the consequences of the rule. This generates new statements that will manifest themselves in the repository, for example, by being returned as query results.
The last rule contains an example of using the [Cut] operator, which is an optimisation hint for the rule compiler. When rules are compiled, a different variant of the rule is created for each premise, so that each premise occurs as the first triple pattern in one of the variants. This is done so that incoming statements can be efficiently matched to appropriate inferences rules. However, when a rule contains two or more premises that match identical triples patterns, but using different variable names, then the extra variant(s) are redundant and better efficiency can be achieved by simply not creating the extra rule variant(s). In the above example, rule owl_FunctProp would by default be compiled in to three variants:

p  <rdf:type>  <owl:FunctionalProperty>
x  p  y
x  p  z
-------------------------------
y  <owl:sameAs>  z


x  p  y
p  <rdf:type>  <owl:FunctionalProperty>
x  p  z
-------------------------------
y  <owl:sameAs>  z


x  p  z
p  <rdf:type>  <owl:FunctionalProperty>
x  p  y
-------------------------------
y  <owl:sameAs>  z

As can be seen, the last two variants are identical apart from the rotation of variables y and z, so one of these variants is not needed. The use of the [Cut] operator above tells the rule compiler to eliminate this last variant, i.e. the one beginning with the premise x p z.
The use of context in rule bodies and rule heads is also best explained by example. The following three rules implement the OWL2-RL property chain rule (prp-spo2) [31] and are inspired by the Rule Interchange Format (RIF) implementation [27]:

Id: prp-spo2_1
    p <owl:propertyChainAxiom> pc
    start pc last                [Context <onto:_checkChain>]
    ----------------------------
    start p last

Id: prp-spo2_2
    pc <rdf:first> p
    pc <rdf:rest> t              [Constraint t != <rdf:nil>]
    start p next
    next t last                  [Context <onto:_checkChain>]
    ----------------------------
    start pc last                [Context <onto:_checkChain>]

Id: prp-spo2_3
    pc <rdf:first> p
    pc <rdf:rest> <rdf:nil>
    start p last
    ----------------------------
    start pc last                [Context <onto:_checkChain>]

The RIF rules that implement prp-spo2 use a relation (unrelated to the input or generated triples) called _checkChain. The GraphDB implementation maps this relation to the 'invisible' context of the same name with the addition of [Context <onto:_checkChain>] to certain statement patterns. Generated statements with this context can only be used for bindings to rule premises when the exact same context is specified in the rule premise. The generated statements with this context will not be used for any other rules.

Consistency checks

Consistency checks are used to ensure that the data model is in a consistent state and are applied whenever an update transaction is committed. The syntax is similar to that of rules, except that Consistency replaces the Id tag that introduces normal rules. Also consistency checks do not have any consequences and will indicate an inconsistency, whenever their premises can be satisfied, e.g.

Consistency: something_can_not_be_nothing
    x rdf:type owl:Nothing
    -------------------------------


Consistency: both_sameAs_and_differentFrom_is_forbidden
    x owl:sameAs y
    x owl:differentFrom y
    -------------------------------

In case of any consistency check(s) failure, when a transaction is committed and consistency checking is switched on (it is off by default - see the configuration section), then:

  • A message is logged with the details of what consistency checks failed;
  • An exception is thrown with the same details;
  • The whole transaction is rolled back.

Materialisation

A GraphDB repository uses the configured rule-set to compute all inferred statements at load time. To some extent, this process increases the processing cost and time taken to load a repository with a large amount of data. However, it has the desirable advantage that subsequent query evaluation can proceed extremely quickly.
Apart from a number of optimisations, the approach taken by GraphDB is one of 'total materialisation', where the inference rules are applied repeatedly to the asserted (explicit) statements until no further inferred (implicit) statements are produced.

Retraction of assertions

GraphDB stores explicit and implicit statements, i.e. those statements inferred (materialised) from the explicit statements. It follows therefore, that when explicit statements are removed from the repository, any implicit statements that rely on the removed statement must also be removed.
In previous versions of GraphDB, this was achieved with a re-computation of the full closure (minimal model), i.e. applying the entailment rules to all explicit statements and computing the inferences. This approach guarantees correctness, but does not scale - the computation is increasingly slow and computationally expensive in proportion to the number of explicit statements and the complexity of the entailment rule-set.
Removal of explicit statements is now achieved in a more efficient manner, by invalidating only those inferred statements that can no longer be derived in any way.
One approach is to maintain track information for every statement, typically the list of statements that can be inferred from this statement. The list is built up during inference as the rules are applied and the statements inferred by the rules are added to the lists of all statements that triggered the inferences. The drawback of this technique is that track information inflates more rapidly than the inferred closure - in the case of large datasets up to 90% of the storage is required just to store the track information.
Another approach is to perform backward chaining. Backward chaining does not require track information, since it essentially re-computes the tracks as required. Instead, a flag for each statement is used, so that the algorithm can detect when a statement has been previously visited and thus avoid an infinite recursion. The algorithm used in GraphDB-SE works as follows:

  1. Apply a 'visited' flag to all statements ('false' by default);
  2. Store the statements to be deleted in the list L;
  3. For each statement in L that is not visited yet, mark it as visited and apply the forward chaining rules. Statements marked as visited become invisible, which is why the statement must be first marked and then used for forward chaining;
  4. If there are no more unvisited statements in L then END;
  5. Store all inferred statements in the list L1;
  6. For each element in L1 check the following:
    • If the statement is purely implicit statement (a statement can be both explicit and implicit and if so then it is not considered purely implicit), then mark it as deleted (prevent it from being returned by the iterators) and check whether it is supported by other statements. The isSupported() method uses queries that contain the premises of the rules and the variables of the rules are preliminarily bound using the statement in question. That is to say the isSupported() method starts from the projection of the query and then checks whether the query will return results (at least one), i.e. this method performs backward chaining.
    • If a result is returned by any query (every rule is represented by a query) in isSupported(), then this statement can be still derived from other statements in the repository, so it must not be deleted (its status is returned to 'inferred').
    • If all queries return no results, then this statement can no longer be derived from any other statements, so its status remains 'deleted' and the number of statements counter is updated.
  7. L := L1 and GOTO 3.

Special care is taken when retracting owl:sameAs statements, so that the algorithm still works correctly, when modifying equivalence classes.

One consequence of this algorithm is that deletion can still have poor performance when deleting schema statements, due to the (probably) large number of implicit statements inferred from them.
The forward chaining part of the algorithm terminates as soon as it detects that a statement is read-only, because if it cannot be deleted, there is no need to look for statements derived from it. For this reason, performance can be greatly improved when all schema statements are made read-only by importing ontologies (and OWL/RDFS vocabularies) using the imports repository parameter.

Schema update transactions

In situations when fast statement retraction is required, but it is also necessary to update schemas, a special statement pattern can be used. By including an insert for a statement with the following form in the update:

[] <http://www.ontotext.com/owlim/system#schemaTransaction> []

GraphDB will use the smooth-delete algorithm, but will also traverse read-only statements and allow them to be deleted/inserted. Such transactions are likely to be be much more computationally expensive to achieve, but are intended for the occasional, offline update to otherwise read-only schemas. The advantage is that fast-delete can still be used, but a repository export and import is not required when making a modification to a schema.

For any transaction that includes an insert of the above special predicate/statement:

  • Read-only (explicit or inferred) statements can be deleted;
  • New explicit statements are marked as read-only;
  • New inferred statements are marked:
    • Read-only if all the premises that fired the rule are read-only;
    • Normal otherwise.

Schema statements can be inserted or deleted using SPARQL Update as follows:

DELETE {
  [[schema statements to delete]]
}
INSERT {
  [] <http://www.ontotext.com/owlim/system#schemaTransaction> [] .
  [[schema statements to insert]]
}
WHERE { }

Reinferring

Statements are inferred only when you insert new statements. So if reconnect to a repository with a different rule set, it does not take effect immediately. However, you can cause reinference with an Update statement like this one:

INSERT DATA { [] <http://www.ontotext.com/owlim/system#reinfer> [] }

This removes all inferred statements and reinfers from scratch. If a statement is both explicitly inserted and inferred, it is not removed, see Managing Explicit and Implicit Statements.

Predefined Rule-Sets

There are a number of pre-defined rule-sets provided with GraphDB-SE, which cover various well known knowledge representation formalisms. The following table gives the details:

Rule set Description
empty No reasoning at all, i.e. GraphDB operates as a plain RDF store
rdfs Supports the standard model-theoretic RDFS semantics
owl-horst OWL dialect close to OWL Horst – essentially pD*
owl-max RDFS and that part of OWL Lite that can be captured in rules (deriving functional and inverse functional properties, all-different, subclass by union/enumeration; min/max cardinality constraints, etc.)
owl2-ql The OWL2 QL profile – a fragment of OWL2 Full designed so that sound and complete query answering is LOGSPACE with respect to the size of the data. This OWL2 profile is based on DL-LiteR, a variant of DL-Lite that does not require the unique name assumption.
owl2-rl The OWL2 RL profile – an expressive fragment of OWL2 Full that is amenable for implementation on rule-engines.
Note that all rule-sets do not support data-type reasoning, which is the main reason that OWL-Horst is not the same as pD*. The rule-set to be used for a specific repository is defined through the ruleset parameter. There are optimised versions of all rule-sets that avoid some little used inferences.

OWL2 QL non-conformance

The implementation of OWL2 QL is non-conformant with the W3C OWL2 profiles recommendation [31] as shown in Table 3:

Conformant behaviour Implemented behaviour
Given a list of disjoint (data or object) properties and an entity that is related with these properties to objects {a, b, c, d,...} then infer an owl:AllDifferent restriction on an anonymous list of these objects. For each pair {p, q} (p != q) of disjoint (data or object) properties then infer the triple:
p owl:propertyDisjointWith q
Which is more likely to be useful for query answering.
For each class C in the knowledge base infer the existence of an anonymous class that is the union of a list of classes containing only C. Not supported. Even if this infinite expansion were possible in a forward-chaining rule-based implementation, the resulting statements are of no use during query evaluation.
If a instance of C1, and b instance of C2, and C1 and C2 disjoint then infer:
a owl:differentFrom b
Impractical for knowledge bases with many members of pairs of disjoint classes, e.g. Wordnet. Instead this is implemented as a consistency check:
If x instance of C1 and C2, and C1 and C2 disjoint then inconsistent.

Custom Rule-Sets

GraphDB has an internal rule compiler that can be configured with a custom set of inference rules and axioms. The user may define a custom rule-set (see 'The Rule Language' on page ) in a .pie file (e.g. MySemantics.pie). The easiest way to create a custom rule-set is to start modifying one of the .pie files that were used to build the precompiled rule-sets. All of these are provided as part of the GraphDB-SE distribution.
If the code generation or compilation cannot be completed successfully, a Java exception is thrown with an indication of the problem. It will state either the Id of the rule or the complete line from the source file where the problem is located. Line information is not preserved during the parsing of the rule file.
The user should specify the custom rule-set via the ruleset configuration parameter. The value of the ruleset parameter is interpreted as a filename and '.pie' is appended when not present. This file is processed to create Java source code that is compiled using the compiler from the Java Development Kit (JDK). The compiler is invoked using the mechanism provided by the JDK version 1.6 (or later). Therefore, a prerequisite for using custom rule-sets is that the Java Virtual Machine (JVM) from a JDK version 1.6 (or later) is used to run the application. If all goes well, the class is loaded dynamically and instantiated for further use by GraphDB-SE during inference. The intermediate files are created in the folder that is pointed by the java.io.tmpdir system property. The JVM should have sufficient rights to read and write to this directory.

Performance Optimisations in RDFS and OWL Support

There are several features in the RDFS and OWL specifications that result in rather inefficient entailment rules and axioms, which can have a significant impact on the performance of a reasoning engine. Such examples are:

  • The consequence X rdf:type rdfs:Resource for each URI node in the RDF graph;
  • The system should be able to infer that URIs are classes and properties, if they appear in schema-defining statements like X rdfs:subClassOf Y and X rdfs:subPropertyOf Y;
  • The individual equality property in OWL is reflexive, i.e. the statement X owl:sameAs X holds for every OWL individual;
  • All OWL classes are sub-classes of owl:Thing and for all individuals X rdf:type owl:Thing should hold;
  • C is inferred as being a rdfs:Class whenever an instance of the class is defined: I rdf:type C.

Although the above inferences are important for formal semantics completeness, users rarely execute queries that seek such statements. Moreover, these inferences generate so many inferred statements that performance and scalability can be severely degraded.
For this reason, optimised versions of the standard rule-sets are provided. These have '-optimized' appended to the rule-set name, e.g. owl-horst-optimized. (Note: previously these optimization were achieved using the partialRDFS parameter.)

The following optimisations are enacted:

Optimization Affects patterns
Remove axiomatic triples
any any <rdfs:Resource>
<rdfs:Resource> any any
any <rdfs:domain> <rdf:Property>
any <rdfs:range> <rdf:Property>
<owl:sameAs> <rdf:type> <owl:SymmetricProperty>
<owl:sameAs> <rdf:type> <owl:TransitiveProperty>
Remove rule conclusions
any any <rdfs:Resource>
Remove rule constraints
[Constraint var != <rdfs:Resource>]

sameAs Optimisation

The performance of a GraphDB-SE repository is greatly improved by a specific optimisation that handles owl:sameAs statements efficiently. owl:sameAs declares that two different URIs identify the same resource. Most often, it is used to align identifiers of the same real-world entity used in different data sets. For example, in DBPedia the URI of Vienna is http://dbpedia.org/page/Vienna, while in Geonames it is http://sws.geonames.org/2761369. DBpedia contains the statement

(S1) dbpedia:Vienna owl:sameAs geonames:2761369

that declares the two URIs as equivalent.

owl:sameAs is probably the most important OWL predicate when it comes to integrating data from different data sources and interlinking RDF datasets. However, its semantics causes explosion of the number of inferred statements. Following the formal definition of OWL (OWL2 RL, to be more specific), whenever two URIs are declared equivalent, all statements that involve one of the URIs should be "replicated" using the other URI in the same position. For instance, in Geonames the city of Vienna is defined as part of http://www.geonames.org/2761367 (the first-order administrative division in Austria with the same name), which in turn, is part of Austria http://www.geonames.org/2782113:

(S2) geonames:2761369 gno:parentFeature geonames:2761367
(S3) geonames:2761367 gno:parentFeature geonames:2782113

Since gno:parentFeature is a transitive relationship, it will be inferred that the city of Vienna is also part of Austria:

(S4) geonames:2761369 gno:parentFeature geonames:2782113

Due to the semantics of owl:sameAs from (S1) it should also be inferred that statements (S2) and (S4) also hold for Vienna's DBpedia URI:

(S5) dbpedia:Vienna gno:parentFeature geonames:2761367
(S6) dbpedia:Vienna gno:parentFeature geonames:2782113

These implicit statements must hold no matter which of the equivalent URIs is used in a query. When we consider that Austria also has an equivalent URI in DBpedia:

(S7) geonames:2782113 owl:sameAs dbpedia:Austria

we should also infer that:

(S8) dbpedia:Vienna gno:parentFeature dbpedia:Austria
(S9) geonames:2761369 gno:parentFeature dbpedia:Austria
(S10) geonames:2761367 gno:parentFeature dbpedia:Austria

In the above example, we had two alignment statements (S1 and S7), two statements carrying specific factual knowledge (S2 and S3), one statement inferred due to a transitive property (S4), and seven statements inferred as a result of owl:sameAs expansion (S5, S7, S8, S9, S10, as well as the inverse statements of S1 and S7).
As we see, inference without owl:sameAs increased the dataset by 25% (one new statement on top of 4 explicit), while owl:sameAs inference increased the full closure by 175% (7 new statements). But Vienna also has a URI in UMBEL: if we add an owl:sameAs for this alignment, it will cause the inference of 4 new implicit statements (duplicates of S1, S5, S6, and S8). Although this is a small example, it provides a indication about the performance implications of using owl:sameAs alignment statements from Linked Open Data.

Furthermore, owl:sameAs is an equivalence relation (transitive, reflexive, and symmetric). Thus for a set of N equivalent URIs, N2 (N squared) owl:sameAs statements should be considered.

GraphDB handles owl:sameAs in a special manner, avoiding both problems. It does not explode the statement indices, nor stores a quadratic number of owl:sameAs statements per cluster (equivalence class). Instead, each owl:sameAs cluster is represented by a single super-node, and all statements are recorded against a selected representative of each cluster. During query evaluation, GraphDB uses a kind of backward chaining by enumerating equivalent URIs, guaranteeing completeness of inference and query results. Special care is taken to ensure that this optimisation does not hinder the ability to distinguish between explicit and implicit statements.

This occurs even when the 'empty' (no inference) rule-set is selected. The owl:sameAs optimisation can be disabled completely using the disable-sameAs configuration parameter, see GraphDB-SE Configuration for details.
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.