GraphDB-SE Reasoner

compared with
Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (29)

View Page History
* The individual equality property in OWL is reflexive, i.e. the statement *X owl:sameAs X* holds for every OWL individual;
* All OWL classes are sub-classes of *owl:Thing* and for all individuals *X rdf:type owl:Thing* should hold;
* A class *C* is inferred as being a *rdfs:Class* whenever an instance of the class is defined with defined: *I rdf:type C*.

Although the above inferences are correct and important for the completeness of the formal semantics, users rarely execute queries which results are affected by the existence of such statements. Moreover, these inferences generate so many inferred statements that performance and scalability can be severely degraded.
For this reason, optimised versions of standard rule-sets are provided. These have '-optimized' appended to the rule-set name, e.g. *owl-horst-optimized*, and use the optimisations listed in Table 4.
\\
| Optimization | *Affects* |
| Remove axiomatic triples | *<any> <any> <rdfs:Resource>* \\
*<rdfs:Resource> <any> <any>* \\
*<any> <rdfs:domain> <rdf:Property>* \\
*<any> <rdfs:range> <rdf:Property>* \\
*<owl:sameAs> <rdf:type> <owl:SymmetricProperty>* \\
*<owl:sameAs> <rdf:type> <owl:TransitiveProperty>* |
| Remove rule conclusions | *<any> <any> <rdfs:Resource>* |
| Remove rule constraints | *\[Constraint <variable> \!= <rdfs:Resource>\]* |
These optimization were previously achieved using the *partialRDFS* parameter, but are now achieved by using a previously optimized built-in rule-set, see the *ruleset* parameter in the [configuration section|GraphDB-SE Configuration] for a complete list.
Although the above inferences are important for formal semantics completeness, users rarely execute queries that seek such statements. Moreover, these inferences generate so many inferred statements that performance and scalability can be severely degraded.
For this reason, optimised versions of the standard rule-sets are provided. These have '-optimized' appended to the rule-set name, e.g. *owl-horst-optimized*. (Note: previously these optimization were achieved using the *partialRDFS* parameter.)

The following optimisations are enacted:
|| Optimization || Affects patterns ||
| Remove axiomatic triples | {noformat}any any <rdfs:Resource>
<rdfs:Resource> any any
any <rdfs:domain> <rdf:Property>
any <rdfs:range> <rdf:Property>
<owl:sameAs> <rdf:type> <owl:SymmetricProperty>
<owl:sameAs> <rdf:type> <owl:TransitiveProperty>{noformat} |
| Remove rule conclusions | {noformat}any any <rdfs:Resource>{noformat}|
| Remove rule constraints | {noformat}[Constraint var != <rdfs:Resource>]{noformat} |


h1. sameAs Optimisation

The performance of a GraphDB-SE repository is greatly improved with a specific optimisation, which allows it to handle {{owl:sameAs}} statements efficiently. {{owl:sameAs}} is an OWL predicate, which declares that two different URIs identify one and the same resource. Most often, it is used to align different identifiers of the same real-world entity used in different data sources. For example, in DBPedia, the URI of Vienna is {{http://dbpedia.org/page/Vienna}}, while in Geonames it is {{http://sws.geonames.org/2761369/}}. DBpedia contains the statement
The performance of a GraphDB-SE repository is greatly improved by a specific optimisation that handles {{owl:sameAs}} statements efficiently. {{owl:sameAs}} declares that two different URIs identify the same resource. Most often, it is used to align identifiers of the same real-world entity used in different data sets. For example, in DBPedia the URI of Vienna is {{http://dbpedia.org/page/Vienna}}, while in Geonames it is {{http://sws.geonames.org/2761369}}. DBpedia contains the statement
{noformat}(S1) dbpedia:Vienna owl:sameAs geonames:2761369{noformat}
{noformat}which declares that declares the two URIs are as equivalent. {{owl:sameAs}} is probably the most important OWL predicate when it comes to merging integrating data from different data sources.
Following the formal definition of OWL (OWL2 RL, to be more specific), whenever two URIs are declared equivalent, all statements that involve one of the URIs should be "replicated" with using the other URI at in the same position. For instance, in Geonames, the city of Vienna is defined as part of {{[http://www.geonames.org/2761367/]}} {{http://www.geonames.org/2761367}} (the first-order administrative division in Austria with the same name), which in turn, is part of Austria {{[http://www.geonames.org/2782113]}}: {{http://www.geonames.org/2782113}}:
{noformat}
{noformat}(S2) geonames:2761369 gno:parentFeature geonames:2761367
(S3) geonames:2761367 gno:parentFeature geonames:2782113
{noformat}
{noformat}Since gno:parentFeature is a transitive relationship, it will be inferred that the city of Vienna is also part of Austria:
{noformat}
{noformat}(S4) geonames:2761369 gno:parentFeature geonames:2782113
{noformat}

{noformat}Due to the semantics of {{owl:sameAs}} from (S1) it should also be inferred that statements (S2) and (S4) also hold for Vienna's DBpedia URI:
{noformat}
{noformat}(S5) dbpedia:Vienna gno:parentFeature geonames:2761367
(S6) dbpedia:Vienna gno:parentFeature geonames:2782113
{noformat}These implicit statements must hold no matter which one of the equivalent URIs is used, i.e. if a query is evaluated, the same results will be returned. When we consider that Austria, too, has an equivalent URI in DBpedia:
{noformat}
These implicit statements must hold no matter which of the equivalent URIs is used in a query. When we consider that Austria also has an equivalent URI in DBpedia:
{noformat}
{noformat}(S7) geonames:2782113 owl:sameAs dbpedia:Austria
{noformat}
{noformat}we should also infer that:
{noformat}
{noformat}(S8) dbpedia:Vienna gno:parentFeature dbpedia:Austria
(S9) geonames:2761369 gno:parentFeature dbpedia:Austria
(S10) geonames:2761367 gno:parentFeature dbpedia:Austria
{noformat}In the above example, we had two alignment statements (S1 and S7), two statements carrying specific factual knowledge (S2 and S3), one statement inferred due to a transitive property (S4), and seven statements inferred as a result of {{owl:sameAs}} alignment (S5, S7, S8, S9, S10, as well as the inverse statements of S1 and S7). As we see, inference without {{owl:sameAs}} inflated the dataset by 25% (one new statement on top of 4 explicit), while the presence of the {{owl:sameAs}} statements increased the full closure by 175% (7 new statements). Considering that Vienna has a URI also in UMBEL, which is also declared equivalent to the one in DBpedia, the addition of one more explicit statement for this alignment, will cause the inference of 4 new implicit statements (duplicates of S1, S5, S6, and S8). Although this is a small example, it provides a indication about the performance implications of using {{owl:sameAs}} alignment statements from Linked Open Data. Also, because {{owl:sameAs}} is a transitive, reflexive, and symmetric relationship, for a set of N equivalent URIs N{^}2^ (N squared) {{owl:sameAs}} statements will be generated for each pair of URIs (in reality there are not that many examples of large {{owl:sameAs}} equivalence classes). Thus, although {{owl:sameAs}} is useful for interlinking RDF datasets, its semantics causes considerable inflation of the number of implicit statements that should be considered during inference and query evaluation (either through forward\- or through backward-chaining).
To overcome this problem, GraphDB-SE handles {{owl:sameAs}} in a specific manner. In its indices, each set of equivalent URIs (equivalence class with respect to {{owl:sameAs}}) is represented by a single super-node. This way, GraphDB-SE does not inflate the indices and, at the same time, retains the ability to enumerate all statements that should be inferred using the equivalence during retrieval requests, i.e. during inference or query evaluation. Special care is taken to ensure that this optimisation does not hinder the ability to distinguish explicit from implicit statements.
{noformat}

In the above example, we had two alignment statements (S1 and S7), two statements carrying specific factual knowledge (S2 and S3), one statement inferred due to a transitive property (S4), and seven statements inferred as a result of {{owl:sameAs}} expansion (S5, S7, S8, S9, S10, as well as the inverse statements of S1 and S7).
As we see, inference without {{owl:sameAs}} increased the dataset by 25% (one new statement on top of 4 explicit), while {{owl:sameAs}} inference increased the full closure by 175% (7 new statements). But Vienna also has a URI in UMBEL: if we add an {{owl:sameAs}} for this alignment, it will cause the inference of 4 new implicit statements (duplicates of S1, S5, S6, and S8).

Although this is a small example, it provides a indication about the performance implications of using {{owl:sameAs}} alignment statements from Linked Open Data. Since {{owl:sameAs}} is an equivalence relation (transitive, reflexive, and symmetric), for a set of N equivalent URIs, N{^}2^ (N squared) {{owl:sameAs}} statements will be generated. Although {{owl:sameAs}} is useful for interlinking RDF datasets, its semantics causes considerable inflation of the number of implicit statements that should be considered during inference and query evaluation (either through forward- or backward-chaining).

To overcome this problem, GraphDB handles {{owl:sameAs}} in a specific manner. In its indices, each set of equivalent URIs (equivalence class with respect to {{owl:sameAs}}) is represented by a single super-node. GraphDB does not inflate the statement indices, and at the same time retains the ability to enumerate all statements that should be inferred during retrieval. Special care is taken to ensure that this optimisation does not hinder the ability to distinguish explicit from implicit statements.

The handling of {{owl:sameAs}} is technically a kind of backward chaining that occurs at query time, when equivalent URIs are enumerated and substituted in to query results.