GraphDB 6 includes a useful new feature that allows you to debug rule performance
To debug rule performance effectively, you first need to understand how GraphDB rules are executed (evaluated). The description below complements the one in GraphDB-SE Reasoner, in particular see Entailment rules.
To enable rule profiling, start GraphDB with the following Java option:
This enables the collection of rule statistics (various counters). Please note that this slows down rule execution (the leading premise checking part) by 10-30%.
When rule profiling is enabled:
Consider this rule for example:
This is a conjunction of two props. It is declared with the axiomatic (A-Box) triples involving t. Whenever the premise p and restricton r hold between two resources,, the rule infers the conclusion q between the same resources, i.e. p & r => q
The corresponding log for variant 4 of this rule may look like the following:
The log records detailed information about each rule and premise, which is indispensable when you are trying to understand which rule is spending too much time. However, it can be hard to grasp because of this level of detail.
We have developed the script rule-stats.pl that outputs a TSV file like this:
You can run the script like this:
We'll give an example of using the Excel format to investigate where time is spent during rule execution.
Open the result in Excel, set a filter "ver=T" (to first look at rules as a whole, not rule variants), sort descending by total "time" (third column). You can download the Excel and maybe use it as a template for your own investigations.
Now focus on rules that spend substantial time, and whose speed is significantly lower than average (highlighted in red above). Let's pick PropRestr and look at its variants by filtering on "rule=PropRestr" and "ver<>T":
It is very similar to the productive variant ptop_PropRestr_4 (see Log File above):
But the function of these premises in the rule is the same, so it's little wonder that the variant ptop_PropRestr_5 (which is checked after 4) is unproductive.
Presumably, performance would be improved if we make the two premises use the same axiomatic triple ptop:premise (emphasizing they have the same role), and introduce a [Cut]:
The cut eliminates the rule variant with x r y as leading premise. It's legitimate to do this, since the two variants are the same, up to substitution p<->r.
Introducing a cut in the original version of the rule would not be legitimate:
Since it would omit some potential inferences (in the case above, 238 triples), changing the semantics of the rule.
We wrote "Presumably, performance would be improved" since rule execution is often non-intuitive. So you are advised to keep detailed speed history, and comparing the performance after each change.
In the previous section we provided a detailed example of poring through the Log File and Excel Format and optimizing a rule set. In this section we provide more general advice about optimizing rule sets.
The complexity of the rule set has a large effect on loading performance, number of inferred statements, and the overall size of the repository after inferencing. The complexity of the standard rule sets increases as follows:
OWL RL and OWL QL do a lot of heavy work that is often not required by applications.
Check the "expansion ratio" (total/explicit statements) for your dataset, and have an idea whether that is what you expect. If your rule set infers say 4x more statements over a large number of explicit statements, that will take time, no matter how you try to optimize the rules.
The number of rules and their complexity affects inferencing performance, even for rules that never infer any new statements. This is because every incoming statement is passed through every variant of every rule to check whether something can be inferred. This often results in many checks & joins, even if the rule never fires.
So start with a minimal rule set, then add only the additional rules that you require. The default ruleset (owl-horst-optimized) works for many people, but you could even consider starting from RDFS. Eg if you need owl:Symmetric and owl:inverseOf on top of RDFS, you could copy only these rules from OWL Horst to RDFS, and leave the rest aside.
Conversely, you can start with a bigger standard ruleset, then remove the rules that you don't need.
To deploy a custom rule set, set the ruleset configuration parameter to the full pathname of your custom .pie file.
Be careful when you write custom rules:
Avoid inserting explicit statements in a named graph, if the same statements are inferrable. GraphDB always stores inferred statements in the default graph, so this will lead to duplication of statements. This will increase repository size and will slow down query answering.
People often use owl:equivalentProperty, owl:equivalentClass (and less often rdfs:subPropertyOf, rdfs:subClassOf) to map ontologies. But every such assertion means that many more statements are inferred (owl:equivalentProperty works as a pair of rdfs:subPropertyOf, and owl:equivalentClass works as a pair of rdfs:subClassOf).
This means that every dcterms:created statement will expand to 3 statements. So do not load the DC ontology unless you really need these inferred dc:date.
Inverse properties (eg :p owl:inverseOf :q) offer some convenience in querying, but are never necessary:
If an ontology defines inverses but you skip inverse reasoning, you should check which of the two properties is used in a particular data set, and write your queries carefully.
A chain of N transitive relations (eg rdfs:subClassOf) causes GraphDB to infer and store a further (N2-N)/2 statements. If the relationship is also symmetric (e.g. in a family ontology with a predicate such as relatedTo) then there will be N2-N inferred statements.
Consider removing the transitivity and/or symmetry of relations that make long chains. Or if you must have them, consider the implementation of TransitiveProperty Through Step Property, which can be faster than the standard implementation of owl:TransitiveProperty
While OWL2 has very powerful class constructs, its property constructs are quite weak. Some widely-used OWL2 property constructs could be done faster.
See this draft: http://vladimiralexiev.github.io/pres/extending-owl2/ for some ideas and clear illustrations. Below we describe 3 of these ideas.
Consider 2-place PropChain instead of general owl:propertyChainAxiom.
owl:propertyChainAxiom needs to use intermediate nodes and edges in order to unroll the rdf:List representing the chain. Since most chains found in practice are 2-place chains (and a chain of any length can be implemented as a sequence of 2-place chains), consider a rule like this:
It's used with axiomatic triples like this:
ptop:transitiveOver has been part of Ontotext's PROTON ontology since 2008. It is defined like this:
It is a specialized PropChain, where premise1 and conclusion coincide. It allows you to chain p with q on the right, yielding p. For example, the inferencing of types along the class hierarchy can be expressed as:
owl:TransitiveProperty is widely used and is usually implemented like this:
You may recognize this as a self-chain, thus a specialization of ptop:transitiveOver, i.e.
Most transitive properties comprise transitive closure over a basic "step" property. For example, skos:broaderTransitive is based on skos:broader and is implemented as:
Now consider a chain of N skos:broader between two nodes. The owl_TransitiveProperty rule has to consider every split of the chain, thus infers the same closure between the two nodes N times, leading to quadratic inference complexity.
This can be optimized by looking for the step property s and extending the chain only at the right end:
However, this won't make the same inferences as owl_TransitiveProperty if someone inserts the transitive property explicitly, which is a bad practice anyway (see Avoid Duplicate Statements).
It is more robust to declare the step and transitive properties together using ptop:transitiveOver, eg:
Skip to end of metadata Go to start of metadata