h1. The TRREE Engine
GraphDB is implemented on top of the TRREE engine. TRREE \[39\] stands for 'Triple Reasoning and Rule Entailment Engine'. The TRREE performs reasoning based on forward-chaining of entailment rules over RDF triple patterns with variables. TRREE's reasoning strategy is total materialisation, see section 3.1.7, although various optimisations are used as described in the following sections.
The semantics used is based on R-entailment \[37\] with the following differences:
* Free variables in the head of a rule (without a binding in the body) are treated as blank nodes. This feature can be considered 'syntactic sugar';
* Variable inequality constraints can be specified in the body of the rules, in addition to the triple patterns. This leads to lower complexity as compared to R-entailment;
* the {{{_}cut{_}}} operator can be associated with rule premises, the TRREE compiler interprets it like the {{_\!_}} operator in Prolog;
* Two types of inconsistency checks are supported. Checks without any consequences indicate a consistency violation if the body can be satisfied. Consistency checks with consequences indicate a consistency violation if the inferred statements do not exist in the repository;
* Axioms can be provided as a set of statements, although those are not modelled as rules with empty bodies.
Further details of the rule language can be found in the corresponding user guides.
The TRREE can be configured via the _rule-sets_ parameter, that identifies a file containing the entailment rules, consistency checks and axiomatic triples. The implementation of TRREE relies on a compile stage, during which custom rule-sets are compiled into Java code that is further compiled and merged in to the inference engine.
The edition of TRREE used in GraphDB-Lite is referred to as 'SwiftTRREE' and performs reasoning and query evaluation in-memory. The edition of TRREE used in GraphDB-SE is referred to as 'BigTRREE' and utilises data structures backed by the file-system. These data structures are organised to allow query optimisations that dramatically improve performance with large datasets, e.g. with one of the standard tests GraphDB-SE evaluates queries against 7 million statements three times faster than GraphDB-Lite, although it takes between two and three times more time to initially load the data.
h1. Comparison of GraphDB-Lite and GraphDB-SE
The two GraphDB editions -- GraphDB-Lite and GraphDB-SE -- are identical in terms of usage and integration except for a few minor differences in some configuration parameters. The editions differ in which version of the TRREE engine they are based upon, but share the same inference and semantics (rule-compiler, etc).
GraphDB-Lite is designed for medium data volumes and prototyping. Its key features are:
* reasoning and query evaluation performed in main memory;
* persistence strategy that assures data preservation and consistency;
* extremely fast loading of data (including inference and storage).
GraphDB-SE is suitable for massive volumes of data and heavy query loads. It is designed as an enterprise-grade semantic repository system. It features:
* file-based indices (enables it to scale to billions of statements even on desktop machines).
* inference and query optimisations (ensures fast query evaluations).
\\
|| Parameter || *GraphDB-Lite* || GraphDB-SE ||
| *Scale* | 10 MSt, using 1.6 GB RAM \\
*100 MSt*, using 16 GB RAM | 130 MSt, using 1.6GB \\
*1068 MSt*, using 12GB |
| *Processing speed* \\
(load+infer+store) | 30 KSt/s on notebook \\
*200 KSt/s* on server | 5 KSt/s on notebook \\
*60 KSt/s* on server |
| *Query optimisation* | No | Yes |
| *Persistence* | Back-up in N-Triples | Binary data files and indices |
| *Efficient owl:sameAs* | No | Yes |
| Advanced features | none | RDF Rank \\
Full-text search \\
Geo-spatial extension |
| *License and Availability* | Free-for-use | Commercial \\
Research and evaluation copies provided for free |
*{_}Table 2 - Comparison between GraphDB-Lite and GraphDB-SE{_}*
h1. Supported Semantics
GraphDB offers several predefined semantics by way of standard rule sets (files), but can also be configured to use custom rule sets with semantics better tuned to the particular domain. The required semantics can be specified through the _ruleset_ for each specific repository instance. Applications that do not need the complexity of the most expressive supported semantics, can choose one of the less complex, which will result in faster inference.
h2. Pre-defined Rule Sets
The pre-defined rule-sets are layered such that each one extends the preceding one. The following list is ordered by increasing expressivity:
* *empty*: no reasoning, i.e. GraphDB operates as a plain RDF store;
* *rdfs*: supports standard RDFS semantics;
* *owl-horst*: OWL dialect close to OWL Horst; the differences are discussed below;
* *owl-max*: a combination of most of OWL Lite with RDFS;
* *owl2-rl*: Fully conformant OWL2 RL profile \[44\] except for D-Entailment, i.e. reasoning about data types.
h2. Custom Rule-Sets
GraphDB has an internal rule compiler that can be used to configure the TRREE with a custom set of inference rules and axioms. The user may define a custom rule-set in a \*.pie file (e.g. MySemantics.pie). The easiest way to do this is to start modifying one of the .pie files that were used to build the pre-compiled rule-sets -- all pre-defined .pie files are included in the distribution. The syntax of the .pie files is easy to follow.
h2. OWL Compliance
Regarding OWL compliance, GraphDB supports several OWL like dialects: OWL Horst \[37\] (*owl-horst*), OWL Max (*owl-max*) that covers most of OWL Lite and RDFS, OWL2 QL (*owl2-ql*) and OWL2 RL (*owl2-rl*).
With the *owl-max* rule-set GraphDB supports the following semantics:
* full RDFS semantics without constraints or limitations, apart from the entailment related to typed literals (known as D-entailment). For instance, meta-classes (and any arbitrary mixture of class, property, and individual) can be combined with the supported OWL semantics
* most of OWL Lite
* all of OWL DLP
The differences between OWL Horst \[37\], and the OWL dialects supported by GraphDB (*owl-horst* and *owl-max*) can be summarised as follows:
* GraphDB does not provide the extended support for typed literals, introduced with the D-entailment extension of the RDFS semantics. Although such support is conceptually clear and easy to implement, it is our understanding that the performance penalty is too high for most applications. One can easily implement the rules defined for this purpose by ter Horst and add them to a custom rule-set;
* There are no inconsistency rules by default;
* A few more OWL primitives are supported by GraphDB (rule-set *owl-max*). These are listed in the GraphDB User Guides;
* There is extended support for schema-level (T-Box) reasoning in GraphDB.
Even though the concrete rules pre-defined in GraphDB differ from those defined in OWL Horst, the complexity and decidability results reported for R-entailment are relevant for TRREE and GraphDB. To put it more precisely, the rules in the *owl-horst* rule-set, do not introduce new B-Nodes, which means that R-entailment with respect to them takes polynomial time. In KR terms, this means that the *owl-horst* inference within GraphDB is tractable.
Inference using *owl-horst* is of a lesser complexity compared to other formalisms that combine DL formalisms with rules. In addition, it puts no constraints with respect to meta-modelling.
The correctness of the support for OWL semantics (for those primitives that are supported) is checked against the normative Positive\- and Negative-entailment OWL test cases \[7\]. These tests are provided in the GraphDB distribution and documented in the GraphDB user guides.