View Source

GraphDB is a high-performance semantic repository, implemented in Java and packaged as a Storage and Inference Layer (SAIL) for the Sesame RDF database. This section describes the various editions of GraphDB.

{toc}

GraphDB is based on Ontotexts's Triple Reasoning and Rule Entailment Engine (TRREE) -- a native RDF rule-entailment engine. The supported semantics can be configured through the definition of rule-sets. The most expressive pre-defined rule-set combines unconstrained RDFS and OWL-Lite. Custom rule-sets allow tuning for optimal performance and expressivity. GraphDB supports RDFS (section 3.1.2), OWL DLP (section 3.1.5.1), OWL Horst (section 3.1.5.2), most of OWL Lite (section 3.1.5.4) and OWL2 RL (section 3.1.5.3).
The three editions of GraphDB are GraphDB-Lite, GraphDB-SE (standard edition) and GraphDB-Enterprise (cluster configuration). With GraphDB-Lite, reasoning and query evaluation are performed in-memory, while, at the same time, a reliable persistence strategy assures data preservation, consistency, and integrity. GraphDB-SE is the high-performance 'enterprise' edition that scales to massive quantities of data. Typically, GraphDB-Lite can manage millions of explicit statements on desktop hardware, whereas GraphDB-SE can manage billions of statements and multiple simultaneous user sessions. GraphDB-Enterprise is an enterprise grade cluster management component that uses a collection of GraphDB instances to provide a resilient, high-performance semantic database.
The key differences between the editions of GraphDB are discussed in section 4.5 and in the GraphDB presentation \[28\]. The results form a number of benchmarks, as well as plenty of other performance evaluation and analysis information, are available on the Web site [http://www.ontotext.com/products/ontotext-graphdb/|http://www.ontotext.com/products/ontotext-graphdb/].

h1. Advantages of GraphDB

One of the main advantages of GraphDB-Lite is the in-memory reasoning implementation: the full content of the repository is loaded and maintained in main memory, which allows for efficient retrieval and query answering. Although the reasoning is handled in-memory, the GraphDB-Lite SAIL offers a relatively comprehensive persistence and backup strategy.
The persistence of GraphDB-Lite is implemented via writing to file in N-Triple format. The repository can be split into several files, where all of these except one are read-only; the writable file is considered as both the source from which the triples are loaded and the target where the new statements are stored. This backup strategy ensures that no loss of newly asserted triples can occur in cases of power failure or abnormal termination. Although relatively simple, this strategy had proven to be very efficient and reliable over the years \[22\].

h1. Limitations of GraphDB

The limitations of GraphDB are related to its reasoning strategy. In general, the expressivity of the language supported cannot be extended in the Description Logic direction, because the semantics must be able to be captured in (Horn) rules. The total materialisation strategy has drawbacks when changes to the explicitly asserted statements occur frequently. For expressive semantics and certain ontologies, the number of implicit statements can grow quickly with the expected degradation in performance. GraphDB-SE has a number of optimisations to reduce this problem, e.g. special handling of _owl:sameAs_. Removing explicit statements can adversely affect performance if the full closure needs to be recomputed. Again, GraphDB-SE uses special techniques to avoid this situation. Another limitation of GraphDB-Lite is that the volume of data it can process is limited by the size of the computer's main memory. Considering currently available commodity hardware, GraphDB-Lite can handle millions of statements on desktop machines and above ten millions on entry-level servers..

h1. GraphDB Interoperability and Architecture

OWLIM version 3.X is packaged as a Storage and Inference Layer (SAIL) for Sesame version 2.x and makes extensive use of the features and infrastructure of Sesame, especially the RDF model, RDF parsers and query engines.
Inference is performed by the TRREE engine \[39\], where the explicit and inferred statements are stored in highly-optimised data structures that are kept in-memory for query evaluation and further inference. The inferred closure is updated through inference at the end of each transaction that modifies the repository.

!owlim_usage.png!

*{_}Figure 5 - OWLIM Usage and Relationship to Sesame and ORDI{_}*

GraphDB implements the Sesame SAIL interface so that it can be integrated with the rest of the Sesame framework, e.g. the query engines and the web UI. A user application can be designed to use GraphDB directly through the Sesame SAIL API or via the higher-level functional interfaces. When an GraphDB repository is exposed using the Sesame HTTP Server, users can manage the repository through the Sesame Workbench Web application, or with other tools integrated with Sesame, e.g. ontology editors like Protégé and TopBraid Composer.
The easiest way for developers to integrate their applications with GraphDB is to use it with the Sesame framework as a set of libraries. The installation and configuration of GraphDB are discussed in the quick start and user guides. More information on the various aspects of the Sesame specifications, its architecture and implementations can be found in section 3.2.

h1. The TRREE Engine

GraphDB is implemented on top of the TRREE engine. TRREE \[39\] stands for 'Triple Reasoning and Rule Entailment Engine'. The TRREE performs reasoning based on forward-chaining of entailment rules over RDF triple patterns with variables. TRREE's reasoning strategy is total materialisation, see section 3.1.7, although various optimisations are used as described in the following sections.

The semantics used is based on R-entailment \[37\] with the following differences:
* Free variables in the head of a rule (without a binding in the body) are treated as blank nodes. This feature can be considered 'syntactic sugar';
* Variable inequality constraints can be specified in the body of the rules, in addition to the triple patterns. This leads to lower complexity as compared to R-entailment;
* the {{{_}cut{_}}} operator can be associated with rule premises, the TRREE compiler interprets it like the {{_\!_}} operator in Prolog;
* Two types of inconsistency checks are supported. Checks without any consequences indicate a consistency violation if the body can be satisfied. Consistency checks with consequences indicate a consistency violation if the inferred statements do not exist in the repository;
* Axioms can be provided as a set of statements, although those are not modelled as rules with empty bodies.

Further details of the rule language can be found in the corresponding user guides.
The TRREE can be configured via the _rule-sets_ parameter, that identifies a file containing the entailment rules, consistency checks and axiomatic triples. The implementation of TRREE relies on a compile stage, during which custom rule-sets are compiled into Java code that is further compiled and merged in to the inference engine.

The edition of TRREE used in GraphDB-Lite is referred to as 'SwiftTRREE' and performs reasoning and query evaluation in-memory. The edition of TRREE used in GraphDB-SE is referred to as 'BigTRREE' and utilises data structures backed by the file-system. These data structures are organised to allow query optimisations that dramatically improve performance with large datasets, e.g. with one of the standard tests GraphDB-SE evaluates queries against 7 million statements three times faster than GraphDB-Lite, although it takes between two and three times more time to initially load the data.

h1. Comparison of GraphDB-Lite and GraphDB-SE

The two GraphDB editions -- GraphDB-Lite and GraphDB-SE -- are identical in terms of usage and integration except for a few minor differences in some configuration parameters. The editions differ in which version of the TRREE engine they are based upon, but share the same inference and semantics (rule-compiler, etc).
GraphDB-Lite is designed for medium data volumes and prototyping. Its key features are:
* reasoning and query evaluation performed in main memory;
* persistence strategy that assures data preservation and consistency;
* extremely fast loading of data (including inference and storage).

GraphDB-SE is suitable for massive volumes of data and heavy query loads. It is designed as an enterprise-grade semantic repository system. It features:
* file-based indices (enables it to scale to billions of statements even on desktop machines).
* inference and query optimisations (ensures fast query evaluations).

\\
|| Parameter || *GraphDB-Lite* || GraphDB-SE ||
| *Scale* | 10 MSt, using 1.6 GB RAM \\
*100 MSt*, using 16 GB RAM | 130 MSt, using 1.6GB \\
*1068 MSt*, using 12GB |
| *Processing speed* \\
(load+infer+store) | 30 KSt/s on notebook \\
*200 KSt/s* on server | 5 KSt/s on notebook \\
*60 KSt/s* on server |
| *Query optimisation* | No | Yes |
| *Persistence* | Back-up in N-Triples | Binary data files and indices |
| *Efficient owl:sameAs* | No | Yes |
| Advanced features | none | RDF Rank \\
Full-text search \\
Geo-spatial extension |
| *License and Availability* | Free-for-use | Commercial \\
Research and evaluation copies provided for free |
*{_}Table 2 - Comparison between GraphDB-Lite and GraphDB-SE{_}*

h1. Supported Semantics

GraphDB offers several predefined semantics by way of standard rule sets (files), but can also be configured to use custom rule sets with semantics better tuned to the particular domain. The required semantics can be specified through the _ruleset_ for each specific repository instance. Applications that do not need the complexity of the most expressive supported semantics, can choose one of the less complex, which will result in faster inference.

h2. Pre-defined Rule Sets

The pre-defined rule-sets are layered such that each one extends the preceding one. The following list is ordered by increasing expressivity:
* *empty*: no reasoning, i.e. GraphDB operates as a plain RDF store;
* *rdfs*: supports standard RDFS semantics;
* *owl-horst*: OWL dialect close to OWL Horst; the differences are discussed below;
* *owl-max*: a combination of most of OWL-Lite with RDFS;
* *owl2-rl*: Fully conformant OWL2 RL profile \[44\] except for D-Entailment, i.e. reasoning about data types.

h2. Custom Rule-Sets

GraphDB has an internal rule compiler that can be used to configure the TRREE with a custom set of inference rules and axioms. The user may define a custom rule-set in a \*.pie file (e.g. MySemantics.pie). The easiest way to do this is to start modifying one of the .pie files that were used to build the pre-compiled rule-sets -- all pre-defined .pie files are included in the distribution. The syntax of the .pie files is easy to follow.

h2. OWL Compliance

Regarding OWL compliance, GraphDB supports several OWL like dialects: OWL Horst \[37\] (*owl-horst*), OWL Max (*owl-max*) that covers most of OWL-Lite and RDFS, OWL2 QL (*owl2-ql*) and OWL2 RL (*owl2-rl*).

With the *owl-max* rule-set GraphDB supports the following semantics:
* full RDFS semantics without constraints or limitations, apart from the entailment related to typed literals (known as D-entailment). For instance, meta-classes (and any arbitrary mixture of class, property, and individual) can be combined with the supported OWL semantics
* most of OWL-Lite
* all of OWL DLP

The differences between OWL Horst \[37\], and the OWL dialects supported by GraphDB (*owl-horst* and *owl-max*) can be summarised as follows:
* GraphDB does not provide the extended support for typed literals, introduced with the D-entailment extension of the RDFS semantics. Although such support is conceptually clear and easy to implement, it is our understanding that the performance penalty is too high for most applications. One can easily implement the rules defined for this purpose by ter Horst and add them to a custom rule-set;
* There are no inconsistency rules by default;
* A few more OWL primitives are supported by GraphDB (rule-set *owl-max*). These are listed in the GraphDB User Guides;
* There is extended support for schema-level (T-Box) reasoning in GraphDB.

Even though the concrete rules pre-defined in GraphDB differ from those defined in OWL Horst, the complexity and decidability results reported for R-entailment are relevant for TRREE and GraphDB. To put it more precisely, the rules in the *owl-horst* rule-set, do not introduce new B-Nodes, which means that R-entailment with respect to them takes polynomial time. In KR terms, this means that the *owl-horst* inference within GraphDB is tractable.
Inference using *owl-horst* is of a lesser complexity compared to other formalisms that combine DL formalisms with rules. In addition, it puts no constraints with respect to meta-modelling.

The correctness of the support for OWL semantics (for those primitives that are supported) is checked against the normative Positive\- and Negative-entailment OWL test cases \[7\]. These tests are provided in the GraphDB distribution and documented in the GraphDB user guides.