GraphDB-Lite Indexing Specifics

Version 1 by barry.bishop
on May 24, 2011 18:45.

compared with
Version 2 by Dimitar Manov
on Jul 17, 2014 17:41.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (11)

View Page History
{toc}

h1. Comparison of OWLIM-SE and OWLIM-Lite
h1. Comparison of GraphDB-SE and GraphDB-Lite

The major differences between OWLIM-SE GraphDB-SE and OWLIM-Lite GraphDB-Lite are their performance and scalability. Both OWLIM GraphDB editions deliver identical functionality for RDF storage, inference and query answering and they both implement Sesame's SAIL APIs, as discussed in section 4. This guarantees that all essential functions of a semantic repository are supported by OWLIM GraphDB in a standard, consistent, and interoperable manner.
Compared to OWLIM-SE, OWLIM-Lite GraphDB-SE, GraphDB-Lite does not scale as well in terms of the volume of data that can be managed -- the upper limit being typically some tens of millions of statements. However, OWLIM-Lite GraphDB-Lite can perform faster inferencing and query answering, due to the fact that it holds all data in memory. However, this is not always the case, because OWLIM-SE GraphDB-SE has several optimisations that OWLIM-Lite GraphDB-Lite does not, i.e. special owl:sameAs handling and various query optimisations.
Furthermore, OWLIM-SE GraphDB-SE has a range of advanced features that are not included in OWLIM-Lite, GraphDB-Lite, i.e. RDF Ranking, RDF Priming, RDF Search, Node Search, notifications.

h1. Persistence Strategy

OWLIM-Lite GraphDB-Lite stores the repository contents in a binary file in the storage folder when it is shutdown. The format is such that it can be quickly read back in to memory when OWLIM-Lite GraphDB-Lite is restarted, i.e. the synchronisation of the in-memory contents of the repository with a persistent binary storage file occurs only at initialisation and shutdown.
Furthermore, new statements that are added to the repository are also stored in N-Triples format in an external file (see the new-triples-file configuration parameter in section 8.4). In the event of abnormal termination, the contents of this external file are added to the repository immediately after the repository is restored from the binary file.
{warning}It is vitally important to shutdown repository connections properly to ensure that the repository contents are written to the file-system on shutdown.{warning}
| {{void setAutoCommit(boolean autoCommit)}} | Enables or disables auto-commit mode for the connection. |

OWLIM GraphDB supports the so called 'read committed' transaction isolation level, well known to relational database management systems. It guarantees that changes will not impact query evaluation, before the entire transaction they are part of is successfully committed. It does not guarantee that execution of a single transaction is performed against a single state of the data in the repository. Regarding concurrency:
* multiple update/modification/write transactions can be initiated and stay open simultaneously, i.e. one transaction does not need to be committed in order to allow another transaction to complete;
* update transactions are processed internally in sequence, i.e. OWLIM GraphDB processes the commits one after another;
* update transactions do not block read requests in any way, i.e. hundreds of SPARQL queries can be evaluated in parallel (the processing is properly multi-threaded) while update transactions are being handled on separate threads.

One should note that OWLIM GraphDB performs materialization, making sure that all the statements which can be inferred from the current state of the repository are indexed and persisted. When the commit method completes, all reasoning related activities related to the changes in the data introduced by the corresponding transaction will have already been performed.

{note}
h1. Handling of Explicit and Implicit Statements

As already described, OWLIM-Lite GraphDB-Lite applies the inference rules at load time in order to compute the full closure. Therefore a repository will contain some statements that are explicitly asserted and other statements that exist through implication. In most cases clients will not be concerned with the difference, however there are some scenarios when it is useful to work with only explicit or only implicit statements. The following sections describe how these two groups of statements can be isolated during programmatic statement retrieval using the Sesame API and during (SPARQL) query evaluation.
The usual technique for retrieving statements is to use the RepositoryConnection method:
{code}RepositoryResult<Statement> getStatements(
h1. Multi-threading

OWLIM-Lite GraphDB-Lite features thread-safe techniques for managing the internal storage of data structures, where several 'worker' threads do inferencing in parallel. The number of worker threads is controlled via the num.threads.run=n system property, see section&nbsp;8.4. Each of the allocated threads operates on jobs of committed explicit statements. Each thread computes the increment to the inferred closure from applying the rule set to these new statements and the existing statements in the repository. The number of collected statements is controlled via the jobsize configuration parameter. When transaction is committed, the caller is blocked until all worker threads have finished their job units.
There are other threads running simultaneously apart from the main application thread and the worker threads mentioned above. If persistence is not switched off, then a persistence thread wakes up every few seconds and scans for new explicit statements. Any new statements found are added to the persistence file. There is also an additional thread that is spawned (by Sesame) during the parsing of RDF at load time. This should also be considered when deciding how many worker threads to allocate for inference.