OWLIM-Lite Indexing Specifics

compared with
Current by barry.bishop
on May 24, 2011 18:45.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (8)

View Page History
{toc}

h1. Comparison of BigOWLIM and SwiftOWLIM
h1. Comparison of OWLIM-SE and OWLIM-Lite

The major differences between BigOWLIM OWLIM-SE and SwiftOWLIM OWLIM-Lite are their performance and scalability. Both OWLIM editions deliver identical functionality for RDF storage, inference and query answering and they both implement Sesame's SAIL APIs, as discussed in section 4. This guarantees that all essential functions of a semantic repository are supported by OWLIM in a standard, consistent, and interoperable manner.
Compared to BigOWLIM, SwiftOWLIM OWLIM-SE, OWLIM-Lite does not scale as well in terms of the volume of data that can be managed -- the upper limit being typically some tens of millions of statements. However, SwiftOWLIM OWLIM-Lite can perform faster inferencing and query answering, due to the fact that it holds all data in memory. However, this is not always the case, because BigOWLIM OWLIM-SE has several optimisations that SwiftOWLIM OWLIM-Lite does not, i.e. special owl:sameAs handling and various query optimisations.
Furthermore, BigOWLIM OWLIM-SE has a range of advanced features that are not included in SwiftOWLIM, OWLIM-Lite, i.e. RDF Ranking, RDF Priming, RDF Search, Node Search, notifications.

h1. Persistence Strategy

SwiftOWLIM OWLIM-Lite stores the repository contents in a binary file in the storage folder when it is shutdown. The format is such that it can be quickly read back in to memory when SwiftOWLIM OWLIM-Lite is restarted, i.e. the synchronisation of the in-memory contents of the repository with a persistent binary storage file occurs only at initialisation and shutdown.
Furthermore, new statements that are added to the repository are also stored in N-Triples format in an external file (see the new-triples-file configuration parameter in section 8.4). In the event of abnormal termination, the contents of this external file are added to the repository immediately after the repository is restored from the binary file.
{warning}It is vitally important to shutdown repository connections properly to ensure that the repository contents are written to the file-system on shutdown.{warning}
h1. Handling of Explicit and Implicit Statements

As already described, SwiftOWLIM OWLIM-Lite applies the inference rules at load time in order to compute the full closure. Therefore a repository will contain some statements that are explicitly asserted and other statements that exist through implication. In most cases clients will not be concerned with the difference, however there are some scenarios when it is useful to work with only explicit or only implicit statements. The following sections describe how these two groups of statements can be isolated during programmatic statement retrieval using the Sesame API and during (SPARQL) query evaluation.
The usual technique for retrieving statements is to use the RepositoryConnection method:
{code}RepositoryResult<Statement> getStatements(
h1. Multi-threading

SwiftOWLIM OWLIM-Lite features thread-safe techniques for managing the internal storage of data structures, where several 'worker' threads do inferencing in parallel. The number of worker threads is controlled via the num.threads.run=n system property, see section&nbsp;8.4. Each of the allocated threads operates on jobs of committed explicit statements. Each thread computes the increment to the inferred closure from applying the rule set to these new statements and the existing statements in the repository. The number of collected statements is controlled via the jobsize configuration parameter. When transaction is committed, the caller is blocked until all worker threads have finished their job units.
There are other threads running simultaneously apart from the main application thread and the worker threads mentioned above. If persistence is not switched off, then a persistence thread wakes up every few seconds and scans for new explicit statements. Any new statements found are added to the persistence file. There is also an additional thread that is spawned (by Sesame) during the parsing of RDF at load time. This should also be considered when deciding how many worker threads to allocate for inference.