GraphDB-Lite Indexing Specifics

Skip to end of metadata
Go to start of metadata
Search
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

GraphDB Documentation

Next versions

GraphDB 6.3
GraphDB 6.4
GraphDB 6.5
GraphDB 6.6
GraphDB 7.0
GraphDB 7.1

Previous versions

GraphDB 6.0 & 6.1

OWLIM 5.4
OWLIM 5.3
OWLIM 5.2
OWLIM 5.1
OWLIM 5.0
OWLIM 4.4
OWLIM 4.3
OWLIM 4.2
OWLIM 4.1
OWLIM 4.0

Comparison of GraphDB-SE and GraphDB-Lite

The major differences between GraphDB-SE and GraphDB-Lite are their performance and scalability. Both GraphDB editions deliver identical functionality for RDF storage, inference and query answering, and they both implement Sesame's SAIL APIs, as discussed in section 4. This guarantees that all essential functions of a semantic repository are supported by GraphDB in a standard, consistent, and interoperable manner.
Compared to GraphDB-SE, GraphDB-Lite does not scale as well in terms of the volume of data that can be managed – the upper limit being typically some tens of millions of statements. However, GraphDB-Lite can perform faster inferencing and query answering, due to the fact that it holds all data in memory. However, this is not always the case, because GraphDB-SE has several optimisations that GraphDB-Lite does not, i.e. special owl:sameAs handling and various query optimisations.
Furthermore, GraphDB-SE has a range of advanced features that are not included in GraphDB-Lite, i.e. RDF Ranking, RDF Priming, RDF Search, Node Search, notifications.

Persistence Strategy

GraphDB-Lite stores the repository contents in a binary file in the storage folder when it is shutdown. The format is such that it can be quickly read back in to memory when GraphDB-Lite is restarted, i.e. the synchronisation of the in-memory contents of the repository with a persistent binary storage file occurs only at initialisation and shutdown.
Furthermore, new statements that are added to the repository are also stored in N-Triples format in an external file (see the new-triples-file configuration parameter in section 8.4). In the event of abnormal termination, the contents of this external file are added to the repository immediately after the repository is restored from the binary file.

It is vitally important to shutdown repository connections properly to ensure that the repository contents are written to the file-system on shutdown.

Transaction Support

Transaction support is exposed via Sesame's RepositoryConnection interface. The three methods of this interface that give the client control over when updates are committed to the repository are shown below:

Method Effect
void commit() Commits all updates that have been performed through this connection since the last commit or rollback operation.
void rollback() Rolls back all updates that have been performed through this connection object since the last commit or rollback operation.
void setAutoCommit(boolean autoCommit) Enables or disables auto-commit mode for the connection.

GraphDB supports the so called 'read committed' transaction isolation level, well known to relational database management systems. It guarantees that changes will not impact query evaluation, before the entire transaction, they are part of, is successfully committed. It does not guarantee that execution of a single transaction is performed against a single state of the data in the repository. Regarding concurrency:

  • multiple update/modification/write transactions can be initiated and stay open simultaneously, i.e. one transaction does not need to be committed in order to allow another transaction to complete;
  • update transactions are processed internally in sequence, i.e. GraphDB processes the commits one after another;
  • update transactions do not block read requests in any way, i.e. hundreds of SPARQL queries can be evaluated in parallel (the processing is properly multi-threaded), while update transactions are being handled on separate threads.

One should note that GraphDB performs materialisation, making sure that all the statements which can be inferred from the current state of the repository are indexed and persisted. When the commit method completes, all reasoning related activities related to the changes in the data introduced by the corresponding transaction will have already been performed.

An uncommitted transaction will not affect the 'view' of the repository through any connection, including the connection used to do the modification. This is perhaps not in keeping with most relational database implementations. However, committing a modification to a semantic repository involves considerably more work, specifically the computation of the changes to the inferred closure resulting from the addition or removal of explicit statements. This computation is only carried out at the point where the transaction is committed and so to consistent, neither the inferred statements nor the modified statements related to the transaction are 'visible'.

Handling of Explicit and Implicit Statements

As already described, GraphDB-Lite applies the inference rules at load time in order to compute the full closure. Therefore, a repository will contain some statements that are explicitly asserted and other statements that exist through implication. In most cases clients will not be concerned with the difference, however there are some scenarios when it is useful to work with only explicit or only implicit statements. The following sections describe how these two groups of statements can be isolated during programmatic statement retrieval using the Sesame API and during (SPARQL) query evaluation.
The usual technique for retrieving statements is to use the RepositoryConnection method:


The method retrieves statements by 'triple pattern', where any or all of the subject, predicate and object parameters can be null to indicate 'wild cards'.
To retrieve explicit and implicit statements, the includeInferred parameter must be set to true. To retrieve only explicit statements, the includeInferred parameter must be set to false.

Multi-threading

GraphDB-Lite features thread-safe techniques for managing the internal storage of data structures, where several 'worker' threads do inferencing in parallel. The number of worker threads is controlled via the num.threads.run=n system property, see section 8.4. Each of the allocated threads operates on jobs of committed explicit statements. Each thread computes the increment to the inferred closure from applying the rule set to these new statements and the existing statements in the repository. The number of collected statements is controlled via the jobsize configuration parameter. When transaction is committed, the caller is blocked until all worker threads have finished their job units.
There are other threads running simultaneously apart from the main application thread and the worker threads mentioned above. If persistence is not switched off, then a persistence thread wakes up every few seconds and scans for new explicit statements. Any new statements found are added to the persistence file. There is also an additional thread that is spawned (by Sesame) during the parsing of RDF at load time. This should also be considered when deciding how many worker threads to allocate for inference.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.