GraphDB-SE Indexing Specifics

compared with
Version 4 by reneta.popova
on Aug 25, 2014 15:14.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (3)

View Page History
Compared to GraphDB-Lite, GraphDB-SE adds additional features and augments many aspects related to performance, which in most cases is a matter of special indexing strategies that allow more efficient retrieval. Functionally, the differences can be classified in to two groups:
* Do the same better: the corresponding feature does not allow the user to do more things with GraphDB-SE -- it rather makes it work better in specific circumstances. This is the case with [predicate lists|GraphDB-SE Indexing Specifics#Predicate Lists] and the [owl:sameAs optimisation|GraphDB-SE Reasoner]
* Do more: deliver a new type of functionality, which is not available in GraphDB-Lite. Such examples are [RDF ranking|GraphDB-SE RDF Rank] and [RDF priming|GraphDB-SE Experimental Features]. [Full-text search features|GraphDB-SE Full-text Search] are also available in GraphDB-SE, although this could be seen as an enhancement since similar, but less powerful, behaviour is available by using regular expression constraints (which disregard tokenisation and deliver different results).

In the 'do more' category, GraphDB-SE delivers functionality that is not exposed by the Sesame API. Typically, this is achieved with the use of special-purpose system predicates. One should be aware that using the 'do more' features will affect compatibility with other semantic repositories.
h1. Persistence Strategy

GraphDB-SE stores all of its data (statements, indexes, entity pool, etc.) in files in the configured storage directory, usually called 'storage'. The content and names of these files is not defined and is subject to change between versions. In general, the index structures used in GraphDB-SE are chosen and optimised to allow for efficient:
* handling of billions of statements under reasonable RAM constraints
* query optimisation
* transaction management

GraphDB-SE maintains two main indices on statements for use in inference and query evaluation, these are the predicate-object-subject (POS) index and the predicate-subject-object (PSO) index. There are many other additional data structures that are used to enable the efficient manipulation of RDF data, but these are not listed, since these internal mechanisms cannot be configured.
The following subsections describe several indexing options, which deliver considerable advantages for specific datasets, retrieval patterns and query loads. Most of these are switched off by default, thus the user should take the initiative to switch them on as necessary. Unless otherwise stated, GraphDB-SE allows one to switch indices on and off against an already populated repository; the repository should be shut down before the change of the configuration is specified. The next time the repository is started, GraphDB-SE will create or remove the corresponding index. In the case that the repository is already loaded with a large volume of data, switching on a new index can lead to considerable delays during initialisation -- this is the time required for building the new index.