GraphDB-SE Indexing Specifics

compared with
Version 5 by Reneta Popova
on Aug 25, 2014 16:11.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (4)

View Page History
* update transactions do not block read requests in any way, i.e. hundreds of SPARQL queries can be evaluated in parallel (the processing is properly multi-threaded) while update transactions are being handled on separate threads.

One should note that GraphDB performs materializsation, making sure that all the statements which that can be inferred from the current state of the repository are indexed and persisted (except for those compressed due to the *owl:sameAs* optimisation, described in section 7.5). When the commit method completes, all reasoning related activities related to the changes in the data introduced by the corresponding transaction will have already been performed.

{note}An uncommitted transaction will not affect the 'view' of the repository through any connection, +including the connection used to do the modification+. This is perhaps not in keeping with most relational database implementations. However, committing a modification to a semantic repository involves considerably more work, specifically the computation of the changes to the inferred closure resulting from the addition or removal of explicit statements. This computation is only carried out at the point where the transaction is committed and so to consistent, neither the inferred statements nor the modified statements related to the transaction are 'visible'.{note}
Index compression is controlled using a single configuration parameter called {{index-compression-ratio}}, whose default value is {{\-1}} indicating no compression. To create a repository that uses ZIP compression, set this parameter to a value between 10 and 50 percent (inclusive). Once created, this compression ratio can not be changed.

The value for this parameter indicates the attempted compression ratio for pages - the smaller the value the more compression is attempted. Pages that can not be compressed below the requested size are stored uncompressed. Therefore, setting this value too low will not save any disk space and will simply add to the processing overhead. Typically, a value of 30% gives good performance with significant disk-space reduction, i.e. around 70% less disk space used for each index. The total disk space requirements are typically reduced by around half when using index compression at 30%.

h1. Literal Index

A literal index is automatically built that allows faster look-ups of numeric and date/time object values. The index is used during query evaluation, only if a query or a subquery (e.g. union) has a filter that is comprised of a conjunction of literal constraints using comparisons and equality (but no negation or inequality), e.g. FILTER(?x = 100 && ?y <= 5 && ?start > "2001-01-01"^^xsd:date)
Other patterns will not use the index in this version of GraphDB, i.e. no attempt is made to re-write filters into usable patterns.

The method retrieves statements by 'triple pattern', where any or all of the subject, predicate and object parameters can be *null* to indicate 'wild cards'.
To retrieve explicit and implicit statements, the *includeInferred* parameter must be set to *true*. To retrieve only explicit statements, the *includeInferred* parameter must be set to *false*.
However, the Sesame API does not provide the means to enable the retrieval of implicit statements only. In order to allow clients to do this, GraphDB-SE allows the use of the special 'implicit' pseudo-graph (section&nbsp;10.7.1) (section 10.7.1) with this API, which can be passed as the context parameter. The following example shows how only implicit statements can be retrieved:

{code}RepositoryResult<Statement> statements =