OWLIM-SE Release notes

Skip to end of metadata
Go to start of metadata
Search
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

OWLIM Documentation

Next versions

OWLIM 5.0
OWLIM 5.1
OWLIM 5.2
OWLIM 5.3
OWLIM 5.4

GraphDB 6.0 & 6.1
GraphDB 6.2
GraphDB 6.3
GraphDB 6.4
GraphDB 6.5
GraphDB 6.6
GraphDB 7.0
GraphDB 7.1

Previous versions

OWLIM 4.3
OWLIM 4.2
OWLIM 4.1
OWLIM 4.0

New features and significant bug-fixes/updates for the last few versions are recorded here.

Version 4.4

Transaction management and isolation mechanisms have been completely refactored. The previous strategy used very lazy writing of modified database pages, such that dirty pages were only flushed to disk when further updates occur and no more memory is available. While extremely fast, the problem with this approach is that there is a considerable recovery time associated with replaying the transaction log after an abnormal termination.
The new mechanism uses two modes: 'bulk-loading', similar behaviour to previous versions and 'normal', where database modifications are flushed to disk as part of the commit operation.
Some side-effects of this change are:

  • The special flush predicate used force pages to be written to disk is no longer required and has been removed
  • There is a new parameter to control the transaction 'mode' called transaction-mode
  • The database-recovery-policy configuration parameter is also no longer required and has been removed
    It is important not to upgrade from previous versions of OWLIM unless the repository has been shutdown cleanly.

SPARQL 1.1 Graph Store HTTP Protocol is now supported according to the W3C Working Draft from the 12th May 2011. This provides a REST interface for managing collections of graphs, using either directly or indirectly named graphs.

Index compression can now be used to reduce disk storage requirements by using zip compression on database pages. This feature if off by default, but can be switched on when creating a new repository. The configuration parameter index-compression-ratio can be set to -1 (the default value indicating no compression) or a value in the range [10-50] indicating the desired percentage reduction in page sizes. Any pages that can not be compressed by the specified amount are stored uncompressed. Therefore a compression ratio that is too aggressive will not bring many benefits. Experiments have shown that for large datasets a value of about 30% is close to optimal.

Entity compression is a modification that reduces the storage requirements for the lookup table that maps between internal identifiers and resources. This is transparent to the user and happens automatically upon upgrading to OWLIM 4.4.

A new literal index is created automatically for numeric and data-time datatypes. The index is only used during query evaluation only if a query or a subquery (e.g. union) has a filter that is comprised of a conjunction of literal constraints, e.g. FILTER(?x >= 3 && ?y <= 5 && ?start > "2001-01-01"^^xsd:date)
Other patterns, including those that use negation, will not use the index for this version of OWLIM.

Incremental Lucene-based full-text search index for updating the index for only those resources specified by using a special predicate in a control query. Using this technique can avoid the more expensive approach of rebuilding the whole index frequently.

Incremental RDF Rank allows the RDF rank for specific resources to be (re-)computed as directed by the user. Again, this technique can avoid the more expensive approach of rebuilding all RDF Rank values frequently.

Geo-spatial index update to support 40-bit resource identifiers (if they are used). Due to compatibility issues, the geo-spatial index MUST be recreated after upgrading to OWLIM 4.4.

Getting started and LUBM benchmark applications have been restructured so that they now work with remote repositories.
TODO check LUBM

OWLIM also includes the following maintenance updates and fixes:
TODO build JIRA release notes

Version 4.3

Further contributions to the Sesame framework from Ontotext and Fluid Operations mean that Sesame version 2.6 is included with this version of OWLIM. The following new features are available:

  • SPARQL 1.1 Federation support that allows queries to pull together data from any number of distributed SPARQL endpoints
  • A new SPARQL repository type to wrap SPARQL endpoints
  • Improvements to the parser for controlling the level of literal/data-type validation and the handling of errors
  • Many other fixes for compliance with the latest revised SPARQL 1.1 working drafts

OWLIM has now has a plug-in API that allows users to build software components that alter the behaviour of OWLIM. This mechanism can be used to add new features or to improve performance in certain scenarios.

OWLIM also includes the following maintenance updates and fixes:

  • OWLIM-205 - Validate literal languages and do not allow invalid language tags to enter the repository
  • OWLIM-273 - Potential thread leak in QueryModelConverter
  • OWLIM-390 - Counting statements using Sesame API gives strange results.
  • OWLIM-419 - Make RepositoryConnection.exportStatements obey the time limit
  • OWLIM-426 - Unable to permanently remove predefined namespace definitions
  • OWLIM-428 - Explicit axioms don't show up as explicit if they have been inferred before by other axioms
  • OWLIM-463 - Clear transaction log in replication cluster if it cannot be initialized
  • OWLIM-466 - SesameConnectionImpl.getStatements must return quads, not trips (breaks workbench explore)
  • OWLIM-470 - Query with Union and optional returns wrong results
  • OWLIM-471 - Can not access new repository when FTS switched on (divide by zero or lockfile locked)
  • OWLIM-473 - onto:explicit pseudo-graph does not prevent implicit statements as input for query answering
  • OWLIM-475 - Repackaged console.sh in openrdf-console.zip has lost its execute attribute
  • OWLIM-476 - Neither of the slf4j jars (api or jdk14) are needed in the war files
  • OWLIM-483 - Lost solutions to queries with FROM <...> clause
  • OWLIM-485 - Repository with many transactions fails to get restored
  • OWLIM-488 - Incorrect behaviour of FROM and FROM NAMED in SPARQL queries
  • OWLIM-489 - Predicate list indices do not log statistics
  • OWLIM-490 - User-supplied Dataset object on query not properly handled
  • OWLIM-491 - Query rewriting in MainQuery.convertToOptimizedForm() converts OR to AND in filters when converting the condition to disjunctive normal form
  • OWLIM-495 - Blank node contexts ignored by getStatements()
  • OWLIM-501 - Lucene and OPTIONAL query bug
  • OWLIM-502 - The database restorer deletes the pso and pos files after second unsuccessful restore
  • OWLIM-457 - Validate data-type values at load time
  • OWLIM-497 - Update getting-started and add timestamps
  • OWLIM-356 - Optimized rule set is not compatible with the rule compiler.
  • OWLIM-480 - Make use of the com.ontotext.trree.collections for the predicate map in order to reuse the file header and the common interface

Version 4.2

Ontotext have continued to invest in the Sesame project and are pleased to announce the inclusion of Sesame version 2.5 with this version of OWLIM. The benefits include:

  • SPARQL 1.1 Update - this extension of SPARQL provides a much more powerful method to modify RDF databases without the requirement for developers to use frameworks and APIs.
  • SPARQL 1.1 Query conformance has been updated to the May 2011 working draft, i.e. all the remaining behaviour has been implemented along with all the new SPARQL filter functions.
  • The SPARQL protocol has also been updated to January 2010 working draft.
  • A new binary RDF serialization format. This format has been derived from the existing binary tuple results format. It's main features are reduced parsing overhead and minimal memory requirements.

As well as integration with the new Sesame APIs and modifications for optimising SPARQL Update, there have also been a number of bug fixes in this version of OWLIM-SE:

  • OWLIM-396 - A RuntimeException is thrown in clearNamespaces() in SailConnection
  • OWLIM-404 - HashEntityPool fails to store/read its entity index table if its size is more than ~500M
  • OWLIM-408 - Getting of default namespace doesn't work
  • OWLIM-440 - Can not create geo-spatial index when using OWLIM-SE with Tomcat
  • OWLIM-443 - Repository fails to start - entity pool error
  • OWLIM-445 - disable-sameAs causing query evaluation to lose bindings
  • OWLIM-446 - Query.setIncludeInferred() is ignored
  • OWLIM-447 - License file can not be specfied - default evaluation license is always used.
  • OWLIM-449 - Wrong conversion from int to long in com.ontotext.trree.plugin.lucene.LuceneIterator
  • OWLIM-452 - Multiple wrong results are returned for a CONSTRUCT query
  • OWLIM-454 - EntityStorageVersion3 fails to restore if a long entity has negative size.
  • OWLIM-455 - Cannot put any more statements in AVL tree after ~3.1B statements added during 3.5-to-4.0 conversion
  • OWLIM-305 - Rationalise OWLIM vocabulary

Version 4.1

This maintenance release includes Sesame 2.4.2, which fixes several important bugs in SPARQL 1.1 Query support:

Also included are some updates to OWLIM-SE:

  • Unexpected binding returned in a Sparql query with union within an optional expression
  • FILTER in OPTIONAL patterns returns incorrect results
  • Aggregate SPARQL query fails with IndexOutOfBoundsException
  • Default and named graphs set in a SPARQL query are ignored by the Jena connector

Version 4.0

  • BigOWLIM has been renamed to OWLIM-SE (standard edition). This new brand name better reflects the role of this software component at the heart of the OWLIM family. This component is a fully-featured, standalone semantic repository as well as the the engine behind the worker nodes in an OWLIM-Enterprise cluster. It's younger sibling, OWLIM-Lite, is the lighter-weight free-for-use version.
  • Easy to deploy WAR files: The distribution now includes openrdf-sesame and openrdf-workbench Web applications pre-configured with OWLIM and ready to deploy. This makes installing OWLIM as a server and creating/administrating OWLIM repositories trivially simple. The WAR files can be found in the sesame_owlim directory of the distribution ZIP file. See 'easy install' in the installation section.
  • SPARQL 1.1 Query: Ontotext has invested significant development resources in the Sesame project in order bring SPARQL 1.1 support to Sesame and OWLIM. This release includes SPARQL 1.1 Query, but without federation support for the moment. SPARQL 1.1 Update support will be included in the next release. The new features include:
    • Aggregates
    • Subqueries
    • Negation
    • Expressions in the SELECT clause
    • Property Paths
    • Assignment
    • A short form for CONSTRUCT
    • An expanded set of functions and operators
  • The SPARQL 1.1 specification has not yet become a W3C recommendation and continues to evolve. The following known issues apply to this release of OWLIM and Sesame:
    • fn:concat is not supported. This was added to the working draft in May, just after the Sesame 2.4.0 release was finalised. It will likely be included in the next Sesame/OWLIM release.
    • Federation is not yet supported. This will be implemented in a later version of Sesame and OWLIM later this year.
    • There are some problems with complex expressions in the SELECT clause. This should be fixed in the next release of Sesame/OWLIM.
    • Empty IN() and NOT IN() clauses will cause an exception - will be fixed in the next release.
    • Using the aggregate function SUM() will cause an exception if the there are no bindings over which to do the summation - will be fixed in the next release.
  • Wider entity IDs: For very large datasets that contain more than 232 unique entities (URIs, blank nodes and literals), OWLIM can be switched in to a new mode that uses 40bit IDs, thus allowing over 1 trillion unique entities.
  • Access to internal entity IDs: For some applications, especially where RDF URIs are use to index external data, e.g. some legacy system or some other type of storage, then a special predicate and function can be used to find the internal ID used to index an entity or to find an entity based on its ID. This allows for more efficient indexing in external systems, where an integer index can be used instead of a URI string.
  • Internal performance analytics: It is now possible to monitor the internal behaviour of OWLIM indices using a JMX interface. Statistics can be accessed that show the cache behaviour (number of hits, misses, reads, writes, etc) and these can be helpful when diagnosing any performance problems when loading datasets or when evaluating certain queries. The statistics can give some indication on how to fine-tune memory allocation between the various caches/indices.

Version 3.5

This release includes many bug fixes, several new features and updates:

  • Remote notifications: A new mechanism to complement the existing high-performance 'in-process' notification mechanism. This new mechanism allows clients to subscribe for the given statement patterns to remote BigOWLIM repository instances.
  • Schema editing: Read-only schemas loaded at database initialisation time allow very fast deletion of (instance) statements by using the 'fact-retraction' method that computes the necessary inferred statements to delete. A new mechanism is provided with this release that allows 'read-only' schema statements to be modified when necessary.
  • Configuration spreadsheet tool: The memory calculator from previous versions has been updated to estimate appropriate BigOWLIM configurations for the specified hardware, dataset characteristics and selected features.
  • Query optimisations: Several improvements have been made to query optimisation, including the special case when using ORDER BY with LIMIT/OFFSET.
  • Online documentation: As well as the PDF format user guides included in the OWLIM distribution zip files, the latest documentation for all editions of OWLIM is now available online.
  • Storage files updated automatically: There are minor differences in storage file formats between versions. Versions of files back to 3.1 are now detected and updated automatically.
  • owl:sameAs optimisation can be disabled: The owl:sameAs optimisation can now be switched off using the disable-sameAs configuration parameter. This update might be useful when using the empty or rdfs rulesets.
  • Lucene-base full-text search enhancements: Even more fine-grained control over what to include in the indexed RDF molecule. Separate include/exclude lists are now supported for both predicates traversed and entities visited.
  • All OWLIM plug-ins available with Jena interface: All the BigOWLIM advanced features are now fully supported when using BigOWLIM with the Jena framework. This includes RDF Rank, RDF Search, Node Search, RDF Priming and Geo-spatial extensions.

Version 3.4

This release includes many bug fixes, several new features and updates:

  • Jena adapter: Applications which use the Jena framework http://jena.sourceforge.net/ or Jena-compliant RDF stores can seamlessly switch to BigOWLIM to take advantage of efficient loading and high-performance reasoning. At the same time, Jena's ARQ engine allows BigOWLIM to handle the latest SPARQL 1.1 extensions (e.g. aggregates). The adapter is still a beta version and has not been rigorously tested for conformance yet, but can be used with Joseki to make queries and has successfully passed the 100 Million BSBM SPARQL benchmark.
  • Geo-spatial extensions: Applications can efficiently make queries involving constraints such as "nearby point" and "within region". Special-purpose indices allow such constraints to be evaluated very efficiently on top of large volumes of location-related data – for example, finding airports within 50 miles of London in the GeoNames dataset becomes 500 faster when compared to the same query evaluated without the special indices.
  • Rule engine enhancements: The rule-engine now supports the ability to use context as part of rule premises and consequences. This allows for more efficient processing of certain RDFS/OWL constructions, particularly those rules using RDF lists. All included rule-sets have been upgraded to make use of this new expressiveness. As a result, there is now just a single rule-set for OWL2-RL, where in the last version there was a 'conformant' and a 'reduced' version. The new rule engine has lead to an improvement in LUBM loading performance of around 22%
  • OWL2-QL: This OWL2 profile http://www.w3.org/TR/owl2-profiles/#OWL_2_QL is based on DL-LiteR, a variant of DL-Lite that does not require the unique name assumption. It is designed to be amenable to implementation on relational databases, due to its suitability for re-writing queries to SQL. This release includes a rule-set for this profile in order to expand the range of standard rule-sets and to give users more flexibility when choosing a balance between complexity of inference and scalability.
  • Enhanced Lucene-based full text search: More flexibility is enabled for using Lucene full text search. Users can create multiple customised indices and can decide to include URIs or literals, select literals by language tags, and use custom analyzers and scorers. Any number of custom indices can be used within the same query.
  • Auto-restore: A configurable policy parameter can be used to specify how the user wishes the repository to start after an abnormal termination. By default, the database restorer tool will be run automatically to return the database to the state prior to the stop event, i.e. to the state after the last committed transaction.
  • Simplified 'implicit-only' statement retrieval: When using the Sesame API to return statements, the 'implicit' pseudo-graph is now used. This is simpler and more consistent with query processing than the old method of invoking RepositoryConnection.getStatements() twice.
  • Documentation: The distribution package includes two new guides: Replication Cluster Quick Start Guide that has details on installing and configuring a cluster and Performance Tuning Guide that brings together all information for optimising loading time, inference and query processing.

Version 3.3

BigOWLIM 3.3 consolidates a number of advanced new features, some of which have been available in previous versions as prototype implementations. The most important differences, compared to previous versions of BigOWLIM are:

  • Replication cluster: brings resilience, failover and horizontally scalable parallel query processing. A master node component is included that can manage a cluster of worker nodes (standard BigOWLIM instances) to synchronise updates, cater for node failure, dynamically add/remove worker nodes and distributed query requests. Such a setup allows for massive concurrent query performance where query the number of queries processed per second scales almost linearly with the number of worker nodes.
  • OWL2-RL: Support for this expressive OWL2 profile http://www.w3.org/TR/owl2-profiles/#OWL_2_RL that is amenable for implementation on rule-engines, but without costly data-type reasoning.
  • High performance retraction: BigOWLIM uses the policy of total materialisation of inferred knowledge. This has the advantage that all inferred statements are computed at load time, thus allowing query processing to proceed extremely quickly. While BigOWLIM has always been very fast at processing incremental statement insertions (which are monotonic by necessity), fast statement retractions are now also possible, despite not utilising any truth maintenance mechanism. When statements are deleted, a combination of forward and backward chaining is used to update the inferred closure. This is slower than statement insertion, but still fast enough to support hundreds of updates per hour even in a repository holding billions of statements.
  • Full text search: Two different, but complimentary full text search mechanisms are provided, one proprietary and the other based on Lucene. Text search queries can be embedded in SPARQL queries for powerful, hybrid query expressions.
  • Consistency checks: Are used to ensure the consistency of the repository. The checks use a syntax similar to entailment rules, but are used to signal a consistency violation when the necessary conditions are met. Consistency checks are used in some of the standard rule sets.
  • RDF Rank: Is a technique that identifies the more important or more popular entities in the repository by examining how many connections (predicates) link each node with any other node. The popularity of entities can then be used to order query results in a similar way to internet search engines, such as Google's PageRank.
  • RDF Priming: Allows subsets of statements to be selected as the input to query answering. It is based upon the concept of 'spreading activation' as developed in cognitive science. It allows the 'priming' of large datasets with respect to concepts relevant to the context and to the query.
  • Notification mechanism: A publish/subscribe mechanism for registering and receiving events from a BigOWLIM repository whenever triples matching a certain graph pattern are inserted or removed. This allows clients to react to events in the update stream and avoid polling the repository.
  • Documentation: improved user documentation with new quick start guide.
  • partialRDFS: this flag has been deprecated and the optimizations made available with extra rule-set options.
  • Ontology imports: importing ontologies can be achieved using a URL as well as a local pathname.
  • Better JDK integration: custom rule-sets require the Java compiler, but it is no longer necessary to have the tools.jar in the classpath.


Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.