OWLIM-SE Release notes

Skip to end of metadata
Go to start of metadata
Search
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

OWLIM Documentation

Next versions

OWLIM 4.1
OWLIM 4.2
OWLIM 4.3
OWLIM 4.4
OWLIM 5.0
OWLIM 5.1
OWLIM 5.2
OWLIM 5.3
OWLIM 5.4

GraphDB 6.0 & 6.1
GraphDB 6.2
GraphDB 6.3
GraphDB 6.4
GraphDB 6.5
GraphDB 6.6
GraphDB 7.0
GraphDB 7.1

New features and significant bug-fixes/updates for the last few versions are recorded here.

Version 4.0

  • BigOWLIM has been renamed to OWLIM-SE (standard edition). This new brand name better reflects the role of this software component at the heart of the OWLIM family. This component is a fully-featured, standalone semantic repository as well as the the engine behind the worker nodes in an OWLIM-Enterprise cluster. It's younger sibling, OWLIM-Lite, is the lighter-weight free-for-use version.
  • Easy to deploy WAR files: The distribution now includes openrdf-sesame and openrdf-workbench Web applications pre-configured with OWLIM and ready to deploy. This makes installing OWLIM as a server and creating/administrating OWLIM repositories trivially simple. The WAR files can be found in the sesame_owlim directory of the distribution ZIP file. See 'easy install' in the installation section.
  • SPARQL 1.1 Query: Ontotext has invested significant development resources in the Sesame project in order bring SPARQL 1.1 support to Sesame and OWLIM. This release includes SPARQL 1.1 Query, but without federation support for the moment. SPARQL 1.1 Update support will be included in the next release. The new features include:
    • Aggregates
    • Subqueries
    • Negation
    • Expressions in the SELECT clause
    • Property Paths
    • Assignment
    • A short form for CONSTRUCT
    • An expanded set of functions and operators
  • The SPARQL 1.1 specification has not yet become a W3C recommendation and continues to evolve. The following known issues apply to this release of OWLIM and Sesame:
    • fn:concat is not supported. This was added to the working draft in May, just after the Sesame 2.4.0 release was finalised. It will likely be included in the next Sesame/OWLIM release.
    • Federation is not yet supported. This will be implemented in a later version of Sesame and OWLIM later this year.
    • There are some problems with complex expressions in the SELECT clause. This should be fixed in the next release of Sesame/OWLIM.
    • Empty IN() and NOT IN() clauses will cause an exception - will be fixed in the next release.
    • Using the aggregate function SUM() will cause an exception if the there are no bindings over which to do the summation - will be fixed in the next release.
  • Wider entity IDs: For very large datasets that contain more than 232 unique entities (URIs, blank nodes and literals), OWLIM can be switched in to a new mode that uses 40bit IDs, thus allowing over 1 trillion unique entities.
  • Access to internal entity IDs: For some applications, especially where RDF URIs are use to index external data, e.g. some legacy system or some other type of storage, then a special predicate and function can be used to find the internal ID used to index an entity or to find an entity based on its ID. This allows for more efficient indexing in external systems, where an integer index can be used instead of a URI string.
  • Internal performance analytics: It is now possible to monitor the internal behaviour of OWLIM indices using a JMX interface. Statistics can be accessed that show the cache behaviour (number of hits, misses, reads, writes, etc) and these can be helpful when diagnosing any performance problems when loading datasets or when evaluating certain queries. The statistics can give some indication on how to fine-tune memory allocation between the various caches/indices.

Version 3.5

This release includes many bug fixes, several new features and updates:

  • Remote notifications: A new mechanism to complement the existing high-performance 'in-process' notification mechanism. This new mechanism allows clients to subscribe for the given statement patterns to remote BigOWLIM repository instances.
  • Schema editing: Read-only schemas loaded at database initialisation time allow very fast deletion of (instance) statements by using the 'fact-retraction' method that computes the necessary inferred statements to delete. A new mechanism is provided with this release that allows 'read-only' schema statements to be modified when necessary.
  • Configuration spreadsheet tool: The memory calculator from previous versions has been updated to estimate appropriate BigOWLIM configurations for the specified hardware, dataset characteristics and selected features.
  • Query optimisations: Several improvements have been made to query optimisation, including the special case when using ORDER BY with LIMIT/OFFSET.
  • Online documentation: As well as the PDF format user guides included in the OWLIM distribution zip files, the latest documentation for all editions of OWLIM is now available online.
  • Storage files updated automatically: There are minor differences in storage file formats between versions. Versions of files back to 3.1 are now detected and updated automatically.
  • owl:sameAs optimisation can be disabled: The owl:sameAs optimisation can now be switched off using the disable-sameAs configuration parameter. This update might be useful when using the empty or rdfs rulesets.
  • Lucene-base full-text search enhancements: Even more fine-grained control over what to include in the indexed RDF molecule. Separate include/exclude lists are now supported for both predicates traversed and entities visited.
  • All OWLIM plug-ins available with Jena interface: All the BigOWLIM advanced features are now fully supported when using BigOWLIM with the Jena framework. This includes RDF Rank, RDF Search, Node Search, RDF Priming and Geo-spatial extensions.

Version 3.4

This release includes many bug fixes, several new features and updates:

  • Jena adapter: Applications which use the Jena framework http://jena.sourceforge.net/ or Jena-compliant RDF stores can seamlessly switch to BigOWLIM to take advantage of efficient loading and high-performance reasoning. At the same time, Jena's ARQ engine allows BigOWLIM to handle the latest SPARQL 1.1 extensions (e.g. aggregates). The adapter is still a beta version and has not been rigorously tested for conformance yet, but can be used with Joseki to make queries and has successfully passed the 100 Million BSBM SPARQL benchmark.
  • Geo-spatial extensions: Applications can efficiently make queries involving constraints such as "nearby point" and "within region". Special-purpose indices allow such constraints to be evaluated very efficiently on top of large volumes of location-related data – for example, finding airports within 50 miles of London in the GeoNames dataset becomes 500 faster when compared to the same query evaluated without the special indices.
  • Rule engine enhancements: The rule-engine now supports the ability to use context as part of rule premises and consequences. This allows for more efficient processing of certain RDFS/OWL constructions, particularly those rules using RDF lists. All included rule-sets have been upgraded to make use of this new expressiveness. As a result, there is now just a single rule-set for OWL2-RL, where in the last version there was a 'conformant' and a 'reduced' version. The new rule engine has lead to an improvement in LUBM loading performance of around 22%
  • OWL2-QL: This OWL2 profile http://www.w3.org/TR/owl2-profiles/#OWL_2_QL is based on DL-LiteR, a variant of DL-Lite that does not require the unique name assumption. It is designed to be amenable to implementation on relational databases, due to its suitability for re-writing queries to SQL. This release includes a rule-set for this profile in order to expand the range of standard rule-sets and to give users more flexibility when choosing a balance between complexity of inference and scalability.
  • Enhanced Lucene-based full text search: More flexibility is enabled for using Lucene full text search. Users can create multiple customised indices and can decide to include URIs or literals, select literals by language tags, and use custom analyzers and scorers. Any number of custom indices can be used within the same query.
  • Auto-restore: A configurable policy parameter can be used to specify how the user wishes the repository to start after an abnormal termination. By default, the database restorer tool will be run automatically to return the database to the state prior to the stop event, i.e. to the state after the last committed transaction.
  • Simplified 'implicit-only' statement retrieval: When using the Sesame API to return statements, the 'implicit' pseudo-graph is now used. This is simpler and more consistent with query processing than the old method of invoking RepositoryConnection.getStatements() twice.
  • Documentation: The distribution package includes two new guides: Replication Cluster Quick Start Guide that has details on installing and configuring a cluster and Performance Tuning Guide that brings together all information for optimising loading time, inference and query processing.

Version 3.3

BigOWLIM 3.3 consolidates a number of advanced new features, some of which have been available in previous versions as prototype implementations. The most important differences, compared to previous versions of BigOWLIM are:

  • Replication cluster: brings resilience, failover and horizontally scalable parallel query processing. A master node component is included that can manage a cluster of worker nodes (standard BigOWLIM instances) to synchronise updates, cater for node failure, dynamically add/remove worker nodes and distributed query requests. Such a setup allows for massive concurrent query performance where query the number of queries processed per second scales almost linearly with the number of worker nodes.
  • OWL2-RL: Support for this expressive OWL2 profile http://www.w3.org/TR/owl2-profiles/#OWL_2_RL that is amenable for implementation on rule-engines, but without costly data-type reasoning.
  • High performance retraction: BigOWLIM uses the policy of total materialisation of inferred knowledge. This has the advantage that all inferred statements are computed at load time, thus allowing query processing to proceed extremely quickly. While BigOWLIM has always been very fast at processing incremental statement insertions (which are monotonic by necessity), fast statement retractions are now also possible, despite not utilising any truth maintenance mechanism. When statements are deleted, a combination of forward and backward chaining is used to update the inferred closure. This is slower than statement insertion, but still fast enough to support hundreds of updates per hour even in a repository holding billions of statements.
  • Full text search: Two different, but complimentary full text search mechanisms are provided, one proprietary and the other based on Lucene. Text search queries can be embedded in SPARQL queries for powerful, hybrid query expressions.
  • Consistency checks: Are used to ensure the consistency of the repository. The checks use a syntax similar to entailment rules, but are used to signal a consistency violation when the necessary conditions are met. Consistency checks are used in some of the standard rule sets.
  • RDF Rank: Is a technique that identifies the more important or more popular entities in the repository by examining how many connections (predicates) link each node with any other node. The popularity of entities can then be used to order query results in a similar way to internet search engines, such as Google's PageRank.
  • RDF Priming: Allows subsets of statements to be selected as the input to query answering. It is based upon the concept of 'spreading activation' as developed in cognitive science. It allows the 'priming' of large datasets with respect to concepts relevant to the context and to the query.
  • Notification mechanism: A publish/subscribe mechanism for registering and receiving events from a BigOWLIM repository whenever triples matching a certain graph pattern are inserted or removed. This allows clients to react to events in the update stream and avoid polling the repository.
  • Documentation: improved user documentation with new quick start guide.
  • partialRDFS: this flag has been deprecated and the optimizations made available with extra rule-set options.
  • Ontology imports: importing ontologies can be achieved using a URL as well as a local pathname.
  • Better JDK integration: custom rule-sets require the Java compiler, but it is no longer necessary to have the tools.jar in the classpath.


Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.