BigOWLIM Release notes

Skip to end of metadata
Go to start of metadata

OWLIM Documentation

All versions

OWLIM 5.1 (latest)
OWLIM 3.5 (this version)

New features and significant bug-fixes/updates for the last few versions are recorded here.

Version 3.5

This release includes many bug fixes, several new features and updates:

  • Remote notifications: A new mechanism to complement the existing high-performance 'in-process' notification mechanism. This new mechanism allows clients to subscribe for the given statement patterns to remote BigOWLIM repository instances.
  • Schema editing: Read-only schemas loaded at database initialisation time allow very fast deletion of (instance) statements by using the 'fact-retraction' method that computes the necessary inferred statements to delete. A new mechanism is provided with this release that allows 'read-only' schema statements to be modified when necessary.
  • Configuration spreadsheet tool: The memory calculator from previous versions has been updated to estimate appropriate BigOWLIM configurations for the specified hardware, dataset characteristics and selected features.
  • Query optimisations: Several improvements have been made to query optimisation, including the special case when using ORDER BY with LIMIT/OFFSET.
  • Online documentation: As well as the PDF format user guides included in the OWLIM distribution zip files, the latest documentation for all editions of OWLIM is now available online.
  • Storage files updated automatically: There are minor differences in storage file formats between versions. Versions of files back to 3.1 are now detected and updated automatically.
  • owl:sameAs optimisation can be disabled: The owl:sameAs optimisation can now be switched off using the disable-sameAs configuration parameter. This update might be useful when using the empty or rdfs rulesets.
  • Lucene-base full-text search enhancements: Even more fine-grained control over what to include in the indexed RDF molecule. Separate include/exclude lists are now supported for both predicates traversed and entities visited.
  • All OWLIM plug-ins available with Jena interface: All the BigOWLIM advanced features are now fully supported when using BigOWLIM with the Jena framework. This includes RDF Rank, RDF Search, Node Search, RDF Priming and Geo-spatial extensions.

Version 3.4

This release includes many bug fixes, several new features and updates:

  • Jena adapter: Applications which use the Jena framework or Jena-compliant RDF stores can seamlessly switch to BigOWLIM to take advantage of efficient loading and high-performance reasoning. At the same time, Jena's ARQ engine allows BigOWLIM to handle the latest SPARQL 1.1 extensions (e.g. aggregates). The adapter is still a beta version and has not been rigorously tested for conformance yet, but can be used with Joseki to make queries and has successfully passed the 100 Million BSBM SPARQL benchmark.
  • Geo-spatial extensions: Applications can efficiently make queries involving constraints such as "nearby point" and "within region". Special-purpose indices allow such constraints to be evaluated very efficiently on top of large volumes of location-related data – for example, finding airports within 50 miles of London in the GeoNames dataset becomes 500 faster when compared to the same query evaluated without the special indices.
  • Rule engine enhancements: The rule-engine now supports the ability to use context as part of rule premises and consequences. This allows for more efficient processing of certain RDFS/OWL constructions, particularly those rules using RDF lists. All included rule-sets have been upgraded to make use of this new expressiveness. As a result, there is now just a single rule-set for OWL2-RL, where in the last version there was a 'conformant' and a 'reduced' version. The new rule engine has lead to an improvement in LUBM loading performance of around 22%
  • OWL2-QL: This OWL2 profile is based on DL-LiteR, a variant of DL-Lite that does not require the unique name assumption. It is designed to be amenable to implementation on relational databases, due to its suitability for re-writing queries to SQL. This release includes a rule-set for this profile in order to expand the range of standard rule-sets and to give users more flexibility when choosing a balance between complexity of inference and scalability.
  • Enhanced Lucene-based full text search: More flexibility is enabled for using Lucene full text search. Users can create multiple customised indices and can decide to include URIs or literals, select literals by language tags, and use custom analyzers and scorers. Any number of custom indices can be used within the same query.
  • Auto-restore: A configurable policy parameter can be used to specify how the user wishes the repository to start after an abnormal termination. By default, the database restorer tool will be run automatically to return the database to the state prior to the stop event, i.e. to the state after the last committed transaction.
  • Simplified 'implicit-only' statement retrieval: When using the Sesame API to return statements, the 'implicit' pseudo-graph is now used. This is simpler and more consistent with query processing than the old method of invoking RepositoryConnection.getStatements() twice.
  • Documentation: The distribution package includes two new guides: Replication Cluster Quick Start Guide that has details on installing and configuring a cluster and Performance Tuning Guide that brings together all information for optimising loading time, inference and query processing.

Version 3.3

BigOWLIM 3.3 consolidates a number of advanced new features, some of which have been available in previous versions as prototype implementations. The most important differences, compared to previous versions of BigOWLIM are:

  • Replication cluster: brings resilience, failover and horizontally scalable parallel query processing. A master node component is included that can manage a cluster of worker nodes (standard BigOWLIM instances) to synchronise updates, cater for node failure, dynamically add/remove worker nodes and distributed query requests. Such a setup allows for massive concurrent query performance where query the number of queries processed per second scales almost linearly with the number of worker nodes.
  • OWL2-RL: Support for this expressive OWL2 profile that is amenable for implementation on rule-engines, but without costly data-type reasoning.
  • High performance retraction: BigOWLIM uses the policy of total materialisation of inferred knowledge. This has the advantage that all inferred statements are computed at load time, thus allowing query processing to proceed extremely quickly. While BigOWLIM has always been very fast at processing incremental statement insertions (which are monotonic by necessity), fast statement retractions are now also possible, despite not utilising any truth maintenance mechanism. When statements are deleted, a combination of forward and backward chaining is used to update the inferred closure. This is slower than statement insertion, but still fast enough to support hundreds of updates per hour even in a repository holding billions of statements.
  • Full text search: Two different, but complimentary full text search mechanisms are provided, one proprietary and the other based on Lucene. Text search queries can be embedded in SPARQL queries for powerful, hybrid query expressions.
  • Consistency checks: Are used to ensure the consistency of the repository. The checks use a syntax similar to entailment rules, but are used to signal a consistency violation when the necessary conditions are met. Consistency checks are used in some of the standard rule sets.
  • RDF Rank: Is a technique that identifies the more important or more popular entities in the repository by examining how many connections (predicates) link each node with any other node. The popularity of entities can then be used to order query results in a similar way to internet search engines, such as Google's PageRank.
  • RDF Priming: Allows subsets of statements to be selected as the input to query answering. It is based upon the concept of 'spreading activation' as developed in cognitive science. It allows the 'priming' of large datasets with respect to concepts relevant to the context and to the query.
  • Notification mechanism: A publish/subscribe mechanism for registering and receiving events from a BigOWLIM repository whenever triples matching a certain graph pattern are inserted or removed. This allows clients to react to events in the update stream and avoid polling the repository.
  • Documentation: improved user documentation with new quick start guide.
  • partialRDFS: this flag has been deprecated and the optimizations made available with extra rule-set options.
  • Ontology imports: importing ontologies can be achieved using a URL as well as a local pathname.
  • Better JDK integration: custom rule-sets require the Java compiler, but it is no longer necessary to have the tools.jar in the classpath.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.