New features and significant bug-fixes/updates for the last few releases are recorded here. Full version numbers are given as:
major.minor.build_number
e.g. 5.3.5928 where the major number is 5, the minor number is 3 and the build number is 5928. Releases with the same major and minor version numbers do not contain any new features, the only difference is that releases with later build numbers contain fixes for bugs discovered since the previous release. New or significantly changed features are released with a higher major or minor version number.
FIX: Storage index no longer grows unexpectedly after executing DROP ALL or RepositoryConnection.clear()
FIX: Incremental retraction may cause invalid inferred statements within contexts to remain after a deletion
FIX: A memory leak may be triggered if the logger level is DEBUG or finer on some of the internal components
Version 5.3 (build 6011)
FIX: Inconsistent data-type indexes may lead to invalid or partial query results
FIX: Rebuilding predicate lists can lead to an infinite loop
Version 5.3 (build 5928)
FIX: Reduced contention for parallel queries on the shared configuration data structures
FIX: Numerous changes to reduce memory consumption and memory leaks
FIX: For apparent corruption of predicate statistics used during query optimisation
FIX: For file handle leak when incrementally updating a lucene index when no changes have occurred
FIX: For removing a class membership of an instance of a member of owl:intersectionOf set that does not remove the membership to the intersection itself
FIX: Incorrect computation in complexity estimation that can lead to suboptimal query plan.
Version 5.3 (build 5849)
This maintenance release addresses a number of significant issues:
FIX: Consistency violation could cause calling thread to deadlock
FIX: SPARQL updates should not be influenced by the query-timeout and query-limit-results parameters
FIX: Some long running queries do not show in the JMX MBean and/or could not be terminated
Version 5.3 (build 5777)
Improvement: Small transaction logs are processed in memory, so that a series of small updates are processed more quickly.
Improvement: Better values for statistics are used for query optimisation for certain statement patterns.
FIX: Use of sesame:directType predicate returns extra incorrect matches during query answering.
FIX: Query-timeout no longer applies to backup operations
FIX: Occasional NullPointerException when query evaluation exceeds time-out setting
FIX: Performance degradation after a large number of inserts and deletes due to incorrect predicate statistics that affect query-optimisation.
FIX: Spurious exception trace when re-initialising a Lucene FTS index.
FIX: Certain configurations of Lucene FTS index can not be serialised.
FIX: A bug prevented owl:sameAs statements from being visible during query answering.
Version 5.3
This is a maintenance release that includes Sesame 2.6.10 and the following significant updates:
New standalone, ready-to-run OWLIM-Workbench, which combines Sesame and OWLIM with a jetty server, and includes an administration interface built using the Forest framework. This framework was developed internally at Ontotext and is used to build services such as FactForge|http://factforge.net/] and LinkedLifeData. The OWLIM-Workbench provides a super set of features currently provided by the bundled Sesame Workbench, including:
Security: Easier to configure HTTP authentication, administration of user accounts and repository level access control
Repository management and configuration editing
Data loading functionality
Query execution using SPARQL syntax highlighting and results inspection
Data exploration and export
System information reporting
'Nested repositories' is an experimental OWLIM-SE feature for 'stacking' or 'sharing' repositories suitable for situations in which a large collection of reference data (perhaps some collection of linked open datasets) are shared by multiple OWLIM instances. Rather than store the large reference data in each specialised repository, it can just be referenced and will be used for query answering and for computing inferences with local data in the specialised repository.
Monitoring and control functions have been expanded with the ability to terminate a runaway update transaction. SPARQL will allow the construction of updates that can be cause arbitrarily large numbers of new statements to be inserted. In such a case, it is possible to instruct OWLIM to abort a long running update and rollback to the previous state. Better query logging is also provided for examining execution plans and on-the-fly optimisations.
A performance degradation when loading very large datasets has been fixed. After a few billion statements load performance started to drop and by around 6 billion statements performance was four times slower than it should be. After the fix the data loading speed at 20 billion statements is around 50% slower than with an empty database.
It is now possible to put a global limit on the number of query results per query. Any queries that generate more results will have the remainder truncated. This feature can be useful for any public-facing SPARQL endpoints.
The Jena adapter layer for OWLIM-SE has been updated to version 2.7.3. This is also reflected in the TopBraid Composer plug-in.
Consistency checks in OWLIM-SE and OWLIM-Enterprise are now strictly enforced. If consistency checking is enabled, data that causes an inconsistency will not be allowed and an update transaction containing an inconsistency will abort and rollback to the previous database state. For example, if using the OWL2-RL ruleset an attempt to declare an individual as being a member of two disjoint classes will trigger a rollback.
The full set of updates for this release include:
New Feature
OWLIM-444 - Nested/virtual repositories
OWLIM-527 - Allow for forced termination of an update transaction
Improvement
OWLIM-887 - MD5 snapshot is not needed for OWLIM-SE
OWLIM-888 - Improve logging of query execution plan and JMX query monitoring
OWLIM-902 - Globally limit the number of results for queries
OWLIM-908 - Improve query logging.
Bug
OWLIM-559 - Builtin ruleset works differently if used as precompiled "owl-max(-optimized)" and through distribution Builtin_Rules.pie
OWLIM-820 - LUBM fails with external ruleset
OWLIM-822 - DELETE query with a wildcard predicate takes excessive time
OWLIM-856 - Sesame server stops responding after period of use
OWLIM-857 - OwlimSchemaRepository and SailImpl do not implement NotifyingSail.
OWLIM-858 - DESCRIBE query causes SailConnectionImpl.evaluate() to throw RuntimeException
OWLIM-860 - Query-timeout causes out of memory error
OWLIM-862 - Lucene lock problem when using full-text search incremental update
OWLIM-863 - ASK query matches statements in defaut graph when includeInferred=false
OWLIM-864 - Memory leak with simple ASK query
OWLIM-865 - Predicate list index causes some query results to be lost
OWLIM-880 - Performance degradation after a large number of inserts and deletes
OWLIM-883 - Rebuilding context index failed
OWLIM-904 - Plug-in 'preprocess()' called twice within single request session
OWLIM-911 - Accessing internal identifiers returns same value for all
OWLIM-913 - Remove number of pages in POS/PSO from worker signature
Task
OWLIM-850 - Upgrade OWLIM-SE to Jena v2.7.3
OWLIM-866 - Add sparql update functionality to getting-started
OWLIM-876 - Verify optional indices are up-to-date and rebuild if necessary
OWLIM-877 - Log full version number at start-up
OWLIM-899 - Update TopBraid Composer plug-in (bundle and connector)
OWLIM-900 - Allow plug-ins to force the rollback of a transaction
OWLIM-905 - Force a rollback when a consistency check fails
Version 5.2 (build 5563)
Fix to prevent the query optimiser choosing a sub-optimal query plan after a long sequence of insert and delete modifications. Fragmentation of storage pages was causing errors in the complexity computations.
Fix to prevent concurrent modification exceptions when namespaces are being updated.
Version 5.2 (build 5512)
Fix to prevent a memory leak due to connection references kept by the PluginManager. This also can cause a performance degradation over time.
Fix to dataset management that was causing explicit triples from the default (nameless) graph to be included as input to query execution when the query uses FROM or FROM NAMED and the includeInferred parameter is set to false.
Version 5.2 (build 5497)
Fix to prevent org.apache.lucene.store.LockObtainFailedException when incrementally updating a Lucene index. The index's configuration was being incorrectly serialised causing unpredictable behaviour.
Fix for missing query results when the optional predicate-lists index is switched on. With this index enabled, statements with certain predicates were being ignored.
Version 5.2 (build 5479)
Fix for an out of memory error that can be caused when using the query-timeout parameter.
Version 5.2 (build 5331)
Fix the known problem that prevents custom rule files being compiled when using Java 1.7
Fix to avoid the stack overflow problem when optimising certain SPARQL queries that use the MINUS operator.
Known problems
When programmatically copying statement objects between two separate OWLIM instances, a problem with the caching of resource values can cause "java.lang.RuntimeException: Cannot retrieve URI with ID of..." - if this problem occurs, statement objects retrieved from one OWLIM instance should be processed using this code fragment before being inserted in to the second OWLIM instance.
Version 5.2 (build 5316)
This is a maintenance release that includes Sesame 2.6.8 (change log 2.6.7change log 2.6.8). Note that 2.6.7 is NOT backward compatible with 2.6.6 due to a couple of minor changes to interfaces. The following significant updates have been made:
Support for the N-Quads RDF format
Changes to the Plug-in SDK
Add transaction begin/end information to Statements.Listener interface
Allow for pre-processor plug-ins to modify the query inside their request
StatementIterator has new methods for testing read-only, explicit and implicit status
Improvements to the getting-started application to allow it to load very large RDF files without the need to break them in to smaller pieces
Improved locking (less contention) in the entity pool and when using RDF Rank
Cache/index statistic now always collected over JMX (attribute to switch this on/off has been removed)
Known problems:
Custom rule-sets will not compile when using Java 1.7 - this will be fixed in the near future with an interim update
The full set of updates for this release includes:
Bug
OWLIM-359 - Support different file formats in "imports" parameter
OWLIM-495 - Blank node contexts ignored by getStatements()
OWLIM-696 - Context parameter ignored when reading statements using HTTP protocol
OWLIM-767 - Improve thread synchronisation for RDF rank plug-in
OWLIM-782 - INSERT update hangs and consumes huge amount of memory
OWLIM-784 - Poor query performance due to query optimisation problems when using the Sesame's QueryJoinOptimizer.
OWLIM-804 - Owlim SE runs a query with an optional and a property path 10 times slower than Owlim Lite.
OWLIM-807 - Using Predicate Lists has an adverse effect on query performance - incorrect complexity estimate
OWLIM-813 - Slow deletion of statements using DELETE WHERE
OWLIM-815 - Lucene FTS functional tests failing when executed against remote repository
OWLIM-825 - Query timeout does not apply on certain queries
Improvement
OWLIM-590 - Improve efficiency of RDFRank recomputation
OWLIM-706 - Collect and export statistics over JMX unconditionally
New Feature
OWLIM-697 - Add support for the NQuads RDF format
OWLIM-582 - Allow for Preprocessor plug-ins to modify the query inside their request
OWLIM-774 - Create a read-only system named graph that will be used in the UniversalConverter to separate the schema statements from the other explicit statements.
Task
OWLIM-496 - Axiomatic statements should behave as inferred statements during query answering
OWLIM-788 - Add transaction information to plug-in SDK callback interface
OWLIM-796 - Add more methods to plug-in SDK StatementIterator to test for explicit and implicit attributes
OWLIM-816 - Share BufferPool instances that manage ByteBuffers of the same size
OWLIM-830 - Improve loading behaviour of getting started to handle huge RDF files
Version 5.1 (build 5208)
Fix the known problem in build 5183. An incompatibility between OWLIM and Sesame query optimisation (QueryJoinOptimizer) causes poor performance in certain circumstances. The use of QueryJoinOptimizer has been restricted to sub-select optimisation only.
Version 5.1 (build 5183)
This is a maintenance release that includes Sesame 2.6.6 and many fixes from interim releases made since 5.0. Repositories created with version 5.0 are binary compatible with 5.1, i.e. the OWLIM software can be updated and used with existing storage files created with version 5.0.
The following improvements have been made:
Axiomatic statements now treated as inferred statements during query answering
License file can now be set using an environment variable: OWLIM_LICENSE_FILE
Username and password parameters added to GettingStarted when using remote repositories with HTTP authentication
Fixed an integer overflow bug in the compression module that results in exception whenever the overlay file grows beyond 2G.
Fixed a bug that can cause big query time increase when using plugins. It occurs when using the plugin triple patterns in combination with an ordinary one which has a very large collection size and is placed before the plugin triple pattern.
Fixed a bug that can cause incorrect query optimisation and/or a NullPointerException at query time. This problem can occur when estimating the number of matching triples for patterns containing a predicate for which there are no asserted statements in the repository.
Workaround to avoid unnecessary BottomUpJoinIteration(sub-selects intersection) when a sub-select is joined with an ordinary statement pattern or a join of such.
System statements filtered out from getContectIDs()
Resolved memory leaks when Updates are mixed with queries involving unbound predicate variables. That cause all unused Indexes to be kept locked.
Apply external bindings prior to handle query optimization. Speeds up such queries by avoiding having filters in 'AfterOptionals'
Collected Namespaces were not properly persisted on Windows
Added transactional handling of changes in properties file - fingerprints, namespaces, geometry, etc..
Rolled back transactions do not close transaction log files in a timely manner, leading to "too many open files" error when many rollbacks occur in sequence. Clean-up code has been relocated to ensure that it is called immediately.
Equivalence class updates do not close temporary files in a timely manner, leading to "too many open files" error when many transactions containing owl:sameAs statements are committed in sequence. Temporary files now removed immediately after the transaction completes.
Rebuild of predicate lists fails on Windows - the old files were locked and not deleted.
Repository lockfile is not released after failed initialisation. The new behaviour is to remove the lockfile if initialisation has failed and the lock file did not exist at the start of initialisation.
Schema updates should not allow removal of inferred statements.
Suboptimal query plan when using geospatial index
System contexts (from ternary relations in rule files) are visible to the Sesame workbench when browsing contexts
Namespaces lost when instance terminated
Known problems
The improvements in both Sesame and OWLIM for better optimisation of sub-queries have unfortunately caused a regression in query performance when using property paths with optional elements. This issue is being urgently addressed and a fix will be available very soon via an interim 5.1 release (later build number).
If you experience this problem, it might be possible to re-write such queries using a UNION, e.g.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.com#>
SELECT * WHERE {
{ ex:PersonA foaf:knows/foaf:name ?name }
UNION
{ ex:PersonA foaf:knows/foaf:knows/foaf:name ?name }
# with more UNIONs for longer paths as necessary
}
Version 5.0
This version of OWLIM-SE is not backwardly compatible with any previous version. This means that images created with OWLIM 4.3 and before will not work correctly with OWLIM 5.0 and must be re-created. There have been a great many modifications to the storage files, indexing structures, etc, and upgrade mechanisms have proven too complex and probably slower than re-loading the database anyway. Please do not attempt to upgrade to OWLIM 5.0 unless you drop and recreate all databases. A migration tool is provided in the root directory of the distribution zip file for copying databases created with older versions of OWLIM. It is called convert.sh/.cmd depending on the operating system. Executing this script without parameters will cause it to output information on how to use it.
Transaction management and isolation mechanisms have been completely refactored. The previous strategy used very lazy writing of modified database pages, such that dirty pages were only flushed to disk when further updates occur and no more memory is available. While extremely fast, the problem with this approach is that there is a considerable recovery time associated with replaying the transaction log after an abnormal termination. The new mechanism uses two modes: 'bulk-loading' (fast) with similar behaviour to previous versions and 'normal' (safe) where database modifications are flushed to disk as part of the commit operation. When running in safe mode, database recovery is instant and there is a significant improvement in concurrency between updates and queries. Some related changes are:
There is a new parameter to control the transaction 'mode' called transaction-mode - see the configuration section
The database-recovery-policy configuration parameter is no longer required and has been removed
The special flush predicate http://www.ontotext.com/flush used to force pages to be written to disk is no longer required and has been removed. Statements using this predicate will be treated like any other statement.
The special reinfer predicate http://www.ontotext.com/owlim/system#reinfer used to force a re-computation of all inferences has been removed. Statements using this predicate will be treated like any other statement.
In fast transaction mode, the isolation constraint can be relaxed in order to improve concurrency behaviour when strict read isolation is not a requirement - this is controlled by a new parameter transaction-isolation that only has an effect in fast mode, see the configuration section
No recovery mechanisms are in place when running in fast mode - therefore administrators must treat an abnormal termination during bulk-loading as a fatal event and must restart the loading procedure
New context indices can be used to improve query performance when data is modelled using many named graphs. These are switched on and off using a single configuration parameter enable-context-index - see the configuration section
The SPARQL 1.1 Graph Store HTTP Protocol is now supported according to the W3C Working Draft from the 12th May 2011. This provides a REST interface for managing collections of graphs, using either directly or indirectly named graphs.
Sesame 2.6.5 with many bug-fixes and updates to bring SPARQL 1.1 Query support up to the latest W3C Working Draft from the 5th January 2012.
Significant reduction in disk-space requirements is achieved with the following modifications:
Index compression can now be used to reduce disk storage requirements by using zip compression on database pages. This feature if off by default, but can be switched on when creating a new repository. The configuration parameter index-compression-ratio can be set to -1 (the default value indicating no compression) or a value in the range [10-50] indicating the desired percentage reduction in page sizes. Any pages that can not be compressed by the specified amount are stored uncompressed. Therefore a compression ratio that is too aggressive will not bring many benefits. Experiments have shown that for large datasets a value of about 30% is close to optimal.
Restructuring of the triple indices has also led to a reduction in disk-space requirements of around 18% independent of the compression functionality
Entity compression is a modification that reduces the storage requirements for the lookup table that maps between internal identifiers and resources. This is transparent to the user and happens automatically. More disk space reductions are apparent using this version.
A new literal index is created automatically for numeric and date/time data-types. The index is used during query evaluation only if a query or a subquery (e.g. union) has a filter that is comprised of a conjunction of literal constraints, e.g. FILTER(?x >= 3 && ?y <= 5 && ?start > "2001-01-01"^^xsd:date). Other patterns, including those that use negation, will not use the index for this version of OWLIM.
All control queries now use SPARQL Update syntax (used mostly to control the Lucene-based full-text search, RDF Rank and geo-spatial plug-ins). This has a number of advantages, namely:
No special control query pseduo-graph is required by the Replication Cluster master in order to identify control queries that must be pushed to all worker nodes
SPARQL Updates use the corresponding SPARQL update protocol, so they can be automatically processed by load-balancers that examine URL patterns
It is more consistent with the SPARQL language, since these 'control queries' cause a change of state in OWLIM
Incremental Lucene-based full-text search index for updating the index for specific resources or all un-indexed resources. Using this technique can avoid the more expensive approach of rebuilding the whole index frequently.
Incremental RDF Rank allows the RDF rank for specific resources to be (re-)computed as directed by the user. This technique can avoid the more expensive approach of rebuilding all RDF Rank values frequently.
As well as the cache/index statistics, performance analysis data is now provided about currently executing queries including: how many results have been returned so far, how long it has been executing, average time to return each result, etc. This facility is provided by a JMX management bean and also allows an administrator to force and early termination of long-running or resource-heavy queries.
The Geo-spatial index has been updated to support 40-bit resource identifiers.
The getting started application has been restructured so that it now works with remote repositories.
OWLIM also includes the following maintenance updates and fixes:
Bugs
OWLIM-710 Wrong collection size for triple pattern when predicate is variable
OWLIM-699 PredicateList causes duplicate rdf:type statements to be shown
OWLIM-679 Nested OPTIONALs might result in incorrect bindings
OWLIM-674 Missing statement after more than 24 hours of randomly killing test
OWLIM-672 Lexvo dataset broke OWLIM
OWLIM-669 Sesame workbench no longer provides OWLIM options when creating a new repository
OWLIM-652 Race condition in HashEntityPool when reading stored entity
OWLIM-646 No statistics for page usage from collection classes
OWLIM-641 Can not restore geo-spatial index
OWLIM-633 Statement not persisted after shutdown.
OWLIM-631 In read-only mode calling clear() causes a hang
OWLIM-624 Entering the license file in Sesame Workbench requires backslashes to be escaped (twice)
OWLIM-614 Query timeout omits queries with some features of SPARQL 1.1
OWLIM-610 Functional test 'TestRestorePredicates' does not use a separate repository path
OWLIM-601 Inferred statements are different depending on the order of loading explicit statements
OWLIM-591 Query problem with variable binding
OWLIM-588 Missing backup file causes infinite Database restore process
OWLIM-587 Inconsistent bnode values in INSERT WHERE update
OWLIM-574 Some configuration parameters can not be set using RDF description (Turtle config file)
OWLIM-572 Conformance tests hang with CPU at 100%
OWLIM-571 LUBM 100 hangs with CPU at 100%
OWLIM-544 After initialisation, AVLRepository's numberOfStatements and numberOfExplicitStatements are set to 0
OWLIM-541 Custom Analyzer not used by the QueryParser
OWLIM-532 OwlimSchemaRepository doesn't set the entity pool in read-only mode
OWLIM-529 Performance regression for a query between 4.2 and 4.3
OWLIM-526 Lucene plug-in throws NPE during Lucene query in case the query cannot be parsed or the index is corrupted
OWLIM-517 External bindings are not properly handled for Lucene (and possibly other) triple patterns
OWLIM-512 The "query-timeout" parameter is interpreted in milliseconds and it should be interpreted as seconds
OWLIM-510 Basic operations are very slow with more complex rule-sets
OWLIM-509 NullPointerException when programmatically setting the SPARQL dataset
OWLIM-501 Lucene / OPTIONAL query bug
OWLIM-494 Lockfile is not removed after failed initialisation
OWLIM-468 owl:equivalentProperty semantics do not hold with OWL-Horst
OWLIM-558 Missing entities in Date Literals index
New features and improvements
OWLIM-647 Have the entity pool hashmap use the top 32 bits of the entity hash for indexing
OWLIM-637 Experiment with multithreaded flush in page cache
OWLIM-602 Use a prebuild, fixed size, Trees on pages that needs to be modified
OWLIM-540 Upgrade lucene version of lucene plugin
OWLIM-535 Don't empty caches on flush
OWLIM-484 Add xsd:float datatype to RDF Rank results
OWLIM-453 Investigate cause of slowdown in new HashEntityPool
OWLIM-431 Allow read-only property to be set by repository id
OWLIM-418 ShutDown should not wait for readers to complete
OWLIM-636 Improve software license validation
OWLIM-583 Extend plug-in API's PatternInterpreter with a light-weight query method that tells the SDK if a pattern can be interpreted
OWLIM-556 Drop the "reinfer" functionality
OWLIM-302 Improve performance when using ORDER and LIMIT
Other
OWLIM-634 Add support for LZ compression
OWLIM-608 Experiment with various SPARQL queries against loaded data.
OWLIM-607 repare and load just the useful parts of dbpedia.
OWLIM-606 Explore dbpedia data for relations that can be well-optimised by the literals index
OWLIM-41 JUnit tests for checking dataset and graph scope in the new inner query model
OWLIM-676 Check the behaviour of DROP DEFAULT
OWLIM-630 Make "read.only.mode" a proper configuration parameter
OWLIM-629 Make Getting Started more robust
OWLIM-628 Add check for failure in FAST mode the previous instance
OWLIM-627 Remove the database restorer and backup file
OWLIM-626 Check that optional indices are created in a 'transactional' way
OWLIM-617 Remove RDF(S) axiomatic triples for container membership properties
OWLIM-611 Re-use OWLIMSailSchema constants in OWLIMSchemaRepository
OWLIM-575 Remove unused configuration parameters from OWLIM-SE and OWLIM-Lite
OWLIM-573 Add more configuration parameters to OWLIM-Lite and OWLIM-SE Sesame Workbench templates
OWLIM-565 Make sure the Entity Pool is transaction-safe
OWLIM-534 Geo-spatial index must use entity IDs wider than 32-bit
OWLIM-522 Refactor LUBM test drivers
OWLIM-422 Reformulate control queries to use SPARQL 1.1 Update syntax
OWLIM-405 Datatype index for improved performance of data range queries
OWLIM-403 Implement a better transaction isolation mechanism (allow 1 update in parallel with multiple reads)
Known problems
The behaviour of the 'include inferred' checkbox in the Sesame Workbench is unpredictable when using OWLIM repositories.
Version 4.3
Further contributions to the Sesame framework from Ontotext and Fluid Operations mean that Sesame version 2.6 is included with this version of OWLIM. The following new features are available:
SPARQL 1.1 Federation support that allows queries to pull together data from any number of distributed SPARQL endpoints
A new SPARQL repository type to wrap SPARQL endpoints
Improvements to the parser for controlling the level of literal/data-type validation and the handling of errors
Many other fixes for compliance with the latest revised SPARQL 1.1 working drafts
OWLIM has now has a plug-in API that allows users to build software components that alter the behaviour of OWLIM. This mechanism can be used to add new features or to improve performance in certain scenarios.
OWLIM also includes the following maintenance updates and fixes:
OWLIM-205 - Validate literal languages and do not allow invalid language tags to enter the repository
OWLIM-273 - Potential thread leak in QueryModelConverter
OWLIM-390 - Counting statements using Sesame API gives strange results.
OWLIM-419 - Make RepositoryConnection.exportStatements obey the time limit
OWLIM-426 - Unable to permanently remove predefined namespace definitions
OWLIM-428 - Explicit axioms don't show up as explicit if they have been inferred before by other axioms
OWLIM-463 - Clear transaction log in replication cluster if it cannot be initialized
OWLIM-466 - SesameConnectionImpl.getStatements must return quads, not trips (breaks workbench explore)
OWLIM-470 - Query with Union and optional returns wrong results
OWLIM-471 - Can not access new repository when FTS switched on (divide by zero or lockfile locked)
OWLIM-473 - onto:explicit pseudo-graph does not prevent implicit statements as input for query answering
OWLIM-475 - Repackaged console.sh in openrdf-console.zip has lost its execute attribute
OWLIM-476 - Neither of the slf4j jars (api or jdk14) are needed in the war files
OWLIM-483 - Lost solutions to queries with FROM <...> clause
OWLIM-485 - Repository with many transactions fails to get restored
OWLIM-488 - Incorrect behaviour of FROM and FROM NAMED in SPARQL queries
OWLIM-489 - Predicate list indices do not log statistics
OWLIM-490 - User-supplied Dataset object on query not properly handled
OWLIM-491 - Query rewriting in MainQuery.convertToOptimizedForm() converts OR to AND in filters when converting the condition to disjunctive normal form
OWLIM-495 - Blank node contexts ignored by getStatements()
OWLIM-501 - Lucene and OPTIONAL query bug
OWLIM-502 - The database restorer deletes the pso and pos files after second unsuccessful restore
OWLIM-457 - Validate data-type values at load time
OWLIM-497 - Update getting-started and add timestamps
OWLIM-356 - Optimized rule set is not compatible with the rule compiler.
OWLIM-480 - Make use of the com.ontotext.trree.collections for the predicate map in order to reuse the file header and the common interface
Version 4.2
Ontotext have continued to invest in the Sesame project and are pleased to announce the inclusion of Sesame version 2.5 with this version of OWLIM. The benefits include:
SPARQL 1.1 Query conformance has been updated to the May 2011 working draft, i.e. all the remaining behaviour has been implemented along with all the new SPARQL filter functions.
A new binary RDF serialization format. This format has been derived from the existing binary tuple results format. It's main features are reduced parsing overhead and minimal memory requirements.
As well as integration with the new Sesame APIs and modifications for optimising SPARQL Update, there have also been a number of bug fixes in this version of OWLIM-SE:
OWLIM-396 - A RuntimeException is thrown in clearNamespaces() in SailConnection
OWLIM-404 - HashEntityPool fails to store/read its entity index table if its size is more than ~500M
OWLIM-408 - Getting of default namespace doesn't work
OWLIM-440 - Can not create geo-spatial index when using OWLIM-SE with Tomcat
OWLIM-443 - Repository fails to start - entity pool error
OWLIM-445 - disable-sameAs causing query evaluation to lose bindings
OWLIM-446 - Query.setIncludeInferred() is ignored
OWLIM-447 - License file can not be specfied - default evaluation license is always used.
OWLIM-449 - Wrong conversion from int to long in com.ontotext.trree.plugin.lucene.LuceneIterator
OWLIM-452 - Multiple wrong results are returned for a CONSTRUCT query
OWLIM-454 - EntityStorageVersion3 fails to restore if a long entity has negative size.
OWLIM-455 - Cannot put any more statements in AVL tree after ~3.1B statements added during 3.5-to-4.0 conversion
OWLIM-305 - Rationalise OWLIM vocabulary
Version 4.1
This maintenance release includes Sesame 2.4.2, which fixes several important bugs in SPARQL 1.1 Query support:
Unexpected binding returned in a Sparql query with union within an optional expression
FILTER in OPTIONAL patterns returns incorrect results
Aggregate SPARQL query fails with IndexOutOfBoundsException
Default and named graphs set in a SPARQL query are ignored by the Jena connector
Version 4.0
BigOWLIM has been renamed to OWLIM-SE (standard edition). This new brand name better reflects the role of this software component at the heart of the OWLIM family. This component is a fully-featured, standalone semantic repository as well as the the engine behind the worker nodes in an OWLIM-Enterprise cluster. It's younger sibling, OWLIM-Lite, is the lighter-weight free-for-use version.
Easy to deploy WAR files: The distribution now includes openrdf-sesame and openrdf-workbench Web applications pre-configured with OWLIM and ready to deploy. This makes installing OWLIM as a server and creating/administrating OWLIM repositories trivially simple. The WAR files can be found in the sesame_owlim directory of the distribution ZIP file. See 'easy install' in the installation section.
SPARQL 1.1 Query: Ontotext has invested significant development resources in the Sesame project in order bring SPARQL 1.1 support to Sesame and OWLIM. This release includes SPARQL 1.1 Query, but without federation support for the moment. SPARQL 1.1 Update support will be included in the next release. The new features include:
Aggregates
Subqueries
Negation
Expressions in the SELECT clause
Property Paths
Assignment
A short form for CONSTRUCT
An expanded set of functions and operators
The SPARQL 1.1 specification has not yet become a W3C recommendation and continues to evolve. The following known issues apply to this release of OWLIM and Sesame:
fn:concat is not supported. This was added to the working draft in May, just after the Sesame 2.4.0 release was finalised. It will likely be included in the next Sesame/OWLIM release.
Federation is not yet supported. This will be implemented in a later version of Sesame and OWLIM later this year.
There are some problems with complex expressions in the SELECT clause. This should be fixed in the next release of Sesame/OWLIM.
Empty IN() and NOT IN() clauses will cause an exception - will be fixed in the next release.
Using the aggregate function SUM() will cause an exception if the there are no bindings over which to do the summation - will be fixed in the next release.
Wider entity IDs: For very large datasets that contain more than 232 unique entities (URIs, blank nodes and literals), OWLIM can be switched in to a new mode that uses 40bit IDs, thus allowing over 1 trillion unique entities.
Access to internal entity IDs: For some applications, especially where RDF URIs are use to index external data, e.g. some legacy system or some other type of storage, then a special predicate and function can be used to find the internal ID used to index an entity or to find an entity based on its ID. This allows for more efficient indexing in external systems, where an integer index can be used instead of a URI string.
Internal performance analytics: It is now possible to monitor the internal behaviour of OWLIM indices using a JMX interface. Statistics can be accessed that show the cache behaviour (number of hits, misses, reads, writes, etc) and these can be helpful when diagnosing any performance problems when loading datasets or when evaluating certain queries. The statistics can give some indication on how to fine-tune memory allocation between the various caches/indices.
Version 3.5
This release includes many bug fixes, several new features and updates:
Remote notifications: A new mechanism to complement the existing high-performance 'in-process' notification mechanism. This new mechanism allows clients to subscribe for the given statement patterns to remote BigOWLIM repository instances.
Schema editing: Read-only schemas loaded at database initialisation time allow very fast deletion of (instance) statements by using the 'fact-retraction' method that computes the necessary inferred statements to delete. A new mechanism is provided with this release that allows 'read-only' schema statements to be modified when necessary.
Configuration spreadsheet tool: The memory calculator from previous versions has been updated to estimate appropriate BigOWLIM configurations for the specified hardware, dataset characteristics and selected features.
Query optimisations: Several improvements have been made to query optimisation, including the special case when using ORDER BY with LIMIT/OFFSET.
Storage files updated automatically: There are minor differences in storage file formats between versions. Versions of files back to 3.1 are now detected and updated automatically.
owl:sameAs optimisation can be disabled: The owl:sameAs optimisation can now be switched off using the disable-sameAs configuration parameter. This update might be useful when using the empty or rdfs rulesets.
Lucene-base full-text search enhancements: Even more fine-grained control over what to include in the indexed RDF molecule. Separate include/exclude lists are now supported for both predicates traversed and entities visited.
All OWLIM plug-ins available with Jena interface: All the BigOWLIM advanced features are now fully supported when using BigOWLIM with the Jena framework. This includes RDF Rank, RDF Search, Node Search, RDF Priming and Geo-spatial extensions.
Version 3.4
This release includes many bug fixes, several new features and updates:
Jena adapter: Applications which use the Jena framework http://jena.sourceforge.net/ or Jena-compliant RDF stores can seamlessly switch to BigOWLIM to take advantage of efficient loading and high-performance reasoning. At the same time, Jena's ARQ engine allows BigOWLIM to handle the latest SPARQL 1.1 extensions (e.g. aggregates). The adapter is still a beta version and has not been rigorously tested for conformance yet, but can be used with Joseki to make queries and has successfully passed the 100 Million BSBM SPARQL benchmark.
Geo-spatial extensions: Applications can efficiently make queries involving constraints such as "nearby point" and "within region". Special-purpose indices allow such constraints to be evaluated very efficiently on top of large volumes of location-related data – for example, finding airports within 50 miles of London in the GeoNames dataset becomes 500 faster when compared to the same query evaluated without the special indices.
Rule engine enhancements: The rule-engine now supports the ability to use context as part of rule premises and consequences. This allows for more efficient processing of certain RDFS/OWL constructions, particularly those rules using RDF lists. All included rule-sets have been upgraded to make use of this new expressiveness. As a result, there is now just a single rule-set for OWL2-RL, where in the last version there was a 'conformant' and a 'reduced' version. The new rule engine has lead to an improvement in LUBM loading performance of around 22%
OWL2-QL: This OWL2 profile http://www.w3.org/TR/owl2-profiles/#OWL_2_QL is based on DL-LiteR, a variant of DL-Lite that does not require the unique name assumption. It is designed to be amenable to implementation on relational databases, due to its suitability for re-writing queries to SQL. This release includes a rule-set for this profile in order to expand the range of standard rule-sets and to give users more flexibility when choosing a balance between complexity of inference and scalability.
Enhanced Lucene-based full text search: More flexibility is enabled for using Lucene full text search. Users can create multiple customised indices and can decide to include URIs or literals, select literals by language tags, and use custom analyzers and scorers. Any number of custom indices can be used within the same query.
Auto-restore: A configurable policy parameter can be used to specify how the user wishes the repository to start after an abnormal termination. By default, the database restorer tool will be run automatically to return the database to the state prior to the stop event, i.e. to the state after the last committed transaction.
Simplified 'implicit-only' statement retrieval: When using the Sesame API to return statements, the 'implicit' pseudo-graph is now used. This is simpler and more consistent with query processing than the old method of invoking RepositoryConnection.getStatements() twice.
Documentation: The distribution package includes two new guides: Replication Cluster Quick Start Guide that has details on installing and configuring a cluster and Performance Tuning Guide that brings together all information for optimising loading time, inference and query processing.
Version 3.3
BigOWLIM 3.3 consolidates a number of advanced new features, some of which have been available in previous versions as prototype implementations. The most important differences, compared to previous versions of BigOWLIM are:
Replication cluster: brings resilience, failover and horizontally scalable parallel query processing. A master node component is included that can manage a cluster of worker nodes (standard BigOWLIM instances) to synchronise updates, cater for node failure, dynamically add/remove worker nodes and distributed query requests. Such a setup allows for massive concurrent query performance where query the number of queries processed per second scales almost linearly with the number of worker nodes.
OWL2-RL: Support for this expressive OWL2 profile http://www.w3.org/TR/owl2-profiles/#OWL_2_RL that is amenable for implementation on rule-engines, but without costly data-type reasoning.
High performance retraction: BigOWLIM uses the policy of total materialisation of inferred knowledge. This has the advantage that all inferred statements are computed at load time, thus allowing query processing to proceed extremely quickly. While BigOWLIM has always been very fast at processing incremental statement insertions (which are monotonic by necessity), fast statement retractions are now also possible, despite not utilising any truth maintenance mechanism. When statements are deleted, a combination of forward and backward chaining is used to update the inferred closure. This is slower than statement insertion, but still fast enough to support hundreds of updates per hour even in a repository holding billions of statements.
Full text search: Two different, but complimentary full text search mechanisms are provided, one proprietary and the other based on Lucene. Text search queries can be embedded in SPARQL queries for powerful, hybrid query expressions.
Consistency checks: Are used to ensure the consistency of the repository. The checks use a syntax similar to entailment rules, but are used to signal a consistency violation when the necessary conditions are met. Consistency checks are used in some of the standard rule sets.
RDF Rank: Is a technique that identifies the more important or more popular entities in the repository by examining how many connections (predicates) link each node with any other node. The popularity of entities can then be used to order query results in a similar way to internet search engines, such as Google's PageRank.
RDF Priming: Allows subsets of statements to be selected as the input to query answering. It is based upon the concept of 'spreading activation' as developed in cognitive science. It allows the 'priming' of large datasets with respect to concepts relevant to the context and to the query.
Notification mechanism: A publish/subscribe mechanism for registering and receiving events from a BigOWLIM repository whenever triples matching a certain graph pattern are inserted or removed. This allows clients to react to events in the update stream and avoid polling the repository.
Documentation: improved user documentation with new quick start guide.
partialRDFS: this flag has been deprecated and the optimizations made available with extra rule-set options.
Ontology imports: importing ontologies can be achieved using a URL as well as a local pathname.
Better JDK integration: custom rule-sets require the Java compiler, but it is no longer necessary to have the tools.jar in the classpath.