This section contains information about performing day to day administration tasks with an OWLIM-SE repository. Most of these standard operations can be achieved using the Sesame software. Some information is repeated here and tailored to the specifics of OWLIM-SE.
- Preserving Context when Exporting and Importing Data
- Loading large amounts of data
- Dumping a large repository to RDF
- Migration of a data from incompatible versions
- Modifying a repository's configuration after it has been created
- Restoring the repository after a crash
- Flush to disk without shutting down the repository
- Exception thrown during query answering or initialisation
- How can the rule set be changed
- Backing up and restoring the repository
In order to preserve the context across export/import a context-aware RDF file format must be used, e.g. TriG. After serialising the repository to a file with this format (this can be done through the Sesame workbench Web application) the file can be imported with the following steps:
- Go to [Add]
- Choose Data format: TriG
- Choose RDF Data File: e.g. export.trig
- Clear the context text field (it will have been set to the URL of the file). If this is not cleared then all the imported RDF statements will be given a context of <file://export.trig> or similar.
The TriX format (an XML-based context-aware RDF serialisation) can also be used.
In general RDF data can be loaded into a given Sesame repository using the 'load' command in the Sesame console application or directly through the workbench web application. However, neither of these approaches will work when using a very large number of triples, e.g. a billion statements. A common solution would be to convert the RDF data into a line-based RDF format (e.g. N-triples) and then split it into many smaller files (e.g. using the linux command 'split'). This would allow each file to be uploaded separately using either the console or workbench applications.
The Sesame openRDF workbench Web application has an export function that can be used to export the contents of moderately sized repositories. However, using this with large repositories (more than a hundred million statements or more) causes problems, usually timeouts for the Servlet container (Tomcat) hosting the application. Also, the workbench cannot be used when using OWLIM-SE without Tomcat.
Therefore, a more straightforward approach for exporting RDF data from repositories is to do this programmatically. The Sesame RepositoryConnection.getStatements() method can be called with the includeInferred flag set to false (in order not to serialise the inferred statements). Then the returned iterator can be used to visit every explicit statement in the repository and one of the Sesame RDF writer implementations can be used to output the statements in the chosen format. If the data will be re-imported, the N-Triples format is recommended, because this can easily be broken in to large 'chunks' that can be inserted and committed separately. The following code snippet shows how an export can be achieved using this approach:
The basic procedure is to export the RDF data from the old version of OWLIM-SE and then reload it in to a new repository instance that uses the new version of OWLIM-SE. Exporting is straightforward when using the Sesame workbench – simply click the 'Export' button, choose the format and click 'download'. To import in to a new repository, click 'add', select a format, specify the file and base URI, then click 'Upload'.
If not using the Sesame workbench, the export must be done programmatically using the RepositoryConnection.getStatements() API, because the Sesame console does not have an export function. NOTE: It should be possible to export only the explicit statements, as the inferred statements will be recomputed at load time. Fortunately, the Sesame console application does have a 'load' function and this can be used to reload the exported statements.
Once created, the repository configuration is maintained in the Sesame SYSTEM repository. There is no easy generic way of changing this configuration, but there are several possibilities.
Firstly, OWLIM-SE allows most of the configuration parameters to be overridden at runtime by specifying the parameter values as JVM options. For example, to change the cache-memory configuration parameter, pass -Dcache-memory=1g option to the JVM that is hosting the application using OWLIM-SE.
The second approach is to modify the SYSTEM repository directly. Caution, make sure that the SYSTEM repository is not corrupted and that no other repository configurations are damaged. There are no tools for doing this, but it is probably easiest (if the current configuration is known) to just remove the repository configuration using the Sesame console (drop <repository_id>) or the Sesame workbench WebApp. Then the configuration in the TTL file can be modified and added to the SYSTEM repository (keeping the same repository id).
A OWLIM-SE repository image can become corrupted if OWLIM-SE (or the Java application that hosts it) crashes. In particular, the repository gets corrupted if OWLIM-SE is interrupted after a successful commit but before flushing the update to disk.
By default, if OWLIM-SE detects that the last shutdown was not normal, it will attempt to restore its internal state back to the end of the last successfully committed transaction. Therefore, users will not normally need to invoke a restore process manually. However, if the database-recovery-policy parameter has been changed to anything other than recover, then no attempt will be made to restore any lost data. In this situation, the user may want to invoke a restore process manually as follows.
If an abnormal termination occurs, it is advisable to execute a database restore as soon as possible. This can also help if OWLIM-SE fails to load a repository during startup for some reason. The OWLIM-SE jar contains a utility for restoring database images. It can be executed in the root folder of the OWLIM-SE distribution as follows:
java -cp lib/*:ext/* com.ontotext.trree.DatabaseRestorer STORAGE_FOLDER_PATH ENTITY_INDEX_SIZE RULE_SET
The parameters are:
|STORAGE_FOLDER_PATH||The full path to the repository storage directory.|
|ENTITY_INDEX_SIZE||The value of the entity-index-size parameter in the TTL repository configuration file.|
|RULE_SET||The value of the ruleset parameter in the TTL repository configuration file.|
To ensure that all committed statements are written to disk storage, a transaction can be committed containing a statement with the special predicate <http://www.ontotext.com/owlim/system#flush>, e.g.
This statement can be committed with other statements as a single transaction or separately in its own transaction. All repository data is flushed to disk after the transaction is committed.
If the Lucene jar file is not on the classpath then the following exception will be thrown:
The Lucene.jar file must be added to the Java classpath, even when not using full-text search.
Changing the rule-set can be achieved in the same way as changing any other configuration parameter, except that it is necessary to re-compute the inferred statements with the new rule-set. This does not happen automatically, but can be forced by committing a transaction containing a statement with the special re-infer predicate (the subject and object can be anything), e.g.
There is no facility at present to back up a repository while it is running. Therefore, to backup a repository it must be shutdown gracefully and a copy of its storage directory (and any sub-directories) taken.
If the Sesame RepositoryManager class is being used, the repository directory will usually be in a directory path that ends with 'repositories/<repository_id>/owlim-storage', but this can be overridden with the storage-folder configuration parameter.
To restore a repository from a back up, make sure the repository is not running and then replace the entire contents of the storage directory (and any sub-directories) with the backup. Then restart the repository and check the log file to ensure a successful start up.