This section gives an overview of configuring an OWLIM-SE repository. Also covered are the contents of the OWLIM-SE distribution and a description of the 'getting-started' application that is included. This sample application serves as an example for integrating OWLIM-SE in to other systems. For a detailed step-by-step guide for installing and setting-up OWLIM-SE, see the installation section.
The OWLIM-SE distribution zip file includes the following folders:
The distribution contains the following files in the root directory of the zip file:
The OWLIM-SE distribution comes with a sample application that can be used as a template for building applications that interact with an OWLIM-SE repository. The source code of this application performs a sequence of typical operations: initialisation of the repository, uploading statements, executing queries and obtaining results, deleting statements, etc. This application template comes with:
An easy way to set up an application to use OWLIM-SE is to copy the getting-started folder and modify the contents as necessary.
To run the program, use the example.cmd / example.sh script. This script requires that the JAVA_HOME environment variable has been set. Alternatively, it can be set directly by editing the setvars.cmd / setvars.sh script in the root folder of the OWLIM-SE software distribution. If the program is modified to use a custom rule set, then JAVA_HOME must point to the Java Runtime Environment (JRE) of a Java Development Kit (JDK) version 1.6 or later. This is so that the new mechanism for locating the Java compiler can be used.
The queries are always evaluated, but the results are output only if the showresults parameter is set to true.
The application can also be run against a remote repository exposed using the Sesame HTTP server. In this case, the url parameter is used to specify the sesame endpoint and the repository parameter is used to specify the repository to use on this server. The use of url and repository overrides the config parameter.
Due to its range of functions, the getting started application makes a useful bulk-loading tool. It can load a single file or traverse through a whole directory structure loading any RDF file it can find. If the files are very large, it will automatically insert commit operations at suitable moments, so it is not necessary to convert and split large files in to smaller ones. For example:
Will load all RDF files located in /home/me/wordnet/ and its subdirectories in to the repository called my_repo at the Sesame endpoint http://192.168.1.31:8080/openrdf-sesame secured using HTTP authentication with the above credentials. If an error occurs, it will output a message and continue on to the next RDF file.
The export features of getting-started allow a reasonable back-up of an OWLIM database to be made.
In this example the TriG file format is used, because it preserves named-graph names (it is a quad format).
Wordnet, is the most popular lexical knowledge base, developed at the University of Princeton. It encodes the meanings of about 150,000 English words. The meanings of the words are defined by word-senses, which relate a word to a lexical concept. Lexical concepts are called synsets, i.e. synonym sets – about 115,000 of those appear in Wordnet v.2.0. Numerous lexical semantic relations are formally modelled, e.g.
A standard RDF/OWL representation of Wordnet is available at http://www.w3.org/TR/wordnet-rdf/. It contains about 1.9 million explicit statements (the Full variant), expressed in a fragment of OWL-Lite that further entails 6.3 million implicit statements.
Sesame 2.0 keeps repository configurations in a SYSTEM repository – in RDF. A new repository can be configured simply by inserting an appropriate graph in to the SYSTEM repository. The getting started application uses the Turtle format for convenience and also because the Sesame console application uses the Turtle format for template files when creating repositories.
Often it is desirable to ensure that a repository starts with a predefined set of RDF statements, usually one or more schema graphs. This is possible by using the owlim:imports property. After start up, these files are parsed and their contents are permanently added to the repository. The complete set of configuration parameters, their descriptions and their default and allowed values are listed below. What follows is a short description of those properties specific to OWLIM-SE that are used to setup the repository. For more information about Sesame 2 configuration schema refer to the Sesame documentation . In short, the configuration is an RDF graph, the root node is of rdf:type rep:Repository, it must be connected through rep:RepositoryID property to a Literal that contains the human readable name of the repository. The root node must be connected via the rep:repositoryImpl property to a node that describes the configuration. The type of the repository is defined via rep:repositoryType property and its value must be openrdf:SailRepository to allow for custom sail implementations (such as OWLIM-SE) to be used in Sesame 2.0. Then a node that specifies the Sail implementation to be instantiated must be connected with sr:sailImpl property. To instantiate OWLIM-SE, this last node must have a property sail:sailType with the value owlim:Sail – the Sesame framework will locate the correct SailFactory within the application classpath that will be used to instantiate the Java implementation class.
All properties used to specify OWLIM-SE's configuration parameters use the owlim: prefix and the local names match up with the parameters listed below, e.g. the value of the ruleset parameter can be specified using the http://www.ontotext.com/trree/owlim#ruleset property.
There follows an example configuration (in Turtle RDF format) of a Sesame 2 repository that uses an OWLIM-SE sail implementation:
Apart from the I/O buffers used for caching, OWLIM-SE keeps in memory the indexes from the nodes in the RDF graph. This is a design decision in order to improve the overall performance of the repository. Each I/O buffer (page) is exactly 64kb and the indexing information per node in the graph is 12 bytes. So, depending on the dataset, memory requirements per repository may vary. To ease the calculation for the amount of Java heap memory required for an OWLIM-SE repository an excel spreadsheet is included in the distribution – owlim-se-configurator.xls.
There are several components in OWLIM that make use of caching (e.g. FTS indices, predicate list, tuple indices). In different situations certain caches will need more memory than others. OWLIM allows for the configuration of both the total cache memory to be used by a repository and all the separate per-module caches.
The following parameters control the amount of memory assigned to each of the different caches:
All parameters can be specified in bytes, kilobytes, megabytes or gigabytes by using a unit specifier at the end of the integer number. When no unit specifier is given, this is interpreted as bytes, otherwise use k or K - kilobytes, m or M - megabytes and g or G - gigabytes (everything base 2).
The general rule of thumb is:
However, if some of the modules using the cache (e.g. full text search) are turned off it is excluded from the above equation.
Almost all OWLIM parameters can be set both in the TTL configuration file and from the command line using the Java -D<param.name>=<value> command line option to set system properties. When a parameter is set simultaneously using both methods, the system property overrides the value in the configuration file. Some OWLIM parameters can only be set using system properties.
OWLIM-SE can provide information about its internal state and behaviour, including trend information. Such information can be useful for 'debugging' certain situations, such as understanding why load performance changes over time or with particular data sets.
Statistics are kept for the main index data structures and includes information such as: cache hits/misses, file reads/writes, etc. This information can be used to fine tune OWLIM-SE's memory configuration.
The analytical information is published via a Java management extensions JMX management bean (MBean). A JMX endpoint is configured by using special system properties when starting the Java virtual machine (JVM) in which OWLIM-SE is running. For example, the following command line parameters will set the JMX server endpoint to listen on port 8089, not require authentication and will not use a secure socket layer:
If using OWLIM-SE with Tomcat, then these parameters must be passed to Tomcat's JVM by setting either the JAVA_OPTS or CATALINA_OPTS environment variable, e.g.
For some Linux distributions, you can also edit the file /etc/default/tomcat6 and set JAVA_OPTS there.
Once OWLIM-SE is loaded, use any compliant JMX client, e.g. jconsole that is part of the Java development kit, to access the JMX interface on the configured port. Statistics will be maintained for all the major indices. For each index, there will be a SortedCollectionStatistics MBean published that shows the cache and file I/O values updated in real-time.
The following information is displayed for each index:
Ideally, the system should be configured to keep the number of cache misses to a minimum. If the ratio of hits to misses is low then consider increasing the memory available to the index (if other factors permit this).
Page swaps will tend to occur much more often during large scale data loading. Page discards will occur more frequently during query evaluation.
Information is also available via the JMX interface that provides details about executing queries (or more accurately query result iterators). These statistics are provided through a SailIterationMonitor MBean, with one for each repository instance. Each bean instance is named using the storage directory of the repository it relates to.
For every executing query the SailIterationMonitor maintains a TrackRecord object that provides the following information:
The collection of these objects grows for each executing/executed query, however older objects in the CLOSED state expire and are removed from the collection as the query result iterators are finalized.
There is a single operation available with this MBean that allows an administrator to request that a query terminates early. The operation is called requestStop and requires the trackId of the query that is to be stopped. If this operation is invoked on a running query, the isRequestedToStop flag is set to true and the next call to hasNext() will return false.
Skip to end of metadata Go to start of metadata