View Source

This section gives an overview of configuring an OWLIM-SE repository. Also covered are the contents of the OWLIM-SE distribution and a description of the 'getting-started' application that is included. This sample application serves as an example for integrating OWLIM-SE in to other systems. For a detailed step-by-step guide for installing and setting-up OWLIM-SE, see the [installation section|OWLIM-SE Installation].

{toc}

h1. Contents of the Distribution Package

The OWLIM-SE distribution zip file includes the following folders:
\\
|| Folder || Contents ||
| {{doc}} | User guide, quick start guide and the OWLIM primer |
| {{ext}} | All required third party libraries. The Sesame 2 installation can be downloaded separately from [http://www.openrdf.org/|http://www.openrdf.org/]. The folder also contains a copy of Lehigh University Benchmark library (lubm.jar), JUnit library (junit.jar) necessary for executing inference tests, the simple logging framework for Java jar files and the lucene full text search jar |
| {{getting-started}} | An example application that uses OWLIM, with all the necessary auxiliary files and folders, see section 8.2 |
| {{lib}} | Contains the binary executable version of OWLIM-SE as a JAR (Java library) file |
| {{lubm}} | Scripts and configuration files to run the LUBM \[16\] benchmarks documented in section 11. |
| {{templates}} | Contains an OWLIM-SE repository template file (.ttl) used by the Sesame 2 framework for creating new repositories. |

The distribution contains the following files in the root directory of the zip file:
\\
|| File || Description ||
| {{\*.pie}} | Rule files containing definitions of the built-in rule-sets, see section 7.1.2. |
| {{OWLIM-SE_license_agreement_xx.pdf}} | The license under which OWLIM-SE is published. |
| {{owlim-se-configurator.xls}} | A useful memory requirement calculator and configuration tool that can be used to calculate the correct Java heap size, memory allocation and various other configuration parameters. Command line and turtle configurations are generated. Instructions for using this spreadsheet are given on the first page. |
| {{setvars.cmd}} \\ {{setvars.sh}} | Scripts (Windows and Linux) that define several environment variables used by the scripts that run the test cases and getting-started application. It should be edited for each installation, as it determines the Java virtual machine to be started, the path to all the relevant JAR files, including those of OWLIM-SE and Sesame |

h1. Getting-Started Application

The OWLIM-SE distribution comes with a sample application that can be used as a template for building applications that interact with an OWLIM-SE repository. The source code of this application performs a sequence of typical operations: initialisation of the repository, uploading statements, executing queries and obtaining results, deleting statements, etc. This application template comes with:
* Source code and compiled class files
* Sample ontology and data files
* Sesame repository template file
* Scripts which invoke the application

An easy way to set up an application to use OWLIM-SE is to copy the {{getting-started}} folder and modify the contents as necessary.
There follows a short description on how the getting started application is organised and what the sample code does. The easiest way of getting a good understanding of it is to read the source code of the {{GettingStarted}} class, located in the {{src}} folder - the code is extensively commented.
The program accepts the following command line parameters:

|| Parameter || Description || Default ||
| {{context}} | If not specified, statements loaded are given the context of the file URL from which the statements were loaded. If specified ({{context=URI}}), then all statements loaded are given this URI for the context. If an empty context is used ({{context=}}) then all statements loaded have no context (default graph) | <none> |
| {{config}} | Specifies the repository description file to use to create a repository. Configuration options specified in this file are explained in section&nbsp;8.5. | ./owlim.ttl |
| {{flush}} | Indicates whether repository updates shiuld be flushed to disk after every commit | false |
| {{preload}} | Specifies the folder or file containing RDF data that is loaded automatically when the program starts. If the parameter value specifies a folder then it is searched recursively for all files that contain RDF data. | ./preload |
| {{queryfile}} | Specifies the file containing queries that are to be executed. The files can contain queries in any format supported by Sesame. | ./queries/sample.sparql |
| {{repository}} | The repository ID identifying the repository described in the configuration file specified by the {{config}} parameter. | owlim |
| {{showresults}} | Specifies whether the results from queries will be displayed or not. | true |
| {{showstats}} | Indicates whether to show initialisation statistics after loading the selected data files | false |
| {{updates}} | Specifies whether the statement insertion and deletion step is performed. | false |
| {{url}} | Indicates whether to connect to the given remote repository URL - overrides {{config}} and {{repository}} parameters | <none> |

To run the program, use the {{example.cmd}} / {{example.sh}} script. This script requires that the {{JAVA_HOME}} environment variable has been set. Alternatively, it can be set directly by editing the {{setvars.cmd}} / {{setvars.sh}} script in the root folder of the OWLIM-SE software distribution. If the program is modified to use a custom rule set, then {{JAVA_HOME}} must point to the Java Runtime Environment (JRE) of a Java Development Kit (JDK) version 1.6 or later. This is so that the new mechanism for locating the Java compiler can be used.
With the example set up, OWLIM-SE loads two ontologies at start up as specified by the {{imports}} parameter in the repository configuration file, i.e. {{owlim.ttl}}. These ontologies are {{./ontology/owl.rdfs}} and {{./ontology/example.rdfs}}. The sample program then loads any other ontologies that it finds in the {{preload}} folder. When start up is complete, the program outputs some statistics and lists the namespaces found.
The next step is to load the specified query file and to execute the queries that they contain. Some example query files are included in the {{queries}} folder. The files can contain several queries where each query starts with an identifier, enclosed in square brackets {{\^\[}} and {{\]}} on a single line; everything between two subsequent query identifiers is treated as a SeRQL or SPARQL query and is evaluated against the contents of the prepared repository. You may use also the {{\#}} sign as a single line comment, so each line starting with {{\#}} will be ignored. Syntax overview:
{noformat}#some comment
^[queryid1]
<query line1>
<query line2>
...
<query lineN>
#some other comment
^[nextqueryid]
<query line1>
...
<EOF>
{noformat}The queries are always evaluated, but the results are output only if the {{showresults}} parameter is set to {{true}}.
Furthermore, the sample application updates the contents of the repository by inserting a statement using {{RepositoryConnection.addStatement()}} and the transaction is committed. The program then fetches some statements from the repository using a direct call to the {{RepositoryConnection.getStatements()}}. The set of retrieved statements should contain the newly added statement since it matches the given pattern. The statement is then removed in a separate transaction.
The application can also be run against a remote repository exposed using the Sesame servlet container. In this case, the {{url}} parameter is used specify the sesame endpoint and this causes the {{config}} and {{repository}} parameters to be ignored. In order to use this, the following extra Apache and Sesame jar files must be included in the classpath (not provided with the OWLIM distribution):
{noformat}commons-logging-1.1.1.jar
commons-codec-1.3.jar
commons-httpclient-3.1.jar
sesame-http-client-2.5.0.jar
{noformat}There are Windows and Linux scripts (namely {{example.cmd}} and {{example.sh}}) prepared for convenience that execute the sample application using its default configuration and values.

h2. Wordnet example

[Wordnet|http://wordnet.princeton.edu/], is the most popular lexical knowledge base, developed at the University of Princeton. It encodes the meanings of about 150,000 English words. The meanings of the words are defined by word-senses, which relate a word to a lexical concept. Lexical concepts are called _synsets_, i.e. synonym sets -- about 115,000 of those appear in Wordnet&nbsp;v.2.0. Numerous lexical semantic relations are formally modelled, e.g.
* Hyponymy (subsumption from a more-general term)
* Antonymy (negation, a term with the opposite meaning)
* Causation and entailment (for verbs)

A standard RDF/OWL representation of Wordnet is available at [http://www.w3.org/TR/wordnet-rdf/|http://www.w3.org/TR/wordnet-rdf/]. It contains about 1.9 million explicit statements (the Full variant), expressed in a fragment of OWL-Lite that further entails 6.3 million implicit statements.
To configure the getting started example program to use the Wordnet data sets and run the included Wordnet queries, one should download the archive of the full version from [http://www.w3.org/2006/03/wn/wn20/download/wn20full.zip|http://www.w3.org/2006/03/wn/wn20/download/wn20full.zip], extract it into a folder, e.g. {{./preload/wordnet}} and provide a path to this folder using the {{preload}} command line parameter when starting the program. Some sample Wordnet queries are provided in the {{wordnet.serql}} and {{wordnet.sparql}} query files. These can be specified on the command line using the {{queryfile}} command line parameter.

h1. Configuration

Sesame 2.0 keeps repository configurations in a SYSTEM repository -- in RDF. A new repository can be configured simply by inserting an appropriate graph in to the SYSTEM repository. The getting started application uses the Turtle format for convenience and also because the Sesame console application uses the Turtle format for template files when creating repositories.
The diagram below gives a graphical illustration of an RDF graph that describes a repository configuration:

!sesame_owlim_config.png!

Often it is desirable to ensure that a repository starts with a predefined set of RDF statements, usually one or more schema graphs. This is possible by using the {{owlim:imports}} property. After start up, these files are parsed and their contents are permanently added to the repository. The complete set of configuration parameters, their descriptions and their default and allowed values are listed below. What follows is a short description of those properties specific to OWLIM-SE that are used to setup the repository. For more information about Sesame 2 configuration schema refer to the Sesame documentation&nbsp;\[9\]. In short, the configuration is an RDF graph, the root node is of {{rdf:type rep:Repository}}, it must be connected through {{rep:RepositoryID}} property to a Literal that contains the human readable name of the repository. The root node must be connected via the {{rep:repositoryImpl}} property to a node that describes the configuration. The type of the repository is defined via {{rep:repositoryType}} property and its value must be {{openrdf:SailRepository}} to allow for custom sail implementations (such as OWLIM-SE) to be used in Sesame 2.0. Then a node that specifies the Sail implementation to be instantiated must be connected with {{sr:sailImpl}} property. To instantiate OWLIM-SE, this last node must have a property {{sail:sailType}} with the value {{owlim:Sail}} -- the Sesame framework will locate the correct {{SailFactory}} within the application {{classpath}} that will be used to instantiate the Java implementation class.
Namespaces corresponding to the prefixes used in the above paragraph are:
{noformat}rep: <http://www.openrdf.org/config/repository#>
sr: <http://www.openrdf.org/config/repository/sail#>
sail: <http://www.openrdf.org/config/sail#>
owlim: <http://www.ontotext.com/trree/owlim#>
{noformat}All properties used to specify OWLIM-SE's configuration parameters use the {{owlim:}} prefix and the local names match up with the parameters listed below, e.g. the value of the {{ruleset}} parameter can be specified using the {{[http://www.ontotext.com/trree/owlim#ruleset]}} property.
Many of the OWLIM specific configuration parameters can be set via the Java Virtual Machine (JVM) system properties passed as command line parameters when starting the JVM. Values for configuration parameters that are given on the command line take precedence over those present in the repository configuration. For instance, the {{ruleset}} parameter can be set from the command line by using:
{noformat}-Druleset=owl-max
{noformat}

h2. Sample Configuration

There follows an example configuration (in Turtle RDF format) of a Sesame 2 repository that uses an OWLIM-SE sail implementation:

{noformat}@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rep: <http://www.openrdf.org/config/repository#>.
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
@prefix sail: <http://www.openrdf.org/config/sail#>.
@prefix owlim: <http://www.ontotext.com/trree/owlim#>.

[] a rep:Repository ;
rep:repositoryID "owlim" ;
rdfs:label "OWLIM Getting Started" ;
rep:repositoryImpl [
rep:repositoryType "openrdf:SailRepository" ;
sr:sailImpl [
sail:sailType "owlim:Sail" ;
owlim:ruleset "owl-horst-optimized" ;
owlim:base-URL "http://example.org/owlim#" ;
owlim:imports "./ontology/my_ontology.rdf" ;
owlim:defaultNS "http://www.my-organisation.org/ontology#" ;
owlim:entity-index-size "5000000" ;
owlim:cache-memory "4G" ;
owlim:storage-folder "storage" ;
owlim:repository-type "file-repository" ;
]
].
{noformat}

h1. Memory Requirements

Apart from the I/O buffers used for caching, OWLIM-SE keeps in memory the indexes from the nodes in the RDF graph. This is a design decision in order to improve the overall performance of the repository. Each I/O buffer (page) is exactly 64kb and the indexing information per node in the graph is 12 bytes. So, depending on the dataset, memory requirements per repository may vary. To ease the calculation for the amount of Java heap memory required for an OWLIM-SE repository an excel spreadsheet is included in the distribution -- {{owlim-se-configurator.xls}}.
The page cache is organized in two sets of buffers, read-only and dirty. Each page is first loaded in to the read-only cache. When this gets full, a page (if dirty) is moved to the dirty cache, where it can be later written to storage.

h2. Cache Memory Configuration

There are several components in OWLIM that make use of caching (e.g. FTS indices, predicate list, tuple indices). In different situations certain caches will need more memory than others. OWLIM allows for the configuration of both the total cache memory to be used by a repository and all the separate per-module caches.

h2. Parameters

The following parameters control the amount of memory assigned to each of the different caches:

|| Parameter || Unit || Default || Deprecated Equivalent || Description ||
| cache-memory | bytes | none | | The amount of memory to be distributed among different caches |
| tuple-index-memory | bytes | 80M | cache-size | Memory used for PSO and POS caches |
| predicate-memory | bytes | 80M | | Memory used for predicate list cache |
| fts-memory | bytes | 20M | tokenIndexCacheSize | Memory used for full-text index cache (node search) |

All parameters can be specified in bytes, kilobytes, megabytes or gigabytes by using a unit specifier at the end of the integer number. When no unit specifier is given, this is interpreted as bytes, otherwise use k or K - kilobytes, m or M - megabytes and g or G - gigabytes (everything base 2).

h2. Memory Distribution

The general rule of thumb is:
{noformat}cache-memory = tuple-index-memory + predicate-memory + fts-memory
{noformat}However, if some of the modules using the cache (e.g. full text search) are turned off it is excluded from the above equation.
Furthermore, if cache-memory is explicitly configured and some of the other memory parameters are omitted, the missing values are resolved by uniformly distributing the remaining memory after all the explicitly configured memory parameters are subtracted. For example if cache-memory = 100M, fts-memory = 10M and the other memory parameters are missing, then they are implicitly assigned (100M - 10M) / 2 = 45M each.
If cache-memory wasn't specified then all the missing memory parameters are assigned their default values.

h2. Backward Compatibility

The old-style parameters that were used to configure cache memory in terms of number of pages are still accepted, although deprecated. When both old and new parameters are used together, the value of the new parameter overrides the value of the old one.

h1. Configuration Parameters

Almost all OWLIM parameters can be set both in the TTL configuration file and from the command line using the Java {{\-D<param.name>=<value>}} command line option to set system properties. When a parameter is set simultaneously using both methods, the system property overrides the value in the configuration file. Some OWLIM parameters can only be set using system properties.
These list of all OWLIM parameters is here:

|| Parameter || TTL || Java \-D || Description ||
| *base-URL* | X | X | _default_ *<none>*, specifies the default namespace for the main persistence file. Non-empty namespaces are recommended, because their use guarantees the uniqueness of the anonymous nodes that may appear within the repository. |
| *build-pcsot* | X | X | _default_ *false*, by default OWLIM will not build _PSOCT_ and _POSCT_ indices (pred-subj-obj-context-tripleset and pred-obj-subj-context-tripleset); if set to *true*, this parameter specifies that the _PCSOT_ index will be built, which will speed up extracting all statements from a context. |
| *build-ptsoc* | X | X | _default_ *false*, similar to *build-pcsot*, but speeds up retrieving all statements from a tripleset. |
| *cache-memory* | X | X | _default_ <_none_>, specifies the total amount of memory to be given to all types of cache. |
| *cache-size* (DEPRECATED) | X | X | _default_ *4000*, works with file repositories only and defines the number of 20k pages in the cache of each of its sorted collections; because there are at least 2 collections (_PSO_ and _POS_, but _PCSOT_ and _PTSOT_ may be defined as well), the total number of pages is twice (or three times or 4 times) as big. |
| *check-for-inconsistencies* | X | X | _default_ *false*, turns on or off the mechanism for consistency checking; consistency checks are defined in the rule file and are applied at the end of every transaction if this parameter is *true*. |
| *database-recovery-policy* | X | X | _default_ recover, specifies the recovery strategy when the repository is initialised and it is detected that the previous instance did not shutdown cleanly. The possible values are: \\
* stop -- log a message and stop. In this situation a repository must be restored from a backup before it can be re-run; \\
* run -- log a message, but continue initialisation. The repository can be used, but some data may have been lost; \\
* recover -- log a message and automatically run the database restorer tool to recover data, then continue initialisation. |
| *debug.level* | | X | _default_ *0,* defines the level of detail of OWLIM output used in *QueryModelConverter*, *SailConnectionImpl* and *HashEntityPool* as follows: \\
* *QueryModelConverter*: \\
** *debug.level* > 2 : Outputs the query optimization time. \\
** *debug.level* > 3 : Outputs the query plan. \\
* SailConnectionImpl: \\
** *debug.level* > 0 : Outputs "Owlim evaluation strategy" or "Sesame evaluation strategy" when evaluating a query. \\
** *debug.level* > 2 : ThreadPool outputs when a worker thread starts and stops. \\
* HashEntityPool: \\
** *debug.level* > 1 : If version number is less than the current one, outputs "Older Entity storage version found: X, recent one is: Y". |
| *defaultNS* | X | | _default_ *<empty>*, default namespaces corresponding to each imported schema file separated by semicolon (;-)\\
and the number of namespaces must be equal to the number of schema files from the *imports* parameter. \\
Example: \\
\\
\*owlim:defaultNS "http://www.w3.org/2002/07/owl#;[http://example.org/owlim#]";\* \\
Note: This parameter cannot be set via a command line argument |
| *disable-sameAs* | X | X | _default_ *false*, enables or disables the {{owl:sameAs}} optimisation |
| *enable-query-cache* (NOT USED) | X | X | _default_ *true*, enables or disables the query cache, from which the most recent similar query (to the one to be executed) is retrieved to speed up query compilation. |
| *enable-optimization* | X | X | _default_ *true*, enables or disables query optimization. |
| *enablePredicateList* | | X | _default_ *false:* enables or disables mappings from an entity (subject or object) to its predicates; switching this on can drastically speed up queries that use wildcard predicate patterns. |
| *enableSmoothDelete* | X | X | _default_ *true*: enables or disables the smooth delete behaviour, see section&nbsp;7.1.4. |
| *entity-index-size* | X | X | _default_ *1000000*, defines the number of entity hash table index entries; the bigger the size, the less the collisions in the hash table and the faster the entity retrieval; the entity hash table does not rehash so its index size is constant throughout the life of the repository. |
| *entity-id-size* | X | X | _default_ *32*, possible values are *32* and *40*; defines the bit size of internal IDs used to index entities (URIs, blank nodes and literals). For most cases, this parameter can be left to its default value. However, if very large datasets are used that contain more than 2{^}32^ entities, then this parameter should be set to *40*. Be aware, that this can only be set when instantiating a new repository and converting an existing repository between 32 and 40-bit entity widths is not possible. |
| *fts-memory* | X | X | _default_ *20m*, specifies the amount of memory to be used for the _FTS_ index cache. |
| *ftsIndexPolicy* | X | X | _default_ *never*, possible values are *onCommit*, *onStartup*, *onShutdown*, *never*; turns on and off the default mechanism for full text search used via the built-in predicates described in section&nbsp;10.1; if _FTS_ is turned on then, depending on the value, it determines when the indexing will take place: *onCommit* \- at the end of each transaction, *onStartup* -- at initialisation, *onShutdown* \- on repository shutdown. |
| *ftsLiteralsOnly* | X | X | _default_ *false*, if the Node search (full-text search) mechanism is enabled, this parameter specifies whether only literals will be indexed (value of *true*, enough in 90% of the cases) or everything (value of *false*). |
| *hash-table-size* (NOT USED) | X | X | _default_ *10000000*, in former times this used to limit the size of the hash table for entities, but now the hash table can grow without limitations. |
| *imports* | X | | _Default_ *none*, a list of schema files that will be imported at start up. All the statements, found in these files, will be loaded into the repository and will be treated as read-only. The serialization format is assumed to be RDF/XML, unless the file has a *.NT* extension. Example: \\
\\
*owlim:imports* \\
*"./ont/owl.rdfs;./ont/ex.rdfs"* \\
\\
Schema files can be either a local path name, e.g. *./ontology/myfile.rdf* \\
or a URL, e.g. *[http://www.w3.org/2002/07/owl.rdf\*|http://www.w3.org/2002/07/owl.rdf*] \\
If this parameter is used then the default namespace for each imported schema file _must_ be provided using the *defaultNS* parameter. \\
Note: This parameter cannot be set via a command line argument. |
| *in-memory-literal-properties* | X | X | _default_ *false*, turns on and off caching of the literal languages and data-types; if the caching is on and the entity pool is restored from persistence, but there is no such a cache available on disk, it is created after the entity pool initialization. |
| *inferencer-threads* | X | X | _default_ *1*, if 1 then the default inferencers are used (generated by the RuleCompiler) otherwise multithreaded inferencers are used (generated by the RuleCompilerMT); the number corresponds to the number of threads used for inference. |
| *journaling* | X | X | _default_ *true*, turns on and off transaction persistence on disk; from the persisted transaction the repository can be restored after an unexpected shutdown. |
| *owlim-license* | X | X | _default_ *<none>*, specifies the license file for both OWLIM-SE and OWLIM-Enterprise worker nodes. |
| *partialRDFS* (DEPRECATED) | X | X | _default_ *false*, if *true* then trivial RDFS inferences are switched off, such as 'every URI is type ' or 'every class is a sub-class of rdfs:Resource', etc. Deprecated in favour of explicit choice of rule-set, see *ruleset* below. However, if this flag is set to *true* then the chosen *ruleset* will effectively have *\-optimized* appended to it if not present, e.g. *ruleset=owl-horst* and *partialRDFS=true* is the same as selecting *ruleset=owl-horst-optimized* |
| *predicate-list-cache-size* (DEPRECATED) | X | X | _default_ *1000*, defines the number of 20k pages used for caching the predicate lists (these are lists of predicates per entity used as subject or object, i.e. subjects and objects are mapped to a list of predicates; this can speed up queries that use wildcard predicate patterns). |
| *predicate-memory* | X | X | _default_ *80m*, specifies the amount of memory to be used for predicate lists cache. |
| *query-timeout* | X | X | _default_ *\-1* (infinity), sets the number of seconds after which the evaluation of a query will be terminated; negative values stand for infinity. |
| *rebuild.rules* | | X | _default_ *true*, where *true* instructs the rule compiler to write the generated inference classes to the *src* directory of the Java project (if one wants to start the *RuleCompiler* then he/she must use this parameter with value of *true*); this parameter is *false* when used by the *RuntimeInferencerCompiler*; used in *RuleCompiler* and *RuleCompilerMT*. |
| *repository-fragments* | X | X | _default_ *1*, works with a file repository only. If it is set to 1 then the default AVLRepository will be used, otherwise a fragmented repository will be used; this can speed up statement storage and retrieval. This parameter controls the number of pieces that the repository indexes are sliced in to, where each slice is stored in a separate folder. |
| *repository-type* | X | X | _default_ *file-repository*, available values are: \\
*file-repository* \\
*weighted-file-repository* |
| *ruleset* | X | X | _default_ *owl-horst*, built-in rule sets are *empty*, *rdfs*, *owl-horst*, *owl-max* and *owl2-rl* and their optimized counterparts *rdfs-optimized*, *owl-horst-optimized*, *owl-max-optimized* and *owl2-rl-optimized*. A custom rule set is chosen by setting the path to its rule file (.pie). |
| *storage-folder* | X | X | _default_ *none*, specifies the folder where the index files will be stored. |
| *tokenIndexCacheSize* (DEPRECATED) | X | X | _default_ *1000*, defines the number of pages to be used by the default full text search; this is now deprecated and replaced by *fts-memory* which specifies the memory available for _FTS_, so the number of pages is automatically calculated. |
| *tokenization-regex* | X | X | _default_ *\[\p\{L\}\d_\]\+*, defines the rule for splitting strings into tokens; the tokens themselves, not the strings, are stored in the _FTS_ index, so it is important to define a suitable tokenization for the given application (e.g. in some cases intensive number parsing is used, in other cases - personal names, i.e. ones starting with capital letter and may contain hyphens and apostrophes, etc.) |
| *tuple-index-memory* | X | X | _default_ *80m*, specifies the amount of memory to be used for statement storage cache. |
| *useShutdownHooks* | X | X | _default_ *true*, if _true_ then the method *OwlimSchemaRepository.shutdown()* is called when the Java Virtual Machine (JVM) exits (running OWLIM under Tomcat requires this parameter to be *true*, otherwise it cannot be guaranteed that the *shutdown()* method will be called at all). |

h1. Performance analytics

OWLIM-SE can provide information about its internal state and behaviour, including trend information. Such information can be useful for 'debugging' certain situations, such as understanding why load performance changes over time or with particular data sets.

Statistics are kept for the main index data structures and includes information such as: cache hits/misses, file reads/writes, etc. This information can be used to fine tune OWLIM-SE's memory configuration.

The analytical information is published via a Java management extensions [JMX|http://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html] management bean (MBean). A JMX endpoint is configured by using special system properties when starting the Java virtual machine (JVM) in which OWLIM-SE is running. For example, the following command line parameters will set the JMX server endpoint to listen on port 8089, not require authentication and will not use a secure socket layer:

{noformat}
-Dcom.sun.management.jmxremote.port=8089
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
{noformat}

If using OWLIM-SE with Tomcat, then these parameters must be passed to Tomcat's JVM by setting either the {{JAVA_OPTS}} or {{CATALINA_OPTS}} environment variable, e.g.

{noformat}
set JAVA_OPTS="-Dcom.sun.management.jmxremote.port=8089
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false"
{noformat}

Once OWLIM-SE is loaded, use any compliant JMX client, e.g. {{jconsole}} that is part of the Java development kit, to access the JMX interface on the configured port. The entry point for accessing analytical information will be advertised via the com.ontotext.AVLRepositoryManager MBean, which has a single attribute called 'CollectionsStatisticsFlag'. By default, this flag is set to {{false}}. After setting this value to {{true}}, statistics will be maintained for all the major indices. For each index, there will be a SortedCollectionStatistics MBean published that shows the cache and file I/O values updated in real-time.

The following information is displayed for each index:

|| CacheHits | The number of operations that completed without accessing the storage system. ||
|| CacheMisses | The number of operations that completed that needed to access the storage system. ||
|| PageDiscards | The number of times a non-dirty page's memory was reused to read in another page. ||
|| PageSwaps | The number of times a page was written do disk, so its memory could be used to load another page. ||
|| Reads | The total number of times an index was searched for a statement or range of statements ||
|| Writes | The total number of times a statement was added to a collection ||

Ideally, the system should be configured to keep the number of cache misses to a minimum. If the ratio of hits to misses is low then consider increasing the memory available to the index (if other factors permit this).

Page swaps will tend to occur much more often during large scale data loading. Page discards will occur more frequently during query evaluation.