This section gives an overview of configuring a GraphDB-SE repository. Also covered are the contents of the GraphDB-SE distribution and a description of the 'getting-started' application that is included. This sample application serves as an example for integrating GraphDB-SE in to other systems. For a detailed step-by-step guide for installing and setting-up GraphDB-SE, see the installation section.
Contents of the Distribution Package
The GraphDB-SE distribution zip file includes the following folders:
The distribution contains the following files in the root directory of the zip file:
The GraphDB distribution comes with a sample application that can be used as a template for building applications that interact with a GraphDB repository. The source code of this application performs a sequence of typical operations: initialisation of the repository, uploading statements, executing queries and obtaining results, deleting statements, etc. This application template comes with:
An easy way to set up an application to use GraphDB is to copy the getting-started folder and modify the contents as necessary.
To run the program, use the example.cmd / example.sh script. This script requires that the JAVA_HOME environment variable to be set. Alternatively, it can be set directly by editing the setvars.cmd / setvars.sh script in the root folder of the GraphDB-SE software distribution. If the program is modified to use a custom rule set, then JAVA_HOME must point to the Java Runtime Environment (JRE) of a Java Development Kit (JDK) version 1.6 or later. This is so that the new mechanism for locating the Java compiler can be used.
#some comment ^[queryid1] <query line1> <query line2> ... <query lineN> #some other comment ^[nextqueryid] <query line1> ... <EOF>
The queries are always evaluated, but the results are output only if the showresults parameter is set to true.
The application can also be run against a remote repository exposed using the Sesame HTTP server. In this case, the url parameter is used to specify the sesame endpoint and the repository parameter is used to specify the repository to use on this server. The use of url and repository overrides the config parameter.
Bulk data loading
Due to its range of functions, the getting started application makes a useful bulk-loading tool. It can load a single file or traverse through a whole directory structure loading any RDF file it can find. If the files are very large, it will automatically insert commit operations at suitable moments, so it is not necessary to convert and split large files in to smaller ones. For example:
./example.sh url=http://192.168.1.31:8080/openrdf-sesame repository=my_repo preload=/home/me/wordnet/ username=me password=secret queryfile=none
It loads all RDF files located in /home/me/wordnet/ and its subdirectories in to the repository called my_repo at the Sesame endpoint http://192.168.1.31:8080/openrdf-sesame, secured using HTTP authentication with the above credentials. If an error occurs, it outputs a message and continues on to the next RDF file.
Making a back-up
The export features of getting-started allow a reasonable back-up of a GraphDB database to be made.
./example.sh queryfile=none url=http://192.168.1.31:8080/openrdf-sesame repository=my_repo preload= exportfile=backup.trig exportformat=trig exporttype=explicit
In this example the TriG file format is used, because it preserves named-graph names (it is a quad format).
Wordnet, is the most popular lexical knowledge base, developed at the University of Princeton. It encodes the meanings of about 150,000 English words. The meanings of the words are defined by word-senses, which relate a word to a lexical concept. Lexical concepts are called synsets, i.e. synonym sets – about 115,000 of those appear in Wordnet v.2.0. Numerous lexical semantic relations are formally modelled, e.g.
A standard RDF/OWL representation of Wordnet is available at http://www.w3.org/TR/wordnet-rdf/. It contains about 1.9 million explicit statements (the Full variant), expressed in a fragment of OWL-Lite that further entails 6.3 million implicit statements.
Sesame 2.0 keeps repository configurations in a SYSTEM repository – in RDF. A new repository can be configured simply by inserting an appropriate graph in to the SYSTEM repository. The getting started application uses the Turtle format for convenience and also because the Sesame console application uses the Turtle format for template files when creating repositories.
Often it is desirable to ensure that a repository starts with a predefined set of RDF statements, usually one or more schema graphs. This is possible by using the owlim:imports property. After start up, these files are parsed and their contents are permanently added to the repository. The complete set of configuration parameters, their descriptions and their default and allowed values are listed below. What follows is a short description of those properties specific to GraphDB-SE that are used to setup the repository. For more information about Sesame 2 configuration schema refer to the Sesame documentation . In short, the configuration is an RDF graph, the root node is of rdf:type rep:Repository, it must be connected through rep:RepositoryID property to a Literal that contains the human readable name of the repository. The root node must be connected via the rep:repositoryImpl property to a node that describes the configuration. The type of the repository is defined via rep:repositoryType property and its value must be openrdf:SailRepository to allow for custom sail implementations (such as GraphDB-SE) to be used in Sesame 2.0. Then a node that specifies the Sail implementation to be instantiated must be connected with sr:sailImpl property. To instantiate GraphDB-SE, this last node must have a property sail:sailType with the value owlim:Sail – the Sesame framework will locate the correct SailFactory within the application classpath that will be used to instantiate the Java implementation class.
rep: <http://www.openrdf.org/config/repository#> sr: <http://www.openrdf.org/config/repository/sail#> sail: <http://www.openrdf.org/config/sail#> owlim: <http://www.ontotext.com/trree/owlim#>
All properties used to specify GraphDB-SE's configuration parameters use the owlim: prefix and the local names match up with the parameters listed below, e.g. the value of the ruleset parameter can be specified using the http://www.ontotext.com/trree/owlim#ruleset property.
There follows an example configuration (in Turtle RDF format) of a Sesame 2 repository that uses a GraphDB-SE sail implementation:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix rep: <http://www.openrdf.org/config/repository#>. @prefix sr: <http://www.openrdf.org/config/repository/sail#>. @prefix sail: <http://www.openrdf.org/config/sail#>. @prefix owlim: <http://www.ontotext.com/trree/owlim#>.  a rep:Repository ; rep:repositoryID "owlim" ; rdfs:label "GraphDB Getting Started" ; rep:repositoryImpl [ rep:repositoryType "openrdf:SailRepository" ; sr:sailImpl [ sail:sailType "owlim:Sail" ; owlim:ruleset "owl-horst-optimized" ; owlim:base-URL "http://example.org/owlim#" ; owlim:imports "./ontology/my_ontology.rdf" ; owlim:defaultNS "http://www.my-organisation.org/ontology#" ; owlim:entity-index-size "5000000" ; owlim:cache-memory "4G" ; owlim:storage-folder "storage" ; owlim:repository-type "file-repository" ; ] ].
Apart from the I/O buffers used for caching, GraphDB-SE keeps in memory the indexes from the nodes in the RDF graph. This is a design decision in order to improve the overall performance of the repository. Each I/O buffer (page) is exactly 64kb and the indexing information per node in the graph is 12 bytes. So, depending on the dataset, memory requirements per repository may vary. To ease the calculation for the amount of Java heap memory required for a GraphDB-SE repository an excel spreadsheet is included in the distribution – owlim-se-configurator.xls.
Cache Memory Configuration
There are several components in GraphDB that make use of caching (e.g. FTS indices, predicate list, tuple indices). In different situations certain caches will need more memory than others. GraphDB allows for the configuration of both the total cache memory to be used by a repository and all the separate per-module caches.
The following parameters control the amount of memory assigned to each of the different caches:
All parameters can be specified in bytes, kilobytes, megabytes or gigabytes by using a unit specifier at the end of the integer number. When no unit specifier is given, this is interpreted as bytes, otherwise use k or K - kilobytes, m or M - megabytes and g or G - gigabytes (everything base 2).
The general rule of thumb is:
cache-memory = tuple-index-memory + predicate-memory + fts-memory
However, if some of the modules using the cache (e.g. full-text search) are turned off, it is excluded from the above equation.
Almost all GraphDB parameters can be set both in the TTL configuration file (that will populate the SYSTEM repository) and from the command line using the Java -D<param.name>=<value> command line option to set system properties. When a parameter is set simultaneously using both methods, the system property overrides the value in the configuration file. Some GraphDB parameters can only be set using system properties.
Once a repository is created, it is possible to change some parameters, either by changing the SYSTEM repository or by overriding values using Java system properties - both methods require a restart of the JVM or Tomcat. Some parameters can not be changed after a repository has been created. These either have no effect (once the relevant data structures are built, their structure can not be changed) or changing them will cause inconsistencies (these parameters affect the reasoner). The following table explains the variations used in the table of configuration parameters:
The following table lists all GraphDB configuration parameters:
Skip to end of metadata Go to start of metadata