This section gives an overview of configuring a GraphDB-Lite repository. Also covered are the contents of the GraphDB-Lite distribution and a description of the 'getting-started' application that is included. This sample application serves as an example for integrating GraphDB-Lite in to other systems. For a detailed step-by-step guide for installing and setting-up GraphDB-Lite, see the installation section.
The GraphDB-Lite distribution zip file includes the following folders:
Folder
Contents
doc
User guides and GraphDB primer
ext
All required third party libraries. The Sesame 2 installation can be downloaded separately from http://www.openrdf.org/. The folder also contains a copy of Lehigh University Benchmark library (lubm.jar), JUnit library (junit.jar) necessary for executing inference tests, the simple logging framework for Java jar files and the lucene full text search jar
getting-started
An example application that uses GraphDB, with all the necessary auxiliary files and folders, see section 8.2
lib
Contains the binary executable version of GraphDB-Lite as a JAR (Java library) file
lubm
Scripts and configuration files to run the LUBM [12] benchmarks documented in section 10.
templates
Contains a GraphDB-Lite repository template file (.ttl) used by the Sesame 2 framework for creating new repositories.
The distribution contains the following files in the root directory of the zip file:
File
Description
*.pie
Rule files containing definitions of the built-in rule-sets.
GraphDB-Lite_license.pdf
The license under which GraphDB-Lite is published.
setvars.cmd setvars.sh
Scripts (Windows and Linux) that define several environment variables used by the scripts that run the test cases and getting-started application. It should be edited for each installation, as it determines the Java virtual machine to be started, the path to all the relevant JAR files, including those of GraphDB-Lite and Sesame.
Getting-Started Application
The GraphDB distribution comes with a sample application that can be used as a template for building applications that interact with a GraphDB repository. The source code of this application performs a sequence of typical operations: initialisation of the repository, uploading statements, executing queries and obtaining results, deleting statements, etc. This application template comes with:
Source code and compiled class files;
Sample ontology and data files;
Sesame repository template file;
Scripts, which invoke the application.
An easy way to set up an application to use GraphDB is to copy the getting-started folder and modify the contents as necessary.
There follows a short description on how the getting started application is organised and what the sample code does. The easiest way of getting a good understanding of it is to read the source code of the GettingStarted class, located in the src folder - the code is extensively commented. The program accepts a number of parameters as described below.
Parameter
Description
Default
Repository/communication parameters:
config
Specifies the repository description file to use to create a repository. Configuration options specified in this file are explained in following sections. This parameter is ignored if the url parameter is used.
./owlim.ttl
url
Used in conjunction with the repository parameter, this URL specifies the remote Sesame server and will have the form http://<hostname>:<port>/openrdf-sesame/. This parameter overrides the config parameter.
repository
The repository ID, used in conjunction with the url parameter, identifies the repository on the remote Sesame server.
username
Specifies the username for HTTP authentication (if enabled at the server).
password
Specifies the password for HTTP authentication (if enabled at the server).
Export parameters:
exportfile
dump the repository contents to the given filename
exporttype
export all/explicit/implicit statements, default is explicit
exportformat
the RDF format: N-Triples, N3, Turtle, RDF/XML, TriG, TriX
Data loading parameters:
context
If not specified, statements loaded are given the context of the file URL from which the statements were loaded. If specified (context=URI), then all statements loaded are given this URI for the context. If an empty context is used (context=) then all statements loaded have no context (default graph).
preload
Specifies the folder or file containing RDF data that is loaded automatically when the program starts. If the parameter value specifies a folder then it is searched recursively for all files that contain RDF data.
./preload
verify
Verify the integrity of the RDF data during parsing.
true
stoponerror
Whether the parser should stop immediately if it finds an error in the data.
true
preservebnodes
Whether the parser should preserve bnode identifiers specified in the source.
true
datatypehandling
The data-type handling method, one of: ignore (allow any data-type values), verify (validate data-type representations) or normalize (convert all data-type values to their canonical form)
verify
chunksize
The number of statements to parse/load before inserting a commit instruction
500000
Query and miscellaneous parameters:
queryfile
Specifies the file containing queries that are to be executed. The files can contain queries in any format supported by Sesame and can include SPARQL updates.
./queries/sample.sparql
showresults
Specifies whether the results from queries will be displayed or not.
true
showstats
Indicates whether to show initialisation statistics after loading the selected data files
false
updates
Specifies whether the statement insertion and deletion step is performed.
false
To run the program, use the example.cmd / example.sh script. This script requires the JAVA_HOME environment variable to be set. Alternatively, it can be set directly by editing the setvars.cmd / setvars.sh script in the root folder of the GraphDB software distribution. If the program is modified to use a custom rule set, then JAVA_HOME must point to the Java Runtime Environment (JRE) of a Java Development Kit (JDK) version 1.6 or later, in order the new mechanism for locating the Java compiler to be used.
With the example set up, GraphDB loads the example ontology at start up as specified by the imports parameter in the repository configuration file, i.e. owlim.ttl. This ontology is ./ontology/example.rdfs. The sample program then loads any other ontologies that it finds in the preload folder. When start up is complete, the program outputs some statistics and lists the namespaces found.
The next step is to load the specified query file and to execute the queries that they contain. Some example query files are included in the queries folder. The files can contain several queries where each query starts with an identifier, enclosed in square brackets ^[ and ] on a single line; everything between two subsequent query identifiers is treated as a SeRQL or SPARQL query and is evaluated against the contents of the prepared repository. You may use also the # sign as a single line comment, so each line starting with # will be ignored. Syntax overview:
The queries are always evaluated, but the results are output only if the showresults parameter is set to true.
Furthermore, the sample application updates the contents of the repository by inserting a statement using RepositoryConnection.addStatement() and the transaction is committed. The program then fetches some statements from the repository using a direct call to the RepositoryConnection.getStatements(). The set of retrieved statements should contain the newly added statement since it matches the given pattern. The statement is then removed in a separate transaction.
The application can also be run against a remote repository exposed using the Sesame HTTP server. In this case, the url parameter is used to specify the sesame endpoint and the repository parameter is used to specify the repository to use on this server. The use of url and repository overrides the config parameter.
Bulk data loading
Due to its range of functions, the getting started application makes a useful bulk-loading tool. It can load a single file or traverse through a whole directory structure loading any RDF file it can find. If the files are very large, it will automatically insert commit operations at suitable moments, so it is not necessary to convert and split large files in to smaller ones. For example:
Will load all RDF files located in /home/me/wordnet/ and its sub-directories in to the repository called my_repo at the Sesame endpoint http://192.168.1.31:8080/openrdf-sesame secured using HTTP authentication with the above credentials. If an error occurs, it will output a message and continue on to the next RDF file.
Making a back-up
The export features of getting-started allow a reasonable back-up of a GraphDB database to be made.
In this example the TriG file format is used, because it preserves named-graph names (it is a quad format).
Wordnet example
Wordnet, is the most popular lexical knowledge base, developed at the University of Princeton. It encodes the meanings of about 150,000 English words. The meanings of the words are defined by word-senses, which relate a word to a lexical concept. Lexical concepts are called synsets, i.e. synonym sets – about 115,000 of those appear in Wordnet v.2.0. Numerous lexical semantic relations are formally modelled, e.g.
Hyponymy (subsumption from a more-general term)
Antonymy (negation, a term with the opposite meaning)
Causation and entailment (for verbs)
A standard RDF/OWL representation of Wordnet is available at http://www.w3.org/TR/wordnet-rdf/. It contains about 1.9 million explicit statements (the Full variant), expressed in a fragment of OWL-Lite that further entails 6.3 million implicit statements.
To configure the getting started example program to use the Wordnet data sets and run the included Wordnet queries, one should download the archive of the full version from http://www.w3.org/2006/03/wn/wn20/download/wn20full.zip, extract it into a folder, e.g. ./preload/wordnet and provide a path to this folder using the preload command line parameter when starting the program. Some sample Wordnet queries are provided in the wordnet.serql and wordnet.sparql query files. These can be specified on the command line using the queryfile command line parameter.
Configuration
Sesame 2.0 keeps repository configurations in a SYSTEM repository – in RDF. A new repository can be configured simply by inserting an appropriate graph in to the SYSTEM repository. The getting started application uses the Turtle format for convenience and also because the Sesame console application uses the Turtle format for template files when creating repositories.
The diagram below gives a graphical illustration of an RDF graph that describes a repository configuration:
Often it is desirable to ensure that a repository starts with a predefined set of RDF statements, usually one or more schema graphs. This is possible by using the owlim:imports property. After start up, these files are parsed and their contents are permanently added to the repository. The complete set of configuration parameters, their descriptions and their default and allowed values are listed below. What follows is a short description of those properties specific to GraphDB-Lite that are used to setup the repository. For more information about Sesame 2 configuration schema refer to the Sesame documentation [5]. In short, the configuration is an RDF graph, the root node is of rdf:type rep:Repository, it must be connected through rep:RepositoryID property to a Literal that contains the human readable name of the repository. The root node must be connected via the rep:repositoryImpl property to a node that describes the configuration. The type of the repository is defined via rep:repositoryType property and its value must be openrdf:SailRepository to allow for custom sail implementations (such as GraphDB-Lite) to be used in Sesame 2.0. Then a node that specifies the Sail implementation to be instantiated must be connected with sr:sailImpl property. To instantiate GraphDB-Lite, this last node must have a property sail:sailType with the value swiftowlim:Sail – the Sesame framework will locate the correct SailFactory within the application classpath that will be used to instantiate the Java implementation class.
Namespaces corresponding to the prefixes used in the above paragraph are:
All properties used to specify GraphDB-Lite's configuration parameters use the owlim: prefix and the local names match up with the parameters listed below, e.g. the value of the ruleset parameter can be specified using the http://www.ontotext.com/trree/owlim#ruleset property.
Many of the GraphDB specific configuration parameters can be set via the Java Virtual Machine (JVM) system properties passed as command line parameters when starting the JVM. Values for configuration parameters that are given on the command line take precedence over those present in the repository configuration. For instance, the ruleset parameter can be set from the command line by using:
-Druleset=owl-max
Sample Configuration
There follows an example configuration (in Turtle RDF format) of a Sesame 2 repository that uses a GraphDB-Lite sail implementation:
Almost all GraphDB parameters can be set both in the TTL configuration file and from the command line using the Java -D_<_param.name>=<value> command line option to set system properties. When a parameter is set simultaneously using both methods, the system property overrides the value in the configuration file. Some GraphDB parameters can only be set using system properties.
These list of all GraphDB parameters is here:
Parameter
TTL
Java -D
Description
base-URL
X
X
default<none>, specifies the default namespace for the main persistence file. Non-empty namespaces are recommended, because their use guarantees the uniqueness of the anonymous nodes that may appear within the repository.
contexts-index-size
X
default1000, specifies the size of the index for storing contexts.
debug.level
X
default0, defines the level of detail for debug output used for various implementation classes. Valid values are from 0 to 4 inclusive, where 0 is no debug output and 4 is the most verbose.
defaultNS
X
default<empty>, default namespaces corresponding to each imported schema file separated by semicolon (
and the number of namespaces must be equal to the number of schema files from the imports parameter.
Example:
owlim:defaultNS "http://www.w3.org/2002/07/owl#;http://example.org/owlim#";
Note: This parameter cannot be set via a command line argument
entity-index-size
X
X
default1000000, defines the number of entity hash table index entries; the bigger the size, the less the collisions in the hash table and the faster the entity retrieval; the entity hash table does not rehash so its index size is constant throughout the life of the repository. The minimum value is 10000.
imports
X
Defaultnone, a list of schema files that will be imported at start up. All the statements, found in these files, will be loaded into the repository and will be treated as read-only. The serialization format is assumed to be RDFS/XML, unless the file has a .NT extension. Example:
owlim:imports
"./ont/owl.rdfs;./ont/ex.rdfs"
Schema files can be either a local path name, e.g. ./ontology/myfile.rdf
or a URL, e.g. http://www.w3.org/2002/07/owl.rdf
If this parameter is used then the default namespace for each imported schema file must be provided using the defaultNS parameter.
Note: This parameter cannot be set via a command line argument.
jobsize
X
X
default1000, the number of statements processed by each worker thread during a commit operation.
new-triples-file
X
default<none>, the file used to store new triples on each commit. The filename used is a concatenation of storage-folder and this parameter.
noPersist
X
defaultfalse, if true, then changes to the repository are not persisted.
num.threads.run
X
default<number_of_processors>, sets the number of worker threads used for parallel inferencing.
optimize.rules
X
defaultfalse, used by the RuleCompiler to select the rule optimisation method to use. true - use a method that orders the premises so that the premises with the smallest number of free variables are first. false - uses the more complex rule optimisation technique.
partialRDFS (DEPRECATED)
X
X
defaultfalse, if true then trivial RDFS inferences are switched off, such as 'every URI is type ' or 'every class is a sub-class of rdfs:Resource', etc. Deprecated in favour of explicit choice of rule-set, see ruleset below. However, if this flag is set to true, then the chosen rule-set will effectively have -optimized appended to it, if not present, e.g. ruleset=owl-horst and partialRDFS=true is the same as selecting ruleset=owl-horst-optimized
predicate-index-size
X
default1000, the size of the index used to store predicates.
repository-type
X
X
defaultin-memory-repository, available values are:
in-memory-repository
ruleset
X
X
defaultowl-horst, built-in rule sets are empty, rdfs, owl-horst, owl-max, owl2-rl-conf and owl2-rl-reduced and their optimised counterparts rdfs-optimized, owl-horst-optimized, owl-max-optimized and owl2-rl-reduced-optimized (note that there is no optimised version of owl2-rl-conf). A custom rule set is chosen by setting the path to its rule file (.pie).
storage-folder
X
X
defaultowlim-storage, specifies the folder where the repository contents will be persisted on shutdown (and reloaded on start up).