GraphDB FAQ

compared with
Current by Nikola Petrov
on Apr 24, 2015 19:28.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (23)

View Page History
Another approach is to download the full version of Sesame and configure GraphDB-SE as a plug-in. This method uses the Sesame HTTP server hosted in Tomcat (or similar) and in this way you can use Sesame together with GraphDB as a server application, accessed via the standard Sesame APIs.

Sesame version 2.2 onwards includes the Sesame Workbench - a convenient Web Application for managing repositories, importing/exporting RDF data, executing queries, etc.

For more information please check the "doc" folder of the GraphDB-SE archive.
h5. How can I retrieve my repository configurations from the Sesame SYSTEM repository?

When using a LocalRepositoryManager, Sesame will store the configuration data for repositories in its own 'SYSTEM' repository. A tomcat instance will do the same and you will see 'SYSTEM' under the list of repositories that the instance is managing. To see what configuration data is stored in a GraphDB-SE repository, connect to the SYSTEM repository and execute the following query:

{noformat}
{noformat}

For GraphDB-Enterprise worker repositories the query is slightly different:

{noformat}
PREFIX sys: <http://www.openrdf.org/config/repository#>
PREFIX sail: <http://www.openrdf.org/config/repository/sail#>

select ?id ?type ?param ?value
where {
?rep sys:repositoryID ?id .
?rep sys:repositoryImpl ?delegate .
?delegate sys:delegate ?impl .
?impl sys:repositoryType ?type .
optional {
?impl sail:sailImpl ?sail .
?sail ?param ?value .
}
# FILTER( ?id = "specific_repository_id" ) .
}
ORDER BY ?id ?param
{noformat}


This will return the repository ID and type, followed by name-value pairs of configuration data for SAIL repositories, including the SAIL type - "owlim:Sail" for GraphDB-SE and "swiftowlim:Sail" for GraphDB-Lite. GraphDB-Enterprise master nodes are not SAIL repositories and have the type "owlim:ReplicationCluster".

There is no easy generic way of changing the configuration - it is stored in the SYSTEM repository created and maintained by Sesame. However, GraphDB allows overriding of these parameters by specifying the parameter values as JVM options. For instance, by passing \-Dcache-memory=1g option to the JVM, GraphDB-SE will read it and use its value to override whatever was configured by the .ttl file. This is convenient for temporary set-ups that require easy and fast configuration change, e.g. for experimental purposes.

Changing the configuration in the SYSTEM repository is trickier, because the configurations are usually structured using blank node identifiers, which are always unique, so attempting to modify a statement with a blank node by using the same blank node identifier will fail. However, this can be achieved with SPARQL UPDATE using a DELETE-INSERT-WHERE command as follows (please note that this is valid for GraphDB-SE repositories):

{noformat}
{noformat}

For GraphDB-Enterprise worker repositories, the Sparql update is slightly different:

{noformat}
PREFIX sys: <http://www.openrdf.org/config/repository#>
PREFIX sail: <http://www.openrdf.org/config/repository/sail#>
PREFIX onto: <http://www.ontotext.com/trree/owlim#>
DELETE { GRAPH ?g {?sail ?param ?old_value } }
INSERT { GRAPH ?g {?sail ?param ?new_value } }
WHERE {
GRAPH ?g { ?rep sys:repositoryID ?id . }
GRAPH ?g { ?rep sys:repositoryImpl ?delegate . }
GRAPH ?g { ?delegate sys:repositoryType ?type . }
GRAPH ?g { ?delegate sys:delegate ?impl . }
GRAPH ?g { ?impl sail:sailImpl ?sail . }
GRAPH ?g { ?sail ?param ?old_value . }
FILTER( ?id = "repo_id" ) .
FILTER( ?param = onto:enable-context-index ) .
BIND( "true" AS ?new_value ) .
}
{noformat}


Modify the last three lines of the update command to specify the repository ID, the parameter, and the new value. Then execute against the SYSTEM repository. In this example, the enable-context-index is changed, but there are other parameters that can not be changed once the repository is created, e.g. the rule-set (in the case of GraphDB-SE).

If you receive a stack trace containing the following:





bq. Invocation of init method failed; nested exception is java.io.IOException: Unable to create logging directory /usr/share/tomcat6/.aduna/openrdf-sesame/logs
then this indicates that Tomcat does not have write permission to its data directory (where it stores configuration, logs, and actual repository data). To fix this, log in as root to the server machine and do the following:
To use custom rule files, GraphDB must be running in a JVM that has access to the Java compiler. The easiest way to do this is to use the Java runtime from a Java Development Kit (JDK).

h5. I am getting SocketTimout when uploading large files

This might be caused because the import is taking longer than the configured timeout for your application server. In the case of tomcat you can change/add a connectionTimeout parameter in the&nbsp;Connector&nbsp;element in *conf/server.xml* like this:
{code} <Connector executor="tomcatThreadPool"
port="8080" protocol="HTTP/1.1"
connectionTimeout="40000"
redirectPort="8443" />{code}
or disable upload timeouts by setting the&nbsp;*disableUploadTimeout* to false. More information can be found in tomcat's documentation&nbsp;[here|http://tomcat.apache.org/tomcat-7.0-doc/config/http.html#Standard_Implementation]

h5. Slow tomcat startup time on Linux

If you are seeing the following in the logs
{code}
Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [5172] milliseconds.
{code}
then try adding {code}
-Djava.security.egd=file:/dev/./urandom
{code}
to your *CATALINA_OPTS* in *setenv.sh*
You can check [this|http://wiki.apache.org/tomcat/HowTo/FasterStartUp#Entropy_Source] for more information and implications for this parameter

h1. Backup/restore/import/export

h5. How do I preserve contexts when exporting and importing Data

In order to preserve the context (named graph) when exporting/importing the whole database, a context-aware RDF file format must be used, e.g. TriG. After serialising the repository to a file with this format (this can be done through the Sesame workbench Web application), the file can be imported with the following steps:
* Go to *\[Add\]*
* Choose Data format: *TriG*
* Choose RDF Data File: e.g. *export.trig*
* Clear the context text field (it will have been set to the URL of the file). If this is not cleared, then all the imported RDF statements will be given a context of <[file://export.trig]> or similar.
* Upload

{noformat}

This method will stream a snapshot of the database's explicit statements in to the 'export.trig' file.

A backup can also be done programmatically using the Sesame API. See the [RepositoryConnection.exportStatements()|http://www.openrdf.org/doc/sesame2/api/org/openrdf/repository/RepositoryConnection.html#exportStatements(org.openrdf.model.Resource,%20org.openrdf.model.URI,%20org.openrdf.model.Value,%20boolean,%20org.openrdf.rio.RDFHandler,%20org.openrdf.model.Resource...)] [RepositoryConnection.exportStatements()|http://openrdf.callimachus.net/sesame/2.6/apidocs/index.html] method and the example in the next question.

If it is possible to shutdown the repository, then a backup can be effected by copying the GraphDB storage directory (and any sub-directories). See the [installation section|GraphDB-SE Installation] for information about where GraphDB storage folders are located. To restore a repository from a back up, make sure the repository is not running and then replace the entire contents of the storage directory (and any sub-directories) with the backup. Then restart the repository and check the log file to ensure a successful start up.

GraphDB-Enterprise has an additional online backup feature that copies a binary database image from a worker node to the cluster master - see the [GraphDB-Enterprise user guide|GraphDB-Enterprise Administration].
h5. How do I dump the contents of a large repository to RDF?

The Sesame openRDF workbench Web application has an export function that can be used to export the contents of moderately sized repositories. However, using this with large repositories (more than a hundred million statements or more) causes problems, problems - usually time-outs for the Servlet container (Tomcat) hosting the application. Also, the workbench cannot be used when using GraphDB-SE without Tomcat.
Therefore, a more straightforward approach for exporting RDF data from repositories is to do this programmatically. The Sesame {{RepositoryConnection.getStatements()}} method can be called with the {{includeInferred}} flag set to {{false}} (in order not to serialise the inferred statements). Then the returned iterator can be used to visit every explicit statement in the repository and one of the Sesame RDF writer implementations can be used to output the statements in the chosen format. If the data will be re-imported, the N-Triples format is recommended, because this can easily be broken in to large 'chunks' that can be inserted and committed separately. The following code snippet shows how an export can be achieved using this approach:

Therefore, a more straightforward approach for exporting RDF data from repositories is to do this programmatically. The Sesame {{RepositoryConnection.getStatements()}} method can be called with the {{includeInferred}} flag set to {{false}} (in order not to serialise the inferred statements). Then the returned iterator can be used to visit every explicit statement in the repository and one of the Sesame RDF writer implementations can be used to output the statements in the chosen format. If the data will be re-imported, the N-Triples format is recommended, because this can easily be broken into large 'chunks' that can be inserted and committed separately. The following code snippet shows how an export can be achieved using this approach:

{code:borderStyle=solid}java.io.OutputStream out = ...;
RDFWriter writer = Rio.createWriter(RDFFormat.NTRIPLES, out);
While most versions of GraphDB are backward compatible, some major version number increases use such different data structures that disk images can no longer be automatically updated to the latest version.

The basic procedure is to export the RDF data from the old version of GraphDB-SE and then reload it in to a new repository instance that uses the new version of GraphDB-SE. Exporting is straightforward when using the Sesame workbench -- - simply click the 'Export' button, choose the format, and click 'download'. To import in to a new repository, click 'add', select a format, specify the file and base URI, then click 'Upload'.
If not using the Sesame workbench, the export must be done programmatically using the {{RepositoryConnection.getStatements()}} API, because the Sesame console does not have an export function. NOTE: It should be possible to export only the explicit statements, as the inferred statements will be recomputed at load time. Fortunately, the Sesame console application does have a 'load' function and this can be used to reload the exported statements.

If not using the Sesame workbench, the export must be done programmatically using the {{RepositoryConnection.getStatements()}} API, because the Sesame console does not have an export function.
NOTE: It should be possible to export only the explicit statements, as the inferred statements will be recomputed at load time. Fortunately, the Sesame console application does have a 'load' function and this can be used to reload the exported statements.

h5. How can I load a large RDF/XML file without getting an "entity expansion limit exceeded" error?




bq. Parser has reached the entity expansion limit "64,000" set by the Application.
when it generates more than a specified number of 'entities'. The default limit for the built-in Java XML parser is 64,000, however it can be configured by using a Java system property. To increase the limit, pass the following to the JVM in which GraphDB/Sesame is running. Note that the actual value can be increased as necessary. Don't forget that if running in Tomcat then this must be passed to the Tomcat instance using the CATALINA_OPTS environment variable.
{noformat}


h5. How can I upgrade to a new version of GraphDB-SE without exporting and reimporting all my data?

There is a good chance that it will take quite a long time to initialise as the storage files are modified, but it should be quicker than re-importing all the data.

h5. How do I load large amounts of data in to GraphDB-SE or GraphDB-Enterprise?

In general, RDF data can be loaded into a given Sesame repository using the 'load' command in the Sesame console application or directly through the workbench web application. However, neither of these approaches will work when using a very large number of triples, e.g. a billion statements. A common solution would be to convert the RDF data into a line-based RDF format (e.g. N-triples) and then split it into many smaller files (e.g. using the linux command 'split'). This would allow each file to be uploaded separately using either the console or the workbench applications.

h1. Developers

# Install ant 1.8 from [http://ant.apache.org/]
# Install maven 2.2.1 - just download from [http://maven.apache.org/] and install in a convenient location and make making sure that mvn.bat is on your PATH so you can run it from the command line. No additional configuration is required.
# Check out the desired Sesame branch from SVN - 2.6 is used in these instructions [http://repo.aduna-software.org/svn/org.openrdf/sesame/branches/2.6]
# Open a command line and go to the core subdirectory of your branch working directory.