GraphDB FAQ

compared with
Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (14)

View Page History
h5. How do I preserve contexts when exporting and importing Data

In order to preserve the context (named graph) when exporting/importing the whole database, a context-aware RDF file format must be used, e.g. TriG. After serialising the repository to a file with this format (this can be done through the Sesame workbench Web application), the file can be imported with the following steps:
* Go to *\[Add\]*
* Choose Data format: *TriG*
* Choose RDF Data File: e.g. *export.trig*
* Clear the context text field (it will have been set to the URL of the file). If this is not cleared, then all the imported RDF statements will be given a context of <[file://export.trig]> or similar.
* Upload

{noformat}

This method will stream a snapshot of the database's explicit statements in to the 'export.trig' file.

A backup can also be done programmatically using the Sesame API. See the [RepositoryConnection.exportStatements()|http://www.openrdf.org/doc/sesame2/api/org/openrdf/repository/RepositoryConnection.html#exportStatements(org.openrdf.model.Resource,%20org.openrdf.model.URI,%20org.openrdf.model.Value,%20boolean,%20org.openrdf.rio.RDFHandler,%20org.openrdf.model.Resource...)] method and the example in the next question.

If it is possible to shutdown the repository, then a backup can be effected by copying the GraphDB storage directory (and any sub-directories). See the [installation section|GraphDB-SE Installation] for information about where GraphDB storage folders are located. To restore a repository from a back up, make sure the repository is not running and then replace the entire contents of the storage directory (and any sub-directories) with the backup. Then restart the repository and check the log file to ensure a successful start up.

GraphDB-Enterprise has an additional online backup feature that copies a binary database image from a worker node to the cluster master - see the [GraphDB-Enterprise user guide|GraphDB-Enterprise Administration].
h5. How do I dump the contents of a large repository to RDF?

The Sesame openRDF workbench Web application has an export function that can be used to export the contents of moderately sized repositories. However, using this with large repositories (more than a hundred million statements or more) causes problems, problems - usually time-outs for the Servlet container (Tomcat) hosting the application. Also, the workbench cannot be used when using GraphDB-SE without Tomcat.
Therefore, a more straightforward approach for exporting RDF data from repositories is to do this programmatically. The Sesame {{RepositoryConnection.getStatements()}} method can be called with the {{includeInferred}} flag set to {{false}} (in order not to serialise the inferred statements). Then the returned iterator can be used to visit every explicit statement in the repository and one of the Sesame RDF writer implementations can be used to output the statements in the chosen format. If the data will be re-imported, the N-Triples format is recommended, because this can easily be broken in to large 'chunks' that can be inserted and committed separately. The following code snippet shows how an export can be achieved using this approach:

Therefore, a more straightforward approach for exporting RDF data from repositories is to do this programmatically. The Sesame {{RepositoryConnection.getStatements()}} method can be called with the {{includeInferred}} flag set to {{false}} (in order not to serialise the inferred statements). Then the returned iterator can be used to visit every explicit statement in the repository and one of the Sesame RDF writer implementations can be used to output the statements in the chosen format. If the data will be re-imported, the N-Triples format is recommended, because this can easily be broken into large 'chunks' that can be inserted and committed separately. The following code snippet shows how an export can be achieved using this approach:

{code:borderStyle=solid}java.io.OutputStream out = ...;
RDFWriter writer = Rio.createWriter(RDFFormat.NTRIPLES, out);
While most versions of GraphDB are backward compatible, some major version number increases use such different data structures that disk images can no longer be automatically updated to the latest version.

The basic procedure is to export the RDF data from the old version of GraphDB-SE and then reload it in to a new repository instance that uses the new version of GraphDB-SE. Exporting is straightforward when using the Sesame workbench -- - simply click the 'Export' button, choose the format, and click 'download'. To import in to a new repository, click 'add', select a format, specify the file and base URI, then click 'Upload'.
If not using the Sesame workbench, the export must be done programmatically using the {{RepositoryConnection.getStatements()}} API, because the Sesame console does not have an export function. NOTE: It should be possible to export only the explicit statements, as the inferred statements will be recomputed at load time. Fortunately, the Sesame console application does have a 'load' function and this can be used to reload the exported statements.

If not using the Sesame workbench, the export must be done programmatically using the {{RepositoryConnection.getStatements()}} API, because the Sesame console does not have an export function.
NOTE: It should be possible to export only the explicit statements, as the inferred statements will be recomputed at load time. Fortunately, the Sesame console application does have a 'load' function and this can be used to reload the exported statements.

h5. How can I load a large RDF/XML file without getting an "entity expansion limit exceeded" error?

The XML parser will generate an error similar to the following:




bq. Parser has reached the entity expansion limit "64,000" set by the Application.
when it generates more than a specified number of 'entities'. The default limit for the built-in Java XML parser is 64,000, however it can be configured by using a Java system property. To increase the limit, pass the following to the JVM in which GraphDB/Sesame is running. Note that the actual value can be increased as necessary. Don't forget that if running in Tomcat then this must be passed to the Tomcat instance using the CATALINA_OPTS environment variable.
{noformat}


h5. How can I upgrade to a new version of GraphDB-SE without exporting and reimporting all my data?

There is a good chance that it will take quite a long time to initialise as the storage files are modified, but it should be quicker than re-importing all the data.

h5. How do I load large amounts of data in to GraphDB-SE or GraphDB-Enterprise?

In general, RDF data can be loaded into a given Sesame repository using the 'load' command in the Sesame console application or directly through the workbench web application. However, neither of these approaches will work when using a very large number of triples, e.g. a billion statements. A common solution would be to convert the RDF data into a line-based RDF format (e.g. N-triples) and then split it into many smaller files (e.g. using the linux command 'split'). This would allow each file to be uploaded separately using either the console or the workbench applications.

h1. Developers