The LoadRDF tool resides in the loadrdf/ folder in the distribution of GraphDB-SE and GraphDB-Enterprise and it is used for fast loading of large data sets.
As input on the command line, the LoadRDF tool accepts a standard config file in Turtle format, the mode, and a list of files for loading.
The mode specifies the way the data is loaded in the repository:
The LoadRDF tool accepts java command line options, using -D:
The following options can tune the behaviour of the ParallelLoader when you use the parallel loading mode:
Beacause the LoadRDF tool uses the ParallelLoader, there is also a way to use loadrdf programmatically. For example, you can write you own java tool, which uses the ParallelLoader internally.
One option gives the ParallelLoader an InputStream as a parameter. The ParallelLoader internally parses the file and fills the buffers of statements for the next stage (resolving), which then prepares the resolved statements for the next stage (sorting) and, finally, the sorted statements are asynchronously loaded into PSO and POS.
Another way to use the ParallelLoader is to specify an Iterator<Statement> (parsed from another sources or possibly generated) instead of InputStream, or a File. Both constructors require to be supplied with the context where the statements will be loaded. Only statements without a context will go to the specified one and if you use a format supporting contexts (trig, trix, nq), only statements without a specified context will go in the one you want. Statements with contexts will use their own context rather than the one you have additionally specified.
The ParallelLoader accepts two -D command line options used for testing purposes: to measure the overhead of parsing and resolving vs loading data into the repository:
If any of these options is specified, a descriptive message is printed on the console.
Skip to end of metadata Go to start of metadata