View Source

{excerpt}Every Turtle/RDF file *must* be validated before committing to SVN{excerpt}.
Else automatic repository deployment (refresh) scripts fail and you waste your colleagues' times.
{toc}

h1. Validation

h2. RIOT
[Jena ARQ RIOT|http://openjena.org/wiki/RIOT] is an RDF conversion/validation tool. It's Java based so it runs on Linux or Windows.
The current download link (Jan 2017) is https://jena.apache.org/download/.

Older installation:
- [Download ARQ|http://sourceforge.net/projects/jena/files/ARQ/] ([ARQ-2.8.8|http://downloads.sourceforge.net/project/jena/ARQ/ARQ-2.8.8/arq-2.8.8.zip?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fjena%2Ffiles%2FARQ%2FARQ-2.8.8%2F&ts=1324486745&use_mirror=ignum] is current as of 2011-04-21)
- unzip to a path that includes *no spaces* (eg on Windows: c:\prog\ARQ-2.8.8)
- On Linux it's easier:
-- add ARQ-2.8.8/bin to your path
- On Windows you need to jump through more hoops:
-- add c:\prog\ARQ-2.8.8\bat to your path
-- write a batch file riot.bat in the same dir:
{code}
@echo off
set ARQROOT=c:\prog\ARQ-2.8.8
call %ARQROOT%\bat\make_classpath.bat %ARQROOT%
set JVM_ARGS=-Xmx1200M -server
java %JVM_ARGS% -cp %CP% riotcmd.riot %*
{code}

Newer installation:
- Just get apache-jena-2.11.1, it includes the required files: shell (Linux) and batch (Windows)

Call it like this
{code}riot --validate myfile.ttl{code}
- Unfortinately it returns only the first error. rdfparse (another jena tool) also returns only the first error
- If there are no errors, you'll see no output
- TODO: integrate this as a SVN commit hook, or emacs vc-before-checkin-hook

h3. Emacs integration
Here's how to integrate RIOT to the Emacs 'compile' command
# Get and install n3-mode.el (it's rather primitive but still useful for Turtle editing), then
{code}
(autoload 'n3-mode "n3-mode" "Major mode for OWL or N3 files" t)
(add-to-list 'auto-mode-alist '("\\.ttl$" . n3-mode))
(add-to-list 'auto-mode-alist '("\\.nt$" . n3-mode))
(add-to-list 'auto-mode-alist '("\\.owl$" . n3-mode))
{code}
# Get and install [smart-compile.el|http://sourceforge.jp/projects/macwiki/svn/view/zenitani/elisp/smart-compile.el?view=co&root=macwiki], then
{code}
(autoload 'smart-compile "smart-compile" nil t)
(global-set-key "\C-cc" 'smart-compile)
(eval-after-load "smart-compile"
'(add-to-list 'smart-compile-alist '(n3-mode . "riot --validate %f")))
{code}
# Add regexp's to recognize RIOT's error messages:
{code}
(add-to-list 'compilation-error-regexp-alist 'riot1)
(add-to-list 'compilation-error-regexp-alist 'riot2)
(add-to-list 'compilation-error-regexp-alist-alist
'(riot1 "riot --validate \\(.*\\)" 1 nil nil 0)) ; file: INFO
(add-to-list 'compilation-error-regexp-alist-alist
'(riot2 "ERROR \\[line: \\([0-9]+\\), col: \\([0-9]+\\)" nil 1 2)) ; line, col
{code}
# When editing a TTL file, invoke compilation with "C-c c". It jumps automatically to the first error, eg:
!emacs-compile-riot-turtle.png!

h2. Online Validator
[http://www.rdfabout.com/demo/validator/]
Use this for a one-off validation job.
TODO: we can easily automate calling this with wget

h2. [Validating RDF Parser (VRP)|http://139.91.183.30:9090/RDF/VRP/]
ICS-FORTH Validating RDF Parser: a tool for analyzing, validating and processing RDF schemas and resource descriptions. SVG visualization.
- Tried [install|http://139.91.183.30:9090/RDF/VRP/Install.html]
- converted susana.ttl to susana.rdf
{noformat}any23 -f xmlrdf susana.ttl > susana.rdf{noformat}
- runVRP.bat: fixed VRP_HOME, removed JAVA_HOME, fixed command line:
{noformat}"%JAVA_HOME%\bin\java" -mx1000m -classpath "%JAVA_HOME%;..."{noformat}
- ran it, trying various options
- specified susana-browseSchema.svg as one of the outputs.
Had to add namespaces:
{code:xml}
<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
{code}
and it shows a page with some control buttons, but no content
- it was never able to produce schemaVisualization.svg
- turns out it's buggy:
{noformat}
Couldn't repair and continue parse
gr.forth.ics.vrp.corevrp.RDF_Error:
java.lang.NullPointerException
d = 13 dec
Exception in thread "AWT-EventQueue-0" java.lang.Error: Symbol recycling detected (fix your scanner).
at java_cup.runtime.lr_parser.parse(lr_parser.java:545)
at gr.forth.ics.vrp.corevrp.model.Model.fetch_all_ns(Model.java:621)
at gr.forth.ics.vrp.corevrp.model.Model.fetch_all(Model.java:583)
{noformat}

h3. [RDFSuite|http://139.91.183.30:9090/RDF/index.html]
It's part of RDFSuite that includes
- Validating RDF Parser (VRP): The First RDF Parser supporting semantic validation of both resource descriptions and schemas
- RDF Schema Specific DataBase (RSSDB): The First RDF Store using schema knowledge to automatically generate an Object-Relational (SQL3) representation of RDF metadata and load resource descriptions.
- RDF Query Language (RQL): The First Declarative Language for uniformly querying RDF schemas and resource descriptions.
- RDF Update Language (RUL): The First Declarative Language for uniformly updating resource descriptions

But it's done 2002-2003 and given the above experience, I won't try it.

h2. Eyeball
Jena Eyeball [http://jena.sourceforge.net/Eyeball/]
- RDF's open world assumption means "anything goes" so eg a misspelt prop name is not considered a mistake.
- Eyeball tries to overcome this. It includes a lot of configurable checks over RDF data
- Windows download: [http://sourceforge.net/projects/jena/files/Eyeball/Eyeball%202.3/eyeball-2.3.zip/download]

See {jira:RS-1071}
for a first attempt and pieces of advice. Once we apply it successfully, we should move the info here.

h2. SHARK

- SHARK- A Test-driven Framework for Design and Evolution of Ontologies (ESWC 2018)
- source: https://github.com/gcpdev/shark
- Uses Travis CI and RDFunit
- Web-based interface where users can
-- choose between 20 different pre-defined guidelines
-- run their own custom SHACL tests.
- REST service: https://app.swaggerhub.com/apis/gcpdev/SHARK/0.1

h2. Further Links

Sebastian Hellmann [mailto:hellmann@informatik.uni-leipzig.de] on OA mlist:
I convert/validate with
- Rapper: http://librdf.org/raptor/rapper.html
- rdf.sh: https://github.com/seebi/rdf.sh (Note the RDF diff function)
- Jena CLI: http://jena.sourceforge.net/tools.html
- Pellet: http://clarkparsia.com/pellet
- Pellint http://weblog.clarkparsia.com/2008/07/02/pellint-an-ontology-repair-tool/

SHARK paper:
- methodologies: Gruninger and Fox, Methontology, On-To-Knowledge, DILIGENT and Neon
- eval frameworks: OntoClean, OntoQA, Unit Tests, OQuaRE, Neon Guidelines
- eval tools: ODEClean, ODEval, AEON, Eyeball, Moki, OQuare, OntoCheck, XD-Analyzer, OOPS

h1. Converting to Turtle
Comparison of several tools for converting RDF to Turtle.
The tools are ordered below by preference. Or you can compare the results yourself:
{attachments:patterns=.*ttl,.*rdf}

h2. rdfcat
Can concatenate several files, URLs or - (stdin).
Based on Jena RIOT.
{noformat} rdfcat -x 2354.rdf -out ttl > 2354.ttl{noformat}
Provides the best resutls

h2. rdfcopy
Also based on Jena RIOT.
{noformat}rdfcopy file:2354.rdf RDF/XML TURTLE > 2354.ttl{noformat}
Requires you to always specify the URL scheme, even for a local file.
Almost as good as rdfcat, just adds {nf}@base <file:2354.rdf>{nf} on top, which may not be desired

h2. rdf2rdf
[http://www.l3s.de/~minack/rdf2rdf/] ([rdf2rdf-1.0.1-2.3.1.jar|http://www.l3s.de/~minack/rdf2rdf/downloads/rdf2rdf-1.0.1-2.3.1.jar])
May be the easiest to use.
Based on Sesame (OpenRDF) RIO 2.3.1
{noformat}rdf2rdf 2354.rdf .ttl{noformat}
Does not group all statements per Subject and spreads prefixes throughout the file, so the Turtle is hard to understand

h2. any23
"any23" stands for "anything to triples" and converts from RDFa, microformats, RDF formats to turtle and other RDF formats

Local use:
- [install|http://any23.apache.org/download.html]
- make a batch file like this:
{code:title=any23.bat}
@echo off
java -cp c:\prog\any23-0.6.1\any23-core\target\any23-core-0.6.1-jar-with-dependencies.jar -Xmx256M org.deri.any23.cli.Rover %*
{code}
- invoke like this:
{noformat}any23 rover 2354.rdf -o 2354.ttl{noformat}
- Output is based on Sesame RIO, so it gives same result as rdf2rdf

Web use
- use their web service using the wget program (similar with curl):
{code}wget -q --post-file=myfile.ttl --output-document=myfile.nt --header=Content-Type:text/turtle http://any23.org/any23/nt{code}
- NOTICE: the file must be valid. Else the site crashes

Update 20170227:
The [Apache Any23|http://any23.apache.org] project management committee are please to announce the release of Any23 2.0 which marks a major milestone for the project.
Anything To Triples (any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents.
[Release notes|https://github.com/apache/any23/blob/any23-2.0/RELEASE-NOTES.txt], [downloads|http://any23.apache.org/download.html]. [Maven artifacts|http://search.maven.org/#search|ga|1|g%3A%22org.apache.any23%22] (Maven Central), [DOAP|https://s.apache.org/any23doap] machine-readable description. Please report any issues to our [community mailing lists|http://any23.apache.org/mail-lists.html].


h2. rdf-convert based on Sesame (rdf4j)
By the main developer of Sesame (rdf4j)
[https://bitbucket.org/jeenbroekstra/rdf-syntax-convert].
Note: this tool is not evaluated against the other conversion tools.
Binary: TODO. (Mitac: I couldn't build it due to a missing dep for com.github.jsonld-java)

h2. librdf based convertor Raptor and Raptor2
- url: http://librdf.org/raptor/
- download: http://download.librdf.org/source/raptor2-2.0.15.tar.gz
- based on librdf, written in C, faster than other java-based convertors
- raptor1 is has some bugs, notably OOM when used on large files. On the other hand it can be installed via the corresponding linux package manager (apt-get, etc.). E.g. {code}sudo apt-get install raptor-utils {code}. BTW, the tool is accessed via 'rapper' instead of 'raptor'.
- raptor2 claims to have the above problems fixed, but Mitac has not tried it yet.