Every Turtle/RDF file must be validated before committing to SVN.
Else automatic repository deployment (refresh) scripts fail and you waste your colleagues' times.
Validation
RIOT
Jena ARQ RIOT is an RDF conversion/validation tool. It's Java based so it runs on Linux or Windows.
The current download link (Jan 2017) is https://jena.apache.org/download/.
Older installation:
- Download ARQ (ARQ-2.8.8 is current as of 2011-04-21)
- unzip to a path that includes no spaces (eg on Windows: c:\prog\ARQ-2.8.8)
- On Linux it's easier:
- add ARQ-2.8.8/bin to your path
- On Windows you need to jump through more hoops:
- add c:\prog\ARQ-2.8.8\bat to your path
- write a batch file riot.bat in the same dir:
Newer installation:
- Just get apache-jena-2.11.1, it includes the required files: shell (Linux) and batch (Windows)
Call it like this
- Unfortinately it returns only the first error. rdfparse (another jena tool) also returns only the first error
- If there are no errors, you'll see no output
- TODO: integrate this as a SVN commit hook, or emacs vc-before-checkin-hook
Emacs integration
Here's how to integrate RIOT to the Emacs 'compile' command
- Get and install n3-mode.el (it's rather primitive but still useful for Turtle editing), then
- Get and install smart-compile.el, then
- Add regexp's to recognize RIOT's error messages:
- When editing a TTL file, invoke compilation with "C-c c". It jumps automatically to the first error, eg:
Online Validator
http://www.rdfabout.com/demo/validator/
Use this for a one-off validation job.
TODO: we can easily automate calling this with wget
Validating RDF Parser (VRP)
ICS-FORTH Validating RDF Parser: a tool for analyzing, validating and processing RDF schemas and resource descriptions. SVG visualization.
- Tried install
- converted susana.ttl to susana.rdf
any23 -f xmlrdf susana.ttl > susana.rdf
- runVRP.bat: fixed VRP_HOME, removed JAVA_HOME, fixed command line:
"%JAVA_HOME%\bin\java" -mx1000m -classpath "%JAVA_HOME%;..."
- ran it, trying various options
- specified susana-browseSchema.svg as one of the outputs.
Had to add namespaces:and it shows a page with some control buttons, but no content
- it was never able to produce schemaVisualization.svg
- turns out it's buggy:
Couldn't repair and continue parse gr.forth.ics.vrp.corevrp.RDF_Error: java.lang.NullPointerException d = 13 dec Exception in thread "AWT-EventQueue-0" java.lang.Error: Symbol recycling detected (fix your scanner). at java_cup.runtime.lr_parser.parse(lr_parser.java:545) at gr.forth.ics.vrp.corevrp.model.Model.fetch_all_ns(Model.java:621) at gr.forth.ics.vrp.corevrp.model.Model.fetch_all(Model.java:583)
RDFSuite
It's part of RDFSuite that includes
- Validating RDF Parser (VRP): The First RDF Parser supporting semantic validation of both resource descriptions and schemas
- RDF Schema Specific DataBase (RSSDB): The First RDF Store using schema knowledge to automatically generate an Object-Relational (SQL3) representation of RDF metadata and load resource descriptions.
- RDF Query Language (RQL): The First Declarative Language for uniformly querying RDF schemas and resource descriptions.
- RDF Update Language (RUL): The First Declarative Language for uniformly updating resource descriptions
But it's done 2002-2003 and given the above experience, I won't try it.
Eyeball
Jena Eyeball http://jena.sourceforge.net/Eyeball/
- RDF's open world assumption means "anything goes" so eg a misspelt prop name is not considered a mistake.
- Eyeball tries to overcome this. It includes a lot of configurable checks over RDF data
- Windows download: http://sourceforge.net/projects/jena/files/Eyeball/Eyeball%202.3/eyeball-2.3.zip/download
See
RS-1071
for a first attempt and pieces of advice. Once we apply it successfully, we should move the info here.
SHARK
- SHARK- A Test-driven Framework for Design and Evolution of Ontologies (ESWC 2018)
- source: https://github.com/gcpdev/shark
- Uses Travis CI and RDFunit
- Web-based interface where users can
- choose between 20 different pre-defined guidelines
- run their own custom SHACL tests.
- REST service: https://app.swaggerhub.com/apis/gcpdev/SHARK/0.1
Further Links
Sebastian Hellmann hellmann@informatik.uni-leipzig.de on OA mlist:
I convert/validate with
- Rapper: http://librdf.org/raptor/rapper.html
- rdf.sh: https://github.com/seebi/rdf.sh (Note the RDF diff function)
- Jena CLI: http://jena.sourceforge.net/tools.html
- Pellet: http://clarkparsia.com/pellet
- Pellint http://weblog.clarkparsia.com/2008/07/02/pellint-an-ontology-repair-tool/
SHARK paper:
- methodologies: Gruninger and Fox, Methontology, On-To-Knowledge, DILIGENT and Neon
- eval frameworks: OntoClean, OntoQA, Unit Tests, OQuaRE, Neon Guidelines
- eval tools: ODEClean, ODEval, AEON, Eyeball, Moki, OQuare, OntoCheck, XD-Analyzer, OOPS
Converting to Turtle
Comparison of several tools for converting RDF to Turtle.
The tools are ordered below by preference. Or you can compare the results yourself:
rdfcat
Can concatenate several files, URLs or - (stdin).
Based on Jena RIOT.
rdfcat -x 2354.rdf -out ttl > 2354.ttl
Provides the best resutls
rdfcopy
Also based on Jena RIOT.
rdfcopy file:2354.rdf RDF/XML TURTLE > 2354.ttl
Requires you to always specify the URL scheme, even for a local file.
Almost as good as rdfcat, just adds @base <file:2354.rdf> on top, which may not be desired
rdf2rdf
http://www.l3s.de/~minack/rdf2rdf/ (rdf2rdf-1.0.1-2.3.1.jar)
May be the easiest to use.
Based on Sesame (OpenRDF) RIO 2.3.1
rdf2rdf 2354.rdf .ttl
Does not group all statements per Subject and spreads prefixes throughout the file, so the Turtle is hard to understand
any23
"any23" stands for "anything to triples" and converts from RDFa, microformats, RDF formats to turtle and other RDF formats
Local use:
- install
- make a batch file like this:
any23.bat
- invoke like this:
any23 rover 2354.rdf -o 2354.ttl
- Output is based on Sesame RIO, so it gives same result as rdf2rdf
Web use
- use their web service using the wget program (similar with curl):
- NOTICE: the file must be valid. Else the site crashes
Update 20170227:
The Apache Any23 project management committee are please to announce the release of Any23 2.0 which marks a major milestone for the project.
Anything To Triples (any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents.
Release notes, downloads. Maven artifacts (Maven Central), DOAP machine-readable description. Please report any issues to our community mailing lists.
rdf-convert based on Sesame (rdf4j)
By the main developer of Sesame (rdf4j)
https://bitbucket.org/jeenbroekstra/rdf-syntax-convert.
Note: this tool is not evaluated against the other conversion tools.
Binary: TODO. (Mitac: I couldn't build it due to a missing dep for com.github.jsonld-java)
librdf based convertor Raptor and Raptor2
- url: http://librdf.org/raptor/
- download: http://download.librdf.org/source/raptor2-2.0.15.tar.gz
- based on librdf, written in C, faster than other java-based convertors
- raptor1 is has some bugs, notably OOM when used on large files. On the other hand it can be installed via the corresponding linux package manager (apt-get, etc.). E.g. . BTW, the tool is accessed via 'rapper' instead of 'raptor'.
- raptor2 claims to have the above problems fixed, but Mitac has not tried it yet.