View Source

{jira:RS-1370}
{toc}
{attachments}

To: ECRM, CRM SIG. Date: 12/18/2012
Subject: ECRM simplification; RDFS+inverses CRM version

h1. ecrm-simplify.xq
I wrote a XQuery script [^ecrm-simplify.xq] that simplifies the ecrm-current.owl file by removing some OWL constructs.
It always keeps owl:inverseOf and owl:SymmetricProperty (self-inverses): they are an innate part of CRM.
Parameter keep= gives a comma-separated list of other features to keep:
- transitive: owl:TransitiveProperty
- restriction: owl:Restriction (blank-node subClassOf)
- functional: owl:FunctionalProperty, owl:InverseFunctionalProperty
- disjoint: owl:disjointWith

So this can script can make "ECRM profiles", which use only a subset of OWL constructs.

h1. Save Space in RS
{jira:RS-1279}
In ResearchSpace we use [^ecrm-simplified.owl] Why did we need this?
Because the ECRM owl:Restrictions add 20-25% [more statements|Repository Volumetrics] that are useless for us.
(I'd appreciate pointers to any system that uses them).

The CRM class hierarchy is quite deep, and each of these restrictions is propagated through it.

We have these counts of rdf:type statements for 115k British Museum objects:
| *\_:nodeXX* | *34675445* | *41.2%* |
| *owl:Thing* | *5423480* | *6.4%* |
| *CRM classes* | *43325284* | *51.5%* |
| crm:E1_CRM_Entity | 5423480 | |
| crm:E77_Persistent_Item | 3869228 | |
| crm:E70_Thing | 3800112 | |
| crm:E72_Legal_Object | 3688546 | |
| crm:E71_Man-Made_Thing | 3681210 | |
| crm:E28_Conceptual_Object | 2932619 | |
| ... | | |

So we’ll save 34.6M statements. (Next I’ll be looking at a way to eliminate the useless owl:Thing statements).
In ResearchSpace we use “transitive” (with a few added statemetns that I believe were forgotten) but not restriction, functional or disjoint.

h1. Avoid Ontological Overcommitment
Apart from the practical considerations above, ECRM makes "ontological overcommitments" beoynd the CRM standard.
E.g. it says (in Manchester syntax):
{noformat}
Class: ecrm:E72_Legal_Object
SubClassOf:
ecrm:P104_is_subject_to some ecrm:E30_Right,
ecrm:P105_right_held_by some ecrm:E39_Actor,
ecrm:E70_Thing
{noformat}
Which means: Legal_Object is not only a subclass of Thing (as per CRM standard), but there must be some Right and Actor that it is related to.
However, such objects may not be present in cases of missing information, so what good is that assertion?

h1. RDFS+inverses
Last summer Martin Doerr asked if I can extend the CRM RDFS version to include inverses, being an innate part of CRM.
Such version is attached here ([^ecrm-inverses.owl]), made from the ECRM version with my simplification script.
(The xmlns and xml:base at the beginning should be changed to the agreed CRM namespace.)

h1. CRM Unification
The CRM community will benefit *greatly* from a single RDF definition of CRM.
This will not only eliminate effort duplication by the different maintainers, but will remove doubt in the community which version to use.
The unified spelling of property and class names adopted by the CRM SIG on Nov 20 is an important step in that direction
- I am glad the ECRM spelling was agreed, since it'd have been quite hard for us to change the ResearchSpace code base to use a different spelling
- the new RDFS release [http://cidoc-crm.org/rdfs/cidoc_crm_v5.0.4_official_release.rdfs] uses that spelling

If the two groups want to collaborate, maybe the best approach is:
- maintain the unified ontology in Protégé
- export an OWL version from Protégé
- use the simplification script to make “application profiles” as described above

Below are some next steps. Who can help?
# diff the two versions to find any discrepancies
-- This can be done with the Protégé plugin. I started doing it once, but dropped it after 10 min…
-- There is OWL Patch: [http://owl.cs.manchester.ac.uk/patch/]
However, I tried it from the commandline and got this:
{noformat}
> curl "http://owl.cs.manchester.ac.uk/patch/diff.php?a=http://cidoc-crm.org/rdfs/cidoc_crm_v5.0.4_official_release.rdfs&b=http://erlangen-crm.org/onto/ecrm/ecrm_current.owl"
Sorry, can't diff!
{noformat}
# add an official namespace, something like [http://cidoc-crm.org/ns] or [http://cidoc-crm.org/ns/]
ResearchSpace will be willing to adopt this namespace **if*\* agreed by Jan 15.
After that date we’ll be loading 1.5M BM objects (estimated 2.076B statements)
# take the multilingual labels from RDFS.
The ECRM guys prefer the prop/class numbers to be included in the labels, so this needs to be argued
# take the longer rdfs:comments from ECRM.
Optionally, split into skos:scopeNote vs skos:example
# take skos:notation from my email 8-Aug-2012. I posted a script that generates statements like:
P91i_is_unit_of rdfs:label "P91 is unit of"; skos:notation "P91i"