BM extensions to CRM
Developed by Seme4 as part of BM collection data mapping to RDF in the CRM schema
D2 - Commentary on the mapping process- from BM to CIDOC-CRM.pdf, Hugh Glaser, SotonU
- very informative description of the mapping, shows the complexity of museum data.
- detailed description of the developed CRM extension ontology (BMX).
- [^bm-extensions.ttl]: BMX ontology definition. It's RDFS only, not OWL
- http://crm.rkbexplorer.com: live links to CRM & BMX classes
- http://collection.britishmuseum.org/: live links to BM objects
(was http://bm2.rkbexplorer.com during development)
- instructive usage guide for the CRM
D1 - BM to ResearchSpace- Conversion Process and Schema.pdf, Ian Millard, Seme4. "Recommended process for converting data from a Collections Management System in the Cultural Heritage domain into Open Linked Data"
- demonstrate feasibility for converting the entire Collections data from the British Museum into Linked Data
- Describes the developed ETL tool "makeRDF": based on config files, uses xpath, similar to xslt but simpler
- nice intro to CRM
- recommended URI allocation:
http://collection.britishmuseum.org/object/PPA1818832 : artifact (non information resource)
http://collection.britishmuseum.org/object/PPA1818832.html : html representation (information resource for humans)
http://collection.britishmuseum.org/object/PPA1818832.rdf : rdf representation (information resource for computers)
http://collection.britishmuseum.org/object/PPA1818832/production : info about production (who when where created it)
- 14Gb of XMLs exported from BM's collection system (Merlin). Took several weeks (SSL says it'd have been faster if they knew better)
- Merlin is a C/C++ system with a proprietary OO database
- BM has 2M objects, 560k of them have images. The CRM RDF has maybe 100 triples per object
- RDF takes between 12 and 24 hours to assert depending on the triple store employed
- BM's Merlin model is based on the SPECTRUM standard for Museum Documentation. The functionality of other museum collection systems is also based on SPECTRUM
- CRM is a very rich ontology, but still not rich enough to capture collection data from different institutions
- BMX shows that extensions are needed
- the BM should drive towards unification and vigorously promote such extensions for CRM standardization, rather than relying on thesaurus mapping (mapping profiles) and co-reference, which make for more difficult reasoning
BMX itself can be improved. Some examples (mostly from the area of measurements, eg width, height):
- Rather than using BM-specific URIs for units, it'd be better to map during conversion to ontologies that are more established in this area: NASA QUDT (Quantities, Units, and Dimensions) or OASIS UoM (Units of Measure)
- CRM uses a single number (P90_has_value is E60_Number) while BMX extends this to an interval (PX.min_value and PX.max_value)
- Because min/max are declared as sub-properties of P90_has_value, you end up with P90_has_value having two values
- I think it's better to compute P90_has_value during conversion, e.g. to be the average of min and max (or equal to min if max is undefined)
- See [How to represent imprecision] for further discussion
- CRM says that extensions should define properties as either sub-properties, or "long-cut" paths of existing CRM properties.
But BMX defines various properties that are neither sub-properties, nor long-cuts.