View Source

{excerpt}BM extensions to CRM{excerpt}
Developed by Seme4 as part of BM collection data mapping to RDF in the CRM schema

h1. BMX Description
(i) [^D2 - Commentary on the mapping process- from BM to CIDOC-CRM.pdf], Hugh Glaser, SotonU
- very informative description of the mapping, shows the complexity of museum data.
- detailed description of the developed CRM extension ontology (BMX).
- [^bm-extensions.ttl]: BMX ontology definition. It's RDFS only, not OWL
- []: live links to CRM & BMX classes
- []: live links to BM objects
(was [] during development)
- instructive usage guide for the CRM

[^D1 - BM to ResearchSpace- Conversion Process and Schema.pdf], Ian Millard, Seme4. "Recommended process for converting data from a Collections Management System in the Cultural Heritage domain into Open Linked Data"
- demonstrate feasibility for converting the entire Collections data from the British Museum into Linked Data
- Describes the developed ETL tool "makeRDF": based on config files, uses xpath, similar to xslt but simpler
- nice intro to CRM
- recommended URI allocation:
[] : artifact (non information resource)
[] : html representation (information resource for humans)
[] : rdf representation (information resource for computers)
[] : info about production (who when where created it)
- 14Gb of XMLs exported from BM's collection system (Merlin). Took several weeks (SSL says it'd have been faster if they knew better)
- Merlin is a C/C+\+ system with a proprietary OO database
- BM has 2M objects, 560k of them have images. The CRM RDF has maybe 100 triples per object
- RDF takes between 12 and 24 hours to assert depending on the triple store employed
- BM's Merlin model is based on the SPECTRUM standard for Museum Documentation. The functionality of other museum collection systems is also based on SPECTRUM

h1. Need for CRM extensions
- CRM is a very rich ontology, but still not rich enough to capture collection data from different institutions
- BMX shows that extensions are needed
- the BM should drive towards unification and vigorously promote such extensions for CRM standardization, rather than relying on thesaurus mapping (mapping profiles) and co-reference, which make for more difficult reasoning

h1. BMX problems
BMX itself can be improved. Some examples (mostly from the area of measurements, eg width, height):
- Rather than using BM-specific URIs for units, it'd be better to map during conversion to ontologies that are more established in this area: NASA QUDT (Quantities, Units, and Dimensions) or OASIS UoM (Units of Measure)
- CRM uses a single number (P90_has_value is E60_Number) while BMX extends this to an interval (PX.min_value and PX.max_value)
-- Because min/max are declared as sub-properties of P90_has_value, you end up with P90_has_value having two values
-- I think it's better to compute P90_has_value during conversion, e.g. to be the average of min and max (or equal to min if max is undefined)
-- See [Imprecise Begin-End (and general)] for further discussion
- CRM says that extensions should define properties as either sub-properties, or "long-cut" paths of existing CRM properties.
But BMX defines various properties that are neither sub-properties, nor long-cuts.