- Problem: Can CRM be Used as a Data Schema
- Problem1: How to display a complex CIDOC CRM object network?
- Problem2: How to edit a complex CIDOC CRM object network?
- Possible solutions
(Vlado) CIDOC CRM is a complex beast: 86 classes and 137 properties, which can be combined in a huge number of ways. It's no coincidence that the CIDOC CRM#Full representation is split in 34 screens (subject areas).
- For a while now I've had the uneasy feeling that one can't easily build a system using CIDOC CRM as its data model. And I don't mean import/export, I mean view/edit of complex CIDOC CRM networks.
(Mike) I have the same view. The role of the CIDOC CRM is to allow mappings between different systems by reference to the common model. Taking this view, ResearchSpace should define or adopt an appropriate schema, for instance LIDO or CDWALite, on which to base the RS tools. With the help of the CIDOC CRM we should be able to produce mappings between the schema of contributing systems to the RS schema "relatively easily". The complexity will depend on how elaborate the two schemas in questions are and which elements are actually to be mapped - for instance any collections management system will contain information to do with the management of the collection such as "the last time it was checked that the object was in place". This sort of information could. no doubt, be expressed in the CIDOC CRM but I'm guessing it is not of interest to ResearchSpace.
- (Aside: there may be a tendency on the part of the ResearchSpace customer to say "we don't know what we don't want at this stage so let's take everything". This way can lie madness, however we've sometimes satisfied this by taking everything and holding it but only making a selection of elements generally visible. That way, if the central schema changes it's relatively easy to expose the new information. I also hope that RDF and linked data offer us better ways of managing this sort of thing - just add the appropriate elements into the ontology to bring new data elements into play - but I have much to learn!)
- There would be a case for a tool which could help create such mappings but I think that is outside of the scope of the current project. It would be good to agree how such mappings are defined (XSLT, RDF, etc) either adopting a standard if we can find one or proposing such a standard.
- this is task "02 UI/Navigation> UI framework for displaying objects (read/only)"
- how display individual fields? How to group them in Tabs/groups/sections? How to display multiple occurrences? Where to stop navigating the network?
- Screen description/generation? Leverage ITT generation experience.
- ECS2 used schema/screen descriptions in excel
- EMS21 used schema/screen descriptions in a custom tool (SRAD)
- XForms uses XSD schemas and XML/XPath screen descriptions, focusing on logical GUI description (binding to data, required/ constraint/ notApplicable). Visuals are tacked with "appearance" attributes and CSS
- but how do you do it for RDF data and RDFS/OWL schema?
- this is task "12 Data Entry/Input Editing>UI framework for editing objects (read/write)"
- it's an even harder task than View, so it must leverage the previous one. One way to leverage:
- Generate the views in HTML+RDFa, binding back to the RDF
- Then RDFAuthor (from OntoWiki) or VIE (from IKS and Apache Stanbol) "magically" make the view editable with JS (see Data Entry Tools).
We have to try it
Eg think about this simple case: how to change the Painter of a Painting.
- According to object_production@crmg and mark_inscription@crmg, it may be represented by fragments like this
CRM Image and Production
E38 Image=>E36 Visual Item --> P65B is shown by --> E24 Physical Man-Made Thing --> P108B was produced by --> E12 Production=>E7 Activity --> P14 carried out by --> E39 Actor<=E21 Person
(maybe I am wrong, since E24 could be eg a T-Shirt showing a reproduction, not the original painting... that's not the point)
- we want the painter to be selected from a thesaurus (not a random string or URI)
- so the painter is both a crm:E21_Person, and a skos:Concept from a particular thesaurus (eg skos:inScheme ulan:scheme)
- and it should have an extra property or parent indicating he's a Painter. Eg it "P127 has broader term" ulan:painters
- therefore we need a rule like this:
To change the target of crm:P108B_was_produced_by
- look at the source (an E24 Physical Man-Made Thing). If it "P65 shows" an "E38 Image" (it's a painting) then:
- invoke a widget that searches for skos:Concept's under ulan:painters (i.e. ulan:painters "P127B has narrower term" ?concept)
My head hurts!
- or maybe we specialize like so (Not that I think this is a good idea!). "rso" stands for "ResearchSpace Ontology"
- rso:Painter is a sub-class of crm:E21_Person, and a skos:Concept having crm:P127_has_broader_term ulan:painters
- rso:Painting is a sub-class of crm:E24_Physical_Man_Made_Thing, having property crm:P65_shows to class crm:E38_Image
- rso:was_painted_by is a sub-property of crm:P108B_was_produced_by with domain rso:Painting and range rso:Painter
which simplifies the rule to "preserve object class":
To change the target of rso:was_painted_by
- invoke a widget that searches for rso:Painter
The CRM Introduction is important since it explains the purpose, applicability, modeling principles, and what is "CRM-compatible". Some interesting quotes:
- "Objectives of the CIDOC CRM: ... enable information exchange and integration between heterogeneous sources of cultural heritage information"
- "it is not optimised to implementation-specific storage and processing aspects"
- "Users of the CRM should be aware that the definition of data entry systems requires support of community-specific terminology, guidance to what should be documented and in which sequence, and application-specific consistency controls. The CRM does not provide such notions."
- CRM Compatibility: "CRM leaves room both for extensions, needed to capture the full richness of cultural information, and for simplifications, required for reasons of economy" (what I call in the plan "CRM application profile")
- "An information system is export-compatible if it is possible to export all user data from this information system into an import-compatible data structure. This capability is the recommended kind of CRM-compatibility for local information systems" (i.e. data entry systems)
- "An information system is import-compatible if it is possible to import data encoded in a CRM-compatible form and to access the data in a manner equivalent to and homogeneous with all generic data of this system that fall under the same concepts. This capability is considered as the normal kind of CRM compatibility for integrated access systems that physically copy source data in a data warehouse style"
RS is both a warehouse (imports objects from a bunch of museums) and a data entry system (edit objects, create annotations). How to support both?
These concerns are confirmed by Martin Doerr on Thu 10/6/2011 5:31 PM at the CRM SIG mailing list:
Please note that what the user finds in a user interface is explicitly not the concern of the CRM. ONLY because such concerns have been excluded, the CRM could ever be standardized. Your GUI has to provide the adequate "filters". The CRM has NEVER been recommended as a data entry form!
This comment was not solicited, it was given on an unrelated topic, so I guess these concerns are not new to Martin
The complications illustrated above suggest that we need to step on a more specific data model. Below we describe some options
(Vlado) use a more concrete XSD schema or ontology such as LIDO
(Dominic) Use LIDO for data representation (of search results) and for data entry purposes.
In this system different object/content types would be defined with a core set of fields derived from the LIDO specification. Specialisation could be added to a seperate catagory and/or project could configure the default representations/data entry forms to conform to their projects particular requirement. This could be linked to a more detailed CRM representation which could also try to represent and 'object' or could be dependent where the user jumped to the CRM representation from the LIDO screen. The CRM view could be more of a browing interface allwing users to explore aspects of the object and jump back into full LIDO representations from a particular point of the CRM browse.
(Mariana) Although more complex, CIDOC-CRM gives more expressive power, and there is a lot of support around it, including a large international community. As a compromise we can use the approach of reference layers and interlink LIDO with CIDOC-CRM and thus allow flexible usage of the two schemata.
The FRs for CRM Search are all based on CIDOC-CRM, so if we are to consider implementing FRs we better have representations with CIDOC-CRM.
(Vlado) So the idea is to use both a more concrete schema and CRM, and somehow interlink the two
- The FRs for CRM Search are a similar idea
- Mariana has experience mapping various dataset-specific relations (eg DBpedia, FreeBase, etc) to a common upper-level (PROTON) for FactForge
Define a sort of "application profile" for CRM, i.e. define what CRM graph patterns (subset of all possible CRM patterns) we will use.
But what mechanism to use for this?
Mitac was thinking along the lines of an Entity API
(Mike Selway) Delineate parts of the graph using some sort of "macro" approach.
But how to recognize the "macro regions"?
(Vlado) RS3 may provide editing only of atomic values, but no reconfiguration of the RDF graph.
Whether this limitation can work should be confirmed by use case elaboration.
Please note that the RDF structure is predetermined by the original XML (and data conversion rules).
- If a value (eg Author) is missing in the original XML data but should be editable: it should be (should have been) present as a blank XML element. Then it will create a blank RDF node that can be edited (TODO: think through the mechanics of this)
- Unfortunately we don't have XSD schema for the Rembrandt data, else we could think to tack a UI on top of such schema