
Some Rembrandt data rework, related to harmonization and improvements
Way of working
- Vlado makes spec; changes to susana.ttl and diff
- Matthew makes changes to Migration
- Jana makes changes to RForm templates
- Mitac makes changes to EntityAPI (hopefully not many will be needed)
A lot of these are changes marked WILL (then DID) in Rembrandt Mapping Review. Some were investigated in RS-323@jira
Color Status
In the next section we use the color-coded {status} macro to indicate where we are at.
When you're done with your part, please edit (in wiki mode!) and change "colour="
Gray: not applicable
Red: not yet done
Yellow: just about done, needs someone else's attention
Green: done
I have not added a box for EntityAPI (Mitac's part) since hopefully these will be few
Changes
Remove part/1
BM data doesn't have parts. For harmonization and simplification:
- get rid of part/1, and put all its properties directly on the object.
I've been assured this won't constitute lying about its production, creator, material - treat part/2 (the frame) as an accessory (less important) part.
Keep its URI as is, no need to change. - has_number_of_parts: output 1 if there is frame; no property if there is no frame (Matthew please take note)
- get rid of rso:P46_has_main_part,
- Keep rso:P46_has_other_part: needed by properties.txt.
- We could compute this (as rso:P46_has_proper_part) from the standard property, then maybe use it to resolve the [FR BUG]:
Update properties.txt
RS-92 uses a file of "business meaningful properties" to collect a complete Museum Object. Based on BM mapping and Rembrandt Changes, update this list
spec diff mig rformsrdfs:label vs crm:P3_has_note
Following Martin's recommendation, BM will use rdfs:label for the main label of every node, and crm:P3_has_note only for auxiliary notes. This is useful, since we may decide to skip their P3_has_note that duplicate structured data in a label, eg "Width :: 23.0"
However, this is not yet adopted by the CRM SIG, and it's unclear whether we're getting rid of P3_has_note altogether. So unless Jana says otherwise, we'll keep P3_has_note for Rembrandt
spec diff mig rformsSearchability and Display Fields
We cannot display search results including sub-objects (eg a Document or Related drawing that lacks most fields). That's why they should not be searchable, which we've accomplished by introducing E22_Museum_Object and marking only top-level objects with that class.
- Rethink where we find the Display fields, so we can display both Rembrandt and BM objects
- BM data doesn't include E22_Museum_Object
- If possible we should formulate a different criterion for searchability, but I don't think CRM has such notion of "top-level or independent object"
- If not, we should add such class to BM data
Disentangle P2_has_type by introducing sub-properties
When two thesauri are mapped to P2_has_type, selecting New value in data annotation doesn't work since it cannot determine which thesaurus to use. Therefore sub-properties should be introduced. P2_has_type can be used as-is only if the node has a single "type".
We replace P2_has_type with the following:
- Object:
- rso:P2_has_object_type (rkd-object)
- rso:P2_has_object_shape (rkd-shape)
- Image: (RS-627@jira "Cannot determine thesaurus for image type")
- rso:P129_has_iconclass (rst-iconclass)
- rso:P129_has_keyword (rkd-keywords)
TODO Vlado: add clause "P129_is_about E55_Type" to FR2_has_type, else we can't search by IconClass/Keywords
- Frame: rso:P2_has_object_type (rkd-object)
- File:
- rso:P2_has_object_status (rkd-objectstatus)
- rso:P2_has_area_captured (rkd-area_captured: FRONT/BACK and OVERALL/DETAIL).
Note: Jana, we cannot split rso:P2_has_area_captured to two separate fields for <file.spec.overall_detail> vs <file.spec.front_back>,
since if you look in thesauri-all.ttl, rkd-area_captured has many tangled values, eg "whole (front)". So we leave 2 instances of this property, and leave it to the user to put two "compatible" values in them.
Single-out Object Type
We display the object type (painting, coin, etc) as one of the display fields ([RS-690]).
For this reason it needs to be singled out amongst all other E55_Type's. How can we do that?
- We could leave it as the only field per node mapped to P2_has_type.
But then to distinguish, two functions need to use explicit triples only (while of course, FRs use inferred triples):- "Fetch complete object data" for display
- "Propose new value"
- We could map it to rso:P2_has_object_type (and a similar bmo: extension property), and filter by thesaurus (rkd-object and bm-thes-object respectively)
I like the second approach better since it doesn't rely on a particular query mode (and we may change that tomorrow).
Mitac & Jana, please give your opinion here.
- Mitac: I probably don't fully understand the problem: crm:P2_has_type is not a chain property, only rso:PS2_has_type is. Thus when we fetch P2 for an object it will return its direct types (painting, coin). Both the explicit and the implicit triples match. So, we can simply show P2 while when searching we should search by PS2.
- Vlado: PS2 will be removed in this iteraton, to be replaced by FR2. Anyway, we're discussing the direct properties here, not the property chains. We disentangled all "type" properties, but will you single out the "object type" to show it amongst the "search result display fields"?
- Joshan: I agree with the second solution also. The thesaurus that the term belongs to defines what the semantics is (not just the predicate of P2_has_type). Could this not be translated into a rule whereby if the ConceptScheme of a term used by P2_has_type is an 'object type thesaurus' (this may differ from org to org) then assert an rso:PS2_has_type. As I mentioned to Vladimir (and as he has mentioned on many occasions), there seems to be an issue of modelling using CRM & providing display/search functionality. How do we add extra ontology vocab to allow display & search? I was suggesting:
- MODELLING: perform pure CRM modelling to model the dataset (irrelevant of display/search purposes)
- ENGINEERING: use rules (like the one above) to assert research space ontology entities to allow display/search funtionality
- By doing this, would allow a clean separation between modelling and application requirements by using rules. I'm aware this may be an ideal and too simplistic, but could it solve majority of issues like this?
- Vlado: whether we implement the "singling-out" with with a rule or a query is a second decision. We're still discussing how to represent the "object type" field.
- BTW Josh you will have to do the same disentangling as we did (eg by defining BMO extension properties of P2), else data annotation won't work over BM data. See the detailed explanation below.
Jana, can you explain here why one property should not have multiple properties from several thesauri?
- Jana: The reason was a design decision that I have never been too happy with - to recognize thesaurus for new values from the existing values.
- That is, if we need to propose a new value for has_current_keeper, for example, we fetch existing values, get the thesaurus they belong to and present the user with autocomplete component that searches values exactly from that thesaurus.
- Keep in mind that for Stage 4 this won't work as we will have to implement entering objects from scratch. As a long term solution, we need some meta information about what thesauri are suitable for each property.
- As a short term improvement, we may suggest values from multiple thesauri in the new value autocomplete - if needed - similar to the way it will be done in the search sentence.
- Vlado: I know all this, but I wanted an in-depth answer to my question above.
What's the problem for P2_has_type to have value1 from thes1 and value2 from thes2? Can't the user propose a value new2 for value1 from thes1, and separately for thes2?
Now that I asked the question explicitly, I can answer it myselfThere's no problem to propose a new value, but two new values will get tangled. The system cannot know whether new1 applies to value1 or value2, since the association is by rdf:subject and rdf:predicate, but not by thesaurus.
- Jana, please comment which alternative above you prefer.
Unchangeable P2_has_type
Jana, please note that the following P2_has_type properties are unchangeable (fixed).
They don't come from a thesaurus, i.e. don't have skos:inScheme, so a new value cannot be proposed.
(This is for information only, no change)
Business-specific sub-properties
Maria RS-273@jira: we planned initially that data in a record will be grouped into sections: Basics, Parts, Exhibitions, Auctions, Collections, etc.
But RForms cannot create different sections (lists) based on P2_has_type of a node: it can distinguish only based on relation.
- Make business-specific sub-properties
- rso:P12i_was_present_at_exhibition (sub-property of crm:P12i_was_present_at, inverse rso:P12_exhibited)
- rso:P12i_was_present_at_research (sub-property of crm:P12i_was_present_at, inverse rso:P12_researched)
- rso:P24i_changed_ownership_through_auction (inverse rso:P12_auctioned)
- change P11 to P14 for auction house:
Maria/Jana to specify whether more sub-properties are needed
Thesaurus changes
- add IconClass code as SKOS Notation:
(currently will not be used by ResearchSpace but could be)
- Verify that Rembrandt and BM thesauri satisfy BMX Issues#Thesaurus Requirements, and make appropriate changes
Rembrandt thesauri:
thesauri.ttl
thesauri-all.ttl
thesauri-disposition.ttl
thesauri-extracted.ttl
- thesauri-place.ttl
- rkd-places: replace P89_falls_within with P88i_forms_part_of, else FR won't work (this is a bug)
BM thesauri: RS-700@jira
Diffs
- You can view these with TortoiseUDiff (part of TortoiseSVN)
- Below we show the most important parts as images, taken from the same program
- I personally prefer to view Tortoise>Diff with Araxis Merge, which shows word/char-level changes
Setup: rclick> TortoiseSVN> Settings> External Programs> Diff viewer> External"C:\Program Files (x86)\Araxis Merge v6.5\compare.exe" /max /wait /title1:%bname /title2:%yname %base %mine
rso.ttl
susana.ttl
thesauri.ttl
etc
thesauri-place.ttl
etc