Skip to end of metadata
Go to start of metadata

Some Rembrandt data rework, related to harmonization and improvements

Way of working

  • Vlado makes spec; changes to susana.ttl and diff
  • Matthew makes changes to Migration
  • Jana makes changes to RForm templates
  • Mitac makes changes to EntityAPI (hopefully not many will be needed)

A lot of these are changes marked WILL (then DID) in Rembrandt Mapping Review.
Some were investigated in RS-323

Color Status

In the next section we use the color-coded {status} macro to indicate where we are at.
When you're done with your part, please edit (in wiki mode!) and change "colour="

Gray: not applicable
Red: not yet done
Yellow: just about done, needs someone else's attention
Green: done

I have not added a box for EntityAPI (Mitac's part) since hopefully these will be few

Changes

Remove part/1

BM data doesn't have parts. For harmonization and simplification:

  • get rid of part/1, and put all its properties directly on the object.
    I've been assured this won't constitute lying about its production, creator, material
  • treat part/2 (the frame) as an accessory (less important) part.
    Keep its URI as is, no need to change.
  • has_number_of_parts: output 1 if there is frame; no property if there is no frame (Matthew please take note)
  • get rid of rso:P46_has_main_part,
    • Keep rso:P46_has_other_part: needed by properties.txt.
    • We could compute this (as rso:P46_has_proper_part) from the standard property, then maybe use it to resolve the FR BUG:
spec diff mig rforms

Update properties.txt

RS-92 uses a file of "business meaningful properties" to collect a complete Museum Object.

  • RS-934
  • Based on BM mapping and Rembrandt Changes, update this list
    RS-680
spec diff mig rforms

rdfs:label vs crm:P3_has_note

Following Martin's recommendation, BM will use rdfs:label for the main label of every node, and crm:P3_has_note only for auxiliary notes. This is useful, since we may decide to skip their P3_has_note that duplicate structured data in a label, eg "Width :: 23.0"

However, this is not yet adopted by the CRM SIG, and it's unclear whether we're getting rid of P3_has_note altogether. So unless Jana says otherwise, we'll keep P3_has_note for Rembrandt

spec diff mig rforms

Searchability and Display Fields

We cannot display search results including sub-objects (eg a Document or Related drawing that lacks most fields). That's why they should not be searchable, which we've accomplished by introducing E22_Museum_Object and marking only top-level objects with that class.

  • Rethink where we find the Display fields, so we can display both Rembrandt and BM objects
    RS-690
  • BM data doesn't include E22_Museum_Object, so we need a different criterion.

The FR rules internally use rso:FC70_Thing. We define this criterion for searchability (FR Implementation#FC70_Thing for RS):

  • rso:E22_Museum_Object, OR
  • crm:E22_Man-Made_Object, and the current keeper or owner is the BM
spec diff mig FR rules

Disentangle RKD types

When two thesauri are mapped to P2_has_type, selecting New value in data annotation doesn't work since it cannot determine which thesaurus to use. Therefore sub-properties should be introduced. P2_has_type can be used as-is only if the node has a single "type".
We replace P2_has_type with the following:

  • Object:
    • rso:P2_has_object_type (rkd-object)
    • rso:P2_has_object_shape (rkd-shape)
  • Image: RS-627
    • rso:P129_has_iconclass (rst-iconclass)
    • rso:P129_has_keyword (rkd-keywords)
    • Added clause "P129_is_about E55_Type" to FR2_has_type, so we can search by IconClass/Keywords
  • Frame: rso:P2_has_object_type (rkd-object)
  • File:
    • rso:P2_has_object_status (rkd-objectstatus)
    • rso:P2_has_area_captured (rkd-area_captured: FRONT/BACK and OVERALL/DETAIL).
      Note: Jana, we cannot split rso:P2_has_area_captured to two separate fields for <file.spec.overall_detail> vs <file.spec.front_back>,
      since if you look in thesauri-all.ttl, rkd-area_captured has many tangled values, eg "whole (front)". So we leave 2 instances of this property, and leave it to the user to put two "compatible" values in them.
spec diff mig rforms

Disentangle BM Types

RS-716
77 occurrences of P2_has_type in config.xml. indicates conflict

N term: comments
22 thesauri/production/authoring..writing: production type
1 thesauri/production/retail: activity type
2 thesauri/production/{mus_object_production_person_association},
thesauri/production-place/{mus_object_production_place_association}:
production probably/unlikely
13 dimension/circumference currency curvature depth diameter die-axis height length percentage thickness volume weight width
8 identifier: bigno cmcatno codexid grcatno otherid prn regno serialno
6 thesauri/(x12541,x5827,x6411,x6596,x6622): part type (bell, case, dial...)
5 find/[CEF]: Discovery type
4 acquisition/association/{bm_acq_name_ass} (Acquired From, On Loan From, Acquired Through, Motivated By)
3 thesauri/authority/{bm_as_(name,place,authority)_ass}: production P17_was_motivated_by person,place,authority
3 thesauri/association/(associatedwith,namedinscription): depicted/named place/person (P65_shows_visual_item/P138_represents)
2 thesauri/association/{bm_as_title_ass}: title type
2 thesauri/production/AJ: production by Related Group (Circle)
2 inscription/lettering type (IR, LE)
1 modification/RP (repaired)
1 aspect/{mus_obj_parts} (E25_Man-Made_Feature)
1 acquisition type (Treasure Trove)
1 thesauri/{mus_object_name_th_i}: object type
1 thesauri/{bm_ware_th_i}: ware
1 thesauri/{bm_escapement_th_i}: clock escapement

Josh, you need to introduce 3 subprops of P2_has_type for the last 3 thesauri, eg
bmo:PX_object_type, bmo:PX_ware, bmo:PX_escapement.
Else we'll have trouble displaying the object type (see next section):
if we fetch the superprop P2, we'll get not only BM type&ware&escapement, but also RKD type&shape (see prev section)

Single-out Object Type

We display the object type (painting, coin, etc) as one of the display fields: RS-690 .
For this reason it needs to be singled out amongst all other E55_Type's. How can we do that?

  1. We could leave it as the only field per node mapped to P2_has_type.
    But then to distinguish, two functions need to use explicit triples only (while of course, FRs use inferred triples):
    • "Fetch complete object data" for display
    • "Propose new value"
  2. We could map it to rso:P2_has_object_type (and a similar bmo: extension property), and filter by thesaurus (rkd-object and bm-thes-object respectively)

I like the second approach better since it doesn't rely on a particular query mode (and we may change that tomorrow).
Mitac & Jana, please give your opinion here.

  • Mitac: I probably don't fully understand the problem: crm:P2_has_type is not a chain property, only rso:PS2_has_type is. Thus when we fetch P2 for an object it will return its direct types (painting, coin). Both the explicit and the implicit triples match. So, we can simply show P2 while when searching we should search by PS2.
    • Vlado: PS2 will be removed in this iteraton, to be replaced by FR2. Anyway, we're discussing the direct properties here, not the property chains. We disentangled all "type" properties, but will you single out the "object type" to show it amongst the "search result display fields"?
  • Joshan: I agree with the second solution also.  The thesaurus that the term belongs to defines what the semantics is (not just the predicate of P2_has_type).  Could this not be translated into a rule whereby if the ConceptScheme of a term used by P2_has_type is an 'object type thesaurus' (this may differ from org to org) then assert an rso:PS2_has_type.  As I mentioned to Vladimir (and as he has mentioned on many occasions), there seems to be an issue of modelling using CRM & providing display/search functionality.  How do we add extra ontology vocab to allow display & search?  I was suggesting:
    1. MODELLING: perform pure CRM modelling to model the dataset (irrelevant of display/search purposes)
    2. ENGINEERING: use rules (like the one above) to assert research space ontology entities to allow display/search funtionality
  • By doing this, would allow a clean separation between modelling and application requirements by using rules.  I'm aware this may be an ideal and too simplistic, but could it solve majority of issues like this?
    • Vlado: whether we implement the "singling-out" with with a rule or a query is a second decision. We're still discussing how to represent the "object type" field.
    • BTW Josh you will have to do the same disentangling as we did (eg by defining BMO extension properties of P2), else data annotation won't work over BM data. See the detailed explanation below.

Let's explain here why one property should not have multiple properties from several thesauri?

  • Jana: The reason was a design decision that I have never been too happy with - to recognize thesaurus for new values from the existing values.
    • That is, if we need to propose a new value for has_current_keeper, for example, we fetch existing values, get the thesaurus they belong to and present the user with autocomplete component that searches values exactly from that thesaurus.
    • Keep in mind that for Stage 4 this won't work as we will have to implement entering objects from scratch. As a long term solution, we need some meta information about what thesauri are suitable for each property.
    • As a short term improvement, we may suggest values from multiple thesauri in the new value autocomplete - if needed - similar to the way it will be done in the search sentence.
  • Vlado: I know all this, but I wanted an in-depth answer to my question above.
    What's the problem for P2_has_type to have value1 from thes1 and value2 from thes2? Can't the user propose a value new2 for value1 from thes1, and separately for thes2?
    Now that I asked the question explicitly, I can answer it myself There's no problem to propose a new value, but two new values will get tangled. The system cannot know whether new1 applies to value1 or value2, since the association is by rdf:subject and rdf:predicate, but not by thesaurus.
  • Jana: "value new2 for value1" sounds fine if we replace the value. When we propose an addition (or create data from scratch) this won't work. Otherwise, we link new1-value1 in an annotation as new/old values. The new values are not shown in rforms anyway until they get accepted and replace the old values.
    • Introducing P2_has_object_type seems fine to me. Not because of the new value problem above (we may run into it at many other places) but because we clearly put some specific semantics there - like planning for different rforms templates per type or searching by type.
    • I prefer P2_has_object_type
spec diff mig rforms

Unchangeable P2_has_type

Jana, please note that the following P2_has_type properties are unchangeable (fixed).
They don't come from a thesaurus, i.e. don't have skos:inScheme, so a new value cannot be proposed.

(This is for information only, no change)

Business-specific sub-properties

Maria RS-273 :
We planned initially that data in a record will be grouped into sections: Basics, Parts, Exhibitions, Auctions, Collections, etc.
But RForms cannot create different sections (lists) based on P2_has_type of a node: it can distinguish only based on relation.

  • Make business-specific sub-properties
    • rso:P12i_was_present_at_exhibition (sub-property of crm:P12i_was_present_at, inverse rso:P12_exhibited)
    • rso:P12i_was_present_at_research (sub-property of crm:P12i_was_present_at, inverse rso:P12_researched)
    • rso:P24i_changed_ownership_through_auction (inverse rso:P12_auctioned)
  • change P11 to P14 for auction house:
  • Maria/Jana may specify more sub-properties, if needed
spec diff mig rforms

Thesaurus changes

  • add IconClass code as SKOS Notation:

    (currently will not be used by ResearchSpace but could be)

  • Verify that Rembrandt and BM thesauri satisfy BMX Issues#Thesaurus Requirements, and make appropriate changes

Rembrandt thesauri:

  • thesauri.ttl
  • thesauri-all.ttl
  • thesauri-disposition.ttl
  • thesauri-extracted.ttl
  • thesauri-place.ttl
    • rkd-places: replace P89_falls_within with P88i_forms_part_of, else FR won't work (this is a bug)
spec diff mig rforms

BM thesauri: RS-700

Diffs

Name Size Creator Creation Date Comment  
Text File rso.ttl.patch 5 kB Vladimir Alexiev May 25, 2012 23:06    
Text File susana.ttl.patch 10 kB Vladimir Alexiev May 25, 2012 23:06    
Text File thesauri.ttl.patch 3 kB Vladimir Alexiev May 25, 2012 23:07    
  • You can view these with TortoiseUDiff (part of TortoiseSVN)
  • Below we show the most important parts as images, taken from the same program
  • I personally prefer to view Tortoise>Diff with Araxis Merge, which shows word/char-level changes
    Setup: rclick> TortoiseSVN> Settings> External Programs> Diff viewer> External
    "C:\Program Files (x86)\Araxis Merge v6.5\compare.exe" /max /wait /title1:%bname /title2:%yname %base %mine

rso.ttl

susana.ttl





thesauri.ttl


etc

thesauri-place.ttl


etc

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. May 28, 2012

    In susana.ttl: crm:P2_has_object_type - should be rso:P2_has_object_type?

    1. Jun 08, 2012

      Fixed crm:P2_has_object_type to rso:P2_has_object_type (twice)