Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

Some Rembrandt data rework, related to harmonization and improvements

Way of working

  • Vlado makes spec; changes to susana.ttl and diff
  • Matthew makes changes to Migration
  • Jana makes changes to RForm templates
  • Mitac makes changes to EntityAPI (hopefully not many will be needed)

A lot of these are changes marked WILL (then DID) in Rembrandt Mapping Review. Some were investigated in RS-323@jira

Color Status

In the next section we use the color-coded {status} macro to indicate where we are at.
When you're done with your part, please edit (in wiki mode!) and change "colour="

Gray: not applicable
Red: not yet done
Yellow: just about done, needs someone else's attention
Green: done

I have not added a box for EntityAPI (Mitac's part) since hopefully these will be few

Changes

Remove part/1

BM data doesn't have parts. For harmonization and simplification:

  • get rid of part/1, and put all its properties directly on the object.
    I've been assured this won't constitute lying about its production, creator, material
  • treat part/2 (the frame) as an accessory (less important) part.
    Keep its URI as is, no need to change.
  • has_number_of_parts: output 1 if there is frame; no property if there is no frame (Matthew please take note)
  • get rid of rso:P46_has_main_part,
    • Keep rso:P46_has_other_part: needed by properties.txt.
    • We could compute this (as rso:P46_has_proper_part) from the standard property, then maybe use it to resolve the [FR BUG]:
spec diff mig rforms

Update properties.txt

RS-92 uses a file of "business meaningful properties" to collect a complete Museum Object. Based on BM mapping and Rembrandt Changes, update this list

spec diff mig rforms

rdfs:label vs crm:P3_has_note

Following Martin's recommendation, BM will use rdfs:label for the main label of every node, and crm:P3_has_note only for auxiliary notes. This is useful, since we may decide to skip their P3_has_note that duplicate structured data in a label, eg "Width :: 23.0"

However, this is not yet adopted by the CRM SIG, and it's unclear whether we're getting rid of P3_has_note altogether. So unless Jana says otherwise, we'll keep P3_has_note for Rembrandt

spec diff mig rforms

Searchability and Display Fields

We cannot display search results including sub-objects (eg a Document or Related drawing that lacks most fields). That's why they should not be searchable, which we've accomplished by introducing E22_Museum_Object and marking only top-level objects with that class.

  • Rethink where we find the Display fields, so we can display both Rembrandt and BM objects
  • BM data doesn't include E22_Museum_Object
    • If possible we should formulate a different criterion for searchability, but I don't think CRM has such notion of "top-level or independent object"
    • If not, we should add such class to BM data
spec diff mig rforms

Disentangle P2_has_type by introducing sub-properties

When two thesauri are mapped to P2_has_type, selecting New value in data annotation doesn't work since it cannot determine which thesaurus to use. Therefore sub-properties should be introduced. P2_has_type can be used as-is only if the node has a single "type".
We replace P2_has_type with the following:

  • Object:
    • rso:P2_has_object_type (rkd-object)
    • rso:P2_has_object_shape (rkd-shape)
  • Image: (RS-627@jira "Cannot determine thesaurus for image type")
    • rso:P129_has_iconclass (rst-iconclass)
    • rso:P129_has_keyword (rkd-keywords)
    • TODO Vlado: add clause "P129_is_about E55_Type" to FR2_has_type, else we can't search by IconClass/Keywords
  • Frame: rso:P2_has_object_type (rkd-object)
  • File:
    • rso:P2_has_object_status (rkd-objectstatus)
    • rso:P2_has_area_captured (rkd-area_captured: FRONT/BACK and OVERALL/DETAIL).
      Note: Jana, we cannot split rso:P2_has_area_captured to two separate fields for <file.spec.overall_detail> vs <file.spec.front_back>,
      since if you look in thesauri-all.ttl, rkd-area_captured has many tangled values, eg "whole (front)". So we leave 2 instances of this property, and leave it to the user to put two "compatible" values in them.
spec diff mig rforms

Single-out Object Type

We display the object type (painting, coin, etc) as one of the display fields ([RS-690]).
For this reason it needs to be singled out amongst all other E55_Type's. How can we do that?

  1. We could leave it as the only field per node mapped to P2_has_type.
    But then to distinguish, two functions need to use explicit triples only (while of course, FRs use inferred triples):
    • "Fetch complete object data" for display
    • "Propose new value"
  2. We could map it to rso:P2_has_object_type (and a similar bmo: extension property), and filter by thesaurus (rkd-object and bm-thes-object respectively)

I like the second approach better since it doesn't rely on a particular query mode (and we may change that tomorrow).
Mitac & Jana, please give your opinion here.

  • Mitac: I probably don't fully understand the problem: crm:P2_has_type is not a chain property, only rso:PS2_has_type is. Thus when we fetch P2 for an object it will return its direct types (painting, coin). Both the explicit and the implicit triples match. So, we can simply show P2 while when searching we should search by PS2.
  • Joshan: I agree with the second solution also.  The thesaurus that the term belongs to defines what the semantics is (not just the predicate of P2_has_type).  Could this not be translated into a rule whereby if the ConceptScheme of a term used by P2_has_type is an 'object type thesaurus' (-this may differ from org to org) then assert an rso:PS2_has_type.  As I mentioned to Vladimir (and as he has mentioned on many occasions), there seems to be an issue of modelling using CRM & providing display/search functionality.  How do we add extra ontology vocab to allow display & search?  I was suggesting:
    1. MODELLING: perform pure CRM modelling to model the dataset (irrelevant of display/search purposes)
    2. ENGINEERING: use rules (like the one above) to assert research space ontology entities to allow display/search funtionality

By doing this, would allow a clean separation between modelling and application requirements by using rules.  I'm aware this may be an ideal and too simplistic, but could it solve majority of issues like this?

Jana, can you explain here why one property should not have multiple properties from several thesauri?

Jana: The reason was a design decision that I have never been too happy with - to recognize thesaurus for new values from the existing values.

That is, if we need to propose a new value for has_current_keeper, for example, we fetch existing values, get the thesaurus they belong to and present the user with autocomplete component that searches values exactly from that thesaurus.

Keep in mind that for Stage 4 this won't work as we will have to implement entering objects from scratch. As a long term solution, we need some meta information about what thesauri are suitable for each property.

As a short term improvement, we may suggest values from multiple thesauri in the new value autocomplete - if needed - similar to the way it will be done in the search sentence.

spec diff mig rforms

Unchangeable P2_has_type

Jana, please note that the following P2_has_type properties are unchangeable (fixed).
They don't come from a thesaurus, i.e. don't have skos:inScheme, so a new value cannot be proposed.

(This is for information only, no change)

Business-specific sub-properties

Maria RS-273@jira: we planned initially that data in a record will be grouped into sections: Basics, Parts, Exhibitions, Auctions, Collections, etc.
But RForms cannot create different sections (lists) based on P2_has_type of a node: it can distinguish only based on relation.

  • Make business-specific sub-properties
    • rso:P12i_was_present_at_exhibition (sub-property of crm:P12i_was_present_at, inverse rso:P12_exhibited)
    • rso:P12i_was_present_at_research (sub-property of crm:P12i_was_present_at, inverse rso:P12_researched)
    • rso:P24i_changed_ownership_through_auction (inverse rso:P12_auctioned)
  • change P11 to P14 for auction house:
  • Maria/Jana to specify whether more sub-properties are needed
spec diff mig rforms

Thesaurus changes

  • add IconClass code as SKOS Notation:

    (currently will not be used by ResearchSpace but could be)

  • Verify that Rembrandt and BM thesauri satisfy BMX Issues#Thesaurus Requirements, and make appropriate changes

Rembrandt thesauri:

  • thesauri.ttl
  • thesauri-all.ttl
  • thesauri-disposition.ttl
  • thesauri-extracted.ttl
  • thesauri-place.ttl
    • rkd-places: replace P89_falls_within with P88i_forms_part_of, else FR won't work (this is a bug)
spec diff mig rforms

BM thesauri: RS-700@jira

Diffs

There are currently no attachments on this page.
  • You can view these with TortoiseUDiff (part of TortoiseSVN)
  • Below we show the most important parts as images, taken from the same program
  • I personally prefer to view Tortoise>Diff with Araxis Merge, which shows word/char-level changes
    Setup: rclick> TortoiseSVN> Settings> External Programs> Diff viewer> External
    "C:\Program Files (x86)\Araxis Merge v6.5\compare.exe" /max /wait /title1:%bname /title2:%yname %base %mine

rso.ttl

susana.ttl





thesauri.ttl


etc

thesauri-place.ttl


etc

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.