Skip to end of metadata
Go to start of metadata

RS-680

Name Size Creator Creation Date Comment  
Microsoft Excel Sheet BM-properties.xls 58 kB Vladimir Alexiev Mar 08, 2013 18:13  
Text File json.log 470 kB Vladimir Alexiev Dec 19, 2012 18:27    
Text File properties.txt 2 kB Vladimir Alexiev Nov 19, 2012 15:07    
Text File superproperties.txt 12 kB Vladimir Alexiev Nov 09, 2012 17:23    
File BM-properties.pl 0.9 kB Vladimir Alexiev Nov 09, 2012 17:22    
PNG File BM-properties.png 23 kB Vladimir Alexiev Nov 09, 2012 17:22    

Intro

RS uses a set of "Business Properties" (currently about 100) for two crucial tasks:

  • Fetch Complete Museum Object
  • FTS indexing (molecules), using a custom process
    RS-977

Considerations

This is a subset of the more than 300 properties in the ontologies that we use:

  • CIDOC CRM has 264 properties (125 inverse pairs, 14 literal properties); and 86 classes
  • RSO has 35 properties; and 7 classes
  • 25 properties; and 4 classes
  • We also use a few properties from external ontologies: SKOS (thesauri), OAC (annotation), BIBO (bibliography), QUDT (units)

Considerations when making the list:

  • For each property we must decide whether we want it, one of its superproperties, or both (see Complete Museum Object#Remove Inferred Superproperty)
  • We should use superproperties whenever appropriate, to keep RForms simpler (more abstract)
  • All appropriate CRM properties have owl:inverseOf and we use inverse inference, so we can use properties either as explicitly stated, or in the opposite direction.
  • Leak Avoidance

Process, Tools, Files

Leak Avoidance

It is very important to avoid "leaks": property paths that go from one object to another (or many others).

  • Such leaks embed the data (or fulltext) of other objects into the root object, and are very undesirable.
  • Direction is important: eg we should follow P14_carried_out_by (to get the painting's author) but never its inverse P14i_performed (that would leak into all objects by the same author).
  • Must "comb" (сресвам) all properties so they point from the object towards the periphery
  • Leaks are caused by properties that can cause a loop (P14/P14i is an example of a trivial loop)
  • Subproperty inference should also be taken into consdireration
    • superproperties.txt: list of immediate superproperties for each business property
    • TODO Vlado: figure out a query (using rdfs:subPropertyOf) to find loops automatically.
  • We cannot remove all looping properties from properties.txt (see examples below), so a key strategy is to cut off traversal at object collections and thesaurus terms:
    RS-1139

Examples of Looping Properties

  • crm:P138i_has_representation: <obj> P138i_has_representation <image>.
    crm:P138_represents: <obj> P65_shows_visual_item/P138_represents <person> or <place>
    • <image> leads to only 1 object, and we cut off at <person> or <place>
  • P12i_was_present_at: <obj> P12i_was_present_at <event> (eg exhibition, research)
    P11_had_participant: BM Association Mapping#Acquired Through (intermediary or contributor)
    • We cut off at BM (skos:Concept)
  • crm:P46_is_composed_of: <obj> P46_is_composed_of <part> (bell, case, dial…)
    crm:P46i_forms_part_of: <obj> P46i_forms_part_of Series/Exhibition/Collection
    • We cut off at Series/Collection (E78_Collection) or Exhibition (skos:Concept)
      RS-1138

Examples of Leaks

Examples of potential leaks, or leaks we had in the past:

Object Present at Another's Acquisition

obj1 – P12i_was_present_at -> obj1/acquisition – P11_had_participant<P12_occurred_in_the_presence_of
-> seller/buyer=BM – P12i_was_present_at -> obj2/acquisition

Resolve by cut off at BM (skos:Concept). See more at Investigating FTS Molecules

Object Part Of Collection

  • obj2 – P46i_forms_part_of -> BM_Collection.
    obj1/acquisition – P110_augmented -> BM_Collection.
    BM_Collection – P110i_was_augmented_by<P12i_was_present_at -> obj1/acquisition.
  • obj2 – FR12_was_present_at -> obj1/acquisition

Resolve by removing P46i_forms_part_of from FRs (cannot exclude by type E78_Collection in FR rules). See more at FR Implementation-old#BUG

Shows Features Of

RS-1160

  • RKD uses P130i:
    # <artistiek> relation to other artistic object
    <obj/2926> crm:P130_shows_features_of <obj/2926/related/1>.
    
  • BM uses P130:
    <obj> P130i_features_are_also_found_on <obj/original>.
    <obj/original> produced/took_place_at <place>
    

I thought that following both relations can create a loop/leak.
But because the referred object has only a few props (and no relations of its own), there can be no leak.

bibo:Document not marked as skos:Concept

RS-700 #3
Bibliography references are not declared skos:Concept, so we spill over from an object to all objects referenced by the same bib.

  • Inference: <obj> P70i_is_documented_in <bibN> => <obj> P67i_is_referred_to_by <bibN> => <bibN> P67_refers_to <obj>
  • We chase P67 because: <obj> P128_carries <obj/concept/1>. <obj/concept/1> P67_refers_to <place>
  • Can be fixed by adding
    @prefix bibo: <http://purl.org/ontology/bibo/>.
    @prefix skos: <http://www.w3.org/2004/02/skos/core#>.
    insert {?d a skos:Concept; 
      skos:inScheme <http://collection.britishmuseum.org/id/bibliography>
    } where {?d a bibo:Document}
    

Shared Image

RS-772 #6
Out of 958035 total images, 57813 (6%) are shared between objects. This is expected.

Fix by cutting off at rso:E22_Museum_Object

Shadow Object with Shared Images

RS-1375
RFC2637 is an "Album: 238 photographs taken in Tibet".

  • It has 304 associated images:
    select (count(*) as ?c) {
      <http://collection.britishmuseum.org/id/object/RFC2637> crm:P138i_has_representation ?i}
    
  • 229 other objects (I guess individual photographs) share images with that album:
    select (count(distinct ?e) as ?c) {
      <http://collection.britishmuseum.org/id/object/RFC2637> crm:P138i_has_representation ?i.
      ?e crm:P138i_has_representation ?i.
      FILTER(?e != <http://collection.britishmuseum.org/id/object/RFC2637>)}
    

This is a more insidiuous case than Shared Image above:

  • RFC2637 is not amongst the 115k objects submitted by Josh
  • So it's only a "shadow of an object": it has image data but no other data
  • from the domain of bmo:PX_has_main_representation it is inferred as crm:E22_Man-Made_Object
  • but because it doesn't have P52_has_current_owner id:the-british-museum, we don't infer it as rso:E22_Museum_Object and we cannot figure out to cut off at this object
    select * {<http://collection.britishmuseum.org/id/object/RFC2637> a ?t}
    

As a result, each of the 229 photograps leaks through the Album to each other photograph.

  • the JSON result for "Load Complete Object" is 450k (json.log) instead of typical 20k
  • the FTS molecule is 199.5k, same for each photograph

Possible Solutions

  1. add P52_has_current_owner to "shadow objects" is a bad idea because they'll become searchable, but they have no data.
    • (Note: assets_*.trig use the same named graphs as real objects, so there'll be no duplicate statements)
  2. cut-off at crm:E22_Man-Made_Object doesn't work because parts and related objects are marked E22:
    RKD: <obj/2926> crm:P130i_features_are_also_found_on <obj/2926/related/1>.
      <obj/2926/related/1> a crm:E22_Man-Made_Object.
    RKD: rso:P46_has_other_part <obj/2926/part/2>.
      <obj/2926/part/2> a crm:E22_Man-Made_Object;
    BM: <obj/part/M> a E22_Man-Made_Object; P46i_forms_part_of <obj>.
    
  3. the best solution is if Josh does not emit image assets for objects that are not included in the 115k
  4. we can hot-fix it by executing this update (in a SystemTransaction):
    delete where {?e bmo:PX_has_main_representation ?i.
      filter (not exists {?e crm:P52_has_current_owner id:the-british-museum})};
    delete where {?e crm:P138i_has_representation ?i.
      filter (not exists {?e crm:P52_has_current_owner id:the-british-museum})};
    

    NOTE: this kills RKD images, which don't have such owner.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.