Skip to end of metadata
Go to start of metadata

Questions, issues and considerations on BMX and BM CRM mapping

Name Size Creator Creation Date Comment  
File bm_ontology-clean.pl 0.7 kB Vladimir Alexiev Nov 19, 2012 15:11    
File builtin_RdfsRules-optimized-with-in... 4 kB Vladimir Alexiev Sep 17, 2012 10:01    
Microsoft Excel Sheet aspects.csv 3 kB joshan.mahmud Sep 05, 2012 16:23    
File BM-data-pretty.pl 1.0 kB Vladimir Alexiev Aug 17, 2012 20:59    
File test.rdf 2 kB Vladimir Alexiev May 10, 2012 10:59    
File test.ttl 1 kB Vladimir Alexiev May 10, 2012 10:59    

Need for extensions

CRM is a very rich ontology, but still not rich enough to capture collection data from different institutions. BMX shows that extensions are needed.

  • Museums should drive towards unification and vigorously promote such extensions for CRM standardization, else effective interoperability cannot happen.
  • Thesaurus mapping (mapping profiles) and co-reference are used to correlate individuals, but cannot harmonize data structure

This page outlines BMX issues, which in many cases stem from CRM, and have to be addressed by any mapping.
Is not intended to bash BMX, but to contribute towards a better mapping that can lead to better interoperability between museum collections mapped to CRM.
The topics have been developed in more detail in Ontology Patterns, and below we link to appropriate sub-pages.

Extension principles

A compatibility condition from the CRM Specification introduction is: "all properties of the extension are either subsumed by CRM properties, or are part of a path for which a CRM property is a shortcut"

  • So extensions should be sub-classes of CRM entities, or sub-properties or long-cuts of CRM properties
  • The purpose of this principle is to allow applications that speak only pure CRM to still query a repository that has CRM extensions

BMX uses some extensions that go against the principle (properties that are neither sub-properties, nor long-cuts)

BM Response
A better construction of the BMX entities will be addressed - i.e:

  • Properties will be a sub-property of CRM predicates
  • Classes will sub-class CRM classes
  • Assert the extensions ontology into the store

RDF-related

Prefixes

  • Non-standard prefix
    All CRM entities and properties have prefix http://collection.britishmuseum.org/id/crm.
    But this doesn't match any of the proposed standardized RDF representations (CIDOC RDFS, Erlangen CRM, BloodyByte OWL2).
    CIDOC RDFS doesn't define a specific prefix/base, but Martin said (31 May 2012 in Crete) they will.

BM Response
Agreed to move the data to use erlangen (current) as that seems a more open and shared way of utilising CIDOC CRM.  Jonathan Whistson Cloud (BM) will make contact with Erlangen

  • the endpoint doesn't return defined prefixes, but auto-generated ones, eg
    @prefix ns4: <crm/bm-extensions/>.

    This makes understanding harder (a small nuisance)

BM Response
This was due to the old RDF store we used. The move to OWLIM will solve this problem.

  • this prefix is not good because of the hash:
    @prefix crmx: <http://erlangen-crm.org/current/#> .

    It leads to these property URIs:

    <http://erlangen-crm.org/current/#P82a_begin_of_the_begin>
    <http://erlangen-crm.org/current/#P82b_end_of_the_end>
    

    The hash (fragment) says to look for such elements in the file <http://erlangen-crm.org/current/>, and you won't find any.
    Use these props with the standard prefix, and let's hope the Erlangen people will add them soon (following my email):

    crm:P82a_begin_of_the_begin, crm:P82b_end_of_the_end
  • remove crmx: and other unused prefixes:
    @prefix dc: <http://purl.org/dc/elements/1.1/> .
    @prefix quantity: <http://data.nasa.gov/qudt/owl/quantity#> .
    

Can't use Prefixed Names

Property names (eg P30F.transferred_custody_of, PX.time-span_earliest) and class names (eg E22.Man-Made_Object) use "." which is invalid for prefixed names. This makes it impossible to use prefixes in SPARQL and returned Turtle, since prefixed URIs can include only alphanumeric, "_" and "-".

  • Let me repeat: applications cannot use prefixed names while querying the BM endpoint
  • Erlangen CRM doesn't have this problem
  • Below we use the prefixes bmx: and crm: for brevity, despite this being syntactically incorrect

BM Response
Agreed that this is a non-standard way in which to name these and the move to Erlangen should help this - will be removed from the mapping.

Reasoning

The old BM endpoint used a repository that doesn't implement any reasoning. This means that only a subset of the declared CRM properties can be navigated.
The new BM endpoint (OWLIM) implements OWL2 RL and custom rules. The rules file builtin_RdfsRules-optimized-with-inverse-and-transitive.pie is based on RdfsRules-optimized of OWLIM5 and implements the following reasoning:

  • RDFS (subclass/subproperty/domain/range)
  • inverse properties
  • transitive properties

So now you don't need to emit (state) inverse properties, you can rely on owl:inverseOf inferencing. Eg remove these:

  • P10i_contains
  • P108_has_produced
  •  

BM Response
Could you point us where in the OWLIM documentation of how we install the .pie files?  Thanks! 

RDF bogosity

I think I found the reason for the blank nodes:

  • in new mapping: Print object date (see Dates below)
  • in old mapping:
    • <bmx:time-span_earliest_int>
    • <E54.Dimension>
    • the thesaurus entry below

The RDF/XML below is bogus:

<rdf:Description rdf:about="http://collection.britishmuseum.org/id/thesauri/subject">
    <csw:hierarchyRoot xsd:boolean="true" />

BM Response
E54.Dimension was a syntax error in the config which has been removed.

XML/RDF syntax for specifying the datatype of a literal is by using an attribute: rdf:datatype:

<rdf:Description rdf:about="http://collection.britishmuseum.org/id/thesauri/subject">
  <csw:hierarchyRoot rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">true</csw:hierarchyRoot>
</rdf:Description>

The RDFer application has been altered to do this.

Various Problems

Problems of the old mapping found in one of 684 files (PrintsAndDrawings_133.rdf). Please check for the new mapping too:

  • PPA1000/1/type/creation: relative URI, but there's no base on top
  • <P43F.has_dimension> and <E54.Dimension> are written without prefix crm: . RIOT says
    WARN  {W136} Relative URIs are not permitted in RDF: specifically <P43F.has_dimension>
    WARN  {W104} Unqualified typed nodes are not allowed. Type treated as a relative URI.
    

    Now is fixed

  • some events that don't seem connected to anything, eg "Great Reform Bill"
    <event/ccd37989fc90ac0c52028efad0e65bda>
  • uses rdfs:text "1832", but I've never heard of such property

BM Response

  • all URIs should now be fully qualified.
  • E54.Dimension has been fixed to use the prefix of crm
  • rdfs:text is no longer used
  • Event names are no long encoded but use the textual name (URI encoded)

Extensions

Imprecise Dates

Non-standard properties

Time Spans in CRM have the standard properties P81 & P82, which range over Time Primitive (defined by CIDOC RDFS as Literal).
BMX defines its own properties bmx:PX.time-span_{earliest,latest} and bmx:time-span_{earliest,latest}_int that are not related to either.
Thus an application speaking only CRM will not understand anything about dates from the BM endpoint.
In contrast, we raised the issue of Date Imprecision at the CRM SIG and M.Doerr responded by proposing more properties: P81a, P81b, P82a, P82b (begin_of_the_begin, end_of_the_end, begin_of_the_end, end_of_the_begin) that can capture precisely the inner and outer bounds of a time span.
See Imprecise Begin-End (and general) for details

BM Response
The proposed way in which to represent dates as outlined by Martin Doerr will take affect and the usage on predicateds P81a/b / P82a/b will be used to represent ranges. Dominic to confirm the usage of this.

Date<>String+Integer

Dates are not represented with the standard xsd:date, but with a string and an integer:

  • plain Literal, eg bmx:PX.time-span_latest "31 Dec 1979"
  • xsd:integer, eg bmx:time-span_latest_int 19791231

The standard xsd:date supports years BC (negative), so there is no reason for that.
Furthermore, the range of these properties is not defined.

BM Response
As a general rule, the usage of xsd types should be used more to type literals.

Peter Main has given this description with regards to the dates:

The situation is as follows

All dates (production or acquistion) are stored internally in Merlin in the same way:

1) As a text date which the user enters, and an algorithm transforms on-the-fly to:
2) A date-range, which is a pair of numeric fields corresponding to a begin-date and an end-date.

Each of the range fields are signed integers representing either

The number of DAYS after 01 Jan 0001 (if positive)
The number of YEARS before 1BC (if negative)

So, from the user’s data entry, three fields are created for each of Production Date and Acquisition Date.

The text date is exported as stands (i.e. as the user originally entered it) to the XML which is used to generate the RDF.
The begin-date and end-date are both ‘rendered’ by Merlin as text versions. These fields are exported alongside the text date.
Production and Acquisition dates-ranges, although stored internally in the same way, are rendered differently in the XML output.

Acquisition date-ranges are rendered as D2 M3 Y4, e.g. ‘23 May 2012’.
Production date-ranges (where only information to the nearest year is deemed to be relevant) are rendered as Y4. So the date above would be rendered as ‘2012’.

Here are some examples:

Acquisition Date:
Text date = ‘1984’ , earliest date = ’01 Jan 2012’, latest date = ’31 Dec 2012’
Text date = ‘1970-1980’, earliest date = ’01 Jan 1970’, latest date = ’31 Dec 1980’.
Text date = ’23 Mar 2012 – 25 Mar 2012’, earliest date = ’23 Mar 2012’, latest date = ’25 Mar 2012’.

Production date:

Text date = ‘13thC’, earliest date = ‘1200’, latest date = ‘1299’
Text date = ‘12thC BC’, earliest date =’1199 BC’, latest date = ‘1100 BC’
Text date = ‘1100 BC – 100 AD’, earliest date = ‘1100 BC’, latest date = ‘0100’

The way the date ranges are rendered, and hence the form they take in the XML export, could probably be changed if necessary.

Blank nodes

time-span_{earliest,latest}_int point to blank nodes, eg

<object/EAF119787/acquisition>
  <crm/bm-extensions/PX.time-span_latest_int> [19791231]

But the proper representation is with a literal (xsd:integer):

<object/EAF119787/acquisition>
  <crm/bm-extensions/PX.time-span_latest_int>
    19791231 # same as  "19791231"^^xsd:integer

BM Response
Perhaps have the time-span can be it's own resource and then have the time span elements from that resource...

Vlado: yes, that's how the Rembrandt mapping does it

Incomplete dates not captured properly

Dates in a historic context are very often Incomplete: yyyy or yyyy-mm instead of full yyyy-mm-dd. This is related to, but different from Imprecise dates.
IMHO this should be modeled as follows (date vs gYearMonth vs gYear):

  • Capture the original properties with types xsd:gYear, xsd:gYearMonth, xsd:date; use them to print
  • Add completing properties of type xsd:date; use them for indexing/search
    • The completion rules should depend on the property and degree of incompleteness.
      Eg P82a_begin_of_the_begin should be completed to yyyy-01-01 (beg-year) or yyyy-mm-01 (beg-month),
      while P82_sometime_within should be completed to

BMX's approach:

  • bmx:PX.time-span_{earliest,latest} capture some degree of incompleteness by supporting formats DD MMM YYYY, [Y+] BC and [Y+] (not certain what the last two mean). But the representation is non-standard, and yyyy-mm cannot be captured
  • bmx:PX.time-span_{earliest,latest}_int serves as indexing/search properties, but the rules for their completion are not specified

BM Response
Will be investigated further as all dates coming out of the collections system should be consistent.

Inconsistent Naming

time-span_{earliest,latest}_int is missing prefix "PX."

Numbers

Museum objects often need to represent imprecise numeric dimensions.
BMX adds two properties bmx:PX.min_value, bmx:PX.max_value that are both sub-properties of crm:P90F.has_value.

  • If a dimension is precise, only min_value is asserted, which infers has_value, but max_value remains undefined
  • If a dimension is imprecise, both min_value and max_value are asserted, which infers has_value as both

This lets an application query for has_value. However, has_value being inferred as multi-value doesn't allow to capture a single "most likely" or "expected" dimension.
I have argued on the CRM SIG that min & max should be standardized (eg in a sub-class Linear Dimension), and value be allowed to be independent. Value can be computed during conversion, e.g. to be the average of min and max, or equal to min if max is undefined

BM Response
If a precise value is used, can both the min and max value be the same value?  Therefore always have a min and max for precise and imprecise.   Check with Martin Doerr

Vlado: see Imprecise Dimension#Proposal Linear Dimension

CRM Modeling

BMX uses some lax CRM modeling practices. The CRM is very expressive and quite specific on how one should model facts about a situation.
Actually I'm not thrilled about being CRM-orthodox all the time, because that creates complex representations that are: more inefficient, difficult to display, difficult to consume.
But lest simplifications sacrifice interoperability, museums must agree on such simplifications in some sort of "application profiles"

Acquisition

BMX models an acquisition (object entering a collection) as E8.Acquisition

  • it carries appropriate properties, eg P22F.transferred_title_to
  • but an acquisition is also a Transfer of Custody (unless the owner simultaneously loans the object to another keeper), and indeed many such events in BM carry appropriate properties (eg P30F.transferred_custody_of)
  • The two properties above have domain/range declarations that infer both types Acquisition and Transfer of Custody under RDFS semantics
  • Therefore the declaration of type Acquisition (only) is unnecessary (under RDFS) and incomplete (under RDF)

Furthermore, there's a lot more going on in such event. Assume that <event> moved <obj> from <collection1> at <place1> to <collection2> at <place2>, where it currently resides. Here's how we model it in RS (Collections):

# <obj> exit from <collection1>
<event> crm:P112_diminished <collection1>. # domain E80_Part_Removal
<event> crm:P28_custody_surrendered_by <collection1>. # domain E10_Transfer_of_Custody
<event> crm:P23_transferred_title_from <collection1>. # domain E8_Acquisition
<obj> crm:P113i_was_removed_by <event>. # range crm:E80_Part_Removal
<obj> crm:P30i_custody_transferred_through <event>. # range crm:E10_Transfer_of_Custody
<obj> crm:P24i_changed_ownership_through <event>. # range crm:E8_Acquisition

# object entry in <collection2>
<event> crm:P110_augmented <collection2>. # domain E79_Part_Addition
<event> crm:P29_custody_received_by <collection2>. # domain E10_Transfer_of_Custody
<event> crm:P22_transferred_title_to <collection2>. # domain E8_Acquisition
<obj> crm:P111i_was_added_by <event>. # range crm:E79_Part_Addition
<obj> crm:P30i_custody_transferred_through <event>. # range crm:E10_Transfer_of_Custody
<obj> crm:P24i_changed_ownership_through <event>. # range crm:E8_Acquisition

# Historic facts: custody->keeper, acquisition(title)->owner, place->location
<obj> crm:P49_has_former_or_current_keeper <collection1>.
<obj> crm:P51_has_former_or_current_owner <collection1>.
<obj> crm:P53_has_former_or_current_location <place1>.

# Current facts
<obj> crm:P46i_forms_part_of <collection2>.
<obj> crm:P50_has_current_keeper <collection2>.
<obj> crm:P52_has_current_owner <collection2>.
<obj> crm:P55_has_current_location <place2>.
<obj> crm:P54_has_current_permanent_location <place2>.

BM Response
It is considered that potentially reviewing the Rembrandt data as well as Yale Centre for British Art's data would be better examples of what data is to be mapped and to bend BM's modelling around this.

  • Some people may find my idea if identifying collections with the agents who own them as too avantgarde. Affects:
    <object> crm:P46i_forms_part_of id:the-british-museum;
    <acquisition> crm:P112_diminished <person-institution/132369> ;
      crm:P110_augmented id:the-british-museum .
    
  • if you specify the types of Acquisition then you should add E80_Part_Removal, conditionalized on whether you know the old owner.
    Better yet, leave it to RDFS to infer all applicable types.
    Currently you have these but not E80:
    <object/PPA41937/acquisition>
      a crm:E8_Acquisition , crm:E10_Transfer_of_Custody , crm:E79_Part_Addition
    
  • is bmo:PX_donated_by a sub-property of crm:P23_transferred_title_from?
    If so, why do you need it, given that you know the acquisition type is Donation:
    crm:P2_has_type <thesauri/acquisition/d>

Too much info captured as Notes

BMX captures too much info as sub-properties of has_note, eg

PX.has_copyright subPropertyOf P3F.has_note

But the proper CRM way is to use E30.Right, and its own P3F.has_note.
By declaring its own sub-properties, BMX eg doesn't give a CRM application a chance to find all Rights over objects.

Note: the use of notes as redundant display properties (eg "height :: 30 :: cm") is a good practice, similar to LIDO's displayWraps. This is especially useful for thesaurus values.

BM Response
Review of this is required for each instance of has_note...

Root Objects vs Sub-objects; Searchability

  • BMX defines both root Museum Objects and their parts as Man-Made_Object. But this makes it hard to search for root objects only, which is a typical use case
  • Sub-objects must not have link "View Collections Online"
    The html representation of an object (eg http://collection.britishmuseum.org/description/object/EAF119787) has link View Collections Online Record, which works nicely.
    But sub-objects (eg http://collection.britishmuseum.org/description/object/EAF119787/acquisition) also have such link, which includes empty URL field objectid= and gives error
    "No object details were found. We are sorry but a problem occurred trying to retrieve this information, please refresh the page and try again. If the problem persists please contact us at web@britishmuseum.org."
  • for RS we defined the following nodes to be searchable, i.e. rso:FC70_Thing:
    • rso:E22_Museum_Object, OR
    • crm:E22_Man-Made_Object, and the current keeper or owner is the BM

Property Types vs Sub-Properties

The CRM allows "typing" of some key properties. Eg P3.1_has_type is a type for P3_has_note, and allows you to introduce notes with specific meaning.
Since RDF does not normally allow you to define properties on properties (P3.1_has_type has another property as its domain), CRM recommends to use sub-properties for that purpose. And indeed, BMX massively uses extension sub-properties.
But as I argue in my paper Property Types and Annotations, this approach is not very convenient if the specific relations are numerous and come from a thesaurus. RDF schemas are flexible, but a thesaurus is more flexible still. In such case, eg for a Search use case, it's better to let the user select (or multi-select) from a list.

  • My paper describes 3 alternatives for representing Property Types in RDF.

Indeed, the BMX mapping of mus_object_production_place_association includes a long Switch that converts data (flexibility) to schema (fixedness):

  • if 'A' then PX.attributed_in, if 'CF' then PX.claimed_to_be_from, ...

Images

  • Josh: what predicate are you using to associate an object to a digital asset (such as a jpg / photo) of the object?
    I saw you used crm:P138i_has_representation but I remember that Martin Doerr had some issue with this (the entity being the thing that was in the picture – not the object I believe)
  • Vlado: yes, P138i (see details below). This passed Rembrandt Mapping Review by Martin with no comment.
  • Let's discuss the Rembrandt and BM mappings on skype, using these diagrams as reference, and come to an agreement:
    image_objects_carriers@crmg, document_references@crmg

Rembrandt Images

rso:P3_has_image_file rdfs:subPropertyOf crm:P3_has_note; rdfs:domain crm:E38_Image.
rso:E38_Main_Image rdfs:subClassOf crm:E38_Image .

<obj/2926> crm:P70i_is_documented_in <obj/2926/document/1>.
<obj/2926/document/1> a crm:E31_Document; crm:P138i_has_representation <obj/2926/file/6>.
<obj/2926/file/6> a crm:E38_Image; rso:P3_has_image_file "mh0147_back_nl_2002.jpg".
  • we have info about physical documents (eg X-Ray photos) with a location and keeper, that's why we created E31_Document.
    If there is no such info, I think it's better to skip <obj/2926/document> and go with the <obj/2926/file> (E38_Image) directly
  • The image filename/URL is captured as rso:P3_has_image_file, a sub-property of P3_has_note.
  • "Main images" (to be used as thumbnail in search results) are marked with rso:E38_Main_Image, an extension sub-class of crm:E38_Image

BM Images

Related tasks:

  • RS-696@jira: description how to find the image URL
  • RS-771@jira: repeats the next section
  • RS-772@jira: task for Josh to fix the problems described in the next section

Image Representation

#! are problems listed by Vlado

<object/GAA5991>
  crm:P67i_is_referred_to_by <object/GAA5991-/document/6115> ; #! not needed: thus super-property is inferred
  crm:P70i_is_documented_in <object/GAA5991-/document/6115> .

<object/GAA5991-/document/6115> a crm:E31_Document ;
  #!!!! This whole node is parasitic (has no useful info), so skip it
  crm:P49_has_former_or_current_keeper id:the-british-museum ; #! if you want to say "BM owns the image digital asset", move it to the next node.
    #! However, you say this with P105_right_held_by. Be mindful of domains and ranges! Only Physical objects have keepers
  crm:P3_has_note "Black-figured dinos..." ; #! this is duplicated from OBJECT.bmo:PX_physical_description, remove it
  crm:P138i_has_representation <object/GAA5991-/file/6115> ;
  crm:P67i_is_referred_to_by <object/GAA5991-/file/6115> . #! not needed: thus super-property is inferred

<object/GAA5991-/file/6115> a crm:E38_Image ;
  #! why "-" in the URI? Also, I'd use "image" instead of "file" because it is an image
  crm:P48_has_preferred_identifier <object/GAA5991-/file/6115/id> ;
  #! add: crm:P138_represents <object/GAA5991>;
  #! add: crm:P105_right_held_by id:the-british-museum;
  bmo:has_image_url "http://www.britishmuseum.org/collectionimages/AN00006/AN00006115_001_l.jpg" .

<object/GAA5991-/file/6115/id>
  a <file:/C:/my/Onto/proj/ResearchSpace/data/BM-data/new/E42_Identifier> ; #! you missed the prefix here
  #! add P2_has_type
  rdfs:label "6115" ;
  crm:P3_has_note "Asset ID :: 6115" . #! shouldn't this be one level up (in the file/image node)?

Discussion:

  • Josh: the creation of the document is actually symbolic of the asset record in our DAM system - therefore I think this could remain as if we were to asset any further information about it then we could add it to this E31_Document node
    • Vlado: Assert it in the file/image node. You don't need a parasitic intermediate node. I used it in the case of Rembrandt since it's a physical document with info where it is stored, etc
  • Josh re Main Image: since you're using a ResearchSpace ontology to link images to objects & main image, becomes difficult for us which is not in ResearchSpace (yet) nor other organisations.
    • Vlado: As we come across various use cases, we should not be afraid to define some generally-useful extensions for museums to use.
      • We already have examples:
        P82a/b for imprecise dates
        min_value/max_value for imprecise numbers
        MuseumObject or somesuch to indicate a searchable collection object
        MainImage to indicate what to use as thumbnail
      • If "RSO" and "BMO" sound too project/institution-specific, let's call this "CRMX" (CRM extended)
  • Josh: Therefore could this be an area which could also be a rule? Could we say:
    if x crm:P3_has_note y AND y rdfs:type E38_Image AND [some indicator of main - for us it will be an ordering/sequence number of '1'] => x rso:P3_has_main_image_file y
    • Vlado: how do you represent ordering/sequence/preference in CRM? You'd be using your own extension, and from that point of view it is not any better than the construct we used in RSO
  • Josh: Since there is not a simple solution using CRM, here we will have to use RSO and have used E38_Main_Image for images which have the sequence no. 1 or 0. The default is 999999 (from merlin).
    Have also use rso:P3_has_image_file and specified the URL to the image on COL. Some objects relate to multiple images, all of which have a sequence no. of 999999. In that case, none of them will have rso:E38_Main_Image.
    Have removed the <obj/document> resource and have:
    <obj> <crm:P138i_has_representation> <obj/image/1>.
    <obj/image/1> a <crm:E38_Image>;
                  <rso:P3_has_image_file> "http://www.britishmuseum.org...." #! Is this OK as it is not a file but a URL?
                 (if sequence no == 0 OR 1)
                  a <rso:E38_Main_Image>;
    

Image RDF

  • Josh provided image_out.rar (285Mb) contains 367k RDF files that link BM objects to image URLs
  • Mitac put it at the server: external: 93.123.21.123:10024, internal: 192.168.130.198:22, user search0ssh, password (ask Mitac), dir /nidata/researchspace/trunk/data/josh
  • Because of the large number of files in one dir, this RAR is quite unwieldy (slow to process)
    • Josh: Totally agree with this. Have found a better way of dealing with this - the RDFer application now creates named graphs and outputs the RDF as TRiG (as suggested by Barry1). Therefore we can chunk the raw file exported in much larger chunks which should get rid of this issue. We will still have multiple files, but more likely they will be in ~40MB files

Image Access

  • SPARQL 1.0 query
    @prefix bmo: <http://collection.britishmuseum.org/id/ontology/> .
    @prefix crm: <http://erlangen-crm.org/current/> .
    select ?obj ?image {
      ?obj crm:P70i_is_documented_in ?doc.
      ?doc crm:P138i_has_representation ?file.
      ?file bmo:has_image_url ?image}
    
  • SPARQL 1.1 using property paths:
    @prefix bmo: <http://collection.britishmuseum.org/id/ontology/> .
    @prefix crm: <http://erlangen-crm.org/current/> .
    select ?obj ?image {
      ?obj crm:P70i_is_documented_in / crm:P138i_has_representation / bmo:has_image_url ?image}
    
  • BM images will be used remotely (from the BM Collection Online site), so we need full URLs. Ergo bmo:has_image_url and not rso:P3_has_image_file
  • These remote images can only be displayed, but the following functioanlity is deferred to RS3.5:
    • Image Annotation: relies images served from IIP Server
    • "Related" tab in Nuxeo: relies on images to be represented as Nuxeo objects
  • There is no "main image" indication yet, so we'll display the first (random) image per object as the thumbnail

Josh5: The Image RDF has altered after speaking with Vlado and making the resource for the image the online jpg:

    <http://collection.britishmuseum.org/id/object/PDO11441> crm:P138i_has_representation <http://www.britishmuseum.org/collectionimages/AN00563/AN00563759_001_l.jpg>.
    <http://www.britishmuseum.org/collectionimages/AN00563/AN00563759_001_l.jpg> crm:P105_right_held_by thesIdentifier:the-british-museum;
                                                                                 crm:P3_has_note """The Libyan Sibyl, study for a pendentive; seated to front, her right arm resting on an unseen surface. c.1738
Black and white chalk, on two conjoined sheets of grey paper""";
                                                                                 crm:P48_has_preferred_identifier <http://www.britishmuseum.org/collectionimages/AN00563/AN00563759_001_l.jpg/id>;
                                                                                 a crm:E38_Image.
    <http://www.britishmuseum.org/collectionimages/AN00563/AN00563759_001_l.jpg/id> crm:P2_has_type <http://collection.britishmuseum.org/id/thesauri/identifier/assetid>;
                                                                                    crm:P3_has_note "Asset ID :: 563759";
                                                                                    a crm:E42_Identifier;
                                                                                    rdfs:label "563759".

Asset ID is also stored as an Identifier.

Image Renditions in Varying Size

BM Response:
We have recently altered the way in which we're displaying the images in different sizes.  The *l* image will remain, but we will be getting rid of the m & s files as we have a new way in which to display the images to save on space e.g.:

More Problems in the New Mapping

Problems of the new mapping, found in one file P&D_133.ttl, converted from P&D_133.rdf using any23:

Labels

Martin recommends using rdfs:label as a good Linked Open Data practice, but:

  • the CRM SIG needs to make a recommendation on this
  • are we abolishing P3_has_note altogether? Eg why do you use it for crm:E54_Dimension, instead of rdfs:label?
  • shouldn't we also replace crm:P90_has_value with rdf:value?

I'll try to raise the above to the CRM SIG.

Currently decided principles:

  • rdfs:label: primary string label of a node, eg object title, artist name
  • skos:prefLabel: primary string label of a SKOS thesaurus node. (This is a sub-property, so it infers rdfs:label). Note, this should only be used for skos:Concept, and NOT skos:ConceptScheme.
  • crm:P3_has_note: any additional label/note
  • bmo:PX_display_wrap: used for redundant textual representation of structured RDF and can be ignored.

Current status:

  • BM uses rdfs:label
    • Why use bmo:PX_inscription_content and not simply rdfs:label?
  • Rembrandt uses P3_has_note

Dates

  • P82/a/b: remove the blank node, use just the literal year. And xsd:gYear should be attached as a type, not as a property!
    <object/PPA41937/acquisition/date> a crm:E52_Time-Span ;
      crm:P82_at_some_time_within _:node16qlkcjibx31086583 .
    _:node16qlkcjibx31086583 xsd:gYear "2003" .
    

    Should be this (but I don't know the RDF/XML syntax)

    <object/PPA41937/acquisition/date> a crm:E52_Time-Span ;
      crm:P82_at_some_time_within "2003"^^xsd:gYear.
    

    BM Response
    The RDF/XML way in which to do this is:

<crm:P82_at_some_time_within rdf:datatype="http://www.w3.org/2001/XMLSchema#int">2003</crm:P82_at_some_time_within>

The RDFer has been altered to do this.

  • you emit all of P82, P82a, P82b. However in this way we don't know which to display.
    If only a year is present in the data, emit only P82. As explained in Rembrandt Mapping Review, that's enough for searching

Dimensions

  • You should add a type: "106.00"^^xsd:double.
    Note: 106.00 means xsd:decimal, but I think we want xsd:double
  • Display wrap of dimension
    <object/PPA41937/height> rdfs:label "Dimension :: 278.00 ::"
    • I think you need to concretize: "Height :: 278.00 ::"
    • Better yet if you can emit one note for both dimensions, at the object level:
      <object/PPA41937> P3_has_note "Dimension :: 278.00 x 85.00 cm ::"

Identifiers, sameAs considered harmful

  • why do you need to have two URIs for each object, related by sameAs? Isn't it enough to have the codex_id as an identifier property?
    Our current queries return all objects twice, so it'll be easier for us if you don't use this sameAs approach
  • you got owl:sameAs twice
    owl:sameAs <codex/763479> ;
    owl:sameAs <codex/763479> ;
    
  • I think you should make "PPA41937" the crm:P48_has_preferred_identifier, and not merely an crm:P1_is_identified_by.
    • Note: "prn" below means "Public Reference Number"
      crm:P2_has_type thesIdentifier:prn

Inscription

  • Why do you need this whole business with
    <object/PPA41937/section/1> a crm:E53_Place
    • Unless you really have inscription position info, skip altogether
      crm:P56_bears_feature, crm:E25_Man-Made_Feature, section/1, crm:E53_Place
    • and just go with
      crm:P65_shows_visual_item, crm:E43_Inscription
  • why do you need bmo:PX_inscription_content, just use rdfs:label?

Subject

  • I'm not sure P129 below is appropriate:
    bmo:PX_physical_description """A still life with jug, vase and mug before a landscape.  1986
      crm:P129_is_about thes:x13409 ;
      crm:P3_has_note "Subject :: still-life ::" ;
    
    • I think "still-life" is a type of print. This one is really about the "jug, vase and mug"
    • So examine that thesaurus and maybe make a "bmo:has_print_type subPropertyOf crm:P2_has_type"
    • But if the thesaurus has other values that describe "aboutness", go with P129

Series

Currently you have:

<http://collection.britishmuseum.org/id/object/PPA44216> a crm:E22_Man-Made_Object ;
	crm:P148i_is_component_of <http://collection.britishmuseum.org/id/series/Mus-e-Grotesque> ;
	crm:P3_has_note "Component of series :: Musee Grotesque ::" .
<http://collection.britishmuseum.org/id/series/Mus-e-Grotesque> a bmo:EX_Series ;
	crm:P148_has_component <http://collection.britishmuseum.org/id/object/PPA44216> ;
	rdfs:label "Musee Grotesque" .
bmo:EX_Series rdf:type owl:Class ;
  rdfs:subClassOf crm:E89_Propositional_Object ;
  • no need to state the inverse
  • P148i_is_component_of is used between E89_Propositional_Object's. The object is a E22_Man-Made_Object, so you should use P46i_forms_part_of instead.
    • By RDFS semantics, the object is inferred to be E89 in addition to E22, which is bad!
  • Change EX_Series to be subClassOf E78_Collection. That's not exactly subclass of E22, but close enough: E18 is their common parent class. Check out class-hierarchy@crm:
    E18 - - - - Physical Thing
    E19 - - - - - Physical Object
    E22 - - - - - - Man-Made Object
    E24 - - - - - Physical Man-Made Thing
    E78 - - - - - - Collection
    

Published Type

What in the world is this? Why the publisher of a print is considered a "type creator", and what "type" are we talking about here?

<http://collection.britishmuseum.org/id/object/PPA44216> a crm:E22_Man-Made_Object ;
  crm:P2_has_type <http://collection.britishmuseum.org/id/object/PPA44216/type> ;
  crm:P3_has_note "Published by :: Martinet, Aaron ::" .
<http://collection.britishmuseum.org/id/object/PPA44216/type> a crm:E55_Type ;
  crm:P94i_was_created_by <http://collection.britishmuseum.org/id/object/PPA44216/type/creation> .
<http://collection.britishmuseum.org/id/object/PPA44216/type/creation> a crm:E83_Type_Creation ;
  crm:P94_has_created <http://collection.britishmuseum.org/id/object/PPA44216/type> ;
  bmo:PX_published_by <http://collection.britishmuseum.org/id/person-institution/37449> ;
  crm:P3_has_note "Published by :: Martinet, Aaron ::" .

BM Response
Yes, it is quite strange. This is a model which Seme4 put together. As I understand it, it works as follows:
- The 'type' (id/PPA44216/type) being referred to is the type of object it is (bowl, jug, painting etc), but referring to not the specific type itself but just the concept of the type (I think it is essentially part of the design (event?) when the decision was made .
- the creation of the type (id/PPA44216/type/creation) is the instance of Type_Creation i.e. the point at which the decision was made of what type of object it was going to be (hence referring to the above 'type' resource)
- I believe the 'type' here is referring to the publication itself and the 'type/creation' is the decision to publish the object (as it is a print).
I agree this is not straight forward, but I presume that this was constructed with Martin Doerr so I did not want to change it.

Vlado: The CRM says "E83 Type Creation: comprises activities formally defining new types of items.
It is typically a rigorous scholarly or scientific process that ensures a type is exhaustively described and appropriately named."
Any way you slice it, publishing a print is not a Type Creation. So kill this crazy construction.

Bibliography

Curently you have:

<http://collection.britishmuseum.org/id/object/PPA44216> a crm:E22_Man-Made_Object ;
  crm:P67i_is_referred_to_by <http://collection.britishmuseum.org/id/bibliography/350> ;
  crm:P3_has_note "Bibliograpic reference :: De Vinck 10531 ::".

<http://collection.britishmuseum.org/id/bibliography/350> a bibo:Book ;
  #! The range of P67i infers type E89_Propositional_Object, but the properties below are non-CRM.
  dc:title "Collection De Vinck.  Un sicle d'histoire de France par l'estampe.  Inventaire analytique." ;
  dcterms:date "1909-1967" ;
  <http://collection.britishmuseum.org/id/crm/bm-extensions/PX.published_in>
    [a <http://collection.britishmuseum.org/id/crm/E53.Place>; rdfs:label "Paris"];
  dcterms:publisher
    [a foaf:Organisation ; rdfs:label "Imprimerie Nationale"] ;
  dcterms:isPartOf [a bibo:Collection], [a bibo:Series] .
  • use P70i_is_documented_in instead of P67i_is_referred_to_by.
    P70i leads to to E31_Document, while P67i leads to the less specific E89_Propositional_Object
  • You must use CRM instead of DC, BIBO and FOAF !!
  • PX.published_in uses property and class from old namespace, blank node, and is not CRM
  • Use Time-span and structured dates
  • The dcterms:isPartOf as it stands is highly uninformative and useless

So I'd use this:

<http://collection.britishmuseum.org/id/object/PPA44216> a crm:E22_Man-Made_Object ;
  crm:P70i_is_documented_in <http://collection.britishmuseum.org/id/bibliography/350> ;
  crm:P3_has_note "Bibliograpic reference :: De Vinck 10531 ::".

<http://collection.britishmuseum.org/id/bibliography/350> a crm:E31_Document;
  rdfs:label "Collection De Vinck.  Un sicle d'histoire de France par l'estampe.  Inventaire analytique." ;
  crm:P94i_was_created_by <http://collection.britishmuseum.org/id/bibliography/350/creation>.

<http://collection.britishmuseum.org/id/bibliography/350/creation> a crm:E65_Creation;
  # Production is for material things; Creation is for conceptual things
  crm:P4_has_time-span <http://collection.britishmuseum.org/id/bibliography/350/creation/date>;
  bmo:PX_published_in <http://collection.britishmuseum.org/id/bibliography/350/creation/place>;
  bmo:PX_published_by <http://collection.britishmuseum.org/id/bibliography/350/creation/publisher>;
  P2_has_type <http://collection.britishmuseum.org/id/thesaurus/production/published>.

<http://collection.britishmuseum.org/id/bibliography/350/creation/date>
  P82a_begin_of_the_begin "1909"^^xsd:gYear;
  P82b_end_of_the_end "1967"^^xsd:gYear.

<http://collection.britishmuseum.org/id/bibliography/350/creation/place> a E53_Place;
  # It is better to find the right term from <http://collection.britishmuseum.org/id/thesaurus/place>
  rdfs:label "Paris".

<http://collection.britishmuseum.org/id/bibliography/350/creation/publisher> a E40_Legal_Body;
  rdfs:label "Imprimerie Nationale".

BM Response
I have made these changes, but not too should what to do with these fields:

  • Journal:text field (using a P2_has_type + PX_Journal label?)
  • Book: text field (using a P2_has_type + PX_Book label?)
  • ISBN/ISSN: text field (E42_identifier or just a PX_ISBN label?)
  • Other text fields: Edition, Volume, Issue, Page Nos

GUID URIs

Production

This is a complex topic, so it has its own section

BM Merlin Fields

Here is a list of all the production related "fields" in Merlin (BM's database).

  • General
    • Production Date
    • Production Period/Culture (thesaurus)
    • Production Technique (thesaurus)
    • Production Place (thesaurus + association code)
    • Production Person (thesaurus + association code)
    • Ethnic group, object made by ethnic group (thesaurus)
    • School, production instance School of (thesaurus)
    • Object State, object produced in state (text)
  • Associated Place
    • Made For Place (thesaurus + associated code)
    • Natural Source place (thesaurus + associated code)
  • Associated People
    • Made For Person (thesaurus + associated code)
    • Authorised/patronised by Person (thesaurus + associated code)

BM Merlin Example

In our collection database an object <obj> is represented as follows (example data given):

Production People
  made by  G & Co   Comment:"dish"

Production Places
  made in  Ghana    Comment:"pendant"
  made in  London   Comment:"dish"

Production Dates
  19thC   (1800-1850)  Comment:"pendant"
  1874    (1874-1874)  Comment:"dish"

Production Technique
  lost-wax cast  (Technique Thesaurus)
  repoussé       (Technique Thesaurus)
  incised        (Technique Thesaurus)
  gilded         (Technique Thesaurus)

BM Merlin Issues

  • Some of the fields are qualified by an association code (eg Place / Person).
  • Some fields just have a thesaurus term (eg Ethnic Group, Period/Culture)
  • Some are literals (Production Date, Object State)
  • There can be multiple fields of each type.

In case of multiple fields, the structure in Merlin does not relate each of these together as a ‘Production instance’ and thus bringing together different aspects of a production into a single event cannot be done.
It should not be done by correlating association codes: Julia explained that ‘engraved_by’ Production Person and ‘engraved_in Production Place does NOT imply they are done together.
These are all separate from each other (i.e. who made the obj, where it was made, when it was made and the techniques used). A human could read this and probably piece together which bits were made by who, where and when by using the comments, but the comment field is not standardised as it is a free text field and used in different ways.

BM CRM Representation

Use diagram object_production@crmg for reference

  • Seme4 put everything under a single Production instance
  • Vlado argued that presents logical inconsistencies, eg the single Production event taking place at two different places and/or dates
  • Josh then modelled each field as a separate Production instance: object/prodiction/1, object/production/2... This is a true interpretation of what BM's database is saying
  • Vlado objected: you've scattered the production properties to too many nodes, so one cannot make intelligent connections
  • After the above explanations about the nature of BM's database, it was agreed that each field should have its own E12_Production instance, eg
<obj> P108i_was_produced_by <obj/production/1>, <obj/production/2>, <obj/production/3>, <obj/production/4>.
<obj/production/1>  P14_carried_out_by  <people_id_G&Co>
<obj/production/2>  P7_took_place_at    <place_id_Ghana>
<obj/production/3>  P7_took_place_at    <place_id_London>
<obj/production/4>  P4_has_time-span    <obj/production/4/date>

If the time comes when we know the missing bits of information we can use sameAs / seeAlso to relate production instances together...i.e

  • If we learn that in a single instance, G&Co made the <obj> in 1874 in Ghana
    <obj/production/1>  - resource for made by G&Co
    <obj/production/2>  - resource for made in Ghana
    <obj/production/4>  - resource for made in 1874 (has the date within it)
    

    Then we sameAs/seeAlso all of these together since they are of the same production

Dual Representation as Term and Sub-Property

"Associated code" (eg "engravedIn" or "engravedBy") is mapped to both:

  1. a property P32_used_general_technique linking to a thesaurus term (E55_Type), eg:
    <thes/association/engravedBy> a skos:Concept; skos:prefLabel "engraved".
    <thes/association/engravedIn> a skos:Concept; skos:prefLabel "engraved".
    
    <production/1> crm:P32_used_general_technique <thes/association/engravedBy>.
    <production/2> crm:P32_used_general_technique <thes/association/engravedIn>.
    
  2. an extension sub-property linking to the field's value, eg:
    bmo:engravedBy rdfs:subPropertyOf crm:P14_carried_out_by.
    bmo:engravedIn rdfs:subPropertyOf crm:P7_took_place_at.
    
    <production/1> bmo:engravedBy <thes/person1235>.
    <production/2> bmo:engravedIn <thes/place1234>.
    

This dual representation allows an application that has no clue about the BMO extensions to display the nature of each Production node

Joshan: I have attached a file with this page which outlines most of the basic association codes (for objects only - not dealing with the associations codes that exist for actual biographical, place records). In the second tab of the excel file I have begun to outline the potenial harmonisation (and hierarchical structure) of the association codes as concepts.

Specific CRM Constructs

Each field is modeled using a specific CRM construct, as proposed below

Merlin Field CRM Construct Example
Production Date
<production> crm:P4_has_time-span <production/date>.
<production/date> crm:P82a_begin_of_the_begin "1630"^^xsd:gYear.
Production Period/Culture
<production> crm:P10_falls_within <thes/period12455>;
   crm:P3_has_note "Production Period/Culture :: Early Christian".
<thes/period12455> a skos:Concept, crm:E4_Period;
  skos:inScheme <thes/period-culture>; skos:prefLabel "Early Christian". #! create this once
Production Technique
<production> crm:P32_used_general_technique <thes/technique4314>.
Production Person
bmo:engravedBy rdfs:subPropertyOf crm:P14_carried_out_by.
<production> bmo:engravedBy <thes/person1235>; crm:P32_used_general_technique <thes/association/engravedBy>.
Production Place
bmo:engravedIn rdfs:subPropertyOf crm:P7_took_place_at.
<production> bmo:engravedIn <thes/place1415>; crm:P32_used_general_technique <thes/association/engravedIn>.
Made by ethnic group
<production> crm:P14_carried_out_by <thes/ethnicGroup1253>.
School of
bmo:schoolOf rdfs:subPropertyOf crm:P14_carried_out_by.
<production> bmo:schoolOf <thes/schoolOf13541>.
Object State <production> crm:P3_has_note "object state"
Made For Place ??? This means "made for use in place", but I can't find a good CRM mapping
Natural Source place ??? What does this mean
Made For Person
bmo:madeForCoronationOf rdfs:subPropertyOf crm:P17_was_motivated_by.
<thes/madeFor/coronation> a skos:Concept; skos:prefLabel "For coronation".
<production> bmo:madeForCoronationOf <thes/person12134>; crm:P17_was_motivated_by <thes/madeFor/coronation>.
Authorised/patronised by Person
bmo:commissionedBy rdfs:subPropertyOf crm:P17_was_motivated_by.
<thes/authorisedBy/commissionedBy> a skos:Concept; skos:prefLabel "Commissioned by".
<production> bmo:commissionedBy <thes/person12134>; crm:P17_was_motivated_by <thes/authorisedBy/commissionedBy>.

Period/Culture

Rather than defining a specific Period/Culture for every production instance:

<object/YCA75313/production/3>
  crm:P10_falls_within   <object/YCA75313/production/3/periodculture>.
<object/YCA75313/production/3/periodculture>
  crm:P10i_contains   <object/YCA75313/production/3> ;
  crm:P2_has_type   thes:x14625 ;

The better/simpler way (shown in the previous section) is to say that the production P10_falls within the period/culture thesaurus term (providing it is defined as a Period):

<object/YCA75313/production/3>
  crm:P3_has_note "Production Period / Culture :: Late Christian Period";
  crm:P10_falls_within thes:x14625 .

Looking at the class hierarchy, production is a period: Production-Modification-Activity-Event-Period; the thesaurus term is another Period; so P10 can be used directly.

  • <!- Define the period culture as a Period-> : Josh, you don't need to do this while processing each object, since the periods are defined in the thesaurus itself, right?

Outstanding Issues

  • Verify the thinking in Dual Representation as Term and Sub-Property by looking at the association thesauri.
    While ok to associate "engraved by" with P32_used_general_technique, this won't work for codes such as "master craftsman", "understudy", etc.
  • You don't have any crm:P7_took_place_at in P&D_133.rdf, is that normal? You guys don't know where prints & drawings are made?
  • there are some remainders of non-ECRM: crm:P2F_has_type (note the "F")

Mixing Techniques and Parts

BMX mixes together Techniques (eg engraved_in <place>) with Parts (eg dust_cap_made_in <place>).
The proper way is to model the separate events separately (Material and Medium-Technique):

  • for Technique (engraving; see diagram material_technique@crmg)
    <obj> P108B.was_produced_by <obj-production>.
    <obj-production> P32F.used_general_technique <thesaurus-technique-engraving>.
    <obj-production> P7F.took_place_at <place>
    
    • or if we need to express several production events (eg casting, engraving, etc)
      <obj> P108B.was_produced_by <obj-production>.
      <obj-production> P9F.consists_of <obj-production-N>.
      <obj-production-N> P32.used_general_technique <thesaurus-technique-engraving>.
      <obj-production-N> P7.took_place_at <place>
      
  • for Part (dust cap)
    <obj> P46F.is_composed_of <obj-part-N>.
    <obj-part-N> P2F.has_type <thesaurus-object-dust_cap>.
    <obj> P108B.was_produced_by <obj-production>.
    <obj-part-N> P108B.was_produced_by <obj-part-N-production>.
    <obj-production> P9F.consists_of <obj-part-N-production>.
    <obj-part-N-production> P7.took_place_at <place>.
    

Over-stated P82, P82a, P82b

When you state this:

<http://collection.britishmuseum.org/id/object/PPA44216/production/1/date> a crm:E52_Time-Span ;
	crm:P82_at_some_time_within "1814" ;
	crm:P82a_begin_of_the_begin "1814"^^xsd:gYear ;
	crm:P82b_end_of_the_end "1814"^^xsd:gYear ;

RS will print "1814-1814".
If there was a single date in the source, emit a single date (P82), not two of the same (P82a=P82b)

Thesauri

Thesaurus Requirements

Thesauri should be represented with both SKOS and CRM:

  • CRM classes are required, so the terms can be linked to from other CRM entities
    • eg CRM P88i_forms_part_of is needed for FRs
  • SKOS gives a uniformity that is used in two RS functionalities.
    In particular, skos:inScheme is critical so we can know all terms in the same thesaurus with a given term
    1. Data Annotation: the user can propose a new value from the same thesaurus
    2. FR Search: after selecting a term:
      • The user is restricted to FRs applicable to that thesaurus
      • Alternate terms can be selected only from the same thesaurus (or "compatible" thesauri that apply to the same FR)
  • Requirements for terms
    • each term is a skos:Concept and one appropriate CRM class:
      E21_Person, E53_Place, E40_Legal_Body, E4_Period, E57_Material, E58_Measurement_Unit, E74_Group; E39_Actor, E55_Type.
      • Use the last two if nothing more specific applies.
      • Distinguish between E21_Person and E40_Legal_Body if you can (if you cannot tell person from organization, only then use E39_Actor)
    • if the thesaurus is hierarchical, use skos:broader and the appropriate CRM relation:
      P88i_forms_part_of (place), P107i_is_current_or_former_member_of (person/organization, if applicable), P9i_forms_part_of (period), P127_has_broader_term (material/type)
  • Requirements for thesauri
    • Each thesaurus is represented as a separate skos:ConceptScheme URI (object, material, ware...)
    • Each term must have exactly one skos:inScheme property
    • It's not critical but is helpful if the term URI starts with the scheme URI, as in Rembrandt
    • The rdfs:label of the ConceptScheme must not be "Concept Scheme" but specific, eg "BM Period/Culture", "BM Material", etc
  • Use ECRM URIs (not CRM SIG URIs) for classes and properties to be consistent

Notes:

  • Martin is against using SKOS for Persons and Places.
  • Dominic will take Martin Doerr's viewpoint as the domain expert.
  • TODO Vlado: I'll post a big discussion [skos-Concept vs Person-Place] on this topic
  • Josh: I don't have a strong enough argument either way as I see both reasons - I agree with you that I'd consider a person to be a concept too. A Person can also be an Artist which is a Concept so I believe they can (not necessarily they would) be a concept themselves.
  • Josh: Is it wise to remove E52_Type from E4_Period? These terms are still part of our thesaurus and are still part of our terminology...
    • Vlado: skos:inScheme indicates they're part of your thesauri, and in fact indicates the particular scheme. They are all skos:Concept, but they are NOT all E55_Type!
      That was the gist of Martin's argument: that People and Places are individuals not E55_Types.
      But somehow he inferred they are also not skos:Concepts, and I fail to see how

Rembrandt Thesauri

As an example, the Rembrandt mapping , eg:

rkd-documentation:action_photograph a crm:E55_Type, skos:Concept;
  skos:inScheme rkd-documentation:; crm:P2_has_type rkd-documentation:;
  rdfs:label "action photograph"@en, "actiefoto"@nl.
rst-iconclass:_98B_HOMER a crm:E55_Type, skos:Concept;
  skos:inScheme rst-iconclass:; crm:P2_has_type rst-iconclass:;
  rdfs:label "(story of) Homer"@en.
rkd-artist:Willem_de_Vries a crm:E21_Person, skos:Concept;
  skos:inScheme rkd-artist:; crm:P2_has_type rkd-artist:;
  crm:P131_is_identified_by rkd-artist:Willem_de_Vries-name.
  rkd-artist:Willem_de_Vries-name crm:P3_has_note "Willem de Vries".
rkd-institution:Boissiere_C_de a crm:E78_Collection, crm:E39_Actor;
  crm:P52_has_current_owner rkd-institution:Boissiere_C_de;
  crm:P50_has_current_keeper rkd-institution:Boissiere_C_de;
  crm:P109_has_current_or_former_curator rkd-institution:Boissiere_C_de;
  crm:P2_has_type rkd-type_where:particuliere-collectie;
  crm:P74_has_current_or_former_residence rkd-type_where:parijs;
  crm:P55_has_current_location rkd-type_where:parijs;
  crm:P131_is_identified_by rkd-institution:Boissiere_C_de-name.
  rkd-institution:Boissiere_C_de-name crm:P3_has_note "Boissiere, C. de".
rkd-plaats:zaankant-regio a crm:E53_Place, skos:Concept ;
 skos:inScheme rkd-plaats:; crm:P2_has_type rkd-plaats: ;
 skos:exactMatch rkd-plaats:zaanstreek-regio ;
 skos:broader rkd-plaats:noord-holland-prov ; crm:P88i_forms_part_of rkd-plaats:noord-holland-prov ;
 crm:P87_is_identified_by rkd-plaats:zaankant-regio-name .
 rkd-plaats:zaankant-regio-name crm:P3_has_note "Zaankant (regio)"@nl .

I'm not satisfied with keeping the same data in 2 redundant ways. TODO: define rules that infer SKOS from CRM properties

BM Thesauri

Currently you use:

  • SKOS for Place
    • You must also mark them as E53_Place
    • You must add in addition to skos:broader
  • FOAF and XFOAF for person/organization
    • You mark them with E39.Actor but must use ECRM
    • You use foaf:Name, but that doesn't account for preferred/alternate names...

Other specifics:

  • The rdfs:label of a ConceptScheme should not be "Concept Scheme" but specific: "BM Period/Culture", "BM Person/Period", "BM Object", "BM Material", etc
    This will be shown to the user, and BM indicates the source (to distinguish eg from "RKD Material"
    • I'd put the same in dc:title and dc:description
  • It's critical that different kinds of terms use different ConceptSchemes. (From thesaurus-config it seems so, but I haven't double checked).

Nitpicking

  • Better skip skos:narrower and rely on OWLIM inverseOf inferencing (else you should also emit P9 / P88 for consistency)
  • Instead of "modernandarchaic" issue two triples: "modern" and "archaic"

Please give here a complete list of the BM thesauri (ConceptScheme URIs) with rdfs:label so I add them to a "meta thesaurus"

Missing Places

RS-718 A lot of major places are missing. It blocks one of our tasks.

  • eg Paris <thesauri/x18074> is missing though it is referenced through many skos:broader
  • from thesaurusAndplace_0.rdf, thesaurusAndplace_1.rdf of 7/12/20115
  • from old endpoint http://collection.britishmuseum.org/Sparql (checked for Paris)
  • in fact I wasn't able to find ANY of the major cities listed in thesauri-sameAs.ttl (in NL):
    amsterdam antwerpen berlijn den-haag edinburgh frankfurt-am-main kassel-hessen kopenhagen leiden londen madrid-spanje new-york-city parijs stockholm wenen

Units

Rather than using BM-specific URIs for units, it'd be better to map during conversion to ontologies that are more established in this area: NASA QUDT (Quantities, Units, and Dimensions).
Eg the RS mapping uses <http://qudt.org/vocab/unit#Centimeter>.

BM Response
All units/ dimensions have been altered to use QUDT apart from 2 units which are BM specific:

  • % of rim
  • oclock

You now use a mix of thesauri for units:

@prefix thesUnit: <thesauri/units/> .
@prefix unit: <http://qudt.org/vocab/unit#> .

If you use QUDT, do it all the way and get rid of the thesUnit prefix.

CSW Ontology

BM Thesaurus uses a CSW ontology that Josh and Vlado don't know what it does.
This is not a problem, just would be nice to learn what it does.

@prefix csw: <http://semantic-web.at/ontologies/csw.owl#> .

But get 404 Not found
– googled "csw.owl" and I find a few mentions in mlists but can't find it
http://www.deq.pl/pub/csw.owl also doesn't exist
– Google even finds it in http://collection.britishmuseum.org/id/thesauri/object
http://sparql.tw.rpi.edu refers tohttp://thedatahub.org/dataset/geological-survey-of-austria-thesaurus/post.ttl
that mentions it but doesn't use it
– google "csw site:http://semantic-web.at" doesn't uncover anything useful,
just blog links to http://www.corporate-semantic-web.de/

Problems 201207

  • MAYBE remove that stuff about added/removed/enlarged/diminished. Let's discuss
    <http://erlangen-crm.org/current/P111i_was_added_by> <http://collection.britishmuseum.org/id/object/YCA80389/acquisition>;
     <http://erlangen-crm.org/current/P113i_was_removed_by> <http://collection.britishmuseum.org/id/object/YCA80389/acquisition>;
    

    BM Response - Done!

  • this doesn't always hold:
    <http://erlangen-crm.org/current/P49_has_former_or_current_keeper> <http://collection.britishmuseum.org/id/person-institution/53456>;
     <http://erlangen-crm.org/current/P50_has_current_keeper> <http://collection.britishmuseum.org/id/the-british-museum>,
     <http://erlangen-crm.org/current/P46i_forms_part_of> <http://collection.britishmuseum.org/id/the-british-museum>;
    

    BM Response - as with above this has been refined

  • if you have two dimensions of the same type, you should still emit two separate Dimensions, eg
    <http://collection.britishmuseum.org/id/object/YCA80389/diameter>
      <http://collection.britishmuseum.org/id/ontology/PX_display_wrap>
        "Dimension Diameter :: 1.30cm :: bead A", "Dimension Diameter :: 1.40cm :: projected - bead B";
      <http://erlangen-crm.org/current/P90_has_value>
        "1.30"^^xsd:double, "1.40"^^xsd:double;
    

    The two displayWraps are ok, but the two values are silly.
    Also: you should emit the note (eg "projected - bead B") as P3_has_note
    BM Response
    Each dimension will have a /number on the end i.e. /diameter/1 /diameter/2 etc..
    The note has also been added as a P3_has_note

  • remove the inverse P108_has_produced
    BM Response - Done!
  • in this PX_display_wrap "Made by ethnic group :: Copper Inuit :: Inuinnait" I don't understand where does the middle part come from
    BM Response
    This is the textual value of the ethnic group thesaurus term. Since thesauri are in a different export and will be SKOS concepts, the RDF from the object to the term is done here and as a display_wrap the text of the term is outputted
  • If this is inscription type, then say "Inscription type" instead of "Inscription note".
    If this is actually a note, then emit P3_has_note and not display_wrap:
    <http://collection.britishmuseum.org/id/object/JCF22970/inscription/1>
         <http://collection.britishmuseum.org/id/ontology/PX_display_wrap> "Inscription note :: :: Artist's inscription";
    

    BM Response
    The text has come from the inscription note field - it is just free text which someone has entered as a comment.

  • use rdfs:label here, no need for sub-property
    <http://collection.britishmuseum.org/id/object/JCF22970/inscription/1/translation>
       <http://collection.britishmuseum.org/id/ontology/PX_has_content> "Portrait of Gordon Marsh";
    

    BM Response - Done!

  • you lose precision here (you have yyyy-mm-dd but you emit only yyyy).
    Print the whole string in P82, and if possible emit P82a,b with higher precision.
    <http://collection.britishmuseum.org/id/object/JCF22970/production/1/date> <http://collection.britishmuseum.org/id/ontology/PX_display_wrap> "Production date :: 1946 :: 26 February";
       <http://erlangen-crm.org/current/P82_at_some_time_within> "1946"^^xsd:string;
       <http://erlangen-crm.org/current/P82a_begin_of_the_begin> "1946"^^xsd:gYear;
       <http://erlangen-crm.org/current/P82b_end_of_the_end> "1946"^^xsd:gYear;
    

    BM Response
    I have standardised the dates for Production & Acquisition based upon Peter Main's description:

As Peter Main has said:
{*}Acquisition date-ranges are rendered as D2 M3 Y4, e.g. ‘23 May 2012’.
Production date-ranges (where only information to the nearest year is deemed to be relevant) are rendered as Y4. So the date above would be rendered as ‘2012’.

Here are some examples:

Acquisition Date:
Text date = ‘1984’ , earliest date = ’01 Jan 2012’, latest date = ’31 Dec 2012’
Text date = ‘1970-1980’, earliest date = ’01 Jan 1970’, latest date = ’31 Dec 1980’.
Text date = ’23 Mar 2012 – 25 Mar 2012’, earliest date = ’23 Mar 2012’, latest date = ’25 Mar 2012’.

Production date:

Text date = ‘13thC’, earliest date = ‘1200’, latest date = ‘1299’
Text date = ‘12thC BC’, earliest date =’1199 BC’, latest date = ‘1100 BC’
Text date = ‘1100 BC – 100 AD’, earliest date = ‘1100 BC’, latest date = ‘0100’
{*}
Therefore I have attempted to format the dates as follows:
- Production Dates
– Date Text: xsd:string (P82)
– Date Earliest: xsd: gYear (P82a) e.g. -0100 (100 BC), 2010 (2010AD)
– Date Latest: xsd: gYear (P82b) e.g. -0100 (100 BC), 2010 (2010AD)

- Acquisition Date
– Date Text: xsd:string (P82)
– Date Earliest: xsd: date (P82a) e.g. 25 Mar 2010
– Date Latest: xsd: date (P82a) e.g. 25 Mar 2010

  • Kill the E11.Modification:
    <http://collection.britishmuseum.org/id/object/JCF22970/production/4>
       a <http://erlangen-crm.org/current/E11.Modification>,
         <http://erlangen-crm.org/current/E12_Production>.
    

    BM Response - Done!

  • if there is not Acquisition Date, then do not produce this time-span node.
  • There's no need to convert cm->m. If it's recorded as cm, you should emit it as cm because that displays better
    <http://collection.britishmuseum.org/id/object/CGR307834/diameter>
        <http://collection.britishmuseum.org/id/ontology/PX_display_wrap> "Dimension Diameter :: 24.00mm ::";
        <http://erlangen-crm.org/current/P90_has_value> "0.024"^^xsd:double;
        <http://erlangen-crm.org/current/P91_has_unit> <http://qudt.org/vocab/unit#Meter>;
    

    BM Response - The reason this was done was because there was no unit for millimetres in the QUDT ontology and only has Meters. Therefore the conversion was made - this is also true of milligrams & millilitres.

Vlado2: You should use the latest version: http://www.linkedmodel.org/catalog/qudt/1.1/index.html, download all Vocabularies>Turtle and Schemas>Turtle
Then in OVG_units-qudt-(v1.1).ttl you find unit:Millimeter and unit:Centimeter, with this data

@prefix qudt:    <http://qudt.org/schema/qudt#> .
@prefix unit:    <http://qudt.org/vocab/unit#> .
unit:Centimeter
      rdf:type qudt:DerivedUnit , qudt:LengthUnit ;
      rdfs:label "Centimeter"^^xsd:string ;
      qudt:abbreviation "cm"^^xsd:string ;
      qudt:code "2016"^^xsd:string ;
      qudt:conversionMultiplier "0.01"^^xsd:double ;
      qudt:conversionOffset "0.0"^^xsd:double ;
      qudt:literal "centimeter"^^xsd:string ;
      qudt:symbol "cm"^^xsd:string ;
      qudt:uneceCommonCode "CMT"^^xsd:string ;
      skos:exactMatch <http://dbpedia.org/resource/Centimetre> .
  • the URIs for these thesauri are mighty convoluted.
    <http://collection.britishmuseum.org/id/thesauri/inscription-script/latin>;
      <http://collection.britishmuseum.org/id/thesauri/script/language/latin>;
    

    I'd suggest:

    <http://collection.britishmuseum.org/id/thesauri/script/latin>;
      <http://collection.britishmuseum.org/id/thesauri/language/latin>;
    
  • you should not emit term definitions in-line.
    And make sure term definitions match the URIs used in the object These are different from above
    <http://collection.britishmuseum.org/id/thesauri/inscription-language/latin> a <http://erlangen-crm.org/current/E56_Language>,
          <http://www.w3.org/2004/02/skos/core#Concept>;
          <http://www.w3.org/2004/02/skos/core#inScheme> <http://collection.britishmuseum.org/id/thesauri/inscription-language>;
           <http://www.w3.org/2004/02/skos/core#prefLabel> "Latin".
        <http://collection.britishmuseum.org/id/thesauri/inscription-script/latin> a <http://erlangen-crm.org/current/E55_Type>,
           <http://www.w3.org/2004/02/skos/core#Concept>;
           <http://www.w3.org/2004/02/skos/core#inScheme> <http://collection.britishmuseum.org/id/thesauri/inscription-script>;
           <http://www.w3.org/2004/02/skos/core#prefLabel> "Latin".
    

    BM Response - The reason for the generation of these in-line is because both language and script are free-text fields and not controlled vocabularies. Therefore anything can be put into them. This data only belongs with the object and therefore can only be produced with the object - but URIs are created out of them in order to create structured data.

  • no need to describe an identifier type in-line: that's what the meta-thesaurus is for. Eg
    <http://collection.britishmuseum.org/id/object/YCA80389/prn>
       <http://erlangen-crm.org/current/P2_has_type> <http://collection.britishmuseum.org/id/thesauri/identifier/prn>; # ok
       rdfs:comment "This is the internal collections database's identifier for this object (record)"; # no need
    

    BM Response - Done!

  • do Events live in-line in objects, or in a thesaurus?
    <http://collection.britishmuseum.org/id/thesauri/event/Asia-Pacific-War-1941-1945>
        a <http://erlangen-crm.org/current/E5_Event>;
        rdfs:label "Asia-Pacific War (1941-1945)".
    

    If in thesaurus, don't emit them in-line
    BM Response - Done! - This is the same as 2 points above - the Title field in Merlin is free text and not from a centralised vocabulary therefore the data only belongs with the object. Thus this structured data is generated with the object data in-line.

P62 vs P67,P129,P139

  • unfortunately P129's domain is conceptual so you can't use it directly at the object:
    <object/CGR307834>
          PX_display_wrap "Subject :: classical deity ::", "Subject :: emperor/empress ::";
          P129_is_about <thesauri/x116470>
    
    • You need an intermediate conceptual node:
      • either a fake one like <object/CGR307834/concept>
      • or somehow reuse a meaningful one like <object/CGR307834/image> or <object/CGR307834/inscription/2>.
      • POSTED ISSUE TO CRM SIG: "P62 needs a parent and a sibling"
    • Josh: I agree with the above and that a parent is required for P62 - but since it doesn't for the moment I believe your first option of creating the parasitic concept node is most applicable (particularly as that is how we have done it for associated people/places/events. Therefore, will it be appropriate to do:
      <obj> P128_carries <obj/concept/M>
      <obj/concept/M> a crm:E72_Information_Object
      <obj/concept/M> P67_refers_to <thes/x1234>
      <obj/concept/M> P2_has_type <thes/association/subjectof>
      
    • Vlado: Use the more specific P129_is_about instead of P67_refers_to. You started from P129, so why give it up and settle for P67?
  • Here neither P62_depicts is correct (that's for physical objects), nor P67_refers_to (that's too generic). Use P138_represents.
    <http://collection.britishmuseum.org/id/object/CGR307834/inscription/3>
        <http://collection.britishmuseum.org/id/ontology/PX_display_wrap> "Refers to :: Nerva ::";
        <http://erlangen-crm.org/current/P62_depicts> <http://collection.britishmuseum.org/id/person-institution/140712>;
        <http://erlangen-crm.org/current/P67_refers_to> <http://collection.britishmuseum.org/id/person-institution/140712>;
    

    BM Response - Done!

Person / Biographical Dates

Vladimir has pointed out that the use of P4_has_time-span cannot be used between <person-institution> and <person-institution/date> in order to associate dates as <person-institution> is not a Period.

Within Merlin, one can associate a number of different dates (ranges) with a biographical record and but it is only qualified by the use of a comment. For example:

  • Display Name: Shah 'Abbas I
    • First Date: 1571
    • Last Date: 1629
    • First Date 1587
    • Last Date 1629
    • Details ruled; AH 995-AH 1038

Here is can be seen that there are 2 pairs of dates associated with Shah 'Abbas I. Upon inspection by a human, it seems that the first range is the birth & death date and the second is the the dates of reign.

Vlado: Has suggested that the the date with the longest range could be the birth/death dates (therefore can be modelled as E63_Beginning_of_Existence/E64_End_of_Existence and the smaller range as E7_Activity. However, this only works when there are 2 dates. If there is only one date - it may NOT be (although often is) the birth/death date, e.g.:

  • Display Name: Shah Muhammad Khudabanda
    • First Date 1578
    • Last Date 1587
    • Details ruled; AH 985-AH 996

The only date associated with Shah Muhammad is his reign and NOT his life span. The only thing that qualifies this is the free-text description.

Therefore I suggest that all dates are modelled as some E7_Activity and joining the Actor & Activity through the use of <P12i_was_present_at>:

<person-institution> a E39_Actor.
<person-institution> crm:P12_was_present_at <person-institution/activity/M>

<person-institution/activity/M> a E7_Activity
<person-institution/activity/M> crm:P3_has_note "Details Text"
<person-institution/activity/M> crm:P4_has_time-span <person-institution/activity/M/date>

<person-institution/activity/M/date> a E52_Time-Span
...

UPDATE: After speaking with Jonathan Whiston Cloud in the Documentation team, he agreed that if there is no text in the details field then it can be assumed that they are birth/death dates. Therefore will model birth/death dates as follows:

<person-institution> a E39_Actor.
<person-institution> crm:P92i_was_brought_into_existence_by <person-institution/birth>
<person-institution> crm:P93i_was_taken_out_of_existence_by <person-institution/death>

<person-institution/birth> a E63_Beginning_of_Existence
<person-institution/birth> crm:P4_has_time-span <person-institution/birth/date>
<person-institution/birth/date> a E52_Time-Span

<person-institution/death> a E64_End_of_Existence
<person-institution/death> crm:P4_has_time-span <person-institution/death/date>
<person-institution/death/date> a E52_Time-Span

Vlado: right, but use the specific stuff for birth & death: E67/P98 & E69/P100

20120809 bm_ontology

  • Check & justify all props that are not subprops of CRM.
    • Eg why bmo:PX_denomination is subprop of rdfs:label but not P3?
    • Eg bmo:PX_field_of_activity_of_the_agent should be subprop of P3, like the rest of them
      BM Response
      All sub properties now sub-class a CRM property. The only ones which remain are: bmo:probably, bmo:unlikely, bmo:property.

– Eg bmo:PX_appellation_in_use is not subprop, and that is wrong. Extensions can define long paths, but not shortcuts.
You should use something like this:

<id> a E41_Appellation; P37i_was_assigned_by <id/start>; P38i_was_deassigned_by <id/end>.
<id/start> a E15_Identifier_Assignment; P4_has_time-span <id/start/date>.
<id/end> a E15_Identifier_Assignment; P4_has_time-span <id/end/date>.

– Except that E15 is applicable to E42_Identifier but no other E41_Appellations, see apellation@crmg
– If BM needs to apply E15 to other appellations (eg E44_Place_Appellation), write to CRM SIG
Josh3:Will compose email to SIG

BM Response
I can see the logic here in perhaps emailing the SIG about this, but there is a property which relates E15 and E41: P142i_was_used_in.
I'm not sure if this works semantically but looking at the scope note: "This property associates the event of assigning an instance of E42 Identifier to an entity, with the instances of E41 Appellation that were used as elements of the identifier." I guess if we were to use it here, we're not assigning an E42_Identifier (or we could also assert someone's name as an Identifier but I don't think that's necessary when the names are like: "King John" or "Caesar") and just saying there as an Identifier_Assignment event and attached a time span to it:

Vlado: P142 is not appropriate. As the scope note and as the prop name "P142 used constituent" state, this is used to describe compound identifiers.
P37i and P38i are appropriate. But as I said, the problem is that E15 is not applicable to E41 (which plagues all of P37, P38 and P142).

Josh: Yes, understood about the identifiers (as we're not assigning any identifiers). I shall email SIG about this and see what they say.

#in the BM ontology
bmo:PX_named_at rdfs:subClassOf crm:P142i_was_used_in.
bmo:PX_unnamed_at rdfs:subClassOf crm:P142i_was_used_in.

<id> a E41_Appellation; bmo:PX_named_at <id/start>; bmo:PX_unnamed_at <id/end>.
<id/start> a E15_Identifier_Assignment; P4_has_time-span <id/start/date>.
<id/end> a E15_Identifier_Assignment; P4_has_time-span <id/end/date>.
  • Every property must have rdfs:label (in addition to rdfs:comment).
    BM Response Done

I can generate these with a script.
Eg

bmo:PX_denomination rdfs:label "denomination"
  • Define domain and range wherever possible (they also serve as a great documentation).
    • This includes range of data properties, but be careful not to overspecify (and for subprop of P3 you don't need to state the range).
    • You can skip domain/range for a subProperty only if they are the same as for the super-property (if not, they must be subclasses).
      BM Response Done

– Eg bmo:PX_type_series: what classes does this apply to?
BM Response
In merlin there is a text field called "Type Series". It is almost a specialisation of the type of object something is, e.g. PRN: BCE187811, Object Type/Name: brooch (from Object Thesaurus), Type Series: Hull & Hawkes, Type III. Here the record is saying the object is a brooch but it also is of type 'Hull & Hawkes' and of 'Type III' which are specific groupings of brooches (but of other object types too). Therefore type series is a sub-prop of P2_has_type and each type series is created as a E55_Type. I have now added skos:Concept, changed rdfs:label to skos:prefLabel and also added it to a ConceptScheme series-type (as Julia in out Documentation Department said if she was to create a controlled vocab for this she would create a new thesaurus)

Vlado: Got it. Be careful how you model it. These things are dependent, so skos:broader should be used.

  • "brooch" is an independent Concept (term)
  • "Hull & Hawkes" is actually a ConceptScheme. It's not a term because you would not use it alone to tag an object.
    If it is too fixed to model it as a ConceptScheme, then model it as a higher-level Concept.
  • "Type III" on its own means nothing. It's a lower-level term.

Assuming some made up URIs:

<object/brooch> a skos:Concept; skos:inScheme <object>.
<object/brooch/hull-&-hawkess> a skos:ConceptScheme.
<object/brooch/hull-&-hawkess/type-iii> a skos:Concept; skos:inScheme <object/brooch/hull-&-hawkess>; skos:broaderMatch <object/brooch>

Josh: There are several issues with this:

  1. 'Hull & Hawkes' is being used as it's own term here. The object has a field called 'type series' and one of the the type-series is 'Hull & Hawkes'.
  2. Assuming a term Type III is a broader match of object/brooch may also be dangerous as it may be a broader match of terms for multiple thesauri according to Julia.
  3. There is no automated way in which to assert the above as mentioned, this is a text field and no proper standardisation of what people enter. Hence the text has been created as part of a URI and made into a Type. This is one of the things that I've left from Seme4's mapping as it seems to be the best way to do it at the moment. Perhaps the above could be done if/when type series becomes a thesaurus in it's own right.?

Vlado2: Got it: you understand better than me what is dependent and what is independent. Glad you won't be using broaderMatch as that may cause some complications in RS.

  • But still a couple concerns:
    • If it's free text, then map it to free text (P3) unless you're certain it's been used consistently in Merlin
    • If you model it as thesauri terms, you should do this in flatauthorities and emit it once. Don't inline it in objects else you'll get duplicate statements in different graphs
    • be careful how you assign URIs, eg <object/type-iii> is quite probably not unique: <object/hull-&-hawkess/type-iii> is better
  • Josh2: As previously mentioned, the only reason anything is defined inline with objects is only because that data lives with the object record and we're generating a resource out of it on the fly...This is true of series-type, collection, title etc and usually is a free text field. The suggestion was that although there is no standardised way in which to use the text field, if people use the same text string, they generally mean the same thing. I think Seme4/Peter/MDoerr decided that it was better to generate this data as structured data...
  • Josh3: I have found a way in which to separately generate inline resources outside of the object graphs so this should be resolved.
  • bmo:EX_Series cannot be subclass of E78_Collection, since that gathers E18_Physical_Things
    BM Response Agreed and done! I've actually modelled this the same as EX_Bibliographic_Series (sub-class of Propositional Object) as it is a similar sort of class (but not the same)
  • Instead of EX_Aggregation, I think you should use crm:E78_Collection (the semantics, i.e. scope notes, is pretty close).
    At least define it as subclass of E78, not of E19_Physical_Object
    BM Response DONE! I actually did this as part of this cleaning up just before I got to this line.
  • Vlado: you don't need PX_published_by (SubPropertyOf P14_carried_out_by). As shown in BM Association Mapping#Produced By Specific Process, you can use this:
    <obj> P108i_was_produced_by <obj/production/M>.
    <obj/production/M> a E12_Production; P2_has_type <production/publication>; P14_carried_out_by <publisher-institution>.
    

    Josh: This has already been done so not sure where this came from...
    Vlado2: it's present in bm_ontology. And as you can see in BM-properties.xls: PX_published_by and PX_published_in are used in bibliography-config.xml (1 occurrence each)

Other bm_ontology notes

  • The previous bm_ontology had 280 properties. Given that all of CRM has 150, that showed inappropriate level of modeling.
  • After BM Association Mapping it is much smaller (7 classes, 30 props), which is a good thing.
  • OMN is better for reading, but to load into OWLIM we need it in Turtle. Here's how to convert it:
    curl --data-urlencode ontology@bm_ontology.omn --data format=Turtle http://owl.cs.manchester.ac.uk/converter/convert > bm_ontology-unclean.ttl
  • The Manchester convertor puts in some junk, so here's a script bm_ontology-clean.pl to clean up
    perl bm_ontology-clean.pl bm_ontology-unclean.ttl > bm_ontology.ttl

    Josh:We have incorporated this into our process...

20120816 Configs and Ontology

I extracted the used classes and properties to [^BM-properties.xls].
Its examination brought a nice catch. I focused on every class/prop that's used rarely, or I haven't used previously.
I also posted a couple of questions/issues to CRM SIG.
I've highlighted in red the classes/props I don't like, and put a comment. But I believe I've reflected all of them below.

Objects

config.xml

  • PX_display_wrap" value="Inscription note :: {bm_inscription_note}"
    This display_wrap is not needed since it just copies the P3 below it
  • Languages, Scripts, Collections and Series should be defined in a thesaurus, and not inlined in object data.
    Else you'll get duplicate facts in different named graphs.
    Search for: id/thesauri/language, id/thesauri/script, id/collection, id/thesauri/series-type
    Josh3: this has been done (see above)
  • Inlined "The British Museum: Gallery {bm_loc}".
    Better to make these into a thesaurus. Else you can't search for eg "all objects in that Gallery"
    Josh3: This has been left as extracting it seems quite complex.
  • bmo:PX_has_transliteration
    The only place the CRM spec speaks of "transliteration" is in P139_has_alternative_form. But that's applicable to E41_Appellation, not to Inscription (which is Linguistic Object).
    I wrote to the CRM SIG to change this
    8/16/2012: ISSUE P139 vs P130; E41 Appellation vs E33 Linguistic Object; Translation vs Transliteration
    So until that is fixed, a subprop of P3_has_note should do.
  • bmo:PX_school_of: instead, use the pattern BM Association Mapping#Influenced By (for S: School of/style of) or BM Association Mapping#Produced By Closely Related Group (for AG: Office/studio of, AJ: Circle/School of)
    Josh3: DONE!
  • crm:P108_was_produced_by : this is a typo (twice!)
    Josh3: DONE!

Aspects

We haven't discussed aspects, so let's discuss
1 - what is the list of aspects
2 - whether it's most appropriate to model as E25_Man-Made_Feature and P56_bears_feature (probably so)
3 - given P56_bears_feature, the inverse P56i_is_found_on is not needed
4 - whether they have any relation to parts like dust_cap_made_at, and how do you model those
5 - whether they have any relation to inscription position such as "obverse", "reverse"

Josh4: According to Peter:
1 - Fixed List (see attached - aspects.csv)
2 - Used for 3 purposes (see Type column on attached spreadsheet). “view” = a ‘view’ of the object, rather than a physical part. “part” is a phsyical part. “language” used to store Chinese translations of some fields. Ignore any data associated with “Chinese” aspect. Views and Parts should probably be mapped differently to reflect the difference in meaning. However, the use of parts is now becoming defunct therefore no object records should have an aspect as part - this to be clarified by Jonathan Whitson Cloud. Structurally it may still exist in the object records however, just with no actual information.
3 - Done.
4 - See 2 above
5 - Yes. Same idea as aspects of type “view”. Logically, an inscription on the obverse of the coin could have been stored within the “obverse” aspect, but it’s actually stored within aspect “1”, with the insciption position field set to “obverse”. I think the only field that used the different “view”-type aspects is Description. Weird I know, but there are historical reasons

As a result I have modelled the aspects (views only) as E55_Types (present in flatauthorities) and done a P2_has_type from the aspect. Views will be modelled as features. Jonathan Whitson Cloud is away at present so he cannot confirm that we're no longer using part. See below about connection with Inscription.

Vlado4: just to clarify, view aspects are modeled as E55_Type of a feature (E26 Physical Feature or maybe better E25 Man-Made Feature) on the object? Sounds right.
Josh5: Yep, this is what I have done.

Collections

"an object may be in an aggregation of objects".
Can you give me a produced file (RDF example)?

  • Not sure why you need both P46i_forms_part_of and P46_is_composed_of
  • This is obviously a typo (remove E35_Title). Sadly, there's no type-check in semweb so any URL goes:
    <type value="http://erlangen-crm.org/current/E35_Title/E78_Collection

    Josh3:DONE!

Biography

biography-config.xml:

  • I firmly believe you should use only CRM here, not FOAF/XFOAF/RDA. Eg
    Nationality: P107i_is_current_or_former_member_of <id/thesauri/nationality/Spanish> (already done)
    Gender: P107i_is_current_or_former_member_of <id/thesauri/nationality/Male>

Josh3: I've decided to do this slightly different as it doesn't seem semantically accurate to model Male and Nationalities in the same ConceptScheme.
Vlado3: my mistake, I meant <id/thesauri/gender/Male>
Josh3: Therefore, I have created a sub-property of P2_has_type: bmo:PX_gender and male & female are E55_Types.
Vlado3: I see what you mean: Gender being binary (most of the time) seems more "type-y" than Nationality.
I still think it's slightly better to model with a group:
P107i_is_current_or_former_member_of <id/thesauri/gender/Male>
Josh4: I'm still not fully convinced at this semantically although I can see the logic. I presume there is no issue with keeping it as a P2_has_type until we can figure this out semantically/philosophically...
Vlado4: Leave gender as E55_Type.
Josh5: Done.

  • this is not proper xsd:date, so it'll fail to parse:
    P81b_begin_of_the_end> "01 Jan 1623"^^xsd:date
  • You use rdfs:label to store the "time-span string" for "birth/death", which is appropriate:
    <person-institution/143765/birth/date> rdfs:label "1706 fl."^^xsd:string.
    But you use P82 for the same of "activity", which is inappropriate (P82 should be a date, gYearMonth or gYear)
    <person-institution/144526/activity/1/date> P82_at_some_time_within "1859 -"^^xsd:string
    Josh3: There seems to be some inconsistency with the biographical records and their dates in terms of format. Therefore I have made the type as xsd:string and when the data has been cleaned up in Merlin I think would be better then to data type it.
  • What is the benefit of using both straight and reverse name forms?
    rdfs:label "Ange Cappel" vs P3_has_note "Cappel, Ange"
    Use just straight
    Josh3: This is just how it has been put into Merlin (or extracted as)
    Vlado3: so rdfs:label is fed from one field and P3_has_note from another: but what's the reason for this?
    Josh4: It's just the way in which it has been put into one of the notes.
    Vlado4: If you have 2 fields in Merlin that are supposed to contain the name, better use only 1 of them to feed both rdfs:label and P3_has_note, in order to ensure consistency
    Josh5: I think there is some confusion here. rdfs:label contains: {bm_bi_names}, and P3_has_note contains: {bm_bi_title}{bm_bi_names}{bm_bi_ntype} (this is an amalgamation of the title, name and the name type). I believe since there is not a huge amount of semantic importance I think it would be best to out these into a display wrap as that is primarily what is being done here.
    Vlado5: So where does "Cappel, Ange" come from? I think you use something else than {bm_bi_names} in P3
  • bmo:PX_appellation_in_use : replace with Identifier Assignment (assigned/deassigned), as discussed
  • This is wrong:
    <.../birth/date> crm:P82a_begin_of_the_begin <bm_bi_fdate_earliest>;
      crm:P81a_end_of_the_begin <bm_bi_fdate_latest>.
    <.../death/date> P81b_begin_of_the_end <bm_bi_ldate_earliest>;
      P82b_end_of_the_end <bm_bi_ldate_latest>.
    
    • P82a/b (outer bounds) describe an interval that the period is contained within.
      P81a/b (inner bounds) describe an interval during which the period was certainly happening.
    • The names of these properties may suggest a story like this:
      "The guy was conceived at begin_of_the_begin, and (presumably 9m later) was finally born at end_of_the_begin.
      He got old and sick at begin_of_the_end, and was dead at end_of_the_end".
    • But you are not describing his whole life period, you are describing separately his birth and death
    • Birth and death are instantaneous events, so P81a/b don't apply: you should use only P82a/b
    • On the other hand, P81a/b is probably accurate for <activity/N/date> (reign), if you check that's the actual semantics of the 4 dates
  • OPTIONAL: when you can recognize a person-institution to be a Person (eg because it has Gender)
    then it's better to be more specific: use crm:Person and crm:Birth/crm:Death (classes dedicated to Person),
    instead of crm:Agent and crm:E63_Beginning_of_Existence crm:E64_End_of_Existence
    • The URIs you use (<birth> and <death>) suggest you believe it's a person
      I like these URIs, don't change them
      Josh3: Person assertion done.
      Josh4: P82a/b usage as also been altered.

Bibliography

bibliography-config.xml

  • bmo:PX_author: instead, use a Creation event having P2 "Authored"
  • bmo:PX_published_by, bmo:PX_published_in: use a separate Creation event having P2 "Published"
    Josh3: DONE!

Josh2: PX_published_by, PX_published_in was used in bibliography-config: not the publication of the object, but of the document the bibliographic record is about
Vlado3: if we use one pattern for publishing an object, we should use a similar pattern for publishing a book.
Difference (as you pointed out): objects are physical since the data is about the only (or a rare) copy. Books are conceptual since we don't know (or care) how many copies. So it's better to use Creation (not Production) for book publication. FRBR/FRBRoo has a lot to say about this (4 levels).

Now it's like this (right?):

<bib> P94i_was_created_by <bib/authoring>, <bib/publication>.
<bib/authoring> a E65_Creation; P2_has_type <production/authoring>; P14_carried_out_by <author>.
<bib/publication> a E65_Creation; P2_has_type <production/publication>; P14_carried_out_by <publisher>.

Josh4: Right!!

Thesaurus

  • given skos:broader, no need to use skos:narrower
    Josh3: This was done for PoolParty application as both narrower & broader needs to be asserted so will keep this as we may still use it.

20120817 Objects

Errors found while inspecting sample objects.
20120801: AES, AOA, ASIA, CM, GR. ME, PD, PE.
A lot of the errors are common, so the sections below just say in which object I found it.
For next iteration, please send me DIFFERENT objects, so I can check different situations.
I use BM-data-pretty.pl to turn the files from ugly long-line TRIG to nice prefixed TTL (and save from subdirs to one dir).

Josh3: I have not changed the display_wraps as although this may increase the number of triples, I have pushed it down in terms of priorities and will do this later.

AES/YCA80389

  • the two PX_display_wrap for acquisition/custody state the wrong direction:
    "Acquisition (Custody Transfer) :: The British Museum to Kudlak, Adam ::",
    "Acquisition (from-to) :: The British Museum to Kudlak, Adam ::",
    "Acquisition (from-to) :: The British Museum to Egypt Exploration Society ::".
    Josh3:Done!
  • id/YCA80389/find:
    shouldn't this be id/object/YCA80389/find to match other event URIs (eg id/object/YCA80389/production/1)
    Josh3:Done!
  • I think you should skip these PX_display_wrap since they duplicate PX_display_wrap at a higher level:
    <YCA80389/find> bmo:PX_display_wrap "Found (in) :: Qasr Ibrim ::";
    <object/YCA80389/production/1/date> bmo:PX_display_wrap "Production date :: 1550-1700AD ::";
    
  • this is wrong: P49 should be at <object/YCA80389> not at the acquisition
    Josh3:Done!
    
    <object/YCA80389/acquisition>
      crm:P49_has_former_or_current_keeper <person-institution/53456>;
    
  • invalid date syntax, won't parse
    crm:P82a_begin_of_the_begin "01 Jan 2008"^^xsd:date;

    Josh3:DONE!

AOA/ENA122547

  • I think you should skip these display wrap since they duplicate ones at lower level (at the acquisition)
    <object/ENA122547> bmo:PX_display_wrap "Acquisition (Custody Transfer) :: The British Museum to Kudlak, Adam ::",
        "Acquisition (from-to) :: The British Museum to Kudlak, Adam ::",
    
  • this P2 is really redundant, since "making" means the same as "production"
    <object/ENA122547/production/5> crm:P2_has_type <thesauri/production/making>;
    Josh3:Noted, but it is from an assocication code - if it was to change/renamed then it may be affected so will leave it for the moment.

Remove it unless it comes from a thesaurus

  • you map "Ethnic Group (Made by) :: Copper Inuit :: Inuinnait" to crm:P14_carried_out_by <thesauri/x83492>.
    This would be ok if scheme <id/thesauri/ethname> was mapped to Group.
    However, you have mapped this to E55_Type in thesaurusandplace_0.trig.
  • "Copper Inuit :: Inuinnait" suggests to me that "Copper Inuit" has a broader group "Inuinnait".
    But I didn't see skos:broader (and P107i_is_current_or_former_member_of) in the thesaurus.
  • Josh3: Agreed about changing E55_Type to Group which I can do.
    But the display wrap is generated by: "Ethnic Group (Made by) :: {bm_eth_name} :: {bm_eth_com}"
    • bm_eth_name is the name of the term
    • bm_eth_com is a comment (free text) in the object record - thus it has been put with the display wrap (probably not important enough for the P3_has_note)
  • Vlado3: if Inuinnait is free text, ok

ASIA/JCF22970

  • <event/Asia-Pacific-War-1941-1945>: Great URI!
    • "a crm:E7_Activity": no, it's an Event
    • if these events are shared between objects, you better define them in a separate thesaurus.
      Else you'll get duplicate facts (a, label) in different named graphs
      Josh3:This has been done...
  • here you got carried away:
    <object/JCF22970/concept/4> bmo:PX_display_wrap "Subject :: soldier ::";
      crm:P2_has_type <thesauri/association/subjectof>;
      crm:P67_refers_to <thesauri/x13394>;
      a crm:E73_Information_Object.
    

    "Soldier" is a concept and "Subject" means it's depicted by the object, so you don't need a fake concept:

    <object/JCF22970> bmo:PX_display_wrap "Subject :: soldier ::";
      crm:P62_depicts <thesauri/x13394>
    
    • Josh3: I thought this was the argument that was occurring with SIG, such that there was no super-property to P62 and doesn't relate to a visual-item. This is similar to how we had it before (instead using P129, but P62 seems better).
    • Vlado3: To map this correctly, we should ask ourselves what is the semantics of BM "subjectof".
      • From the name, maybe "subjectof" maps to P129, which has this scope note:
        P129 is about (P129i is subject of): "documents that an E89 Propositional Object has as subject an instance of E1 CRM Entity"
      • There's another candidate, also subprop of P67 refers to:
        P138 represents: "relationship between an E36 Visual Item and the entity that it visually represents."
      • I'd say the semantics of P129 and P138 is the same (to link to the primary subject of a work), but P138 is applicable only to Visual items.
      • The crucial question is: is it safe to assume that every object (having "subjectof") is associated with a Visual Item? (To answer this, read the scope note of Visual Item)
      • If NO then you need a fake concept:
        <object/JCF22970> P128_carries <object/JCF22970/concept/4>.
        <object/JCF22970/concept/4> a crm:E73_Information_Object.
          bmo:PX_display_wrap "Subject :: soldier ::";
          crm:P129_is_about <thesauri/x13394>.
        

        But you shouldn't put P2 there:

        crm:P2_has_type <thesauri/association/subjectof>;
        

        Either leave it out (129 already means "is subject of"), or put it in EX_Association

      • If YES then you can skip the fake concept. You don't need P138 and E36 Visual Item either, since you can use the shortcut. See scope note below, and image_objects_carriers@crmg as example:
        P62: "a shortcut of the more fully developed path from E24 Physical Man-Made Thing through P65 shows visual item (is shown by), E36 Visual Item, P138 represents (has representation) to E1 CRM Entity"
        Josh4: The answer to this is NO - the use of a subject (and therefore BM's concept of "subjectof") is the object having a relation to a concept through some abstract means and not just showing it visually. Hence I originally created the fake concept and now using P129_is_about to the subject term from the concept. I am not using EX_Association nor P2 anymore for this.
        Vlado4: understood and agreed
  • remove this bmo:PX_display_wrap since it just duplicates the P3_has_note
    Josh4: Display wraps have been moved to main object node only.
    <object/JCF22970/inscription/1> bmo:PX_display_wrap "Inscription note :: Artist's inscription";
      crm:P3_has_note "Artist's inscription";
    
  • here emit P82/a/b with full precision as xsd:date:
    <object/JCF22970/production/1/date> bmo:PX_display_wrap "Production date :: 1946 :: 26 February";
      crm:P82_at_some_time_within "1946"^^xsd:string;
      crm:P82a_begin_of_the_begin "1946"^^xsd:gYear;
      crm:P82b_end_of_the_end "1946"^^xsd:gYear;
    

    Josh4: Dates are now xsd:string for convenience.
    Vlado4: Please note that P82's range is E61_Time_Primitive, which says "instances for time that should be implemented with appropriate validation, precision and interval logic to express date ranges relevant to cultural documentation". Although CRM gives examples in various formats, I think for RDF this means valid XSD gYear, gYearMonth or date types.
    In Martin's definition P82a&b are subprops of P82. This allows an application to consume only P82 and not bother with P82a&b. But it also means all of P82, P82a, P82b should have valid XSD format. Therefore I don't think xsd:string is an appropriate format for these properties.
    Josh5: I can see the logic of what you say, but then there is an issue of how should we present this data? Current we do the following with each of the Production/Acquisition dates:

    date_text     -> P82 (as xsd:string as the format can be 12th Mar 1966 OR 12thC OR 300BC - 100BC etc...)
    date_earliest -> P82a (as xsd:date as it would be just a single date)
    date_latest   -> P82b (as xsd:date as it would be just a single date)
    

    I think the main issue is that the date_text stored for the production/acquisition can be a range NOT just a single time primitive. The range is represented in the earliest/latest fields hence they are xsd:dates. Would it therefore be better to get rid of P82 and just use rdfs:label on the Time-Span? If you use P82a/b are we obliged to use P82 as well?
    Vlado5: You are not obliged to use P82: it will be inferred from P82a and P82b, therefore will have 2 values (date_earliest and date_latest). And yes, put the string in rdfs:label

  • instead of ext prop, map this according to BM Association Mapping#Influenced By
    <object/JCF22970/production/6> bmo:PX_display_wrap "School of :: Nihonga School ::";
      bmo:PX_school_of <thesauri/x18989>;
    

    Josh4: Done. Not sure why this was left out...

  • I'm now thinking this is too complicated:
    <object/JCF22970/inscription/1>
      bmo:PX_has_transliteration "Gaadon Maasu kun";
      rdfs:label "賀亜頓・真須君";
      crm:P73_has_translation <object/JCF22970/inscription/1/translation>.
    <object/JCF22970/inscription/1/translation> a crm:E33_Linguistic_Object;
      rdfs:label "Portrait of Gordon Marsh".
    

    Just because CRM has P73, doesn't mean we need to use it. Why not use multi-valued rdfs:label (the normal semweb way)

    <object/JCF22970/inscription/1>
      bmo:PX_has_transliteration "Gaadon Maasu kun";
      rdfs:label "賀亜頓・真須君"@jp;
      rdfs:label "Portrait of Gordon Marsh"@en.
    
    • I guess you cannot say @jp since you don't have correlation from inscription-language to RDF language codes.
      Then you can leave the original string without language, and just say @en about the translation (since you know it's in English).
    • I think you should also state the title in EN, not just as a display wrap.
      Instead of
      <object/JCF22970/title/1> bmo:PX_display_wrap "Title translation :: Portrait of Gordon Marsh ::";
        rdfs:label "Gaadon Maasu kun (賀亜頓・真須君)".
      

      do this

      <object/JCF22970/title/1>
        rdfs:label "Gaadon Maasu kun (賀亜頓・真須君)";
        rdfs:label "Portrait of Gordon Marsh"@en.
      

      Josh4: Done.

CM/CGR307834

  • you should give P2_has_type to the aspects:
    <object/CGR307834/reverse> a crm:E25_Man-Made_Feature;
    P2_has_type <thesaurus/aspect/reverse>.

PD/PPA357889

  • note to self: this is ok: image/1 is used to carry the association (IP: Portrait of)
    <object/PPA357889>
      crm:P62_depicts <person-institution/110644>;
      crm:P65_shows_visual_item <object/PPA357889/image/1>.
    <object/PPA357889/image/1> crm:P138_represents <person-institution/110644>;
      crm:P2_has_type <thesauri/association/IP>; a crm:E38_Image.
    

20120827 UAT Objects

Inspecting sample objects

  • 20120824: test records for COL UAT (testrecordsforcol.trig)

CGR266697

  • no need for both, skip the free text (PX_denomination) and remove from BMO
    bmo:PX_currency <thesauri/currency/as>;
      bmo:PX_denomination "as";
    

    Josh4: Amending as suggested below

  • here you both divided by 100 and used unit:Millimeter, so the info is wrong:
    <object/CGR266697/diameter/1> bmo:PX_display_wrap "Dimension Diameter :: 21.00mm ::";
      crm:P90_has_value "0.021"^^xsd:double;
      crm:P91_has_unit unit:Millimeter;
    

    Josh4: Forgot to remove the function which performs the divide. It remains for Milli-Litres & Milli-grams since they're not in the QUDT 1.1 ontology.

  • please remind me: are inscription locations free text?
    Below it's feasible to attach the inscription to the feature (/reverse) since the text "reverse" matches (is not really free).
    It is an overkill to create a feature just to attach the inscription location to something, but when there already is such a feature, I think it'd be nice to use it.
    I guess this is a very common pattern for coins?
    <object/CGR266697> crm:P56_bears_feature <object/CGR266697/obverse>, <object/CGR266697/reverse>.
    <object/CGR266697/reverse> bmo:PX_physical_description "Crude Minerva advancing left with shield.";
      a crm:E25_Man-Made_Feature.
    <object/CGR266697/inscription/2>
      bmo:PX_inscription_position "reverse";
    

    Josh4: as per the above note about Aspects, this could have been done but really the Merlin Aspects section has been used to insert multiple descriptions about an object. The above sample RDF could be generated if they put it in Merlin that way - but if they haven't I suggest we model what merlin is saying and rectify these issues at the source. Therefore I will leave this as it is.

  • You need to state a label for the feature, else we can't display what it is. Eg:
    <object/CGR266697/reverse> rdfs:label "reverse";
      a crm:E25_Man-Made_Feature.
    

    Josh4: Done.

CBA266435

  • Ok, I see now what PX_denomination is for: it's a number ("100") in this Banknote:
    <object/CBA266435> bmo:PX_currency <thesauri/currency/tongyuan>;
      bmo:PX_denomination "100";
    
    • Can you check in the DB to see what are the different values of PX_denomination?
    • There is a note about the worth of the unit (tongyuan), which should be emitted as a note, not only as display_wrap.
      Josh4: The Denomination comment is now a P3_has-note in addition to the display_wrap.
      (And there's a typo in "Demomination")
      "Demomination :: tongyuan,100 :: 100 x worth-10 copper dollars"
    • I think this is better modeled as a dimension:
      <object/CBA266435> crm:P43_has_dimension <object/CBA266435/currency/1>.
      <object/CBA266435/currency/1> a crm:E54_Dimension
        crm:P2_has_type <thesauri/dimension/currency>;
        crm:P90_has_value 100;
        crm:P91_has_unit <thesauri/currency/tongyuan>;
        crm:P3_has_note "100 x worth-10 copper dollars".
          # or could put "worth-10 copper dollars" on the unit, but only if it's consistent across objects
      
    • You need to change the term's type to a subtype:
      <thesauri/currency/tongyuan> a E58_Measurement_Unit
    • For the coin described above (CGR266697) you use PX_denomination differently: it just repeats PX_currency. But you shouldn't use PX_denomination in two different ways, else how do you define its semantics?
      For coins I think you should use the same pattern, but skip P90_has_value since it's always 1.
      Then we can kill PX_denomination.
    • It makes sense to keep the currency also attached directly to the object, if we want to be able to search by currency (since there's no search by dimension):
      <object/CBA266435> bmo:PX_currency <thesauri/currency/tongyuan>

      If there is no such need, you could kill PX_currency.

    • Josh4: Happy to change currency to Dimension, this makes sense. This would eradicate PX_denomination. However it should be made aware that the denomination field for the above object is text:
      "tongyuan, 100". I just have a function which parses the text into denomination & currency. I shall now modify the function such that the denomination should have a number value and if it doesn't will not return anything. This should make sure that they do not have the same information.
      Vlado4: couldn't really understand how you handle "if it does not have a number", but I'm sure you'll do it right.
      Josh5: using .NET Parse functions.
      Josh4: Currencies are now E58 and not E55. Have also created a thesauri/dimension/currency Concept which is an E55_Type
  • map this to an Identifier:
    "Serial number :: ... :: Zhi 9856";
    Josh4: Done.
  • note to self: from the data it is correct to treat Production Authority "I: Issuer" (motivator) differently from "M: Moneyer" (creator):
    Shun Yee Savings Bank didn't print the money, but it was printed for the bank
    "Authority Assocication I :: Shun Yee Savings Bank ::",
    <object/CBA266435/production/5> crm:P17_was_motivated_by <person-institution/173589>;
    

PPA263540

  • here you have a precise date
    <object/PPA263540/acquisition/date> bmo:PX_display_wrap "Acquisition date :: 1923 :: 6 March";
      crm:P3_has_note "6 March";
      crm:P82_at_some_time_within "1923"^^xsd:string;
      crm:P82a_begin_of_the_begin "1923-01-01"^^xsd:date;
      crm:P82b_end_of_the_end "1923-12-31"^^xsd:date;
    

    If you can extract the month-day (or it's not free text), put it in P82a&b.
    Put the full date in crm:P3_has_note, not just the day.
    Josh: The date is in the comment field so this is not a standardised thing to do...

  • these two display wraps are contradictory. Remove the first, and add "after" in the second
    <object/PPA263540/production/3> bmo:PX_display_wrap "Production (By) :: Bol, Hans ::",
        "Production (Inflenced by) :: Bol, Hans ::";
    

    The association bears out it's Influenced (more specifically "AT: After")

    <object/PPA263540/production/3/association> bmo:PX_property crm:P15_was_influenced_by;
      crm:P2_has_type <thesauri/production/AT>;
    
    • the main display wrap should also say "... (Influenced) :: ... : after"
      <object/PPA263540>  bmo:PX_display_wrap
          "Production (By) :: Bol, Hans ::",
      

      Josh4: There was an error with the way in which the display wraps was working. This has been amended and moved such that all display wraps are on the object / aspect. In addition, the consolidation of association codes (Such as Association Codes by Specific Process) now means that the text is more generic, so adding the 'after' bit is a little more complicated and since this is in just the display_warp I shall leave it for the moment.

  • How do you know this is "printing"?
    And shouldn't you make an Association for "attributed" (using <thesaurus/likelihood/attributed>)?
    <object/PPA263540/production/4> bmo:PX_display_wrap "Production (By) :: Collaert, Adriaen :: attributed";
      crm:P14_carried_out_by <person-institution/23263>;
      crm:P2_has_type <thesauri/production/printing>;
      a crm:E12_Production.
    

    Josh4: Because it is!!! The record in Merlin is as follows:

    Production: Person
    	Association	AT (After)
    	Name	Bol, Hans (Hans Bol)
    
    	Association	PM (Print made by)
    	Name	Collaert, Adriaen (Adriaen Collaert)
    	Comment	attributed
    
    	Association	Z (Published by)
    	Name	Galle, Philips (Philips Galle)
    

    Adriaen Collaert is the Printer, hence the production node has type printing. The comment is attributed which someone has put in. Perhaps they used an incorrect association code but that would be a data issue. There is already an association code for attributed (Production Person, A: Attributed to) but that wasn't used here.
    Vlado4: The reason I asked is because the display_wrap says "Production (By) :: Collaert, Adriaen".
    If you cannot generate the more specific "Production (Print made by)" (as you explain above), then leave out the part in parentheses altogether.
    Josh5: Since I am putting the display_wrap on the object node ONLY, I must perform the logic of the association codes in two different places which is cumbersome - hence there has been some simplification of the logic to put in the display wrap. Since this is just a display wrap, does it matter as the structured data has the detail?
    Vlado5: the display wrap should not be misleading. "Production (By) :: Bol, Hans" is misleading because his association is "After". So remove the part "(By)"

  • there is no data about the Series:
    "Component of series :: Venationis, piscationis, et aucupii typi ::",

Josh4: The series is now being generated outside of the object graph in it's own graph. There is the RDF: of P46i_forms_part_of to:
<http://collection.britishmuseum.org/id/series/Venationis-piscationis-et-aucupii-typi>

COC231583

  • the dimensions should be on the object, not on an aspect (especially the weight)
    crm:P43_has_dimension <object/COC231583/reverse/diameter/1>, <object/COC231583/reverse/weight/1>;
    

    Josh4: Although this is true I've discussed this with Peter and this is more of a data issue which should be rectified at the Merlin level.

PPA138043

  • carries title? Should be P102_has_title
    crm:P128_carries <object/PPA138043/title/1>;
    

    Josh4: Done.

  • complete P82a/b to xsd:date (1/1 resp 12/31), same as you do for a single year
    <object/PPA138043/production/1/date> bmo:PX_display_wrap "Production date :: 1780-1795 ::";
      crm:P82a_begin_of_the_begin "1780"^^xsd:gYear;
      crm:P82b_end_of_the_end "1795"^^xsd:gYear;
    

    Josh4: Both production dates and Acquisition dates are xsd:date now as per Peter Main's description.
    Acquisition date-ranges are rendered as D2 M3 Y4, e.g. ‘23 May 2012’.
    Production date-ranges (where only information to the nearest year is deemed to be relevant) are rendered as Y4. So the date above would be rendered as ‘2012’.
    Vlado4: Ok re precision.
    But you should use proper XSD formats (xsd:gYear, xsd:date) for P82/a/b (see note about E61_Time_Primitive above), and use xsd:string only for notes.
    Josh5: see my above comment of what to do with the date_text field which was originally being put with P82.
    Vlado5: yesm put date_text in rdfs:label

20120912 Objects

Josh sent 1000 records per department (AES, AOA, ASIA, CM, GR, ME, PD, PE) in RDF_1000_sample.rar

  • when you use xsd:string for dates, the Date search finds nothing (see above "Vlado4" for a bit more detail).
    RS-1021
    Josh5: Is this also affecting the dates used in biographic records as xsd:string is being used for P82a/b.
    Vlado5: Yes, P82/a/b everywhere should be pure dates, with proper XSD type

Thesauri breakage

The thesauri seem screwed

  • they are much smaller, e.g.
    new thesaurusandplace.trig 350k vs old thesaurusandplace_*.trig 72M
    new biography 3.1Mb vs old biography_* 605Mb
    • Are they subset somehow? If so, have you ensured closure (e.g. over broader)?
  • thesaurusandplace seems to include only strange location names like this
    idThes:x203088 a crm:E55_Type,
    skos:inScheme idThes:location;
    skos:prefLabel "KEB/Mezz/cp64/dr8";
    • shouldn't this be crm:Place not crm:Type?
    • at the end there are some like this:
      idThes:x203150 a crm:E57_Material,
      skos:prefLabel "bogus MOS Unit Test - MuseumTerminologyServiceApiTest::CreateTermTest() 03/09/2012 17:53".
  • I've committed to SVN
    only these two: inline-thesauri manual_assertions.ttl
    but left the other theasauri to their old versions.

P50 and/or P52

  • Top-level objects (MOs) must be marked rso:FC70_Thing to be searchable. As stated in FR Implementation-old#FC70_Thing for RS, the rule that implements this for BM objects is "E22 and the current keeper (P50) or owner (P52) is the BM".
    • count objects in trig files by Josh
      > grep -rP "^<http://collection.britishmuseum.org/id/object/\w+/graph>" . | wc -l
      8000
      
    • count in repository:
      select * where {?o a rso:FC70_Thing}
      5525
      
    • BM data doesn't have P50 or P52 consistently.
  • P52 but no P50
    <object/BCB186972>
      crm:P50_has_current_keeper <thesauri/department/H>;
      crm:P52_has_current_owner id:the-british-museum;
    <object/BCB186972/acquisition> crm:P22_transferred_title_to id:the-british-museum;
      crm:P29_custody_received_by id:the-british-museum;
    

    This is marked as FC70_Thing because of P52. However, "P29=BM" in the acquisition is not reflected as P50 on the object.
    I think: every time you state "P50=some BM department", also state "P50=BM": if a department is the keeper, then by implication BM is also keeper.

  • neither P52 nor P50
    <object/CEM307064> 
      bmo:PX_physical_description "Gold coin.";
      # has neither keeper nor owner
    <object/CEM307064/acquisition> crm:P4_has_time-span <object/CEM307064/acquisition/date>;
    <object/CEM307064/acquisition/date> crm:P82_at_some_time_within "1837"^^xsd:string;
    

    Here the acquisition info is very poor. But I think you should emit P50 and P52 by default; and suppress them only if there is acquisition info explicitly saying someone else is keeper and/or owner.

20121023 config review

  • commented out in config BUT PRESENT in ontology:
    bmo:EX_Exhibition, bmo:PX_exhibits
    Josh: Actually they should exist but in the inline-thesaurus which I have now added. Commenting it out in the config was part of the process of not asserting things as part of the object graph
    and to assert things which live outside the object (like definition of exhibition) is now outside. The commenting out was done, but the actual definition of the Exhibition was missing. This has now been added in the inline-thesauri.xml.
  • wrong name:
    bmo:commemorates (correct is bmo:PX_commemorates).
    Josh:Done.

Note to self: We mapped 'IC' to a subprop since the other BM Association Mapping#Associated Events use straight CRM props (no EX_Association)

  • wrong prefix (correct is bmo
    crm:PX_display_wrap (biography-config.xml)
    crm:PX_max_value crm:PX_min_value (config.xml)
    Josh:Done.
  • maybe superfluous? thesaurus-config.xml:

    If Merlin consistently states both, then you can skip skos:narrower since it'll be inferred as inverse of skos:broader
    Josh: Done this due to usage of Pool Party which we may use again at some point which requires the assertion of broader and narrower...

  • XML syntax error. thesaurus-config.xml:
    </triple>>

    Josh:Done.
    I wonder how that goes through RDFer. Please XML-validate all configs.

Copyright

Looking at this config:

It'd produce something disconnected:

<object/PRN> bmo:PX_has_copyright "Some copyright statement";
  crm:P104_is_subject_to <object/PRN/copyright>.

Looking at reproduction_rights@crmg you want something like this:

<object/PRN> 
  P104_is_subject_to <object/PRN/copyright>;
  P105_right_held_by <id/the-british-museum>. # optional, if you're certain it's BM  
<object/PRN/copyright> a E30_Right;
  P3_has_note "Some copyright statement";
  P75i_is_possessed_by <id/the-british-museum>. # optional, if you're certain it's BM  

Josh: I'm leaving out the P105 & P75 as there is a complicated process to calculate whether the BM actually own the copyright (something I'm currently working on in another project) so perhaps we can add this at a later date.

20121106 Series and Exhibition should be E78_Collection

RS-1138

  • When collecting the complete object, we cut off graph traversal at E78_Collection and skos:Concept:
    RS-1139
  • So the type E78_Collection is very important, without it we may spill over into other objects of that collection.

Similarly, Exhibition should be skos:Concept. Else we'll spill from an object over to all objects that were at the same Exhibition.

Series

  1. Maybe you need to use a modifier on {mus_title} to ensure it's a valid URI?
  2. P46i is about physical things, so you can't use it with E89
  3. The proper type is E78_Collection (a collection of physical things).
  4. Instead of a subclass EX_Series, better use straight E78_Collection, plus P2_has_type:
    each object has many rdf:type due to inheritance, but only one P2_has_type

Josh:
#Have altered /series to now be E79_Collection. I believe there was a reason for having the series as a propositional object but it was not documented so this makes sense.
#I have used a P2_has_type to <thesauri/event/exhibition> which is a skos:Concept in the scheme: <thesauri/event> and has a skos:broader of <thesauri/event/normal> which is used for associated event triples.

Collection

Isn't this duplicated?

  • flat-config.xml has <!-- COLLECTION NAME -->
  • inline-thesauri-config.xml has <!-- COLLECTION-->
    Josh: I have kepy the flat-config one as that is complete.

Exhibition

This is way too complicated. Use either Event or Collection, but not both; and you don't need ext classes EX_Exhibition nor EX_Aggregation.

  1. Variant1 (I prefer this):
    <id/exhibition/{mus_auth_flat_term}> a E5_Event, skos:Concept;
      skos:prefLabel "{mus_auth_flat_term}";
      P2_has_type <id/thesauri/event/exhibition>; skos:inScheme <id/thesauri/event/exhibition>.
    <obj> P12i_was_present_at <id/exhibition/{mus_auth_flat_term}>.
    
  2. Variant2:
    <id/exhibition/{mus_auth_flat_term}> a E78_Collection;
      skos:prefLabel "{mus_auth_flat_term}";
      P2_has_type <id/thesauri/collection/exhibition>.
    <obj> P46i_forms_part_of <id/exhibition/{mus_auth_flat_term}>.
    
  3. Do you need to use a modifier on {mus_auth_flat_term}?
  4. Remove EX_Exhibition nor EX_Aggregation, PX_exhibits from bm_ontology.
  5. remove "/objects" from the URI
  6. (cosmetic) call this section of flat-config.xml "EXHIBITIONS", and maybe move to inline-config

Josh:

  • I have gone with Variant1 - I believe there was a reason about talking about the objects of an exhibition as an aggregation and associating that aggregation to the event but this is much of a muchness at the moment, e.g. One could talk about the objects in an exhibition as a whole which is not possible with this modelling.

Vlado: and please use skos:prefLabel instead of rdfs:label throughout. You may ask "but Collections are not skos:Concept":

  • skos:prefLabel is intentionally defined so it's not limited to skos:Concepts but can apply to anything
  • our RForms use skos:prefLabel for "aboutness" objects. It'd be harder to distinguish between the case of "object is a term" (eg Person) vs "object is not a term (eg Collection).
  • skos:prefLabel implies rdfs:label, so you lose nothing

20130407 P128 vs P65; P129 vs P138

Why have you used P129_is_about (in a sub property) off the visual node (has P65_shows_visual_item) rather than as a concept using P128_carries as in the British Museum mapping. In the BM mapping, depicts and represents center around place, actor and ethnic group and it uses P128 for the subject authority. The RKD mapping uses the P129 sub property off P65.

Isnt this a big difference in approach?

It isn't.
P65 (and E38_Image) is more specific than P128 (and concept = E73_Information_Object): image_objects_carriers@crmg.
So whenever you know the Aboutness is due to a Visual item, you should use P65 not the more vague P128.

Search for these prop numbers (esp P65) above and you'll find more (gory) details.

  • A detailed explanation is around "If YES then you can skip the fake concept."
  • BM data DOES use P65-P138 when it can be established the Aboutness is Visual, e.g. in the case of "IP: Portrait of" (PD/PPA357889):
    <object/PPA357889>
      crm:P62_depicts <person-institution/110644>;
      crm:P65_shows_visual_item <object/PPA357889/image/1>.
    <object/PPA357889/image/1> crm:P138_represents <person-institution/110644>;
      crm:P2_has_type <thesauri/association/IP>; a crm:E38_Image.
    

All the RKD objects are paintings, so we make an E38_Image node instead of a fake concept = E73_Information_Object node.
In particular, IconClass is an iconography thesaurus about Paintings, so P138_represents is the exact way to link to these terms.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.