Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

Notes from May-Jul 2013
RS-1874

Intro

Lec: (Our mapping) is same as Dominic's manual (to best of our understanding)
Vlado: Does it comply with BM's latest changes to modeling Association codes (esp re Acquisition, Production)? BM Association Mapping v2. Dominic's document probably reflects this, but these are recent changes and I haven't checked.

YCBA uses the following systems:

  • BededWork = calendaring
  • TMS = art collections
  • Drupal = website, exhibitions etc
  • Orbis = books etc

Getting Yale On-board

Lec: If there is something else you need to get this to work with Research Space please let me know
Vlado: Once it's compliant, we should:

  • try loading the data
  • load your thesauri. Complete Getty or only the subset used by your objects? How about Broader terms?
  • implement Image Annotation over your DeepZoom images (we use IIP Image for RKD images, while you use IIIF)
  • coreference some of your terms to enable cross-collection search
  • create RForms for your objects

See RS Plan 3.7#Get Yale on-board with ResearchSpace for details. As of 08-Jul-2013, this iteration is under planning and the exact scope and start is not clear. I think we can get Yale on board before mid-Sep (the Getty meeting), but it depends on the exact scope.

It appears that Yale is not the bottleneck: starting the RS3.7 iteration is. So please let's not rush this review process!

Legend

Please don't use color or strikethrough, use the following symbols for easier tracking (easiest to edit them in Wiki Markup mode):

  • open issue
  • resolved issue
  • issue under discussion

Eyeball

Lec: I am reviewing for any typos, missing types...

  • Have you tried Eyeball? See here: RDF Validation and Conversion#Eyeball
    RS-1071
    Lec: We tried Eyeball, no luck have to contact dev community as we were not able to install it after number of tries. TBD..

General Problems

  • STRONGLY Suggest to have 1 URI per object, not 3 sameAs URIs
    Lec: BM has multiple, followed their lead
    Vlado: RS currently cannot work with these sameAs (eg would return results in triplicate). BM puts sameAs in separate files that we don't load
    Lec: we still need to figure this out (don't want to publish two data sets one for RS and one for the world) - not pressing low priority
    Vlado: this is high priority since RS cannot work with 3 sameAs URLs. And there's no good reason to publish your objects under 3 URLs: please explain why you want to do this
  • don't emit prefixes you don't need: lccn, oclc, ycba_aat, etc etc
  • crm:PX_* (e.g. crm:PX_display_wrap) is wrong, should be bmo:PX_*
    Lec: fixed with http://collection.britishart.yale.edu/id/ontology/PX_display_wrap
    Vlado: Please use bmo:PX_* and not ycba:PX_*: don't create a second property with the same purpose.
  • Same holds about classes: use bmo:EX_Association not ycba:EX_Association: don't define your own class for the same purpose.

Pubby

These issues are not about the mapping, but how RDF is presented by Pubby.
If you switch to Forest / OwlimWorkbench, they'll go away (and probably be replaced by different issues )

  • low prio Pubby prefixes are not setup: shows "?:..."
    Lec: does same for BM, will try to fix

Link to download Turtle file

I'm now getting actual RDF and Turtle files for the sample set on box.com.

Better to fix this on Pubby & VUFind:

Inverse links considered harmful

I cannot examine a URI like http://collection.britishart.yale.edu/id/page/thesauri/department because pubby tries to show "is P51_current_keeper of" all objects and that takes forever.
Is there a way to switch these inverse links off?

Thesauri

Whole Getty or Parts

postponed

Currently you emit thesaurus data together with the object, and only the terms used in the object:

  • This way you miss Broader terms, so eg a search for "Animal" or "Mammal" won't find FR Transitivity
    • Lec: regarding Places, why don't we use Geographic Coordinates (included by Yale) and search by a bounding box?
    • Vlado: RS currently doesn't have bounding box search, because BM Places thesaurus doesn't include Geographic Coordinates.
      RS has place name search, that uses the place hierarchy.
  • This way you repeat data about the same term in many objects
  • If you start emitting objects in separate graphs like BM does (to be able to easily replace/delete), the term data will be duplicated in each of these object graphs
    As you can see already happens about YCBA itself:
    <http://vocab.getty.edu/resource/ulan/subject/500303557> a crm:E74_Group , skos:Concept ;
      skos:prefLabel "Yale Center for British Art" ;
      skos:inScheme <http://collection.britishart.yale.edu/id/thesauri/institution> ;
      a crm:E74_Group , skos:Concept ;
      skos:prefLabel "Yale Center for British Art" ;
      skos:inScheme <http://collection.britishart.yale.edu/id/thesauri/institution> .
    
  • In several cases these in-object ad-hoc terms don't satisfy Thesaurus Requirements (see next section)

My strong recommendation is to export the complete Getty thesauri

  • we shouldn't wait for Getty to do an official mapping, since it'll take a few months for TGN and ULAN, and it won't satisfy the requirement to publish as CRM (see next section)
  • I can do this mapping. I'll also be involved in the Getty's mapping, so that's a good synergy
  • Getty's committed to publish as LOD, so hopefully they won't object, as soon as we mark our export as Unofficial
  • use a separate thesaurus export config, like BM does

Lec: I export terms within objects because they are present in the LIDO XML.
But to export the complete tehsauri, I would need to get someone to export from RDBMS, and we cannot do this right now

Decision:

  • For the time being we'll stay without Broader terms
  • Dominic and Vlado to try to expedite the Getty export by Getty

Thesaurus Requirements

You must comply with BMX Issues#Thesaurus requirements
Each term should be both a CRM entity of appropriate type, and skos:Concept.

  • You have this for some (eg Agents) but not others (eg Title Type).

Meta-Thesaurus

Each thesaurus (ConceptScheme) used by Yale should be described in Meta-Thesaurus and FR Names#YCBA Thesauri (this section will be merged to the rest of the table)
This applies to both Getty and YCBA-local thesauri.

Each ConceptScheme determines a CRM type (rso:hasRange), which in term determines which FRs (searches) are applicable to it. The mechanism is described in Meta-Thesaurus and FR Names#Thesaurus to FR Compatibility. Consequently RS needs each term to be attached to a single ConceptScheme.

The CRM types are shown on the diagram at the end of that page, and listed below. For each CRM type we give the relevant ConceptSchemes. While ULAN and TGN map to one type, AAT comprises several Facets (top-level URLs) that map to different types:

  • crm:E39_Actor: ULAN, YCBA People
  • crm:E53_Place: TGN, YCBA Places
  • crm:E4_Period: AAT Styles and Periods Facet (see Period/Culture)
  • crm:E57_Material: AAT Materials Facet
  • rso:E55_Technique: AAT Activities Facet (includes Processes and Techniques)
  • crm:E55_Type: all the rest of AAT: Objects, Associated Concepts, Physical Attributes, Agents (these are kinds of agents, not specific agents!), Brand Names (see Brand Names)
  • crm:E58_Measurement_Unit (not searchable): YCBA Units
  • crm:E31_Document (not searchable): YCBA Bibliography

Therefore:

  • Export AAT terms to different concept schemes. Use the actual AAT Facet as concept scheme (I may be recommending something similar to Getty as well), but isolate in a prefix definition. Eg:
    @prefix aat_materials:  <http://vocab.getty.edu/aat/300264091> . # Materials Facet
    @prefix aat_activities: <http://vocab.getty.edu/aat/300264090> . # Activities Facet, includes Processes and Techniques
    @prefix aat_periods:    <http://vocab.getty.edu/aat/300264088> . # Styles and Periods Facet
    @prefix aat:            <http://vocab.getty.edu/aat/300000000> . # all the rest of AAT
    
    <http://vocab.getty.edu/aat/300014078> skos:inScheme aat_materials:  . # canvas: Materials facet
    <http://vocab.getty.edu/aat/300230058> skos:inScheme aat_activities: . # oil golding: Activities facet
    <http://vocab.getty.edu/aat/300111159> skos:inScheme aat_periods:    . # British (modern): Styles and Periods facet
    <http://vocab.getty.edu/aat/300033618> skos:inScheme aat:            . # paintings (visual works): Objects facet
    <http://vocab.getty.edu/aat/300250148> skos:inScheme aat:            . # horses (animals) [prefLabel=Equus caballus (species)]: Agents facet
    
    • You already do this for Technique but use a Yale URL for the scheme (may be ok for now, but need to change in the future)
      aat:230058 a skos:Concept; skos:prefLabel "oil gilding";
        skos:inScheme <yale/thes/technique>.
      

Limitation:

  • If a painting has a Subject selected from Materials, Activities or Styles then currently it cannot be found (since a term from these facets does not invoke the is/has/about FR). But such Subject is not very likely
  • future extend RS to handle AAT as one ConceptScheme that includes a number of facets (hierarchies) for object type, material, technique, etc

Brand Names

Yale to check whether they use any terms from the Brand Names Facet, in particular:

Vladimir: I think having a Brand Names Facet is logically inconsistent. These are Techniques, Materials etc; so they should be put in the corresponding facet, and have a flag to mark them as Brand Name. I'll talk to Getty

Emmanuelle:  yes, we are using Formica (TM) as a material that went in the making of the table in Damien Hirst's installation http://collection.britishart.yale.edu/id/page/object/4908

This is the path of the 'Formica (TM)' term that we indexed: Formica (TM) / <plastic by production method> / plastic / organic material / <materials by composition> / materials / Materials / MATERIALS FACET / Art & Architecture Thesaurus

As Vladimir explains above, in our 15+ year old copy of the AAT there is no <brand name materials> facet. The term is in the Material facet.  The current online AAT has a <brand name materials> facet, but it still does not include Formica (TM) (it's still in the Materials facet)

Searchable/Taggable Thesauri

Emmanuelle: It would be helpful to briefly go over the definitions for searchable and tagable.

  • Searchable is a thesaurus that can be used in FR search. The list of FRs is Meta-Thesaurus and FR Names#FR Names Table and the detailed definitions are FR Implementation. Examples:
    • BM Object is searchable using FR2_has_type "is/has/about" because it's mapped to P2_has_type of the object
    • BM Ware and BM Currency are searchable using the same FR because they are sub-properties of P2_has_type
    • BM Aspect is not searchable because it's P2_has_type of E55_Type of a E25 Man-Made Feature on the object (side of coin)
    • If IPTC code is similar to "subject", it should be searchable
    • BM Unit and BM Dimension are not searchable because they are attributes of a Dimension of the object, and there's no FR defined for dimensions
    • BM Place is searchable, even in a hierarchical way
    • BM Place Type (town, village) and BM Place Name Type (modern, archaic) are not searchable because FRs don't reach into the properties of a place
  • Taggable: whether the thesaurus is "interesting enough" to be used as a source of tags. Tags are general categories to be used for categorization of research questions and comments. See Tags Spec

Term Distribution

  • Yale: 99% of Yale terms come from TGN, AAT, and ULAN.
    1% of terms come from ODNB, IconClass, YCBA Local terms (Frames, ...)
  • Yale: example of a lesser known person (Elihu Yale) who's found in ODNB, VIAF, DBPedia but not ULAN:
    http://www.oxforddnb.com/view/article/30183
    http://viaf.org/viaf/46310522/
    http://dbpedia.org/page/Elihu_Yale
    • Vlado: such "local heros" are a typical pattern for any museum. BM People also has "local heros" that are not found in ULAN.
  • Lec: will it be helpful if we make connections to ODNB, VIAF, DBPedia?
    • Vlado: yes, assuming you can easily export such term data according to Thesaurus Requirements. If you source it from these external sources, you'd need to make the same SKOS & CRM mapping as for the rest, and register in the Meta-Thesaurus.
      If these are indeed less than 1%, I'd source them from a single thesaurus YCBA Local.

There will be a meeting at Getty in September 2013, with 1/2 day discussion on Vocabularies

Term Code Discrepancy

Your LIDO has some AAT codes that are truncated compared to the original. Eg object/7:

The original codes are aat:300015050 and aat:300014078. I have seen the same for "oil gilding": "230058" vs aat:300230058.

You map "canvas" to a local term (instead of AAT), and skip "oil paint" altogether:

<http://collection.britishart.yale.edu/id/thesauri/14078> a crm:E57_Material , skos:Concept ;
  skos:prefLabel "canvas" ;
  skos:inScheme <http://collection.britishart.yale.edu/id/thesauri/material> .

Don't know how this happened but it is crazy. Can you fix it in the conversion script?

  • Emmanuelle: YCBA LIDO has some AAT codes that are truncated compared to the original because this is how the AAT codes were 15+ years ago when they were loaded in TMS.  Unfortunately the vendor never updated the TMS thesaurus manager.
  • Vladimir: The best way is to fix at the source (TMS). Think you need to speak to them about an upgrade. If they can't do it soon enough, then fix in the conversion.
  • Emmanuelle: If we want to fix this problem in the conversion script, then it will be helpful to know that AAT terms always start with 30 and are always 9 digits.  So if the old AAT code for 'oil paint' was 15050, the current one should be 300015050.  The same behavior happens in the TGN with the 70 prefix. 
  • Lec: We can catch and add the prefixes but this cannot be done in RDFer (limitations of the digit functions) let me see if our move to XSLT is done, and if COBOAT can fill this in. Back in touch with you soon
  • Ken: All the more reason to leave those old things out for now. Also worth bringing up at Getty in September. 
  • Vladimir: At this stage Yale will be emitting its own version of terms (embedded in object data), so Lec you don't need to fix it. It matters only to your students when they do coreferencing to BM thesauri. But longer term it needs to be fixed.
  • Emmanuelle: One thing to be aware of, however, is that the current AAT is somewhat different from our 15 year old copy and some terms valid back then have been decommissioned today. 

Local Terms

don't export a term as "-1", eg http://collection.britishart.yale.edu/id/thesauri/AAT/-1

  • Lec: Emmanuelle, please have some students go through TMS, I exclude now anything that has -1
  • Lec: Emmanuelle, there are cases where subjects are TGN where conceptID=0, I will try to ignore. Example: object/34
  • Vlado: in chat Emmanuelle said "-1" come from Yale local terms. Then emit them as such, don't look them up in AAT.
    You absolutely must emit the local terms, else you'll be missing important data.

Agents

  • remove crm:E55_Type: a Group is not a Type
    <thesauri/nationality/British> a crm:E55_Type , crm:E74_Group , skos:Concept ;
  • SKOS says one prefLabel (per language). If you don't have a flag in TMS, call the first one prefLabel and the rest altLabel
    <person-institution/142> a crm:E21_Person , skos:Concept ;
    	skos:inScheme ycba:person-institution ;
    	skos:prefLabel "Robert Smirke I" , "Robert Smirke R. A." , "Robert Smirk" , "Robert I Smirke" , "Robert Smirke" ;
    

    Lec: Awaiting Emmanuelle confirmation if subjectActor will have multiple names, currently not in LIDO
    Emmanuelle: yes a fair number of our subjectActor have alternate names in addition to their preferred names.

Life Dates

you don't have any date (P82_at_some_time_within) for <person-institution/142/birth/date>. This makes all the following statements useless, so kill them.
Lec: we now have P82

<person-institution/142>
	crm:P92i_was_brought_into_existence_by <person-institution/142/birth> ;
<person-institution/142/birth> a crm:E63_Beginning_of_Existence ;
	crm:P4_has_time-span <person-institution/142/birth/date>
  • Same for death

This is wrong

<person-institution/6046/birth> a crm:E67_Birth ;
	crm:P82_at_some_time_within <person-institution/6046/birth/date> .
<person-institution/6046/birth/date> a crm:E52_Time-Span ;
	rdfs:label "1609" ;
	crm:P82a_begin_of_the_begin "1609"^^xsd:gYear .
<person-institution/6046/death> a crm:E69_Death ;
	crm:P82_at_some_time_within <person-institution/6046/death/date> .
<person-institution/6046/death/date> a crm:E52_Time-Span ;
	rdfs:label "1672" ;
	crm:P82b_end_of_the_end "1672"^^xsd:gYear .
  • The relation to E52 must be P4 not P82
  • You don't use P82a vs P82b depending on the nature of the event (birth or death). When you have a single event date, just use P82

Should be:

<person-institution/6046/birth> a crm:E67_Birth ;
	crm:P4_has_time-span <person-institution/6046/birth/date> .
<person-institution/6046/birth/date> a crm:E52_Time-Span ;
	rdfs:label "1609" ;
	crm:P82_at_some_time_within "1609"^^xsd:gYear .
<person-institution/6046/death> a crm:E69_Death ;
	crm:P4_has_time-span <person-institution/6046/death/date> .
<person-institution/6046/death/date> a crm:E52_Time-Span ;
	rdfs:label "1672" ;
	crm:P82_at_some_time_within "1672"^^xsd:gYear .

Dates Variety

Lec: we have more variety in Person dates. Emmanuelle provided examples, Vlado provided Turtle code:

Lec: LIDO may not contain the correct data in the earliestDate and latestDate in vitalRecord for all dates esp., most of these above are in Display, so I recommend we ignore for time being
Vlado: ok, but first consider the Turtle snippets above

Person vs Group vs Institution

If you can distinguish in LIDO different kinds of Actors (Person/ Group (informal)/ Legal Body (institution)) then use specific subclasses and subprops. The number of dashes below shows class nesting:

Actor Begin begin prop end end prop
E39_Actor E63_Beginning_of_Existence P92i_was_brought_into_existence_by E64_End_of_Existence P93i_was_taken_out_of_existence_by
-E21_Person -E67_Birth P98i_was_born -E69_Death P100i_died_in
-E74_Group -E66_Formation P95i_was_formed_by -E68_Dissolution P99i_was_dissolved_by
--E40_Legal_Body -E66_Formation P95i_was_formed_by -E68_Dissolution P99i_was_dissolved_by

URLs: strictly speaking "/birth" and "/death" are correct only for E21_Person, but it's good enough to also use for other actors

Lec: we are taking into account both person and groups
Vlado: consider the extended version above. You don't have to distinguish E40_Legal_Body: you can stick with E74_Group

Thesaurus URIs

  • Use more logical URIs that reflect the nature of the resource or type, and don't reflect their genesis in existing systems:
    <thesauri/event/exhibition_history> -> <thesauri/event/exhibition> (an exhibition is NOT "exhibition history")
    <event/some-exhibition/TMS/exhibition_history> -> <event/some-exhibition/identifier> (an identifier is NOT "exhibition history")
    <thesauri/identifier/TMS/exhibition_history> -> <thesauri/identifier/exhibition> (doesn't matter your system is called TMS)
    

    Lec: this may need further discussion, we may have other types of events with IDs from other systems, however made changes per suggestion
    Vlado: you have a point. If you have 2 exhibition IDs then you need to add the system acronym

Exhibition URIs

  • We need to make decision on URI for exhibition, originally we had a short identifier, BM suggested title, this does not always work well, eg see: ObjectID 34
  • Vlado: Yes, pretty long titles in http://collection.britishart.yale.edu/id/page/object/34.
    Exhibition :: An American's Passion for British Art - Paul Mellon's Legacy, 2007-2008
    Exhibition :: Great British Paintings from American Collections: Holbein to Hockney, Thursday, September 27, 2001 - Sunday, December 30, 2001
    Exhibition :: J. M. W. Turner - A Selection of Paintings from the Collection of Mr. and Mrs. Paul Mellon, 1968-1969
    
  • RS doesn't care what the URI is
  • Lec: updated with standard ID based URIs

Getty URIs

Bibliography

Objects

Titles

Title Types

  • Why do you need these duplicate types?
    crm:P2_has_type <thesaurus/title/Alternate-title> , <thesaurus/title/alternate> .
  • I'm not sure what "Repository title" is. But if it means Preferred, then this is also unnecessary duplication:
    crm:P2_has_type <thesaurus/title/Repository-title> , <thesaurus/title/preferred> .
  • Emmanuelle: the capitalized title types talk about the purpose of the titles, not their ranking. Here are all title types possible: Alternate, Collective, Creator's, Exhibited, Foreign language, Former, Inscribed, Repository, Verso.
  • Emmanuelle: The lowercase title attributes (alternate and preferred) talk about the ranking/preference of the titles.
  • Can an object have different Repository title and Preferred title?
    Emmanuelle: no, all Repository titles are always the preferred ones. But the alternate titles are not all of the type Alternate.

Vladimir:

  • CRM has no notion of "preferred title" (unlike P48_has_preferred_identifier)
  • RS prints the titles in order, together with the title type
  • Luckily Lec also emits rdfs:label equal to the preferred title, so we use that in result lists

I propose to merge the two sets of values because preferred/alternate is already covered by Repository/all-the-rest:

  • Emit "Preferred" instead of "Repository"
  • Emit the titles in order, the Preferred one first
  • Don't emit "alternate"
  • shorten the term URL a bit since the thesaurus URL already says "title":
     <thesaurus/title/ForeignLanguage>

Duplicate Titles

  • these two titles are duplicated. Keep just one of them: I suggest <title/1> for uniformity with the alternate title(s)
    <object/19850/title/1> a crm:E35_Title ;
      rdfs:label "Malvolio Dancing" ;
      crm:P2_has_type <thesaurus/title/Repository-title> , <thesaurus/title/preferred> .
    <object/19850/title/primary> a crm:E35_Title ;
      rdfs:label "Malvolio Dancing" ;
      crm:P2_has_type <thesaurus/title/Repository-title> , <thesaurus/title/preferred> .
    

Title Language

  • (optional) Indicate the title language:
    <object/19850/title/1> a crm:E35_Title ;
      rdfs:label "Malvolio Dancing"@en ;
      P72_has_language <thesaurus/language/english>.
    
  • Emmanuelle: we indicate the language of the titles only if they are in foreign language <thesaurus/title/ForeignLanguage-title>, and probably not consistently. All the other titles are understood as being in American English, the official language of the YCBA.
  • Vladimir: fair enough! So indicate language only for that type, and say it's translation of the Preferred one:
    <object/19850/title/N> a E35_Title;
      rdfs:label "Malvolio Danse"@fr ;
      P2_has_type <thesaurus/title/ForeignLanguage>;
      P72_has_language <thesaurus/language/french>;
      P73i_is_translation_of <object/19850/title/1>.
    

    (Actually it's more likely this is the original title, so you may want to use P73_has_translation instead of P73i)

Related Resources

List all URLs closely related to the object: web pages, LIDO XML, etc.
Eg for http://collection.britishart.yale.edu/id/object/5005 "Mrs. Abington as Miss Prue in Love for Love by William Congreve" this includes:

Representing Related Resources

An important question is how to represent these related resources and how to link them to the object. CRM doesn't have specific classes for "web page" or "XML record" but E31_Document is appropriate: "identifiable immaterial items that make propositions about reality. These propositions may be expressed in text, graphics, images, audiograms, videograms or by other similar means. Documentation databases are regarded as a special case of E31 Document." (Therefore a single XML record is also E31_Document). See document_references@crmg, reference@crmg

It's also nice to include the media type of these documents (dc:format).

  • for web pages that's "text/html"
  • for LIDO we use "text/xml". There's no registration for LIDO specifically, so we follow RFC 3023: XML Media Types: "If an XML document – that is, the unprocessed, source XML document – is readable by casual users, text/xml is preferable. Application/xml is preferable when the XML MIME entity is unreadable by casual users."

Representation:

<http://collection.britishart.yale.edu/id/object/5005> P70i_is_documented_in
  <http://collections.britishart.yale.edu/vufind/Record/1669236>,
  <http://collections.britishart.yale.edu/oaicatmuseum/OAIHandler?verb=GetRecord&identifier=oai:tms.ycba.yale.edu:7&metadataPrefix=lido>,
  <http://discover.odai.yale.edu/ydc/Record/1669236>,
  <http://www.google.com/culturalinstitute/asset-viewer/mrs-abington-as-miss-prue-in-love-for-love-by-william-congreve/tQHBb0Q2MZF2uQ>.
<http://collections.britishart.yale.edu/vufind/Record/1669236>
  a E31_Document; dc:format "text/html"; P2_has_type <thes/document/home-page>.
<http://collections.britishart.yale.edu/oaicatmuseum/OAIHandler?verb=GetRecord&identifier=oai:tms.ycba.yale.edu:7&metadataPrefix=lido>
  a E31_Document; dc:format "text/xml"; P2_has_type <thes/document/lido-xml>.
<http://discover.odai.yale.edu/ydc/Record/1669236>
  a E31_Document; dc:format "text/html"; P2_has_type <thes/document/ydc-page>.
<http://www.google.com/culturalinstitute/asset-viewer/mrs-abington-as-miss-prue-in-love-for-love-by-william-congreve/tQHBb0Q2MZF2uQ>
  a E31_Document; dc:format "text/html"; P2_has_type <thes/document/google-art-page>.

Optionally, you could add Creation records to state who created the above documents.

We already use E31_Document for Bibliography. For symmetry, we should add P2_has_type:

<http://collection.britishart.yale.edu/id/object/80>
  crm:P70i_is_documented_in <http://collection.britishart.yale.edu/id/bibliography/1075>.
<http://collection.britishart.yale.edu/id/bibliography/1075> a crm:E31_Document ;
  P2_has_type <thes/document/bibliography>;
  rdfs:label "David Lee, Ladies of the Knight, Arts  Review, Vol. 47, May 1995, pp. 26-29, N1 A792 + (A & A)" .

The type terms mentioned above:

<thes/document/home-page> a E55_Type, skos:Concept; skos:inScheme <thes/document>;
   skos:prefLabel "Home page (VUFind record)".
<thes/document/lido-xml> a E55_Type, skos:Concept; skos:inScheme <thes/document>;
   skos:prefLabel "LIDO XML record".
<thes/document/ydc-page> a E55_Type, skos:Concept; skos:inScheme <thes/document>;
   skos:prefLabel "Yale Digital Collections Center page".
<thes/document/google-art-page> a E55_Type, skos:Concept; skos:inScheme <thes/document>;
   skos:prefLabel "Google Cultural Institute (Google Art) page".
<thes/document/bibliography> a E55_Type, skos:Concept; skos:inScheme <thes/document>;
   skos:prefLabel "Bibliography".

<thes/document> a skos:ConceptScheme; skos:prefLabel "Document Type".

Images

Image Metadata

Yale keeps numerous image assets, and Yale ODAI provides extensive metadata about the images:

Description: I haven't read the documentation but here's what I see.

path/field eg description
X/derivatives/Y   X ranges over Image Views (0..M, 0 is Main View), Y ranges over Image Sizes (1,2,3,6,7)
./formatId 1 size id
./formatShort sm size name
./label Screen small size label
./url http://deliver.odai.yale.edu/content/id/482f519c-eebf-4596-819c-4c8197c4d3e5/format/1 logical URL (request this)
./source http://b02.deliver.odai.yale.edu/48/2f/482f519c-eebf-4596-819c-4c8197c4d3e5/ba-obj-7-0001-pub-sm.jpg physical URL (redirects to it)
./bucketDNS b02.deliver.odai.yale.edu physical server
./bucketName b02.deliver.odai.yale.edu physical server
./bucketPath 48/2f/482f519c-eebf-4596-819c-4c8197c4d3e5/ba-obj-7-0001-pub-sm.jpg physical path
./contentId 482f519c-eebf-4596-819c-4c8197c4d3e5 GUID
./filename ba-obj-7-0001-pub-sm.jpg file name
./unitAccessOnly false Only the Yale unit that created the image should have access
./cas false Login through CAS is required (campus-only access)
./captcha false protected by captcha? Only size 5 are
./format image/jpeg image format
./sizeBytes 33169 file size
./pixelsX 249 width
./pixelsY 186 height
X/metadata   describes Image View X
./caption cropped to image, recto, unframed enumerates Flags of the Image View
./source Yale Center for British Art credits
./imageCredit Digital Image: Yale Center for British Art credits
./webStatement http://hdl.handle.net/10079/gb5mkww redirects to http://britishart.yale.edu/collections/using-collections/image-use
./usageTerms http://hdl.handle.net/10079/gb5mkww always same
./imageCopyrightNotice   always empty
./imageCopyrightMarked false always false
./assetId d70ae2b604a64bd24809441a5d24233a8d406925 another GUID
X/contentId 482f519c-eebf-4596-819c-4c8197c4d3e5 GUID, same as above

Questions:

  • Are there other formats that we care about?
    • Lec: we only care about image/jpeg, image/tiff, image/jp2. In the longer future maybe pdf/a, mp3, mp4, 3D formats, TBD as they will have different viewers
  • What are unitAccessOnly and cas, and do we care?
    • Lec: Proxy, CAS, login + session ticket. We do care as Linked Data may not always be Open, we can have some LOD and some LD. I can imagine on the long run giving access to all data and those without access with only see LOD. For now you can ignore.
    • Vlado: Then these flags should be used to filter the dataset.
      If you publish something out, it becomes LOD even if your intent is for some of it to be non-open LD

Image URLs

ODAI has several URLs that redirect to the physical URL:

  1. Using object id:
    http://deliver.odai.yale.edu/content/repository/YCBA/object/<objectId>/type/2/format/<Y>

    eg http://deliver.odai.yale.edu/content/repository/YCBA/object/7/type/2/format/2

    • Unfortunately such redirect is set only for the Main View (X=0) (I tried varying "type" but got nowhere).
      It's not suitable as a permanent URL, since if YCBA decides to remove one view from public access, all others after it in the sequence are promoted (decremented)
  2. Using repository name and filename:
    http://deliver.odai.yale.edu/content/repository/YCBA/id/<filename1>/format/<Y>

    where filename1 is "filename" with "formatShort" chopped off and extension replaced with ".tif"
    eg http://deliver.odai.yale.edu/content/repository/YCBA/id/ba-obj-7-0001-pub.tif/format/2

  3. Using ODAI GUID: 
    jiraissues: Invalid URL: RS-1920

In the sections below we use these image aliases for brevity. In RDF the actual http URL should be used. Not a "made up" node with P1_is_identified_by pointing to the actual URL

Image Formats

YCBA keeps images in many sizes. These sizes or "formats" are over 15 and include video, 3D models, etc.
The ones I've encountered for images are listed below ("width" is just an example):

url suffix format code format label width file type use in RS
format/1 sm Screen small 250 jpeg result list thumbnail (or could use 2)
format/2 med Screen medium 480 jpeg lightbox, object view, data basket preview
format/3 large Screen large 1920 jpeg  
format/6 print-lg Print large 3000 tiff  
format/7 JPEG2000
Zoom (JPEG 2000) 4279 jp2 This is the max available size. We don't use this URL for annotation since it serves the whole Deep Zoom Image

Deep Zoom Image

Many YCBA objects have Deep Zoom images (JPEG2000), sometimes even several per object.
Eg Miss Prue http://collections.britishart.yale.edu/vufind/Record/1669236 has:

This is an IIPMooViewer client using a Djatoka Adore IIIF server
(May 2013: IIP Server has beta support for the IIIF Image API)