Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

comes from AdLib collection management software and is better structured

There are currently no attachments on this page.

Introduction

  • see Rembrandt for a description of this project
  • We got sample XMLs (a record list, 2 individual records about 2 paintings). Don't have an XSD schema
  • tags are in Dutch but there are English comments; Excel can translate automatically
  • <object_number_RKDtechnical> includes English versions for many of the Dutch fields; and most of the values it adds are bilingual

Database Schemas

Older schema (2009_12_02): matches closely the XML data

viewpdf: The viewfile macro is unable to locate the attachment "2009_12_02 Datastructuur RKDtechnical.pdf" on this page

Newer schema (2011_06_01): doesn't match the XML data, and doesn't look entirely precise

viewpdf: The viewfile macro is unable to locate the attachment "2011_06_01 Datastructuur RKDtechnical voorstel.pdf" on this page

Adlib Correspondence

20110917: Adlib is vendor of the base collection management system

  1. We're starting work on the www.ResearchSpace.org project for the British Museum. One of the tasks is to convert the Rembrandt project database to CIDOC CRM and import it to ResearchSpace. Dominic Oldman (IS Development Manager of BM) sent us some sample records from Rembrandt (attached), and I noticed they are exported from your system: the root is <adlibXML>.
  2. are you involved in the Rembrandt project, or they only use your system?
    • We are involved in the Rembrandt project. We made the UI for the web application and RKD have used our software to create the RKD technical database. The RKD technical database is a database that has been created by the RKD themselves and has been used as the basis for the Rembrandt database. It is currently hosted at the RKD. From our side a programmer has worked on this who has left in the meantime. Ilya has taken over his responsibilities in this project.
  3. could you send us the schema of the Rembrandt XML export?
    • The RKD can send you the data dictionary of the database. Then you can also see if all the tags have translations. Furthermore you can probably access the data through the Adlib API online. The RKD must open their system to allow you access. Using the API you can query the live system and get data out of it (rather than exporting). Technical information on the API can be found on api.adlibsoft.com
  4. would it be possible to export using English tags? (We don't speak Dutch). There are English XML comments in the file (very useful) but the conversion will be more robust if the tags are in English.
    • Sure, but that’s up to the RKD to decide.
  5. does AdLib have conversion/export to some standard format (e.g. LIDO) that can be used by Rembrandt? It would be higher value to BM if we develop an import from a standard format.
    • We do support LIDO for object information and have a standard output format for our Adlib Museum System. But as I already have explained the Rembrandt database has been created by the RKD itself. As such it does not map to LIDO using our Standard XSLT transformation. Again, this is up to the RKD to change this.

RKD Correspondence

RKD is the Rembrandt project technical partner

9/26/2011 Jan Teuben: Last week I got an email reply from Adlib, about your questions for Adlib, forwarded and I'm curious if I could maybe help you if you still have questions regarding the Rembrandt Database project. Let me know if I could help.

Unanswered questions below are marked as
The most urgent need at present is to get the thesauri, else we need to make up values
Next priority is to get the complete data set

Data Stability and Schema

  1. Has the export changed (or will it change) significantly from the attached?
    • Right now the export is the same, but it will change in the future multiple times. Let me explain in more detail: right now the project website is still in beta stage and hosted at Adlib at an old copy of our Adlib application (RKDtechnical). We want to move the website and connect it to the newest version of RKDtechnical. We also want to release version 1.0 later this year and in the future more versions will follow. Together with the website, the Adlib application we build here at RKD is continuously changing, although major changes aren't desirable, because external partners are also working with this application.
      After moving the website, a lot of the export will change already since I made a lot of changes the past six months.
    • Vladimir: We've started intensive work on mapping this to CRM, so Dominic and I need to figure out some strategy regarding future changes
    • Wietske 14 Oct: we have not yet responded to Vladimirs e-mail of October 6, as we are afraid that some of the answers will not only take a lot of time from us, but also from you in the end. This has to do with the fact that our database and website are still undergoing changes, and the information we give you today will be different from what we can give you in a few months time. We have discussed this with Dominic and sent him an overview of our current status and planning yesterday. We will await what comes out of this.
    • Vladimir: I understand the risk, but unfortunately we cannot wait since it's very important for us to ground our technical work on some specific data. The CRM is too abstract to use as a data model for RS, without having an "application profile" over specific data. So don't worry about the upcoming changes, it's up to Dominic and me to figure out some way to accommodate it in RS.
      That's also the reason why I'm in a hurry to get clarifications on the various questions.
  2. Does the older schema (2009_12_02) represent the current XML data, and the newer schema (2011_06_01) your plans for changes?
  3. Can you send me a current export of the same record priref=2926 "De badende Suzanna" ?
    • Please find attached (badendesuzanna_adlibxml.xml)
    • Thanks! I compared the new file you sent against the old one, and the differences are minor (see sec [Record Versions]).
      Also thanks for the other sample 32162 "Portret van Herman Doomer"!
  4. Do you have a schema of the Rembrandt XML export and can you send it?
    • Please find attached. We use multiple schemas, I send you the schemas for page settings and global settings. Schemas for filtering and search aren't necessary I assumed
      GlobalSettings.xml GlobalSettings.xsd PageSettings.xml PageSettings.xsd
    • Thanks for these (GlobalSettings.xml especially is useful since I can see which IIP servers are used, etc).
      But I meant XSD for the data (object record) export, defining all elements and their nesting
  5. If not: we can figure out all tags by looking at a complete set of export files (which we don’t yet have), but if the schema is a bit complicated (e.g. xsd:choice) we can't figure it out
    • send a complete set of export files, and we can figure out all tags on our own

Multi-linguality

  1. Can we assume that all text inside <object_number_RKDtechnical> is in EN,
    except explicitly tagged values such as <value lang="nl-NL" invariant="false">paneel (eikenhout)
    • In the future, yes; right now, no.
      This is a bit of a difficult issue for us. We want to make our data available in at least English and Dutch. Some of our partners will entry data in German or French. The problem is that there are some issues with multilingualism inside Adlib application and also within our own applications. Since only RKDtechnical is somehow multilingual, but other RKD applications like Artists, Persons and Images aren't.
  2. Can we assume that all text outside <object_number_RKDtechnical> is in NL?
    The tag names there are in Dutch.
    • In the future, yes; right now, no.
    • RDF allows to mark a string with a language (e.g. "Badende Susanna"@nl), or not mark it at all.
      Since there are no lang tags on free text fields in your XML, we have two options:
      • Assume EN and NL as outlined above, and as we see it in Susanna.
        That's what we'll do unless Dominic decides otherwise
        • Pro: will allow us to display only one of the fields, in the user's preferred language
        • Cons: the assumption could be wrong
      • OR not put any language
        • Cons: will have to display both language variants, even if they are exact translations. Or pick one at random, and risk the user not being able to read it
  3. Austin Nevin said that you've added EN translations to the NL part, is that true?
    Have you done it with explicitly tagged values, e.g. <value lang="nl-NL">... and <value lang="en-US"> ...
    • My colleague Wietske has added EN translations to the NL parts in the adlibxml example of record priref=2926 "De badende Suzanna". She send these files on the 17th of June to Dominic. See forwarded email. But these files and the translations (between <!-- engelse vertaling -->EN translation) are just for some extra context.
    • Thanks, we already had these and the <!-- EN translations --> were very useful. My question was if there are additional EN variants of free text fields, but you've answered that above

Data Structure

  1. Is your extension inside <object_number_RKDtechnical>?
    Does everything outside this element come from Adlib's standard system?
  2. What to do when the same piece of data is in both parts (inside and outside <object_number_RKDtechnical>) ?
    If the data is always consistent, then we can combine (when multiple values are appropriate) or pick one (when a single value is needed). E.g.:
    • The reason that some information is repeated in the export, has to do with the fact that for The Rembrandt Database we combine data from two databases (RKDimages and RKDtechnical). RKDimages is a database for very elaborate art historical data and RKDtechnical is a database for metadata on technical documentation on works of art. To identify a specific work of art to which the technical documentation is related, RKDtechnical also contains brief art historical data, which is a duplication with some information in RKDimages. The two were developed more or less separately in the past, but we want to integrate them in the near future. The brief art historical data in RKDtechnical will be replaced by the much more elaborate data in RKDimages. In the export, you can easily tell which is which, because the RKDtechnical fields are all in English, whereas the RKDimages fields are still only in Dutch. So in the end the values in the Dutch fields will prevail. So if the data in the two fields is not exactly the same, it's best to pick the Dutch fields.
    • This answers my question re single-value fields:
      <hoogte> prevails over <object.size.height>.
    • (No need to say that we also strive to make RKDimages multilingual, but this has large implications for our institution and will take some more time...).
    • Regarding multi-value fields, I think the decision is also clear:
      we'll combine <benaming_kunstwerk> with <title> because one gives the @nl string, and the other the @en string.
  3. If it is useful for you, we could give you a list of all fields in RKDimages which are duplicated in RKDtechnical?
    • No need: we see them, but I'd be obliged if you verify the mapping (in the last column) after we complete it
  4. Regarding <value> with @lang attribute:
    • What is @invariant?
    • Guess we must map such values to a thesaurus?
    • (internal question: should we strip trailing "-US" from lang leaving only "en"?)
  5. Please XML-encode all text. The last XML you sent ([^badendesuzanna-new.xml]) includes bare "&" and similar, and will give us trouble

Image URLs

  1. Regarding the painting: where is the filename (or URL) to the main painting image on the IIP server? Will we get the images?
  2. Regarding the xrays and other research images (eg <file.image>= mh0147_front_nl_2002_001.tif):
    are they available in IIP? Or will we get them separately?
  3. Please give me a complete URL to an image on IIP as an example.
    I can see in GlobalSettings.xml that you use two IIP servers (one in NL, another in UK).
    I would guess the URL is made out of these parts:

RKD Thesauri

  1. Which thesauri do you use?
    • We assume RKD locations, people, concepts, artworks, IconClass, but would like you to confirm explicitly
    • Are these the same well-known RKD thesauri shown in red on the attached picture? Can you tell from the numbers? E.g. People (=RKDartists) having 331,455
      Unable to render embedded object: File (eCulture data cloud.png) not found.
  2. Can you urgently send us the thesauri used by the Rembrandt project?
    Preferably in SKOS if you have them in that format.
  3. Is it possible to make an export with the ID's (codes/URIs) of the controlled fields, in addition to the text label?
  4. If not: can you list for each controlled field:
    the thesaurus it came from, which branch (parent), whether it’s immediate child or descendant of any depth?
    We need this info so we can lookup and store the ID.

Thesauri description

  • RKDartists (an elaborate thesaurus of artist names and other persons in the 'art historical scene', with information on name variants, life dates, dates & places of activity, references to publications, etc.)

In RKDimages and RKDtechnical

  • geographical terms (cities, countries)
  • institutions names (museums, laboratories, etc)
  • whereabouts type (e.g. museum, private collection, church, etc.)
  • object type (e.g. painting, sculpture, drawing, etc.)
  • shape (e.g. vertical rectangle, oval, etc.)
  • support (e.g. panel, canvas, paper, etc.)
  • technique (e.g. oil paint, pen and brown ink, pencil, etc.)
  • qualification attribution (e.g. after, possibly, studio of, etc.)
    (Please note that due to the duplication between parts of RKDtechnical and RKDimages as described above, there is also a duplication in some of the controlled vocabularies we use. Both databases use for instance controlled vocabularies of institutions names, but they are not all the way identical. We intend to have everything integrated in Spring 2012.)

In RKDtechnical:

  • persons names (researchers, conservators, etc. Will be integrated with RKDartists in due time)
  • research type (e.g. x-radiography, normal light studies, dendrochronology, etc.)
  • analytical techniques (=techniques applied on paint samples. Will be integrated with "research types" shortly)
  • equipment (=specific cameras, microscopes or other type of equipment used, with specifications)
  • computer hardware (=hardware that was used to create certain documentation, such as scanners)
  • software (=software that was used to create certain documentation)
  • research reason / objective (e.g. conservation, publication, exhibition, etc.)
  • object status (e.g. before treatment, during treatment, after treatment, etc.)
  • area captured (=part of the painting/paint sample captured in an image, e.g. back overall, front upper right corner, etc.)
  • magnification (for images taken with a light- or stereomicroscope)
  • document type (e.g. slide 35 mm, digital-born color photograph, research report, etc.)
  • documentation whereabouts (=location where the documentation is kept within an institution, e.g. conservation studio, library, archive, etc.)
  • documentation number type (=numbering system which is used for certain groups of documentation within an institution, e.g. inventory number, registration number, negative number, etc.)
  • reason for sampling (=reason why a paint sample was taken, e.g. conservation, attribution, etc.)
  • sample type (e.g. cross-section, dispersed sample, varnish sample, etc.)
  • location/area type (for paint samples, e.g. flesh, foliage, sky, etc.)
  • location/area color (for paint samples, e.g. red, brown, etc.)
  • paint defects (for paint samples, e.g. smalt discoloration, saponification of lead white, etc.)
  • paint layer function (for paint samples, e.g. ground, surface paint layer, varnish, etc.)
  • field (for images of paint samples, e.g. bright field, dark field, etc)
  • light (for images of paint samples, e.g. normal light, uv, etc)

Specific Field Questions

  1. <vervaardigd_plaats_land> country/place of making
    How about <toeschrijving>.<land_plaats_anoniem> attribution.city_or_country ?
  2. <iconclass> IconClass classification
    Is IconClass available as an RDF thesaurus? Can we get it?
  3. <plaats> depicted location
    Are you sure that's depicted on the painting? Geertruidenberg is in province North Brabant in south Netherlands
  4. <oorspronkelijke_lijst> original frame or not
    What is this "x"? I guess the field is mandatory and the user didn't know what to enter
  5. <toeschrijving> attribution
    Are these statements about the creator (artist)?
  6. <kwalificatie> attribution qualification
    What is this "of"?
  7. <land_plaats_anoniem> city or country if anonymous
    Is this used only if <naam>="Anoniem"?
  8. <positie_signatuur_ref> position of signature
    Is this some key? How to decode?
  9. <artistiek_verband> type of artistic relation; (excel translation) artistic context
    "Drawn by" does not describe the relation between the reproduction and the original; it simply means de Poorter has drawn the reproduction. What type of relation does this represent?
  10. <collectie_afdrukken> collection print
    What is this "x"? I guess the field is mandatory and the user didn't know what to enter?
  11. <inv.nr._bruikleengever> (MT) inventory number of loan giver
    Is this inv.no of the Rijksmuseum?
    • Wietske:
      'bruikleen' means 'loan' (a work of art lend from one to another institution on a temporary basis)
      'bruikleengever' means 'lender' (the institution lending the work of art to another institution, the borrower)
  12. <bruikleen_naam> (MT) loan name
    Is Rijksmuseum the owner, while Mauritshuis is the curator (care taker)?
    You say "temporary loan", but it seems from the data that it's been like this since 1816
  13. <object_record_number>
    Is this an auxiliary ID that is not used elsewhere?
  14. <title.other_older>
    Combine with <andere_benaming>?
    Notice they are different: one is "Batseba (heette ten onrechte)" i.e. "Bathsheba (was wrong name)", the other "Susanna and the elder"
  15. <whereabouts.name>
    Is this the <collectienaam> of the last effective <collectie>? (<collectie> is multiple while <whereabouts.name> looks single)
    If so, how exactly we define "last effective"? Options:
    • last in XML order. that's what we'll do if no answer
    • empty <einddatum_in_collectie>
    • largest or empty <einddatum_in_collectie>
  16. <whereabouts.city>
    Is this <plaats_collectie_ verblijfplaats> of the last effective <collectie>?
  17. <link_documentation_record.lref>
    Does this relate to any record ID, or some other key?
  18. <reference_image.front>
    What is this "x"? Only if front_back=FRONT?
  19. <reference_image.back>
    What is this "x"? Only if front_back=BACK?
  20. <file.application> file.application
    Dominic: HTML-escaped fragment including a Flash and a link with URL and many other attributes. Can we trust this HTML and embed it directly into the GUI? Is Flash allowed?
  21. <sample.location.vert>, <sample.location.hor>
    Are these in CM? Are they used only if <object.shape> is Rectangle?

Sample Record Analysis

  • We are working with a sample record about Susanna Bathing, an important painting created by Rembrandt in 1636 that is at Mauritshuis, The Hague. Susanna, who is just about to bathe, is accosted by two elders (barely visible at the back right) who are hiding in the shrubbery and tried to force Susanna to give herself to them
  • In 1637 Rembrandt created Susanna and the Elders that is Gemäldegalerie der Staatlichen Museen, Berlin. Here the two old men step forward, and their indecent intentions are obvious. This is an apocryphal biblical story (text added to the Book of Daniel) that's painted by at least 7 other classic masters (Van Dyck, Tintoretto, etc)
  • In 1654 Rembrandt created by Bathing Bathsheba (Batsheba, Batseba) that is at Musée du Louvre, Paris. It is based on a similar biblical story about Kind David raping another bathing woman. These paintings are sometimes confused, and looking at the similarities in the paintings it is easy to imagine why. Eg our record about Susanna Bathing says:
    • <andere_benaming> other/former title: "Bathsheba (was wrong name)"
    • a collection remark: "the Hague 18-05-1768 (Lugt 1683), nr. 12: '(FR) A Bathsheba, with a bath, raped by David'...

Not to go too deep into it, but you get the idea that art attribution and research is a complicated and sometimes confusing affair

Record Versions

  • [^badendesuzanna-new.xml] (got from Jan Teuben@RKB, 10/4/2011, original name was "badendesuzanna_adlib.xml")
    Mariana: This file does not open, gives XML Parsing Error: not well-formed
  • [^badendesuzanna-old.xml] (got from Dominic, original name was "rdb-samplerecord_English translations.xml")

What I did:

  • New: split lines per tag (it was a single huge line) and indented properly
  • Old: untabified (tab -> 2 spaces)
    • Split some elements (all with value "RKD", etc) to 3 lines, so they compare better to New
    • Split one line between tags
  • compared with Araxis Merge

Differences:

  • New: no English translations of the Dutch tags. But these are in our excel anyway
  • New: content is not properly escaped, eg
    • "list & lijden": should be &
    • The complex HTML excerpt in <file.application>. It doesn't have a single root: has a sequence of elements (first is <a>, last is <object>)
  • New: includes more <value lang> variants
    • Example1 (old had just "FRONT"). "0/1" are not proper languages, interpret as 0->en, 1->nl
    • Example2 (old had just "RKD")
    • happens for the following elements
      <object.size.unit>
      <doc.size.unit>
      <file.image.location>
      <file.application.location>
      <file.spec.object_status>
      <file.spec.overall_detail>
      <file.spec.front_back>
      <sample.location.vert.start>
      <sample.location.hor.start>
    • All of these are thesaurus values to be mapped to URI, so it doesn't change the mapping
    • Now we have not just the code but also the titles
  • Old: <plaats_tentoonstelling> was a number (eg 490: a bug we noticed earlier). New: now is a proper name, eg Berlijn (but is that from a thesaurus?)
  • Old: <instelling_tentoonstelling> was empty, now is present (eg Gemäldegalerie)

Full Sample Record

I've simplified the XML sample record to a table for easier comprehension, by using (all hail!) Emacs and these commands:

key command from to explanation
M-% query-replace </.*?>   remove closing tags
C-u C-x C-o my-delete-blank-lines     remove empty lines
M-% query-replace >< > < add a space here: <empty-tag/><!-- English comment>
M-= query-replace-regexp \^(TAB*) \,(format "%dTAB%s" (length \1) (make-string (length \1) ?-)) replace leading tabs with N (tag level) and leading dashes (to conserve space while still showing the hierarchy)
M-= query-replace-regexp >(.*)(<\!-\- (.*) \-->) > \3TAB\1 move English comment right after the tag, put element content in new column
M-= query-replace-regexp >([^ |]) >TAB\1 put remaining element contents in new column

rdb-sample.xls

Reduced Sample Record

I've reduced the sample by leaving only 1 instance of each element. In many cases I merged elements so the remaining one has all possible sub-elements (which may lead to non-sensical data, eg begindatum_in_collectie>einddatum_in_collectie).
This can be used as the frame on which to discuss and later describe the CRM mapping
rdb-sample-reduced.xls

viewxls: The viewfile macro is unable to locate the attachment "rdb-sample-reduced.xls" on this page
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.