View Source

{excerpt}from AdLib collection management software, with additions by RKD{excerpt}
h1. Introduction
Important attachments and links:
- [Rembrandt]: description of this project
- [^]: XMLs of all 11 Rembrandt paintings
- [^]: JPG images of all paintings (323 of which 45% are missing), see [Documentation, Files, Images#Number of Images]
- [\\ontonas\all-onto\Projects\ResearchSpace\data\Rembrandt-data\images\]: 18 TIFs (deep-zoom pyramid images) for Susanna, 1.2 Gb
- [Rembrandt XML]: issues and problems with the XML, description of new XML (not for RS3.1)
- [Rembrandt thesauri]: thesauri used by the data

- We got XMLs for 11 paintings
- tags are in Dutch but there are English comments; Excel can translate automatically
- <object_number_RKDtechnical> includes English versions for many of the Dutch fields; and most of the values it adds are bilingual

h2. Data Sources
- website: []
- painting list (there are 12): []
- one painting: []
- click on "Debug tools: Adlib XML" at the very top and you get the XML.
- it uses the AdLib API to get the data from a URL like this:
"Grouped" is important! Examples:
-- [de-badende-suzanna|]
-- [portret-van-herman-doomer|]

h2. Database Schemas

Don't have an XSD schema.
[^2009_12_02 Datastructuur RKDtechnical.pdf]: matches closely the XML data

[^2011_06_01 Datastructuur RKDtechnical voorstel.pdf]: New database schema (2011_06_01). RKD are restructuring the database. This schema doesn't match the XML data, and doesn't look entirely precise yet. It's a risk for the data migration, but we can't wait for the new schema, so we'll proceed with the old one.

h1. Adlib Correspondence

20110917: Adlib is vendor of the base collection management system
- Bert Degenhart Drenth []: CEO
- Ilya Gorbylev []: dev who currently supports Rembrandt

# We're starting work on the project for the British Museum. One of the tasks is to convert the Rembrandt project database to CIDOC CRM and import it to ResearchSpace. Dominic Oldman (IS Development Manager of BM) sent us some sample records from Rembrandt (attached), and I noticed they are exported from your system: the root is <adlibXML>.
# are you involved in the Rembrandt project, or they only use your system?
#- We are involved in the Rembrandt project. We made the UI for the web application and RKD have used our software to create the RKD technical database. The RKD technical database is a database that has been created by the RKD themselves and has been used as the basis for the Rembrandt database. It is currently hosted at the RKD. From our side a programmer has worked on this who has left in the meantime. Ilya has taken over his responsibilities in this project.
# could you send us the schema of the Rembrandt XML export?
#- The RKD can send you the data dictionary of the database. Then you can also see if all the tags have translations. Furthermore you can probably access the data through the Adlib API online. The RKD must open their system to allow you access. Using the API you can query the live system and get data out of it (rather than exporting). Technical information on the API can be found on
# would it be possible to export using English tags? (We don't speak Dutch). There are English XML comments in the file (very useful) but the conversion will be more robust if the tags are in English.
#- Sure, but that’s up to the RKD to decide.
# does AdLib have conversion/export to some standard format (e.g. LIDO) that can be used by Rembrandt? It would be higher value to BM if we develop an import from a standard format.
#- We do support LIDO for object information and have a standard output format for our Adlib Museum System. But as I already have explained the Rembrandt database has been created by the RKD itself. As such it does not map to LIDO using our Standard XSLT transformation. Again, this is up to the RKD to change this.

h1. RKD Correspondence

RKD is the Rembrandt project technical partner
- *[]*: your e-mail will be read by several colleagues at RKD working on the project and whoever reads it first, can answer the question.
- Wietske Donkersloot []: Mellon Fellow, Project Manager. T: \+31 70-333 9719. On maternity leave from 4-Nov-2011 to 28-Feb-2012
- Sytske Weidema []: replaces Wietske as project manager
- Willem ter Velde: will send the data exports
- Jan Teuben []: application and database manager at RKD, IT manager of Rembrandt Database project (but his time is the most limited)
- Bert Warmelink []
- Reinier van 't Zelfde []

9/26/2011 Jan Teuben: Last week I got an email reply from Adlib, about your questions for Adlib, forwarded and I'm curious if I could maybe help you if you still have questions regarding the Rembrandt Database project. Let me know if I could help.

Unanswered questions below are marked as (?)
(?) *The most urgent need at present is to get the thesauri, else we need to make up values*
(?) *Next priority is to get the complete data set*

h2. Data Stability and Schema

# Has the export changed (or will it change) significantly from the attached?
#- Right now the export is the same, but it will change in the future multiple times. Let me explain in more detail: right now the project website is still in beta stage and hosted at Adlib at an old copy of our Adlib application (RKDtechnical). We want to move the website and connect it to the newest version of RKDtechnical. We also want to release version 1.0 later this year and in the future more versions will follow. Together with the website, the Adlib application we build here at RKD is continuously changing, although major changes aren't desirable, because external partners are also working with this application.
After moving the website, a lot of the export will change already since I made a lot of changes the past six months.
#- (!) Vladimir: We've started intensive work on mapping this to CRM, so Dominic and I need to figure out some strategy regarding future changes
#- Wietske 14 Oct: we have not yet responded to Vladimirs e-mail of October 6, as we are afraid that some of the answers will not only take a lot of time from us, but also from you in the end. This has to do with the fact that our database and website are still undergoing changes, and the information we give you today will be different from what we can give you in a few months time. We have discussed this with Dominic and sent him an overview of our current status and planning yesterday. We will await what comes out of this.
#- Vladimir: I understand the risk, but unfortunately we cannot wait since it's very important for us to ground our technical work on some specific data. The CRM is too abstract to use as a data model for RS, without having an "application profile" over specific data. So don't worry about the upcoming changes, it's up to Dominic and me to figure out some way to accommodate it in RS.
That's also the reason why I'm in a hurry to get clarifications on the various questions.
# (?) Does the older schema (2009_12_02) represent the current XML data, and the newer schema (2011_06_01) your plans for changes?
# Can you send me a current export of the same record priref=2926 "De badende Suzanna" ?
#- Please find attached (badendesuzanna_adlibxml.xml)
#- Thanks\! I compared the new file you sent against the old one, and the differences are minor (see sec [Record Versions]).
Also thanks for the other sample 32162 "Portret van Herman Doomer"\!
# Do you have a schema of the Rembrandt XML export and can you send it?
#- Please find attached. We use multiple schemas, I send you the schemas for page settings and global settings. Schemas for filtering and search aren't necessary I assumed
GlobalSettings.xml GlobalSettings.xsd PageSettings.xml PageSettings.xsd
#- Thanks for these (GlobalSettings.xml especially is useful since I can see which IIP servers are used, etc).
But I meant XSD for the data (object record) export, defining all elements and their nesting
# If not: we can figure out all tags by looking at a complete set of export files (which we don’t yet have), but if the schema is a bit complicated (e.g. xsd:choice) we can't figure it out
#- (?) send a complete set of export files, and we can figure out all tags on our own

h2. Multi-linguality

# Can we assume that all text inside <object_number_RKDtechnical> is in EN,
except explicitly tagged values such as <value lang="nl-NL" invariant="false">paneel (eikenhout)
#- In the future, yes; right now, no.
This is a bit of a difficult issue for us. We want to make our data available in at least English and Dutch. Some of our partners will entry data in German or French. The problem is that there are some issues with multilingualism inside Adlib application and also within our own applications. Since only RKDtechnical is somehow multilingual, but other RKD applications like Artists, Persons and Images aren't.
# Can we assume that all text outside <object_number_RKDtechnical> is in NL?
The tag names there are in Dutch.
#- In the future, yes; right now, no.
#- RDF allows to mark a string with a language (e.g. "Badende Susanna"@nl), or not mark it at all.
Since there are no lang tags on free text fields in your XML, we have two options:
#-- Assume EN and NL as outlined above, and as we see it in Susanna.
(!) That's what we'll do unless Dominic decides otherwise
#--- Pro: will allow us to display only one of the fields, in the user's preferred language
#--- Cons: the assumption could be wrong
#-- OR not put any language
#--- Cons: will have to display both language variants, even if they are exact translations. Or pick one at random, and risk the user not being able to read it
# Austin Nevin said that you've added EN translations to the NL part, is that true?
Have you done it with explicitly tagged values, e.g. <value lang="nl-NL">... and <value lang="en-US"> ...
#- My colleague Wietske has added EN translations to the NL parts in the adlibxml example of record priref=2926 "De badende Suzanna". She send these files on the 17th of June to Dominic. See forwarded email. But these files and the translations (between <\!-\- engelse vertaling -->EN translation) are just for some extra context.
#- Thanks, we already had these and the <\!-\- EN translations --> were very useful. My question was if there are additional EN variants of free text fields, but you've answered that above

h2. Data Structure

# (?) Is your extension inside <object_number_RKDtechnical>?
Does everything outside this element come from Adlib's standard system?
# What to do when the same piece of data is in both parts (inside and outside <object_number_RKDtechnical>) ?
If the data is always consistent, then we can combine (when multiple values are appropriate) or pick one (when a single value is needed). E.g.:
<title> combine with <benaming_kunstwerk>
<title.other_older> combine with <andere_benaming>
<object.size.height> use this or <hoogte> (makes no difference)
<object.size.width> use this or <breedte> (makes no difference)
<object.size.unit> use this or <eenheid> (makes no difference)
#- The reason that some information is repeated in the export, has to do with the fact that for The Rembrandt Database we combine data from two databases (RKDimages and RKDtechnical). RKDimages is a database for very elaborate art historical data and RKDtechnical is a database for metadata on technical documentation on works of art. To identify a specific work of art to which the technical documentation is related, RKDtechnical also contains brief art historical data, which is a duplication with some information in RKDimages. The two were developed more or less separately in the past, but we want to integrate them in the near future. The brief art historical data in RKDtechnical will be replaced by the much more elaborate data in RKDimages. In the export, you can easily tell which is which, because the RKDtechnical fields are all in English, whereas the RKDimages fields are still only in Dutch. So in the end the values in the Dutch fields will prevail. So if the data in the two fields is not exactly the same, it's best to *pick the Dutch fields*.
#- This answers my question re single-value fields:
<hoogte> prevails over <object.size.height>.
#- (No need to say that we also strive to make RKDimages multilingual, but this has large implications for our institution and will take some more time...).
#- Regarding multi-value fields, I think the decision is also clear:
we'll combine <benaming_kunstwerk> with <title> because one gives the @nl string, and the other the @en string.
# If it is useful for you, we could give you a list of all fields in RKDimages which are duplicated in RKDtechnical?
#- No need: we see them, but I'd be obliged if you verify the mapping (in the last column) after we complete it
# (?) Regarding <value> with @lang attribute:
#- What is @invariant?
#- Guess we must map such values to a thesaurus?
#- (internal question: should we strip trailing "-US" from lang leaving only "en"?)
# (?) *Please XML-encode all text*. The last XML you sent ([^badendesuzanna-new.xml]) includes bare "&" and similar, and will give us trouble

h2. Image URLs

# (/) Where is the filename (or URL) to the main painting image on the IIP server?
#- Rembrandt uses two IIP servers: "RKD" in NL, "NGL" at National Gallery of London:
<imageServer locationId="RKD"
filesPath="D:/" />
#- Which server to use is specified with <file.image.location>:
<value lang="neutral">RKD{code}
But "RKD" isDefault, so it's used even when <file.image.location> is not specified.
#- The image name is specified in <file.image>. The AFTER_TREATMENT OVERALL FRONT image is the default (main) image used in thumbnails.
<value lang="neutral">AFTER_TREATMENT</value>
<value lang="neutral">OVERALL</value>
<value lang="neutral">FRONT</value>
# (/) Example of complete URL to an image on IIP:
#- It is formed as a concatenation of:
{noformat}imageServer/@url "?FIF=" imageServer/@filesPath file.image "&HEI=" Height "&CVT=" Format{noformat}
where Height and Format are the desired thumbnail size and file format (note that IIP converts TIF to JPEG)
# (/) Regarding the xrays and other research images (eg <file.image>= mh0147_front_nl_2002_001.tif): are they available in IIP?
They are available in exactly the same way
# Will we get the deep-zoom images?
#- Wietske: I think the priority is first with the thesauri and the other data, but we could also provide you with sample images concerning the database records that we have send you

h2. Specific Field Questions

# (?) <vervaardigd_plaats_land> country/place of making
How about <toeschrijving>.<land_plaats_anoniem> attribution.city_or_country ?
# (?) <iconclass> IconClass classification
Is IconClass available as an RDF thesaurus? Can we get it?
# (?) <plaats> depicted location
Are you sure that's depicted on the painting? Geertruidenberg is in province North Brabant in south Netherlands
# (?) <oorspronkelijke_lijst> original frame or not
What is this "x"? I guess the field is mandatory and the user didn't know what to enter
# (?) <toeschrijving> attribution
Are these statements about the creator (artist)?
# (?) <kwalificatie> attribution qualification
What is this "of"?
# (?) <land_plaats_anoniem> city or country if anonymous
Is this used only if <naam>="Anoniem"?
# (?) <positie_signatuur_ref> position of signature
Is this some key? How to decode?
# (?) <artistiek_verband> type of artistic relation; (excel translation) artistic context
"Drawn by" does not describe the relation between the reproduction and the original; it simply means de Poorter has drawn the reproduction. What type of relation does this represent?
# (?) <collectie_afdrukken> collection print
What is this "x"? I guess the field is mandatory and the user didn't know what to enter?
# (?) <> (MT) inventory number of loan giver
Is this of the Rijksmuseum?
#- Wietske:
'bruikleen' means 'loan' (a work of art lend from one to another institution on a temporary basis)
'bruikleengever' means 'lender' (the institution lending the work of art to another institution, the borrower)
# (?) <bruikleen_naam> (MT) loan name
Is Rijksmuseum the owner, while Mauritshuis is the curator (care taker)?
You say "temporary loan", but it seems from the data that it's been like this since 1816
# (?) <object_record_number>
Is this an auxiliary ID that is not used elsewhere?
# (?) <title.other_older>
Combine with <andere_benaming>?
Notice they are different: one is "Batseba (heette ten onrechte)" i.e. "Bathsheba (was wrong name)", the other "Susanna and the elder"
# (?) <>
Is this the <collectienaam> of the last effective <collectie>? (<collectie> is multiple while <> looks single)
If so, how exactly we define "last effective"? Options:
#- last in XML order. (!) that's what we'll do if no answer
#- empty <einddatum_in_collectie>
#- largest or empty <einddatum_in_collectie>
# (?) <>
Is this <plaats_collectie\_ verblijfplaats> of the last effective <collectie>?
# (?) <link_documentation_record.lref>
Does this relate to any record ID, or some other key?
# (?) <reference_image.front>
What is this "x"? Only if front_back=FRONT?
# (?) <reference_image.back>
What is this "x"? Only if front_back=BACK?
# (?) <file.application> file.application
Dominic: HTML-escaped fragment including a Flash and a link with URL and many other attributes. Can we trust this HTML and embed it directly into the GUI? Is Flash allowed?
# (?) <sample.location.vert>, <sample.location.hor>
Are these in CM? Are they used only if <object.shape> is Rectangle?
# (?) what is meant by begindatum_lijst and end datum_lijst? - period of creation?
# (?) what is meant by bronnen? - bibliographical references?
## (?) what kinds can be the references? standaardbron (reference publication) is apparently just one option
further the value of this item gives the Name of the Author and the span of his life what does it mean?
are all the data like this?
# (?) what is meant by collection? (in the example begindatum, enddatum in collection ???)
# (?) what is meant by <plaats> depicted location
# (?) is there actually an owner of the painting, for instance is the current collection of the painting considered as its owner? or we are in a situation of custody and loaning for all collections?

h1. Sample Record Analysis

- We are working with a sample record about [Susanna Bathing|], an important painting created by Rembrandt in 1636 [that is at Mauritshuis, The Hague|]. Susanna, who is just about to bathe, is accosted by two elders (barely visible at the back right) who are hiding in the shrubbery and tried to force Susanna to give herself to them. The owner of the painting is Rijksmuseum, Amsterdam. They currently have the following related [painting|] on place
- In 1637 Rembrandt created [Susanna and the Elders|] that is Gemäldegalerie der Staatlichen Museen, Berlin. Here the two old men step forward, and their indecent intentions are obvious. This is an apocryphal biblical story (text added to the Book of Daniel) that's painted by at least 7 other classic masters (Van Dyck, Tintoretto, etc)
- In 1654 Rembrandt created by [Bathing Bathsheba|] (Batsheba, Batseba) that is at Musée du Louvre, Paris. It is based on a similar biblical story about Kind David raping another bathing woman. These paintings are sometimes confused, and looking at the similarities in the paintings it is easy to imagine why. Eg our record about Susanna Bathing says:
-- <andere_benaming> other/former title: "Bathsheba (was wrong name)"
-- a collection remark: "the Hague 18-05-1768 (Lugt 1683), nr. 12: '(FR) A Bathsheba, with a bath, raped by David'...

!|height=300! !|height=300! !|height=300!

Not to go too deep into it, but you get the idea that art attribution and research is a complicated and sometimes confusing affair

h2. Full Sample Record

I've simplified the XML sample record to a table for easier comprehension, by using (all hail\!) Emacs and these commands:
|| key || command || from || to || explanation ||
| M-% | query-replace | {nf}</.*?>{nf} | | remove closing tags |
| C-u C-x C-o | my-delete-blank-lines | | | remove empty lines |
| M-% | query-replace | >< | > < | add a space here: <empty-tag/><\!-\- English comment> |
| M-= | query-replace-regexp | {nf}\^(TAB*){nf} | {nf}\,(format "%dTAB%s" (length \1) (make-string (length \1) ?-)){nf} | replace leading tabs with N (tag level) and leading dashes (to conserve space while still showing the hierarchy) |
| M-= | query-replace-regexp | {nf}>(.*)(<\!*{*}-\- (.-*) \->){nf} | {nf}> \3TAB\1{nf} | move English comment right after the tag, put element content in new column |
| M-= | query-replace-regexp | {nf}>([^ &#124;&#124;&#124;&#124;&#124;&#124;&#124;\||]){nf} | {nf}>TAB\1{nf} | put remaining element contents in new column |

h2. Reduced Sample Record

I've reduced the sample by leaving only 1 instance of each element. In many cases I merged elements so the remaining one has all possible sub-elements (which may lead to non-sensical data, eg begindatum_in_collectie>einddatum_in_collectie).
This can be used as the frame on which to discuss and later describe the CRM mapping