compared with
Current by Vladimir Alexiev
on Mar 12, 2012 08:58.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (54)

View Page History
{excerpt}comes from AdLib collection management software and is better structured{excerpt}
{excerpt}from AdLib collection management software, with additions by RKD{excerpt}
{toc}
{attachments:sortBy=name}

h1. Introduction
- see [Rembrandt] for a description of this project
- We got sample XMLs (a record list, 2 individual records about 2 paintings). Don't have an XSD schema
Important attachments and links:
- [Rembrandt]: description of this project
- [^Rembrandt-xml.zip]: XMLs of all 11 Rembrandt paintings
- [^images-jpg.zip]: JPG images of all paintings (323 of which 45% are missing), see [Documentation, Files, Images#Number of Images]
- [\\ontonas\all-onto\Projects\ResearchSpace\data\Rembrandt-data\images\susanna-tif.zip]: 18 TIFs (deep-zoom pyramid images) for Susanna, 1.2 Gb
- [Rembrandt XML]: issues and problems with the XML, description of new XML (not for RS3.1)
- [Rembrandt thesauri]: thesauri used by the data

Notes:
- We got XMLs for 11 paintings
- tags are in Dutch but there are English comments; Excel can translate automatically
- <object_number_RKDtechnical> includes English versions for many of the Dutch fields; and most of the values it adds are bilingual

h2. Data Sources
- website: [http://rkd.adlibsoft.com/rembrandt-demo]
- painting list (there are 12): [http://rkd.adlibsoft.com/rembrandt-demo/explore-paintings]
- one painting: [http://rkd.adlibsoft.com/rembrandt-demo/painting/de-badende-suzanna]
- click on "Debug tools: Adlib XML" at the very top and you get the XML.
!scr2.jpg|width=900!
- it uses the AdLib API to get the data from a URL like this:
[http://rkd.adlibsoft.com/rembrandt-backend/wwwopac.ashx?database=RDB-RKDimages&search=priref=???&xmltype=Grouped]
"Grouped" is important! Examples:
-- [de-badende-suzanna|http://rkd.adlibsoft.com/rembrandt-backend/wwwopac.ashx?database=RDB-RKDimages&search=priref=2926&xmltype=Grouped]
-- [portret-van-herman-doomer|http://rkd.adlibsoft.com/rembrandt-backend/wwwopac.ashx?database=RDB-RKDimages&search=priref=32162&xmltype=Grouped]

h2. Database Schemas
Older schema (2009_12_02): matches closely the XML data
{viewpdf:name=2009_12_02 Datastructuur RKDtechnical.pdf}
Newer schema (2011_06_01): doesn't match the XML data, and doesn't look entirely precise
{viewpdf:name=2011_06_01 Datastructuur RKDtechnical voorstel.pdf}

Don't have an XSD schema.
[^2009_12_02 Datastructuur RKDtechnical.pdf]: matches closely the XML data
!RKDtechnical-2009_12_02.png!

[^2011_06_01 Datastructuur RKDtechnical voorstel.pdf]: New database schema (2011_06_01). RKD are restructuring the database. This schema doesn't match the XML data, and doesn't look entirely precise yet. It's a risk for the data migration, but we can't wait for the new schema, so we'll proceed with the old one.

h1. Adlib Correspondence

20110917: Adlib is vendor of the base collection management system
- Bert Degenhart Drenth [mailto:bdd@adlibsoft.com]: CEO
- Ilya Gorbylev [mailto:i.Gorbylev@adlibsoft.com]: dev who currently supports Rembrandt

# We're starting work on the www.ResearchSpace.org project for the British Museum. One of the tasks is to convert the Rembrandt project database to CIDOC CRM and import it to ResearchSpace. Dominic Oldman (IS Development Manager of BM) sent us some sample records from Rembrandt (attached), and I noticed they are exported from your system: the root is <adlibXML>.
# are you involved in the Rembrandt project, or they only use your system?

h1. RKD Correspondence

RKD is the Rembrandt project technical partner
- *[mailto:rembrandtdatabase@rkd.nl]*: your e-mail will be read by several colleagues at RKD working on the project and whoever reads it first, can answer the question.
- Wietske Donkersloot [mailto:donkersloot@rkd.nl]: Mellon Fellow, Projectmanager. T: +31 70-333 9719
- Wietske Donkersloot [mailto:donkersloot@rkd.nl]: Mellon Fellow, Project Manager. T: \+31 70-333 9719. On maternity leave from 4-Nov-2011 to 28-Feb-2012
- Sytske Weidema [mailto:weidema@rkd.nl]: replaces Wietske as project manager
- Willem ter Velde: will send the data exports
- Jan Teuben [mailto:teuben@rkd.nl]: application and database manager at RKD, especially for the IT manager of Rembrandt Database project (but his time is the most limited)
- Bert Warmelink [mailto:warmelink@rkd.nl],
- Reinier van 't Zelfde [mailto:zelfde@rkd.nl]
- Sytske Weidema [mailto:weidema@rkd.nl]

9/26/2011 Jan Teuben: Last week I got an email reply from Adlib, about your questions for Adlib, forwarded and I'm curious if I could maybe help you if you still have questions regarding the Rembrandt Database project. Let me know if I could help.

h2. Data Stability and Schema

# Has the export changed (or will it change) significantly from the attached?
#- Right now the export is the same, but it will change in the future multiple times. Let me explain in more detail: right now the project website is still in beta stage and hosted at Adlib at an old copy of our Adlib application (RKDtechnical). We want to move the website and connect it to the newest version of RKDtechnical. We also want to release version 1.0 later this year and in the future more versions will follow. Together with the website, the Adlib application we build here at RKD is continuously changing, although major changes aren't desirable, because external partners are also working with this application.
#- Wietske 14 Oct: we have not yet responded to Vladimirs e-mail of October 6, as we are afraid that some of the answers will not only take a lot of time from us, but also from you in the end. This has to do with the fact that our database and website are still undergoing changes, and the information we give you today will be different from what we can give you in a few months time. We have discussed this with Dominic and sent him an overview of our current status and planning yesterday. We will await what comes out of this.
#- Vladimir: I understand the risk, but unfortunately we cannot wait since it's very important for us to ground our technical work on some specific data. The CRM is too abstract to use as a data model for RS, without having an "application profile" over specific data. So don't worry about the upcoming changes, it's up to Dominic and me to figure out some way to accommodate it in RS.
That's also the reason why I'm in a hurry to get clarifications on the various questions.
# (?) Does the older schema (2009_12_02) represent the current XML data, and the newer schema (2011_06_01) your plans for changes?
# Can you send me a current export of the same record priref=2926 "De badende Suzanna" ?
#- Please find attached (badendesuzanna_adlibxml.xml)
#- Thanks\! I compared the new file you sent against the old one, and the differences are minor (see sec [Record Versions]).
Also thanks for the other sample 32162 "Portret van Herman Doomer"\!
# Do you have a schema of the Rembrandt XML export and can you send it?
#- Please find attached. We use multiple schemas, I send you the schemas for page settings and global settings. Schemas for filtering and search aren't necessary I assumed
GlobalSettings.xml GlobalSettings.xsd PageSettings.xml PageSettings.xsd
#- Thanks for these (GlobalSettings.xml especially is useful since I can see which IIP servers are used, etc).
But I meant XSD for the data (object record) export, defining all elements and their nesting
# If not: we can figure out all tags by looking at a complete set of export files (which we don’t yet have), but if the schema is a bit complicated (e.g. xsd:choice) we can't figure it out
#- (?) send a complete set of export files, and we can figure out all tags on our own

h2. Multi-linguality

# Can we assume that all text inside <object_number_RKDtechnical> is in EN,
except explicitly tagged values such as <value lang="nl-NL" invariant="false">paneel (eikenhout)
The tag names there are in Dutch.
#- In the future, yes; right now, no.
#- RDF allows to mark a string with a language (e.g. "Badende Susanna"@nl), or not mark it at all.
Since there are no lang tags on free text fields in your XML, we have two options:
#-- Assume EN and NL as outlined above, and as we see it in Susanna.
#--- Pro: will allow us to display only one of the fields, in the user's preferred language
#--- Cons: the assumption could be wrong
#-- OR not put any language
#--- Cons: will have to display both language variants, even if they are exact translations. Or pick one at random, and risk the user not being able to read it
# Austin Nevin said that you've added EN translations to the NL part, is that true?
Have you done it with explicitly tagged values, e.g. <value lang="nl-NL">... and <value lang="en-US"> ...
#- My colleague Wietske has added EN translations to the NL parts in the adlibxml example of record priref=2926 "De badende Suzanna". She send these files on the 17th of June to Dominic. See forwarded email. But these files and the translations (between <\!-\- engelse vertaling -->EN translation) are just for some extra context.
#- Thanks, we already had these and the <\!-\- EN translations --> were very useful. My question was if there are additional EN variants of free text fields, but you've answered that above

h2. Data Structure

# (?) Is your extension inside <object_number_RKDtechnical>?
Does everything outside this element come from Adlib's standard system?
{code}
#- The reason that some information is repeated in the export, has to do with the fact that for The Rembrandt Database we combine data from two databases (RKDimages and RKDtechnical). RKDimages is a database for very elaborate art historical data and RKDtechnical is a database for metadata on technical documentation on works of art. To identify a specific work of art to which the technical documentation is related, RKDtechnical also contains brief art historical data, which is a duplication with some information in RKDimages. The two were developed more or less separately in the past, but we want to integrate them in the near future. The brief art historical data in RKDtechnical will be replaced by the much more elaborate data in RKDimages. In the export, you can easily tell which is which, because the RKDtechnical fields are all in English, whereas the RKDimages fields are still only in Dutch. So in the end the values in the Dutch fields will prevail. So if the data in the two fields is not exactly the same, it's best to *pick the Dutch fields*.
#- This answers my question re single-value fields:
<hoogte> prevails over <object.size.height>.
#- (No need to say that we also strive to make RKDimages multilingual, but this has large implications for our institution and will take some more time...).
#- Regarding multi-value fields, I think the decision is also clear:
we'll combine <benaming_kunstwerk> with <title> because one gives the @nl string, and the other the @en string.
#- No need: we see them, but I'd be obliged if you verify the mapping (in the last column) after we complete it
# (?) Regarding <value> with @lang attribute:
#- What is @invariant?
#- Guess we must map such values to a thesaurus?
#- (internal question: should we strip trailing "-US" from lang leaving only "en"?)

h2. Image URLs
# (?) Regarding the painting: where is the filename (or URL) to the main painting image on the IIP server? Will we get the images?
# (?) Regarding the xrays and other research images (eg <file.image>= mh0147_front_nl_2002_001.tif):
are they available in IIP? Or will we get them separately?
# (?) Please give me a complete URL to an image on IIP as an example.
I can see in GlobalSettings.xml that you use two IIP servers (one in NL, another in UK).
I would guess the URL is made out of these parts:

# (/) Where is the filename (or URL) to the main painting image on the IIP server?
#- Rembrandt uses two IIP servers: "RKD" in NL, "NGL" at National Gallery of London:
{code:title=GlobalSettings.xml}
<imageServers>
<imageServer locationId="RKD"
url="http://rembrandtdatabase.adlibsoft.com/IIPImageServer/IIPImageServer.exe"
isDefault="true"
filesPath="D:/rembrandtdatabase.adlibsoft.com/Images/" />
...
{code}
<imageServer locationId="RKD" url="http://rembrandtdatabase.adlibsoft.com/IIPImageServer/IIPImageServer.exe"
<file.image.location><value lang="neutral">RKD
<file.image>mh0147_front_nl_2002_001.tif
#- Which server to use is specified with <file.image.location>:
{code}<file.image.location>
<value lang="neutral">RKD{code}
But "RKD" isDefault, so it's used even when <file.image.location> is not specified.
#- The image name is specified in <file.image>. The AFTER_TREATMENT OVERALL FRONT image is the default (main) image used in thumbnails.
{code:title=badendesuzanna-new.xml}
<object_number_RKDtechnical>
<link_research_record>
<link_documentation_record>
<link_file_record>
<file.image>mh0147_front_nl_2002.tif</file.image>
<file.spec.object_status>
<value lang="neutral">AFTER_TREATMENT</value>
<file.spec.overall_detail>
<value lang="neutral">OVERALL</value>
<file.spec.front_back>
<value lang="neutral">FRONT</value>
{code}
# (/) Example of complete URL to an image on IIP:
[http://rembrandtdatabase.adlibsoft.com/IIPImageServer/IIPImageServer.exe?FIF=D:/rembrandtdatabase.adlibsoft.com/Images/mh0147_front_nl_2002.tif&HEI=300&CVT=JPEG]
#- It is formed as a concatenation of:
{noformat}imageServer/@url "?FIF=" imageServer/@filesPath file.image "&HEI=" Height "&CVT=" Format{noformat}
where Height and Format are the desired thumbnail size and file format (note that IIP converts TIF to JPEG)
# (/) Regarding the xrays and other research images (eg <file.image>= mh0147_front_nl_2002_001.tif): are they available in IIP?
They are available in exactly the same way
# Will we get the deep-zoom images?
#- Wietske: I think the priority is first with the thesauri and the other data, but we could also provide you with sample images concerning the database records that we have send you

h2. RKD Thesauri
# Which thesauri do you use?
#- We assume RKD locations, people, concepts, artworks, IconClass, but would like you to confirm explicitly
#- (?) Are these the same well-known RKD thesauri shown in red on the attached picture? Can you tell from the numbers? E.g. People (=RKDartists) having 331,455
!eCulture data cloud.png|width=400!
# (?) Can you *urgently send us the thesauri* used by the Rembrandt project?
Preferably in SKOS if you have them in that format.
# (?) Is it possible to make an export with the ID's (codes/URIs) of the controlled fields, in addition to the text label?
# If not: can you list for each controlled field:
the thesaurus it came from, which branch (parent), whether it’s immediate child or descendant of any depth?
We need this info so we can lookup and store the ID.

h2. Thesauri description
- RKDartists (an elaborate thesaurus of artist names and other persons in the 'art historical scene', with information on name variants, life dates, dates & places of activity, references to publications, etc.)

In RKDimages and RKDtechnical
- geographical terms (cities, countries)
- institutions names (museums, laboratories, etc)
- whereabouts type (e.g. museum, private collection, church, etc.)
- object type (e.g. painting, sculpture, drawing, etc.)
- shape (e.g. vertical rectangle, oval, etc.)
- support (e.g. panel, canvas, paper, etc.)
- technique (e.g. oil paint, pen and brown ink, pencil, etc.)
- qualification attribution (e.g. after, possibly, studio of, etc.)
(Please note that due to the duplication between parts of RKDtechnical and RKDimages as described above, there is also a duplication in some of the controlled vocabularies we use. Both databases use for instance controlled vocabularies of institutions names, but they are not all the way identical. We intend to have everything integrated in Spring 2012.)

In RKDtechnical:
- persons names (researchers, conservators, etc. Will be integrated with RKDartists in due time)
- research type (e.g. x-radiography, normal light studies, dendrochronology, etc.)
- analytical techniques (=techniques applied on paint samples. Will be integrated with "research types" shortly)
- equipment (=specific cameras, microscopes or other type of equipment used, with specifications)
- computer hardware (=hardware that was used to create certain documentation, such as scanners)
- software (=software that was used to create certain documentation)
- research reason / objective (e.g. conservation, publication, exhibition, etc.)
- object status (e.g. before treatment, during treatment, after treatment, etc.)
- area captured (=part of the painting/paint sample captured in an image, e.g. back overall, front upper right corner, etc.)
- magnification (for images taken with a light- or stereomicroscope)
- document type (e.g. slide 35 mm, digital-born color photograph, research report, etc.)
- documentation whereabouts (=location where the documentation is kept within an institution, e.g. conservation studio, library, archive, etc.)
- documentation number type (=numbering system which is used for certain groups of documentation within an institution, e.g. inventory number, registration number, negative number, etc.)
- reason for sampling (=reason why a paint sample was taken, e.g. conservation, attribution, etc.)
- sample type (e.g. cross-section, dispersed sample, varnish sample, etc.)
- location/area type (for paint samples, e.g. flesh, foliage, sky, etc.)
- location/area color (for paint samples, e.g. red, brown, etc.)
- paint defects (for paint samples, e.g. smalt discoloration, saponification of lead white, etc.)
- paint layer function (for paint samples, e.g. ground, surface paint layer, varnish, etc.)
- field (for images of paint samples, e.g. bright field, dark field, etc)
- light (for images of paint samples, e.g. normal light, uv, etc)

h2. Specific Field Questions

# (?) <vervaardigd_plaats_land> country/place of making
How about <toeschrijving>.<land_plaats_anoniem> attribution.city_or_country ?
# (?) <inv.nr._bruikleengever> (MT) inventory number of loan giver
Is this inv.no of the Rijksmuseum?
#- Wietske:
'bruikleen' means 'loan' (a work of art lend from one to another institution on a temporary basis)
'bruikleengever' means 'lender' (the institution lending the work of art to another institution, the borrower)
# (?) <bruikleen_naam> (MT) loan name
Is Rijksmuseum the owner, while Mauritshuis is the curator (care taker)?
You say "temporary loan", but it seems from the data that it's been like this since 1816
# (?) <object_record_number>
Is this an auxiliary ID that is not used elsewhere?
# (?) <title.other_older>
Combine with <andere_benaming>?
Notice they are different: one is "Batseba (heette ten onrechte)" i.e. "Bathsheba (was wrong name)", the other "Susanna and the elder"
# (?) <whereabouts.name>
#- largest or empty <einddatum_in_collectie>
# (?) <whereabouts.city>
Is this <plaats_collectie\_ verblijfplaats> of the last effective <collectie>?
# (?) <link_documentation_record.lref>
Does this relate to any record ID, or some other key?
# (?) <sample.location.vert>, <sample.location.hor>
Are these in CM? Are they used only if <object.shape> is Rectangle?
# (?) what is meant by begindatum_lijst and end datum_lijst? - period of creation?
# (?) what is meant by bronnen? - bibliographical references?
## (?) what kinds can be the references? standaardbron (reference publication) is apparently just one option
further the value of this item gives the Name of the Author and the span of his life what does it mean?
are all the data like this?
# (?) what is meant by collection? (in the example begindatum, enddatum in collection ???)
# (?) what is meant by <plaats> depicted location
# (?) is there actually an owner of the painting, for instance is the current collection of the painting considered as its owner? or we are in a situation of custody and loaning for all collections?


h1. Sample Record Analysis

- We are working with a sample record about [Susanna Bathing|http://www.artbible.info/art/large/815.html], an important painting created by Rembrandt in 1636 [that is at Mauritshuis, The Hague|http://www.mauritshuis.nl/index.aspx?chapterid=2345&contentID=18308&ViewPage=1&SchilderijSsOtName=titel&SchilderijSsOv=%25susanna%25]. Susanna, who is just about to bathe, is accosted by two elders (barely visible at the back right) who are hiding in the shrubbery and tried to force Susanna to give herself to them. The owner of the painting is Rijksmuseum, Amsterdam. They currently have the following related [painting|http://www.rijksmuseum.nl/zoeken/search.jsp?lang=en&focus=assets&query=Die+Badende+Susanna&x=0&y=0] on place
- In 1637 Rembrandt created [Susanna and the Elders|http://www.artbible.info/art/large/355.html] that is Gemäldegalerie der Staatlichen Museen, Berlin. Here the two old men step forward, and their indecent intentions are obvious. This is an apocryphal biblical story (text added to the Book of Daniel) that's painted by at least 7 other classic masters (Van Dyck, Tintoretto, etc)
- In 1654 Rembrandt created by [Bathing Bathsheba|http://www.artbible.info/art/large/412.html] (Batsheba, Batseba) that is at Musée du Louvre, Paris. It is based on a similar biblical story about Kind David raping another bathing woman. These paintings are sometimes confused, and looking at the similarities in the paintings it is easy to imagine why. Eg our record about Susanna Bathing says:
-- <andere_benaming> other/former title: "Bathsheba (was wrong name)"
-- a collection remark: "the Hague 18-05-1768 (Lugt 1683), nr. 12: '(FR) A Bathsheba, with a bath, raped by David'...

!http://static.artbible.info/large/rembrandt_suzanna_bad.jpg|height=300!!http://static.artbible.info/large/rembrandt_susanna.jpg|height=300!!http://static.artbible.info/large/rembrandt_bathseba_brief.jpg|height=300!
!http://static.artbible.info/large/rembrandt_suzanna_bad.jpg|height=300! !http://static.artbible.info/large/rembrandt_susanna.jpg|height=300! !http://static.artbible.info/large/rembrandt_bathseba_brief.jpg|height=300!

Not to go too deep into it, but you get the idea that art attribution and research is a complicated and sometimes confusing affair

h2. Record Versions
- [^badendesuzanna-new.xml] (got from Jan Teuben@RKB, 10/4/2011, original name was "badendesuzanna_adlib.xml")
Mariana: This file does not open, gives XML Parsing Error: not well-formed
- [^badendesuzanna-old.xml] (got from Dominic, original name was "rdb-samplerecord_English translations.xml")

What I did:
- New: split lines per tag (it was a single huge line) and indented properly
- Old: untabified (tab -> 2 spaces)
-- Split some elements (all with value "RKD", etc) to 3 lines, so they compare better to New
-- Split one line between tags
- compared with Araxis Merge

Differences:
- New: no English translations of the Dutch tags. But these are in our excel anyway
- New: content is not properly escaped, eg
-- "list & lijden": should be &amp;
-- The complex HTML excerpt in <file.application>. It doesn't have a single root: has a sequence of elements (first is <a>, last is <object>)
{code}<a title="View mh0147_x06 on Scribd" href=...</object>{code}
- New: includes more <value lang> variants
-- Example1 (old had just "FRONT"). "0/1" are not proper languages, interpret as 0->en, 1->nl
{code}
<value lang="neutral">FRONT</value>
<value lang="0">front</value>
<value lang="1">voorzijde</value>
{code}
-- Example2 (old had just "RKD")
{code}
<value lang="neutral">RKD</value>
<value lang="0">Rijksdienst voor Kunsthistorische Documentatie</value>
<value lang="1">Rijksdienst voor Kunsthistorische Documentatie</value>
{code}
-- happens for the following elements
<object.size.unit>
<doc.size.unit>
<file.image.location>
<file.application.location>
<file.spec.object_status>
<file.spec.overall_detail>
<file.spec.front_back>
<sample.location.vert.start>
<sample.location.hor.start>
-- All of these are thesaurus values to be mapped to URI, so it doesn't change the mapping
-- Now we have not just the code but also the titles
- Old: <plaats_tentoonstelling> was a number (eg 490: a bug we noticed earlier). New: now is a proper name, eg Berlijn (but is that from a thesaurus?)
- Old: <instelling_tentoonstelling> was empty, now is present (eg Gemäldegalerie)

h2. Full Sample Record

| M-% | query-replace | >< | > < | add a space here: <empty-tag/><\!-\- English comment> |
| M-= | query-replace-regexp | {nf}\^(TAB*){nf} | {nf}\,(format "%dTAB%s" (length \1) (make-string (length \1) ?-)){nf} | replace leading tabs with N (tag level) and leading dashes (to conserve space while still showing the hierarchy) |
| M-= | query-replace-regexp | {nf}>(.*)(<\!-\- (.*) \-->){nf} {nf}>(.*)(<\!*{*}-\- (.-*) \->){nf} | {nf}> \3TAB\1{nf} | move English comment right after the tag, put element content in new column |
| M-= | query-replace-regexp | {nf}>([^ &#124;&#124;&#124;&#124;&#124;&#124;&#124;\||]){nf} | {nf}>TAB\1{nf} | put remaining element contents in new column |

[^rdb-sample.xls]

This can be used as the frame on which to discuss and later describe the CRM mapping
[^rdb-sample-reduced.xls]


{viewxls:name=rdb-sample-reduced.xls}