compared with
Current by Vladimir Alexiev
on Mar 12, 2012 08:58.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (30)

View Page History
{excerpt}comes from AdLib collection management software and is better structured{excerpt}
{excerpt}from AdLib collection management software, with additions by RKD{excerpt}
{toc}
{attachments:sortBy=name}

h1. Introduction
Important attachments and links:
- [Rembrandt]: description of this project
- [^Rembrandt-xml.zip]: XMLs of all 11 Rembrandt paintings
- [^images-jpg.zip]: JPG images of all paintings (323 of which 45% are missing), see [Documentation, Files, Images#Number of Images]
- [\\ontonas\all-onto\Projects\ResearchSpace\data\Rembrandt-data\images\susanna-tif.zip]: 18 TIFs (deep-zoom pyramid images) for Susanna, 1.2 Gb
- [Rembrandt XML]: issues and problems with the XML, description of new XML (not for RS3.1)
- [Rembrandt thesauri]: thesauri used by the data

- see [Rembrandt] for a description of this project
- We got sample XMLs (a record list, 2 individual records about 2 paintings). Don't have an XSD schema
Notes:
- We got XMLs for 11 paintings
- tags are in Dutch but there are English comments; Excel can translate automatically
- <object_number_RKDtechnical> includes English versions for many of the Dutch fields; and most of the values it adds are bilingual

h2. Data Sources
- website: [http://rkd.adlibsoft.com/rembrandt-demo]
- painting list (there are 12): [http://rkd.adlibsoft.com/rembrandt-demo/explore-paintings]
- one painting: [http://rkd.adlibsoft.com/rembrandt-demo/painting/de-badende-suzanna]
- click on "Debug tools: Adlib XML" at the very top and you get the XML.
!scr2.jpg|width=900!
- it uses the AdLib API to get the data from a URL like this:
[http://rkd.adlibsoft.com/rembrandt-backend/wwwopac.ashx?database=RDB-RKDimages&search=priref=???&xmltype=Grouped]
"Grouped" is important! Examples:
-- [de-badende-suzanna|http://rkd.adlibsoft.com/rembrandt-backend/wwwopac.ashx?database=RDB-RKDimages&search=priref=2926&xmltype=Grouped]
-- [portret-van-herman-doomer|http://rkd.adlibsoft.com/rembrandt-backend/wwwopac.ashx?database=RDB-RKDimages&search=priref=32162&xmltype=Grouped]

Jan 19 Oct 2011: Adlib API functionality: [http://api.adlibsoft.com/site/] gives you a lot of information about our records and thesauri.
For example this url shows the complete XML for record number 49524:
[http://test2.adlibsoft.com/rembrandt-backend-newdesign/wwwopac.ashx?database=RDB-RKDimages&search=priref=49524&xmltype=Grouped] !scr2.jpg|thumbnail,border=1!
For image with url http://test2.adlibsoft.com/rembrandt-newdesign/painting/portret-van-herman-doomer we can get the XML from the Debug tools section as shown above.

h2. Database Schemas

Older schema (2009_12_02): matches closely the XML data
{viewpdf:name=2009_12_02 Datastructuur RKDtechnical.pdf}
Newer schema (2011_06_01): doesn't match the XML data, and doesn't look entirely precise
{viewpdf:name=2011_06_01 Datastructuur RKDtechnical voorstel.pdf}
Don't have an XSD schema.
[^2009_12_02 Datastructuur RKDtechnical.pdf]: matches closely the XML data
!RKDtechnical-2009_12_02.png!

[^2011_06_01 Datastructuur RKDtechnical voorstel.pdf]: New database schema (2011_06_01). RKD are restructuring the database. This schema doesn't match the XML data, and doesn't look entirely precise yet. It's a risk for the data migration, but we can't wait for the new schema, so we'll proceed with the old one.

h1. Adlib Correspondence

RKD is the Rembrandt project technical partner
- *[mailto:rembrandtdatabase@rkd.nl]*: your e-mail will be read by several colleagues at RKD working on the project and whoever reads it first, can answer the question.
- Wietske Donkersloot [mailto:donkersloot@rkd.nl]: Mellon Fellow, Projectmanager. T: \+31 70-333 9719
- Wietske Donkersloot [mailto:donkersloot@rkd.nl]: Mellon Fellow, Project Manager. T: \+31 70-333 9719. On maternity leave from 4-Nov-2011 to 28-Feb-2012
- Sytske Weidema [mailto:weidema@rkd.nl]: replaces Wietske as project manager
- Willem ter Velde: will send the data exports
- Jan Teuben [mailto:teuben@rkd.nl]: application and database manager at RKD, especially for the IT manager of Rembrandt Database project (but his time is the most limited)
- Bert Warmelink [mailto:warmelink@rkd.nl],
- Reinier van 't Zelfde [mailto:zelfde@rkd.nl]
- Sytske Weidema [mailto:weidema@rkd.nl]

9/26/2011 Jan Teuben: Last week I got an email reply from Adlib, about your questions for Adlib, forwarded and I'm curious if I could maybe help you if you still have questions regarding the Rembrandt Database project. Let me know if I could help.
# Austin Nevin said that you've added EN translations to the NL part, is that true?
Have you done it with explicitly tagged values, e.g. <value lang="nl-NL">... and <value lang="en-US"> ...
#- My colleague Wietske has added EN translations to the NL parts in the adlibxml example of record priref=2926 "De badende Suzanna". She send these files on the 17th of June to Dominic. See forwarded email. But these files and the translations (between <\!-\- engelse vertaling \-->EN translation) are just for some extra context.
#- Thanks, we already had these and the <\!-\- EN translations \--> were very useful. My question was if there are additional EN variants of free text fields, but you've answered that above

h2. Data Structure
h2. Image URLs

# (?) Regarding the painting: where is the filename (or URL) to the main painting image on the IIP server? Will we get the images?
# (?) Regarding the xrays and other research images (eg <file.image>= mh0147_front_nl_2002_001.tif):
are they available in IIP? Or will we get them separately?
# (?) Please give me a complete URL to an image on IIP as an example.
I can see in GlobalSettings.xml that you use two IIP servers (one in NL, another in UK).
I would guess the URL is made out of these parts:
# (/) Where is the filename (or URL) to the main painting image on the IIP server?
#- Rembrandt uses two IIP servers: "RKD" in NL, "NGL" at National Gallery of London:
{code:title=GlobalSettings.xml}
<imageServers>
<imageServer locationId="RKD"
url="http://rembrandtdatabase.adlibsoft.com/IIPImageServer/IIPImageServer.exe"
isDefault="true"
filesPath="D:/rembrandtdatabase.adlibsoft.com/Images/" />
...
{code}
<imageServer locationId="RKD" url="http://rembrandtdatabase.adlibsoft.com/IIPImageServer/IIPImageServer.exe"
<file.image.location><value lang="neutral">RKD
<file.image>mh0147_front_nl_2002_001.tif
#- Which server to use is specified with <file.image.location>:
{code}<file.image.location>
<value lang="neutral">RKD{code}
But "RKD" isDefault, so it's used even when <file.image.location> is not specified.
#- The image name is specified in <file.image>. The AFTER_TREATMENT OVERALL FRONT image is the default (main) image used in thumbnails.
{code:title=badendesuzanna-new.xml}
<object_number_RKDtechnical>
<link_research_record>
<link_documentation_record>
<link_file_record>
<file.image>mh0147_front_nl_2002.tif</file.image>
<file.spec.object_status>
<value lang="neutral">AFTER_TREATMENT</value>
<file.spec.overall_detail>
<value lang="neutral">OVERALL</value>
<file.spec.front_back>
<value lang="neutral">FRONT</value>
{code}
# (/) Example of complete URL to an image on IIP:
[http://rembrandtdatabase.adlibsoft.com/IIPImageServer/IIPImageServer.exe?FIF=D:/rembrandtdatabase.adlibsoft.com/Images/mh0147_front_nl_2002.tif&HEI=300&CVT=JPEG]
#- It is formed as a concatenation of:
{noformat}imageServer/@url "?FIF=" imageServer/@filesPath file.image "&HEI=" Height "&CVT=" Format{noformat}
where Height and Format are the desired thumbnail size and file format (note that IIP converts TIF to JPEG)
# (/) Regarding the xrays and other research images (eg <file.image>= mh0147_front_nl_2002_001.tif): are they available in IIP?
They are available in exactly the same way
# Will we get the deep-zoom images?
#- Wietske: I think the priority is first with the thesauri and the other data, but we could also provide you with sample images concerning the database records that we have send you

h2. RKD Thesauri

# Which thesauri do you use?
#- We assume RKD locations, people, concepts, artworks, IconClass, but would like you to confirm explicitly
#- (?) Are these the same well-known RKD thesauri shown in red on the attached picture? Can you tell from the numbers? E.g. People (=RKDartists) having 331,455
!eCulture data cloud.png|width=400!
# (?) Can you *urgently send us the thesauri* used by the Rembrandt project?
Preferably in SKOS if you have them in that format.
# (?) Is it possible to make an export with the ID's (codes/URIs) of the controlled fields, in addition to the text label?
# If not: can you list for each controlled field:
the thesaurus it came from, which branch (parent), whether it’s immediate child or descendant of any depth?
We need this info so we can lookup and store the ID.

h2. Thesauri description

- RKDartists (an elaborate thesaurus of artist names and other persons in the 'art historical scene', with information on name variants, life dates, dates & places of activity, references to publications, etc.)

In RKDimages and RKDtechnical
- geographical terms (cities, countries)
- institutions names (museums, laboratories, etc)
- whereabouts type (e.g. museum, private collection, church, etc.)
- object type (e.g. painting, sculpture, drawing, etc.)
- shape (e.g. vertical rectangle, oval, etc.)
- support (e.g. panel, canvas, paper, etc.)
- technique (e.g. oil paint, pen and brown ink, pencil, etc.)
- qualification attribution (e.g. after, possibly, studio of, etc.)
(Please note that due to the duplication between parts of RKDtechnical and RKDimages as described above, there is also a duplication in some of the controlled vocabularies we use. Both databases use for instance controlled vocabularies of institutions names, but they are not all the way identical. We intend to have everything integrated in Spring 2012.)

In RKDtechnical:
- persons names (researchers, conservators, etc. Will be integrated with RKDartists in due time)
- research type (e.g. x-radiography, normal light studies, dendrochronology, etc.)
- analytical techniques (=techniques applied on paint samples. Will be integrated with "research types" shortly)
- equipment (=specific cameras, microscopes or other type of equipment used, with specifications)
- computer hardware (=hardware that was used to create certain documentation, such as scanners)
- software (=software that was used to create certain documentation)
- research reason / objective (e.g. conservation, publication, exhibition, etc.)
- object status (e.g. before treatment, during treatment, after treatment, etc.)
- area captured (=part of the painting/paint sample captured in an image, e.g. back overall, front upper right corner, etc.)
- magnification (for images taken with a light\- or stereomicroscope)
- document type (e.g. slide 35 mm, digital-born color photograph, research report, etc.)
- documentation whereabouts (=location where the documentation is kept within an institution, e.g. conservation studio, library, archive, etc.)
- documentation number type (=numbering system which is used for certain groups of documentation within an institution, e.g. inventory number, registration number, negative number, etc.)
- reason for sampling (=reason why a paint sample was taken, e.g. conservation, attribution, etc.)
- sample type (e.g. cross-section, dispersed sample, varnish sample, etc.)
- location/area type (for paint samples, e.g. flesh, foliage, sky, etc.)
- location/area color (for paint samples, e.g. red, brown, etc.)
- paint defects (for paint samples, e.g. smalt discoloration, saponification of lead white, etc.)
- paint layer function (for paint samples, e.g. ground, surface paint layer, varnish, etc.)
- field (for images of paint samples, e.g. bright field, dark field, etc)
- light (for images of paint samples, e.g. normal light, uv, etc)

h2. Specific Field Questions

Not to go too deep into it, but you get the idea that art attribution and research is a complicated and sometimes confusing affair

h2. Record Versions

- [^badendesuzanna-new.xml] (got from Jan Teuben@RKB, 10/4/2011, original name was "badendesuzanna_adlib.xml")
Mariana: This file does not open, gives XML Parsing Error: not well-formed
- [^badendesuzanna-old.xml] (got from Dominic, original name was "rdb-samplerecord_English translations.xml")

What I did:
- New: split lines per tag (it was a single huge line) and indented properly
- Old: untabified (tab \-> 2 spaces)
-- Split some elements (all with value "RKD", etc) to 3 lines, so they compare better to New
-- Split one line between tags
- compared with Araxis Merge

Differences:
- New: no English translations of the Dutch tags. But these are in our excel anyway
- New: content is not properly escaped, eg
-- "list & lijden": should be &
-- The complex HTML excerpt in <file.application>. It doesn't have a single root: has a sequence of elements (first is <a>, last is <object>)
{code}<a title="View mh0147_x06 on Scribd" href=...</object>{code}
- New: includes more <value lang> variants
-- Example1 (old had just "FRONT"). "0/1" are not proper languages, interpret as 0->en, 1->nl
{code}
<value lang="neutral">FRONT</value>
<value lang="0">front</value>
<value lang="1">voorzijde</value>
{code}
-- Example2 (old had just "RKD")
{code}
<value lang="neutral">RKD</value>
<value lang="0">Rijksdienst voor Kunsthistorische Documentatie</value>
<value lang="1">Rijksdienst voor Kunsthistorische Documentatie</value>
{code}
-- happens for the following elements
<object.size.unit>
<doc.size.unit>
<file.image.location>
<file.application.location>
<file.spec.object_status>
<file.spec.overall_detail>
<file.spec.front_back>
<sample.location.vert.start>
<sample.location.hor.start>
-- All of these are thesaurus values to be mapped to URI, so it doesn't change the mapping
-- Now we have not just the code but also the titles
- Old: <plaats_tentoonstelling> was a number (eg 490: a bug we noticed earlier). New: now is a proper name, eg Berlijn (but is that from a thesaurus?)
- Old: <instelling_tentoonstelling> was empty, now is present (eg Gemäldegalerie)

h2. Full Sample Record

| M-= | query-replace-regexp | {nf}\^(TAB*){nf} | {nf}\,(format "%dTAB%s" (length \1) (make-string (length \1) ?-)){nf} | replace leading tabs with N (tag level) and leading dashes (to conserve space while still showing the hierarchy) |
| M-= | query-replace-regexp | {nf}>(.*)(<\!*{*}-\- (.-*) \->){nf} | {nf}> \3TAB\1{nf} | move English comment right after the tag, put element content in new column |
| M-= | query-replace-regexp | {nf}>([^ &#124;&#124;&#124;&#124;&#124;&#124;&#124;\||]){nf} | {nf}>TAB\1{nf} | put remaining element contents in new column |

[^rdb-sample.xls]


h2. Reduced Sample Record

[^rdb-sample-reduced.xls]


{viewxls:name=rdb-sample-reduced.xls}