Documents and Files about a Painting
Get XMLs
The XMLs of the paintings are obtained according to Rembrandt data#Data Sources (RS-57@jira)
- "Zelfportret" http://rkd.adlibsoft.com/rembrandt-demo/painting/zelfportret returns error: "Sorry! An error occured while you were using this site. The page you are looking for might have been removed, had its name changes, or is temporarily unavailable."
- "Borststuk van een man met een gevederde baret" http://rkd.adlibsoft.com/rembrandt-demo/painting/borststuk-van-een-man-met-een-gevederde-baret has Other/former title "Zelfportret", but is not to be confused with "Zelfportret".
- renumbered and saved filenames that include the most important words from the title (see tables below)
- broke each XML element on its own line
File Kinds
The 11 Rembrandt XMLs include files (<link_file_record>) of the following kinds:
kind | tag | given as | ext | handling |
---|---|---|---|---|
image | <file.image> | file name | tif | Original TIFs are 50-200Mb in size. Download as JPG (smallest rendition), store and serve through Nuxeo (in popup window or as block element) |
link | <file.application> | absolute link | Save the link, open in popup window | |
html excerpt | <file.application> | object embed | swf | Save the html excerpt, serve (in popup window or as block element). Uses ScribdViewer.swf to render, so the browser must support Flash. Eg Mh0146-Letter-1877-Hopman.htm |
We can cross-check these are all the tags and file extensions: both these (bash) commands return nothing:
Number of Files
We can count the different files with commands like these
We can cross-check with this command:
That's how I caught these problems (accounted for in Data Migration Spec):
- 07_NicolaesTulp.xml: <file.image>mh0146_front_nldetail_1997_038</file.image>
missing file ext - 06_Flora.xml: <file.image>N-4930-00-000096-017-PYR.tif / N-4930-00-000096-018-PYR.tif</file.image>
Some bright mind put two images in one element
painting | images | links | html |
---|---|---|---|
02_Aristoteles.xml | 57 | 5 | |
03_Batseba.xml | 12 | 5 | |
04_HermanDoomer.xml | 8 | 6 | |
05_BadendeSusana.xml | 42 | 1 | 2 |
06_Flora.xml | 35 | ||
07_man_met_baret.xml | 18 | 2 | |
08_NicolaesTulp.xml | 64 | 4 | |
09_man_in_orientaalse.xml | 10 | 2 | |
10_oude_vrouw.xml | 3 | ||
11_Andromeda.xml | 46 | ||
12_lachende_man.xml | 28 | 2 |
Number of Images
- extract XML file names and images within each
- extract image names only
- check for duplicates
Only N-4930-00-000052-PYR.tif: mentioned twice in Flora
- get all images at size Height=400 pixels (0images-get.bat):
- Out of 323 images, 147 images (45%) are missing. Eeach has 17 bytes and includes this text:
- Save missing image names in 0images-missing.txt
- Make full table of files, images, and status (MISSING or not)
- Convert to excel (0images-files.xlsx), add pivot chart (hint: add Status to both Legend Fields and Values!)
- To check what the data migration has produced:
Main Image
The main image (shown in search results) has these qualifiers
- file.spec.overall_detail/value[@lang="neutral"]="OVERALL"
- file.spec.front_back/value[@lang="neutral"]="FRONT"
Unfortunately there are many candidate images per painting, as shown in the table below.
So the Main image cannot be determined from a query.
The main image (shown in bold below) and has to be marked manually, using rdf:type rso:E38_Main_Image.
(In 2 cases we downloaded better images from the web)
.xml file | main .jpg images |
---|---|
02_Aristoteles | DT219367 177463 241722 223525 272638 DP147597 |
03_Batseba | DT226609 23941 115937 DT225592 263034 DP145915 |
04_HermanDoomer | DP145921 73943 206843 DT2102 DT230089 mh0147_front_uv_1982 |
05_BadendeSusana | mh0147_front_nl_2002 mh0147_front_uv_1985 mh0147_front_rl_1982 |
06_Flora | N-4930-00-000052-PYR N-4930-00-000094-PYR N-4930-00-000096-027-PYR N-4930-00-000096-029-PYR N-4930-00-000096-032-PYR N-4930-00-000096-035-PYR N-4930-00-000052-PYR |
07_man_met_baret | mh0149_front_nl_2010 mh0149_front_nl_1999_007 |
08_NicolaesTulp | mh0146_front_nl_1998 mh0146_front_nlsimulation_1998 mh0146_damages002_1877 (one with no <file.image>) |
09_man_in_orientaalse | DT509 47208 47208_1941 143570 263045 DP121368 207750 |
10_oude_vrouw | mh0610_front_nl_2008 (Downloaded), mh0610_front_uv_2008 |
11_Andromeda | mh0707_irr_2001 (Downloaded), mh0707_back_nl_2001_001 |
12_lachende_man | mh0598_front_nl_1998_001 mh0598_front_eer_1970_001 mh0598_irp_1970 mh0598_front_rl_1998_001 mh0598_front_nl_1998_007 mh0598_front_uv_1998_001 |