Skip to end of metadata
Go to start of metadata

Documents and Files about a Painting

Get XMLs

The XMLs of the paintings are obtained according to Rembrandt data#Data Sources (RS-57@jira)

File Kinds

The 11 Rembrandt XMLs include files (<link_file_record>) of the following kinds:

kind tag given as ext handling
image <file.image> file name tif Original TIFs are 50-200Mb in size. Download as JPG (smallest rendition), store and serve through Nuxeo (in popup window or as block element)
link <file.application> absolute link pdf Save the link, open in popup window
html excerpt <file.application> object embed swf Save the html excerpt, serve (in popup window or as block element). Uses ScribdViewer.swf to render, so the browser must support Flash. Eg Mh0146-Letter-1877-Hopman.htm

We can cross-check these are all the tags and file extensions: both these (bash) commands return nothing:

Number of Files

We can count the different files with commands like these

We can cross-check with this command:

That's how I caught these problems (accounted for in Data Migration Spec):

  • 07_NicolaesTulp.xml: <file.image>mh0146_front_nldetail_1997_038</file.image>
    missing file ext
  • 06_Flora.xml: <file.image>N-4930-00-000096-017-PYR.tif / N-4930-00-000096-018-PYR.tif</file.image>
    Some bright mind put two images in one element

painting images links html
02_Aristoteles.xml 57   5
03_Batseba.xml 12   5
04_HermanDoomer.xml 8   6
05_BadendeSusana.xml 42 1 2
06_Flora.xml 35    
07_man_met_baret.xml 18   2
08_NicolaesTulp.xml 64   4
09_man_in_orientaalse.xml 10   2
10_oude_vrouw.xml 3    
11_Andromeda.xml 46    
12_lachende_man.xml 28   2

Number of Images

  • extract XML file names and images within each
  • extract image names only
  • check for duplicates

    Only N-4930-00-000052-PYR.tif: mentioned twice in Flora

  • get all images at size Height=400 pixels (0images-get.bat):
  • Out of 323 images, 147 images (45%) are missing. Eeach has 17 bytes and includes this text:
  • Save missing image names in 0images-missing.txt
  • Make full table of files, images, and status (MISSING or not)
  • Convert to excel (0images-files.xlsx), add pivot chart (hint: add Status to both Legend Fields and Values!)

  • To check what the data migration has produced:

Main Image

The main image (shown in search results) has these qualifiers

  • file.spec.overall_detail/value[@lang="neutral"]="OVERALL"
  • file.spec.front_back/value[@lang="neutral"]="FRONT"

Unfortunately there are many candidate images per painting, as shown in the table below.
So the Main image cannot be determined from a query.
The main image (shown in bold below) and has to be marked manually, using rdf:type rso:E38_Main_Image.
(In 2 cases we downloaded better images from the web)

.xml file main .jpg images
02_Aristoteles DT219367 177463 241722 223525 272638 DP147597
03_Batseba DT226609 23941 115937 DT225592 263034 DP145915
04_HermanDoomer DP145921 73943 206843 DT2102 DT230089 mh0147_front_uv_1982
05_BadendeSusana mh0147_front_nl_2002 mh0147_front_uv_1985 mh0147_front_rl_1982
06_Flora N-4930-00-000052-PYR N-4930-00-000094-PYR N-4930-00-000096-027-PYR N-4930-00-000096-029-PYR N-4930-00-000096-032-PYR N-4930-00-000096-035-PYR N-4930-00-000052-PYR
07_man_met_baret mh0149_front_nl_2010 mh0149_front_nl_1999_007
08_NicolaesTulp mh0146_front_nl_1998 mh0146_front_nlsimulation_1998 mh0146_damages002_1877 (one with no <file.image>)
09_man_in_orientaalse DT509 47208 47208_1941 143570 263045 DP121368 207750
10_oude_vrouw mh0610_front_nl_2008 (Downloaded), mh0610_front_uv_2008
11_Andromeda mh0707_irr_2001 (Downloaded), mh0707_back_nl_2001_001
12_lachende_man mh0598_front_nl_1998_001 mh0598_front_eer_1970_001 mh0598_irp_1970 mh0598_front_rl_1998_001 mh0598_front_nl_1998_007 mh0598_front_uv_1998_001
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.