Skip to end of metadata
Go to start of metadata

Specifiation for migrating Rembrandt data to CRM


In RS3.1 we don't migrate the following fields:

  • Various Remarks, since they map to Annotations and we have still not decided how to do it (see Alternatives in Property Types and Annotations)
    • By way of example, we have mapped <toeschrijving> Attribution, which is the most complex case.
      It includes author (source), qualification, remark, date; while others have only Remark
  • Group <link_sample_record> (12 fields), since that maps to Image annotation
  • Several fields with unknown meaning, typically "x" (see susana.ttl for details):
    <positie_signatuur_ref>, <reference_image.front>, <reference_image.back>
  • Key fields that are outside the corresponding record (so cannot be correlated reliably), and whose relation is unclear:
    <link_research_record>.<link_documentation_record.lref>, <link_research_record>.<link_documentation_record.lref>
  • files that are missing in some records, "NGL" in one, and "RKD" in all others. (We'll use only own IIP server):
    <file.image.location>, <file.application.location>

Special Selection

<collectie> Collection

The last record (CURRENT COLLECTION) has special treatment, see in susana.ttl. In 05_Susanna, the records are in chronological order.

  • If <begindatum_in_collectie> are not in chronological order, throw an exception
  • If <einddatum_in_collectie> are not in chronological order, throw an exception

<toeschrijving> Attribution

The first <toeschrijving> record says only "Rembrandt". The second has more data, and the third is bogus (checked in XMLs 02..07). Therefore:

  • If more than one, use the second one; else use the only one.

Special Field Handling

Duplicate fields

Many fields are duplicate: NL tag at outer level, and EN tag within <object_number_RKDtechnical>.
These pairs are listed in the comments column of Rembrandt data#Reduced Sample Record, and in comments in susana.ttl.
They are handled in two different ways, described below


For multi-value fields (denoted "," in the comment): emit both, eg


For single-value fields (denoted "=" in the comment): emit only the NL field, ignoring EN, eg:

(here we deal with two fields at once)

For thesaurus fields, coordinate with Maria whether to use the NL or EN field, eg

  • we prefer bilingual thesarus (RKDtechnical)
  • RKD prefers Dutch thesaurus (RKDimages) since it's more authoritative (RKD are in the process of cleaning and merging thesauri)

Text fields

  • Language tags: for a free text field coming from a NL tag, emit @nl. For an EN tag, emit @en
  • If a text field includes quotes or newlines, emit it as extended Turtle string, eg
  • This includes the following fields (because of quote):
    <file.application>, <literature>, <literatuur>, <research.reason_objective><value>


Extract <datering>

The painting's date is present in two fields, that are not always consistent and that include text:

file <datering> <date>
02_Aristoteles 1653 gedateerd 1653 (dated)
03_Batseba 1643 1643 (dated)
04_HermanDoomer 1640 gedateerd 1640 (dated)
05_BadendeSusana 1636 1636 (dated)
06_Flora 1635 gedateerd 1635
07_man_met_baret rond het midden of in de tweede helft van de jaren 1630 ca. 1635-1640
08_NicolaesTulp 1632 gedateerd 1632 (dated)
09_man_in_orientaalse 1632 gedateerd 1632 (dated)
10_oude_vrouw na ca. 1631 ca. 1660
11_Andromeda 1630/1631 1630/1631
12_lachende_man 1629/1630 1629/1630

To extract a useful date:

  • use <datering> and ignore <date> (this is a random decision)
  • look for numbers (digit sequences) in <datering>
  • emit as "YYYY"^^xsd:gYear
  • handle 1 or 2 dates (P82 vs P82a&P82b)

Other Dates

For other date fields

  • Replace "/" with "-" (eg "1758/05/23" is not valid xsd:date lexical value)
  • Assign types xsd:date vs xsd:gYearMonth vs xsd:gYear depending on the date form (yyyy-mm-dd vs yyyy-mm vs yyyy)
    (More elaborate handling of date vs gYearMonth vs gYear is not for RS3.1)

P82 vs P82a&P82b

Several fields can contain one or two dates

  • if there is one date, emit
  • if there are two dates, emit

This applies to:

  • <datering>: painting (this single field can hold 2 dates, see Extract <datering>)
  • <begindatum_lijst>, <einddatum_lijst>: frame
  • <begindatum_tentoonstelling>, <einddatum_tentoonstelling>: exhibition
  • <begindatum_veiling>, <einddatum_veiling> auction
  • <research.date_begin>, <research.date_end>: research


  • emit integer fields, eg <bedrag>=157 as eg 157), which is equivalent to "157"^^xsd:integer
  • emit floating fields, eg <hoogte>=47,2 as eg "47.2"^^xsd:double
    • convert comma to dot
  • emit keys as string, even if they are numeric, eg

Trim white space

Trim leading/trailing white-space from all fields. Best to use a parser option for this.
Useful for:

  • Space before image file (06_Flora.xml)
  • empty field (just a newline)

Missing or Empty fields

Missing or empty fields MUST NOT emit any RDF. This includes:

  • missing elements
  • totally empty elements:
  • elements having only whitespace:

    This is very important, otherwise invalid or inconsistent TTL will result

Missing Frame

If there aren't any Frame fields (<begindatum_lijst>, <einddatum_lijst>, <naam_lijstenmaker>, <lijstmateriaal>) then:

  • don't emit any statements related to part/2:
  • crm:P57_has_number_of_parts should be 1 not 2

Fake Values

Treat the following values as missing (i.e. don't emit)

  • <naam_koper> = "-"
  • <sample.name_number> = "x" (for RS3.3)

The following fields always have empty or fake value, so are simply ignored:

  • <collectie_afdrukken> <oorspronkelijke_lijst> <reference_image.back> <reference_image.front>


The XMLs include references to various files, see Documentation, Files, Images#File types for details.
They are handled according the the following decision table ("content" means to check if element content starts with this):

source tag content target property extra actions (and justification)
<file.image>   rso\:P3_has_image_file
  1. Split content on " / "
    06_Flora.xml: <file.image>N-4930-00-000096-017-PYR.tif / N-4930-00-000096-018-PYR.tif</file.image>
    Some bright mind put two images in one element
  2. Replace ".tiff?" with ".jpg"
  3. If no ".tif", append ".jpg"
    07_NicolaesTulp.xml: <file.image>mh0146_front_nldetail_1997_038</file.image>
    Missing file ext
  4. Remove any path name (.*/)
    06_Flora.xml: <file.image>/pics/pyramids/careofthecollection/For%20Conservation/Rembrandt%20Project/N-4930-00-000052-PYR.tif</file.image>
<file.application> &lt rso\:P3_has_html Decode HTML entities lt gt amp
<file.application> http rso\:P3_has_url  
<file.application>     throw exception, printing the content


Some XML elements allow repetitions, which results in several nodes. We use counters to generate the URIs for these nodes.
The counters are reset to 1 at the start of every object, incremented globally for the object (no matter the nesting)

  • object: obj/priref (root of these below)
  • parts: part/n (1=painting, 2=frame)
  • <andere_benaming>, <title.other_older>: title/other/n (both XML elements use one counter)
  • <artistiek>: related/n
  • <literatuur>, <bronnen>: reference/n (both XML elements use one counter)
  • <collectie>: collection/n
  • <tentoonstellingen>: exhibition/n
  • <veiling>: acquisition/n
  • <link_research_record>: research/n
  • <link_documentation_record>: document/n
  • <link_file_record>: file/n


See Thesaurus Lookup function for details!

  • For thesaurus fields, call these:
    • LookupInThesaurusByLabel (String field, String label)
      for fields with simple content (eg <drager>)
    • LookupInthesaurusByLabels (String field, StringWithLang[] labels)
      for fields with <value> elements (eg <>. These include multiple labels with language
  • for <iconclass_code>, generate the URI yourself
    Remove spaces, replace "(...)" with "_..._", prepend "_" and rst-iconclass: namespace
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.