Skip to end of metadata
Go to start of metadata

RKD Thsauri, with area where used and examples

Name Size Creator Creation Date Comment  
PNG File eCulture data cloud.png 266 kB Vladimir Alexiev Oct 06, 2011 16:26    
PNG File Geographical terms.png 68 kB Maria Todorova Nov 18, 2011 16:14    
Text File Lookup_thesauri_property_file.txt 0.6 kB Maria Todorova Nov 23, 2011 17:21    
File Thesauri.rar 451 kB Mariana Damova Nov 30, 2011 12:07 The thesauri well-formed  

RKD Thesauri

Thesauri description

The this table  are listed Rembrandt thesauri, description for each thesauri, the area in which the thesaurus is used, is a sample of the thesaurus is provided and questions/ comments. Thesauri named Location, IconCLass, Artworks, People and Concepts are not provided by RKD but are taken from eCulture data cloud (see PNG below).

Thesaurus Description Area ?
Questions/ Comments
Locations       Not used at the moment from RKD (see the explanation in geographical terms)
IconClass       Are you using IconClass thesaurus? Could you provide it to us? Yes, we are using IconClass in the RKDimages database. However, this is probably not exactly the same version as the one that you find in the eCulture Data Cloud. We will send the vocabulary that we are currently using for our Rembrandt data entry to you. WD, 04-11-2011
Artworks       Are you using Artworks thesaurus? Could you provide it to us? I think, but I'm not sure, that this thesaurus is a thesaurus of works of art in our RKDimages database. It has been mapped for the eCulture data cloud as part of the Multimedian project some years ago (http://e-culture.multimedian.nl/). No one from our team was involved in this and we know very little about it I'm afraid. But if you have specific questions about it, we could may be try to find someone who could answer those. WD, 04-11-2011
Concepts       Maria, what is this?
RKDartists an elaborate thesaurus of artist names and other persons in the 'art historical scene', with information on name variants, life dates, dates & places of activity, references to publications, etc.) RKDArtists   Is RKDArtsis thesaurus part of People thesaurus? I think the database RKDartists was referred to as "People" when it was mapped for the Multimedian Project. Since then, the database has been worked on on a daily basis (adding new records, but also cleaning up of old records), so I think the RKDartists & thesaurus as we use it today, is different from the People thesaurus in the e-culture cloud. For your information: in RKDimages we also use a couple of separate vocabularies with names of individual people (such as private collectors), which we would like to merge with RKDartists in the end. But this is a very time consuming task and I don't see this happening in the next year or even the year after that. We will just send you what we use, that will make this issue more clear I think. WD 04-11-2011.
geographical terms cities, countries  

Y
Is Geographical terms thesaurus the same as Locations thesaurus? Same here as above. Since the mapping to the e-culture data cloud a lot of work has been done to clean up this thesaurus. We will send you the version that we use at the moment. WD, 04-11-2011
See Vocabulary GEOGRAPHICAL THESAURUS (Dutch).xml
institution names museums, laboratories, etc RKDImages, RKDTechnical    
whereabouts type museum, private collection, church, etc. RKDImages, RKDTechnical
Y
see Vocabularies RKDtechnical ALL TERM TYPES.xml
object type e.g. painting, sculpture, drawing, etc. RKDImages, RKDTechnical
Y
see Vocabularies RKDtechnical ALL TERM TYPES.xml
shape e.g. vertical rectangle, oval, etc. RKDImages, RKDTechnical
Y

see Vocabularies RKDtechnical ALL TERM TYPES.xml
support The type of material which a work of art has been made on, such as canvas, wood or stone, 208 terms, English and Dutch. This is a very simple vocabulary, with no hierarchical or other relations and no scope notes. You will see that we have not only translated the terms relevant for Rembrandt paintings, but also terms relevant for other types of objects (drawings, prints, etc.) RKDImages,
RKDTechnical
Y

Y
see Vocabulary SUPPORT RKDimages(Dutch)
In "Vocabulary Support(English).csv" (RKDTechnical) in column term.type there are records with type "Object type", "Technique" - are these errors, or we should treat all values as Support? One term can have multiple term types. In this case, treat all terms as "Support". WD, 04-11-2011
Are you storing all thesauri for RKDImages (or RKDTechnical) in one table in the DB and distinguish based on the values in term.type? If this is so you could provide us without separating by categories. In RKDtechnical, we are storing all vocabularies in three different "tables" in the database: 1 for geographical terms, 1 for persons&institutions names en 1 for all other terms/concepts. For all of these, we could send you one csv file. However, one term can belong to multiple term types. In the csv file, you will only see one of those. If we send them to you seperately, you will find that some terms are in several csv files. So this gives more information we think. Do you agree? In RKDimages all vocabularies are in several tables, so we need to send those as separate files anyway. WD, 04-11-2011.
see Vocabularies RKDtechnical ALL TERM TYPES.xml
Technique e.g. oil paint, pen and brown ink, pencil, etc. RKDImages, RKDTechnical
Y

see Vocabularies RKDtechnical ALL TERM TYPES.xml
qualification attribution e.g. after, possibly, studio of, etc. RKDImages, RKDTechnical
Y

see Vocabularies RKDtechnical ALL TERM TYPES.xml
In RKDtechnical:        
persons names researchers, conservators, etc. Will be integrated with RKDartists in due time) RKDTechnical   Is Persons names thesaurus part of People thesaurus? No, at this moment it is seperate. We would like to merge it, but this will not happen soon. WD, 04-11-2011
research type e.g. x-radiography, normal light studies, dendrochronology, etc.) RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
analytical techniques techniques applied on paint samples. Will be integrated with "research types" shortly) RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
equipment specific cameras, microscopes or other type of equipment used, with specifications) RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
computer hardware hardware that was used to create certain documentation, such as scanners RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
software software that was used to create certain documentation RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
research reason / objective e.g. conservation, publication, exhibition, etc. RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
object status e.g. before treatment, during treatment, after treatment, etc. RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
area captured part of the painting/paint sample captured in an image, e.g. back overall, front upper right corner,    etc. RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
magnification for images taken with a light
- or stereomicroscope
RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
document type e.g. slide 35 mm, digital-born color photograph, research report, etc.the type of written document or image created when researching or treating a work of art, 119 terms, English, Dutch and German. This vocabulary has a hierarchical structure. Other relations or scope notes are not yet included (but will be at some point). RKDTechnical Y
In "Vocabulary DOCUMENTATION Type (English).csv":
1. In column term.type there are records with type "Object type", "Technique" and "Area captured" - are these errors, or we should treat all values as Document types? One term can have multiple term types. In this case, treat all terms as "Document types". WD, 04-11-2011
2. In column term.status there are values with status "candidate" - how should we treat these values (to show them to the user or not)? Please, provide if there are other statuses. Please ignore these values, they don't mean anything at the moment. WD, 04-11-2011
3. We'll use broader_term but ignore narrower_term, since one row can have multiple narrower terms. Is this true? Yes! WD, 04-11-2011

see Vocabularies RKDtechnical ALL TERM TYPES.xml
documentation whereabouts location where the documentation is kept within an institution, e.g. conservation studio, library, archive, etc. RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
documentation number type numbering system which is used for certain groups of documentation within an institution, e.g. inventory number, registration number, negative number, etc. RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
reason for sampling reason why a paint sample was taken, e.g. conservation, attribution, etc. RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
sample type e.g. cross-section, dispersed sample, varnish sample, etc. RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
location/area type for paint samples, e.g. flesh, foliage, sky, etc. RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
location/area color for paint samples, e.g. red, brown, etc. RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
paint defects for paint samples, e.g. smalt discoloration, saponification of lead white, etc. RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
paint layer function for paint samples, e.g. ground, surface paint layer, varnish, etc. RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml
field for images of paint samples, e.g. bright field, dark field, etc RKDTechnical Y ssee Vocabularies RKDtechnical ALL TERM TYPES.xml
light for images of paint samples, e.g. normal light, uv, etc RKDTechnical Y see Vocabularies RKDtechnical ALL TERM TYPES.xml

Please note that due to the duplication between parts of RKDtechnical and RKDimages as described above, there is also a duplication in some of the controlled vocabularies we use. Both databases use for instance controlled vocabularies of institutions names, but they are not all the way identical. We intend to have everything integrated in Spring 2012.)

Data to Thesauri Mapping

In the table below is done mapping between the fields in Susanna and the provided from Rembrandt thesauri. In the first column is shown the thesaurus name.

The thesaurus names which are stroke through are not received from RKD and those fields will be treated as Free Text, till further decisions. There are questions and comments in the last column "Questions/Comments"

Thesaurus URI of the thes. value
Example Label
<tag> translation Questions/ Comments
Artworks     <benaming_kunstwerk> object title This is free text
Artworks     <andere_benaming> other/former title This is free text
rkd-plaats rkd-plaats:amsterdam "Amsterdam" <vervaardigd_plaats_land> country/place of making  
rkd-shape rkd-shape:vertical-rectangle "staande rechthoek"@nl <vorm> shape  
rkd-support rkd-support:panel--oak "paneel (eikenhout)"@nl <drager> support  
rkd-technique rkd-technique:oil-paint "olieverf"@nl <materiaal> medium/technique  
rst-iconclass rst-iconclass:_71P412
"Susanna bathing, usually in or near a fountain and sometimes accompanied by two female servants"@en.
<iconclass_code> iconclass code Thesaurus created by Ontotext
rkd-keywords rkd-keywords:oude_testament_apocriefen "oude testament & apocriefen"@nl <RKD_algemene_trefwoorden> RKD keywords Do you support KeyWords thesaurus? Could you provide it to us? We will provide you with the vocabulary "RKD algemene trefwoorden". WD, 04-11-2011
Thesaurus created by Ontotext
rkd-plaats rkd-plaats:geertruidenberg "Geertruidenberg" <plaats> depicted location  
rkd-frame rkd-frame:wood--gold_plated
"hout, gestoken en verguld"@nl
<lijstmateriaal> material of frame Do you use Frame materials thesaurus (Support thesaurs for frames)? Yes, will send it to you. WD, 04-11-2011
Thesaurus created by Ontotext
RKDartists     <naam_lijstenmaker> name of frame maker This is free text. WD 04-11-2011
rkd-artists rkd-artists:Rembrandt crm:P3_has_note "Rembrandt"
<naam> artist name Thesaurus created by Ontotext
rkd-plaats rkd-plaats:amsterdam "Amsterdam" <land_plaats_anoniem> city or country if anonymous  
rkd-artists rkd-artists:WillemDePoorter
crm:P3_has_note  "Poorter, Willem de"
<kunstenaar_art_verb> artist of related object Thesaurus created by Ontotext
rkd-collection rkd-collection:Mauritshuis
crm:P3_has_note "Koninklijk Kabinet van Schilderijen Mauritshuis"@nl
<collectienaam> collection name Royal Cabinet of Paintings Mauritshuis
Thesaurus created by Ontotext
 
      <bruikleen_naam> (MT) loan name Could the owner in <bruikleen_naam> be individual? In this case which thesauri do we use? Please ignore this field, we will no longer use it. WD, 04-11-2011
rkd-plaats rkd-plaats:amsterdam "Amsterdam" <bruikleen_plaats> (MT) loan place Should <bruikleen_plaats> be ignored, given that <bruikleen_naam> is ignored? If not, what does it mean?
rkd-type_where rkd-type-where:private-collection "particuliere collectie"@nl <soort_collectie_verblijfplaats> collection type  
rkd-plaats rkd-plaats:den-haag "Den Haag"
<plaats_collectie_verblijfplaats> collection location  
?auction house     <veilinghuis_zc_nw> auction house Are the (<veilinghuis_zc_nw>) auction house names supported in Institution names thesauri, or this field is not linked to a thesauri and is free text.This is a separate vocabulary, will send it to you. WD, 04-11-2011
Maria: Free text due to missing thesaurus
rkd-plaats rkd-plaats:antwerpen "Antwerpen" <veilingplaats_zc_nw> auction location  
?People     <inbrenger> seller Are the ( <inbrenger>) sellers names supported in any of the People/Persons Names/ RKDArtists thesauri, or this field is free text Free text. WD 04-11-2011
?People     <naam_koper> buyer Are the ( <naam_koper>) buyers names supported in any of the People/Persons Names/ RKDArtists thesauri, or this field is free text Free text. WD 04-11-2011
rst-currency rst-currency:HFL "HFL" <munt> monetary unit Is there any existing thesaurus for currencies supported? Should we use it for <munt> field, or it is free text. There is a vocabulary, will send it to you. WD, 04-11-2011
Maria: Thesaurus created by Ontotext
institutions names     <instelling_tentoonstelling/> institution where exhibition takes place Maria: Free text due to missing thesaurus
rkd-plaats rkd-plaats:berlijn "Berlijn" <plaats_tentoonstelling> place of exhibition  
Artworks     <title>   This is free text. This field in RKDtechnical is duplicated in RKDimages, <benaming_kunstwerk> (object title). We will use the value from RKDimages. WD, 04-11-2011
Artworks     <title.other_older>   This is free text. This field in RKDtechnical is duplicated in RKDimages, <andere_benaming> (other/formal title). We will use the value from RKDimages. WD, 04-11-2011
shape     <object.shape>   This is free text. This field in RKDtechnical is duplicated in RKDimages, <vorm> (shape). We will use the value from RKDimages. WD, 04-11-2011
unit unit:Centimeter "cm"@nl <object.size.unit>   There is no Measurement Units thesaurus? Are you supporting such thesaurus? We plan to use QUDT thesaurus. We don't use a thesaurus for this field, but a drop down list of mm, cm, inch, pixels. However, the field is duplicated in RKDimages <eenheid> (unit) - not listed above, probably because it is free text - and we will use that value and not the one from RKDtechnical. WD, 04-11-2011
Maria: Thesaurus created by Ontotext
rkd-suppor rkd-support:panel--oak "panel (oak)"@en <object.support>   This field in RKDtechnical is duplicated in RKDimages, <drager> (support). We will use the value from RKDimages. WD, 04-11-2011
rkd-technique rkd-technique:oil-paint
"oil paint"@en <object.technique>   This field in RKDtechnical is duplicated in RKDimages, <materiaal> (medium/technique). We will use the value from RKDimages. WD, 04-11-2011
geographical terms     <whereabouts.city>   This is free text. This field in RKDtechnical is duplicated in RKDimages, <plaats_collectie_verblijfplaats> (collection location). We will use the value from RKDimages. WD, 04-11-2011
institutions names     <whereabouts.name>   This is free text. This field in RKDtechnical is duplicated in RKDimages, <collectienaam> (collection name). We will use the value from RKDimages. WD, 04-11-2011
RKDartists     <attribution.name>   This is free text. This field in RKDtechnical is duplicated in RKDimages, <naam> (artist name). We will use the value from RKDimages. WD, 04-11-2011
rkd-res_type rkd-res_type:x-radiography "X-radiography"@en
<research.type>    
rkd-person rkd-person:A_B_de_Vries
crm:P3_has_note  "Vries, A.B. de" <research.researcher>   Maria: Thesaurus created by Ontotext
rkd-documentation rkd-documentation:X-ray_film
"X-ray film"@en
<doc.type>   Maria: Thesaurus created by Ontotext
rkd-person rkd-person:P_Noble
crm:P3_has_note  "Noble, P."
<doc.creator>   Maria: Thesaurus created by Ontotext
unit unit:Centimeter
"cm"@nl
<doc.size.unit>   There is no Measurement Units thesaurus? Are you supporting such thesaurus? We plan to use QUDT thesaurus. See above
Maria: Thesaurus created by Ontotext
institutions names     <doc.whereabouts>   Maria: Free text due to missing thesaurus
rkd-whereabouts
_location
    <doc.whereabouts.location>   Maria: Free Text: In the rkd-sample-reduced the value is  "restauratieatelier", but I can't find such in the thesaurus.
institutions names     <file.image.location>   Maria: Free text due to missing thesaurus
rkd-objectstatus rkd-objectstatus:after-treatment
"after treatment"@en
<file.spec.object_status>    
rkd-area_captured rkd-area_captured:overall
"overall"@en
<file.spec.overall_detail>    
rkd-area_captured rkd-area_captured:FRONT
"front"@en
<file.spec.front_back>   Maria: Thesaurus value created in rkd-area_captured by Ontotext
rkd-area_captured     <reference_image.front>    
rkd-area_captured     <reference_image.back>    
institutions names     <file.application.location> file.application.location Vlado: not right, this says "RKD" (an abbreviation)
Maria: Free text due to missing thesaurus
rkd-sam_type rkd-sam-type:cross-section
"cross-section"@en
<sample.type>    
rkd-an_techn rkd-an_techn:analytical-microscopy
"analytical microscopy"@en
<sample.analytical_technique>    
institutions names     <sample.whereabouts>   Maria: Free text due to missing thesaurus
rkd-whereabouts_location     <sample.whereabouts.location>   Maria: Free Text: In the rkd-sample-reduced the value is "restauratieatelier", but I can't find such in the thesaurus.
rkd-area_captured rkd-area_captured:FROM_BOTTOM "from bottom"@en
<sample.location.vert.start>   Maria: Thesaurus value created in rkd-area_captured by Ontotext
rkd-area_captured rkd-area_captured:FROM_LEFT
"from left"@en
<sample.location.hor.start>   Maria: Thesaurus value created in rkd-area_captured by Ontotext.

Thesaurus Migration

Migration from Vocabularies RKDtechnical ALL TERM TYPES.xml,

This xml file contains information about most of RKDTechnical thesauri: support, technique, reason for sampling, sample type location, area color, document type, etc (see Thesauri description section above), both in English, German and Dutch. The values from the files could be split by different thesauri based on <term.type> field. It is different for each thesauri (TECHNIQUE, OBJECT_TYP, SOFTWARE, etc.). There are terms like "cardboard" with more than one <term.type> -  "SUPPORT", "TECHNIQUE" and "OBJECT_TYPE". This would mean that such terms must participate in more than one thesauri.

The following tags should be migrated from RKDtechnical ALL TERM TYPES. If there are sub tags to the listed tags, they should also be migrated.

  • <priref> - primary reference
    (Vlado's comment on <priref>: Lookup from Sussana to thesaurus is by label.
    It's nicer to generate URI from the EN label (or NL if EN is missing), see URI Scheme - Proposal; but is the label unique within the thesaurus? So we may ignore this?
  • <broader_term> (Relations - skos:broader, crm:P127 has broader term) - Since  term could have a broader term (only one) and narrower terms (more than one), we will ignore the narrower terms. A term could have a broader term or not.
  • <term.type> - It is design solution if all the terms will be split by <term.type> and migrated in different thesauri (then one term could be present in more then one thesaurus), or the mapping "data field" - "thesaurus" will be done by <term.type>. (Vlado comment: skos:inScheme (I hope!))
  • <term> - the thesaurus term in Eglish, German and Dutch (there are some terms only in Dutch, others in English and Dutch)

- <term.status> will not be migrated, because RKD think that the values are not meaningful for us. (Please ignore these values, they don't mean anything at the moment. WD, 04-11-2011)

Example:

Migration from Support RKD Images (Dutch).xml

The thesaurus is keeping values for different support materials, only in Dutch. There are 213 terms in  Support RKD Images (Dutch).xml and there are 208 terms in SUPPORT technical ( see Vocabularies RKDtechnical ALL TERM TYPES.xml, <term.type> = SUPPORT ). I made free comparison of values in both thesauri, and all the values that I checked from SUPPORT technical are present also in SUPPORT images. One value from Images that is not existing in technical, for example, is "ceramiek". In RKD Images there are 2 values: "ceramiek" and "keramiek" and "ceramiek" is <gebruikt_voor> (another use) of "keramiek". In RKD technical is existing only "keramiek".

! (Maria) My proposal is initially not to be migrated values from  Support RKD Images (Dutch).xml and all data fields using it to be mapped it to the terms in SUPPORT technical. Disadvantage is that 5 terms will be lost. The advantage is that RKDTechnical is multilingual, while  RKD Images is only in dutch.

! (Quotation from Wietske e-mail from 4.11.2011) For the Rembrandt Database, we will not use the "duplicated" thesauri, because we do not want to present the brief object information in RKDtechnical, but the much more elaborate object information in RKDimages. So in fact, all duplicated thesauri from RKDtechnical could be ignored for the mapping. However, you might like to have them, because they contain the translations in English.

If a decision for migration from RKD Images is taken, then the following tags to be migrated:

  • <priref>
  • <broader_term> (Relations - skos:broader, crm: P127_has_broader_term)
  • <term>

Example:

Migration from Vocabulary GEOGRAPHICAL THESAURUS (Dutch).xml

The xml contains detailed information about geographical places, like name of the place, broader and narrower places, equivalent names, other names (use for), detailed description and a lot of system information like date of import in the thesaurus, modification date, who imported the value, who modified etc. The values in the thesaurus are only in Dutch. (Task: Maria to send a letter to RKD, requesting  the thesaurus in Eng.)

  • <priref> - primary reference
  • <ruimere_term> (Relations -skos:broader, crm: P89 Falls within)- Broader term, the broader term also exists as a term. Since  term could have a broader term (only one) and narrower terms (more than one), we will ignore the narrower terms.
  • <equivalente_term> - equvalent term, also exists as a term. A term could have an equivalent terms or not.
  • <term> - the name of the place
  • <gebruik>  - use (alternative name), the names listed here exit as terms in the thesaurus. <gebruik> and <gebruikt_voor> could not exist in the same record together.
  • <gebruikt_voor> - used for (also known as), the names listed here exist as terms in the thesaurus, they are alternative names. A term could have other names (<gebruikt_voor>) or not.

! Our proposal is for all <equivalente_terms>, <gebruikt_voor> and <gebruik> terms to be created one URI with different labels for each term, instead of creating relation of type scos: Exact Match (in CRM there is no approriate realtion for equivalent terms). See the graph example below.

Example:

Example: Graph with values showing ruimere term, equivalent term, gebruikt voor and gebruik.

Input formats

RKD provided samples in several formats and are awaiting our decision which format we want.

  • XML: includes all possible info, but is more complicated. It takes about 30 sec. to RKD to generate a thesaurus in this format.
  • CSV. Vlado thought will be best for us, since we already have a similar conversion of BM thesauri (bm-csv2ttl.pl). But it has 2 defects:
    • RKD cannot provide all languages in one CSV (WD 04-11-2011).
      We could process in several passes (first creates nodes, other add labels), or the unix "join" command (part of GNU textutils) could help
    • If a column has several values, only one is present. But are there viable/useful multi-valued columns??
      I've seen this for "narrower", but we'll use "broader" instead. I haven't seen multiple "broader" (i.e. multi-parents)
      It may happen for "equivalent" and "related", but do we need them?
    • It takes about 5 min. to RKD to generate a thesaurus in this format, because they have to do this for each data language separately
  • XLS: manual compilation from CSV, more effort for RKD (it takes about 15 min), harder to read by us...
  • DAT: simple line-oriented format. Each line starts with a 2-letter code identifying the field (eg te=term, bt=broader_term).
    Big problem: does not indicate the language (but I guess they come in a fixed order). Eg:
    bt photograph (print)
    bt Foto (Print)
    bt foto (afdruk)
    te black and white photograph
    te Schwarz / Weiss Foto (Print)
    te zwart-witfoto (afdruk)
    

Vlado to RKD: our tech guys will confirm the selected input format and document here

Conversion approach

Proposed approaches and estimates (which tool, how many p/d):

  • Vlado: I made a simple Perl script (bm-csv2ttl.pl) to convert BM thesauri from Merlin CSV to RDF Turtle in SKOS as part of the RS demo (before we won it).
    See BM Thesauri for description and attachment. Uses module Text::CSV::auto.
    Took me 1-2d for 3 thesauri. It may be easy to adapt for this task.
  • Mitac: Java (with opencsv.jar), 1-2d for the first 2 samples. It might be easier to produce N-Triples instead of Turtle
  • SSL: We suggest normalising each incoming dataset first, and then supporting a single conversion from the normalised format into RDF.  To start off we'll convert the incoming data into the BM format and re-use Vlado's perl script.  That will give us working code and converted data.  If we have time, it would be good to chose a recognised standard to normalise to instead of the BM format.  We might need to maintain/modify the perl a bit, but that should be straightforward.  We'd like to put 4 days effort into this.
  • Kalin?? (I guess it's overkill to ask 3 estimates)

Output SKOS format

Vlado: I propose the following format (https://svn.ontotext.com/svn/researchspace/data/thesauri.ttl)

Important Notes

  • "rkd-object:;" is valid Turtle syntax (empty local name) and is tested in OWLIM. Could also be written as "rkd:object;", doesn't matter
  • The thesauri were made up during the mapping of Susanna (sample painting) and need to be adjusted to real data.
    Eg the above puts frame material (wood--gold_plated) and painting technique (oil_paint) in one skos:Scheme, but in the RKD data, SUPPORT is a separate thesaurus
  • Maria: devise a URI scheme and propose it to RKD for clearance (it makes sense to use http://rkd.nl as prefix, since it's their data)
  • Mariana: load SKOS ontology with appropriate reasoning, so when we assert skos:broader, it infers broaderTransitive
    (and maybe narrower and narrowerTransitive: will be useful for the UI to build a thesaurus tree)

URI Scheme - Proposal

For each thesaurus value we propose the following URI to be generated

http://rkd.nl/thesaurus/_thesaurus_name/thesaurus_value_, where.

  • http://rkd.nl is the prefix
  • thesaurus - giving information that it is thesaurus URI
  • thesaurus_name - is the name of the thesaurus
  • thesaurus_value - is the concrete value from the thesaurus. URI is formed from EN or NL label:
    • Space " " is replaced with underscore "_"
    • Brackets "(...)" is replaced with two slashes "--" (hopefully reflecting hierarchical values)

Example: URI of value "panel (birch wood)" in "support" thesaurus:

Questions

  1. Which thesauri do you use?
    • We assume RKD locations, people, concepts, artworks, IconClass, but would like you to confirm explicitly
    • Are these the same well-known RKD thesauri shown in red on the attached picture? Can you tell from the numbers? E.g. People (=RKDartists) having 331,455 Please see my comments above under "Thesauri description". WD, 04-11-2011
  2. Can you urgently send us the thesauri used by the Rembrandt project (within 1 week)? Don't' wait to translate them all. Multilingual is not critical for us: even if it's only in Dutch, we'll implement some language-fallback in the UI. Send us the large thesauri like "People", "Locations" etc. Preferably in SKOS if you have them in that format. 2 sample thesauri are received. We will send others. WD, 04-11-2011 
  3. Is it possible to make an export with the ID's (codes/URIs) of the controlled fields, in addition to the text label? Yes, as you have seen in the sample thesauri. I hope that is what you meant. WD, 04-11-2011
  4. If not: can you list for each controlled field:
    the thesaurus it came from, which branch (parent), whether it’s immediate child or descendant of any depth? We need this info so we can lookup and store the ID.
  5. Please, is it possible for you to send us thesauri in different languages together in one csv?
  6. Please confirm, that you agree with the proposed from Vladimir format for URI's (see the above section)?

Thesaurus Properties

There is a properties file to map data fields contents to their thesaurus type and language counterparts

https://svn.ontotext.com/svn/researchspace/trunk/entity-api/src/resources/thes.properties

Thesarus Parsing script

The thesaurus parsing python script is no longer an attachment on this page but at

https://svn.ontotext.com/svn/researchspace/trunk/parseThes/src/parse_thes.py

An example of use:

Validating the output

A useful validator but there is a size limit the thesurus-place.ttl has to chopped up to validate.

http://www.rdfabout.com/demo/validator/

Thesarus Turtle files

The thesaurus turtle files (generated by the above) are in svn

https://svn.ontotext.com/svn/researchspace/trunk/data/thesauri-all.ttl
https://svn.ontotext.com/svn/researchspace/trunk/data/thesauri-place.ttl

These incorporate the resolution of issues RS-170, RS-167, RS-95 and RS-171

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Nov 18, 2011

    tna

    From Matthew posing as Mike Stapleton (I am getting an account here soon).

    In the "Vocabulary GEOGRAPHICAL THESAURUS.xml" some of the broader terms seem to be referenced in a case insensitive way say the term "Afrika" has a broader term <ruimere_term>wereld</ruimere_term> where there is a term in another record <term>Wereld</term>. Also in the same thesaurus file there is <term>Cardiff College of Art</term> which has a broader term <ruimere_term>Cardiff (Wales): opleiding</ruimere_term> there is no such term in the file but there is a <term>Cardiff (Wales)</term> now according to the dictionary I am using "opleiding" is  lead-up, breeding, training, is latter a broader term and the colon separated text irrelevant?

    In the "Vocabulary DOCUMENTATION TYPE.xml" file.

    The broader term of "research data (digital born)" "tekst of grafische weergave (digital born)" is missing.
    The broader term of "research report (digital born)" "tekst of grafische weergave (digital born)" is missing.

    The following are top level items without broader terms or narrower terms.

    id 1635
    en-US unknown
    de-DE unbekannt
    nl-NL onbekend
    AREA_CAPTURED
    OBJECTSTATUS
    SHAPE
    DOC_TYPE
    LOC_SAMPLE

    id 1797
    en-US Imaging plates Fuji BAS 2000, 20 x 40 cm
    de-DE Bildspeicherplatte Fuji BAS 2000, 20 x 40 cm
    nl-NL Imaging plates Fuji BAS 2000, 20 x 40 cm
    DOC_TYPE

    id 1817
    en-US negatives 7.5" x 9.5"
    de-DE negatives 7.5" x 9.5"
    nl-NL negatives 7.5" x 9.5"
    DOC_TYPE

    id 1818
    de-DE color transparencies 6.25" x 4.5"
    DOC_TYPE

    Is this all to be expected and what would be the best thing to do with orphan terms without a broader term and terms with a broader term not in the thesaurus?

    Matthew

  2. Nov 18, 2011

    tna

    Matthew here again.

    In the example ttl you have rkd-object andrkd-material does this map from this section of the thesaurus xml files.

    <term.type option="DOC_TYPE" value="DOC_TYPE">
                    <text language="0">Document type</text>
                    <text language="1">Type documentatie</text>
                    <text language="3">Dokumentart</text>
                </term.type>
                <term.type>Document type</term.type>

    Do the rkd-XXXX from the term type of the records so the above record might be

    rkd-DOC_TYPE

    Sorry if this is naïve but I can't where else this would map from.

  3. Nov 18, 2011

    tna

    The supplementary question: If there are more than one term.type how does this map do we have a separate rkd-XXX triplet for each type?

    I will check that narrower terms share the same types as broader terms, if not we might end up with orphans.

  4. Nov 18, 2011

    tna

    I have uploaded 4 ttl files that are my preliminary cut of converting the thesaurus xml files.

    Damning criticism welcomed.

    Matthew