Skip to end of metadata
Go to start of metadata

BM thesauri, with terms count for each one

See BMX Issues#Thesaurus Requirements.

BM Thesauri

Name Size Creator Creation Date Comment  
File BM-thesauri.rar 20.77 MB Vladimir Alexiev Jan 10, 2013 18:01    

Some of the BM thesauri are shown below. All are described in Meta-Thesaurus and FR Names#Meta-thesaurus table

Thesaurus Terms
Escape 88
Ethname 3333
MatCult 1440
Material 2010
Object 5802
Places 45521
School 475
Subject 1429
Technique 594
Treatment 719
Ware 659

Counts include both authorised and unauthorised terms. Unauthorised terms are not in the main hierarchy so they don't carry so much semantic content although they may have scope notes.


Name Size Creator Creation Date Comment  
ZIP Archive 788 kB Vladimir Alexiev Jul 20, 2011 09:01 material, object, place in Turtle  
ZIP Archive 1.07 MB Vladimir Alexiev Jul 20, 2011 09:01 material, object, place in Merlin TTL  
File 2 kB Vladimir Alexiev Jul 20, 2011 09:01 convert from CSV to Turtle  
XML File thesaurus-place-china-merlin.xml 1.74 MB Vladimir Alexiev Jul 18, 2011 12:37 XML (Merlin): places in China  
File thesaurus-place-china.rdf 5 kB Vladimir Alexiev Jul 18, 2011 12:37 RDF: places in China  

Conversion to SKOS

Vlado wrote a simple Perl script to convert BM thesauri from Merlin CSV to RDF Turtle in SKOS.
Each term maps to a simple SKOS fragment (places have some extra info):

The mapping is described in the next section

Merlin CSV format

Comma-separated. Last two fields may include \n and \r and are enclosed in ". May include full UTF8 (eg Chinese hieroglyphs).

Fields (prefix (bm|mus)auth_thes):

  • name: thesaurus type, eg OBJECT, MATERIAL, Place. Maps to lowercase:
    skos:inScheme id:thesauri/\L$name\E ;
  • id: code, eg x6202. Maps to the row's URI
  • term: label. Maps to
    skos:prefLabel "$term" ;
  • term_discrim: discriminator
    • if 'deleted': skip row
    • else adds a different sense to term
      • eg: "yoke (harness)" vs "yoke (clothing)" vs just "yoke" (which is a load-carrier).
      • eg "Messina (Transvaal)" vs "Messina (province - Sicily)" vs "Messina (city - Sicily)"
    • References use this syntax (eg "burin (stone tool)")
  • name_type (for Place only): 'modern', 'archaic'. Maps to
    crm:PX.place_name_type "$name_type" ;
  • use_for: synonyms. Semicolon-separated, maps to multiple triples:
    crm:PX.use_for $use_for ;
  • r_terms: related terms. Not used for now
  • b_term: broader terms that create the hierarchy
    • Poly-hierarchical (i.e. multiple inheritance):
      "shark-rattle" is a "fish-lure"
      "fish-gorge" is a "fishing equipment";
      "throwing-weapon" is a "hunting equipment;weapon"
    • match the string to find its $parent_id. Unfortunately parent terms don't always come before the child term, so 2 passes are needed
    • semi-colon separated. Maps to multiple triples
      skos:broader id:thesauri/$parent_id ;
  • calc_current_hcode: hierarchical codes
    • Eg "throwing-weapon" has two hcodes: 2.1.4.F;2.5.D.RE.
    • All its sub-items also have two hcodes with matching last component (here *):
    • Uses special chars unknown to me: :, <, >, newline
  • scope_note: some are a description, others a random note
    • uses special chars: \r \n
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.