BM thesauri, with terms count for each one
See BMX Issues#Thesaurus Requirements.
BM Thesauri
Some of the BM thesauri are shown below. All are described in Meta-Thesaurus and FR Names#Meta-thesaurus table
Thesaurus | Terms |
---|---|
Escape | 88 |
Ethname | 3333 |
MatCult | 1440 |
Material | 2010 |
Object | 5802 |
Places | 45521 |
School | 475 |
Subject | 1429 |
Technique | 594 |
Treatment | 719 |
Ware | 659 |
Counts include both authorised and unauthorised terms. Unauthorised terms are not in the main hierarchy so they don't carry so much semantic content although they may have scope notes.
Obsolete
Conversion to SKOS
Vlado wrote a simple Perl script bm-csv2ttl.pl to convert BM thesauri from Merlin CSV to RDF Turtle in SKOS.
Each term maps to a simple SKOS fragment (places have some extra info):
The mapping is described in the next section
Merlin CSV format
Comma-separated. Last two fields may include \n and \r and are enclosed in ". May include full UTF8 (eg Chinese hieroglyphs).
Fields (prefix (bm|mus)auth_thes):
- name: thesaurus type, eg OBJECT, MATERIAL, Place. Maps to lowercase:
skos:inScheme id:thesauri/\L$name\E ; - id: code, eg x6202. Maps to the row's URI
id:thesauri/$id - term: label. Maps to
skos:prefLabel "$term" ; - term_discrim: discriminator
- if 'deleted': skip row
- else adds a different sense to term
- eg: "yoke (harness)" vs "yoke (clothing)" vs just "yoke" (which is a load-carrier).
- eg "Messina (Transvaal)" vs "Messina (province - Sicily)" vs "Messina (city - Sicily)"
- References use this syntax (eg "burin (stone tool)")
- name_type (for Place only): 'modern', 'archaic'. Maps to
crm:PX.place_name_type "$name_type" ; - use_for: synonyms. Semicolon-separated, maps to multiple triples:
crm:PX.use_for $use_for ; - r_terms: related terms. Not used for now
- b_term: broader terms that create the hierarchy
- Poly-hierarchical (i.e. multiple inheritance):
"shark-rattle" is a "fish-lure"
"fish-gorge" is a "fishing equipment";
"throwing-weapon" is a "hunting equipment;weapon" - match the string to find its $parent_id. Unfortunately parent terms don't always come before the child term, so 2 passes are needed
- semi-colon separated. Maps to multiple triples
skos:broader id:thesauri/$parent_id ;
- Poly-hierarchical (i.e. multiple inheritance):
- calc_current_hcode: hierarchical codes
- Eg "throwing-weapon" has two hcodes: 2.1.4.F;2.5.D.RE.
- All its sub-items also have two hcodes with matching last component (here *):
2.1.4.F.*;2.5.D.RE.* - Uses special chars unknown to me: :, <, >, newline
- scope_note: some are a description, others a random note
- uses special chars: \r \n
Labels:
None