Skip to end of metadata
Go to start of metadata

Established thesauri and thesaurus conversion tools

Example

An excellent example of using thesauri is Semantic Annotation of Image Collections by eCulture. It uses ULAN, WordNet, AAT and IconClass for semantic tagging of paintings. Regarding the subject matter of the framework, it uses an Agent-Action-Object-Recipient approach, eg:

Agent: "woman" (WordNet)
Action: "give" (WordNet)
Object: "flower" (WordNet)
Recipient: "Chagall, Marc" (ULAN)
Agent: "Derain, Andre" (ULAN)
Action: "smoke" (WordNet)
Object: "pipes(smoking equipment)" (AAT)

SKOS

The preferred way of representing vocabularies in RDF is Simple Knowledge Organization System (SKOS), see W3C SKOS Primer. (The examples below assume the ULAN thesaurus is represented as RDF using an ulan: prefix)

  • Each term is a Concept in the respective scheme (thesaurus), eg "ulan:Rembrandt a skos:Concept; skos:inScheme ulan:"
  • The term is in an appropriate hierarchy, eg "ulan:Rembrandt skos:broader ulan:classicalPainter".
  • We can use this to find appropriate terms for auto-complete or data-entry. Eg assuming "ulan:classicalPainter skos:broader ulan:painter", this should find individual painters as leaf-level children of ulan:painter:
  • In order to use theasuri URIs in CRM networks, we should enrich them with the respective CRM class, eg "ulan:Rembrandt a crm:E21_Person". This will let us link Rembrandt to CRM entities and apply CRM properties to it (see E21 Person@crm).
  • SKOS provides a dedicated property "skos:exactMatch" to map concepts with equivalent meaning. Our offer says we'll use owl:sameAs for Terminology Mapping, to benefit from OWLIM's sameAs optimization. But SKOS Primer says:

    Note on skos:exactMatch vs. owl:sameAs: SKOS intentionally does not use owl:sameAs. When two resources are linked with owl:sameAs they are considered to be the same resource, and triples involving these resources are merged. This does not fit what is needed in most SKOS applications. Eg if we say "ex1:animal owl:sameAs ex2:animals", we'd get:

    This would make ex:animal inconsistent, as a concept cannot possess two different preferred labels in the same language.

    • See discussion on Pedantic Web mailing list
    • My answer is: I don't care if we violate some minor SKOS nitpicking It's more important to state that ulan:Rembrandt and viaf:Rembrandt are the same (owl:sameAs) person (crm:E21_Person), so that painting data using either thesaurus will be interlinked.

Skosify

Skosify is an open source (MIT license) tool that can be used to convert vocabularies expressed as RDFS/ OWL into SKOS.
It can also be used to improve, enrich and validate existing SKOS vocabularies.
Low maturity (says MarinD, Onto's CTO).

GettyConvert

GettyConvert ("E-culture Getty vocabulary conversion suite") is a set of schemas and Prolog scripts for converting the convert the Getty Thesauri (XML files) into RDF.

  • From a cursory inspection, I don't think it makes SKOS
  • The destination is the eCulture SWI-Prolog semweb server (ClioPatria), but likely the RDF files can be used in any repository
  • Can be very useful to figure out how to deal with the details of the Getty thesauri

Specific Thesauri

Cultural Heritage LOD

Cultural Heritage has its own LOD cloud:

WordNet

By Princeton University.
125k English words with various relations: synonym, homonym/hypernym (more general/specific), meronym (substance/part inclusion), etc.
Canonic conversion to RDF by W3C, created around 2005.
Integrated in Ontotext's FactForge.

Getty Thesauri

The Getty Research Institute develops the following important thesauri.
Size is expressed as number of triples (as stated in GettyConvert):

  • Thesaurus of Geographic Names (TGN): 6,476,448
    Why not simply use GeoNames? Because TGN has historic places (a time dimension).
    Vlado: actually I've seen this in the BM Thesaurus, but I guess it's also true of TGN
  • Union List of Artist Names (ULAN): 1,502,089
    Lots of details about the artists: places (requires TGN), nationalities, roles, relations (associatedWith, with subtypes such as teacherOf, patronWas...)
  • Art and Architecture Thesaurus (AAT): 249,162
    Object types, materials, techniques, etc
  • Cultural Object Name Authority (CONA) (in development)

They are licensed, not freely available

BM Thesauri

See separate page

VIAF (People)

VIAF (Virtual International Authority File)

  • joint project of several national libraries plus selected regional and trans-national library agencies. (total 20 libraries)
  • implemented and hosted by OCLC
  • advice from Europeana scientific advisor: "should you want to access the VIAF data, you can send a request to hickey@oclc.org. They are quite willing to share it, even though not openly on the web at the moment"
  • includes all sorts of authors including historic.
    Eg Methodius Saint, Apostle of the Slavs ca. 825-884
    • shows co-authors, eg
      Cyril, Saint, Apostle of the Slavs, ca. 827-869.
  • links to WorldCat, eg:
    Works: 729 works in 1,092 publications in 24 languages and 9,222 library holdings
  • Includes Publication Timeline, search by categories (e.g. "Christian saints, Slavic")
  • Related people:
    Cyril Saint, Apostle of the Slavs ca. 827-869
    Dinekov, Petŭr Nikolov 1910- Editor
    Popruzhenko, M. G. (Mikhail Georgievich) b. 1866 Editor
    Kliment Ohridski d. 916
    Grivec, František
    Romanski, St (Stoi︠a︡n) 1882-1959 Compiler
    Grivec, Franc 1878-
    Angelov, Boni︠u︡ St Editor
  • Related organizations:
    Catholic Church
    Orthodox Eastern Church

Here's an example of Rembrandt, correlated between 20+ libraries!

Lexicon of Greek Names (people, places, professions)

http://clas-lgpn2.classics.ox.ac.uk

Google Places

Pleiades (ancient places)

http://pleiades.stoa.org/
A community-built gazetteer and graph of ancient places. 34k places
Unable to render embedded object: File (thesaurus-Pleiades-places.png) not found.

IconClass (painting subject)

Subject classification codes for paintings. Eg "gods> Greek gods> Olypmians> Zeus"

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.