Established thesauri and thesaurus conversion tools
Example
An excellent example of using thesauri is Semantic Annotation of Image Collections by eCulture. It uses ULAN, WordNet, AAT and IconClass for semantic tagging of paintings. Regarding the subject matter of the framework, it uses an Agent-Action-Object-Recipient approach, eg:
Agent: "woman" (WordNet) Action: "give" (WordNet) Object: "flower" (WordNet) Recipient: "Chagall, Marc" (ULAN) |
Agent: "Derain, Andre" (ULAN) Action: "smoke" (WordNet) Object: "pipes(smoking equipment)" (AAT) |
SKOS
The preferred way of representing vocabularies in RDF is Simple Knowledge Organization System (SKOS), see W3C SKOS Primer. (The examples below assume the ULAN thesaurus is represented as RDF using an ulan: prefix)
- Each term is a Concept in the respective scheme (thesaurus), eg "ulan:Rembrandt a skos:Concept; skos:inScheme ulan:"
- The term is in an appropriate hierarchy, eg "ulan:Rembrandt skos:broader ulan:classicalPainter".
- We can use this to find appropriate terms for auto-complete or data-entry. Eg assuming "ulan:classicalPainter skos:broader ulan:painter", this should find individual painters as leaf-level children of ulan:painter:
- In order to use theasuri URIs in CRM networks, we should enrich them with the respective CRM class, eg "ulan:Rembrandt a crm:E21_Person". This will let us link Rembrandt to CRM entities and apply CRM properties to it (see E21 Person@crm).
- SKOS provides a dedicated property "skos:exactMatch" to map concepts with equivalent meaning. Our offer says we'll use owl:sameAs for Terminology Mapping, to benefit from OWLIM's sameAs optimization. But SKOS Primer says:
Note on skos:exactMatch vs. owl:sameAs: SKOS intentionally does not use owl:sameAs. When two resources are linked with owl:sameAs they are considered to be the same resource, and triples involving these resources are merged. This does not fit what is needed in most SKOS applications. Eg if we say "ex1:animal owl:sameAs ex2:animals", we'd get:
This would make ex:animal inconsistent, as a concept cannot possess two different preferred labels in the same language.
- See discussion on Pedantic Web mailing list
- My answer is: I don't care if we violate some minor SKOS nitpicking
It's more important to state that ulan:Rembrandt and viaf:Rembrandt are the same (owl:sameAs) person (crm:E21_Person), so that painting data using either thesaurus will be interlinked.
Skosify
Skosify is an open source (MIT license) tool that can be used to convert vocabularies expressed as RDFS/ OWL into SKOS.
It can also be used to improve, enrich and validate existing SKOS vocabularies.
Low maturity (says MarinD, Onto's CTO).
GettyConvert
GettyConvert ("E-culture Getty vocabulary conversion suite") is a set of schemas and Prolog scripts for converting the convert the Getty Thesauri (XML files) into RDF.
- From a cursory inspection, I don't think it makes SKOS
- The destination is the eCulture SWI-Prolog semweb server (ClioPatria), but likely the RDF files can be used in any repository
- Can be very useful to figure out how to deal with the details of the Getty thesauri
Specific Thesauri
Cultural Heritage LOD
Cultural Heritage has its own LOD cloud:
WordNet
By Princeton University.
125k English words with various relations: synonym, homonym/hypernym (more general/specific), meronym (substance/part inclusion), etc.
Canonic conversion to RDF by W3C, created around 2005.
Integrated in Ontotext's FactForge.
Getty Thesauri
The Getty Research Institute develops the following important thesauri.
Size is expressed as number of triples (as stated in GettyConvert):
- Thesaurus of Geographic Names (TGN): 6,476,448
Why not simply use GeoNames? Because TGN has historic places (a time dimension).
Vlado: actually I've seen this in the BM Thesaurus, but I guess it's also true of TGN - Union List of Artist Names (ULAN): 1,502,089
Lots of details about the artists: places (requires TGN), nationalities, roles, relations (associatedWith, with subtypes such as teacherOf, patronWas...) - Art and Architecture Thesaurus (AAT): 249,162
Object types, materials, techniques, etc - Cultural Object Name Authority (CONA) (in development)
They are licensed, not freely available
BM Thesauri
See separate page
VIAF (People)
VIAF (Virtual International Authority File)
- joint project of several national libraries plus selected regional and trans-national library agencies. (total 20 libraries)
- implemented and hosted by OCLC
- advice from Europeana scientific advisor: "should you want to access the VIAF data, you can send a request to hickey@oclc.org. They are quite willing to share it, even though not openly on the web at the moment"
- includes all sorts of authors including historic.
Eg Methodius Saint, Apostle of the Slavs ca. 825-884- shows co-authors, eg
Cyril, Saint, Apostle of the Slavs, ca. 827-869.
- shows co-authors, eg
- links to WorldCat, eg:
Works: 729 works in 1,092 publications in 24 languages and 9,222 library holdings - Includes Publication Timeline, search by categories (e.g. "Christian saints, Slavic")
- Related people:
Cyril Saint, Apostle of the Slavs ca. 827-869
Dinekov, Petŭr Nikolov 1910- Editor
Popruzhenko, M. G. (Mikhail Georgievich) b. 1866 Editor
Kliment Ohridski d. 916
Grivec, František
Romanski, St (Stoi︠a︡n) 1882-1959 Compiler
Grivec, Franc 1878-
Angelov, Boni︠u︡ St Editor - Related organizations:
Catholic Church
Orthodox Eastern Church
Here's an example of Rembrandt, correlated between 20+ libraries!
Lexicon of Greek Names (people, places, professions)
http://clas-lgpn2.classics.ox.ac.uk
Google Places
Pleiades (ancient places)
http://pleiades.stoa.org/
A community-built gazetteer and graph of ancient places. 34k places
Unable to render embedded object: File (thesaurus-Pleiades-places.png) not found.
IconClass (painting subject)
Subject classification codes for paintings. Eg "gods> Greek gods> Olypmians> Zeus"
- learn about it: http://www.iconclass.nl
- browse the classification: http://www.iconclass.org