- Thesaurus Alignment for Linked Data Publishing (abstract):
Ahsan Morshed, Caterina Caracciolo, Gudrun Johannsen, Johannes Keizer. DC 2011
As part of the publication of the AGROVOC thesaurus as Linked Data (LD), AGROVOC is now mapped with six well-known thesauri in the agricultural domain, i.e., EUROVOC, NALT, GEMET, STW, LCSH, RAMEAU. To find matching candidates, known matching algorithms discussed in the literature and available from public API were used. Results were evaluated by a domain expert, and almost total precision obtained. The candidate matches that were confirmed have already been added to the LD version of AGROVOC. Moreover, the owners of two of the thesauri mapped with AGROVOC have included in their data the mapping we identified. From this work, we conclude that we achieved our goal to enhance the Linked Data version of AGROVOC with reliable links to other thesauri, following a procedure that is fully replicable
The basis of Dominic's spec
- Semantic Problems of Thesaurus Mapping (abstract, PDF)
Martin Doerr, Journal of Digital Information, 2001
- Effective Terminology Support for Distributed Digital Collections
Martin Doerr, DELOS Workshop, 1998
- Vlado: I think I'll convert the spec to confluence, since I can't read the images in viewdoc, and it'll be easier to discuss.
- Compare SILK features to spec requirements
- Try out SILK on Smithsonian data and compare to USC results
Idea for extra functionality: ability to map a smaller local thesaurus to a large remote thesaurus (eg VIAF) by invoking a search API.
Rationale: sometimes a user may not be able to load the large thesaurus locally, either because of size or licensing (eg Getty ULAN)
Difficulty: it's a completely different data matching problem; you rely on the remote API to provide resaonable matches