h1. Bibliography
h2. Amalgame
VUA's alignment server, implemented in SWI Prolog, using ClioPatria
h2. Concepts in Context (2010) workshop
h2. CultureGraph-Metafacture
h2. Dagobert Soergel
h2. Evaluation
h2. Multilingualism and Terminology (Linked Heritage Seminar 20130418)
h2. S-Match
h2. SILK
h2. string-matching
h1. Specific Papers

h2. [Comparison of schema matching evaluations|]
Hong-Hai Do, Sergey Melnik, Erhard Rahm, International Workshop Web and Databases,
Lecture Notes in Computer Science, vol. 2593, Springer, Berlin, 2003

h2. [Thesaurus Alignment for Linked Data Publishing|]
Ahsan Morshed, Caterina Caracciolo, Gudrun Johannsen, Johannes Keizer (DC 2011)

[Abstract|]): As part of the publication of the AGROVOC thesaurus as Linked Data (LD), AGROVOC is now mapped with six well-known thesauri in the agricultural domain, i.e., EUROVOC, NALT, GEMET, STW, LCSH, RAMEAU. To find matching candidates, known matching algorithms discussed in the literature and available from public API were used. Results were evaluated by a domain expert, and almost total precision obtained. The candidate matches that were confirmed have already been added to the LD version of AGROVOC. Moreover, the owners of two of the thesauri mapped with AGROVOC have included in their data the mapping we identified. From this work, we conclude that we achieved our goal to enhance the Linked Data version of AGROVOC with reliable links to other thesauri, following a procedure that is fully replicable

- This paper is the basis of Dominic's spec
- Lists matching systems, esp ones evaluated by [OEAI|]. But this is Ontology Alignment Evaluation Initiative, does it include studies of instance matching?
-- COMA++, RiMOM, FALCON-AO, S-Match
-- S-Match uses WordNet for synonyms
- Describes AGROVOC, EUROVOC, NALT, GEMET, STW, LCSH, RAMEAU with 1 para each
- Considered all possible pairs for matching. This is computationally intensive, more advanced approaches use blocking/windowing (creating subsets/clusters)
- Validation was done manually by a multilingual expert, took 40p/d. He looked at all 30k matches exported to excel, and verified in each thesaurus, also looking at hierarchical context
- See [AGROVOC to NALT Analysis] for a detailed analysis

h2. [Semantic Problems of Thesaurus Mapping|]
Martin Doerr, Journal of Digital Information, 2001. [PDF|^Semantic Problems of Thesaurus Mapping (JODI 2001).pdf])
([Abstract|]: With networked information access to heterogeneous data sources, the problem of terminology provision and interoperability of controlled vocabulary schemes such as thesauri becomes increasingly urgent. Solutions are needed to improve the performance of full-text retrieval systems and to guide the design of controlled terminology schemes for use in structured data, including metadata. Thesauri are created in different languages, with different scope and points of view and at different levels of abstraction and detail, to accommodate access to a specific group of collections. In any wider search accessing distributed collections, the user would like to start with familiar terminology and let the system find out the correspondences to other terminologies in order to retrieve equivalent results from all addressed collections.
This paper investigates possible semantic differences that may hinder the unambiguous mapping and transition from one thesaurus to another. It focusses on the differences of meaning of terms and their relations as intended by their creators for indexing and querying a specific collection, in contrast to methods investigating the statistical relevance of terms for objects in a collection. It develops a notion of optimal mapping, paying particular attention to the intellectual quality of mappings between terms from different vocabularies and to problems of polysemy. Proposals are made to limit the vagueness introduced by the transition from one vocabulary to another. The paper shows ways in which thesaurus creators can improve their methodology to meet the challenges of networked access of distributed collections created under varying conditions. For system implementers, the discussion will lead to a better understanding of the complexity of the problem.

h2. [Effective Terminology Support for Distributed Digital Collections|]
Martin Doerr, DELOS Workshop, 1998

h2. [Method for Estimating the Precision of Placename Matching|]
Martin Doerr, Manos Papagelis, IEEE Trans. Knowl. Data Eng. 19(8): 1089-1101 (2007)
- [alt pdf|]
- [NKOS 2004 presentation|], [direct download|]

Estimates the precision of a place gazetteer based on probabilities of the same name to identify 1,2,3,etc places (i.e. place polysemy).

h2. [Matching unstructured vocabularies using a background ontology|]
Aleksovski, Z., Klein, M., ten Kate, W., van Harmelen, F. (EKAW 2006)

Abstract. Existing ontology matching algorithms use a combination of lexical and structural correspondance between source and target ontologies. We present a realistic case-study where both types of overlap are low: matching two unstructured lists of vocabulary used to describe patients at Intensive Care Units in two different hospitals. We show that indeed existing matchers fail on our data. We then discuss the use of background knowledge in ontology matching problems. In particular, we discuss the case where the source and the target ontology are of poor semantics, such as flat lists, and where the background knowledge is of rich semantics, providing extensive descriptions of the properties of the concepts involved. We evaluate our results against a Gold Standard set of matches that we obtained from human experts.
- uses hierarchical info for the matching