The Large Knowledge Base (LKB) gazetteer consists of a set of lists containing concepts (Persons, Locations, Organizations, etc.) loaded directly from the semantic repository you are going to use as a background knowledge, instead of predetermined and flat set of gazetteer lists. It means that certain annotated entities are linked to specific instances in the semantic repository. The LKB is part of the GATE distribution and provides efficient representation of very large vocabularies, as well as query-based selective loading from RDF databases.
The Large Knowledge Base (KB) Gazetteer allows loading collections of identifiers and labels and using them for gazetteer lookup. It uses custom implementations of the "DictionaryFeeder" interface to populate its dictionaries, which can utilize arbitrary dictionary data sources.
The LKB Gazetteer supports a Static dictionary loaded at component initialization and Dynamic dictionaries loaded on Document processing.
- Static Dictionary highlights:
- Can hold huge amounts of data (millions of entries)
- Can be serialized - for faster second-time loading
- Can be shared in memory by multiple Gaz instance working in parallel
- Dynamic Dictionaries highlights:
- Are populated during Document processing and can hold just the needed.
- Can be loaded with up to the last minute data or Document dependent data.
- Are disposed at Document procesing end.
(/)The Gazetteer can use just Static or just Dynamic or both dictionaries simultaneously.
This PR is not compatible with the older openrdf jars which appear by default in the lib folder of GATE 5.0 and thus with all components that depend on them like the Ontology LR or the OntoRootGazetteer. Remove the openrdf jars and don't load those plugins to take advantage of this PR. This issue is corrected in next versions.
The LKB makes use of a number of configuration files such as the set of SPARQL queries to be used on the ontology.