As a semantic annotation platform, KIM uses ontologies and knowledge bases. It can also generate and store new knowledge. This section explains the roles of the different data sources related to KIM. It also clarifies which are mandatory and which are subject to configuration, extension or customization.
KIM is based on the PROTON ontology, developed in the scope of the Semantically Enabled Knowledge Technologies (SEKT) project. KIM depends solely on the System module of PROTON that is further extended by KIMSO (KIM System ontology). The other related ontologies - KIMLO (KIM Lexical ontology) and the Top and Upper modules of PROTON, are part of the distribution. KIM makes use of them, but they can also be replaced, changed, or extended.
PROTON is a light-weight upper-level ontology that defines about 300 classes and 100 properties, covering most of the upper-level concepts, necessary for semantic annotation, indexing, and retrieval. It is separated into three modules:
|System module||contains a few meta-level primitives (5 classes and 5 properties). It introduces the notion of 'entity' that may have aliases. The primitives on this level are usually a few elements that have to be hard-coded in the ontology-based applications. This module can be considered as an application ontology.|
|Top module||the highest, most general, conceptual level, consisting of about 20 classes. They ensure a good balance of utility, domain independence, and ease of understanding and usage. The top layer is usually the best level for establishing alignment to other ontologies and schemata.|
|Upper module||over 200 general classes of entities, that often appear in multiple domains (E.g. various sorts of organizations, a comprehensive range of locations, etc.)|
The diagram below demonstrates the dependencies between the different modules of PROTON and the KIM specific modules. The strongest dependency is placed at the bottom of the diagram.
The diagram also illustrates potential extension/customization paths. You can easily extend or substitute (partially or completely) the proprietary PROTON Top and Upper modules with any other Top and Upper ontologies. (Here, this is represented by the Custom Top Ontology and Custom Upper Ontology). However, we expect the PROTON Top and Upper modules to be efficient, consistent, and generic enough to satisfy the needs in the majority of use cases.
Application Ontology and Domain Ontology are designators of the respective specific application and/or domain ontologies that will be mapped to PROTON as its extensions, depending on the requirements of each particular use case.
To summarize, in order to be recognized by KIM, any extensions to the ontology must be integrated with PROTON and KIMSO.
In order to integrate an ontology extension, you have to accomplish two steps:
- the new classes should subclass protons:Entity
- set the new classes' visibility level
Make sure that the new classes inherit at least http://proton.semanticweb.org/2006/05/protons#Entity, directly or indirectly.
|We recommend that, if applicable, your classes inherit one of these PROTON Top classes:|
By default, the ontology extension modules (* .owl) are located in the <KIM_HOME>/context/default/kb/owl folder. You can place them somewhere else if you prefer, as long as you include them in the OWLIM configuration file correctly.
All classes should have the "label" attribute set. You should pay special attention when using a third-party schema designer to create the ontology extension.
Here is an example extension module that defines "Robot" as a class and also defines the relation "createdBy":
If you have an existing ontology in OWL, you can alternatively create a separate mapping file with only rdfs:subClass statements . Then add both your ontology and the mapping file to the KIM Server installation. See this step-by-step mapping from DBPedia ontology integration case-study as an example:
All new classes that should be visible in the class hierarchy in the web interface and in the Structure/Patterns screens must be declared "visible" in the <KIM_HOME>/context/default/kb/visibility.nt file like this:
Example: visibility.nt (addition)
IE can be enhanced by modeling a set of predefined entities in the knowledge base. Generally, they will be used by KIM when some requirements are met:
- should be of type which is a subclass of protons:Entity
- should have at least one alias (label)
- should be generated by a Trusted source
The choice of these entities is usually driven by two aspects:
Entities of high importance in a domain are likely to appear often and their correct extraction is crucial. Important entities also have aliases as a result of frequently referencing them. These other names are likely to be hard to classify and identify.
- Existing lists
If you want to customize KIM in a particular domain, where you will use semantic annotation, it will be easier to obtain lists of instances of the domain-specific classes. For example, all robot brands and models in a robot domain, or all location names in a geographically-restricted domain (like South-African Politics). These lists can be transformed into entities and modeled into an extension of the instance or knowledge base associated with the KIM.
Entities usually have more than one name (e.g. R2D2 and R-Two D-Two). In KIM this is modeled by the helper classes Alias and MainAlias, as well as the respective relations hasAlias and hasMainAlias. The MainAlias is the official or most popular name and is used for entity representation in the KIM Web UI or the client applications.
The alias definitions and the individual declaration should be in the same place.
Here is an example of modeling R2D2's class and names:
Example: in.space.n3 (part 1)
The URIs of the labels, like wkb:Robot_R2D2.1, don't need to be in that exact format, ending in .<number>. They only need to be unique.
- Relation modeling
Entities may have relations to other entities. These relations must be defined in the ontology with respect to domain and range. An example of such a relation is assigning a creator to a robot:
Example: in.space.n3 (part 2)
- Source (generatedBy)
During the phrase-lookup (gazetteer) phase of the default IE module, KIM annotates each mention of a subset of the entities in the knowledge base. By default, the dictionary of pre-defined named entities that are recognized in texts consists of those entities in the knowledge base that:
- are of a type that is a subclass of protons:Entity
- have at least one alias
- are marked as Trusted
To mark an entity as trusted, make sure that the semantic repository contains statements, similar to the following one:
in.space.n3 (part 3)
The only requirement for an entity to be marked as trusted, is that this entity is generated by a source, which is of type protons:Trusted. There are some trusted sources, defined at the top of wkb.nt . You can also define your own.
If you extend the KIM ontology prior to using KIM to annotate a document, we recommend that you add the new RDF files to the initial load list of the embedded OWLIM.
- Place all RDF files created above in <KIM_HOME>/context/default/kb or any sub-folder.
- Append the files to the initial load list in <KIM_HOME>/config/owlim.ttl, noting the RDF files relative paths to <KIM_HOME/context/default. See the following example configuration:
For details on configuration options, see the OWLIM documentation. When adding ontologies with more than 1 mln. statements, you will need to edit the database configuration in order to have good performance. The OWLIM product page on our website also offers presentations, free white papers and use cases, demonstrating the capabilities of the semantic repository.
Finally, reset KIM to its initial state by stopping the server if started, including Apache Tomcat and removing the whole <KIM_HOME>/context/default/populated directory. This removes all annotated documents.
If you need to extend the ontology of a KIM Server without losing existing documents, don't change the initial load list of the embedded BigOWLIM, but import the new RDF using the RDF Import Tool. Afterwards, stop the KIM Server and Tomcat, delete the whole <KIM_HOME>/context/default/populated/cache directory and start the server again. Naturally, in that case, the ontology changes will not be reflected in the previously annotated documents.