Adding instance data and facts to the knowledge base
IE can be enhanced by modeling a set of predefined entities in the knowledge base. Generally, they will be used by KIM when some requirements are met:
- should be of type which is a subclass of protons:Entity
- should have at least one alias (label)
- should be generated by a Trusted source
The choice of these entities is usually driven by two aspects:
Entities of high importance in a domain are likely to appear often and their correct extraction is crucial. Important entities also have aliases as a result of frequently referencing them. These other names are likely to be hard to classify and identify.
- Existing lists
If you want to customize KIM in a particular domain, where you will use semantic annotation, it will be easier to obtain lists of instances of the domain-specific classes. For example, all robot brands and models in a robot domain, or all location names in a geographically-restricted domain (like South-African Politics). These lists can be transformed into entities and modeled into an extension of the instance or knowledge base associated with the KIM.
Entities usually have more than one name (e.g. R2D2 and R-Two D-Two). In KIM this is modeled by the helper classes Alias and MainAlias, as well as the respective relations hasAlias and hasMainAlias. The MainAlias is the official or most popular name and is used for entity representation in the KIM Web UI or the client applications.
The alias definitions and the individual declaration should be in the same place.
Here is an example of modeling R2D2's class and names:
Example: in.space.n3 (part 1)
The URIs of the labels, like wkb:Robot_R2D2.1, don't need to be in that exact format, ending in .<number>. They only need to be unique.
- Relation modeling
Entities may have relations to other entities. These relations must be defined in the ontology with respect to domain and range. An example of such a relation is assigning a creator to a robot:
Example: in.space.n3 (part 2)
- Source (generatedBy)
During the phrase-lookup (gazetteer) phase of the default IE module, KIM annotates each mention of a subset of the entities in the knowledge base. By default, the dictionary of pre-defined named entities that are recognized in texts consists of those entities in the knowledge base that:
- are of a type that is a subclass of protons:Entity
- have at least one alias
- are marked as Trusted
To mark an entity as trusted, make sure that the semantic repository contains statements, similar to the following one:
in.space.n3 (part 3)
The only requirement for an entity to be marked as trusted, is that this entity is generated by a source, which is of type protons:Trusted. There are some trusted sources, defined at the top of wkb.nt . You can also define your own.