KIM can be customized in multiple ways to suit different semantic annotation and search needs. One way to do this is to change the text analysis pipeline to find new types of entities and facts, and use the conceptual models and instance bases relevant to a certain domain.
The case study below describes the methods of adopting a third-party ontology (DBpedia) in KIM, incorporating it in the default IE pipeline, and making the pipeline aware of the knowledge base for this new mapped ontology.
The resources in the KIM default IE pipeline depend on the KIM default ontology - the PROTON ontology. It is the formal structure of the KIM knowledge base. Due to the complexity of the IE process, adding a new ontology to KIM and making KIM aware of it, is not a one step process.
If the PROTON ontology and the new one are very similar as a domain, you only have to align them. But if they are completely different, then you have to go through all the stages of integrating an ontology in KIM. In most cases, the task is a mixture of both. Part of the new ontology may be usable by just aligning it to PROTON, and the other part - by making the processing resources aware of this new part (adding it to lists, rules etc.).
In order to integrate a small subset of the DBpedia ontology in KIM, you have to incorporate it in the KIM default IE pipeline. To this end, you will use:
- KIM 3 with the default pipeline - from the installation
- the labels model:
- the labels model is set at install.properties - com.ontotext.kim.KIMConstants.ENTITY_DESCR = Labels.
- you have to load the entity labels from the World Knowledge Base (wkb) - (wkb-labels.nt.zip) - in the OWLIM semantic repository.
- a DBpedia extract
Below you will see an extract from the DBpedia ontology. For simplicity, in the rest of the case study, we will call it the DBpedia ontology.
Let's say you want to include the classes highlighted in blue in the IE process.
These three major classes (Person, Organization, Location) are very important for adopting a third-party ontology and extending the default IE pipeline. In this process you actually create Lookup annotations and later transform them in annotations of these types. Finally, many of the processing resources of the default IE pipeline use them to form other annotations.
The first step: import the DBpedia ontology in KIM.
- Create a sub-folder in the KIM context folder. It will be used as storage for all the RDF data for this task.
For example, create <KIM_HOME>/context/default/kb/dbpedia/.
We recommend this location but you can put your RDF data anywhere in the KIM context folder.
- Put dbpedia_3.5.1.owl, containing the DBpedia taxonomy, in <KIM_HOME>/context/default/kb/dbpedia/
- Put dbpedia_instances.nt, containing the actual objects description, in <KIM_HOME>/context/default/kb/dbpedia/
- Include the new files in the import section of owlim (<KIM_HOME>/config/owlim.ttl):
Do not forget to add the corresponding namespaces in the defaultNS section.
Now we have a running KIM with DBpedia loaded, but it is pretty autonomous and cannot change the IE process a lot. We need to map it to PROTON.
Create a file called dbpedia_proton.nt
For mapping the DBpedia ontology to PROTON you can use either equivalence, or subsumption mechanisms.
Generally subsumption is preferred, because equivalence sometimes has awkward side effects.
Subclass the three major classes from DBpedia to the corresponding three major classes in PROTON.
This means, for example, that when you make a query to the knowledge base about People from PROTON, the result will be both people from DBpedia and PROTON. On the other hand, if you search for DBpedia people, the result will be only DBpedia people.
- Create a file called dbpedia_proton.nt in which you will store all the RDF data that aligns DBpedia ontology to PROTON ontology.
- Put this file in the dbpedia kb folder.
- Include it in the import section of the Owlim config file.
Create a file called dbpedia_kim.nt
Add a descriptive statement for each DBpedia instance, so that KIM can differentiate them from the other instances in the KB.
- Add a statement for each of the new DBpedia entities, in order to be able to differentiate them.
- Create a new file dbpedia_kim.nt in <KIM_HOME>/context/default/kb/dbpedia and put these statements there.
- Put the definition of DBPedia source:
Create a file called dbpedia_labels.nt
Labels are used for entity recognition in text, so they are quite important for the IE process.
The KIM model relies on the use of rdfs:label and protons:mainLabel . Therefore, we advise you to set sensible values for label and mainLabel for each instance in your KB.
Set a label and mainLabel to each DBpedia entity.
- Set a mainLabel to each entity:
- Set labels to each entity (several options):
- Manually set a label from the foaf:name property to the entities from DBpedia:
- Use inference rules to generate the same statements.
Rules are better utilized with more complex scenarios. We recommend you to do it with explicit statements when possible.
For example, if you want all foaf:name properties to exist also as labels, you can add the following rule to <KIM_HOME>/context/default/kb/KIMRules.pie:
- Tell OWLIM that these two properties are the same by stating:
KIM has a mechanism to control the visibility of different taxonomy parts in the WEB UI. It is done through a property.
Make your new classes from DBpedia visible in the Web UI.
- Add statements such as:
- Append those statements to <KIM_HOME>/context/default/kb/visibility.nt.
There is not much sense to make visible the classes that can be subclassed to PROTON. Such as Philosopher is a Person. They can be queried by their parents. Other classes that can not be directly mapped, and are subclassed to protons:Entity, should be displayed, to be able to chose them, when making queries.
Set up a gazetteer, in order to recognize the newly added entities in texts.
Now we want to recognize our newly added entities in texts. Matching them with the KIM default Large Knowledge Base (LKB) gazetteer won't be of any use, because the objects in the DBpedia taxonomy tree have a lot of subclasses and KIM works best with a more horizontal structure. We can lookup all the DBpedia entities with the default gazetteer, but they will be in the form of Lookup annotation with a strange class. For example, if we find Aristotle it will be a Lookup annotation with these features:
Therefore, for this particular case of DBpedia, we will create a different gazetteer for each major class. These gazetteers will create lookups in separate annotation sets. Then, with the help of jape grammars, these lookups will be used to create Person, Location, Organization annotations in the default annotation set.
- Create a different gazetteer for each of the classes (Person, Location, Organization)
- Generate the gazetteers lists by using a query to the semantic repository. The path to the directory, containing the query, is set in FeedSetupPath parameter. This directory should have a query.txt file, in either SeRQL or SPARQL that returns instances with their labels and class. The one below is an example SPARQL query for Person.
This query returns the lowest class of the entities type, which is the original class. The others are inferred.
This is how the Person gazetteer is set up. The others look identical:
- Run them to match the mentions in text with the same entries from the gazetteers lists. This automatically creates Lookup annotations.
- Transform the Lookup annotations into Person, Location, Organization annotations, with the help of Jape rules.
Transform the Lookup annotations into Person, Location, Organization annotations.
After we have created Lookup annotations for the new entities, we need grammars to convert those Lookups into meaningful types.
- Add a Jape transducer for each gazetteer.
Each transducer must have for inputAS the corresponding set - Person for dbpedia_person.jape, etc., and for outputAS - the default annotation set.
The grammars in these transducers should take the Lookup annotations from the corresponding annotation set and create more meaningful annotations in the default annotation set.
_Example:_For Person the grammar looks like this:
For the other types the approach is the same.
- Name the jape rules - dbpedia_person.jape , dbpedia_location.jape and dbpedia_organization.jape, and put them in <KIM_HOME>/context/default/grammars/dbpedia.
As a result we've got a document, visible through the KIM web interface, in which Aristotle is recognized as an entity. Below is the page http://www.iep.utm.edu/aristotl/ annotated with Aristotle from DBpedia: