View Source

KIM can be [customized in multiple ways|http://ontotext.com/kim/tailorKIM.html] to suit different [semantic annotation|http://ontotext.com/kim/semanticannotation.html] and search needs. One way to do this is to change the text analysis pipeline to find new types of entities and facts, and use the conceptual models and instance bases relevant to a certain domain.

The case study below describes the methods of adopting a third-party ontology (DBpedia) in KIM, incorporating it in the default IE pipeline, and making the pipeline aware of the knowledge base for this new mapped ontology.

h2. Some background and description of the task

The resources in the KIM default IE pipeline depend on the KIM default ontology - the [PROTON ontology|http://proton.semanticweb.org/]. It is the formal structure of the KIM knowledge base. Due to the complexity of the IE process, adding a new ontology to KIM and making KIM aware of it, is not a one step process.
If the PROTON ontology and the new one are very similar as a domain, you only have to align them. But if they are completely different, then you have to go through all the stages of integrating an ontology in KIM. In most cases, the task is a mixture of both. Part of the new ontology may be usable by just aligning it to PROTON, and the other part - by making the processing resources aware of this new part (adding it to lists, rules etc.).

h2. Prerequisites

In order to integrate a small subset of the DBpedia ontology in KIM, you have to incorporate it in the KIM default IE pipeline. To this end, you will use:

* KIM 3 with the default pipeline - from the installation
* the labels model:
** the labels model is set at {{install.properties}} \- {{com.ontotext.kim.KIMConstants.ENTITY_DESCR = Labels}}.
** you have to load the entity labels from the World Knowledge Base (wkb) - ([_Files for reuse^wkb-labels.nt.zip]) - in the OWLIM semantic repository.
* a DBpedia extract
** a small subset of the original DBpedia ([http://dbpedia.org]) ontology - [_Files for reuse^dbpedia-ontology.zip]

Below you will see an extract from the DBpedia ontology. For simplicity, in the rest of the case study, we will call it the DBpedia ontology.

| !_Images for reuse^dbpedia_taxonomy.png! | !_Images for reuse^dbpedia_taxonomuy_organisation.png! | !_Images for reuse^dbpedia_taxonomy_person.png! | !_Images for reuse^dbpedia_place.png! |

h2. Ontology integration steps

Let's say you want to include the classes highlighted in blue in the IE process.

These three major classes (Person, Organization, Location) are very important for adopting a third-party ontology and extending the default IE pipeline. In this process you actually create Lookup annotations and later transform them in annotations of these types. Finally, many of the processing resources of the default IE pipeline use them to form other annotations.

h3. Importing the DBpedia ontology in KIM

!_Images for reuse^task.png!
*The first step: import the DBpedia ontology in KIM.*

!_Images for reuse^to_do.png!
* Create a sub-folder in the KIM context folder. It will be used as storage for all the RDF data for this task.
_For example_, create *<KIM_HOME>/context/default/kb/dbpedia/*.

(!) We recommend this location but you can put your RDF data anywhere in the KIM context folder.

* Put *dbpedia_3.5.1.owl*, containing the DBpedia taxonomy, in *<KIM_HOME>/context/default/kb/dbpedia/*
* Put *dbpedia_instances.nt*, containing the actual objects description, in *<KIM_HOME>/context/default/kb/dbpedia/*
* Include the new files in the _import_ section of owlim (*<KIM_HOME>/config/owlim.ttl*):
{code:text}............

kb/wkb.nt;
kb/wkb-labels.nt;
kb/wkbx.nt;
kb/dbpedia/dbpedia_3.5.1.owl;
kb/dbpedia/dbpedia_instances.nt;" ;

owlim:defaultNS
"http://www.w3.org/2002/07/owl#;
............
{code}

(!) Do not forget to add the corresponding namespaces in the {{defaultNS}} section.

Now we have a running KIM with DBpedia loaded, but it is pretty autonomous and cannot change the IE process a lot. We need to map it to PROTON.

h3. Mapping the DBpedia ontology to PROTON

*Create a file called dbpedia_proton.nt*

For mapping the DBpedia ontology to PROTON you can use either equivalence, or subsumption mechanisms.

(/) Generally subsumption is preferred, because equivalence sometimes has awkward side effects.

!_Images for reuse^task.png!
*Subclass the three major classes from DBpedia to the corresponding three major classes in PROTON.*

(!) This means, for example, that when you make a query to the knowledge base about People from PROTON, the result will be both people from DBpedia and PROTON. On the other hand, if you search for DBpedia people, the result will be only DBpedia people.

!_Images for reuse^to_do.png!
* Create a file called *dbpedia_proton.nt* in which you will store all the RDF data that aligns DBpedia ontology to PROTON ontology.
* Put this file in the *dbpedia kb* folder.
{code:xml}
<http://dbpedia.org/ontology/Place> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://proton.semanticweb.org/2006/05/protont#Location> .
<http://dbpedia.org/ontology/Person> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://proton.semanticweb.org/2006/05/protont#Person> .
<http://dbpedia.org/ontology/Organisation> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://proton.semanticweb.org/2006/05/protont#Organization> .
{code}
* Include it in the import section of the *Owlim config* file.

h3. Enrich the DBpedia instances with information usable for KIM

*Create a file called dbpedia_kim.nt*

!_Images for reuse^task.png!
*Add a descriptive statement for each DBpedia instance, so that KIM can differentiate them from the other instances in the KB.*

!_Images for reuse^to_do.png!
* Add a statement for each of the new DBpedia entities, in order to be able to differentiate them.
{code:xml}
<http://dbpedia.org/resource/Aristotle> <http://proton.semanticweb.org/2006/05/protons#generatedBy> <http://dbpedia.org/page/DBpedia> .
{code}
* Create a new file *dbpedia_kim.nt* in *<KIM_HOME>/context/default/kb/dbpedia* and put these statements there.
* Put the definition of DBPedia source:
{code:xml}
<http://dbpedia.org/page/DBpedia> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://proton.semanticweb.org/2006/05/protons#Trusted> .
{code}

h3. Managing labels

*Create a file called dbpedia_labels.nt*

Labels are used for entity recognition in text, so they are quite important for the IE process.
The KIM model relies on the use of {{rdfs:label}} and {{protons:mainLabel}} . Therefore, we advise you to set sensible values for {{label}} and {{mainLabel}} for each instance in your KB.

!_Images for reuse^task.png!
*Set a* {{label}} *and* {{mainLabel}} *to each DBpedia entity.*
!_Images for reuse^to_do.png!
* Set a mainLabel to each entity:
{code:xml}
<http://dbpedia.org/resource/Aristotle> <http://proton.semanticweb.org/2006/05/protons#mainLabel> "Aristotle" .
{code}
* Set labels to each entity (several options):
** Manually set a label from the {{foaf:name}} property to the entities from DBpedia:
{code:xml}
<http://dbpedia.org/resource/Aristotle> <http://www.w3.org/2000/01/rdf-schema#label> "Aristotélēs" .
{code}
** Use inference rules to generate the same statements.

(!) Rules are better utilized with more complex scenarios. We recommend you to do it with explicit statements when possible.

For example, if you want all {{foaf:name}} properties to exist also as labels, you can add the following rule to *<KIM_HOME>/context/default/kb/KIMRules.pie*:
{code:xml}
e <protons:generatedBy> <http://dbpedia.org/page/DBpedia>
e <foaf:name> name
----------------------
e <rdfs:label> name
{code}
* Tell OWLIM that these two properties are the same by stating:
{code:xml}
<http://dbpedia.org/property/name> <http://www.w3.org/2002/07/owl#sameAs> <http://www.w3.org/2000/01/rdf-schema#label> .
{code}

h3. Setting the visibility of classes

KIM has a mechanism to control the visibility of different taxonomy parts in the WEB UI. It is done through a property.

!_Images for reuse^task.png!
*Make your new classes from DBpedia visible in the Web UI.*
!_Images for reuse^to_do.png!
* Add statements such as:
{code:xml}
<http://dbpedia.org/ontology/Philosopher> <http://www.ontotext.com/kim/2006/05/kimso#visibilityLevel1> "" .
{code}
* Append those statements to *<KIM_HOME>/context/default/kb/visibility.nt*.

(!) There is not much sense to make visible the classes that can be subclassed to PROTON. Such as {{Philosopher}} is a {{Person}}. They can be queried by their parents. Other classes that can not be directly mapped, and are subclassed to {{protons:Entity}}, should be displayed, to be able to chose them, when making queries.

h3. Setting up the gazetteers

!_Images for reuse^task.png!
*Set up a gazetteer, in order to recognize the newly added entities in texts.*

Now we want to recognize our newly added entities in texts. Matching them with the KIM default Large Knowledge Base (LKB) gazetteer won't be of any use, because the objects in the DBpedia taxonomy tree have a lot of subclasses and KIM works best with a more horizontal structure. We can lookup all the DBpedia entities with the default gazetteer, but they will be in the form of {{Lookup}} annotation with a strange class. For example, if we find {{Aristotle}} it will be a Lookup annotation with these features:
!_Images for reuse^Aristotle_annotation.png!

Therefore, for this particular case of DBpedia, we will create a different gazetteer for each major class. These gazetteers will create lookups in separate annotation sets. Then, with the help of jape grammars, these lookups will be used to create Person, Location, Organization annotations in the default annotation set.

!_Images for reuse^to_do.png!
* Create a different gazetteer for each of the classes (Person, Location, Organization)
* Generate the gazetteers lists by using a query to the semantic repository. The path to the directory, containing the query, is set in {{FeedSetupPath}} parameter. This directory should have a {{query.txt}} file, in either SeRQL or SPARQL that returns instances with their labels and class. The one below is an example SPARQL query for Person.
{code}
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix protont: <http://proton.semanticweb.org/2006/05/protont#>
PREFIX protons: <http://proton.semanticweb.org/2006/05/protons#>

SELECT ?la ?entity ?cl
WHERE {
?entity a ?cl ;
rdfs:label ?la ;
protons:generatedBy <http://dbpedia.org/page/DBpedia> .
?cl rdfs:subClassOf protont:Person .
OPTIONAL
{
?sc rdfs:subClassOf ?cl.
?entity a ?sc .
filter(?cl != ?sc)
}
filter (!bound(?sc) && isURI(?cl))
}
{code}

(/) This query returns the lowest class of the entities type, which is the original class. The others are inferred.

This is how the Person gazetteer is set up. The others look identical:
!_Images for reuse^gaz_person_setup.png!
* Run them to match the mentions in text with the same entries from the gazetteers lists. This automatically creates Lookup annotations.
* Transform the Lookup annotations into Person, Location, Organization annotations, with the help of Jape rules.

h3. Setting up the Jape grammars

!_Images for reuse^task.png!
*Transform the Lookup annotations into Person, Location, Organization annotations.*

After we have created Lookup annotations for the new entities, we need grammars to convert those Lookups into meaningful types.

!_Images for reuse^to_do.png!
* Add a Jape transducer for each gazetteer.

(x) Each transducer must have for {{inputAS}} the corresponding set - Person for {{dbpedia_person.jape}}, etc., and for {{outputAS}} \- the default annotation set.
The grammars in these transducers should take the Lookup annotations from the corresponding annotation set and create more meaningful annotations in the default annotation set.

\_Example:_For Person the grammar looks like this:
!_Images for reuse^jape_dbpedia_person.png!

For the other types the approach is the same.
* Name the jape rules - *dbpedia_person.jape* , *dbpedia_location.jape* and *dbpedia_organization.jape*, and put them in *<KIM_HOME>/context/default/grammars/dbpedia*.

h2. The result

As a result we've got a document, visible through the KIM web interface, in which Aristotle is recognized as an entity. Below is the page [http://www.iep.utm.edu/aristotl/] annotated with Aristotle from DBpedia:

!_Images for reuse^Aristotle_search.png!