Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

The resources in the KIM default IE pipeline depend on the KIM default ontology - the PROTON ontology.
When you want to add a completely different conceptual model, you have to go through all the stages of mapping an ontology to PROTON.

Prerequisites

In order to integrate a small subset of a new ontology in KIM, you have to incorporate it in the KIM default IE pipeline. To this end, you will use:

  • KIM 3 with the default pipeline - from the installation
  • the labels model:
    • the labels model is set at install.properties - com.ontotext.kim.KIMConstants.ENTITY_DESCR = Labels.
    • you have to load the entity labels from the World Knowledge Base (wkb) - ([\_Files for reuse^wkb-labels.nt.zip|]) - in the OWLIM semantic repository.
  • an ontology extract
    • a small subset of the original ontology - .zip file

Ontology integration steps

These three major classes (Person, Organization, Location) are very important for adopting a third-party ontology and extending the default IE pipeline. In this process you actually create Lookup annotations and later transform them in annotations of these types. Finally, many of the processing resources of the default IE pipeline use them to form other annotations.

Importing an ontology in KIM

The first step: import the DBpedia ontology in KIM.

  • Create a sub-folder in the KIM context folder. It will be used as storage for all the RDF data for this task.
    For example, create <KIM_HOME>/context/default/kb/dbpedia/.

We recommend this location but you can put your RDF data anywhere in the KIM context folder.

  • Put dbpedia_3.5.1.owl, containing the DBpedia taxonomy, in <KIM_HOME>/context/default/kb/dbpedia/
  • Put dbpedia_instances.nt, containing the actual objects description, in <KIM_HOME>/context/default/kb/dbpedia/
  • Include the new files in the import section of owlim (<KIM_HOME>/config/owlim.ttl):

Do not forget to add the corresponding namespaces in the defaultNS section.

Now we have a running KIM with DBpedia loaded, but it is pretty autonomous and cannot change the IE process a lot. We need to map it to PROTON.

Mapping the DBpedia ontology to PROTON

Create a file called dbpedia_proton.nt

For mapping the DBpedia ontology to PROTON you can use either equivalence, or subsumption mechanisms.

Generally subsumption is preferred, because equivalence sometimes has awkward side effects.

Unable to render embedded object: File (task.png) not found.
Subclass the three major classes from DBpedia to the corresponding three major classes in PROTON.

This means, for example, that when you make a query to the knowledge base about People from PROTON, the result will be both people from DBpedia and PROTON. On the other hand, if you search for DBpedia people, the result will be only DBpedia people.

Unable to render embedded object: File (to_do.png) not found.

  • Create a file called dbpedia_proton.nt in which you will store all the RDF data that aligns DBpedia ontology to PROTON ontology.
  • Put this file in the dbpedia kb folder.
  • Include it in the import section of the Owlim config file.

Enrich the DBpedia instances with information usable for KIM

Create a file called dbpedia_kim.nt

Unable to render embedded object: File (task.png) not found.
Add a descriptive statement for each DBpedia instance, so that KIM can differentiate them from the other instances in the KB.

Unable to render embedded object: File (to_do.png) not found.

  • Add a statement for each of the new DBpedia entities, in order to be able to differentiate them.
  • Create a new file dbpedia_kim.nt in <KIM_HOME>/context/default/kb/dbpedia and put these statements there.
  • Put the definition of DBPedia source:

Managing labels

Create a file called dbpedia_labels.nt

Labels are used for entity recognition in text, so they are quite important for the IE process.
The KIM model relies on the use of rdfs:label and protons:mainLabel . Therefore, we advise you to set sensible values for label and mainLabel for each instance in your KB.

Unable to render embedded object: File (task.png) not found.
Set a label and mainLabel to each DBpedia entity.
Unable to render embedded object: File (to_do.png) not found.

  • Set a mainLabel to each entity:
  • Set labels to each entity (several options):
    • Manually set a label from the foaf:name property to the entities from DBpedia:
    • Use inference rules to generate the same statements.

Rules are better utilized with more complex scenarios. We recommend you to do it with explicit statements when possible.

For example, if you want all foaf:name properties to exist also as labels, you can add the following rule to <KIM_HOME>/context/default/kb/KIMRules.pie:

  • Tell OWLIM that these two properties are the same by stating:

Setting the visibility of classes

KIM has a mechanism to control the visibility of different taxonomy parts in the WEB UI. It is done through a property.

Unable to render embedded object: File (task.png) not found.
Make your new classes from DBpedia visible in the Web UI.
Unable to render embedded object: File (to_do.png) not found.

  • Add statements such as:
  • Append those statements to <KIM_HOME>/context/default/kb/visibility.nt.

There is not much sense to make visible the classes that can be subclassed to PROTON. Such as Philosopher is a Person. They can be queried by their parents. Other classes that can not be directly mapped, and are subclassed to protons:Entity, should be displayed, to be able to chose them, when making queries.

Setting up the gazetteers

Unable to render embedded object: File (task.png) not found.
Set up a gazetteer, in order to recognize the newly added entities in texts.

Now we want to recognize our newly added entities in texts. Matching them with the KIM default Large Knowledge Base (LKB) gazetteer won't be of any use, because the objects in the DBpedia taxonomy tree have a lot of subclasses and KIM works best with a more horizontal structure. We can lookup all the DBpedia entities with the default gazetteer, but they will be in the form of Lookup annotation with a strange class. For example, if we find Aristotle it will be a Lookup annotation with these features:
Unable to render embedded object: File (Aristotle_annotation.png) not found.

Therefore, for this particular case of DBpedia, we will create a different gazetteer for each major class. These gazetteers will create lookups in separate annotation sets. Then, with the help of jape grammars, these lookups will be used to create Person, Location, Organization annotations in the default annotation set.

Unable to render embedded object: File (to_do.png) not found.

  • Create a different gazetteer for each of the classes (Person, Location, Organization)
  • Generate the gazetteers lists by using a query to the semantic repository. The path to the directory, containing the query, is set in FeedSetupPath parameter. This directory should have a query.txt file, in either SeRQL or SPARQL that returns instances with their labels and class. The one below is an example SPARQL query for Person.

This query returns the lowest class of the entities type, which is the original class. The others are inferred.

This is how the Person gazetteer is set up. The others look identical:
Unable to render embedded object: File (gaz_person_setup.png) not found.

  • Run them to match the mentions in text with the same entries from the gazetteers lists. This automatically creates Lookup annotations.
  • Transform the Lookup annotations into Person, Location, Organization annotations, with the help of Jape rules.

Setting up the Jape grammars

Unable to render embedded object: File (task.png) not found.
Transform the Lookup annotations into Person, Location, Organization annotations.

After we have created Lookup annotations for the new entities, we need grammars to convert those Lookups into meaningful types.

Unable to render embedded object: File (to_do.png) not found.

  • Add a Jape transducer for each gazetteer.

Each transducer must have for inputAS the corresponding set - Person for dbpedia_person.jape, etc., and for outputAS - the default annotation set.
The grammars in these transducers should take the Lookup annotations from the corresponding annotation set and create more meaningful annotations in the default annotation set.

_Example:_For Person the grammar looks like this:
Unable to render embedded object: File (jape_dbpedia_person.png) not found.

For the other types the approach is the same.

  • Name the jape rules - dbpedia_person.jape , dbpedia_location.jape and dbpedia_organization.jape, and put them in <KIM_HOME>/context/default/grammars/dbpedia.

The result

As a result we've got a document, visible through the KIM web interface, in which Aristotle is recognized as an entity. Below is the page http://www.iep.utm.edu/aristotl/ annotated with Aristotle from DBpedia:

Unable to render embedded object: File (Aristotle_search.png) not found.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.