View Source

KIM can be [customized in multiple ways|http://ontotext.com/kim/tailorKIM.html] to suit different [semantic annotation|http://ontotext.com/kim/semanticannotation.html] and search needs. One way to do this is to change the text analysis pipeline to find new types of entities and facts, and use the conceptual models and instance bases relevant to a certain domain.

The example below describes how to replace the KIM IE pipeline with a custom GATE stored application (in this case [ANNIE|http://gate.ac.uk/sale/tao/splitch6.html#chap:annie]).

h2. Some background and description of the task

By default KIM uses a GATE pipeline with customized processing resources, configured for the common task of annotating news articles. This makes it possible to replace the whole annotation functionality in KIM with an arbitrary GATE application ([saved in a .gapp file|http://gate.ac.uk/sale/tao/splitch3.html#sec:developer:savestate]).

The process of annotating documents can be divided into three steps:

*Step 1: Create* *[GATE annotations|http://gate.ac.uk/sale/tao/splitch5.html#sec:corpora:dags]* *using a pipeline of processing resources* - tokenizer, POS tagger, gazetteers, etc.;

*Step 2: Assign "class" to the entity annotations that do not have this feature*.

The *Instance Generator* (IG) processing resource recognizes the following types of annotations and assigns ontology classes to them: Person, Location, Organization, KeyPhrase, Date and Time. The assigned classes are from the PROTON ontology ({{[http://proton.semanticweb.org/2006/05/protont#Person]}}, etc).

When you have *custom annotation types*, or other ontology classes, you should assign a *"class" feature* to all your annotations.
For instance, if you have annotations of the type _"ArchaeologicalSite"_, you should put a feature {{{_}class = "http://proton.semanticweb.org/2006/05/protont#Location"_}}. However, if there is a more appropriate class in another ontology, you may use it instead of the more general class {{.../protont#Location}}. For example, {{{_}class = "http://example.com/exampleOntology#ArchaeologicalSite"_}}. In most cases, the simplest way to add a *"class" feature* to a given annotation type is to create a JAPE grammar matching that type. Put the JAPE transducer right before the Instance Generator.

*Step 3: Link classified annotations to the respective named entities in the KIM Knowledge Base. Create new entities if needed.*

The IG processing resource uses the annotation features to create instances in the KB.

For example, if "Kim Basinger" is annotated with class {{protont:Person}}, the IG will generate an instance for her with class {{protont:Person}} and main label "Kim Basinger". If such statements do not exist in the KB, they will be created.

We are going to integrate the ANNIE pipeline as the first step of the annotation process. For the purpose of Steps 2 and 3, we have to include the Instance Generator in the {{.gapp file}} that comes with the ANNIE plug-in.

h2. Prerequisites

To follow the next steps you have to be familiar with semantic annotation, GATE processing resources and storing GATE applications. We assume that you have a clean installation of KIM.

h2. Adapting a Gate pipeline for KIM

!_Images for reuse^task.png!
*Replace the KIM default pipeline with ANNIE with defaults.*
!_Images for reuse^to_do.png!
* Copy the *ANNIE_with_defaults.gapp* file from the folder *KIM/plugins/ANNIE* to *KIM/context/default/resources*.
(/) The *KIM/context/default/resources* folder is the place where we put the GATE applications in KIM. The default KIM pipeline is described in the *IE.gapp* file.
* Open *KIM/config/nerc.properties* and replace this line:

{code}
com.ontotext.kim.KIMConstants.IE_APP=IE.gapp
{code}
with:
{code}
com.ontotext.kim.KIMConstants.IE_APP=ANNIE_with_defaults.gapp
{code}

The GATE application file holds relative paths to the resources used in it.

* Open *ANNIE_with_defaults.gapp* file and fix the relative path:

{code:xml}
<urlList class="gate.util.persistence.CollectionPersistence">
<localList>
<gate.util.persistence.PersistenceManager-URLHolder>
<urlString>$relpath$../../../plugins/ANNIE</urlString>
</gate.util.persistence.PersistenceManager-URLHolder>
</localList>
{code}

If you start KIM now and populate some documents, you'll notice that in the *Facets* screen no entities are associated with the documents. However, if you start the GATE developer (by running the kim script with parameter "gate"), you can annotate documents and observe some annotations. That is because KIM uses instance data, which is created by the Instance Generator, or included by the default KIM gazetteer. If a representative URI is not included for an annotation, the {{"document_X mentions entity_Y"}} relation is not created. Therefore, you have to customize the pipeline to include the IG.

!_Images for reuse^task.png!
*Customize the pipeline to include the Instance Generator PR.*

!_Images for reuse^to_do.png!
* Add the Instance Generator PR to the *ANNIE_with_defaults.gapp* file. Place the following resource at the end of the PR list:

{code:xml}
....
<gate.util.persistence.LanguageAnalyserPersistence>
<resourceType>com.ontotext.kim.gate.KIMInstanceGeneratorWrapper</resourceType>
<resourceName>Instance Generator_0003D</resourceName>
<initParams class="gate.util.persistence.MapPersistence">
<mapType>gate.util.SimpleFeatureMapImpl</mapType>
<localMap>
<entry>
<string>staticDictSerializationPath</string>
<gate.util.persistence.PersistenceManager-URLHolder>
<urlString>$relpath$../populated/cache</urlString>
</gate.util.persistence.PersistenceManager-URLHolder>
</entry>
<entry>
<string>inputIGIdentityModel</string>
<null/>
</entry>
</localMap>
</initParams>
<features class="gate.util.persistence.MapPersistence">
<mapType>gate.util.SimpleFeatureMapImpl</mapType>
<localMap/>
</features>
<runtimeParams class="gate.util.persistence.MapPersistence">
<mapType>gate.util.SimpleFeatureMapImpl</mapType>
<localMap>
<entry>
<string>document</string>
<null/>
</entry>
<entry>
<string>inputASName</string>
<null/>
</entry>
</localMap>
</runtimeParams>
</gate.util.persistence.LanguageAnalyserPersistence>

</localList>
<collectionType>java.util.ArrayList</collectionType>
</prList>
...
{code}

The IG is defined in the *creole.xml* file in *KIM/config*.
* Include the link by adding the code below in the {{<urlList>}} in your .gapp application file:

{code:xml}
<gate.util.persistence.PersistenceManager-URLHolder>
<urlString>$relpath$../../../config</urlString>
</gate.util.persistence.PersistenceManager-URLHolder>
{code}

* You can now start KIM. If all the steps were completed properly, you'll see this in the startup messages:

{code}
[INFO] Loading G:\KIM\context\default\resources\ANNIE_with_defaults.gapp
CREOLE plugin loaded: file:/G:/KIM/config/
CREOLE plugin loaded: file:/G:/KIM/plugins/ANNIE/
{code}

(/) If GATE is unable to find the files for the plug-ins mentioned in your pipeline (Task 1), you may get the following exception when you start KIM:

{code}
Could not reload creole directory file:/G:/KIM/context/default/resources/
gate.creole.ResourceInstantiationException: Couldn't get resource data for gate.creole.annotdelete.AnnotationDeletePR.

You may need first to load the plugin that contains your resource.
For example, to create a gate.creole.tokeniser.DefaultTokeniser you need first to load the ANNIE plugin.
......
{code}

h2. The result

With this custom pipeline you are free to annotate any corpus of documents and explore the information extraction results. Since the KIM default pipeline is based on ANNIE, you may not notice any significant difference in the extraction. However, it is good to know that you can take any GATE pipeline, add the Instance Generator and utilize the rest of the KIM system for your needs.