KIM can be customized in multiple ways to suit different semantic annotation and search needs. One way to do this is to change the text analysis pipeline to find new types of entities and facts, and use the conceptual models and instance bases relevant to a certain domain.
The example below describes how to replace the KIM IE pipeline with a custom GATE stored application (in this case ANNIE).
By default KIM uses a GATE pipeline with customized processing resources, configured for the common task of annotating news articles. This makes it possible to replace the whole annotation functionality in KIM with an arbitrary GATE application (saved in a .gapp file).
The process of annotating documents can be divided into three steps:
Step 1: Create GATE annotations using a pipeline of processing resources - tokenizer, POS tagger, gazetteers, etc.;
Step 2: Assign "class" to the entity annotations that do not have this feature.
The Instance Generator (IG) processing resource recognizes the following types of annotations and assigns ontology classes to them: Person, Location, Organization, KeyPhrase, Date and Time. The assigned classes are from the PROTON ontology (http://proton.semanticweb.org/2006/05/protont#Person, etc).
When you have custom annotation types, or other ontology classes, you should assign a "class" feature to all your annotations.
For instance, if you have annotations of the type "ArchaeologicalSite", you should put a feature class = "http://proton.semanticweb.org/2006/05/protont#Location". However, if there is a more appropriate class in another ontology, you may use it instead of the more general class .../protont#Location. For example, class = "http://example.com/exampleOntology#ArchaeologicalSite". In most cases, the simplest way to add a "class" feature to a given annotation type is to create a JAPE grammar matching that type. Put the JAPE transducer right before the Instance Generator.
Step 3: Link classified annotations to the respective named entities in the KIM Knowledge Base. Create new entities if needed.
The IG processing resource uses the annotation features to create instances in the KB.
For example, if "Kim Basinger" is annotated with class protont:Person, the IG will generate an instance for her with class protont:Person and main label "Kim Basinger". If such statements do not exist in the KB, they will be created.
We are going to integrate the ANNIE pipeline as the first step of the annotation process. For the purpose of Steps 2 and 3, we have to include the Instance Generator in the .gapp file that comes with the ANNIE plug-in.
To follow the next steps you have to be familiar with semantic annotation, GATE processing resources and storing GATE applications. We assume that you have a clean installation of KIM.
Replace the KIM default pipeline with ANNIE with defaults.
- Copy the ANNIE_with_defaults.gapp file from the folder KIM/plugins/ANNIE to KIM/context/default/resources.
The KIM/context/default/resources folder is the place where we put the GATE applications in KIM. The default KIM pipeline is described in the IE.gapp file.
- Open KIM/config/nerc.properties and replace this line:
The GATE application file holds relative paths to the resources used in it.
- Open ANNIE_with_defaults.gapp file and fix the relative path:
If you start KIM now and populate some documents, you'll notice that in the Facets screen no entities are associated with the documents. However, if you start the GATE developer (by running the kim script with parameter "gate"), you can annotate documents and observe some annotations. That is because KIM uses instance data, which is created by the Instance Generator, or included by the default KIM gazetteer. If a representative URI is not included for an annotation, the "document_X mentions entity_Y" relation is not created. Therefore, you have to customize the pipeline to include the IG.
Customize the pipeline to include the Instance Generator PR.
- Add the Instance Generator PR to the ANNIE_with_defaults.gapp file. Place the following resource at the end of the PR list:
The IG is defined in the creole.xml file in KIM/config.
- Include the link by adding the code below in the <urlList> in your .gapp application file:
- You can now start KIM. If all the steps were completed properly, you'll see this in the startup messages:
If GATE is unable to find the files for the plug-ins mentioned in your pipeline (Task 1), you may get the following exception when you start KIM:
With this custom pipeline you are free to annotate any corpus of documents and explore the information extraction results. Since the KIM default pipeline is based on ANNIE, you may not notice any significant difference in the extraction. However, it is good to know that you can take any GATE pipeline, add the Instance Generator and utilize the rest of the KIM system for your needs.