View Source

This section discusses how to configure the KIM Information Extraction (IE) Modules.


h2. Overview

KIM supports multiple GATE applications and has dynamic runtime switching of the IE modules.

Loading simultaneously multiple IE modules on one running KIM server allows different IE processing, depending on the types of documents coming. This feature allows a KIM client to handle multilingualism or domain-dependent processing without the need to set up multiple KIM servers. Another advantage is that the extracted entities are added to the same semantic repository, despite the fact that they were extracted by different IE modules.

!_Images for reuse^KIM_IE_modules.png|width=800px!


h2. Configuration


To set up the configuration of the KIM IE –ľodules, you need to:
* Configure all desired GATE applications, edit the file *./config/nerc.properties* where the parameter
{{com.ontotext.kim.KIMConstants.IE_APP}}
must be a comma-separated list of the names of the IE modules {{*.gapp}} files. The first item on the list will be the default.
* Place all IE modules {{* .gapp}} files in the */context/default/resources* folder.

Within one KIM installation, you can configure several IE modules that can be changed at any time. To configure the different modules, use the [SemanticAnnotationAPI|http://javadoc.ontotext.com/kim-3.7-javadoc/com/ontotext/kim/client/semanticannotation/SemanticAnnotationAPI.html].

Example:

Let's say you have the following three IE modules placed in the */context/default/resources* folder
* {{IE.gapp}}
* {{Custom.gapp}}
* {{test.gapp}}

{{IE.gapp}} is a standard module that comes with KIM.

If you want to select another module to be the default, you just need to place it first on the list.How you order the rest of them is insignificant, so the configuration parameter may look like this:
{{com.ontotext.kim.KIMConstants.IE_APP=IE.gapp, Custom.gapp}}

{note}File names are case-sensitive, so they should be spelled the same way as in the configuration and the API calling at runtime.{note}

To select between the different GATE applications, see the [example|03. Annotating documents and texts].


h2. Parallel annotation

The KIM Server supports annotation of multiple documents in parallel. This way the performance of semantic annotation increases with the number of CPU cores, available to the server.

To use this feature, you need to configure it and make sure that the server receives multiple requests for annotation in parallel. In terms of the KIM API this means that the [SemanticAnnotationAPI|http://javadoc.ontotext.com/kim-3.7-javadoc/com/ontotext/kim/client/semanticannotation/SemanticAnnotationAPI.html] executed method will allow several executions at the same time by one or more KIM server clients. Note that simple clients like the [Populater Tool|Population], which annotate documents one by one (always waiting for completion of one document before sending the next), will not benefit from the parallel annotation capabilities.

h3. Parallel annotation configuration

The number of pipelines is configured in the file *confignerc.properties*. The option format is:
{code:java}
com.ontotext.kim.semanticannotation.PARALLEL_NERCS=<maximum number of parallel executions allowed>
{code}Examples:
{code:java}
com.ontotext.kim.semanticannotation.PARALLEL_NERCS=1
#will disable parallel annotation

com.ontotext.kim.semanticannotation.PARALLEL_NERCS=auto
#will limit the number of parallel executions to the number of CPU cores, available in the system

com.ontotext.kim.semanticannotation.PARALLEL_NERCS=3
#will process up to 3 annotation executions at the same time
{code}


h2. Extending the information extraction

After you have extended the ontology with domain specific classes and have enriched the knowledge base with instances, there are three options for configuring the IE process:

* You can use both the default IE module and the grammars. If you have followed the above steps correctly, you can rely on the module and grammars to recognize entities correctly.
* You can leave the default IE module as it is, but edit some of the grammars to provide specific recognition rules. (We do not recommended it.)
* You can create a new IE module {{*.gapp}} and develop grammars to cover the desired domain. Replace the default IE module with the new IE module. (This is a complex task and we recommend it to advanced GATE users only. It is not discussed here.)

(!) The first IE module on the list is used by default.
{tip:title=Tips}
Increasing the number of allowed parallel executions will increase the memory requirements of the server. Therefore, we do not recommend to set this number above the number of CPU cores in the system. Doing so will only waste memory. If you allow parallel annotation, make sure you [increase the allotted memory to the server|Configuring remote connection].
* Parallel annotation leads to loading multiple documents in KIM simultaneously. Documents with more than 20 pages will take up to 1 GB of memory. If you intend to annotate large documents in parallel, keep in mind that the KIM server will require a lot of memory.

{noformat}<required memory> =
<memory in minimal requirements> +
<number of parallel executions allowed> * <memory required for processing 1 document>{noformat}
* Even though the regular [Populater Tool|Population] does not take advantage of parallel annotation, you can start several separate Populater instances on different subsets of documents at the same time. Due to the fact that these different instances will work in parallel, this approach will take less time, compared to running a single Populater Tool instance for all documents.{tip}