This section discusses how to configure the KIM Information Extraction (IE) Modules.
KIM supports multiple GATE applications and has dynamic runtime switching of the IE modules.
Loading simultaneously multiple IE modules on one running KIM server allows different IE processing, depending on the types of documents coming. This feature allows a KIM client to handle multilingualism or domain-dependent processing without the need to set up multiple KIM servers. Another advantage is that the extracted entities are added to the same semantic repository, despite the fact that they were extracted by different IE modules.
To set up the configuration of the KIM IE мodules, you need to:
- Configure all desired GATE applications, edit the file ./config/nerc.properties where the parameter
must be a comma-separated list of the names of the IE modules *.gapp files. The first item on the list will be the default.
- Place all IE modules * .gapp files in the /context/default/resources folder.
Within one KIM installation, you can configure several IE modules that can be changed at any time. To configure the different modules, use the SemanticAnnotationAPI.
Let's say you have the following three IE modules placed in the /context/default/resources folder
IE.gapp is a standard module that comes with KIM.
If you want to select another module to be the default, you just need to place it first on the list.How you order the rest of them is insignificant, so the configuration parameter may look like this:
|File names are case-sensitive, so they should be spelled the same way as in the configuration and the API calling at runtime.|
To select between the different GATE applications, see the example.
The KIM Server supports annotation of multiple documents in parallel. This way the performance of semantic annotation increases with the number of CPU cores, available to the server.
To use this feature, you need to configure it and make sure that the server receives multiple requests for annotation in parallel. In terms of the KIM API this means that the SemanticAnnotationAPI executed method will allow several executions at the same time by one or more KIM server clients. Note that simple clients like the Populater Tool, which annotate documents one by one (always waiting for completion of one document before sending the next), will not benefit from the parallel annotation capabilities.
Parallel annotation configuration
The number of pipelines is configured in the file confignerc.properties. The option format is:
Extending the information extraction
After you have extended the ontology with domain specific classes and have enriched the knowledge base with instances, there are three options for configuring the IE process:
- You can use both the default IE module and the grammars. If you have followed the above steps correctly, you can rely on the module and grammars to recognize entities correctly.
- You can leave the default IE module as it is, but edit some of the grammars to provide specific recognition rules. (We do not recommended it.)
- You can create a new IE module *.gapp and develop grammars to cover the desired domain. Replace the default IE module with the new IE module. (This is a complex task and we recommend it to advanced GATE users only. It is not discussed here.)
The first IE module on the list is used by default.
Increasing the number of allowed parallel executions will increase the memory requirements of the server. Therefore, we do not recommend to set this number above the number of CPU cores in the system. Doing so will only waste memory. If you allow parallel annotation, make sure you increase the allotted memory to the server.
<required memory> = <memory in minimal requirements> + <number of parallel executions allowed> * <memory required for processing 1 document>