Skip to end of metadata
Go to start of metadata

The configuration file <KIM_HOME>\config\populater.xml is used to setup the environment for the population process. All parameters are grouped by functional relevance and are as follows:

Document source parameters determine where and how to get row data for document creation
INPUT_STORAGE_URL This is a file-path to the folder containing files with document components (body and metadata).
INPUT_SUBFOLDERS ( true / false flag). It determines whether to check the sub-folders of the storage folder for files.
INPUT_SOURCE_CLASS This is a fully qualified class name. It points to a class implementing the DocumentSource interface. The class collects file information and packs together files related to one document. Then it lets the populater process them package by package. There are two classes currently available:
  • com.ontotext.kim.util.datastore.DocumentSourceBulk - the standard implementation
  • com.ontotext.astrazeneca.populater.DocumentSourceQueue - implements input queue handling
Auxiliary parameters need to be defined if DocumentSourceQueue has the value of INPUT_SOURCE_CLASS
INPUT_QUEUE_URL This is the file-path with request files. The folder can be the same as the storage folder.
INPUT_DOC_EXT This is a comma-separated list of file extensions. Files with such extensions will be searched and used as document bodies. The extension list is prioritized with priority decreasing from left to right. If more than one file is found for each document (with different extensions), only the file with the highest priority extension is selected.

In the class definition of the DocumentSource classes, request file extensions and metadata file extensions are defined as constants.

KIM server connection parameters determine how a running instance of KIM server can be reached
RMI_HOST This is the URL (as a string) of the host where a KIM server is running.
RMI_PORT This is the (integer) port number of the KIM server.
Document processing parameters determine how the documents will be processed:
POLLING_INTERVAL This is an integer setting the time (in seconds) that the populater must wait before checking for new files.
  • When in a standard " bulk mode ", after all initially found documents have been processed, the populater stops and this parameter has no effect.
  • When in a " queue mode ", after all available files have been processed, the populater enters a loop waiting for new files. In this mode the polling interval is used to set the time before the populater checks for new elements in the queue.
CONTENT_ENCODING This is the (string) character encoding used in the input files (e.g. UTF-8).
SKIP_PROCESSED ( true / false flag) This options uses the list of all successfully processed files kept by the populater . If you want the populater to skip such files next time it runs, set this parameter to: true. This is especially useful, if " bulk mode " population failed in the middle of the process.

You should keep the default values of the following parameters. If changed, they seriously affect the behavior of the KIM Server.

STORE_LIMITED ( true / false flag) This determines if KIM stores a serialized version of the original document in the document repository (affects only Lucene).
PRESERVE_CONTENT ( true / false flag) This determines if KIM preserves the original document content. If it is set to: true, the original content will be preserved as document feature.
MARKUP_AWARE ( true / false flag) This determines if KIM extracts the markup in the document and stores it as document annotations.
Statistics system parameters  
STAT_SAVE ( true / false flag) This determines if the statistics of the population process is saved.
STAT_URL This is the file-path to the folder where the statistics file will be created and saved.
STAT_INTERVAL This is an integer that determines how many documents must be processed before a snapshot of the statistics is taken and stored.

To ensure proper interpretation when internal path concatenations are performed, all file-path parameters must have "/" as a file separator. You could also use a placeholder and at runtime it is replaced with the current installation path. The only exceptions are the auxiliary parameters. (For example INPUT_QUEUE_URL can not use the placeholder).

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.