The configuration file <KIM_HOME>\config\populater.xml is used to setup the environment for the population process. All parameters are grouped by functional relevance and are as follows:
Document source parameters | determine where and how to get row data for document creation |
---|---|
INPUT_STORAGE_URL | This is a file-path to the folder containing files with document components (body and metadata). |
INPUT_SUBFOLDERS | ( true / false flag). It determines whether to check the sub-folders of the storage folder for files. |
INPUT_SOURCE_CLASS | This is a fully qualified class name. It points to a class implementing the DocumentSource interface. The class collects file information and packs together files related to one document. Then it lets the populater process them package by package. There are two classes currently available:
|
Auxiliary parameters | need to be defined if DocumentSourceQueue has the value of INPUT_SOURCE_CLASS |
INPUT_QUEUE_URL | This is the file-path with request files. The folder can be the same as the storage folder. |
INPUT_DOC_EXT | This is a comma-separated list of file extensions. Files with such extensions will be searched and used as document bodies. The extension list is prioritized with priority decreasing from left to right. If more than one file is found for each document (with different extensions), only the file with the highest priority extension is selected. |
In the class definition of the DocumentSource classes, request file extensions and metadata file extensions are defined as constants.
KIM server connection parameters | determine how a running instance of KIM server can be reached |
---|---|
RMI_HOST | This is the URL (as a string) of the host where a KIM server is running. |
RMI_PORT | This is the (integer) port number of the KIM server. |
Document processing parameters | determine how the documents will be processed: |
POLLING_INTERVAL | This is an integer setting the time (in seconds) that the populater must wait before checking for new files.
|
CONTENT_ENCODING | This is the (string) character encoding used in the input files (e.g. UTF-8). |
SKIP_PROCESSED | ( true / false flag) This options uses the list of all successfully processed files kept by the populater . If you want the populater to skip such files next time it runs, set this parameter to: true. This is especially useful, if " bulk mode " population failed in the middle of the process. |
You should keep the default values of the following parameters. If changed, they seriously affect the behavior of the KIM Server.
STORE_LIMITED | ( true / false flag) This determines if KIM stores a serialized version of the original document in the document repository (affects only Lucene). |
PRESERVE_CONTENT | ( true / false flag) This determines if KIM preserves the original document content. If it is set to: true, the original content will be preserved as document feature. |
MARKUP_AWARE | ( true / false flag) This determines if KIM extracts the markup in the document and stores it as document annotations. |
Statistics system parameters | |
---|---|
STAT_SAVE | ( true / false flag) This determines if the statistics of the population process is saved. |
STAT_URL | This is the file-path to the folder where the statistics file will be created and saved. |
STAT_INTERVAL | This is an integer that determines how many documents must be processed before a snapshot of the statistics is taken and stored. |
To ensure proper interpretation when internal path concatenations are performed, all file-path parameters must have "/" as a file separator. You could also use a placeholder and at runtime it is replaced with the current installation path. The only exceptions are the auxiliary parameters. (For example INPUT_QUEUE_URL can not use the placeholder).