Skip to end of metadata
Go to start of metadata

Oracle COREDB is deprecated - it works in KIM 3, but with some known minor issues. Furthermore, Ontotext will not provide support for Oracle COREDB to users who use the KIM Platform for evaluation purposes. Instead, consider selecting [a different document repository configuration] .

The COREDB module was designed to use Relational Database Management System (RDBMS) for document storage. This section describes how to configure the document repository when you want to use the COREDB module over Oracle Database. The current version of Oracle COREDB is developed and tested extensively with the 10g2 release of the Oracle TM RDBMS. We recommend that you always use its latest patch version. KIM is a powerful application and any known issues with Oracle will affect it.
For small document sets as well as for evaluation purposes, we support Oracle Express database. You can download it and use it for free from here .

Prerequisites

When you set up KIM to use the COREDB module over Oracle as the document repository , this will enable the CORE interface in the Web User Interface (UI) and the CORE-specific methods in the software interface (Application Programming Interface or API). To do this, follow these steps:

  1. Create a dedicated tablespace with unlimited quota and a database user (DB-user) from the Oracle Administration interface.
  2. Assign the tablespace to the newly created DB-user. In order to be able to create and use the COREDB-specific schema objects, the DB-user needs to have, as a minimum, the following roles: CTXAPP, RESOURCE, and CONNECT. It is also possible to grant the Database Administrator (DBA) role to the DB-user.

Basic configuration

After the database requirements are met, review and set up the following configuration parameters, located in KIM/config/document.repository.properties

  • com.ontotext.kim.KIMConstants.DOCUMENT_REPOSITORY_TYPE = coredb
      This command selects Oracle COREDB as the document repository implementation. In most cases, you also need to set the CORE_INDEX_ADDON option to none.

   Changing the document repository will NOT move any documents from the old repository to the new one. Therefore, all populated documents will be inaccessible. However, they will not be deleted, so you can return to your old document repository at any time.

  • com.ontotext.kim.KIMConstants.COREDB_CONNECTION_STRING = jdbc:oracle:thin:@//<server-address>:1521/<oracle-sid>, where:
    • <server-address> must be replaced by the IP address of a server name
    • <oracle-sid> must be replaced by the service name of the Oracle database
        The connection string is used to connect to a database running on a server and accessible at a port with the settings provided by the administrator. Since only Oracle databases are currently supported, you should use the above connection string notation.
  • com.ontotext.kim.KIMConstants.COREDB_USER
  • com.ontotext.kim.KIMConstants.COREDB_PASS
      These parameters set the user name and the password to connect. The default tablespace assigned to the user will be used for data storage.

Do not execute kim(.bat) reindex on a tablespace with existing COREDB data. This will delete all data.

Run kim(.bat) reindex from an open console window in the bin folder of the installation.This will create the database schema and will preload the entities. The process may take up to 30 minutes. After KIM initializes, populate some documents and try the CORE interface.

Performance configuration

To set up the performance configuration, use the following parameters:

  • com.ontotext.kim.KIMConstants.DOCUMENT_REPOSITORY_SYNCHRONIZE_COUNT=100
  • com.ontotext.kim.KIMConstants.DOCUMENT_REPOSITORY_OPTIMIZE_COUNT=100000

  These parameters determine the count of documents after which the Full Text Search (FTS) indices will be automatically synchronized or optimized. The CoreDbAPI provides methods for considering these parameters, ignoring them, or explicitly forcing sync or optimize operation.

   Note that sync simply updates the index so that the content of the new entries is included, while optimize rewrites it from scratch, which may take a long time.

Initialization configuration

There are additional options that affect the document repository, located in KIM/config/document.repository.rebuild. These options take effect only when the repository is initialized, i.e. when KIM is started by startKIM_RebuildIndex or when the changes are applied from the COREDB administration console.

  • com.ontotext.kim.KIMConstants.COREDB_EXTENDED_QUERY=true
      This enables the full capabilities of Oracle Enterprise Server 10g2. It will greatly enhance the performance of the semantic queries over the document repository. The default is: true. If you use Oracle Standard Edition or Oracle Express, set to: false.
  • com.ontotext.kim.KIMConstants.COREDB_XML_DOC_BODY=false
      This determines whether the documents will be serialized as XML documents in Oracle. It enables sectioning support in queries. The default is: false. If the semantic annotation will produce document sectioning metadata, set to: true.
  • com.ontotext.kim.KIMConstants.DOCUMENT_FEAT_LIST=
      This sets the list of document feature names. Names must contain only letters and numbers. Letters are uppercased internally, so they are not case-sensitive. Commas are treated as delimiters. Equal feature names are merged into one name. The retrieved feature structure will be used as default in the document population process. If this parameter is empty, then the default set of document features is used. This set contains all features available for the documents in the KIM demonstration corpus.
  • com.ontotext.kim.coredb.base.DbPopulate.COUNT_ENTITIES=true
      This determines whether the number of entities is calculated prior to their population in CORE. It helps to better estimate the time the entity population would take, but also slows down the process. The default is: true. We recommended that COUNT_ENTITIES is set to: false, if one of the following is true:
    • the number of entities may be more than 10,000,000
    • the majority of the entities do not have a label and will not be populated in CORE
  • com.ontotext.kim.KIMConstants.COREDB_EXTRA_TAB_SPACES=
      This is a comma-separated list of additional tablespaces that are available. The CORE module can utilize effectively up to 3 additional tablespaces. Using multiple tablespaces gives you better parallelism of execution and makes it easier to manage large databases. The default configuration of the KIM server provides only one tablespace.

After editing the initialization configuration, you need to invoke the administration console to apply the changes or rebuild the repository, losing all data.

Upgrading from KIM 2.4

KIM 3 will automatically upgrade a tablespace, created by Oracle COREDB in KIM 2.4. All data will be kept. Simply configure KIM 3 to use the existing tablespace and start KIM. You will see status messages about "recovering" the tablespace.
Note that you will not be able to use the tablespace and any of the data with KIM 2.4 again. Upgrading the Oracle COREDB tablespace is only part of the process of upgrading an existing KIM 2.4 installation to KIM 3. Contact support for details.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.