Skip to end of metadata
Go to start of metadata

Document metadata

For every populated document, the populater loads the accompanying metadata. The metadata is stored as document features. The populater tries to access the metadata by looking for a file with the same name as the document, but in XML format.
For example, it will check for the metadata of the document "CompanyReport2003.html" in the file "CompanyReport2003.xml".

By default, KIM recognizes the following feature types: TITLE, SUBTITLE, AUTHORS, TIMESTAMP, SUBJECT, SOURCE, URL, ORIGIN.

Adding arbitrary elements to the .xml file will NOT add features to the document. If you want to customize the feature set, you need to edit the document repository configuration in <KIM_HOME>config/document.repository.rebuild. Add your custom features to the com.ontotext.kim.KIMConstants.DOCUMENT_FEAT_LIST option. Like all options in document.repository.rebuild, the document features list will be updated when you rebuild your documents storage.

This restriction to the feature schema allows better indexing and a more consistent user interface. The document features will be available immediately after loading the document. As a result, they can be used in the semantic annotation pipeline.

Notes:
  • All feature names (keys) will be converted to uppercase automatically.
  • If you want to develop a customized KIM application, make sure you configure the feature list before populating any documents.

The TIMESTAMP feature will be parsed as a date. Typically, this is the date on which the document was created. The the documents dates are used in the Timelines section of the Web UI. Furthermore, developers can create queries that return documents from a specific time interval. See the Java RMI API or Web Service API for details.

A variety of date formats are recognized. For best results, use one of the sample date formats:

  • Mon Jan 06 00:43:52 EET 2003
  • 2006-01-06
  • Oct 20 00:00:00 EET 1950
  • 06-01-2006

If the date or the TIMESTAMP feature is not specified, the document date will be set to the day of processing the document at midnight in the current timezone.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.