Document metadata
For every populated document, the populater loads the accompanying metadata. The metadata is stored as document features. The populater tries to access the metadata by looking for a file with the same name as the document, but in XML format.
For example, it will check for the metadata of the document "CompanyReport2003.html" in the file "CompanyReport2003.xml".
By default, KIM recognizes the following feature types: TITLE, SUBTITLE, AUTHORS, TIMESTAMP, SUBJECT, SOURCE, URL, ORIGIN.
This restriction to the feature schema allows better indexing and a more consistent user interface. The document features will be available immediately after loading the document. As a result, they can be used in the semantic annotation pipeline.
![]() | Notes:
|
The TIMESTAMP feature will be parsed as a date. Typically, this is the date on which the document was created. The the documents dates are used in the Timelines section of the Web UI. Furthermore, developers can create queries that return documents from a specific time interval. See the Java RMI API or Web Service API for details.
A variety of date formats are recognized. For best results, use one of the sample date formats:
- Mon Jan 06 00:43:52 EET 2003
- 2006-01-06
- Oct 20 00:00:00 EET 1950
- 06-01-2006
If the date or the TIMESTAMP feature is not specified, the document date will be set to the day of processing the document at midnight in the current timezone.