KIM is best suited for recognizing new named entities and relations in news and other informative texts. The "new" named entities are those not present in the knowledge base, but recognized during the semantic annotation process based on their context.
By default KIM supports only English language.
If you want to analyze documents in other languages, please have in mind that only the gazetteer will work over any language, and it will find mentions of entities that are the same in all languages and are already in the knowledge base, such as people, organizations, locations, money.
By default the semantic annotation is enabled.
If you want to disable it, leave the com.ontotext.kim.KIMConstants.IE_APP parameter empty in <KIM_HOME>config/nerc.properties.
Entities recognized by default:
Example with context
Nicolas Sarkozy in He is a friend of President Nicolas Sarkozy
Barclays Capital in Nomura lost the US division to Barclays Capital ...
Capitol Hill in ... the executives asked to attend a Capitol Hill summit ...
$850 billion in Obama looking at $850 billion jolt to the economy ...
Adobe Systems Incorporated acquires Omniture in 9/2009 in Adobe Systems Inc ( ADBE.O ) announced a $1.8 billion (1.0 billion pounds) deal to buy business software maker Omniture Inc ( OMTR.O )...
* Here by accuracy, we mean the standard measure for analyzing performance - F1-score .
** The F1-score for locations has been achieved by using a dictionary of known locations. This is valid as locations names do not depend on the topic of the analyzed documents. Nevertheless, the extraction of the previously unknown named entities is supported and necessary even for locations, like in the example above.
Relations recognized by default:
Example with context
Job position, held by a person at a company
as in Tim Geithner, the current president of the Federal Reserve Bank of New York who is expected to be appointed President-elect Barack Obama 's Treasury secretary today ...
Location of an organization
Document features recognized by default:
Depending on the domain of the document set, KIM also recognizes mentions that are considered document features, or metadata.
As the default document corpus consists of international news, KIM recognizes the following document features:
key phrases (words or phrases considered characteristic of the selected document)
key entities (named entities that are statistically rare in the document set, and therefore considered especially specific for the selected document)
KIM will work best if the populated documents have the following metadata: