One of the major improvements of KIM is the possibility to use some KIM components on a standalone basis. CORE Java component is one of them. It can be used outside the KIM Server. By using the CORE capabilities you can get answers to queries such as: "Who are the 10 people that are most frequently mentioned in a document set, and at the same time appear together with the entity "Federal Reserve"", or another named entity, or a set of entities.
More complex queries such as "Which are the 20 locations, mentioned together with companies from the software industry" are also supported.
In KIM, the CORE capabilities are provided by keeping an index of the entities that are mentioned in the documents. The index is stored as RDF triples in an RDF database.
To use this functionality, we recommend setting up a KIM server and populating the documents in it, but if that's not an option, you can embed the so called RdfCore module right in your existing GATE-based application.
Basically, after you setup RdfCore, you can feed it with GATE documents (again and again):
... then run queries like this ...
... or manually write SPARQL queries for the index if you are familiar with the language and your application doesn't need to generate such queries on the fly based on user input. You can also search for documents as well as entities. There are more query examples in the KIM Developer's guide.
The most basic way to setup RdfCore is the following:
The complexity comes from the extensive capacity of RdfCore for customization. The table below presents a short overview of all components.
|Component||Function||Default implementation||Possible customization|
|SemanticRepositoryAPI||Abstraction over a triple store.||SemanticRepositoryAPIImpl works with a (Big)OWLIM RDF database, behind a Sesame 2 interface.||With minor tweaks, triple stores other than OWLIM can be supported.|
|DocumentRepositoryAPI||Abstraction over document storage and/or additional indexing.||Default implementation: document storage and additional indexing are disabled.||In KIM, documents are stored as compressed GATE documents and are indexed in Mimir, using new IndexStorePair(new FileStore(...), new FederatedIndexController(...));. Storage into GATE document stores can also be implemented.|
|EntityAPI||Storage of named entity molecules in a triple store.||KIMEntityAPIImpl - straightforward implementation, respecting the vocabulary modeling schema.||Not needed.|
|LabelsModel||Limited abstraction over a vocabulary modeling schema. Defines how the canonical and alternative labels of a named entity are modelled in RDF.||DirectLabelsModel: canonical label is literal, connected via protons:mainLabel predicate; alt. labels are rdfs:labels||Support for SKOS labelling (skos:prefLabel, skos:altLabel) can easily be implemented.|
|RdfCoreDefinition||Defines what triples are stored when a document is indexed and what RDF queries are generated, based on input CORE queries.||AgrDocumentCoreDef - defines co-occurrence without regard for in-document context; takes into consideration SKOS vocabulary relations, like skos:broader and skos:exactMatch; enables ranking.||Can be customized to consider co-occurrence only in the same paragraph or sentence. Can be tweaked to generate more efficient or more descriptive indexes.|
|RdfCore||Orchestrates all components. Controls index synchronization between index and querying. Responsible for ranking.||RdfCoreRanked||Modify synchronization behavior. The base RdfCore implementation disables ranking but has greater performance.|