In order to run these examples, make sure that the default corpus is populated as described in the Installation guide.
Searching for entities offers more possibilities than searching for a part of the entity name. Document results can be limited only by the keywords they contain and whether they fit into a particular time interval. But if you search for entities you can implement some semantics. As you see in the example, you can find all entities of a specific class, or choose the most semantically relevant entities in a group of documents.
When you search for entities using the same CoreDbQuery as the one used for document search, the entity search will take into account all document search parameters. This will limit the document set and only the entities matching the requirements within that set will be returned.
CoreDbQuery inherits the restrictions fields of the DocumentQuery, which limit the document set:
- CooccurringEntities - sets some entities that are required to appear in the document
- KeywordExpression - a boolean full-text search expression
- TimeInterval - sets the time interval the document must fit into
If you don't need to restrict the document set, you should leave those fields empty.
The following fields have no impact on the document set limitation and are important only for the entity search:
- ClassURI - sets the class of the searched entities
- AliasKeywordAndCompareStyle - sets the alias filter for the entities
- MaxResultLength - sets how many entities to return (if there are more entities than the MaxResultLength, only the top most popular ones are returned)
The following example is more complex than the one before. Here we add co-occurring entities as a restriction for the document set, so the document set gets recalculated.
Let's say that in the previous step some entities have been found and some documents have been limited by keyword/time. Now, these can be passed as parameters to the CoreDbQuery and CoreAPI.getEntities(). The final result will show you all Organizations occurring in the docIds documents that fit in the given time interval, as well as all cooccurringEntityURIs entities.
The CORE module allows calculating different kinds of statistics and trends due to the extra metadata it stores. Documents are bound to concrete moments in time and each of them has a timestamp. This way you can follow the occurrence of an entity in documents for a period of time.
Generally three types of timelines can be calculated:
- Documents Timelines : This option gives you just the document distribution over time. You can still restrict the documents on which the calculation is performed by co-occurring entities and keywords.
- Selected Entities Timelines : This option calculates trends for specific entities over a time period. You can still restrict the documents on which the calculation is performed by co-occurring entities and keywords.
- Most Popular Entities Timelines : This option calculates trends for the most popular entities of a given class over a time period. You can still restrict the documents on which the calculation is performed by co-occurring entities and keywords.