View Source

\\
(!) In order to run these examples, make sure that the default corpus is populated as described in the [Installation guide].

{code:java}// ----------------------------------------------------------------------------------
// connect to KIMService (deployed on specific host and port)
KIMService serviceKim = GetService.from();
System.out.println("KIM Server connected.");
// obtain CoreAPI and TimelinesAPI components
CoreAPI core = serviceKim.getCoreDbAPI();
TimelinesAPI apiTimelines = apiCoreDb.getTimelinesAPI();
System.out.println("CoreAPI and TimelinesAPI obtained successfully.");
// ----------------------------------------------------------------------------------
{code}

h2. Searching for entities and documents using {{CoreDbQuery}}

Searching for entities offers more possibilities than searching for a part of the entity name. Document results can be limited only by the keywords they contain and whether they fit into a particular time interval. But if you search for entities you can implement some semantics. As you see in the example, you can find all entities of a specific class, or choose the most semantically relevant entities in a group of documents.

(/) When you search for entities using the same {{CoreDbQuery}} as the one used for document search, the entity search will take into account all document search parameters. This will limit the document set and only the entities matching the requirements within that set will be returned.

CoreDbQuery inherits the restrictions fields of the DocumentQuery, which limit the document set:
* {{CooccurringEntities}} - sets some entities that are required to appear in the document
* {{KeywordExpression}} - a boolean full-text search expression
* {{TimeInterval}} - sets the time interval the document must fit into

If you don't need to restrict the document set, you should leave those fields empty.

The following fields have no impact on the document set limitation and are important only for the entity search:
* {{ClassURI}} - sets the class of the searched entities
* {{AliasKeywordAndCompareStyle}} - sets the alias filter for the entities
* {{MaxResultLength}} - sets how many entities to return (if there are more entities than the {{MaxResultLength}}, only the top most popular ones are returned)

{code:java}// ----

// create a new CoreDbQuery
CoreDbQuery query = new CoreDbQuery();

// set the entities to be searched for to be Persons
query.setClassURI("http://proton.semanticweb.org/2006/05/protont#Person");

// which alias has a word that starts with "bush"
query.setAliasKeywordAndCompareStyle("bush", CompareStyleConstants.COMPARE_STYLE_STARTS_WITH);

// in documents containing "oil" as a word or part of a word
query.setKeywordRestriction("oil\ *");

// and return the first 5 results
query.setMaxResultLength(5);

// ----

Set ret;
ret = new HashSet();
ResultSetIterator rsIt = core.getEntities(query);
System.out.println("===========================================");
System.out.println("CoreSearch Entity Result");
System.out.println("===========================================");
while(rsIt.hasNext()){
CoreEntity coreEnt = rsIt.next();
ret.add(coreEnt.getEntityUri());
System.out.println("entity : " + coreEnt.toString());
}
System.out.println("===========================================");
// ----

ret = new HashSet();
DocumentQueryResult resDocs = core.getDocumentIds(query);
System.out.println("===========================================");
System.out.println("CoreSearch Document Result");
System.out.println("===========================================");
for (DocumentQueryResultRow row : resDocs) {
String docId = String.valueOf(row.getDocumentId());
ret.add(docId);
System.out.println("doc : " + docId);
}
System.out.println("===========================================");
{code}




h2. Limited document set with co-occurring entities



The following example is more complex than the one before. Here we add co-occurring entities as a restriction for the document set, so the document set gets recalculated.

Let's say that in the previous step some entities have been found and some documents have been limited by keyword/time. Now, these can be passed as parameters to the {{CoreDbQuery}} and {{CoreAPI.getEntities()}}. The final result will show you all Organizations occurring in the {{docIds}} documents that fit in the given time interval, as well as all {{cooccurringEntityURIs}} entities.

{code:java}Calendar cal = Calendar.getInstance();
// ----
// create a new CoreDbQuery
CoreDbQuery query = new CoreDbQuery();
// set the entities to be searched for to be Organizations
query.setClassURI("http://proton.semanticweb.org/2006/05/protont#Organization");
// that cooccur with the given ones
query.setCooccurringEntities(cooccuringEntityURIs);
// in selected time period
cal.set(2001, 0, 1);
query.setTimeIntervalStartDate(cal.getTime());
cal.set(2009, 11, 31);
query.setTimeIntervalEndDate(cal.getTime());
// ----

ResultSetIterator rsIt = core.getEntities(query);
System.out.println("===========================================");
System.out.println("CoreSearch Entity Result");
System.out.println("===========================================");
while(rsIt.hasNext()){
CoreEntity coreEnt = rsIt.next();
System.out.println("entity : " + coreEnt.toString());
}
System.out.println("===========================================");
{code}




h2. Requesting timelines for documents and entities


The CORE module allows calculating different kinds of statistics and trends due to the extra metadata it stores. Documents are bound to concrete moments in time and each of them has a timestamp. This way you can follow the occurrence of an entity in documents for a period of time.

Generally three types of timelines can be calculated:

* *Documents Timelines* : This option gives you just the document distribution over time. You can still restrict the documents on which the calculation is performed by co-occurring entities and keywords.

{code:java}Calendar cal =Calendar.getInstance();
// create a new TimelineQuery
TimelineQuery query = new TimelineQuery();
// documents in selected time period
cal.set(2001, 0, 1);
query.setTimeIntervalStartDate(cal.getTime());
cal.set(2009, 11, 31);
query.setTimeIntervalEndDate(cal.getTime());

// by given time units
query.setTimeGranularity(TimelineQuery.GRANULARITY_MONTH);
// optionally keywords and cooccurring entities can be set
// to restrict the calculation document set further e.g.:
query.setKeywordRestriction("important");
// query.setCooccurringEntities(coreEntURIs);

DocumentTimeLine res = apiTimelines.getDocsTimeLine(query);
System.out.println("===========================================");
System.out.println("Document Timelines");
System.out.println("===========================================");
Date prevPoint = null;
for (Date timePoint : res.getTimePoints()) {
System.out.println(timePoint + " : ["+res.getValue(timePoint)+"docs]");
if (prevPoint != null)
// the two lines below require junit library dependency
// you may use them just to be sure everything is fine with the result
// assertEquals(1, timePoint.compareTo(prevPoint));
// assertTrue("0 at " + timePoint, res.getValue(timePoint) > 0);
prevPoint = timePoint;
}
System.out.println("===========================================");
{code}

* *Selected Entities Timelines* : This option calculates trends for specific entities over a time period. You can still restrict the documents on which the calculation is performed by co-occurring entities and keywords.

{code:java}Set<String> selectedEntityURIs = Collections.singleton("http://www.ontotext.com/kim/2006/05/wkb#Continent_T.4");

Calendar cal = new GregorianCalendar();
// create a new TimelineQuery
TimelineQuery query = new TimelineQuery();
cal.set(2001, 0, 1);
query.setTimeIntervalStartDate(cal.getTime());
cal.set(2009, 11, 31);
query.setTimeIntervalEndDate(cal.getTime());
// by given time units
query.setTimeGranularity(TimelineQuery.GRANULARITY_DAY);
query.setTimeLinesEntities(selectedEntityURIs);

EntityTimeLine res = apiTimelines.getTimeLine(query);
Collection<Date> timePoints = res.getTimePoints();

// the line below requires a junit dependency
// you may use it just to be sure everything with the result is fine
// assertTrue("timePoint.size() = " + timePoints.size(), timePoints.size() > 5);
Iterator<String> entSeries = query.getTimeLinesEntities().iterator();
System.out.println("===========================================");
System.out.println("Entity Timelines");
System.out.println("===========================================");
while(entSeries.hasNext()){
String coreEnt = entSeries.next();
System.out.println(coreEnt.toString());
Integer\[] series = res.getSeriesForEntity(new URIImpl(coreEnt));


// the two line below require a junit dependency
// you may use them just to be sure everything is fine with the result
// assertTrue("0 series for " + coreEnt, sumOf(series) > 0);
// assertEquals(timePoints.size(), series.length);
Iterator<Date> pointIterator = timePoints.iterator();
for(int i=0; i<series.length; i++){
Date timePoint = pointIterator.next();
System.out.println(timePoint + " : ["+res.getValue(timePoint,newURIImpl(coreEnt))+"hits]");
}
}
System.out.println("===========================================");
{code}

* *Most Popular Entities Timelines* : This option calculates trends for the most popular entities of a given class over a time period. You can still restrict the documents on which the calculation is performed by co-occurring entities and keywords.

{code:java}Set<String> selectedEntityURIs = Collections.singleton("http://www.ontotext.com/kim/2006/05/wkb#Continent_T.4");

Calendar cal = new GregorianCalendar();
// create a new TimelineQuery
TimelineQuery query = new TimelineQuery();
cal.set(2001, 0, 1);
query.setTimeIntervalStartDate(cal.getTime());
cal.set(2009, 11, 31);
query.setTimeIntervalEndDate(cal.getTime());
// by given time units
query.setTimeGranularity(TimelineQuery.GRANULARITY_DAY);
query.setTimeLinesEntities(selectedEntityURIs);

EntityTimeLine normalizedRes = apiTimelines.getTimeLineOnDocumentOccurrence(query);
Iterator<Date> normalizedTimePoints = normalizedRes.getTimePoints().iterator();
Iterator<String> normEntSeries = query.getTimeLinesEntities().iterator();
System.out.println("===========================================");
System.out.println("Normalized Entity Timelines");
System.out.println("===========================================");
while(normEntSeries.hasNext()){
String coreEnt = normEntSeries.next();
System.out.println(coreEnt.toString());
Integer series\[] = normalizedRes.getSeriesForEntity(new URIImpl(coreEnt));
for(int i=0; i<series.length; i++){
Date timePoint = normalizedTimePoints.next();
System.out.println(timePoint + " : ["+normalizedRes.getValue(timePoint,newURIImpl(coreEnt))+"hits]");
}
normalizedTimePoints = normalizedRes.getTimePoints().iterator();
}
System.out.println("===========================================");
{code}