This section describes how to annotate documents with CES (Concept Extraction Service).
Annotating a document is the process of adding a set of meta data about words or phrases in an unstructured text.
A mention is a slice of text with attached meta data features. Mentions always have:
A mention is usually (but not necessarily) associated with a concept - a concept is a real-world entity that we recognized, a mention is a reference to that concept.
For example, annotating the text Hello London will yield a mention similar to the one below. The only mention has offsets within the original text and is associated with the concept http://dbpedia.org/resource/London
All URLs in this document are of the form http://worker-base/endpoint where http://worker-base is the host:port/context of a deployed CES worker and endpoint is the specific worker call. For example, if you worker is deployed at http://192.168.0.1/extractor-web and this guide mentions http://worker-base/extract then the URL to query will be http://192.168.0.1/extractor-web/extract
Annotation requests go to http://worker-base/extract. There are two ways to invoke annotation:
It's also advisable to specify Accept header with the desired output mime type. The default will usually be application/vnd.ontotext.ces+json, see output formats for more.
|If Accept header is not specified, the simple mentions JSON format is returned (application/vnd.ontotext.ces+json)|
Mention features can vary wildly depending on the subsystem that generated the mention. Most mentions however will have
Other returned features may include confidence (how sure the annotator feels about this mention), ambiguityRank, etc.
Other features are database and type dependant, for example locations such as London can have a featClass, featCode, countryCode, etc, giving more information about the concept