Concept Extraction Plug-in (CES)

compared with
Version 5 by reneta.popova
on Sep 16, 2014 16:21.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (22)

View Page History
See [http://en.wikipedia.org/wiki/Base64#URL_applications] \\
and [http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html]\\ \\
\\ \\
\\
{note} |
h2. Re-training

Not defined yet. Depends on the pipeline as well, whether or not it contains re-trainable machine learning components.
Not defined yet. Whether or not it contains re-trainable machine learning components, depends on the pipeline.


h3. Start

* First time initialisation *DOES NOT* include Gazetteer NOT* include gazetteer cache loading - see [Reload dictionary|#Reloaddictionary].
* On subsequent starts will load it loads the cache from the file system.

To start *ALL* registered start *ALL* registered concept extraction pipelines use the following query:


{code}

To start a specific pipeline, include it's specific name graph in the query. See an example with the default pipeline:

{code:lang=xml}
h3. Stop

Stops *ALL* concept extraction service, nothing special here.
To stop *ALL* concept extraction pipelines use:


{code:language=html/xml}
INSERT DATA {
{code}

Again, to To stop a specific pipeline use the named graph of the pipeline:
{code:lang=xml}
INSERT DATA {
h3. Reload dictionary

Cleans Reload dictionary cleans the Ggazetteer cache from the file system and loads it again from the repository.

{note}In case the concept extraction service is not started, this SPARQL update operation will *NOT* schedule a dictionary reload (unlike before).{note}
The following query will initiate initiates a dictionary reload on all running pipelines. To specify a particular pipeline use a named graph like shown in the Start/Stop sections of this page.

{code:language=html/xml}
{code}

h3. Add/remove Ggazetteer configuration

Registers template queries for different entity types via INSERT/DELETE DATA.
* <[http://www.ontotext.com/owlim/ces#gazetteerConfig]> is a special (interpretable) predicate, which denotes a Gazetteer template query entry.
Add/remove gazetteer configuration registers template queries for the different entity types via INSERT/DELETE DATA.
* <[http://www.ontotext.com/owlim/ces#gazetteerConfig]> is a special (interpretable) predicate that denotes a gazetteer template query entry;
* Each Ggazetteer configuration should be added in a separate named graph (per domain), i.e. the default pipeline uses <[http://www.ontotext.com/owlim/ces#default]>;
* The template queries are also executed for all sub-classes of the defined class;
* The configuration is stored as regular triples in the repository and is loaded on the concept extraction initialization initialisation.

Example configuration which that indicates how to load all rdfs:labels of all Agents, Locations and EconomicConcepts into the Ggazetteer dictionary.
{code:language=html/xml}
INSERT DATA { GRAPH <http://www.ontotext.com/owlim/ces#default> {
{code}

{info}Adding/removing Ggazetteer configuration doesn't does not take full effect immediately. For example, adding a new template query, results in that the CES plugin starting starts to listen for entities of its corresponding type. However, it does not load already existing entities of the same type. In order to achieve that you should trigger a dictionary reload.{info}

h2. FAQ
h3. How to deploy a pipeline?

Just unpack your pipeline package into _$\{info.aduna.platform.appdata.basedir\}/repositories/$\{repository.name\}/storage/ces/pipelines/_ and the it will be discovered automatically. You can confirm it is discovered by finding an MBean, called _PipelineManager_, and checking its _AvailablePipelines_ property, which lists the URIs of all the deployed pipelines. Note that in order to start using the pipeline you need to start it.

h3. How to preserve annotation sets?