
# Download [extractor-web.war|http://maven.ontotext.com/content/repositories/publishing-releases/com/ontotext/ces/extractor-web/1.0.1/extractor-web-1.0.1.war]
# You can now start your webapp container
# Deploy the war you just downloaded. In Tomcat you simply need to move it to its /webapps sub-directory and it will get picked up.
# Now go to [http://localhost:8080/extractor-web/apidocs] for live documentation. {note}Due to [Swagger|https://helloreverb.com/developers/swagger] limitations, the most important endpoint, namely /extract, cannot have live documentation. This is why it's explained [here|https://confluence.ontotext.com/display/SWS/Annotate+content].{note}
h1. High-availability setup
The high-availability setup architecture includes several components, communicating through RESTful calls. Each component has its own role in the environment. Here's a list with brief explanation of each module:
* GraphDB with EUF plugin -- the *GraphDB* module maintains a semantic database, containing RDF data used within the system. Its *EUF* plugin (EUF stands for _Entity Update Feed_) is responsible for providing the outer world with notifications about every entity (concept) within the database that has been modified in any way (added, removed, edited)
* Concept Extraction API Coordinator -- the *Coordinator* module accepts annotation requests and dispatches them towards a group of Concept Extraction *Workers* (see below). The Coordinator communicates with the semantic database in order to track for changes leading to updates in every Worker's Dynamic Gazetteer.
* Concept Extraction API Worker -- a *Worker* module evaluates annotation requests. It maintains a pool of GATE pipeline instances, used for text analysis and concept extraction.
h2. Installing GraphDB and the EUF plugin
Information about installing and using the GraphDB semantic database can be found on the official [GraphDB documentation page|http://graphdb.ontotext.com/display/GraphDB6/Home].
In order to install the Entity Update Feed plugin, check the [CES Components|CES Components#GraphDBandEUFplugin] page.
(!) OPTIONAL: insert a single random statement having rdfs:label as predicate in order to activate the EUF plugin
h2. Setting up a Coordinator
We will be using *Apache Tomcat* as a web application container for this example.
# Download the [Coordinator web application|http://maven.ontotext.com/content/repositories/publishing-releases/com/ontotext/ces/coordinator/1.0.1/coordinator-1.0.1.war] from our Nexus instance
# Add the coordinator-specific parameters to the Tomcat setup -- use the {{<tomcat-home>/bin/setenv.sh}} file; example:
{code:language=bash|title=coordinator setenv.sh}
#!/bin/bash
# general options -- name, storage directory location, URL (required)
export GENERAL_OPTS="-Dcoordinator.name=master -Dcoordinator.stateDirectory=/path/to/storage/dir/coordinator -Dcoordinator.baseUrl=http://the.base.url:7070/coordinator"
# sparql endpoint options -- location
export ENDPOINT_OPTS="-Dcoordinator.sparql.endpoint=http://sparql.endpoint.be:8080/graphdb/repositories/my-repo"
# VM options -- heap size, etc
export JVM_OPTS="-XX:+UseConcMarkSweepGC -XX:+TieredCompilation -Xmx1g"
export CATALINA_OPTS="$GENERAL_OPTS $ENDPOINT_OPTS $JVM_OPTS"
{code}
# deploy the coordinator web application in Tomcat's {{webapps}} directory
# (Re-)start the Tomcat instance
(!) More information about all Coordinator configuration parameters can be found [here|CES Components#Coordinator].
h2. Setting up a Worker node
We will be using *Apache Tomcat* as a web application container for this example.
# Download the [CES Worker web application|http://maven.ontotext.com/content/repositories/publishing-releases/com/ontotext/ces/extractor-web/1.0.1/extractor-web-1.0.1.war] from our Nexus instance
# Add the worker-specific parameters to the Tomcat setup -- use the {{<tomcat-home>/bin/setenv.sh}} file; example:
{code:language=bash|title=worker setenv.sh}
#!/bin/bash
# garbage collection and compiler
export J_OPTS="-XX:+UseConcMarkSweepGC -XX:+TieredCompilation -Xmx4g"
# worker options -- path to pipeline, size of pipeline pool
export W_OPTS="-Dworker.name=st-worker -Dgate.app.location=file:/path/to/pipeline/application.xgapp -Dpipeline-pool-max-size=2"
export CATALINA_OPTS="$J_OPTS $W_OPTS"
{code}
# Deploy the {{extractor-web.war}} in Tomcat's {{webapps}} directory
# (Re-)start the Tomcat instance
(!) More information about all Worker configuration parameters can be found [here|CES Components#Worker]
h2. Adding a Worker node into the Coordinator
(!) Assumptions:
* the *coordinator* instance is located at {{http://coordinator.url:7070/coordinator}}
* the *worker* instance is located at {{http://worker.url:6060/worker}}
* the *worker* instance has a pipeline pool with size 2
Using a REST client, execute the following request to the *coordinator* instance ({{http://coordinator.url:7070}}):
{code}
POST /coordinator/workers
Content-type: application/json
[{"capacity":2, "url":"http://worker.url:6060/worker"}]
{code}
* {{capacity}} is the number of pipeline instances in the worker pool.
* {{url}} is the location of the worker instance.
(!) Instead of a REST client, one could use the Coordinator' Swagger Documentation endpoint, located at {{http://coordinator.url:7070/coordinator/apidocs}} and the specific *POST* or *PUT* requests.