Skip to end of metadata
Go to start of metadata


This page provides information about all the components required to build a resilient Concept Extraction API with dynamically updated Gazetteer dictionaries. In case you only need to be able to extract named entities from text with a static dictionary and don't care about high availability, you can do it with a single worker.




  • (OPTIONAL) - an optional sub-directory within worker persistence directory, points to
    by default


  • (REQUIRED) - full path to the *.xgapp file to load, including the filename and extension. It should
    start with file:/, otherwise it will be interpreted as relative to the application context
  • -Dpipeline-pool-max-size (OPTIONAL, default = 1) - the maximum number of Gate pooled applications. In other words,
    the number of simultaneous annotations this worker will support

Recommended JVM settings

  • GC: -XX:+UseConcMarkSweepGC -verbose:gc -verbose:sizes -Xloggc:/path/to/logs/gc.log
    -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
    -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M
  • Compiler: -XX:+TieredCompilation
  • -Xmx: dependent on the pipeline in use, each pipeline package should state how much memory it requires



All timeouts are in milliseconds unless specified otherwise.


  • (OPTIONAL) - the name of this coordinator. Used for suffix for the directory under home in which
    the coordinator persists its state
  • -Dcoordinator.stateDirectory (OPTIONAL, default = <home>/.coordinator) - set the directory for Coordinator's state files.
  • -Dcoordinator.baseUrl (REQUIRED) - the base address of this coordinator. Needed to be able to give workers URLs that point
    back to the coordinator


  • -Dcoordinator.sparql.endpoint (REQUIRED) - the remote SPARQL endpoint URL, including repository. Usually in the form
  • -Dcoordinator.sparql.connectionTimeout (OPTIONAL, default = 10000) - establish connection to the SPARQL endpoint timeout
  • -Dcoordinator.sparql.socketTimeout (OPTIONAL, default = 600000) - socket timeout for SPARQL queries


  • -Dcoordinator.worker.connectionTimeout (OPTIONAL, default = 10000) - establishing connection to a worker timeout
  • -Dcoordinator.worker.socketTimeout (OPTIONAL, default = 10000) - socket timeout for worker communication
  • -Dcoordinator.worker.retries (OPTIONAL, default = 2)
  • -Dcoordinator.worker.retryDelay (OPTIONAL, default = 2000)
  • -Dcoordinator.worker.retryDelayMult (OPTIONAL, default = 2.0)

Updates (dictionaries)

  • -Dcoordinator.updates.checkDelay (OPTIONAL, default = 10000) - initial delay before the first check for updates
  • -Dcoordinator.updates.checkRate (OPTIONAL, default = 600000) - interval between checks for updates
  • -Dcoordinator.updates.maxWorkersToVerify (OPTIONAL, default = 2) - a change will first be verified on a single workers
    before being propagated to all workers. This specified the maximum number of workers to attempt to change before giving up
  • -Dcoordinator.updates.verificationTimeout (OPTIONAL, default = 1800000) - the maximum time to wait for update verification

Updates (models)


  • -Dcoordinator.annotation.freeWorkerTimeout (OPTIONAL, default = 30000) - the maximum time to wait for free worker to
    become available for annotation
  • -Dcoordinator.annotation.connectionTimeout (OPTIONAL, default = 10000) - establish connection to a worker for annotation timeout
  • -Dcoordinator.annotation.socketTimeout (OPTIONAL, default = 60000) - socket timeout for annotation to a worker

Watchdog / heartbeat checker

  • -Dcoordinator.watchdog.checkDelay (OPTIONAL, default = 60000) - initial delay before the first heartbeat check
  • -Dcoordinator.watchdog.checkRate (OPTIONAL, default = 60000) - interval between heartbeat checks


All files relative to ~/.coordinator/[${}]/ , that is ~/.coordinator if is unset and
~/.coordinator/<>/ if it is set

  • workers.json - persisted workers list and configuration
  • sparql-update-history.json - the update history for SparqlUpdatesManager
  • models.json - latest known models for ModelUpdatesManager

JVM settings

  • GC: -XX:+UseConcMarkSweepGC -verbose:gc -verbose:sizes -Xloggc:/path/to/logs/gc.log
    -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
    -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M
  • Compiler: -XX:+TieredCompilation
  • -Xmx: depends on the pipeline, each pipeline should come with memory requirements

GraphDB and EUF plug-in

This is the semantic database you are going to need to enable the dynamic dictionary updates functionality. In case you don't already have GraphDB, go get it here. Official 6.0 documentation.

EUF stands for 'Entity Updates Feed'. This plug-in publishes entity update feeds which are consumed by the Coordinator.


To install the EUF plug-in in GraphDB

  1. Provide the following Java parameter to GraphDB on startup
  2. Unpack the EUF plug-in in your plugins home (prior to starting GraphDB)
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.