OWLIM-Enterprise Basics

Skip to end of metadata
Go to start of metadata
Search
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

OWLIM Documentation

Next versions

OWLIM 5.4

GraphDB 6.0 & 6.1
GraphDB 6.2
GraphDB 6.3
GraphDB 6.4
GraphDB 6.5
GraphDB 6.6
GraphDB 7.0
GraphDB 7.1

Previous versions

OWLIM 5.2
OWLIM 5.1
OWLIM 5.0
OWLIM 4.4
OWLIM 4.3
OWLIM 4.2
OWLIM 4.1
OWLIM 4.0

An OWLIM-Enterprise cluster is organised as one or more master nodes that manage one or more worker nodes. Fail-over and load-balancing between worker nodes is automatic. Multiple (stand-by) master nodes ensure continuous cluster performance even in the event of a master node failure. The cluster deployment can be modified when running, which allows worker nodes to be added during peak times, or released for maintenance, backup, etc. Such a set up guarantees a high performance always on service that can handle millions of query requests per hour.

Description

A cluster is made up of two basic node types: masters and workers. The cluster master manages and distributes atomic requests (query evaluations and update transactions) to a set of workers. A master node is responsible for:

  • ensuring that all worker nodes are synchronised
  • propagating updates (insert and delete tasks) across all workers and checking updates for inconsistencies
  • load balancing read requests (query execution tasks) between all available worker nodes
  • providing a uniform entry point for client software, where the client interacts with the cluster as though it is just a normal Sesame openRDF repository
  • providing a JMX management interface for monitoring and administrating the cluster
  • automatic cluster re-configuration in the event of failure of one or more worker nodes
  • user-directed dynamic configuration of the cluster to add/remove ẃorker nodes

Worker nodes are based on OWLIM-SE and are set up using the Sesame HTTP server (a Web application running inside a servlet container, e.g. Apache Tomcat). Master nodes do not store or manage any RDF data themselves (although it is possible to run a master and a worker instance on the same physical machine). From an external point of view, a master behaves exactly like any other Sesame/OWLIM repository that is exposed via the Sesame HTTP server, but utilises worker nodes (also exposed via a Sesame HTTP server) to store and manage RDF data. In this way, parallel query execution performance can be increased by having worker nodes answer queries in parallel.

OWLIM-Enterprise therefore has the following advantages over a stand-alone OWLIM-SE instance:

  • Improved parallel query execution performance
  • Resilience to hardware failures of individual nodes in the cluster
  • Superior monitoring and control functions

Master Node

A master node of an OWLIM-Enterprise cluster implements the Sesame Repository interfaces and appears to clients like any other OWLIM instance. However, it does not store any RDF data itself, rather its function is to route queries and update requests to a set of OWLIM-SE instances (nodes).

A cluster can contain more than one master node. In this case, every master monitors the health of all the workers and can distribute query execution tasks between them, effectively allowing a cluster to have multiple entry points for queries. However, only one master at any given time can be responsible for propagating updates across the workers and this node is said to be in 'normal' (also called 'read/write') mode. All other master nodes are in the 'hot-standby' (also called 'read-only') mode. Switching the modes of master nodes can be achieved using the JMX management interface.

The master node that is currently in the 'normal' mode will respond to problems with the worker nodes as soon as they are detected, such as not responding to status queries, HTTP errors, invalid fingerprint etc. This typically takes just a few seconds. If a worker is lost completely then it simply does not take part in cluster operations and the cluster master remains in 'normal' mode, even though it will show the worker in the OFF state and will send a JMX notification that this has occurred.

In case the 'normal' master node fails and does not respond to HTTP requests, the cluster will process only read requests (e.g. query evaluations) via any hot-spare masters until the main master is running again or the standby master is switched in to 'normal' mode.

Worker nodes

Worker nodes are OWLIM-SE repositories configured with identical rule-sets hosted in the openrdf-sesame Web application running in a Java servlet container, such as Tomcat. These are accessible to the Master node via the HTTP protocol of the exported SPARQL endpoint of the Sesame service.

Read requests (query evaluations)

Every read request (SPARQL query) is passed to one of the available worker nodes. The node is chosen based on runtime statistics: number of queries currently running on that node and average query evaluation time. The available nodes are organised in a priority queue which is rearranged after every read operation. If an error occurs (time-out, lost connection, etc) the request is resent to one of the other available nodes and so on until it is successfully processed.

Updates

Updates are handled in the following way: Each update is first logged. Then a probe node is selected and the request is sent to it. If the update is successful, any control queries are evaluated against it in order to check its consistency with respect to the data it holds. If this control suite passes, the update is forwarded to the rest of the nodes in the cluster. During the handling of an update the nodes that do not successfully process it are disabled and scheduled for recovery replication.

The master node can be configured with a set of control SPARQL queries, which may be only CONSTRUCT or ASK queries. The CONSTRUCT queries, when evaluated, check for the presence of the statements generated from the query. If any of these statements are missing, then a condition is raised and the control suite is considered to have failed. An example of a control query could be the check for proper values respecting the rdfs:range of an arbitrary property, e.g.

An example CONSTRUCT control query


The above query is somewhat redundant if the rule-set has such an inference rule included, which is the case with the built-in rule-sets currently distributed with OWLIM (such as "rdfs", "owl-horst" and "owl-max").
The Ask queries raise conditions in the case that they return true (i.e. there is a sub-graph fulfilling the statement patterns of the query). An example of such a control query is the check for common members of mutually disjoint classes, e.g.

An example ASK control query


The set of control queries is evaluated against the node on which the update was probed - if these queries pass on that node, then the update is considered 'safe' and is then forwarded to the remaining nodes in the cluster.
The file with the control queries can be preset using the <http://www.ontotext.com/trree/owlim#queriesFile> configuration parameter of the master node. The file format is the same as the one used in the GettingStarted demo application included with the OWLIM distribution.

Limitations

Special care must be taken when executing updates on an OWLIM cluster, specifically, an update must be completely deterministic in terms of the change of state in the repository.

For example, the following update will cause a synchronisation error, because the result of the update will be different for each worker node:

PREFIX ex:<http://example.com#/>
INSERT { ex:a ex:p ?timestamp }
WHERE { BIND( now() as ?timestamp ) }

i.e. the value of ?timestamp will be set according to the clock and time of execution at each worker. After promulgating this update to all worker nodes, the master will detect a difference in each worker nodes signature (which will not match any previous state in its logs) and will trigger a full replication between the worker that first received the update and all other worker nodes.

Synchronisation and replication

During normal operations, the master node keeps the cluster synchronised by processing updates as described above. If a worker node becomes out of synch with the rest of the cluster, perhaps because it was offline or unreachable due to network problems then the master node will detect this condition. If the auto-replication attribute is set to false then a notification is sent and the cluster master switches in to read-only mode, i.e. it will not process any more updates. If desired, the startReplication operation can be used to start replication manually and when completed the worker node and cluster will return to normal status.

If the auto-replication attribute is set to true, or if replication is started manually, then the master will attempt to bring the problem worker node to the same state as the other worker nodes using one of the following replication methods:

  • incremental replication - if the master recognises that the signature of the out-of-synch worker corresponds to a known state that was reached some way back in the transaction log, then this replication technique will simply replay the missing transactions to the problem worker node until it reaches the same state as the other worker nodes. During this time, the cluster remains in read-write mode and is able to process updates.
  • full replication - if incremental replication is not possible, e.g. if the state of the problem worker is not recognised by the master, then a random up-to-date worker node is used to replicate to the out-of-date worker node. During this time, both nodes will be unavailable and the cluster will not accept update requests (read-only mode) and if no other workers are available, it will not accept queries either. This kind of replication involves shutting down both workers and initiating a binary transfer of the good worker's storage folder directly to the storage folder of the bad worker. When replication is complete, both nodes will be returned to the [ON] status and the cluster will resume processing updates;
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.