View Source

In the following sections typical administrative tasks relating to the management of an OWLIM-Enterprise cluster instance are covered.

{toc}

h1. Adding and removing worker nodes

After the instantiation of a master node, worker nodes can be added using a JMX client application, e.g. jconsole. From the MBeans tab, select the bean associated with the master node to be modified. Each bean will be named: ReplicationCluster/ClusterInfo/<repository_id>
Worker nodes can be added using the addClusterNode operation, with the following parameters:
* repository URL
* replication port, which at the moment must be the servlet container's port address plus 10, e.g. 8090 for [http://192.168.1.25:8080/openrdf-sesame/repositories/]
* JMX port of the repository container (this is used for the cluster notifications and can safely be left 0 if notifications aren't necessary)
* {{readable}} flag indicating if the node should be used for queries (true) or only for updates (false)

Worker nodes are removed using the removeClusterNode operation, which requires only the repository URL.
If the master node has the AutoReplication attribute set to true, then replication will begin as soon as the worker node is added.

If a master node is assigned two worker nodes of unknown, but different, status, then it will not be able to decide which of the nodes is correct. In this situation, both workers should be removed, the master's transaction log cleared and the workers added back with the 'correct' worker first.

h1. Detecting cluster problems

If a master node should fail completely then the most immediate effect will be that clients of the repository will start getting errors when trying to query or update the repository. In order to avoid this use at least one additional master and the Client Failover API.
The Status attribute of the master mode indicates the cluster health according to the following table:
\\
| Value | Meaning |
| 0 | Available \\
Indicates that all worker nodes are synchronised and no problems have been detected. |
| 1 | Needs attention \\
A problem has prevented the cluster from synchronising the attached worker nodes. When this happens, no updates can be accepted (IsWritable becomes false) until all attached worker nodes are synchronised. If the AutoReplication attribute is true then this will happen automatically and fairly quickly. Otherwise manual replication must be initiated. When all workers are in synch, the Status attribute will return to 0. |
| 2 | Not available. \\
Indicates that no workers are available for processing requests. |
_Description of possible values of the Status attribute._

h1. Manual Replication

The master node will automatically initiate replication if {{AutoReplication}} is set to {{true}} and it detects any of the necessary conditions, such as an inconsistent state of a node with respect to the most recent update available (comparing the node's fingerprint, number of successful updates, number of statements etc). However, this can be disabled and replication can be initiated manually using the {{startReplication}} operation.


h1. Online backup and restore

The deep-replication mechanism is also present in the master node and this is the basis for replication to a remote cluster (see below) as well as for online backup and restore.

Backups are made by copying a worker node's image to the specified path on the master node machine. Restoring an image is made from a local image on the master node and replicated to all the worker nodes. The first step for preparing a backup operation is to ensure that the replication parameters for the master node have been set up - there are no default values for these. In the {{ClusterInfo}} MBean, set the following parameters: {{MasterReplicationPort}} should be set to an unused port number and {{MasterUrl}} should be set to the actual sesame server URL that is used to remotely access the cluster master.

To start a backup, invoke the {{backup}} operation. This takes a single parameter, which is the path to a directory on the master node where the image will be stored. Ensure that the server (Tomcat) has sufficient rights to this directory. *If this directory already exists then any files in it will be deleted*. Two notification messages will be sent, one to say that the backup operation has been started and another to say if it has successfully completed.

To restore the cluster state from a backup image, invoke the {{restoreFromImage}} operation. This takes one parameter that is the path to the directory where the image is stored. Once started, the master will copy this image to each of the worker nodes in turn.

h1. Remote replication

It is sometimes desirable to maintain two clusters for the purpose of disaster recovery. When the network link between two data-centres is fast and reliable, then OWLIM instances in both data-centres can be connected up in to a single OWLIM cluster. This is the preferred approach.

However, in situations where the network link between two data-centres is poor (slow, unreliable, high transience, etc) then a better approach for keeping the two data-centres (i.e. two OWLIM clusters) in synch will be to use the 'remote replication'. Using this feature, a master node of one cluster can be added as a pseudo-worker not to the other cluster making a hierarchy of clusters. Master nodes for remote clusters added in this way do not take part in query answering, but do receive all updates. Also many of the time-out and synchronisation parameters for the remote cluster are relaxed in order to cope with a more troublesome network layer.

When a remote master is added to another master it will have its own set of worker nodes. In such a configuration each update handled by the remote master will be slower, because it needs to update its own worker nodes. Before adding the remote master, set the RemoteMaster attribute to true on this node. This attribute value indicates that the remote master will receive only updates, but no read requests by the controlling master. The updates will be queued and sent asynchronously from the rest of the worker nodes so that no delay in the operation of the controlling master occurs. Incremental replications are made more likely when synchronizing a remote master by increasing the tolerated distance between its the current and expected state. Deep replications will be triggered if the remote master can not be made up-to-date based on the sequence of stored updates in the transaction log. If a deep replication is necessary, a regular worker node is selected to send its contents and during the replication the controlling master will not be in writable state. However, in cases when the remote master is only few updates behind an incremental replication will occur and the controlling master will remain in a writable state and will be able to accept and process updates.