GraphDB-Enterprise Administration

compared with
Version 3 by reneta.popova
on Sep 10, 2014 17:00.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (17)

View Page History
In the following sections typical administrative tasks relating to the management of an a GraphDB-Enterprise cluster instance are covered.

{toc}
h1. Adding and removing worker nodes

After the instantiation of a master node, worker nodes can be added using a JMX client application, e.g. jconsole. From the MBeans tab, select the bean associated with the master node to be modified. Each bean will be named: ReplicationCluster/ClusterInfo/<repository_id>.
Worker nodes can be added using the addClusterNode operation, with the following parameters:
* repository URL;
* replication port, which at the moment must be the servlet container's port address plus 10, e.g. 8090 for [http://192.168.1.25:8080/openrdf-sesame/repositories/];
* JMX port of the repository container (this is used for the cluster notifications and can safely be left 0 if notifications aren't necessary);
* {{readable}} flag, indicating if the node should be used for queries (true) or only for updates (false).

Worker nodes are removed using the removeClusterNode operation, which requires only the repository URL.
If the master node has the AutoReplication attribute set to true, then replication will begin as soon as the worker node is added.

If a master node is assigned two worker nodes of unknown, but different, status, then it will not be able to decide which of the nodes is correct. In this situation, both workers should be removed, the master's transaction log cleared, and the workers added back with the 'correct' worker first.

h1. Detecting cluster problems

If a master node should fail completely fails completely, then the most immediate effect will be that clients of the repository will start getting errors when trying to query or update the repository. In order to avoid this, use at least one additional master and the Client Failover API.
The Status attribute of the master mode indicates the cluster health according to the following table:

_Table: Description of possible values of the Status attribute._
\\
| Value | Meaning |
Indicates that all worker nodes are synchronised and no problems have been detected. |
| 1 | Needs attention \\
A problem has prevented the cluster from synchronising the attached worker nodes. When this happens, no updates can be accepted (IsWritable becomes false) until all attached worker nodes are synchronised. If the AutoReplication attribute is true then this will happen automatically and fairly quickly. Otherwise, manual replication must be initiated. When all workers are in synch, the Status attribute will return to 0. |
| 2 | Not available. \\
Indicates that no workers are available for processing requests. |
_Description of possible values of the Status attribute._

h1. Manual Replication

The master node will automatically initiate replication if {{AutoReplication}} is set to {{true}} and it detects any of the necessary conditions, such as an inconsistent state of a node with respect to the most recent update available (comparing the node's fingerprint, number of successful updates, number of statements etc). However, this can be disabled and replication can be initiated manually using the {{startReplication}} operation.
The master node automatically initiates replication, when {{AutoReplication}} is set to {{true}} and it detects any of the necessary conditions, such as inconsistent state of a node with respect to the most recent update available (comparing the node's fingerprint, number of successful updates, number of statements etc). Otherwise, the replication can be initiated manually using the {{startReplication}} operation.


h1. Online backup and restore

The deep-replication mechanism is also present in the master node and this is the basis for replication to a remote cluster (see below), as well as for online backup and restore.

Backups are made by copying a worker node's image to the specified path on the master node machine. Restoring an image is made from a local image on the master node and replicated to all the worker nodes. The first step for preparing a backup operation is to ensure that the replication parameters for the master node have been set up - there are no default values for these. In the {{ClusterInfo}} MBean, set the following parameters: {{MasterReplicationPort}} should be set to an unused port number and {{MasterUrl}} should be set to the actual sesame server URL that is used to remotely access the cluster master.

To start a backup, invoke the {{backup}} operation. This takes a single parameter, which is the path to a directory on the master node where the image will be stored. Ensure that the server (Tomcat) has sufficient rights to this directory. *If this directory already exists then any files in it will be deleted*. Two notification messages will be sent, one to say that the backup operation has been started and another to say if it has successfully completed.
However, in situations where the network link between two data-centres is poor (slow, unreliable, high transience, etc) then a better approach for keeping the two data-centres (i.e. two GraphDB clusters) in synch will be to use the 'remote replication'. Using this feature, a master node of one cluster can be added as a pseudo-worker not to the other cluster making a hierarchy of clusters. Master nodes for remote clusters added in this way do not take part in query answering, but do receive all updates. Also many of the time-out and synchronisation parameters for the remote cluster are relaxed in order to cope with a more troublesome network layer.

When a remote master is added to another master it will have has its own set of worker nodes. In such a configuration each update handled by the remote master will be is slower, because it needs to update its own worker nodes. Before adding the remote master, set the RemoteMaster attribute to true on this node. This attribute value indicates that the remote master will receive only updates, but no read requests by the controlling master. The updates will be queued and sent asynchronously from the rest of the worker nodes so that no delay in the operation of the controlling master occurs. Incremental replications are made more likely when synchronizsing a remote master by increasing the tolerated distance between its the current and expected state. Deep replications will be is triggered if when the remote master can not be made up-to-date, based on the sequence of stored updates in the transaction log. If a deep replication is necessary, a regular worker node is selected to send its contents and during the replication, the controlling master will not be in writable state. However, in cases when the remote master is only few updates behind, an incremental replication will occur and the controlling master will remain in a writable state and will be able to accept and process updates.