View Source

In the following sections, typical administrative tasks relating to the management of a GraphDB-Enterprise cluster instance are covered.

{toc}

h1. Adding and removing worker nodes

After the instantiation of a master node, worker nodes can be added using a JMX client application, e.g. jconsole. From the MBeans tab, select the bean associated with the master node to be modified. Each bean will be named: ReplicationCluster/ClusterInfo/<repository_id>.

Worker nodes can be added using the addClusterNode operation, with the following parameters:
* repository URL;
* replication port, which at the moment must be the servlet container's port address plus 10, e.g. 8090 for [http://192.168.1.25:8080/openrdf-sesame/repositories/];
* JMX port of the repository container (this is used for the cluster notifications and can safely be left 0 if notifications aren't necessary);
* {{readable}} flag, indicating if the node should be used for queries (true) or only for updates (false).

Worker nodes are removed using the removeClusterNode operation, which requires only the repository URL.
If the master node has the AutoReplication attribute set to true, then replication will begin as soon as the worker node is added.

If a master node is assigned two worker nodes of unknown, but different, status, then it will not be able to decide which of the nodes is correct. In this situation, both workers should be removed, the master's transaction log cleared, and the workers added back with the 'correct' worker first.

h1. Detecting cluster problems

If a master node fails completely, then the clients of the repository will start getting errors when trying to query or update the repository. In order to avoid this, use at least one additional master and the Client Failover API.

The Status attribute of the master mode indicates the cluster health according to the table:

_Table: Description of possible values of the Status attribute._
\\
| Value | Meaning |
| 0 | Available \\
Indicates that all worker nodes are synchronised and no problems have been detected. |
| 1 | Needs attention \\
A problem has prevented the cluster from synchronising the attached worker nodes. When this happens, no updates can be accepted (IsWritable becomes false) until all attached worker nodes are synchronised. If the AutoReplication attribute is true, then this will happen automatically and fairly quickly. Otherwise, manual replication must be initiated. When all workers are in synch, the Status attribute will return to 0. |
| 2 | Not available. \\
Indicates that no workers are available for processing requests. |

h1. Manual Replication

_The master node automatically initiates replication of worker nodes as needed or appropriate. The manual replication option is obsolete and no longer available._


h1. Online backup and restore

Cluster-wide backup and restore can be used to restore the cluster back to a previous operational state.

Backups are made by copying a worker node's image to a location on the master node's machine. Restoring an image is made from a local image on the master node and replicated throughout worker nodes in the cluster, propagating via peer master nodes as needed.

To start a backup, invoke the {{backup}} operation with a single parameter, the name used to identify the image. The backup image will go into a directory under the master node's repository data directory. At least two notification messages will be sent under normal operation, one to indicate that the backup operation has been started, and another to indicate that it has completed successfully.

To restore the cluster state from a backup image, invoke the {{restoreFromImage}} operation with an existing backup's name as a parameter. The image created by the backup will be replicated to every worker throughout the cluster. At least two notification messages will be sent under normal operation, one to indicate that the restore operation has been started, and another to indicate that it has completed successfully.

Important usage notes:
* *The image name "default" is reserved for internal use.* Backing-up to an image named "default" would interfere with the cluster's failure recovery capabilities.
* Backing-up a second time with the same image name will overwrite the previous image of that name.
* Each master node maintains its own set of backup images. The set of backup images available from one master would be completely unrelated to the sets of images available from other masters.
* After successfully executing a restore operation, any updates executed from the time the backup image was created to the time the cluster was successfully restored will be lost irreversibly.
* Upon failure of a restore operation, the cluster may be in an inconsistent state.
* Trying to initiate backup or restore while another backup or restore operation is already in progress on the same master node will be treated as an error and will result in immediate failure. The failure of the offending operation will not interfere with carrying out the operation that was already in progress.

h1. Remote replication

_Remote replication is no longer available._