View Source

In the following sections typical administrative tasks relating to the management of an OWLIM-Enterprise cluster instance are covered.

{toc}

h1. Adding and removing worker nodes

After the instantiation of a master node, worker nodes can be added using a JMX client application, e.g. jconsole. From the MBeans tab, select the bean associated with the master node to be modified. Each bean will be named: ReplicationCluster/ClusterInfo/<repository_id>
Worker nodes can be added using the addClusterNode operation, with the following parameters:
* repository URL
* replication port, which at the moment must be the servlet container's port address plus 10, e.g. 8090 for [http://192.168.1.25:8080/openrdf-sesame/repositories/]
* JMX port of the repository container (this is used for the cluster notifications and can safely be left 0 if notifications aren't necessary)
* *readable* flag telling if the node should be used for queries (true) or only for updates (false)

Worker nodes are removed using the removeClusterNode operation, which requires only the repository URL.
If the master node has the AutoReplication attribute set to true, then replication will begin as soon as the worker node is added.

h1. Replication

When a new worker node is attached to a master it will not be utilised until it is synchronised with the other worker nodes, i.e. the current state of the clustered repository must be replicated to the new worker node. During replication, a random up to date worker node is selected and used to replicate its state to any worker in the NEEDS_UPDATE state. If the AutoReplication attribute is set to true, then this will happen automatically as soon as a worker node is added.
Replication can be initiated manually by using the startReplication operation.

h1. Switching in a hot-swap master node

After setting up a master node it will be in a read-only mode. This is the default state of the master and is also called 'hot-standby'. In this mode it can handle query requests and despatch these requests to all its attached worker nodes.
To enable a master to handle updates, first make sure that all other masters are in the read-only mode. This can be done by checking the IsWritable attribute of each master using a JMX client (jconsole). To switch one master in to read-write mode, set the ConfiguredWritable attribute to true. Immediately afterwards, the IsWritable attribute should become true also. The read-write master is now responsible for keeping the cluster synchronised

h1. Detecting cluster problems

If a master node should fail completely then the most immediate effect will be that clients of the repository will start getting errors when trying to query or update the repository. If this should happen, a hot-swap/read-only master must be brought in to read-write mode and all query and update requests should be targeted to this master, see section&nbsp;4.3.
Once the problem with the failed master has been rectified, it can be brought online in read-only mode and the write ability can be switched back to the original master node when the time is suitable.
The Status attribute of the master mode indicates the cluster health according to the following table:
\\
| Value | Meaning |
| 0 | Read/Write mode \\
Indicates that all worker nodes are synchronised, no problems have been detected and the cluster can accept updates. |
| 1 | Read only mode \\
Either the cluster has been configured to be read-only (ConfiguredWritable=false) or a problem has prevented the cluster from accepting updates. This can happen if a worker node is out of synch. When this happens, no updates can be accepted (IsWritable becomes false) until all attached worker nodes are synchronised. If the AutoReplication attribute is true then this will happen automatically and fairly quickly. Otherwise manual replication must be initiated. When all workers are in synch, the Status attribute will return to 0 (if ConfiguredWritable=true). |
| 2 | Not available. \\
Indicates that no workers are available for processing read (or write) requests. |
_Description of possible values of the Status attribute._

h1. Manual Replication

The current Master node implementation automatically initiates replication when it detects any of the necessary conditions, such as an inconsistent state of a node with respect to the most recent update available (comparing the node's fingerprint, number of successful updates, number of statements etc). However, this can be disabled and replication can be initiated manually. Here are the steps to follow:
The cluster will indicate that it is not in a healthy state by a non-zero value of the *ClusterStatus* attribute of the JMX monitoring interface. In this situation, the inconsistent node should be shutdown and a healthy one should be chosen. This node also needs to be shut down in order to get the most recent binary images of the database. Then the binary files need to be transferred into the storage folder of the failed instance (refer to the configuration steps to find where those are located -- usually under the \~/.aduna/openrdf-sesame/repositories/<repository_id>/<storage_folder>). Then both instances must be restarted. The Master node will detect their presence and return back to normal operation.

h1. Synchronisation

During normal operations, the master node keeps the cluster synchronised by adopting the approach described in [Replication|OWLIM-Enterprise Administration#Replication]. However, if a worker node becomes out of synch with the rest of the cluster, perhaps because it was offline or unreachable due to network problems then the master node will detect this condition and proceed as follows:
* AutoReplication ON -- a random up to date worker node is used to replicate to the out of date worker node. During this time, both nodes will be unavailable and the cluster will not accept update requests (read-only mode). When replication is complete, both nodes will be returned to the \[ON\] status and the cluster will resume processing updates;
* AutoReplication OFF -- in this mode, the out of date worker will not be available for processing read requests and the cluster will remain in read-only mode. If desired, the *startReplication* operation can be invoked and when completed the worker node and cluster return to normal status.

There are other situations that can cause a worker node to become out of synch with the cluster, for example if an update is passed directly to a worker node, bypassing the master node altogether. In this situation, the master node will detect the problem and stop passing any read or update requests to the problem worker node. However, because the fingerprint of the worker will not match any known state of the cluster, the master node will not be able to bring the worker back in to synch with the cluster. In this situation, the best approach is to clear the history log of the master node(s) and if necessary restart the master. This will cause the master to choose the worker with the most recent update to use for replication to the other worker nodes that have a different fingerprint.

h1. Frequently Occurring Problems

h4. Can not connect the JMX client (jconsole) to the Apache Tomcat instance.

Make sure that Tomcat has been configured to expose the JMX interface using either JAVA_OPTS, CATALINA_OPTS or by configuring the Windows Tomcat monitor application, see section&nbsp;3.3.2.

h4. Can not copy OWLIM-Enterprise jar file to the Sesame WEB-INF/lib directory.

This directory will not exist until the Sesame war files have been deployed to the \[WEBAPPS\] directory AND Tomcat is running. If the war files have been deployed, but the directory does not exist, try restarting Tomcat.

h4. Can not connect the Sesame console to the local Sesame server at [http://localhost:8080/openrdf-sesame]

Make sure that the Sesame war files have been deployed and that Tomcat is running. Restart Tomcat if necessary.

h4. Can not create a OWLIM-Enterprise repository using the Sesame console

Make sure that the repository template file \[BIGOWLIM\]/templates/bigowlim.ttl has been copied to the 'templates' subdirectory of the Sesame console's data directory.

h4. Can not create an repository, the Sesame console says unknown Sail type -- owlim:Sail

The Sesame console cannot find the OWLIM-Enterprise jar file. Make sure it was copied from \[BIGOWLIM\]/lib to the Sesame installation folder here: \[SESAME\]/lib

h4. A long error message/Java exception is given when trying to query a new repository instance.

The Lucene jar file must be copied from the BigOWLIM distribution to the Sesame server's WEB-INF/lib directory. Restart Tomcat if necessary.

h4. Cannot use my custom rule file (pie file), an exception occurred

The tools.jar file from the Java Development Kit (JDK) must be on the classpath or alternatively copied to Sesame server's WEB-INF/lib directory.

h4. Sesame Workbench starts, but gives memory error on the 'explore' and 'query' menus

The maximum heap space must be increased, i.e. Tomcat's Java virtual machine must be allowed to allocate more memory. This can be done by setting the environment variable 'CATALINA_OPTS' to include the desired value, e.g. \-Xmx1024m