Replication Cluster Installation

Skip to end of metadata
Go to start of metadata
Search

OWLIM Documentation

All versions

OWLIM 5.1 (latest)
OWLIM 5.0
OWLIM 4.3
OWLIM 4.2
OWLIM 4.1
OWLIM 4.0
OWLIM 3.5 (this version)

This document is for users and system administrators that are unfamiliar with the OWLIM Replication Cluster software. The information contained in the following pages should be enough to get started with OWLIM Replication Cluster, i.e. to install and configure the software on both master and worker nodes.

Description

A cluster is made up of two basic node types: masters and workers. A master node is responsible for:

  • ensuring that all worker nodes are synchronised
  • propagating updates (insert and delete tasks) across all workers and checking updates for inconsistencies
  • load balancing read requests (query execution tasks) between all available worker nodes
  • providing a uniform entry point for client software, where the client interacts with the cluster as though it is just a normal Sesame openRDF repository
  • providing a JMX management interface for monitoring and administrating the cluster
  • automatic cluster re-configuration in the event of failure of one or more worker nodes
  • user-directed dynamic configuration of the cluster to add/remove ẃorker nodes

Worker nodes are normal BigOWLIM instances exposed via the Sesame HTTP server (a Web application running inside a servlet container, e.g. Apache Tomcat). Master nodes do not store or manage any RDF data themselves (although it is possible to run a master and a worker instance on the same physical machine). From an external point of view, a master behaves exactly like any other Sesame repository that is exposed via the Sesame HTTP server, but internally utilises worker nodes (also exposed via the Sesame HTTP server) to store and manage RDF data. In this way, parallel query execution performance can be increased by having worker nodes answer queries in parallel.
Replication cluster software therefore has two advantages: Improved parallel query execution performance and resilience to hardware failures of individual nodes in the cluster.
Furthermore, each cluster can contain more than one master node. In this case, every master monitors the health of all the workers and can distribute query execution tasks between them, effectively allowing a cluster to have multiple entry points. However, only one master at any given time can be responsible for propagating updates across the workers and this node is said to be in 'normal' (also called 'read/write') mode. All other master nodes are in the 'hot-standby' (also called 'read-only') mode. Switching the modes of master nodes can be achieved using the JMX management interface.
The rest of this guide gives detailed step by step instructions for installing and configuring a BigOWLIM replication cluster that contains both master and worker nodes.

Third party software

This guide covers the installation of BigOWLIM. Required software for this comprises:

Note that another servlet container can be used instead of Tomcat. However, the rest of this guide will describe the installation of BigOWLIM with this software. No part of this guide is intended to supersede the documentation published with these 3rd party software components and the reader is strongly advised to familiarise himself/herself with these.

Preparation

A suitable application server must be installed. The examples in this guide assume Apache Tomcat version 6.x is used. Importantly, the user must have sufficient privileges to write to the Tomcat webapps directory and to stop and start the Tomcat daemon.
The Aduna OpenRDF Sesam e framework version 2.x is required. The examples in this guide use version 2.3.2.
A BigOWLIM distribution is required. This is published as a zip file. The examples in this guide use version 3.4.
Furthermore, a Java Development Kit (JDK) version 1.5 or later is necessary since this contains the jconsole management application.

Installation steps

Assuming that an instance of Tomcat is available on all cluster nodes, the installation proceeds as follows:

  • Install worker nodes
  • Install master nodes
  • Connect worker nodes to master nodes

One thing to remember is that although workers behave like any other BigOWLIM instances, worker nodes need to expose a cluster management end-point (when necessary) using a port number that is given by the Tomcat port number plus 10, i.e. usually 8080 + 10 = 8090. You must make sure that no other software is listening on this port number.

Install worker nodes

Install worker nodes using the Sesame HTTP server referring to the BigOWLIM quick start guide if necessary.

Install master node

Install the master node as though it is any other BigOWLIM instance except that a different configuration template file must be used. A suitable template called cluster.ttl can be found in the templates folder of the distribution ZIP file.
Configure the Tomcat instance to expose its JMX interface, which can be done in a variety of ways. Firstly, a JMX port number must be chosen, this example uses port 8089. Then the JAVA_OPTS environment variable can be set so that the Tomcat server exposes the JMX interface on this port number:

Alternatively, use the CATALINA_OPTS environment variable, or if using Windows, right click on the Tomcat monitor icon in the system tray, select the Java tab and add the above options to the 'Java Options' field.
Restart Tomcat.
Run the Sesame console application:

Connect to the tomcat instance just started
Create a cluster master node:
Exit the Sesame console:
At this point the master node is running, but the cluster it manages has no attached worker nodes.

Connect the worker nodes

Ensure that the replication cluster is loaded by the http server, by starting the openrdf workbench and selecting it. To do this open a browser and connect to the workbench using this url (or similar):

Use the workbench to select the correct http server where the replication cluster instance is running. Then choose the replication cluster repository instance. This will have the ID as entered when the repository was created. It will likely be necessary to force the repository to load and initialise by clicking on the query button. This will cause an exception to be displayed containing the text 'No available worker nodes'.
Start a JMX client, e.g. jconsole from the Java Software Development Kit.
Connect to the Tomcat process using the port number selected above. When using a non-standard port number, connect as a remote process using localhost:<port_number>. In this example enter the following:
After connecting, select the MBeans tab and open the ReplicationCluster bean (if this is not visible, try to force the instance in to memory as described above). Add worker nodes using the addClusterNode operation. Two or more parameters are required, the first is the URL of the HTTP endpoint, e.g. http://localhost:8080/openrdf-sesame/repositories/w3, and the second is the port number used for replication. At the moment, this is hard-coded to be the servlet container (Tomcat) port number plus 10, so usually this will be 8090. Any other parameters can be ignored at this stage.
To ensure that the worker node has been added correctly, check the attribute NodeStatus. This is an array and will need to be expanded by double-clicking on it. The newly added worker node will be shown here with its full URL. If it has been added successfully, it will have the status [ON]. If however, it has the status [OFF] then the node has not been added properly. A common mistake is to enter the Sesame workbench URL (containing '...openrdf-workbench...') instead of the repository endpoint URL (containing '...openrdf-sesame...'). The proper URL to use can be seen in the list of repositories shown in the workbench by clicking on the the 'Repositories' link in the left-hand menu (assuming that the workbench is already using the correct Sesame server).
As soon as one or more workers have been attached. the cluster can now be used as a read-only repository, notifications and statistics can be monitored through the JMX client.

Put the cluster in to read-write mode

When a master is brought online, it will be in read-only mode, otherwise known as 'hot-standby'. To put the master in to read-write mode, attach to its JMX interface (using jconsole) and set the ConfiguredWritable attribute to true. If there is no problem, then the IsWritable attribute should now also change to true.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.