An OWLIM-Enterprise cluster is organised as one or more master nodes that manage one or more worker nodes. Fail-over and load-balancing between worker nodes is automatic. Multiple (stand-by) master nodes ensure continuous cluster performance even in the event of a master node failure. The cluster deployment can be modified when running, which allows worker nodes to be added during peak times, or released for maintenance, backup, etc.Such a set up guarantees a high performance always on service that can handle millions of query requests per hour.
The OWLIM-Enterprise cluster master manages and distributes atomic requests (query evaluations and update transactions) to a set of OWLIM instances.
The Master node works in two modes - 'Normal' and 'Read-only'. The difference between these is that update requests are processed only when the Master node is in the 'Normal' mode of operation, meaning that no issues with the registered nodes have been detected, such as not responding to status queries, HTTP errors, invalid fingerprint etc.
Worker nodes are OWLIM-SE repositories configured with identical rule-sets hosted in the openrdf-sesame Web application running in a Java servlet container, such as Tomcat. These are accessible to the Master node via the HTTP protocol of the exported SPARQL endpoint of the Sesame service.
Every read request (SPARQL query) is passed to one of the available worker nodes. The node is chosen based on runtime statistics: number of queries currently running on that node and average query evaluation time. The available nodes are organised in a priority queue which is rearranged after every read operation. If an error occurs (time-out, lost connection, etc) the request is resent to one of the other available nodes and so on until it is successfully processed.
Updates are handled in the following way: Each update is first logged. Then a probe node is selected and the request is sent to it. If the update is successful, a set of control queries are evaluated against it in order to check its consistency with respect to the data it holds. If this control suite passes, the update is forwarded to the rest of the nodes in the cluster. During the handling of an update the nodes that do not successfully process it are disabled and scheduled for recovery replication.
For example, the following update will cause a synchronisation error, because the result of the update will be different for each worker node:
i.e. the value of ?timestamp will be set according to the clock and time of execution at each worker. After promulgating this update to all worker nodes, the master will detect a difference in each worker nodes signature (which will not match any previous state in its logs) and will trigger a full replication between the worker that first received the update and all other worker nodes.
Skip to end of metadata Go to start of metadata