Smart Replication

Skip to end of metadata
Go to start of metadata
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

GraphDB Documentation

Next versions

GraphDB 6.6
GraphDB 7.0
GraphDB 7.1

Previous versions

GraphDB 6.4
GraphDB 6.3
GraphDB 6.2
GraphDB 6.0 & 6.1

OWLIM 5.4
OWLIM 5.3
OWLIM 5.2
OWLIM 5.1
OWLIM 5.0
OWLIM 4.4
OWLIM 4.3
OWLIM 4.2
OWLIM 4.1
OWLIM 4.0

The idea behind smart replication is to automatically choose between incremental update and full replication of a given worker, based on which one is better.

Parameters

There are 3 parameters that control the smart replication process:

Parameter Type Default value Description
NetSpeedBitsPerSec bits/sec (long) 104857600 (100Mbps) The network speed. Used to estimate the time for full replication.
FullReplicationTimeFactor ratio (float) 1.3 Speed-up ratio. See below.
MinTimeToConsiderFullReplicationS seconds (long) 600 (10 minutes) Minimum absolute time. See below.

These parameters are controlled via the JMX bean ReplicationCluster:name=ClusterInfo/{$MASTER} and are persisted in the master's configuration file.

The parameter *IncrementalUpdateLimit, which used to control the old logic, is now removed.

Heuristics

Generally, incremental updates are preferable because they affect only the updated worker node (the full replication needs another worker from which to do the replication). A planned improvement is to leave the cluster in RW mode during incremental updates, which would make them even more preferable.

Therefore, the current heuristics is the following: a full replication is preferable only when it is considerably faster than the incremental replication. How much faster is controlled by two parameters: FullReplicationTimeFactor and MinTimeToConsiderFullReplicationS. Let's say that the estimate of incremental update is incrementalDurationS and the estimate of full replication is replicationDurtationS. GraphDB Enterprise will prefer the full replication when both of these are true:

  1. incrementalDurationS > replicationDurationS * FullRreplicationTimeFactor -- this is the speed-up
  2. incrementalDurationS > MinTimeToConsiderFullReplicationS -- this handles the case when the relative difference is big but the absolute difference is small. E.g. 1s for full replication vs. 2s for incremental one.

Logs

The old log message "Incremental update rejected because the difference is too big (N transactions)" is replaced by "Incremental update rejected because it would be slower than full replication".

There are also two new log messages:

  • "Couldn't find storage size" if the master cannot find a suitable worker to query its storage size;
  • "Replication params: minTime = MinTimeToConsiderFullReplicationSs, replication factor = FullReplicationTimeFactor
    Full replication: size = storage-size bytes, speed = NetSpeedBitsPerSec bits/sec
    Incremental replication time: estimated-incremental-replication-times
    verdict -> full or incremental replication"

Indicating reasons for worker replication in the enterprise log

The following messages are logged in a cluster enterprise log about replication, with the current implementation:

  • Replicating (out of sync): The worker was deemed out of sync. This state is either reported by the worker, in case it found itself irreparably out of sync with the master's expectations, or detected by the master, in case the worker did not report a valid current state.
  • Replicating (out of sync), forced: The worker was detected out of sync after a reportedly successful completion of an operation (e.g,. a replication or an update execution).
  • Replicating (empty): The worker was empty upon initialisation, and there was at least one non-empty worker attached to the cluster.
  • Replicating (catch up): The worker was cued to replicate through smart replication. This message is preceded by a message from the smart replication indicating why replication was preferred over ordinary execution of transactions.
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.