Smart Replication

Skip to end of metadata
Go to start of metadata
Search
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

GraphDB Documentation

Next versions

GraphDB 6.2
GraphDB 6.3
GraphDB 6.4
GraphDB 6.5
GraphDB 6.6
GraphDB 7.0
GraphDB 7.1

Previous versions

[OWLIM 5.4]
[OWLIM 5.3]
[OWLIM 5.2]
[OWLIM 5.1]
[OWLIM 5.0]
[OWLIM 4.4]
[OWLIM 4.3]
[OWLIM 4.2]
[OWLIM 4.1]
[OWLIM 4.0]

The idea behind smart replication is to automatically choose between incremental update and full replication of a given worker, based on which one is better.

Parameters

There are 3 parameters that control the smart replication process:

Parameter Type Default value Description
NetSpeedBitsPerSec bits/sec (long) 104857600 (100Mbps) The network speed. Used to estimate the time for full replication.
FullReplicationTimeFactor ratio (float) 1.3 Speed-up ratio. See below.
MinTimeToConsiderFullReplicationS seconds (long) 600 (10 minutes) Minimum absolute time. See below.

These parameters are controlled via the JMX bean ReplicationCluster:name=ClusterInfo/{$MASTER} and are persisted in the master's configuration file.

N.B. The parameter IncrementalUpdateLimit, which used to control the old logic, is now removed.

Heuristics

Generally, incremental updates are preferable because they affect only the updated worker node (the full replication needs another worker from which to do the replication). A planned improvement is to leave the cluster in RW mode during incremental updates, which would make them even more preferable.

Therefore, the current heuristics is the following: a full replication is preferable only when it is considerably faster than the incremental replication. How much faster is controlled by two parameters: FullReplicationTimeFactor and MinTimeToConsiderFullReplicationS. Let's say that the estimate of incremental update is incrementalDurationS and the estimate of full replication is replicationDurtationS. GraphDB Enterprise will prefer the full replication when both of these are true:

  1. incrementalDurationS > replicationDurationS * FullRreplicationTimeFactor -- this is the speed-up
  2. incrementalDurationS > MinTimeToConsiderFullReplicationS -- this handles the case when the relative difference is big but the absolute difference is small. E.g. 1s for full replication vs. 2s for incremental one.

Logs

The old log message "Incremental update rejected because the difference is too big (N transactions)" is replaced by "Incremental update rejected because it would be slower than full replication".

There are also two new log messages:

  • "Couldn't find storage size" if the master cannot find a suitable worker to query its storage size;
  • "Replication params: minTime = MinTimeToConsiderFullReplicationSs, replication factor = FullReplicationTimeFactor
    Full replication: size = storage-size bytes, speed = NetSpeedBitsPerSec bits/sec
    Incremental replication time: estimated-incremental-replication-times
    verdict -> full or incremental replication"
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.