The idea behind smart replication is to automatically choose between incremental update and full replication of a given worker, based on which one is better.
Parameters
There are 3 parameters that control the smart replication process:
Parameter |
Type |
Default value |
Description |
NetSpeedBitsPerSec |
bits/sec (long) |
104857600 (100Mbps) |
The network speed. Used to estimate the time for full replication. |
FullReplicationTimeFactor |
ratio (float) |
1.3 |
Speed-up ratio. See below. |
MinTimeToConsiderFullReplicationS |
seconds (long) |
600 (10 minutes) |
Minimum absolute time. See below. |
These parameters are controlled via the JMX bean ReplicationCluster:name=ClusterInfo/{$MASTER} and are persisted in the master's configuration file.
N.B. The parameter IncrementalUpdateLimit, which used to control the old logic, is now removed.
Heuristics
Generally, incremental updates are preferable because they affect only the updated worker node (the full replication needs another worker from which to do the replication). A planned improvement is to leave the cluster in RW mode during incremental updates, which would make them even more preferable.
Therefore, the current heuristics is the following: a full replication is preferable only when it is considerably faster than the incremental replication. How much faster is controlled by two parameters: FullReplicationTimeFactor and MinTimeToConsiderFullReplicationS. Let's say that the estimate of incremental update is incrementalDurationS and the estimate of full replication is replicationDurtationS. GraphDB Enterprise will prefer the full replication when both of these are true:
- incrementalDurationS > replicationDurationS * FullRreplicationTimeFactor -- this is the speed-up
- incrementalDurationS > MinTimeToConsiderFullReplicationS -- this handles the case when the relative difference is big but the absolute difference is small. E.g. 1s for full replication vs. 2s for incremental one.
Logs
The old log message "Incremental update rejected because the difference is too big (N transactions)" is replaced by "Incremental update rejected because it would be slower than full replication".
There are also two new log messages:
- "Couldn't find storage size" if the master cannot find a suitable worker to query its storage size;
- "Replication params: minTime = MinTimeToConsiderFullReplicationSs, replication factor = FullReplicationTimeFactor
Full replication: size = storage-size bytes, speed = NetSpeedBitsPerSec bits/sec
Incremental replication time: estimated-incremental-replication-times
verdict -> full or incremental replication"