View Source

The idea behind smart replication is to automatically choose between incremental update and full replication of a given worker, based on which one is better.

h2. Parameters

There are 3 parameters that control the smart replication process:
|| Parameter || Type || Default value || Description ||
| *NetSpeedBitsPerSec* | bits/sec (long) | 104857600 (100Mbps) | The network speed. Used to estimate the time for full replication. |
| *FullReplicationTimeFactor* | ratio (float) | 1.3 | Speed-up ratio. See below. |
| *MinTimeToConsiderFullReplicationS* | seconds (long) | 600 (10 minutes) | Minimum absolute time. See below. |

These parameters are controlled via the JMX bean *ReplicationCluster:name=ClusterInfo/\{$MASTER\}* and are persisted in the master's configuration file.

*N.B.* The parameter *IncrementalUpdateLimit,* which used to control the old logic, is now removed.

h2. Heuristics

Generally, incremental updates are preferable because they affect only the updated worker node (the full replication needs another worker from which to do the replication). A planned improvement is to leave the cluster in RW mode during incremental updates, which would make them even more preferable.

Therefore, the current heuristics is the following: a full replication is preferable only when it is considerably faster than the incremental replication. How much faster is controlled by two parameters: *FullReplicationTimeFactor* and *MinTimeToConsiderFullReplicationS*. Let's say that the estimate of incremental update is *incrementalDurationS* and the estimate of full replication is *replicationDurtationS*. GraphDB Enterprise will prefer the full replication when both of these are true:
# *incrementalDurationS > replicationDurationS * FullRreplicationTimeFactor* \-\- this is the speed-up
# *incrementalDurationS > MinTimeToConsiderFullReplicationS* \-\- this handles the case when the relative difference is big but the absolute difference is small. E.g. 1s for full replication vs. 2s for incremental one.

h2. Logs

The old log message {{"Incremental update rejected because the difference is too big (}}{{{}{_}N{_}}} {{transactions)"}} is replaced by {{"Incremental update rejected because it would be slower than full replication"}}.

There are also two new log messages:
* {{"Couldn't find storage size"}} if the master cannot find a suitable worker to query its storage size;
* "{{Replication params: minTime =}} {{{*}MinTimeToConsiderFullReplicationS{*}{}}}{{{}s, replication factor =}} {{{*}FullReplicationTimeFactor{*}}}
{{Full replication: size =}} {{{*}storage-size{*}}} {{bytes, speed =}} {{{*}NetSpeedBitsPerSec{*}}} {{bits/sec}}
{{Incremental replication time:}} {{{*}estimated-incremental-replication-time{*}{}}}{{{}s}}
{{verdict \->}} {{{*}full{*}}} {{{*}{_}or{_}{*}}} {{{*}incremental{*}}} {{replication}}"