Safety in Ditto Server (bad things never happen) has been discussed at length above, in UST, and Transitions, and how Causally Consistent reads occur. Liveness, however, depends on every replica contributing to the UST. The UST (and GC timestamp) are calculated from gossip from every node. If any node is down, partitioned by the network, slow, or in some other way broken, it impacts the progress of the cluster. Yes, Ditto Server can still accept writes, and serve (ever staler) reads, but the UST wonât rise, Transitions wonât finish and GC will stall (leading to many versions on disk.) It is possible in future that we make some changes to how the UST is calculated, and use a quorum of nodes from a partition, or the single highest maximum transaction from a partition. The trade-off being that query routing becomes more complex, and in the event that the node that set the UST high then becomes unavailableâŚsomething has to give in terms of consistency. These trade-offs are mutable, we can re-visit them, we favoured safety in the current iteration of the design. If some nodes are keeping the UST down, and slowing or halting progress, the bad node(s) can be removed.Documentation Index
Fetch the complete documentation index at: https://docs.ditto.live/llms.txt
Use this file to discover all available pages before exploring further.
Bad Nodes
In this first iteration of Ditto Server we have the blunt, expedient tool available to us of killing and removing a node that is bad. The process is simple. Update the Current Config to remove the offending node(s) and signal the remaining servers. They will immediately no longer route to that node, use that node in their calculations, or listen to gossip from that node. This is fine as long as at least one node in each partition of the map is left standing. As soon as the offending nodes are removed the data is under-replicated. At once add replacement nodes by performing a transition as above. For example, imaginep1r2 has become unresponsive. Remove it from the Current Config, create a Next Config with a new server to take the place of** **p1r2, store the configs in the Strongly Consistent metadata store, and signal the nodes. The new node will begin to consume transactions and backfill, and the UST will rise etc.
Missing and Lost Data
As described in the first Backfill section, it is possible, with a long network incident and a short log retention policy, that some transactions are missed. If all the replicas for a partition miss some intersecting subset of transactions, that data has been missed, and it is lost. This should never happen. If it does, we donât want to throw away the Ditto Server cluster, and all the good data. Progress must still be made. In this case each replica of the partition understands from the**IntervalMaps that some transaction T** has been missed. After doing a strongly consistent read of the metadata store, to check that no server in the next config exists that may have the data, the replicas agree unilaterally to pretend that really they did store this data, and they splice it into their IntervalMaps. The UST rises, and progress is made.
It is essential to understand this is a disaster scenario, and not business as usual, but disasters happen, and they should be planned for. We do everything we can to never lose data, including a replicated durable transaction log with a long retention policy.