Network Scenario | Description |
---|---|
Mesh networks | Helps replicate data across peers connected in the same mesh networks, as well as between multiple disconnected mesh networks by way of multihop functionality. (Multihop Replication) |
Centralized and decentralized systems | Handles data sync between centralized authorities, such as Ditto Server; if applicable, your external database, and so on, and decentralized peer-to-peer networks. |
Component | Description |
---|---|
Distributed database | The core storage nodes distributed (partitioned and replicated) across the platform. These database nodes consume the transaction log; commit data to disk; and exchange metadata, known as background gossip, with other storage nodes. (Distributed Database) |
Transaction log | A log that captures each transaction, such as upsert, update, remove, and so on. (Transaction Log) |
Subscription servers | Specialized nodes that extend Ditto Server functionality by bridging communication between all peers and various data sources, as well as serve as caches to enhance query performance*. *(Subscription Servers) |
IntervalMap
, and represents what has been observed, in what slices of the keyspace.
For example, if a server is responsible for an interval of the keyspace that represents the first third of the keyspace, the server “splices” the observed transactions into the IntervalMap
at that interval.
Imagine Server 1 is responsible for Interval 1, it receives transactions 1..=100 from the log, it adds the data from those transactions to a local write transaction with RocksDB. Then it splices the information into the IntervalMap, that it has seen a block of transactions from 1..=100
. We now say that the base
for Interval 1 is 100
. Now the server stores this updated IntervalMap
with the data in a write transaction to RocksDB.
Next the server receives transaction 150..=200
from the log. Clearly the server can detect that it has somehow missed transaction 101..=149
. The server can still observe and record the data from these new transactions, and splice the information into the IntervalMap
. The IntervalMap now has a base
of 100
and a detached-range
of 150..=200
.
Any server with any detached ranges can look in the Partition Map to see if it has any peer replicas, and ask them for the detached range(s). This is an internal query in Ditto Server. If a peer replica has some or all of the missing transaction data, it will send it to the requesting server, who will splice the results in the IntervalMap
, and write the data to disk. This way a server can recover any data it missed, assuming at least one replica stored that data. We call this Backfill.
Nodes gossip their IntervalMaps
, this is how the UST is calculated, and how Backfill replicas can be chosen.
Read on down to “Missed/Lost Data” if you want to know how the cluster continues to make progress and function in the disastrous case that all replicas miss a transaction.
The IntervalMap
, gossip, Backfill, UST, Read Transactions, and the GC timestamp all come together to facilitate “transitions”, which is how Ditto Server can scale up and down, while remaining operational, available, and consistent.
p1r1
refers to the first replica of Partition 1, p2r2
the second replica of Partition 2, etc.
In the Current Config there are nodes p1r1
, p1r2
, p2r1
, p2r2
, p3r1
, p3r2
. Two new nodes are started, (p4r1
, p4r2
). A new Cluster Configuration is generated from the Current Configuration. This runs the Cut-Shift algorithm and produces a Next Configuration, with the partition map and intervals as-per the diagram above.
We store the Current Config, and the Next Config in a strongly consistent metadata store. Updating the metadata store causes the Current Config and Next Config file to be written to location on disk for each Ditto Server Store node, and each node is signaled to re-read the configs.
The servers in p1-p3
are all in the Current Config, and the Next Config. The servers in p4 are only in the Next Config.
A server will consume from the log if it is in either config. Those in both configs will store data in all intervals they own in both configs. In our example each of the current config servers stores a subset of the current sub-interval of its current ownership in the next config. The new servers in p4
start to consume from the log at once, and gossip to all their peers in both configs.
Txn Id 1000
. They must Backfill from 0-1000 from the owners of their intervals in the Current Configuration. They use the Current Configuration to calculate those owners, and the IntervalMaps
from gossip to pick an owner to query for data no longer on the log. Recall that the UST is calculated from the base
of the IntervalMap
but these new servers (only part of the new config) do not contribute to the UST until they have Backfilled.
Announce
message, which contains the announcing server’s IntervalMap
, GC timestamp, highest observed ClusterConfig
epoch, and a boolean
that indicates if the server has backfilled in that ClusterConfig
epoch.
The IntervalMap
is a mapping from ranges of document IDs, referred to as intervals, to TransactionSets
. The IntervalMap
expresses exactly which transactions have been applied in which intervals.
From this gossip, individual nodes calculate what they consider to be the UST. If every server gossips its local maximum committed transaction, then the UST is the minimum of those maximums.
The local maximum committed transaction refers to the process in which a node shares information about the highest transaction it has committed to its local storage. Put differently, the UST represents a limit that should not be exceeded — the “upper limit.”
For more information, see Timestamps.