CDC is a premium feature which must first be enabled by request to the Ditto team.To enable CDC for your organization, contact us.Once submitted, Ditto will be in touch to guide you through the next steps.
Combining Edge and Big Data
CDC enables asynchronous, real-time streaming of structured data change events directly to any Kafka consumer you choose. Similar to how Ditto’s SDK operates, observing changes within the mesh as they happen, you can leverage the same techniques directly from Ditto Cloud. This empowers your external systems and applications to immediately react to data changes as they occur within Ditto. In doing so, you can effectively connect data at the edge to your cloud applications. Some real-world use cases include:- Real-Time PoS Data: Immediately capture point-of-sale transactions from multiple retail locations, enabling real-time inventory management, dynamic pricing adjustments, and instant sales reporting.
- Operations Monitoring: Continuously stream event data from applications in the field to build a live, comprehensive view of business operations, helping you proactively identify issues and minimize downtime.
- Data Analytics: Egress data into third-party data lakes, data warehouses, or other systems of record, allowing you to centralize and persist data from your edge devices and power decision-making through advanced analytics.
Getting Started
Create a Kafka Consumer
If you’re new to Kafka, complete the steps provided in the official Apache Kafka Quickstart guide to set up and try out basic Kafka operations in your local environment.Create a CDC Data Bridge
Data Bridges allow you to direct and organize data egress from your Ditto Cloud app to multiple destinations. You can create multiple Data Bridges, each with different filters, to send the right data where you need it to go.1
From the Change Data Capture tab, create a new Kafka Data Bridge.Webhooks are also available; however, Kafka is our recommended Data Bridge type for most use cases.
In order to see and edit Data Bridges in the Change Data Capture tab, you’ll need to have an assigned role with the “Access app data bridges” and “Manage app data bridges” permissions respectively.See Role-Based Access Control for more.
2
Choose whether you’d like the data filtered or unfiltered:
- A filtered Data Bridge is an effective way to deliver only the data relevant to your consuming cloud application. We recommend using filtered Data Bridges where possible, to reduce load on consumers and provide greater flexibility for future CDC changes.
- An unfiltered Data Bridge streams all changes across your entire app. This is useful when egressing all data to another data store or when routing data within your application.
3
If you’ve chosen a filtered Data Bridge, select your collection and query.The filter is specified using a DQL predicate. You can filter data similarly to a DQL query after the
WHERE
clause.For example, to filter only blue cars, specify: color = 'blue'
.Or, to stream all data in the specified collection, set the query to true
.You can test your query within the editor before creating your Data Bridge and preview the results.4
Select the number of partitions to create for the Kafka topic.You may wish to adjust this depending on how many Kafka consumers your application is architected with. See Partitioning Strategies for details.If you’re unsure, 12 is our recommended sensible default.
5
Name and create the Data Bridge.
Obtaining Certificates
Securely connect Ditto to your Kafka consumer by using certificates provided by Ditto to establish a Secure Sockets Layer (SSL) connection. To get your certificates, select a Data Bridge from the CDC page. Under the “Kafka consumer info” section, download and securely store the Cluster Certificate, Cluster Certificate Password, User Certificate, and User Certificate Password.Store downloaded certificates securely to prevent unauthorized access.
Connecting
To connect your Kafka consumer to the Data Bridge, you’ll need the topic and consumer group prefix, in addition to the certificates and passwords you’ve obtained. These certificates and passwords correspond to the following SSL parameters in your consumer:SSL Connection | Ditto Name | Description |
---|---|---|
ssl.truststore.location | Cluster certificate | Certificate authority (CA) certificate in PKCS12 format. |
ssl.truststore.password | Cluster certificate password | Password to decrypt the CA certificate. |
ssl.keystore.location | User certificate | Location of user certificate in PKCS12 format. |
ssl.keystore.password | User certificate password | Password to decrypt the user certificate. |
Verifying Integration
To verify your CDC setup, we’ve provided a step-by-step guide. This assumes you’ve followed the Apache Kafka Quickstart. We’ll use the scripts located in thebin
directory of Kafka version you’ve downloaded, for example: kafka_2.13-3.9.0/bin
.
1
Copy-paste the following bash script in a terminal, and then replace each variable with the relevant information that displays under the Kafka consumer info section of your Data Bridge.
Bash
You’ve successfully connected if no errors display in your terminal, and the script continues running without shutting down.
2
Next, we want to verify that events are received by your connected script every time a change replicates to your Cloud App.The simplest way to do this is by using the DQL editor in the Ditto Portal to
INSERT
a document that matches the filter of your Data Bridge.For example, given a Data Bridge with a filter of:- Collection:
cars
- Query:
color = 'blue'
INSERT INTO cars DOCUMENTS ({ 'color': 'blue' })
Alternatively, you can trigger changes by performing an INSERT
with the SDK or HTTP APIYour changes are propagating successfully you see an event delivered in your consumer console. For example:
Updating CDC Data Bridges
At present, CDC Data Bridges cannot be edited in-place. Instead, create a new Data Bridge with the updated filters you require and switch your Kafka consumer over to the new Data Bridge. If you’re updating the number of partitions, you can create a new Data Bridge with an identical filter, and simply specify the number of partitions you need. It’s important to note that new Data Bridges begin streaming changes from the time of creation. In other words, they do not provide historical change events that occurred prior to their creation. Therefore, if you require a filter that overlaps with an existing Data Bridge, and want to maintain uninterrupted streaming, you may wish to configure your application or storage system to consume from the old and new Data Bridge concurrently. This can be achieved by keeping a record, for each event, of the combination oftxnId
and the document’s _id
. As no two events for a given document will share the same tnxId
, when combined, they can be used to deduplicate events received from both streams.
Advanced Configuration
Consumer Group Configuration
Kafka Consumer Groups are commonly used to allow multiple individual consumers to consume from the same topic, improving resilience and throughput. Ditto Data Bridges support a consumer group prefix, which is displayed in the Data Bridge consumer info. This allows you to have multiple consumer groups to independently read from the same topic without interfering with each other. To do this, each group must have a unique ID which shares the prefix shown in the portal.Example
Example
Given a Kafka Data Bridge with topic
e18540db-85a7-4079-8fad-c486190a752a
and consumer group prefix 1e3a1a13-93d7-48ce-8633-c82b78d48451
, our getting started example could be adapted to run in two independent consumer groups with the following:Consumer Group A
Consumer Group B
Partitioning Strategies
You’re able to customize the number of partitions the underlying Kafka topic is deployed with. This ranges from 1 to 100. It’s disruptive to change the partition count, so choose it carefully. The Kafka topic partition count limits the ability of the consumer group to scale. Kafka consumers track their progress on a per-partition basis, so different threads or instances of consumers in a consumer group need to consume from different partitions. Each consumer instance in a consumer group will subscribe to a disjoint set of partitions, so that they consume different messages. However, with too many partitions, Kafka broker performance may suffer. In practice, it’s often difficult to scale a consumer group beyond one thread per topic partition. It’s also important that each consumer in the consumer group is subscribed to the same number of partitions, so that none fall behind. Select a partition count that is around twice the max number of threads you anticipate needing to process the messages. This allows for a little growth. Partition counts with many integer factors provide smoother scaling, which may reduce your hosting costs. 12 is enough for most situations, and allows 1, 2, 3, 4, 6, or 12 consumers to process the topic in parallel. For higher throughput change feeds, 24, 36, 48, or 60 are good partition counts. We suggest load testing in a development environment prior to launching to production, adjusting your consumers as needed, and if you aren’t achieving the throughput needed, create a Data Bridge with a higher number of partitions. These Confluent docs are a good starting point if you want to learn more: https://developer.confluent.io/learn-more/kafka-on-the-go/partitions/CDC Events Explained
Once your consumer is active, events appear as a JSON message stream in your Kafka console whenever a change successfully replicates to Ditto Server.Event Ordering
CDC guarantees event ordering at the individual document level. In other words, all changes for a specific document will be delivered sequentially, and each event you receive reflects the latest known state of that document. However, CDC does not guarantee global ordering across multiple documents. For example, consider the following sequence of updates on two separate documents,A
and B
:
-
UPDATE cars SET color = 'blue' WHERE _id = 'A'
-
UPDATE cars SET color = 'red' WHERE _id = 'B'
-
UPDATE cars SET color = 'green' WHERE _id = 'A'
- 1, 2, 3
- 1, 3, 2
- 2, 1, 3
{'_id': 'A' , 'color': 'green'}
and {'_id': 'B' , 'color': 'red'}
Standard Fields
The structure of the message stream includes both standard fields and fields that are specific to each event. The following table provides an overview of the standard set of fields that are included in every message.Field | Description |
---|---|
txnId | The timestamp for when Ditto Server internally replicates data modifications from small peers. (This timestamp is an always‑increasing value.) |
type | The type of event stream. |
collection | The collection that the changed document belongs to. |
change.method | The method that executed the event. |
New Document Events
When a document is created (for instance, in response to anINSERT
DQL operation), an upsert
CDC event is created.
A new upsert
event will look something like the following:
upsert
event are as follows:
Field | Description |
---|---|
change.oldValue | The previous state of the document; since the document did not previously exist, the change.oldValue field is always set to null . |
change.newValue | The current state of the document created as a result of the upsert operation |
Updated Document Events
When a document is updated (for instance, in response to anUPDATE
DQL operation), an upsert
CDC event is created (similarly to INSERT
s above).
Given a document with the following:
update
operation specified the following changes:
Removed Document Events
When a document is removed (for instance, in response to aDELETE
DQL operation), a remove
CDC event is created.
A new remove
event will look something like the following:
Field | Description |
---|---|
change.value | Indicates the full document at the time of its removal. |
Evicted Document Events
When a document is evicted from Ditto Server, anevict
CDC event is created.
A new evict
event will look something like the following:
value
field will be null
.
Field | Description |
---|---|
change.value | Indicates the full document at the time of its eviction. null if the document has been already deleted. |
Snapshot Events
When CDC experiences recovery scenarios (when catching up after falling behind), it may emitsnapshot
events that represent the current state of a document.
A snapshot
event provides the full document state at the time of recovery:
Field | Description |
---|---|
change.value | The complete current state of the document. Unlike other event types, this represents the full document, not a change operation. |
Snapshot events are emitted during CDC recovery and do not represent individual change operations. They provide the current state of documents when CDC cannot guarantee delivery of all intermediate changes.
Event Failures
When the producer fails to keep up with the incoming changes, arequeryRequired
event type
displays in your Kafka console.
Following is an example of a requeryRequired
event stream:
The
documents
field is now deprecated and will only return an empty list, as demonstrated in the previous snippet.Responding to Streaming Failures
To maintain data integrity, if you receive arequeryRequired
message, invoke the HTTP API to update your system for the entire dataset, as specified in your Data Bridge’s filter.
Use the event’s txnID
as a value of X-DITTO-TXN-ID
in the HTTP API call to ensure the data retrieved is not stale.
Webhooks
Ditto recommends using Kafka based Data Bridges wherever possible, as they offer higher throughput and delivery reliability. However, in situations where you aren’t able to put run a Kafka consumer, Data Bridges also support Webhooks as a destination type. Creating a Webhook Data Bridge is similar to how you’d create a Kafka-based Data Bridge.1
From the Change Data Capture tab, create a new Webhook Data Bridge.
2
Provide a URL that points to your webhook.It must be reachable from the Public internet and able to receive a HTTP POST request. Ditto can provide the public IP addresses which CDC uses for your app, if you need to make changes to any firewalls or network policies.
3
Select a collection, and define a DQL filter for your Webhook Data Bridge.
4
Name and create the Data Bridge.
5
Once the Data Bridge has finished creating, a secret will be available under the “Webhook details” section.This secret is passed to your webhook under the
Ditto-Signature
header.You can inspect this in your webhook’s logic to verify the authenticity of the request, ensuring that it comes from your Ditto Data Bridge.