Change Data Capture
Use Apache Kafka-powered Change Data Capture (CDC) to enable real-time event streaming from Ditto Cloud to your applications and third-party data solutions.
Change Data Capture (CDC) observes changes within your apps hosted on Ditto Cloud and automatically delivers change events to Kafka user-consumable topics you’ve configured.
CDC is a premium feature which must first be enabled by request to the Ditto team.
To enable CDC for your organization, contact us.
Once submitted, Ditto will be in touch to guide you through the next steps.
Combining Edge and Big Data
CDC enables asynchronous, real-time streaming of structured data change events directly to any Kafka consumer you choose.
Similar to how Ditto’s SDK operates, observing changes within the mesh as they happen, you can leverage the same techniques directly from Ditto Cloud.
This empowers your external systems and applications to immediately react to data changes as they occur within Ditto. In doing so, you can effectively connect data at the edge to your cloud applications.
Some real-world use cases include:
- Real-Time PoS Data: Immediately capture point-of-sale transactions from multiple retail locations, enabling real-time inventory management, dynamic pricing adjustments, and instant sales reporting.
- Operations Monitoring: Continuously stream event data from applications in the field to build a live, comprehensive view of business operations, helping you proactively identify issues and minimize downtime.
- Data Analytics: Egress data into third-party data lakes, data warehouses, or other systems of record, allowing you to centralize and persist data from your edge devices and power decision-making through advanced analytics.
Getting Started
Create a Kafka Consumer
If you’re new to Kafka, complete the steps provided in the official Apache Kafka Quickstart guide to set up and try out basic Kafka operations in your local environment.
Create a CDC Data Bridge
Data Bridges allow you to direct and organize data egress from your Ditto Cloud app to multiple destinations.
You can create multiple Data Bridges, each with different filters, to send the right data where you need it to go.
From the Change Data Capture tab, create a new Kafka Data Bridge.
Webhooks are also available; however, Kafka is our recommended Data Bridge type for most use cases.
Choose whether you’d like the data filtered or unfiltered:
- A filtered Data Bridge is an effective way to deliver only the data relevant to your consuming cloud application.
We recommend using filtered Data Bridges where possible, to reduce load on consumers and provide greater flexibility for future CDC changes.
- An unfiltered Data Bridge streams all changes across your entire app.
This is useful when egressing all data to another data store or when routing data within your application.
If you’ve chosen a filtered Data Bridge, select your collection and query.
The filter is specified using a DQL predicate. You can filter data similarly to a DQL query after the WHERE
clause.
For example, to filter only blue cars, specify: color = 'blue'
.
Or, to stream all data in the specified collection, set the query to true
.
You can test your query within the editor before creating your Data Bridge and preview the results.
Name and create the Data Bridge.
You should now be able to see your first Data Bridge in the Ditto Portal!
Obtaining Certificates
Securely connect Ditto to your Kafka consumer by using certificates provided by Ditto to establish a Secure Sockets Layer (SSL) connection.
To get your certificates, select a Data Bridge from the CDC page.
Under the “Kafka consumer info” section, download and securely store the Cluster Certificate, Cluster Certificate Password, User Certificate, and User Certificate Password.
Store downloaded certificates securely to prevent unauthorized access.
Connecting
To connect your Kafka consumer to the Data Bridge, you’ll need the topic and consumer group prefix, in addition to the certificates and passwords you’ve obtained.
These certificates and passwords correspond to the following SSL parameters in your consumer:
SSL Connection | Ditto Name | Description |
---|---|---|
ssl.truststore.location | Cluster certificate | Certificate authority (CA) certificate in PKCS12 format. |
ssl.truststore.password | Cluster certificate password | Password to decrypt the CA certificate. |
ssl.keystore.location | User certificate | Location of user certificate in PKCS12 format. |
ssl.keystore.password | User certificate password | Password to decrypt the user certificate. |
Verifying Integration
To verify your CDC setup, we’ve provided a step-by-step guide.
This assumes you’ve followed the Apache Kafka Quickstart.
We’ll use the scripts located in the bin
directory of Kafka version you’ve downloaded, for example: kafka_2.13-3.9.0/bin
.
Copy-paste the following bash script in a terminal, and then replace each variable with the relevant information that displays under the Kafka consumer info section of your Data Bridge.
You’ve successfully connected if no errors display in your terminal, and the script continues running without shutting down.
Next, we want to verify that events are received by your connected script every time a change replicates to your Cloud App.
The simplest way to do this is by using the DQL editor in the Ditto Portal to INSERT
a document that matches the filter of your Data Bridge.
For example, given a Data Bridge with a filter of:
- Collection:
cars
- Query:
color = 'blue'
We could run: INSERT INTO cars DOCUMENTS ({ 'color': 'blue' })
Alternatively, you can trigger changes by performing an INSERT
with the SDK or HTTP API
Your changes are propagating successfully you see an event delivered in your consumer console. For example:
Updating CDC Data Bridges
At present, CDC Data Bridges cannot be edited in-place.
Instead, create a new Data Bridge with the updated filters you require and switch your Kafka consumer over to the new Data Bridge.
It’s important to note that new Data Bridges begin streaming changes from the time of creation. In other words, they do not provide historical change events that occurred prior to their creation.
Therefore, if you require a filter that overlaps with an existing Data Bridge, and want to maintain uninterrupted streaming, you may wish to configure your application or storage system to consume from the old and new Data Bridge concurrently.
This can be achieved by keeping a record, for each event, of the combination of txnId
and the document’s _id
. As no two events for a given document will share the same tnxId
, when combined, they can be used to deduplicate events received from both streams.
CDC Events Explained
Once your consumer is active, events appear as a JSON message stream in your Kafka console whenever a change successfully replicates to the Big Peer.
Event Ordering
CDC guarantees event ordering at the individual document level. In other words, all changes for a specific document will be delivered sequentially, and each event you receive reflects the latest known state of that document.
However, CDC does not guarantee global ordering across multiple documents.
For example, consider the following sequence of updates on two separate documents, A
and B
:
-
UPDATE cars SET color = 'blue' WHERE _id = 'A'
-
UPDATE cars SET color = 'red' WHERE _id = 'B'
-
UPDATE cars SET color = 'green' WHERE _id = 'A'
Possible event orders you might receive are:
- 1, 2, 3
- 1, 3, 2
- 2, 1, 3
Regardless of order across different documents, updates to the same document will always arrive in sequence.
Therefore, the net effect once all events are consumed are the same: {'_id': 'A' , 'color': 'green'}
and {'_id': 'B' , 'color': 'red'}
Standard Fields
The structure of the message stream includes both standard fields and fields that are specific to each event.
The following table provides an overview of the standard set of fields that are included in every message.
Field | Description |
---|---|
txnID | The timestamp for when the Big Peer internally replicates data modifications from small peers. (This timestamp is an always‑increasing value.) |
type | The type of event stream. |
collection | The collection that the changed document belongs to. |
change.method | The method that executed the event. |
change.oldValue | The previous contents of the document |
change.newValue | The current contents of the document |
New Document Events
When new documents are created, (for instance, in response to an INSERT
DQL operation) an upsert
CDC event is created.
A new upsert
event will look something like the following:
The fields on an upsert
event are as follows:
Field | Description |
---|---|
change.oldValue | The previous state of the document; since the document did not previously exist, the change.oldValue field is always set to null . |
change.newValue | The current state of the document created as a result of the upsert operation |
Updated Document Events
When new documents are updated, (for instance, in response to an UPDATE
DQL operation) an update
CDC event is created.
Given a document with the following:
If an update
operation specified the following changes:
You would receive the following JSON event:
The following table provides an overview of the event-specific fields that provide information specific to the update
event.
Field | Description |
---|---|
change.oldValue | The previous state of the document. |
change.newValue | The current state of the document (note, this the full document, not the diff of the change) |
Removed Document Events
When documents are removed, (for instance, in response to an TOMBSTONE
DQL operation) an remove
CDC event is created.
A new remove
event will look something like the following:
Field | Description |
---|---|
change.value | Indicates the full document at the time of its removal. |
Event Failures
When the producer fails to keep up with the incoming changes, a requeryRequired
event type
displays in your Kafka console.
Following is an example of a requeryRequired
event stream:
The documents
field is now deprecated and will only return an empty list, as demonstrated in the previous snippet.
Responding to Streaming Failures
To maintain data integrity, if you receive a requeryRequired
message, invoke the HTTP API to update your system for the entire dataset, as specified in your Data Bridge’s filter.
Use the event’s txnID
as a value of X-DITTO-TXN-ID
in the HTTP API call to ensure the data retrieved is not stale.
Webhooks
Ditto recommends using Kafka based Data Bridges wherever possible, as they offer higher throughput and delivery reliability.
However, in situations where you aren’t able to put run a Kafka consumer, Data Bridges also support Webhooks as a destination type.
Creating a Webhook Data Bridge is similar to how you’d create a Kafka-based Data Bridge.
From the Change Data Capture tab, create a new Webhook Data Bridge.
Provide a URL that points to your webhook.
It must be reachable from the Public internet and able to receive a HTTP POST request. Ditto can provide the public IP addresses which CDC uses for your app, if you need to make changes to any firewalls or network policies.
Select a collection, and define a DQL filter for your Webhook Data Bridge.
Name and create the Data Bridge.
Once the Data Bridge has finished creating, a secret will be available under the “Webhook details” section.
This secret is passed to your webhook under the Ditto-Signature
header.
You can inspect this in your webhook’s logic to verify the authenticity of the request, ensuring that it comes from your Ditto Data Bridge.
Legacy Webhooks
Ditto CDC now uses DQL. Earlier versions of Ditto’s CDC offering supported Webhooks defined using the Legacy Query Language.
Webhooks defined using the Legacy Query Language are marked as “Legacy” in the portal. These will continue to function, but do not support any further edits.
If required, we recommend creating an equivalent Webhook Data Bridge, written using a DQL filter, per the steps in Webhooks.
Please contact Ditto should you need any support with this.
Was this page helpful?