Change Data Capture

Change Data Capture (CDC) observes changes within your apps hosted on Ditto Cloud and automatically delivers change events to Kafka user-consumable topics you’ve configured.

CDC is a premium feature which must first be enabled by request to the Ditto team.To enable CDC for your organization, contact us.Once submitted, Ditto will be in touch to guide you through the next steps.

Combining Edge and Big Data

CDC enables asynchronous, real-time streaming of structured data change events directly to any Kafka consumer you choose. Similar to how Ditto’s SDK operates, observing changes within the mesh as they happen, you can leverage the same techniques directly from Ditto Cloud. This empowers your external systems and applications to immediately react to data changes as they occur within Ditto. In doing so, you can effectively connect data at the edge to your cloud applications. Some real-world use cases include:

Real-Time PoS Data: Immediately capture point-of-sale transactions from multiple retail locations, enabling real-time inventory management, dynamic pricing adjustments, and instant sales reporting.
Operations Monitoring: Continuously stream event data from applications in the field to build a live, comprehensive view of business operations, helping you proactively identify issues and minimize downtime.
Data Analytics: Egress data into third-party data lakes, data warehouses, or other systems of record, allowing you to centralize and persist data from your edge devices and power decision-making through advanced analytics.

Getting Started

Create a Kafka Consumer

If you’re new to Kafka, complete the steps provided in the official Apache Kafka Quickstart guide to set up and try out basic Kafka operations in your local environment.

Create a CDC Data Bridge

Data Bridges allow you to direct and organize data egress from your Ditto Cloud app to multiple destinations. You can create multiple Data Bridges, each with different filters, to send the right data where you need it to go.

From the Change Data Capture tab, create a new Kafka Data Bridge.

In order to see and edit Data Bridges in the Change Data Capture tab, you’ll need to have an assigned role with the “Access app data bridges” and “Manage app data bridges” permissions respectively.See Role-Based Access Control for more.

Webhooks are also available; however, Kafka is our recommended Data Bridge type for most use cases.

Choose whether you’d like the data filtered or unfiltered:

A filtered Data Bridge is an effective way to deliver only the data relevant to your consuming cloud application. We recommend using filtered Data Bridges where possible, to reduce load on consumers and provide greater flexibility for future CDC changes.
An unfiltered Data Bridge streams all changes across your entire app. This is useful when egressing all data to another data store or when routing data within your application.

If you’ve chosen a filtered Data Bridge, select your collection and query.The filter is specified using a DQL predicate. You can filter data similarly to a DQL query after the WHERE clause.For example, to filter only blue cars, specify: color = 'blue'.Or, to stream all data in the specified collection, set the query to true.You can test your query within the editor before creating your Data Bridge and preview the results.

Select the number of partitions to create for the Kafka topic.You may wish to adjust this depending on how many Kafka consumers your application is architected with. See Partitioning Strategies for details.If you’re unsure, 12 is our recommended sensible default.

Name and create the Data Bridge.

You should now be able to see your first Data Bridge in the Ditto Portal!

Obtaining Certificates

Securely connect Ditto to your Kafka consumer by using certificates provided by Ditto to establish a Secure Sockets Layer (SSL) connection. To get your certificates, select a Data Bridge from the CDC page. Under the “Kafka consumer info” section, download and securely store the Cluster Certificate, Cluster Certificate Password, User Certificate, and User Certificate Password.

Store downloaded certificates securely to prevent unauthorized access.

Connecting

To connect your Kafka consumer to the Data Bridge, you’ll need the topic and consumer group prefix, in addition to the certificates and passwords you’ve obtained. These certificates and passwords correspond to the following SSL parameters in your consumer:

SSL Connection	Ditto Name	Description
`ssl.truststore.location`	Cluster certificate	Certificate authority (CA) certificate in PKCS12 format.
`ssl.truststore.password`	Cluster certificate password	Password to decrypt the CA certificate.
`ssl.keystore.location`	User certificate	Location of user certificate in PKCS12 format.
`ssl.keystore.password`	User certificate password	Password to decrypt the user certificate.

Verifying Integration

To verify your CDC setup, we’ve provided a step-by-step guide. This assumes you’ve followed the Apache Kafka Quickstart. We’ll use the scripts located in the bin directory of Kafka version you’ve downloaded, for example: kafka_2.13-3.9.0/bin.

Copy-paste the following bash script in a terminal, and then replace each variable with the relevant information that displays under the Kafka consumer info section of your Data Bridge.

Bash

CLUSTER_CERT_LOCATION=/path/to/cluster.p12
CLUSTER_CERT_PW=YOUR_CLUSTER_CERT_PASSWORD

USER_CERT_LOCATION=/path/to/user.p12
USER_CERT_PW=YOUR_USER_CERT_PASSWORD

CLOUD_ENDPOINT=YOUR_ENDPOINT
GROUP=YOUR_CONSUMER_GROUP_PREFIX
TOPIC=YOUR_TOPIC

KAFKA=/path/to/kafka_version

$KAFKA/bin/kafka-console-consumer.sh \
 --bootstrap-server $CLOUD_ENDPOINT \
 --consumer-property security.protocol=SSL \
 --consumer-property ssl.truststore.password=$CLUSTER_CERT_PW \
 --consumer-property ssl.truststore.location=$CLUSTER_CERT_LOCATION \
 --consumer-property ssl.keystore.password=$USER_CERT_PW \
 --consumer-property ssl.keystore.location=$USER_CERT_LOCATION \
 --group $GROUP \
 --topic $TOPIC

You’ve successfully connected if no errors display in your terminal, and the script continues running without shutting down.

Next, we want to verify that events are received by your connected script every time a change replicates to your Cloud App.The simplest way to do this is by using the DQL editor in the Ditto Portal to INSERT a document that matches the filter of your Data Bridge.For example, given a Data Bridge with a filter of:

Collection: cars
Query: color = 'blue'

We could run: INSERT INTO cars DOCUMENTS ({ 'color': 'blue' })Alternatively, you can trigger changes by performing an INSERT with the SDK or HTTP API

Your changes are propagating successfully you see an event delivered in your consumer console. For example:

{"txnId": 454875010,"version":1,"cdcTimestamp":"2025-03-18T14:26:09.537711736Z","type":"documentChanged","collection":"cars","change":{"method":"upsert","oldValue":null,"newValue":{"_id":"67d98281004e476900ef5245","_version":"1","color":"blue"}}}

Updating CDC Data Bridges

At present, CDC Data Bridges cannot be edited in-place. Instead, create a new Data Bridge with the updated filters you require and switch your Kafka consumer over to the new Data Bridge. If you’re updating the number of partitions, you can create a new Data Bridge with an identical filter, and simply specify the number of partitions you need. It’s important to note that new Data Bridges begin streaming changes from the time of creation. In other words, they do not provide historical change events that occurred prior to their creation. Therefore, if you require a filter that overlaps with an existing Data Bridge, and want to maintain uninterrupted streaming, you may wish to configure your application or storage system to consume from the old and new Data Bridge concurrently. This can be achieved by keeping a record, for each event, of the combination of txnId and the document’s _id. As no two events for a given document will share the same tnxId, when combined, they can be used to deduplicate events received from both streams.

Advanced Configuration

Consumer Group Configuration

Kafka Consumer Groups are commonly used to allow multiple individual consumers to consume from the same topic, improving resilience and throughput. Ditto Data Bridges support a consumer group prefix, which is displayed in the Data Bridge consumer info. This allows you to have multiple consumer groups to independently read from the same topic without interfering with each other. To do this, each group must have a unique ID which shares the prefix shown in the portal.

Example

Given a Kafka Data Bridge with topic e18540db-85a7-4079-8fad-c486190a752a and consumer group prefix 1e3a1a13-93d7-48ce-8633-c82b78d48451, our getting started example could be adapted to run in two independent consumer groups with the following:

Consumer Group A

CLUSTER_CERT_LOCATION=/path/to/cluster.p12
CLUSTER_CERT_PW=YOUR_CLUSTER_CERT_PASSWORD

USER_CERT_LOCATION=/path/to/user.p12
USER_CERT_PW=YOUR_USER_CERT_PASSWORD

KAFKA=/path/to/kafka_version

# Group ID shares prefix 1e3a1a13-93d7-48ce-8633-c82b78d48451
GROUP=1e3a1a13-93d7-48ce-8633-c82b78d48451-group-a
TOPIC=e18540db-85a7-4079-8fad-c486190a752a

$KAFKA/bin/kafka-console-consumer.sh \
 --bootstrap-server $CLOUD_ENDPOINT \
 --consumer-property security.protocol=SSL \
 --consumer-property ssl.truststore.password=$CLUSTER_CERT_PW \
 --consumer-property ssl.truststore.location=$CLUSTER_CERT_LOCATION \
 --consumer-property ssl.keystore.password=$USER_CERT_PW \
 --consumer-property ssl.keystore.location=$USER_CERT_LOCATION \
 --group $GROUP \
 --topic $TOPIC

Consumer Group B

CLUSTER_CERT_LOCATION=/path/to/cluster.p12
CLUSTER_CERT_PW=YOUR_CLUSTER_CERT_PASSWORD

USER_CERT_LOCATION=/path/to/user.p12
USER_CERT_PW=YOUR_USER_CERT_PASSWORD

KAFKA=/path/to/kafka_version

# Group ID shares prefix 1e3a1a13-93d7-48ce-8633-c82b78d48451
GROUP=1e3a1a13-93d7-48ce-8633-c82b78d48451-group-b
TOPIC=e18540db-85a7-4079-8fad-c486190a752a

$KAFKA/bin/kafka-console-consumer.sh \
 --bootstrap-server $CLOUD_ENDPOINT \
 --consumer-property security.protocol=SSL \
 --consumer-property ssl.truststore.password=$CLUSTER_CERT_PW \
 --consumer-property ssl.truststore.location=$CLUSTER_CERT_LOCATION \
 --consumer-property ssl.keystore.password=$USER_CERT_PW \
 --consumer-property ssl.keystore.location=$USER_CERT_LOCATION \
 --group $GROUP \
 --topic $TOPIC

Partitioning Strategies

You’re able to customize the number of partitions the underlying Kafka topic is deployed with. This ranges from 1 to 100. It’s disruptive to change the partition count, so choose it carefully. The Kafka topic partition count limits the ability of the consumer group to scale. Kafka consumers track their progress on a per-partition basis, so different threads or instances of consumers in a consumer group need to consume from different partitions. Each consumer instance in a consumer group will subscribe to a disjoint set of partitions, so that they consume different messages. However, with too many partitions, Kafka broker performance may suffer. In practice, it’s often difficult to scale a consumer group beyond one thread per topic partition. It’s also important that each consumer in the consumer group is subscribed to the same number of partitions, so that none fall behind. Select a partition count that is around twice the max number of threads you anticipate needing to process the messages. This allows for a little growth. Partition counts with many integer factors provide smoother scaling, which may reduce your hosting costs. 12 is enough for most situations, and allows 1, 2, 3, 4, 6, or 12 consumers to process the topic in parallel. For higher throughput change feeds, 24, 36, 48, or 60 are good partition counts. We suggest load testing in a development environment prior to launching to production, adjusting your consumers as needed, and if you aren’t achieving the throughput needed, create a Data Bridge with a higher number of partitions. These Confluent docs are a good starting point if you want to learn more: https://developer.confluent.io/learn-more/kafka-on-the-go/partitions/

CDC Events Explained

Once your consumer is active, events appear as a JSON message stream in your Kafka console whenever a change successfully replicates to Ditto Server.

Event Ordering

CDC guarantees event ordering at the individual document level. In other words, all changes for a specific document will be delivered sequentially, and each event you receive reflects the latest known state of that document. However, CDC does not guarantee global ordering across multiple documents. For example, consider the following sequence of updates on two separate documents, A and B:

UPDATE cars SET color = 'blue' WHERE _id = 'A'
UPDATE cars SET color = 'red' WHERE _id = 'B'
UPDATE cars SET color = 'green' WHERE _id = 'A'

Possible event orders you might receive are:

1, 2, 3
1, 3, 2
2, 1, 3

Regardless of order across different documents, updates to the same document will always arrive in sequence. Therefore, the net effect once all events are consumed are the same: {'_id': 'A' , 'color': 'green'} and {'_id': 'B' , 'color': 'red'}

Standard Fields

The structure of the message stream includes both standard fields and fields that are specific to each event. The following table provides an overview of the standard set of fields that are included in every message.

Field	Description
`txnId`	The timestamp for when Ditto Server internally replicates data modifications from small peers. (This timestamp is an always‑increasing value.)
`type`	The type of event stream.
`collection`	The collection that the changed document belongs to.
`change.method`	The method that executed the event.

New Document Events

When a document is created (for instance, in response to an INSERT DQL operation), an upsert CDC event is created. A new upsert event will look something like the following:

{
  "txnId": 552789,
  "type": "documentChanged",
  "collection": "people",
  "change": {
    "method": "upsert",
    "oldValue": null,
    "newValue": {
      "_id": "6213e9c90012e4af0017cb9f",
      "date": 1645472201,
      "name": "Susan",
      "age": 31
    }
  }
}

The fields on an upsert event are as follows:

Field	Description
`change.oldValue`	The previous state of the document; since the document did not previously exist, the `change.oldValue` field is always set to `null`.
`change.newValue`	The current state of the document created as a result of the `upsert` operation

Updated Document Events

When a document is updated (for instance, in response to an UPDATE DQL operation), an upsert CDC event is created (similarly to INSERTs above). Given a document with the following:

{
  "_id": "6213e9c90012e4af0017cb9f",
  "date": 1645472312,
  "name": "Susan",
  "age": 31
}

If an update operation specified the following changes:

UPDATE people
SET
  ownedCars = 0,
  friends = [],
  name = 'Frank'
WHERE _id = '6213e9c90012e4af0017cb9f'

You would receive the following JSON event:

{
  "txnId": 553358,
  "type": "documentChanged",
  "collection": "people",
  "change": {
    "method": "upsert",
    "oldValue": {
      "_id": "6213e9c90012e4af0017cb9f",
      "date": 1645472312,
      "name": "Susan",
      "age": 31
    },
    "newValue": {
      "_id": "6213e9c90012e4af0017cb9f",
      "date": 1645472312,
      "name": "Frank",
      "ownedCars": 0,
      "friends": []
    }
  }
}

Removed Document Events

When a document is removed (for instance, in response to a DELETE DQL operation), a remove CDC event is created. A new remove event will look something like the following:

{
  "txnId": 701251,
  "type": "documentChanged",
  "collection": "people",
  "change": {
    "method": "remove",
    "value": {
      "_id": "6213e9c90012e4af0017cb9f",
      "date": 1645468039,
      "name": "Susan",
      "age": 31
    }
  }
}

Field	Description
`change.value`	Indicates the full document at the time of its removal.

Evicted Document Events

When a document is evicted from Ditto Server, an evict CDC event is created. A new evict event will look something like the following:

{
  "txnId": 701251,
  "type": "documentChanged",
  "collection": "people",
  "change": {
    "method": "evict",
    "value": {
      "_id": "6213e9c90012e4af0017cb9f",
      "date": 1645468039,
      "name": "Susan",
      "age": 31
    }
  }
}

Deleted documents that have a tombstone can also be evicted. This can be done manually by the user, or as part of the periodic auto eviction of tombstones that is performed by Ditto Server. In these cases, an event will still be created, but the value field will be null.

Field	Description
`change.value`	Indicates the full document at the time of its eviction. `null` if the document has been already deleted.

Snapshot Events

When CDC experiences recovery scenarios (when catching up after falling behind), it may emit snapshot events that represent the current state of a document. A snapshot event provides the full document state at the time of recovery:

{
  "txnId": 740442,
  "version": 1,
  "cdcTimestamp": "2025-07-03T14:50:35.392860465Z",
  "type": "documentChanged",
  "collection": "orders",
  "change": {
    "method": "snapshot",
    "value": {
      "_id": "7ea0326c-b8e0-4c30-a0ed-671b5ddb0d1f",
      "_version": "1",
      "closed": false,
      "items": [
        {
          "name": "Latte",
          "quantity": 2,
          "price": 4.50
        }
      ]
    }
  }
}

Field	Description
`change.value`	The complete current state of the document. Unlike other event types, this represents the full document, not a change operation.

Snapshot events are emitted during CDC recovery and do not represent individual change operations. They provide the current state of documents when CDC cannot guarantee delivery of all intermediate changes.

Event Failures

When the producer fails to keep up with the incoming changes, a requeryRequired event type displays in your Kafka console. Following is an example of a requeryRequired event stream:

{
  "txnID": 45,
  "type": "requeryRequired",
  "documents": []
}

The documents field is now deprecated and will only return an empty list, as demonstrated in the previous snippet.

Responding to Streaming Failures

To maintain data integrity, if you receive a requeryRequired message, invoke the HTTP API to update your system for the entire dataset, as specified in your Data Bridge’s filter. Use the event’s txnID as a value of X-DITTO-TXN-ID in the HTTP API call to ensure the data retrieved is not stale.

Webhooks

Ditto recommends using Kafka based Data Bridges wherever possible, as they offer higher throughput and delivery reliability. However, in situations where you aren’t able to put run a Kafka consumer, Data Bridges also support Webhooks as a destination type. Creating a Webhook Data Bridge is similar to how you’d create a Kafka-based Data Bridge.

From the Change Data Capture tab, create a new Webhook Data Bridge.

Provide a URL that points to your webhook.It must be reachable from the Public internet and able to receive a HTTP POST request. Ditto can provide the public IP addresses which CDC uses for your app, if you need to make changes to any firewalls or network policies.

Select a collection, and define a DQL filter for your Webhook Data Bridge.

Name and create the Data Bridge.

Once the Data Bridge has finished creating, a secret will be available under the “Webhook details” section.This secret is passed to your webhook under the Ditto-Signature header.You can inspect this in your webhook’s logic to verify the authenticity of the request, ensuring that it comes from your Ditto Data Bridge.

Legacy Webhooks

Ditto CDC now uses DQL. Earlier versions of Ditto’s CDC offering supported Webhooks defined using the Legacy Query Language. Webhooks defined using the Legacy Query Language are marked as “Legacy” in the portal. These will continue to function, but do not support any further edits. If required, we recommend creating an equivalent Webhook Data Bridge, written using a DQL filter, per the steps in Webhooks. Please contact Ditto should you need any support with this.

Overview

Public Cloud (BYOC)

Cloud Portal

HTTP Data API

Data Integration

Reference

Change Data Capture

Combining Edge and Big Data

Getting Started

Create a Kafka Consumer

Create a CDC Data Bridge

Obtaining Certificates

Connecting

Verifying Integration

Updating CDC Data Bridges

Advanced Configuration

Consumer Group Configuration

Partitioning Strategies

CDC Events Explained

Event Ordering

Standard Fields

New Document Events

Updated Document Events

Removed Document Events

Evicted Document Events

Snapshot Events

Event Failures

Responding to Streaming Failures

Webhooks

Legacy Webhooks

Overview

Public Cloud (BYOC)

Cloud Portal

HTTP Data API

Data Integration

Reference

​Combining Edge and Big Data

​Getting Started

​Create a Kafka Consumer

​Create a CDC Data Bridge

​Obtaining Certificates

​Connecting

​Verifying Integration

​Updating CDC Data Bridges

​Advanced Configuration

​Consumer Group Configuration

​Partitioning Strategies

​CDC Events Explained

​Event Ordering

​Standard Fields

​New Document Events

​Updated Document Events

​Removed Document Events

​Evicted Document Events

​Snapshot Events

​Event Failures

​Responding to Streaming Failures

​Webhooks

​Legacy Webhooks

Combining Edge and Big Data

Getting Started

Create a Kafka Consumer

Create a CDC Data Bridge

Obtaining Certificates

Connecting

Verifying Integration

Updating CDC Data Bridges

Advanced Configuration

Consumer Group Configuration

Partitioning Strategies

CDC Events Explained

Event Ordering

Standard Fields

New Document Events

Updated Document Events

Removed Document Events

Evicted Document Events

Snapshot Events

Event Failures

Responding to Streaming Failures

Webhooks

Legacy Webhooks