Self Managed Change Data Capture
Big Peer Change Data Capture (CDC) is currently available as a Helm deployed add-on for Operator managed Big Peer.
CDC allows you to connect Big Peer running in your environment to other cloud applications and third party services.
CDC is organised into individual “Data Bridges” which work by:
- Capturing changes made to documents in Big Peer
- Filtering these changes based on your configuration
- Publishing them to Kafka topics for consumption by external systems
For an overview of CDC and possible use cases, see Change Data Capture.
Prerequisites
Before setting up CDC, ensure you have:
- Installed the Ditto Operator (version 0.3.0 or above)
- Deployed a Big Peer
- Created an App on your Big Peer
- Deployed Kafka
Deploying the Ditto Operator
Version 0.3.0
or above is required.
Consult the Operator documentation to get started with the Operator.
The examples in the guide will assume you’ve deployed on a kind
cluster using our recommended kind
config, but can be adjusted to suit your environment.
Deploying a Big Peer
Deploy a Big Peer using a BigPeer
custom resource.
For example:
This creates a basic Big Peer we’ll reference for this guide, called bp1
.
Creating an App
Create an App on your Big Peer using either the Operator API, or a BigPeerApp
resource.
For example:
This creates an App called example-app
on the bp1
Big Peer. The appId
is used to identify the App across different Big Peers and will be needed when configuring CDC.
Deploying Kafka
For your convenience, we’ve provided a Helm chart to deploy Kafka. You may need to change the baseDomain
if you need to make the Kafka topics available over a specific domain you have an ingress for.
For this guide, we’ll assume you’ve deployed using the recommended kind
cluster deployment, and we’ll establish a path on localhost by setting baseDomain
to kafka.localhost
:
The naming of certain resources deployed depends on the name of the helm release. The rest of this guide will assume this release is named kafka-connectors
.
Wait a few minutes for all the pods to be ready:
Deploying CDC
CDC is deployed using the ditto-connectors
Helm chart.
First, create a Helm values file with the configuration needed to deploy CDC:
With your configuration values set, deploy CDC using:
After a few moments, you should see cdc
, cdc-heartbeat
and stream-splitter
pods running:
Connecting to CDC
To connect to CDC Kafka topics, you’ll need to extract the necessary metadata and credentials.
In the examples below, these will be saved to local files for later use. For production environments, make sure to store credentials securely.
Extracting Credentials
Cluster Certificate and Password
Start by extracting the cluster certificate and its password, which are stored in a Kubernetes secret created during Kafka cluster deployment (in the same namespace).
If you followed the Kafka deployment steps exactly, it will be called kafka-connectors-cluster-ca-cert
. Otherwise, it will take the name of the release used for the Kafka installation, suffixed by -cluster-ca-cert
.
From this secret, extract and base64 decode the PKCS#12 certificate and password for use with your Kafka client.
Topic Selection
Next, we need to choose a topic to connect to.
You can see the full list of topics created with:
Example output:
The names seen here are the names of the topics.
For this guide, we’ll choose the 2164bef3-37c0-489c-9ac6-c94b034525d7-all-true
topic.
Group ID Prefix
The Kafka topics make use of a group ID prefix, so multiple consumers groups can read from the topic independently.
This prefix is identical to the topic you’ve selected in the previous step.
User Certificate and Password
Lastly, we need to obtain the user certificate and password.
These will be store in a secret named identically to the topic.
Connecting to Kafka
With all the required connection information extracted, you can now connect a consumer to the topic.
See the Change Data Capture documentation for guidance on which parameters to configure in your consumer with this information.
For this guide, we’ll follow the same verification steps in the CDC docs.
Identify your endpoint hostname
The endpoint to connect the console consumer to will vary depending on how you configured your ingress earlier.
To check, you can inspect your ingresses:
The relevant ingress should have a name that corresponds to the helm release name.
If you’ve followed the steps in this guide, it will be called kafka-connectors-kafka-connectors-0
.
From here, you can obtain the hostname with:
Following these examples, this will return ditto-broker-0.kafka.localhost
.
Run the console consumer
Now we can run the console consumer, supplying the user certs, clusters certs, topic and a group ID using the group prefix.
For example:
If the console runs without error, then you’ve successfully connected.
If you’ve followed the getting started guide, running a kind
cluster and hosting the ingress on kafka.localhost
, you may experience some DNS errors here.
This is due to how the Java runtime, which the console consumer is built upon, resolves DNS.
Adding an entry to your system’s /etc/hosts
should address this issue, for example:
echo "127.0.0.1 ditto-broker-0.kafka.localhost:443" >> /etc/hosts
Verify changes are streaming
The easiest way to verify that changes are streaming successfully is by inserting a document through the HTTP API.
If you haven’t already, follow the steps in Using the Big Peer HTTP API to create an API.
Example document insertion:
You should see output in your Kafka consumer like:
Uninstalling
To uninstall, run helm
delete on the Helm release made during installation: