Building Consistent Audit Log System with Change Data Capture (CDC)

February 5, 2025 (3mo ago)

Capturing reliable and consistent audit logs is essential for traceability, compliance, and debugging. This blog explores how to build a consistent audit logging system using Change Data Capture (CDC). We’ll cover the core concepts of CDC, how it enables reliable tracking of database changes, and practical implementation tips to ensure your logs are complete, ordered, and tamper-resistant.

This blog assumes that you know about CDC and audit logs individually

Other ways than CDC

Logging from application code

saveToDb(...);
saveToAuditLogStorage(...); // ES or any other audit log storage

Polling the Database Periodically

Database Triggers Writing to an Audit Table

Learn more about trigger

Database triggers solves above issues, but it leads to following issues.

Using Change Data Capture (CDC)

Change Data Capture (CDC) is a technique used to track and capture changes made to data in a database or data store. It is typically used to identify and capture inserts, updates, and deletes that occur in a database, and then move or process those changes to other systems or components in real time or in near real time.

I'll not show how to setup CDC, for that you can refer CDC With ProstgresSQL

In simple terms, CDC will stream row level changes made into database in JSON format.

Let's understant it through an example, consider we are having a row in database table user with 1 user.

{
  "id:1,
  "name:"Rahul Kumar"
}

Now, if someone updates the user's name to Rahul Thakur, the database will emit the following CDC event:

{
  "id:1,
  "name:"Rahul Thakur"
}

Acutal CDC event will have other meta data properties, like transaction id, LSN, operation type(c,u,d,t) and more. They are omited here for simplicity.

CDC will ensure that every update is received to your CDC consumer, no matter if you are updating database from app server, shell script, cron jobs or from anywhere. This gurrantee solves the problem of consistency, maintenance and direct DB updates.

CDC events can be written to MongoDB, seperate audit log table or S3/Blob stoarge.

Storing Actor Information in Audit Logs Using CDC

CDC events does not contain any information about who made the change. There are two ways we can use capture actor information when using CDC:

  1. Using outbox table
  2. Adding actor information in table schema itself

1. Outbox table

We can create a table for storing audit logs along with actor information

Simple audit_log table schema

- actor - who made change?
- timestamp - when change was made?
- data - what was the change?

Using above table, we can write updates to Db as well as in audit_log table in the same transaction. Now, we can setup CDC on audit_log table to update audit logs.

It has the following issues:

2. Adding actor information in table schema itself

We can modify the schema of table and add actor information as following.

book_table

- created_by - who created?
- last_modified_by - who made change?
---- others columns ----

When using CDC created_by and last_modified_by will be sent along with each event.

When you don't need actor information?

Actor information is not always necessary, for example personal taks management app. In personal taks management app, users will always modify their own tasks. There is no shared tasks that can be modifed by multiple users. So, it's unnecessary to work hard on keeping the actor information in audit logs.

In some applications, we usually get mix of situations. A careful design can eliminate the need to actor information.

Bonus