Implementing Change Data Capture (CDC) in Databricks

Change Data Capture (CDC) is a process used to track changes made to data in a database. In Databricks, CDC can be simplified using Delta Live Tables, which provide APIs like APPLY CHANGES and APPLY CHANGES FROM SNAPSHOT to manage these changes efficiently.

Using APPLY CHANGES API

The APPLY CHANGES API is used to process changes from a change data feed (CDF). It supports both SCD Type 1 and Type 2 for updating tables. SCD Type 1 updates records directly without retaining history, while SCD Type 2 retains a history of changes.

Example with Python

      import dlt
      from pyspark.sql.functions import col, expr

      @dlt.view
      def users():
        return spark.readStream.table("cdc_data.users")

      dlt.create_streaming_table("target")

      dlt.apply_changes(
        target = "target",
        source = "users",
        keys = ["userId"],
        sequence_by = col("sequenceNum"),
        apply_as_deletes = expr("operation = 'DELETE'"),
        except_column_list = ["operation", "sequenceNum"],
        stored_as_scd_type = "2",
        track_history_except_column_list = ["city"]
      )
    

Example with SQL

      -- Create and populate the target table.
      CREATE OR REFRESH STREAMING TABLE target;

      APPLY CHANGES INTO
      target
      FROM
      stream(cdc_data.users)
      KEYS
      (userId)
      APPLY AS DELETE WHEN
      operation = "DELETE"
      SEQUENCE BY
      sequenceNum
      COLUMNS * EXCEPT
      (operation, sequenceNum)
      STORED AS
      SCD TYPE 2
      TRACK HISTORY ON * EXCEPT
      (city)
    

Frequently Asked Questions

Bottom Line

Implementing CDC in Databricks using Delta Live Tables simplifies the process of capturing and managing changes in data. The APPLY CHANGES and APPLY CHANGES FROM SNAPSHOT APIs provide flexible options for handling CDC data, making it easier to maintain data integrity and history.


👉 Hop on a short call to discover how Fog Solutions helps navigate your sea of data and lights a clear path to grow your business.