The Working of Change Data Capture in Microsoft SQL Server

The Change Data Capture (CDC) is very important in the current data-driven business ecosystem. Not only does it firewall databases from the adverse effects of data breaches but it also secures changed data where their values have to be stored in a way that their history is not compromised. Several solutions towards this end have been tried in the past like complex queries, triggers, timestamps, and even data auditing but without much success.

It was Microsoft that first launched the CDC feature in 2005. It is SQL Server CDC had the required “after update”, “after insert”, and “after delete” features. However, DBAs found this product too complex and intrusive and was not well accepted by them. Subsequently, in 2008, a more advanced version of the SQL Server CDC was introduced by Microsoft that enabled DBAs and developers to capture and archive data without going through additional programming tasks.

How does the SQL Server CDC feature work?

All changes that are made in tables created by users are tracked and monitored by Change Data Capture and stored in relational tables that can be easily accessed for retrieval with T-SQL. When the attributes of the CDC technology are applied to a database table, a mirrored image of the tracked table is created. The replicated table has additional columns of metadata that checks for any changes that have been made in the database row. Apart from this one aspect, the source tables and the replicated tables are similar in every respect.

The source of the changes that have occurred in CDC is reflected clearly in the transaction log in the SQL Server CDC. Immediately after the modifications (inserts, updates, deletes) are seen in the tracked source tables, the particulars of these entries are added to the log, thereby becoming the referral point in CDC. The descriptions of the changes that are linked to the change table segment of the original table are then read from the log.

There are two types of SQL Server CDC.

The first is the Log-based CDC where the system reads the transaction log and file of a database to identify the changes made in the source system which are then replicated in the target system. The main benefit of this method is that there are no missed changes and minimal impact on the production database system and is therefore highly reliable. On the flip side, this type of SQL Server CDC is very complex and works only with databases that support log-based CDC.

The second is Trigger-based CDC where database triggers are used in the CDC thereby reducing the data extraction costs. The main benefit here is that it is easy to implement, changes occur faster, detailed logs of all transactions are provided in shadow tables, and direct support is received in the SQL API for selected databases.

Page updated

Google Sites

Report abuse