The Carrier_SCD workflow

Here you have the basic workflow to load the SCD_Carrier Type 2 slowly changing dimension. We will go over the nodes. You can download this workflow here

First we connect to the database.
Let's now look at the nodes at the bottom left part. The purpose of these is to retrieve the highest Surrogate Key value used so far in the SCD_Carrier table.
- first we select the SCD_Carrier table
- in the DB Query and DB Reader nodes we get the highest value for CARRIER_SKEY used so far
- in the Table Row to Variable node we store this value in the Flow Variable 'max' for later use
The four nodes above these are basically the same as in our previous dimension load example:
- we select and read the flightdata staging table.
- we select the carrier columns (UNIQUE_CARRIER, UNIQUE_CARRIER_NAME)
- we filter out possible duplicate rows
In the next nodes (center and right part), we will check if there are any existing rows in the SCD_Carrier table. If so, we will set their 'valid' column to 'N'. (nodes Constant Column Value and DB Update)
In the Expression node we increment the value of our Surrogate key using the Flow Variable 'max' we created before.
In the Constant Column node we set the value of the 'valid' column to 'Y'
Finally, we add the new row to SCD_Carrier.
Below you can see a screenshot of how the SCD_Carrier table looks like after the first execution of our Workflow
On the next page we will see what happens if we receive new data and an airline changes its name.

Page updated

Google Sites

Report abuse