Kimballs SCD Type 2 Demo

This example is using SSIS Dimension Merge SCD Component for which pragmatic works has recently committed to supporting. The demo is purely to show the power of the tool and we have also adding some custom designs to take advantage of its efficiency. The original SSIS Slowly changing dimension component runs very slow and by using the Kimball’s SCD component it will dramatically improve the performance up to 100X faster. We have looked at the update process and found that the OLE DB Command is also quite slow for each update so we inserted into a Temp table and do our final update with a SQL Command.

The image below captures the SSIS package control flow. We have a data flow and a execute SQL Statement.

The ComputerInfo table is the dimensions target table. This table has your typical elements within the warehouse model. We have our surrogate key (S_Key), Effective date, Expire Date, business keys (Manufacturer, Model, and Serial Number), and the remaining fields are objects subject to change.

We have our Update Temp table Called ComputerInfoUpdateExpire which is a copy of the ComputerInfo table the will keep track of your update records with the appropriate expire date. This will be the table used to run the final update statement against the ComputerInfo table to expire those records.

We have our Source file or table depending on your design and here we are using a flat file. The Source Flat file has records coming in that have the business keys that will relate to the target table. So we have Manufacturer, Model, and Serial Number to use as our comparison keys and the we want to compare the Ram or Operating system to see if the records have changed or not.

Now that we have seen all the objects we are working with we will now dive into the Slowly Changing dimension design in SSIS. The following illustrates the design using the new SSIS Dimension Merge SCD Component and we want to look at each part closely to understand how this works.

The target object is the ComputerInfo table and the source object is the ComputerInfoSrc. Once we connect the source and target we then go into the designer and set our keys, mappings, and output structures.

Existing Dimension input column Definition

This section is very important as we want to establish our rules. As you can see we have our surrogate key set which is S_Key and this is a numeric identity column so it will update as new rows are created. We have our effective date and expire dates set. We have our business keys which are Manufacturer, Model, and SerialNumber. The last objects are the fields subject to change which are Ram and OperatingSystem.

Column Mapping

This allows us to see what’s being compared and so we want to map the business keys and the fields we are comparing for change.

Row Change Detection

This allows us to configure how we want to check for changes in the fields. We can ignore the case or Ignore Leading and trailing spaces.

SCD 2 Date Handling

This section lets us set up the date handling elements such as Load date and Expire date. This is a very nice feature as it handles these changes automatically with the component.

variable containing “now” value is the load date
granularity of eff/exp datetimes is setting the date part
record changes by marking is the how the expire date is set
last record’s expiry date is set to null cause you are handling it for record changes

Output Column Selection

This section lets us look at all the output choices and gives the ability to choose which columns will be available in the mapping. We have several output selections such as New, Unchanged, Deleted, New SCD2, Expired SCD2 with SCD1 changes, etc…

Note: There are a few more tabs in the designer that we will not discuss in this article. This gives us a good overview of the product and how to set the component up.

The last piece to look is the outputs for SSIS Dimension Merge SCD Component. In this example we have 4 output streams which we will break down as follows:

Expired SCD2 with SCD1 Changes – This is checking the business keys and the change detection scenario and finds changes so it sends the rows and the expire date to the ComputerInfoUpdateExpire Temp Table

New – Any new records in the source that doesn’t match the destination table

Deleted –

New SCD2 –This will insert the new SCD2 records with a new surrogate key.

The Final piece to the SCD Type 2 demo is to update the Expired dates for SCD Type 2. Instead of doing this in the dataflow with an OLE DB Command we decided to take advantage of the SQL language to do this all at ones based on our Temp Table. We call the Update and Expire SQL task.

The Update and Expire SQL statement is a select statement on the ComputerInfoUpdateExpire table that updates the ComputerInfo Table with the expiration date. Here is an illustration of the SQL command.

One thought on “Kimballs SCD Type 2 Demo

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s