Friday, September 19, 2008

SCD 1 implementation in Datastage

Type 1 Slowly Changing Dimension data warehouse architecture applies when no history is kept in the database. The new, changed data simply overwrites old entries. This approach is used quite often with data which change over the time and it is caused by correcting data quality errors (misspells, data consolidations, trimming spaces, language specific characters). Type 1 SCD is easy to maintain and used mainly when losing the ability to track the old history is not an issue.

SCD 1 implementation in Datastage
The job described and depicted below shows how to implement SCD Type 1 in Datastage. It is one of many possible designs which can implement this dimension. The example is based on the customers load into a data warehouse.

Datastage SCD1 job design
The most important facts and stages of the CUST_SCD2 job processing:
There is a hashed file (Hash_NewCust) which handles a lookup of the new data coming from the text file.
A T001_Lookups transformer does a lookup into a hashed file and maps new and old values to separate columns. A T002 transformer updates old values with new ones without concerning about the overwritten data. SCD1 Transformer update old entries The database is updated in a target ODBC stage (with the 'update existing rows' update action)

2 comments:

Unknown said...

Scd is the very important stage to maintain the history of the data.
href="http://datastage-etl-tool.blogspot.com/2011/05/datastage-interview-questions.html
Datastage Interview Questions

Unknown said...

SCD is the best option for complex stages . And I like to work on this stage with very interesting.

Filter Stage