Thursday, September 11, 2008

Factors for deciding best ETL Estimation Strategy

You might have heard many requests like -I need to prepare an estimation effort for our ETL process. Is it right to calculate the effort only based number of sources, targets and transformations? If it is not only based on above criteria what are the other criteria which we need to follow during estimation effort preparation? Also let me know the percentage of complexity level and data quality issues need to be taken into account.

So now here I post sample ETL estimation effort gudelines for reference.

How well documented are the source files?
How knowledgeable are the ETL developers with the source data?
How clean does the data need to be? We often find that some data does not have the same data quality requirements.
How knowledgeable are the ETL developers with the ETL tool?
Will the ETL developers be assigned full time to the project?
How well is the project being managed?
How much data has to go through the ETL process? Very large amounts of data result in challenges (read problems) that take time and effort to correct.

What is the skill level of the programming resources?
Availability of resources (how many other projects are they working on, other time off for sickness and vacations)?
Who is doing the testing?
Who is writing the test cases?
How well were the test cases written?
How well were the requirements written?
If the scope is changing yet?
What is the level of data quality?
What is the level of understanding of the data?
My suggestion is to perform some research before providing estimates. This can be done by:

Reviewing how the data will be used,
As well as reviewing the data in the data sources by writing queries and manually looking at the data, and
Performing some simple pseudo coding of the solution first.

No comments: