Friday, September 19, 2008

Pentaho Data Integration - Kettle ETL tool

Kettle (K.E.T.T.L.E - Kettle ETTL Environment) has been recently aquired by the Pentaho group and renamed to Pentaho Data Integration. Kettle is a leading open source ETL application on the market. It is classified as an ETL tool, however the concept of classic ETL process (extract, transform, load) has been slightly modified in Kettle as it is composed of four elements, ETTL, which stands for:

Data extraction from source databases
Transport of the data
Data transformation
Loading of data into a data warehouse

Kettle is a set of tools and applications which allows data manipulations across multiple sources.
The main components of Pentaho Data Integration are:
Spoon - a graphical tool which make the design of an ETTL process transformations easy to create. It performs the typical data flow functions like reading, validating, refining, transforming, writing data to a variety of different data sources and destinations. Tranformations designed in Spoon can be run with Kettle Pan and Kitchen.
Pan - is an application dedicated to run data transformations designed in Spoon.
Chef - a tool to create jobs which automate the database update process in a complex way
Kitchen - it's an application which helps execute the jobs in a batch mode, usually using a schedule which makes it easy to start and control the ETL processing
Carte - a web server which allows remote monitoring of the running Pentaho Data Integration ETL processes through a web browser.

1 comment:

big data fanatic and bi guru said...

Due to Kettle/Pentaho (version 4 will be released soon) the open source demand for ETL is increasing. Research from the makers of the etl tools comparisons shows that Pentaho data integration can nearly compete with most commercial tools, maybe it is even better.