Factors for consideration in the decision include:
1. Costs of run-time processing and development.
2. Proprietary nature of source or target systems. A situation where the source system can only be accessed via screen scrapping because the file layouts and key structures are part of package and source is not available. In such cases neither ETL nor EAI will work and a solution might have to be developed on case to case basis.
3. The state of data and load-time window available to migrate data from source to target and vice versa which needs real-time movement of data.
4. Complexity and mapping of source and target systems by data elements and data quality in each system.
5. Skills of staff relative to EAI and ETL tools.
Monday, September 29, 2008
Friday, September 19, 2008
Delivering Asian Chat to your ETL blog
Asian Chat city has now arrived on a blog near you. Here where tons of Asian users from everywhere in the world join ETL chats and have fun for 100% free! Hop on your cam and show tons of people see you right from your ETL blog while you enjoy their video cam! You can signup for completely free account today and begin seeing and talking to all of the ETL users on this site today especially for increasing your ETL connections.
SCD 1 implementation in Datastage
Type 1 Slowly Changing Dimension data warehouse architecture applies when no history is kept in the database. The new, changed data simply overwrites old entries. This approach is used quite often with data which change over the time and it is caused by correcting data quality errors (misspells, data consolidations, trimming spaces, language specific characters). Type 1 SCD is easy to maintain and used mainly when losing the ability to track the old history is not an issue.
SCD 1 implementation in Datastage
The job described and depicted below shows how to implement SCD Type 1 in Datastage. It is one of many possible designs which can implement this dimension. The example is based on the customers load into a data warehouse.
Datastage SCD1 job design
The most important facts and stages of the CUST_SCD2 job processing:
There is a hashed file (Hash_NewCust) which handles a lookup of the new data coming from the text file.
A T001_Lookups transformer does a lookup into a hashed file and maps new and old values to separate columns. A T002 transformer updates old values with new ones without concerning about the overwritten data. SCD1 Transformer update old entries The database is updated in a target ODBC stage (with the 'update existing rows' update action)
SCD 1 implementation in Datastage
The job described and depicted below shows how to implement SCD Type 1 in Datastage. It is one of many possible designs which can implement this dimension. The example is based on the customers load into a data warehouse.
Datastage SCD1 job design
The most important facts and stages of the CUST_SCD2 job processing:
There is a hashed file (Hash_NewCust) which handles a lookup of the new data coming from the text file.
A T001_Lookups transformer does a lookup into a hashed file and maps new and old values to separate columns. A T002 transformer updates old values with new ones without concerning about the overwritten data. SCD1 Transformer update old entries The database is updated in a target ODBC stage (with the 'update existing rows' update action)
Jewish Men for ETL work
Jewish Men are hard to find for an ETL work, but why would you need Jewish men ? Not only you need Jewish men for dating but you can hook up with ETL developers also through jsingles.com and they have pride in offering a special dating service that offers 24/7 phone support, active chat, instant messaging, & much more. If your hunting for Jewish Men, then you've found it right here. You get the access to the biggest database right from you blog,and the quickest sites, & very active customer base from any dating site on the internet. Just take a look for the newest members who have recently joined near you from their IP address based location finder on the homepage.
Steps for generation of a SAS business datawarehouse
Please also be aware of the fact that SAS is very powerful and flexible system and the steps below can be done in many different ways. It is just one way to get the expected results.
- First step will be to read dimensions and populate sample dimensions data
- Then a fact table will be created.
- In the next step we will randomly generate transactions for the fact table with sales data for three years. To generate the numbers we will use SAS random number generators with uniform and random distributions.
- The final tasks will be to validate and extract generated data and feed the reporting application.
- First step will be to read dimensions and populate sample dimensions data
- Then a fact table will be created.
- In the next step we will randomly generate transactions for the fact table with sales data for three years. To generate the numbers we will use SAS random number generators with uniform and random distributions.
- The final tasks will be to validate and extract generated data and feed the reporting application.
Talk Reviews for African ETL
Now there is an easy way to write a website review for your African ETL work. I have never seen a site dedicated to reviews for any country specific ETL work. There are tons of top South African websites that are reviewed by fellow bloggers and give a deep insight on the strategies in ETL world and keep you on the top of the world in technology forefront. Good or bad, pleased or unpleased, happy or dissatisfied with service, come share your ETL experience with others working in African ETL field. So stop staring and start sharing at TalkReviews.com
Pentaho Data Integration - Kettle ETL tool
Kettle (K.E.T.T.L.E - Kettle ETTL Environment) has been recently aquired by the Pentaho group and renamed to Pentaho Data Integration. Kettle is a leading open source ETL application on the market. It is classified as an ETL tool, however the concept of classic ETL process (extract, transform, load) has been slightly modified in Kettle as it is composed of four elements, ETTL, which stands for:
Data extraction from source databases
Transport of the data
Data transformation
Loading of data into a data warehouse
Kettle is a set of tools and applications which allows data manipulations across multiple sources.
The main components of Pentaho Data Integration are:
Spoon - a graphical tool which make the design of an ETTL process transformations easy to create. It performs the typical data flow functions like reading, validating, refining, transforming, writing data to a variety of different data sources and destinations. Tranformations designed in Spoon can be run with Kettle Pan and Kitchen.
Pan - is an application dedicated to run data transformations designed in Spoon.
Chef - a tool to create jobs which automate the database update process in a complex way
Kitchen - it's an application which helps execute the jobs in a batch mode, usually using a schedule which makes it easy to start and control the ETL processing
Carte - a web server which allows remote monitoring of the running Pentaho Data Integration ETL processes through a web browser.
Data extraction from source databases
Transport of the data
Data transformation
Loading of data into a data warehouse
Kettle is a set of tools and applications which allows data manipulations across multiple sources.
The main components of Pentaho Data Integration are:
Spoon - a graphical tool which make the design of an ETTL process transformations easy to create. It performs the typical data flow functions like reading, validating, refining, transforming, writing data to a variety of different data sources and destinations. Tranformations designed in Spoon can be run with Kettle Pan and Kitchen.
Pan - is an application dedicated to run data transformations designed in Spoon.
Chef - a tool to create jobs which automate the database update process in a complex way
Kitchen - it's an application which helps execute the jobs in a batch mode, usually using a schedule which makes it easy to start and control the ETL processing
Carte - a web server which allows remote monitoring of the running Pentaho Data Integration ETL processes through a web browser.
Just talk reviews on talkreviews.com
Website Reviews are an essential element for perfect shopping experience. Especially when you are purchasing something online there is no way you can put complete trust in any site unless you hear real testimonials from other customers who have bought the same product and share their complete experience. Now talkreviews.com just talks about reviews and allows users to share their views through the online site review form. For example when I travel to NJ and want to stay at Doubletree Hotel Jersey City, I can easily get detail stats about it and also in several countries Hilton operates its Top Websites. Its a value add for giving a rich internet shopping experience.
Basics of Data Integration
The ETL process is also very often referred to as Data Integration process and ETL tool as a Data Integration platform. The terms closely related to and managed by ETL processes are: data migration, data management, data cleansing, data synchronization and data consolidation. The main goal of maintaining an ETL process in an organization is to migrate and transform data from the source OLTP systems to feed a data warehouse and form data marts. At present the most popular and widely used ETL tools and applications on the market are:
*IBM Websphere DataStage (Formerly known as Ascential DataStage and Ardent DataStage)
*Informatica PowerCenter
*Oracle Warehouse Builder
*Ab Initio
*Pentaho Data Integration - Kettle Project (open source ETL)
*SAS ETL studio
*Cognos Decisionstream
*Business Objects Data Integrator (BODI)
*Microsoft SQL Server Integration Services (SSIS)
*IBM Websphere DataStage (Formerly known as Ascential DataStage and Ardent DataStage)
*Informatica PowerCenter
*Oracle Warehouse Builder
*Ab Initio
*Pentaho Data Integration - Kettle Project (open source ETL)
*SAS ETL studio
*Cognos Decisionstream
*Business Objects Data Integrator (BODI)
*Microsoft SQL Server Integration Services (SSIS)
Monday, September 15, 2008
FREE cell phones for ETL developers
Have you been frustrated in choosing the right cellular deals just because they keep on changing every day ? Or you are the reader who does not read the news because the news keep changing? Well that was just a joke. But I found the website bestincellphones.com after I researched for Blackberry phones online. I wished I had found this website earlier before signining a contract with my current cell phone company. Well my current contract is with Sprint phones and they have a 1 year contract which I was not comfortable to sign with. So the website bestincellphones.com that I searched showed me deals not only with sprint but also Verizon cell phones that did not have any contract and you can get a cell phone for free. That is really amazing. Can you believe a free phone with no contract? Well you have to regularly check for this site for these types of deals because they keep on changing every day and one fine day you miss the deal of your life time. I always keep an eye on at&t cell phones because my dream is to get a free iPhone with no contract and that I have been waiting for the past one year. Hopefully one day this site will offer me the best deal if we as ETL developers unite together and appeal to the company to come up with such a deal for us. In any event, I will keep a close eye and also I forgot to mention that they have TMobile phones, as my girlfriend has a tmobile that she regrets not buying from this site! So dont miss you chance of amazing phone deals of your lifetime.
Benefits of Jitterbit Integration Solution
Jitterbit Integration Solution is the Leading Open Source Integration Platform that combines intuitive integration design with robust functionality and scalable performance. Jitterbit is designed to handle the most complex integration challenges between legacy, enterprise and On-Demand applications, including Business Process Fusion, ETL, SaaS, and SOA. The Jitterbit Solution enables organizations to unite their independent applications and data in record time and for the lowest total cost of ownership in the industry.
Rapid Results:
Start integrating in minutes!
Graphical drag-n-drop mapping without custom code
100% standards-based communication
Out-of-the-box connections for major enterprise applications, all major databases, flat-files, Web Services, and messaging systems
Pre-built integrations (Jitterpaks) available at the Trading Post
Robust & Scalable
Multi-threaded architecture
Optimized for Windows, Linux and Solaris
Advanced caching for improved performance
Easy Integration Management:
Complete visibility into transactions
Schedule processes and automate success/failure operations
Proactive alerting of data and connectivity errors
Role-based access privileges
Cost Effective:
No software licenses
No appliance hardware to buy
Low annual support subscription pricing
Up to 90% lower operating costs vs. proprietary alternatives
Rapid Results:
Start integrating in minutes!
Graphical drag-n-drop mapping without custom code
100% standards-based communication
Out-of-the-box connections for major enterprise applications, all major databases, flat-files, Web Services, and messaging systems
Pre-built integrations (Jitterpaks) available at the Trading Post
Robust & Scalable
Multi-threaded architecture
Optimized for Windows, Linux and Solaris
Advanced caching for improved performance
Easy Integration Management:
Complete visibility into transactions
Schedule processes and automate success/failure operations
Proactive alerting of data and connectivity errors
Role-based access privileges
Cost Effective:
No software licenses
No appliance hardware to buy
Low annual support subscription pricing
Up to 90% lower operating costs vs. proprietary alternatives
Instantly connect with Asian ETL developers
Majority of ETL developers are Asian Singles and some ETL blogs help with matching for someone like you. All you need to do is signup on asianmatching.com and instantly connect to thousands of Asian ETL developers. Even this is helpful for HR guys. The website has many features that other dating sites do not have. Joining only takes a minute. Joining the community will allow you to better match you with other members although you can search without becoming a member.
Is cloud computing an integrated ETL function ?
Cloud computing does nothing to solve tricky data integration problems that companies may be wrestling with. Cloud-based systems do not do the hard integration work - that's still up to the enterprise. If an organization chooses Salesforce.com CRM on demand instead of SAP or Oracle CRM deployed within their firewall, does anything really change? Yes, there's no doubt that there are potential benefits as far as getting CRM up and running goes, but then what? In the enterprises I've worked with, the bulk of the projects were not about implementing some new vendor package. Some of that was always occurring, but there was plenty more that was about integration, enhancements, and other development activities.
Black Women with ETL background
Black Women have a deep ETL background from the comments that we get on this blog and now I found the site blacksene.com where you can always find active Singles, with many people in your neighborhood..It is an online singles and personals site, featuring some of the amazing members. If your looking for Black Women, then you just found it on this blog right here. The site also features the largest database,the speediest sites, and the most active users from any dating service on line.
Coming Soon: Open Source ETL Conversion Tool
Last count their are about a dozen ETL vendors in the market. This includes Informatica, IBM, Talend, Apatar, Ab Initio, Expressor-Software, Business Objects/SAP, Oracle, Microsoft, Cognos/IBM, Pentaho (Kettle), SAS, Enhydra Octopus and probably more that I'm missing. The challenge? Nothing that you do in each one of these products is portable to migrate to another ETL tool. I can see why this is the case. It's about lock-in. Once you build an integration using an ETL tool, depending on the complexity, it is very difficult to swap out one for another. This certainly keeps the prices up in the proprietary vendor community. ASP's in this space are probably around 200K or more. For all the years that these products have been in the market, its amazing that nothing has been pursued to produce an ETL process standard like what has been in with BPEL (Business Process Execution Language). I guess my entrepreneurial instincts lean towards thoughts of creating an open source ETL Conversion tool for the market. Although very difficult to build, thus a barrier to entry, I suppose that building it as an open source solution might be the best approach. No one really owns it and it would benefit customers as they will have an option to jump out of one ETL product to another without feeling locked in forever.
Saturday, September 13, 2008
Black Chat Room is here now
As I was looking back at the Black Chat Room widgets for ETL blogs, it struck on me some memories of some of the oldest form of true chat rooms are the text-based variety. I remember some black chat rooms were around as back as 1990 using a prototype of the text-only chat room. The most popular of this kind was Internet Relay Chat (IRC) and I think blackchatcity.com also had one of its IRC at some time in the past. The popularity of these kinds of chat rooms has waned over the years, and IRC's popularity has rapidly given way to instant messaging. Also a notable number of people were introduced to chat rooms from blackchatcity.com and web chat sites. Now the chat technology provided by blackchatcity.com is more advanced with video and audio and real time anonymous chat for your ETL blog. This is all FREE of charge, so go ahead and install it on your ETL blog.
SQLIO Predeployment and Optimization Best Practices
• SQLIO is generally not a CPU-intensive process. However, you should monitor during the run to ensure that there are no CPU bottlenecks, which would skew the results.
• Larger I/O sizes normally result in slightly higher latencies.
• For random I/O, focus on number of I/Os per second and latency. For sequential I/Os, focus mainly on throughput (MB/s) and latency.
• Ideally, you should see some increase in your throughput as you increase the number of volumes being tested at the same time. This is dependent on the underlying storage design and specifically impacted by any spindle sharing between LUNs. You may also see increased total throughput as the size of the I/O is increased. This is, however, dependent on the specifics of your particular configuration.
• When you no longer see increased throughput when increasing the number of outstanding I/Os you probably have saturated the disk, the bandwidth of the channel, or do not have the HBA queue depth setting high enough.
• Save all of the test results. It can be helpful to share the results from your SQLIO tests with your storage vendor since the vendor has detailed knowledge about how the equipment should perform for different I/O types, RAID levels, and so on. This data will also be valuable if troubleshooting I/O problems in the future.
• If you have multiple HBAs or multipathing software in use on the host, ensure that both paths are functionally working and, ideally, that the cumulative bandwidth of the HBAs can be exhausted. One test approach to exhaust bandwidth resources between the host and storage is to use very small test files (100 MB) for SQLIO so that they reside in the cache of the storage array (that is, 100 MB). Sequential I/O issued against these small test file(s) should saturate the bandwidth of the paths and allow one to determine if there are any issues related to bandwidth. When running a test such as this, look at the cumulative MB/s attainable and related this number to the number and bandwidth of HBAs. Ideally, you should be able to almost completely saturate the cumulative bandwidth of the number of HBAs in the host.
• If test results vary wildly, check to determine if you are sharing spindles with others on the array.
• Monitoring at host and on the array during tests is ideal for complete analysis. This provides more insight into actual limiting hardware resource (disk, fiber channel (FC) ports, service processors, and so on). Monitoring strategies using System Monitor are discussed in the next section.
• Larger I/O sizes normally result in slightly higher latencies.
• For random I/O, focus on number of I/Os per second and latency. For sequential I/Os, focus mainly on throughput (MB/s) and latency.
• Ideally, you should see some increase in your throughput as you increase the number of volumes being tested at the same time. This is dependent on the underlying storage design and specifically impacted by any spindle sharing between LUNs. You may also see increased total throughput as the size of the I/O is increased. This is, however, dependent on the specifics of your particular configuration.
• When you no longer see increased throughput when increasing the number of outstanding I/Os you probably have saturated the disk, the bandwidth of the channel, or do not have the HBA queue depth setting high enough.
• Save all of the test results. It can be helpful to share the results from your SQLIO tests with your storage vendor since the vendor has detailed knowledge about how the equipment should perform for different I/O types, RAID levels, and so on. This data will also be valuable if troubleshooting I/O problems in the future.
• If you have multiple HBAs or multipathing software in use on the host, ensure that both paths are functionally working and, ideally, that the cumulative bandwidth of the HBAs can be exhausted. One test approach to exhaust bandwidth resources between the host and storage is to use very small test files (100 MB) for SQLIO so that they reside in the cache of the storage array (that is, 100 MB). Sequential I/O issued against these small test file(s) should saturate the bandwidth of the paths and allow one to determine if there are any issues related to bandwidth. When running a test such as this, look at the cumulative MB/s attainable and related this number to the number and bandwidth of HBAs. Ideally, you should be able to almost completely saturate the cumulative bandwidth of the number of HBAs in the host.
• If test results vary wildly, check to determine if you are sharing spindles with others on the array.
• Monitoring at host and on the array during tests is ideal for complete analysis. This provides more insight into actual limiting hardware resource (disk, fiber channel (FC) ports, service processors, and so on). Monitoring strategies using System Monitor are discussed in the next section.
Goth Chat on your ETL blog
Goth Chat represents the communication on your ETL blog using a style of rock music that often evokes bleak, lugubrious imagery. Goth people are performers or followers of this style of music or using chat romance concerning a union of two of the major influences in the development of European culture, the Roman Empire and the Germanic tribes that invaded it. Essentially this is your portal to the darkest corners of the Web like the Dark Knight says that the darkest hour is before the dawn so go ahead and talk with fellow Goth people by embedding a Goth chat on your ETL blog.
Strategies to optimize SQL Server cube and measure group design
• Define cascading attribute relationships (for example Day > Month > Quarter > Year) and define user hierarchies of related attributes (called natural hierarchies) within each dimension as appropriate for your data. Attributes participating in natural hierarchies are materialized on disk in hierarchy stores and are automatically considered to be aggregation candidates. User hierarchies are not considered to be natural hierarchies unless the attributes comprising the levels are related through cascading attribute relationships.
• Remove redundant relationships between attributes to assist the query execution engine in generating the appropriate query plan. Attributes need to have either a direct or an indirect relationship to the key attribute, not both.
• Keep cube space as small as possible by only including measure groups that are needed.
• Place measures that are queried together in the same measure group. A query that retrieves measures from multiple measure groups requires multiple storage engine operations. Consider placing large sets of measures that are not queried together into separate measure groups to optimize cache usage, but do not explode the number of measure groups.
• Minimize the use of large parent-child hierarchies. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute (for example, the All attribute) unless it is disabled. As a result, queries returning cells at intermediate levels are calculated at query time and can be slow for large parent-child dimensions. If you are in a design scenario with a large parent-child hierarchy (more than 250,000 members), you may want to consider altering the source schema to reorganize part or all of the hierarchy into a user hierarchy with a fixed number of levels.
• Optimize many-to-many dimension performance, if used. When you query the data measure group by the many-to-many dimension, a run-time “join” is performed between the data measure group and the intermediate measure group using the granularity attributes of each dimension that the measure groups have in common. Where possible, reduce the size of the intermediate fact table underlying the intermediate measure group. To optimize the run-time join, review the aggregation design for the intermediate measure group to verify that aggregations include attributes from the many-to-many dimension.
• Remove redundant relationships between attributes to assist the query execution engine in generating the appropriate query plan. Attributes need to have either a direct or an indirect relationship to the key attribute, not both.
• Keep cube space as small as possible by only including measure groups that are needed.
• Place measures that are queried together in the same measure group. A query that retrieves measures from multiple measure groups requires multiple storage engine operations. Consider placing large sets of measures that are not queried together into separate measure groups to optimize cache usage, but do not explode the number of measure groups.
• Minimize the use of large parent-child hierarchies. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute (for example, the All attribute) unless it is disabled. As a result, queries returning cells at intermediate levels are calculated at query time and can be slow for large parent-child dimensions. If you are in a design scenario with a large parent-child hierarchy (more than 250,000 members), you may want to consider altering the source schema to reorganize part or all of the hierarchy into a user hierarchy with a fixed number of levels.
• Optimize many-to-many dimension performance, if used. When you query the data measure group by the many-to-many dimension, a run-time “join” is performed between the data measure group and the intermediate measure group using the granularity attributes of each dimension that the measure groups have in common. Where possible, reduce the size of the intermediate fact table underlying the intermediate measure group. To optimize the run-time join, review the aggregation design for the intermediate measure group to verify that aggregations include attributes from the many-to-many dimension.
Free BBW Chat for ETL Bloggers
Have you wished the need to communicate with your ETL blog readers ? Now you can host a BBW Chat on your ETL blog and start communicating with BBW right away. It is a big and beautiful idea to get their comments on your blog right away and improve upon your posts and hone your blogging skills. bbwchatcity.com lets you host both text and audio as well as video and also provides a host of graphical user interface (GUI) text-based chat rooms which allow users to select an identifying icon and modify the look of their chat environment.
SAN usage guidelines for ETL
You can have multiple copies of the same database on different servers at same time. This will allow you to do “mix and match” configurations by having EMC re-map volumes from one server to another. It may sound like a rare luxury to have that amount of storage available. Here are some of the SAN experiences and guidelines:
Establish a usage pattern for your storage volumes that lets you and your SAN expert predict what will be needed.
Follow a naming convention that gives each volume a name indicating what it is for.
Volume names should be unique over all machines.
Mount the volumes on NTFS folders that follow the same naming convention.
Keep in communication with your SAN expert about the usage patterns for your storage.
Establish a usage pattern for your storage volumes that lets you and your SAN expert predict what will be needed.
Follow a naming convention that gives each volume a name indicating what it is for.
Volume names should be unique over all machines.
Mount the volumes on NTFS folders that follow the same naming convention.
Keep in communication with your SAN expert about the usage patterns for your storage.
Friday, September 12, 2008
Free Vampire Chat for ETL Bloggers
Free Vampire Chat is the best place for Vampire ETL bloggers i.e. the ETL developers who blog throughout the night to join the chatrooms and have a great time for free. Thanks to Vampire Chat City where plenty of Vampire users from everywhere in the world can easily hop on their cam right throught their ETL blogs and see you while you check out their live cam! If this interests you then signup for totally free membership today and enjoy seeing and talking to all of the users on your blog now.
ETL Performance world record with SQL Server 2008
Here is a world record that must go into guiness book of world records for ETL tools that was demostrated by the Microsoft Team at Seattle. SQL Server 2008 can now handle more than one terabyte of data parsed from flat files, transferred over the network and loaded into the destination database in less than 30 minutes, a world record beating all previously published results using an ETL tool. That is a rate in excess of 2 TB per hour (650+ MB/second). To be precise, 1.18TB of flat file data can be loaded in 1794 seconds. This is equivalent to 1.00TB in 25 minutes 20 seconds or 2.36TB per hour. Multiple competitors have published results based on TPC-H data. Informatica has the fastest time previously reported, loading 1 TB in over 45 minutes. SSIS has now beaten that time by more than 15 minutes.
Jewish Chat for ETL Bloggers
If you are running a Jewish Blog and are ETL developer like me, then you would love the Jewish Chat from Jewish Chat City. You can talk with jewish users all day or all night long in the free chats for jewish people. You will love talking with religious webcam users who want to talk about judiasm in the free jewish chat that has webcams that work in our free jewish chat lines. Not only religious things to chat here, you can also discuss ETL questions and designs and also embed the chat as HTML on your blog so your visitors can communicate right from your ETL blog.
Requirements for Spatial ETL tools
Spatial ETL tools should have the following characteristics:
1. Common Spatial Reference System or Coordinate Reference System Names and Descriptions
2. Coordinate System (and CRS related object) dictionaries. Stuff like the EPSG dictionary.
3. Datum shift lists (towgs84), and datum grid shift files (NTv1, etc).
4. Transformations, calculations, and algorithms written in pseudocode that can be edited in different languages.
5.Descriptions of spatial reference systems that can be used by developers in different programming languages.
6. Notes on transformation from different representations of a CRS (WKT, PROJ.4, GCTP, GML,...).
7. Test suites with test points in a variety of coordinate systems and their lat/long and WGS84 equivelents).
8. Articles on spatial reference systems and translations useful for programmers interested in spatial reference system implementations. For example: Understanding The Difference Between National Vertical Datum of 1929 and the North American Datum of 1988
1. Common Spatial Reference System or Coordinate Reference System Names and Descriptions
2. Coordinate System (and CRS related object) dictionaries. Stuff like the EPSG dictionary.
3. Datum shift lists (towgs84), and datum grid shift files (NTv1, etc).
4. Transformations, calculations, and algorithms written in pseudocode that can be edited in different languages.
5.Descriptions of spatial reference systems that can be used by developers in different programming languages.
6. Notes on transformation from different representations of a CRS (WKT, PROJ.4, GCTP, GML,...).
7. Test suites with test points in a variety of coordinate systems and their lat/long and WGS84 equivelents).
8. Articles on spatial reference systems and translations useful for programmers interested in spatial reference system implementations. For example: Understanding The Difference Between National Vertical Datum of 1929 and the North American Datum of 1988
Search public records by email
Now you can search public records by email. This service is offered by localpublicrecords.org and is a convenient search to verify the employment. You can search for the domain and email address by the employer domain that you want to search. It is also reliable since they get most upto date records from the most current public databases in the US. If your target has moved from one state to another you can even track that activity by the state specific search on the website.
Thursday, September 11, 2008
Mission statements for an integrated EDW
Detailed textual descriptions of entities such as customers, products, locations and calendars to be applied uniformly across subject areas, using standardized data values. This is a fundamental tenet of MDM.
Aggregated groupings such as types, categories, flavors, colors and zones defined within entities to have the same interpretations across subject areas. This can be viewed as a higher-level requirement on the textual descriptions.
That constraints posed by business intelligence (BI) applications, which attempt to harvest the value of consistent text descriptions and groupings, be applied with identical application logic across subject areas. For instance, constraining on a product category should always be driven from a field named Category found in the Product dimension.
That numeric facts are represented consistently across subject areas so that it makes sense to combine them in computations and compare them to each other, perhaps with ratios or differences. For example, if Revenue is a numeric fact reported from multiple subject areas, then the definitions of each of these revenue instances must be the same.
That international differences in languages, location descriptions, time zones, currencies and business rules be resolved to allow all of the previous consistency requirements to be achieved.
That auditing, compliance, authentication and authorization functions be applied in the same way across subject areas.
Coordination with industry standards for data content, data exchange and reporting, where those standards impact the enterprise. Typical standards include ACORD (insurance), MISMO (mortgages), SWIFT and NACHA (financial services), HIPAA and HL7 (health care), RosettaNet (manufacturing) and EDI (procurement).
Aggregated groupings such as types, categories, flavors, colors and zones defined within entities to have the same interpretations across subject areas. This can be viewed as a higher-level requirement on the textual descriptions.
That constraints posed by business intelligence (BI) applications, which attempt to harvest the value of consistent text descriptions and groupings, be applied with identical application logic across subject areas. For instance, constraining on a product category should always be driven from a field named Category found in the Product dimension.
That numeric facts are represented consistently across subject areas so that it makes sense to combine them in computations and compare them to each other, perhaps with ratios or differences. For example, if Revenue is a numeric fact reported from multiple subject areas, then the definitions of each of these revenue instances must be the same.
That international differences in languages, location descriptions, time zones, currencies and business rules be resolved to allow all of the previous consistency requirements to be achieved.
That auditing, compliance, authentication and authorization functions be applied in the same way across subject areas.
Coordination with industry standards for data content, data exchange and reporting, where those standards impact the enterprise. Typical standards include ACORD (insurance), MISMO (mortgages), SWIFT and NACHA (financial services), HIPAA and HL7 (health care), RosettaNet (manufacturing) and EDI (procurement).
Privacy with the public records searches
The best thing I found about public records search such as publicrecordsdb.org is that none of your personal data is collected on this site and Public Records Database will not ask you to provide any personal details, passwords or credit card information. As the publicrecordsdb.org site uses several resources you may be asked for this information by another website and you should read their privacy policy if you are concerned about the use of your personal information. If you are looking for the fastest and easiest way to search public records online, whether on a local or nationwide level you should try out publicrecordsdb.org without worrying about any privacy loss.
The new Japersoft ETL suite with NetBeans
Jaspersoft Corporation announced the full-production availability of Jaspersoft’s iReport for NetBeans Integrated Development Environment (IDE), which has been certified for NetBeans 6.0 and 6.1 and supports the new beta version 6.5.
In addition, Jaspersoft announced that Jasper for MySQL, the MySQL OEM edition of the Jaspersoft Business Intelligence Suite, was upgraded to version 3 (v3).
iReport is the graphical report and dashboard design tool for JasperReports, a popular open source reporting product. iReport has been available in Beta since December as a native NetBeans plug-in. iReport developers deploy their reports with JasperServer, which includes a secure repository, dashboards, scheduling and ad hoc reporting.
The new Jasper for MySQL Version 3 incorporates all of the new features and functions in Jaspersoft announced earlier this summer and is specifically certified and packaged for Sun's MySQLdatabase.
In addition, Jaspersoft announced that Jasper for MySQL, the MySQL OEM edition of the Jaspersoft Business Intelligence Suite, was upgraded to version 3 (v3).
iReport is the graphical report and dashboard design tool for JasperReports, a popular open source reporting product. iReport has been available in Beta since December as a native NetBeans plug-in. iReport developers deploy their reports with JasperServer, which includes a secure repository, dashboards, scheduling and ad hoc reporting.
The new Jasper for MySQL Version 3 incorporates all of the new features and functions in Jaspersoft announced earlier this summer and is specifically certified and packaged for Sun's MySQLdatabase.
Pre-requisite to ETL consulting hire
More and more of the ETL consulting firms have started using public records search before hiring their top notch consultants. Complete Public Records (CPR) is so far the most comprehensive database of free public records. ETL firms can search the largest public records search database on the World Wide Web. CPR has compiled the most useful and direct links to US public records, state public records.ETL firms can browse Public Records resources to research marriage and divorce records, birth and death records, business information, property records, court and criminal public records, links to employment searches, unemployment benefits, etc. The definite Know-how before your hire your key resources.
Have you done ETL reconcilation?
The reconciliation is critical. If you do not reconcile, you have no way of knowing if the ETL process completed successfully and the business users rarely have any way of performing their own reconciliation. This means they may be dealing with old data (they may assume it is current) and the data may be incomplete. These users will produce incorrect results and when the cause is determined, they will point to you. You should be looking at tie-outs which include certification of the number of rows loaded and verification of numbers within the database. Tie-outs are standard features of the ETL tools but they have to be turned on and someone has to verify the results and this must be done each time the ETL process is complete.
Ringtones for the ETL bloggers
Kanye West Homecoming Ringtone is one stop shop to get Kanye West ringtones for your mobile phone. The site is in dedication to Kanye Omari West (pronounced /ˈkɑnjɛj/) (born June 8, 1977 Atlanta, Georgia) who is an American record producer and is a multiple Grammy Award-winning rapper and singer who rose to fame in the mid 2000s. The best feature is Listen Then Download so you dont spend unnecessary downloads on your phone cutting your data plan down the drains. I hope you will enjoy all the Kanye West Ringtones and found the one you were looking for downloading to your cell phone or your ETL blog.
Factors for deciding best ETL Estimation Strategy
You might have heard many requests like -I need to prepare an estimation effort for our ETL process. Is it right to calculate the effort only based number of sources, targets and transformations? If it is not only based on above criteria what are the other criteria which we need to follow during estimation effort preparation? Also let me know the percentage of complexity level and data quality issues need to be taken into account.
So now here I post sample ETL estimation effort gudelines for reference.
How well documented are the source files?
How knowledgeable are the ETL developers with the source data?
How clean does the data need to be? We often find that some data does not have the same data quality requirements.
How knowledgeable are the ETL developers with the ETL tool?
Will the ETL developers be assigned full time to the project?
How well is the project being managed?
How much data has to go through the ETL process? Very large amounts of data result in challenges (read problems) that take time and effort to correct.
What is the skill level of the programming resources?
Availability of resources (how many other projects are they working on, other time off for sickness and vacations)?
Who is doing the testing?
Who is writing the test cases?
How well were the test cases written?
How well were the requirements written?
If the scope is changing yet?
What is the level of data quality?
What is the level of understanding of the data?
My suggestion is to perform some research before providing estimates. This can be done by:
Reviewing how the data will be used,
As well as reviewing the data in the data sources by writing queries and manually looking at the data, and
Performing some simple pseudo coding of the solution first.
So now here I post sample ETL estimation effort gudelines for reference.
How well documented are the source files?
How knowledgeable are the ETL developers with the source data?
How clean does the data need to be? We often find that some data does not have the same data quality requirements.
How knowledgeable are the ETL developers with the ETL tool?
Will the ETL developers be assigned full time to the project?
How well is the project being managed?
How much data has to go through the ETL process? Very large amounts of data result in challenges (read problems) that take time and effort to correct.
What is the skill level of the programming resources?
Availability of resources (how many other projects are they working on, other time off for sickness and vacations)?
Who is doing the testing?
Who is writing the test cases?
How well were the test cases written?
How well were the requirements written?
If the scope is changing yet?
What is the level of data quality?
What is the level of understanding of the data?
My suggestion is to perform some research before providing estimates. This can be done by:
Reviewing how the data will be used,
As well as reviewing the data in the data sources by writing queries and manually looking at the data, and
Performing some simple pseudo coding of the solution first.
Friday, September 5, 2008
Know whom you are dealing with
Ever wanted to lookup online Criminal Court Records ? Last time I met with an accident with another person, I wanted to do a check on his background. I didnt have an easy way to find out the other persons court records to prove that his side of the story was wrong. Luckily I found this site courtrecordssearch.net from where it was easy to access any court record or court documents. Thank goodness for him that I did not find anything wrong with the other person. But I love to use this site to check on any person's records before conducting any bussiness with. You never know whom you are dealing with!
PPP are you listening
I was really excited about PPP as it really pulled us out of a hole, living on a fixed income. I have 3 blogs and they are hardly doing anything now at all so I have to look elsewhere to make some money for things like food. PPP said they passed the 100,000 posties point just awhile back but if there was some way to track them, I bet it is way down now as we just aren't getting any posts. What's the solution? There doesn't seem to be many new posts at all and one $5 post in 4 days won't buy the bread and milk. Apparently, all the posts are going to those with high Real Ranks and the rest of us are left out in the cold. Does anyone out there feel this way too?
Subscribe to:
Posts (Atom)