Robin Bloor PhD & Rebecca Jozwiak

The Bloor Group

Executive Summary

In this white paper, we examine the evolution of ETL, concluding that a new generation of ETL products, ETL 2.0 as we have called it, is putting a much needed emphasis on the transformation aspect of ETL. The following bullet points summarise the contents of the paper.

• Data movement has proliferated wildly since the advent of the data warehouse, necessitating the growth of a market for ETL products that helps to automate such transfers.

Few data centers have experienced a consistent use of ETL, with many such programs being hand coded or implemented using SQL utilities. As a consequence, the ETL environment is usually fragmented and poorly managed.

Databases and data stores in combination with data transfer activities can be viewed as providing a data services layer to the organisation. Ultimately, the goal of such data services is to provide any data needed by authorised IT and business users when they want it and in the form that they need it.

The capabilities of the first generation of ETL products are now being stressed by:

- The growth of new applications, particularly BI applications.

- The growth of data volumes, the increasing variety of data, and the need for speed.

- The increasing need to analyse very large pools of data, often including historical and social network data.

- High-availability (24/7) requirements that have closed batch windows in which ETL programs could run.

- Rapid changes in technology.

In respect to technology changes, we note the emergence of a whole new generation of databases that are purpose-designed to exploit current computer hardware both to achieve better performance and scale, and to manage very large collections of data. Similarly, we believe the second generation of ETL products will be capable of better performance and scalability, and will be better able to process very large volumes of data.

• We characterise the second generation of ETL products as having the following qualities:

- Improved connectivity - Versatility of extracts, transformations, and loads - Breadth of application - Usability and collaboration - Economy of resource usage - Self-optimisation

ETL 2.0

By leveraging an ETL tool that is versatile in both connectivity and scalability, businesses can negate the challenges of large data volumes to improve the overall performance of data flows. The versatility of second generation ETL tools additionally allows for a wide variety of applications that address business needs, however complex. These products will improve the time to value for many applications that depend on data flows and provide a framework that fosters collaboration among developers, analysts, and business users. By virtue of software efficiency, these tools will require fewer hardware resources than previous tools, and because transformations are processed in memory, they will eliminate the need for workarounds, scheduling, and constant tuning.

In summary, it is our view that ETL tools with such capabilities become increasingly strategic because of their critical role in the provision of data services to applications and business users, and the inherently low development and maintenance costs can help businesses realize a significantly lower overall total cost of ownership (TCO).

