How To Obtain Flexible, Cost-Effective Scalability and Performance Through Pushdown Processing
How To Obtain Flexible, Cost-Effective Scalability and Performance Through Pushdown Processing
How to Obtain Flexible, Cost-effective Scalability and Performance through Pushdown Processing
Under the Hood of the Pushdown Optimization Option Now Available Through Informatica PowerCenter 8
This document contains Confidential, Proprietary and Trade Secret Information (Confidential Information) of Informatica Corporation and may not be copied, distributed, duplicated, or otherwise reproduced in any manner without the prior written consent of Informatica. While every attempt has been made to ensure that the information in this document is accurate and complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any kind of loss resulting from the use of information contained in this document. The information contained in this document is subject to change without notice. The incorporation of the product attributes discussed in these materials into any release or upgrade of any Informatica software productas well as the timing of any such release or upgradeis at the sole discretion of Informatica. Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700. This edition published April 2006
White Paper
Table of Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Historical Approaches to Data Integration . . . . . . . . . . . . . . . . . . . . . . . . .4 The Combined Engine- and RDBMS-based Approach to Data Integration . .5 How Pushdown Optimization Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Overview of Pushdown Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Two-Pass Pushdown Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Partial Pushdown Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Full Pushdown Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Platform-specific Pushdown Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 Limitations on the Types of Transformations that Can Be Pushed to the Database . . . . . .9
Pushdown Optimization
Executive Summary
Over the next five to 10 years and beyond, the two dominant variables in the enterprise data integration equation are painfully clearmore data and less time. Given these, whats the right data integration strategy to effectively manage terabytes or even hundreds of terabytes of data with enough flexibility and adaptability to cope with future growth? Historically, data integration was performed by developing hand-coded programs that extract data from source systems, apply business/transformation logic and then populate the appropriate downstream system, be it a staging area, data warehouse or other application interface. Helping to overcome the challenges of implementing data integration as an enterprise-wide function, PowerCenter 8 offers key new features that can enable near-universal data access, deliver greater performance and scalability, and significantly increase developer productivity. The push-down logic will allow us to take further advantage of our database processing power. Mark Cothron
Data Integration Architect, Ace Hardware
Hand-coding has been replaced, in many instances, by data integration software that performs the access, discovery, integration, and delivery of data using an engine or data integration server and visual tools to map and execute the desired process. Driven by accelerated productivity gains and ever-increasing performance, state of the art data integration platforms, such as Informatica PowerCenter, handle the vast majority of todays scenarios quite effectively. PowerCenter has enjoyed wide acceptance and use by high-volume customers representing companies and government organizations of all sizes. Based on this use, Informatica has identified performance scenarios where processing data in a source or target databaseinstead of within the data integration servercan lead to significant performance gains. These scenarios are primarily where data is co-located within a common database instance, such as when staging and production reside in a single Oracle relational database management system (RDBMS) or where a large investment has been made in database hardware and software that can provide additional processing power. With these scenarios in mind, Informatica Corporation set out to deliver a solution that delivers the best of both worlds without incurring undo configuration and management burden; a solution that best leverages the performance capabilities of its data integration server and/or the processing power of a relational database interchangeably to optimize the use of available resources.
White Paper
Informatica has developed a solution that offers IT architects flexibility and ease of performance optimization through push down processing into a relational database using the same metadata-driven mapping and execution architecture: the PowerCenter Pushdown Optimization Option now available through Informatica PowerCenter 8. PowerCenter 8 is the latest release of Informaticas single, unified enterprise data integration platform for accessing and integrating data from virtually any business system, in any format, and delivering that data throughout the enterprise at any speed. This white paper describes the flexibility, performance optimization, and leverage provided by the PowerCenter 8 Pushdown Optimization Option. It examines the historical approaches to data integration and describes how a combined engine- and RDBMS-based approach to data integration can help the enterprise: Cost-effectively scale by using a flexible, adaptable data integration architecture Increase developer and team productivity Save costs through greater leverage of RDBMS and hardware investments Eliminate the need to write custom-coded solutions Easily adapt to changes in underlying RDBMS architecture Maintain visibility and control of data integration processes After reading this paper, you will understand how pushdown processing works, the options technical capabilities, and how these capabilities will benefit your environment.
Pushdown Optimization
Data Sources
White Paper
Pushdown Optimization
Figure 2: Data Integration Solution Architects Can Configure the Pushdown Strategy through a Simple Drop-Down Menu in the Powercenter 8 Workflow Manager
Pushdown optimization can be used to push data transformation logic to the source or target database. The amount of work data integration solution architects can push to the database depends on the pushdown optimization configuration, the data transformation logic, and the mapping configuration. When pushdown optimization is used, PowerCenter writes one or more SQL statements to the source or target database based on the data transformation logic. PowerCenter analyzes the data transformation logic and mapping configuration to determine the data transformation logic it can push to the database. At run time, PowerCenter executes any SQL statement generated against the source or target tables, and it processes any data transformation logic within PowerCenter that it cannot push to the database. Using pushdown processing can improve performance and optimize available resources. For example, PowerCenter can push the data transformation logic for the mapping seen in Figure 2 to the source database.
White Paper
The mapping contains a filter transformation that filters out all items except for those with an ID greater than 1005. PowerCenter can push the data transformation logic to the database, and it generates the following SQL statement to process the data transformation logic: INSERT INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC, n_PRICE) SELECT ITEMS.ITEM_ID, ITEMS.ITEM_NAME, ITEMS.ITEM_DESC, CAST(ITEMS.PRICE AS INTEGER) FROM ITEMS WHERE (ITEMS.ITEM_ID >1005) PowerCenter generates an INSERT SELECT statement to obtain and insert the ID, NAME, and DESCRIPTION columns from the source table, and it filters the data using a WHERE clause. PowerCenter does not extract any data from the database during this process. Because PowerCenter does not need to extract and load data, performance improves and resources are maximized.
Pushdown Optimization
In Figure 5, the sources and targets are the same instance, and the data transformation logic can be pushed to the database. The work of the filtering, joining, and sorting the data is performed by the database, freeing PowerCenter resources to perform other tasks. However, the transformation logic is represented in PowerCenter, so it is platform independent and easy to modify. The visual representation makes it simple to review the flow of logic, and the Pushdown Optimizer Viewer allows you to preview the SQL statements PowerCenter will execute at run time.
White Paper
Source-side
Target-side
x x x x
x x x x
With the PowerCenter Pushdown Optimization Option, data integration solution architects canleverage both the database and PowerCenters capabilities by pushing some transformation logic to the database and processing other data transformation logic using PowerCenter.
Pushdown Optimization
For example, users might have a mapping that filters and sorts data and then outputs the data to an XML target. To utilize database and PowerCenter capabilities to their fullest potential, data integration solution architects might push the transformation logic for the Source Qualifier, Filter, and Sorter transformations to the source database, and then the extract the data to output it to the XML target. Figure 7 shows a mapping that uses database capabilities and PowerCenters XML capabilities.
Figure 7: Mapping Pushes Transformation Logic to the Source and Writes to an XML Target
Increased Performance
The PowerCenter Pushdown Optimization Option increases systems performance by providing the flexibility to push data transformation processing to the most appropriate processing resource, whether within a source or target database or through the PowerCenter server. With this option, PowerCenter is the only enterprise data integration software on the market that allows data integration solution architects to choose when pushing down processing offers a performance advantage. With the PowerCenter Pushdown Optimization Option, data integration solution architects can choose to push all or part of the data transformation logic to the source or target database. Data integration solution architects can select the database they want to push transformation logic to, and they can choose to push some sessions to the database, while allowing PowerCenter to process other sessions. For example, lets say an IT organization has an Oracle source database with very low user activity. This organization may choose to push transformation logic for all sessions that run on this database. In contrast, lets say an IT organization has a Teradata source database with heavy user activity. This organization may choose to allow PowerCenter to process the transformation logic for sessions that run on this database. In this way, the sessions can be tuned to work with the load on each database, optimizing performance. With the PowerCenter Pushdown Optimization Option, data integration solution architects can also use variables to choose to push different volumes of transformation logic to the source or target database at different times during the day. For example, partial pushdown optimization may be used during the peak hours of the day, but full pushdown optimization is used from midnight until 2 a.m. when activity is low.
10
White Paper
Pushdown Optimization
11
12
White Paper
Pushdown Optimization
13
Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USA phone: 650.385.5000 fax: 650.385.5500 toll-free in the US: 1.800.653.3871 www.informatica.com
Informatica Offices Around The Globe: Australia Belgium Canada China France Germany Japan Korea the Netherlands Singapore Switzerland United Kingdom USA
2006 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, and, PowerCenter are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be tradenames or trademarks of their respective owners.