CDC Transformation and Delivery Data at the speed of business 1 © 2008 IBM Corporation What is CDC Change Data Capture • Capture data events in source database and move only the changes to the target Many different ways of doing CDC • Timestamps • Triggers • API • Log-based 2 What fuels the IBM CDC Roadmap? The widest breadth of functionality: • Batch/pull and real-time push processing • Guaranteed delivery/transactional integrity • Multiple topologies (peer to peer, 1 to many, many to 1, uni-directional, bidirectional) • Homogeneous & heterogeneous data synchronization Broadest range of sources and targets • Log-based capture agents for DB2 (on all platforms), Oracle, SQL Server, Sybase, IMS, VSAM, IDMS, ADABAS • Native/parallel applies for all RDBMS and JMS • Multiple data delivery protocols (TCP/IP, JMS) Industry leading performance and scalability • End to end throughput and low latency • Parallel Apply to target system • Low impact on source database systems 3 What fuels the IBM CDC Roadmap …. 3000+ customers using the existing CDC products for; • HA/DR (DB back-up, fault tolerance) • Real-time reporting/off-load querying • Application Co-existence (migrations, upgrades, modernization) • eCommerce (web apps, portals, data distribution) • Dynamic Data Warehousing, Master Data Management 700+ people in engineering focused on Information Integration including 170+ focused on CDC technologies The most comprehensive suite of data integration products • BoB transform / cleanse / discovery, metadata management, scalable performance, services enabled for SOA architectures • 5000+ customers using Information Server components 4 The IBM Solution: IBM Information Server Delivering information you can trust IBM Information Server Unified Deployment Discover, model, and govern information structure and content Standardize, merge, and correct information Combine and restructure information for new uses Unified Metadata Management 5 Capture, virtualize and move information for inline delivery InfoSphere CDC Solution Provides real-time change data capture and delivery for • Dynamic warehousing and real-time reporting • Synchronization and replication • Event detection Developers Architects DataMirror Delivers real time changed data to Information Server, applications and targets or message queues Minimal impact on production systems High scalability and end-to-end performance Guaranteed data integrity Proven Heterogeneous support 6 Without impacting performance of production systems Key Value Proposition LATENCY 1. Near zero latency for pervasive integration projects. 2. ETL can also deliver low latency but at what impact to product systems and mission-critical applications. Low Impact Low Latency IMPACT 1. Reduces risk to operational systems. 2. Non intrusive to applications and databases. 3. Use of native DB logs, documented overhead of 2-5%. 4. No use of disk based staging or triggers. 5. Management easily integrated into existing IT operations. 6. Help reduce/manage operational windows. Continuous Consistent Data Delivery CONSISTENT DATA DELIVERY 1. Data pushed from source, delivered in continuous stream, continuous with business operations. 2. Transaction consistency maintained to preserve units of work, referential integrity. 3. Full transaction granularity, before and after image of all transactional changes. 4. Data event aware, can be used to trigger specific business processes. 5. Fault tolerance, recover to last committed transaction. 7 Architecture Java-based GUI for admin & monitoring Subscriber Publisher Database ODS TCP/IP Audit JMS Journal Log Redo/Archive Logs Source Engine And Metadata Target Engine And Metadata Business Process Flat files Databases Oracle, DB2, DB2 UDB, SQL Server, Sybase, Teradata, Netezza, PointBase IMS, VSAM, IDMS, Adabas, DataCom - Classic Direct to existing ETL Platforms z/OS, System i5, Red Hat and SUSE Linux, AIX, HP/UX (PA-RISC and Itanium), Solaris SPARC, Tru64 UNIX, Windows Messaging Middleware MQSeries, Sun Open Message Queue (JMS), TIBCO, BEA AquaLogic, Oracle Fusion Middleware 8 Use Cases Customer examples 9 © 2008 IBM Corporation 1. Building A Low Latency ODS for Operational Reporting and Auditing “Solution deployed to improve visibility into lines of business for organizations with Operational BI and Data Auditing requirements” Production Server ERP Native Operational Data Store DB OLTP Log Manufacturing ODS Production Server Finance Native DB OLTP Each OLTP insert, update and delete operation can be stored as an insert, update and delete to maintain synchronized copy of data. Log Manufacturing All OLTP insert, update and delete operations can be stored as inserts to maintain complete transaction history. Add relevant information such as timestamp, transaction type, source system id, and id of user who changed the transaction. 10 2. Complementing An Existing ETL Technology “Solution deployed to improve visibility into lines of business (i.e. Dynamic Warehousing) and help manage impact concerns caused by ETL on mission critical systems” Production Server ETL Server Data Warehouse Point Of Sale Native OLTP Continuous DB Stage Log ETL Scheduled Batch EDW Retail 11 Stage can be: 1. Relational Table Complementary ETL Technologies: 1. Informatica “Power Center” 2. Flat File 2. Business Objects “Data Integrator” 3. Message Queue 3. Ab Initio 4. Direct to ETL 4. IBM “DataStage” (has native integration) 3. Continuous Feed Of A Business Intelligence Appliance “Solution deployed to improve visibility into lines of business by combining the cost/performance benefits of a BI Appliance with real-time data feeds”. Production Server Appliance Nodes/Cluster Staging Server ERP Native OLTP DB Log “CDC” Continuous (to Appliance) CDC Stage Appliance Load API Flat File Appliance Manufacturing Flat file containing transaction changes viewed as an external file to the appliance. Supported Appliances 1. Teradata Load threshold based on # of Transactions or time interval. 2. Netezza Once threshold reached, call appliance “load API” to bulk load transactions into appliance. 3. GreenPlum 4. Paraccel 5. IBM Balanced Warehouse 12 Data Event Synchronization via an Enterprise Service Bus “Solution deployed to provide real time data feeds for SOA and application “Solution deployed to provide real timerequirements”. data feeds for SOA and application integration business integration business requirements”. Production Server Production Server Billing CRM E Native OLTP “CDC” Continuous DB Queue 1 “CDC” Continuous S Queue 1 B OLTP Log ETL Telco CDC/Replication Process Other Technology CDC/Replication License A license would reside on the server that hosts the message oriented middleware. 13 Telco Complimentary ESB Technologies: 1. IBM “MQ Series” 2. TIBCO “Business Works” 3. BEA “Aqualogic” 4. WebMethods “Fabric” 5. e-Commerce Application Synchronization “Solution deployed to provide continuous customer, sales and inventory visibility in web base e-commerce applications”. Website Orders Native DB OLTP Log Production Server Inventory Corporate Native OLTP DB Log Point Of Sale Retail Native DB Log Provides continuous bi-directional synchronization between web based applications and mission critical business applications. Downtown Store Helps organizations improve customer online shopping experience with improved visibility into inventory and customer shopping activities. 14 OLTP 6. Data Synchronization for Upgrades, Migrations and Workload Balancing “Solution deployed to help IT support application, database and platform migrations”. Production Server Testing Server ERP Native OLTP ERP Upgrades, Migrations DB Log Manufacturing Native DB Workload Balancing OLTP Log Manufacturing Keep data synchronized between current production server and a server deployed to test a new application upgrade/version, or a hardware/OS upgrade. Workload balancing capability (i.e. master to master support) allows database instances to remain synchronized where dual or double data entry is a requirement (i.e. data entry occurring on both systems at the same time). 15 7. Offloading Production Query & Reporting Cycles “Solution deployed to allow organizations to offload the impact of query and reporting to a non mission critical system”. Production Server(s) Finance 1 OLTP Native DB Log Services Reporting Server Finance 2 “Table Copy” Report Query OLTP Services Finance 3 Native OLTP DB Log Services 16 Reporting server can also be used for consolidation requirements i.e. consolidating financials from multiple branches into a single corporate instance. Replication frequency generally varies from continuous (near real-time) to periodic. Table level refresh or copy can be used in addition to log based change data capture. 8. Data Backup And Availability “Solution deployed to allow organizations to backup copies of critical data for recovery where a full disaster solution is not a requirement”. Production Server Availability Server Finance Native OLTP “CDC” Continuous (to backup instance) DB Backup Log Partition 1 Backup Partition 2 Availability of data only, does not support DDL replication. Exact image replication to produce a backup copy on a separate server or in a different partition on the same server. A separate license is not required for each partition used on the production server. 17 Thank You 18 © 2008 IBM Corporation