® IBM Software Group An Introduction to WebSphere Information Integrator Q Replication Tridex June 16, 2005 Beth Hamel DB2 Replication Product Architect hameleb@us.ibm.com © 2004 IBM Corporation IBM Software Group Agenda Topics The Basics of MQ Based Replication Publishing DB2 data to MQ in an XML format Application Examples of Replication and Publishing IBM Software Group SQL Replication Architecture Admin DB2 IMS DB2 Log based Staging tables CD1 CD Control Sybase Oracle SQL Server Capture Informix Control CD1 CD CD1 CD Apply Federation Engine Oracle SQL Server Informix Trigger based Any source External application Sybase Teradata Nicknames Flexible scheduling, transformation, distribution Typically used for business intelligence, distribution and consolidation, application integration IBM Software Group Why Create Another Replication Architecture? Performance: Combine high throughput with low latency Capability: Significantly improve multi-directional replication support New function: Event publishing, table difference utility Manageability: Reduce the number of replication objects to be defined and managed, ease the definition process with new Replication Center wizards IBM Software Group Q Replication Architecture Admin Utilities Control Control Apply Capture Source Federation Engine Log based WebSphere MQ Each message represents a transaction Highly parallel apply process Differentiated conflict detection and resolution Integrated infrastructure for replication and publishing Staged availability of heterogeneous support Target IBM Software Group Q Replication – Q Subscription Process SOURCE METADATA SOURCE2 Q Apply Brows er SOURCE1 METADATA DB2 Log Q Captur e Apply Agent Apply Agent Apply Agent ADMINISTRATION TGT3 TGT1 TGT2 Replication Center Replication Monitor TARGET IBM Software Group Q Replication High Volume with Low Latency Performance Performance of Q-Replication vs. SQL Replication z/OS 25 Q Rep SQL Rep Latency (Seconds) 20 SQL Repl 15 Q Repl 10 5 0 0 5 10 15 Workload Throughput (Rows / Second) Thousands 20 IBM Software Group Subscription Types Unidirectional Changes are replicated in one direction between two servers (i.e. from source to target) Changes can be filtered and transformed Bidirectional Changes are replicated in two directions between two servers Utilizes VALUE based conflict detection Peer to peer Changes are replicated between 2 or more servers Utilizes VERSION based conflict detection IBM Software Group Q Replication Multidirectional Configurations Peer-to-peer – No master copy – Guaranteed Convergence – Version based conflict resolution – Requires extra columns and triggers to provide versioning of rows – N nodes: N * (N-1) subscriptions Bi-directional – One node prevails in case of conflicts – Value based conflict resolution Primary Secondary/backup – Uses old and new value data comparisons – 2 nodes only IBM Software Group Q Replication – Defining Subsets or Filters Subset data Subset of rows through Q Capture predicate on subscription/publication Subset of columns through subscription/publication definition Option included for ignoring deletes Signal defined to allow user selected transactions to be ignored Predicate examples Based on values in the row data itself • WHERE :LOCATION ='EAST' AND :SALES > 100000 Based on values in other data • WHERE :LOCATION ='EAST' AND :SALES > (SELECT SUM(expense) FROM STORES WHERE stores.deptno = :DEPTNO) IBM Software Group Q Replication - Transformations Transformations achieved through: Triggers on the target table Publish event to User Application Stored Procedures called by Apply at the row level Apply calls Stored Procedure , mapping columns to input parms COL1 COL2 COL3 COL4 InParm1 InParm2 InParm3 InParm4 Update Target table X where trg1 = “a”; Stored Procedure performs logic and makes insert/update/delete TRG1 TRG2 TRG3 TRG4 IBM Software Group Apply Load Options A subscription is defined as either: automatic load, manual load, no load required Automatic load: Load is performed by Apply, with automatic coordination of the simultaneous capture of changes, loading of the new table, and apply of changes to other tables. Manual load: Load is performed by user, coordination is required, and will be handled by user (with some help from our administration). No load: No loading required, no coordination required, can immediately capture and apply changes Example: target system is built through backup/restore, with replication started from an inactive source IBM Software Group Replication Administration Replication Center GUI Launchpads, Wizards, Online Help Definitions, Operations, Monitoring Command Line Interface Scripts or interactive mode Example: C:\asnclp REPL > CREATE QSUB USING REPLQMAP ... REPL > CREATE SUBSCRIPTION SET SETNAME ... REPL > CREATE MEMBER IN SETNAME ... Java API’s Typically used when replication is embedded IBM Software Group Q Create Subscription Wizard Create large numbers of subscriptions at a time!! IBM Software Group Table Reconciliation Utilities ASNTDIFF Utility that compares a subscription’s source table (S) with its target table (T) • Generates a table of differences between the two o Rows in S but not in T o Rows in T but not in S o Rows in T and S, but with different values • Checksum used to compare contents of entire row • Very similar concept to file compares such as UNIX diff command • Differences can be used to change source, target, or both ASNTREP Utility that uses the table built by the tdiff utility and issues SQL to make table (T) match table (S) S o n l y Intersection of S and T T o n l y IBM Software Group New in 2005 – MQ Client Support SOURCE MQ SERVER MQ SERVER METADATA SOURCE2 SOURCE1 METADATA METADATA DB2 Log RECV QUEUE Q Captur e MQ CLIENT Q Apply Brows er MQ CLIENT Apply Agent SEND QUEUE METADATA New – MQ Server not required on source or target Apply Agent Apply Agent TGT3 Distributed platforms only Allows separation of Database servers and MQ servers Allows replication support on platforms which currently lack MQ Server support TGT1 TGT2 TARGET IBM Software Group New in 2005 – Federated Targets WEBSPHERE INFORMATION INTEGRATOR SOURCE SOURCE2 METADATA SOURCE1 RECV QUEUE Q Apply Brows er METADATA METADATA DB2 Log METADATA Apply Agent Q Captur e Apply Agent SEND QUEUE Uses WebSphere II Federation Provides high speed parallel apply of data New targets: Oracle, Sybase (fp 9) Apply Agent TGT3 METADATA TGT1 TGT2 MS SQL Server, and Informix(fp10) FEDERATED TARGET IBM Software Group Why Publish Data? Application to Application Messaging Drive downstream applications or APIs based on the transactional changed data of database events Reduce application development and maintenance, performance impact to source applications, and availability impact to source applications Meet Auditing Requirements Capture and store information regarding what changes were made to critical business data and by whom Event Notification Stream changed data information to Web interfaces Stream only particular events of interest (filter data) Warehouse / Business Intelligence Integrate captured changed data with an ETL tool Perform very complex transformations Use a specific transaction format to update target IBM Software Group Q Replication – Event Publication Process SOURCE SOURCE2 SOURCE1 User Application METADATA DB2 Log Q Captur e WBI Event Broker User Application ADMINISTRATION DB2 MQ Listener Replication Center Replication Monitor TARGET User Stored Procedure IBM Software Group Event Publishing - Publication Options Format Only data from committed transactions is published Data is UTF-8, self describing with XML tags Row based = one row per message Transaction based = one transaction per message Row Content Subset by column Subset by predicate Changed column values only or all column values New data values only or include old values Row sent on any change or only on published column changes Suppress deletes IBM Software Group Information Integrator Event Publishing for Classic Sources Capture data changes for classic sources using log data where available Correlate by transactions within a single database WS Information Integrator Event Publishers for z/OS Publish onto message queue in XML format DB2 UDB for z/OS VSAM IMS IDMS Extending the value proposition of the MQ based replication and publishing architecture IBM Software Group Replication and Event Publishing Products: z/OS DB2 DataPropagator for z/OS DB2 UDB sources and targets (DB2 for z/OS V7 and V8) SQL Replication only WebSphere Information Integrator Replication for z/OS DB2 UDB sources and targets (DB2 for z/OS V7 and V8) DB2 DataPropagator Includes SQL Replication, Q Replication, and DB2 Event Publisher WebSphere Information Integrator Event Publisher for z/OS Event publishing to message queues Available for DB2, IMS, or VSAM New!!!! WS II V8.2 New!!!! WS II V8.2 IBM Software Group Replication and Event Publishing Products: Distributed Platforms DB2 LUW DB2 LUW and Informix IDS sources and targets SQL Replication only DB2 DataPropagator WebSphere Information Integrator Replication Edition DB2 DataPropagator WebSphere Information Integrator Event Publisher Edition Includes SQL Replication, Q Replication, and DB2 Event Publisher DB2 LUW sources and targets ( Q Replication) – note that Websphere MQ is bundled with this product Multi-vendor sources and targets (SQL Replication) DB2 LUW sources – note that Websphere MQ is bundled with this product New!!!! WS II V8.2 New!!!! WS II V8.2 IBM Software Group Combining SQL and Q Replication with Event Publishing Staging Tables Log Reader 1 CD2 Capture Schema „CAP1“ DB2 Log CD1 TARGET1 SQL Apply Q Sub TARGET2 Q Apply Log Reader 2 Q Capture Schema „CAP2“ SOURCE3 SOURCE2 SOURCE1 Event Pub User Application SQL Replication and Q Replication can co-exist Managed at source by using multiple capture schemas One Q Capture can handle both Publications and Subscriptions IBM Software Group Continuous Availability using Q Replication Read Only Applications Q Apply Q Capture Q Capture Q Apply DSNA Database Primary DSNB Secondary Database Primary Connection Connection Available for Failover Read/Write Applications Q Replication provides a solution for continuous availability where the active secondary system is also available for other applications IBM Software Group Why Use Q Replication for Continuous Availability? Advantages Allows the fastest switchover with transactionally consistent data Excellent solution for scheduled outage • • Allows flexibility of OS level, DB level, application level, data format Can be easily tested and monitored Allows for database read or write activity on secondary • • secondary site may be used for other applications is the only solution for geographically dispersed updateable databases Can supplement other HA solutions Allows for lower cost hardware or platform Low impact on source applications Disadvantages Asynchronous • Some data is left behind in a failure scenario Application awareness is required (triggers, generated always columns) IBM Software Group High Availability Disaster Recovery for DB2 LUW Log data is copied synchronously or asynchronously db HADR Agent database Agent Bufferpool(s) Database Storage Log Buffer Recovery Log db HADR Agent Copied data is continuously applied using forward recovery HADR Buffer Bufferpool(s) Recovery Log Production Database Database Storage Standby Database Offers a complete solution for high availability –easy to implement, replicates the complete database Will not initially support reads at secondary, partitioned tables IBM Software Group High Availability - Q Replication compared with HADR HADR Sync, async, near-sync DB2 II Q based Replication Near real time async whole DB2 database selected tables/columns DDL, DML DML only ** very simple to set up and manage more complex to set up and manage similar configurations only sites can be very different no support for unlogged LOBs can support unlogged LOBs 1 read/write site only ** multiple read and/or update sites No DPF DPF ok ** Current restriction only ** Current restriction only IBM Software Group Peer to Peer Q Replication Read/Write Applications Q Apply Q Capture Primary Database Read/Write Applications Q Capture Q Apply Secondary Database Replication processes and subscriptions are defined in both directions and data changes flow in both directions Recursion is stopped by Capture, which reads special logged events created by Apply Conflict detection is typically necessary, unless the application is carefully designed to completely avoid conflicts IBM Software Group Peer to Peer Q Replication – Best Practices Workload balancing Provides best results with high ratio of reads to writes Conflicts Plan carefully – avoidance is the best policy May occur with failovers and switchbacks Consider the application impact: for database convergence, single row updates are backed out, not whole transactions Exceptions table Understand the exceptions table – all conflicts are logged there Consider a global view or replicated consolidation of exceptions tables Consider a trigger on the exceptions table for additional actions that need to be performed Application considerations Make a plan to handle serialized objects such as sequences and identity columns Consider impacts to triggers and triggered actions IBM Software Group Online Trading – A case for very high speed replication Trade History Applications Trading Applications Trade Processing Data Q Capture Trade History Data Q Apply In many online environments OLTP data is kept separately from query/history data for better performance of both update and query applications This user has just made an online trade – he will keep hitting enter until he sees that the trade is complete, in this case meaning it has been replicated to the trade history database IBM Software Group Order Processing – Exploiting II Event Publishing Create Billing Request Q Capture Create New Order Order Entry Data WBI Event Broker As new orders are entered into the order entry system, the pertinent data is captured and published into a queue The Websphere Business Integrator Event Broker processes the queued data A billing transaction is created and queued in one system and a shipping transaction is created and queued in another system Create Shipping Request IBM Software Group Customer Profile Management – Exploiting II Event Publishing II IMS Event Publisher Change Customer Info Customer Data Create Customer Change Request WBI Event Broker When a change to customer profile information occurs in one system, the pertinent data is captured and published into a queue The Websphere Business Integrator Event Broker processes the queued data Transactions are created to update the customer profile information in all other database systems, as applicable Create Customer Change Request IBM Software Group Other Important Sources of Information/Education WebSphere Information Integrator sites on the web: http://www-306.ibm.com/software/data/integration/ http://www-306.ibm.com/software/data/db2imstools/ http://db2ii2.dfw.ibm.com/wps/portal/!ut/p/!ut/p/!ut/p/.scr/Login http://www-1.ibm.com/support/docview.wss?uid=swg27005663 Developer Works: http://www-106.ibm.com/developerworks/db2/zones/db2ii/ Tutorials, whitepapers, samples available now IBM Education for Q Replication: DW240: 3 day course without MQ basics DW241: 4 day course with MQ basics included Redbook recently published for Q Replication, EP later this year Consider IBM Services as part of your implementation plan