Oracle Change Data Capture Jack Raitto, Development Manager Oracle NEDC NYOUG Long Island SIG October 7, 2004 1 Oracle Corporation Capture your change data for FREE!* Change Capture Cost Before 2 After * Zero additional license cost over Oracle10g EE Virtually zero source system processing cost Oracle Corporation What is Oracle CDC? 3 Captures change data from operational system(s) as it occurs Part of Extract / Transform / Load (ETL) process for DSS / Data warehouse, potentially other applications Optimizes the extract phase Unleashes SQL power for transformations Provides management framework for change data Oracle Corporation How was it done before (old way)? Method Major Issues Application logging / triggers Timestamp / change key column Maintenance, transaction impacts Application design & performance impact, no before image Impractical for large tables, high transport costs, not timely Not supported, does not track DB releases, security issues, rocket science Table differencing Log sniffing 4 Oracle Corporation CDC Advantages • Built in, custom fit, evolves with the database • Delivers change data when you need it, where you need it • Offers several tradeoffs between timely • 5 change delivery vs. source system overhead (sync, async hotlog, async autolog, etc.) Assumes complete change management responsibility Oracle Corporation CDC Advantages (concl.) • • • • 6 Captures all change data along with transaction information – see all changes a given transaction made and who made them Transactional consistency for changes across multiple source tables is guaranteed Transparently coordinates sharing of change data across users and applications You don’t need rocket scientists on your staff! Oracle Corporation CDC Configurations Sync CDC Async CDC HotLog Async CDC AutoLog Oracle 10g EE Oracle 10g EE System resources Minimal (~2%) Part of txn Oracle 9i EE Oracle 10g SE Transaction delay, system resources YES NO NO Latency Real time Near real time Systems 1 1 Varies w / topology, checkpoint & log switch interval 2 Available Source system cost 7 Oracle Corporation How CDC Works: Sync CDC 8 Uses internal triggers to capture before and/or after images of new and updated rows Has the same performance implications as capture via user triggers Delivers change data in real-time Uses the same interface as async CDC Oracle Corporation Synchronous CDC HotLog Combined Source / Operational BI System CDC Change Tables Customer CDC 9 Oracle Corporation Order ETL Process Upsert to Load Dimension Tables Direct Path Insert to load Fact Tables How CDC Works: Async CDC Relational interface to Streams • Prepackaged Streams application • Asynchronously captures change data from redo/archive logs • Presents relational interface to change data stream 10 Can operate on source system (hot log) or staging system (auto log) Oracle Corporation Foundations of Async CDC Async CDC Replication Message queuing Warehouse loading Event notification Data protection Streams LogMiner 11 Change capture Change management Warehouse loading Oracle Corporation Redo log inspection Debugging Auditing Reversing transactions Asynchronous CDC HotLog Combined Source / Operational BI System CDC Change Tables Customer Active Redo Log 12 LogMiner Streams CDC Oracle Corporation Order ETL Process Upsert to Load Dimension Tables Direct Path Insert to load Fact Tables Asynchronous CDC AutoLog Source Database Data Warehouse / Staging System CDC Change Tables Customer Redo Logs Arch Process Archived Redo Logs 13 LogMiner Streams CDC Oracle Corporation Order ETL Process Upsert to Load Dimension Tables Direct Path Insert to load Fact Tables Using CDC: Publish/Subscribe 14 Publisher supplies, subscribers consume change data Model allows sharing of change data across users and applications Coordinates retention / purge of change data Prevents application from accidentally processing change data more than once Guarantees transactional consistency of change data across source tables via change sets Oracle Corporation Using CDC: Publish/Subscribe Subscriber 1 Publisher Change Data Publication CustNo Last First 123 Smith Frank 124 Jones Mary 125 Stein Linda 126 Vine Abe 127 Block Greg 15 Table Column Type Cust CustNo number Cust Last varchar Cust First varchar Subscription CustNo Last First 123 Smith Frank 124 Jones Mary 125 Stein Linda Subscriber 2 Subscription Oracle Corporation CustNo Last First 125 Stein Linda 126 Vine Abe 127 Block Greg Publisher Concepts Change source • Defines the source system to CDC Change set • Collection of source tables for which transactionally consistent change data is needed Change table • Container to receive change data • Is published to subscribers 16 Oracle Corporation Publisher Concepts Source Database: HQ Staging Database: DW Change Source: HQ_SRC Source table: sh.sales PROD_ID CUST_ID PROMO_ID AMOUNT_SOLD QUANTITY_SOLD Change Set: SH_SET Change table: sales_ct PROD_ID CUST_ID PROMO_ID AMOUNT_SOLD Source table: sh.promotions PROMO_ID PROMO_SUBCAT PROMO_CAT Change table: promo_ct PROMO_ID PROMO_SUBCAT PROMO_CAT PROMO_COST 17 Oracle Corporation Publish Package DBMS_CDC_PUBLISH CREATE / ALTER / DROP_AUTOLOG_CHANGE_SOURCE CREATE / ALTER / DROP_CHANGE_SET CREATE / ALTER / DROP_CHANGE_TABLE PURGE PURGE_CHANGE_SET PURGE_CHANGE_TABLE DROP_SUBSCRIPTION 18 Oracle Corporation Using Change Data: Subscribers 19 The subscriber creates a subscription from an available publication The subscription provides a moving window (view) to the change data Subscriptions go against a single change set and are therefore transactionally consistent When all subscribers have advanced past old change data, CDC automatically and efficiently purges Oracle Corporation Subscriber Concepts Staging Database: DW Subscription: sales_promo_list Change Set: SH_SET Publication on : sh.sales PROD_ID CUST_ID PROMO_ID AMOUNT_SOLD Publication on: sh.promotions PROMO_ID PROMO_SUBCAT PROMO_CAT 20 Subscriber view: spl_sales Subscriber view: spl_promos Oracle Corporation Subscriber View Subscriber view: spl_sales OPERATION$ CSCN$ USERNAME$ PROD_ID CUST_ID PROMO_ID I 587322 GRIFFIN 12784 12 0 UO 587482 SLOAN 12784 12 0 UN 587482 SLOAN 12784 12 42 I 594312 BRIGGS 14899 302 42 I 602311 GRIFFIN 12498 12 55 D 711413 SLOAN 138922 7934 0 I 796122 BRIGGS 77741 712 55 I 796122 BRIGGS 13846 712 55 Insert Update before Update after Insert Insert Delete Insert Insert 21 Oracle Corporation Subscriber Package DBMS_CDC_SUBSCRIBE CREATE_SUBSCRIPTION SUBSCRIBE ACTIVATE_SUBSCRIPTION EXTEND_WINDOW PURGE_WINDOW DROP_SUBSCRIPTION 22 Oracle Corporation Security 23 Sync publisher must have SELECT access to the source table Async publisher must have EXECUTE_CATALOG_ROLE privilege Publisher uses GRANT and REVOKE on change tables to control subscriber access Oracle Corporation Performance Benchmark* Objectives: • Determine impact on transaction time • Determine latency Source system: Oracle 10g R1 Beta, SunFire 4800 SMP 8x900Mhz/16GB w/striped 8 x Sun StorEdge T3 arrays (9X36.4MB each) Customer insurance quote OLTP application run at Oracle, 250 concurrent users / 175 TPS, system “warmed up” (steady state) Mixture of Inserts, Updates, Deletes, Singleton Selects, Cursor Fetches, Rollbacks / Commits, savepoints Capture changes on all tables * Your mileage will vary! 24 Oracle Corporation Transaction Performance Transaction elongated by 10% Relative impact varies depending on other overhead 1.2 1.15 1.1 1.05 1 0.95 0.9 no CDC 25 Sync CDC (9i) HotLog CDC (10g) Oracle Corporation AutoLog CDC (10g) Transaction Performance Transaction elongated by 8% Can reduce elongation by adding RAC nodes / CPUs 1.2 1.15 1.1 1.05 1 0.95 0.9 no CDC 26 Sync CDC (9i) HotLog CDC (10g) Oracle Corporation AutoLog CDC (10g) Transaction Performance Transaction elongation virtually eliminated Change capture processing moved off system 1.2 1.15 1.1 1.05 1 0.95 0.9 no CDC 27 Sync CDC (9i) HotLog CDC (10g) Oracle Corporation AutoLog CDC (10g) HotLog Latency Performance 80 60 40 Seconds About ½ the change data arrived in 1 second Virtually all the change data arrived in 2 seconds 28 Oracle Corporation 3 2.5 2 1.5 1 0 0.5 20 0 % Changes Arrived 100 Summary 29 CDC assumes the burden of change capture for you Change data is guaranteed consistent and complete Change data can be shared across users and applications effortlessly CDC delivers change data where you need it, when you need it, and with minimal overhead Oracle Corporation For More Information 30 Oracle Data Warehousing Guide, 10gR1, Chapter 16 Oracle PL/SQL Packages and Types Reference, 10gR1, packages DBMS_CDC_* http://www.oracle.com/technology/oramag/ora cle/03-nov/o63tech_bi.html http://www.oracle.com/technology/products/bi/ db/10g/pdf/twp_dss_ontime_etl_10gr1_0304.p df http://www.rittman.net/archives/000901.html http://www.nyoug.org/cdc.pdf (Oracle9i) Oracle Corporation Questions? 31 Oracle Corporation