Oracle Change Data Capture - New York Oracle User Group

advertisement
Oracle Change Data Capture
Jack Raitto, Development Manager
Oracle NEDC
NYOUG Long Island SIG
October 7, 2004
1
Oracle Corporation
Capture your change data for FREE!*
Change Capture
Cost
Before
2
After
* Zero additional license cost over Oracle10g EE
Virtually zero source system processing cost
Oracle Corporation
What is Oracle CDC?





3
Captures change data from operational
system(s) as it occurs
Part of Extract / Transform / Load (ETL)
process for DSS / Data warehouse,
potentially other applications
Optimizes the extract phase
Unleashes SQL power for
transformations
Provides management framework for
change data
Oracle Corporation
How was it done before (old way)?
Method
Major Issues
Application logging /
triggers
Timestamp / change
key column
Maintenance,
transaction impacts
Application design &
performance impact,
no before image
Impractical for large
tables, high transport
costs, not timely
Not supported, does
not track DB releases,
security issues, rocket
science
Table differencing
Log sniffing
4
Oracle Corporation
CDC Advantages
• Built in, custom fit, evolves with the database
• Delivers change data when you need it,
where you need it
• Offers several tradeoffs between timely
•
5
change delivery vs. source system overhead
(sync, async hotlog, async autolog, etc.)
Assumes complete change management
responsibility
Oracle Corporation
CDC Advantages (concl.)
•
•
•
•
6
Captures all change data along with
transaction information – see all changes a
given transaction made and who made them
Transactional consistency for changes
across multiple source tables is guaranteed
Transparently coordinates sharing of change
data across users and applications
You don’t need rocket scientists on your
staff!
Oracle Corporation
CDC Configurations
Sync CDC
Async CDC
HotLog
Async CDC
AutoLog
Oracle 10g EE
Oracle 10g EE
System
resources
Minimal (~2%)
Part of txn
Oracle 9i EE
Oracle 10g SE
Transaction
delay, system
resources
YES
NO
NO
Latency
Real time
Near real time
Systems
1
1
Varies w /
topology,
checkpoint &
log switch
interval
2
Available
Source
system cost
7
Oracle Corporation
How CDC Works: Sync CDC
8

Uses internal triggers to capture
before and/or after images of new and
updated rows

Has the same performance
implications as capture via user
triggers

Delivers change data in real-time

Uses the same interface as async CDC
Oracle Corporation
Synchronous CDC HotLog
Combined Source / Operational BI System
CDC
Change Tables
Customer
CDC
9
Oracle Corporation
Order
ETL Process
Upsert to Load
Dimension
Tables
Direct Path
Insert to load
Fact Tables
How CDC Works: Async CDC

Relational interface to Streams
• Prepackaged Streams application
• Asynchronously captures change data
from redo/archive logs
• Presents relational interface to change
data stream

10
Can operate on source system (hot
log) or staging system (auto log)
Oracle Corporation
Foundations of Async CDC
Async CDC
Replication
Message queuing
Warehouse loading
Event notification
Data protection
Streams
LogMiner
11
Change capture
Change management
Warehouse loading
Oracle Corporation
Redo log inspection
Debugging
Auditing
Reversing transactions
Asynchronous CDC HotLog
Combined Source / Operational BI System
CDC
Change Tables
Customer
Active
Redo
Log
12
LogMiner
Streams
CDC
Oracle Corporation
Order
ETL Process
Upsert to Load
Dimension
Tables
Direct Path
Insert to load
Fact Tables
Asynchronous CDC AutoLog
Source
Database
Data Warehouse / Staging System
CDC
Change Tables
Customer
Redo
Logs
Arch
Process
Archived
Redo Logs
13
LogMiner
Streams
CDC
Oracle Corporation
Order
ETL Process
Upsert to Load
Dimension
Tables
Direct Path
Insert to load
Fact Tables
Using CDC: Publish/Subscribe





14
Publisher supplies, subscribers consume
change data
Model allows sharing of change data across
users and applications
Coordinates retention / purge of change data
Prevents application from accidentally
processing change data more than once
Guarantees transactional consistency of
change data across source tables via change
sets
Oracle Corporation
Using CDC: Publish/Subscribe
Subscriber 1
Publisher
Change
Data
Publication
CustNo
Last
First
123
Smith
Frank
124
Jones
Mary
125
Stein
Linda
126
Vine
Abe
127
Block
Greg
15
Table
Column
Type
Cust
CustNo
number
Cust
Last
varchar
Cust
First
varchar
Subscription
CustNo
Last
First
123
Smith
Frank
124
Jones
Mary
125
Stein
Linda
Subscriber 2
Subscription
Oracle Corporation
CustNo
Last
First
125
Stein
Linda
126
Vine
Abe
127
Block
Greg
Publisher Concepts

Change source
• Defines the source system to CDC

Change set
• Collection of source tables for which
transactionally consistent change data
is needed

Change table
• Container to receive change data
• Is published to subscribers
16
Oracle Corporation
Publisher Concepts
Source Database: HQ
Staging Database: DW
Change Source:
HQ_SRC
Source table:
sh.sales
PROD_ID
CUST_ID
PROMO_ID
AMOUNT_SOLD
QUANTITY_SOLD
Change Set:
SH_SET
Change table:
sales_ct
PROD_ID
CUST_ID
PROMO_ID
AMOUNT_SOLD
Source table:
sh.promotions
PROMO_ID
PROMO_SUBCAT
PROMO_CAT
Change table:
promo_ct
PROMO_ID
PROMO_SUBCAT
PROMO_CAT
PROMO_COST
17
Oracle Corporation
Publish Package
DBMS_CDC_PUBLISH
CREATE / ALTER / DROP_AUTOLOG_CHANGE_SOURCE
CREATE / ALTER / DROP_CHANGE_SET
CREATE / ALTER / DROP_CHANGE_TABLE
PURGE
PURGE_CHANGE_SET
PURGE_CHANGE_TABLE
DROP_SUBSCRIPTION
18
Oracle Corporation
Using Change Data: Subscribers




19
The subscriber creates a subscription
from an available publication
The subscription provides a moving
window (view) to the change data
Subscriptions go against a single
change set and are therefore
transactionally consistent
When all subscribers have advanced
past old change data, CDC
automatically and efficiently purges
Oracle Corporation
Subscriber Concepts
Staging Database: DW
Subscription:
sales_promo_list
Change Set:
SH_SET
Publication on :
sh.sales
PROD_ID
CUST_ID
PROMO_ID
AMOUNT_SOLD
Publication on:
sh.promotions
PROMO_ID
PROMO_SUBCAT
PROMO_CAT
20
Subscriber view:
spl_sales
Subscriber view:
spl_promos
Oracle Corporation
Subscriber View
Subscriber view: spl_sales
OPERATION$
CSCN$
USERNAME$
PROD_ID
CUST_ID
PROMO_ID
I
587322
GRIFFIN
12784
12
0
UO
587482
SLOAN
12784
12
0
UN
587482
SLOAN
12784
12
42
I
594312
BRIGGS
14899
302
42
I
602311
GRIFFIN
12498
12
55
D
711413
SLOAN
138922
7934
0
I
796122
BRIGGS
77741
712
55
I
796122
BRIGGS
13846
712
55
Insert
Update
before
Update
after
Insert
Insert
Delete
Insert
Insert
21
Oracle Corporation
Subscriber Package
DBMS_CDC_SUBSCRIBE
CREATE_SUBSCRIPTION
SUBSCRIBE
ACTIVATE_SUBSCRIPTION
EXTEND_WINDOW
PURGE_WINDOW
DROP_SUBSCRIPTION
22
Oracle Corporation
Security
23

Sync publisher must have SELECT
access to the source table

Async publisher must have
EXECUTE_CATALOG_ROLE privilege

Publisher uses GRANT and REVOKE
on change tables to control subscriber
access
Oracle Corporation
Performance Benchmark*

Objectives:
• Determine impact on transaction time
• Determine latency




Source system: Oracle 10g R1 Beta, SunFire
4800 SMP 8x900Mhz/16GB w/striped 8 x Sun
StorEdge T3 arrays (9X36.4MB each)
Customer insurance quote OLTP application
run at Oracle, 250 concurrent users / 175
TPS, system “warmed up” (steady state)
Mixture of Inserts, Updates, Deletes,
Singleton Selects, Cursor Fetches, Rollbacks
/ Commits, savepoints
Capture changes on all tables
* Your mileage will vary!
24
Oracle Corporation
Transaction Performance
Transaction elongated by 10%
Relative impact varies depending on other overhead
1.2
1.15
1.1
1.05
1
0.95
0.9
no CDC
25
Sync CDC (9i)
HotLog CDC
(10g)
Oracle Corporation
AutoLog CDC
(10g)
Transaction Performance
Transaction elongated by 8%
Can reduce elongation by adding RAC nodes / CPUs
1.2
1.15
1.1
1.05
1
0.95
0.9
no CDC
26
Sync CDC (9i)
HotLog CDC
(10g)
Oracle Corporation
AutoLog CDC
(10g)
Transaction Performance
Transaction elongation virtually eliminated
Change capture processing moved off system
1.2
1.15
1.1
1.05
1
0.95
0.9
no CDC
27
Sync CDC (9i)
HotLog CDC
(10g)
Oracle Corporation
AutoLog CDC
(10g)
HotLog Latency Performance
80
60
40
Seconds
About ½ the change data arrived in 1 second
Virtually all the change data arrived in 2 seconds
28
Oracle Corporation
3
2.5
2
1.5
1
0
0.5
20
0
% Changes Arrived
100
Summary




29
CDC assumes the burden of change
capture for you
Change data is guaranteed consistent
and complete
Change data can be shared across
users and applications effortlessly
CDC delivers change data where you
need it, when you need it, and with
minimal overhead
Oracle Corporation
For More Information






30
Oracle Data Warehousing Guide, 10gR1,
Chapter 16
Oracle PL/SQL Packages and Types
Reference, 10gR1, packages DBMS_CDC_*
http://www.oracle.com/technology/oramag/ora
cle/03-nov/o63tech_bi.html
http://www.oracle.com/technology/products/bi/
db/10g/pdf/twp_dss_ontime_etl_10gr1_0304.p
df
http://www.rittman.net/archives/000901.html
http://www.nyoug.org/cdc.pdf (Oracle9i)
Oracle Corporation
Questions?
31
Oracle Corporation
Download