Rick Weaver
Product Manager for DB2 for z/OS
Considerations for Continuity
Business
Continuity
High Availability
Backup & Recovery
Data Replication/Failover
Online DB Maintenance
© Copyright 05/07/2010 BMC Software, Inc
Disaster
Recovery
IC+ Log Offsite
Remote Replication
2
1
What can cause a database application outage?
Some events are planned:
-
Application database maintenance
-
-
Data migration
Schema change implementation
-
-
-
Hardware upgrades (processor, storage)
Operating system or DBMS maintenance
Disaster recovery preparation
Other events are unplanned
-
Site disasters (floods, power outages, storms, fire, etc.)
-
-
-
-
-
Hardware failures (disk, CPU, network, etc.)
Operating system failures
DBMS failures
Operation errors
Batch cycle errors
-
-
-
-
-
-
Improper data feeds
User errors
Deliberate data corruption
Application software errors
Application performance degradation
Fallback from application change migrations
3 © Copyright 05/07/2010 BMC Software, Inc
When to Declare Disaster
Site-wide calamity - declare
Lose Network connectivity – maybe declare
-
Estimate repair time
-
> nn hours, declare (BCP Group)
Lose 1 LPAR – no declare
Lose entire CPU – maybe declare
-
Estimate repair time
-
> nn hours, declare (BCP group)
Lose one volume – no declare
Lose one SHARK array – no declare
Lose all SHARK arrays – maybe declare
-
Estimate repair time
-
> nn hours, declare (BCP group)
Lose one application, one database, one segment/table no declare (Most likely event)
© Copyright 05/07/2010 BMC Software, Inc
Data Center
L
P
A
R
L
P
A
R
L
P
A
R
L
P
A
R
L
P
A
R
L
P
A
R
4
2
How customers spend their money
R e c o v e r y
T y p e
D is a s t e r
V o lu m e
A p p lic a t io n /
L o g ic a l
B u d g e t
$ $ $ $ $
$ $ $
$
A t t e n t io n P r o b a b il it y
H i g h
M e d i u m
L o w
L o w
M e d i u m
V e r y h i g h – it ’s s u r e t o h a p p e n !
© Copyright 05/07/2010 BMC Software, Inc 5
Disaster Recovery
Options from weekly dumps to remote site replication
-
Dumps - simple, cheap, maximum data loss
-
Weekly dumps means several days data loss
Remote mirror - complex, expensive, no data loss
-
Disk, network, software, facilities, operations
Compromise - periodic vaulting of copies and logs
Daily or hourly log shipment will minimize data loss
Cost
Complexity
Data Loss
Outage Time
© Copyright 05/07/2010 BMC Software, Inc 6
3
Cost Components of Backup
What do you spend doing Database Backups?
CPU time, overhead on system resources
Output resources (tape or disk)
Operations and Support resources
What’s the value to the business?
Recoverability of critical data asset
Possible side benefit – use backup to migrate data to ‘clone’ system
What’s the business impact?
Availability impact (maybe)
Data integrity and consistency risk (maybe)
Conflicts with business processing (maybe)
© Copyright 05/07/2010 BMC Software, Inc 7
Cost Components of Log Processing
What do you spend doing Log Processing (accums)?
CPU time, overhead on system resources
Output resources (tape or disk)
Operations and Support resources
What’s the value to the business?
Faster Recovery of critical data asset
What’s the business impact?
Availability impact (maybe)
Conflicts with business processing (maybe)
© Copyright 05/07/2010 BMC Software, Inc 8
4
Cost Components of Application Recovery
What do you spend doing Local Recovery?
Your business is DOWN – cost can be MILLIONs per hour!
CPU time, overhead on system resources
Output resources (tape or disk)
Operations/Support resources – do you have Recovery Experts?
– ‘Think Time’ can be a significant part of total outage time
– Remember – MOST outages are LOCAL outages, not Disaster Recovery
What’s the value to the business?
Recovery of critical data asset - eventually
Business Resumption
– Identify and Redo lost transactions
What’s the business impact?
Availability impact
– Lost sales, lost opportunity, fees and fines, supply chain impact, etc.
© Copyright 05/07/2010 BMC Software, Inc
Application Recovery Support Tools
DBMS vendors include ‘basic’ set of utilities
-
-
Backup, Restore, Log accumulation, Reorganization
Usually delivered with DBMS license, but some vendors break out utilities and charge for them
‘Native’ utilities were not primary business goal for DBMS vendors
-
-
Utilities don’t ‘sell’ the DBMS
Spent more time optimizing DBMS performance and connectivity
Gave rise to secondary ‘Independent Software Vendor’ market
-
Initial products were DBA Utilities and Administration tools
-
-
Support High Availability (local) and Disaster Recovery (remote)
Evolving into DBA and System Programmer solutions for automating administration and support of High Availability DBMS applications
© Copyright 05/07/2010 BMC Software, Inc 10
9
5
Examples of ISV Innovation for High Availability
Storage exploitation for consistent image copies with minimal outage
High-speed recovery with a variety of techniques
Point-In-Time recovery to any timestamp with consistency
Disaster Recovery preparation automation
Data Replication for reporting and recovery
Using DBMS log data for reporting and transaction recovery
Online reorganization and unloads
Structure change automation
Monitoring DBMS performance for utility automation
Simplification and automation for complex tasks
© Copyright 05/07/2010 BMC Software, Inc
Cost Components of Disaster Recovery
What do you spend preparing for Disaster Recovery?
CPU time, overhead on system resources
Output resources (tape or disk)
Operations and Support resources
Recovery Site (Facilities, Network/Communications, Work Area)
What’s the value to the business?
Recovery of critical data asset in a total site outage scenario
Business Resumption
– Identify and Redo lost transactions
What’s the business impact?
Cost to business for ‘insurance’ is high
– Typically a long term contract with a ‘hot site’ provider
Cost to business for ongoing DR preparation is draining
– Periodic process at local site (DR Prep – daily?)
© Copyright 05/07/2010 BMC Software, Inc
11
12
6
Disaster Recovery vs Disaster Restart
Disaster Recovery (Cheap)
-
-
-
Take database image copies and dataset dumps
Ship or transmit to remote site (usually 3 rd party)
Restore files at remote site, start systems, recover all databases
-
-
Data Loss – 24+ hours, Recover Time 24-72 hours
Most common option for Disaster protection
Disaster Restart to PiT copy (Simple)
-
-
Suspend production database processing
Dump all data, ship or transmit to remote site (usually 3 rd
Typically using Flashcopy or something like it
-
-
Restore from dumps at remote site, start systems
Data Loss – 24+hours, Recovery Time 24 hours. party)
Disaster Restart to Replicate (most protection, most cost)
-
No outage, no dumps, no restores
-
-
-
Start systems at remote site, inflight transaction automatically backed out
Data Loss – 0 to minutes, Recovery (actually Restart) Time 2 to 4hours
Usually the remote site belongs to the company, not a 3 rd party
© Copyright 05/07/2010 BMC Software, Inc
Replication Solution Cost Components
Constant contact with alternative site
Facilities
– Secure, Conditioned, Raised Floor, Accessible, Remote
Network
– Bandwidth plus associated Network Hardware gear
Hardware
– Disk, Tape, Processor, etc.
Software
– Possibly redundant licenses
Operational Complexity (Services)
– Typically 24x7
© Copyright 05/07/2010 BMC Software, Inc
13
14
7
What criteria should be used to justify?
Recovery Time Objective (RTO)
Recovery Point Objective (RPO)
Recovery Geography Objective (RGO)
» Is your disaster site threatened by the same risk?
Data Integrity after recovery (DI)
» Recovery to inconsistent data or point is useless
Production Availability Impact (Avail)
» Establishing consistent recovery point may require outage
Production Performance impact (Perf)
» Due to extra processing or distance to remote site
Cost/Benefit Analysis ($$$)
Risk of solution failure or breakdown
Complexity of Disaster Recovery/Restart
© Copyright 05/07/2010 BMC Software, Inc 15
Remote Replication Examples
-
-
E-Net Remote Recovery Data Facility, Enterprise Data Replication
IBM Extended Remote Copy (XRC)
Sun/StorageTek Data Replicator
Database specific solutions such as Oracle Data Guard or IBM DB2 UDB
HADR
EMC Symetrix Remote Data Facility family
IBM Peer-to-Peer Remote Copy family
StorageTek PowerPPRC
© Copyright 05/07/2010 BMC Software, Inc 16
8
Remote Journaling
– E-Net RRDF (Remote Recovery Data Facility)
-
RPO~0, RTO>hrs, Perf~>0, Cost=$$$, Dist>66km
Local
Host
DASD
Controller
RRDF
Task
ESCON or
T1 or T3
RRDF
Task
IMS, DB2,
CICS Logs
© Copyright 05/07/2010 BMC Software, Inc
Remote
Host
DASD
Controller
RRDF Journal
(Must be Split and Applied)
17
Remote Propagation – E-Net RRDF with Log Apply, Oracle Data
Guard
Local
Host
DASD
Controller
IMS, DB2,
CICS Logs
-
RPO~0, RTO~<hrs, Perf~>0, Cost=$$$, Dist>66km
RRDF
Task
RRDF
Task
ESCON or
T1 or T3
Remote
Host
DASD
Controller
RRDF
Journal
RRDF
Splitter
IMS, DB2,
CICS Logs Log
Apply
© Copyright 05/07/2010 BMC Software, Inc
IMS
DB2
CICS/
VSAM
18
9
Remote Replication - EDR: OAR/DB2
Capture, Apply and User Interface
– RPO~0, RTO~<hrs, Perf~>0, Cost=$$$$, Dist>66km
Ocean A
OAR/DB2
Capture
OAR/DB2
Apply
DB2 for z/OS
OAR/DB2
Capture
OAR/DB2
Apply
Ocean C
DB2 for z/OS
User interface
© Copyright 05/07/2010 BMC Software, Inc
OAR/DB2
Capture
OAR/DB2
Apply
Ocean B DB2 for z/OS
19
Remote Disk Shadow - IBM XRC (Extended Remote Copy)
(Emulated by HDS and EMC)
-
RPO~0, RTO~0, Perf~>0, Cost=$$$, Dist>66km
Local
Host
DASD
Controller
DFSMS
ESCON or
T1 or T3
DFSMS
SDM
Task
DASD
© Copyright 05/07/2010 BMC Software, Inc
Remote
Host
DASD
Controller
DASD
20
10
Remote Disk Shadow
- EMC SRDF/A, HDS TrueCopy
-
RPO~0, RTO~hrs, Perf~>0, Cost=$$$, Dist>66km
Local
Host
Symetrix
DASD
SRDF/A
ESCON or
T1 or T3
Delta Sets and
Write Folding to reduce bandwidth
Symetrix
DASD
© Copyright 05/07/2010 BMC Software, Inc 21
Remote Disk Mirror - EMC SRDF/S
-
RPO=0, RTO~hrs, Perf~>0, Cost=$$$, Dist<66km
Local
Host
Symetrix
SRDF/S
ESCON
DASD
© Copyright 05/07/2010 BMC Software, Inc
Symetrix
DASD
22
11
Remote Disk Mirror
- IBM PPRC (Peer to Peer Remote Copy)
(Emulated by HDS and EMC)
-
RPO=0, RTO~hrs, Perf~>0, Cost=$$$, Dist<66km
Local
Host
3990-6
DFSMS
PPRC
ESCON
3990-6
© Copyright 05/07/2010 BMC Software, Inc 23
Remote Disk Mirror – EMC SRDF/AR
-
RPO>0, RTO~hrs, Perf~>0, Cost=$$$$, Dist>66km
© Copyright 05/07/2010 BMC Software, Inc
SRDF/S
ESCON
‘Hop’ Mirror BCVs
SRDF/A
Only Changed Tracks
T1(s) or T3(s)
Remote
Mirror
BCVs
24
12
Remote Disk Mirror –
EMC SRDF/STAR
-
RPO>0, RTO~hrs, Perf~>0, Cost=$$$$, Dist>66km
SRDF/S
ESCON
SRDF/A
Only Changed Tracks
SRDF/A
Only Changed Tracks
T1(s) or T3(s)
Remote
Mirror
‘Hop’ Mirror BCVs
© Copyright 05/07/2010 BMC Software, Inc
BCVs
25
IBM GDPS - Geographically Dispersed Parallel Sysplex (disk controller can be EMC or HDS)
-
RPO=0, RTO~0, Perf~>0, Cost=$$$$$, Dist<66km
DFSMS CF CF
Local
Host
DFSMS
Remote
Host
3990-6
3990-6
PPRC
(Metro Mirror) or XRC
(Global Mirror)
© Copyright 05/07/2010 BMC Software, Inc 26
13
RPO and RTO
Technology Alternatives
1 Day
RPO – Recovery
Point Objective
How much transactional data is lost when an offsite recovery is required?
1 Hour
Data
Loss
(RPO)
1 Second
Asynchronous
Disk Mirroring
Asynchronous
Database Replication
(LogApply & EDR)
0
0 15 Minutes
Recovery Time (RTO)
© Copyright 05/07/2010 BMC Software, Inc
Daily Backups
Remote Vaulting
(ISVs can push this circle left)
Synchronous Disk Mirroring
(short distance only)
1-2 Hours
RTO – Recovery Time Objective
6-24 Hours
How much time is required to perform offsite recovery?
RRDF Journaling
(ISVs can push this circle left)
27
Remote Replication Considerations
Remote Replication is a powerful solution for total site failure … but
-
Doesn’t help for more likely local production failure
-
-
Synchronous Remote Replication is very distance sensitive
Asynchronous Remote Replication can be distant – and there will be data loss
-
What if the solution fails? What is Plan ‘B’?
© Copyright 05/07/2010 BMC Software, Inc 28
14
Considerations for Continuity
Business
Continuity
High Availability
Backup & Recovery
Data Replication/Failover
Online DB Maintenance
© Copyright 05/07/2010 BMC Software, Inc
Disaster
Recovery
IC+ Log Offsite
Remote Replication
29
RECOVER PLUS – Fast Forward Recovery
Active
Logs
Archive
Logs
Copies
Full Inc
Work Dataset
INDEX work-area can use memory to reduce I/O
LOG INPUT
LOG SORT
MERGE
KEY SORT
INDEX BUILD
-
-
-
Page built in memory, only written once
Simultaneous COPY
Simultaneous key extract
Table Space
Copies
Full
Index Space
Full
© Copyright 05/07/2010 BMC Software, Inc 30
15
Recovery Management for DB2 DR support (Sysprog and DBA)
-
Automated process, easy to implement
Dialog driven generation of Recovery Manager utilities
Simulation and Estimation at local site pre-DR
DB2 Catalog/Directory
& BMC Repository
Remote Full Copies
(Change or Reference) ARMBSRR
(Gen System)
ARMBARC
(copy log)
TMS
Pull
ARMBGEN
(Gen Apps JCL)
ICF Dump
ARMBLOG
(Switch Active Log)
Application
Remote Copies
(Full/Incremental,
Change/Reference)
© Copyright 05/07/2010 BMC Software, Inc 31
Remote Site Execution
Remote site DB2 startup is easy
-
Tracking and Reporting tables capture DR activity for subsequent analysis
-
You can continue to ship/transmit logs to the DR site after ARMBSRR at local site
ARMBSDR will indentify additional DR log and update BSDS etc.
ICF
Restore
ARMBSRR Job 2
(Cat/Dir/RM
Recoveries)
Application
Database
Recoveries
Business
Resumption
(as of ‘last nights’
ARCHIVE LOG point)
TMS Restore ARMBSRR Job 1
(CLI, VSAM Allocates
Initialize Actives)
© Copyright 05/07/2010 BMC Software, Inc
Application
Dataset
Restores
32
16
BMC Recovery Management for DB2 Business Value
Cost Reduction
-
Downtime costs money, backup and recovery operations consume resources – BMC reduces both
Availability
-
BMC can reduce or eliminate downtime with online backup, recovery backout & avoidance, UNDO
Risk Mitigation
-
Are you a Recovery Expert? If yes, you are getting too much practice. BMC automation ensures success
Productivity
-
Automated backup and recovery assistance, exploit recovery assets for Auditing, Migrating, Replicating
Performance
-
BMC runs faster and uses less CPU (dramatically so with Instant
Snapshot, Backout Recovery, UNDO)
33 © Copyright 05/07/2010 BMC Software, Inc
17
© Copyright 05/07/2010 BMC Software, Inc 35
18