Rick Weaver - Briefing DB2 Recovery Extended DRx

advertisement

Extended Disaster Recovery

Rick Weaver

Product Manager for DB2 for z/OS

Considerations for Continuity

Business

Continuity

High Availability

Backup & Recovery

Data Replication/Failover

Online DB Maintenance

© Copyright 05/07/2010 BMC Software, Inc

Disaster

Recovery

IC+ Log Offsite

Remote Replication

2

1

What can cause a database application outage?

Some events are planned:

-

Application database maintenance

-

-

Data migration

Schema change implementation

-

-

-

Hardware upgrades (processor, storage)

Operating system or DBMS maintenance

Disaster recovery preparation

Other events are unplanned

-

Site disasters (floods, power outages, storms, fire, etc.)

-

-

-

-

-

Hardware failures (disk, CPU, network, etc.)

Operating system failures

DBMS failures

Operation errors

Batch cycle errors

-

-

-

-

-

-

Improper data feeds

User errors

Deliberate data corruption

Application software errors

Application performance degradation

Fallback from application change migrations

3 © Copyright 05/07/2010 BMC Software, Inc

When to Declare Disaster

Site-wide calamity - declare

Lose Network connectivity – maybe declare

-

Estimate repair time

-

> nn hours, declare (BCP Group)

Lose 1 LPAR – no declare

Lose entire CPU – maybe declare

-

Estimate repair time

-

> nn hours, declare (BCP group)

Lose one volume – no declare

Lose one SHARK array – no declare

Lose all SHARK arrays – maybe declare

-

Estimate repair time

-

> nn hours, declare (BCP group)

Lose one application, one database, one segment/table no declare (Most likely event)

© Copyright 05/07/2010 BMC Software, Inc

Data Center

L

P

A

R

L

P

A

R

L

P

A

R

L

P

A

R

L

P

A

R

L

P

A

R

4

2

How customers spend their money

R e c o v e r y

T y p e

D is a s t e r

V o lu m e

A p p lic a t io n /

L o g ic a l

B u d g e t

$ $ $ $ $

$ $ $

$

A t t e n t io n P r o b a b il it y

H i g h

M e d i u m

L o w

L o w

M e d i u m

V e r y h i g h – it ’s s u r e t o h a p p e n !

© Copyright 05/07/2010 BMC Software, Inc 5

Disaster Recovery

Options from weekly dumps to remote site replication

-

Dumps - simple, cheap, maximum data loss

-

Weekly dumps means several days data loss

Remote mirror - complex, expensive, no data loss

-

Disk, network, software, facilities, operations

Compromise - periodic vaulting of copies and logs

Daily or hourly log shipment will minimize data loss

Cost

Complexity

Data Loss

Outage Time

© Copyright 05/07/2010 BMC Software, Inc 6

3

Cost Components of Backup

What do you spend doing Database Backups?

CPU time, overhead on system resources

Output resources (tape or disk)

Operations and Support resources

What’s the value to the business?

Recoverability of critical data asset

Possible side benefit – use backup to migrate data to ‘clone’ system

What’s the business impact?

Availability impact (maybe)

Data integrity and consistency risk (maybe)

Conflicts with business processing (maybe)

© Copyright 05/07/2010 BMC Software, Inc 7

Cost Components of Log Processing

What do you spend doing Log Processing (accums)?

CPU time, overhead on system resources

Output resources (tape or disk)

Operations and Support resources

What’s the value to the business?

Faster Recovery of critical data asset

What’s the business impact?

Availability impact (maybe)

Conflicts with business processing (maybe)

© Copyright 05/07/2010 BMC Software, Inc 8

4

Cost Components of Application Recovery

What do you spend doing Local Recovery?

Your business is DOWN – cost can be MILLIONs per hour!

CPU time, overhead on system resources

Output resources (tape or disk)

Operations/Support resources – do you have Recovery Experts?

– ‘Think Time’ can be a significant part of total outage time

– Remember – MOST outages are LOCAL outages, not Disaster Recovery

What’s the value to the business?

Recovery of critical data asset - eventually

Business Resumption

– Identify and Redo lost transactions

What’s the business impact?

Availability impact

– Lost sales, lost opportunity, fees and fines, supply chain impact, etc.

© Copyright 05/07/2010 BMC Software, Inc

Application Recovery Support Tools

DBMS vendors include ‘basic’ set of utilities

-

-

Backup, Restore, Log accumulation, Reorganization

Usually delivered with DBMS license, but some vendors break out utilities and charge for them

‘Native’ utilities were not primary business goal for DBMS vendors

-

-

Utilities don’t ‘sell’ the DBMS

Spent more time optimizing DBMS performance and connectivity

Gave rise to secondary ‘Independent Software Vendor’ market

-

Initial products were DBA Utilities and Administration tools

-

-

Support High Availability (local) and Disaster Recovery (remote)

Evolving into DBA and System Programmer solutions for automating administration and support of High Availability DBMS applications

© Copyright 05/07/2010 BMC Software, Inc 10

9

5

Examples of ISV Innovation for High Availability

Storage exploitation for consistent image copies with minimal outage

High-speed recovery with a variety of techniques

Point-In-Time recovery to any timestamp with consistency

Disaster Recovery preparation automation

Data Replication for reporting and recovery

Using DBMS log data for reporting and transaction recovery

Online reorganization and unloads

Structure change automation

Monitoring DBMS performance for utility automation

Simplification and automation for complex tasks

© Copyright 05/07/2010 BMC Software, Inc

Cost Components of Disaster Recovery

What do you spend preparing for Disaster Recovery?

CPU time, overhead on system resources

Output resources (tape or disk)

Operations and Support resources

Recovery Site (Facilities, Network/Communications, Work Area)

What’s the value to the business?

Recovery of critical data asset in a total site outage scenario

Business Resumption

– Identify and Redo lost transactions

What’s the business impact?

Cost to business for ‘insurance’ is high

– Typically a long term contract with a ‘hot site’ provider

Cost to business for ongoing DR preparation is draining

– Periodic process at local site (DR Prep – daily?)

© Copyright 05/07/2010 BMC Software, Inc

11

12

6

Disaster Recovery vs Disaster Restart

Disaster Recovery (Cheap)

-

-

-

Take database image copies and dataset dumps

Ship or transmit to remote site (usually 3 rd party)

Restore files at remote site, start systems, recover all databases

-

-

Data Loss – 24+ hours, Recover Time 24-72 hours

Most common option for Disaster protection

Disaster Restart to PiT copy (Simple)

-

-

Suspend production database processing

Dump all data, ship or transmit to remote site (usually 3 rd

Typically using Flashcopy or something like it

-

-

Restore from dumps at remote site, start systems

Data Loss – 24+hours, Recovery Time 24 hours. party)

Disaster Restart to Replicate (most protection, most cost)

-

No outage, no dumps, no restores

-

-

-

Start systems at remote site, inflight transaction automatically backed out

Data Loss – 0 to minutes, Recovery (actually Restart) Time 2 to 4hours

Usually the remote site belongs to the company, not a 3 rd party

© Copyright 05/07/2010 BMC Software, Inc

Replication Solution Cost Components

Constant contact with alternative site

Facilities

– Secure, Conditioned, Raised Floor, Accessible, Remote

Network

– Bandwidth plus associated Network Hardware gear

Hardware

– Disk, Tape, Processor, etc.

Software

– Possibly redundant licenses

Operational Complexity (Services)

– Typically 24x7

© Copyright 05/07/2010 BMC Software, Inc

13

14

7

What criteria should be used to justify?

Recovery Time Objective (RTO)

Recovery Point Objective (RPO)

Recovery Geography Objective (RGO)

» Is your disaster site threatened by the same risk?

Data Integrity after recovery (DI)

» Recovery to inconsistent data or point is useless

Production Availability Impact (Avail)

» Establishing consistent recovery point may require outage

Production Performance impact (Perf)

» Due to extra processing or distance to remote site

Cost/Benefit Analysis ($$$)

Risk of solution failure or breakdown

Complexity of Disaster Recovery/Restart

© Copyright 05/07/2010 BMC Software, Inc 15

Remote Replication Examples

Replication to Remote site for Disaster Recovery/Restart

-

-

Host-based solution examples

E-Net Remote Recovery Data Facility, Enterprise Data Replication

IBM Extended Remote Copy (XRC)

Sun/StorageTek Data Replicator

Database specific solutions such as Oracle Data Guard or IBM DB2 UDB

HADR

Storage-based solutions examples

EMC Symetrix Remote Data Facility family

IBM Peer-to-Peer Remote Copy family

StorageTek PowerPPRC

© Copyright 05/07/2010 BMC Software, Inc 16

8

Remote Journaling

– E-Net RRDF (Remote Recovery Data Facility)

-

RPO~0, RTO>hrs, Perf~>0, Cost=$$$, Dist>66km

Local

Host

DASD

Controller

RRDF

Task

ESCON or

T1 or T3

RRDF

Task

IMS, DB2,

CICS Logs

© Copyright 05/07/2010 BMC Software, Inc

Remote

Host

DASD

Controller

RRDF Journal

(Must be Split and Applied)

17

Remote Propagation – E-Net RRDF with Log Apply, Oracle Data

Guard

Local

Host

DASD

Controller

IMS, DB2,

CICS Logs

-

RPO~0, RTO~<hrs, Perf~>0, Cost=$$$, Dist>66km

RRDF

Task

RRDF

Task

ESCON or

T1 or T3

Remote

Host

DASD

Controller

RRDF

Journal

RRDF

Splitter

IMS, DB2,

CICS Logs Log

Apply

© Copyright 05/07/2010 BMC Software, Inc

IMS

DB2

CICS/

VSAM

18

9

Remote Replication - EDR: OAR/DB2

Capture, Apply and User Interface

– RPO~0, RTO~<hrs, Perf~>0, Cost=$$$$, Dist>66km

Ocean A

OAR/DB2

Capture

OAR/DB2

Apply

DB2 for z/OS

OAR/DB2

Capture

OAR/DB2

Apply

Ocean C

DB2 for z/OS

User interface

© Copyright 05/07/2010 BMC Software, Inc

OAR/DB2

Capture

OAR/DB2

Apply

Ocean B DB2 for z/OS

19

Remote Disk Shadow - IBM XRC (Extended Remote Copy)

(Emulated by HDS and EMC)

-

RPO~0, RTO~0, Perf~>0, Cost=$$$, Dist>66km

Local

Host

DASD

Controller

DFSMS

ESCON or

T1 or T3

DFSMS

SDM

Task

DASD

© Copyright 05/07/2010 BMC Software, Inc

Remote

Host

DASD

Controller

DASD

20

10

Remote Disk Shadow

- EMC SRDF/A, HDS TrueCopy

-

RPO~0, RTO~hrs, Perf~>0, Cost=$$$, Dist>66km

Local

Host

Symetrix

DASD

SRDF/A

ESCON or

T1 or T3

Delta Sets and

Write Folding to reduce bandwidth

Symetrix

DASD

© Copyright 05/07/2010 BMC Software, Inc 21

Remote Disk Mirror - EMC SRDF/S

-

RPO=0, RTO~hrs, Perf~>0, Cost=$$$, Dist<66km

Local

Host

Symetrix

SRDF/S

ESCON

DASD

© Copyright 05/07/2010 BMC Software, Inc

Symetrix

DASD

22

11

Remote Disk Mirror

- IBM PPRC (Peer to Peer Remote Copy)

(Emulated by HDS and EMC)

-

RPO=0, RTO~hrs, Perf~>0, Cost=$$$, Dist<66km

Local

Host

3990-6

DFSMS

PPRC

ESCON

3990-6

© Copyright 05/07/2010 BMC Software, Inc 23

Remote Disk Mirror – EMC SRDF/AR

-

RPO>0, RTO~hrs, Perf~>0, Cost=$$$$, Dist>66km

© Copyright 05/07/2010 BMC Software, Inc

SRDF/S

ESCON

‘Hop’ Mirror BCVs

SRDF/A

Only Changed Tracks

T1(s) or T3(s)

Remote

Mirror

BCVs

24

12

Remote Disk Mirror –

EMC SRDF/STAR

-

RPO>0, RTO~hrs, Perf~>0, Cost=$$$$, Dist>66km

SRDF/S

ESCON

SRDF/A

Only Changed Tracks

SRDF/A

Only Changed Tracks

T1(s) or T3(s)

Remote

Mirror

‘Hop’ Mirror BCVs

© Copyright 05/07/2010 BMC Software, Inc

BCVs

25

IBM GDPS - Geographically Dispersed Parallel Sysplex (disk controller can be EMC or HDS)

-

RPO=0, RTO~0, Perf~>0, Cost=$$$$$, Dist<66km

DFSMS CF CF

Local

Host

DFSMS

Remote

Host

3990-6

3990-6

PPRC

(Metro Mirror) or XRC

(Global Mirror)

© Copyright 05/07/2010 BMC Software, Inc 26

13

RPO and RTO

Technology Alternatives

1 Day

RPO – Recovery

Point Objective

How much transactional data is lost when an offsite recovery is required?

1 Hour

Data

Loss

(RPO)

1 Second

Asynchronous

Disk Mirroring

Asynchronous

Database Replication

(LogApply & EDR)

0

0 15 Minutes

Recovery Time (RTO)

© Copyright 05/07/2010 BMC Software, Inc

Daily Backups

Remote Vaulting

(ISVs can push this circle left)

Synchronous Disk Mirroring

(short distance only)

1-2 Hours

RTO – Recovery Time Objective

6-24 Hours

How much time is required to perform offsite recovery?

RRDF Journaling

(ISVs can push this circle left)

27

Remote Replication Considerations

Remote Replication is a powerful solution for total site failure … but

-

Doesn’t help for more likely local production failure

-

-

Synchronous Remote Replication is very distance sensitive

Asynchronous Remote Replication can be distant – and there will be data loss

-

What if the solution fails? What is Plan ‘B’?

© Copyright 05/07/2010 BMC Software, Inc 28

14

Considerations for Continuity

Business

Continuity

High Availability

Backup & Recovery

Data Replication/Failover

Online DB Maintenance

© Copyright 05/07/2010 BMC Software, Inc

Disaster

Recovery

IC+ Log Offsite

Remote Replication

29

RECOVER PLUS – Fast Forward Recovery

Active

Logs

Archive

Logs

Copies

Full Inc

Work Dataset

INDEX work-area can use memory to reduce I/O

LOG INPUT

LOG SORT

MERGE

KEY SORT

INDEX BUILD

-

-

-

Page built in memory, only written once

Simultaneous COPY

Simultaneous key extract

Table Space

Copies

Full

Index Space

Full

© Copyright 05/07/2010 BMC Software, Inc 30

15

Recovery Management for DB2 DR support (Sysprog and DBA)

Offsite log recovery without complexity

-

Automated process, easy to implement

Dialog driven generation of Recovery Manager utilities

Simulation and Estimation at local site pre-DR

DB2 Catalog/Directory

& BMC Repository

Remote Full Copies

(Change or Reference) ARMBSRR

(Gen System)

ARMBARC

(copy log)

TMS

Pull

ARMBGEN

(Gen Apps JCL)

ICF Dump

ARMBLOG

(Switch Active Log)

Application

Remote Copies

(Full/Incremental,

Change/Reference)

© Copyright 05/07/2010 BMC Software, Inc 31

Remote Site Execution

Remote site DB2 startup is easy

-

Tracking and Reporting tables capture DR activity for subsequent analysis

-

You can continue to ship/transmit logs to the DR site after ARMBSRR at local site

ARMBSDR will indentify additional DR log and update BSDS etc.

ICF

Restore

ARMBSRR Job 2

(Cat/Dir/RM

Recoveries)

Application

Database

Recoveries

Business

Resumption

(as of ‘last nights’

ARCHIVE LOG point)

TMS Restore ARMBSRR Job 1

(CLI, VSAM Allocates

Initialize Actives)

© Copyright 05/07/2010 BMC Software, Inc

Application

Dataset

Restores

32

16

BMC Recovery Management for DB2 Business Value

Cost Reduction

-

Downtime costs money, backup and recovery operations consume resources – BMC reduces both

Availability

-

BMC can reduce or eliminate downtime with online backup, recovery backout & avoidance, UNDO

Risk Mitigation

-

Are you a Recovery Expert? If yes, you are getting too much practice. BMC automation ensures success

Productivity

-

Automated backup and recovery assistance, exploit recovery assets for Auditing, Migrating, Replicating

Performance

-

BMC runs faster and uses less CPU (dramatically so with Instant

Snapshot, Backout Recovery, UNDO)

33 © Copyright 05/07/2010 BMC Software, Inc

17

© Copyright 05/07/2010 BMC Software, Inc 35

18

Download