Business Continuity & Disaster Recovery

advertisement

Business Continuity

&

Disaster Recovery

Lauren Farese – Oracle Corporation

Paul Christman – VERITAS Software

Walter Callahan – State of Ohio

What happened on August 14

th

,

2003?

Disasters happen every day...its a fact!

• Disasters cost money so why suffer by being unprepared?

• Organizations that survive typically have:

– management foresight

– tested procedures

– processes

– back-up facilities

• Business Continuity Planning

(BCP)

Percentage

Availability

95%

99%

99.9%

99.99%

99.999%

99.9999%

Downtime Costs Money

Days

Downtime Per Year (7x24x365)

Hours Minutes Cost$

*

18

3

0

0

6

15

8

0

0

36

46

53

$250M

$51M

$5,003,312

$504,136

0

0

0

0

5

1

$47,560

$9,512

Numbers assume $5B yearly revenue run rate.

* Oracle calculated costs and is not associated with the Standish Group Report

Business Continuity Planning vs.

Disaster Recovery Planning

• Both are directed at recovery of operations

• Business Continuity Planning is directed at the recovery and resumption of business activities across the entire enterprise

• Disaster Recovery Planning is usually directed at the recovery of information technology systems and business applications, including corporate data

• BCP addresses Processes, People and Property

Business Continuity Planning Phases

• Typically three phases

– Pre-Planning

– Planning

– Post-Planning

• Critical success factor

• Cost is always an issue

• Executive ownership is critical

• Must be a business priority

Phase One: Pre-Planning

• Project initiation and management

– Establish a need

– Executive management ownership

– Time and budget allocation

• Risk evaluation and control

– Events and environment issues

– Facilities and process evaluation

– Cost benefit analysis

• Impact analysis

– Disruption and disaster scenarios

– Critical business functions

– Recovery time analysis

Phase Two: Planning

• Develop continuity strategies

– Alternative organizational recovery

– Operations and information systems

– Adhere to recovery time objectives

• Emergency response and operations

– Procedures for response and stabilization

– Establish operations center

– Emergency command and control

• Developing and implementing the plan

– Plan provides recovery within time objective

Disaster Recovery - Business Continuity Planning

Oracle BCM Business Flow

A

Global IT Bus ine s s

Ope r ations

Does current

DR plan require modif ication?

N

A

DR plan passes tests?

Y

Approv al receiv ed?

Rev iew changes made to Global IT env ironment (1)

Establish a multidiscipline team (2)

Identif y Business

Continuity / Disaster

Recov ery team members (3)

M ulti-dis cipline d

Dis as te r Re cove r y

Planning Te am

Y

M ulti-dis cipline d

Dis as te r Re cove r y

Planning Te am

Identif y within current plan areas that require additional work to mitigate new risk (7)

Dev elop new DR plan

(8)

N

M ulti-dis cipline d

Dis as te r Re cove r y

Planning Te am

Modif y DR plan as necessary & re-test plan (11)

B

N

C

M ulti-dis cipline d

Dis as te r Re cove r y

Planning Te am

Modif y new DR plan to address rev iewers concerns (14)

Determine if modif ications to plan requires additional testing (15)

C

Perf orm business risk assessment to determine current risk / f uture risk prof ile (4)

Document & communicate business risk assessment results

& risk prof ile to Global

IT Senior Management

Team (5)

Rev iew current

Disaster Recov ery plan to determine if new risk prof ile is mitigated within current DR plan

(6)

M ulti-dis cipline d

Dis as te r Re cove r y

Planning Te am

Determine what testing has to be perf ormed on

DR plan (9)

B

M ulti-dis cipline d

Dis as te r Re cove r y

Planning Te am

Test DR plan (10)

M ulti-dis cipline d

Dis as te r Re cove r y

Planning Te am

Submit DR plan to

Senior Management f or approv al (12)

Global IT Se nior

M anage m e nt

Rev iew new / changed

DR plan (13)

Plan requires additional testing due to modif ications?

N

M ulti-dis cipline d

Dis as te r Re cove r y

Planning Te am

C

Y

A

Y

Re-submit to Senior

Management f or approv al (16)

B

Phase Three: Post-Planning

• Awareness and training

– Create organizational awareness

– Enhance skills

• Maintaining and exercising

– Coordinate plan exercises

– Evaluate and document exercise results

– Develop process to maintain the plan

– Report results clearly and concisely

• Coordination and communication

– Communication with media, families, suppliers

– Crisis coordination with first responders, local authorities

What about the technology?

Match the Tools to the Business Needs

Wks Days Hrs Mins Secs

Recovery Point

Secs Mins Hrs Days Wks

Recovery Time

Tape or Disk

Backup

Async.

Replication

Sync.

Replication

Clustering

Remote

Replication

Online

Restore

Tape

Restore

Only as Good as the Weakest Link

Clients

Load Balancer

Web Cache Application

Server Tier

Java Clusters

Database

Tier

BC/DR Must Address Every Component

• Network Infrastructure

• Data Storage – online, near-line and off-line

• Application servers and their offspring

Any component down = the entire system is un-usable

Network Infrastructure

• Wide Area Traffic Manager to direct client traffic to proper site

• Network load balancer to distribute incoming requests

• Dedicated, fast link between sites

– Influences production database performance

• Redundant components and paths

– Network paths to the site and within the site

BC/DR Techniques for Data Storage

• Snapshots – frequent, within an array, FC, temporary

• Mirrors – frequent, in a different array, FC, temporary

• Replicas – synchronous or async, remote or local, FC or IP, temporary or semi-permanent

• Near-Line Disk – infrequent, x-platform, FC or IP, BI copy, DLM, or staging for backup

• Tape Backup – infrequent, FC or IP, required best practice for DR

Application Availability with Local Clustering

Server 1

Instance ‘A’

Server 2

Instance ‘B’

Database

Protects from local server failures

Depends on shared available storage

Wide Area Clustering

• Extends local clustering model to several sites

• Requires data mirroring or replication

Cleveland

Columbus

Cincinnati

Sandusky

Wide Area Clustering

Site Migration

Failover

Replication

Key Steps to Success

• Conduct a Business Impact Analysis

• Identify which processes are truly critical and cost of BC

• Prioritize investments in people and technology

• Plan and Implement

• Test, test, test!!!

• Review the business continuity plan when the business process changes

Real Life Example

Ohio Dept. of Public Safety

• State Highway Patrol

• Bureau of Motor Vehicles

• Emergency Management Agency

• Emergency Medical Services

• Investigative Unit

• Homeland Security

• Administration

Data Center Facilities

• State of Ohio Computer Center –

– West campus of Ohio State University

– Primary site

– Full data center facilities, i.e., UPS, Generator, Environmental

– Operates light out

• Charles D. Shipley Building – Public Safety Headquarters,

1970 W. Broad Street

– Approximately 4 miles apart

– Secondary site

– Full data center facilities, i.e., UPS, Generator, Environmental

– Remote operations

Features

• OC48 Sonet ring between the buildings

– Moving to Gigabit Ethernet

• Mainframe environment has mirrored disks at primary site, 3 rd mirrored leg at secondary site

• Robotic tape silos at primary site, remote tape drives at secondary site

• Redundant server with failover for law enforcement

• Servers at either site, mirror to other site

Decision Factors

• Prioritize business functions

• Work with business units for business continuity to determine IT disaster planning levels

• Determine level of acceptable risks

– Distance for secondary site

– Hot versus cold site

– Mirror data versus backups

– Redundant servers with failover versus build new server at time of disaster

“The pessimist sees difficulty in every opportunity.

The optimist sees opportunity in every difficulty”

- Winston Churchill

Download