Lauren Farese – Oracle Corporation
Paul Christman – VERITAS Software
Walter Callahan – State of Ohio
th
• Disasters cost money so why suffer by being unprepared?
• Organizations that survive typically have:
– management foresight
– tested procedures
– processes
– back-up facilities
• Business Continuity Planning
(BCP)
Percentage
Availability
95%
99%
99.9%
99.99%
99.999%
99.9999%
Days
Downtime Per Year (7x24x365)
Hours Minutes Cost$
*
18
3
0
0
6
15
8
0
0
36
46
53
$250M
$51M
$5,003,312
$504,136
0
0
0
0
5
1
$47,560
$9,512
Numbers assume $5B yearly revenue run rate.
* Oracle calculated costs and is not associated with the Standish Group Report
Business Continuity Planning vs.
Disaster Recovery Planning
• Both are directed at recovery of operations
• Business Continuity Planning is directed at the recovery and resumption of business activities across the entire enterprise
• Disaster Recovery Planning is usually directed at the recovery of information technology systems and business applications, including corporate data
• BCP addresses Processes, People and Property
• Typically three phases
– Pre-Planning
– Planning
– Post-Planning
• Critical success factor
• Cost is always an issue
• Executive ownership is critical
• Must be a business priority
• Project initiation and management
– Establish a need
– Executive management ownership
– Time and budget allocation
• Risk evaluation and control
– Events and environment issues
– Facilities and process evaluation
– Cost benefit analysis
• Impact analysis
– Disruption and disaster scenarios
– Critical business functions
– Recovery time analysis
• Develop continuity strategies
– Alternative organizational recovery
– Operations and information systems
– Adhere to recovery time objectives
• Emergency response and operations
– Procedures for response and stabilization
– Establish operations center
– Emergency command and control
• Developing and implementing the plan
– Plan provides recovery within time objective
Disaster Recovery - Business Continuity Planning
A
Global IT Bus ine s s
Ope r ations
Does current
DR plan require modif ication?
N
A
DR plan passes tests?
Y
Approv al receiv ed?
Rev iew changes made to Global IT env ironment (1)
Establish a multidiscipline team (2)
Identif y Business
Continuity / Disaster
Recov ery team members (3)
M ulti-dis cipline d
Dis as te r Re cove r y
Planning Te am
Y
M ulti-dis cipline d
Dis as te r Re cove r y
Planning Te am
Identif y within current plan areas that require additional work to mitigate new risk (7)
Dev elop new DR plan
(8)
N
M ulti-dis cipline d
Dis as te r Re cove r y
Planning Te am
Modif y DR plan as necessary & re-test plan (11)
B
N
C
M ulti-dis cipline d
Dis as te r Re cove r y
Planning Te am
Modif y new DR plan to address rev iewers concerns (14)
Determine if modif ications to plan requires additional testing (15)
C
Perf orm business risk assessment to determine current risk / f uture risk prof ile (4)
Document & communicate business risk assessment results
& risk prof ile to Global
IT Senior Management
Team (5)
Rev iew current
Disaster Recov ery plan to determine if new risk prof ile is mitigated within current DR plan
(6)
M ulti-dis cipline d
Dis as te r Re cove r y
Planning Te am
Determine what testing has to be perf ormed on
DR plan (9)
B
M ulti-dis cipline d
Dis as te r Re cove r y
Planning Te am
Test DR plan (10)
M ulti-dis cipline d
Dis as te r Re cove r y
Planning Te am
Submit DR plan to
Senior Management f or approv al (12)
Global IT Se nior
M anage m e nt
Rev iew new / changed
DR plan (13)
Plan requires additional testing due to modif ications?
N
M ulti-dis cipline d
Dis as te r Re cove r y
Planning Te am
C
Y
A
Y
Re-submit to Senior
Management f or approv al (16)
B
• Awareness and training
– Create organizational awareness
– Enhance skills
• Maintaining and exercising
– Coordinate plan exercises
– Evaluate and document exercise results
– Develop process to maintain the plan
– Report results clearly and concisely
• Coordination and communication
– Communication with media, families, suppliers
– Crisis coordination with first responders, local authorities
Wks Days Hrs Mins Secs
Recovery Point
Secs Mins Hrs Days Wks
Recovery Time
Tape or Disk
Backup
Async.
Replication
Sync.
Replication
Clustering
Remote
Replication
Online
Restore
Tape
Restore
Clients
Load Balancer
Web Cache Application
Server Tier
Java Clusters
Database
Tier
• Network Infrastructure
• Data Storage – online, near-line and off-line
• Application servers and their offspring
Any component down = the entire system is un-usable
• Wide Area Traffic Manager to direct client traffic to proper site
• Network load balancer to distribute incoming requests
• Dedicated, fast link between sites
– Influences production database performance
• Redundant components and paths
– Network paths to the site and within the site
• Snapshots – frequent, within an array, FC, temporary
• Mirrors – frequent, in a different array, FC, temporary
• Replicas – synchronous or async, remote or local, FC or IP, temporary or semi-permanent
• Near-Line Disk – infrequent, x-platform, FC or IP, BI copy, DLM, or staging for backup
• Tape Backup – infrequent, FC or IP, required best practice for DR
Application Availability with Local Clustering
Server 1
Instance ‘A’
Server 2
Instance ‘B’
Database
Protects from local server failures
Depends on shared available storage
• Extends local clustering model to several sites
• Requires data mirroring or replication
Cleveland
Columbus
Cincinnati
Sandusky
Site Migration
Failover
Replication
• Conduct a Business Impact Analysis
• Identify which processes are truly critical and cost of BC
• Prioritize investments in people and technology
• Plan and Implement
• Test, test, test!!!
• Review the business continuity plan when the business process changes
• State Highway Patrol
• Bureau of Motor Vehicles
• Emergency Management Agency
• Emergency Medical Services
• Investigative Unit
• Homeland Security
• Administration
• State of Ohio Computer Center –
– West campus of Ohio State University
– Primary site
– Full data center facilities, i.e., UPS, Generator, Environmental
– Operates light out
• Charles D. Shipley Building – Public Safety Headquarters,
1970 W. Broad Street
– Approximately 4 miles apart
– Secondary site
– Full data center facilities, i.e., UPS, Generator, Environmental
– Remote operations
• OC48 Sonet ring between the buildings
– Moving to Gigabit Ethernet
• Mainframe environment has mirrored disks at primary site, 3 rd mirrored leg at secondary site
• Robotic tape silos at primary site, remote tape drives at secondary site
• Redundant server with failover for law enforcement
• Servers at either site, mirror to other site
• Prioritize business functions
• Work with business units for business continuity to determine IT disaster planning levels
• Determine level of acceptable risks
– Distance for secondary site
– Hot versus cold site
– Mirror data versus backups
– Redundant servers with failover versus build new server at time of disaster
“The pessimist sees difficulty in every opportunity.
The optimist sees opportunity in every difficulty”
- Winston Churchill