Disaster Recovery Planning

advertisement
Disaster Recovery Planning
Questions to the Audience
What is an IT Disaster
• What is an IT Disaster?
• ‘Disaster’ – the unplanned interruption of normal business
processes resulting from the interruption of the IT
infrastructure components used to support them.
Common Types 1 :
Power outages
28%
Hurricanes
6%
Storm Damage
12%
Fires
6%
Floods
10%
Software Error
5%
Hardware Error
8%
Power surge/spike
5%
Physical Attack
7%
Earthquake
5%
1. Healthcare Information and Management Systems Society (himss.org)
What is an IT Disaster
• What is an IT Disaster?
• ‘Disaster’ – the unplanned interruption of normal business
processes resulting from the interruption of the IT
infrastructure components used to support them.
Common Types:
✔ Power outages
28%
✔ Storm Damage
12%
✔ Floods
Hurricanes
6%
✔
Fires
6%
10%
✔
Software Error
5%
✔ Hardware Error
8%
✔
Power surge/spike
5%
Physical Attack
7%
Earthquake
5%
Business Continuity versus Disaster Recovery
• These are not the same thing!
• Business Continuity (BC): Considers the academic, research and business
functioning of the institution as a whole. Includes risk assessment, and
plans for functional units and business processes. Potentially wider
variety of scenarios to consider.
• Disaster Recovery (DR): IT activities to enable recovery to an acceptable
condition after a disaster. BC includes DR. DR requires guidance from BC
to direct priorities and set scope.
What is the York DR Plan?
Review 2008 Plan
• Project start: January 2003
• Sponsored by CIO and VP Finance and Administration
• Scope
• Systems: “key information systems”
• Scenarios: “localized disaster or failure”
• Intended to be a multi-phase, multi-year project
What is the York DR Plan?
• Engaged functional unit leaders and IT support areas
• Asked to identify maximum tolerable outage and data loss
• Surprise: >50% of business processes ranked “critical”
• Reality check based on observed impacts from lesser-scale
outages
• VP and AVP consultations were the final step to confirm
criticality
Risk Management
Optimal
Cost/Benefit
Cost of
Incidents
Cost of
Countermeasures
Low
Degree of Assurance
High
What is the York DR Plan?
• DR Threat Assessment
•
•
•
•
•
Proximity to heavy industry – Oil depot across street
Freight train corridor (chemical spill 1980)
Near intersection of major highways (400 & 407)
York main campus on flight path of two airports
Main data centre in basement of old building with UPS but no
generator
• High pedestrian traffic (Science Library and washrooms
upstairs) directly overhead
• Worst case scenario chosen:
• Loss of building containing main data centre
What is the York DR Plan?
• By 2008
• Secured Telus site for secondary site
• Identified 4 categories of information systems
•
•
•
•
Recovery Point Objectives (RPO)
Recovery Time Objectives (RTO)
Strategy defined on style of recovery for each
Business owners classified which systems belong in which
categories
• Large infrastructure upgrades identified to meet the RTO/RPOs
• Planned to annually refresh DR plan
2012 DRP Refresh
• It’s been 4 years
•
•
•
•
Big upgrade on storage and core network
Acquisition of second on-campus data centre
IT department merger
And …
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
Goals for 2012 Refresh
2012 Goals
• Focus on C1 business applications as of 2012
• IT staff / office space not in scope
• Scenario is the loss of a single data centre (not both)
• Validate the categorization of “information systems”
• Gap Analysis for C1 information systems
• Table-top recovery scenario for supporting infrastructure
Methodology
• Produce the complete UIT-supported application inventory
• How hard can this be?
• The one list did not exist
• Categorize Applications and focus on 2012 C1 Applications
• Gap Analysis and Planning
• Tabletop Recovery of supporting infrastructure
DR Categories
Categories and associated RTOs/RPOs
Category
Summary
Recovery Time
Objective (RTO)
Recovery Point
Objective (RPO)
Category 1
Vital Communications and
Emergency Services
<= 4 hours
<= 15 minutes
Category 2
Critical Customer / Partner
<= 48 hours
Interfaces and Emergency Systems
<= 15 minutes
Category 3
Critical Customer / Partner
<=7 days
Interfaces and Emergency Systems
<= 24 hours
Category 4
Critical Internal Departmental
Services and Non-Critical
Customer Interface
<= 48 hours
<= 14 days
Application Categorization
• CIO/Business owners re-categorized the application list
• Result:
• “information systems” changed criticality
2008
• C1 – 5 services; C2 – None
2012
• C1 – 5 different services; C2 – 7 services
C1/C2 Applications
• Gap Analysis
• Table-top recovery scenario
•
“That is still in service, why?”, “That does what? When did
that start?”
• Documentation, documentation, documentation
• Update deployment and SOP for services
Example Normal Service
Example Recovered Service
DR of Supporting Infrastructure
• The Business focuses on applications
• Document infrastructure service dependencies
• Determine the services required by Infrastructure groups to
complete a recovery
• ie: Monitoring, secure access, system inventory, recovery
documentation, etc
• Some services are considered Category 0 services
• ie: storage, network, and power
• Tabletop recovery exercise
Lessons Learned
• RTOs and RPOs are set by the business not IT
• IT helps in getting to the real requirement
•
•
•
•
Services evolve and RTOs change
Infrastructure capabilities change
Identify key technologies
Continual Improvement
• DR is big .. Do it in small chunks
• DR is not Backup
• DR Planning can be used in more than just DR
Next Steps
•
•
•
•
Review the DR plan for remaining services
Asking the DR question up front
Disaster RTO/RPO versus Operational RTO/RPO
Bring staff space and equipment into scope
Questions
Chris Russell Director of Information and Communication
Technology Infrastructure, York University
russel@yorku.ca
Rick Smith Lead Architect, York University
rvsmith@yorku.ca
Download