Disaster Recovery Planning Questions to the Audience What is an IT Disaster • What is an IT Disaster? • ‘Disaster’ – the unplanned interruption of normal business processes resulting from the interruption of the IT infrastructure components used to support them. Common Types 1 : Power outages 28% Hurricanes 6% Storm Damage 12% Fires 6% Floods 10% Software Error 5% Hardware Error 8% Power surge/spike 5% Physical Attack 7% Earthquake 5% 1. Healthcare Information and Management Systems Society (himss.org) What is an IT Disaster • What is an IT Disaster? • ‘Disaster’ – the unplanned interruption of normal business processes resulting from the interruption of the IT infrastructure components used to support them. Common Types: ✔ Power outages 28% ✔ Storm Damage 12% ✔ Floods Hurricanes 6% ✔ Fires 6% 10% ✔ Software Error 5% ✔ Hardware Error 8% ✔ Power surge/spike 5% Physical Attack 7% Earthquake 5% Business Continuity versus Disaster Recovery • These are not the same thing! • Business Continuity (BC): Considers the academic, research and business functioning of the institution as a whole. Includes risk assessment, and plans for functional units and business processes. Potentially wider variety of scenarios to consider. • Disaster Recovery (DR): IT activities to enable recovery to an acceptable condition after a disaster. BC includes DR. DR requires guidance from BC to direct priorities and set scope. What is the York DR Plan? Review 2008 Plan • Project start: January 2003 • Sponsored by CIO and VP Finance and Administration • Scope • Systems: “key information systems” • Scenarios: “localized disaster or failure” • Intended to be a multi-phase, multi-year project What is the York DR Plan? • Engaged functional unit leaders and IT support areas • Asked to identify maximum tolerable outage and data loss • Surprise: >50% of business processes ranked “critical” • Reality check based on observed impacts from lesser-scale outages • VP and AVP consultations were the final step to confirm criticality Risk Management Optimal Cost/Benefit Cost of Incidents Cost of Countermeasures Low Degree of Assurance High What is the York DR Plan? • DR Threat Assessment • • • • • Proximity to heavy industry – Oil depot across street Freight train corridor (chemical spill 1980) Near intersection of major highways (400 & 407) York main campus on flight path of two airports Main data centre in basement of old building with UPS but no generator • High pedestrian traffic (Science Library and washrooms upstairs) directly overhead • Worst case scenario chosen: • Loss of building containing main data centre What is the York DR Plan? • By 2008 • Secured Telus site for secondary site • Identified 4 categories of information systems • • • • Recovery Point Objectives (RPO) Recovery Time Objectives (RTO) Strategy defined on style of recovery for each Business owners classified which systems belong in which categories • Large infrastructure upgrades identified to meet the RTO/RPOs • Planned to annually refresh DR plan 2012 DRP Refresh • It’s been 4 years • • • • Big upgrade on storage and core network Acquisition of second on-campus data centre IT department merger And … 2012 DRP Refresh 2012 DRP Refresh 2012 DRP Refresh 2012 DRP Refresh 2012 DRP Refresh 2012 DRP Refresh 2012 DRP Refresh 2012 DRP Refresh Goals for 2012 Refresh 2012 Goals • Focus on C1 business applications as of 2012 • IT staff / office space not in scope • Scenario is the loss of a single data centre (not both) • Validate the categorization of “information systems” • Gap Analysis for C1 information systems • Table-top recovery scenario for supporting infrastructure Methodology • Produce the complete UIT-supported application inventory • How hard can this be? • The one list did not exist • Categorize Applications and focus on 2012 C1 Applications • Gap Analysis and Planning • Tabletop Recovery of supporting infrastructure DR Categories Categories and associated RTOs/RPOs Category Summary Recovery Time Objective (RTO) Recovery Point Objective (RPO) Category 1 Vital Communications and Emergency Services <= 4 hours <= 15 minutes Category 2 Critical Customer / Partner <= 48 hours Interfaces and Emergency Systems <= 15 minutes Category 3 Critical Customer / Partner <=7 days Interfaces and Emergency Systems <= 24 hours Category 4 Critical Internal Departmental Services and Non-Critical Customer Interface <= 48 hours <= 14 days Application Categorization • CIO/Business owners re-categorized the application list • Result: • “information systems” changed criticality 2008 • C1 – 5 services; C2 – None 2012 • C1 – 5 different services; C2 – 7 services C1/C2 Applications • Gap Analysis • Table-top recovery scenario • “That is still in service, why?”, “That does what? When did that start?” • Documentation, documentation, documentation • Update deployment and SOP for services Example Normal Service Example Recovered Service DR of Supporting Infrastructure • The Business focuses on applications • Document infrastructure service dependencies • Determine the services required by Infrastructure groups to complete a recovery • ie: Monitoring, secure access, system inventory, recovery documentation, etc • Some services are considered Category 0 services • ie: storage, network, and power • Tabletop recovery exercise Lessons Learned • RTOs and RPOs are set by the business not IT • IT helps in getting to the real requirement • • • • Services evolve and RTOs change Infrastructure capabilities change Identify key technologies Continual Improvement • DR is big .. Do it in small chunks • DR is not Backup • DR Planning can be used in more than just DR Next Steps • • • • Review the DR plan for remaining services Asking the DR question up front Disaster RTO/RPO versus Operational RTO/RPO Bring staff space and equipment into scope Questions Chris Russell Director of Information and Communication Technology Infrastructure, York University russel@yorku.ca Rick Smith Lead Architect, York University rvsmith@yorku.ca