Disaster Recovery Costs and Impact on Healthcare Operations Tom Walsh, CISSP Tom Walsh Consulting, LLC Overland Park, KS Copyright © 2009, Tom Walsh Consulting, LLC Things to Consider... • When regional disasters strike, the public expects healthcare services to be available • Healthcare is a critical component of our nation’s infrastructure • We cannot always control events; the only thing we can control is our reactions Copyright © 2009, Tom Walsh Consulting, LLC 2 Tom Walsh • Certified Information Systems Security Professional (CISSP) • Passed the exam for a Certified Business Continuity Professional • Co-authored three books on security • Invited speaker at national conferences • Information Security Officer for San Antonio Community Hospital in Upland, CA (Outsourced) • A little nerdy, but overall, a nice guy Copyright © 2009, Tom Walsh Consulting, LLC 4 Objectives • Using a Business Impact Analysis (BIA) to determine patient care and business operations needs • Discussing key concepts such as: Recovery Time Objective (RTO) and Recovery Point Objective (RPO) and their roles in determining an appropriate disaster recovery strategy Copyright © 2009, Tom Walsh Consulting, LLC 5 Objectives (2) • Examining the pros and cons of the various strategies • Comparing recovery costs versus recovery time • Selecting recovery strategies that support recovery time • Explaining potential solutions such as virtualization for reducing costs • Reviewing some lessons learned (if time permits) Copyright © 2009, Tom Walsh Consulting, LLC 6 Terminology • Business Continuity Plan (BCP) – The larger umbrella plan that covers multiple plans; the overall goal is to ensure the business can continue to operate in the aftermath of any problem or disastrous event A business continuity plan includes all departments Note: Government agencies often use the term Continuity of Operations Plan (COOP) or Contingency Plan instead of business continuity plan Copyright © 2009, Tom Walsh Consulting, LLC 8 Terminology • Disaster Recovery Plan (DRP) – Applies to major, usually catastrophic, events that deny access to the normal facility for an extended period (tend to focus on technology in a Data Center) • Contingency Plan – Focuses on sustaining a business function during a temporary disruption • Data Backup Plan – Outlines how backups of systems are performed, frequency of backups, rotation of backups, and storage of backups (on-site and off-site backups) Copyright © 2009, Tom Walsh Consulting, LLC 9 Terminology • Business Impact Analysis (BIA) – An exercise that determines the impact of losing the support of any resource to an organization and establishes the escalation of that loss over time, and identifies the minimum resources needed to recover, the Recovery Time Objective (RTO), and prioritizes the recovery of processes and supporting systems Copyright © 2009, Tom Walsh Consulting, LLC 10 Terminology • Recovery Time Objective (RTO) – The time within which business functions or application systems must be restored to acceptable levels of operational capacity • Recovery Point Objective (RPO) – The maximum tolerable loss of information due to the frequency of the backups – Example: If daily backups are made, then the RPO = 24 hours which is maximum loss of data (unless there are periodic snapshots of memory, transactional logs, or journaling) Copyright © 2009, Tom Walsh Consulting, LLC 11 Terminology • Disaster – A calamitous event that creates an inability on an organization’s part to provide the critical business functions for some predetermined period of time and which results in great damage or loss Note: The time factor which determines whether a service interruption is an inconvenience or a disaster will vary from organization to organization Healthcare executives should move beyond “What if” to questions of “Are we prepared?” Copyright © 2009, Tom Walsh Consulting, LLC 12 Interruptions, Disasters, & Recovery RTO < RTO = Problem Event Contingency Plan or Downtime Procedures > RTO = Disaster Recovery Time Activation of the Disaster Recovery Plan The Recovery Time Objective (RTO) is determined by the Business Impact Analysis Copyright © 2009, Tom Walsh Consulting, LLC 13 Terminology • Data Owner – (a.k.a. Information Owner) The directors or senior managers who are responsible for the functional areas or business units that depend on information systems to run their operations • Interdependencies – Relying upon input, assistance, support, or interaction between business units in order for each to complete their mission and objectives Copyright © 2009, Tom Walsh Consulting, LLC 14 Terminology Instead of… Try using… Redundancy High availability, Resiliency, or Failover systems Backup Data Center Recovery Site or Alternate Data Center Return on Investment Loss avoidance Unimportant Less critical Copyright © 2009, Tom Walsh Consulting, LLC 15 Business Continuity Plan The objectives of a business continuity plan (BCP) are to: – Protect human life – Maintain services to patients – Lessen the overall impacts by defining strategies and predetermined responses – Create a systematic approach to recover and restore systems – Comply with applicable laws and regulations Copyright © 2009, Tom Walsh Consulting, LLC 16 It’s Not Just “A Plan” Business Continuity and Disaster Recovery Planning focuses on three things: #1 People #3 Information Systems Copyright © 2009, Tom Walsh Consulting, LLC #2 Data 17 Key Steps in BCP and DRP • • • • • • • • Define the scope of the project Conduct a risk analysis Conduct a Business Impact Analysis (BIA) Research and recommend strategies Write the plan Educate staff on the plan Exercise and test the plan Revise and maintain the plan Copyright © 2009, Tom Walsh Consulting, LLC 18 Conduct a Business Impact Analysis • Without a Business Impact Analysis (BIA), the organization runs the risk of either overcommitting or underestimating the resources required to respond to a disaster or business disruption • The BIA is the foundation for Business Continuity and Disaster Recovery Planning Copyright © 2009, Tom Walsh Consulting, LLC 19 BIA Objectives 1. Identify the critical resources required to minimally maintain business operations in the wake of a disastrous event 2. Estimate the operational and financial impacts due to the loss of an information resource as it relates to the functioning of the organization Copyright © 2009, Tom Walsh Consulting, LLC 20 BIA Objectives 3. Determine business recovery objectives and assumptions 4. Establish an order or priority for restoring business functions and the information resources that support those functions 5. Facilitate planning strategies Copyright © 2009, Tom Walsh Consulting, LLC 21 BIA Questions • What is the impact to patient care? – Identify key patient care departments • How much downtime, loss of revenue, and loss of data can each department or business unit sustain? • What are the IT systems that support those mission-critical operations? Copyright © 2009, Tom Walsh Consulting, LLC 22 BIA Questions • If this business unit generates revenue, then on average, what is the hourly revenue generated? • How is data or information received and processed by those departments? • What are the dependencies? – Key employees, vendors, workflows, supply chain, etc. Copyright © 2009, Tom Walsh Consulting, LLC 23 Possible Impacts • • • • • Inability to treat patients Financial losses and lost revenue An organization's credibility and reputation Penalties or fines for noncompliance Litigation – Executives and officers are potentially culpable for not allocating the necessary resources to ensure the continuity of business (Duty of Care) Copyright © 2009, Tom Walsh Consulting, LLC 24 Analysis of BIA Data • Determine the Recovery Point Objective (RPO) for each department or business unit – Assess any gaps with current backup plan • Determine the Recovery Time Objective (RTO) for each department or business unit – Determine the order in which information systems are needed (restoration priority) Copyright © 2009, Tom Walsh Consulting, LLC 25 Analysis of BIA Data (2) • Identify the vital records necessary for running the business – Format and location of the records • Determine existing technologies for supporting high availability and recovery • Assess the gap between current recovery capabilities and needed capabilities to sustain the business Copyright © 2009, Tom Walsh Consulting, LLC 26 Analysis of BIA Data (3) • List departments and business units ordered by their recovery time objective (RTO) and/or impact to patient care • Identify gaps between current recovery capability and needed recovery capability • Validation of BIA with key stakeholders Copyright © 2009, Tom Walsh Consulting, LLC 27 New Threat – Pandemic Flu Copyright © 2009, Tom Walsh Consulting, LLC 28 Research Recovery Strategies • Determine how gaps between current recovery capability and recovery needs (RTO and RPO) will be handled • Research potential recovery strategies to meet the overall RTO • Create cost-benefit analysis • Make recommendations for business continuity and disaster recovery Copyright © 2009, Tom Walsh Consulting, LLC 29 Strategy – Alternate Sites Site Hot Warm Cold Advantages Disadvantages Shortest recovery time Most expensive Equipment is supplied Short-term use of facility Easy to test backups and recovery plans Facility may not always be available Moderately priced Not easy to test plans Basic infrastructure with some equipment Facility may not always be available Most inexpensive Longest recovery time Basic infrastructure No equipment is supplied; it must be ordered, delivered, and installed Can usually rent the space for longer period ofCopyright time No © 2009, Tom Walsh Consulting, LLC way to test 30 Recovery Time versus Strategy Copyright © 2009, Tom Walsh Consulting, LLC 31 Costs versus Recovery Time Source: DRI International DRP-501 Business Continuity Planning Review 32 Recovery Site Location • Too close – It may be affected by the same regional disaster • Too far away – May have difficulty getting employees to leave their homes and families during a disaster to work at an alternate or recovery site – Ability to leave the disaster area – Costs associated with travel and temporary living expenses Copyright © 2009, Tom Walsh Consulting, LLC 33 Strategy – Virtualization • Virtualization – A condition without boundaries or constraints • Virtual machine – A single server running multiple operating systems (Windows, Linux, NetWare, etc.) and applications • Originally developed by IBM in 1960s for the mainframe operating system • Breaks the “one server, one application” standard by decoupling the physical hardware from the operating system Copyright © 2009, Tom Walsh Consulting, LLC 34 Virtualization Virtual machine One server, multiple operating systems and applications One server per operating system and application Copyright © 2009, Tom Walsh Consulting, LLC 35 Virtualization – Benefits • Zero downtime – Within seconds, systems can be moved from one physical server to another • Ease of managing failover systems – Servers are treated as a uniform pool – Any spare server could be the recovery target for a virtual machine • Virtual machine environment is saved as a single file – Easier to back up, move and copy Copyright © 2009, Tom Walsh Consulting, LLC 36 Virtualization – Benefits • Owning and maintaining fewer servers – Making high availability more cost-effective – Curbing the proliferation of servers • Maintenance budget – Reduces hardware, power, cooling, and floor space requirements • Data does not leak across on virtual machines Copyright © 2009, Tom Walsh Consulting, LLC 37 Findings and Recommendations • Present report of findings and recommendations at meeting with data owners and senior leadership • Obtain an agreement on recovery strategies • Conclude the BIA portion of the project Providing realistic cost estimates may be difficult given the many variables and vendors’ unwillingness to disclose prices Copyright © 2009, Tom Walsh Consulting, LLC 38 Lessons Learned from Katrina Major challenges: • Communications outages made it difficult to locate missing personnel • Access to and reliable transportation into restricted areas was not always available • Lack of electrical power or fuel for generators rendered computer systems inoperable Copyright © 2009, Tom Walsh Consulting, LLC 40 Lessons Learned from Katrina Major challenges: • Obtaining replacement supplies as initial stocks are exhausted can be difficult – Diesel fuel for generators – Food and water • May need large amounts of cash to pay for critical supplies and services • Mail service was interrupted for months in some areas Copyright © 2009, Tom Walsh Consulting, LLC 41 Summary • Business continuity and disaster recovery planning should involve the entire organization (It is more than the recovery of the technology; it is the recovery of the business) • A business impact analysis is the foundation for planning • Select strategies that support recovery objectives which meet the needs of the organization (RPO & RTO) Copyright © 2009, Tom Walsh Consulting, LLC 43 References • DRI International, DRP-501 Business Continuity Planning Review • FFIEC Lessons Learned From Hurricane Katrina: Preparing Your Institution for a Catastrophic Event NIST Special Publications: • 800-34 Contingency Planning Guide for Information Technology Systems • 800-30 Risk Management Guide for Information Technology Systems Copyright © 2009, Tom Walsh Consulting, LLC 44 Tom Walsh, CISSP twalshconsulting@aol.com 913-696-1573 Copyright © 2009, Tom Walsh Consulting, LLC