Business Continuity Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology The Pennsylvania State University University Park, PA 16802 chu@ist.psu.edu IST 515 Organizational Security Policy Organizational Design Asset Classification and Control Access Control Compliance Personnel Security Awareness Education System Development and Maintenance Physical and Environmental Security Communications & Operations Mgmt. Business Continuity Management What Can Disrupt Your Business ? Fire Flood Terrorism Hackers Power IT (http://www.thehindubusinessline.com/mentor/2005/08/15/images/2005081500291101.jpg) Objectives Describe what is a disaster? Describe the differences between natural and man-made disasters. Describe what is business continuity? Discuss why continuity planning? Discuss the prime elements of BCP. Develop and document project scope and plan. Conduct Business Impact Analysis (BIA). Implement and maintain the plan. Conduct training in terms of BCP. Describe the differences between BCP and DRP. Readings • Hansche, S., Berti, J. and Hare, C., Official (ISC)2 Guide to the CISSP Exam, Auerbach, 2004. Chapter 9 (Required). • Swanson, M., Wohl, A., Pope, L., Grance, T., Hash, J., and Thomas, R., Contingency Planning Guide for Information Technology Systems, NIST Special Publication 800-34, June 2002. • Wikipedia, Business Continuity. http://en.wikipedia.org/wiki/Business_continuity • NFPA 1600, Standard on Disaster/Emergency Management and Business Continuity Programs, 2007. http://www.nfpa.org/assets/files/pdf/nfpa1600.pdf What is a Disaster ? “A business disaster is that point in time after the “cause” when you can not provide your customers and users with the minimum level of services they need and expect” Disasters • A disaster is any sudden, unplanned calamitous event that brings about great damage or loss. In the business environment, it is any event creating an inability for the organization to support critical business functions for some predetermined period of time. • If it harms critical business processes, it may be a disaster. • Time-based definition – how long can the business stand the pain? • Probability of occurrence. • Anything that diminishes or destroys normal data processing capabilities. Types of Disasters Natural Earthquakes, floods, storms (i.e., thunder, hail, lightning, electrical, snow, winter ice), tornadoes, hurricanes, volcanic eruptions, natural fires System/ technical Hardware/software outages, programming/system errors Supply Systems Communication outages, power distribution (i.e., blackouts), burst pipes Man-made Political Events Bombings, explosions, disgruntled employees, fires, purposeful destruction, aircraft crashes, hazardous/toxic spills, chemical contamination, malicious code Terrorist attacks, espionage, riots or civil disturbances, strikes Business Continuity Scenarios Large-scale natural disaster (hurricanes, earthquakes) Power outage caused by a winter storm Malfunctioning software Server malfunction Failed hard drive Office fire Computer virus outbreak Terrorist attack Pandemic disease outbreak Why Doesn’t Everyone Plan? • The Human element. • The “it’s not going to happen to me” view or philosophy. • We have a tendency to view concerns from a “life span” and personal experience aspect. – – – – It hasn’t happed yet… Not on Manager’s list of goals We’ll get to it Looks to BIG! Where do we start? Contingency Planning and Risk Management Contingency Planning Risk Management Emergency Events Contingency Plan Execution Security Control Implementation Plan Response Recover Business Continuity (1) Business continuity is the activity performed by an organization to ensure that critical business functions will be available to customers, suppliers, regulators, and other entities that must have access to those functions. These activities include many daily chores such as project management, system backups, change control, and help desk. Business Continuity is not something implemented at the time of a disaster; Business Continuity refers to those activities performed daily to maintain service, consistency, and recoverability. Business Continuity (2) Organizations write several types of plans, such as the Contingency Plan, Business Continuity Plan (BCP), Business Resumption Plan (BRP), or Disaster Recovery Plan (DRP), to ensure the availability of critical information system resources in the event of an expected network interruption or a disaster. Business Continuity Plan (BCP) Scope of Business Continuity • BCP and DRP addresses the preparation, processes, and practices required to ensure the preservation of the business in the face of major disruptions to normal business operations. • BCP and DRP involve the identification, selection, implementation, testing, and updating of processes and specific actions necessary to prudently protect critical business processes from the effects of major system and network disruptions and to ensure the timely restoration of business operations if significant disruptions occur. Business Continuity & Disaster Recovery Business Continuity and Disaster Recovery are a part of Business Continuity Management Incident Assessment Within Hours Declare a disaster Travel To DR Recovery of IT completely End of disaster recovery Restore Information Technology Back to normal operations Stabilization of production operations Return to Cleanup the mess primary Plan for return to primary site site Business Continuity Management Business Continuity Plans Disaster Recovery Plans Crisis Management Plans Normal problem management Emergency response plans Copyright 2006 VCPI Business Resumption Plans Scope of Business Continuity • Continuation of critical business processes when a disaster destroys data processing capabilities • Used to be just the data center • Now includes: - Distributed operations - Personnel, networks, power - All aspects of the IT environment BCM - Not just an IT issue! Business Continuity Planning IT contingency planning refers to a coordinated strategy involving plans, procedures, and technical measures that enable the recovery of IT systems, operations, and data after a disruption. Contingency planning generally includes one or more of the approaches to restore disrupted IT services: • Restoring IT operations at an alternate location • Recovering IT operations using alternate equipment • Performing some or all of the affected business processes using non-IT (manual) means (typically acceptable for only short-term disruptions). Business Continuity Planning • To prevent interruptions to normal business activity • To protect critical business processes from natural and man-made failures or disasters and the resultant loss of capital due to the unavailability of normal business processes • A strategy to minimize the effect of disturbances and to allow for resumption of business processes Why Continuity Planning (1) • Reality of Terrorist Attack. E.g., September 11 attack. • Natural Disasters. E.g., Hurricane Katrina, Fire, flood, hurricane, tornado, earthquake, volcanoes. • Economic Frauds. E.g., the recent corporate corruption cases of WorldCom, Enron, HealthSouth, etc. • Internal and External Audit Oversight. Why Continuity Planning (2) • Legislative and Regulatory Requirements: – HIPPA (Health Insurance Portability and Accountability Act). 1996. – SOX (Sarbanes–Oxley Act). 2002. – GLB (The Gramm-Leach Bliley). 1999. – The Patriot Act. 2001. Why have a Business Plan ? According to research data kept at the National Archives & Records Administration in Washington, DC: • Nearly 90% of all small businesses don't have a continuity plan in place • Only 43% of businesses suffering a disaster ever recover sufficiently to resume business • Of those that do reopen, only 29% are still operating two years later • 93% of businesses that lost their data-center for more than 9 days filed for bankruptcy within one year of the disaster. • 50% of businesses that found themselves without data management for more than 9 days filed for bankruptcy immediately. Success, Recovery or Failure? Fully tested effective BCM A B No BCM – lucky escape C Time Critical Recovery Point No BCM – usual outcome Broad BCP Objectives • Create, document, test, and update a plan that will: • Allow timely recovery of critical business operations • Minimize loss • Meet legal and regulatory requirements • Availability – the main focus • Confidentiality – still important • Integrity – still important Categories of Potential Loss • • • • • Revenue Loss. Extra Expense. Compromised Customer Service. Embarrassment or Loss of Confidence Impact. Hidden Benefits of Continuity Planning. BCP Cycle The Business Continuity Management Cycle Understanding Your Business 1 Exercising, Maintenance and Audit 5 2 BCM P Business Continuity Strategies Managem Programme Management 4 Building & Embedding a BCM Culture 3 Develop and Implement BCM Plans & Solution(s) The Five BCP Phases 1. Project Management & Initiation 2. Business Impact Analysis (BIA) 3. Review Recovery Strategies 4. Plan Design & Development 5. Testing, Maintenance, Awareness, and Training 1. Project Management & Initiation • • • • • • • • Establish need (risk analysis). Get management support. Identify strategic internal and external resources. Establish team (functional, technical, BCC – Business Continuity Coordinator) Create work plan (scope, goals, methods, timeline) Prepare and present an initial report to management Obtain management approval to proceed. Develop formal meeting schedules. BCP Team Members • • • • • • • • • Senior management. BCP planner/coordinator. Recovery team members. Business unit representatives. Crisis management team. User community. Systems and network experts. Information security department. Legal representatives. 2. Business Impact Analysis (BIA) • The BIA is a management-level analysis that identifies the impact should a potential data processing outage occur. • Goal: obtain formal agreement with senior management on the MTD for each time-critical business resource • MTD – Maximum tolerable downtime, also known as MAO (Maximum Allowable Outage). • Quantifies loss due to business outage (financial, extra cost of recovery, embarrassment) • Does not consider what types of incidents cause a disruption; only identifying consequences. Purpose of BIA • Provide written documentation to understand the impact associated with possible outages. • Identify an organization’s business functions and determine how critical those functions are to the organization. • Identify any concerns that staff or management may have. • Prioritize critical systems. • Analyze the impact of an outage. • Determine recovery windows for each business function. BIA Procedure • Choose information gathering methods (surveys, interviews, software tools). • Select interviewees. • Customize questionnaire. • Analyze information. • Identify time-critical business functions. • Assign maximum tolerable downtimes (MTDs). • Rank critical business functions by MTDs. • Document, prepare, and report recommendations. • Obtain management approval. Critical Business Function Categories Item Required Recovery Time Nonessential 30 days Normal 7 days Important 72 hours Urgent 24 hours Critical/Essential 1 – 4 hours Critical Business Function Categories Item Required Recovery Time Very High (1) 0 – 12 hours High (2) 12 – 24 hours Moderate (3) 24 - 72 hours Low (4) > 72 hours Example of BIA • An order department might list the following tasks and recovery time periods: • Receive orders electronically via e-commerce Web site: Critical/Essential • Receive orders by facsimile machine: Critical/Essential • Receive orders by phone system: Critical/Essential • Input orders into ordering system: Important • Process orders: Important • Issue and mail invoices: Important Sample BIA Question Topics • • • • • • • • • Business function. Date of interview. Contact name. Business process. Financial impacts. Operational impacts. Legal obligations. Damage to reputation. Technological dependence. • Interdependencies. • Existing BCP measures. • Alternate processing options. • Customized options: - Financial impact - Operational impact - Legal obligation - Damage to reputation 4. Plan Development and Implementation 1. 2. 3. 4. Determine management concerns and priorities. Determine planning scope. Establish outage assumptions. Define prevention strategies for risk management, physical security, information security, insurance coverage, and how to mitigate the emergency. 5. Identify resumption strategies for mission-critical applications and systems at alternate sites. 6. Identify recovery strategies for non-mission-critical applications and systems at alternate sites and for relocating the emergency operations center/command center to the recovery site. Plan Development and Implementation 7. Develop service function recovery plans, including information processing, telecommunications, etc. 8. Develop business function recovery plans and procedures. 9. Develop facility recovery plans. 10. Identify the response procedures. 11. Gather data required for plan completion. 12. Review and outline how the organization will interface with external groups. (Communication) 13. Review and outline how the organization will cope with other complications beyond the actual disaster. BCP Coverage • • • • • • • • • • • General Introduction and Overview. Policy Statement. Functional Areas Priorities. Critical Resources / Non-critical Resources. Procedural Considerations. Emergency and Evacuation Procedures. Recovery Teams. Recovery Processes. Emergency Operations Center/Command Center. Facility Considerations. Inventory Considerations. BCP Coverage • • • • • • • • • • Equipment Considerations. Communication Considerations. Documentation Considerations. Data/Software Considerations. Transportation Considerations. Supporting Equipment. Responding to the disaster. Resume critical business functions. Resume Non-Critical Business Functions. Planning for Return to the Primary Site (Restoration Operations). • Interfacing with External Groups. Continuity Plans Components • Awareness of Roles and Responsibilities – Who will do what? Employees and staff are critical. Pandemic is an extreme example of a disaster where employee resources will be very limited! • Defined recovery time objectives • Risk Management to identify & reduce risks • Alternate Processes (telecommuting, distance learning) • Alternate recovery locations • Off-site storage of critical media and non-media items • Written plans, reviewed & updated regularly • Frequent plan exercises Business Continuity Plans must be useful Make sure the plans that protect each of us is more than …….. Successful Business Continuity Planning helps ensure that employees and the interests of owners and customers are protected. 5. Plan Testing • Until it’s tested, you don’t have a plan. • Types of testing: - Structured walk-through Check List Simulation Parallel Full interruption Goals of Plan Testing • Ensure the understanding and workability of documented recovery procedures. • Acquaint test participants and recovery teams with their roles in the event of a disaster. • Verify that recovery strategies are viable. • Train team leaders and members in the procedures of executing the continuity plan. • Identify flaws and oversights in plan procedures and strategies. • Obtain information about recovery strategy implementation. Goals of Plan Testing • Demonstrate that output performance of backup systems and networks are consistent with production systems and networks. • Adapt and update existing plans to encompass new requirements. • Test all components of the plan, including hardware, software, personnel, data and voice communications, procedures, supplies and forms, documentation, transportation, utilities, alternate site processing, etc. 5. Plan Maintenance • Resolve all problems/deficiencies found during testing. • Implement change management. • Audit and address audit findings • Build maintenance procedures into the organization operation. • Annual review of plan • Centralize responsibility for updates. • Report updates regularly to team members and, if necessary, to senior management. 5. Awareness and Training • BCP team is probably the DR team. • All staff should be trained in the business recovery process. • Training should cover a range of outcomes, from simple awareness of the major provisions of the plan to the ability to carry out specific procedures. • BCP training must be on-going. • BCP training needs to be part of the standard on-boarding and part of the corporate culture. BCP Training Coverage • Describe the recovery organization (teams and functions). • Explain the flow of recovery events and activities following a disaster. • State team members’ responsibilities in recovery activities. • Provide an opportunity for each recovery team to meet to develop in-depth knowledge of their responsibilities and procedures. • Require teams to conduct drills using the actual checklists and procedures in their section of the recovery plan. • If possible, include a plan for cross-training teams so those individuals are familiar with a variety of recovery roles and responsibilities. Sponsorship is Key to Success • Board of Directors or Senior executives (president, vice presidents, officers) must identify BCP a priority. • Executives and senior managers must actively support the BCP Process. • Business Recovery Coordinators (BRCs) within business units / departments must be actively involved, developing, implementing, and exercising BC plans, and accept ownership of their plans. Communication is Critical • Employees, customers, business partners must know key information about your plan if your plan is to work. • Plans must be periodically reviewed in team meetings and shared with new team members. Secret Plans won’t work! Communication….. • Contact information for all team members must be current • Make sure employees have Emergency Wallet Cards with key phone numbers, etc • Plans must include: – Clear chains of authority – Clear listing of tasks, roles and responsibilities – DR conference lines or standing communication tools – Standing meetings (times, numbers) – Alternate meeting locations – Centralized communication facility (VM, web site, etc…) Off Site Storage is Critical ! When a facility is lost or inaccessible, all items inside are no longer available. What is needed in off site storage if you had to recover from scratch • PC backup media must be stored off-site? • Critical, non-media, documents and materials must be available in an off-site location, accessible by appropriate individuals or teams during a disaster or exercise. • Key personnel must know where off-site storage items are located and to where items will be shipped (Hotsite, Incident Command Center or remain in off-site storage?) Effective BCP Is Built On 7 P’s • Programme - the total BCM strategy • People - Roles and responsibilities, H&S, awareness and education • Processes - all organisational processes including ICT • Premises - buildings & facilities • Providers - supply chain inc. outsourcing • Profile - brand, image and reputation • Performance - benchmarking, evaluation & audit Essential Elements of BCM • Take a holistic approach • ‘End to End’ • Effects, not causes • Prevention, not just cure • Culture of BCM • Need to measurement Common Pitfalls In BCP Industrial and Professional Standards (1) • BS 25999-1 (2006), Business continuity management, Part 1: Code of practice, The British Standards Institution, United Kingdom. • HB 221 (2004), Business Continuity Management, Standards Australia, Australia. • HB 292 (2006), A practitioners guide to business continuity management, Standards Australia, Australia Industrial and Professional Standards (2) • BS ISO/IEC 17799 (2005), Code of practice for information security management, The British Standards Institution, United Kingdom. • NFPA 1600, Standard on Disaster/Emergency Management and Business Continuity Programs, The National Fire Protection Association, United States. • Defense Security Service (DSS), formally known as Defense Investigative Services (DIS). • National Institute of Standard and Technology (NIST). Current Regulations/Standards • US - Securities and Exchange Commission - NASD Rules 3510 & 3520 and the NYSE Rule 446 • Basel II & E-banking • UK Civil Contingencies Act • Sarbanes Oxley • UK FSA – BCM Guidance • PAS 56 and from Summer 2006 BSI • King II in South Africa • Singapore - MAS BCM Standard • Australian Standard for BCM • US - NFPA 1600 • Europe - Netherlands, Luxemburg, Belgium, et al A Changing World Corporate Governance ISO 17799-01 CCA, Comp Act GDPdU & GoBS BS7799-02 NF Z 42-013 COBIT AIPA ITIL King II MAS IT Baseline China Basel II Sarbanes Oxley Act APO BS 25999 -1 BS 25999-1 BCP Planning Resource Contingency Planning Association of the Carolinas – www.cpaccarolinas.org Disaster Recovery Journal – www.drj.com/groups/drj6.html Disaster Recovery Institute International (DRII) – www.drii.org/ DHS - www.ready.gov/ FEMA - www.fema.gov/ Institute for Business & Home Safety (IBHS) – www.ibhs.org/business_protection/ Premier Safety Institute – www.premierinc.com/quality-safety/toolsservices/safety/index.jsp Key Terminologies • Business Continuity Plan (BCP): A document describing how an organization responds to an event to ensure critical business functions continue without unacceptable delay or change. • Business Continuity Planning. Business continuity planning will help organizations: – Identify the impacts of potential data processing operational disruptions and data loss. – Formulate and implement viable recovery plans to ensure the availability of data processing support for critical applications, data, and services. – Develop, implement, and administer a comprehensive BCP training, testing, and maintenance program. Key Terminologies • Business Resumption Planning (BRP). BRP develops procedures to initiate the recovery of business operations immediately following an outage or disaster. It can also outline the procedures for returning critical business functions to the normal processing site following the interruption. • Continuity of Operations Plan (COOP): A COOP is a document describing the procedures and capabilities to sustain an organization’s essential, strategic functions at an alternate site for up to 30 days. Key Terminologies • Crisis Communication Plan. A document that outlines the procedures for disseminating status reports to personnel and the public in the event of an outage or disaster. • Cyber Incident Response Plan. This document provides the strategies to detect, respond to, and limit consequences of malicious cyber incidents. The focus is on information security responses to incidents affecting systems and/or networks. Key Terminologies • Disaster Recovery Planning. Disaster recovery refers to the immediate and temporary restoration of critical computing and network operations after a natural or man-made disaster within defined timeframes. An organization documents how it will respond to a disaster and resume the critical business functions within a predetermined period of time; minimize the amount of loss; and repair (or replace) the primary facility to resume data processing support.