Business Continuity Planning

advertisement
Business Continuity
Chao-Hsien Chu, Ph.D.
College of Information Sciences and Technology
The Pennsylvania State University
University Park, PA 16802
chu@ist.psu.edu
IST 515
Organizational
Security Policy
Organizational
Design
Asset Classification
and Control
Access Control
Compliance
Personnel Security
Awareness Education
System Development
and Maintenance
Physical and
Environmental Security
Communications &
Operations Mgmt.
Business Continuity
Management
What Can Disrupt Your Business ?
Fire
Flood
Terrorism
Hackers
Power
IT
(http://www.thehindubusinessline.com/mentor/2005/08/15/images/2005081500291101.jpg)
Objectives

Describe what is a disaster? Describe the differences
between natural and man-made disasters.

Describe what is business continuity?

Discuss why continuity planning?

Discuss the prime elements of BCP.

Develop and document project scope and plan.

Conduct Business Impact Analysis (BIA).

Implement and maintain the plan.

Conduct training in terms of BCP.

Describe the differences between BCP and DRP.
Readings
• Hansche, S., Berti, J. and Hare, C., Official (ISC)2 Guide
to the CISSP Exam, Auerbach, 2004. Chapter 9
(Required).
• Swanson, M., Wohl, A., Pope, L., Grance, T., Hash, J.,
and Thomas, R., Contingency Planning Guide for
Information Technology Systems, NIST Special
Publication 800-34, June 2002.
• Wikipedia, Business Continuity.
http://en.wikipedia.org/wiki/Business_continuity
• NFPA 1600, Standard on Disaster/Emergency
Management and Business Continuity Programs, 2007.
http://www.nfpa.org/assets/files/pdf/nfpa1600.pdf
What is a Disaster ?
“A business disaster is that point in time
after the “cause” when you can not
provide your customers and users with
the minimum level of services they
need and expect”
Disasters
• A disaster is any sudden, unplanned calamitous event
that brings about great damage or loss. In the business
environment, it is any event creating an inability for the
organization to support critical business functions for
some predetermined period of time.
• If it harms critical business processes, it may be a
disaster.
• Time-based definition – how long can the business stand
the pain?
• Probability of occurrence.
• Anything that diminishes or destroys normal data
processing capabilities.
Types of Disasters
Natural
Earthquakes, floods, storms (i.e., thunder, hail,
lightning, electrical, snow, winter ice), tornadoes,
hurricanes, volcanic eruptions, natural fires
System/
technical
Hardware/software outages, programming/system
errors
Supply
Systems
Communication outages, power distribution (i.e.,
blackouts), burst pipes
Man-made
Political
Events
Bombings, explosions, disgruntled employees,
fires, purposeful destruction, aircraft crashes,
hazardous/toxic spills, chemical contamination,
malicious code
Terrorist attacks, espionage, riots or civil
disturbances, strikes
Business Continuity Scenarios

Large-scale natural disaster (hurricanes,
earthquakes)
 Power outage caused by a winter storm
 Malfunctioning software
 Server malfunction
 Failed hard drive
 Office fire
 Computer virus outbreak
 Terrorist attack
 Pandemic disease outbreak
Why Doesn’t Everyone Plan?
• The Human element.
• The “it’s not going to happen to me” view or
philosophy.
• We have a tendency to view concerns from a “life
span” and personal experience aspect.
–
–
–
–
It hasn’t happed yet…
Not on Manager’s list of goals
We’ll get to it
Looks to BIG! Where do we start?
Contingency Planning and
Risk Management
Contingency Planning
Risk Management
Emergency
Events
Contingency
Plan
Execution
Security Control
Implementation
Plan
Response
Recover
Business Continuity (1)
Business continuity is the activity performed by an
organization to ensure that critical business
functions will be available to customers, suppliers,
regulators, and other entities that must have access
to those functions. These activities include many
daily chores such as project management, system
backups, change control, and help desk. Business
Continuity is not something implemented at the time
of a disaster; Business Continuity refers to those
activities performed daily to maintain service,
consistency, and recoverability.
Business Continuity (2)
Organizations write several types of plans, such
as the Contingency Plan, Business Continuity
Plan (BCP), Business Resumption Plan (BRP),
or Disaster Recovery Plan (DRP), to ensure the
availability of critical information system
resources in the event of an expected network
interruption or a disaster.
Business
Continuity
Plan (BCP)
Scope of Business Continuity
• BCP and DRP addresses the preparation,
processes, and practices required to ensure the
preservation of the business in the face of major
disruptions to normal business operations.
• BCP and DRP involve the identification,
selection, implementation, testing, and updating
of processes and specific actions necessary to
prudently protect critical business processes from
the effects of major system and network
disruptions and to ensure the timely restoration of
business operations if significant disruptions
occur.
Business Continuity & Disaster Recovery
Business Continuity and Disaster Recovery are a part of Business Continuity Management
Incident
Assessment
Within Hours
Declare a
disaster
Travel
To
DR
Recovery of IT completely
End of disaster recovery
Restore Information
Technology
Back to normal
operations
Stabilization of production operations Return to
Cleanup the mess
primary
Plan for return to primary site
site
Business Continuity Management
Business Continuity Plans
Disaster Recovery Plans
Crisis Management Plans
Normal problem management
Emergency response plans
Copyright 2006 VCPI
Business Resumption Plans
Scope of Business Continuity
• Continuation of critical business processes
when a disaster destroys data processing
capabilities
• Used to be just the data center
• Now includes:
- Distributed operations
- Personnel, networks, power
- All aspects of the IT environment
BCM - Not just an IT issue!
Business Continuity Planning
IT contingency planning refers to a coordinated strategy
involving plans, procedures, and technical measures that
enable the recovery of IT systems, operations, and data
after a disruption. Contingency planning generally
includes one or more of the approaches to restore
disrupted IT services:
• Restoring IT operations at an alternate location
• Recovering IT operations using alternate equipment
• Performing some or all of the affected business
processes using non-IT (manual) means (typically
acceptable for only short-term disruptions).
Business Continuity Planning
• To prevent interruptions to normal business
activity
• To protect critical business processes from
natural and man-made failures or disasters
and the resultant loss of capital due to the
unavailability of normal business processes
• A strategy to minimize the effect of
disturbances and to allow for resumption of
business processes
Why Continuity Planning (1)
• Reality of Terrorist Attack. E.g., September
11 attack.
• Natural Disasters. E.g., Hurricane Katrina,
Fire, flood, hurricane, tornado, earthquake,
volcanoes.
• Economic Frauds. E.g., the recent corporate
corruption cases of WorldCom, Enron,
HealthSouth, etc.
• Internal and External Audit Oversight.
Why Continuity Planning (2)
• Legislative and Regulatory Requirements:
– HIPPA (Health Insurance Portability and
Accountability Act). 1996.
– SOX (Sarbanes–Oxley Act). 2002.
– GLB (The Gramm-Leach Bliley). 1999.
– The Patriot Act. 2001.
Why have a Business Plan ?
According to research data kept at the National Archives
& Records Administration in Washington, DC:
• Nearly 90% of all small businesses don't have a continuity
plan in place
• Only 43% of businesses suffering a disaster ever recover
sufficiently to resume business
• Of those that do reopen, only 29% are still operating two years
later
• 93% of businesses that lost their data-center for more than 9
days filed for bankruptcy within one year of the disaster.
• 50% of businesses that found themselves without data
management for more than 9 days filed for bankruptcy
immediately.
Success, Recovery or Failure?
Fully tested
effective BCM
A
B
No BCM –
lucky escape
C
Time
Critical
Recovery Point
No BCM –
usual outcome
Broad BCP Objectives
• Create, document, test, and update a plan that will:
• Allow timely recovery of critical business
operations
• Minimize loss
• Meet legal and regulatory requirements
• Availability – the main focus
• Confidentiality – still important
• Integrity – still important
Categories of Potential Loss
•
•
•
•
•
Revenue Loss.
Extra Expense.
Compromised Customer Service.
Embarrassment or Loss of Confidence Impact.
Hidden Benefits of Continuity Planning.
BCP Cycle
The Business Continuity Management Cycle
Understanding
Your Business
1
Exercising,
Maintenance
and Audit
5
2
BCM
P
Business
Continuity
Strategies
Managem
Programme
Management
4
Building &
Embedding a
BCM Culture
3
Develop and
Implement BCM
Plans & Solution(s)
The Five BCP Phases
1. Project Management & Initiation
2. Business Impact Analysis (BIA)
3. Review Recovery Strategies
4. Plan Design & Development
5. Testing, Maintenance, Awareness,
and Training
1. Project Management & Initiation
•
•
•
•
•
•
•
•
Establish need (risk analysis).
Get management support.
Identify strategic internal and external resources.
Establish team (functional, technical, BCC –
Business Continuity Coordinator)
Create work plan (scope, goals, methods, timeline)
Prepare and present an initial report to
management
Obtain management approval to proceed.
Develop formal meeting schedules.
BCP Team Members
•
•
•
•
•
•
•
•
•
Senior management.
BCP planner/coordinator.
Recovery team members.
Business unit representatives.
Crisis management team.
User community.
Systems and network experts.
Information security department.
Legal representatives.
2. Business Impact Analysis (BIA)
• The BIA is a management-level analysis that
identifies the impact should a potential data
processing outage occur.
• Goal: obtain formal agreement with senior
management on the MTD for each time-critical
business resource
• MTD – Maximum tolerable downtime, also known
as MAO (Maximum Allowable Outage).
• Quantifies loss due to business outage (financial,
extra cost of recovery, embarrassment)
• Does not consider what types of incidents cause a
disruption; only identifying consequences.
Purpose of BIA
• Provide written documentation to understand the
impact associated with possible outages.
• Identify an organization’s business functions and
determine how critical those functions are to the
organization.
• Identify any concerns that staff or management may
have.
• Prioritize critical systems.
• Analyze the impact of an outage.
• Determine recovery windows for each business
function.
BIA Procedure
• Choose information gathering methods (surveys,
interviews, software tools).
• Select interviewees.
• Customize questionnaire.
• Analyze information.
• Identify time-critical business functions.
• Assign maximum tolerable downtimes (MTDs).
• Rank critical business functions by MTDs.
• Document, prepare, and report recommendations.
• Obtain management approval.
Critical Business Function Categories
Item
Required Recovery Time
Nonessential
30 days
Normal
7 days
Important
72 hours
Urgent
24 hours
Critical/Essential
1 – 4 hours
Critical Business Function Categories
Item
Required Recovery Time
Very High (1)
0 – 12 hours
High (2)
12 – 24 hours
Moderate (3)
24 - 72 hours
Low (4)
> 72 hours
Example of BIA
• An order department might list the following tasks
and recovery time periods:
• Receive orders electronically via e-commerce Web
site: Critical/Essential
• Receive orders by facsimile machine:
Critical/Essential
• Receive orders by phone system: Critical/Essential
• Input orders into ordering system: Important
• Process orders: Important
• Issue and mail invoices: Important
Sample BIA Question Topics
•
•
•
•
•
•
•
•
•
Business function.
Date of interview.
Contact name.
Business process.
Financial impacts.
Operational impacts.
Legal obligations.
Damage to reputation.
Technological dependence.
• Interdependencies.
• Existing BCP measures.
• Alternate processing
options.
• Customized options:
- Financial impact
- Operational impact
- Legal obligation
- Damage to reputation
4. Plan Development and Implementation
1.
2.
3.
4.
Determine management concerns and priorities.
Determine planning scope.
Establish outage assumptions.
Define prevention strategies for risk management,
physical security, information security, insurance
coverage, and how to mitigate the emergency.
5. Identify resumption strategies for mission-critical
applications and systems at alternate sites.
6. Identify recovery strategies for non-mission-critical
applications and systems at alternate sites and for
relocating the emergency operations center/command
center to the recovery site.
Plan Development and Implementation
7. Develop service function recovery plans, including
information processing, telecommunications, etc.
8. Develop business function recovery plans and
procedures.
9. Develop facility recovery plans.
10. Identify the response procedures.
11. Gather data required for plan completion.
12. Review and outline how the organization will interface
with external groups. (Communication)
13. Review and outline how the organization will cope with
other complications beyond the actual disaster.
BCP Coverage
•
•
•
•
•
•
•
•
•
•
•
General Introduction and Overview.
Policy Statement.
Functional Areas Priorities.
Critical Resources / Non-critical Resources.
Procedural Considerations.
Emergency and Evacuation Procedures.
Recovery Teams.
Recovery Processes.
Emergency Operations Center/Command Center.
Facility Considerations.
Inventory Considerations.
BCP Coverage
•
•
•
•
•
•
•
•
•
•
Equipment Considerations.
Communication Considerations.
Documentation Considerations.
Data/Software Considerations.
Transportation Considerations.
Supporting Equipment.
Responding to the disaster.
Resume critical business functions.
Resume Non-Critical Business Functions.
Planning for Return to the Primary Site (Restoration
Operations).
• Interfacing with External Groups.
Continuity Plans Components
• Awareness of Roles and Responsibilities
– Who will do what? Employees and staff are critical.
Pandemic is an extreme example of a disaster where
employee resources will be very limited!
• Defined recovery time objectives
• Risk Management to identify & reduce risks
• Alternate Processes (telecommuting, distance learning)
• Alternate recovery locations
• Off-site storage of critical media and non-media items
• Written plans, reviewed & updated regularly
• Frequent plan exercises
Business Continuity Plans must be useful
Make sure the plans that
protect each of us is
more than ……..
Successful Business
Continuity Planning helps
ensure that employees and
the interests of owners
and customers are
protected.
5. Plan Testing
• Until it’s tested, you don’t have a plan.
• Types of testing:
-
Structured walk-through
Check List
Simulation
Parallel
Full interruption
Goals of Plan Testing
• Ensure the understanding and workability of
documented recovery procedures.
• Acquaint test participants and recovery teams with
their roles in the event of a disaster.
• Verify that recovery strategies are viable.
• Train team leaders and members in the procedures
of executing the continuity plan.
• Identify flaws and oversights in plan procedures
and strategies.
• Obtain information about recovery strategy
implementation.
Goals of Plan Testing
• Demonstrate that output performance of backup
systems and networks are consistent with
production systems and networks.
• Adapt and update existing plans to encompass new
requirements.
• Test all components of the plan, including
hardware, software, personnel, data and voice
communications, procedures, supplies and forms,
documentation, transportation, utilities, alternate
site processing, etc.
5. Plan Maintenance
• Resolve all problems/deficiencies found during
testing.
• Implement change management.
• Audit and address audit findings
• Build maintenance procedures into the organization
operation.
• Annual review of plan
• Centralize responsibility for updates.
• Report updates regularly to team members and, if
necessary, to senior management.
5. Awareness and Training
• BCP team is probably the DR team.
• All staff should be trained in the business
recovery process.
• Training should cover a range of outcomes,
from simple awareness of the major provisions
of the plan to the ability to carry out specific
procedures.
• BCP training must be on-going.
• BCP training needs to be part of the standard
on-boarding and part of the corporate culture.
BCP Training Coverage
• Describe the recovery organization (teams and functions).
• Explain the flow of recovery events and activities following
a disaster.
• State team members’ responsibilities in recovery activities.
• Provide an opportunity for each recovery team to meet to
develop in-depth knowledge of their responsibilities and
procedures.
• Require teams to conduct drills using the actual checklists
and procedures in their section of the recovery plan.
• If possible, include a plan for cross-training teams so those
individuals are familiar with a variety of recovery roles and
responsibilities.
Sponsorship is Key to Success
• Board of Directors or Senior executives
(president, vice presidents, officers) must
identify BCP a priority.
• Executives and senior managers must actively
support the BCP Process.
• Business Recovery Coordinators (BRCs) within
business units / departments must be actively
involved, developing, implementing, and
exercising BC plans, and accept ownership of
their plans.
Communication is Critical
• Employees, customers,
business partners must
know key information
about your plan if your
plan is to work.
• Plans must be
periodically reviewed in
team meetings and
shared with new team
members.
Secret Plans won’t
work!
Communication…..
• Contact information for all team members must be current
• Make sure employees have Emergency Wallet Cards with
key phone numbers, etc
• Plans must include:
– Clear chains of authority
– Clear listing of tasks, roles and responsibilities
– DR conference lines or standing communication tools
– Standing meetings (times, numbers)
– Alternate meeting locations
– Centralized communication facility (VM, web site, etc…)
Off Site Storage is Critical !
When a facility is lost or inaccessible, all items inside are
no longer available. What is needed in off site storage
if you had to recover from scratch
• PC backup media must be stored off-site?
• Critical, non-media, documents and materials must be
available in an off-site location, accessible by
appropriate individuals or teams during a disaster or
exercise.
• Key personnel must know where off-site storage items
are located and to where items will be shipped (Hotsite, Incident Command Center or remain in off-site
storage?)
Effective BCP Is Built On 7 P’s
• Programme - the total BCM strategy
• People
- Roles and responsibilities, H&S,
awareness and education
• Processes
- all organisational processes
including ICT
• Premises
- buildings & facilities
• Providers
- supply chain inc. outsourcing
• Profile
- brand, image and reputation
• Performance - benchmarking, evaluation & audit
Essential Elements of BCM
• Take a holistic approach
• ‘End to End’
• Effects, not causes
• Prevention, not just cure
• Culture of BCM
• Need to measurement
Common Pitfalls In BCP
Industrial and Professional Standards (1)
• BS 25999-1 (2006), Business continuity management,
Part 1: Code of practice, The British Standards
Institution, United Kingdom.
• HB 221 (2004), Business Continuity Management,
Standards Australia, Australia.
• HB 292 (2006), A practitioners guide to business
continuity management, Standards Australia, Australia
Industrial and Professional Standards (2)
• BS ISO/IEC 17799 (2005), Code of practice for
information security management, The British
Standards Institution, United Kingdom.
• NFPA 1600, Standard on Disaster/Emergency
Management and Business Continuity Programs, The
National Fire Protection Association, United States.
• Defense Security Service (DSS), formally known as
Defense Investigative Services (DIS).
• National Institute of Standard and Technology
(NIST).
Current Regulations/Standards
• US - Securities and Exchange Commission - NASD Rules
3510 & 3520 and the NYSE Rule 446
• Basel II & E-banking
• UK Civil Contingencies Act
• Sarbanes Oxley
• UK FSA – BCM Guidance
• PAS 56 and from Summer 2006 BSI
• King II in South Africa
• Singapore - MAS BCM Standard
• Australian Standard for BCM
• US - NFPA 1600
• Europe - Netherlands, Luxemburg, Belgium, et al
A Changing World
Corporate Governance
ISO 17799-01
CCA, Comp Act
GDPdU & GoBS
BS7799-02
NF Z 42-013
COBIT
AIPA
ITIL
King II
MAS
IT Baseline
China
Basel II
Sarbanes Oxley Act
APO
BS 25999 -1
BS 25999-1
BCP Planning Resource
 Contingency Planning Association of the Carolinas
– www.cpaccarolinas.org
 Disaster Recovery Journal
– www.drj.com/groups/drj6.html
 Disaster Recovery Institute International (DRII)
– www.drii.org/
 DHS - www.ready.gov/
 FEMA - www.fema.gov/
 Institute for Business & Home Safety (IBHS)
– www.ibhs.org/business_protection/
 Premier Safety Institute
– www.premierinc.com/quality-safety/toolsservices/safety/index.jsp
Key Terminologies
• Business Continuity Plan (BCP): A document describing
how an organization responds to an event to ensure critical
business functions continue without unacceptable delay or
change.
• Business Continuity Planning. Business continuity planning
will help organizations:
– Identify the impacts of potential data processing operational disruptions
and data loss.
– Formulate and implement viable recovery plans to ensure the
availability of data processing support for critical applications, data,
and services.
– Develop, implement, and administer a comprehensive BCP training,
testing, and maintenance program.
Key Terminologies
• Business Resumption Planning (BRP). BRP
develops procedures to initiate the recovery of
business operations immediately following an outage
or disaster. It can also outline the procedures for
returning critical business functions to the normal
processing site following the interruption.
• Continuity of Operations Plan (COOP): A COOP
is a document describing the procedures and
capabilities to sustain an organization’s essential,
strategic functions at an alternate site for up to 30
days.
Key Terminologies
• Crisis Communication Plan. A document that
outlines the procedures for disseminating status
reports to personnel and the public in the event of
an outage or disaster.
• Cyber Incident Response Plan. This document
provides the strategies to detect, respond to, and
limit consequences of malicious cyber incidents.
The focus is on information security responses to
incidents affecting systems and/or networks.
Key Terminologies
• Disaster Recovery Planning. Disaster recovery
refers to the immediate and temporary restoration of
critical computing and network operations after a
natural or man-made disaster within defined
timeframes. An organization documents how it will
respond to a disaster and resume the critical business
functions within a predetermined period of time;
minimize the amount of loss; and repair (or replace)
the primary facility to resume data processing
support.
Download