Business Recovery Strategies

advertisement
Disaster Recovery
Chao-Hsien Chu, Ph.D.
College of Information Sciences and Technology
The Pennsylvania State University
University Park, PA 16802
chu@ist.psu.edu
IST 515
Objectives
• Describe the basic differences between BCP and DRP
• Describe the steps involved in creating a disaster
recovery plan tests.
• Identify and describe the various types of recovery
strategies.
• Describe how to formulate a recovery strategy.
• Compare and contrast strategies for backup.
• Identify the advantages and disadvantages of mutual aid
agreements.
• Compare and contrast the advantages and disadvantages
of hot sites and cold sites.
• Compare and contrast the advantages and disadvantages
of using service bureaus.
Readings
• Hansche, S., Berti, J. and Hare, C., Official (ISC)2 Guide
to the CISSP Exam, Auerbach, 2004. Chapter 9
(Required).
• Swanson, M., Wohl, A., Pope, L., Grance, T., Hash, J.,
and Thomas, R., Contingency Planning Guide for
Information Technology Systems, NIST Special
Publication 800-34, June 2002.
• Wikipedia, Disaster recovery.
http://en.wikipedia.org/wiki/Disaster_recovery
Disasters
BCP Cycle
Areas Covered in BCP
• Contact points. Who to contact during office hours,
outside office hours, and in an emergency;
• Roles and responsibilities. A well-defined
organizational structure for the business continuity and
recovery teams;
• Risk levels. A categorization of business risks and the
level of risk the organization deems acceptable;
• Continuity and recovery service levels. How much
time is acceptable for responding to threats,
implementing continuity plans, and recovering from
failure scenarios;
Areas Covered in BCP
• Business continuity reviews. How and when the
organization reviews business continuity plans;
• Business continuity processes. Processes and
procedures that inform staff how to react to and handle
particular failure scenarios;
• Incident reporting and documentation. Methods of
recording and documenting incidents and responses to
them;
• Testing. Acceptance criteria and testing requirements
for the business continuity plan; and
• Training. Training requirements for staff involved in
business continuity and disaster recovery processes.
Step 1: Initiate the BCP Project
1. Obtain and confirm support from senior
management.
2. Identify key business and technical stakeholders.
3. Form a business continuity working group.
4. Define objectives and constraints.
5. Establish strategic milestones and draw up a road
map.
6. Begin a draft version of business continuity
policy.
Step 2: Identify Business Threats
• Technology threats include natural disaster (such as
flooding), fire, power failure, systems and network
failure, systems and network flooding (when attackers
try to overwhelm a network with traffic), virus attack,
denial-of-service attack, theft, vandalism, and sabotage.
• Information threats come from hacking, theft, fraud,
fabrication, alteration, misuse, natural disaster, fire, and
the degradation of the ink on paper records.
• People threats include illness, recruitment shortfalls,
resignation, compassionate leave, pregnancy, weather,
and unavailability of transportation or office access.
Step 2: Identify Business Threats
1. Identify the community of business and
technical stakeholders.
2. Conduct threat identification workshops.
3. Delineate and document business threats.
Step 3: Conduct a Risk Analysis
• Conduct risk analysis workshops.
• Assess the likelihood and impact of threat
occurrence.
• Categorize and prioritize threats according to risk
level.
• Review outputs of risk analysis with management.
• Ascertain level of risk acceptable to the
organization.
• Document outputs in business continuity policy.
Step 4: Establish the Business Continuity Team
• Identify key business, technical, and customer
services stakeholders.
• Form and empower the business continuity team.
• Clarify and agree on team objectives and working
mode.
• Define roles and responsibilities; produce a work
plan.
• Identify incident engagement and response
processes.
• Update business continuity policy.
Roles of BC Team
• A business continuity manager is the first point of contact,
manages the incident, initiates the business continuity plan,
mobilizes the business continuity team, and presents key decisions
to business owners when appropriate.
• The business owner makes key decisions about how the business
handles incidents.
• The technical services manager manages disruptions to technical
services, such as IT infrastructure and applications; initiates
continuity arrangements; and interacts with third-party business
continuity service providers.
• An estate manager manages disruptions relating to buildings,
offices, and the surrounding environment; initiates continuity
arrangements and interacts with third party business continuity
service providers.
Roles of BC Team
• The business operations and customer services manager
manages disruptions to business operations and
customer services; keeps customers informed if there is
a noticeable impact on customer service levels; initiates
continuity arrangements; and interacts with third-party
business continuity service providers.
• Business continuity (or resumption) teams are technical,
estate, or customer services teams that execute the
business continuity plans.
• A recovery manager guides the business’ recovery to
normal operations.
Step 5: Design the Business Continuity Plan
• Identify critical and noncritical business services.
• Establish preferred business continuity service levels and
profiles for continuity and recovery.
• Reaffirm key constraints (such as time and cost).
• For each threat, identify possible continuity strategies
and evaluate them in terms of time, cost, and benefits.
• Identify and engage potential business continuity
partners.
• Draft a set of continuity plans and work toward an
agreed set of plans with senior management.
• Produce and execute an implementation plan.
Common Strategies
• Technology: Redundancy (of hardware and network, for
example), maintenance and support agreements, and backup
and restore capabilities are common defensive strategies.
• Information: Recover information by using data mirroring,
backup and restore, auditing, and off-site or secondary data
storage.
• People: To temporarily shore up people-related resources, use
contract staff, rotas (workloads that a company can change in
response to business demand or personnel shortfalls), call-out
arrangements (having certain staff in standby mode to be
called to work as necessary), rental offices and sites, manual
procedures, and service-forwarding agreements (such as with
specialist call centers).
Evaluating Criteria
• Costs for acquisition, deployment, testing, training, and
associated management overhead;
• Level of protection;
• Business resumption response time; and
• Time to implement, including time for acquiring,
deploying, and testing the business continuity strategy
and for conducting relevant and necessary training.
Step 6: Define Your Business Continuity
Processes
• Identify, define, and document business continuity
processes.
• Review and verify business continuity processes
with relevant stakeholders.
• Identify training requirements.
• Develop training exercises, role-playing scripts,
and simulation case studies.
• Initiate training and awareness programs.
Business Continuity Processes
• Handling specific failure events, such as fire and
network failures;
• Backup and restoration of systems and business
data;
• Virus management;
• Incident reporting;
• Problem escalation hierarchies;
• Customer and staff communication;
• contact procedures for third-party support providers.
Step 7: Test your business continuity plan
•
•
•
•
•
Define business continuity acceptance criteria.
Formulate the business continuity test plan.
Identify major testing milestones.
Devise the testing schedule.
Execute tests via simulation and rehearsal; document test
results.
• Assess overall effectiveness of business continuity plan;
pinpoint areas of weakness and improvement.
• Iterate tests until the plan meets acceptance criteria.
• Check, complete, and distribute business continuity
policy.
Reasons for Testing BCP
• Validate the plan’s effectiveness in meeting your stated
business continuity service levels;
• Identify, at an early stage, any shortcomings in the plan;
• Assess whether your business continuity service levels
are realistic and achievable given your budgetary and
time constraints; and
• Give senior management and other parties (such as
regulatory bodies) confidence in the plan.
Step 8: Review your business continuity plan
• Develop a review schedule for different types of
review.
• Arrange a business continuity review meeting or
workshop.
• Update the business continuity document.
• Kick off another BCP cycle if necessary.
When to Review BCP
• Significant changes to the business—for example, the
launch of new e-business operations;
• Changes in business priorities;
• Shifts in the legal or regulatory landscape;
• Significant world events (wars or terrorist attacks);
• Changes to the IT budget;
• Physical relocation of IT systems and operations;
• Outsourcing of IT systems and operations;
• Developments in IT infrastructure; and
• Significant changes in the labor market.
Common Pitfalls In BCP
Disaster Recovery
• Disaster recovery refers to the immediate and
temporary restoration of critical computing and
network operations after a natural or man-made
disaster within defined timeframes.
• An organization should document how it will
respond to a disaster and resume the critical
business functions within a predetermined period of
time; minimize the amount of loss; and repair (or
replace) the primary facility to resume data
processing support.
Disaster Recovery Planning
• A comprehensive statement of consistent actions
to be taken before, during, and after a disruptive
event that causes a significant loss of information
systems resources
• The procedures for responding to an emergency,
providing extended backup operations during the
interruption, and managing recovery and salvage
processes afterwards, should an organization
experience a substantial loss of processing
capability.
Disaster Recovery Planning
• To provide the capability to implement critical
processes at an alternative site and return to the
primary site and normal processing within a
time frame that minimizes the loss to the
organization, by executing rapid recovery
procedures.
Goals and Objectives of DRP
• Protecting an organization from major
computer services failure.
• Minimizing the risk to the organization from
delays in providing services.
• Guaranteeing the reliability of standby systems
through testing and simulation.
• Minimizing the decision-making required by
personnel during a disaster.
Disaster Recovery Procedures
•
•
•
•
The recovery team.
The salvage team.
Normal operations resume.
Other recovery issues:
– Interfacing with external groups
– Employee relations
– Fraud and crime (vandalism and looting)
• Financial disbursement.
• Media relations.
Recovery Strategies
Recovery Strategies
• Recovery strategies consist of a set of predefined
and management approved actions implemented in
response to an unacceptable business interruption.
• The focus is on recovery methods to meet the
predetermined recovery timeframes established for
the operation and functioning of the critical
business functions.
• Developing the recovery strategies includes
compiling the resource requirements and
identifying the alternatives available during
recovery.
Sample of Business Unit Priorities
Recover
Windows (Hrs)
IT
Platforms
Priority
IT Security
2
Mainframe, LAN, WAN
1
Facilities
2
LAN, WAN
1
Legal
36
LAN, WAN
3
Administrative
18
LAN, WAN
2
Accounting
48
LAN, WAN
3
Human Resources
48
LAN, WAN
3
Business Units
Steps in Developing Recovery Strategies
•
•
•
•
Document all costs with each alternative.
Obtain costs for any outside services.
Develop written agreements.
Evaluate risk reduction and resumption
strategies based on a full loss of the facility.
• Identify risk reduction measures and revise
resumption priorities and timeframes.
• Document recovery strategies and present them
to management for comments and approval.
Recovery Strategies
Strategies should address recovery of:
•
•
•
•
Business operations
Facilities & supplies
Users (workers and end-users)
Technical (network, telecommunication, data
center)
• Data (off-site backups of data and
applications)
Business Recovery Strategies
• Business recovery strategies focus on critical resources and
the MTD for each business function.
• The business unit priorities are taken directly from the
BIA. The length of the recovery window for each business
unit dictates the priority for recovery.
The strategies involved identifying the following:
• Critical business units and their associated business
functions.
• Critical IT system requirements for each business function.
• Procedures for connectivity to IT infrastructures (e.g.,
mainframe, mini, LAN, WAN).
Business Recovery Strategies
The strategies involved identifying the following:
• Critical equipment and supply requirements for each
business function.
• Essential office space requirements of each business unit.
• Key personnel for each business unit.
• Redirection of postal service mail, voice
telecommunications, and data networks to the recovery
site.
• Business unit interdependencies with other units.
• Off-site storage (procedures, media, documents).
• Vendor services.
Facility and Supply Recovery Strategies
• Facility recovery involves identifying recovery procedures
for the alternate facility, including space, security, fire
protection, infrastructure, utility, supply, and
environmental requirements.
• Determine minimum space for recovery of critical business
units.
• Determine space needs for less critical resources.
• Determine security needs at recovery sites.
• Determine fire protection needs.
• Determine critical furnishings and office equipment.
• Determine infrastructure requirements.
• Determine utility and environmental needs.
• Determine what office/business supplies are needed.
User Recovery Strategies
• The strategies involved with personnel requirements focus
on manual procedures, vital records, and restoration
procedures. A critical component is establishing methods
to implement the process and maintain the records so that
information can be easily and accurately updated to the
electronic format when service is restored.
The plan should specify the followings:
• Manual procedures.
• Vital record storage (i.e., medical, personnel).
• Employee notification procedures.
• Employee transportation arrangements.
• Employee accommodations.
Technical Recovery Strategies
• Technical recovery strategies define alternate
recovery strategies for the data center and
network infrastructure components.
Methods:
• Subscription services.
• Mutual aid agreements.
• Redundant data centers.
• Service bureaus.
Subscription Services
• Subscription services provide an alternate facility
or “site” for recovery. They are characterized as
hot, warm, cold, mirror,
• and mobile sites.
• Hot Site. A fully configured site with complete
customer required hardware and software
provided by the service.
• Warm Site. Similar to a hot site, but the
expensive equipment (i.e., mainframe) is not
available on-site. The site is ready in hours after
the needed equipment arrives.
Subscription Services
• Cold Site. Does not include any technical
equipment or resources, except environmental
support such as air conditioning, power,
telecommunication links, raised floors, etc.
• Mirror Site. Also referred to as full redundancy, is
a computer service facility equipped with utilities,
communication lines, and appropriate hardware
that is fully operational and processes each
transaction along with the primary site.
• Mobile Site. A trailer that can be set up and link
by a trailer sleeve to create a space to suit the
subscriber’s recovery needs.
Reciprocal or Mutual Aid Agreements
• This strategy is to establish reciprocal or mutual aid
agreements with other companies to provide facilities
to the other in the event of a disaster.
• Reciprocal agreements require the companies to have
similar hardware and software computing
environments.
• Typically, reciprocal agreements are dismissed in
practice because few information system facilities
have the extra capacity needed to run both their own
and another organization’s needs for any extended
period of time.
Technical Recovery Strategies
Redundant Processing Centers:
•Expensive
•Maybe not enough spare capacity for critical
operations
Service Bureaus:
•Many clients share facilities
•Almost as expensive as a hot site
•Must negotiate agreements with other clients
Data Recovery Strategies
• The objectives are to back up critical software
and data, store the backups at an off-site location,
and retrieve the backups quickly during a
recovery operation
• Backups of data and applications
• Off-site vs. on-site storage of media
• How fast can data be recovered?
• How much data can you lose?
• Security of off-site backup media
• Types of backups (full, incremental, differential,
etc.)
Recovery Management
• This is sometimes referred to as Crisis Management.
Essentially, it is the overall coordination of the
organization’s response to a crisis.
• The goal is to deal with the issues in an effective and
timely manner and avoid or minimize damage to the
organization’s profitability, reputation, and ability to
operate.
• The flow of accurate information is a key ingredient to
effective crisis management. The effective management of
information can serve as the first line of defense against a
crisis and can also be the most effective mechanism in the
process of restoring both the business functions and public
confidence.
Testing the Disaster Recovery Plan
• To verify the accuracy of the recovery
procedures and identities
• To prepare and trains the personnel to execute
their emergency duties
• To verify the processing capability of the
alternative backup site
Testing DRP
Creating the Test Document:
• Testing Schedule and Timing
• The Duration of the Test
• The Specific test steps
• Who will be the participants in the test
• The task assignments of the test personnel
• The resources and services required (supplies,
hardware, software, documentation, and so
forth)
Five DRP Test Types
•
•
•
•
•
Checklist
Structured walk-through
Simulation
Parallel
Full-interruption
Download