slides - BCS Berkshire Branch

advertisement
Weathering the Storm
Patricia Vella
Resilience Matters
Former Global Head Business Continuity Nortel
patriciavella@resilience-matters.com
About Resilience Matters Ltd.
• Patricia Vella is owner of Resilience Matters, a
small Business continuity consultancy
• Patricia ran Nortel’s corporate wide Business
continuity program for over 5 years
– She worked closely with key outsourced and off
shore facilities
• Since Nortel Patricia has carried out business
continuity, disaster recovery and resilience work
for RAC, SAB Miller, The Economist and Deutsche
Bank
• Patricia moved into business continuity from a
background as a technical architect for high
availability telecoms solutions
• http://uk.linkedin.com/in/pvella
4/13/2015
Copyright Resilience Matters
2
Contents
• Before you start
• Demystifying the Jargon
– Emergency Response
– Crisis Management
– Business Continuity
– Disaster Recovery
• What sort of plans do you need ?
• Case Studies of incidents
4/13/2015
Copyright Resilience Matters
3
Before you Start
• Ensure you know what your company does
and what is most critical
– 999 service support almost unknown in Nortel
• Identify where your company is located
• Find work you can reuse
– Emergency plans should already be in place
– Quality plans may contain critical business info
• Understand your company culture
4/13/2015
Copyright Resilience Matters
4
Demystifying the Jargon
•
•
•
•
Emergency Response
Crisis Management
Business Continuity
Disaster Recovery/ICT Service Continuity
4/13/2015
Copyright Resilience Matters
5
Emergency Response
• These are your plans for responding to an
emergency affecting a physical site eg. fire
• Typically developed and owned by H&S
– Fire Safety requirements specified in legislation
Regulatory Reform (Fire Safety) Order 2005
• These must be enacted first
– Ensure separation between personnel critical in
emergency response such as first aiders/fire
wardens and business continuity team members
4/13/2015
Copyright Resilience Matters
6
Emergency Plans Must include
• Action on discovering a fire.
• Calling the fire brigade.
• Evacuation of the premises including those
particularly at risk.
• Power/process isolation.
• Places of assembly and roll call.
• Liaison with emergency services.
• Identification of key escape routes.
4/13/2015
Copyright Resilience Matters
7
Emergency Plans May Include
• Alternative assembly points in case of bomb
threat
• Premise search instructions for bombs
• Instructions in case of disease outbreak on site
eg. Include liaison with UK Health Protection
Authority (HPA)
• Contents and location of emergency grab bag
4/13/2015
Copyright Resilience Matters
8
Crisis Management
• Process by which a major incident is managed
• If incident affects business processes crisis
management will invoke business continuity
and manage that process
• Some incidents such as kidnap and ransom are
managed without involving wider business
and may utilise specialist external agencies
• Good idea to have clear definition of what
constitutes a crisis and who can invoke
4/13/2015
Copyright Resilience Matters
9
Business Continuity
• Business Continuity are the plans and
processes that maintain critical operations
after a major incident
• Business Continuity is defined as
– the strategic and tactical capability of the
organization to plan for and respond to incidents
and business disruptions in order to continue
business operations at an acceptable pre-defined
level
BS 25999-1
4/13/2015
Copyright Resilience Matters
10
Business Continuity
• Business Continuity for large organisations is
much more than a set of plans
• Business Continuity Program needs
– Clearly identified leader (and alternate)
– Annual programme of updates to BIAs and BCPs
– Contact point for customer questions
– Defined strategy for supply chain resilience
– Annual test programme
4/13/2015
Copyright Resilience Matters
11
Business continuity
• Business Continuity Planning includes
mitigations carried out ahead of an incident
that reduce impact/risk eg.
– IT Service Continuity measures
– dual source critical components
• BCP response strategies typically include
– Short term manual workarounds
– Work transfer to alternate teams
– Transfer of people to Work area recovery sites
4/13/2015
Copyright Resilience Matters
12
Disaster Recovery
• Disaster recovery
– is the process, policies and procedures that enable
the recovery or continuation of technology
infrastructure after a disaster
• Disaster Recovery Plan (DRP) contains
– steps to be followed to enable recovery of the
technology infrastructure
– steps to be followed to reconcile the data
• Master DRP specifies running order for system
DRPs
4/13/2015
Copyright Resilience Matters
13
Disaster Recovery/ICT Continuity
• Huge variety of techniques to provide disaster
recovery. Selection of what is right for you
depends on your requirements (and budget).
• Before you start you need to define
– Recovery Time Objective (RTO) ie. how long can
you tolerate the system being down for
– Recovery Point Objective (RPO) ie. how much data
could you lose
4/13/2015
Copyright Resilience Matters
14
Disaster Recovery Strategies
•
•
•
•
•
•
Mirroring
Hot/warm/cold standby
High availability
Backup – tape vs hot swappable disks
Rollback strategy in case of corruption
UPS and Standby generators
4/13/2015
Copyright Resilience Matters
15
What Plans do you need ?
• Depends on company size, location, function
and regulatory requirements
• IT/Technology need DR plans.
– prioritise most business critical systems first,
– don’t overlook the middleware systems
• Crisis plan should be simple and succinct,
– know who will step in and take charge and when
• Start simple for your business continuity plans
– build up complexity over time
4/13/2015
Copyright Resilience Matters
16
4/13/2015
Copyright Resilience Matters
17
London BC Invocation 7/7 05
• Nortel had large Managed Services presence in London
– This is where we managed parts of the telecoms network
– 20 different customers
– Managed all the switches for a major UK telecoms operator
• London bombings quickly caused following impact to our
business
– Mass call event, unlike usual mass call events (eg. BGT) calls
were not restricted to specific range of numbers and were
sustained over several hours
– London transport halted which caused significant impact to
shift change for 24x7 operations
– Difficulty in moving around London impacted spares, FLM,
I&C and other services around London
4/13/2015
Copyright Patricia Vella
18
London BC Invocation 7/7 05
• Nortel Network Operators in London noticed a unusual
pattern of mobile calls from several locations in London
•
•
•
•
4/13/2015
– They alerted the senior manager on duty that day,
– They agreed whatever was going on was clearly bigger than
a power surge on the underground
Ops manager contacted me 20mins after the first explosion, we
reviewed situation and invocated BC 30mins after first explosion
– followed BC process (eg switch off provisioning, track shift
change, transfer operators to alternate sites)
– Incident EMT managed by conference calls throughout
Thank-you email from CEO of major UK operator Friday am
CSF Highly trained BC Primes
CSF Every manager and team leader had been involved in at
least one 2 day simulation exercise, some had participated in
two previous exercises
Copyright Patricia Vella
19
India ‘Bollywood’ Star Death April 06
•
•
•
•
•
4/13/2015
Bollywood informal name for film industry in India
– ‘Bollywood’ Film stars have a huge following in India
Rajkumar died 12th April 2006 in Bangalore
– Approx 60,000 fans took to the streets,
– Vehicles were set alight, police used teargas to control
crowds
– 8 people died in the riots
– 2 days national mourning
Technology companies shutdown
– American companies hit first, then Indian companies
– Impacted R&D, Nortel was notified of the incident,
R&D schedules replanned
Supplier run Nortel 24x7 call centre shut for 2 days
– Nortel BC was to temporarily transfer calls to Canadian
call centre
– CSF Ability to transfer work to alternate supplier at
short notice
http://news.bbc.co.uk/1/hi/world/south_asia/4909432.st
m
Copyright Patricia Vella
20
Summary
• Essential you understand what sort of plans
you need and why
– Emergency Response
– Crisis Management
– Business Continuity
– Disaster Recovery
• The type of incidents that can cause you to
invoke your plans are many and unpredictable
• Tests prove their value in real incidents
4/13/2015
Copyright Resilience Matters
21
Download