Disaster Recovery Capstone Project

advertisement
Advisor:
Jim French, Dept of Ecology
Team Members:
Scott Andersen, WSDOT
Gary Duffield, DIS
Doug Selix, OFM
Thelma Smith, WSDOT
Brian Sylvester, DOP




How can the state achieve a coordinated
approach to IT disaster recovery?
How will we recover critical services and
infrastructure knowing that we share services,
platforms, and customers that rely on each other
for data during the recovery?
How do we expose the risks, identify the gaps
and move toward meeting recovery time
objectives?
How do we ensure that the capacity to recover
aligns with the risk tolerance of state leadership?
1.
Establish and empower a central authority for
‘Enterprise’ (Statewide) D/R Planning
2.
Standardize and consolidate IT Infrastructure
where ever possible to ease D/R Planning
3.
Practice D/R Planning at the ‘Enterprise’ (not
agency) level
4.
Mandate D/R planning for all IT systems
5.
Develop and document State guidelines on ‘risk
appetite’
Resilience
and Recoverability
(R/R)
Leadership is about change!
Shared Vision: Changes on the horizon
 Standardization & Consolidation
 System Level R/R Focus
 R/R Designed into All Systems
 Risk Tolerance and Oversight
 Senior Level Sponsorship
 State Agencies’ Partnership
 Strategic and Tactical Leadership
 Strategic = Resilience
 Tactical = Recoverability
Governor
Emergency Management Council
State Agency Liaisons
DIS, OFM, DOP, DOT, etc.
Comprehensive Emergency
Management Plan - CEMP
ISB Standards
State Agencies’ Plans
Existing Catch 22
 Change agency-centric approach to statewide R/R
solution
 Establish shared vision for funding R/R
Integrate R/R into Spending Plans
 Develop policy that cements R/R funding into IT
initiatives

Establish Ownership and Oversight
 Align R/R efforts with similar or preexisiting efforts
 Emergency management groups
 Agencies’ leadership teams
 Establish new teams or partnerships as needed
 Establish policies for:
 Compliance
 Success Metrics
 Change Management
 LEADERSHIP!!!
 Proactive = Resilience
 Reactive = Recovery
 Close Gaps and Remove Roadblocks
 Leverage Existing or
Program
Empower new

Hardware and software consolidation and
standardization is becoming the driving force behind
organizations evaluating their Disaster Recovery
plans.

A 2009 survey from Symantec Corporation found
that 64% of organizations are creating or reevaluating their DR plans based on a plan to
consolidate and standardize their infrastructure.
Hosting Service Matrix
Increase
provider
mgmt,
reduce
agency
resources
Maturity
Target
Transition
Target
Leverage common infrastructure,
consolidate hardware, reduce cost
2
 Adopt a cost effective enterprise High
Availability Architecture solution
(Resilience).
 Future investments in Infrastructure and
Applications should include Resilience
and Recoverability.
 Planning for Resilience and Recoverability
should be at the Enterprise Level.
 Planning for recovery by agency,
technology, or individual application is not
effective for an enterprise class system.
Enterprise Level Planning is complex, and must be
done for Essential Systems.
 Essential Systems support Essential Agency Functions
as defined in agency COOP plans
 Must consider core agency systems - run by agency or
service provider
 Must consider dependencies such as infrastructure and
interface services
 Must consider dependant trading partner systems
 Must consider enterprise data at recovery point
 Must include procedures for assuring data integrity at
recovery point
OFM Example - The State Payment Process
 Payment Process based upon AFRS and all
systems that it connects to
 Historical DR Plan “DIS will recover the
mainframe and all will be good”
 Look at interfaces to partner agencies
 Look at known single points of failure
Enterprise Class Planning requires
someone to focus on getting it done for
essential systems!
 A single organization must facilitate
Enterprise planning
 Enterprise system owner and Stakeholders
must fully participate in development and
testing of R/R Plans
 Enterprise Planning is HARD!
 Enterprise Class Systems are COMPLEX!
 Someone Needs to GET ‘er DONE!
Many, if not most, recent IT systems developed
without Disaster Recovery – Why?
 Elimination viewed as a ‘Cost Reduction’ strategy.
 This is a ‘false economy’ – a calculated risk
Real consequences to State citizens:
 Missing vital systems after a disaster
Or
 Spend too much to ensure their availability
Creation of WSRRO
 Mandate all new IT systems include R/R
 Review and approve
Criteria
 Agency impact analysis
 Integration impact analysis
 Validate appropriateness of plan
Types of ‘valid’ plans:
 ‘Resilience’
 ‘Warm site’
 ‘Cold site’
 Data protection only
 No recovery plan
Time
Cost
Assurance
Assurance
Cost
Time
Resilience
Recovery (Warm)
Recovery (Cold)
Data Protection Only
 Mandate R/R planning for all IT
systems
 Scope for critical functions only
 Ensure ‘Enterprise’ context

If your house was on-fire, what would you save?

We all live in the same house, we need to decide
what is going to be saved! And how much!
We won’t be able to save it all.
Be careful what you choose!
What is important to the WA State Enterprise?
 Public Safety (EMD/WSP/DOC/Roads/others?)
 Citizen Systems – Licensing, Social Systems,
others?
 Financial Systems - How we dispense and
receive funds.
 H/R Systems, Data Centers?
State Enterprise Approach!
How much and what loss is acceptable?





Data? E-mail? File Systems?
Hardware/infrastructure
Network s, communications?
Applications used by Citizens?
Applications used by Agencies?
What does this look like?
How do we determine what and how much?
Identify and Develop a Risk Matrix!

Now we know what, How do we really know it will
work?

What are our expectations for Disaster Recovery?

How do we ensure that RECOVERY WILL work?
 LEADERSHIP!
 Identify and apply standardized comprehensive testing
(Know what and how much to test and test it the same way
across the board!
 Perform Resilience and Recoverability Plans
 Review Results and apply Process Improvement!
(Do it better next time!)


Target Enterprise (State Level)
Programs/Systems NOT silo agencies
Identify how much of it we really need!
RISK MATRIX!



Standardized Comprehensive Testing applied
Regularly perform Resilience and Recoverability
Testing
Process Improvement
1.
Establish and empower a central authority for
‘Enterprise’ (Statewide) R/R Planning
2.
Standardize and consolidate IT Infrastructure
where ever possible to ease R/R Planning
3.
Practice R/R Planning at the ‘Enterprise’ (not
agency) level
4.
Mandate R/R planning for all new IT systems
5.
Develop and document State guidelines on ‘risk
appetite’

Thank you!
Scott Andersen, WSDOT
Gary Duffield, DIS
Doug Selix, OFM
Thelma Smith, WSDOT
Brian Sylvester, DOP
Download