History - EDUCAUSE.edu

advertisement
Disaster Recovery and
Business Continuity Planning
in a University Environment
Mardecia Bell
Ann Harris
Copyright Mardecia Bell/Ann Harris 2005. This work is the intellectual property of the authors. Permission
is granted for this material to be shared for non-commercial, educational purposes, provided that this
copyright statement appears on the reproduced materials and notice is given that the copying is by
permission of the authors. To disseminate otherwise or to republish requires written permission from the
authors.
The realization of a single point of
failure with one data center for both the
central academic and administrative IT
environments, prompted NC State
University to implement a disaster
recovery strategy for communications
and critical applications residing on the
mainframe & open systems computing
environment.
History/Timeline
1997
Initiated with the administrative environment
Mainframe environment recovery test
1999
Y2K - Business Continuity concept
Acquired central repository software (LDRPS)
2001
Scheduled annual Mainframe recovery test
Included communications & academic environment
2002
Expanded to include Enterprise Business
Continuity/Disaster Recovery Planning
2004
Successful DR test of ERP systems
2005
Co-processing of production services began in Data
Center II
Implementation Steps
•
•
•
•
•
•
•
Gain Sponsorship
Establish Steering Committees
Develop University Policy/Regulation
Create DR Structure/Establish Staffing
Market Program
Establish Central Repository
Review & Test Plans Regularly
Gain Sponsorship
• Office of the President – University System
• Chancellor
• Executive Management
–
–
–
–
Present your Business Case
Identify the roles involved
Provide Executive Summary of BC/DR Program
Present Statement of Work and Project Plan
• Add responsibilities to staff work plans
Establish Steering Committees
• IT Steering Committee
• Business/Service Steering Committee
• Both committees are comprised of
– Vice Chancellor/Vice Provost Level
– Representatives from Critical Areas of the Campus
– Ex Officio members from IT areas
• Mission of IT Steering Committee
– Provide guidance and oversight for the
combined academic and administrative
Disaster Recovery Plan.
Policy/Regulations/Rule
• Develop a Policy or Regulation to affirm
the mandate and promote cooperation
Divide Campus Into Groupings
•
•
•
•
•
•
•
•
•
Space/Facilities
Teaching and Academic Programs
Academic IT
Administrative IT
Environmental Health and Public Safety
Business Administration
Research Programs
Student Affairs
Extension and Engagement
Resource Projections
• Hire Full-Time Business Continuity and Disaster
Recovery Personnel
– Director of Business Continuity (plus 1 Business
Analyst)
– Admin IT DR Coordinator (plus 1 Business Analyst)
– Academic DR Coordinator (part-time)
• Add BC/DR responsibilities to work plan of
existing staff
• Identify Coordinators for each business unit
Marketing
•
•
•
•
•
•
•
Present at campus departmental meetings
Create a Website
Utilize listserves
Campus Newspaper
Network with peer institutions
Remain abreast of industry standards
Attend conferences, workshops and
seminars
Establish Central Information
Repository
Continuous Implementation
Accomplishments
• Disaster Recovery and Business Continuity
Plan
• Risk Assessments for Critical Business Units
• Successful Mainframe Recovery Tests
• Designed and implemented infrastructure for
central computing environment (academic &
administrative) in secondary data center.
• Implementation of recovery strategies in
secondary data center
• Creation of Administrative IT Disaster
Recovery Unit
Illustration of Various DR Deployments
 Fault-tolerant cluster (file and print services)
A Production
B Production
B Configuration
A Configuration
B Production
A Production
 Co-processing and load-balancing (ERP)
A Production
A Production
A Production
 Distributed deployment (hosted systems)
A Production
A Development
A Production
 Data replication (mainframe)
Server
Data
Server
Data
Server
Data
Enterprise Resource Planning (ERP) Deployment
 Financial System
 Human Resources (Version 8.8)
 Student Information System (under
construction)
Campus
Users
DC I
DC II
Batch
Server
Batch
Server
Data
Storage
Area
Network
Web
Server
Web
Server
Web
Server
Web
Server
Application
Server
Application
Server
Application
Server
Application
Server
DB
Server
DB
Server
Batch
Server
Batch
Server
Summary and Future Steps
DC I
Novell Directory
Services / Novell
Email/Calendar
Anti-SPAM
File/Print,
User
Home
Citrix
DC II
Novell Directory
Services / Novell
Email/Calendar
Anti-SPAM
Citrix
Backup/vaulting
Backup/vaulting
Hosted
systems
Hosted
systems
Data
Data
Data
Active Directory
/ Windows
Storage
Area
Network
Infrastructure
Database
Server
Web
Server
ERP
Web
File/Print,
User
Home
ERP
Application
Development
Server
ERP DB
Server
Mainframe
Server
ERP
Batch
Data
Data
Data
Active Directory
/ Windows
Storage
Area
Network
Infrastructure
Database
Server
Web
Server
ERP
Web
ERP
Application
Development
Server
ERP DB
Server
Mainframe
Server
ERP
Batch
Administrative IT Disaster Recovery Unit
Mission
• Ensure minimal risk of major disruptions to
critical University systems and processes
in the event that all or part of its computer
operations are rendered inoperable.
• Ensure timely recovery of infrastructure
and services in the event of a disruption.
• Ensure that business continuity plans are
available and viable relative to its
scenario.
Risk Management
•
•
•
Identify
Mitigate
Process Mapping
Risk Management
Risk Mitigation
Risk Assessment
• Prioritize Actions
• Evaluate recommended
Control Options
• Conduct Cost-Benefit
Analysis
• Select Controls
• Assign Responsibility
• Develop Safeguard
Implementation Plan
• Implement Selected Controls
•
•
•
•
•
•
•
•
•
NIST SP 800-30
System Characterization
Threat Identification
Vulnerability Identification
Control Analysis
Likelihood Determination
Impact Analysis
Risk Determination
Control Recommendations
Results Documentation
Process Mapping
Infrastructure
• Total DR through distributed high
availability
• Client Recovery Solutions
• Application Restoration
• Establish collaborative partnerships with
other Universities
Client Recovery Solution(s)
Application Restoration
• Event
• Time
• Scope of Impact
– Infrastructure
– Software
– Hardware
Collaborative Partnerships
Vaulting
•
•
•
•
Readily accessible
Secure
Onsite
Offsite
Critical Business Units
•
Advancement Services
•
All Campus Network
•
Budget Office
•
College of Agriculture and Life Sciences - Personnel Office
•
ComTech - Data Networking
•
ComTech - Telecommunications
•
Contracts and Grants
•
Controller's Office
•
Enterprise Application and Database Services
•
EH&S - Business Continuity
•
EH&S - Campus Police
•
EH&S - Emergency Response
•
EH&S - Environmental Affairs
•
EH&S - Health and Safety
•
EH&S - Industrial Hygiene
•
EH&S - Insurance and Risk Management
•
EH&S - Radiation Safety
•
EH&S - Transportation
•
•
•
•
EH&S - Waste Management
Enrollment Management - Admissions
Enrollment Management - Office of Scholarships & Financial Aid
Enrollment Management - Registration and Records
•
•
•
•
•
Enterprise Technology Services and Support
Facilities - Construction Management
Facilities - Design and Construction Services
Facilities - Operations
Facilities - University Architect
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Fire Protection
Foundations Accounting & Investments
HR - Benefits
HR - Employment & Compensation
HR - Human Resource Information Management
HR - Payroll
ITD - Business Services
ITD - Computer Operations
ITD - Computer Services
ITD - Systems
Libraries - Administration
Materials Management - Materials Support
Materials Management - Purchasing
Materials Management - University Graphics
Real Estate
Student Health Services
University Cashier's Office
University Dining
University Housing
Business Continuity Planning
Communication
•
•
•
•
Consistency in plan updating
Training
Partnering
Emergency Communication standardization
–
–
–
–
–
Call Trees
Mobile Devices
Website
Incident Command System Call Center
Incident Report Plan
IT Disaster Categorization
• Category 1: A single person or group in a
Critical Business Unit (CBU) is unable to
perform their critical functions
• Category 2: An entire CBU is unable to
perform its critical functions
• Category 3: Multiple CBUs are unable to
perform their critical functions
• Category 4: Non CBUs are not able to
perform their critical functions
• Category 5: A wide spread event that impacts
the entire University
Goals
• Total DR through distributed high
availability
• Standardized Emergency
Communications
• Immediate Client Recovery Solutions
• Improved RTO
Ann Harris
Asst Dir, Administrative IT Disaster Recovery
919-515-9228
ann_harris@ncsu.edu
http://www.fis.ncsu.edu/dr
Download