Disaster Recovery 101 Sudarshan Ranganath & Matthew Phillips Ellucian

advertisement
Disaster Recovery 101
Sudarshan Ranganath & Matthew Phillips
Ellucian
SESSION OBJECTIVES
 Business
continuity is critical to every institution
and its IT organization. How do you set up your
ERP and other Tier 1 apps to reduce the risk of a
disaster, and quickly recover from one should
disaster strike? Learn about the infrastructure
and practices that Ellucian’s Cloud Services uses
to minimize the impact of a disaster for your
Banner systems.
AGENDA
 Tier
-1 Apps
 Disaster Prevention
 Disaster Readiness
 DR Execution
 DR Options
3
May 2, 2013
TIER 1 APPS FOR DISASTER RECOVERY /
BC
 Communication
 SIS/ERP
 LMS
 CRM
 Other
Financial/Operational
4
May 2, 2013
RISKS THAT COULD IMPACT YOUR
OPERATIONS
Causes
• Natural disasters
• Human errors
• Technological failures
Frequent
Probability
Likely Occassional
Severity
Catastrophic
Unlikely
Hurricane
/Flooding
Hit by
Tornado
Security
Staff Breach
Attrition
Power
Demand
Outage
Surge
Critical
Moderate
CPU
Failure
Disk
Failure/
Trip over
a wire
Negligible
Priority
Seldom
Extremely
High
High
Impact
• Business
Interruption
• Financial
• Legal
• Reputational
Moderate
Low
KEY KPIS YOU CARE ABOUT AS
ORGANIZATION
Downtime
Performance
Security
Scalability
Disaster Recoverability
Backup Currency
Software Currency
Stakeholder Support
Costs/Investment Efficiency
AN
IT
• Application Availability
• User Experience for Public and Private apps
• Number of Security Incidents
• Extent of compromise per Incident
• Ability of current infrastructure to handle load
• Time to add capacity in response to demand spike
• Probability of a disaster affecting the datacenter
• Time to recover from a site-level disaster
• Lost work product because of inefficient backup practices,
and aging of backed-up data as a result
• Time to update to newest version after being made available
by vendor
• Effectiveness in furthering student/staff satisfaction
• TCO to operate solution, ROI for every $$ invested
DISASTER PREVENTION
 Power
 Facility
 Network
 Hardware
 Application
Architecture
 Replication
 Process
7
May 2, 2013
DISASTER PREVENTION - POWER
 Multiple
Utilities or Stations
 A and B power Grids
 All components connected A&B
 UPS Generator
 Generator Backup
 Fueling agreements for outage >2 days
8
May 2, 2013
DISASTER PREVENTION - FACILITY
 Multiple
Physical Entries for Power,
Network
 Hardened Walls and Roof
 Temperature – Humidity
 Secure personnel
and equipment Entries
 Multi-stage Fire Detection
9
May 2, 2013
DISASTER PREVENTION - NETWORK
 Multiple
Internet connections
 Multiple ISP providers
 Redundant firewalls
 Redundant core network
 Servers, storage redundant connections
10
May 2, 2013
DISASTER PREVENTION - HARDWARE
 Redundancy
is key at every level
 SAN vs. non-SAN
 Virtualization vs. Dedicated Server
Hardware
 Redundant cold/warm/hot hardware in
DR location
May 2, 2013
DISASTER PREVENTION - APPLICATION
ARCHITECTURE
 Again…
redundancy is key at every level
 DB tier and App tier
 Monitoring & alerting considerations
 Integrations
 Customization
and Modifications
 Licensing
May 2, 2013
DISASTER PREVENTION - APPLICATION
ARCHITECTURE
 Backup
architecture considerations
 OS is static
 Application tier is static
 Database backup considerations
 Database backup architecture
 Fullexp, RMAN, cold, custom hot
 Archive vs no-archive mode (prod vs non-prod)
 Data-Domain style vs Tape architecture
 Architecture must consider RTO and RPO
May 2, 2013
DISASTER RECOVERY REPLICATION
Backup
Process
 Replication
Process
 Recovery
Point
 Recovery
Time

14
May 2, 2013
DISASTER PREVENTION - PROCESS
 ITIL®
Change Management
 Incident Management
 Shutdown / Startup Processes
 Access Control / Role Based Security
 Training
15
May 2, 2013
DISASTER READINESS
 How
do you test your readiness for disaster
 Failover Test





Power
Network test
VM test
Application / Database test
Monitoring test
EXECUTION WHEN YOU HAVE A
SITE LEVEL DISASTER
 Requires
People & Process
 Facility to restore
 Infrastructure (Network, servers, storage, Recovery software, DNS)
 Most Recent Backups

 Prioritization
 Move
IP networking from primary to DR
 Recover Virtual Machines
 Recover Databases
 Recover Apps
 Integrations to other systems
17
May 2, 2013
DISASTER RECOVERY STRATEGIES
Strategy
RPO
RTO
Cost
Server Replication
Secs – Min
< Hr
$$$$$
SAN Replication
Min – hours
Hours – Day
$$$$
VM + DB logs
Hours – day
Hours - days
$$$
Offsite Tape + DR
Contract
Days
Days-weeks
$$
Offsite Tape
Days
Months
$
18
May 2, 2013
SUMMARY
• DR is about
• Planning & Testing Readiness
• Prevention, Readiness, Execution
Download