Disaster Recovery 101 Sudarshan Ranganath & Matthew Phillips Ellucian SESSION OBJECTIVES Business continuity is critical to every institution and its IT organization. How do you set up your ERP and other Tier 1 apps to reduce the risk of a disaster, and quickly recover from one should disaster strike? Learn about the infrastructure and practices that Ellucian’s Cloud Services uses to minimize the impact of a disaster for your Banner systems. AGENDA Tier -1 Apps Disaster Prevention Disaster Readiness DR Execution DR Options 3 May 2, 2013 TIER 1 APPS FOR DISASTER RECOVERY / BC Communication SIS/ERP LMS CRM Other Financial/Operational 4 May 2, 2013 RISKS THAT COULD IMPACT YOUR OPERATIONS Causes • Natural disasters • Human errors • Technological failures Frequent Probability Likely Occassional Severity Catastrophic Unlikely Hurricane /Flooding Hit by Tornado Security Staff Breach Attrition Power Demand Outage Surge Critical Moderate CPU Failure Disk Failure/ Trip over a wire Negligible Priority Seldom Extremely High High Impact • Business Interruption • Financial • Legal • Reputational Moderate Low KEY KPIS YOU CARE ABOUT AS ORGANIZATION Downtime Performance Security Scalability Disaster Recoverability Backup Currency Software Currency Stakeholder Support Costs/Investment Efficiency AN IT • Application Availability • User Experience for Public and Private apps • Number of Security Incidents • Extent of compromise per Incident • Ability of current infrastructure to handle load • Time to add capacity in response to demand spike • Probability of a disaster affecting the datacenter • Time to recover from a site-level disaster • Lost work product because of inefficient backup practices, and aging of backed-up data as a result • Time to update to newest version after being made available by vendor • Effectiveness in furthering student/staff satisfaction • TCO to operate solution, ROI for every $$ invested DISASTER PREVENTION Power Facility Network Hardware Application Architecture Replication Process 7 May 2, 2013 DISASTER PREVENTION - POWER Multiple Utilities or Stations A and B power Grids All components connected A&B UPS Generator Generator Backup Fueling agreements for outage >2 days 8 May 2, 2013 DISASTER PREVENTION - FACILITY Multiple Physical Entries for Power, Network Hardened Walls and Roof Temperature – Humidity Secure personnel and equipment Entries Multi-stage Fire Detection 9 May 2, 2013 DISASTER PREVENTION - NETWORK Multiple Internet connections Multiple ISP providers Redundant firewalls Redundant core network Servers, storage redundant connections 10 May 2, 2013 DISASTER PREVENTION - HARDWARE Redundancy is key at every level SAN vs. non-SAN Virtualization vs. Dedicated Server Hardware Redundant cold/warm/hot hardware in DR location May 2, 2013 DISASTER PREVENTION - APPLICATION ARCHITECTURE Again… redundancy is key at every level DB tier and App tier Monitoring & alerting considerations Integrations Customization and Modifications Licensing May 2, 2013 DISASTER PREVENTION - APPLICATION ARCHITECTURE Backup architecture considerations OS is static Application tier is static Database backup considerations Database backup architecture Fullexp, RMAN, cold, custom hot Archive vs no-archive mode (prod vs non-prod) Data-Domain style vs Tape architecture Architecture must consider RTO and RPO May 2, 2013 DISASTER RECOVERY REPLICATION Backup Process Replication Process Recovery Point Recovery Time 14 May 2, 2013 DISASTER PREVENTION - PROCESS ITIL® Change Management Incident Management Shutdown / Startup Processes Access Control / Role Based Security Training 15 May 2, 2013 DISASTER READINESS How do you test your readiness for disaster Failover Test Power Network test VM test Application / Database test Monitoring test EXECUTION WHEN YOU HAVE A SITE LEVEL DISASTER Requires People & Process Facility to restore Infrastructure (Network, servers, storage, Recovery software, DNS) Most Recent Backups Prioritization Move IP networking from primary to DR Recover Virtual Machines Recover Databases Recover Apps Integrations to other systems 17 May 2, 2013 DISASTER RECOVERY STRATEGIES Strategy RPO RTO Cost Server Replication Secs – Min < Hr $$$$$ SAN Replication Min – hours Hours – Day $$$$ VM + DB logs Hours – day Hours - days $$$ Offsite Tape + DR Contract Days Days-weeks $$ Offsite Tape Days Months $ 18 May 2, 2013 SUMMARY • DR is about • Planning & Testing Readiness • Prevention, Readiness, Execution