IT Major Incident Management Response and Communication A Collaborative Approach to IT Major Incident Management Nancy Proctor Chris Wright Kevin Chenoweth October 2012 1 Collaborative Approach to IT Major Incident Management • Objective: Expeditious Return to Normal Operations – Managed – Measured – Deliberate – Safe • Collaborative effort: IT; Hospital Administration; VUMC Communications; (Inpatient) Systems Support Services; (Outpatient) Operations Systems Engineering, Operational Units; Others as needed 2 IT Major Incident Definition* • High impact, or potentially high operational impact • Requires a response that is above and beyond that given to normal incidents. Typically, requires: – Cross departmental coordination – Management escalation – Mobilization of additional resources – Increased communications *ITIL Incident Management 3 IT Major Incident Phased Response and Escalation Points • Phase 1 From IT Incident Awareness to activation of IT Technical Conference Call Bridge • Phase 2 From Activation of IT Conference Call Bridge to Activation of Administrative Conference Call Bridge • Phase 3 From Activation of Administrative Conference Call Bridge to Return to Normal Operations 4 Communication Protocols • Early “Heads Up” Advisory to Hospital Administrative Coordinators (via text page) • Technical IT Conference Call Bridge (CCB#1) – Reserved for IT troubleshooting and internal IT communications only • Administrative Conference Call Bridge (CCB#2) – Reserved for IT and Hospital Administration communication, and coordination of Operational response – Activated by IT on request from Hospital Administration • Medical Center Alerts and Communications – IT and Operations jointly approve content of alerts and medical center communications 6 Roles and Responsibilities Incident Manager • Helpdesk Manager On Call assumes the role of Incident Manager* – Coordinates initial incident response and IT impact assessment – Liaise with Helpdesk, workgroups, and Informatics Admin On Call – Communicates IT impact assessment and status updates to Informatics Admin On Call, (via DR Admin on Administrative Conference Call Bridge if activated) – Maintains list of impacted systems and status Currently Helpdesk Manager but this may change 7 Roles and Responsibilities Informatics Administrator On Call (IC AOC) – Provides IT leadership and direction – Communicates impact assessment and status updates to Hospital Administration (ACs and AOC, depending on “phase”) – Presence on either Conference Call Bridge dependent on Incident Management Phase – Provides input to, and approves content of, enterprise communications – IT Counterpart to Hospital Administrator On Call 8 Roles and Responsibilities Systems Support Services/Operations Systems Engineering • Receive initial notification from Helpdesk – Receive notification when Technical CCB is opened; SSS Primary OnCall will join the bridge to receive status briefing – Escalate to Systems Support Management as necessary – Assist with impact assessment(s), workarounds, end user communications, resource requirements, issue verification, and recovery verification – Provide advice re need for House-wide downtime – Assess need for StarPanel banner message; provide input to IT AOC and Hospital ACs on content – Receive notification when Administrative CCB is activated • Interact with Informatics Center Admin On Call, Hospital ICs, IT Leadership Liaison, Hospital AOCs on CCB #2 (Admin Bridge Line) 9 Roles and Responsibilities Hospital AC (Administrative Coordinator) • Receives initial “heads up” text alert, notification of IT Conference Call Bridge activation (but does not join call), and IT impact assessment from IC AOC • Conducts operational impact assessment • Determines need to activate Administrative Conference Call Bridge (CCB#2) • Determines need for overhead announcements, alerts and enterprise communications • Collaborates with IC AOC on content of announcements, alerts, and enterprise communications • Engages other Operational (non-IT) resources as necessary • Determines need to escalate within Hospital Administration hierarchy 10 Roles and Responsibilities Hospital AOC (Administrator On Call) • Receives initial notification and/or updates from Hospital Administrative Coordinator • Joins Administrative Conference Call Bridge • Leads Operational Response Activities • Establishes communication with IC AOC via Administrative Conference Call Bridge • Determines need to activate Emergency Operations Center (EOC) • Determines need for overhead announcements, alerts and enterprise communications • Collaborates with IC AOC on content of announcements, alerts and enterprise communications • Engages other Operational (non-IT) resources as necessary 11 1 Incident/Event reported 1b Automated (monitored) event alerts to workgroups 1a Helpdesk alerted to incident 3 Standard Incident Response 2a No VUMC Helpdesk defines a Major Incident as an incident that has generated 3 or more end users reports about the same thing 2 Major Incident? 2b Yes 4b No 4 Incident Resolved? 6 HD Notifies: HD Manager, IT Workgroups 4a Yes 5 Close Incident 7 HD and Workgroups “work” the incident 8 Need for CCB#1? 8a Yes 11 Helpdesk initiates CCB#1 activation 8b No 9 Need to “Alert” Hosp ACs IC AOC, SSS? 9b No 9a Yes 10 Send Alert 12 11 Helpdesk initiates CCB#1 activation 12 Helpdesk contacts Computer Operations to activate CCB#1 (From Phase 1) 14 HD Notifies: HD Manager IC AOC, IT Workgroups, Systems Support 15 HD alerts Hospital ACs 16 IT Impact Assessment 17 Hospital AC Conducts Operational Impact Assessment 13 Computer Operations activate and host CCB#1 19a NO 18 IC Admin and Hospital AC’s confer 19a NO 19 Initiate Admin Conf Call Bridge #2? 19b YES 20 Helpdesk initiates activation of Conf Call Bridge #2 13 20 Helpdesk initiates activation of Conf Call Bridge #2 21 HD contacts CompOps to activate CCB#2 23 Hosp AC, IC Admin, SSS join Admin CCB #2 22 CompOps activate CCB#2 24 IT Incident Manager on CCB#1 provides updates to CCB#2 (From Phase 2) 25 IC Admin provides updates to Hosp AC 26 Hosp AC, IC AOC, IT Liaison, SSS collaborate to manage incident 27b NO 27 IT Major Incident Resolved? 27a YES 28 Operational Impact resolved? 29b NO 29 Continue to work Operational impact 29a YES 31 Return to Normal Operations 30 Close CCB#1 and CCB#2 14