IT Major Incident Management Response and Communication

advertisement
IT Major Incident Management
Response and Communication
A Collaborative Approach to IT Major
Incident Management
Nancy Proctor
Chris Wright
Kevin Chenoweth
October 2012
1
Collaborative Approach to IT Major Incident
Management
• Objective: Expeditious Return to Normal Operations
– Managed
– Measured
– Deliberate
– Safe
• Collaborative effort: IT; Hospital Administration; VUMC
Communications; (Inpatient) Systems Support Services;
(Outpatient) Operations Systems Engineering,
Operational Units; Others as needed
2
IT Major Incident Definition*
• High impact, or potentially high operational
impact
• Requires a response that is above and beyond
that given to normal incidents. Typically,
requires:
– Cross departmental coordination
– Management escalation
– Mobilization of additional resources
– Increased communications
*ITIL Incident Management
3
IT Major Incident Phased Response and
Escalation Points
• Phase 1 From IT Incident Awareness to activation
of IT Technical Conference Call Bridge
• Phase 2 From Activation of IT Conference Call
Bridge to Activation of Administrative Conference
Call Bridge
• Phase 3 From Activation of Administrative
Conference Call Bridge to Return to Normal
Operations
4
Communication Protocols
• Early “Heads Up” Advisory to Hospital Administrative Coordinators (via
text page)
• Technical IT Conference Call Bridge (CCB#1)
– Reserved for IT troubleshooting and internal IT communications only
• Administrative Conference Call Bridge (CCB#2)
– Reserved for IT and Hospital Administration communication, and coordination
of Operational response
– Activated by IT on request from Hospital Administration
• Medical Center Alerts and Communications
– IT and Operations jointly approve content of alerts and medical center
communications
6
Roles and Responsibilities
Incident Manager
• Helpdesk Manager On Call assumes the role of
Incident Manager*
– Coordinates initial incident response and IT impact
assessment
– Liaise with Helpdesk, workgroups, and Informatics
Admin On Call
– Communicates IT impact assessment and status
updates to Informatics Admin On Call, (via DR Admin
on Administrative Conference Call Bridge if activated)
– Maintains list of impacted systems and status
Currently Helpdesk Manager but this may change
7
Roles and Responsibilities
Informatics Administrator On Call (IC AOC)
– Provides IT leadership and direction
– Communicates impact assessment and status updates
to Hospital Administration (ACs and AOC, depending
on “phase”)
– Presence on either Conference Call Bridge dependent
on Incident Management Phase
– Provides input to, and approves content of, enterprise
communications
– IT Counterpart to Hospital Administrator On Call
8
Roles and Responsibilities
Systems Support Services/Operations Systems
Engineering
• Receive initial notification from Helpdesk
– Receive notification when Technical CCB is opened; SSS Primary OnCall will join the bridge to receive status briefing
– Escalate to Systems Support Management as necessary
– Assist with impact assessment(s), workarounds, end user
communications, resource requirements, issue verification, and
recovery verification
– Provide advice re need for House-wide downtime
– Assess need for StarPanel banner message; provide input to IT AOC
and Hospital ACs on content
– Receive notification when Administrative CCB is activated
• Interact with Informatics Center Admin On Call, Hospital ICs, IT
Leadership Liaison, Hospital AOCs on CCB #2 (Admin Bridge Line)
9
Roles and Responsibilities
Hospital AC (Administrative Coordinator)
• Receives initial “heads up” text alert, notification of IT Conference Call
Bridge activation (but does not join call), and IT impact assessment from IC
AOC
• Conducts operational impact assessment
• Determines need to activate Administrative Conference Call Bridge
(CCB#2)
• Determines need for overhead announcements, alerts and enterprise
communications
• Collaborates with IC AOC on content of announcements, alerts, and
enterprise communications
• Engages other Operational (non-IT) resources as necessary
• Determines need to escalate within Hospital Administration hierarchy
10
Roles and Responsibilities
Hospital AOC (Administrator On Call)
• Receives initial notification and/or updates from Hospital
Administrative Coordinator
• Joins Administrative Conference Call Bridge
• Leads Operational Response Activities
• Establishes communication with IC AOC via Administrative
Conference Call Bridge
• Determines need to activate Emergency Operations Center (EOC)
• Determines need for overhead announcements, alerts and
enterprise communications
• Collaborates with IC AOC on content of announcements, alerts and
enterprise communications
• Engages other Operational (non-IT) resources as necessary
11
1
Incident/Event
reported
1b
Automated
(monitored) event
alerts to
workgroups
1a
Helpdesk alerted
to incident
3
Standard
Incident
Response
2a
No
VUMC Helpdesk defines a
Major Incident as an
incident that has
generated 3 or more end
users reports about the
same thing
2
Major Incident?
2b
Yes
4b
No
4
Incident
Resolved?
6
HD Notifies:
HD Manager,
IT
Workgroups
4a
Yes
5
Close Incident
7
HD and
Workgroups
“work” the
incident
8
Need for
CCB#1?
8a
Yes
11
Helpdesk
initiates
CCB#1
activation
8b
No
9
Need to “Alert”
Hosp ACs IC
AOC, SSS?
9b
No
9a
Yes
10
Send Alert
12
11
Helpdesk
initiates
CCB#1
activation
12
Helpdesk contacts
Computer
Operations to
activate CCB#1
(From Phase 1)
14
HD Notifies:
HD Manager
IC AOC,
IT Workgroups,
Systems Support
15
HD alerts Hospital
ACs
16
IT Impact
Assessment
17
Hospital AC
Conducts
Operational Impact
Assessment
13
Computer
Operations
activate and host
CCB#1
19a
NO
18
IC Admin and
Hospital AC’s
confer
19a
NO
19
Initiate Admin
Conf Call
Bridge #2?
19b
YES
20
Helpdesk initiates
activation of Conf
Call Bridge #2
13
20
Helpdesk
initiates
activation of
Conf Call
Bridge #2
21
HD contacts
CompOps to
activate
CCB#2
23
Hosp AC, IC
Admin, SSS join
Admin CCB #2
22
CompOps
activate
CCB#2
24
IT Incident
Manager on
CCB#1 provides
updates to CCB#2
(From Phase 2)
25
IC Admin provides
updates to Hosp
AC
26
Hosp AC, IC AOC,
IT Liaison, SSS
collaborate to
manage incident
27b
NO
27
IT Major
Incident
Resolved?
27a
YES
28
Operational
Impact
resolved?
29b
NO
29
Continue to work
Operational impact
29a
YES
31
Return to Normal
Operations
30
Close CCB#1
and CCB#2
14
Download