What is a Major Incident - e

advertisement
Service Interrupted…
AHS Experience with IT Major Incidents &
Clinical Involvement
e-Health 2013
Challenges & Opportunities TEC Talk
Wendy Tegart, Provincial Director Service Management
Jill Robert, IT Strategic Partner
Service Interrupted… TEC Talk
Faculty/Presenter Disclosure
• Faculty: Wendy Tegart & Jill Robert
• Relationships with commercial interests:
–
–
–
–
Grants/Research Support: Not applicable
Speakers Bureau/Honoraria: Not applicable
Consulting Fees: Not applicable
Other: Employees of Alberta Health Services
Nothing to Disclose
www.albertahealthservices.ca
2
Service Interrupted… TEC Talk
Agenda
1
Alberta Health Services Overview
2
Major Incident Process
3
Major Incident Roles
4
Communication Approach
5
Clinical Involvement
6
Next Steps
7
Questions
www.albertahealthservices.ca
3
Service Interrupted… TEC Talk
Alberta Health Service Overview
Alberta Health Services (AHS)
Responsible for delivering health services to the
3.8 million people living in Alberta, over 661,848
square kilometers served
Annual Service Volumes (2011-12)
•
•
•
•
•
Acute Care
2,029,191 Emergency
Department Visits
376,115 Hospital Discharges
Primary Care
2,602,384 Total Hospital Days
• 104,704 Home Care Clients
50,099 Births
• 766,146 Health Link calls
99 Acute care hospitals and 5
stand-alone psychiatric facilities • 393,964 EMS Calls/Events
www.albertahealthservices.ca
4
Service Interrupted… TEC Talk
Alberta Health Service Overview
AHS Scale of Effort
• Largest Employer in Alberta, 5th largest in Canada
◦ 100,000 employees
◦ 7,000 physicians
◦ 120,000 network IDs
• Scope of AHS-IT
◦
◦
◦
◦
◦
◦
◦
1,514 production apps (163 critical)
34 data centers
4,721 servers (physical and virtual)
75,000 workstations
48,000 tickets generated monthly
550 concurrent users in ITSM tool
1,300 IT Staff (+ outsourced partners)
www.albertahealthservices.ca
5
Service Interrupted… TEC Talk
Major Incident Process
Context to Current Realities
• Complexities of Electronic Health Record in Alberta
• Local vs Provincial IT service delivery
• Given the complexities of the AHS IT landscape, aging and varied
technical infrastructure and critical service requirements to support
patient care...
“Downtimes happen...”
How do we minimize organizational and clinical impact and provide robust
support when the technology fails?
www.albertahealthservices.ca
6
Major Incident Process
Service Interrupted… TEC Talk
Super Bowl 2013 – infamous power outage
www.albertahealthservices.ca
7
Service Interrupted… TEC Talk
Major Incident Process
What is a Major Incident (MI)?
 IT has a provincial Incident Management Process to manage all Incidents.
When an Incident is of a certain scale, scope, or impact, a “Major”
Incident is launched.
 The goal of the Incident process is to return an IT Service to operational
status.
 Throughout AHS-IT, we employ this common process to ensure that
major IT service issues are quickly identified and appropriately
responded to.
 The purpose of the MI process is to supplement the Incident process
with additional resources, escalation, communication and record
keeping.
www.albertahealthservices.ca
8
Service Interrupted… TEC Talk
Major Incident Process
Is this a “Critical” Incident?
Urgency and Impact must both be High to create a critical incident. Critical
Incidents must be escalated to the IROC immediately. Critical incidents are:
– a major outage affecting a large number of customers
– an essential service and/or a business unit where there is no available
resolution or work around to provide a return to business operations
Must also consider:
–
–
–
–
–
Patient safety may be at risk or reduced effectiveness of patient care
The safety of AHS staff and personnel
Impact to confidentiality of data, or reliability of data
Degradation of a service including data, applications, or infrastructure.
A Senior Admin from the business is requesting a Major Incident be declared
(requires immediate escalation to IROC)
www.albertahealthservices.ca
9
Major Incident Process
Service Interrupted… TEC Talk
Is this a “Critical” Incident? Urgency
www.albertahealthservices.ca
10
Major Incident Process
Service Interrupted… TEC Talk
Is this a “Critical” Incident? Impact
www.albertahealthservices.ca
11
Major Incident Process
Service Interrupted… TEC Talk
Priority
www.albertahealthservices.ca
12
Service Interrupted… TEC Talk
Major Incident Process
Major Incidents by Month
www.albertahealthservices.ca
13
Service Interrupted… TEC Talk
Major Incident Roles
IT Major Incident Roles
An IT service Incident is typically managed by the IT Service Desk and/or a
specific IT Service team. When an MI is initiated, some additional resources
brought in include:
IT Incident Response On Call (IROC)
This is a group of IT Directors who share an On Call responsibility for MI’s. Once contacted, the
IROC is responsible for managing the MI Process so the Service Desk and Service team can
concentrate on resolving the Incident.
IT Security & Compliance On Call
On Call IT Security staff to respond to MI’s with a security component.
IT Senior Leader On Call
This group of IT senior leaders is available to provide additional guidance and authority if/as
required by the particular MI.
Problem Manager
Chair and facilitate communication bridge meetings. Notify IT staff of updates.
www.albertahealthservices.ca
14
Service Interrupted… TEC Talk
Major Incident Roles
Clinical Roles
Not all MIs require the engagement of clinical experts, but when required
these roles provide context to clinical impact and urgency
Clinical Informatics
This is a group of Physicians and non-physicians
Clinical Operations Administrator On-call
On Call AHS leaders including Executive Directors and Site Administrators. May provide front
line resources to support in downtime and reconciliation efforts
Senior Leadership On-call
This group of AHS Senior leaders include Facility Medical directors and VPS
Health Information Management
Health Record Management experts with data and record integrity expertise
Zonal Emergency Operations Centres (ZEOCs)
Tied into Emergency Preparedness
www.albertahealthservices.ca
15
Service Interrupted… TEC Talk
Communication Approach
Bridges Types (conference calls)
Technical Bridge
•
•
Part of the Incident Management process, as is initiated independently of MI process
Opened when collaboration by several parties is required during incident resolution
activities
Communications Bridge
•
•
•
Launched by IROC to bring the right stakeholders together to identify the problem and
direct its resolution.
Problem manager assists by recording chronology, participants, decisions and results
Directs communications within IT and the user community
Clinical Bridge
•
Usually chaired by a Clinical Informatics physician
www.albertahealthservices.ca
16
Service Interrupted… TEC Talk
Communication Approach
MI Heads Up Notification
Incident Ticket 12345
Start Date and
Time
Please be aware an MI has just been declared for <Service>.
Full impact is still being assessed but at this point we have identified the following stakeholders and
groups as affected by this issue: <groups and stakeholders>
If your team is directly or indirectly responsible for this Service, please attend the appropriate
bridge calls set out below.
Conference Bridge Communication Bridge: <number>
Information Passcode: <number>
Start Time: <time>
Technical Bridge: (if applicable/tbd)
Passcode:(if applicable/tbd)
Start Time: <time>
Clinical Bridge: (if applicable/tbd)
Passcode:(if applicable/tbd)
Start Time: <time>
www.albertahealthservices.ca
17
Service Interrupted… TEC Talk
Communication Approach
Communications to Customers
IT - Service Issue Information
Message may be sent to users of IT Services. In relation to unexpected/unplanned service issues. Say
who is this information is intended for / pertains to. Speak in terms the customer will understand.
Briefly and directly tell users what is happening and what impact they will experience. Note that IT
teams are working to resolve the issue and restore Services.
Acknowledge the issue/inconvenience and provide contact information for the relevant zone/FHE
Service Desk. If appropriate, state that an update will be provided within a specific timeframe.
Replace all text in this section with pertinent information. Review the notifications guide if there are questions
on when to use this format.
Impact Summary
Clearly state what, from the users perspective, is not working. Also set out the specific locations
affected by this Service Issue.
NOTE: Any exclusions or caveats to what you've stated above regarding this Service Issue.
Replace all text in this section with pertinent information. Review the notifications guide if there are questions
on when to use this format.
www.albertahealthservices.ca
18
Service Interrupted… TEC Talk
Communication Approach
MI Root Cause Code Definitions
Cause Code
Summary
Application/Software Bug
The failure is caused by a problem within the packed software itself.
Communication
Failure is caused by a missed communication.
Data
Unexpected or corrupted data elements caused the failure.
Environment
The failure is caused by an uncontrolled element of the physical world where redundancy
would not have reasonably mitigated the effect.
Equipment
Failure due to age, malfunction or fault in the physical equipment where redundancy would
not have reasonably mitigated the effect..
IT Third Party Vendor
Root cause lies with the vendor providing a service.
Process
Missing or undeveloped process caused the failure. There was an oversight in the process; a
branch of the process isn’t properly developed or missed entirely.
Security
An IT Security failure caused the issue.
Training
The failure was caused by lack of understanding, incorrect qualification or insufficient training.
Other
A mistake was made where existing process, if followed correctly, should have avoided the
failure.
Unknown
Root cause undetermined.
www.albertahealthservices.ca
19
Service Interrupted… TEC Talk
Clinical Involvement
Clinical Support During a MI
• Transparency
• Communication
– Understanding and translating the clinical impact
– Timely and frequent “clinical speak” communication about
the incident and immediate risk mitigation measures
• Support
– Robust downtime procedures owned by clinical operations
– Bedside to boardroom engagement and support
www.albertahealthservices.ca
20
Clinical Involvement
Service Interrupted… TEC Talk
More than clinical involvement...
it’s about relationships,
partnerships and supporting
safe patient care!
www.albertahealthservices.ca
21
Service Interrupted… TEC Talk
Next Steps
Next Steps
• Continuous improvement per incident review
• Develop service improvement plans overall driven by business
requirements
• Examine different scales of MIs and support requirements
• Leveraging the successes of the MI process to other risk areas
• Continually examine clinical business risk tolerance/value and
architecture of information systems
• Simplify – application consolidation, migration to a provincial patient
care platform and large scale reliability/redundancy
1.
2.
Higdon’s Law
Good judgement comes from bad experience.
Experience comes from bad judgement.
www.albertahealthservices.ca
22
Service Interrupted… TEC Talk
Questions?
Comments / Questions
Insanity: Doing the same thing over
and over again and expecting
different results. ~ Albert Einstein
Jill.Robert@albertahealthservices.ca
Wendy.Tegart@albertahealthservices.ca
www.albertahealthservices.ca
23
Download