Incident Management Process

advertisement
Incident Management Process
Handbook
Griffith University
Version 1.7
Griffith University
Incident Management Process Handbook
Version History
Version
No
Issue Date
Nature of Amendment
Editor
0.3
July 2003
First draft
Patrick Keogh
(LucidIT)
0.4
8/3/2004
Updates to priority and response / resolution table.
Minor updates
John Scullen
0.5
22/3/2004
Updates from quality review walkthrough
John Scullen
0.6
24/3/2004
Updates suggested by ICTS-MT. Response
John Scullen
definition adjusted and diagram included. Response
times adjusted for priority 1 and 2. Hierarchical
escalation modified to ICTS-MT specifications.
0.7
6/4/2004
Minor updates resulting from final v0.6 distribution.
Merril Rogers
1.0
20/4/2004
Minor updates from Geoff Dengate. Approved by
Project Board
John Scullen
1.1
13/07/2006
Revision
Sanja Tadic, Judy
Bromage, Julie
Aslett
1.2
02/10/2006
Minor updates resulting from consultations with
Product and Service Managers
Sanja Tadic, Julie
Aslett
1.3
22/06/2007
Minor update
Sanja Tadic
1.4
29/09/2008
Minor update to document major outage procedure
Sanja Tadic,
Naveen Sharma
1.5
30/07/2010
Minor update to document procedure for incorrect
group assignment. Update to Incident Management
Quick Reference Guide Appendix A. Update to
Incident Categories Appendix B. Update of name
change from InfoServices to Library and IT Help.
References to EITS changed to CTS.
Felicity Berends
Sanja Tadic
1.6
30/4/2013
Minor updates of SDT name from
LibraryandITHelp@Griffith to Service Desk tool.
Minor corrections to role titles. Update to Quick
Reference Guide (Appendix A) and Incident
Categories spider chart (Appendix B)
Felicity Berends
Distribution
Ver #
0.4
Recipient
Andrew Bowness
Wendy Balachandran
Christine Schafer
Regina Obexer
Matt Maynard
Carol O’Faircheallaigh
Date issued
8/3/2004
Reason for distribution
Initial release prior to quality
review
ii
Griffith University
Incident Management Process Handbook
Carolyn Plant
Karl Turnbull
Rowan Salt
Geoff Mitchell
Sandra Reis
0.6
ICTS-MT
Chris Walker
Paul Jardine
26/3/2004
Circulated following suggested
updates
0.7
Project Board
6/4/2004
Approval
1.0
ITIL Steering Commitee
21/4/2004
FYI
1.1
ITIL Steering Comittee
22/08/2006
Approval
1.1
INS Product and Service
Managers (currently using
LibraryandITHelp@Griffith)
12/09/2006
29/09/2006
FYI
1.5
INS Product and Service
Managers (currently using
LibraryandITHelp@Griffith)
31/08/2010
FYI
1,6
INS Product and Service
Managers (currently using the
Service Desk tool)
30/4/2013
FYI
iii
Griffith University
Incident Management Process Handbook
Print date
– 31 August 2010
Filename
– Incident Management Process Handbook v1_6.doc
Author(s)
– Patrick Keogh (Lucid IT Pty Ltd), Wendy Balachandran, John Scullen, Sanja Tadic,
Felicity Berends
iv
Griffith University
Incident Management Process Handbook
Table of Contents
1
Introduction .................................................................................... 3
1.1 Objective .................................................................................... 3
1.2 Scope ........................................................................................ 3
1.3 Document Structure ................................................................... 3
2
The Incident Management Process .............................................. 5
2.1 Objectives of Incident Management ........................................... 5
2.2 Scope ........................................................................................ 5
2.3 Incident Classification ................................................................ 5
2.4 Process Description ................................................................... 8
2.5 Response and Resolution Times ............................................. 11
2.6 Escalation ................................................................................ 12
2.7 Priority 1 Incident Procedure.................................................... 17
3
Roles and Responsibilities ......................................................... 20
3.1 Manager, Library and IT Help (Incident Management Process Owner)
20
3.2 Library and IT Help Team Leaders (Service Desk Manager) ... 20
3.3 First Tier (Service Desk) Support............................................. 21
3.4 Resolution Groups ................................................................... 22
4
Communication Framework........................................................ 25
4.1 Communication Framework ..................................................... 25
4.2 Relationship with Other Processes .......................................... 26
4.3 Incident Management & Problem Management ....................... 26
4.4 Incident Management & Configuration Management ............... 27
4.5 Incident Management & Service Level Management ............... 28
5
Performance Management .......................................................... 29
6
Management Reports .................................................................. 30
6.1 Management Reports .............................................................. 30
7
Process Review ........................................................................... 32
Appendix A: Incident Management Quick Reference Card .............. 33
Appendix B: Incident Categories ........................................................ 34
1
Griffith University
Incident Management Process Handbook
Appendix C: Abbreviations and Definitions ...................................... 35
7.1 Abbreviations and Acronyms Used .......................................... 36
7.2 Definition of Terms Used ......................................................... 36
Appendix D: Unsupported Functions ................................................. 39
2
Griffith University
Incident Management Process Handbook
1 Introduction
The main goal of Incident Management is to restore normal service operation as
quickly as possible and minimise the adverse impact on business operations. Service
Desk as the first point of contact for Information Services clients is the owner of the
Incident Management Process. The main objective of Service Desk is to facilitate the
restoration of normal operational service with minimal business impact on the client
and within agreed service levels and business priorities.
These procedures are to be used by all Information Services staff handling Incidents
within the scope defined in section 2.2 “Scope”.
1.1 Objective
Information Services aims to achieve the following objectives with the Incident
Management Process:
Capture information about an incident at the start of the process
Clients have confidence in Information Services capability
Consistent processes for clients
Have clear procedures for clients on how to get help
Better management of, and alignment with, client expectations
A well defined scope of the Service Desk role which is clearly communicated
Analysis of incidents which will contribute to a better understanding of the
underlying issues
1.2 Scope
This Handbook documents the Incident Management Process including:
-
The process flow
Roles and responsibilities
Communication framework
Performance Management
Management reports
Checklists and definitions
1.3 Document Structure
This document describes the Incident Management process for Information Services.
The document is organised as follows:
Section 2: The Incident Management Process – Provides a description of the
Incident Management Process. Includes high level process description, classification, lead
times, and escalation matrices.
3
Griffith University
Incident Management Process Handbook
Section 3: Roles and Responsibilities – The different roles within the Incident
Management Process are described. The responsibilities of each role are included.
Section 4: Communication Framework – The communication aspects are
explained in more detail. Included is high level communication framework and the
relationship between Incident Management and other processes.
Section 5: Performance Management – The Key Performance Indicators (KPIs)
for the Incident Management Process are described.
Section 6: Management Reports – Information about the different management
reports is provided.
Section 7: Process Review – Information about reviewing the Incident
management process.
The following Appendices are included at the end of the document:
Appendix A: Incident Management Quick Reference Card – Quick Reference
Cards (QRC) for Service Desk and second and third tier support groups.
Appendix B: Incident Categories – Overview of the different Incident
Categories
-
4
Griffith University
Incident Management Process Handbook
Appendix C: Abbreviations and Definitions – Explanation of abbreviations and definitions
used in this document
5
Griffith University
Incident Management Process Handbook
2 The Incident Management Process
Incident Management is closely associated with Problem Management, providing
categorisation and reporting of all the Information Services related incidents that occur
within Griffith University, thus enabling root cause analysis.
The second and third tier support groups have a role within this process. The focus
must be on quick solutions and the time involved restoring service as quickly as
possible.
After resolution, a complete recording of all actions should be documented in the
Service Desk tool. This will facilitate faster response times for future incidents, and will
free up second and third tier support groups for more proactive problem solving.
This section provides information about the Incident Management process.
2.1 Objectives of Incident Management
The objectives of Incident Management are:
-
To restore normal service operation as quickly as possible; and
To minimise the adverse impact on business operations
2.2 Scope
An Incident is any event which is not part of the standard operation of the service and
which causes, or may cause, an interruption to, or a reduction of, the quality of the
service.
Only production systems, or systems connected to the production network, are covered
by the Incident Management process.
Excluded from the scope of this process are:
-
Development activities
Non-production activities (unless connected to the production network)
All unsupported functions as specified in Appendix D.
2.3 Incident Classification
Classification of incidents is based on two aspects:
1. Priority of an incident: Relating to the severity of an incident; and
2. Category of an incident: Relating to the configuration item causing the incident to
occur
6
Griffith University
Incident Management Process Handbook
2.3.1 Incident Priority
The priority of an incident is determined by:
1. Impact: Impact of the incident on the business. The number of clients or
importance of system affected. The hierarchical position of the client is included in
this variable.
2. Urgency: How severely the client’s work process is affected. This influences the
timeframe that is allowed to resolve the incident.
The Impact/Urgency matrix, shown below, determines the priority of the incident.
Urgency
Impact
Low
Medium
High
Low
5
4
3
Medium
4
3
2
High
3
2
1
The assessment methodology for the impact and the severity is explained in more
detail in the sections below.
2.3.1.1 Impact
Incidents will be placed into High, Medium and Low impact categories. The key factor
in measuring impact is the impact the incident has on the business. Each incident
will be reviewed on a case-by-case basis with appropriate impact assessment and
approval based on the following criteria.
Impact
Description
High
Whole organisation affected;
Site or multiple sites affected;
Multiple groups of clients affected;
Critical business process interrupted; or
System-wide outages to Learning@Griffith, Staff portal, or Email
Medium
Group of clients, a Pro Vice Chancellor (PVC), or a member of the
Vice Chancellor’s (VC’s) Office staff affected;
Non-critical business process interrupted.
Low
One client affected (other than VC’s Office or PVCs)
7
Griffith University
Incident Management Process Handbook
2.3.1.2 Urgency
Incidents will be placed into High, Medium and Low urgency categories. The key factor
in measuring urgency is how severely the client’s work process is affected. This
influences the timeframe that is allowed to resolve the incident. Each incident will be
reviewed on a case-by-case basis with appropriate severity assessment and approval
based on the following criteria.
Urgency
Description
High
Process stopped; client(s) cannot work
Medium
Process affected; client(s) cannot use certain functions
Low
Process not affected; change request, new/extra/optimised function
2.3.2 Incident Category
Incident categories have been established to:
-
To assist with the correct assignment of incidents.
To facilitate reporting on the incident and problem management process.
To identify priority areas for proactive problem management to focus on.
Incident categories are described in Appendix B: Incident Categories.
8
Griffith University
Incident Management Process Handbook
2.4 Process Description
1. Incident
Detected
The aim of the Incident Management process is
to provide a standardised, high quality service to
all clients who report incidents. This section
provides an overview of the Incident
Management process as pictured in the process
flow chart (right). The process flowchart
provides an overview of the Incident
Management
process.
The
Incident
Management process is managed through the
Service Desk tool.
2.4.1 Incident Detected
The process starts with the detection of an
incident. An incident can originate
from a situation experienced by a client and
reported to the Service Desk
from a technical malfunction, detected by
clients, Information Services staff or third party
vendors.
Note: The incident can be communicated to the
Service Desk via the telephone, digitally or faceto-face.
8. Monitoring and Tracking
Note: Service Desk tool is a Request for
Information, Incident, Problem and Change
Management System.
2. Acceptance,
Recording and
Classification
Service
Request ?
Activate
Change Management
process
Priority 1 ?
Activate
Priority 1 Incident
procedure
3. Initial Support
4. Investigation
and Diagnosis
5. Resolve Incident
Incident
Resolved ?
6. Incident
Escalation
7. Verify Resolution
and
Incident Closure
2.4.2 Acceptance, Recording and Classification
When the incident is reported to the Service Desk, a Service Desk staff member should
first determine whether it falls within the scope of Incident Management process. If
uncertain, this person should seek the advice of the Manager, Library and IT Help
(Incident Management Process Owner) or a Team Leader (Service Desk Manager) for
confirmation.
The Service Desk staff member is responsible for opening a new record in the Service
Desk tool and recording the incident details. The following is an example of information
that should be captured:
-
Client details such as name, location, phone number, email
Classification of the incident in terms of incident category and priority
Detailed description of the incident and affected Configuration Items (CIs)
9
Griffith University
Incident Management Process Handbook
The Service Desk staff member classifies the incident according to impact and urgency
of the incident. Refer to the Impact/Urgency Matrix in 2.3.1 to help determine the
priority of the incident.
If an issue can be identified as a Request for Information (RFI) or a Request for
Change (RFC), the RFI & RFC process gets activated.
2.4.3 Initial Support
The Service Desk provides the initial support to solve the incident and, based on time
and knowledge, determines whether the incident can be solved.
If the incident cannot be solved by Service Desk, an incident reference number and a
response time (see section 2.5 “Response and Resolution Times”) will be provided.
If the Service Desk recognises incidents with similar symptoms, which have been
recently recorded, then Incident Matching can occur. Incident Matching is where similar
incidents are grouped together to reflect that there may be a larger “problem”. This
linking assists with Problem Management activities and, when a solution is found, it can
easily be transferred to all the incidents grouped together. The Service Desk tool
parent/child facility is used to record these matches/incidents.
If the incident cannot be linked to an existing problem, the Service Desk staff should
search Service Desk tool for similar incidents or the Knowledge Base for a resolution.
If a resolution is identified it can be applied to the incident. If not, additional
investigation and diagnosis should be carried out for an incident resolution.
2.4.4 Investigation and Diagnosis
The Service Desk staff member should first make an attempt at analysing the incident,
in search for a solution. This person can use the following information:
-
Own experience and knowledge
Service Desk tool information
Knowledge Base entries
Procedure manuals & other relevant documentation
Technical information from the Internet
Knowledge and experience of colleagues
Any additional information acquired which will be useful for resolving the incident
should be recorded in the Service Desk tool.
If a solution cannot be provided by Service Desk, it is then escalated to the relevant
support groups (second, third or vendor) for investigation and diagnosis to find and
implement a solution/workaround for the incident. The relevant support group updates
the incident record with the solution/workaround.
Note: If a solution cannot be provided, the solution should be identified as a possible
entry for inclusion into the Knowledge Base, which will assist future diagnoses.
2.4.5 Resolve Incident
If a resolution of the incident can be found and implemented within SLA time by the
relevant second, third or vendor support groups, the incident will be given a status of
10
Griffith University
Incident Management Process Handbook
‘resolved’. This will trigger an auto-generated email informing the client that their
incident has been resolved.
Part of this activity is updating Service Desk tool with the resolution information.
Accurate and complete recording of details is very important and is a requirement of all
parties involved in the Incident Management process. Quality information is critical for
future incident handling and restoration of normal business activities.
2.4.6 Incident Escalation
If an incident cannot be resolved by the Service Desk staff, the incident should be
transferred to second or third tier support groups. Service Desk remains responsible for
the incident until it is transferred to a second tier support group.
Transfer means involving second or third tier support groups in the resolution of the
incident. When incidents are transferred, the assigned owner of the incident is
responsible for keeping the client informed of progress. This is done by any
appropriate means: face to face, telephone, manual notification from Service Desk tool
or email. In addition, all the relevant service groups are responsible for ensuring all
activity relating to an incident is suitably annotated.
There may be occasions when an incident is escalated to an incorrect group. It is the
assigned group’s responsibility to transfer the incident to the correct group and provide
information to the analyst who initially assigned the incident incorrectly to enable them
to assign incidents of this nature correctly in the future. If the incorrectly assigned group
does not know which group to correctly assign the incident to it is appropriate to
transfer the incident to Service Desk for further investigation.
When Priority 1 and 2 incidents are assigned or transferred, the analyst assigning or
transferring should make a phone call to advise the new assigned owner that priority 1
(or 2) incident has been assigned or transferred to them.
Depending on the priority, hierarchical escalation might take place as well. Hierarchical
escalation (awareness) means that higher levels of management are involved when
there is a threatened breach of service levels or additional authorisation is required for
incident resolution.
Explanation of incident notification and escalation process is given in section 2.6.1.
Note: Updating the Internal Notes does not generate an email from Service Desk tool.
Only confirmation, requestor communications, resolution and closure emails are sent to
the client.
2.4.7 Verify Resolution and Incident Closure
The client will be informed once the incident is resolved. The client has three business
days (Monday to Friday, excluding holidays) to confirm that the incident can be closed.
If the client does not agree that the incident has been resolved, it will be reopened and
the process will return to “Investigation and Diagnosis.”
If client does not respond to the email requesting incident closure, the incident will
automatically be closed after three business days and the status changed to “closed”.
If the client responds after the incident closure, a new incident record will be created.
Incidents with status of “closed” should not be reopened. A new incident should be
created that refers to the original Incident record.
11
Griffith University
Incident Management Process Handbook
2.4.8 Monitoring and Tracking
The end to end progress of the incident is monitored and communicated to the client
when necessary. The Service Desk tool is updated each time the status of the incident
changes. Clients can monitor the status of their Incidents via Service Desk tool
accessible from Staff portal.
2.5 Response and Resolution Times
Response Time is the elapsed time between when the incident is recorded and when
work commences on investigation, diagnosis and resolution of the incident. The
response time can be utilised to
-
Research a solution
Mobilise a priority team
Request further details from the client
Advise action taken and provide an indication of the resolution time if required
Resolution Time is the target time for a resolution to an incident to be implemented.
Information Services aims to resolve at least 80% of incidents inside the resolution
times specified. This will allow for exceptional circumstances that cannot be met using
the standard times. Official closure of the incident is dependent on the approval of the
client. Solution time therefore does not include the time taken for the client to contact
Information Services to give approval, as this could happen some time later.
Detection &
report to
Service Desk
Start Repair
Diagnosis
Finish Repair
Recovery
Incident
Incident
Detection
time
Response time
Repair time
Recovery
time
12
Griffith University
Incident Management Process Handbook
The following table provides an overview of response and resolution times for
incidents. The times listed below relate to response and resolution times of the support
groups involved during standard business hours, Mondays to Fridays. Time required by
external support service providers (non Information Services) or purchasing time
(should there be the need for the acquisition of parts and/or materials) is excluded.
Response Time
Resolution Time
Priority 1
30 minutes
4 hours
Priority 2
1 hour
8 hours
Priority 3
4 hours
12 hours
Priority 4
1 day
3 days
Priority 5
2 days
5 days
2.6 Escalation
Escalation can take place in two ways:
1. Functional escalation – This is
escalation to another support group in
order to solve the incident.
2. Hierarchical escalation – This is
escalation in order to inform the right
(management) level within Information
Services for communication purposes and
in order to free up the necessary
resources to solve the incident.
Incident Notification and Escalation
Notification of Incidents occurs at defined times. Notification ensures that:
The Business (including management) and clients are kept informed of the
occurrence and progress towards resolution of an incident;
Swift action is taken to resolve the incident;
Management provide necessary resources to resolve the incident.
13
Griffith University
Incident Management Process Handbook
Notification is generated, depending on the priority, when:
Higher priority of incident is recorded in the Service Desk tool - certain
management levels are notified;
Incident is assigned or transferred to a group using the Service Desk tool;
75% time has lapsed since Incident was recorded & updated;
SLA is breached.
Table A details when internal notification occurs for 75% resolution time elapsed
and SLA breaches for all priorities
Table B details when additional internal and external notifications for each
priority occur
Table A
Event & Priority
75% of target
resolution time
elapsed
1 = 3 hours
Notification Sent to:
The Analyst the Incident is assigned to
Group Manager of the assigned group
Method
Email generated by
Service Desk tool
Category Owner of the Incident Category
2 = 6 hours
3 = 9 hours
4 = 2.25 days
5 = 3.75 days
100% - SLA
Breached
1 = 4 hours
The Analyst the Incident is assigned to
Group Manager of the assigned group
Email generated by
Service Desk tool
Category Owner of the Incident Category
2 = 8 hours
3 = 12 hours
4 = 3 days
5 = 5 days
Weekly
Report of all Incidents recorded in this
period, including breaches sent to Group
Managers
Reports generated from
Service Desk tool
Note: 1 working day = 9 hours, 8am – 5pm. The clock does not continue outside of
these hours for the purpose of escalation notification.
14
Griffith University
Incident Management Process Handbook
Table B
Priority
1&2
After
Immediately after
an incident has
been assigned to a
group or
transferred to
another group
Notification sent to:
Method
Group Manager of the
assigned group
Email generated by
Service Desk tool
Category Owner of the Incident
Category
SDT Announcement is
created to notify INS staff
using SDT and or Griffith
University Community
Incident Management Process
Owner (Manager Library and IT
Help)
Service Desk Manager (Library
and IT Help Management
Team)
The Analyst the Incident is
assigned to
1
30 minutes
(if Incident has not
been responded to
or appropriately
updated)
Product Manger, Team Leader
or Duty phone of the assigned
group
Analyst who assigns or
transfers the incident to
another support group
notifies the group via
telephone.
All INS Directors & Associate
Directors
Email generated by
Service Desk tool
PVC (INS)
Group Manager of the
assigned group
Category Owner of the Incident
Category
Incident Management Process
Owner (Manager Library and IT
Help)
Service Desk Manager (Library
and IT Help Management
Team)
The Analyst the Incident is
assigned to
Business/Clients
Library and IT Help phone
greeting may require
updating to include
information relating to the
Incident.
SDT Announcement is
created to notify INS staff
using SDT and or Griffith
University Community
15
Griffith University
Incident Management Process Handbook
Priority
1
After
1 hour (if Incident
has not been
responded to or
appropriately
updated)
Notification sent to:
The Analyst the Incident is
assigned to
Method
Email generated by
Service Desk tool
Team Leader of the assigned
group
Business/Clients
SDT Announcement is
updated with a progress
report of the Incident.
Update Library and IT Help
phone greeting to keep
clients informed on
progress of the incident.
Email from the relevant
Director/PVC (INS) might
be sent to all staff and all
students
Priority
1
After
Hourly (if Incident
has not been
responded to or
appropriately
updated)
Notification sent to:
Method
Team Leader of the assigned
group
Email generated by
Service Desk tool
Business/Clients
SDT Announcement is
updated with a progress
report of the Incident.
Update Library and IT Help
phone greeting to keep
clients informed on
progress of the incident
Update email from the
relevant Director/PVC
(INS) might be sent to all
staff and all students
2
Immediately after an
incident has been
assigned to a group
or transferred to
another group
Product Manager, Team
Leader or Duty phone of the
assigned group
Analyst who assigns or
transfers the incident to
another support group
notifies the group via
telephone.
SDT Announcement is
created to notify INS staff
using L&ITH@G and or
Griffith University
Community
16
Griffith University
Incident Management Process Handbook
Priority
2
After
30 minutes after an
incident has been
assigned to a group
or transferred to
another group
Notification sent to:
Team Leader of the assigned
group
Method
Email generated by
Service Desk tool
Incident Manager Process
owner
Service Desk Manager
(Library and IT Help
Management Team)
2
1 hour
(if Incident has not
been responded to
or appropriately
updated)
All INS Directors & Associate
Directors
Email generated by
Service Desk tool
PVC (INS)
Group Manager of the
assigned group
Category Owner of the
Incident Category
Incident Management
Process Owner (Manager
Library and IT Help)
Service Desk Manager
(Library and IT Help
Management Team)
The Analyst the Incident is
assigned to
2
2 hours
(if Incident has not
been responded to
or appropriately
updated)
3-5
Immediately after an
incident has been
assigned to a group
or transferred to
another group
Business/Clients
Library and IT Help phone
greeting may require
updating to include
information relating to the
Incident.
The Analyst the Incident is
assigned to
Email generated by
Service Desk tool
Team Leader of the assigned
group
All INS Directors & Associate
Directors
The Analyst the Incident is
assigned to
Email generated by
Service Desk tool
Group Manager of the
assigned group
17
Griffith University
Incident Management Process Handbook
2.7 Major Outage Procedure
A major outage is an incident that results in significant disruption to Griffith University
staff and students.
It impacts majority of university enterprise systems and majority or all of the university
clients. It is important to note that although major outage is classified as a Priority 1
incident not all priority one incidents are necessarily major outages. Please refer to 2.3
Incident Classification for further clarification.
2.7.1 Major Outage Procedure during Business Hours
Whenever a major outage occurs a Priority 1 incident will be recorded and escalated
immediately for:
Functional escalation to the relevant specialist group (2nd or 3rd tier support) to
resolve the Incident
Hierarchical escalation to the Manager, Library and IT Help (Incident Management
Process Owner) for awareness and relevant communication to the business
The Manager, Library and IT Help (Incident Management Process Owner) or relevant
director/associate director will coordinate the communication within Information
Services about the major outage and liaise with other product and service managers to
ensure that adequate resources will be made available to resolve the Priority 1 incident
as soon as possible.
If the incident has not been responded to or appropriately updated within 30 minutes,
the Manager, Library and IT Help (Incident Management Process Owner) or relevant
director/associate director will liaise with the responsible Product Service Manager/s to
ensure that a Priority Team which includes all technical specialists is mobilised. The
Priority Team is responsible for:
-
Being a communication contact point
Ongoing communication with the relevant business units/managers about the
status of the incident (a status update will be provided every hour to the business)
Resolution of the incident
Continuous information, as the situation changes, to the PVC (Information
Services), Director and Associate Directors and the Service Desk about the status of the
incident
A detailed report about the cause and resolution of the major outage after the
resolution and closure of the incident.
2.7.2 Major Outage Procedure outside Business Hours
Recording, resolution and communication of major Outages outside business hours is
responsibility of appropriate on call team. On call team will advise the Product and
Service Manager responsible for that service who will contact relevant associate
Director/Director. It is responsibility of Associate Director/Director to advise Pro Vice
Chancellor, Information Services.
If major outage is likely to continue into business hours Manager, Library and IT Help
(Incident Manager) needs to be advised to take over communication with clients and
Information Services staff with a Priority Team.
18
Griffith University
Incident Management Process Handbook
2.7.3 Roles and Responsibilities
Staff who will normally be involved in the resolution of major outage, coordination of
resolution process and communication process are:
Incident Manager
PVC, Director, Associate Director
Product Service Manager
Technical specialist
On Call Staff
Vendor staff
2.7.4 Major Outage Communication Guidelines
2.7.4.1 Purpose
The purpose of communication during major outage is to immediately investigate and
confirm the impact and severity of the Incident. It should also confirm that the Incident
is major outage and an emergency situation.
2.7.4.2 Frequency
The frequency of the communication will be determined by the impact and severity of
the outage. Frequency of communication for priority one and two incidents is outlined in
the section 2.6.1 Incident Notification and Escalation, Table B above.
2.7.4.3 Content
The content will depend on the audience. Communication with the staff who are tasked
with the resolution of the incident will be internally focussed, detailed and will contain
clear actions and timelines. Communication with clients will focus on the impact and
what is being done to minimise the impact and resolve the incident. It should not
contain technical descriptions or detailed information about internal processes.
The content will focus on:
-
The nature and extent of the outage
Assessment of the impact
High level overview of actions taken to resolve the incident
Estimated resolution time
Confirmation that the incident has been resolved
2.7.4.4 Communication channels
Every possible communication channel available at the time of the major outage should
be used to communicate with the clients and Information Services staff:
-
Email
Web page
Phone
Public Announcement System
19
Griffith University
Incident Management Process Handbook
Face to face (response team meetings, meetings with staff tasked with the
resolution of Incident, meetings with Information Services staff impacted by the incident ,
CTS Team Leaders informing staff in schools, Library staff informing clients in the Library,
etc.)
Printed notices
Notice boards
SMS message to key stakeholders advising them who to contact for more
information or updates
-
20
Griffith University
Incident Management Process Handbook
3 Roles and Responsibilities
3.1 Manager, Library and IT Help (Incident Management Process Owner)
The Manager, Library and IT Help (Incident Management Process Owner) has
responsibility for the Incident Management process.
The Manager, Library and IT Help (Incident Management Process Owner) has the
following responsibilities:
-
Monitors Incident Management process
Determines scope of the Incident Management process
Establishes Incident Management procedures
Establishes prioritisation and escalation criteria
Monitors incident escalations
Establishes links to other service management disciplines
Monitors trends and takes appropriate action
Produces high level management reports about Incident Management
Reviews Service Desk procedures
Organises reviews and audit of process
Initiates improvement programmes
Liaises with Library and IT Help Team Leaders (Service Desk Manager)
Liaises with Problem Manager
Produces information for clients
Liaises with Change Manager and Service Level Manager over proposed changes
3.2 Library and IT Help Campus Coordinators (Service Desk Manager)
All Library and IT Help Campus Coordinators (Service Desk Manager) have the
responsibility for the Service Desk tool and act as the line managers of Service Desk
staff.
The Library and IT Help Campus Coordinators (Service Desk Manager) have the
following responsibilities:
Monitor the quality of delivered services
Determine the organisation, structure and scope of the Service Desk in
consultation with the Manager, Library and IT Help (Incident Management Process Owner)
Manage Service Desk staff
Review staffing levels
Review skill requirements
Organise training
Coordinate management reporting about Service Desk function (includes process
reporting)
21
Griffith University
Incident Management Process Handbook
Are responsible for internal and external communication (clients, Information
Services, Service Desk)
Liaise with Product Managers
Liaise with the business
Promote Service Desk
Establish links to service management processes
Organise client satisfaction surveys (together with process owners)
Liaise with other support teams providing technical resources
Participate in service desk tool selection, tailoring and installation
-
3.3 First Tier (Service Desk) Support
Service Desk provides first tier support. First tier support is responsible for:
Incident and service request registration
Initial support and classification
Resolution and recovery of incidents
Escalation of incidents when necessary
Monitoring of the status and progress toward resolution of all open incidents
Following up on behalf of clients about progress towards resolution
Monitoring of response and resolution times
Closure of incidents (This process has been automated. Please refer to 2.4.7)
Keeping affected clients informed about progress
Quality checking of closed incidents
Identifying the need for and creating/editing of Knowledge Base documents to
assist with a timely response for future incidents
(Pro-) Active relationship management with the clients
Communication with clients about Information Services issues and service
requests (status updates)
Advising and assisting Information Services’ clients to make best use of services
provided by Library and IT Help
Encouraging use of self-help resources
Note: Library and IT Help is Information Services product/service line that provides first
tier support for the majority of Information Services products and services.
For the purpose of Incident Management Process, Library and IT Help is defined as the
Service Desk. However, all groups involved in the resolution of Incidents and using
Service Desk tool have the responsibility to follow Service Desk processes. For
example, all groups using Service Desk tool should record Incidents they detect and
22
Griffith University
Incident Management Process Handbook
those Incidents reported to them by other groups within Information Services that are
not currently using Service Desk tool.
3.4 Resolution Groups (Second or Third Tier Support Groups)
These groups comprise technical specialists who hold strong relationships with other
areas within Information Services and have good skills in analysing incidents and
problems.
For Incident Management, resolution groups are of two types:
-
On site support
Technical support
with the following responsibilities:
3.4.1 On Site Support (Second or Third Tier Support Groups)
On site support is responsible for:
Resolution and recovery of incidents that need support at a physical location
Escalating incidents where necessary (Hierarchical escalation)
Escalation to another support group if necessary (Functional escalation)
Resolution and recovery of assigned Incidents
Monitoring the status and progress towards resolution of all open Incidents
assigned to their group
Communicating solutions and workarounds to Library and IT Help (Service
Desk/First Level Support) to assist in Incident classification, initial support and escalation
Promotion of the Library and IT Help (Service Desk/First level Support) and
Information Services on location (e.g. Informing clients of correct channels for reporting
incidents and how to obtain updates about outages and the status of requests)
Keeping clients informed about the status of the Incident assigned to their group
Accurate and complete recording of Incidents from and to other internal groups
within Information Services, using the Service Desk tool
Accurate and complete updating of activities and steps taken to resolve the
Incident.
3.4.2 Technical Support (Second or Third Tier Support Groups)
Technical support is responsible for:
Escalating service requests where necessary
Resolution and recovery of assigned incidents
Monitoring the status and progress toward resolution of all open incidents assigned
to their group
Monitoring tasks of servers and network components and applications
Keeping affected clients informed about progress
Escalation to another support group if necessary
23
Griffith University
Incident Management Process Handbook
Communicating solutions and workarounds to On Site Support and Library and IT
Help (Service Desk/First Tier Support)
Communicating incidents generated by monitoring tools to Library and IT Help
(Service Desk/First Tier Support)
Providing monitoring and diagnosis tools to Library and IT Help (Service Desk/First
Tier Support)
Creating, knowledge base documents and providing relevant training to Service
Desk staff
Providing any other information to Library and IT Help (Service Desk/First Tier
Support) to assist in incident classification, initial support and escalation
Accurate and complete recording of Incidents from and to other internal groups
within Information Services, using the Service Desk tool
Accurate and complete updating of activities and steps taken to resolve the
Incident.
Defined Procedural Roles
1st
Tier
ARCI MATRIX
Procedural Activities
Incident submitted to Service
Desk
Incident detection and recording
2nd
Tier
3rd
Tier
Service
Incident
Group
Desk
Management Managers
Manager
Process
Owner
Client
A
I
I
I
-
-
R
R
R
R
A
-
-
-
Incident process
R
R
R
A
-
-
I
Request for Information process
R
R
R
A
-
-
I
R
R
R
A
-
-
I
R
R,C
R,C
A
I
I
I
R
R
R
A
I
I
I
A
R
R
I
I
I
I
R
R,C
R,C
I,C
A
R
I
R
R,C
R,C
C,R
A
R
I
Give the client a reference
number
Initial support and classification
Escalation to right support
group
Communicate status updates to
client
Investigation and diagnosis
Escalate using escalation
procedure
Resolution and recovery
R
R,C
R,C
C,R
A
R
I
Client approval of solution
R
R
R
I
R
-
A
Closure
R
I
I
A
I
I
R
Explanation of Roles
Accountable
The person in this process who has the accountability for ensuring the overall
process is available, understood and performed correctly
Responsible
The person(s) who are expected to perform the prescribed activity, resolve and/or
escalate the related issues. Multiple levels within the matrix can do this
Consulted
The person(s) who are consulted before decisions are made or implementations
carried out
Informed
The person(s) who need to be informed about the prescribed activity
24
Griffith University
Incident Management Process Handbook
3.4.3 Product and Service Managers
Product and Service Managers are responsible for:
Monitoring the status and progress toward resolution of all open incidents assigned
to their group
Ensuring that adequate resources are available for efficient and effective resolution
of incidents assigned to their group
Monitoring of, and adherence to, response and resolution times of incidents
assigned to their groups
Monitoring client feedback for their product and service group and following up on
negative client feedback
Ensuring that adequate resources are available to resolve Priority 1 incidents
assigned to their groups as soon as possible
When Priority 1 incident has not been resolved within 30 minutes liaising with the
Manager, Library and IT Help (Incident Management Process Owner) about the situation and
mobilisation of a Priority Team, which can include technical specialists from all product and
service groups.
25
Griffith University
Incident Management Process Handbook
4 Communication Framework
This section consists of two parts:
1. Communication framework; In which the different communication lines between
Information Services and the business are described on an operational, tactical and
strategic level
2. Relationship with other processes; The input and output flows between Incident
Management and the other processes (Problem, Change, Configuration and
Service Level Management) are detailed in this section.
4.1 Communication Framework
The Communication Framework provides a high level overview of the different
communication lines between Information Services and the business on an operational,
tactical and strategic level. The communication framework for the Incident
Management process is shown below.
Strategic
Business
Director
Exception Reporting
KPI’s
INS Strategic Plan
Griffith Strategic Plan
PVC
(INS)
DSD
Exception Reporting
KPI’s
INS Strategic Plan
Griffith Strategic
Plan
INS Budget
SLA’s
Business
Needs
SLAs
Tactical
Business
Manager
Exception Reporting
Service Levels
Service Catalogue
SLA Reporting
Request for Service Outside SLA
Client Satisfaction Survey
Client Needs
Satisfaction Survey
Request for Service outside
SLA
SLA’s
Operational
End User
Incident Reporting
Incidents
Service Catalogue
Outages
Request for Service Outside SLA
Client Satisfaction Survey
Service Delivery News
Training
Incident
Problem
Manager
Exception Reporting
KPI’s
INS Strategic Plan
Griffith Strategic Plan
Request for Service Outside
SLA
SLA’s
Incident Reporting
Service Desk
26
Griffith University
Incident Management Process Handbook
4.2 Relationship with Other Processes
The following picture provides an overview on how Incident Management fits into the
Service Support processes as described by ITIL.
Incident
Management
RFC
Service Desk
Service Level
Management
Incident
Problem
Management
Incident
IT Operations
Configuration
Management
RFC
Change
Management
The main input and output flows between Incident Management and the other
processes that will be implemented at Griffith University. (Problem, Change
Configuration and Service Level Management) are detailed in this section.
4.3 Incident Management & Problem Management
4.3.1.1 Input
Information needed from the Problem Management process by Incident Management
includes:
-
Resolutions for Incidents
Workarounds for Incidents
Knowledge Base (known errors, existing resolutions, accepted workarounds)
4.3.1.2 Output
Information provided to Problem Management by Incident Management includes:
error)
-
Incident details (Affected systems, affected clients, classification, details of the
History of occurred incidents
Proposed workarounds for incidents
Proposed solutions for incidents
27
Griffith University
Incident Management Process Handbook
4.3.2 Incident Management & Change Management
4.3.2.1 Input
Information needed from the Change Management process by Incident Management
includes:
-
Change schedule
Status update of scheduled changes
Result of implemented changes (history)
4.3.2.2 Output
Information provided to Change Management by Incident Management includes:
-
Accepted RFCs
Advice on incidents resulting from an implemented change (feedback)
4.4 Incident Management & Configuration Management
4.4.1.1 Input
Information needed from the Configuration Management process by Incident
Management includes:
-
Details of Configuration Items (CIs)
Relationships between CIs
Service levels for CIs
Service contact details
4.4.1.2 Output
Information provided to Configuration Management by Incident Management includes:
-
Errors or discrepancies in Configuration Management Data Base (CMDB)
Relationship between incidents and Cls
28
Griffith University
Incident Management Process Handbook
4.5 Incident Management & Service Level Management
4.5.1.1 Input
Information needed from the Service Level Management process by Incident
Management includes:
-
Service Levels / KPIs
Business priority escalations
Service Catalogue
Client satisfaction / feedback about the Incident Management process
Communication about new services
4.5.1.2 Output
Information provided to Service Level Management by Incident Management includes:
-
Incidents outside Service Level Agreement (SLA)
Requests for service outside SLA (new ad hoc business requirements)
Client satisfaction
Exceptions to SLAs
Escalation of priority calls
Process information (management reporting)
KPI reporting
29
Griffith University
Incident Management Process Handbook
5 Performance Management
The following Key Performance Indicators (KPIs) have been set for the Incident
Management process:
-
80% of incidents responded to within SLA (response time)
80% of incidents resolved within SLA (resolution time)
100% of non-pending incidents must have updated activity log < 2 days old
90% of incidents to follow predefined Incident Management process
The following Key Performance Indicators (KPIs) have been set for the Service Desk:
scale)
received
-
95% of calls to be answered within 10 seconds
Client Satisfaction Survey to return average rating of 4 or higher (on a 5 point
100% of e-mailed incidents to be recorded within 24 business hours after email
80% of incidents solved at 1st tier
30
Griffith University
Incident Management Process Handbook
6 Management Reports
In this section the management information provided by the Incident Management
process is specified.
6.1 Management Reports
Management reporting takes place on daily, weekly and monthly basis about the
following subjects:
Daily
Weekly
Monthly
All Exceptions
All Exceptions
All Exceptions
Critical Issues
Incident Summary
Availability of services

Open Incident
Current Number of Users of Service
Desk tool

Closed Incident
Implemented Improvements

New Incident
Incidents Summary – Rolling Trend

Group Breakdown
Incident Reporting, Incidents by:
Priority 1 Incident

Status
SLA Exceptions

Category

Priority

Service Group
Incident Summary (Detailed KPIs)
Performance Against SLA
Priority 1 Incidents
Recommendations
Top 10 Service Desk tool users
Management reports are submitted:
On a daily basis to all staff involved in the Incident Management process
On a weekly basis to Information Services management
On a monthly basis to Information Services Management, the Library and IT Help
Team Leaders (Service Desk Manager) and the Service Level manager
These management reports are used to monitor the success of the Incident
Management process and to identify any problems with the process.
31
Griffith University
Incident Management Process Handbook
6.1.1 Incident Management Process Reports
The following reports will be available for the Incident Management process.
Metric
Metric Use
Total number of incidents
recorded in the Service Desk
tool (open and closed)
Gives an indication of the overall workload of the Information Services
staff.
Total number of incidents
recorded per Information
Services service group (open
and closed)
Gives a breakdown of incidents logged to show which departments
are requiring the most support. Further analysis can be carried out to
drill down into “problem” departments to identify key groups that need
more assistance than average.
Number of incidents
recorded per category (open
and closed)
Helps show which parts of the infrastructure are creating the most
incidents. Useful to identify areas that could require detailed analysis
to remove common problems.
Percentage and number of
incidents resolved within
service level times
An important measure that indicates the level of service that is being
provided to the clients of Information Services.
Percentage of incidents per
priority code
Will show the workload per priority code and hence service level. This
data can be used to determine staffing levels, costs of services or the
review of the priority codes defined.
Percentage of incidents
resolved at first, second and
third tier.
Shows where the support work is taking place. It can be a useful
metric especially for the Service Desk as they take work away from
second and third level freeing them up for more pro-active tasks.
Number of incidents
assigned per Information
Services group/staff member
(open and closed)
Show the amount of work that different staff members and groups are
processing
Apart from the specific metrics above the following reports can be made available.
Results of client satisfaction surveys
A summary of any major outages, the actions taken to fix the outage and steps to
ensure that this will not occur again.
32
Griffith University
Incident Management Process Handbook
7 Process Review
In order to maintain continuous improvement of the Incident Management process an
ongoing review is essential. Detailed Incident Management review should be
undertaken on a six monthly basis.
These reviews should take place to ensure quality is maintained or improved. The
following steps should take place for each review.
Gather data and information from the Service Desk tool, Client Satisfaction
Surveys and input from Incident Management staff. Input for the process review should be
pro-actively sought from the Information Services staff by the Library and IT Help Manager
(Incident Management Process Owner).
Analyse the data looking at areas such as, client satisfaction, suggestions from
staff and process metrics. Some specific metrics to look at include those listed in Sections 5
“Performance Management" and 6 “Management Reports” and the following list.
Correct logging of incident data in terms of incident categories, priority and specific
information relating to the incident
Review of incident Management reports such as performance against service
levels.
State of the Incident Management Process Handbook (Up to date or awaiting
review)
Problems identified with the process
From the above analysis identify improvement opportunities and put forward a
report detailing the suggestions. This report should be submitted to the Information Services
Management.
Following approval for improvements documented in the above mentioned report a
Process Improvement Plan should be formulated for improvements.
Changes to the Incident Management process should be authorised by the
Change Management processes. Once authorised implementation of changes should occur.
Following an improvement project a Post Implementation Review should be carried
out. This review should look at all the reports analysed before to identify that improvements
have made a positive change.
33
Griffith University
Incident Management Process Handbook
Appendix A: Incident Management Quick
Reference Card
34
Griffith University
Incident Management Process Handbook
Appendix B: Incident Categories
Categories updated on continuous basis to reflect categories in Service Desk tool
35
Griffith University
Incident Management Process Handbook
36
Griffith University
Incident Management Process Handbook
Appendix C: Abbreviations and Definitions
7.1 Abbreviations and Acronyms Used
Abbreviation
Definition
INS
Information Services
ITIL
Information Technology Infrastructure Library
RFC
Request For Change
SLA
Service Level Agreement
7.2 Definition of Terms Used
Term
Definition
Business Manager
A person authorised to make decisions on behalf of an organisational
unit concerning a service and its associated service levels.
Change
Any action either physical or procedural which modifies or impacts the
production environment
Change Management
The management and control of changes to the production
environment, in order to minimise the impact of change-related
problems.
Change Manager
The person responsible for processing change requests, chairing
Change Advisory Board meetings, coordinating changes and reporting
change activity to management
Classification
Determining the value of items by placing them in a certain order on
the basis of category, impact, and severity. It can be used to support
decisions concerning priorities.
A component of an IT infrastructure. CIs may vary widely in
complexity, size and type – from an entire system (including all
hardware, software and documentation) to a single software module
or a minor hardware component.
Configuration Item (CI)
Configuration
Management
The process of identifying and defining the CIs in a system, recording
and reporting the status of CIs, and verifying the completeness and
correctness of CIs.
Incident
ITIL Definition: Any event that deviates from the standard and
expected operation of an IT system or service.
Incident Management
Further Description: An incident can be seen as a client requesting
help for something that is not working. For example “I can’t print”, “I
can’t access the Internet”. In any situation where something does not
work and the specific details are not known it is an incident.
The process that has as primary focus to restore normal service
operation as quickly as possible and minimise the adverse impact on
business operations
Incident Recording
The quality recording of incidents in such a way that other activities
and increased service provision is possible.
Incident Reporting
The reporting of incidents, requests by clients and/or support groups.
37
Griffith University
Incident Management Process Handbook
Term
Definition
Information Services
(Information Services)
Information Services encompasses the supporting technologies and
infrastructure on which the systems are run.
Information Technology
Infrastructure Library
(ITIL)
ITIL is a non-proprietary framework tailored to the operation of the IT
infrastructure developed by the UK Office of Government Commerce.
It is a set of comprehensive, consistent and coherent codes of best
practice for IT Service Management.
Information Services
Element of Griffith University that encompasses the supporting
technologies and infrastructure on which the systems are run and
services provided.
Information Services
infrastructure
The sum of an organisation’s IT-related hardware, software, data
communication facilities, procedures, and people.
Information Services
service
A described set of facilities, II and non IT, supported by the Information
Services (service provider) that fulfils one of more needs of the client,
that supports the client’
Key Performance
Indicator (KPI)
Key Performance Indicators are clearly defined objectives with
measurable targets, set to judge process performance
Known Error
ITIL Definition: The successful diagnosis of the root cause of a
Problem (i.e. the specific infrastructure component at fault has been
identified).
Problem
Problem Management
Further Description: A known error is logged when the specific root
cause is known for a group of problems/incidents or a single major
problem. The know error record will exactly define what has gone
wrong and the solution so that it does not happen again. Continuing
our example, a known error would be: “There is a fault with the
network card in the printer in department X that is causing the printing
problems”. A Known Error is more defined than a Problem.
ITIL Definition: The unknown underlying cause of one or more
incidents. More specifically A condition identified as a result of
multiple incidents that exhibit common symptoms, or of a single
significant incident indicative of a single error.
Further Description: A problem is a more specific definition of
something that has gone wrong. Quite often a number of similar
incidents are linked to a common problem. In the case where a
number of clients are not able to print a “problem” will be defined
saying something like “there is a problem with the network in
department X causing printing problems”. A Problem is more defined
than an Incident.
The process that has as primary focus to minimise the adverse impact
of Incidents and Problems on the business that are caused by errors
within the IT Infrastructure, and to prevent recurrence of Incidents
related to these errors
Process
A connected series of actions, activities or operations performed with
the intent of satisfying a purpose to achieve a goal.
Release Management
The process that has as primary focus to securely control the physical
and logical storage, management, distribution and implementation of
all software assets, ensuring that only currently authorised and quality
checked versions of software, are actually brought into use in the
production environment at minimal cost
Request For Change
A form or screen, used to record details of a request for a change to
38
Griffith University
Incident Management Process Handbook
Term
Definition
(RFC)
any component of an IT infrastructure.
Service Catalogue
Written statement of services, default service levels and options.
Service Desk
Information Services organisational unit that makes its services
accessible to clients. Library and IT Help product service group is this
unit. All other Information Services groups E.g. S3, NCS, CTS
contribute to the Service Desk Incident Management Process. See
Section 3.3 "Service Desk Support”.
Service Desk tool
Application which is used to record incidents, RFIs, changes and
problems (i.e. Service-Now.com).
Service Level
The expression of an aspect of a service in definitive and quantifiable
terms.
Service Level
Agreement (SLA)
A formal agreement between the client(s) and the IT service provider
specifying service levels and the terms under which a service or a
package of services is provided to the client.
Service Level
Management
The process of regular communication with the client to find out their
requirements and to offer new services and technologies.
39
Griffith University
Incident Management Process Handbook
Appendix D: Unsupported Functions
NB There are no unsupported functions at present time.
40
Download