Incident Management Process Handbook Griffith University Version 1.7 Griffith University Incident Management Process Handbook Version History Version No Issue Date Nature of Amendment Editor 0.3 July 2003 First draft Patrick Keogh (LucidIT) 0.4 8/3/2004 Updates to priority and response / resolution table. Minor updates John Scullen 0.5 22/3/2004 Updates from quality review walkthrough John Scullen 0.6 24/3/2004 Updates suggested by ICTS-MT. Response John Scullen definition adjusted and diagram included. Response times adjusted for priority 1 and 2. Hierarchical escalation modified to ICTS-MT specifications. 0.7 6/4/2004 Minor updates resulting from final v0.6 distribution. Merril Rogers 1.0 20/4/2004 Minor updates from Geoff Dengate. Approved by Project Board John Scullen 1.1 13/07/2006 Revision Sanja Tadic, Judy Bromage, Julie Aslett 1.2 02/10/2006 Minor updates resulting from consultations with Product and Service Managers Sanja Tadic, Julie Aslett 1.3 22/06/2007 Minor update Sanja Tadic 1.4 29/09/2008 Minor update to document major outage procedure Sanja Tadic, Naveen Sharma 1.5 30/07/2010 Minor update to document procedure for incorrect group assignment. Update to Incident Management Quick Reference Guide Appendix A. Update to Incident Categories Appendix B. Update of name change from InfoServices to Library and IT Help. References to EITS changed to CTS. Felicity Berends Sanja Tadic 1.6 30/4/2013 Minor updates of SDT name from LibraryandITHelp@Griffith to Service Desk tool. Minor corrections to role titles. Update to Quick Reference Guide (Appendix A) and Incident Categories spider chart (Appendix B) Felicity Berends Distribution Ver # 0.4 Recipient Andrew Bowness Wendy Balachandran Christine Schafer Regina Obexer Matt Maynard Carol O’Faircheallaigh Date issued 8/3/2004 Reason for distribution Initial release prior to quality review ii Griffith University Incident Management Process Handbook Carolyn Plant Karl Turnbull Rowan Salt Geoff Mitchell Sandra Reis 0.6 ICTS-MT Chris Walker Paul Jardine 26/3/2004 Circulated following suggested updates 0.7 Project Board 6/4/2004 Approval 1.0 ITIL Steering Commitee 21/4/2004 FYI 1.1 ITIL Steering Comittee 22/08/2006 Approval 1.1 INS Product and Service Managers (currently using LibraryandITHelp@Griffith) 12/09/2006 29/09/2006 FYI 1.5 INS Product and Service Managers (currently using LibraryandITHelp@Griffith) 31/08/2010 FYI 1,6 INS Product and Service Managers (currently using the Service Desk tool) 30/4/2013 FYI iii Griffith University Incident Management Process Handbook Print date – 31 August 2010 Filename – Incident Management Process Handbook v1_6.doc Author(s) – Patrick Keogh (Lucid IT Pty Ltd), Wendy Balachandran, John Scullen, Sanja Tadic, Felicity Berends iv Griffith University Incident Management Process Handbook Table of Contents 1 Introduction .................................................................................... 3 1.1 Objective .................................................................................... 3 1.2 Scope ........................................................................................ 3 1.3 Document Structure ................................................................... 3 2 The Incident Management Process .............................................. 5 2.1 Objectives of Incident Management ........................................... 5 2.2 Scope ........................................................................................ 5 2.3 Incident Classification ................................................................ 5 2.4 Process Description ................................................................... 8 2.5 Response and Resolution Times ............................................. 11 2.6 Escalation ................................................................................ 12 2.7 Priority 1 Incident Procedure.................................................... 17 3 Roles and Responsibilities ......................................................... 20 3.1 Manager, Library and IT Help (Incident Management Process Owner) 20 3.2 Library and IT Help Team Leaders (Service Desk Manager) ... 20 3.3 First Tier (Service Desk) Support............................................. 21 3.4 Resolution Groups ................................................................... 22 4 Communication Framework........................................................ 25 4.1 Communication Framework ..................................................... 25 4.2 Relationship with Other Processes .......................................... 26 4.3 Incident Management & Problem Management ....................... 26 4.4 Incident Management & Configuration Management ............... 27 4.5 Incident Management & Service Level Management ............... 28 5 Performance Management .......................................................... 29 6 Management Reports .................................................................. 30 6.1 Management Reports .............................................................. 30 7 Process Review ........................................................................... 32 Appendix A: Incident Management Quick Reference Card .............. 33 Appendix B: Incident Categories ........................................................ 34 1 Griffith University Incident Management Process Handbook Appendix C: Abbreviations and Definitions ...................................... 35 7.1 Abbreviations and Acronyms Used .......................................... 36 7.2 Definition of Terms Used ......................................................... 36 Appendix D: Unsupported Functions ................................................. 39 2 Griffith University Incident Management Process Handbook 1 Introduction The main goal of Incident Management is to restore normal service operation as quickly as possible and minimise the adverse impact on business operations. Service Desk as the first point of contact for Information Services clients is the owner of the Incident Management Process. The main objective of Service Desk is to facilitate the restoration of normal operational service with minimal business impact on the client and within agreed service levels and business priorities. These procedures are to be used by all Information Services staff handling Incidents within the scope defined in section 2.2 “Scope”. 1.1 Objective Information Services aims to achieve the following objectives with the Incident Management Process: Capture information about an incident at the start of the process Clients have confidence in Information Services capability Consistent processes for clients Have clear procedures for clients on how to get help Better management of, and alignment with, client expectations A well defined scope of the Service Desk role which is clearly communicated Analysis of incidents which will contribute to a better understanding of the underlying issues 1.2 Scope This Handbook documents the Incident Management Process including: - The process flow Roles and responsibilities Communication framework Performance Management Management reports Checklists and definitions 1.3 Document Structure This document describes the Incident Management process for Information Services. The document is organised as follows: Section 2: The Incident Management Process – Provides a description of the Incident Management Process. Includes high level process description, classification, lead times, and escalation matrices. 3 Griffith University Incident Management Process Handbook Section 3: Roles and Responsibilities – The different roles within the Incident Management Process are described. The responsibilities of each role are included. Section 4: Communication Framework – The communication aspects are explained in more detail. Included is high level communication framework and the relationship between Incident Management and other processes. Section 5: Performance Management – The Key Performance Indicators (KPIs) for the Incident Management Process are described. Section 6: Management Reports – Information about the different management reports is provided. Section 7: Process Review – Information about reviewing the Incident management process. The following Appendices are included at the end of the document: Appendix A: Incident Management Quick Reference Card – Quick Reference Cards (QRC) for Service Desk and second and third tier support groups. Appendix B: Incident Categories – Overview of the different Incident Categories - 4 Griffith University Incident Management Process Handbook Appendix C: Abbreviations and Definitions – Explanation of abbreviations and definitions used in this document 5 Griffith University Incident Management Process Handbook 2 The Incident Management Process Incident Management is closely associated with Problem Management, providing categorisation and reporting of all the Information Services related incidents that occur within Griffith University, thus enabling root cause analysis. The second and third tier support groups have a role within this process. The focus must be on quick solutions and the time involved restoring service as quickly as possible. After resolution, a complete recording of all actions should be documented in the Service Desk tool. This will facilitate faster response times for future incidents, and will free up second and third tier support groups for more proactive problem solving. This section provides information about the Incident Management process. 2.1 Objectives of Incident Management The objectives of Incident Management are: - To restore normal service operation as quickly as possible; and To minimise the adverse impact on business operations 2.2 Scope An Incident is any event which is not part of the standard operation of the service and which causes, or may cause, an interruption to, or a reduction of, the quality of the service. Only production systems, or systems connected to the production network, are covered by the Incident Management process. Excluded from the scope of this process are: - Development activities Non-production activities (unless connected to the production network) All unsupported functions as specified in Appendix D. 2.3 Incident Classification Classification of incidents is based on two aspects: 1. Priority of an incident: Relating to the severity of an incident; and 2. Category of an incident: Relating to the configuration item causing the incident to occur 6 Griffith University Incident Management Process Handbook 2.3.1 Incident Priority The priority of an incident is determined by: 1. Impact: Impact of the incident on the business. The number of clients or importance of system affected. The hierarchical position of the client is included in this variable. 2. Urgency: How severely the client’s work process is affected. This influences the timeframe that is allowed to resolve the incident. The Impact/Urgency matrix, shown below, determines the priority of the incident. Urgency Impact Low Medium High Low 5 4 3 Medium 4 3 2 High 3 2 1 The assessment methodology for the impact and the severity is explained in more detail in the sections below. 2.3.1.1 Impact Incidents will be placed into High, Medium and Low impact categories. The key factor in measuring impact is the impact the incident has on the business. Each incident will be reviewed on a case-by-case basis with appropriate impact assessment and approval based on the following criteria. Impact Description High Whole organisation affected; Site or multiple sites affected; Multiple groups of clients affected; Critical business process interrupted; or System-wide outages to Learning@Griffith, Staff portal, or Email Medium Group of clients, a Pro Vice Chancellor (PVC), or a member of the Vice Chancellor’s (VC’s) Office staff affected; Non-critical business process interrupted. Low One client affected (other than VC’s Office or PVCs) 7 Griffith University Incident Management Process Handbook 2.3.1.2 Urgency Incidents will be placed into High, Medium and Low urgency categories. The key factor in measuring urgency is how severely the client’s work process is affected. This influences the timeframe that is allowed to resolve the incident. Each incident will be reviewed on a case-by-case basis with appropriate severity assessment and approval based on the following criteria. Urgency Description High Process stopped; client(s) cannot work Medium Process affected; client(s) cannot use certain functions Low Process not affected; change request, new/extra/optimised function 2.3.2 Incident Category Incident categories have been established to: - To assist with the correct assignment of incidents. To facilitate reporting on the incident and problem management process. To identify priority areas for proactive problem management to focus on. Incident categories are described in Appendix B: Incident Categories. 8 Griffith University Incident Management Process Handbook 2.4 Process Description 1. Incident Detected The aim of the Incident Management process is to provide a standardised, high quality service to all clients who report incidents. This section provides an overview of the Incident Management process as pictured in the process flow chart (right). The process flowchart provides an overview of the Incident Management process. The Incident Management process is managed through the Service Desk tool. 2.4.1 Incident Detected The process starts with the detection of an incident. An incident can originate from a situation experienced by a client and reported to the Service Desk from a technical malfunction, detected by clients, Information Services staff or third party vendors. Note: The incident can be communicated to the Service Desk via the telephone, digitally or faceto-face. 8. Monitoring and Tracking Note: Service Desk tool is a Request for Information, Incident, Problem and Change Management System. 2. Acceptance, Recording and Classification Service Request ? Activate Change Management process Priority 1 ? Activate Priority 1 Incident procedure 3. Initial Support 4. Investigation and Diagnosis 5. Resolve Incident Incident Resolved ? 6. Incident Escalation 7. Verify Resolution and Incident Closure 2.4.2 Acceptance, Recording and Classification When the incident is reported to the Service Desk, a Service Desk staff member should first determine whether it falls within the scope of Incident Management process. If uncertain, this person should seek the advice of the Manager, Library and IT Help (Incident Management Process Owner) or a Team Leader (Service Desk Manager) for confirmation. The Service Desk staff member is responsible for opening a new record in the Service Desk tool and recording the incident details. The following is an example of information that should be captured: - Client details such as name, location, phone number, email Classification of the incident in terms of incident category and priority Detailed description of the incident and affected Configuration Items (CIs) 9 Griffith University Incident Management Process Handbook The Service Desk staff member classifies the incident according to impact and urgency of the incident. Refer to the Impact/Urgency Matrix in 2.3.1 to help determine the priority of the incident. If an issue can be identified as a Request for Information (RFI) or a Request for Change (RFC), the RFI & RFC process gets activated. 2.4.3 Initial Support The Service Desk provides the initial support to solve the incident and, based on time and knowledge, determines whether the incident can be solved. If the incident cannot be solved by Service Desk, an incident reference number and a response time (see section 2.5 “Response and Resolution Times”) will be provided. If the Service Desk recognises incidents with similar symptoms, which have been recently recorded, then Incident Matching can occur. Incident Matching is where similar incidents are grouped together to reflect that there may be a larger “problem”. This linking assists with Problem Management activities and, when a solution is found, it can easily be transferred to all the incidents grouped together. The Service Desk tool parent/child facility is used to record these matches/incidents. If the incident cannot be linked to an existing problem, the Service Desk staff should search Service Desk tool for similar incidents or the Knowledge Base for a resolution. If a resolution is identified it can be applied to the incident. If not, additional investigation and diagnosis should be carried out for an incident resolution. 2.4.4 Investigation and Diagnosis The Service Desk staff member should first make an attempt at analysing the incident, in search for a solution. This person can use the following information: - Own experience and knowledge Service Desk tool information Knowledge Base entries Procedure manuals & other relevant documentation Technical information from the Internet Knowledge and experience of colleagues Any additional information acquired which will be useful for resolving the incident should be recorded in the Service Desk tool. If a solution cannot be provided by Service Desk, it is then escalated to the relevant support groups (second, third or vendor) for investigation and diagnosis to find and implement a solution/workaround for the incident. The relevant support group updates the incident record with the solution/workaround. Note: If a solution cannot be provided, the solution should be identified as a possible entry for inclusion into the Knowledge Base, which will assist future diagnoses. 2.4.5 Resolve Incident If a resolution of the incident can be found and implemented within SLA time by the relevant second, third or vendor support groups, the incident will be given a status of 10 Griffith University Incident Management Process Handbook ‘resolved’. This will trigger an auto-generated email informing the client that their incident has been resolved. Part of this activity is updating Service Desk tool with the resolution information. Accurate and complete recording of details is very important and is a requirement of all parties involved in the Incident Management process. Quality information is critical for future incident handling and restoration of normal business activities. 2.4.6 Incident Escalation If an incident cannot be resolved by the Service Desk staff, the incident should be transferred to second or third tier support groups. Service Desk remains responsible for the incident until it is transferred to a second tier support group. Transfer means involving second or third tier support groups in the resolution of the incident. When incidents are transferred, the assigned owner of the incident is responsible for keeping the client informed of progress. This is done by any appropriate means: face to face, telephone, manual notification from Service Desk tool or email. In addition, all the relevant service groups are responsible for ensuring all activity relating to an incident is suitably annotated. There may be occasions when an incident is escalated to an incorrect group. It is the assigned group’s responsibility to transfer the incident to the correct group and provide information to the analyst who initially assigned the incident incorrectly to enable them to assign incidents of this nature correctly in the future. If the incorrectly assigned group does not know which group to correctly assign the incident to it is appropriate to transfer the incident to Service Desk for further investigation. When Priority 1 and 2 incidents are assigned or transferred, the analyst assigning or transferring should make a phone call to advise the new assigned owner that priority 1 (or 2) incident has been assigned or transferred to them. Depending on the priority, hierarchical escalation might take place as well. Hierarchical escalation (awareness) means that higher levels of management are involved when there is a threatened breach of service levels or additional authorisation is required for incident resolution. Explanation of incident notification and escalation process is given in section 2.6.1. Note: Updating the Internal Notes does not generate an email from Service Desk tool. Only confirmation, requestor communications, resolution and closure emails are sent to the client. 2.4.7 Verify Resolution and Incident Closure The client will be informed once the incident is resolved. The client has three business days (Monday to Friday, excluding holidays) to confirm that the incident can be closed. If the client does not agree that the incident has been resolved, it will be reopened and the process will return to “Investigation and Diagnosis.” If client does not respond to the email requesting incident closure, the incident will automatically be closed after three business days and the status changed to “closed”. If the client responds after the incident closure, a new incident record will be created. Incidents with status of “closed” should not be reopened. A new incident should be created that refers to the original Incident record. 11 Griffith University Incident Management Process Handbook 2.4.8 Monitoring and Tracking The end to end progress of the incident is monitored and communicated to the client when necessary. The Service Desk tool is updated each time the status of the incident changes. Clients can monitor the status of their Incidents via Service Desk tool accessible from Staff portal. 2.5 Response and Resolution Times Response Time is the elapsed time between when the incident is recorded and when work commences on investigation, diagnosis and resolution of the incident. The response time can be utilised to - Research a solution Mobilise a priority team Request further details from the client Advise action taken and provide an indication of the resolution time if required Resolution Time is the target time for a resolution to an incident to be implemented. Information Services aims to resolve at least 80% of incidents inside the resolution times specified. This will allow for exceptional circumstances that cannot be met using the standard times. Official closure of the incident is dependent on the approval of the client. Solution time therefore does not include the time taken for the client to contact Information Services to give approval, as this could happen some time later. Detection & report to Service Desk Start Repair Diagnosis Finish Repair Recovery Incident Incident Detection time Response time Repair time Recovery time 12 Griffith University Incident Management Process Handbook The following table provides an overview of response and resolution times for incidents. The times listed below relate to response and resolution times of the support groups involved during standard business hours, Mondays to Fridays. Time required by external support service providers (non Information Services) or purchasing time (should there be the need for the acquisition of parts and/or materials) is excluded. Response Time Resolution Time Priority 1 30 minutes 4 hours Priority 2 1 hour 8 hours Priority 3 4 hours 12 hours Priority 4 1 day 3 days Priority 5 2 days 5 days 2.6 Escalation Escalation can take place in two ways: 1. Functional escalation – This is escalation to another support group in order to solve the incident. 2. Hierarchical escalation – This is escalation in order to inform the right (management) level within Information Services for communication purposes and in order to free up the necessary resources to solve the incident. Incident Notification and Escalation Notification of Incidents occurs at defined times. Notification ensures that: The Business (including management) and clients are kept informed of the occurrence and progress towards resolution of an incident; Swift action is taken to resolve the incident; Management provide necessary resources to resolve the incident. 13 Griffith University Incident Management Process Handbook Notification is generated, depending on the priority, when: Higher priority of incident is recorded in the Service Desk tool - certain management levels are notified; Incident is assigned or transferred to a group using the Service Desk tool; 75% time has lapsed since Incident was recorded & updated; SLA is breached. Table A details when internal notification occurs for 75% resolution time elapsed and SLA breaches for all priorities Table B details when additional internal and external notifications for each priority occur Table A Event & Priority 75% of target resolution time elapsed 1 = 3 hours Notification Sent to: The Analyst the Incident is assigned to Group Manager of the assigned group Method Email generated by Service Desk tool Category Owner of the Incident Category 2 = 6 hours 3 = 9 hours 4 = 2.25 days 5 = 3.75 days 100% - SLA Breached 1 = 4 hours The Analyst the Incident is assigned to Group Manager of the assigned group Email generated by Service Desk tool Category Owner of the Incident Category 2 = 8 hours 3 = 12 hours 4 = 3 days 5 = 5 days Weekly Report of all Incidents recorded in this period, including breaches sent to Group Managers Reports generated from Service Desk tool Note: 1 working day = 9 hours, 8am – 5pm. The clock does not continue outside of these hours for the purpose of escalation notification. 14 Griffith University Incident Management Process Handbook Table B Priority 1&2 After Immediately after an incident has been assigned to a group or transferred to another group Notification sent to: Method Group Manager of the assigned group Email generated by Service Desk tool Category Owner of the Incident Category SDT Announcement is created to notify INS staff using SDT and or Griffith University Community Incident Management Process Owner (Manager Library and IT Help) Service Desk Manager (Library and IT Help Management Team) The Analyst the Incident is assigned to 1 30 minutes (if Incident has not been responded to or appropriately updated) Product Manger, Team Leader or Duty phone of the assigned group Analyst who assigns or transfers the incident to another support group notifies the group via telephone. All INS Directors & Associate Directors Email generated by Service Desk tool PVC (INS) Group Manager of the assigned group Category Owner of the Incident Category Incident Management Process Owner (Manager Library and IT Help) Service Desk Manager (Library and IT Help Management Team) The Analyst the Incident is assigned to Business/Clients Library and IT Help phone greeting may require updating to include information relating to the Incident. SDT Announcement is created to notify INS staff using SDT and or Griffith University Community 15 Griffith University Incident Management Process Handbook Priority 1 After 1 hour (if Incident has not been responded to or appropriately updated) Notification sent to: The Analyst the Incident is assigned to Method Email generated by Service Desk tool Team Leader of the assigned group Business/Clients SDT Announcement is updated with a progress report of the Incident. Update Library and IT Help phone greeting to keep clients informed on progress of the incident. Email from the relevant Director/PVC (INS) might be sent to all staff and all students Priority 1 After Hourly (if Incident has not been responded to or appropriately updated) Notification sent to: Method Team Leader of the assigned group Email generated by Service Desk tool Business/Clients SDT Announcement is updated with a progress report of the Incident. Update Library and IT Help phone greeting to keep clients informed on progress of the incident Update email from the relevant Director/PVC (INS) might be sent to all staff and all students 2 Immediately after an incident has been assigned to a group or transferred to another group Product Manager, Team Leader or Duty phone of the assigned group Analyst who assigns or transfers the incident to another support group notifies the group via telephone. SDT Announcement is created to notify INS staff using L&ITH@G and or Griffith University Community 16 Griffith University Incident Management Process Handbook Priority 2 After 30 minutes after an incident has been assigned to a group or transferred to another group Notification sent to: Team Leader of the assigned group Method Email generated by Service Desk tool Incident Manager Process owner Service Desk Manager (Library and IT Help Management Team) 2 1 hour (if Incident has not been responded to or appropriately updated) All INS Directors & Associate Directors Email generated by Service Desk tool PVC (INS) Group Manager of the assigned group Category Owner of the Incident Category Incident Management Process Owner (Manager Library and IT Help) Service Desk Manager (Library and IT Help Management Team) The Analyst the Incident is assigned to 2 2 hours (if Incident has not been responded to or appropriately updated) 3-5 Immediately after an incident has been assigned to a group or transferred to another group Business/Clients Library and IT Help phone greeting may require updating to include information relating to the Incident. The Analyst the Incident is assigned to Email generated by Service Desk tool Team Leader of the assigned group All INS Directors & Associate Directors The Analyst the Incident is assigned to Email generated by Service Desk tool Group Manager of the assigned group 17 Griffith University Incident Management Process Handbook 2.7 Major Outage Procedure A major outage is an incident that results in significant disruption to Griffith University staff and students. It impacts majority of university enterprise systems and majority or all of the university clients. It is important to note that although major outage is classified as a Priority 1 incident not all priority one incidents are necessarily major outages. Please refer to 2.3 Incident Classification for further clarification. 2.7.1 Major Outage Procedure during Business Hours Whenever a major outage occurs a Priority 1 incident will be recorded and escalated immediately for: Functional escalation to the relevant specialist group (2nd or 3rd tier support) to resolve the Incident Hierarchical escalation to the Manager, Library and IT Help (Incident Management Process Owner) for awareness and relevant communication to the business The Manager, Library and IT Help (Incident Management Process Owner) or relevant director/associate director will coordinate the communication within Information Services about the major outage and liaise with other product and service managers to ensure that adequate resources will be made available to resolve the Priority 1 incident as soon as possible. If the incident has not been responded to or appropriately updated within 30 minutes, the Manager, Library and IT Help (Incident Management Process Owner) or relevant director/associate director will liaise with the responsible Product Service Manager/s to ensure that a Priority Team which includes all technical specialists is mobilised. The Priority Team is responsible for: - Being a communication contact point Ongoing communication with the relevant business units/managers about the status of the incident (a status update will be provided every hour to the business) Resolution of the incident Continuous information, as the situation changes, to the PVC (Information Services), Director and Associate Directors and the Service Desk about the status of the incident A detailed report about the cause and resolution of the major outage after the resolution and closure of the incident. 2.7.2 Major Outage Procedure outside Business Hours Recording, resolution and communication of major Outages outside business hours is responsibility of appropriate on call team. On call team will advise the Product and Service Manager responsible for that service who will contact relevant associate Director/Director. It is responsibility of Associate Director/Director to advise Pro Vice Chancellor, Information Services. If major outage is likely to continue into business hours Manager, Library and IT Help (Incident Manager) needs to be advised to take over communication with clients and Information Services staff with a Priority Team. 18 Griffith University Incident Management Process Handbook 2.7.3 Roles and Responsibilities Staff who will normally be involved in the resolution of major outage, coordination of resolution process and communication process are: Incident Manager PVC, Director, Associate Director Product Service Manager Technical specialist On Call Staff Vendor staff 2.7.4 Major Outage Communication Guidelines 2.7.4.1 Purpose The purpose of communication during major outage is to immediately investigate and confirm the impact and severity of the Incident. It should also confirm that the Incident is major outage and an emergency situation. 2.7.4.2 Frequency The frequency of the communication will be determined by the impact and severity of the outage. Frequency of communication for priority one and two incidents is outlined in the section 2.6.1 Incident Notification and Escalation, Table B above. 2.7.4.3 Content The content will depend on the audience. Communication with the staff who are tasked with the resolution of the incident will be internally focussed, detailed and will contain clear actions and timelines. Communication with clients will focus on the impact and what is being done to minimise the impact and resolve the incident. It should not contain technical descriptions or detailed information about internal processes. The content will focus on: - The nature and extent of the outage Assessment of the impact High level overview of actions taken to resolve the incident Estimated resolution time Confirmation that the incident has been resolved 2.7.4.4 Communication channels Every possible communication channel available at the time of the major outage should be used to communicate with the clients and Information Services staff: - Email Web page Phone Public Announcement System 19 Griffith University Incident Management Process Handbook Face to face (response team meetings, meetings with staff tasked with the resolution of Incident, meetings with Information Services staff impacted by the incident , CTS Team Leaders informing staff in schools, Library staff informing clients in the Library, etc.) Printed notices Notice boards SMS message to key stakeholders advising them who to contact for more information or updates - 20 Griffith University Incident Management Process Handbook 3 Roles and Responsibilities 3.1 Manager, Library and IT Help (Incident Management Process Owner) The Manager, Library and IT Help (Incident Management Process Owner) has responsibility for the Incident Management process. The Manager, Library and IT Help (Incident Management Process Owner) has the following responsibilities: - Monitors Incident Management process Determines scope of the Incident Management process Establishes Incident Management procedures Establishes prioritisation and escalation criteria Monitors incident escalations Establishes links to other service management disciplines Monitors trends and takes appropriate action Produces high level management reports about Incident Management Reviews Service Desk procedures Organises reviews and audit of process Initiates improvement programmes Liaises with Library and IT Help Team Leaders (Service Desk Manager) Liaises with Problem Manager Produces information for clients Liaises with Change Manager and Service Level Manager over proposed changes 3.2 Library and IT Help Campus Coordinators (Service Desk Manager) All Library and IT Help Campus Coordinators (Service Desk Manager) have the responsibility for the Service Desk tool and act as the line managers of Service Desk staff. The Library and IT Help Campus Coordinators (Service Desk Manager) have the following responsibilities: Monitor the quality of delivered services Determine the organisation, structure and scope of the Service Desk in consultation with the Manager, Library and IT Help (Incident Management Process Owner) Manage Service Desk staff Review staffing levels Review skill requirements Organise training Coordinate management reporting about Service Desk function (includes process reporting) 21 Griffith University Incident Management Process Handbook Are responsible for internal and external communication (clients, Information Services, Service Desk) Liaise with Product Managers Liaise with the business Promote Service Desk Establish links to service management processes Organise client satisfaction surveys (together with process owners) Liaise with other support teams providing technical resources Participate in service desk tool selection, tailoring and installation - 3.3 First Tier (Service Desk) Support Service Desk provides first tier support. First tier support is responsible for: Incident and service request registration Initial support and classification Resolution and recovery of incidents Escalation of incidents when necessary Monitoring of the status and progress toward resolution of all open incidents Following up on behalf of clients about progress towards resolution Monitoring of response and resolution times Closure of incidents (This process has been automated. Please refer to 2.4.7) Keeping affected clients informed about progress Quality checking of closed incidents Identifying the need for and creating/editing of Knowledge Base documents to assist with a timely response for future incidents (Pro-) Active relationship management with the clients Communication with clients about Information Services issues and service requests (status updates) Advising and assisting Information Services’ clients to make best use of services provided by Library and IT Help Encouraging use of self-help resources Note: Library and IT Help is Information Services product/service line that provides first tier support for the majority of Information Services products and services. For the purpose of Incident Management Process, Library and IT Help is defined as the Service Desk. However, all groups involved in the resolution of Incidents and using Service Desk tool have the responsibility to follow Service Desk processes. For example, all groups using Service Desk tool should record Incidents they detect and 22 Griffith University Incident Management Process Handbook those Incidents reported to them by other groups within Information Services that are not currently using Service Desk tool. 3.4 Resolution Groups (Second or Third Tier Support Groups) These groups comprise technical specialists who hold strong relationships with other areas within Information Services and have good skills in analysing incidents and problems. For Incident Management, resolution groups are of two types: - On site support Technical support with the following responsibilities: 3.4.1 On Site Support (Second or Third Tier Support Groups) On site support is responsible for: Resolution and recovery of incidents that need support at a physical location Escalating incidents where necessary (Hierarchical escalation) Escalation to another support group if necessary (Functional escalation) Resolution and recovery of assigned Incidents Monitoring the status and progress towards resolution of all open Incidents assigned to their group Communicating solutions and workarounds to Library and IT Help (Service Desk/First Level Support) to assist in Incident classification, initial support and escalation Promotion of the Library and IT Help (Service Desk/First level Support) and Information Services on location (e.g. Informing clients of correct channels for reporting incidents and how to obtain updates about outages and the status of requests) Keeping clients informed about the status of the Incident assigned to their group Accurate and complete recording of Incidents from and to other internal groups within Information Services, using the Service Desk tool Accurate and complete updating of activities and steps taken to resolve the Incident. 3.4.2 Technical Support (Second or Third Tier Support Groups) Technical support is responsible for: Escalating service requests where necessary Resolution and recovery of assigned incidents Monitoring the status and progress toward resolution of all open incidents assigned to their group Monitoring tasks of servers and network components and applications Keeping affected clients informed about progress Escalation to another support group if necessary 23 Griffith University Incident Management Process Handbook Communicating solutions and workarounds to On Site Support and Library and IT Help (Service Desk/First Tier Support) Communicating incidents generated by monitoring tools to Library and IT Help (Service Desk/First Tier Support) Providing monitoring and diagnosis tools to Library and IT Help (Service Desk/First Tier Support) Creating, knowledge base documents and providing relevant training to Service Desk staff Providing any other information to Library and IT Help (Service Desk/First Tier Support) to assist in incident classification, initial support and escalation Accurate and complete recording of Incidents from and to other internal groups within Information Services, using the Service Desk tool Accurate and complete updating of activities and steps taken to resolve the Incident. Defined Procedural Roles 1st Tier ARCI MATRIX Procedural Activities Incident submitted to Service Desk Incident detection and recording 2nd Tier 3rd Tier Service Incident Group Desk Management Managers Manager Process Owner Client A I I I - - R R R R A - - - Incident process R R R A - - I Request for Information process R R R A - - I R R R A - - I R R,C R,C A I I I R R R A I I I A R R I I I I R R,C R,C I,C A R I R R,C R,C C,R A R I Give the client a reference number Initial support and classification Escalation to right support group Communicate status updates to client Investigation and diagnosis Escalate using escalation procedure Resolution and recovery R R,C R,C C,R A R I Client approval of solution R R R I R - A Closure R I I A I I R Explanation of Roles Accountable The person in this process who has the accountability for ensuring the overall process is available, understood and performed correctly Responsible The person(s) who are expected to perform the prescribed activity, resolve and/or escalate the related issues. Multiple levels within the matrix can do this Consulted The person(s) who are consulted before decisions are made or implementations carried out Informed The person(s) who need to be informed about the prescribed activity 24 Griffith University Incident Management Process Handbook 3.4.3 Product and Service Managers Product and Service Managers are responsible for: Monitoring the status and progress toward resolution of all open incidents assigned to their group Ensuring that adequate resources are available for efficient and effective resolution of incidents assigned to their group Monitoring of, and adherence to, response and resolution times of incidents assigned to their groups Monitoring client feedback for their product and service group and following up on negative client feedback Ensuring that adequate resources are available to resolve Priority 1 incidents assigned to their groups as soon as possible When Priority 1 incident has not been resolved within 30 minutes liaising with the Manager, Library and IT Help (Incident Management Process Owner) about the situation and mobilisation of a Priority Team, which can include technical specialists from all product and service groups. 25 Griffith University Incident Management Process Handbook 4 Communication Framework This section consists of two parts: 1. Communication framework; In which the different communication lines between Information Services and the business are described on an operational, tactical and strategic level 2. Relationship with other processes; The input and output flows between Incident Management and the other processes (Problem, Change, Configuration and Service Level Management) are detailed in this section. 4.1 Communication Framework The Communication Framework provides a high level overview of the different communication lines between Information Services and the business on an operational, tactical and strategic level. The communication framework for the Incident Management process is shown below. Strategic Business Director Exception Reporting KPI’s INS Strategic Plan Griffith Strategic Plan PVC (INS) DSD Exception Reporting KPI’s INS Strategic Plan Griffith Strategic Plan INS Budget SLA’s Business Needs SLAs Tactical Business Manager Exception Reporting Service Levels Service Catalogue SLA Reporting Request for Service Outside SLA Client Satisfaction Survey Client Needs Satisfaction Survey Request for Service outside SLA SLA’s Operational End User Incident Reporting Incidents Service Catalogue Outages Request for Service Outside SLA Client Satisfaction Survey Service Delivery News Training Incident Problem Manager Exception Reporting KPI’s INS Strategic Plan Griffith Strategic Plan Request for Service Outside SLA SLA’s Incident Reporting Service Desk 26 Griffith University Incident Management Process Handbook 4.2 Relationship with Other Processes The following picture provides an overview on how Incident Management fits into the Service Support processes as described by ITIL. Incident Management RFC Service Desk Service Level Management Incident Problem Management Incident IT Operations Configuration Management RFC Change Management The main input and output flows between Incident Management and the other processes that will be implemented at Griffith University. (Problem, Change Configuration and Service Level Management) are detailed in this section. 4.3 Incident Management & Problem Management 4.3.1.1 Input Information needed from the Problem Management process by Incident Management includes: - Resolutions for Incidents Workarounds for Incidents Knowledge Base (known errors, existing resolutions, accepted workarounds) 4.3.1.2 Output Information provided to Problem Management by Incident Management includes: error) - Incident details (Affected systems, affected clients, classification, details of the History of occurred incidents Proposed workarounds for incidents Proposed solutions for incidents 27 Griffith University Incident Management Process Handbook 4.3.2 Incident Management & Change Management 4.3.2.1 Input Information needed from the Change Management process by Incident Management includes: - Change schedule Status update of scheduled changes Result of implemented changes (history) 4.3.2.2 Output Information provided to Change Management by Incident Management includes: - Accepted RFCs Advice on incidents resulting from an implemented change (feedback) 4.4 Incident Management & Configuration Management 4.4.1.1 Input Information needed from the Configuration Management process by Incident Management includes: - Details of Configuration Items (CIs) Relationships between CIs Service levels for CIs Service contact details 4.4.1.2 Output Information provided to Configuration Management by Incident Management includes: - Errors or discrepancies in Configuration Management Data Base (CMDB) Relationship between incidents and Cls 28 Griffith University Incident Management Process Handbook 4.5 Incident Management & Service Level Management 4.5.1.1 Input Information needed from the Service Level Management process by Incident Management includes: - Service Levels / KPIs Business priority escalations Service Catalogue Client satisfaction / feedback about the Incident Management process Communication about new services 4.5.1.2 Output Information provided to Service Level Management by Incident Management includes: - Incidents outside Service Level Agreement (SLA) Requests for service outside SLA (new ad hoc business requirements) Client satisfaction Exceptions to SLAs Escalation of priority calls Process information (management reporting) KPI reporting 29 Griffith University Incident Management Process Handbook 5 Performance Management The following Key Performance Indicators (KPIs) have been set for the Incident Management process: - 80% of incidents responded to within SLA (response time) 80% of incidents resolved within SLA (resolution time) 100% of non-pending incidents must have updated activity log < 2 days old 90% of incidents to follow predefined Incident Management process The following Key Performance Indicators (KPIs) have been set for the Service Desk: scale) received - 95% of calls to be answered within 10 seconds Client Satisfaction Survey to return average rating of 4 or higher (on a 5 point 100% of e-mailed incidents to be recorded within 24 business hours after email 80% of incidents solved at 1st tier 30 Griffith University Incident Management Process Handbook 6 Management Reports In this section the management information provided by the Incident Management process is specified. 6.1 Management Reports Management reporting takes place on daily, weekly and monthly basis about the following subjects: Daily Weekly Monthly All Exceptions All Exceptions All Exceptions Critical Issues Incident Summary Availability of services Open Incident Current Number of Users of Service Desk tool Closed Incident Implemented Improvements New Incident Incidents Summary – Rolling Trend Group Breakdown Incident Reporting, Incidents by: Priority 1 Incident Status SLA Exceptions Category Priority Service Group Incident Summary (Detailed KPIs) Performance Against SLA Priority 1 Incidents Recommendations Top 10 Service Desk tool users Management reports are submitted: On a daily basis to all staff involved in the Incident Management process On a weekly basis to Information Services management On a monthly basis to Information Services Management, the Library and IT Help Team Leaders (Service Desk Manager) and the Service Level manager These management reports are used to monitor the success of the Incident Management process and to identify any problems with the process. 31 Griffith University Incident Management Process Handbook 6.1.1 Incident Management Process Reports The following reports will be available for the Incident Management process. Metric Metric Use Total number of incidents recorded in the Service Desk tool (open and closed) Gives an indication of the overall workload of the Information Services staff. Total number of incidents recorded per Information Services service group (open and closed) Gives a breakdown of incidents logged to show which departments are requiring the most support. Further analysis can be carried out to drill down into “problem” departments to identify key groups that need more assistance than average. Number of incidents recorded per category (open and closed) Helps show which parts of the infrastructure are creating the most incidents. Useful to identify areas that could require detailed analysis to remove common problems. Percentage and number of incidents resolved within service level times An important measure that indicates the level of service that is being provided to the clients of Information Services. Percentage of incidents per priority code Will show the workload per priority code and hence service level. This data can be used to determine staffing levels, costs of services or the review of the priority codes defined. Percentage of incidents resolved at first, second and third tier. Shows where the support work is taking place. It can be a useful metric especially for the Service Desk as they take work away from second and third level freeing them up for more pro-active tasks. Number of incidents assigned per Information Services group/staff member (open and closed) Show the amount of work that different staff members and groups are processing Apart from the specific metrics above the following reports can be made available. Results of client satisfaction surveys A summary of any major outages, the actions taken to fix the outage and steps to ensure that this will not occur again. 32 Griffith University Incident Management Process Handbook 7 Process Review In order to maintain continuous improvement of the Incident Management process an ongoing review is essential. Detailed Incident Management review should be undertaken on a six monthly basis. These reviews should take place to ensure quality is maintained or improved. The following steps should take place for each review. Gather data and information from the Service Desk tool, Client Satisfaction Surveys and input from Incident Management staff. Input for the process review should be pro-actively sought from the Information Services staff by the Library and IT Help Manager (Incident Management Process Owner). Analyse the data looking at areas such as, client satisfaction, suggestions from staff and process metrics. Some specific metrics to look at include those listed in Sections 5 “Performance Management" and 6 “Management Reports” and the following list. Correct logging of incident data in terms of incident categories, priority and specific information relating to the incident Review of incident Management reports such as performance against service levels. State of the Incident Management Process Handbook (Up to date or awaiting review) Problems identified with the process From the above analysis identify improvement opportunities and put forward a report detailing the suggestions. This report should be submitted to the Information Services Management. Following approval for improvements documented in the above mentioned report a Process Improvement Plan should be formulated for improvements. Changes to the Incident Management process should be authorised by the Change Management processes. Once authorised implementation of changes should occur. Following an improvement project a Post Implementation Review should be carried out. This review should look at all the reports analysed before to identify that improvements have made a positive change. 33 Griffith University Incident Management Process Handbook Appendix A: Incident Management Quick Reference Card 34 Griffith University Incident Management Process Handbook Appendix B: Incident Categories Categories updated on continuous basis to reflect categories in Service Desk tool 35 Griffith University Incident Management Process Handbook 36 Griffith University Incident Management Process Handbook Appendix C: Abbreviations and Definitions 7.1 Abbreviations and Acronyms Used Abbreviation Definition INS Information Services ITIL Information Technology Infrastructure Library RFC Request For Change SLA Service Level Agreement 7.2 Definition of Terms Used Term Definition Business Manager A person authorised to make decisions on behalf of an organisational unit concerning a service and its associated service levels. Change Any action either physical or procedural which modifies or impacts the production environment Change Management The management and control of changes to the production environment, in order to minimise the impact of change-related problems. Change Manager The person responsible for processing change requests, chairing Change Advisory Board meetings, coordinating changes and reporting change activity to management Classification Determining the value of items by placing them in a certain order on the basis of category, impact, and severity. It can be used to support decisions concerning priorities. A component of an IT infrastructure. CIs may vary widely in complexity, size and type – from an entire system (including all hardware, software and documentation) to a single software module or a minor hardware component. Configuration Item (CI) Configuration Management The process of identifying and defining the CIs in a system, recording and reporting the status of CIs, and verifying the completeness and correctness of CIs. Incident ITIL Definition: Any event that deviates from the standard and expected operation of an IT system or service. Incident Management Further Description: An incident can be seen as a client requesting help for something that is not working. For example “I can’t print”, “I can’t access the Internet”. In any situation where something does not work and the specific details are not known it is an incident. The process that has as primary focus to restore normal service operation as quickly as possible and minimise the adverse impact on business operations Incident Recording The quality recording of incidents in such a way that other activities and increased service provision is possible. Incident Reporting The reporting of incidents, requests by clients and/or support groups. 37 Griffith University Incident Management Process Handbook Term Definition Information Services (Information Services) Information Services encompasses the supporting technologies and infrastructure on which the systems are run. Information Technology Infrastructure Library (ITIL) ITIL is a non-proprietary framework tailored to the operation of the IT infrastructure developed by the UK Office of Government Commerce. It is a set of comprehensive, consistent and coherent codes of best practice for IT Service Management. Information Services Element of Griffith University that encompasses the supporting technologies and infrastructure on which the systems are run and services provided. Information Services infrastructure The sum of an organisation’s IT-related hardware, software, data communication facilities, procedures, and people. Information Services service A described set of facilities, II and non IT, supported by the Information Services (service provider) that fulfils one of more needs of the client, that supports the client’ Key Performance Indicator (KPI) Key Performance Indicators are clearly defined objectives with measurable targets, set to judge process performance Known Error ITIL Definition: The successful diagnosis of the root cause of a Problem (i.e. the specific infrastructure component at fault has been identified). Problem Problem Management Further Description: A known error is logged when the specific root cause is known for a group of problems/incidents or a single major problem. The know error record will exactly define what has gone wrong and the solution so that it does not happen again. Continuing our example, a known error would be: “There is a fault with the network card in the printer in department X that is causing the printing problems”. A Known Error is more defined than a Problem. ITIL Definition: The unknown underlying cause of one or more incidents. More specifically A condition identified as a result of multiple incidents that exhibit common symptoms, or of a single significant incident indicative of a single error. Further Description: A problem is a more specific definition of something that has gone wrong. Quite often a number of similar incidents are linked to a common problem. In the case where a number of clients are not able to print a “problem” will be defined saying something like “there is a problem with the network in department X causing printing problems”. A Problem is more defined than an Incident. The process that has as primary focus to minimise the adverse impact of Incidents and Problems on the business that are caused by errors within the IT Infrastructure, and to prevent recurrence of Incidents related to these errors Process A connected series of actions, activities or operations performed with the intent of satisfying a purpose to achieve a goal. Release Management The process that has as primary focus to securely control the physical and logical storage, management, distribution and implementation of all software assets, ensuring that only currently authorised and quality checked versions of software, are actually brought into use in the production environment at minimal cost Request For Change A form or screen, used to record details of a request for a change to 38 Griffith University Incident Management Process Handbook Term Definition (RFC) any component of an IT infrastructure. Service Catalogue Written statement of services, default service levels and options. Service Desk Information Services organisational unit that makes its services accessible to clients. Library and IT Help product service group is this unit. All other Information Services groups E.g. S3, NCS, CTS contribute to the Service Desk Incident Management Process. See Section 3.3 "Service Desk Support”. Service Desk tool Application which is used to record incidents, RFIs, changes and problems (i.e. Service-Now.com). Service Level The expression of an aspect of a service in definitive and quantifiable terms. Service Level Agreement (SLA) A formal agreement between the client(s) and the IT service provider specifying service levels and the terms under which a service or a package of services is provided to the client. Service Level Management The process of regular communication with the client to find out their requirements and to offer new services and technologies. 39 Griffith University Incident Management Process Handbook Appendix D: Unsupported Functions NB There are no unsupported functions at present time. 40