Interim major incident management process IT Helpdesk responsibilities Objective Restore service with minimal downtime Effective communication and co-ordination Responsibilities Task Is New? Create an incident, identify patterns/multiple calls as potential major incident Current …or… create suspected major incident as reported by Support Team (via phone) New Categorise incident(s) as major (MI) by comparing with priority matrix and critical services list (Gold/Silver/Bronze) New Apply Solution Template “MAJOR INCIDENT” and assign MI to Major Incident Manager (MIM) via phone (+61 3 992 52777) and infra call ticket (…. also notify primary support team to begin investigation while MIM engaged) New / Current Co-ordinate with Support Team to get updates New Update the progress to MIM via infra call updates and phone (+61 3 992 52777) New RMIT University©2011 Information Technology Services 2 Hours of coverage • Major Incident Manager operates from 8am to 8pm (core business/IT Helpdesk hours) • Quality Assurance Services (QAS) and IT Helpdesk will share responsibility to provide 12 hour coverage • Use +61 3 992 52777 to contact the Major Incident Manager (note: call may divert to IT Helpdesk depending on time of day) • Out of hours, current “as is” process for each support team remains: – Technical “on call” person notified by alerts – “On call” person fixes – Escalate to line manager if necessary Note: Out of hours “as is” process varies between teams (to be standardised at a later date) RMIT University©2011 Information Technology Services 3 Process flow diagram Major Incident Management - Team Roles & Responsibilities Incident User Call IT Helpdesk Support Team <<Trigger>> - - Monitoring Tool, etc., Create Incident Is it a Major? <<use priority matrix>> NO Create Incident <<Apply Solution Templates Email + Phone>> Plan Execution MIM Team & Response Plan Incident Categorisation Report Incident MI Manager (MIM) YES Start Incident Found Continue with Standard Incident Management & fix Process Conduct initial assessment to determine impacted services and users groups NO Is Major? Use priority matrix <<YES - Preliminary Assessment>> Phone MIM to establish Major Incident response Update Progress to MIM Repeat at a frequency of 30 minutes or as agreed with MIM MI Team to conduct thorough assessment and develop response Get response approval from MIM. MI Team to execute the response. Progress update to IT Helpdesk/ MIM at a regular interval as agreed with MIM Update Incident Record <<Service Restored>> MI Closure Prepare report within 5 days from Major incident resolved date Get Approval Problem management (notify, if applicable) Monitor the impacted services Prepare root cause and Incident closure report Initiate follow up with appropriate teams (if applicable) Closure RMIT University©2011 Information Technology Services 4 Evaluation Urgency x Impact = Priority Urgency High Medium Low Impact University wide Campus wide Building (50+)/Faculty Floor/Lab (10+)/Course Individual Gold - Top "10" services or core Silver - Mid tier services Bronze - Other services * Top 10 • Peoplesoft • SAP • myRMIT / Learning Hub • Blackboard • Google Mail • Staff Groupwise Mail • VOIP • Teaching Spaces (AV) • Teaching Spaces (IT) • EOL / STS Core • Network • Load Balancers • Storage • NDS/AD • DNS • DHCP • Firewalls • etc… Priority Matrix Individual Floor/Lab/Course Building/Faculty Campus Wide University Wide Low P4 P4 P3 P3 P3 Medium P4 P3 P2 P2 P1 High P3 P2 P1 P1 P1 P1 = Major Incident * Refer to “Critical Services List” on the http://www.rmit.edu.au/its/majorincident for latest information RMIT University©2011 Information Technology Services 5