Interim major incident management process Support team responsibilities Objective Restore service with minimal downtime Effective communication Ensure and create an environment for support team to fully focus on the incident Responsibilities Task Is New? IT Helpdesk reports major incident to Major Incident Manager (MIM) on +61 3 992 52777 and IT Support team via phone or infra call and MIM assigns actions as appropriate New …or…. IT Support team inform IT Helpdesk on finding a suspected/potential major incident via phone (or phone and send infra call ticket) New Conduct thorough assessment Current Develop response and obtain approval from MIM on +61 3 992 52777 (common sense, speak to line manager if you cannot get hold of MIM!!) Current Execute response Current Update progress to IT Helpdesk / MIM at regular intervals as agreed with MIM New Restore service Current Prepare and document root cause of the incident on Infra Call New Monitor the impacted service Current RMIT University©2011 Information Technology Services 2 Hours of coverage • Major Incident Manager operates from 8am to 8pm (core business/IT Helpdesk hours) • Quality Assurance Services (QAS) and Helpdesk will share responsibility to provide 12 hour coverage • Use +61 3 992 52777 to contact the Major Incident Manager • Out of hours, current “as is” process for each support team remains: – Technical “on call” person notified by alerts – “On call” person fixes – Escalate to line manager if necessary Note: Out of hours “as is” process varies between teams (to be standardised at a later date) RMIT University©2011 Information Technology Services 3 Process flow diagram Major Incident Management - Team Roles & Responsibilities Incident User Call Plan Execution MIM Team & Response Plan Incident Categorisation Report Incident IT Helpdesk MI Manager (MIM) Support Team Heading <<Trigger>> - - Monitoring Tool, etc., Create Incident Is it a Major? <<use priority matrix>> NO Create Incident Apply Solution Template Email + Phone YES Start Incident Found Continue with Standard Incident Management & fix Process Conduct initial assessment to determine impacted services and users groups NO Is Major? Use priority matrix to assess <<YES - Preliminary Assessment>> Phone MIM to establish Major Incident response Update Progress to MIM Repeat at a frequency of 30 minutes or as agreed with MIM MI Team to conduct thorough assessment and develop response Get response approval from MIM. MI Team to execute the response. Progress update to IT Helpdesk /MIM at a regular interval as agreed with MIM Update Incident Record <<Service Restored>> MI Closure Prepare report within 5 days from Major incident resolved date Get Approval Problem management (notify, if applicable) Monitor the impacted services Prepare root cause and Incident closure report Initiate follow up with appropriate teams (if applicable) Closure RMIT University©2011 Information Technology Services 4 Evaluation Urgency x Impact = Priority Urgency High Medium Low Impact University wide Campus wide Building (50+)/Faculty Floor/Lab (10+)/Course Individual Gold - Top "10" services or core Silver - Mid tier services Bronze - Other services * Top 10 • Peoplesoft • SAP • myRMIT / Learning Hub • Blackboard • Google Mail • Staff Groupwise Mail • VOIP • Teaching Spaces (AV) • Teaching Spaces (IT) • EOL / STS Core • Network • Load Balancers • Storage • NDS/AD • DNS • DHCP • Firewalls • etc… Priority Matrix Individual Floor/Lab/Course Building/Faculty Campus Wide University Wide Low P4 P4 P3 P3 P3 Medium P4 P3 P2 P2 P1 High P3 P2 P1 P1 P1 P1 = Major Incident * Refer to “Critical Services List” on http://www.rmit.edu.au/its/majorincident for latest information RMIT University©2011 Information Technology Services 5