Interim Major Incident Management

advertisement
Interim major incident management
process
Support team responsibilities
Objective
 Restore service with minimal downtime
 Effective communication
 Ensure and create an environment for support team to fully focus on the incident
Responsibilities
Task
Is New?
IT Helpdesk reports major incident to Major Incident Manager (MIM) on +61 3
992 52777 and IT Support team via phone or infra call and MIM assigns
actions as appropriate
New
…or…. IT Support team inform IT Helpdesk on finding a suspected/potential
major incident via phone (or phone and send infra call ticket)
New
Conduct thorough assessment
Current
Develop response and obtain approval from MIM on +61 3 992 52777
(common sense, speak to line manager if you cannot get hold of MIM!!)
Current
Execute response
Current
Update progress to IT Helpdesk / MIM at regular intervals as agreed with MIM
New
Restore service
Current
Prepare and document root cause of the incident on Infra Call
New
Monitor the impacted service
Current
RMIT University©2011
Information Technology Services
2
Hours of coverage
• Major Incident Manager operates from 8am to 8pm (core business/IT
Helpdesk hours)
• Quality Assurance Services (QAS) and Helpdesk will share responsibility to
provide 12 hour coverage
• Use +61 3 992 52777 to contact the Major Incident Manager
• Out of hours, current “as is” process for each support team remains:
– Technical “on call” person notified by alerts
– “On call” person fixes
– Escalate to line manager if necessary
Note: Out of hours “as is” process varies between teams (to be standardised
at a later date)
RMIT University©2011
Information Technology Services
3
Process flow diagram
Major Incident Management - Team Roles & Responsibilities
Incident
User Call
Plan
Execution
MIM Team &
Response
Plan
Incident
Categorisation
Report
Incident
IT Helpdesk
MI Manager (MIM)
Support Team
Heading
<<Trigger>>
- - Monitoring Tool, etc.,
Create
Incident
Is it a Major?
<<use priority
matrix>>
NO
Create
Incident
Apply Solution
Template
Email + Phone
YES
Start
Incident Found
Continue with
Standard Incident
Management &
fix Process
Conduct initial
assessment to
determine impacted
services and users
groups
NO
Is Major?
Use priority
matrix to
assess
<<YES - Preliminary Assessment>>
Phone
MIM to establish
Major Incident
response
Update Progress to MIM
Repeat at a frequency of 30 minutes or as
agreed with MIM
MI Team to conduct
thorough assessment
and develop
response
Get response
approval from MIM.
MI Team to execute
the response.
Progress update to IT
Helpdesk /MIM at a
regular interval as
agreed with MIM
Update Incident Record
<<Service
Restored>>
MI Closure
Prepare report within 5 days
from Major incident resolved
date
Get Approval
Problem management
(notify, if applicable)
Monitor the
impacted
services
Prepare root cause
and Incident closure
report
Initiate follow up with
appropriate teams (if
applicable)
Closure
RMIT University©2011
Information Technology Services
4
Evaluation
Urgency x Impact = Priority
Urgency
High
Medium
Low
Impact
University wide
Campus wide
Building (50+)/Faculty
Floor/Lab (10+)/Course
Individual
Gold - Top "10" services or core
Silver - Mid tier services
Bronze - Other services
*
Top 10
• Peoplesoft
• SAP
• myRMIT / Learning Hub
• Blackboard
• Google Mail
• Staff Groupwise Mail
• VOIP
• Teaching Spaces (AV)
• Teaching Spaces (IT)
• EOL / STS
Core
• Network
• Load Balancers
• Storage
• NDS/AD
• DNS
• DHCP
• Firewalls
• etc…
Priority Matrix
Individual
Floor/Lab/Course
Building/Faculty
Campus Wide
University Wide
Low
P4
P4
P3
P3
P3
Medium
P4
P3
P2
P2
P1
High
P3
P2
P1
P1
P1
P1 = Major Incident
* Refer to “Critical Services List” on http://www.rmit.edu.au/its/majorincident for latest information
RMIT University©2011
Information Technology Services
5
Download