Interim Major Incident Management

advertisement
Interim major incident management
process
IT Helpdesk responsibilities
Objective
 Restore service with minimal downtime
 Effective communication and co-ordination
Responsibilities
Task
Is New?
Create an incident, identify patterns/multiple calls as potential major incident
Current
…or… create suspected major incident as reported by Support Team (via
phone)
New
Categorise incident(s) as major (MI) by comparing with priority matrix and
critical services list (Gold/Silver/Bronze)
New
Apply Solution Template “MAJOR INCIDENT” and assign MI to Major Incident
Manager (MIM) via phone (+61 3 992 52777) and infra call ticket (…. also
notify primary support team to begin investigation while MIM engaged)
New / Current
Co-ordinate with Support Team to get updates
New
Update the progress to MIM via infra call updates and phone (+61 3 992
52777)
New
RMIT University©2011
Information Technology Services
2
Hours of coverage
• Major Incident Manager operates from 8am to 8pm (core business/IT
Helpdesk hours)
• Quality Assurance Services (QAS) and IT Helpdesk will share responsibility
to provide 12 hour coverage
• Use +61 3 992 52777 to contact the Major Incident Manager (note: call may
divert to IT Helpdesk depending on time of day)
• Out of hours, current “as is” process for each support team remains:
– Technical “on call” person notified by alerts
– “On call” person fixes
– Escalate to line manager if necessary
Note: Out of hours “as is” process varies between teams (to be standardised
at a later date)
RMIT University©2011
Information Technology Services
3
Process flow diagram
Major Incident Management - Team Roles & Responsibilities
Incident
User Call
IT Helpdesk
Support Team
<<Trigger>>
- - Monitoring Tool, etc.,
Create
Incident
Is it a Major?
<<use priority
matrix>>
NO
Create
Incident
<<Apply Solution
Templates
Email + Phone>>
Plan
Execution
MIM Team &
Response
Plan
Incident
Categorisation
Report
Incident
MI Manager (MIM)
YES
Start
Incident Found
Continue with
Standard Incident
Management &
fix Process
Conduct initial
assessment to
determine impacted
services and users
groups
NO
Is Major?
Use priority
matrix
<<YES - Preliminary Assessment>>
Phone
MIM to establish
Major Incident
response
Update Progress to MIM
Repeat at a frequency of 30
minutes or as agreed with MIM
MI Team to conduct
thorough assessment
and develop
response
Get response
approval from MIM.
MI Team to execute
the response.
Progress update to IT
Helpdesk/ MIM at a
regular interval as
agreed with MIM
Update Incident Record
<<Service
Restored>>
MI Closure
Prepare report within 5 days
from Major incident resolved
date
Get Approval
Problem management
(notify, if applicable)
Monitor the
impacted
services
Prepare root cause
and Incident closure
report
Initiate follow up with
appropriate teams (if
applicable)
Closure
RMIT University©2011
Information Technology Services
4
Evaluation
Urgency x Impact = Priority
Urgency
High
Medium
Low
Impact
University wide
Campus wide
Building (50+)/Faculty
Floor/Lab (10+)/Course
Individual
Gold - Top "10" services or core
Silver - Mid tier services
Bronze - Other services
*
Top 10
• Peoplesoft
• SAP
• myRMIT / Learning Hub
• Blackboard
• Google Mail
• Staff Groupwise Mail
• VOIP
• Teaching Spaces (AV)
• Teaching Spaces (IT)
• EOL / STS
Core
• Network
• Load Balancers
• Storage
• NDS/AD
• DNS
• DHCP
• Firewalls
• etc…
Priority Matrix
Individual
Floor/Lab/Course
Building/Faculty
Campus Wide
University Wide
Low
P4
P4
P3
P3
P3
Medium
P4
P3
P2
P2
P1
High
P3
P2
P1
P1
P1
P1 = Major Incident
* Refer to “Critical Services List” on the http://www.rmit.edu.au/its/majorincident for latest information
RMIT University©2011
Information Technology Services
5
Download