Name of presentation

advertisement
AAA NCNU INFORMATION TECHNOLOGY
IT OPERATIONS
Problem Management
Jim Heronime, Manager, ITSM Program
Tanya Friehauf-Dungca, Manager, Problem Management
2/17/11
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
1
Agenda
 PM Overview
 History
 Vision & Mission
 Operational Level Agreement (OLA)
 Action Items
 Trending (Proactive Problem Management)
 Facilitated Meetings (MIR & ToE)
 KPIs and Metrics
 Future Initiatives
 Questions? Problem Management Team Members
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
2
Problem Management Overview
 Main goal of Problem Management:
– Detection of the underlying causes of an incident and the subsequent resolution and
prevention of the incidents.
 Problem Management ensures:
– The identification and classification of problems, root cause analysis, and resolution
of problems
 Problem Management process also includes:
– The formulation of recommendations for improvement, maintenance of problem
records, and review of the status of corrective actions
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
3
History of PM at AAA
 Began our formal Problem Management practice in 2008.
– Track major incidents
– ID Root cause for major incidents
– Rudimentary MS-Access dB to store info
 Began formal implementation of ITSM in June 2009
– Average root cause found was 55.4%
– Mean time to close problems = 6 days
 Implemented current iteration of Problem Management October 2009. By
January 2010.
– Average root cause found was 83%
– Mean time to close problems = 3 days
 We continue to mature our process
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
4
Vision and Mission
 VISION:
– To permanently eliminate problems in our production environment and prevent new
problems from occurring
 MISSION:
– To aggressively identify root cause of problems and drive permanent solutions to
stabilize our IT infrastructure
 We do this by:
– PROCESSES: Ensuring PM processes and procedures are followed by IT support
teams
– ACTION ITEMS: Managing assigned action items and their timeframes with support
teams to drive permanent solutions
– ROOT CAUSE: Driving root cause identification within OLA timeframes
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
5
OLAs for PM
Be aggressive: 3 Business days to identify root cause
- Report enables us to track daily progress
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
6
Action Items
 Objective:
– Action items are identified and assigned to drive permanent solutions
 Types of Action Items:
– Root cause identification for every problem created from an incident
– Areas of improvement
• Documentation
• Process improvement & training
• Vendor management
• Hardware replacement
 How are Action Items identified?
– Incident management activities
– Problem management activities – Root Cause Analysis
– Meetings: Daily IT Operations Meeting, Major Incident Review (MIR), or Team of

Experts (ToE)
How are they tracked?
– Maximo – integrated system with Change, Incident, and Asset
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
7
Trend Analysis (Proactive Problem Management)
 Objective:
– Analyze related incidents for common root causes
 Collaboration with Operations Bridge:
– Weekly work sessions to identify potential areas of concern
– The Problem Management team reviews related incidents to look for common
symptoms, causes, or conditions
 Commonalities identified by trend analysis?
– A Global Problem record is created and assigned to the Service Owner with
appropriately assigned action items
 Service Owner analysis:
– The Service Owner prioritizes their efforts
– Determine to identify root cause
– Prioritize and approve with business for funding, scheduling
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
8
Major Incident Review (MIR)
 What is it?
– Evaluation of the incident process after a major incident
 What’s it’s purpose?
– Validate details of the incident record
– Review incident handling – identify opportunities
– Identify lessons learned - share across the enterprise
– Identify action items
 When is one required?
– Mandated for all Severity 1 incidents
– Lower severities by request or as needed
 Why does Problem Management facilitate a Major Incident Review?
– Unbiased view of events – no call involvement
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
10
MIR Agenda
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
11
MIR Template
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
12
Team of Experts (ToE)
 What is it?
– A special team of technical subject matter experts (SMEs) assembled to analyze
and resolve critical problems at an accelerated pace to minimize or eliminate
exposure.
 How long has this process been in place?
– This is one of our newest additions – since December 2010
 Why are ToEs initiated?
– Teams not collaboratively engaging each other
– Need to identify root cause immediately – back to back incidents
– Leadership’s request for information and status of critical or chronic problems
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
13
ToE (cont.)
 ToE Activities
– Root cause analysis
– Brainstorm solutions and permanent fixes
– Assign action items and due dates
 Where’s the template?
– Currently under construction
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
14
KPIs and Metrics
 KPIs
– Root cause identified within OLA
– MIRs conducted for Sev1 Incidents
 Operational Metrics
– Total Problems by Severity
– Problems by Causing Party
– Outages by Domain (Applications, Network, Security, Servers, Telecom or Other)
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
15
KPIs
*Baseline determined by internal historical data = 82%
*Industry standards non-existent
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
16
KPI Details
*2010 Average for RC Identified within OLA = 85.7%
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
17
Examples of Metrics
*Change
Freeze
AT&T
AAA
NCNU
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
18
Future Initiatives
 Workarounds and defects – Known Error Database
 Action item validation – quality check on completed actions
 ToE template development
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
19
Questions?
 PROBLEM MANAGEMENT TEAM MEMBERS
– Mark Hernandez - IT Service Transition Analyst V
– Gessica Briggs-Sullivan – IT Service Transition Analyst III
– Andrew Egan - Intern
AAA NCNU INFORMATION TECHNOLOGY: I T
OPERATIONS
20
Download