AAA NCNU INFORMATION TECHNOLOGY IT OPERATIONS Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca, Manager, Problem Management 2/17/11 AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 1 Agenda PM Overview History Vision & Mission Operational Level Agreement (OLA) Action Items Trending (Proactive Problem Management) Facilitated Meetings (MIR & ToE) KPIs and Metrics Future Initiatives Questions? Problem Management Team Members AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 2 Problem Management Overview Main goal of Problem Management: – Detection of the underlying causes of an incident and the subsequent resolution and prevention of the incidents. Problem Management ensures: – The identification and classification of problems, root cause analysis, and resolution of problems Problem Management process also includes: – The formulation of recommendations for improvement, maintenance of problem records, and review of the status of corrective actions AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 3 History of PM at AAA Began our formal Problem Management practice in 2008. – Track major incidents – ID Root cause for major incidents – Rudimentary MS-Access dB to store info Began formal implementation of ITSM in June 2009 – Average root cause found was 55.4% – Mean time to close problems = 6 days Implemented current iteration of Problem Management October 2009. By January 2010. – Average root cause found was 83% – Mean time to close problems = 3 days We continue to mature our process AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 4 Vision and Mission VISION: – To permanently eliminate problems in our production environment and prevent new problems from occurring MISSION: – To aggressively identify root cause of problems and drive permanent solutions to stabilize our IT infrastructure We do this by: – PROCESSES: Ensuring PM processes and procedures are followed by IT support teams – ACTION ITEMS: Managing assigned action items and their timeframes with support teams to drive permanent solutions – ROOT CAUSE: Driving root cause identification within OLA timeframes AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 5 OLAs for PM Be aggressive: 3 Business days to identify root cause - Report enables us to track daily progress AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 6 Action Items Objective: – Action items are identified and assigned to drive permanent solutions Types of Action Items: – Root cause identification for every problem created from an incident – Areas of improvement • Documentation • Process improvement & training • Vendor management • Hardware replacement How are Action Items identified? – Incident management activities – Problem management activities – Root Cause Analysis – Meetings: Daily IT Operations Meeting, Major Incident Review (MIR), or Team of Experts (ToE) How are they tracked? – Maximo – integrated system with Change, Incident, and Asset AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 7 Trend Analysis (Proactive Problem Management) Objective: – Analyze related incidents for common root causes Collaboration with Operations Bridge: – Weekly work sessions to identify potential areas of concern – The Problem Management team reviews related incidents to look for common symptoms, causes, or conditions Commonalities identified by trend analysis? – A Global Problem record is created and assigned to the Service Owner with appropriately assigned action items Service Owner analysis: – The Service Owner prioritizes their efforts – Determine to identify root cause – Prioritize and approve with business for funding, scheduling AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 8 Major Incident Review (MIR) What is it? – Evaluation of the incident process after a major incident What’s it’s purpose? – Validate details of the incident record – Review incident handling – identify opportunities – Identify lessons learned - share across the enterprise – Identify action items When is one required? – Mandated for all Severity 1 incidents – Lower severities by request or as needed Why does Problem Management facilitate a Major Incident Review? – Unbiased view of events – no call involvement AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 10 MIR Agenda AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 11 MIR Template AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 12 Team of Experts (ToE) What is it? – A special team of technical subject matter experts (SMEs) assembled to analyze and resolve critical problems at an accelerated pace to minimize or eliminate exposure. How long has this process been in place? – This is one of our newest additions – since December 2010 Why are ToEs initiated? – Teams not collaboratively engaging each other – Need to identify root cause immediately – back to back incidents – Leadership’s request for information and status of critical or chronic problems AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 13 ToE (cont.) ToE Activities – Root cause analysis – Brainstorm solutions and permanent fixes – Assign action items and due dates Where’s the template? – Currently under construction AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 14 KPIs and Metrics KPIs – Root cause identified within OLA – MIRs conducted for Sev1 Incidents Operational Metrics – Total Problems by Severity – Problems by Causing Party – Outages by Domain (Applications, Network, Security, Servers, Telecom or Other) AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 15 KPIs *Baseline determined by internal historical data = 82% *Industry standards non-existent AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 16 KPI Details *2010 Average for RC Identified within OLA = 85.7% AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 17 Examples of Metrics *Change Freeze AT&T AAA NCNU AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 18 Future Initiatives Workarounds and defects – Known Error Database Action item validation – quality check on completed actions ToE template development AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 19 Questions? PROBLEM MANAGEMENT TEAM MEMBERS – Mark Hernandez - IT Service Transition Analyst V – Gessica Briggs-Sullivan – IT Service Transition Analyst III – Andrew Egan - Intern AAA NCNU INFORMATION TECHNOLOGY: I T OPERATIONS 20