Direct Cause vs Root Cause “A Problem Solving Concept” INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department 12341, Weapon System and Software Quality Presentation Objective Events have many potential “causes”. We tend to think of “causes” as related mostly to “unwanted” events – but in effect, all events that occur have “causes” – that is, the reason that the event occurs. The objective of this short presentation/discussion is to gain a better understanding of why it is important to understand the difference between “direct” causes and “root” causes of events. In so doing, we enhance our capability to influence a much larger class of events – both in preventing unwanted events and ensuring wanted events actually do occur. Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 2 An Example of a Problem USAF F-22A jets grounded by software glitch <Jeremy Epstein <jepstein@webmethods.com>> Fri, 23 Feb 2007 15:55:52 -0500 Navigational systems failed, planes forced to return to Hawaii [visually having to follow their tankers to safety]. The problem turns out to be software (no surprise there). Fix created, "verified", installed, and they're off again. [Direct or Root Cause addressed?] A spokesman for Lockheed Martin this week insisted that the navigation software problem was minor. 'The issue was quickly identified in a matter of days and a fix installed in the airplanes, which were flown successfully to Japan,' he said. 'There are 87 of these exceptional fighters and they are out there performing exceptionally well, and their pilots continue to fly them in new and greater ways.'" Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 3 Examples to Test Our Understanding RESOURCE: http://catless.ncl.ac.uk/Risks Peter Neumann, Stanford University Professor RISK site provides a voluminous list of risks, many of which are computer/software related primarily interested in security and safety risks; summaries are provided with links to more detail. Army Training Accident, June 2002 Friendly Fire Deaths, March 2002 Medical “Direct/Root” Cause Determinations Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 4 A Simple Example Assume each of these factors is as described below: e: car will not start d: battery is dead c: alternator does not function b: alternator is well beyond its designed service life a: car is not being maintained according to recommended service schedule Direct Cause? Intermediary Causes? Root Cause? Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 5 Error, Fault/Defect, Failure Error – a human action or lack of action that results in the inclusion of a fault in a product or the way it is used – the variance between expected and actual results Fault/Defect – an accidental condition that causes a product to fail to perform its required function if encountered during operational use Failure – an event in which a product does not perform a required function within its specified limits during operational use ERROR may lead to FAULT/DEFECT may lead to or FAULT TOLERANCE may lead to Direct Cause vs Root Cause INCOSE Chapter Meeting FAILURE NO FAILURE REDUCED EFFECT March 14, 2007 6 Direct Cause Causes of events may be natural or man-made, active or passive, initiating or permitting, obvious or hidden. Those causes that lead immediately to the effect are often called direct or proximate causes. Examples of direct/proximate causes: Equipment Human Arched • Pushed incorrect button Leaked • Fell Over-loaded • Dropped tool Over-heated • Connected wires Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 7 Root Cause Direct causes often result from another set of causes, which could be called intermediate causes, and these may be the result of still other causes. When a chain of cause and effect is followed from a known end-state, back to an origin or starting point, root causes are found. The process used to find root causes is called root cause analysis --- systematic problem solving. A root cause is an initiating cause of a causal chain which leads to an outcome or effect of interest. Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 8 The Benefits of Problem Solving! The usual purpose of attempting to find root causes is to solve a problem that has actually occurred, or to prevent a less serious problem from escalating to an unacceptable level (e.g., Near miss safety for aircraft). The basic concept is that solving a problem by addressing root causes is ultimately more effective than merely addressing symptoms or direct causes. That is, a “class” of problems may be solved/prevented by addressing root causes rather than just direct causes. Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 9 Basic Process - Continue to Ask Why! Continue to ask “why” until you have reached: 1. Direct, Intermediate, and Root cause(s) - including all organizational factors that exert control over the design, fabrication, development, maintenance, operation, and disposal of the system. 2. A problem/cause that is not correctable by your organization => may be promoted to higher responsible organization. 3. Insufficient data to continue. Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 10 Example Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 11 Why-Causal Tree Undesired Undesired Outcome Outcome WHY Event #1 Occurred WHY Event #1 Occurred WHY Condition Existed or Changed WHY WHY WHY WHY WHY WHY WHY WHY WHY Failed Failed or or Exceeded Exceeded Barrier Barrier or or Control Control Event Event #2 #2 Condition Condition Event Event #1 #1 WHY Condition Existed or Changed WHY Event #2 Occurred WHY WHY WHY WHY WHY Event #2 Occurred WHY WHY WHY Direct Cause vs Root Cause INCOSE Chapter Meeting WHY WHY Failed/Exceeded Barrier or Control WHY WHY WHY WHY Failed/Exceeded Barrier or Control PROXIMATE CAUSES INTERMEDIATE CAUSES WHY ROOT CAUSES WHY March 14, 2007 12 Example Lost Lost High High Speed Speed Data Data Stream Stream From From Satellite Satellite (Mission Failure) (Mission Failure) Thrusters Thrusters Oriented Oriented Space Space Craft Craft Poor Poor Line Line of of Sight Sight Technician Technician Used Used Wrong Wrong Method Method to to Correct Correct Satellite Satellite Failed Failed To To Deploy Deploy Antenna Antenna Power Supply Failed Battery Dead Installed Improperly Beyond Shelf Limit Root Cause is Much Deeper Keep Asking Why Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 13 Potential Problem Analysis Tools Failure Modes and Effects Analysis (FMEA) – an inductive engineering technique used at the component level to define, identify, and eliminate known and/or potential failures, problems, and errors from the system, design, process, and/or service before they reach the customer Fault Tree Analysis (FTA) – FTA is a deductive analytical technique of reliability and safety analyses and generally is used for complex dynamic systems Probabilistic Risk Assessment (PRA) – PRA is a systematic, logical, and comprehensive discipline that uses tools like FMEA, FTA, Event Tree Analysis (ETA), Event Sequence Diagrams (ESD), Master Logic Diagrams (MLD), Reliability Block Diagrams (RBD), and so forth to quantify risk. Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 14 Summary Direct Cause vs Root Cause – Issue: level of problem solving Problem Solving – Direct Cause: objective is to solve an instance of a potential class of problems – Root Cause: objective is to solve a class of problems – Both are useful Analysis Methods – Methods exist to analyze events – goal is to eliminate occurrence of unwanted events and ensure wanted events do occur – FMEA, FTA, PRA Q&A? Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 15 Examples Army Training Accident Incident – Thu, 13 Jun 2002: two soldiers were killed in training at Ft Drum. They were firing artillery shells, and were relying on the output of the Advanced Field Artillery Tactical Data System. When they forgot to enter the target altitude, the system assumed an altitude of zero. (Ft Drum is 676 ft) Direct Cause – Soldiers forgot to enter the target altitude Potential Root Cause(s) – Software should not default to a valid altitude – Software/System analysis and modeling/testing inadequate – Software requirements not adequately specified – System CONOPS not adequate – Soldier training inadequate Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 17 Friendly Fire Deaths Incident – A U.S. Special Forces air controller was calling in GPS positioning from some sort of battery-powered device. He had used the GPS receiver to calculate the latitude and longitude of the Taliban position in minutes and seconds for an airstrike by a Navy F/A-18. The bomber crew "required" a seconds calculation in degree decimals. The crew did not have equipment to perform the minutesseconds conversion themselves. – The air controller had recorded the correct value in the GPS receiver when the battery died. Upon replacing the battery, he called in the degree-decimal position the unit was showing -- without realizing that the unit is set up to reset to its *own* position when the battery is replaced. – The 2,000-pound bomb landed on the air controller position, killing three Special Forces soldiers and injuring 20 others. Direct Cause – Taliban position was incorrectly transmitted to the Navy F/A-18 bomber crew Potential Root Cause(s) – GPS System Default was a valid not invalid position – Lack of battery backup to hold values in memory during battery replacement – Not equipping users to translate one coordinate system to another (reminiscent of the Mars Climate Orbiter slamming into the planet when ground crews confused English with metric) – Using a device with such flaws in a combat situation without adequate testing Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 18 Medical Direct/Root Cause Example 1 - Questions? Sentinel event A patient was given the wrong medication and the patient experienced an adverse reaction. As a result, the patient's length of stay was extended for an additional 10 days. Direct cause The nurse who administered the medication did not compare the name on the patient's armband to the name on the medication order. The nurse did not follow the patient identification policy. Direct Cause vs Root Cause INCOSE Chapter Meeting Root cause - thoughts? Registration staff placed the wrong armband on the patient's arm to begin with. March 14, 2007 19 Medical Direct/Root Cause Example 2 - Questions? Sentinel event Doctor prescribes an anti-seizure drug (phenytoin) and the patient develops a severe allergic reaction known as anaphylaxis. The symptoms were itching, hives, swelling in the throat, wheezing, light-headedness from low blood pressure, nausea, and Direct cause Patient is allergic to phenytoin. Root cause - thoughts? The doctor did not do a thorough background check on the patient medical history or the patient did not inform the doctor of his/her previous medical history. abdominal cramping. Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 20 Medical Direct/Root Cause Example 3 - Questions? Sentinel event Medication of Lasix drip hung to wrong patient. Patient had same last name. Direct cause Interruption during medication administration. nurse had very heavy patient assignment and skipped double check medication administration with another RN. Direct Cause vs Root Cause INCOSE Chapter Meeting Root cause - thoughts? Missed the double check process on patient identification and medication administration. All hospital medication should be double checked by two nurses. March 14, 2007 21 Medical Direct/Root Cause Example 4- Questions? Sentinel event A patient slips and falls on a slippery floor that has been mopped previously from another patient having an upset stomach. Direct cause Root cause - thoughts? Janitor was not able to put signs down noting caution before the patient walked The sign is not down noting down the hall because he was the caution. interrupted by a cafeteria worker needing him to clean a spill made. Direct Cause vs Root Cause INCOSE Chapter Meeting March 14, 2007 22