Understanding Frequent Root Causes of System-development Failure 7 March 2012 Neil Siegel Vice-President & Chief Engineer Failure is not Uncommon • The record indicates that the development of large-scale systems remains an endeavor that often fails. – Requiring significantly more money &/or time to complete than originally planned – Under-delivery of specified functionality – Lack of suitability of the delivered system for the actual intended use – Cancellation of the development project before a useful product has been delivered • For example, (Glass 2001) cites data indicating that only about 16% of system development projects that he examined were listed as successful by their own developers. • Analyses of root causes* tend to focus on factors such as incomplete requirements, changing requirements, and so forth. – These are sometimes symptoms, and not causes. – I offer four candidate root causes, and discuss how to address each. 2 * For example, (Boehm 5-2006) Four Root Causes for Failures • “More Precision than Accuracy” • “Effective but not Suitable” • 90-90 Failures • Too Late / Too Expensive to be useful 3 More Precision than Accuracy • We may have a great system specification, but the “wicked” nature of the problem prevents us from actually achieving consensus on what they system needs to do, even if we think we have already done so. – – – – Ill-defined Involve many stakeholders with strong and opposing views. Have conditions that change midstream. Are misunderstood until a solution is in hand.* • In many large-scale endeavors, the social factors must be addresses in synchronicity with the technical problems. – So our specification – and contract, and statement-of-work, and design baseline, etc – are likely of little real value in reaching a satisfactory conclusion to the project. 4 * Quoted from Steve Nixon, “Wicked Problems, November 2011. Used with permission. Recognizing Wicked Problems Every time we discuss it with the users, we get important new insights about what the problem actually is that we are trying to solve. We don’t seem actually to know who are all of the stakeholders – we keep finding new ones. 5 The problem seems actually to change. We can’t get the stakeholders to agree. “Everything should talk to everything” – we can’t seem to bound the problem. Adapted from Steve Nixon, “Wicked Problems, November 2011. Used with permission. Solving Wicked Problems Collaboration Experimentation Social complexity from integrated networks is a key driver. Traditional linear solution styles are not well-suited. 6 Adapted from Steve Nixon, “Wicked Problems, November 2011. Used with permission. “Effective but not Suitable” • 95%+ of our specifications describe desired functionality, but experience suggests: – That while the resulting systems may be effective (in the sense that they provide the specified functionality), they are not suitable (in the sense that they fail to operate appropriately within the intended environment, falling short in areas such as reliability, response times, ease-of-use, being excessively prone to configuration-driven errors, and so forth). – There are many systems that are considered failures ... even after being shown to meet their specification! • What to do: – Achieve far higher reliability in software-based systems. – Design to stay within the capability and interest-level of the intended user. – etc. 7 “90-90” Failures • Example scenario: – We have decomposed our system into a set of small components, each of which has been implemented. – When we start putting the system together, however, all sorts of failures and difficulties arise, performance is unacceptable, and the schedule and cost estimates are repeatedly exceeded. • The problem is often unplanned dynamic behavior. • What can we do better: – “Design for integration” 8 Too Late / Too Expensive to be Useful • Example scenario: – The amount of time (or money, or both) required to build the capability makes it no longer of interest. – Due to repeated breaches of cost and schedule estimates, the development team has lost credibility with the funders &/or users. • What can we do: – Agile methods – Radical reduction in SLOC counts 9 Summary • Cost increases of 2x, 3x, even 10x are signals of something other than “requirements creep” – Attributing failure to “lack of complete requirements” could be interpreted as passing the blame to someone else – I believe that we in the development community need to take more responsibility for achieving more consistently-better performance • How: – Recognize the social aspect of our job, and thereby, deal with the “wicked” aspects of systems development – Recognize that we have to deliver systems that are suitable, as well as effective – Deal better with projected dynamic behavior in our designs, and thereby avoiding “90-90” failures – Create methods that will allow us to deliver system within budgets and schedules that are of interest 10 Q&A Questions? 11 NORTHROP GRUMMAN PRIVATE / PROPRIETARY LEVEL 1