COCOMO II Integrated with Crystal Ball® Risk Analysis Software Clate Stansbury MCR, LLC cstansbury@mcri.com (703) 506-4600 Prepared for 19th International Forum on COCOMO Software Cost Modeling University of Southern California Los Angeles CA 27 October 2004 1 Contents • • • • • Purpose: Describing Uncertainty Representing Uncertain Inputs Simulating Costs Correlating Inputs and Costs Summary 2 Traditional “Roll-Up” Method (Too Simple) • • • Define “Best Estimate” of Each Cost Element to be the Most Likely Cost of that Element List Cost Elements in a Work-Breakdown Structure (WBS) – Calculate “Best Estimate” of Cost for Each Element – Sum All Best Estimates – Define Result to be “Best Estimate” of Total Project Cost Two Problems With Roll-up Method 1. Ignores Uncertainty—Only Outputs a Point Estimate 2. Estimate is Too Low (We’ll Discuss Later) 3 Estimators Must Describe Uncertainty • Report Cost As a Statistical Quantity, Not a Point – Cost of Any Incomplete Program Is Uncertain – Estimator Must Report That Uncertainty as Part of His or Her Delivered Estimate • Cost-risk Analysis Allows Estimator to Report Cost As a Probability Distribution, So Decision-maker Is Made Aware of – Expected Cost (Mean) – 50th Percentile Cost (Median) – 80th Percentile Cost – Overrun Probability of Project Budget 4 What a Cost Estimate Should Look Like Percentile Value 0% 450.19 10% 516.81 20% 538.98 30% 557.85 40% 575.48 50% 592.72 60% 609.70 70% 629.19 80% 650.97 90% 683.01 100% 796.68 Statistics Value Trials 10,000 Mean 596.40 Median 592.72 Mode --Standard Deviation 63.18 Range Minimum 450.19 Range Maximum 796.68 (Crystal Ball Outputs) Forecast: A8 10,000 Trials Cumulativ e Chart 71 Outliers 1.000 10000 “S-Curve” .750 .500 .250 .000 0 462.43 537.16 611.89 686.62 761.35 Forecast: A8 10,000 Trials Frequency Chart 71 Outliers .020 197 “Density Curve” .015 147.7 .010 98.5 .005 49.25 .000 0 462.43 537.16 611.89 686.62 761.35 5 Representing Uncertain Inputs Using Triangular Distributions 6 DENSITY Triangular Distribution of Element Cost, Reflecting Uncertainty in “Best” Estimate $ L M Optimistic Best-Estimate Cost (Mode = Cost Most Likely) H Cost Implication of Technical, Programmatic Assessment 7 COCOMO Cost Drivers as Triangular Distributions Why triangular distribution? • Triangular Distribution is Simple and Malleable • Parameters (Optimistic, Most Likely, Pessimistic) Are Easy to Define and Explain • Could Have User Provide Parameters for Normal, Lognormal, Exponential, Uniform, or Beta Distributions, for Example, if More is known about the distributions • Good Topic for Further Research…. 8 COCOMO Cost Drivers as Triangular Distributions • For Each COCOMO II Input … – Input Request Interpreted as a Triangular Distribution – User Estimates Optimistic, Most Likely, and Pessimistic Values (which may not always be all different from each other) Most Likely (mode) Probability Optimistic Pessimistic Cost User provides three values for each COCOMO II input, as though there were three separate projects. Range of Realistic Input Values 9 COCOMO Cost Drivers as Triangular Distributions 0.90 1.14 10 Processing Uncertainty Using Simulations 11 How to Process Triangular Distributions? • Taking the Product of Effort Multipliers When Each EM is a Triangular Distribution? • How to Sum Code Counts for All CSCIs? • How to Compute Rest of COCOMO II Algorithm? 12 Process Optimistic, ML, Pessimistic as 3 Separate Projects (Too Simple) • Perform “Roll-up” Method Three Times – Input Optimistic Values into COCOMO II – Input Most Likely Values into COCOMO II – Input Pessimistic Values into COCOMO II • Obtain Total Project Effort as a Triangular Distribution 13 Why “Roll-up” Doesn’t Work WBS-ELEMENT TRIANGULAR INPUT DISTRIBUTIONS MERGE INPUT DISTRIBUTIONS INTO TOTAL-COST DISTRIBUTION Most Likely Most Likely $ . . . $ Most Likely $ $ ROLL-UP TO MOST LIKELY TOTAL COST REAL MOST LIKELY TOTAL COST 14 Use Monte Carlo Simulation to Process the Input Triangular Distributions Trial 1 Trial 10,000 Trial 2 Assumption Cell G5 =SUM($G$4:$G$8) Total Cost Forecast 15 Crystal Ball Risk- Analysis Software • Commercially Available Third-Party Software Add-on to Excel, Marketed by Decisioneering, Inc., 2530 S. Parker Road, Suite 220, Aurora, CO 80014, (800) 289-2550 • Inputs – Parameters Defining WBS-Element Distributions – Rank Correlations Among WBS-Element Cost Distributions • Mathematics – Monte-Carlo (Random) or Latin Hypercube (Stratified) Statistical Sampling – Virtually All Probability Distributions That Have Names Can Be Used – Suggests Adjustments to Inconsistent Input Correlation Matrix • Outputs – Percentiles and Other Statistics of Program Cost – Cost Probability Density and Cumulative Distribution Graphics 16 Representing Correlations Among Risks 17 Risks are Correlated • Resolving One Cost Driver’s Risk Issues by Spending More Money Often Involves Increasing Values of Several Other Drivers as Well – For Example, the Monte Carlo Could Generate a High RELY Value and a Low DOCU Value for the Same Trial, Which Doesn’t Make Any Sense – Schedule Slippage Due to Problems in One CSCI Lead to Cost Growth and Schedule Slippage in Other CSCIs • As We Will Soon See, Correlation Tends to Increase the Variance of the Total-Cost Probability Distribution • Numerical Values of Correlations are Difficult to Estimate, but That’s Another Story 18 Maximum Possible Underestimation of Total-Cost Sigma • Percent Underestimated σ When Correlation Assumed to be 0 Instead of r (n=# of Input Values) 100 n = 1000 n = 100 n = 30 Percent Underestimated 80 n = 10 60 40 20 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Actual Correlation 19 Determining Correlations Among COCOMO II Cost Drivers • Default Correlations to 0.2 • More Detailed Default Correlations? – Higher Correlation Between RELY and DOCU? – COCOMO II Security Extension Cost Driver Related to Existing Cost Drivers 20 Summary • Estimator Must Model Uncertainty • Describe Uncertainty by Representing COCOMO Inputs as Triangular Distributions • Calculate Implications of Uncertainty by Using Monte Carlo or Latin Hypercube Simulations to Perform COCOMO II Algorithm • Consider Correlation Among CSCI Risks and Costs • Professional Software, e.g., Crystal Ball, is Available to do Computations 21 Acronyms AA AT CB CM COCOMO CSCI DM EI EIF EO EQ ILF IM KSLOC MS O,M,P SCED SLOC SU UFP UNFM USC WBS Assessment and Assimilation Automatically Translated code Crystal Ball Percent of Code Modified Constructive Cost Model Computer Software Cost Integrator Percent of Design Modified External Input External Interface File External Output External Inquiry Internal Logical File Effort for Integration Thousands of Source Lines of Code Microsoft Optimistic, Most Likely, Pessimistic Schedule compression/expansion rating Source Lines of Code Software Unadjusted Function Point Programmer Unfamiliarity rating University of Southern California Work Breakdown Structure 22 Backup Slides 23 Correlation Matters • Suppose for Simplicity – There are n Cost Elements C , C , , C 1 2 n – Each Var (C ) = s 2 i – Each Corr(Ci ,Cj ) = r < 1 n – Total Cost C = C i k =1 n n 1 n • Var(C ) = Var(C i ) 2r k =1 i =1 j = i 1 2 ( ) Var(C i ) Var C j = ns 2 n( n 1) rs = ns 2 (1 ( n 1) r ) Correlation 0 1 Var( C ) r ns 2 ns 2 ( 1 ( n 1) r ) n2s 2 24 Correlation Matrices Allow User to Adjust Correlations • One Matrix for Each CSCI Allows Estimator to Set Correlations Among Cost Drivers for that CSCI How to Record Inter-CSCI Information? • One Matrix for All Inputs in All CSCI’s Difficult for User (and Developer!) to Manipulate • One Matrix for Project with which the Estimator Sets Correlations Among the Efforts of the CSCI’s But CSCI Costs are Not Inputs (aka Assumptions). Only Inputs Can Be Correlated 25 Selection of Correlation Values • “Ignoring” Correlation Issue is Equivalent to Assuming that Risks are Uncorrelated, i.e., that All Correlations are Zero • Square of Correlation (namely, R2) Represents Percentage of Variation in one WBS Element’s Cost that is Attributable to Influence of Another’s • Reasonable Choice of Nonzero Values Brings You Closer to Truth • Most Elements are, in Fact, Pairwise Correlated • 0.2 is at “Knee” of Curve on Previous Charts, thereby Providing Most of the Benefits at Least Commitment Correlation % Influenced 0.00 0% 0.10 1% 0.32 10% 0.50 25% 0.71 50% 26 Cost-Risk Analysis Works by Simulating System Cost • In Engineering Work, Computer Simulation of System Performance is Standard Practice, with Key Performance Characteristics Modeled by Monte Carlo Analysis as Random Variables, e.g. – Data Throughput – Time to Lock – Time Between Data Receipt and Delivery – Atmospheric Conditions • Cost-Risk Analysis Enables the Cost Analyst to Conduct a Computer Simulation of System Cost – WBS-element Costs Are Modeled As Random Variables – Total System Cost Distribution is Determined by Monte Carlo Simulation – Cost is Treated as a Performance Criterion 27 Traditional “Roll-Up” Method (Too Simple) • Define “Best Estimate” of Each Cost Element to be the Most Likely Cost of that Element • List Cost Elements in a Work-Breakdown Structure (WBS) – Calculate “Best Estimate” of Cost for Each Element – Sum All Best Estimates – Define Result to be “Best Estimate” of Total Project Cost • Unfortunately, It Turns Out That Things are Not as Simple as They Seem – There are a Lot of Problems with This Approach 28