INTEGRATION OF SOFTWARE COST ESTIMATES ACROSS COCOMO, SEER-SEM, AND PRICE-S MODELS W. Thomas Harwick, Engineering Specialist Northrop Grumman Corporation, Integrated Systems El Segundo, California October 18, 2004 Abstract Some of the classical challenges using a single software cost model include determining software project size, normalizing size to equivalent new source lines of code, selecting appropriate environmental factors, calculating and communicating the cost results to the software customer. When different software cost models are used by different suppliers, understanding how one model translates to another model complicates the task to integrate the software estimates and to communicate them to the buyer. This paper will attempt to make model integration easier by showing the basic economic properties for each of these models: 1. COCOMO 2. SEER-SEM 3. PRICE-S. Economic properties to be explored will include economies of scale, productivity (cost) drivers, process maturity, integration complexity, and schedule compression. The productivity (cost) drivers to be examined include staff, product complexity, market, development tools, process maturity, and schedule compression. The method used will be the “ceterus parabus” tool from the field of economics. This is a way to examine each important parameter one at a time, while holding the other parameters constant about a specified baseline. The results of the “ceterus parabus” analysis will be used to show the differences between the software cost models. (The SEER-SEM and PRICE-S models are not as open as COCOMO). A notional example (baseline) will be generated that includes sensor software using SEER-SEM, air vehicle software using PRICE-S, and mission control station software using COCOMO. A project cost risk result will be integrated via a summary model that captures normalized sizing data, standardized productivity 1 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. data as determined from each model’s economic properties. A consistency check will also be performed across these three models using COCOMO. Last, this paper will also endeavor to address the impact of the Unified Modeling Language (UML) and Object Oriented Analysis (OOA) on software design productivity. Introduction Integrators of software development activity have a challenge when integrating the software cost estimates, often from different software cost models used by major subcontractors. This paper will attempt to make model integration easier. This will be accomplished through the following steps. First, by classifying each of the cost drivers from all three models. Second, show the economic properties of each of the software development cost models. Third, a baseline example will be developed for a notional air vehicle, sensor manager, and control station. Lastly, the cost models will be integrated by using the model sizing definitions, and economic properties. The cost/risk will be calculated using the Monte Carlo risk simulation method with the sizing probability distribution and the productivity probability distribution. Overview of the Software Cost Models A first step is the identification of independent variables that “explain” changes in cost. These will be called “cost drivers” throughout this paper. In mathematical form: Cost = f(x1,x2,x3,...). Each of the “x” terms is called a cost driver. The terms that explain cost drivers are called “root cost drivers” in the sense that they explain the cost drivers. In terms of the models, root cost drivers form the basis for the knob settings used in selecting the cost drivers values. It is assumed that the reader has access to the detailed cost driver definitions of each of these models. 1 A list of cost drivers can be seen by reviewing the cost The COCOMO II model can be viewed online at “http://sunset.usc.edu/research/COCOMOII/”. The Web site for the SEER-SEM model, is http://www.galorath.com . The PRICE-S model Web site is 1 2 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. models COCOMO II, SEER-SEM, and PRICE-S. A partial list, grouped, is included below. Team & Process Team COCOMO II ACAP, PCAP, APEX, PLEX, LTEX Process TOOL, PCON, PVOL Market/ Customer Market COCOMO II RUSE, SITE, SECU * Schedule SCED Reliability Certification Requirements Complexity Complexity RELY DOCU Operating Environment & Technology Sizing & Exponent Size Normalized size Exponent Scope Hardware dev. in parallel with software System Integration COCOMO II CPLX, DATA, STOR TIME COCOMO II Raw KSLOC Equiv. new KSLOC PREC, FLEX, RESL, TEAM, PMAT- or SEI rating COCOMO II Not in the default model SEER-SEM Analyst Capabilities; Analyst Experience; Programmer Capabilities; Programmers Language Experience Development Method – KnowledgeBase; and Practices and Methods Experience SEER-SEM Requirements Definition Normality ; Development Method – Application and Acquisition MethodKnowledgeBase; Multiple Site Development Required Schedule; Start Date Development Standard KnowledgeBase SEER-SEM Application – KnowledgeBase Platform; and DevelopmentMethod KnowledgeBase SEER-SEM New Lines of Code; Pre-existing SLOC. New Lines of Code PRICE-S INTEGI CPLX1 PRICE-S CPLXM, CPLX1 DSTART, PEND Standard PRICE-S UTIL, APPL PLTFM PRICE-S Raw SLOC NEWD, NEWD, SLOC SEER-SEM PRICE-S CPLX2 Table 1: Cost Drivers by Software Model Model Sensitivity and Baseline Productivity Range A CSC of 50,000 source lines of code (SLOC) is used to iterate each of the models through their (major) cost drives to show the sensitivity analysis. http://www.pricesystems.com. The author could not find details about the SEER-SEM or the PRICE-S model online. 3 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. Each of the models is analyzed using the Ceteris Paribus method practiced in economics. The Ceteris Paribus method involves varying the cost variable under consideration and noting the changes in the cost estimate - while holding all other cost drivers constant. The cost sensitivities are recorded for each major cost driver. 2 This yields the data to plot the ranked histograms shown below. A mathematically oriented person can alternatively think of the concept of “partial derivative” instead. Next, we take the top several cost drivers, and their derived ranges (local variation from the ceterus parabus baseline), and calculate the hours/SLOC range about the model baseline. This will yield a hours/SLOC range of high productivity, baseline productivity, and low productivity scenario for each model. 2 While SEER-SEM has a built in sensitivity analysis, it may reflect industry uncertainty, rather than project uncertainty – particularly if one uses the K-Base settings. 4 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. COCOMO II - Local Impact (about Baseline, +/- 1 setting) ACAP = High PVOL= Nom PCAP= Nom RUSE= Nom CPLX= XHigh DATA= High TIME= XHigh PCON= Nom APEX = High SITE= Low DOCU= Nom TOOL=Nom LTEX= Nom PLEX= Nom RELY= VHigh SCED= Nom SECU = High STOR=Nom - 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 Environmental Factors Figure 1: Local Cost Driver Sensitivity - COCOMO II Model The COCOMO baseline cost driver settings are listed next to the bar plots. The local sensitivity ranges are shown in the chart above. For the COCOMO “local” sensitivity analysis, the settings were moved +/- one setting from the COCOMO baseline. Thus, the COCOMO settings will differ somewhat from the (global) “Software productivity range” shown on the cover of Boehm’ “Software Cost Estimation with COCOMO II”. 5 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. These Environmental Factors were derived about the “Baseline” settings listed on the left hard column. The impact factors are +/- one COCOMO setting. 3 In COCOMO, the Environmental Factors are multipliers. The COCOMO hours/SLOC data is summarized in the table below. The major Environmental Factor cost drivers, in this example, are: 1. Staff Capability (ACAP + PCAP) = 1.94 2. Complexity (project complexity (CPLX) = 1.90 3. Personnel Continuity staff (PCON) = 1.68. 4 Cost Driver (group) for COCOMO II Staff Capability Complexity Personnel Continuity Person Months (excludes security, HW & SW integ, & system integ) Hours/SLOC High Productivity Baseline Productivity Low Productivity ACAP=”VHigh”; PCAP=”HIgh” CPLX= “High” “High” ACAP=”High”; PCAP=”Nom” CPLX= “XHigh” “Nom” ACAP=”Nom”; PCAP=”Low” CPLX= “VHigh” “Low” EAF= 1.95 439 EAF = 3.32 748 EAF = 5.79 1,305 1.33 2.27 3.97 Table 2: Hours per Source Lines of Code - COCOMO The Environmental Factor result in COCOMO II is a multiplicative result of the 17 cost drivers. In symbols: 17 EnvFactor EFi 1 3 Boehm, Horowitz, Reifer et al., Software Cost Estimation with COCOMO II, Prentice-Hall, Inc., New Jersey, 2000. 4 The selection of the top several cost drivers illustrates the local range of the 3 models. In actual practice, one would select cost drivers appropriate to the uncertainty about the supplier quote. In addition, the exponent drivers have been left out of this analysis. They should be included in future analysis. 6 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. SEER-SEM Sensitivity analysis: For the SEER-SEM “local” sensitivity analysis, the settings were moved +/- two setting from the SEER-SEM baseline. (There are about 9 settings available in SEER-SEM and only 5 discrete settings available in COCOMO). SEER-SEM 7.0 Security Requirements= “Nom+” Analyst Applic. Experience = “Nom” Analyst Capabilities = “Nom+” Programmer Capabilities = “Nom” Requirements Volatility= “Hi” Modern Development Practices Use = “Hi-“ Automated Tool Use = “Nom+” Process Volatility= “Nom+” Multiple Site Development= “VHi” Programmers Language Experience = “Nom” LanguageType = “Nom” Specification Level Reliability= “Hi” 0% 10% 20% 30% 40% 50% 60% 70% 80% Local Impact, Percentage Figure 2: Local Cost Driver Sensitivity – SEER-SEM Model The major SEER-SEM Factor cost drivers, in this example, are: Security Requirements = 1.75 Staff Capability (Analyst & Programmer) = 1.47 Volatility (Requirements, Modern Practices) = 1.32. The hours/SLOC data for SEER-SEM is summarized in the table below: 7 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. Cost Driver (group) for SEER-SEM Security Requirements Staff Capability Volatility Hours Hours/SLOC High Productivity Low Productivity “Nom-” Baseline Productivity “Nom+” “Nom” “Nom+” 75,200 1.50 “Nom+” “Hi” 03,900 2.08 “Hi” “VH-” 82,700 3.65 “Hi” Table 3: Hours per Source Lines of Code – SEER-SEM PRICE-S Sensitivity analysis: For the PRICE-S “local” sensitivity analysis, the settings were moved +/- one setting from the SEER-SEM baseline. PRICE-S - Local Range INTEGI-Team=0.70, INTEGE-Team= 0.70, CPLX1-person.= -0.2 INTEGI-timing= 0.7,INTEGEtiming=0.7,UTIL=0.5,APPL=8.46 PROFAC= 6.5 CPLXM=1.2 PLTFM = 1.8 CPLX1 -product familiarity CPLX1-Tools= -0.1, CPLX1Req.+newLang. = 0 - 0.50 1.00 1.50 2.00 Productivity Factors Figure 3: Local Cost Driver Sensitivity – PRICE-S Model The major PRICE-S Factor cost drivers, in this example, are: 1. Operating Environment (PLTFM) and Productivity (PROFAC) (1.75) 2. Staff Capability (INTEGI, INTEGE – crew; Team famil. W prod. Line (CPLX1) (1.55) 3. Complexity (INTEGI, INTEGE integ. Cplx). 8 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. Cost Driver (group) for PRICE-S Operating Environment Staff Capability High Productivity Baseline Productivity Low Productivity PLTFM=1.7; PLTFM=1.8; PLTFM=1.9; PROFAC=7.0 PROFAC=6.5 PROFAC=5.5 INTEGI=0.5; INTEGI=0.7; INTEGI=1.0; CPLX1 CPLX1 CPLX1 (crew= -0.2) (crew= -0.1) (crew= 0.0) Hours 141,056 220,400 330,904 Hours/SLOC 2.82 4.41 6.62 Table 4: Hours per Source Lines of Code – PRICE-S The hours/SLOC data for PRICE-S is summarized in the table above. Inter-Model Comparison of Cost Drivers The Inter-model comparison will include both sizing and the other cost drivers. We first start with the sizing and scope question. We will attempt a brief introduction to sizing for each of the models. Software Sizing by Model: The COCOMO II model uses equivalent new source lines of code (SLOC) for sizing the software effort. Re-used code is reduced to equivalent new code by determining the ratio of effort relative to a new line of code for design, code and unit test, and integration and test phases. These relative efforts are multiplied by the respective phase weights, and then summed, to yield an equivalent new SLOC count for the reused code. Two additional parameters are added by COCOMO II that apply to software reused code. They are Software Understanding and Assessment and Assimilation. 5 The resulting equivalent new re-used code is added to the actual new code to yield the total new SLOC. There are also special development cases that are handled differently, such as auto-generated code. 6 5 Boehm et. Al., Software Cost Estimation with COCOMO II, Appendix E, The Effort Estimation Equation. 6 The reader should refer to either (1) Boehm, Software Engineering Economics, or Boehm et. Al., Software Cost Estimation with COCOMO II. 9 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. The SEER-SEM model has three major categories for determining the overall SLOC count: (1) New Lines of Code, (2) Pre-existing lines of code (not designed for reuse), (3) Pre-exists, Designed-for-reuse. It appears from using the model that the calculations for the “Pre-existing” lines (not designed for reuse) is basically a multiplicative and additive combination (linear combination) of the amount of effort in Redesign, Re-implementation, and Retest ratios multiplied by the respective phase weight. This result is then multiplied by the Pre-existing SLOC. This yields the “Pre-exists”, Designed-forreuse code size. The “Pre-exists” (designed-for-reuse) would be calculated in a similar fashion. That is, the phase ratios are multiplied by the respective phase weights, and are then multiplied by the respective Pre-existing SLOC - designed for reuse. The PRICE-S model includes a ratio for percent of new code design and new SLOC that is coded. There is also a “Language” parameter that normalizes SLOC size between languages and a “FRAC” factor that normalizes out comment lines. Scope of the Models It is also critical to understand the scope of the software cost estimate made by each model. We discuss the scope of each model briefly. The major point is to investigate the model differences when comparing estimates. What may seem to be large differences in productivity rates can be due to assumptions differences within each of the cost models. COCOMO There are four major phases in COCOMO I and COCOMO II. They retain the assumption of a “waterfall” software development approach. These four phases are: 1. Plans and Requirements 2. Product Design 3. Programming 4. Integration of test of the software product. COCOMO does not include hardware and software integration. The author does not believe that it includes software developed in parallel with hardware. This is due to the differences between the software project sizes cited in the COCOMO I database (prior to 1981) and the size of the larger software projects of today. 7 7 See Boehm, Software Engineering Economics, Prentice-Hall, 1981, Table 29-1, COCOMO I database. 10 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. The “Plans & Requirements phase includes plans for required functions, interfaces and performance. These requirements are used to define the capabilities of the software product. The “Product Design” phase includes defining the hardware/software architecture, control structure and data structure for the product. Test plans are also created in this phase. In the Programming phase software components are created per the design and interface, performance requirements. For the Integration & Test phase, the software components are brought together to construct a properly functioning software product. The product is composed of loosely coupled modules. The requirements are used to determine the usefulness of the delivered software product. The SEER-SEM and PRICE-S phase outputs were determined by running each model with the Baseline of 50,000 SLOC. The percentages are listed in the table below. Model COCOMO II 9 COCOMO I (Embedded Mode) 8 SEER-SEM PRICE-S Plans & Requirements Product Design Programming (detailed design & code/ unit test 7 7 4 13 16 53 17 58 6 60 13 30 Integration & Test 25 25 22 HW and SW Integration System Integration 0 0 0 0 (part of programming) (part of System Integration) 30 11 11 Table 5: Percent Phase Effort by Model Inter-Model Comparisons – Environmental Factors This table shows the approximate percentages that apply to each factor for the following categories: 1. Team & Process 8 Boehm, Software Engineering Economics, Table 5-3, p.66. The COCOMO I database of 63 projects has approximately 9 projects above 100 KSLOC. The System Integration that we are dealing with in our notional case, must integrate many 100 KSLOC projects. Thus, the author assigns concludes that a zero percent should be assigned to System Integration for COCOMO I. 9 Boehm et. Al., Software Cost Estimation with COCOMO II, Table A-4, p.307. Plans and Requirements are in addition to the 100%. The “Transition” (of 12 %) is assumed part of support and is not included in the development phase. 11 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. 2. Market 3. Complexity 4. Scope (between models) Some additional cost drivers that are not present in the COCOMO model have been labeled called scope (for beyond the default model) parameters. These parameters include (1) hardware developed in parallel with software, and (2) system integration of large software code sections. Both of these factors are beyond the scope of COCOMO I and likely COCOMO II. The size of most of the software projects shown in Boehm’s Software Engineering Economics are not large enough to include system integration across larger CSCs. Since the author has not seen the database for COCOMO II this observation may only hold for COCOMO I). Team Process Team & COCOMO II SEER-SEM PRICE-S Analyst Capabilities (1.20) INTEGI (crew) (1.58) Programmer Capabilities (1.18) Analyst Experience (1.21) PROFAC (1.35) ACAP (1.41) PCAP (1.31) APEX (1.23) PLEX (1.20) LTEX (1.20) Process & Tools CPLX1 – Product familiarity (1.25) Programmers Language Experience (1.05) Development Method – KnowledgeBase; CPLX1 (1.10) PVOL (1.99) PCON (1.24) TOOL (1.21) Process & Reqmts. Volatility (1.32) Practices and Methods Experience CPLX!-Reqmts. Volatility (1.13) CPLX1-Tools (1.23) Table 6: Baseline - Team & Process Cost Factors Market Market COCOMO II RUSE (1.31) SITE (1.22) SECU (1.10) Schedule SCED (1.14) Reliability RELY (1.15) SEER-SEM PRICE-S CPLXM (1.2) Security Requirements(1.75) Required Schedule; Start Date Requirements Definition Formality; (part of Platform, Application -Kbase) DSTART, PEND (part of PLTFM) 12 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. Certification requirements DOCU (1.22) Development Standard Kbase Table 7: Baseline - Market & Customer, Safety Cost Factors Complexity Complexity COCOMO II CPLX (1.30) SEER-SEM Application (Kbase) DATA (1.28) STOR (1.05) Operating Environment Technology TIME (1.26) & PRICE-S APPL UTIL INTEGI (Timing. Coupling) (1.55) Platform; and DevelopmentMethod KnowledgeBase PLTFM (1.25) Table 8: Baseline – Complexity & Operating Environment Cost Factors Scope b/t Models COCOMO II SEER-SEM PRICE-S Hardware developed in parallel w/ software System Integration Not included added 1.11 Hardware Integration Level (1.09) CPLX2 (1.20) Programs INTEGE (1.13) Not included Currently added (1.12) Integrating (1.04) Table 9: Cost Model Normalization Factors Economic Properties of These Cost Models Cost models can be evaluated for their economic properties. This allows for another method to compare cost model properties. The categories we shall use are (1) diseconomies/economies of scale, (2) impact of schedule (rate of development), (3) productivity associated with the non-exponent cost drivers. Diseconomies/Economies of Scale (Size) In this section, we ask what happens to cost as the size of the product is doubled or quadrupled. This addresses the development economics of the size of product or proposed product. The exponent of the sizing parameter determines a model’s economies of scale. In software development, economies of scale have not existed until very recently. Historically, there has only been dis-economies of scale that means that the exponents are greater than one. 13 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. The COCOMO II Model allows for the possibility of economies of scale if the 5 scale factors have a Very High or Extra High rating. 10 The COCOMO model has a value range from 0.91 to approximately 1.26. 11 Effort- Person-Months COCOMO, SEER-SEM, PRICE-S COCOMO 20,000 18,000 16,000 14,000 12,000 10,000 8,000 6,000 4,000 2,000 - y = 19.4 x 1.09 SEER-SEM y = 15.1 x 1.12 PRICE-S y = 9.7 x 1.11 0 200 400 600 KSLOC The COCOMO exponent ranges from 0.91 to approximately 1.26 depending upon the Scale Factors. Figure 3: Economies of Scale for Software Cost Models Economies of scale (exponent with respect to size of project less than one) may be possible. This would require preliminary and detailed design strategies that have a lot of code commonality. This could be done, in principle, by making use of parent-child classes (and corresponding objects) as well in “inheritance” properties of classes (and corresponding objects). Some types of applications may lend themselves to economies of scale in design, and perhaps, also code implementation. It is noted that the Unified Modeling Language (UML) can be used to architect systems at the analysis step. UML can also be used to generate class skeletons and basic structure. It remains to be seen what will be the impact upon software design productivity and decreases in system design defects. 12, 13 10 Boehm, Reifer et. Al., Software Cost Estimation with COCOMO II, inside cover. This is reasonably in line with COCOMO I that has three values for the exponent. (“Embedded“ mode = 1.20, “Semi-detached” Mode= 1.12, “Organic” Mode= 1.05). 12 Chonoles, Schardt , UML2 for Dummies, Wiley Publishing, Inc., p. 24, 2003. 13 In the area of software maintenance, the use of encapsulation (data hiding) within computer components can have a major impact upon decreasing maintenance cost. For example, the Y2K problem, was due to the previous era of programming when a two-digit year was used instead of a four-digit year. These date references were spread in numerous places within financial and other programs and had to be found and corrected. If the early developers had known about encapsulation, they could have placed the date routine in one place within each large program. The maintenance cost would have been significantly less. 11 14 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. Impact of Schedule (Rate of Development) What is the impact of changes in rate of development? The cost models display the following schedule penalties for schedule compression and expansion. The schedule curves are developed through varying the number of months. Each model showed somewhat different behavior. The COCOMO model showed a 43 percent penalty for a 25 percent schedule compression, but no penalty for schedule expansion. The PRICE-S model showed “classical” schedule behavior in that compressions of schedule exhibited penalties and schedule expansion exhibited some effort penalty. The SEER-SEM assumes a minimal schedule and then shows lower amount of effort for expanding to a nominal schedule. All the cost models show this inverse square law feature. 14 Schedule Variation (SLOC = 50,000) 2,500 2,000 1,500 1,000 500 0 20 40 60 Months COCOMO 80 PRICE-S SEER-SEM Figure 3: Model’s Relative Schedules and Penalties Productivity Ranges (Local) COCOMO shows the greatest range in productivity between low to high for the change in settings between the models. This is probably due to the fewer knob 14 This inverse square law was verified by independent work by Dr. Randall Jensen. He also notes that the SEER-SEM optimum schedule corresponds to the COCOMO nominal schedule. Dr. Randall Jensen is a member of Space Systems Cost Analysis Group (SSCAG), he is located at Software Technology Support Center, Hill AFB, UT, 84056. 15 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. settings available for the ceterus paribus analysis. COCOMO and SEER-SEM are within about 10 percent of each other, in terms of productivity (hours/SLOC) at the Baseline, High, and Low Productivity scenarios. It is not immediately clear why the PRICE-S productivity is so much lower than the other cost models. The impact of new hardware (with software development) was normalized out of the High productivity settings for PRICE-S (CPLX2). This only accounts for about 12 percent of the difference [=(2.82/1.50) / (4.41/2.08)]. The Baseline and Low Productivity scenarios for PRICE-S contain new hardware. Hours/SLOC by model: High Productivity Baseline Productivity Low Productivity COCOMO SEER-SEM PRICE-S 1.33 1.50 2.82 3.97 3.65 6.62 2.27 2.08 4.41 Low/High Productivity Ratio 3.0 2.4 2.3 Table 10: Productivity Range by Model Development of cost/risk estimates In this section, we develop the cost/risk development profiles for the air vehicle, sensor, and ground control station. The basic equation is Hours (effort) equal Size in SLOC multiplied by the Hours per SLOC. We will use probability curves for these two equation elements. Sizing uncertainty is modeled for 3 notional scenarios for each software system as listed in the following three tables. For illustration, we have selected the “Triangular” probability distribution. We need to generate a “Low”, “Most Likely”, and “High” sizing estimate for each software system. 15 Air Vehicle Original SLOC point estimate 2,000,000 Code growth of -5%, 15% Code growth of reuse 15%, 10% reuse Code growth of 20%, 5% Reuse SLOC-Low SLOC-Most Likely SLOC-High 1,615,000 2,070,000 2,280,000 Assume that the “Low” value is set at the one percent probability of being exceeded, the “Most Likely” has the highest probability of occurrence (in a single modal distribution), and “High” at the 99 percent probability. 16 19th International Forum on COCOMO and Software Cost Modeling 15 Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. Table 11: Air Vehicle Sizing Scenarios and Range The Air Vehicle table shows the Low value as 1.61 million SLOCs, the Most Likely as 2.07 million SLOCs, and the “High” at 2.28 million SLOCs. Code growth of -5%, 20% Code growth of reuse 15%, 10% reuse Sensor Original SLOC point estimate SLOC-Low 500,000 380,000 Table 12: Sensor Sizing Scenarios and Range Code growth of 30%, no Reuse SLOC-Most Likely SLOC-High 517,500 650,000 The Sensor table shows the Low value as 0.380 million SLOCs, the Most Likely as 0.518 million SLOCs, and the “High” at 0.650 million SLOCs. Code growth of -5%, 50% Code growth of reuse 10%, 25% reuse Code growth of 20%, 15% Reuse Ground Control Station Original SLOC point estimate SLOC-Low SLOC-Most Likely SLOC-High 1,000,000 475,000 825,000 1,020,000 Table 13: Ground Control Station Sizing Scenarios and Range The Ground Control Station table shows the Low value as 0.475 million SLOCs, the Most Likely as 0.825 million SLOCs, and the “High” at 1.020 million SLOCs. Next, the most likely hours per SLOC is calculated from each of the cost models. The Air Vehicle has a productivity of 4.48 Hours/SLOC per the PRICE-S model. The Sensor has a productivity of 4.91 Hours/SLOC per the SEER-SEM model, while the Ground Control Station has 4.06 hours/SLOC per the COCOMO model. PRODUCTIVITY (Hours/SLOC) Model High Productivity Medium Productivity Air Vehicle PRICE-S 3.66 Sensor SEER-SEM 4.02 3.32 Ground Control Station COCOMO Table 14: Sizing Ranges for the Software Cost Models Low Productivity 4.48 4.91 4.06 17 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. 5.74 6.29 5.20 The Low and High are derived by varying several cost drivers within each model. For example, the COCOMO Hours./SLOC is varied by assuming that the TIME, and PCON variable could vary by +/- one setting. Final Consistency Check The consistency check uses the initial source lines of code (SLOC) sizing data is shown in the chart below for each of our major systems (air vehicle, sensor, and control ground station). MODEL Air Vehicle Sensor Ground Control COCOMO - SLOC 2,000,000 500,000 1,000,000 SEER-SEM - SLOC PRICE-S - SLOC 500,000 1,000,000 Table 15: Model Cross-Check – Sizing Used MODEL Air Vehicle COCOMO - PM COCOMO—PM SEER-SEM - PM (normalized to include HW/SW Integration & System Test) 43,681 55,308 10,189 12,901 Ground Control 21,096 26,712 Table 16: Model Cross-Check – Estimates by Model Sensor PRICE-S - PM 58,992 14,748 29,496 Percent 64,657 15,863 32,273 17% 23% 21% The model estimates are within approximately 23 percent of each other for each major software types as shown in the Percent column in Table 16. At this level of analysis we can say the model results are consistent. That is the hypothesis is the software suppliers are using their models in ways that lead to consistent results with the type of environment being modeled. Cost risk comparisons across the cost models The cost risk estimates tables and curves are developed for each of the models. 18 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. Probability Software - Total Effort 50% Occurs at 15.1 Million Hours 100 90 80 70 60 50 40 30 20 10 0 - 5.00 10.00 15.00 20.00 25.00 Hours (Millions) Figure 4: Probability Outcomes for Total (Notional) Software Effort Air Vehicle Air Vehicle Air Vehicle SLOC SLOC, Thousands SLOC Hours/SLOC SLOC Hours, in Millions 0 1625 0 3.70 0 6.14 10 1794 10 4.07 10 7.83 20 1864 20 4.24 20 8.28 30 1914 30 4.37 30 8.59 40 1960 40 4.48 40 8.87 50 2002 50 4.61 50 9.16 60 2038 60 4.73 60 9.44 70 2071 70 4.87 70 9.76 80 2110 80 5.03 80 10.13 90 2164 90 5.22 90 10.64 100 2275 100 5.73 100 12.48 Table 17: Cumulative Probability Results - for Air Vehicle (SLOC, Hours/SLOC, Hours (M)) 19 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. Sensor SLOC Sensor Sensor SLOC, Thousands SLOC Hours/SLOC SLOC Hours, in Millions 0 382 0 4.03 0 1.66 10 441 10 4.48 10 2.15 20 467 20 4.66 20 2.30 30 486 30 4.81 30 2.42 40 502 40 4.93 40 2.52 50 518 50 5.05 50 2.61 60 531 60 5.18 60 2.71 70 548 70 5.33 70 2.82 80 566 80 5.52 80 2.94 90 590 90 5.74 90 3.13 100 647 100 6.28 100 3.99 Table 18: Cumulative Probability Results - for Sensor (SLOC, Hours/SLOC, Hours (M)) Gnc. Cntl. SLOC, Thousands Gnc. Cntl. Hours/SLOC Gnc. Cntl. Hours, in Millions 0 478.8 0 3.33 0 1.73 10 608.7 10 3.69 10 2.50 20 669.1 20 3.86 20 2.75 30 715.4 30 3.97 30 2.95 40 753.4 40 4.07 40 3.10 50 786.1 50 4.17 50 3.25 60 815.8 60 4.28 60 3.40 70 845.5 70 4.40 70 3.57 80 878.1 80 4.55 80 3.75 90 923.3 90 4.73 90 3.98 100 1017.1 100 5.18 100 5.06 Table 19: Cumulative Probability Results - for Gnd. Cntl. Stn. (SLOC, Hours/SLOC, Hours (M)) Summary/Lessons Learned from Cost Models Our objective was to integrate three different estimate that used the COCOMO II, SEER-SEM, and PRICE-S models. To understand the sizing, scope, and productivity associated with each of these models, we identified the major cost drivers, attempted to classify them into Team (team and process), Market (market, schedule, reliability, certification), Complexity (complexity, operating environment, technology), and Sizing (sizing and sizing economics). We also added a scope category to cover parameters that were not in one or more of the models. We used the ceterus parabus method to extract the largest (local) impact cost drivers about our assumed baseline of 50,000 source lines of code (SLOC). 20 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. The input baseline, for each model, was used to calculate a Productivity Baseline and region for each of the cost models. The units used are Hours/SLOC. Scope of the cost models was briefly discussed, along with the models economic properties (exponents) and schedule compression and expansion properties. We then constructed the cost-risk results for each of the models using the Monte Carlo method. This involves constructing notional scenario uncertainty ranges in SLOC and productivity (hours/SLOC) for each of the three scenarios (Air Vehicle, Sensor, and Ground Station). The final output was obtained using the Monte Carlo method and illustrating the analysis by using a “Triangular” probability distribution for the sizing and the productivity uncertainties. The result is a cumulative histogram for total program hours and hours for each major system. Further work is needed to understand the productivity basis for the three software cost models as well as the mapping of cost drivers between models. In addition, one of the factors not explored is the cultural impact upon software productivity in large and smaller organizations. As long as we have multiple suppliers bidding on different software systems, we will have this challenge of understand how people work and the models used to capture the aggregate sizing and productivity data. 21 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004. References: Models: Price Systems, PRICE-S model, Los Angeles, California, 2004. Galorath Associates, SEER-SEM, Los Angeles, California, 2003. USC Associates, COCOMO II.1999, Los Angeles, California, 2004. Decisioneering, Crystal Ball, Denver, Colorado, 2002. Books: 1. Boehm, Reifer, et al., Software Cost Estimation with COCOMO II, PrenticeHall, New Jersey, 2000. 2. Boehm, Barry W., Software Engineering Economics, Prentice-Hall, New Jersey, 1981. 3. Deming, W. Edward, Out of the Crisis, Massachusetts Institute of Technology, Cambridge, 1982. 4. William H. Greene, Econometric Analysis, Prentice-Hall, Englewood Cliffs, New Jersey, 1993. 5. Cs3 Inc., Objected Oriented Analysis Using the Unified Modeling Language, (UML), Los Angeles, CA, 2000-2003. 22 19th International Forum on COCOMO and Software Cost Modeling Center for Software Engineering, University of Southern California Los Angeles, CA, October 26-29, 2004.