University of Southern California Center for Systems and Software Engineering A COCOMO Extension for Software Maintenance 25th International Forum on COCOMO and Systems/Software Cost Modeling Vu Nguyen, Barry Boehm November 2nd, 2010 © 2010, USC-CSSE 1 University of Southern California Center for Systems and Software Engineering Outline • Motivation • Problem and A Solution • COCOMO Extension for SW Maintenance – Sizing method – Effort model • Results – Data collection results – Calibrations • Conclusions © 2010, USC-CSSE 2 University of Southern California Center for Systems and Software Engineering Software Maintenance • Work of modifying, enhancing, and providing cost-effective support to the existing software • Characteristics of maintenance projects – Constrained by legacy system • • • • Quality of the system Requirements, architecture and design System understandability Documentation © 2010, USC-CSSE 3 University of Southern California Center for Systems and Software Engineering Magnitude of Software Maintenance • Majority of software costs incur after the first operational release [Boehm 1981] % of Software Cost Maintenance vs. Total Software Cost 100 90 80 70 60 50 40 30 20 10 0 Others Maintenance Zelkowitz et al. (1979) McKee (1984) Moad (1990) Erlikh (2000) Studies Software maintenance cost versus total software cost © 2010, USC-CSSE 4 University of Southern California Center for Systems and Software Engineering Importance of Software Estimation in Managing Software Projects • Estimation is a key factor determining success or failure of software projects – Two out of three most-cited project failures are related to resource estimation, CompTIA survey [Rosencrance 2007] • Cost estimate is key information for investment, project planning and control, etc. • Many software estimation approaches have been proposed and used in industry – E.g., COCOMO, SEER-SEM, SLIM, PRICE-S, Function Point Analysis © 2010, USC-CSSE 5 University of Southern California Center for Systems and Software Engineering Problem and Solution • These models are built on the assumptions of new development projects • Problem is that these assumptions do not always hold in software maintenance due to differences between new development and maintenance – Low estimation accuracies achieved Solution: Extending COCOMO II to support estimating maintenance projects Objective: Improving the estimation performance © 2010, USC-CSSE 6 University of Southern California Center for Systems and Software Engineering COCOMO II for Maintenance • An extension of COCOMO II – COCOMO is the non-proprietary most popular model – COCOMO has attracted many independent validations and extensions • Designed to estimate effort of a software release • Has two components – Maintenance Sizing Model – Effort Model • Supports maintenance types – Enhancement – Error corrections © 2010, USC-CSSE 7 University of Southern California Center for Systems and Software Engineering COCOMO II for Maintenance – Extensions • Maintenance Sizing Model – Uniting Adaptation/Reuse and Maintenance models – Redefining size parameters DM, CM, and IM • Using deleted SLOC from modified modules • Method to determine actual equivalent SLOC from code • Effort Model – Excluding RUSE and SCED cost drivers from the model – Revising rating levels for personnel attributes – Providing a reduced-parameter model – Providing a new set of rating scales for the cost drivers © 2010, USC-CSSE 8 University of Southern California Center for Systems and Software Engineering Software Maintenance Sizing • Size is a key determinant of effort • Sizing method has to take into account different types of code Preexisting Code Delivered Code Reused Modules External Modules Existing System Modules Manually develop and maintain Automatically translate Adapted Modules New Modules Automatically Translated Modules Types of Code © 2010, USC-CSSE 9 University of Southern California Center for Systems and Software Engineering Software Maintenance Sizing (cont’d) • Computing Equivalent SLOC: – New Modules: KSLOCadded – Adapted Modules: EKSLOC adapted AKLOC * AAM AKLOC : KSLOC of the adapted modules before changes – Reused Modules: EKSLOCreused 0.3 * RKSLOC * IM reused RKSLOC: KSLOC of the reused modules – Total Equivalent KSLOC: EKSLOC KSLOC added EKSLOC adapted EKSLOC reused © 2010, USC-CSSE 10 University of Southern California Center for Systems and Software Engineering COCOMO Effort Model for Maintenance • Using the same COCOMO II form, non-linear SF PM A * Size * EM B Where, PM – project effort measured in person-month A – a multiplicative constant, calibrated using data sample B – an exponent constant, calibrated using data sample Size – software size measured in EKSLOC EM – 15 effort multipliers, cost drivers that have an multiplicative effect on effort SF – 5 scale factors, cost drivers that have an exponential effect on effort • Linearizing the model using log-transformation log(PM) = 0 + 1 log(Size) + i SFi log(Size) + j log(EMj) © 2010, USC-CSSE 11 University of Southern California Center for Systems and Software Engineering Data Collection • Delphi survey – Surveying experts about rating scales of cost drivers • Sample data – Collecting data of completed maintenance projects from industry – Following inclusion criteria, e.g., • Starting and ending dates are clear • Include only major releases with Equivalent SLOC no less than 2000 SLOC • Maintenance type: error corrections, enhancements Release N Project starts for Release N+1 Release N+1 Project starts for Release N+2 Timeline Baseline 1 Maintenance project N+1 Baseline 2 Release Period © 2010, USC-CSSE 12 University of Southern California Center for Systems and Software Engineering Calibration • Process of fitting data to the model to adjust its parameters and constants Initial rating scales for cost drivers Delphi survey of 8 experts (Expert-judgment estimates) Model Calibration New rating scales for cost drivers and constants Calibration Techniques: - Ordinary Least Squares Regression (OLS) - Bayesian Analysis [Boehm 2000] - Constrained Regression [Nguyen 2008] Sample data 80 data points from 3 organizations © 2010, USC-CSSE 13 University of Southern California Center for Systems and Software Engineering Data Collection Results • Delphi Survey Results – 8 surveys collected from experts in the field – Considerable changes seen in the personnel factors R E LY 1.25 F LE X 1.26 T EA M 1.3 P R EC 1.31 Parameter PMAT PREC TEAM FLEX RESL PCAP RELY CPLX TIME STOR ACAP PLEX LTEX DATA DOCU PVOL APEX PCON TOOL SITE 1.35 ST OR DATA 1.38 R ESL 1.39 P M AT 1.41 P LE X 1.45 P VOL 1.46 LT E X 1.46 D OC U 1.52 T OOL 1.55 T IM E 1.55 S IT E 1.56 A P EX 1.6 1.69 P C ON 1.77 ACAP 1.83 P CAP C P LX 2.29 1.00 1.50 2.00 Delphi PR 1.41 1.31 1.30 1.26 1.39 1.83 1.22 2.29 1.55 1.35 1.77 1.45 1.46 1.38 1.52 1.46 1.60 1.69 1.55 1.56 COCOMO II.2000 PR Difference 1.43 -0.02 1.33 -0.02 1.29 0.01 1.26 0.00 1.38 0.01 1.76 0.07 1.24 -0.02 2.38 -0.09 1.63 -0.08 1.46 -0.11 2.00 -0.23 1.40 0.05 1.43 0.03 1.42 -0.04 1.52 0.00 1.49 -0.03 1.51 0.09 1.59 0.10 1.50 0.05 1.53 0.04 2.50 Differences in PRs between COCOMO II.2000 and Delphi Results Productivity Ranges (PRs) © 2010, USC-CSSE 14 University of Southern California Center for Systems and Software Engineering Data Collection Results (cont’d) • Sample data – 86 releases in 24 programs (6 releases are outliers) Releases Source Statistics 64 A large organization member of CSSE Affiliates, USA 14 A CMMI-L5 company, Vietnam 8 A CMMI-L3 company, Thailand Average Median Max Min Size (EKSLOC) 64.1 39.6 473.4 2.8 Effort (PM) 115.2 58.7 1505.1 4.9 Schedule (Month) 10.5 10.2 36.9 1.8 ESLOC Added 31.8% Equivalent SLOC differs from SLOC of the delivered program ESLOC Adapted 60.7% ESLOC Reused 7.5% © 2010, USC-CSSE Distribution of size metrics 15 University of Southern California Center for Systems and Software Engineering Data Collection Results (cont’d) • Distribution of Size and Effort 1600 3.5 1400 3 1200 2.5 2 Log(PM) PM 1000 800 600 1.5 400 1 200 0.5 0 0 0 100 200 300 400 500 EKSLOC 0 0.5 1 1.5 2 2.5 3 Log(EKSLOC) Log(PM) vs. Log(EKSLOC) PM vs. EKSLOC © 2010, USC-CSSE 16 University of Southern California Center for Systems and Software Engineering Model Calibrations • Full model calibrations – Applying Bayesian and Constrained Regression – Using 80 data points (6 outliers eliminated) • Local calibrations – Calibrating models into organizations and programs – Using four approaches • productivity index, simple regression, Bayesian, constrained regression © 2010, USC-CSSE 17 University of Southern California Center for Systems and Software Engineering Full Model Calibrations • Bayesian approach – Productivity ranges indicate that • • • APCAP is less influential than it is in COCOMO II.2000 CPLX is still the most influential PCAP is more influential than ACAP Parameter COCOMO II Maintenance PR COCOMO II.2000 PR Difference A 3.16 2.94 0.22 B 0.78 0.91 -0.13 PMAT 1.41 1.43 -0.03 PREC 1.31 1.33 -0.02 TEAM 1.29 1.29 0.01 FLEX 1.26 1.26 -0.01 RESL 1.39 1.38 0.01 PCAP 1.79 1.76 0.02 RELY 1.22 1.24 -0.02 CPLX 2.22 2.38 -0.16 TIME 1.55 1.63 -0.08 STOR 1.35 1.46 -0.11 ACAP 1.61 2.00 -0.39 PLEX 1.44 1.40 0.04 LTEX 1.46 1.43 0.03 DATA 1.36 1.42 -0.06 DOCU 1.53 1.52 0.01 PVOL 1.46 1.49 -0.04 APEX 1.58 1.51 0.08 PCON 1.49 1.59 -0.10 TOOL 1.55 1.50 0.05 SITE 1.53 1.53 0.01 1.22 R E LY F LE X 1.26 1.29 T EA M P R EC 1.31 ST OR 1.35 DATA 1.36 R ESL P M AT 1.39 1.41 1.44 P LE X P VOL 1.46 LT E X 1.46 P C ON 1.49 D OC U 1.53 S IT E 1.53 T OOL 1.55 T IM E 1.55 1.58 A P EX 1.61 ACAP 1.79 P CAP C P LX 2.22 1.00 1.50 Productivity Ranges 2.00 2.50 Differences in PRs between COCOM II.2000 and COCOMO II for Maint. © 2010, USC-CSSE 18 University of Southern California Center for Systems and Software Engineering Full Model Calibrations (cont’d) • Estimation accuracies – COCOMO II.2000: use the model to estimate 80 data points – COCOMO II for Maintenance: calibrated using Bayesian and Constrained regression approaches • COCOMO II for Maintenance outperforms COCOMO II.2000 by a wide margin Model MMRE PRED(0.25) PRED(0.3) COCOMO II.2000 56% 31% 38% COCOMO II for Maintenance: Bayesian 48% 41% 51% COCOMO II for Maintenance: CMRE 37% 56% 60% COCOMO II for Maintenance: CMSE 39% 43% 51% COCOMO II for Maintenance: CMAE 42% 54% 58% 58% Three Constrained Regression Techniques: CMRE: Constrained Minimum sum of Relative Errors CMSE: Constrained Minimum sum of Square Errors CMAE: Constrained Minimum sum of Absolute Errors © 2010, USC-CSSE 19 University of Southern California Center for Systems and Software Engineering Local Calibration • Local calibration potentially improving the performance of estimation models [Chulani 1999, Velerdi 2005] • In local calibration, the model’s constants A and B estimated using local data sets • Local calibration types – Organization-based • All data points of each organization used to calibrate the model • 3 organizations, 80 releases – Program-based • All data points (releases) of each program • Only programs having 5 or more releases • Total 45 releases in 6 programs © 2010, USC-CSSE 20 University of Southern California Center for Systems and Software Engineering Local Calibration (cont’d) • Approaches to be compared – Productivity index • Using the productivity of past projects to estimate the effort of the current project given size • The most simple but widely used – Simple linear regression • Building a simple regression model using log(PM) as the response and log(EKSLOC) as the predictor • Widely used estimation approach – COCOMO II for Maintenance: Bayesian analysis – COCOMO II for Maintenance: CMRE © 2010, USC-CSSE 21 University of Southern California Center for Systems and Software Engineering Local Calibration (cont’d) • Organization-based calibration accuracies: 80 data points Model MMRE PRED(0.25) PRED(0.3) Productivity index 44% 40% 48% Simple linear regression 50% 34% 35% COCOMO II for Maintenance: Bayesian 38% 54% 59% COCOMO II for Maintenance: CMRE 34% 62% 64% • Program-based calibration accuracies: 45 data points Model MMRE PRED(0.25) PRED(0.3) Productivity index 27% 53% 64% Simple linear regression 25% 64% 69% COCOMO II for Maintenance: Bayesian 22% 71% 80% COCOMO II for Maintenance: CMRE 21% 72% 79% © 2010, USC-CSSE 22 University of Southern California Center for Systems and Software Engineering Conclusions • A model for sizing maintenance and reuse is proposed • A set of cost drivers and levels of their impact on maintenance cost are derived • Deleted SLOC is an important maintenance cost driver • The extension is more favorable than the productivity index and simple linear regression • Organization-based and program-based calibrations improve estimation accuracy – Best model generates estimates within 30% of the actuals 80% of the time © 2010, USC-CSSE 23 University of Southern California Center for Systems and Software Engineering Threats to Validity • Threats to Internal Validity – Unrecorded overtime not included in actual effort reported – Various counting tools used in the US organization – Reliability of the data reported from the organizations • Threats to External Validity – Bias in the data set: data from the three organizations may not be relevant to the general software industry – Bias in the selection of participants for the Delphi survey © 2010, USC-CSSE 24 University of Southern California Center for Systems and Software Engineering Future Work • Calibrate the model with more data points from industry • Build domain-specific, language-specific, or platformspecific model • Survey a more diverse group of experts, not only those who are familiar with COCOMO • Extend the model to other types of maintenance – reengineering, language and data migration, performance improvement, etc. • Extend the model to support effort estimation of iterations in iterative development © 2010, USC-CSSE 25 University of Southern California Center for Systems and Software Engineering Thank You © 2010, USC-CSSE 26 University of Southern California Center for Systems and Software Engineering References – 1/2 Abran A., Silva I., Primera L. (2002), "Field studies using functional size measurement in building estimation models for software maintenance", Journal of Software Maintenance and Evolution, Vol 14, part 1, pp. 31-64 Abran A., St-Pierre D., Maya M., Desharnais J.M. (1998), "Full function points for embedded and real-time software", Proceedings of the UKSMA Fall Conference, London, UK, 14. Albrecht A.J. (1979), “Measuring Application Development Productivity,” Proc. IBM Applications Development Symp., SHARE-Guide, pp. 83-92. Basili V.R., Condon S.E., Emam K.E., Hendrick R.B., Melo W. (1997) "Characterizing and Modeling the Cost of Rework in a Library of Reusable Software Components". Proceedings of the 19th International Conference on Software Engineering, pp.282-291 Boehm B.W. (1981), “Software Engineering Economics”, Prentice-Hall, Englewood Cliffs, NJ, 1981. Boehm B.W. (1999), "Managing Software Productivity and Reuse," Computer 32, Sept., pp.111-113 Boehm B.W., Horowitz E., Madachy R., Reifer D., Clark B.K., Steece B., Brown A.W., Chulani S., and Abts C. (2000), “Software Cost Estimation with COCOMO II,” Prentice Hall. Briand L.C. & Basili V.R. (1992) “A Classification Procedure for an Effective Management of Changes during the Software Maintenance Process”, Proc. ICSM ’92, Orlando, FL Chulani S. (1999), "Bayesian Analysis of Software Cost and Quality Models", PhD Thesis, the University of Southern California. Port D., Nguyen V., Menzies T., (2009) “Studies of Confidence in Software Cost Estimation Research Based on the Criterions MMRE and PRED.” Submitted to Journal of Empirical Software Engineering De Lucia A., Pompella E., Stefanucci S. (2003), “Assessing the maintenance processes of a software organization: an empirical analysis of a large industrial project”, The Journal of Systems and Software 65 (2), 87–103. Erlikh L. (2000). “Leveraging legacy system dollars for E-business”. (IEEE) IT Pro, May/June, 17-23. Gerlich R., and Denskat U. (1994), "A Cost Estimation Model for Maintenance and High Reuse, Proceedings," ESCOM 1994, Ivrea, Italy. IEEE (1998) IEEE Std. 1219-1998, Standard for Software Maintenance, IEEE Computer Society Press, Los Alamitos, CA. © 2010, USC-CSSE 27 University of Southern California Center for Systems and Software Engineering References – 2/2 Jorgensen M. (1995), “Experience with the accuracy of software maintenance task effort prediction models”, IEEE Transactions on Software Engineering 21 (8) 674–681. McKee J. (1984). “Maintenance as a function of design”. Proceedings of the AFIPS National Computer Conference, 187-193. Moad J. (1990). “Maintaining the competitive edge”. Datamation 61-62, 64, 66. Niessink F., van Vliet H. (1998), “Two case study in measuring maintenance effort”, Proceedings of International Conference on Software Maintenance, Bethesda, MD, USA, pp. 76–85. Ramil J.F. (2003), “Continual Resource Estimation for Evolving Software," PhD Thesis, University of London, Imperial College of Science, Technology and Medicine. Nguyen V., Deeds-Rubin S., Tan T., Boehm B.W. (2007), “A SLOC Counting Standard,” The 22nd International Annual Forum on COCOMO and Systems/Software Cost Modeling. Nguyen V., Steece B., Boehm B.W. (2008), “A constrained regression technique for COCOMO calibration”, Proceedings of the 2nd ACM-IEEE international symposium on Empirical software engineering and measurement (ESEM), pp. 213-222 Nguyen V., Boehm B.W., Danphitsanuphan P. (2009), “Assessing and Estimating Corrective, Enhancive, and Reductive Maintenance Tasks: A Controlled Experiment.” In Proceedings of 16th Asia-Pacific Software Engineering Conference (APSEC 2009), Dec. Nguyen V., Boehm B.W., Danphitsanuphan P. (2010), “A Controlled Experiment in Assessing and Estimating Software Maintenance Tasks”, APSEC Special Issue, Information and Software Technology Journal, 2010. Sneed H.M., (1995), "Estimating the Costs of Software Maintenance Tasks," IEEE International Conference on Software Maintenance, pp. 168-181 Rosencrance L. (2007), "Survey: Poor communication causes most IT project failures," Computerworld Selby R. (1988), Empirically Analyzing Software Reuse in a Production Environment, In Software Reuse: Emerging Technology, W. Tracz (Ed.), IEEE Computer Society Press, pp. 176-189. Sneed H.M., (2004), "A Cost Model for Software Maintenance & Evolution," IEEE International Conference on Software Maintenance, pp. 264-273 Symons C.R. (1988) "Function Point Analysis: Difficulties and Improvements," IEEE Transactions on Software Engineering, vol. 14, no. 1, pp. 2-11 Valerdi R. (2005), "The Constructive Systems Engineering Cost Model (Cosysmo)", PhD Thesis, The University of Southern California. Zelkowitz M.V., Shaw A.C., Gannon J.D. (1979). “Principles of Software Engineering and Design”. Prentice-Hall © 2010, USC-CSSE 28 University of Southern California Center for Systems and Software Engineering Backup Slides © 2010, USC-CSSE 29 University of Southern California Center for Systems and Software Engineering Abbreviations COCOMO COCOMO II CMMI EM PM OLS MSE MAE CMSE CMAE CMRE MMRE MRE PRED ICM PR SF Constructive Cost Model Constructive Cost Model version II Capability Maturity Model Integration Effort Multiplier Person Month Ordinary Least Squares Mean Square Error Mean Absolute Error Constrained Minimum Sum of Square Errors Constrained Minimum Sum of Absolute Errors Constrained Minimum Sum of Relative Errors Mean of Magnitude of Relative Errors Magnitude of Relative Errors Prediction level Incremental Commitment Model Productivity Range Scale Factor © 2010, USC-CSSE 30 University of Southern California Center for Systems and Software Engineering Model Parameter Abbreviations AA AAF AAM AKSLOC CM DM EKSLOC ESLOC IM KSLOC RKSLOC SLOC SU UNFM Assessment and Assimilation Adaptation Adjustment Factor Adaptation Adjustment Multiplier Kilo Source Lines of Code of the Adapted Modules Code Modified Design Modified Equivalent Kilo Source Lines of Code Equivalent Source Lines of Code Integration Modified Kilo Source Lines of Code Kilo Source Lines of Code of the Reused Modules Source Lines of Code Software Understanding Programmer Unfamiliarity © 2010, USC-CSSE ACAP APEX CPLX DATA DOCU FLEX LTEX PCAP PCON PERS PLEX PMAT PREC PREX PVOL RELY RESL SITE STOR TEAM TIME TOOL Analyst Capability Applications Experience Product Complexity Database Size Documentation Match to Life-Cycle Needs Development Flexibility Language and Tool Experience Programmer Capability Personnel Continuity Personnel Capability Platform Experience Equivalent Process Maturity Level Precedentedness of Application Personnel Experience Platform Volatility Required Software Reliability Risk Resolution Multisite Development Main Storage Constraint Team Cohesion Execution Time Constraint Use of Software Tools 31 University of Southern California Center for Systems and Software Engineering Model Accuracy Measures • Magnitude of relative error (MRE) yi yˆi MREi yi • Mean of MRE (MMRE) 1 MMRE N N MRE i 1 i • Prediction Level: PRED(l) = k/N – k is the number of estimates with MRE ≤ l – Commonly used PRED(0.30) and PRED(0.25) © 2010, USC-CSSE 32