University of Southern California Center for Systems and Software Engineering COCOMO II Maintenance Model Upgrade Vu Nguyen, Barry Boehm Center for Systems and Software Engineering (CSSE) CSSE Annual Research Review 2010 Mar 8th, 2010 © 2010, USC-CSSE 1 University of Southern California Center for Systems and Software Engineering Outline • Motivation • • • • Research Problem and A Solution Major Software Maintenance Estimation Models COCOMO II for Maintenance Research Validation and Preliminary Results • Next Steps • Expected Main Contributions • References © 2010, USC-CSSE 2 University of Southern California Center for Systems and Software Engineering Magnitude of Software Maintenance • Majority of software costs incur after the first operational release [Boehm 1981] % of Software Cost Maintenance vs. Total Software Cost 100 90 80 70 60 50 40 30 20 10 0 Others Maintenance Zelkowitz et al. (1979) McKee (1984) Moad (1990) Erlikh (2000) Studies Fig. 1. Software maintenance cost versus total software cost © 2010, USC-CSSE 3 University of Southern California Center for Systems and Software Engineering Importance of Software Estimation in Managing Software Projects • Estimation is a key success factor of software projects – Two out of three most-cited project failures are related to resource estimation, according to a CompTIA survey in 2007 [Rosencrance 2007] • Cost estimate is the key information for investment, project planning and control, etc. • Many software estimation approaches have been proposed and used in industry – E.g., COCOMO, SEER-SEM, SLIM, PRICE-S, Function Point Analysis © 2010, USC-CSSE 4 University of Southern California Center for Systems and Software Engineering The Problem • These models are built on the assumptions of new development projects • Problem is that these assumptions do not always hold in software maintenance due to differences between new development and maintenance – Low estimation accuracies achieved © 2010, USC-CSSE 5 University of Southern California Center for Systems and Software Engineering A Solution Improved COCOMO models that allow estimators to better determine the equivalent size of maintained software and estimate maintenance effort can improve the accuracy of effort estimation of software maintenance. © 2010, USC-CSSE 6 University of Southern California Center for Systems and Software Engineering Major Software Maintenance Estimation Models Models Key size metrics COCOMO KnowledgePlan PRICE-S SEER-SEM SLIM Code added, modified; equivalent SLOC Code added, reused, deleted, changed Code added, adapted, reused Effective SLOC or FP Code added and modified • Estimate cost for the maintenance phase after the software is delivered • All types of regular maintenance tasks are included • Most of the models use SLOC as a size input • Lack of empirical evaluation of these models © 2010, USC-CSSE 7 University of Southern California Center for Systems and Software Engineering Outline • Motivation • Research Problem and A Solution • Major Software Maintenance Estimation Models • COCOMO II for Maintenance – Modeling Process – A Software Maintenance Sizing Method – Effort Model – Calibration Techniques • Research Validation and Preliminary Results • Next Steps • Expected Main Contributions • References © 2010, USC-CSSE 8 University of Southern California Center for Systems and Software Engineering The Modeling Process 1 Analyze existing literature Perform maintenance experiment to validate some size measures 3A Perform Behavioral Analysis Identify relative significance of factors 4 5A Determine sizing method for maintenance 2 3B Determine form of effort model Perform Expert-judgment and Delphi assessment 5B Gather project data 6 Test hypotheses about impact of parameters 7 Calibrate model parameters Determine parameter variability 8 Evaluate model performance © 2010, USC-CSSE Fig. 2. The Modeling Process, Adapted from the COCOMO Modeling Process [Boehm 2000] 9 University of Southern California Center for Systems and Software Engineering COCOMO II for Maintenance – A Glance • An extension of COCOMO II – COCOMO is the non-proprietary most popular model – COCOMO has attracted many independent validations and extensions • Two Components – A Unified Reuse and Maintenance Model • Determining equivalent SLOC for reuse and maintenance – COCOMO Effort Model for Maintenance • Using a different set of parameters and constants © 2010, USC-CSSE 10 University of Southern California Center for Systems and Software Engineering Software Maintenance Sizing • Size is a key determinant of effort • Sizing method has to take into account different types of code Preexisting Code Delivered Code Reused Modules External Modules Existing System Modules Manually develop and maintain Automatically translate Fig.3. Types of Code © 2010, USC-CSSE Adapted Modules New Modules Automatically Translated Modules 11 University of Southern California Center for Systems and Software Engineering A Unified Reuse and Maintenance Model – 1/3 • Objectives – Provide a consistent size measure for both reuse and maintenance – Better account for different types of code • Changes – Use a single model for both reuse and maintenance – Redefine the reuse model equation and parameters to • account for code expansion • allow equivalent SLOC to be determined from completed code • include deleted SLOC in modified modules, but excluded SLOC in deleted modules • smooth the curve representing nonlinear effects © 2010, USC-CSSE 12 University of Southern California Center for Systems and Software Engineering A Unified Reuse and Maintenance Model – 2/3 • New AAF and AAM equations Major changes AAF 0.4 * DM CM 0.3* IM 2 AAF AA AAF 1 1 * SU * UNFM 100 if AAF 100 AAM 100 AA AAF SU * UNFM if AAF 100 100 AAF – Adaptation Adjustment Factor (% of modification) AAM – Adaptation Adjustment Multiplier DM – Percentage of Design Modified, accounting for only design changes made to the preexisting modules IM – Percentage of Integration changed, relative to the integration of the preexisting modules CM – Percentage of Code Modified, including added, modified, deleted © 2010, USC-CSSE 13 University of Southern California Center for Systems and Software Engineering A Unified Reuse and Maintenance Model – 3/3 • Compute Equivalent SLOC (ESLOC): – New Modules: KSLOCadded – Adapted Modules: EKSLOC adapted AKLOC * AAM AKLOC : KSLOC of the adapted modules before changes – Reused Modules: EKSLOCreused 0.3 * RKSLOC * IM reused RKSLOC: KSLOC of the reused modules – Total Equivalent KSLOC: EKSLOC KSLOC added EKSLOC adapted EKSLOC reused © 2010, USC-CSSE 14 University of Southern California Center for Systems and Software Engineering COCOMO Effort Model for Maintenance • Follows the same COCOMO II non-linear form SF PM A * Size * EM B Where, PM – project effort measured in person-month A – a multiplicative constant, calibrated using data sample B – an exponent constant, calibrated using data sample Size – software size measured in SLOC EM – effort multipliers, cost drivers that have an multiplicative effect on effort SF – scale factors, cost drivers that have an exponential effect on effort • Linearize the model using log-transformation log(PM) = 0 + 1 log(Size) + i SFi log(Size) + j log(EMj) © 2010, USC-CSSE 15 University of Southern California Center for Systems and Software Engineering Calibration • Process of fitting data to the model to adjust its parameters and constants Rating scales for cost drivers Delphi survey of experts (Expert-judgment estimates) Model Calibration New rating scales for cost drivers and Constants Calibration Techniques: - Ordinary Least Squares Regression (OLS) - Bayesian Analysis - Constrained Regression Technique [Nguyen 2008] Sample data Fig.4. Calibration Process © 2010, USC-CSSE 16 University of Southern California Center for Systems and Software Engineering Outline • Motivation • • • • Research Problem and A Solution Major Software Maintenance Estimation Models COCOMO II for Maintenance Research Validation and Preliminary Results • Next Steps • Expected Main Contributions • References © 2010, USC-CSSE 17 University of Southern California Center for Systems and Software Engineering Data Collection – 1/2 • Collect data of completed maintenance projects from industry – Maintenance type: error corrections, enhancements, etc. Excluding reengineering and language-migration projects • CodeCount tool (UCC) is used for size collection [Nguyen 2007] Release N Project starts for Release N+1 Release N+1 Project starts for Release N+2 Timeline Baseline 1 Maintenance project N+1 Baseline 2 Fig.5. Release Period to be Collected © 2010, USC-CSSE 18 University of Southern California Center for Systems and Software Engineering Data Collection – 2/2 • Metrics Metric Description Product Name of software product. Release Release number. A software product has multiple releases, each associated with a maintenance project. Effort Total time in person-month for the maintenance project delivering the release. SLOC adapted Sum of SLOC added, modified, and deleted of adapted modules. SLOC pre-adapted SLOC count of the preexisting modules to be adapted. SLOC added SLOC count of new modules. SLOC reused SLOC count of reused modules. CM The percentage of code added, modified, and deleted. DM The percentage of design modified. IM The percentage of implementation and test needed for the preexisting modules. SU Software Understanding UNFM Programmer Unfamiliarity Cost drivers Rating levels for 22 cost drivers. © 2010, USC-CSSE 19 University of Southern California Center for Systems and Software Engineering Current Status • Completed the controlled experiment • Collected 83 projects/releases – 64 from a large organization member of CSSE Affiliates – 14 from a CMMI-Level 5 company in Vietnam – 4 from a CMMI-Level 3 company in Thailand • Generated preliminary results using Bayesian and constrained regression techniques – Use cost-driver rating scales from Delphi exercise for COCOMOII.2000 (Delphi COCOMOII.2000) • Delphi survey has yet to be completed © 2010, USC-CSSE 20 University of Southern California Center for Systems and Software Engineering Preliminary Results – 1/2 • Relative Impact of Cost Drivers on Effort – Rating scales: Delphi COCOMOII.2000 Sample data: 161 projects 1.43 SCED 1.48 A P EX P VOL 1.47 P VOL RUSE 1.61 1.39 DA TA LTEX 1.41 LTEX A CA P 2.00 A CA P 2.51 CP LX RESL 1.40 FLEX 1.26 FLEX 1.25 TEA M 1.00 P REC 1.32 1.64 2.32 1.65 1.65 1.29 1.28 1.27 1.23 PM AT 1.41 1.50 1.50 CP LX TEA M PM AT -37% P CA P 1.78 P CA P P REC 1.47 1.26 RELY 1.49 RESL 1.43 TIM E 1.71 RELY -17% STOR 1.44 TIM E 2.00 +16% 2.50 Values Generated by COCOMO II for Maintenance Using the Bayesian Analysis 1.15 P LEX 1.35 STOR 1.51 1.41 RUSE DA TA P EXP 1.53 1.46 DOCU 1.22 +18% 1.56 P CON A EXP DOCU 1.74 TOOL 1.51 P CON 1.53 SITE 1.47 TOOL 1.42 SCED 1.52 SITE COCOMO II.2000 Values [Boehm 2000] Sample data: 83 projects 1.00 1.35 1.50 2.00 2.50 Fig. 6. Productivity Ranges © 2010, USC-CSSE 21 University of Southern California Center for Systems and Software Engineering Preliminary Results – 2/2 • Estimation Accuracies – Estimated 83 projects using • COCOMO II.2000 • COCOMO II for Maintenance: Bayesian analysis • COCOMO II for Maintenance: Constrained regression – Rating scales: Delphi COCOMOII.2000 – Computed MMRE and PRED(0.3) values: Model MMRE PRED(0.3) COCOMO II.2000 64% 34% COCOMO II for Maintenance: Bayesian 51% 39% COCOMO II for Maintenance: CMRE 48% 46% © 2010, USC-CSSE 22 University of Southern California Center for Systems and Software Engineering Next Steps • Perform Delphi survey to obtain expertjudgment rating scales for maintenance • Continue data collection – COCOMO data for both new development and maintenance projects • Validate the research hypotheses • Analyze and validate the models © 2010, USC-CSSE 23 University of Southern California Center for Systems and Software Engineering Expected Main Contributions • A model for sizing maintenance and reuse • An extended COCOMO model for maintenance • A set of cost drivers and levels of their impact on maintenance cost • Empirical validations on the impact of cost drivers for software maintenance © 2010, USC-CSSE 24 University of Southern California Center for Systems and Software Engineering References Boehm B.W. (1981), “Software Engineering Economics”, Prentice-Hall, Englewood Cliffs, NJ, 1981. Erlikh L. (2000). “Leveraging legacy system dollars for E-business”. (IEEE) IT Pro, May/June, 17-23. McKee J. (1984). “Maintenance as a function of design”. Proceedings of the AFIPS National Computer Conference, 187-193. Moad J. (1990). “Maintaining the competitive edge”. Datamation 61-62, 64, 66. Nguyen V., Deeds-Rubin S., Tan T., Boehm B.W. (2007), “A SLOC Counting Standard,” The 22nd International Annual Forum on COCOMO and Systems/Software Cost Modeling. Nguyen V., Steece B., Boehm B.W. (2008), “A constrained regression technique for COCOMO calibration”, Proceedings of the 2nd ACM-IEEE international symposium on Empirical software engineering and measurement (ESEM), pp. 213-222 Rosencrance L. (2007), "Survey: Poor communication causes most IT project failures," Computerworld Zelkowitz M.V., Shaw A.C., Gannon J.D. (1979). “Principles of Software Engineering and Design”. Prentice-Hall © 2010, USC-CSSE 25 University of Southern California Center for Systems and Software Engineering Backup Slides © 2010, USC-CSSE 26