University of Southern California Center for Systems and Software Engineering Domain-Driven Software Cost Estimation Wilson Rosa (Air Force Cost Analysis Agency) Barry Boehm (USC) Brad Clark (USC) Thomas Tan (USC) Ray Madachy (Naval Post Graduate School) 27th International Forum on COCOMO® and Systems/Software Cost Modeling October 16, 2012 This material is based upon work supported, in whole or in part, by the U.S. Department of Defense through the Systems Engineering Research Center (SERC) under Contract H98230-08-D-0171. The SERC is a federally funded University Affiliated Research Center (UARC) managed by Stevens Institute of Technology consisting of a collaborative network of over 20 universities. More information is available at www.SERCuarc.org University of Southern California Center for Systems and Software Engineering Research Objectives • Make collected data useful to oversight and management entities – Provide guidance on how to condition data to address challenges – Segment data into different Application Domains and Operating Environments – Analyze data for simple Cost Estimating Relationships (CER) and Schedule-Cost Estimating Relationships (SCER) within each domain – Develop rules-of-thumb for missing data Domain CER/SER Data Preparation and Analysis Data Records for one Domain Cost (Effort) = a * Sizeb Schedule = a * Sizeb * Staffc 27th International Forum on COCOMO® and Systems/Software Cost Modeling 2 University of Southern California Center for Systems and Software Engineering Stakeholder Community • Research is collaborative across heterogeneous stakeholder communities who have helped us in refining our data definition framework, taxonomy, providing us data and funding Funding Sources Data Sources Project has 27th evolved into a Joint Government Software Study3 International Forum on COCOMO® and Systems/Software Cost Modeling University of Southern California Center for Systems and Software Engineering Topics • Data Preparation Workflow – Data Segmentation • • • • • • Analysis Workflow Software Productivity Benchmarks Cost Estimating Relationships Schedule Estimating Relationships Conclusion Future Work 27th International Forum on COCOMO® and Systems/Software Cost Modeling 4 University of Southern California Center for Systems and Software Engineering Data Preparation University of Southern California Center for Systems and Software Engineering Current Dataset • Multiple Data Formats (SRDR, SEER, COCOMO) • SRDR (377 records) + Other (143 records) = 522 total records Software Resources Data Report: Final Developer Report - Sample Page 1: Report Context, Project Description and Size 1. Report Context 1. System/Element Name (version/release): 2. Report As Of: 3. Authorizing Vehicle (MOU, contract/amendment, etc.): 4. Reporting Event: Contract/Release End Submission # ________ (Supersedes # _______, if applicable) Description of Actual Development Organization 5. Development Organization: 6. Certified CMM Level (or equivalent): 7. Certification Date: 8. Lead Evaluator: 9. Affiliation: 10. Precedents (list up to five similar systems by the same organization or team): Comments on Part 1 responses: 2. Product and Development Description Percent of Product Size 1. Primary Application Type: 2. 17. Primary Language Used: 18. Upgrade or New? Actual Development Process % 3. 4. % 21. List COTS/GOTS Applications Used: 22. Peak staff (maximum team size in FTE) that worked on and charged to this project: __________ 23. Percent of personnel that was: Highly experienced in domain: ___% Nominally experienced: ___% Entry level, no experience: ___% Comments on Part 2 responses: 3. Multiple Sources Provide Actuals at Final Delivery Product Size Reporting 1. Number of Software Requirements, not including External Interface Requirements (unless noted in associated Data Dictionary) 2. Number of External Interface Requirements (i.e., not under project control) 3. Amount of Requirements Volatility encountered during development (1=Very Low .. 5=Very High) Code Size Measures for items 4 through 6. For each, indicate S for physical SLOC (carriage returns); Snc for noncomment SLOC only; LS for logical statements; or provide abbreviation _________ and explain in associated Data Dictionary. 4. Amount of New Code developed and delivered (Size in __________ ) 5. Amount of Modified Code developed and delivered (Size in __________ ) 6. Amount of Unmodified, Reused Code developed and delivered (Size in __________ ) Comments on Part 3 responses: DD Form 2630-3 Page 1 of 2 27th International Forum on COCOMO® and Systems/Software Cost Modeling 6 University of Southern California Center for Systems and Software Engineering The Need for Data Preparation • Issues found in dataset – – – – – – – – – – – Inadequate information on modified code (size provided) Inadequate information on size change or growth Size measured inconsistently Inadequate information on average staffing or peak staffing Inadequate information on personnel experience Inaccurate effort data in multi-build components Missing effort data Replicated duration (start and end dates) across components Inadequate information on schedule compression Missing schedule data No quality data 27th International Forum on COCOMO® and Systems/Software Cost Modeling 7 University of Southern California Center for Systems and Software Engineering Data Preparation Workflow Start with SRDR submissions Inspect each Data Point Determine Data Quality Levels No resolution Correct Missing or Questionable Data Normalize Data Exclude from Analysis Segment Data 8 27th International Forum on COCOMO® and Systems/Software Cost Modeling University of Southern California Center for Systems and Software Engineering Segment Data by Operating Environments (OE) Operating Environment Fixed (GSF) Ground Site (GS) Mobile (GSM) Manned (GVM) Examples Command Post, Ground Operations Center, Ground Terminal, Test Faculties Intelligence gathering stations mounted on vehicles, Mobile missile launcher Tanks, Howitzers, Personnel carrier Ground Vehicle (GV) Unmanned (GVU) Robots Manned (MVM) Aircraft carriers, destroyers, supply ships, submarines Maritime Vessel (MV) Unmanned (MVU) Mine hunting systems, Towed sonar array Manned (AVM) Fixed-wing aircraft, Helicopters Aerial Vehicle (AV) Unmanned (AVU) Remotely piloted air vehicles Ordinance Vehicle (OV) Unmanned (OVU) Air-to-air missiles, Air-to-ground missiles, Smart bombs, Strategic missiles Manned (SVM) Passenger vehicle, Cargo vehicle, Space station Unmanned (SVU) Orbiting satellites (weather, communications), Exploratory space vehicles Space Vehicle (SV) 27th International Forum on COCOMO® and Systems/Software Cost Modeling 9 University of Southern California Center for Systems and Software Engineering Segment Data by Productivity Type (PT) • Different productivities have been observed for different software application types. • SRDR dataset was segmented into 14 productivity types to increase the accuracy of estimating cost and schedule 1. Sensor Control and Signal Processing (SCP) 2. Vehicle Control (VC) 3. Real Time Embedded (RTE) 4. Vehicle Payload (VP) 5. Mission Processing (MP) 6. System Software (SS) 7. Telecommunications (TEL) 8. Process Control (PC) 9. Scientific Systems (SCI) 10.Planning Systems (PLN) 11.Training (TRN) 12.Test Software (TST) 13.Software Tools (TUL) 14.Intelligence & Information Systems (IIS) 27th International Forum on COCOMO® and Systems/Software Cost Modeling 10 University of Southern California Center for Systems and Software Engineering Example: Finding Productivity Type Finding Productivity Type (PT) using the Aircraft MIL-STD-881 WBS: The highest level element represents the environment. In the MAV environment there are the Avionics subsystem, Fire-Control sub-subsystem, and the sensor, navigation, air data, display, bombing computer and safety domains. Each domain has an associated productivity type. Env Subsys Sub-subsystem Domains PT MAV Avionics Fire Control Search, target, tracking sensors SCP Self-contained navigation RTE Self-contained air data systems RTE Displays, scopes, or sights RTE Bombing computer MP Safety devices RTE Data Display Multi-function display RTE and Controls Control display units RTE Display processors MP On-board mission planning TRN Level 1 Level 2 Level 3 Level 4 27th International Forum on COCOMO® and Systems/Software Cost Modeling 11 University of Southern California Center for Systems and Software Engineering Operating Environment & Productivity Type Productivity Type Operating Environment GSF GSM GVM GVU MVM MVU AVM AVU OVU SVM SVU SSP VC RTE VP MP SS TEL PC SCI PLN TRN TST TUL IIS X When the dataset is segmented by Productivity Type and Operating Environment, the impact accounted for by many COCOMO II model drivers are considered 27th International Forum on COCOMO® and Systems/Software Cost Modeling 12 University of Southern California Center for Systems and Software Engineering Data Analysis University of Southern California Center for Systems and Software Engineering Analysis Workflow Prepared, Normalized & Segmented Data Derive CER Model Form Derive SCER Publish SCER 10/16/2012 Derive Final-CER & reference data subset Publish Productivity Benchmarks by Productivity Type & Size Group Publish CER results CER: Cost Estimating Relationship PR: Productivity Ratio SER: Schedule Estimating Relationship SCER: Schedule Compression / Expansion Relationship 27th International Forum on COCOMO® and Systems/Software Cost Modeling 14 University of Southern California Center for Systems and Software Engineering Software Productivity Benchmarks • • Productivity-based CER Software productivity refers to the ability of an organization to generate outputs using the resources that it currently has as inputs. Inputs typically include facilities, people, experience, processes, equipment, and tools. Outputs generated include software applications and documentation used to describe them. • The metric used to express software productivity is thousands of equivalent source lines of code (ESLOC) per person-month (PM) of effort. While many other measures exist, ESLOC/PM will be used because most of the data collected by the Department of Defense (DoD) on past projects is captured using these two measures. While controversy exists over whether or not ESLOC/PM is a good measure, consistent use of this metric (see Metric Definitions) provides for meaningful comparisons of productivity. University of Southern California Center for Systems and Software Engineering Software Productivity Benchmarks Benchmarks by PT, across all operating environments** PT MEAN (ESLOC/PM MIN MAX ) (ESLOC/PM) (ESLOC/PM) KESLOC Obs. Std. Dev. CV MIN MAX SCP 10 50 80 38 19 39% 1 162 VP 28 82 202 16 43 52% 5 120 RTE 33 136 443 52 73 54% 1 167 MP 34 189 717 47 110 58% 1 207 SCI 9 221 431 39 119 54% 1 171 SYS 61 225 421 60 78 35% 2 215 IIS 169 442 1039 36 192 43% 1 180 ** The following operating environments were included in the analysis: • Ground Surface Vehicles • Sea Systems • Aircraft • Missile / Ordnance (M/O) • Spacecraft Preliminary Results – More Records to be added University of Southern California Center for Systems and Software Engineering Software Productivity Benchmarks Benchmarks by PT, Ground System Manned Only MIN (ESLOC/PM) MEAN MAX (ESLOC/PM) (ESLOC/PM) 56 80 Std. KESLOC Obs. Dev. CV MIN MAX 13 17 30% 1 76 PT SCP OE GSM RTE GSM 51 129 239 22 46 36% 9 89 MP GSM 87 162 243 6 52 32% 15 91 SYS GSM 115 240 421 28 64 26% 5 215 SCI GSM 9 243 410 24 108 44% 5 171 IIS GSM 236 376 581 23 85 23% 15 180 27 Preliminary Results – More Records to be added CV: ESLOC: KESLOC: MAD: MAX: MIN: PM: PT: OE: Cost Variance Equivalent SLOC Equivalent SLOC in Thousands Mean Absolute Deviation Maximum Minimum Effort in Person-Months Productivity Type Operating Environment University of Southern California Center for Systems and Software Engineering Cost Estimating Relationships Preliminary Results – More Records to be added University of Southern California Center for Systems and Software Engineering CER Model Forms • • • • • • Effort = a * Size Effort = a * Size + b Effort = a * Sizeb + c Effort = a * ln(Size) + b Effort = a * Sizeb * Durationc Effort = a * Sizeb * c1-n Production Cost (Cost/Unit) Scaling Factor Log-Log transform % Adjustment Factor ln(Effort) = b0 + (b1 * ln(Size)) + (b2 * ln(c1)) + (b3 * ln(c2)) + … Effort = eb0 * Sizeb1 * c1b2 * c2b3 + … Anti-log transform 19 University of Southern California Center for Systems and Software Engineering Software CERs by Productivity Type (PT) CERs by PT, across all operating environments** PT Equation Form Obs. R2 (adj) IIS PM = 1.266 * KESLOC1.179 37 90% 35% 65 1 180 MP PM = 3.477 * KESLOC1.172 48 88% 49% 58 1 207 RTE PM = 34.32 + KESLOC1.515 52 68% 61% 46 1 167 SCI PM = 21.09 + KESLOC1.356 39 61% 65% 18 1 171 SCP PM = 74.37 + KESLOC1.714 36 67% 69% 31 1 162 SYS PM = 16.01 + KESLOC1.369 60 85% 37% 53 2 215 VP PM = 3.153 * KESLOC 1.382 16 86% 27% 50 5 120 MAD ** The following operating environments were included in the analysis: • Ground Surface Vehicles • Sea Systems • Aircraft • Missile / Ordnance (M/O) • Spacecraft Preliminary Results – More Records to be added PRED KESLOC (30) MIN MAX University of Southern California Center for Systems and Software Engineering Software CERs for Aerial Vehicle Manned (AVM) CERs by Productivity Type, AVM Only R2 PRED KESLOC Obs. (adj) MAD (30) MIN MAX PT OE Equation Form MP MAV PM = 3.098*KESLOC1.236 31 88% 50% 59 1 207 RRTE MAV PM = 5.611*KESLOC1.126 9 89% 50% 33 1 167 SCP MAV PM = 115.8 + KESLOC1.614 8 88% 27% 62 6 162 Preliminary Results – More Records to be added CERs: ESLOC: KESLOC: MAD: MAX: MIN: PM: PRED: PT: OE: Cost Estimating Relationships Equivalent SLOC Equivalent SLOC in Thousands Mean Absolute Deviation Maximum Minimum Effort in Person-Months Prediction (Level) Productivity Type Operating Environment University of Southern California Center for Systems and Software Engineering Software CERs for Manned Ground Systems Manned CERs by Productivity Type (GSM) PT OE Equation Form IIS MGS MP MGS PM = 3.201 * KESLOC1.188 RTE MGS SCI R2 PRE Obs. (adj) MAD (30) PM = 30.83 + 1.381 * KESLOC1.103 23 KESLOC MIN MAX 16% 91 15 180 6 86% 24% 83 15 91 PM = 84.42 + KESLOC1.451 22 24% 73 9 89 MGS PM = 34.26 + KESLOC1.286 24 37% 56 5 171 SCP MGS PM = 135.5 + KESLOC1.597 13 39% 31 1 76 SYS MGS PM = 20.86 + 2.347 * KESLOC1.115 28 19% 82 5 215 Preliminary Results – More Records to be added CERs: ESLOC: KESLOC: MAD: MAX: MIN: PM: PT: OE: Cost Estimating Relationships Equivalent SLOC Equivalent SLOC in Thousands Mean Absolute Deviation Maximum Minimum Effort in Person-Months Productivity Type Operating Environment University of Southern California Center for Systems and Software Engineering Software CERs for Space Vehicle Unmanned CERs by Productivity Type (PT) - SVU Only PT OE Equation Form Obs. R2 (adj) VP SVU PM = 3.153*KESLOC 1.382 16 86% PRED MAD (30) 27% 50 KESLOC MIN MAX 5 120 CERs: Cost Estimating Relationships ESLOC: Equivalent SLOC KESLOC: Equivalent SLOC in Thousands MAD:Mean Absolute Deviation MAX: Maximum MIN: Minimum PM: Effort in Person-Months PRED: Prediction (Level) PT: Productivity Type OE: Operating Environment Preliminary Results – More Records to be added University of Southern California Center for Systems and Software Engineering Schedule Estimating Relationships Preliminary Results – More Records to be added University of Southern California Center for Systems and Software Engineering Schedule Estimation Relationships (SERs) • SERs by Productivity Type (PT), across operating environments** PT R2 PRED KESLOC Obs. (adj) MAD (30) MIN MAX Equation Form IIS TDEV = 3.176 * KESLOC0.7209 / FTE 0.4476 35 65 25 68 1 180 MP TDEV = 3.945 * KESLOC0.968 / FTE 0.7505 43 77 39 52 1 207 RTE TDEV= 11.69 * KESLOC 0.7982 / FTE 0.8256 49 70 36 55 1 167 SYS TDEV = 5.781 * KESLOC0.8272 / FTE 0.7682 56 71 27 62 2 215 SCP TDEV = 34.76 * KESLOC0.5309 / FTE 35 62 26 64 1 165 0.5799 ** The following operating environments were included in the analysis: •Ground Surface Vehicles •Sea Systems •Aircraft •Missile / Ordnance (M/O) •Spacecraft Preliminary Results – More Records to be added 27th International Forum on COCOMO® and Systems/Software Cost Modeling 25 University of Southern California Center for Systems and Software Engineering Size – People – Schedule Tradeoff 27th International Forum on COCOMO® and Systems/Software Cost Modeling 26 University of Southern California Center for Systems and Software Engineering COCOMO 81 vs. New Schedule Equations • Model Comparisons PT Obs. New Schedule Equations COCOMO 81 Equations IIS 35 TDEV = 3.176 * KESLOC0.7209 * FTE -0.4476 TDEV = 2.5 * PM0.38 MP 43 TDEV = 3.945 *KESLOC0.968 * FTE-0.7505 TDEV = 2.5 * PM0.35 RTE 49 TDEV= 11.69 *KESLOC 0.7982 * FTE -0.8256 TDEV = 2.5 * PM0.32 SYS 56 TDEV = 5.781 *KESLOC0.8272 * FTE-0.7682 TDEV = 2.5 * PM0.35 SCP 35 TDEV = 34.76 * KESLOC0.5309 * FTE-0.5799 TDEV = 2.5 * PM0.32 ** The following operating environments were included in the analysis: •Ground Surface Vehicles •Sea Systems •Aircraft •Missile / Ordnance (M/O) •Spacecraft Preliminary Results – More Records to be added University of Southern California Center for Systems and Software Engineering COCOMO 81 vs. New Schedule Equations • Model Comparisons using PRED (30%) PT IIS MP RTE SYS SCP Obs. 35 43 49 56 35 New Schedule Equations PRED (30) 68 52 55 62 64 COCOMO 81 Equations PRED (30) 28 23 16 5 8 ** The following operating environments were included in the analysis: •Ground Surface Vehicles •Sea Systems •Aircraft •Missile / Ordnance (M/O) •Spacecraft Preliminary Results – More Records to be added University of Southern California Center for Systems and Software Engineering Conclusions University of Southern California Center for Systems and Software Engineering Conclusion • Developing CERs and Benchmarks by grouping appears to account for some of the variability in estimating relationships. • Grouping software applications by Operating Environment and Productivity Type appears to have promise – but needs refinement • Analyses shown in this presentation are preliminary as more data is available for analysis – It requires preparation first 27th International Forum on COCOMO® and Systems/Software Cost Modeling 30 University of Southern California Center for Systems and Software Engineering Future Work • Productivity Benchmarks need to be segregated by sizegroups • More data is available to fill in missing cells in the OE-PT table • Workshop recommendations will be implemented – New data grouping strategy • Data repository that provides drill-down to source data – Presents the data to the analyst – If there is a question, it is possible to navigate to the source document, e.g. data collection form, project notes, EVM data, Gantt Charts, etc. • Final results will be published online http://csse.usc.edu/afcaawiki