Early Phase Software Cost and Schedule Estimation Models Presenter: Wilson Rosa Co-authors: Barry Boehm, Ray Madachy, Brad Clark , Cheryl Jones, John McGarry November 16, 2015 Outline • • • • • • • Introduction Experimental Design Data Analysis Descriptive Statistics Effort Models Schedule Models Conclusion 2 Introduction Problem Statement • Software cost estimates are more useful at early elaboration phase, when source lines of code (SLOC) and Function Points Analysis (FPA) are not yet available. • Mainstream software cost models do not provide alternate functional size measurements for early phase estimation 4 Significance of Proposed Study • This study will remedy these limitations in 3 ways: 1. Introduce effort and schedule estimating models for software development projects at early elaboration phase 2. Perform statistical analysis on parameters that are made available to analysts at early elaboration phase such as Estimated functional requirements Estimated peak staff Estimated Effort 3. Measure the direct effect of functional requirements on software development effort 5 Research Questions Question 1: Does estimated requirement relate to actual effort? Question 2: Do estimated requirements along with estimated peak staff relate to actual effort? Question 3: Does estimated effort relate to actual development duration? Question 4: Are estimating models based on Estimated Size more accurate than those based on Final Size? 6 Experimental Design Quantitative Method • A non-random sample was used since NCCA had access to names in the population and the selection process for participants was based on their convenience and availability (see next slide) • This study focused on projects reported at the total level rather than by CSCIs, as requirements count at elaboration phase are provided at the aggregate level • To minimize threats to validity the analysis framework focused on estimated inputs rather than final inputs 8 Instrumentation • Questionnaire: – Software Resource Data Report” (SRDR) (DD Form 2630) • Source: – Cost Assessment Data Enterprise (CADE) website: http://cade.osd.mil/Files/Policy/Initial_Developer_Report.xlsx http://cade.osd.mil/Files/Policy/Final_Developer_Report.xlsx • Content: – Allows for the collection of project context, company information, requirements, product size, effort, schedule, and quality 9 Sample and Population • Empirical data from 40 very recent US DoD programs extracted from the Cost Assessment Data Enterprise: http://dcarc.cape.osd.mil/Default.aspx Each program submitted: SRDR Initial Developer Report (Estimates) & SRDR Final Developer Report (Actuals) 10 Operating Environments C4I Aircraft Missile Automated Information System Enterprise Resource Planning Unmanned Aircraft 0 2 4 6 8 10 12 Number of Projects Major Automated Information Systems (AIS) Major Defense Systems 11 Delivery Year Project Delivery Year 2014 2013 2012 2011 2010 2009 2008 2007 2006 0 2 4 6 Number of Projects 8 10 Projects were completed during the time period from 2006 to 2014 12 Model Reliability and Validity Accuracy of the Models verified using five different measures: Measure Symbol Description Coefficient of Variation CV P-value α Variance Inflation Factor VIF Indicates whether multicollinearity (correlation among predictors) is present in a multi-regression analysis. Coefficient of Determination R2 The Coefficient of Determination shows how much variation in dependent variable is explained by the regression equation. F-test The value of the F test is the square of the equivalent t test; the bigger it is, the smaller the probability that the difference could occur by chance. F-test Percentage expression of the standard error compared to the mean of dependent variable. A relative measure allowing direct comparison among models. Level of statistical significance established through the coefficient alpha (p ≤ α). 13 Data Analysis Pairwise Correlation Analysis • Variable selection based on Pairwise Correlation – Pairwise Correlation chosen over structural equation modeling as the number of observations (40) was far below the minimum observations (200) needed – Variables examined: Actual Effort Estimated Total Requirements Actual Duration Actual Total Requirements Estimated New Requirements Actual New Requirements Estimated Peak Staff Actual Peak Staff Scope Volatility Estimated Effort 15 Pairwise Correlation Analysis Actual Effort Actual Estimated Actual Estimated Actual Estimated Actual Estimated Duration Total REQ Total REQ New REQ New REQ Effort Peak Staff Peak Staff Actual Effort Actual Duration 1.0 0.6 0.6 1.0 0.7 0.4 0.7 0.4 0.7 0.5 0.5 0.3 0.6 0.2 0.4 -0.2 0.4 -0.2 Estimated Total Requirement 0.7 0.4 1.0 0.9 0.9 0.7 0.6 0.2 0.2 Actual Total Requirement Estimated New Requirement Actual New Requirement Estimated Effort Actual Peak Staff Estimated Peak Staff RVOL Scope 0.7 0.7 0.5 0.6 0.4 0.4 0.1 0.2 0.4 0.5 0.3 0.2 -0.2 -0.2 0.1 -0.1 0.9 0.9 0.7 0.6 0.2 0.2 0.0 0.1 1.0 0.8 0.8 0.6 0.3 0.3 0.0 0.1 0.8 1.0 0.9 0.7 0.2 0.2 0.5 0.4 0.8 0.9 1.0 0.5 0.5 0.4 0.2 0.3 0.6 0.7 0.5 1.0 0.6 0.6 0.1 0.1 0.3 0.2 0.5 0.6 1.0 1.0 0.1 0.4 0.3 0.2 0.4 0.6 1.0 1.0 0.1 0.4 Strong Correlation Moderate Correlation Weak Correlation Estimated Requirements should be considered in the effort model, as it is strongly correlated to Actual Effort Estimated Peak Staff should also be considered in the effort model, as it is correlated to Actual Effort Although estimated effort is weakly correlated to actual duration, it was still chosen based past literature 16 Descriptive Statistics Project Size Boxplot Product Size by Mission Area Functional Requirements 10000 8000 6000 4000 2001 2000 906 0 IT Defense Observation: higher requirements count for defense projects 18 Project Duration Boxplot Development Duration by Mission Area Duration (in months) 90 80 70 60 50 39 40 30 21 20 10 0 IT Defense Observation: longer duration for defense systems due to interdependencies with hardware design and platform integration schedules. 19 Productivity Boxplot New vs Enhancement Actual Hours per Estimated Requirement Hours per Requirement 800 700 600 500 400 300 200 100 164 183 173 89 0 Enhancement New Scope Observation: No significant difference between new and enhancement projects 20 Productivity Boxplot IT vs Defense Projects Actual Hours per Estimated Requirement Hours per Requirement 800 700 600 IT Projects 500 400 300 192 200 100 141 154 0 ERP AIS Project Type Defense Observation: • ERP shows higher “hours per requirement” due to challenges with customizing SAP/Oracle 21 Effort Growth Boxplot Contract Award to End Effort Growth (Contract Award to End) Percent Growth (%) 300 250 IT Projects 200 150 100 50 22% 0 15% 37% -50 ERP AIS Project Type Defense Observation: No significant difference between Defense and IT projects 22 Effort Models Effort Model Variables Variable Type Definition Actual Effort Dependent Actual software engineering effort (in PersonMonths) Actual Total Requirements Independent Total Requirements captured in the Software Requirements Specification (SRS). These are the final total requirements at end of contract. Estimated Total Requirements Independent Total Requirements captured in the Software Requirements Specification (SRS). These are the estimated total requirements at contract award. Actual Peak Staff Independent Actual peak team size, measured in full-time equivalent staff. Only include direct labor. Estimated Peak Staff Independent Estimated peak team size at contract award, measured in full-time equivalent staff. Only include direct labor. 24 Effort Model 1: using Estimated REQ Equation: PM Model Form N R2 CV Mean MAD REQ Min 40 76 64 1739 58 25 = REQ0.6539 x RVOL0.9058x 2.368Scope PM = 22.37 x eREQ0.5862 16000 = = Coeff 22.37 0.5862 Actual effort (in Person Months) Estimated total requirements T stat 1.8262 7.3870 14000 Predicted (PM_Final) Variable Intercept eREQ 13900 Actual vs. Predicted (Unit Space) Where: PM eREQ REQ Max 12000 10000 8000 6000 4000 2000 0 0 2000 4000 6000 8000 10000 12000 14000 16000 Actual 25 Effort Model 2: using Actual REQ Equation: PM Model Form N R2 CV Mean MAD REQ Min 40 74 54 1739 55 35 = REQ0.6539 x RVOL0.9058x 2.368Scope PM = 29.08 x aREQ0.5456 16000 = = Coeff 29.08 0.5456 Actual effort (in Person Months) Actual total requirements T stat 1.7464 6.600 14000 Predicted (PM_Final) Variable Intercept aREQ 12716 Actual vs. Predicted (Unit Space) Where: PM aREQ REQ Max 12000 10000 8000 6000 4000 2000 0 0 2000 4000 6000 8000 10000 12000 14000 16000 Actual 26 Effort Model 3: using Estimated REQ and Staff Equation: PM Model Form N R2 CV Mean MAD REQ Min 40 78 54 1739 47 25 = REQ0.6539 x RVOL0.9058x 2.368Scope PM = 11.82 x eREQ0.4347 x eStaff0.4269 16000 = = = Coeff 11.82 0,4347 0.4269 Actual effort (in Person Months) Estimated total requirements Estimated Peak Staff T stat 1.8790 4.7140 3.5372 14000 Predicted (PM_Final) Variable Intercept eREQ eStaff 13900 Actual vs. Predicted (Unit Space) Where: PM eREQ eStaff REQ Max 12000 10000 8000 6000 4000 2000 0 0 2000 4000 6000 8000 10000 12000 14000 16000 Actual 27 Effort Model 4: using Actual REQ and Staff Equation: PM Model Form N R2 CV Mean MAD REQ Min 40 66 50 1739 57 35 = REQ0.6539 x RVOL0.9058x 2.368Scope PM = 17.01 x aREQ0.3006 x aStaff0.5124 16000 = = = Coeff 17.01 0.3006 0.5124 Actual effort (in Person Months) Actual total requirements Actual Peak Staff T stat 5.8891 3.3815 4.2866 14000 Predicted (PM_Final) Variable Intercept aREQ aStaff 12716 Actual vs. Predicted (Unit Space) Where: PM aREQ aStaff REQ Max 12000 10000 8000 6000 4000 2000 0 0 2000 4000 6000 8000 10000 12000 14000 16000 Actual 28 Schedule Models Schedule Model Variables Variable Type Definition Actual Duration Dependent Actual software engineering duration (in Months) from software requirements analysis through final qualification test Actual Effort Independent Actual software engineering effort at the end of the contract Estimated Effort Independent Estimated software engineering effort at contract award. 30 Schedule Model 1: using Estimated Effort Equation: PM Model Form = REQ0.6539 x RVOL0.9058x2 2.368Scope TDEV = ePM0.5290 N R CV Mean F-stat PM Min PM Max 40 94 60 38 683 17 7132 Actual vs. Predicted (Unit Space) Where: Variable ePM = = Coeff 0.529 Actual Duration in Months Estimated Effort (in Person Months) T stat 26.14 P value 0.0000 120 Predicted (TDEV_Final) TDEV ePM 140 100 80 60 40 20 0 0 20 40 60 80 100 120 140 Actual 31 Schedule Model 2: using Actual Effort Equation: PM Model Form = REQ0.6539 x RVOL0.9058x2 2.368Scope TDEV = aPM0.5051 N R CV Mean F-stat PM Min PM Max 40 95 48 38 887 27 14819 Actual vs. Predicted (Unit Space) Where: Variable aPM = = Coeff 0.529 Actual Duration in Months Actual Effort (in Person Months) T stat 26.14 P value 0.0000 120 Predicted (TDEV_Final) TDEV aPM 140 100 80 60 40 20 0 0 20 40 60 80 100 120 140 Actual 32 Conclusion Primary Findings • Estimated functional requirements is a significant contributor to software development effort. • Variation in effort becomes more significant when estimated peak staff is added to the effort model. – Thus, the effect of estimated functional requirements on effort shall be interpreted along with estimated peak staff. • Estimated effort is a significant contributor to development duration. 34 Model Usefulness Effort models based on estimated requirements and estimated peak more appropriate at early elaboration phase. Effort Models based on final requirements and final peak staff more appropriate after Critical Design Review, once requirements have been stabilized Productivity Boxplots (effort per requirement) are useful for crosschecking estimates at Preliminary Design Review Appropriate for both, Defense and IT projects 35 Study Limitations • Since data was collected at the aggregate level, the estimation models are not appropriate for projects reported at the CSCI level. • Do not use Effort Models 1 through 4 if your input parameter is outside of the effort model range. • Do not use Schedule Models 5 & 6 if your input parameter is outside of the schedule model range. 36 Future Work Develop similar effort and schedule estimation models using data reported at the CSCI level. Build effort models using functional requirements along with other cost drivers such as Complexity/Application Domain Percent reuse Requirements Volatility Process Maturity 37 Acknowledgement • Dr. Shu-Ping Hu, Tecolote Research • Dr. Brian Flynn, Technomics 38