Local Calibration: How Many Data Points are Best? Presented by Barry Boehm on behalf of Vu Nguyen, Thuy Huynh University of Science Vietnam National University - Ho Chi Minh city, Vietnam Outline Motivation and Objectives Methods Data set Results Conclusions 6/27/2016 COCOMO Forum 2015 2 Motivation Importance of local calibration for adapting estimation model in organizations Projects used for calibration affect model performance Small organizations lack of data while large ones have abundance for calibration Old data may become irrelevant for training models to estimate future projects 6/27/2016 COCOMO Forum 2015 3 Objectives Our studies attempt to address the following questions: How many data points are best for calibrating COCOMO models? How much old past data can be used for calibrating COCOMO models? 6/27/2016 COCOMO Forum 2015 4 Moving windows A technique to select training sets, previously investigated in some studies [1][2][3] All data points/projects within a window are used as a training set A window has a size, either the number of projects or time duration Training set Estimating period Time Window moving direction 6/27/2016 COCOMO Forum 2015 5 COCOMO calibration COCOMO II effort formula EM and SF are effort multipliers and scale factors, respectively A and B are constants This study calibrates only A and B constants 6/27/2016 COCOMO Forum 2015 6 Outline Motivation and Objectives Methods Data set Results Conclusions 6/27/2016 COCOMO Forum 2015 7 Applying moving windows All projects within a windows are used to calibrate COCOMO constants A and B Only projects within one year succeeding the window are estimated (estimating period) Variable window size: different number of projects and years Window n Estimating period: 1 year … Window 2 Window 1 2009 1970 Time Window moving direction 6/27/2016 COCOMO Forum 2015 8 Applying moving windows – 2 For each window, calibrate COCOMO using projects in the window Use the calibrated model to estimate projects in the estimating period Compute MRE’s for estimated projects Increase window size and repeat above steps Move window one year forward Compute Magnitude of Relative Errors (MRE) 6/27/2016 COCOMO Forum 2015 9 Data Set Total of 341 projects completed between 1970 and 2009 including 161 projects used to calibrate COCOMO II.2000 from 25 organizations Number of projects each year from 1970 to 2009 50 48 45 43 40 Number of Projects 35 30 25 21 20 14 15 9 10 5 19 18 4 1 2 2 1 14 13 10 8 7 4 3 18 18 3 0 0 0 0 0 0 3 1 2 3 8 2 13 9 7 6 6 1 0 '70 '71 '72 '73 '74 '75 '76 '77 '78 '79 '80 '81 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 Completion Year 6/27/2016 COCOMO Forum 2015 10 Outline Motivation and Objectives Methods Data set Results Conclusions 6/27/2016 COCOMO Forum 2015 11 How many data points are best for calibrating COCOMO models? Lowest mean MRE’s obtained with window of 10 – 25 data points More data points for calibration do not necessarily result in best calibrated models 6/27/2016 COCOMO Forum 2015 12 Best window sizes (project) Best window sizes with lowest MRE’s vary by year In most years, best window sizes are below 50 projects 6/27/2016 COCOMO Forum 2015 13 How much old past data can be used for calibrating COCOMO models? Mean MRE’s increase when using older past data Best model performance can be achieved with past data within 5 years 6/27/2016 COCOMO Forum 2015 14 Best window sizes (year) Best sizes with lowest MRE’s vary by year Recent years (2001-2009), best sizes are less than 5 years 6/27/2016 COCOMO Forum 2015 15 Outline Motivation and Objectives Methods Data set Results Conclusions 6/27/2016 COCOMO Forum 2015 16 Conclusions Best numbers of projects and years to select data for calibrating COCOMO vary by year But, generally, calibrating between 10 and 25 data points and within 5 years for COCOMO models is best Counter-intuitively, more data points used for calibration do not necessarily result in high model accuracy Legacy data may become irrelevant for calibrating models to estimate future projects 6/27/2016 COCOMO Forum 2015 17 Future study Analyze the issue of why best window sizes vary significantly by year Take into account organizations in the analysis of best window sizes Apply different calibration methods in answering the objective questions 6/27/2016 COCOMO Forum 2015 18 Thank You References [1] C. Lokan, E. Mendes, “Applying moving windows to software effort estimation”, in: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, IEEE Computer Society, 2009, pp. 111–122. [2] S. Amasaki, C. Lokan, “The effects of moving windows to software estimation: comparative study on linear regression and estimation by analogy”, in: IWSM/Mensura’12, 2012. [3] C. Lokan, E. Mendes, “Investigating the use of duration-based moving windows to improve software effort prediction”, in: K. R. P. H. Leung, P. Muenchaisri (Eds.), APSEC, IEEE, 2012, pp. 818–827. 6/27/2016 COCOMO Forum 2015 20