Local Calibration: How Many Data Points are Best?

advertisement
Local Calibration: How Many
Data Points are Best?
Presented by Barry Boehm on behalf of
Vu Nguyen, Thuy Huynh
University of Science
Vietnam National University - Ho Chi Minh city,
Vietnam
Outline





Motivation and Objectives
Methods
Data set
Results
Conclusions
6/27/2016
COCOMO Forum 2015
2
Motivation

Importance of local calibration for adapting
estimation model in organizations

Projects used for calibration affect model
performance

Small organizations lack of data while large ones
have abundance for calibration

Old data may become irrelevant for training models
to estimate future projects
6/27/2016
COCOMO Forum 2015
3
Objectives

Our studies attempt to address the following
questions:

How many data points are best for calibrating
COCOMO models?

How much old past data can be used for calibrating
COCOMO models?
6/27/2016
COCOMO Forum 2015
4
Moving windows



A technique to select training sets, previously
investigated in some studies [1][2][3]
All data points/projects within a window are used as
a training set
A window has a size, either the number of projects
or time duration
Training set
Estimating period
Time
Window moving direction
6/27/2016
COCOMO Forum 2015
5
COCOMO calibration

COCOMO II effort formula



EM and SF are effort multipliers and scale factors,
respectively
A and B are constants
This study calibrates only A and B constants
6/27/2016
COCOMO Forum 2015
6
Outline





Motivation and Objectives
Methods
Data set
Results
Conclusions
6/27/2016
COCOMO Forum 2015
7
Applying moving windows



All projects within a windows are used to calibrate
COCOMO constants A and B
Only projects within one year succeeding the
window are estimated (estimating period)
Variable window size: different number of projects
and years
Window n
Estimating period: 1 year
…
Window 2
Window 1
2009
1970
Time
Window moving direction
6/27/2016
COCOMO Forum 2015
8
Applying moving windows – 2

For each window, calibrate COCOMO using projects
in the window


Use the calibrated model to estimate projects in the
estimating period
Compute MRE’s for estimated projects

Increase window size and repeat above steps
Move window one year forward

Compute Magnitude of Relative Errors (MRE)

6/27/2016
COCOMO Forum 2015
9
Data Set

Total of 341 projects completed between 1970 and 2009



including 161 projects used to calibrate COCOMO II.2000
from 25 organizations
Number of projects each year from 1970 to 2009
50
48
45
43
40
Number of Projects
35
30
25
21
20
14
15
9
10
5
19
18
4
1
2 2
1
14
13
10
8
7
4
3
18 18
3
0 0 0 0 0 0
3
1
2
3
8
2
13
9
7
6 6
1
0
'70 '71 '72 '73 '74 '75 '76 '77 '78 '79 '80 '81 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09
Completion Year
6/27/2016
COCOMO Forum 2015
10
Outline





Motivation and Objectives
Methods
Data set
Results
Conclusions
6/27/2016
COCOMO Forum 2015
11
How many data points are best for calibrating
COCOMO models?


Lowest mean MRE’s obtained with window of 10 – 25
data points
More data points for calibration do not necessarily result
in best calibrated models
6/27/2016
COCOMO Forum 2015
12
Best window sizes (project)


Best window sizes with lowest MRE’s vary by year
In most years, best window sizes are below 50
projects
6/27/2016
COCOMO Forum 2015
13
How much old past data can be used for
calibrating COCOMO models?


Mean MRE’s increase when using older past data
Best model performance can be achieved with past
data within 5 years
6/27/2016
COCOMO Forum 2015
14
Best window sizes (year)


Best sizes with lowest MRE’s vary by year
Recent years (2001-2009), best sizes are less than
5 years
6/27/2016
COCOMO Forum 2015
15
Outline





Motivation and Objectives
Methods
Data set
Results
Conclusions
6/27/2016
COCOMO Forum 2015
16
Conclusions




Best numbers of projects and years to select data
for calibrating COCOMO vary by year
But, generally, calibrating between 10 and 25 data
points and within 5 years for COCOMO models is
best
Counter-intuitively, more data points used for
calibration do not necessarily result in high model
accuracy
Legacy data may become irrelevant for calibrating
models to estimate future projects
6/27/2016
COCOMO Forum 2015
17
Future study



Analyze the issue of why best window
sizes vary significantly by year
Take into account organizations in the
analysis of best window sizes
Apply different calibration methods in
answering the objective questions
6/27/2016
COCOMO Forum 2015
18
Thank You
References
[1] C. Lokan, E. Mendes, “Applying moving windows to software effort
estimation”, in: Proceedings of the 2009 3rd International Symposium on
Empirical Software Engineering and Measurement, IEEE Computer Society,
2009, pp. 111–122.
[2] S. Amasaki, C. Lokan, “The effects of moving windows to software
estimation: comparative study on linear regression and estimation by analogy”,
in: IWSM/Mensura’12, 2012.
[3] C. Lokan, E. Mendes, “Investigating the use of duration-based moving
windows to improve software effort prediction”, in: K. R. P. H. Leung, P.
Muenchaisri (Eds.), APSEC, IEEE, 2012, pp. 818–827.
6/27/2016
COCOMO Forum 2015
20
Download