Reducing Biases in Individual Software Effort Estimations: A Combining Approach

advertisement
Reducing Biases in Individual Software Effort
Estimations: A Combining Approach
Qi Li,Qing Wang,Ye Yang and Mingshu Li
Laboratory for Internet Software Technologies
Institute of Software
Chinese Academy of Sciences
COCOMO Forum, October 28, 2008
6/27/2016
COCOMO Forum 2008
1
Agenda





6/27/2016
Introduction
Optimal Linear Combining (OLC) Method
with an Experimental Study
Lessons learned from the Experiment
Discussion of Possible Threats to Validity
Conclusions and Future Work
COCOMO Forum 2008
2
Agenda





6/27/2016
Introduction
Optimal Linear Combining (OLC) Method
with an Experimental Study
Lessons learned from the Experiment
Discussion of Possible Threats to Validity
Conclusions and Future Work
COCOMO Forum 2008
3
Introduction
•
•
Effort estimation tools and
techniques abound, each with its
own set of advantages and
disadvantages, and no tool stands
out to be the silver bullet
Usually, one estimation tool performs
well on some projects, but does
much worse on other projects
4
3
2
1
0
-1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-2
-3
-4
Which
technique, or
tool should I
use?
•
Some empirical studies show that
when one technique predicts poorly,
other techniques tend to perform
significantly better
1
0.8
0.6
0.4
0.2
0
1
2
3
4
5
6
7
8
9
-0.2
6/27/2016
COCOMO Forum 2008
-0.4
4
Introduction (Cont.)



Best practices recommend that project managers should
use at least two approaches since many factors affect
the estimation and these might be captured by using
alternative approaches
Combining forecasting techniques have been rapidly
developed and widely used in many practical fields such
as whether forecasting, money market, macroeconomics analysis etc. with considerable success
It has come to a consensus that combining estimation
may help integrate estimating knowledge acquired by
component methods, reduce errors deriving from faulty
assumptions, bias, or mistakes in data and improve the
estimation accuracy
6/27/2016
COCOMO Forum 2008
5
Introduction (Cont.)
Which
technique, or
tool should I
use?
Expert
method
Regression-based
method
A new way to estimate
Parametric
method
Combine to generate
more accurate result
Learning-oriented
method
Dynamics-based
method
6/27/2016
COCOMO Forum 2008
…
6
Agenda





6/27/2016
Introduction
OLC Method with an Experimental Study
Lessons learned from the Experiment
Discussion of Possible Threats to Validity
Conclusions and Future Work
COCOMO Forum 2008
7
OLC Method with an Experimental Study

Optimal Linear Combining method is the most typical
linear method

Granger and Ramanathan first introduce the OLC
method, Hashem and Schmeiser extend the idea of
OLCs and discuss related issues about how to improve
the predictive power of the combined model by reducing
collinearity

The OLC method gives components different weights
according to their performances, can make full use of
information provided by each component to maximize
6/27/2016
COCOMO Forum 2008
the accuracy in prediction
8
OLC Method with an Experimental Study

Overview of OLC Method
 Step 1: Preparing data and component methods
 The same organization and preferably of the
same project type
 “Different” component methods. Error Correlation
Analysis
 Step 2: OLC modeling
 OLS, Four cases of OLC
 Step 3: Further improving OLC’s predictive power
 Collinearity
 Step 4: Returning the final estimating model
6/27/2016
COCOMO Forum 2008
9
1. Company-specific data collection
Preparing Data
and CM
Individual
Estimates
2.Error correlation analysis
OLC Modeling
3.Choose the one with lowest MSE_CV
from the four cases of OLCs
Further
Improving
Predictive Power
Only Two CMs left ?
Yes
Yes
No
4. Strongest collinearity
detection,drop the one with
higher MSE_CV and recalculate
OLC with a
constant?
MSE_CV lower?
No
OLC
Model
Yes
No
5.Drop the constant
and recalculate
Yes
Returning the
Final Estimating
Model
Yes
MSE_CV lower?
No
Lower than Best
CM and SA?
Lower than Best
CM and SA?
Yes
Yes
6.Return OLC
with no constant
CM: Component Methods
MSE_CV: MSE by Cross-Validation
SA: Simple Average
6/27/2016
No
6.Return OLC with
constant
6.Return the one with lower
COCOMOMSE
Forum
of2008
Best CM and SA
No
More
Accurate
Estimating
Result
10
OLC Method with an Experimental Study
Step 1: Preparing data and component methods

6/27/2016
Experiment Data Source: individual estimates of COCOMO、SLIM and
Function Points for 15 projects from F.Kemerer’s empirical work
COCOMO Forum 2008
Low
Correlation can
benefit
combining
11
OLC Method with an Experimental Study
Step 2:OLC Modeling (1/2)

Essence: Multiple regression analysis using Ordinary Least Square
(OLS), estimates of components as independent variables, and actual
effort as attributive variable
p
Y   0   j y j

6/27/2016
j 1
Four extended cases of OLC models
COCOMO Forum 2008
12
OLC Method with an Experimental Study
Step 2:OLC Modeling (2/2)

In-Sample MSE Comparison

Accuracy Comparison after LOOCV (leave-one-out cross-validation
LOOCV)
OLC’s MSE still
larger than F’s, so it
needs further
6/27/2016
improvement
COCOMO Forum 2008
MSRE and MMRE
have already been
improved 13
OLC Method with an Experimental Study
Step 3:Further Improving Predictive Power (1/4)


The problem that affects the predictive power of the OLC is the
collinearity among the predictors variables
Solution:


6/27/2016
A common and simple way to deal with collineariy is to drop a
component involved in the strongest collinearity.
Two rules:
 “High R2( the multiple coefficient of determination [45])but few
significant t ratios”. The variables whose coefficients are not
significant are involved in collinearity. This rule of thumb helps
us to detect collinearity and identify all the variables involved
in collinearity.
 “High pair-wise correlations among regressors”. If the pairwise or zero-order correlation coefficient between two
regressors is high (generally higher than 0.8) then collinearity
is a serious problem. This rule of thumb helps us to find the
pair involved in the strongest collinearity.
COCOMO Forum 2008
14
OLC Method with an Experimental Study
Step 3:Further Improving Predictive Power (2/4)
Accuracy Comparison after Dropping C
Drop worse C
OLC’s MSE is
smaller than F’s,
Accuracy has
been
improved
6/27/2016
COCOMO Forum 2008
15
OLC Method with an Experimental Study
Step 3:Further Improving Predictive Power (3/4)
OLC after Dropping C
OLC after Dropping Constant
Drop the Constant
Accuracy Comparison after Dropping Constant
MSE further
decreases.
Accuracy is
improved
further
6/27/2016
COCOMO Forum 2008
Coefficients
are all
significant
16
OLC Method with an Experimental Study
Step 3:Further Improving Predictive Power (4/4)


OLC (Ⅰ_C+S+F)->OLC (Ⅰ_S+F)->OLC (Ⅲ_S+F)
In succession to maximize OLC’s predictive power
Decreasing Trend of MSE,MSRE,MMRE
6/27/2016
COCOMO Forum 2008
17
OLC Method with an Experimental Study
Step 4:Returning the Final Estimating Model

Result: Compared with the apparently best component F

Accuracy on the sense of MSE, MSRE, MMRE are improved by 66.29% ,3.09 times
and 61.48% respectively
 Consistency on the sense of SD is improved by 96.91%
6/27/2016
COCOMO Forum 2008
18
Agenda





6/27/2016
Introduction
OLC Method with an Experimental Study
Lessons learned from the Experiment
Discussion of Possible Threats to Validity
Conclusions and Future Work
COCOMO Forum 2008
19
Lessons learned from the Experiment

The improvement in combining accuracy
depends on the following factors:
 “Degree of redundancy in the information
obtained from the components”
 If
every component method captures the same
information, there is no benefit from combining
 “Superiority
of the best component method”
 If
one method performs much superior to the rest,
while the other methods have no additional
knowledge to contribute, the OLC will tend to favor
using the best component by itself
6/27/2016
COCOMO Forum 2008
20
Lessons learned from the Experiment (Cont.)
 “Adequacy
of the combination data”
 Small
quantity of data might cause severe ill
effects of collinearity
 “Outliers
at different noise levels”
 MSE
are often blamed for its high sensitiveness to
outliers, another way to reduce OLC sensitiveness
to outliers might be employing other algorithms
instead of OLS by minimizing less sensitive
criterion, such as MMRE
6/27/2016
COCOMO Forum 2008
21
Agenda





6/27/2016
Introduction
OLC Method with an Experimental Study
Lessons learned from the Experiment
Discussion of Possible Threats to Validity
Conclusions and Future Work
COCOMO Forum 2008
22
Discussion of Possible Threats to Validity

Data Quality
Estimates of components are old and can’t be
compared to up-to-date methods
 Significant accuracy improvement in component
methods will result in further accuracy improvement in
combining methods
 Our focus in this paper is not to evaluate component
methods, but to experimentally prove that combining
methods can improve predictive power


Data Quantity


6/27/2016
Lack of public data of individual estimates for the
same data set
Only 15 projects’ data might be statistically so small to
show OLC method’s effectiveness
COCOMO Forum 2008
23
Discussion of Possible Threats to Validity (Cont.)

Statistical Significance



Commonly used statistic tests: parametric test (paired t test) or
nonparametric test (Wilcoxon matched pair test) are not proper
for evaluating combining method's statistical significance, since
the combining results are highly dependent on the components,
it cannot always ensure significant improvement from the best
component
Not proper to require their results should be statistically
significantly better than the best component
Usability of OLC Model


6/27/2016
Complex and cost a lot
We are currently implementing a tool incorporating the most
popular and mature cost estimation techniques with the same
inputs to solve this problem
COCOMO Forum 2008
24
Agenda





6/27/2016
Introduction
OLC Method with an Experimental Study
Lessons learned from the Experiment
Discussion of Possible Threats to Validity
Conclusions and Future Work
COCOMO Forum 2008
25
Conclusion

Introduce the systematic combining idea into the field of
software effort estimation, and estimate software effort
using Optimal Linear Combining (OLC) method with an
experimental study based on a real-life data set

Combining estimates derived from different techniques
or tools and draw from different sources of information
should become part of the mainstream of estimating
practice in software effort to improve estimating accuracy

Combining estimates is especially useful when you are
uncertain about the situation, uncertain about which
method is the most accurate, and when you want to
avoid large errors
6/27/2016
COCOMO Forum 2008
26
Future Work

Providing an OLC estimate of the
probability distribution of its possible
values

Exploring and validating more and
effective combining methods using more
data sets
6/27/2016
COCOMO Forum 2008
27
Thank you!
6/27/2016
COCOMO Forum 2008
28
Q&A
6/27/2016
COCOMO Forum 2008
29
Download