Uploaded by Radek Silhavy

Article

advertisement
Comparing Linear Regression and other Models
with Improved Analytical Programming Model
for Software Effort Estimation
Tomas Urbanek1 , Zdenka Prokopova1 , Radek Silhavy1 and Ales Kuncar1
Faculty of Applied Informatics
Tomas Bata University in Zlin
Nad Stranemi 4511
Czech Republic
turbanek@fai.utb.cz
Abstract. This paper compares the most common regression models
with improved analytical programming for the most accurate effort estimation. For this paper, we used several models; for example simple linear
models, multiple linear models, Karner’s model and other. We compare
this models with models generated by improved analytical programming.
This study uses 10-fold cross-validation (CV) to asses reliability and
the standard statistical methods are also used. For comparison, we use
MMRE measure for all models. The experimental results show that the
linear regression and improved analytical programming techniques have
a major role in prediction of effort in software engineering. All results
were evaluated by standard approach: visual inspection and statistical
significance testing.
Keywords: analytical programming, linear regression, effort estimation, use case points
1
Introduction
Effort estimation is defined as the activity of predicting the amount of effort
required to complete a development of software project [1]. Despite of a lot of
attempts of scientists and software engineers, there is still no optimal and effective method for every software project. The common way to improve effort
estimation is to enhance the algorithmic methods. The algorithmic methods use
mathematical formula for prediction. It is very common that this group also
depends on the historical data. The most common example of algorithmic methods are COCOMO[2], FP[3] and last but not least UCP[4]. However, there are
other algorithmic methods. It is essential that the calculation of effort estimation should be completed in early stages of software development cycle. The best
case is if these calculations are already known during the requirement analysis
[4]. The accurate and reliable effort estimations are crucial factor for the proper
development cycle. These estimations are used for effective planning, monitoring and controlling the process of the software development. The prediction of
effort estimations in software engineering is complex and complicated process.
The main reason is that there are a lot of factors which influences the final prediction. One of the most substantial factor is human factor. For this reason, the
artificial intelligence could compensate the prediction error calculated by software engineer. Nowadays, the use of artificial intelligence is very common in this
research area.
Some work has been done to enhance the effort estimation based on the Use
Case Points method. These enhancements cover the review and calibration of
the productivity factor such as the work of Subriadi et al. [5]. Another enhancement could be the construction investigation and simplification of the Use Case
Points method presented by Ochodek et al. [6]. The recent work of Silhavy et al.
[7] suggest a new approach ” automatic complexity estimation based on requirements ”, which is partly based on Use Case Points method. Another approach
uses fuzzy inference system to improve accuracy of the Use Case Points method
[8]. A very promising is research of Kocaguneli et al. [9]. This paper shows that
ensemble of effort estimation methods could provide better results then a single
estimator. The works of Kaushik et al. [10] and Attarzadeh et al. [11] use neural
networks and COCOMO [2] method for prediction.
In this article, we investigated the efficiency of several models. These models
are prediction by mean, Karner’s model, simple linear regression, simple linear
regression without intercept, multiple linear regression and models produced by
improved analytical programming. The improved analytical programming can
be seen as a regression function generator. In recent time, we published study
that examined a selection of fitness function for analytical programming and
we found that the best fitness function is MSE and very common MMRE [12].
In our best knowledge, no previous study has investigated the comparison of
such models especially with improved analytical programming when Use Case
Points method and k-fold cross validation was used. Therefore, this study makes
a major contribution to research of Use Case Points method while analytical
programming method and linear regression models are used.
1.1
The Use Case Points Method
This effort estimation method was presented in 1993 by Gustav Karner[4]. It is
based on a similar principle to the function point method. Project managers have
to estimate the project parameters to four tables. Due to the aims of this paper,
the detailed description of well known Use Case Points method is insignificant
and hence omitted. Please refer to [4], [12] for more detailed description of the
Use Case Points method.
1.2
Improved Analytical Programming Algorithm
Analytical programming (AP) is a symbolic regression method. The core of analytical programming is a set of functions and operands. These mathematical
objects are used for the synthesis of a new function. Every function in the analytical programming set core has its own varying number of parameters. The
functions are sorted according to the parameters into General Function Sets
(GFS). For example, GF S1par contains functions that have only 1 parameter
e.g. sin(), cos(), and other functions. AP must be used with any evolutionary
algorithm that consists of a population of individuals for its run [13], [14]. In
this paper, Differential evolution (DE) is used as an analytical programming
evolutionary algorithm. Also we utilize new improved analytical programming
technique. The most important advantage for our application is the automatic
constant resolving procedure. This algorithm was presented in article written by
Urbanek et. al.[15].
2
Research Objectives
This section presents the design of the research questions. We compared linear regression models with improved analytical programming models for effort
estimation. The research questions of our study can be outlined as follows:
– RQ-1: Can we use sample mean for more accurate prediction?
– RQ-2: Are linear regression models more accurate than improved analytical
programming?
– RQ-3: Is there an evidence that the new models (linear regression or analytical programming) are more accurate than original Karner’s model?
The first research question (RQ-1) aims to get an insight on the dataset used
in this research. We examine the dataset and then we use sample mean to produce
predictions. Furthermore, the MMRE will be calculated for comparison against
other models. The second research question (RQ-2) aims to the production of
a linear regression models. For these models, MMRE will be also calculated.
Another task in this question is to produce a model by the improved analytical
programming technique. All of these models will be compared by MMRE measure. To address research question (RQ-3), we experimented with built models as
reported and discussed in experiment section. To asses the evidences of statistical
properties, we used exploratory statistical analysis and hypothesis testing.
3
Experiment
For all models, we used 10-fold cross validation method to asses the reliability of
our research. In our experiment, we build several prediction models. These are
sample mean prediction, Karner’s model, simple linear model, multiple linear
model and model built by analytical programming. For all models, we calculate
MMRE measure. The MMRE is chosen as a criteria for model comparison.
3.1
Prediction by sample mean
For each fold (10-fold CV), we calculated the sample mean x̄ on training data.
This sample mean will be then used as a prediction on the testing fold. Then
the MMRE will be calculated for comparison.
3.2
Karner’s model
For each fold (10-fold CV), we calculated prediction on testing set. Then the
MMRE will be calculated for comparison. For Karner’s model, we used standard
value for PF which is set to 20.
3.3
Linear Regression
In this research, we will present three models of linear regression. Simple linear
regression, simple linear regression without intercept, and multiple linear regression. The equation for simple linear regression can be seen in equations (1) and
(2).
ŷ = β0 + β1 x
(1)
,where ŷ is prediction (dependent variable), β0 is intercept, x is number of
Use Case Points (UCP) and β1 can be seen as productivity factor for Use Case
Points method.
ŷ = β1 x
(2)
The multiple linear regression is represented by equation (3).
ŷ = β0 + β1 U U CW + β2 U AW + β3 T CF + β4 ECF
(3)
,where β1,2,3,4 are estimated coefficients for each Use Case Point parameter.
For both linear regression models, we used 10-fold cross validation. MMRE
for comparison is also calculated for these two models.
3.4
Analytical Programming
The analytical Programming experiment can be seen in the Figure 1. In this
experiment, we used 10-folds cross-validation. One equation was generated in
one loop which was then verified with the rest of the dataset. The process begins
with a cycle that loops through the number of folds. In the data preparation
loop, the 10-fold cross-validation was used to split the dataset into two distinct
sets. Then, there is a second loop. In this loop, the differential evolution process
starts to generate an initial population. Analytical programming then uses this
initial population to synthesize a new function. After that, the new function is
evaluated by the MMRE. If the termination condition is met, one can assume
that one has an optimal predictive model, and this model is then evaluated by
the calculation of the MMRE on the testing set.
Table 1 shows the analytical programming set-up. The number of leafs (functions built by analytical programming can be seen as trees) was set to 20 which
can be recognized as a relatively high value. However, one needs to find the
model that will be more accurate than the other models. There is no need to
generate short and easily memorable models, but rather models that will be
Fig. 1. Diagram of proposed experiment
Table 1. Set-up of analytical programming
Parameter
Value
Number of leafs 20
GFS - functions Plus, Subtract,
Divide, Multiply,
Sin, Cos, Power,
Sqrt
GFS - constants UUCW,
UAW,
TCF, ECF, K
Constant K range 0-10
more accurate. Functions were chosen according to the assumed non-linearity of
the dataset. Constant K range was set between 0-10, this value was tested in our
previous experiment where these particular values have the best results.
Table 2 shows the set-up of differential evolution. The best set-up of differential evolution is the subject of further research.
Fitness Function The new model built by the analytical programming method
contains the following parameters: UUCW, UAW, TCF and ECF. The models
built by analytical programming method does not have to contain all of these
Table 2. Set-up of differential evolution
Parameter
NP
Generations
F
Cr
Value
45
100
0.2
0.8
parameters. Equation (4) is used for optimization task. When the LAD result is
closer to zero, then the accuracy of the proposed model is higher.
LAD =
n
X
|yi − ŷi |
(4)
i=1
,where n is equal to the number of projects in training set, ŷi is prediction, yi is
actual effort.
3.5
Dataset
The dataset for this study was collected using document reviews and contribution
from software companies. In the dataset, there are 143 distinct software projects.
There are 5 values for each software project: UUCW, UAW, TCF, ECF and
actual effort.
The distribution of each parameter in the dataset and also the correlation
coefficients can be seen on figure 2. Nearly all parameters is normally distributed.
The only exception is Actual effort which is skewed to the left side. We can also
see a considerably high correlation between Actual effort and UUCW parameter.
This relationship will be used to built linear regression model.
Figure 3 shows the distribution of actual effort in dataset. As can be seen
the majority of the software projects was completed between 1800 man/hour to
about 5000 man/hour. We can also see the skewness of this data to the left side.
This means that the actual effort is not normally distributed. This can be also
seen on density plot on figure 2. The mean for actual effort is 3565 man/hour.
4
Results
In this section, we present the result of our study. All the calculations were
performed by 10-fold cross validation on 143 software projects.
4.1
Prediction by sample mean
Table 3. shows results from computed prediction by sample mean. As can be
seen, the sample mean for this dataset is 3565.3 man/hour. The MMRE for each
fold can be seen on this table. The mean MMRE for these predictive models is
127 %. We can also see that the 5th fold shows the lowest error and the 6th fold
shows the greatest error.
UUCW
0.003
Corr:
0.412
0.002
Corr:
0.137
Corr:
0.0938
Corr:
0.691
Corr:
0.299
Corr:
0.174
Corr:
0.216
Corr:
0.433
Corr:
0.268
0.001
0.000
20
UAW
15
10
5
●
●
●
●
●
● ●●
● ● ●
●● ●
●
● ●● ● ●●
●●
●
●
●
●
●
● ● ●
● ● ● ●
●
●●
● ● ●● ● ●
●
●
● ●●● ●● ● ●
●●●●● ●
●●● ●●
●●
● ●●●●
●● ●●● ● ● ● ●
●
● ●● ● ● ● ●
●●
● ● ● ● ● ●●●
●
●
●
● ●● ●●●
●
●
● ● ●●
●
●
●●
●
●
●
●
● ● ●
●
●
●
●
●
1.2
TCF
1.1
1.0
0.9
0.8
0.7
ECF
0.8
0.6
●
Actual
3000
0
●
●
●
●
●
●
●
●●
●
● ●
● ● ●● ● ●
●
● ●
●
●
●
●
● ● ●●
●●
● ●
●●
●● ●
● ●●● ● ● ● ●
●
●
●
●
● ●
●● ●
●● ● ●●● ●
●● ●●
●
●
●
● ●●
●●●●
●●
●
●● ●● ●
● ● ●
● ● ● ● ●●
●
● ●
●● ●
● ●●● ●● ● ●●●●
● ●●●
●
●●
● ●●
●
●
●
● ●● ●
●
● ● ●● ●
● ●
●
●●
●
●
●
●
●
●
●●
●
●● ●● ●●
●
● ●
●
●●●●
●
●
●
●●
●●●
●●
●●
●
● ●●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
● ●
●
●●● ●
●
●
●●
●
● ●
●
●●
●●
●● ●
●●
●● ●
● ●
●
●●
● ●●●●
●
● ●
●
● ●
●●
●
●
● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
● ●
●
●
●●
●● ● ●
● ●
● ● ●
●
●
●
●
● ●
● ● ● ●
●
●
●
●●
● ●
●●●
● ●
●● ●
● ● ●● ● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
● ●●
●●
●
●
●
● ●●
●●●
●
● ●
●●●● ●
●
● ●● ●
●
●
●●●●●
●● ●
●
●
●●●●
●
●
●● ●
●
●● ●
● ●● ●
●
●
●
●
●
●
●
●●●● ●●
●●
●
●
●
●● ●●
● ● ●●● ●
●
●●●●
●●● ●
●
●
●●● ●
●●●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
● ● ● ● ●
●
● ● ● ●● ●
●● ●●
●●●
●
● ●
●●●●●●
● ●●
●
●
●● ●
●
●●●●
●
●●●
●
100 200 300 400 500
UUCW
●
●
●
●●
●
●
5
10
Corr:
0.246
●
●
●
●
●
●●
●
●●
●
●
●
●
● ●
● ●
● ●●
●●
● ●
●
●
●
●
●
●
●●●
● ●
●
●
●
●●
●●
● ●● ●●● ●● ●
● ● ●●●●
●
●●
● ●●
●
●
●
●
●
●●
●●●● ● ●●● ●●●
● ●●
● ●
● ●
●●●
●●
● ●
● ●● ●
●
●●●●
● ●●●
● ●
●●
●●●●
●●● ●●●●
●
●
●
●
● ●
●
●
● ● ● ●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
6000
●
●
●
●
●
●
●●
● ●●
●
●
●
●● ●●●
●
●●●●
●●
●●●● ● ● ●
●
●●
●● ●● ●
●●● ●
●
●
●●
●● ●●●
●
●●●
●●●●●
●●●
●●● ●
●●
●
●●●●
●
●●
●●●
●●
●
●
● ● ●
● ●●●
●●
●●●●●
●●
●
●
●
● ●
●
●●● ●
● ●
●
●
●
●
9000
●
●
●
●●
●
●
●
● ● ●● ●
●
●
● ●
●●
●●
● ●●
●
● ● ●
●● ●
●
●
●
●●
● ●
● ●●●
● ●
●●
●
●● ●● ●●
●
● ●● ●●
●
●
●
●
●
●
●
●
● ● ● ● ●● ● ●● ●●●● ●
●
● ● ●
●
● ● ●●● ●
● ● ● ●●
●●
●
● ●
●● ●
●
● ●●●● ●
●
● ●
● ●
●
●
●
●
●●
●●
●
●
●●
1.2
1.0
●
●
●
●
● ●
●
●
●
UAW
15
20
●
●
●
●
●
●● ●
●
●
● ●
●
●●●
●
● ●●
● ● ●●
●●
●
●
●
●
●● ● ●
● ●●
●
●
●
●
●
●
●
●
●
●
●●●● ● ● ● ●
●
● ●
●
●● ● ●
●
●●●●● ● ●●
● ●● ● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●● ● ● ●● ● ●
●●
● ● ● ●●●
●
●
●●
●
● ● ●●● ●
●●
●
●● ● ●●
● ●●
●
●
● ●●
●● ●
0.7 0.8 0.9 1.0 1.1 1.2
TCF
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●● ●
●
●
●●
●●
● ●
●
●●●
● ●●
●● ●
● ●
●
●●
●
●● ● ●
● ●
●
●●
●● ●
●
● ●● ●
● ● ●
●
●●
●● ●
●●
● ●● ●
●
●
●●●
●●
● ●● ● ● ● ● ●
●
●
●●●●● ● ●● ●
●
●
● ●
●●● ●
●●
●
●
● ● ● ●●● ●
●
● ● ● ●● ●● ● ● ●
●
●
●●
●
0.6
0.8
ECF
1.0
1.2 0
3000 6000 9000
Actual
Fig. 2. The distribution of each parameter in dataset
Table 3. Results from computed prediction by sample mean
Fold
1
2
3
4
5
6
7
8
9
10
Mean
4.2
x̄
MMRE [%]
3632.16
115
3549.98
132
3525.31
122
3631.45
139
3543.08
55
3535.79
248
3463.54
89
3573.16
99
3576.41
121
3622.15
153
3565.30
127
Karner’s model
Table 4. shows results from computed Karner’s model. The mean MMRE for
this predictive model is 96 %. We can also see that the 5th fold shows the lowest
error and the 6th fold shows the greatest error. This model used PF set on 20.
12
10
Count [−]
8
6
4
2
2000
4000
6000
8000
10000
Actual [man/hour]
Fig. 3. The distribution of actual efforts in dataset
Table 4. Results from computed Karner’s model
Fold MMRE [%]
1
119
2
94
3
93
4
117
5
43
6
122
7
78
8
100
9
76
10
115
Mean
96
4.3
Simple linear model
Table 5. shows models derived by simple linear regression. We can also see from
this table that the R2 is about 0.5. The mean MMRE for this predictive model
Table 5. Models derived by simple linear regression Equation 1
Fold
1
2
3
4
5
6
7
8
9
10
Mean
Intercept
568.15
640.26
602.52
656.39
363.07
589.50
651.37
533.68
600.84
500.54
570.63
β1
13.45
13.13
13.02
13.15
14.29
13.21
12.57
13.46
13.35
13.73
13.34
R2 MMRE [%]
0.50
90
0.47
74
0.50
80
0.49
91
0.50
30
0.45
114
0.49
68
0.51
80
0.48
60
0.50
91
0.49
78
is 78 %. We can also see that the 5th fold shows the lowest error and the 6th
fold shows the greatest error.
4.4
Simple linear model without intercept
Table 6. Models derived by simple linear regression without intercept Equation 2
Fold
1
2
3
4
5
6
7
8
9
10
Mean
PF
15.41
15.40
15.09
15.40
15.58
15.32
14.84
15.29
15.46
15.47
15.33
R2 MMRE [%]
0.86
85
0.85
62
0.85
72
0.85
83
0.85
29
0.84
88
0.85
65
0.85
75
0.84
48
0.85
82
0.85
69
Table 6. shows models derived by simple linear regression without intercept.
The mean PF is 15.33. This table also shows that the R2 is about 0.85 and the
mean MMRE is 69 %. We can also see that the 5th fold shows the lowest error
and the 6th fold shows the greatest error.
4.5
Multiple linear model
Table 7. shows models derived by multiple linear regression. This table also shows
that the R2 is about 0.54 and the mean MMRE is 69 %. We can also see that
the 5th fold shows the lowest error and the 4th fold shows the greatest error.
Table 7. Models derived by multiple linear regression
Fold
1
2
3
4
5
6
7
8
9
10
Mean
4.6
β0
-4278.38
-3722.15
-4374.94
-3948.38
-4423.32
-3961.68
-3678.35
-4536.31
-3966.67
-4326.92
-4121.71
β1
15.5
14.14
13.76
14.28
14.71
14.12
13.55
14.35
14.52
14.99
14.35
β2
-123.24
-103.45
-90.31
-75.13
-99.29
-96.87
-59.25
-101.36
-81.52
-94.16
-92.46
β3
3625.87
2545.9
3411.71
3127.86
3349.9
3024.78
2486.13
3404.54
2764.32
2976.17
3071.72
β4
2076.69
2638.47
2381.76
1941.63
2357.37
2303.34
2212.82
2565.34
2371.26
2527.32
2337.6
R2 MMRE [%]
0.58
86
0.52
59
0.54
66
0.55
89
0.54
25
0.51
72
0.54
73
0.56
75
0.52
55
0.55
87
0.54
69
Analytical programming
Table 8. Models derived by improved analytical programming
Fold
1
2
3
4
Model
MMRE [%]
times(UUCW,plus(plus(ECF,ECF),UAW))
78
times(UUCW,9.73)
61
times(plus(UUCW,8.37),9.94)
67
times(UUCW,9.95)
71
times(plus(minus(5.80,TCF),ECF),
plus(plus(UUCW,divide(pow(ECF,ECF),divide(TCF,UUCW))),
5
30
cos(minus(UAW,8.09))))
divide(plus(sqrt(plus(sqrt(4.22),ECF)),UUCW),
divide(times(cos(cos(4.88)),sqrt(TCF)),
6
99
times(4.03,sqrt(UAW))))
7
times(UUCW,9.55)
71
minus(times(plus(plus(pow(cos(ECF),
times(UAW,ECF)),TCF),9.31),UUCW),
8
59
sin(sin(times(pow(UAW,ECF),UUCW))))
9
times(UUCW,plus(7.90,sqrt(8.97)))
67
10
times(9.65,UUCW)
65
Mean
67
Table 8. shows models derived by improved analytical programming. As can
be seen in this table, the mean MMRE is 67 %. The improved analytical programming algorithm derived mostly simple model. Only couple models are represented
by non-linearity with sin, cos, pow, sqrt functions. The function ŷ = k ∗ U U CW
was derived in 4 cases, where k is coefficient generated by analytical programming. We can also see that the 5th fold shows the lowest error and the 6th fold
shows the greatest error.
4.7
Comparison
This section provides evidences about statistical performance for each chosen
method.
250
●
MMRE [%]
200
150
100
●
50
●
●
●
●
Mean
Karner's
S Lin. Reg.
S Lin. Reg. w Int. M Lin. Reg.
AP
Model [−]
Fig. 4. Box plot comparing each method MMREs
Figure 4. depicts box plots comparing each method MMREs. We can see that
the worst result was produced by the prediction by sample mean and Karner’s
model with PF set to 20. Linear regression models and improved analytical
programming method performed similarly. Improved analytical programming
shows very low variance against linear regression models. On this box plot can
be also seen some outliers which is depicted by points.
Two sample t-tests was conducted on MMRE values to provide the evidence
that each method produce no difference in true mean on significance level 95 %.
H0 : µ1 = µ2
HA : µ1 6= µ2
Table 9. presents t-test comparison for two samples about difference in means.
Where H0 means that the null hypothesis was accepted and HA means that the
Table 9. T-test comparison for two samples about difference in means
Mean
Karner’s
S Lin. Reg.
S Lin. Reg. w Int.
M Lin. Reg.
AP
Mean Karner’s S Lin. Reg. S Lin. Reg. w Int. M Lin. Reg.
H0
HA
HA
HA
H0
H0
HA
HA
HA
H0
H0
H0
HA
HA
H0
H0
HA
HA
H0
H0
HA
HA
H0
H0
H0
AP
HA
HA
H0
H0
H0
-
alternative hypothesis was accepted on significance level 95 %. As can be seen,
the analytical programming outperformed the prediction by sample mean and
also Karner’s model. However, the true mean difference for analytical programming and linear models is the same on chosen significance level.
5
Discussion
The study started out with a goals of answering three research questions outlined in research objective section. These questions are answered in the result
section of this paper.
RQ-1: Can we use sample mean for more accurate prediction?
This question is answered in result section respectively by comparing each
MMREs of other methods. The mean MMRE of this method is 127 % which
can be seen as exceptionally worse than other methods. The evidence for this
statement can be seen in Table 9.. The prediction by mean can be compared
only with Karner’s model where PF is set to 20.
RQ-2: Are linear regression models more accurate than improved analytical
programming?
This question is answered in result section. We used 3 different linear regression models. Simple linear model, simple linear model without intercept and
multiple linear model. These models are described in Experiment section. The
simple linear model has mean MMRE 78 % which can be seen as significant
improvement against the prediction by sample mean and Karner’s model. However, the simple regression had R2 only 0.49. On the other hand, the simple
linear model without intercept had MMRE about 69 %. This value is the same
for multiple linear regression model. Nevertheless, simple linear model without
intercept has a exceptionally better R2 value (0.85). If we compare these results
with model derived by improved analytical programming, we can see that the
analytical programming yields very similar result of MMRE. From Table 9., we
can see that there is no evidence that the improved analytical programming generated better results on significance level of 5 %. Nevertheless, overall MMREs
are lower for analytical programming.
RQ-3: Is there an evidence that the new models (linear regression or analytical programming) are more accurate than the original Karner’s model?
Firstly, we must emphasize the fact that the Karner’s model is not calibrated
and we used the standard value for PF (20). From the comparison from tables
presented in result section and from Table 9., we can state that nearly every
model outperformed the standard Karner’s model. The only exception is prediction by sample mean. Sample mean model and Karner’s model has the same
true mean on the significance level of 5 %. If the PF and the whole UCP method
is set to default values, there is a possibility that the model built by analytical
programming outperformed the standard UCP equation. The reason for this exceptionally worse results of Karner’s model can be fact that the PF need to be
set to number about 15. This fact comes from simple linear regression without
intercept where the β1 can be seen as a PF for Karner’s model.
6
Threats to Validity
It is widely recognised that several factors can bias the validity of empirical
studies. Therefore, our results are not devoid of validity threats.
6.1
External validity
External validity questions whether the results can be generalized outside the
specifications of a study [16]. Specific measures were taken to support external
validity; for example, a 10-fold CV technique was used to draw samples from the
population in order to conduct experiments. Likewise, the statistical tests used
in this paper, they are also quite standard. We note that the t-tests method used
in this paper features prominently. We used a relatively small size dataset, which
could be a significant threat to external validity. Similarly, we do not see how a
smaller or larger dataset size should yield reliable results. It is widely recognised
that, SEE datasets are neither easy to find nor easy to collect. This represents
an important external validity threat that can be mitigated only replicating
the study on another datasets. Another validity issue to mention is that either
improved analytical programming nor differential evolution has been exhausted
via fine-tuning. Therefore, future work is required to exhaust all the parameters
of these methods to use their best versions. Threat to external validity could
be also the implementation of the improved analytical programming and differential evolution algorithms. Moreover the improved analytical programming
implementation is quite new. Although we used standard implementations, there
is considerable amount of code, which could be the threat to validity.
6.2
Internal validity
Internal validity questions to what extent the cause-effect relationship between
dependent and independent variables hold [17]. To asses reliability to our study
we use 10-fold CV, which is standard procedure for this kind of study.
7
Conclusion
The current study found that the prediction model generated by analytical programming method can be seen as a valid for effort estimation. However, the
model generated by simple regression is as good as model generated by analytical programming. The most interesting part of this study is that the Karners’s
model with PF set to 20 performed exceptionally worse than most of the models. The linear model without intercept shows that the PF value should be set
to number around 15. The findings of this study have a number of important
implications for future research of the using of improved analytical programming as an effort estimation technique. More research is required to determine
the efficiency of analytical programming for this task. It would be interesting to
compare improved analytical programming to the other machine learning methods.
8
Acknowledgement
This study was supported by the internal grant of TBU in Zlin
No. IGA/FAI/2016/035 funded from the resources of specific university research.
References
1. J. W. Keung, “Theoretical Maximum Prediction Accuracy for Analogy-Based Software Cost Estimation,” Software Engineering Conference, 2008. APSEC ’08. 15th
Asia-Pacific, pp. 495–502, 2008.
2. W. Boehm, “Software Engineering Economics,” IEEE Transactions on Software
Engineering, vol. SE-10, pp. 4–21, jan 1984.
3. K. Atkinson and M. Shepperd, “Using Function Points to Find Cost Analogies,”
5th European Software Cost Modelling Meeting, Ivrea, Italy, pp. 1–5, 1994.
4. G. Karner, “Resource estimation for objectory projects,” Objective Systems SF
AB, pp. 1–9, 1993.
5. A. P. Subriadi and P. A. Ningrum, “Critical review of the effort rate value in
use case point method for estimating software development effort,” Journal of
Theroretical and Applied Information Technology, vol. 59, no. 3, pp. 735–744, 2014.
6. M. Ochodek, J. Nawrocki, and K. Kwarciak, “Simplifying effort estimation based
on Use Case Points,” Information and Software Technology, vol. 53, pp. 200–213,
mar 2011.
7. R. Silhavy, P. Silhavy, and Z. Prokopova, “Algorithmic Optimisation Method for
Improving Use Case Points Estimation,” PLOS ONE, vol. 10, nov 2015.
8. a. B. Nassif, L. F. Capretz, and D. Ho, “Estimating Software Effort Based on Use
Case Point Model Using Sugeno Fuzzy Inference System,” Tools with Artificial
Intelligence (ICTAI), 2011 23rd IEEE International Conference on, pp. 393–398,
2011.
9. E. Kocaguneli, T. Menzies, and J. W. Keung, “On the Value of Ensemble Effort
Estimation,” IEEE Transactions on Software Engineering, vol. 38, no. 6, pp. 1403–
1416, 2012.
10. A. Kaushik, a. K. Soni, and R. Soni, “An adaptive learning approach to software cost estimation,” Computing and Communication Systems (NCCCS), 2012
National Conference on, pp. 1–6, nov 2012.
11. I. Attarzadeh and S. Ow, “Software development cost and time forecasting using
a high performance artificial neural network model,” Intelligent Computing and
Information Science, pp. 18–26, 2011.
12. T. Urbanek, Z. Prokopova, R. Silhavy, and V. Vesela, “Prediction accuracy measurements as a fitness function for software effort estimation,” SpringerPlus, 2015.
13. I. Zelinka, D. Davendra, R. Senkerik, R. Jasek, and Z. Oplatkova, Analytical
programming-a novel approach for evolutionary synthesis of symbolic structures.
Rijeka: InTech, 2011.
14. Z. K. Oplatkova, R. Senkerik, I. Zelinka, and M. Pluhacek, “Analytic programming
in the task of evolutionary synthesis of a controller for high order oscillations stabilization of discrete chaotic systems,” Computers & Mathematics with Applications,
vol. 66, pp. 177–189, aug 2013.
15. T. Urbanek, Z. Prokopova, R. Silhavy, and A. Kuncar, “New Approach of Constant
Resolving of Analytical Programming,” 30th European Conference on Modeling and
Simulation, pp. 231–236, 2016.
16. D. Milicic and C. Wohlin, “Distribution patterns of effort estimations,” IEEE Conference Proceedings of Euromicro 2004, Track on Software Process and Product
Improvement, pp. 422–429, 2004.
17. Y. Batanlar and M. Ozuysal, “Introduction to machine learning.,” Methods in
molecular biology (Clifton, N.J.), vol. 1107, pp. 105–28, 2014.
Download