Comparing Linear Regression and other Models with Improved Analytical Programming Model for Software Effort Estimation Tomas Urbanek1 , Zdenka Prokopova1 , Radek Silhavy1 and Ales Kuncar1 Faculty of Applied Informatics Tomas Bata University in Zlin Nad Stranemi 4511 Czech Republic turbanek@fai.utb.cz Abstract. This paper compares the most common regression models with improved analytical programming for the most accurate effort estimation. For this paper, we used several models; for example simple linear models, multiple linear models, Karner’s model and other. We compare this models with models generated by improved analytical programming. This study uses 10-fold cross-validation (CV) to asses reliability and the standard statistical methods are also used. For comparison, we use MMRE measure for all models. The experimental results show that the linear regression and improved analytical programming techniques have a major role in prediction of effort in software engineering. All results were evaluated by standard approach: visual inspection and statistical significance testing. Keywords: analytical programming, linear regression, effort estimation, use case points 1 Introduction Effort estimation is defined as the activity of predicting the amount of effort required to complete a development of software project [1]. Despite of a lot of attempts of scientists and software engineers, there is still no optimal and effective method for every software project. The common way to improve effort estimation is to enhance the algorithmic methods. The algorithmic methods use mathematical formula for prediction. It is very common that this group also depends on the historical data. The most common example of algorithmic methods are COCOMO[2], FP[3] and last but not least UCP[4]. However, there are other algorithmic methods. It is essential that the calculation of effort estimation should be completed in early stages of software development cycle. The best case is if these calculations are already known during the requirement analysis [4]. The accurate and reliable effort estimations are crucial factor for the proper development cycle. These estimations are used for effective planning, monitoring and controlling the process of the software development. The prediction of effort estimations in software engineering is complex and complicated process. The main reason is that there are a lot of factors which influences the final prediction. One of the most substantial factor is human factor. For this reason, the artificial intelligence could compensate the prediction error calculated by software engineer. Nowadays, the use of artificial intelligence is very common in this research area. Some work has been done to enhance the effort estimation based on the Use Case Points method. These enhancements cover the review and calibration of the productivity factor such as the work of Subriadi et al. [5]. Another enhancement could be the construction investigation and simplification of the Use Case Points method presented by Ochodek et al. [6]. The recent work of Silhavy et al. [7] suggest a new approach ” automatic complexity estimation based on requirements ”, which is partly based on Use Case Points method. Another approach uses fuzzy inference system to improve accuracy of the Use Case Points method [8]. A very promising is research of Kocaguneli et al. [9]. This paper shows that ensemble of effort estimation methods could provide better results then a single estimator. The works of Kaushik et al. [10] and Attarzadeh et al. [11] use neural networks and COCOMO [2] method for prediction. In this article, we investigated the efficiency of several models. These models are prediction by mean, Karner’s model, simple linear regression, simple linear regression without intercept, multiple linear regression and models produced by improved analytical programming. The improved analytical programming can be seen as a regression function generator. In recent time, we published study that examined a selection of fitness function for analytical programming and we found that the best fitness function is MSE and very common MMRE [12]. In our best knowledge, no previous study has investigated the comparison of such models especially with improved analytical programming when Use Case Points method and k-fold cross validation was used. Therefore, this study makes a major contribution to research of Use Case Points method while analytical programming method and linear regression models are used. 1.1 The Use Case Points Method This effort estimation method was presented in 1993 by Gustav Karner[4]. It is based on a similar principle to the function point method. Project managers have to estimate the project parameters to four tables. Due to the aims of this paper, the detailed description of well known Use Case Points method is insignificant and hence omitted. Please refer to [4], [12] for more detailed description of the Use Case Points method. 1.2 Improved Analytical Programming Algorithm Analytical programming (AP) is a symbolic regression method. The core of analytical programming is a set of functions and operands. These mathematical objects are used for the synthesis of a new function. Every function in the analytical programming set core has its own varying number of parameters. The functions are sorted according to the parameters into General Function Sets (GFS). For example, GF S1par contains functions that have only 1 parameter e.g. sin(), cos(), and other functions. AP must be used with any evolutionary algorithm that consists of a population of individuals for its run [13], [14]. In this paper, Differential evolution (DE) is used as an analytical programming evolutionary algorithm. Also we utilize new improved analytical programming technique. The most important advantage for our application is the automatic constant resolving procedure. This algorithm was presented in article written by Urbanek et. al.[15]. 2 Research Objectives This section presents the design of the research questions. We compared linear regression models with improved analytical programming models for effort estimation. The research questions of our study can be outlined as follows: – RQ-1: Can we use sample mean for more accurate prediction? – RQ-2: Are linear regression models more accurate than improved analytical programming? – RQ-3: Is there an evidence that the new models (linear regression or analytical programming) are more accurate than original Karner’s model? The first research question (RQ-1) aims to get an insight on the dataset used in this research. We examine the dataset and then we use sample mean to produce predictions. Furthermore, the MMRE will be calculated for comparison against other models. The second research question (RQ-2) aims to the production of a linear regression models. For these models, MMRE will be also calculated. Another task in this question is to produce a model by the improved analytical programming technique. All of these models will be compared by MMRE measure. To address research question (RQ-3), we experimented with built models as reported and discussed in experiment section. To asses the evidences of statistical properties, we used exploratory statistical analysis and hypothesis testing. 3 Experiment For all models, we used 10-fold cross validation method to asses the reliability of our research. In our experiment, we build several prediction models. These are sample mean prediction, Karner’s model, simple linear model, multiple linear model and model built by analytical programming. For all models, we calculate MMRE measure. The MMRE is chosen as a criteria for model comparison. 3.1 Prediction by sample mean For each fold (10-fold CV), we calculated the sample mean x̄ on training data. This sample mean will be then used as a prediction on the testing fold. Then the MMRE will be calculated for comparison. 3.2 Karner’s model For each fold (10-fold CV), we calculated prediction on testing set. Then the MMRE will be calculated for comparison. For Karner’s model, we used standard value for PF which is set to 20. 3.3 Linear Regression In this research, we will present three models of linear regression. Simple linear regression, simple linear regression without intercept, and multiple linear regression. The equation for simple linear regression can be seen in equations (1) and (2). ŷ = β0 + β1 x (1) ,where ŷ is prediction (dependent variable), β0 is intercept, x is number of Use Case Points (UCP) and β1 can be seen as productivity factor for Use Case Points method. ŷ = β1 x (2) The multiple linear regression is represented by equation (3). ŷ = β0 + β1 U U CW + β2 U AW + β3 T CF + β4 ECF (3) ,where β1,2,3,4 are estimated coefficients for each Use Case Point parameter. For both linear regression models, we used 10-fold cross validation. MMRE for comparison is also calculated for these two models. 3.4 Analytical Programming The analytical Programming experiment can be seen in the Figure 1. In this experiment, we used 10-folds cross-validation. One equation was generated in one loop which was then verified with the rest of the dataset. The process begins with a cycle that loops through the number of folds. In the data preparation loop, the 10-fold cross-validation was used to split the dataset into two distinct sets. Then, there is a second loop. In this loop, the differential evolution process starts to generate an initial population. Analytical programming then uses this initial population to synthesize a new function. After that, the new function is evaluated by the MMRE. If the termination condition is met, one can assume that one has an optimal predictive model, and this model is then evaluated by the calculation of the MMRE on the testing set. Table 1 shows the analytical programming set-up. The number of leafs (functions built by analytical programming can be seen as trees) was set to 20 which can be recognized as a relatively high value. However, one needs to find the model that will be more accurate than the other models. There is no need to generate short and easily memorable models, but rather models that will be Fig. 1. Diagram of proposed experiment Table 1. Set-up of analytical programming Parameter Value Number of leafs 20 GFS - functions Plus, Subtract, Divide, Multiply, Sin, Cos, Power, Sqrt GFS - constants UUCW, UAW, TCF, ECF, K Constant K range 0-10 more accurate. Functions were chosen according to the assumed non-linearity of the dataset. Constant K range was set between 0-10, this value was tested in our previous experiment where these particular values have the best results. Table 2 shows the set-up of differential evolution. The best set-up of differential evolution is the subject of further research. Fitness Function The new model built by the analytical programming method contains the following parameters: UUCW, UAW, TCF and ECF. The models built by analytical programming method does not have to contain all of these Table 2. Set-up of differential evolution Parameter NP Generations F Cr Value 45 100 0.2 0.8 parameters. Equation (4) is used for optimization task. When the LAD result is closer to zero, then the accuracy of the proposed model is higher. LAD = n X |yi − ŷi | (4) i=1 ,where n is equal to the number of projects in training set, ŷi is prediction, yi is actual effort. 3.5 Dataset The dataset for this study was collected using document reviews and contribution from software companies. In the dataset, there are 143 distinct software projects. There are 5 values for each software project: UUCW, UAW, TCF, ECF and actual effort. The distribution of each parameter in the dataset and also the correlation coefficients can be seen on figure 2. Nearly all parameters is normally distributed. The only exception is Actual effort which is skewed to the left side. We can also see a considerably high correlation between Actual effort and UUCW parameter. This relationship will be used to built linear regression model. Figure 3 shows the distribution of actual effort in dataset. As can be seen the majority of the software projects was completed between 1800 man/hour to about 5000 man/hour. We can also see the skewness of this data to the left side. This means that the actual effort is not normally distributed. This can be also seen on density plot on figure 2. The mean for actual effort is 3565 man/hour. 4 Results In this section, we present the result of our study. All the calculations were performed by 10-fold cross validation on 143 software projects. 4.1 Prediction by sample mean Table 3. shows results from computed prediction by sample mean. As can be seen, the sample mean for this dataset is 3565.3 man/hour. The MMRE for each fold can be seen on this table. The mean MMRE for these predictive models is 127 %. We can also see that the 5th fold shows the lowest error and the 6th fold shows the greatest error. UUCW 0.003 Corr: 0.412 0.002 Corr: 0.137 Corr: 0.0938 Corr: 0.691 Corr: 0.299 Corr: 0.174 Corr: 0.216 Corr: 0.433 Corr: 0.268 0.001 0.000 20 UAW 15 10 5 ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ●●●●● ● ●●● ●● ●● ● ●●●● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 1.2 TCF 1.1 1.0 0.9 0.8 0.7 ECF 0.8 0.6 ● Actual 3000 0 ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ●● ●● ● ● ● ● ●● ●●●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ●● ● ●●●● ● ●●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ●●●● ● ● ● ●● ●●● ●● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ●●● ● ● ● ●●●● ● ● ● ●● ● ● ● ●●●●● ●● ● ● ● ●●●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ●● ●● ● ● ● ●● ●● ● ● ●●● ● ● ●●●● ●●● ● ● ● ●●● ● ●●●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●●● ● ● ● ●●●●●● ● ●● ● ● ●● ● ● ●●●● ● ●●● ● 100 200 300 400 500 UUCW ● ● ● ●● ● ● 5 10 Corr: 0.246 ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ●●● ●● ● ● ● ●●●● ● ●● ● ●● ● ● ● ● ● ●● ●●●● ● ●●● ●●● ● ●● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ●●●● ● ●●● ● ● ●● ●●●● ●●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 6000 ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ●●● ● ●●●● ●● ●●●● ● ● ● ● ●● ●● ●● ● ●●● ● ● ● ●● ●● ●●● ● ●●● ●●●●● ●●● ●●● ● ●● ● ●●●● ● ●● ●●● ●● ● ● ● ● ● ● ●●● ●● ●●●●● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● 9000 ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ●● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● 1.2 1.0 ● ● ● ● ● ● ● ● ● UAW 15 20 ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ●●● ● ●● ● ●● ● ●● ● ●● ● ● ● ●● ●● ● 0.7 0.8 0.9 1.0 1.1 1.2 TCF ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●●● ● ●● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ●●● ●● ● ●● ● ● ● ● ● ● ● ●●●●● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● 0.6 0.8 ECF 1.0 1.2 0 3000 6000 9000 Actual Fig. 2. The distribution of each parameter in dataset Table 3. Results from computed prediction by sample mean Fold 1 2 3 4 5 6 7 8 9 10 Mean 4.2 x̄ MMRE [%] 3632.16 115 3549.98 132 3525.31 122 3631.45 139 3543.08 55 3535.79 248 3463.54 89 3573.16 99 3576.41 121 3622.15 153 3565.30 127 Karner’s model Table 4. shows results from computed Karner’s model. The mean MMRE for this predictive model is 96 %. We can also see that the 5th fold shows the lowest error and the 6th fold shows the greatest error. This model used PF set on 20. 12 10 Count [−] 8 6 4 2 2000 4000 6000 8000 10000 Actual [man/hour] Fig. 3. The distribution of actual efforts in dataset Table 4. Results from computed Karner’s model Fold MMRE [%] 1 119 2 94 3 93 4 117 5 43 6 122 7 78 8 100 9 76 10 115 Mean 96 4.3 Simple linear model Table 5. shows models derived by simple linear regression. We can also see from this table that the R2 is about 0.5. The mean MMRE for this predictive model Table 5. Models derived by simple linear regression Equation 1 Fold 1 2 3 4 5 6 7 8 9 10 Mean Intercept 568.15 640.26 602.52 656.39 363.07 589.50 651.37 533.68 600.84 500.54 570.63 β1 13.45 13.13 13.02 13.15 14.29 13.21 12.57 13.46 13.35 13.73 13.34 R2 MMRE [%] 0.50 90 0.47 74 0.50 80 0.49 91 0.50 30 0.45 114 0.49 68 0.51 80 0.48 60 0.50 91 0.49 78 is 78 %. We can also see that the 5th fold shows the lowest error and the 6th fold shows the greatest error. 4.4 Simple linear model without intercept Table 6. Models derived by simple linear regression without intercept Equation 2 Fold 1 2 3 4 5 6 7 8 9 10 Mean PF 15.41 15.40 15.09 15.40 15.58 15.32 14.84 15.29 15.46 15.47 15.33 R2 MMRE [%] 0.86 85 0.85 62 0.85 72 0.85 83 0.85 29 0.84 88 0.85 65 0.85 75 0.84 48 0.85 82 0.85 69 Table 6. shows models derived by simple linear regression without intercept. The mean PF is 15.33. This table also shows that the R2 is about 0.85 and the mean MMRE is 69 %. We can also see that the 5th fold shows the lowest error and the 6th fold shows the greatest error. 4.5 Multiple linear model Table 7. shows models derived by multiple linear regression. This table also shows that the R2 is about 0.54 and the mean MMRE is 69 %. We can also see that the 5th fold shows the lowest error and the 4th fold shows the greatest error. Table 7. Models derived by multiple linear regression Fold 1 2 3 4 5 6 7 8 9 10 Mean 4.6 β0 -4278.38 -3722.15 -4374.94 -3948.38 -4423.32 -3961.68 -3678.35 -4536.31 -3966.67 -4326.92 -4121.71 β1 15.5 14.14 13.76 14.28 14.71 14.12 13.55 14.35 14.52 14.99 14.35 β2 -123.24 -103.45 -90.31 -75.13 -99.29 -96.87 -59.25 -101.36 -81.52 -94.16 -92.46 β3 3625.87 2545.9 3411.71 3127.86 3349.9 3024.78 2486.13 3404.54 2764.32 2976.17 3071.72 β4 2076.69 2638.47 2381.76 1941.63 2357.37 2303.34 2212.82 2565.34 2371.26 2527.32 2337.6 R2 MMRE [%] 0.58 86 0.52 59 0.54 66 0.55 89 0.54 25 0.51 72 0.54 73 0.56 75 0.52 55 0.55 87 0.54 69 Analytical programming Table 8. Models derived by improved analytical programming Fold 1 2 3 4 Model MMRE [%] times(UUCW,plus(plus(ECF,ECF),UAW)) 78 times(UUCW,9.73) 61 times(plus(UUCW,8.37),9.94) 67 times(UUCW,9.95) 71 times(plus(minus(5.80,TCF),ECF), plus(plus(UUCW,divide(pow(ECF,ECF),divide(TCF,UUCW))), 5 30 cos(minus(UAW,8.09)))) divide(plus(sqrt(plus(sqrt(4.22),ECF)),UUCW), divide(times(cos(cos(4.88)),sqrt(TCF)), 6 99 times(4.03,sqrt(UAW)))) 7 times(UUCW,9.55) 71 minus(times(plus(plus(pow(cos(ECF), times(UAW,ECF)),TCF),9.31),UUCW), 8 59 sin(sin(times(pow(UAW,ECF),UUCW)))) 9 times(UUCW,plus(7.90,sqrt(8.97))) 67 10 times(9.65,UUCW) 65 Mean 67 Table 8. shows models derived by improved analytical programming. As can be seen in this table, the mean MMRE is 67 %. The improved analytical programming algorithm derived mostly simple model. Only couple models are represented by non-linearity with sin, cos, pow, sqrt functions. The function ŷ = k ∗ U U CW was derived in 4 cases, where k is coefficient generated by analytical programming. We can also see that the 5th fold shows the lowest error and the 6th fold shows the greatest error. 4.7 Comparison This section provides evidences about statistical performance for each chosen method. 250 ● MMRE [%] 200 150 100 ● 50 ● ● ● ● Mean Karner's S Lin. Reg. S Lin. Reg. w Int. M Lin. Reg. AP Model [−] Fig. 4. Box plot comparing each method MMREs Figure 4. depicts box plots comparing each method MMREs. We can see that the worst result was produced by the prediction by sample mean and Karner’s model with PF set to 20. Linear regression models and improved analytical programming method performed similarly. Improved analytical programming shows very low variance against linear regression models. On this box plot can be also seen some outliers which is depicted by points. Two sample t-tests was conducted on MMRE values to provide the evidence that each method produce no difference in true mean on significance level 95 %. H0 : µ1 = µ2 HA : µ1 6= µ2 Table 9. presents t-test comparison for two samples about difference in means. Where H0 means that the null hypothesis was accepted and HA means that the Table 9. T-test comparison for two samples about difference in means Mean Karner’s S Lin. Reg. S Lin. Reg. w Int. M Lin. Reg. AP Mean Karner’s S Lin. Reg. S Lin. Reg. w Int. M Lin. Reg. H0 HA HA HA H0 H0 HA HA HA H0 H0 H0 HA HA H0 H0 HA HA H0 H0 HA HA H0 H0 H0 AP HA HA H0 H0 H0 - alternative hypothesis was accepted on significance level 95 %. As can be seen, the analytical programming outperformed the prediction by sample mean and also Karner’s model. However, the true mean difference for analytical programming and linear models is the same on chosen significance level. 5 Discussion The study started out with a goals of answering three research questions outlined in research objective section. These questions are answered in the result section of this paper. RQ-1: Can we use sample mean for more accurate prediction? This question is answered in result section respectively by comparing each MMREs of other methods. The mean MMRE of this method is 127 % which can be seen as exceptionally worse than other methods. The evidence for this statement can be seen in Table 9.. The prediction by mean can be compared only with Karner’s model where PF is set to 20. RQ-2: Are linear regression models more accurate than improved analytical programming? This question is answered in result section. We used 3 different linear regression models. Simple linear model, simple linear model without intercept and multiple linear model. These models are described in Experiment section. The simple linear model has mean MMRE 78 % which can be seen as significant improvement against the prediction by sample mean and Karner’s model. However, the simple regression had R2 only 0.49. On the other hand, the simple linear model without intercept had MMRE about 69 %. This value is the same for multiple linear regression model. Nevertheless, simple linear model without intercept has a exceptionally better R2 value (0.85). If we compare these results with model derived by improved analytical programming, we can see that the analytical programming yields very similar result of MMRE. From Table 9., we can see that there is no evidence that the improved analytical programming generated better results on significance level of 5 %. Nevertheless, overall MMREs are lower for analytical programming. RQ-3: Is there an evidence that the new models (linear regression or analytical programming) are more accurate than the original Karner’s model? Firstly, we must emphasize the fact that the Karner’s model is not calibrated and we used the standard value for PF (20). From the comparison from tables presented in result section and from Table 9., we can state that nearly every model outperformed the standard Karner’s model. The only exception is prediction by sample mean. Sample mean model and Karner’s model has the same true mean on the significance level of 5 %. If the PF and the whole UCP method is set to default values, there is a possibility that the model built by analytical programming outperformed the standard UCP equation. The reason for this exceptionally worse results of Karner’s model can be fact that the PF need to be set to number about 15. This fact comes from simple linear regression without intercept where the β1 can be seen as a PF for Karner’s model. 6 Threats to Validity It is widely recognised that several factors can bias the validity of empirical studies. Therefore, our results are not devoid of validity threats. 6.1 External validity External validity questions whether the results can be generalized outside the specifications of a study [16]. Specific measures were taken to support external validity; for example, a 10-fold CV technique was used to draw samples from the population in order to conduct experiments. Likewise, the statistical tests used in this paper, they are also quite standard. We note that the t-tests method used in this paper features prominently. We used a relatively small size dataset, which could be a significant threat to external validity. Similarly, we do not see how a smaller or larger dataset size should yield reliable results. It is widely recognised that, SEE datasets are neither easy to find nor easy to collect. This represents an important external validity threat that can be mitigated only replicating the study on another datasets. Another validity issue to mention is that either improved analytical programming nor differential evolution has been exhausted via fine-tuning. Therefore, future work is required to exhaust all the parameters of these methods to use their best versions. Threat to external validity could be also the implementation of the improved analytical programming and differential evolution algorithms. Moreover the improved analytical programming implementation is quite new. Although we used standard implementations, there is considerable amount of code, which could be the threat to validity. 6.2 Internal validity Internal validity questions to what extent the cause-effect relationship between dependent and independent variables hold [17]. To asses reliability to our study we use 10-fold CV, which is standard procedure for this kind of study. 7 Conclusion The current study found that the prediction model generated by analytical programming method can be seen as a valid for effort estimation. However, the model generated by simple regression is as good as model generated by analytical programming. The most interesting part of this study is that the Karners’s model with PF set to 20 performed exceptionally worse than most of the models. The linear model without intercept shows that the PF value should be set to number around 15. The findings of this study have a number of important implications for future research of the using of improved analytical programming as an effort estimation technique. More research is required to determine the efficiency of analytical programming for this task. It would be interesting to compare improved analytical programming to the other machine learning methods. 8 Acknowledgement This study was supported by the internal grant of TBU in Zlin No. IGA/FAI/2016/035 funded from the resources of specific university research. References 1. J. W. Keung, “Theoretical Maximum Prediction Accuracy for Analogy-Based Software Cost Estimation,” Software Engineering Conference, 2008. APSEC ’08. 15th Asia-Pacific, pp. 495–502, 2008. 2. W. Boehm, “Software Engineering Economics,” IEEE Transactions on Software Engineering, vol. SE-10, pp. 4–21, jan 1984. 3. K. Atkinson and M. Shepperd, “Using Function Points to Find Cost Analogies,” 5th European Software Cost Modelling Meeting, Ivrea, Italy, pp. 1–5, 1994. 4. G. Karner, “Resource estimation for objectory projects,” Objective Systems SF AB, pp. 1–9, 1993. 5. A. P. Subriadi and P. A. Ningrum, “Critical review of the effort rate value in use case point method for estimating software development effort,” Journal of Theroretical and Applied Information Technology, vol. 59, no. 3, pp. 735–744, 2014. 6. M. Ochodek, J. Nawrocki, and K. Kwarciak, “Simplifying effort estimation based on Use Case Points,” Information and Software Technology, vol. 53, pp. 200–213, mar 2011. 7. R. Silhavy, P. Silhavy, and Z. Prokopova, “Algorithmic Optimisation Method for Improving Use Case Points Estimation,” PLOS ONE, vol. 10, nov 2015. 8. a. B. Nassif, L. F. Capretz, and D. Ho, “Estimating Software Effort Based on Use Case Point Model Using Sugeno Fuzzy Inference System,” Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on, pp. 393–398, 2011. 9. E. Kocaguneli, T. Menzies, and J. W. Keung, “On the Value of Ensemble Effort Estimation,” IEEE Transactions on Software Engineering, vol. 38, no. 6, pp. 1403– 1416, 2012. 10. A. Kaushik, a. K. Soni, and R. Soni, “An adaptive learning approach to software cost estimation,” Computing and Communication Systems (NCCCS), 2012 National Conference on, pp. 1–6, nov 2012. 11. I. Attarzadeh and S. Ow, “Software development cost and time forecasting using a high performance artificial neural network model,” Intelligent Computing and Information Science, pp. 18–26, 2011. 12. T. Urbanek, Z. Prokopova, R. Silhavy, and V. Vesela, “Prediction accuracy measurements as a fitness function for software effort estimation,” SpringerPlus, 2015. 13. I. Zelinka, D. Davendra, R. Senkerik, R. Jasek, and Z. Oplatkova, Analytical programming-a novel approach for evolutionary synthesis of symbolic structures. Rijeka: InTech, 2011. 14. Z. K. Oplatkova, R. Senkerik, I. Zelinka, and M. Pluhacek, “Analytic programming in the task of evolutionary synthesis of a controller for high order oscillations stabilization of discrete chaotic systems,” Computers & Mathematics with Applications, vol. 66, pp. 177–189, aug 2013. 15. T. Urbanek, Z. Prokopova, R. Silhavy, and A. Kuncar, “New Approach of Constant Resolving of Analytical Programming,” 30th European Conference on Modeling and Simulation, pp. 231–236, 2016. 16. D. Milicic and C. Wohlin, “Distribution patterns of effort estimations,” IEEE Conference Proceedings of Euromicro 2004, Track on Software Process and Product Improvement, pp. 422–429, 2004. 17. Y. Batanlar and M. Ozuysal, “Introduction to machine learning.,” Methods in molecular biology (Clifton, N.J.), vol. 1107, pp. 105–28, 2014.