MSc Software Maintenance

16/04/2020

MSc Software Maintenance

MS Viðhald Hugbúnaðar

Fyrirlestrar 43 og 44

Estimating Effort for Corrective

Software Maintenance

Dr Andy Brooks 1

Case Study

Dæmisaga

Reference

Effort Estimation for Corrective Software Maintenance ,

Andrea De Lucia, Eugenio Pompella, and Silvio Stefanucci,

The Fourteenth International Conference on Software

Engineering and Knowledge Engineering (SEKE’02) pp 409416, 2002. ©ACM

16/04/2020 Dr Andy Brooks 2

1. Introduction

• Effort estimation helps managers:

– plan resource and staff allocation

– prepare less risky bids for external contracts

– make maintain versus buy decisions

• Effort estimation is complicated by:

– the different types of software maintenance

• corrective, adaptive, perfective, preventive

– the scope of software maintenance work

• simple method fixes through to full reengineering


1. Introduction

• Effort estimation requires the use of quantitative metrics.

• Software maintenance costs are mainly human resource costs.

– the person-days needed

• A linear or non-linear relationship between complexity/size and effort is “commonly assumed”.


2. RELATED WORK

Estimation by analogy

simple fictitious example by Andy

• The following historical data is available:

– Project A involved 100 maintenance requests for 110,000 LOC and took 25 person-days.

– Project B involved 105 maintenance requests for 111,000 LOC and took 28 person-days.

– Project C involved 20 maintenance requests for 2,000 LOC and took 2 person-days.

• Project D will involve 85 maintenance requests on

91,000 LOC so how much effort is required?

• Project A is the closest match so the effort expended for

Project A can be used as an estimate for Project D: 25 person-days.


2. RELATED WORK

Shepperd, M., Schofield, C., and Kitchenham, B. Effort Estimation

Using Analogy. Proceedings of the International Conference on

Software Engineering (ICSE ´96), ©IEEE, 1996, 170-178.

• The first step is deciding on the variables used to describe projects.

– “all datasets had at least one variable that was in some sense size related”

• The second step is deciding on how to determine similarity.

– “Analogies are found by measuring Euclidean distance in ndimensional space where each dimension corresponds to a variable. Values are standardised so that each dimension contributes equal weight to the process of finding analogies.”

16/04/2020

ArchANGEL tool here: http://dec.bournemouth.ac.uk/ESERG/ANGEL/

Dr Andy Brooks 6

2. RELATED WORK




(x

2

,y

2

)

Euclidean distance is:

√((x

1

- x

2

)² + (y

1

- y

2

)²)

Manhattan distance is: (x

2

-x

1

)+(y

2

-y

1

)

(x

1

,y

1

)

• “In N dimensions, the Euclidean distance between two points p and q is √(∑ i=1

N (p i

-q i

)²) where p coordinate of p (or q) in dimension i.” i

(or q i

) is the

– http://www.nist.gov/dads/


2. RELATED WORK




• The third step is deciding how to use known effort data to derive an effort estimate for the new project.

– just use the effort for the closest project?

– average the effort for the X closest projects?

– average the effort for the X closest projects weighting by closeness of matching?

• Shepperd et. al. used X = 2 and an unweighted average.


2. RELATED WORK




• Effort estimation using analogy was found to outperform traditional algorithmic methods for six different datasets.

– later studies, however, did not support this finding

• Shepperd et. al. suggest it is better to use more than one estimation technique, to assess the degree of risk associated with a prediction.

– if effort estimation using regression analysis and analogy strongly disagree, then perhaps any estimation is unsafe

– Andy says : in industrial projects, it is unlikely resources are available to apply more than one technqiue


3. Experimental Setting

• Multiple linear regression analysis was applied to real data from five corrective maintenance projects from different companies.

– All five corrective maintenance projects were outsourced to one supplier company whose maintenance process closely followed the IEEE

Standard for Software Maintenance.

• The data set comprised 144 observations corresponding to monthly maintenance periods.


Missing Data Techniques

Treatment of missing values

• If a value is missing one approach is simply to exclude the entire observation [effort, size, NA, MB, NC] from the model building process.

– the safest approach

• Another approach is to substitute the mean or the median value calculated from the other observations.

• Yet another approach is to find the most similar observation and use the value found there.

– best analogy found by calculating euclidean distances

• Fortunately, the data set did not contain missing values.



Data available

• Size of the system.

• Effort spent in the maintenance period.

• Number of maintenance tasks by

– type A source code modification

– type B fixing data misalignments through database queries

• data cleansing

– type C (not A or B) user disoperation, problems out of contract, etc.

• Other metrics such as software complexity were not available in full across all the projects.



Table 1: Collected Metrics

©ACM



Table 2: Descriptive statistics

©ACM

16/04/2020

144 observations, monthly maintenance periods

1960/(35hrs*4wks) = 14 person months

Dr Andy Brooks 14

4. Building Effort Estimation Models

• Multiple linear regression analysis minimizes the sum of the squared error.

• Regression analysis is said to be “as good as or better than many competing modeling techniques”.

– see references [7] and [18] of the case study article which showed estimation by analogy was not better

• Incorporating the size of a maintenance task would be useful, but this metric was not available.

• Analysis of residuals from the regression analyses revealed no non-linearity or other trends.


http://www.physics.csbsju.edu/stats/box2.html

Dealing with outliers

outlier/enfari

• If a value is deemed to be an outlier, one approach is to exclude the entire observation.

– outliers can be caused by transcription errors

• In a box plot, the box contains

50% of the data set.

– the interquartile range (IQR)

• 1.5* IQR away from the box, a value is a suspected outlier

• 3.0*IQR away from the box, a value is deemed an outlier

There were no obvious outliers in the data set.


correlation matrix/fylgnifylki

Table 3: Metrics correlation matrix

©ACM

• There are no strong correlations between the independent variables used to build the regression models.

• N (total number) correlates less well with NA possibly because NA is much smaller than NB and NC.

• No explanation is given for the correlation r = 0.6458.

16/04/2020 Dr Andy Brooks strong usually means r > 0.7

17

Critical commentary

from Andy

• Regression models are built assuming that model variables are independent.

– so it is important to carry out checks e.g. examine correlations

• We do not know the nature of the correlation coefficient used. Pearson is applied to normally distributed data and

Spearman to non-normally distributed data.

– sometimes researchers compute both to be sure

• There are some large differences between means and medians in Table 2 which suggests non-normality.

– Spearman correlation coefficients should have been calculated

• The correlation of 0.6458 suggests a real linkage between NA and NC i.e. they may not be independent.


Some plots illustrating correlations of various sizes

16/04/2020 Dr Andy Brooks http://www.jerrydallal.com/

19

4. Building Effort Estimation Models

Effort estimation models A, B, C

• NBC is the sum of NB and NC recall


4.1 Evaluating Model Performances

• The coefficient of determination R 2 represents the percentage of variation in the dependent variable explained by the independent variables of the model.

• Having a high R 2 does not guarantee the quality of future predictions.

– R 2 does not represent the performance of the model on a different data set, only the data set upon which the model was built.


Table 4 Model parameters

©ACM

• All model variables are statistically significant (p > 0,05).

• Model C explains 90% of the variation in effort.



Assessing the quality of future predictions

PRESS (PREdiction Sum of Squares)

• ŷ (y-hat) means predicted value.

• The residual represents the difference between the i th value in the data set and the value predicted from a regression analysis using all data points except the i th .

• In a data set of size n, n separate regression equations are calculated.

• Smaller PRESS scores are better.

• PRESS is also known as “leave-one-out cross validation”.




SPR



• SPR is the sum of the absolute values rather than the squares of the PRESS residuals.

• SPR is used when a few large PRESS residuals can inflate the PRESS score unreasonably.




MMRE (Mean Magnitude Relative Error)

• MRE i is the magnitude of the relative error.



• MMRE is the mean magnitude.

• MdMRE is the median magnitude. MMRE might be dominated by a few MREs with very high values.




PRED

• RE is the relative error.

• “We believe that maintenance managers may, in most cases and specially for small maintenance tasks, accept a relative error between the actual and predicted effort of about 50%.”

• According to reference [36] (1991) of the case study article, an average error of 100% can be considered

“good” and an average error of 32% “outstanding”.



Table 5: Leave-one-out cross validation ©ACM

• Model C is clearly better.

– Almost 50% of cases have a relative error of less than 25%.

– Almost 83% of cases have a relative error of less than 50%.


extending the evaluation of Model C

Leave More Out Cross Validation for Model C

• The data set is randomly partioned into a training data set and a test set.

• The training data set is used to build the model.

• The test data set is used to assess the quality of the model ´s prediction.

• L x means the training (learning) data set is composed of x% of the observations.

• T

100-x means the test data set is composed of

100-x% of the observations.


Model C

Table 6: Leave more out cross validation with random partitions

• As the size of the learning set decreases, so does the quality of prediction, as expected.


Model C

Critical commentary

from Andy

• It is not stated how many partitions were used to establish each of the average values in Table 6.

• a minimum sample size of 10 is usually required to compute an average with reasonable accuracy

• The trend in Table 6 makes sense, but it is difficult to believe the PRED values in Table 6 for L

90

-T

10

.

• PRED

50

= 100% yet PRED

50 is only 82.64% when all the data except one observation is used for training.

– The authors should have addressed what appears to be an anomalous result.


5 projects

Table 7 Cross validation at a project level

Model C

• Column P1 represents training with P2, P3, P4, and P5 and the results of testing on project P1.

– the regression analysis has “no knowledge” of project P1


Model C

Cross validation at the project level

• Predictive performance is poor when using projects P1 and P3 as test sets.

• Project P1 had no maintenance tasks of type B and this might explain the poor predictive performance of a model which actually has NB as a predictor variable.

• No explanation is provided for the poor predictive performance using P3 as a test set.


5. Conclusion

• Previously, the supplier company (i.e. the company doing the maintenance) had used a prediction model which did not distinguish between different types of maintenance task.

• PRED values for this earlier prediction model were not very satisfactory.

– PRED

25

– PRED

50

= 33.33%

= 53.47%

• The authors believed that modelling different types of maintenance task (A, B, and C) would improve prediction, which it did, especially for Model C.

– PRED

25

– PRED

50

16/04/2020

= 49.31%

= 82.64%

Dr Andy Brooks leave-out-one PREDs

33

5. Conclusion

• More complicated prediction models could be built, but the authors chose not to, so that the models could be easily calculated by working engineers and managers.

• Effort estimation can also involve estimating values for the independent variables.

– estimating the number and type of maintenance tasks

“ex ante” in a forthcoming maintenance period can be done reasonably accurately from historical data

• more complicated models involve more variables to estimate , making it more difficult to predict forthcoming effort


5. Conclusion

• The greatest weakness of using regression models for effort estimation is that they only apply to the “analyzed domain and technological environment”.

– i.e. The prediction models are company specific and you cannot apply the values determined for the model coefficients in another company setting.

• Andy says: This is a likely explanation for the “cross validation at the project level” results. Projects were from different companies so it is perhaps not surprising that trying to predict for one company using data for other companies sometimes did not work well.

– P1 and P3 in Table 7


5. Conclusion

• The models presented were adopted by the supplier company providing maintenance services.


MSc Software Maintenance

MSc Software Maintenance

MS Viðhald Hugbúnaðar

Case Study

Dæmisaga

1. Introduction

1. Introduction

Estimation by analogy

3. Experimental Setting

Treatment of missing values

Data available

Table 1: Collected Metrics

Table 2: Descriptive statistics

4. Building Effort Estimation Models

Dealing with outliers

Table 3: Metrics correlation matrix

Critical commentary

Effort estimation models A, B, C

4.1 Evaluating Model Performances

Table 4 Model parameters

Critical commentary

Cross validation at the project level

5. Conclusion

5. Conclusion

5. Conclusion

5. Conclusion

Related documents

Products

Support

MSc Software Maintenance