MSc Software Maintenance

advertisement

16/04/2020

MSc Software Maintenance

MS Viðhald Hugbúnaðar

Fyrirlestrar 43 og 44

Estimating Effort for Corrective

Software Maintenance

Dr Andy Brooks 1

Case Study

Dæmisaga

Reference

Effort Estimation for Corrective Software Maintenance ,

Andrea De Lucia, Eugenio Pompella, and Silvio Stefanucci,

The Fourteenth International Conference on Software

Engineering and Knowledge Engineering (SEKE’02) pp 409416, 2002. ©ACM

16/04/2020 Dr Andy Brooks 2

1. Introduction

• Effort estimation helps managers:

– plan resource and staff allocation

– prepare less risky bids for external contracts

– make maintain versus buy decisions

• Effort estimation is complicated by:

– the different types of software maintenance

• corrective, adaptive, perfective, preventive

– the scope of software maintenance work

• simple method fixes through to full reengineering

16/04/2020 Dr Andy Brooks 3

1. Introduction

• Effort estimation requires the use of quantitative metrics.

• Software maintenance costs are mainly human resource costs.

– the person-days needed

• A linear or non-linear relationship between complexity/size and effort is “commonly assumed”.

16/04/2020 Dr Andy Brooks 4

2. RELATED WORK

Estimation by analogy

simple fictitious example by Andy

• The following historical data is available:

– Project A involved 100 maintenance requests for 110,000 LOC and took 25 person-days.

– Project B involved 105 maintenance requests for 111,000 LOC and took 28 person-days.

– Project C involved 20 maintenance requests for 2,000 LOC and took 2 person-days.

• Project D will involve 85 maintenance requests on

91,000 LOC so how much effort is required?

• Project A is the closest match so the effort expended for

Project A can be used as an estimate for Project D: 25 person-days.

16/04/2020 Dr Andy Brooks 5

2. RELATED WORK

Shepperd, M., Schofield, C., and Kitchenham, B. Effort Estimation

Using Analogy. Proceedings of the International Conference on

Software Engineering (ICSE ´96), ©IEEE, 1996, 170-178.

• The first step is deciding on the variables used to describe projects.

– “all datasets had at least one variable that was in some sense size related”

• The second step is deciding on how to determine similarity.

– “Analogies are found by measuring Euclidean distance in ndimensional space where each dimension corresponds to a variable. Values are standardised so that each dimension contributes equal weight to the process of finding analogies.”

16/04/2020

ArchANGEL tool here: http://dec.bournemouth.ac.uk/ESERG/ANGEL/

Dr Andy Brooks 6

2. RELATED WORK

Shepperd, M., Schofield, C., and Kitchenham, B. Effort Estimation

Using Analogy. Proceedings of the International Conference on

Software Engineering (ICSE ´96), ©IEEE, 1996, 170-178.

(x

2

,y

2

)

Euclidean distance is:

√((x

1

- x

2

)² + (y

1

- y

2

)²)

Manhattan distance is: (x

2

-x

1

)+(y

2

-y

1

)

(x

1

,y

1

)

• “In N dimensions, the Euclidean distance between two points p and q is √(∑ i=1

N (p i

-q i

)²) where p coordinate of p (or q) in dimension i.” i

(or q i

) is the

– http://www.nist.gov/dads/

16/04/2020 Dr Andy Brooks 7

2. RELATED WORK

Shepperd, M., Schofield, C., and Kitchenham, B. Effort Estimation

Using Analogy. Proceedings of the International Conference on

Software Engineering (ICSE ´96), ©IEEE, 1996, 170-178.

• The third step is deciding how to use known effort data to derive an effort estimate for the new project.

– just use the effort for the closest project?

– average the effort for the X closest projects?

– average the effort for the X closest projects weighting by closeness of matching?

• Shepperd et. al. used X = 2 and an unweighted average.

16/04/2020 Dr Andy Brooks 8

2. RELATED WORK

Shepperd, M., Schofield, C., and Kitchenham, B. Effort Estimation

Using Analogy. Proceedings of the International Conference on

Software Engineering (ICSE ´96), ©IEEE, 1996, 170-178.

• Effort estimation using analogy was found to outperform traditional algorithmic methods for six different datasets.

– later studies, however, did not support this finding

• Shepperd et. al. suggest it is better to use more than one estimation technique, to assess the degree of risk associated with a prediction.

– if effort estimation using regression analysis and analogy strongly disagree, then perhaps any estimation is unsafe

– Andy says : in industrial projects, it is unlikely resources are available to apply more than one technqiue

16/04/2020 Dr Andy Brooks 9

3. Experimental Setting

• Multiple linear regression analysis was applied to real data from five corrective maintenance projects from different companies.

– All five corrective maintenance projects were outsourced to one supplier company whose maintenance process closely followed the IEEE

Standard for Software Maintenance.

• The data set comprised 144 observations corresponding to monthly maintenance periods.

16/04/2020 Dr Andy Brooks 10

Missing Data Techniques

Treatment of missing values

• If a value is missing one approach is simply to exclude the entire observation [effort, size, NA, MB, NC] from the model building process.

– the safest approach

• Another approach is to substitute the mean or the median value calculated from the other observations.

• Yet another approach is to find the most similar observation and use the value found there.

– best analogy found by calculating euclidean distances

• Fortunately, the data set did not contain missing values.

16/04/2020 Dr Andy Brooks 11

3. Experimental Setting

Data available

• Size of the system.

• Effort spent in the maintenance period.

• Number of maintenance tasks by

– type A source code modification

– type B fixing data misalignments through database queries

• data cleansing

– type C (not A or B) user disoperation, problems out of contract, etc.

• Other metrics such as software complexity were not available in full across all the projects.

16/04/2020 Dr Andy Brooks 12

3. Experimental Setting

Table 1: Collected Metrics

©ACM

16/04/2020 Dr Andy Brooks 13

3. Experimental Setting

Table 2: Descriptive statistics

©ACM

16/04/2020

144 observations, monthly maintenance periods

1960/(35hrs*4wks) = 14 person months

Dr Andy Brooks 14

4. Building Effort Estimation Models

• Multiple linear regression analysis minimizes the sum of the squared error.

• Regression analysis is said to be “as good as or better than many competing modeling techniques”.

– see references [7] and [18] of the case study article which showed estimation by analogy was not better

• Incorporating the size of a maintenance task would be useful, but this metric was not available.

• Analysis of residuals from the regression analyses revealed no non-linearity or other trends.

16/04/2020 Dr Andy Brooks 15

http://www.physics.csbsju.edu/stats/box2.html

Dealing with outliers

outlier/enfari

• If a value is deemed to be an outlier, one approach is to exclude the entire observation.

– outliers can be caused by transcription errors

• In a box plot, the box contains

50% of the data set.

– the interquartile range (IQR)

• 1.5* IQR away from the box, a value is a suspected outlier

• 3.0*IQR away from the box, a value is deemed an outlier

There were no obvious outliers in the data set.

16/04/2020 Dr Andy Brooks 16

correlation matrix/fylgnifylki

Table 3: Metrics correlation matrix

©ACM

• There are no strong correlations between the independent variables used to build the regression models.

• N (total number) correlates less well with NA possibly because NA is much smaller than NB and NC.

• No explanation is given for the correlation r = 0.6458.

16/04/2020 Dr Andy Brooks strong usually means r > 0.7

17

Critical commentary

from Andy

• Regression models are built assuming that model variables are independent.

– so it is important to carry out checks e.g. examine correlations

• We do not know the nature of the correlation coefficient used. Pearson is applied to normally distributed data and

Spearman to non-normally distributed data.

– sometimes researchers compute both to be sure

• There are some large differences between means and medians in Table 2 which suggests non-normality.

– Spearman correlation coefficients should have been calculated

• The correlation of 0.6458 suggests a real linkage between NA and NC i.e. they may not be independent.

16/04/2020 Dr Andy Brooks 18

Some plots illustrating correlations of various sizes

16/04/2020 Dr Andy Brooks http://www.jerrydallal.com/

19

4. Building Effort Estimation Models

Effort estimation models A, B, C

• NBC is the sum of NB and NC recall

16/04/2020 Dr Andy Brooks 20

4.1 Evaluating Model Performances

• The coefficient of determination R 2 represents the percentage of variation in the dependent variable explained by the independent variables of the model.

• Having a high R 2 does not guarantee the quality of future predictions.

– R 2 does not represent the performance of the model on a different data set, only the data set upon which the model was built.

16/04/2020 Dr Andy Brooks 21

Table 4 Model parameters

©ACM

• All model variables are statistically significant (p > 0,05).

• Model C explains 90% of the variation in effort.

16/04/2020 Dr Andy Brooks 22

4.1 Evaluating Model Performances

Assessing the quality of future predictions

PRESS (PREdiction Sum of Squares)

• ŷ (y-hat) means predicted value.

• The residual represents the difference between the i th value in the data set and the value predicted from a regression analysis using all data points except the i th .

• In a data set of size n, n separate regression equations are calculated.

• Smaller PRESS scores are better.

• PRESS is also known as “leave-one-out cross validation”.

16/04/2020 Dr Andy Brooks 23

4.1 Evaluating Model Performances

Assessing the quality of future predictions

SPR

• ŷ (y-hat) means predicted value.

• The residual represents the difference between the i th value in the data set and the value predicted from a regression analysis using all data points except the i th .

• SPR is the sum of the absolute values rather than the squares of the PRESS residuals.

• SPR is used when a few large PRESS residuals can inflate the PRESS score unreasonably.

16/04/2020 Dr Andy Brooks 24

4.1 Evaluating Model Performances

Assessing the quality of future predictions

MMRE (Mean Magnitude Relative Error)

• MRE i is the magnitude of the relative error.

• ŷ (y-hat) means predicted value.

• The residual represents the difference between the i th value in the data set and the value predicted from a regression analysis using all data points except the i th .

• MMRE is the mean magnitude.

• MdMRE is the median magnitude. MMRE might be dominated by a few MREs with very high values.

16/04/2020 Dr Andy Brooks 25

4.1 Evaluating Model Performances

Assessing the quality of future predictions

PRED

• RE is the relative error.

• “We believe that maintenance managers may, in most cases and specially for small maintenance tasks, accept a relative error between the actual and predicted effort of about 50%.”

• According to reference [36] (1991) of the case study article, an average error of 100% can be considered

“good” and an average error of 32% “outstanding”.

16/04/2020 Dr Andy Brooks 26

4.1 Evaluating Model Performances

Table 5: Leave-one-out cross validation ©ACM

• Model C is clearly better.

– Almost 50% of cases have a relative error of less than 25%.

– Almost 83% of cases have a relative error of less than 50%.

16/04/2020 Dr Andy Brooks 27

extending the evaluation of Model C

Leave More Out Cross Validation for Model C

• The data set is randomly partioned into a training data set and a test set.

• The training data set is used to build the model.

• The test data set is used to assess the quality of the model ´s prediction.

• L x means the training (learning) data set is composed of x% of the observations.

• T

100-x means the test data set is composed of

100-x% of the observations.

16/04/2020 Dr Andy Brooks 28

Model C

Table 6: Leave more out cross validation with random partitions

• As the size of the learning set decreases, so does the quality of prediction, as expected.

16/04/2020 Dr Andy Brooks 29

Model C

Critical commentary

from Andy

• It is not stated how many partitions were used to establish each of the average values in Table 6.

• a minimum sample size of 10 is usually required to compute an average with reasonable accuracy

• The trend in Table 6 makes sense, but it is difficult to believe the PRED values in Table 6 for L

90

-T

10

.

• PRED

50

= 100% yet PRED

50 is only 82.64% when all the data except one observation is used for training.

– The authors should have addressed what appears to be an anomalous result.

16/04/2020 Dr Andy Brooks 30

5 projects

Table 7 Cross validation at a project level

Model C

• Column P1 represents training with P2, P3, P4, and P5 and the results of testing on project P1.

– the regression analysis has “no knowledge” of project P1

16/04/2020 Dr Andy Brooks 31

Model C

Cross validation at the project level

• Predictive performance is poor when using projects P1 and P3 as test sets.

• Project P1 had no maintenance tasks of type B and this might explain the poor predictive performance of a model which actually has NB as a predictor variable.

• No explanation is provided for the poor predictive performance using P3 as a test set.

16/04/2020 Dr Andy Brooks 32

5. Conclusion

• Previously, the supplier company (i.e. the company doing the maintenance) had used a prediction model which did not distinguish between different types of maintenance task.

• PRED values for this earlier prediction model were not very satisfactory.

– PRED

25

– PRED

50

= 33.33%

= 53.47%

• The authors believed that modelling different types of maintenance task (A, B, and C) would improve prediction, which it did, especially for Model C.

– PRED

25

– PRED

50

16/04/2020

= 49.31%

= 82.64%

Dr Andy Brooks leave-out-one PREDs

33

5. Conclusion

• More complicated prediction models could be built, but the authors chose not to, so that the models could be easily calculated by working engineers and managers.

• Effort estimation can also involve estimating values for the independent variables.

– estimating the number and type of maintenance tasks

“ex ante” in a forthcoming maintenance period can be done reasonably accurately from historical data

• more complicated models involve more variables to estimate , making it more difficult to predict forthcoming effort

16/04/2020 Dr Andy Brooks 34

5. Conclusion

• The greatest weakness of using regression models for effort estimation is that they only apply to the “analyzed domain and technological environment”.

– i.e. The prediction models are company specific and you cannot apply the values determined for the model coefficients in another company setting.

• Andy says: This is a likely explanation for the “cross validation at the project level” results. Projects were from different companies so it is perhaps not surprising that trying to predict for one company using data for other companies sometimes did not work well.

– P1 and P3 in Table 7

16/04/2020 Dr Andy Brooks 35

5. Conclusion

• The models presented were adopted by the supplier company providing maintenance services.

16/04/2020 Dr Andy Brooks 36

Download