Slides - Crest

advertisement
On the application of GP for
software engineering predictive
modeling: A systematic review
Expert systems with Applications, Vol. 38 no. 9, 2011
Wasif Afzal, Richard Torkar
Blekinge Institute of Technology,
Karlskrona, Sweden.
{waf,rto}@bth.se
Agenda
•
•
•
•
•
•
•
Research question
Symbolic regression
Prediction and estimation in sw engineering
GP for prediction and estimation in sw engineering
Application of GP for sw quality classification
Application of GP for sw cost/effort/size estimation
Application of GP for sw fault prediction and sw reliability
growth modeling
• Future work
• Conclusions
• Recommendations
Our research question
• Is there evidence that:
symbolic regression using GP is an effective
method for:
prediciton and estimation, in comparison with:
regression, machine learning and other models
(including expert opinion and different
improvements over the standard GP algorithm)?
It is about symbolic regression!
• Symbolic regression – One of the many application
areas of GP
– Finds a function, with the outputs having desired
outcomes.
– Makes no assumptions about:
• Structure of the function
• Data distribution
• Relationship between independent and dependent
variables
• Helps in identifying the significant variables in
subsequent modeling attempts
Prediction and estimation in sw
engineering
• Software quality
– Software quality classification
– Software fault prediction
– Software reliability growth modeling
• Software size
• Software development cost/effort
• Maintenance task effort
• Software release timing
GP for prediction and estimation in sw
engineering
• 23 identified primary studies
– Software quality classification (8)
– Software cost/effort/size estimation (7)
– Software fault prediction and software
reliability growth modeling (8)
GP for prediction and estimation in sw
engineering cntd…
Application of GP for sw quality
classification (8 studies)
• Variations of the dependent variable:
– Fault proneness
– Quality ranking of program modules (high risk to low
risk)
• Variations in sampling of training and testing sets:
– Simple hold-out and 10-fold CV.
Application of GP for sw quality
classification cntd…
• Variations in fitness function
– Single objective
• Minimization of root mean square
• Minimization of average cost of misclassification
– Multi-objective
• Minimization of average cost of misclassification +
minimization of tree size
• Maximization of the best percentage of the actual faults
averaged over the percentiles level of interest + controlling
the tree size.
• Balancing the over sampling and under sampling in each
class for a decision tree.
Application of GP for sw quality
classification cntd…
• Variations in comparison groups:
– Neural networks
– k-nearnest neighbour
– Regression (linear, logistic)
– Humans
Application of GP for sw quality
classification cntd…
• Results:
– Majority of the studies (6 out of 8) reported
results in favor of using GP for the classification
task.
• Limitations:
– Increase the comparisons with a more
representative set of techniques.
– Increase the use of publically available data sets
for easier replications.
Application of GP for sw quality
classification cntd…
• Encouraging aspects:
– The datasets used represent real-world
projects.
– Problem dependent objectives represented in
fitness functions perform better than standard
GP.
Application of GP for sw cost/effort/size
(CES) estimation (7 studies)
• Variations of the dependent variable
– Software effort
– Software cost
– Software size
• Variations in fitness function
– Single objective
• Minimization of mean squared error or
MMRE
Application of GP for sw cost/effort/size
(CES) estimation cntd…
• Variations in comparison groups
– ANN, nearest neighbour and different forms
of regression.
• Variations in sampling of training and testing
sets
– Simple hold-out.
Application of GP for sw cost/effort/size
(CES) estimation cntd…
• Results
– No strong evidence of GP performing consistently on
all evaluation measures used.
• Limitations
–
–
–
–
Evaluation measures used are not standardized.
Different hold-out samplings for train and test sets.
Lack of statistical hypothesis testing.
Lack of comparison groups.
Application of GP for sw fault prediciton and
sw reliability growth modeling (8 studies)
• Variations of the dependent variable
– SW fault prediction
– SW reliability growth modeling
• Variations in fitness function
– Single objective:
• Minimization of standard error
Application of GP for sw fault prediciton and
sw reliability growth modeling cntd …
• Variations in comparison groups
– Standard GP, Naive Bayes, traditional
software reliability growth models.
• Variations in sampling of training ad
testing sets
– Hold-out and 10-fold CV
Application of GP for sw fault prediciton and
sw reliability growth modeling cntd …
• Results:
– 7 out of 8 studies favor the use of GP.
• Limitations:
– Poor representation of comparison groups
– Absence of a baseline to compare to.
Promising future work to undertake
• Multi-objective fitness evaluation (e.g.
Minimization of standard error and maximization
of correlation coefficient)
• Simplification of GP solutions to help
interpretation of relationships between variables.
• Evaluation of techniques to minimize overfitting
of GP solutions.
Conclusions
• A total of 23 studies apply GP for predictive studies in sw
engineering:
– sw quality classification (8)
– sw cost/effort/size estimation (7)
– sw fault prediciton and sw reliability growth modeling (8)
• There is evidence in support of using GP for:
– sw quality classifiaction
– sw fault prediction and SW reliability growth modeling
• but not for:
– sw cost/effort/size estimation.
Recommendations
• Use public data sets wherever possible.
• Apply commonly used sampling strategies.
• Use techniques to avoid overfitting in GP
solutions.
• Report the settings of GP parameters.
• Compare the performances against a commonly
used baseline.
• Use statistical experimental designs.
Download