Document

advertisement
An Analysis of Machine
Learning Algorithms for
Condensing Reverse Engineered
Class Diagrams
Hafeez Osman, Michel R.V. Chaudron and Peter van der Putten
Leiden University, Leiden, the Netherlands
Chalmers University of Technology and Goteborg University,
Gothenburg, Sweden
Luiz Paulo Coelho Ferreira
Introduction
• Up-to-date design documentation is important.
• UML models created during the design are often poorly
kept up to date during development and maintenance.
• For legacy software, up-to-date designs are valuable for
maintaining such systems and is hard to find.
• This paper is partially motivated by a scenario where
new programmers want to join a development team.
2
Luiz Paulo Coelho Ferreira
Research Problem
• This paper specifically aims at providing suitable
classification algorithms to decide which classes
should be included in a class diagram.
• They seek an automated approach to classify the key
classes in a class diagram.
3
Luiz Paulo Coelho Ferreira
Contribution
• They explore 9 classification algorithms for
predicting key classes that should be included in a
class diagram.
• Evaluated 9 open sources systems, with 59 to 903
classes.
4
Luiz Paulo Coelho Ferreira
Research Questions
• RQ1: Which individual predictors are influential for
the classification?
• RQ2: How robust is the classification to the
inclusion of categories of predictors?
• RQ3: What are suitable classification algorithms in
classifying key classes?
5
Luiz Paulo Coelho Ferreira
Machine Learning
• Univariate Analysis
• Checks the predictor who has more influence
• Machine Learning Classification Algorithm:
• J48 Decision Tree, k-Nearest Neighbor, Logistic
Regression, Naive Bayes, Decision Tables, Decision
Stumps, Radial Basis Function Networks, Random
Forests and Random Trees.
6
Luiz Paulo Coelho Ferreira
Machine Learning
• Evaluation Method:
• Univariate Analysis they used InfoGain Attribute
Evaluator (InfoGain).
• Classification Algorithms were evaluated by Area
Under ROC curve (AUC).
7
Luiz Paulo Coelho Ferreira
Approach
• Examined Predictors and Tools
• Case Studies
• Process
8
Luiz Paulo Coelho Ferreira
Predictors and Tools
• Reverse Engineering:
• MagicDraw
• Software Metrics:
• SDMetrics
• Data Mining:
• WEKA
9
Luiz Paulo Coelho Ferreira
Case Studies
• Criteria:
• Open Source Project
• Must have a forward design class diagram
• 50+ classes
10
Luiz Paulo Coelho Ferreira
Process
11
Luiz Paulo Coelho Ferreira
Evaluation
• RQ1: Which individual predictors are influential for
the classification?
12
Luiz Paulo Coelho Ferreira
Evaluation
RQ2: How robust is the classification to the inclusion
of categories of predictors?
13
Luiz Paulo Coelho Ferreira
Evaluation
RQ2: How robust is the classification to the inclusion
of categories of predictors?
14
Luiz Paulo Coelho Ferreira
Evaluation
RQ3: What are suitable
classification algorithms in
classifying key classes?
15
Luiz Paulo Coelho Ferreira
Evaluation
RQ3: What are suitable classification algorithms in classifying
key classes?
16
Luiz Paulo Coelho Ferreira
Discussion and Future Work
• Export Coupling Parameter (EC Par), Dependency In
(Dep In) and Number of Operation (NumOps) were the
most influential predictors.
• K-NN(5) and Random Forest were the best algorithms,
and they can be combined to find better solutions.
• Wasn’t able to produce high values of AUC.
• Could use different metrics.
• Evolve the “ground truth” to be iterative or use version
control mining
17
Luiz Paulo Coelho Ferreira
Threats to Validity
• This study assumed that all the classes that existed in
the forward designs were the important classes.
• The input of this study is dependent on the
MagicDraw CASE tools.
• We only cover 9 open source case studies.
18
Luiz Paulo Coelho Ferreira
Conclusion
• They propose an approach for condensing reverse
engineered class diagram by selecting the key classes in it.
• Evaluates the influential predictors in classifying key
classes and compares various machine learning
classification algorithms on 9 case studies.
• Export Coupling Parameter, Dependency In and Number
of Operation are the most influential predictors for
predicting key classes
• On these predictor sets, Random Forest and k-Nearest
Neighbor provided the best results
19
Luiz Paulo Coelho Ferreira
Questions?
??????????????
20
Luiz Paulo Coelho Ferreira
Download