Use Code Bad Smell to Predict Faults

advertisement
An Empirical Study of the
Relationship Between Code Bad
Smells and Software Faults
Min Zhang
School of Computer Science
University of Hertfordshire
Introduction
What is a Code Bad Smell?
 Problems using Code Bad Smells
 An overview of the empirical study
 Code Bad Smell detection
 Fault identification
 Result and discussion
 Conclusion
 Q/A

Code Bad Smells

The 22 Code Bad Smells are bad structures in
source code informally identified by Fowler et
al. (1999).

Fowler et al. (1999) suggest that Code Bad
Smells can give “indications that there is
trouble that can be solved by a refactoring”.

They are widely used for detecting refactoring
opportunities in software (Mens and Tourwe,
2004).
Problems in Using Code Bad Smells

Fowler et al. (1999) claim that Code Bad
Smells are structures which cause
detrimental effects on software. However,
little empirical evidence has been
provided.

Most existing Code Bad Smell detection
tools are Metric-based. We argue about
their accuracy.
An Empirical Study of the Relationship
between Code Bad Smells and Faults

Objective:
Capture the relationship between Code Bad Smells
and faults

Targeted Code Bad Smells:
Data Clumps, Message Chains, Middle Man,
Speculative Generality, and Switch Statements

Research Data:


Eclipse Core Packages (Release 3.0, 3.0.1, 3.0.2, 3.1 and 3.2)
Apache Common Packages (Common IO, Common Logging,
Common Codec, Common DbUtils, Common DBCP, and
Common Net )
Code Bad Smell Detection

Pattern-based Code Bad Smell detection
Define each Code Bad Smell as particular
code patterns
 Ideas from Gamma et al.’s (1995) definition
of the GoF Design Patterns


Use Recoder API to analyse Java
source code
An Example: The Pattern-based Definition
of the Message Chains Bad Smell
The Pattern-based Definition of the Message Chains Bad Smell
Fowler et al.’s
definition
You see message chains when a client asks one object for another object, which the
client then asks for yet another object, which the client then asks for yet another
another object, and so on. You may see these as a long line of getThis methods, or as
a sequence of temps. (Fowler et al., 1999)
Pattern-based
definition
An instance of the Message Chains Bad Smell is in one of the following situations:
Situation 1:
1.In order to access a data field in another class, a statement needs to call more than
a threshold value of getter methods in a sequence. (E.g. int a=b.getC().getD();)
2.This method call statement and the declarations of getter methods are in different
classes.
Situation 2:
1.In order to access a data field in another class, source code use more than a
threshold number of temp variable.
2.A temp variable is that a variable only access data members (data fields/getter
methods) of the other classes or other temp variables. (E.g. ClassC tmpC=b.getC();
int a=a1.getD();)
Fault Identification

Zimmerman et al.’s (2007) fault identification
approach:
1.
2.
3.
4.
Locate “bug”, “fix(ed)” and “update(d)” token in
CVS comment messages.
If a version entry in CVS contains one or more
above tokens and those tokens are followed by
numbers, this version entry is seen as a bug
fixing update.
Those numbers are treated as bug ID.
Confirm the bug ID using Bugzilla database.
Results and Discussion: Binary Coding of
the Existence of Code Bad Smells (1)
Existence of Code Bad Smells
Data Clumps
Message Chains
Speculative
Generality
Middle Men
Switch Statements
Coding
0
0
0
0
0
0
1
0
0
0
0
1
0
1
0
0
0
2
1
1
0
0
0
3
0
0
1
0
0
4
1
0
1
0
0
5
0
1
1
0
0
6
1
1
1
0
0
7
0
0
0
1
0
8
1
0
0
1
0
9
0
1
0
1
0
10
1
1
0
1
0
11
0
0
1
1
0
12
1
0
1
1
0
13
0
1
1
1
0
14
1
1
1
1
0
15
Result and Discussion: Binary Coding of
the Existence of Code Bad Smells (2)
Existence of Code Bad Smells
Data Clumps
Message Chains
Speculative
Generality
Middle Men
Switch Statements
Coding
0
0
0
0
1
16
1
0
0
0
1
17
0
1
0
0
1
18
1
1
0
0
1
19
0
0
1
0
1
20
1
0
1
0
1
21
0
1
1
0
1
22
1
1
1
0
1
23
0
0
0
1
1
24
1
0
0
1
1
25
0
1
0
1
1
26
1
1
0
1
1
27
0
0
1
1
1
28
1
0
1
1
1
29
0
1
1
1
1
30
1
1
1
1
1
31
Result and Discussion: One-way Analysis
of Variance Eclipse Data (1)
Result and Discussion: One-way Analysis
of Variance Eclipse Data (2)

The five profiles which indicate the
existence of each of the five Code Bad
Smells contain significantly lower mean
number of faults than profile zero.

All profiles which have higher mean
number of faults than profile zero contain
the Message Chains and the Switch
Statement Bad Smells.
Result and Discussion: the Message
Chains and Switch Statements
Result and Discussion: the Message
Chains and Switch Statements

All source code samples associated with
more than 10 faults contain the Message
Chains Bad Smell.

The Switch Statements Bad Smell does
not show a clear relationship with high
number of faults.
Result and Discussion: One-way Analysis
of Variance Apache Data (1)
Result and Discussion: One-way Analysis
of Variance Apache Data (2)

The five profiles which indicate the
existence of each of the five Code Bad
Smells contain lower mean number of
faults than profile zero.

All the Message Chains Bad Smell
contained profiles do not show higher
mean number of faults than the profile
zero.
A Detailed Investigation of Message
Chains

Objective:



To test whether the Message Chains Bad Smell is
directly associated with faults.
To test whether the Message Chains Bad Smell is
directly associated with particular types of faults.
Method:

Manually investigate 20 source code samples from
the Eclipse project
An Detail Investigation of Message Chains:
Direct Association with Faults
Association Type
Detail of Change
Number of Instances
Message Chains Touched During Fix
Message Chains Increased
4
Message Chains Reduced
5
Message Chains Not Touched During
Fix
Total
45
54
A Detailed Investigation of Message
Chains: Fault Classification

Classification Schema: An adopted version of
Seaman et al.’s (2008) fault classification schema

Results:
Type of Fault
Number of Instances
Algorithm / Method
4
Checking
1
External Interface
2
Internal Interface
2
Non-functional Defects
0
Other
0
A Detailed Investigation of Message
Chains: Result

Message Chains Bad Smell is not likely
to be directly associated with faults, but it
indicates a complicated software context.

Message Chains Bad Smell is likely to
be associated with Algorithm/Method
faults.
Conclusion

Source code containing only one of the five
Code Bad Smells is not likely to be fault prone.

The Message Chains Bad Smell could cause a
high number of faults and is likely to be
associated with Algorithm/Method faults, so it
deserves further attention.

The Message Chains Bad Smell may not be
directly associated with faults but it may
indicate a complicated software context.
Q/A
References

FOWLER, M., BECK, K., BRANT, J., OPDYKE, W. & ROBERTS, D. (1999)
Refactoring: Improving the Design of Existing Code, Addison Wesley.

GAMMA, E., HELM, R., JOHNSON, R. & VLISSIDES, J. (1995) Design patterns :
elements of reusable object-oriented software, Reading, Mass., Addison-Wesley.

MENS, T. & TOURWE, T. (2004) A survey of software refactoring. Software
Engineering, IEEE Transactions on, 30, 126-139.

SEAMAN, C. B., SHULL, F., REGARDIE, M., ELBERT, D., FELDMANN, R. L.,
GUO, Y. & GODFREY, S. (2008) Defect categorization: making use of a decade
of widely varying historical data. Proceedings of the Second ACM-IEEE
international symposium on Empirical software engineering and measurement.
Kaiserslautern, Germany, ACM.

ZIMMERMANN, T., PREMRAJ, R. & ZELLER, A. (2007) Predicting Defects for
Eclipse. IN PREMRAJ, R. (Ed.) Predictor Models in Software Engineering, 2007.
PROMISE'07: ICSE Workshops 2007. International Workshop on.
Download