Hamid Abdul Basit, Member, IEEE, and Stan Jarzabek, Member, IEEE Computer Society
(IEEE Transactions on Software Engineering, vol. 35, issue 4, pp. 497-514, July-Aug. 2009)
Presenter: Deepa Shinde
Introduction
Code Clones are similar program structures.
Simple clones: Small fragments of duplicate code.
Structural clones: finding clone patterns in a class, file, module and
unifying them to form high – level clones is called structural clones.
Structural clones are often induced because of
Domain : Product line approach
Design technique : Similar problems solved with similar design solutions
To speed up the development
Motivation
Problems of clone: When a cloned fragment is to be
changed, a programmer must find and update all the
instances of it consistently . Hence it is expensive affair
also maintenance becomes difficult with cloning
There are several clone detection tools but most of the
tools can only detect simple clones which has
limitations like they cannot help in understanding the
design of the system for better maintenance and less
helpful in reuse.
This paper has proposed a tool that helps in detecting
structural clones.
Solution
At the core of the structural
clones, often there are simple
clones that coexist and relate
to each other in certain ways.
Structural clones are detected
by Clone Miner tool
Input is simple clones
set(SCSets)
method clone sets (MCSets
at level 3)
file clone sets (FCSets at level
5)
directory clone sets (DCSets
at level 7)
Algorithm Steps
Step 1- Detection of Simple clones is done by Repeated token finder
tool.(Finding SCSet)
Step 2- Finding repeating groups of simple clones across different or
same files or methods. That is Detection of level 1 and 2
FIM (Frequent item set mining technique) for finding 1B and 2B structural
clones. Encode the tuple <a, b> where a is the SCSet ID and b is the
occurrence index of this ID in the given file or method
1A and 2A are locally repeating groups and is obtained by sorting and brute
force combination generator.
In this example 9-9-9-15 is the structural clone
Algorithm Steps
Step3 – Detecting File clone
sets (FCSets at level 5) and
method clone sets (MCSets at
level 3) by the process of
clustering from the significant
level 2-B and 1-B structural
clones.
Defines significance as
Len(I)-length of the structural
clone instance i.e length of
the SCSet and unit can be no.
of lines of code.
Cover(I, C) =
(Len(I)/Len(C))* 100 Where
Len(C) is the length of the
method or file and it
measures the percentage
coverage of the container C by
the structural clone instance I.
File 13
File 12
File 19
File 14
Cluster 1
Cluster 2
Algorithm Steps
Step4 – Detecting level 4(Repeating Groups of Method Clones) across
same or different files
4B is obtained by FIM technique
4A by sorting and brute force combination generation.
Step5 – Detecting level 6 and level 7 by clustering as before.
By obtaining structural clones at this level it is possible to view the entire
system.
Directory 1
Directory
12
Directory
19
Directory
14
Cluster 1
Cluster 2
XVCL approach
Clones might be created intentionally. Therefore eliminating them is not
always a good idea. Still having a non redundant code is required for
maintenance or reuse purposes and this can be achieved by XVCL.
XVCL Provides a metalevel source code representation free of clones, and
keeping “good” clones untouched in the actual program.
Meta-level source code consists of generic metacomponents
Generic Metacomponents are unified simple or structural clone set from a
Clone Miner tool .
This technique is used in Product line approach that requires reuse of program
components across a family of similar systems.
XVCL Framework
Generic metacomponents for simple clones are at the bottom of that hierarchy,
metacomponents for structural clones formed as configurations of simple
clones appear above them, and so on. Small XVCL metacomponents are
combined to form bigger ones, and eventually represent the structure of the
entire system.
The XVCL Processor reads and processes how the metacomponents are
configured. After which it reconstructs the program in its native language such
as Java or C++.
SPC defines deltas for metacomponents.
Buffer Library case studies
Java Buffer Library (java.nio.* package of J2SE 1.5).
Building generic metacomponents for a group of files
that were detected as an FCSet by Clone Miner
74 classes in the Java Buffer library and even if they
essentially play the same role they differ in features
such as buffer element type, memory allocation
scheme, byte ordering, and access mode.
All these classes was clustered into seven groups of
highly similar classes
Buffer Library case studies
Example of a cluster is [T]Buffer. T element types, for example Byte, Char, Int,
Double, Float, Long, Short.
A closer analysis of the Clone Miner’s output reveals that numeric type buffer
classes (IntBuffer,ShortBuffer, FloatBuffer, LongBuffer, and DoubleBuffer)
differ from each other in type names only. And CharBuffer and ByteBuffer
classes have extra methods that are missing from the other classes in the group.
This variation is handled by XVCL commands
Overall case study output
Other Case Studies
Eclipse Graphical Editing
Framework: open-source
development platform for the
creation of graphical editors
from an existing application
model.
Other Case Studies
Eclipse Visual Editor: is an open-source development platform
for creating GUI builders from an existing model.
OpenJGraph 0.9.2: OpenJGraph is an open-source Java library to
work with graphs. It contains the well-known graph algorithms
like ShortestPath, MinimumSpanningTree, etc
J2ME Wireless Toolkit 2.2: J2ME Wireless Toolkit 2.2 (WTK) is a
Java development platform for wireless applications, mainly for
mobile devices such as cell phones and PDAs
Java Pet Store 1.3.2: Java Pet Store 1.3.2 is developed as a model
application for the Java 2 Platform to demonstrate the
capabilities of J2EE for enterprise applications. It provides a
template to rapidly develop enterprise solutions
Advantages of Clone Miner
Improves clone detection : Most of the tools will detect only simple
clones which may only be noise when considered individually but when
combined to form structural clones they represent a bigger entity.
Program understanding : Design recovery is about analyzing programs
to identify important design information or concepts in programs. Such
information is invaluable in program understanding, maintenance, reuse,
and reengineering.
Change Impact Analysis: For efficient maintenance of software, good
understanding of the system is required to deal with ripple effects of
changes and update anomalies
Refactoring : Various restructuring or refactoring techniques can be
applied to improve the design of a software system without changing its
functionality . Analysis of structural clones is helpful in locating places
where high-level duplication is present that can be restructured or
refactored.
Drawbacks.
Clone Miner currently does not support detection of
dynamic relationships between cloned entities.
It works for code only written in Java it is language
dependent.
More complex similarities can be considered.
Detection and analysis of similarity patterns is based
only on the physical location of clones.
This paper does not explain how the SCSet is obtained.
This technique is more domain specific.
Improvements
Another potentially useful analysis could be to
detect repeating groups of Method clones across
directories, but this is currently not implemented
in Clone Miner.