Hamid Abdul Basit, Member, IEEE, and Stan Jarzabek, Member, IEEE... (IEEE Transactions on Software Engineering, vol. 35, issue 4, pp....

Hamid Abdul Basit, Member, IEEE, and Stan Jarzabek, Member, IEEE Computer Society
(IEEE Transactions on Software Engineering, vol. 35, issue 4, pp. 497-514, July-Aug. 2009)
Presenter: Deepa Shinde
 Code Clones are similar program structures.
 Simple clones: Small fragments of duplicate code.
 Structural clones: finding clone patterns in a class, file, module and
unifying them to form high – level clones is called structural clones.
 Structural clones are often induced because of
 Domain : Product line approach
 Design technique : Similar problems solved with similar design solutions
 To speed up the development
 Problems of clone: When a cloned fragment is to be
changed, a programmer must find and update all the
instances of it consistently . Hence it is expensive affair
also maintenance becomes difficult with cloning
 There are several clone detection tools but most of the
tools can only detect simple clones which has
limitations like they cannot help in understanding the
design of the system for better maintenance and less
helpful in reuse.
 This paper has proposed a tool that helps in detecting
structural clones.
 At the core of the structural
clones, often there are simple
clones that coexist and relate
to each other in certain ways.
 Structural clones are detected
by Clone Miner tool
 Input is simple clones
 method clone sets (MCSets
at level 3)
 file clone sets (FCSets at level
 directory clone sets (DCSets
at level 7)
Algorithm Steps
 Step 1- Detection of Simple clones is done by Repeated token finder
tool.(Finding SCSet)
 Step 2- Finding repeating groups of simple clones across different or
same files or methods. That is Detection of level 1 and 2
 FIM (Frequent item set mining technique) for finding 1B and 2B structural
clones. Encode the tuple <a, b> where a is the SCSet ID and b is the
occurrence index of this ID in the given file or method
 1A and 2A are locally repeating groups and is obtained by sorting and brute
force combination generator.
 In this example 9-9-9-15 is the structural clone
Algorithm Steps
 Step3 – Detecting File clone
sets (FCSets at level 5) and
method clone sets (MCSets at
level 3) by the process of
clustering from the significant
level 2-B and 1-B structural
 Defines significance as
 Len(I)-length of the structural
clone instance i.e length of
the SCSet and unit can be no.
of lines of code.
 Cover(I, C) =
(Len(I)/Len(C))* 100 Where
Len(C) is the length of the
method or file and it
measures the percentage
coverage of the container C by
the structural clone instance I.
File 13
File 12
File 19
File 14
Cluster 1
Cluster 2
Algorithm Steps
 Step4 – Detecting level 4(Repeating Groups of Method Clones) across
same or different files
 4B is obtained by FIM technique
 4A by sorting and brute force combination generation.
 Step5 – Detecting level 6 and level 7 by clustering as before.
 By obtaining structural clones at this level it is possible to view the entire
Directory 1
Cluster 1
Cluster 2
XVCL approach
 Clones might be created intentionally. Therefore eliminating them is not
always a good idea. Still having a non redundant code is required for
maintenance or reuse purposes and this can be achieved by XVCL.
 XVCL Provides a metalevel source code representation free of clones, and
keeping “good” clones untouched in the actual program.
 Meta-level source code consists of generic metacomponents
 Generic Metacomponents are unified simple or structural clone set from a
Clone Miner tool .
 This technique is used in Product line approach that requires reuse of program
components across a family of similar systems.
XVCL Framework
 Generic metacomponents for simple clones are at the bottom of that hierarchy,
metacomponents for structural clones formed as configurations of simple
clones appear above them, and so on. Small XVCL metacomponents are
combined to form bigger ones, and eventually represent the structure of the
entire system.
 The XVCL Processor reads and processes how the metacomponents are
configured. After which it reconstructs the program in its native language such
as Java or C++.
 SPC defines deltas for metacomponents.
Buffer Library case studies
 Java Buffer Library (java.nio.* package of J2SE 1.5).
 Building generic metacomponents for a group of files
that were detected as an FCSet by Clone Miner
 74 classes in the Java Buffer library and even if they
essentially play the same role they differ in features
such as buffer element type, memory allocation
scheme, byte ordering, and access mode.
 All these classes was clustered into seven groups of
highly similar classes
Buffer Library case studies
 Example of a cluster is [T]Buffer. T element types, for example Byte, Char, Int,
Double, Float, Long, Short.
 A closer analysis of the Clone Miner’s output reveals that numeric type buffer
classes (IntBuffer,ShortBuffer, FloatBuffer, LongBuffer, and DoubleBuffer)
differ from each other in type names only. And CharBuffer and ByteBuffer
classes have extra methods that are missing from the other classes in the group.
 This variation is handled by XVCL commands
Overall case study output
Other Case Studies
 Eclipse Graphical Editing
Framework: open-source
development platform for the
creation of graphical editors
from an existing application
Other Case Studies
 Eclipse Visual Editor: is an open-source development platform
for creating GUI builders from an existing model.
 OpenJGraph 0.9.2: OpenJGraph is an open-source Java library to
work with graphs. It contains the well-known graph algorithms
like ShortestPath, MinimumSpanningTree, etc
 J2ME Wireless Toolkit 2.2: J2ME Wireless Toolkit 2.2 (WTK) is a
Java development platform for wireless applications, mainly for
mobile devices such as cell phones and PDAs
 Java Pet Store 1.3.2: Java Pet Store 1.3.2 is developed as a model
application for the Java 2 Platform to demonstrate the
capabilities of J2EE for enterprise applications. It provides a
template to rapidly develop enterprise solutions
Advantages of Clone Miner
 Improves clone detection : Most of the tools will detect only simple
clones which may only be noise when considered individually but when
combined to form structural clones they represent a bigger entity.
 Program understanding : Design recovery is about analyzing programs
to identify important design information or concepts in programs. Such
information is invaluable in program understanding, maintenance, reuse,
and reengineering.
 Change Impact Analysis: For efficient maintenance of software, good
understanding of the system is required to deal with ripple effects of
changes and update anomalies
 Refactoring : Various restructuring or refactoring techniques can be
applied to improve the design of a software system without changing its
functionality . Analysis of structural clones is helpful in locating places
where high-level duplication is present that can be restructured or
 Clone Miner currently does not support detection of
dynamic relationships between cloned entities.
It works for code only written in Java it is language
More complex similarities can be considered.
Detection and analysis of similarity patterns is based
only on the physical location of clones.
This paper does not explain how the SCSet is obtained.
This technique is more domain specific.
 Another potentially useful analysis could be to
detect repeating groups of Method clones across
directories, but this is currently not implemented
in Clone Miner.