Hamid Abdul Basit, Member, IEEE, and Stan Jarzabek, Member, IEEE Computer Society (IEEE Transactions on Software Engineering, vol. 35, issue 4, pp. 497-514, July-Aug. 2009) Presenter: Deepa Shinde Introduction Code Clones are similar program structures. Simple clones: Small fragments of duplicate code. Structural clones: finding clone patterns in a class, file, module and unifying them to form high – level clones is called structural clones. Structural clones are often induced because of Domain : Product line approach Design technique : Similar problems solved with similar design solutions To speed up the development Motivation Problems of clone: When a cloned fragment is to be changed, a programmer must find and update all the instances of it consistently . Hence it is expensive affair also maintenance becomes difficult with cloning There are several clone detection tools but most of the tools can only detect simple clones which has limitations like they cannot help in understanding the design of the system for better maintenance and less helpful in reuse. This paper has proposed a tool that helps in detecting structural clones. Solution At the core of the structural clones, often there are simple clones that coexist and relate to each other in certain ways. Structural clones are detected by Clone Miner tool Input is simple clones set(SCSets) method clone sets (MCSets at level 3) file clone sets (FCSets at level 5) directory clone sets (DCSets at level 7) Algorithm Steps Step 1- Detection of Simple clones is done by Repeated token finder tool.(Finding SCSet) Step 2- Finding repeating groups of simple clones across different or same files or methods. That is Detection of level 1 and 2 FIM (Frequent item set mining technique) for finding 1B and 2B structural clones. Encode the tuple <a, b> where a is the SCSet ID and b is the occurrence index of this ID in the given file or method 1A and 2A are locally repeating groups and is obtained by sorting and brute force combination generator. In this example 9-9-9-15 is the structural clone Algorithm Steps Step3 – Detecting File clone sets (FCSets at level 5) and method clone sets (MCSets at level 3) by the process of clustering from the significant level 2-B and 1-B structural clones. Defines significance as Len(I)-length of the structural clone instance i.e length of the SCSet and unit can be no. of lines of code. Cover(I, C) = (Len(I)/Len(C))* 100 Where Len(C) is the length of the method or file and it measures the percentage coverage of the container C by the structural clone instance I. File 13 File 12 File 19 File 14 Cluster 1 Cluster 2 Algorithm Steps Step4 – Detecting level 4(Repeating Groups of Method Clones) across same or different files 4B is obtained by FIM technique 4A by sorting and brute force combination generation. Step5 – Detecting level 6 and level 7 by clustering as before. By obtaining structural clones at this level it is possible to view the entire system. Directory 1 Directory 12 Directory 19 Directory 14 Cluster 1 Cluster 2 XVCL approach Clones might be created intentionally. Therefore eliminating them is not always a good idea. Still having a non redundant code is required for maintenance or reuse purposes and this can be achieved by XVCL. XVCL Provides a metalevel source code representation free of clones, and keeping “good” clones untouched in the actual program. Meta-level source code consists of generic metacomponents Generic Metacomponents are unified simple or structural clone set from a Clone Miner tool . This technique is used in Product line approach that requires reuse of program components across a family of similar systems. XVCL Framework Generic metacomponents for simple clones are at the bottom of that hierarchy, metacomponents for structural clones formed as configurations of simple clones appear above them, and so on. Small XVCL metacomponents are combined to form bigger ones, and eventually represent the structure of the entire system. The XVCL Processor reads and processes how the metacomponents are configured. After which it reconstructs the program in its native language such as Java or C++. SPC defines deltas for metacomponents. Buffer Library case studies Java Buffer Library (java.nio.* package of J2SE 1.5). Building generic metacomponents for a group of files that were detected as an FCSet by Clone Miner 74 classes in the Java Buffer library and even if they essentially play the same role they differ in features such as buffer element type, memory allocation scheme, byte ordering, and access mode. All these classes was clustered into seven groups of highly similar classes Buffer Library case studies Example of a cluster is [T]Buffer. T element types, for example Byte, Char, Int, Double, Float, Long, Short. A closer analysis of the Clone Miner’s output reveals that numeric type buffer classes (IntBuffer,ShortBuffer, FloatBuffer, LongBuffer, and DoubleBuffer) differ from each other in type names only. And CharBuffer and ByteBuffer classes have extra methods that are missing from the other classes in the group. This variation is handled by XVCL commands Overall case study output Other Case Studies Eclipse Graphical Editing Framework: open-source development platform for the creation of graphical editors from an existing application model. Other Case Studies Eclipse Visual Editor: is an open-source development platform for creating GUI builders from an existing model. OpenJGraph 0.9.2: OpenJGraph is an open-source Java library to work with graphs. It contains the well-known graph algorithms like ShortestPath, MinimumSpanningTree, etc J2ME Wireless Toolkit 2.2: J2ME Wireless Toolkit 2.2 (WTK) is a Java development platform for wireless applications, mainly for mobile devices such as cell phones and PDAs Java Pet Store 1.3.2: Java Pet Store 1.3.2 is developed as a model application for the Java 2 Platform to demonstrate the capabilities of J2EE for enterprise applications. It provides a template to rapidly develop enterprise solutions Advantages of Clone Miner Improves clone detection : Most of the tools will detect only simple clones which may only be noise when considered individually but when combined to form structural clones they represent a bigger entity. Program understanding : Design recovery is about analyzing programs to identify important design information or concepts in programs. Such information is invaluable in program understanding, maintenance, reuse, and reengineering. Change Impact Analysis: For efficient maintenance of software, good understanding of the system is required to deal with ripple effects of changes and update anomalies Refactoring : Various restructuring or refactoring techniques can be applied to improve the design of a software system without changing its functionality . Analysis of structural clones is helpful in locating places where high-level duplication is present that can be restructured or refactored. Drawbacks. Clone Miner currently does not support detection of dynamic relationships between cloned entities. It works for code only written in Java it is language dependent. More complex similarities can be considered. Detection and analysis of similarity patterns is based only on the physical location of clones. This paper does not explain how the SCSet is obtained. This technique is more domain specific. Improvements Another potentially useful analysis could be to detect repeating groups of Method clones across directories, but this is currently not implemented in Clone Miner.