Mining Logical Clones in Software:
Revealing High-Level Business &
Programming Rules
Wenyi Qian1, Xin Peng1, Zhenchang Xing2, Stan Jarzabek3, Wenyun Zhao1
1Fudan University, China
2Nanyang Technological University, Singapore
3National University of Singapore, Singapore
Logical Clones
• may not well documented
• revealing high-level rules
Logical Clones
• Logical clones consisting of:
– Similar methods
– Similar code fragments
– Similar entity classes
– Persistent data projects
Logical Clones
• Today’s techniques on clone/similarity
detection:
– Simple clone (text, token, AST…)
– Structural clone (simple clone)
– Similar design structures (similarity metrics, machine learning)
• They are not enough to detect high-level
clones:
– lack of high-level information
– need of pre-defined templates, such as certain design pattern
Approach Overview
output
input
abstraction
Program Model
•
•
•
•
Methods & functional clusters
Entity classes
Code clones
Persistent data objects
Program Model
• Methods & functional clusters
– Semantic clustering
Program Model
• Entity classes
– Encapsulating information with getter/setter
Program Model
• Code clones
– Simple clones in different methods
Program Model
• Persistent data objects
– Data tables in DB or data entries in files
Mining Process
<Method>
PosClearPayment
PosPayCheck
<Entity class>
PosScreen
<Method>
<Method>
processPay
<Entity class>
PosScreen
PosPayGiftCard
<Method>
<Entity class>
PosScreen
processPay<Method>
Mining Process
<Method>
PosClearPayment
PosPayCheck
<Entity class>
PosScreen
<Method>
<Method>
processPay
<Entity class>
PosScreen
PosPayGiftCard
<Method>
<Entity class>
PosScreen
processPay<Method>
Mining Process
Mining Process
<Method>
PosClearPayment
PosPayCheck
<Entity class>
PosScreen
<Method>
<Method>
processPay
<Entity class>
PosScreen
PosPayGiftCard
<Method>
<Entity class>
PosScreen
processPay<Method>
Mining Process
Mining Process
<Method>
PosClearPayment
PosPayCheck
<Entity class>
PosScreen
<Method>
<Method>
processPay
<Entity class>
PosScreen
PosPayGiftCard
<Method>
<Entity class>
PosScreen
processPay<Method>
Mining Process
Tool: MiLico
Case Study
• Project: Opentaps 1.4.0
– 14,351 classes & interfaces
– 253,743 methods
• 1690 logical clones mined
– at least 3 nodes & 2 instances
Case Study
Categories of Logical Clones
• Categories of Mined Logical Clones
(manual work)
–
–
–
–
Programming Convention (37%)
Design Structure (24%)
Business Task (23%)
Business Process (16%)
Categories of Logical Clones
• Programming Convention
– Similar ways to implement similar functions
Categories of Logical Clones
• Design Structure
– Similar interaction structures
Categories of Logical Clones
• Business Task
– Similar ways to implement similar business task
Categories of Logical Clones
• Business Process
– Similar business process or sub-process
Human Study
• 5 senior graduate students, 2 questions:
• Helpful for Programming understanding?
• Helpful for Reuse/Evolution?
Human Study
Human Study
• 5 senior graduate students, 2 questions:
• Helpful for Programming understanding?
YES
• Helpful for Reuse/Evolution?
YES
Discussion
• Helpful for reuse, without knowledge of
code details
• Developers with good domain knowledge
will use logical clones better
• Making MiLiCo integrated with IDEs will
make logical clones more useful
Conclusion
•
•
•
•
The concept of logical clones
The approach for mining logical clones
The tool: MiLoCo
A case study, showing that logical clones
are helpful in software understanding, reuse
and maintainance
Thanks for your attention!
Download

Mining Logical Clones in Software: Revealing High