Makoto Ichii
Takashi Ishio
Katsuro Inoue
Osaka University
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
1
Automatic detection of crosscutting concerns helps
Finding refactoring opportunities
Understanding application-specific coding rules
Fung: Coding pattern detection tool
[Ishio, 2008][Miyake, 2007]
Detects coding patterns including crosscutting concerns from an application using a data mining technique
Basic idea: “a crosscutting concern code
frequently appears across an application”
[Ishio, 2008] T. Ishio. H. Date, T. Miyake and K. Inoue,
"Mining Coding Pattern to Detect Crosscutting Concerns in Java Programs", Proc. WCRE2008, 2008
[Miyake, 2007] T. Miyake, T. Ishio, K. Taniguchi, K. Inoue,
"Towards Maintenance Support for Idiom-based Code Using Sequential Pattern Mining", Proc. AOASIA3, 2007
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
2
Coding pattern
An ordered sequence of method calls and control statements that frequently appears in source code.
Process of coding pattern detection
Source code parse & normalize
Method call sequence
Sequential pattern mining
Coding pattern
… if (log.isDebugEnabled())
{ log.debug(getMessage());
}
… …
String status = getStatus(); if (log.isDebugEnabled())
{ log.debug(status);
}
… … if (log.isDebugEnabled())
{
}
… log.debug("QBK");
… isDebugEnabled()
IF getMessage() debug()
END_IF
…
… getStatus() isDebugEnabled()
IF debug()
END_IF
…
… isDebugEnabled()
IF debug()
END_IF
…
1: isDebugEnabled()
2: IF
3: debug()
4: END_IF
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
3
Detected coding patterns include generic idioms
Idioms also frequently appear across code base
Less interesting to developers who need applicationspecific knowledge
Target application Detected patterns
Logging
1: isDebugEnabled()
2: IF
3: debug()
4: END_IF
Iterator idiom
1: iterator()
2: hasNext()
3: LOOP
4: next()
5: hasNext()
6: END_LOOP
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
4
Key Idea
Generic idioms appear in various applications
Application-specific patterns appear in a few applications
Measure how widely a class/pattern is used across applications
– “Universality” metric
Logging
1: isDebugEnabled()
2: IF
3: debug()
4: END_IF
Iterator idiom
1: iterator()
2: hasNext()
3: LOOP
4: next()
5: hasNext()
6: END_LOOP
Appears in only two applications
Appears in almost all applications
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
5
Collect various applications
Including target application
Analyze the use-relation between the classes in the applications
Measure universality metric for each classes
Filter out the patterns comprising only universally-used classes.
Target application Detected patterns Filtered patterns
1: indexOf
2: lastIndexOf
3: substring
1: iterator()
2: hasNext()
3: LOOP
4: next()
5: hasNext()
6: END_LOOP
1: isDebugEnabled()
2: IF
3: debug()
4: END_IF
1: activate
2: IF
3: deactivate
4: END_IF
1: contains
2: IF
3: get
4: END_IF
1: indexOf
2: lastIndexOf
3: substring
1: iterator()
2: hasNext()
3: LOOP
4: next()
5: hasNext()
6: END_LOOP
1: activate
2: IF
3: deactivate
4: END_IF
1: isDebugEnabled()
2: IF
3: debug()
4: END_IF
1: contains
2: IF
3: get
4: END_IF
1. …………………………
2. …………………………
3. …………………………
4. …………………………
5. …………………………
…
…
…
Application collection Use-relation between classes List of universally-used classes
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
6
An extension of ordinal static use-relation analysis between classes in an application.
Build a use-relation graph
Node: class
Edge: static use-relation between classes
Kinds of use-relation
Inheritance, Method call, Field access, Instantiation and
Variable/Parameter declaration
Source code Use-relation graph
WarehouseApp
WarehouseApp class Liquor { long price;
String name;
…
} class Warehouse {
…
Liquor liq = new Liquor();
…
}
Warehouse
Liquor
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
7
Analyze use-relation between classes across application borders
Analyze intra-application use-relation
in the same way with the case of single application
If there are several copies of “used class” in different applications, create edges to all of them
WarehouseApp
Warehouse
StoreApp
Store Shelf
Liquor Liquor Paper
A copy of
Liquor in
WarehouseApp
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
8
Class fan-in of a class c
The number of classes using c
Application fan-in of a class c
The number of applications using c
App Class
WA
SA
Warehouse
Liquor
Store
Shelf
Liquor
Paper
CFI AFI
0 0
3 2
0
1
3
2 1
0
1
2
WarehouseApp
Warehouse
StoreApp
Store Shelf
Liquor Liquor Paper
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
9
Class universality of a class c
Represents how widely a class is used
From many classes / applications
c
i c
C
a c
A
i c
: class fan-in of c
; a c
: application fan-in of c
;
|
C
| : total number of classes; |
A
| : total number of applications
Frequentlyused locally Frequently-used universally
App Class
WA
SA
Warehouse
Liquor
Store
Shelf
Liquor
(copy)
Paper
CFI AFI
0 0
3 2
3
2
0
1
2
1
0
1
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Univ.
0
1.2
0
0.39
1.2
0.61
10
Pattern universality of a pattern p
The minimum universality value of the classes whose methods are invoked in p
A universal pattern comprises only universal classes
Coding pattern
1: iterator()
2: hasNext()
3: LOOP
4: next()
5: hasNext()
6: END_LOOP
Involved classes
Class
Collection
Iterator
Univ.
0.72
0.77
Pattern universality = 0.72
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
11
Case Study 1
Measure class universality value of actual classes
Case Study 2
Measure pattern universality value of coding patterns detected by Fung
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
12
–
Questions
Q.1
What kind of classes have high universality?
Q.2
Can universality distinguish classes widely used and classes simply frequently used?
Q.3
What threshold value is good for filtering?
Process
Measure class universality of classes in application collection
Investigate the result to answer the questions
The top-20 classes in the universality [Q.1]
Difference between the universality and the fan-in [Q.2]
Distribution of the universality [Q.3]
Target
39 application packages (131,328 classes)
Java SE 1.5
Various OSS packages covering a broad range of domains
– Eclipse (IDE), Azureus (Network client), Apache Tomcat (Network server),
Freemind (Drawing tool), …
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
13
–
Class name
1 java.lang.String
Univ.
CFI
0.933 69,324
Q.1
What kind of classes have high universality?
2 java.lang.Object
3 java.util.List
4 java.lang.System
5 java.lang.Class
6 java.lang.Throwable
7 java.util.Iterator
8 java.util.ArrayList
0.915 55,628
0.793 12,981
0.780 11,191
0.776 10,590
0.775 10,467
0.773 10,191
0.772 10,135
Fundamental / Utility classes
9 java.lang.Exception
10 java.util.Map
0.761
8,840
0.757
8,476
11 java.lang.Integer
12
14
16
18 java.util.Set
13 java.io.File java.lang.StringBuffer
15 java.io.PrintStream java.util.HashMap
17 java.io.IOException java.util.Collection
19 java.lang.IllegalArgumentException
0.748
0.741
0.736
0.735
0.730
0.730
0.725
0.724
0.714
7,568
6,954
6,554
6,907
6,132
6,129
6,115
5,690
5,057
20 java.lang.Runnable
2008/12/2
0.699
6,790
AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
14
–
High universality / Low fan-in
Class Name java.lang.Character java.util.LinkedList java.io.FileOutputStream java.lang.Comparable java.util.Stack
Rank
[Univ.]
39
41
56
78
95
Rank
[CFI]
104
105
177
240
354
High universality / Low fan-in
Classes with fundamental / utility role
Low universality / High fan-in
Classes implementing crosscutting concerns in a large application
Low universality / High fan-in
Class Name org.eclipse.swt...Control org.eclipse.swt.SWT org.eclipse.core…IResource org.openide.util.NbBundle org.openide.ErrorManager
Rank
[Univ.]
213
221
564
1,398
1,496
Rank
[CFI]
25
34
69
24
54
Q.2
Can universality distinguish classes widely used and classes simply frequently used?
Yes.
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
15
–
1.0-0.5: general-purpose classes
Primitive/fundamental classes, collection utilities, …
0.5-0.2: domain-specific classes
Logging utility, networking, GUI, …
0.2-0: application-local classes
Univ.
1.0 – 0.9
0.9 – 0.8
0.8
– 0.7
0.7
– 0.6
0.6 – 0.5
0.5 – 0.4
0.4 – 0.3
#of Classes
2 java.lang
Package
0 -
17 java.util, java.lang, java.io
18 java.lang, java.util, java.io, java.net, java.awt
49 java.util, java.lang, java.io, javax.swing, java.awt,...
80 java.io, java.lang, javax.swing, javax.swing, java.awt,...
196 org.eclipse.swt.widgets, javax.swing, java.util, java.awt.event, java.lang, ...
348 org.eclipse.swt.widgets, org.eclipse.swt.graphics, javax.swing,
0.3 – 0.2
0.2 – 0.1
0.1 – 0.0
2008/12/2
Q.3
What threshold value is good for filtering?
1,385
0.2 for finding application-specific concerns
129,233 org.gudy.azureus2.core3.util, org.bouncycastle.asn1, ...
0.5 for filtering out generic concerns soot.jimple.parser.node, org.apache.poi.....functions, test, soot.coffi, ...
AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
16
–
Question
Can the pattern universality distinguish among generic, domain-specific and application-specific patterns?
Process
Categorize coding patterns according to pattern universality
1.0 – 0.5: Generic pattern
0.5 – 0.2: Domain-specific pattern
0.2 – 0.0: Application-specific pattern
Target
Coding patterns
Azureus (presented in [Ishio, 2008] )
Application collection
Same as Case Study 1
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
17
–
Generic patterns (2290 patterns)
String manipulation
String.lastIndexOf() / IF / String.substring() / END_IF
Collection manipulation
List.get() / IF / List.remove() / END_IF
Domain-specific patterns (79 patterns)
Collection manipulation
Map.size() / Iterator.remove() / LinkedHashMap.get() /
LinkedHashMap.remove()
Domain-specific?
Application-specific patterns (2293 patterns)
Logging
LOOP / Thread.sleep() / Debug.printStackTrace() / END_LOOP
Synchronization
IF / AEMonitor.enter() / ArrayList.remove() / AEMonitor.exit()
/ END_IF
Q.
Can the pattern universality distinguish generic
/ domain-specific / application-specific patterns?
Almost yes.
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
18
Universality metric can distinguish universally-used classes
Resource management classes in Eclipse/NetBeans are distinguished as application-specific
although they have large fan-in
Universality metric value may depend on a set of applications
Case studies in different target are needed
E.g. industrial software systems.
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
19
Some domain-specific classes have higher class universality than general-purpose classes
Ideas to improve the metric
Propagate fan-in through important use-relation
E.g. inheritance
Combining other metric
Class name
…
33 java.awt.Component
Univ.
0.63
…
113 java.util.ListIterator
…
0.46
230 java.util.LinkedHashMap 0.36
…
+ Less popular generic concerns may be more interesting than famous domain-specific ones
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
20
Cross-application fan-in analysis for filtering coding patterns
Measures universality, or a metric that represents how widely a class/pattern is used
Future work
Case studies with different applications
Refinement of the universality metric
2008/12/2 AOAsia 4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
21