Reverse Engineering of Design Patterns from Java Source Code

advertisement
Reverse Engineering of
Design Patterns from
Java Source Code
UC DAVIS
Nija Shi
shini@cs.ucdavis.edu
Ron Olsson
olsson@cs.ucdavis.edu
Outline
• Design patterns vs. reverse engineering
• Reclassification of design patterns
• Pattern detection techniques
• PINOT
• Ongoing and future work
ASE 2006
UC DAVIS
Design Patterns
A design pattern offers guidelines on when, how, and
why an implementation can be created to solve a
general problem in a particular context.
-- Design Patterns: Elements of Reusable Object-Oriented Software
Gang of Four (GoF)
• A few well-known uses
– Singleton: Java AWT’s (GUI builder) Toolkit class
– Proxy: CORBA’s (middleware) proxy and real objects
– Chain of Responsibility: Tomcat’s (application server)
request handlers
ASE 2006
UC DAVIS
Reverse Engineering of Design Patterns
Component
Toolkit
private static Toolkit toolkit;
public static synchronized Toolkit getDefaultToolkit()
protected abstract ButtonPeer createButton(Button target)
protected abstract TextFieldPeer createTextField(TextField target)
protected abstract LabelPeer createLabel(Label target)
protected abstract ScrollbarPeer createScrollbar(Scrollbar target)
Container
layoutMgr
LayoutManager 1
TextField
Label
Button
1
ComponentPeer
TextFieldPeer
LabelPeer
ButtonPeer
ComponentPeer
ASE 2006
UC DAVIS
Representative Current Approaches
Tools
Language
Techniques
Case Study
Patterns Targeted
SPOOL
C++
Database query
ET++
Template Method,
Factory Method, Bridge
DP++
C++
Database query
DTK
Composite, Flyweight,
Class Adapter
Vokac et al.
C++
Database query
SuperOffice CRM
Singleton, Template Method,
Observer, Decorator
Antoniol et al.
C++
Software metric
Leda, libg++, socket, galib,
groff, mec
Adapter, Bridge
SPQR
C++
Formal semantic
test programs
Decorator
Balanyi et al.
C++
XML matching
Jikes, Leda,
Star Office Calc, Writer
Builder, Factory Method, Prototype,
Bridge, Proxy, Strategy, Template
Method
PTIDEJ
Java
Constraint Solver
Java.awt.*, Java.net.*
Composite, Facade
FUJABA
Java
Fuzzy logic and
Dynamic
analysis
Java AWT
Bridge, Strategy, Composite
WoP Scanner
Java
AST query
AWT, Swing, JDBC API, etc.
Abstract Factory
HEDGEHOG
Java
Formal Semantic
PatternBox, Java 1.1, 1.2
Most GoF patterns (discussed later)
Heuzeroth et al.
Java
Dynamic
analysis
Java Swing
Observer, Mediator, CoR, Visitor
KT
SmallTalk
Dynamic
analysis
KT
Composite, Visitor,
Template Method
MAISA
UML
UML matching
Nokia DX200 Switching
System
Abstract Factory
ASE 2006
UC DAVIS
Current Approaches
• Limitations
– Misinterpretation of pattern definitions
– Limited detection scope on implementation
variants
• Can be grouped as follows:
– Targeting structural aspects
• Analyze class/method declarations
• Analyze inter-class relationships (e.g., whether one class
extends another)
– Targeting behavioral aspects
• Analyze code semantics (e.g., whether a code
segment is single entry)
ASE 2006
UC DAVIS
Targeting Structural Aspects
• Method
– Extract structural relationships (inter-class
analysis)
– For a pattern, check for certain structural
properties
• Drawback
– Relies only on structural relationships,
which are not the only distinction
between patterns
ASE 2006
UC DAVIS
Targeting Behavioral Aspects
• Method
– Narrow down search space
• using inter-class relationships
– Verify behavior in method bodies
• Dynamic analysis
• Machine learning
• Static program analysis
ASE 2006
UC DAVIS
Targeting Behavioral Aspects
• Drawback
– Dynamic analysis:
• Requires good data coverage
• Verifies program behavior but does not verify
the intent
• Complicates the task for detecting patterns
that involve concurrency
– Machine learning:
• Most patterns have concrete definitions, thus
does not solve the fundamental problem.
ASE 2006
UC DAVIS
A Motivating Example
Detecting the
Singleton Pattern:
• As detected by FUJABA
• Common search criteria
– private Singleton()
– private static
Singleton instance
– public static
Singleton
getInstance()
public class Singleton
{
private static Singleton instance;
private Singleton(){}
public static Singleton getInstance()
{
if (instance == NULL)
instance
= Singleton();
new
Singleton();
return
new
instance
= new
Singleton();
return
instance;
return instance;
}
}
Correctly
Inaccurately
identified
recognizedas
aasSingleton
Singleton
• Problem
– No behavioral analysis on
getInstance()
• Solution?
ASE 2006
UC DAVIS
GoF Patterns Reclassified
Singleton
class
structure
factory
interface
forward
data-flow
analysis
check
object creation
backward
data-flow
analysis
Composite
virtual
delegation
1:1
1:N
aggregation
unconditional
delegation
Decorator
conditional
delegation
Proxy
Chain of Responsibility
Singleton
Factory Method
Abstract Factory
call
dependence
analysis
Template Method
grouping
family of
products
push
delegation
centralized
delegation
verify implementation
of flyweight pool
Language
provided
1:N
Facade
Adapter
context -interface
association
check
write access
on context
Observer
Mediator
Flyweight
State
Strategy
Command
Interpreter
Memento
Builder
Prototype
ASE 2006
Visitor
Bridge
Structure
driven
Behavior
driven
Domain
specific
Generic
concepts
Iterator
UC DAVIS
Language-provided Patterns
• Patterns provided in the language or library
– The Iterator Pattern
• “Provides a way to access the elements of an aggregate object
sequentially without exposing its underlying representation” [GoF]
• In Java:
– Enumeration since Java 1.0
– Iterator since Java.1.2
– The for-each loop since Java 1.5
– The Prototype Pattern
• “Specify the kinds of objects to create using a prototypical instance,
and create new objects based on this prototype”
• In Java:
– The clone() method in java.lang.Object
• Pattern Detection
– Recognizing variants in legacy code
ASE 2006
UC DAVIS
Structure-driven Patterns
• Patterns that are driven by software architecture.
• Can be identified by inter-class relationships
– The Template Method, Composite, Decorator, Bridge,
Adapter, Proxy, Facade patterns
• Inter-class Relationships
–
–
–
–
–
–
Accessibility
Declaration
Inheritance
Delegation
Aggregation
Method invocation
ASE 2006
UC DAVIS
Behavior-driven Patterns
• Patterns that are driven by system behavior.
• Can be detected using inter-class and program
analyses.
– The Singleton, Abstract Factory, Factory Method, Flyweight,
CoR, Visitor, Observer, Mediator, Strategy, and State
patterns.
• Program analysis techniques:
– Program slicing
– Data-flow analysis
– Call trace analysis
ASE 2006
UC DAVIS
Domain-specific Patterns
• Patterns applied in a domain-specific context
– The Interpreter Pattern
• “Given a language, define a representation for its grammar
along with an interpreter that uses the representation to
interpret sentences in the language” [GoF]
• Commonly based on the Composite and Visitor patterns
– The Command Pattern
• “Encapsulate a request as an object, thereby letting you
parameterize clients with different requests, queue or log
requests, and support undoable operations” [GoF]
• A use of combining the Bridge and Composite patterns to
separate user interface and actual command execution. The
Memento pattern is also used to store a history of executed
commands
• Pattern Detection
– Requires domain-specific knowledge
ASE 2006
UC DAVIS
Generic Concepts
• Patterns that are generic concepts
– The Builder Pattern
• “Separate the construction of a complex object from its
representation so that the same construction can create
different representation” [GoF]
• System bootstrapping pattern, object creation is not
necessary
– The Memento Pattern
• “Without violating encapsulation, capture and externalize an
object’s internal state so that the object can be restored to
this state later” [GoF]
• Implementation of memo pool and representation of states
are not specifically defined.
• Pattern detection
– Lack implementation trace
ASE 2006
UC DAVIS
Recognizing the Singleton Pattern
• Structural aspect
– private Singleton()
– private static Singleton instance
– public static Singleton getInstance()
• Behavioral aspect
– Analyze the behavior in getInstance()
• Check if lazy-instantiation is implemented
– Check if instance is returned
– Slice the method body for instance and analyze
the sliced program
ASE 2006
UC DAVIS
Recognizing the Singleton Pattern
public class SingleSpoon
{
private SingleSpoon();
private static SingleSpoon theSpoon;
public static SingleSpoon getTheSpoon()
{
if (theSpoon == null)
theSpoon = new SingleSpoon();
return theSpoon;
}
}
ASE 2006
Conditions
theSpoon == null
Statements
theSpoon (created)
Conditions
theSpoon != null
Statements
theSpoon (returned)
UC DAVIS
Pattern INference and recOvery Tool
• PINOT
– A fully automated pattern detection tool
– Designed to be faster and more accurate
– Detects structural- and behavioral-driven
patterns
• How PINOT works
Pattern Instances
JAVA
Source
Code
Text
PINOT
Pattern Instances
U M L
XMI view
ASE 2006
editors
UC DAVIS
Implementation Alternatives
• Program analysis tools
– Extract basic information of the source code
• Class, method, and variable declarations
• Class inheritance
• Method invocations, call trace
• Variable refers-to and refers-by relationships
• Parsers
– Extract the abstract syntax tree (AST)
• Compilers
– Extract the AST and provide related symbol
tables and built-in functions operating on the AST
ASE 2006
UC DAVIS
Implementation Overview
• A modification of Jikes (open source C++ Java
compiler)
• Analysis using Jikes abstract syntax tree (AST) and
symbol tables
• Identifying Structure-driven patterns
– Considers Java language constructs
– Considers commonly used Java utility classes:
java.util.Collection and java.util.Iterator
• Identifying Behavior-driven patterns
– Applies data-flow analysis, inter-procedural analysis, alias
analysis
• PINOT considers related patterns
– Speed up the process of pattern recognition
– E.g., Strategy and State Patterns, CoR and Decorator, etc.
ASE 2006
UC DAVIS
Benchmarks
• Java AWT (GUI toolkit)
• javac (Sun Java Compiler)
• JHotDraw (GUI framework)
• Apache Ant (Build tool)
• Swing (Java Swing library)
• ArgoUML (UML editor tool)
ASE 2006
UC DAVIS
PINOT Results
• PINOT works well in terms of accuracy: it recognizes
many pattern instances in the benchmarks.
• Like other pattern detection tools, PINOT is not
perfect:
– False positives
• Prototype vs. Factory Method
– PINOT does not detect Prototype pattern
– Prototype pattern involves object creation
– PINOT identifies implementation of clone methods as factory
methods
– False Negatives
• User-defined data structures
– Container structures are commonly used with Observer,
Mediator, Composite, Chain of Responsibility patterns, etc.
ASE 2006
UC DAVIS
Pattern Interpretation
• Flyweight vs. Immutable
– Immutable classes are sharable singletons
• Mediator vs. Facade
– Colleagues of participating in the
Mediator pattern can have different types
– A mediator class becomes a facade
against an individual colleague class
ASE 2006
UC DAVIS
Ab
st
r
Fa ac
ct tFa
or c
yM tor
y
Si eth
ng od
le
Ad ton
ap
B ter
C rid
om g
e
D pos
ec it
or e
a
Fa tor
Fl ca
yw de
ei
gh
Pr t
ox
y
M Co
ed R
O iato
bs r
er
ve
Te
r
S
m
t
pl Str ate
at at
e M eg
et y
ho
Vi d
si
to
r
No. of Pattern Instances
PINOT Results
600
No. of
classes KLOC
500
400
Ant
Swing
ASE 2006
Time
(sec)
AWT
526
485
72.4
142.8
15.54
13.77
JHotDraw
464
71.7
15.73
1028
263.5
72.91
300
200
100
0
UC DAVIS
Ongoing and Future Work
• Investigate other domain-specific
patterns
– High performance computing (HPC)
patterns
– Real-time patterns
• Extend usability of PINOT
– Formalize pattern definitions
– Visualizing detection results
ASE 2006
UC DAVIS
PINOT + Eclipse
ASE 2006
UC DAVIS
Conclusion
• Reverse engineering of
design patterns
• Reclassifying the GoF
patterns for reverseengineering
• PINOT – a faster and more
accurate pattern detection
tool
• Ongoing and future work
• More information on our website:
http://www.cs.ucdavis.edu/~shini
/research/pinot
ASE 2006
UC DAVIS
Download