An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages

advertisement
An Efficient Inclusion-Based
Points-To Analysis for
Strictly-Typed Languages
John Whaley
Monica S. Lam
Computer Systems Laboratory
Stanford University
September 18, 2002
Background
 Andersen’s



points-to analysis for C (1994)
Flow-insensitive, context-insensitive
Inclusion-based, more accurate than
unification-based Steensgaard
O(n3), considered too slow to be practical
 CLA
optimization to Andersen’s analysis
(Heintze & Tardieu, PLDI’01)


Online caching/cycle elimination
Field-independent: 1.3M lines of code in 137s
September 18, 2002
SAS 2002
Slide 2
Doing it for Java
 We
want Andersen-level pointers for Java
 Naïve port of CLA algorithm:


Spec “compress” benchmark: 2+ hours!
Call graph accuracy: same as RTA (terrible)
 Our



paper: how to do CLA for Java
Spec “compress” benchmark: 5 seconds!
JEdit (1371 classes): ~10 minutes!
Call graph accuracy: very good
September 18, 2002
SAS 2002
Slide 3
Java vs. C: Virtual calls
 Java


has many virtual calls
Accuracy of analysis strongly affects number
of call targets
More call targets leads to more code being
analyzed and longer analysis times
September 18, 2002
SAS 2002
Slide 4
Java vs. C: Treatment of Fields
 Field-independent:


Most C pointer analyses
Sound even for non-type-safe languages
 Field-based:


in o.f, use only f
Very inaccurate, requires type safety
 Field-sensitive:

in o.f, use only o
in o.f, use both o, f
Strictly more accurate than field-independent
or field-based
Essential for Java
September 18, 2002
SAS 2002
Slide 5
Java vs. C: Local variables
 Local
variables/stack locations are reused
 Flow
insensitivity causes many false aliases
 Local
flow sensitivity is necessary
September 18, 2002
SAS 2002
Slide 6
Our Contribution
 Andersen-style
inclusion-based points-to
analysis for Java, based on ideas from CLA

Field sensitivity
• Tracks separate fields of separate objects

Uses “method summary graphs”
• Sparse representation, uses local flow sensitivity

Optimizations
• Caching across iterations, reducing redundant ops

Supports all features of Java
September 18, 2002
SAS 2002
Slide 7
Algorithm Overview
Intraprocedural:
Generate a sparse, flow-insensitive
summary graph for each method

Based on access paths, uses local flow sensitivity
Interprocedural:
Using summary graphs, build inclusion
graph to obtain whole-program result
September 18, 2002
SAS 2002
Slide 8
Method Summaries
 Sparse,
flow-insensitive summary of the
semantics of each method



Stores (writes) in method
Calls made by method and their parameters
Return values, thrown and caught exceptions
 Use
a flow-sensitive technique to generate
method summaries

Precisely model updates to stack and locals
September 18, 2002
SAS 2002
Slide 9
Method Summary: Example
Code for method foo:
Summary for method foo:
static void foo(C x, C y) {
C t = x.f;
t.g = y;
x.g = x;
t.bar(y);
}
x
f
x.f
g
y
g
bar(t,y);
read edge
write edge
parameter map edge
September 18, 2002
SAS 2002
Slide 10
Node types
A node represents an object at run time.
 Concrete type nodes


Objects that have a known concrete type
new statements and constant objects
 Abstract


nodes
Parameters, return values, dereferences
Interprocedural phase maps an abstract node
to set of concrete nodes it can represent
September 18, 2002
SAS 2002
Slide 11
Edge types
 Read


edge:
Created by load statements
Represent dereferences (access paths) of
known locations
 Write


f
edge:
f
Created by store statements
Represent references created by the method
September 18, 2002
SAS 2002
Slide 12
Outgoing parameter map
 Records
which nodes are passed as which
parameters
 This is used in the interprocedural phase
to match call sites to call targets
x
f
x.f
g
y
g
t.bar(y);
September 18, 2002
SAS 2002
Slide 13
Generating method summary
 Worklist
data flow solver (flow-sensitive)
 Strong updates on locals, weak on others
 Detect and close cycles in access paths
 More detail in the paper
September 18, 2002
SAS 2002
Slide 14
Review: Andersen’s Points-to
 Points-to
is encoded as inclusion relations
x=y
implies
xy
x  y is also written as: x  y
September 18, 2002
SAS 2002
Slide 15
Review: Andersen’s Points-to
Rule name:
If code contains:
Apply rule:
Store
x.f = e;
x  newy
newy.f  e
Load
e = x.f;
x  newy
e  newy.f
Copy
e1 = e2;
e1  e2, e2  e3
e1  e3
Transitive closure
September 18, 2002
e 1  e2
SAS 2002
Slide 16
Andersen example
g
t = x.f;
t.g = y;
x.g = x;
September 18, 2002
x
f
x.f
SAS 2002
g
y
Slide 17
Andersen example
g
t = x.f;
t.g = y;
x.g = x;
x
C
September 18, 2002
f
f
x.f
D
SAS 2002
g
y
E
Slide 18
Andersen example
g
t = x.f;
t.g = y;
x.g = x;
x
C
Rule name:
Load
September 18, 2002
f
x.f
f
g
D
If code contains:
e = x.f;
SAS 2002
y
E
Apply rule:
x  newy
e  newy.f
Slide 19
Andersen example
g
t = x.f;
t.g = y;
x.g = x;
x
C
Rule name:
Load
September 18, 2002
f
x.f
f
g
D
If code contains:
e = x.f;
SAS 2002
y
E
Apply rule:
x  newy
e  newy.f
Slide 20
Andersen example
g
t = x.f;
t.g = y;
x.g = x;
x
C
Rule name:
Store
September 18, 2002
f
x.f
f
g
D
If code contains:
x.f = e;
SAS 2002
y
E
Apply rule:
x  newy
newy.f  e
Slide 21
Andersen example
g
t = x.f;
t.g = y;
x.g = x;
x
C
Rule name:
Store
September 18, 2002
f
x.f
f
g
D
If code contains:
x.f = e;
SAS 2002
y
E
Apply rule:
x  newy
newy.f  e
Slide 22
Andersen example
g
t = x.f;
t.g = y;
x.g = x;
x
C
Rule name:
Store
September 18, 2002
f
x.f
f
g
D
If code contains:
x.f = e;
SAS 2002
y
E
Apply rule:
x  newy
newy.f  e
Slide 23
Andersen example
g
t = x.f;
t.g = y;
x.g = x;
x.f
g
y
g
x
f
C
Rule name:
Store
September 18, 2002
f
D
If code contains:
x.f = e;
SAS 2002
E
Apply rule:
x  newy
newy.f  e
Slide 24
Mapping method calls
t.bar(y);
g
t = x.f;
t.g = y;
x.g = x;
t.bar(y);
x.f
g
y
g
x
f
C
September 18, 2002
f
D
SAS 2002
E
Slide 25
Mapping method calls
t.bar(y);
g
t = x.f;
t.g = y;
x.g = x;
t.bar(y);
x.f
g
y
g
x
f
C
September 18, 2002
f
D
SAS 2002
E
Slide 26
Mapping method calls
t.bar(y);
g
t = x.f;
t.g = y;
x.g = x;
t.bar(y);
x.f
g
y
g
x
f
C
September 18, 2002
f
D
E
Bar:
this
Bar:
p1
SAS 2002
Slide 27
Overall Picture
“Abstract” world
E
“Concrete” world
F
C
D
September 18, 2002
SAS 2002
Slide 28
Graph-based Andersen
 Computing
full transitive closure is
prohibitively expensive
 Store the graph in pre-transitive form,
and calculate reachable nodes on demand
September 18, 2002
SAS 2002
Slide 29
Algorithm
foreach write edge e1 → e2 do
foreach n in getConcreteNodes(e1)
add write edge n.f → e2
foreach read edge e1 → e2 do
foreach n in getConcreteNodes(e1)
add inclusion edge e2  n.f
foreach method call e1.f()
foreach n in getConcreteNodes(e1)
add parameter mappings for target method
September 18, 2002
SAS 2002
Slide 30
Caching reachability queries
 getConcreteNodes(e):
transitive closure
query on the inclusion graph
 The same queries are repeated many times
 Store the result in a hash table


Cached result may be stale due to edges
added since the last query
Iterate until convergence
September 18, 2002
SAS 2002
Slide 31
Online cycle detection
 Inclusion
graph includes cycles
 The algorithm collapses cycles as they are
traversed



During traversal, keeps track of current path
If a node on current path is revisited, collapse
all nodes in cycle
Each node has a “skip” pointer, which is set
when collapsed and followed on all accesses
September 18, 2002
SAS 2002
Slide 32
Reusing caches
 Concrete
node cache values don’t change
much between algorithm iterations
 Reallocation and rebuilding them is
expensive
 Reuse caches from old iterations

Keep track of an iteration ‘version’ number for
each cache entry
September 18, 2002
SAS 2002
Slide 33
Minimizing set union operations
 Many
caches don’t change across
iterations
 Avoid set union operations for caches that
haven’t changed since the last iteration


Keep a ‘changed’ flag for each cache entry,
records if last computation changed the entry
If input set hasn’t changed, set union
operation is redundant
September 18, 2002
SAS 2002
Slide 34
Experimental Results
 Concrete
type inference
 Static call graph
 Implemented
in ~800 lines of Java
 Freely available at:
http://joeq.sourceforge.net
September 18, 2002
SAS 2002
Slide 35
Programs

SpecJVM


J2EE – Java 2 Enterprise Edition v1.3


Compiler infrastructure, 75K lines
Cloudscape


Massive (1+ million lines) business framework
joeq


Standard benchmark suite
Database shipped with J2EE, no source code
JEdit

Full-featured editor, 100K lines
September 18, 2002
SAS 2002
Slide 36
Experimental Results
 We
analyzed the reachable code for
each application


Results include code in class library
Analysis was very effective in reducing
total program size
 Pentium
4 2GHz 2GB RAM, Redhat 7.2
 Sun JDK 1.3.1_01 with 512MB heap
September 18, 2002
SAS 2002
Slide 37
c
co hec
m k
pr
es
s
db
ja
ck
ja
va
c
m
pe jes
ga s
ud
io
m
ra tr
yt t
ad rac
m e
in
ap too
pc l
de lie
pl nt
j2 oyto
ee o
se l
pa rve
ck r
ag
ve er
rif
ie
r
clo jo
ud eq
sc
ap
e
je
di
t
Average targets per call site
Analysis Precision vs. RTA
3
2.5
2
1.5
September 18, 2002
RTA
Points-to
1
0.5
0
SAS 2002
Slide 38
Analysis time: Small benchmarks
80
70
Seconds
60
50
No opt
Opt
40
30
20
10
September 18, 2002
SAS 2002
ce
ra
yt
ra
m
trt
o
m
pe
ga
ud
i
je
ss
ja
va
c
ja
ck
db
ch
ec
k
co
m
pr
es
s
0
Slide 39
Analysis time: Large benchmarks
2000
1800
1600
Seconds
1400
1200
No opt
Opt
1000
800
600
400
200
September 18, 2002
it
ds
je
d
ca
pe
q
jo
e
cl
ou
ve
rif
i
er
r
ag
e
ck
pa
es
er
v
er
l
j2
e
pl
oy
to
o
de
ie
nt
pc
l
ap
ad
m
in
to
o
l
0
SAS 2002
Slide 40
db
ja
ck
ja
va
c
m jes
pe
s
ga
ud
io
m
ra trt
yt
r
ad ace
m
in
ap too
pc l
de l ien
pl
oy t
j2
ee too
se l
pa rve
ck r
ag
e
ve r
rif
ie
r
cl
ou joeq
ds
ca
pe
je
di
t
c
co hec
m k
pr
es
s
Times speedup
Analysis time (speedup)
20
18
16
14
12
10
Opt
8
6
4
2
0
September 18, 2002
SAS 2002
Slide 41
September 18, 2002
db
ja
ck
ja
va
c
m
j
pe ess
ga
ud
io
m
ra trt
yt
ad rac
m e
in
ap too
pc l
de lien
pl
oy t
j2
ee too
se l
pa rve
ck r
ag
e
ve r
rif
ie
r
cl
j
ou oe
ds q
ca
pe
je
di
t
c
co hec
m k
pr
es
s
Bytecodes per second
Analysis time (bytecodes/second)
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
SAS 2002
Slide 42
Related Work
 Original

CLA paper
Heintze and Tardieu (PLDI 2001)
 Anderson’s



Rountev, Milanova, Ryder (OOPSLA 2001)
Liang, Pennings, Harrold (PASTE 2001)
Many others…
 Concrete


analysis for Java
type inference
CHA, RTA
Flow and context sensitivity, 0-CFA
September 18, 2002
SAS 2002
Slide 43
Conclusion
 Improved


Field sensitivity
Local flow sensitivity
 Improved


precision
efficiency
Reuse reachability cache across iterations
Minimize set-union operations
 Scales
to the largest Java programs
 A new baseline for Java pointers

No reason to use a less precise analysis
September 18, 2002
SAS 2002
Slide 44
Download