Notes by Walid Gomaa

advertisement
Walid Gomaa
CMSC 631
Paper: “Type –Based Analysis and Applications”
Control-Flow Analysis Problem (CFA)
The purpose of control-flow analysis is to compute an approximation of the possible functions
that can be called from each program point. More formally, we define the standard CFA on a
variant of -calculus with labeled abstractions as follows: e ::= x | lx.e | e1 e2. So the purpose of
CFA is to associate a set of labels L(e), called the flow set of e, with each subexpression e of the
program such that if e reduces to an abstraction labeled l during execution of the program, then
lL(e).
0-CFA
0-CFA reformulates the CFA as a transition system as follows:
e1  l x. e
l
l
 x. e   x. e (1),
(if e1 e 2 occurs in expression e) (2)
x  e2
e1  e2 e2  e3
e1  l x. e
(if e1 e 2 occurs in expression e) (3),
(4)
e1 e2  e
e1  e3
Then the standard CFA can now be redefined as follows: given a program expression e, find all
abstractions lx.e such that elx.e is derivable from the above rules. For example, given the
expression: F = ((1f. 2x. f x) (3a. a)) (4b.b), we can derive the following transitions:
(1)
( 2)
( 3)
1 f .2 x. f x 
1 f .2 x. f x, f 
3a. a, (1 f .2 x. f x) (3a. a) 
2 x. f x
( 2)
a 
x,
( 3)
f x 
a,
( 4)
a 
4b. b,
( 2)
( 3)
x 
4b. b, ((1 f .2 x. f x) (3a. a)) (4b. b) 
f x
( 4)
( 4)
f x 
4b. b, ((1 f .2 x. f x) (3a. a)) (4b. b) 
4b. b
F  4b. b, L( F )  {4}
An algorithm based on this transition system is basically an algorithm that tries to find the
transitive closure in the flow graph given the basic edges derived using the first three rules. This
algorithm takes O(n3) time, where n is the number of syntax nodes.
Type and Effect System
From now on assume that the program type checks. A simple type system is generated by the

 t , where each function type is annotated with a flow set .
following grammar: t ::  | t 
The rules for this simply annotated typing are as follows:
, x : s |  e : t
 |  x : t (1),
(l   ) (2)

 |  l x.e : s 

t

 |  e1 : s 

t  |  e2 : s
(3)
 |  e1 e2 : t
For the previous example, we can construct the following derivation:
{4}
{3}
{4}
{1}
{4}
{2}
{4}
1 f .2 x. f x : (( 
 ) 
( 
 )) 
(( 
 ) 
( 
 ))
{4}
{2}
{4}
{4}
{3}
{4}
2 x. f x : (( 
 ) 
( 
 )), 3a. a : (( 
 ) 
( 
 ))
{4}
{4}
{2}
{4}
4b. b :  
 , (1 f .2 x. f x) (3a. a) : (( 
 ) 
( 
 ))
{4}
{4}
((1 f .2 x. f x) (3a. a)) (4b. b) :  
 , F :  
 , L( F )  {4}
2
Sparse Flow Graph Approach
The second type-based analysis uses a sparse flow graph and avoids the transitive closure of 0CFA. All potential nodes are generated by the following grammar: n ::= e | dom(n) | ran(n),
where dom(n) and ran(n) are the domain and range of n. The transition system for the new flow
graph is defined as follows:
x  dom(l x. e) (l x. e occurs in E) (1),
ran(l x. e)  e (l x. e occurs in E) (2)
dom(e1 )  e2 (if e1 e 2 ocuurs in E) (3),
e1 e2  ran(e1 ) (if e1 e 2 ocuurs in E) (4)
n1  n2 n  dom(n2 )
n1  n2 n  ran(n1 )
(5),
(6)
dom(n2 )  dom(n1 )
ran(n1 )  ran(n2 )
L(e) is defined as the set of abstractions lx.e’ such that there exists a path e * lx.e’ in the
flow graph. For the given expression F the following edges can be generated:
(1)
( 3)
f 
dom(1 f .2 x. f x) 
3a. a
( 4)
( 2)
(1 f .2 x. f x)(3 a. a) 
ran(1 f .2 x. f x) 
2 x. f x
( 4)
(6)
(6)
( 2)
F 
ran((1 f .2 x. f x)(3a. a)) 
ran(ran(1 f .2 x. f x)) 
ran (2 x. f x) 
f x
( 4)
(6)
(6)
( 2)
(1)

ran( f ) 
ran(dom(1 f .2 x. f x)) 
ran(3a. a) 
a 
dom(3a. a)
( 5)
(5)
( 3)
(1)

dom(dom(1 f .2 x. f x)) 
dom( f ) 
x 
dom(2 x. f x)
( 5)
(5)
( 3)

dom(ran(1 f .2 x. f x)) 
dom((1 f .2 x. f x)(3a. a)) 
4b. b
So the flow set of F is {4}. It can be shown that if a -term is simply typed then the flow graph
will be finite, sparse, built in finite time, and the produced flow information will be the same as
that produced by 0-CFA. For bounded types, the flow information can be computed in O(n2)
time.
Types as Discriminators
In this type-based analysis, L(e) is the set of abstractions in the program that have the same type
as e. In the given example F and 4b.b have the same type  so L(F) = {4}.
Advantages of Type-based Analysis

Simplicity: types provide an infrastructure on top of which analyses can be built.

Efficiency: statically-typed programs are more structured and hence easier to analyze than
dynamically-typed programs.

Correctness: well-typed programs can not go wrong. The correctness of the analysis is
subsumed by the correctness of the type system.
Applications of Type-Based Analysis

Method Inlining: Type-based Analyses such as CHA (Class Hierarchy Analysis) and RTA
(Rapid Type Analysis) use types as discriminators to determine the methods that can be
invoked at virtual call sites so that the compiler can inline these calls.

Application Extraction: CHA and RTA can be extended with a form of reachability
analysis to build the call graph which can be used to remove methods that are not reachable
from the main method, replace dynamically dispatched method calls with direct method
calls, inline method calls for which there is a unique target, and other optimizations.

Redundant-Load Elimination: A compile-time optimization that needs alias analysis.
Several type-based alias analysis were suggested that use types as discriminators.
3



Type-based escape analysis (finding confined classes in Java bytecode whose objects do
not escape the package).
Typed-based analysis for race detection (determine when two threads manipulate a shared
data structure without synchronization).
Typed inference can determine where to allocate and deallocate regions in a region-based
memory management.
4
Download