Walid Gomaa CMSC 631 Paper: “Type –Based Analysis and Applications” Control-Flow Analysis Problem (CFA) The purpose of control-flow analysis is to compute an approximation of the possible functions that can be called from each program point. More formally, we define the standard CFA on a variant of -calculus with labeled abstractions as follows: e ::= x | lx.e | e1 e2. So the purpose of CFA is to associate a set of labels L(e), called the flow set of e, with each subexpression e of the program such that if e reduces to an abstraction labeled l during execution of the program, then lL(e). 0-CFA 0-CFA reformulates the CFA as a transition system as follows: e1 l x. e l l x. e x. e (1), (if e1 e 2 occurs in expression e) (2) x e2 e1 e2 e2 e3 e1 l x. e (if e1 e 2 occurs in expression e) (3), (4) e1 e2 e e1 e3 Then the standard CFA can now be redefined as follows: given a program expression e, find all abstractions lx.e such that elx.e is derivable from the above rules. For example, given the expression: F = ((1f. 2x. f x) (3a. a)) (4b.b), we can derive the following transitions: (1) ( 2) ( 3) 1 f .2 x. f x 1 f .2 x. f x, f 3a. a, (1 f .2 x. f x) (3a. a) 2 x. f x ( 2) a x, ( 3) f x a, ( 4) a 4b. b, ( 2) ( 3) x 4b. b, ((1 f .2 x. f x) (3a. a)) (4b. b) f x ( 4) ( 4) f x 4b. b, ((1 f .2 x. f x) (3a. a)) (4b. b) 4b. b F 4b. b, L( F ) {4} An algorithm based on this transition system is basically an algorithm that tries to find the transitive closure in the flow graph given the basic edges derived using the first three rules. This algorithm takes O(n3) time, where n is the number of syntax nodes. Type and Effect System From now on assume that the program type checks. A simple type system is generated by the t , where each function type is annotated with a flow set . following grammar: t :: | t The rules for this simply annotated typing are as follows: , x : s | e : t | x : t (1), (l ) (2) | l x.e : s t | e1 : s t | e2 : s (3) | e1 e2 : t For the previous example, we can construct the following derivation: {4} {3} {4} {1} {4} {2} {4} 1 f .2 x. f x : (( ) ( )) (( ) ( )) {4} {2} {4} {4} {3} {4} 2 x. f x : (( ) ( )), 3a. a : (( ) ( )) {4} {4} {2} {4} 4b. b : , (1 f .2 x. f x) (3a. a) : (( ) ( )) {4} {4} ((1 f .2 x. f x) (3a. a)) (4b. b) : , F : , L( F ) {4} 2 Sparse Flow Graph Approach The second type-based analysis uses a sparse flow graph and avoids the transitive closure of 0CFA. All potential nodes are generated by the following grammar: n ::= e | dom(n) | ran(n), where dom(n) and ran(n) are the domain and range of n. The transition system for the new flow graph is defined as follows: x dom(l x. e) (l x. e occurs in E) (1), ran(l x. e) e (l x. e occurs in E) (2) dom(e1 ) e2 (if e1 e 2 ocuurs in E) (3), e1 e2 ran(e1 ) (if e1 e 2 ocuurs in E) (4) n1 n2 n dom(n2 ) n1 n2 n ran(n1 ) (5), (6) dom(n2 ) dom(n1 ) ran(n1 ) ran(n2 ) L(e) is defined as the set of abstractions lx.e’ such that there exists a path e * lx.e’ in the flow graph. For the given expression F the following edges can be generated: (1) ( 3) f dom(1 f .2 x. f x) 3a. a ( 4) ( 2) (1 f .2 x. f x)(3 a. a) ran(1 f .2 x. f x) 2 x. f x ( 4) (6) (6) ( 2) F ran((1 f .2 x. f x)(3a. a)) ran(ran(1 f .2 x. f x)) ran (2 x. f x) f x ( 4) (6) (6) ( 2) (1) ran( f ) ran(dom(1 f .2 x. f x)) ran(3a. a) a dom(3a. a) ( 5) (5) ( 3) (1) dom(dom(1 f .2 x. f x)) dom( f ) x dom(2 x. f x) ( 5) (5) ( 3) dom(ran(1 f .2 x. f x)) dom((1 f .2 x. f x)(3a. a)) 4b. b So the flow set of F is {4}. It can be shown that if a -term is simply typed then the flow graph will be finite, sparse, built in finite time, and the produced flow information will be the same as that produced by 0-CFA. For bounded types, the flow information can be computed in O(n2) time. Types as Discriminators In this type-based analysis, L(e) is the set of abstractions in the program that have the same type as e. In the given example F and 4b.b have the same type so L(F) = {4}. Advantages of Type-based Analysis Simplicity: types provide an infrastructure on top of which analyses can be built. Efficiency: statically-typed programs are more structured and hence easier to analyze than dynamically-typed programs. Correctness: well-typed programs can not go wrong. The correctness of the analysis is subsumed by the correctness of the type system. Applications of Type-Based Analysis Method Inlining: Type-based Analyses such as CHA (Class Hierarchy Analysis) and RTA (Rapid Type Analysis) use types as discriminators to determine the methods that can be invoked at virtual call sites so that the compiler can inline these calls. Application Extraction: CHA and RTA can be extended with a form of reachability analysis to build the call graph which can be used to remove methods that are not reachable from the main method, replace dynamically dispatched method calls with direct method calls, inline method calls for which there is a unique target, and other optimizations. Redundant-Load Elimination: A compile-time optimization that needs alias analysis. Several type-based alias analysis were suggested that use types as discriminators. 3 Type-based escape analysis (finding confined classes in Java bytecode whose objects do not escape the package). Typed-based analysis for race detection (determine when two threads manipulate a shared data structure without synchronization). Typed inference can determine where to allocate and deallocate regions in a region-based memory management. 4