Data-Flow Analysis (Chapter 8) Furman Michael Outline • What is Data-Flow Analysis? • An example: Reaching Definitions • Basic Concepts: Lattices, Flow-Functions, and Fixed Points • Taxonomy of Data-Flow Problems and Solutions • Iterative Data-Flow Analysis Data-Flow Analysis • Input: A control flow graph • Output: A control flow graph with “global” information at every basic block Examples – Constant expressions: x+y*z – Live variables The purpose of data- flow analysis is to provide global information about how a procedure manipulates its data. For example, constant – propogation analysis seeks to determine, whether all assignment to a practical variable that may provide the value of that variable at some particular point necessary give it the same constant value. If it so, a use of the variable at that point can be replaced by constant. Data flow analysis should, as any optimization, always attempt to get the greatest possible benefit from the analyses and code-improvement transformations without ever transforming correct code to incorrect code. Compiler structure String of characters Scaner Scanner tokens Tokens Symbol Table And Access Routines Parser Parser AST AST Semantic Semantic analyzer Analizer IR IR Code Generator Object code Object code structure Fig 1. Compiler structure Os Interface Optimizing compiler structure String of characters String of characters Front - End IR Control Flow analysis CFG Data Flow Analysis CFG + information Program Transformation IR Instruction selection Object Code Fig 2. Optimizing compiler structure An Example Reaching Definitions • A definition --- an assignment to variable • An assignment d reaches a basic block if there exists an execution path to the basic block in which the value assigned at d is still active at the basic block Running Example unsigned int fib(unsigned int m) {unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } } 1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i+1 12: goto L1 13: L3: return m B0 Entry B1 1: 2: 3: 4: B2 5: B3 B4 B5 6: L1: if i <=m goto L2 7: return f2 B6 13: L3 : return m B7 receive m(val) f0 0 f1 1 if m<= 1 goto L3 i 2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: I I+1 12: goto L1 Exit Fig 3 . Control Flow Graph 2,3 2,3, 8,9, 2,3, 8,9, 10, From definition of Reaching Definition it is easy to get following table, that show, what line reaches each block: B0 B1 B2 2,3 B3 2,3,5,8,9,10,11 B4 2,3,5,8,9,10,11 B5 2,3,5,8,9,10,11 B6 2,3 B7 2,3,5,8,9,10,11 There is two options what to do with reaching definition of line 1: parameter m: 1) parameter m receive value , so each block from B1 to B7 also reach line 1. 2) there is no assignment in line 1. Difficulties in Data-Flow Analysis • In general it is recursively undecidable, when a definition actually reaches some other point. • Also, reaching definition may depend on input data. For example in this C code, if actual parameter of function f n is 1, then line 5 is actually reaches lines 7 and 8, if actual parameter n > 0 and n <> 1 then line 4 actually reaches lines 7 and 8. If actual parameter n < 0 then 4 and 5 not reaches line 7 and 8. What is reaching definition of line 10? Once again, depend on actual parameter. And reaching definition may be , if actual parameter n > 0 , but function g does not decrement n. 1 int g(int m , int i) 2 int f(int n) 3{ int I = 0; 4 if (n == 1) I = 2 5 while (n > 0) { 6 j = I + 1; 7 n = g(n , I); 8 } 9 return j 10 } Iterative Computation of Reaching Definitions • Optimistically assume that every block no definition is reached • Every basic block “generates” new definitions and “preserves” other definitions • No definition reaches ENTRY • Iteratively compute more and more definitions at every basic block • The process must terminate • The final solution is unique and conservative Iterative Computation of Reaching Definitions RCin(ENTRY) = The definition may reach the beginning of block , if it may reach the end of some it predecessor: RCin(B) = Rcout(B’) B’ Pred(b) The definition may reach the end of basic block iff 1) It reach the beginning of block and preserve in block 2) It generates in block RCOut(B) = GEN(B) ( RCin(B) PRSV(B)) Basic Block B0 B1 B2 B3 B4 B5 B6 B7 Gen Prsrv 2,3 5 8,9,10.11 2,3,5,8,9,10,11 5,8,11 2,3,5,8,9,10,11 2,3,5,8,9,10,11 2,3,5,8,9,10,11 2,3,5,8,9,10,11 2,3,5,8,9,10,11 After first iteration: Basic Block B0 B1 B2 B3 B4 B5 B6 B7 RCOut RCIn 2,3 2,3 2,3,5 2,3,5 2,3,5 2,3,5 2,3 2,3 2,3,5 2,3,5 2,3,5 2,3,5 RCOut RCIn 8,9,10,11 After one more iteration: Basic Block B0 B1 B2 B3 B4 B5 B6 B7 2,3 2,3 2,3 2,3,5 2,3 2,3,5,8,9,10,11 2,3,5,8,9,10,11 2,3,5,8,9,10,11 2,3,5,8,9,10,11 8,9,10,11 2,3,5,8,9,10,11 2,3,5,8,9,10,11 2,3,5,8,9,10,11 Iterative Computation of Reaching Definitions Using Bit-Vectors • Represent every definition with a bit : 1 meaning it may reach the given point or 0 – meaning definition does not reach the point. • PRSV and GEN are bit-vectors Our rules now presented as follow (for example : RCin(ENTRY) = <0000….000> RCin(i) = Rcout(j) j Pred(i) RCOut(i) = GEN(i) ( RCin(i) PRSV(i)) Complete Join-Lattices Lattice L consist of 1) set of values 2) two operations called meet () and join (). Properties: 1) For all x, y L , there is exist unique z and w , such x y = z and x y = w ( closure) 2)For all x, y L , x y = y x , and x y = y x (commutativity) 3) For all x,y,z L, (x y) z = x (y z) and (x y) z = x (y z) (associativity) 4) There are two unique elements of L called bottom (┴) and top (┬), Such for all x of L x ┴ = ┴ and x ┬ = ┬ Example 1: Bit vectors BV(n) will be used to denote the lattice of bit vectors of length n. ┴ = <0…0> ┬ = <1…1> is bitwise and is bitwise or Example 2: ICP Elements : ┴, ┬, all the integer and the booleans Properties: 1) For all n ICP n ┴ = ┴ 2) For all n ICP n ┬ = ┬ Meet of any two elements is found by following the lines downward from them until they meet, and the join is found by following the lines up until they join ┬ false … -2 –1 0 ┴ 1 2 … true • An partial order on the elements of L can be defined as follow: X y if and only if x y = x • x y x “covers” less states than y x is more precise than y • height of a lattice length of maximal strictly increasing chain x1x2... xk , where ┴ = x1 ┬ = xk Functions on Lattices • A functions mapping lattice to itself f: L L is monotonic if for all x, y x y f(x) f(y) Example: f: BV(3) BV(3) as defined by f(<x1 x2 x3>) = <x1 1 x3> is monotonic. • A fixed point of a function f: L L Is an element x of L such f(x) = x. Example: f:BV BV f(0) = 0 and f(1) = 1. Both 0 and 1 are fixed points of f. • For a montonic function f the effective height of L relative to function L L is the length of the longest increasing chain obtaining by iterating application of f. such exist x1 , x2 = f(x1), x3 = f (x2), …………., xn= f (xn-1), such that x1x2... xk ┬ The Join (Meet) Over All Paths • A data-flow solution which is precise under the assumption that every control flow path is executable • Let G = < N,E > be a flow graph • Let Path(B) represent the set of all paths from entry to any node B of N and p be any element of Path(B) Let F B () be the flow function representing flow trough block B and F P () represent the composition of the flow function encountered in following the path p. For example , if B1 = entry , …, Bn = B are the blocks making the path p to B, then F p = FBn ... FB2 FB1 Let Init be the lattice value associated with the entry block • The MOP at a block B MOP(B) = P Path(B) Fp(Init) • The JOP at a block B JOP(B) = P Path(B) Fp(Init) Dimensions for Data-Flow Problems • The information provided • “ralational” Vs. independent attributes • The type of lattice and functions used powersets, ICPn, ..., unbounded heights • The direction of information flow forward, backward, bidirectional Example Data-Flow Problems • • • • • • • Reaching Definitions Available Expressions Live Variables Upward Exposed Uses Copy-Propagation Analysis Constant-Propagation Analysis Partial-Redundency Analysis Data-Flow Analysis Algorithms • • • • • • • • Allen’s strongly connected regions Kildall’s iterative algorithm Ullman’s T1-T2 analysis Kennedy’s node-listing algorithm Farrow, Kennedy, and Zuconi’s graph grammar approach Rosen’s high-level approach structural analysis slotwise analysis