Page 1 C. Kessler, IDA, Linköpings Universitet, 2009. [ASU1e 10.4] [ALSU2e 9.6] [Muchnick 7] TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Control Flow Analysis necessary to enable global optimizations beyond basic blocks basis for data-flow analysis 3 2 1 m <= 1 ? f1 = 1 f0 = 0 receive m C. Kessler, IDA, Linköpings Universitet, 2009. i = 2 N 5 i <= m ? N 11 10 9 i = i + 1 f1 = f2 f0 = f1 Y 8 f2 = f0+f1 6 return f2 4 7 (iterative) (recursive) (recursive) reconstruction of if-then-else, loops from MIR, from unstructured source code or from target code ! loops: candidates for loop transformations, software pipelining ! if-then-else: candidates for predication identify basic blocks of a routine m Y Page 3 construct its flow graph / basic block graph 1. Dominator-based analysis 2. Interval analysis 3. Structural analysis TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Example (cont.): Control Flow Graph receive m // read arg value f0 = 0 f1 = 1 if m<=1 goto L3 12 return i = 2 L1: if i<=m goto L2 return f2 L2: f2 = f0 + f1 f0 = f1 f1 = f2 i = i + 1 goto L1 L3: return f2 // MIR 1 2 3 4 5 6 7 8 9 10 11 12 13 TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Page 2 Page 4 Control Flow Analysis – Running Example // Fibonacci - Iterative alg. } unsigned int f0 = 0, f1 = 1, f2, i; if (m <= 1) return m; else { for (i=2; i<=m; i++) { f2 = f0 + f1; f0 = f1; f1 = f2; } return f2; unsigned int fib ( unsigned int m ) { } TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Detecting basic blocks C. Kessler, IDA, Linköpings Universitet, 2009. receive m // read arg value f0 = 0 f1 = 1 if m<=1 goto L3 i = 2 L1: if i<=m goto L2 return f2 L2: f2 = f0 + f1 f0 = f1 f1 = f2 i = i + 1 goto L1 L3: return f2 // MIR, flattened 1 2 3 4 5 6 7 8 9 10 11 12 13 C. Kessler, IDA, Linköpings Universitet, 2009. basic block (BB) = max. sequence of consecutive statements (IR or target level) that can be entered by program control only via the first one and left only via the last one. first instruction (“leader”) of a BB: either + entry point of a procedure, or + branch target, or + instruction immediately following a branch or return ! call instructions need not delimit the basic block (ok for most cases, but not for e.g. instruction scheduling) ! exception-based control transfer not considered here TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Basic-block graph Page 5 C. Kessler, IDA, Linköpings Universitet, 2009. Terminology: in [Muchnick’97] called control-flow graph CFG, whereas “our” CFG (statement level) is there called a “flowchart” rooted, directed graph G = (N E ) nodes = basic blocks + entry + exit Succ(b) = Pred (b) 1 f0 = 0 receive m i = 2 f0 = f1 fn 2 N : (b n) 2 E g B2 Y exit entry B1 B5 N B6 Y B4 B3 C. Kessler, IDA, Linköpings Universitet, 2009. fn 2 N : (n b) 2 E g Page 7 = edges = control flow edges from CFG/flowchart (there connecting BB exits to the leaders of their successor BBs) + enter ! initial basic block + final basic blocks (no succ.) ! exit successor BB’s of a BB b: predecessor BB’s of a BB b: 2 f1 = 1 5 N TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Y 4 3 2 1 m <= 1 ? f1 = 1 f0 = 0 receive m return f2 6 5 N N 10 9 i = i + 1 f1 = f2 f0 = f1 Y 8 f2 = f0+f1 i <= m ? i = 2 12 Page 6 return m Y 3 2 1 f1 = 1 f0 = 0 receive m C. Kessler, IDA, Linköpings Universitet, 2009. 5 i = 2 N i <= m ? 10 9 i = i + 1 f1 = f2 f0 = f1 Y 8 f2 = f0+f1 6 N 11 C. Kessler, IDA, Linköpings Universitet, 2009. return f2 m <= 1 ? 7 4 Example (cont.): CFG ! Basic Block Leaders 12 return m 7 11 Page 8 Algorithm for computing the EBB’s of a CFG: see e.g. [Muchnick 7.1] EBB’s are useful for some optimizations e.g. instruction scheduling ! EBB also known as treegion ! single entry, multiple exits, tree-like internal control flow Extended basic block (EBB) = max. sequence of instructions beginning with a leader that contains no join nodes other than (maybe) its first node TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. 3 m <= 1 ? TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. 4 Extended basic blocks, regions Y return m i <= m ? N 9 f1 = f2 Y 8 f2 = f0+f1 6 return f2 10 i = i + 1 N Example (cont.): CFG + Basic Blocks ! Basic Block Graph 12 7 11 Region = strongly connected subgraph (SCC) of the CFG with a single entry TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Y entry B1 B5 N N B6 Y B4 B3 Example (cont.): Extended Basic Blocks B2 exit TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Page 9 Page 11 Graph-theoretic concepts of control-flow analysis (1) C. Kessler, IDA, Linköpings Universitet, 2009. C. Kessler, IDA, Linköpings Universitet, 2009. depth-first search (dfs): recursively explore descendants of a node before any of its siblings (as far as not yet visited) dfs-number: order in which dfs enters nodes tree edges: edges followed by dfs via recursive calls dfs-tree: ( nodes, tree edges ) non-tree edges classified as forward edges “F” back edges “B” cross edges “C” not unique! depends on ordering of descendants see also DFS-slides on course homepage TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Finding Loops Page 10 Programs spend most of the execution time in loops. C. Kessler, IDA, Linköpings Universitet, 2009. ! Optimizations that exploit the loop structure are important loop unrolling, loop parallelization, software pipelining, ... Loops may be expressed in programs by different constructs (while, for, goto, ..., compiler-converted tail-recursion) ! Find uniform treatment for program loops T 1 4 C T T 6 Page 12 T 7 T 8 B B T 2 T T 3 T 1 5 C C 4 T 6 T 7 T 8 B C. Kessler, IDA, Linköpings Universitet, 2009. Use a general approach based on graph-theoretic properties of CFG. TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. 2 T 3 T Example: DFS-tree, edge classification B F 5 Page 13 C. Kessler, IDA, Linköpings Universitet, 2009. TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Page 14 Dominance, immediate dominance, strict dominance TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Graph-theoretic concepts of control-flow analysis (2) idom(b) is unique for each b 2 N ! (i)dominator tree, rooted at entry Lamp N N Y B4 B6 Page 16 strict dominance d sdom b if d dom b and d 6= b TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. entry B1 B5 C. Kessler, IDA, Linköpings Universitet, 2009. C. Kessler, IDA, Linköpings Universitet, 2009. i immediately dominates b (i idom b) if i dom b and there is no c 2 N, i 6= c 6= b, with i dom c and c dom b dom is reflexive, transitive, antisymmetric ! partial order on N d dominates b (d dom b) if every possible execution path entry ! b includes d Given: flow graph G = (N E ), nodes d i p b 2 N C. Kessler, IDA, Linköpings Universitet, 2009. preorder traversal of a digraph G = (N E ): at each node b 2 N process b before its descendants (not unique; in dfsnum-order) Page 15 postorder traversal at each node b 2 N process b after its descendants TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Y Dominance intuition (2) Lamp N B4 Y exit B3 Dominance intuition (1) entry B1 N B6 B2 The entry node dominates all nodes: Y B5 B3 Imagine a source of light going into the entry node, nodes are transparent and edges are optical fibers B2 exit Place an opaque barrier at node v ! nodes dominated by v get dark (Adapted from a nice presentation by J. Amaral 2003) TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Page 17 C. Kessler, IDA, Linköpings Universitet, 2009. TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Y exit entry B1 B5 Lamp N N B3 Y B4 B6 TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. B2 Dominance intuition (4) Lamp N Y B4 B3 Dominance intuition (3) entry B1 N B6 C. Kessler, IDA, Linköpings Universitet, 2009. Node B6 only dominates itself: Y B5 Page 19 Node B3 dominates B3, B4, B5, B6: B2 exit TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. c1 c2 a .... Page 20 Page 18 C. Kessler, IDA, Linköpings Universitet, 2009. C. Kessler, IDA, Linköpings Universitet, 2009. if D = 6 Domin(n) Domin(n) D; change true return Domin p2Pred (n) change true; Domin(n) N 8n 2 N ; frg Domin(r) frg; while ( change ) change false for all n 2 N ; frg // in dfsnum order T Domin( p) D fng Algorithm 1: Computing the dominators of a node or (1) a = b, iff Postdominance b dom p in the reversed flow graph a dom b () p postdominates b (p pdom b) if every possible execution path b ! exit includes p p pdom b (2) 9 unique immediate predecessor of b, namely a, i.e. Pred (b) = fag, or (3) for all c 2 Pred (b) c= 6 a and a dom c. b B2 TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. entry B1 N N B3 B4 Y B6 node i entry B1 B2 B3 B4 B5 B6 exit Page 21 Page 23 Domin(i), init. fentryg change=true fentry, B1, B2, B3, B4, B5, B6, exitg fentry, B1, B2, B3, B4, B5, B6, exitg fentry, B1, B2, B3, B4, B5, B6, exitg fentry, B1, B2, B3, B4, B5, B6, exitg fentry, B1, B2, B3, B4, B5, B6, exitg fentry, B1, B2, B3, B4, B5, B6, exitg fentry, B1, B2, B3, B4, B5, B6, exitg Example: Computing dominators Y B5 exit TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Y entry B1 N N B3 B4 Y B6 node i entry B1 B2 B3 B4 B5 B6 exit B2 init. Tmp(i)=Domin(i)–fig fentryg fentryg fentry, B1g fentry, B1g fentry, B1, B3g fentry, B1, B3, B4g fentry, B1, B3, B4g fentry, B1g Example: Computing immediate dominators B2 B5 exit Dominator tree: iter. 2 ... ... ... ... ... ... ... ... ... C. Kessler, IDA, Linköpings Universitet, 2009. Domin(i), iter. 1 fentryg fentry, B1g change=true fentry, B1, B2g fentry, B1, B3g fentry, B1, B3, B4g fentry, B1, B3, B4,B5g fentry, B1, B3, B4,B6g fentry, B1, exitg exit C. Kessler, IDA, Linköpings Universitet, 2009. B3 B4 B6 Tmp(i), iter. 1 fentryg fentryg fB1g fB1g fB3g fB4g fB4g fB1g entry B1 B5 TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Page 22 Extension for computing immediate dominators ... compute Domin ... for each n 2 N Tmp(n) Domin(n) ; fng C. Kessler, IDA, Linköpings Universitet, 2009. C. Kessler, IDA, Linköpings Universitet, 2009. for each n 2 N ; frg // in dfsnum order // if a s in Tmp(n) has a dominator t 6= s, remove t from Tmp(n) for each s 2 Tmp(n) for each t 2 Tmp(n) ; fsg if t 2 Tmp(s) then Tmp(n) Tmp(n) ; ft g for each n 2 N ; frg // in dfsnum order idom(n) the b 2Tmp(n) Page 24 Total time: O(n2e) if sets are represented by bitvectors TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Computing dominators Algorithm 2 [Lengauer/Tarjan’79] based on depth first search and path compression time O(e log n) or O(e α(e n)) (see e.g. [Muchnick pp. 185–190]) TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Page 25 Loops and Strongly Connected Components T T T T C c a b w v’ ? Can’t exist because n dom m a b c d e T T T T F C. Kessler, IDA, Linköpings Universitet, 2009. d e C. Kessler, IDA, Linköpings Universitet, 2009. B We call a (backward, B) edge (m n) a loop back edge if n dom m. Remark: Not every B edge is a loop back edge! w v’ B Natural loop of a loop back edge (m n) = subgraph of n and all nodes v from which m can be reached without passing through n v n n is the loop header. B m Page 27 c does not dominate d ) not a natural loop (2 entry points) TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Identifying the natural loop of a loop back edge v n stopping recursive backward search at already found loop nodes. Backwards from m, (df)search predecessors v, Start by marking m and n as loop nodes. Algorithm: Compute the loop node set for a given loop back edge (m n) B m TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Y entry B1 B5 N N n m B6 Y B4 B3 Example (cont.): Natural Loop B2 exit TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Loop header, preheader B2 Page 26 C. Kessler, IDA, Linköpings Universitet, 2009. C. Kessler, IDA, Linköpings Universitet, 2009. B2 B Page 28 B1 header pre−header For technical reasons, add a pre-header (initially empty) if the header has more than 2 predecessors: B1 header B ! Easier to place new instructions immediately before the loop TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Properties of Natural Loops Page 29 C. Kessler, IDA, Linköpings Universitet, 2009. Two natural loops with different headers are either disjoint or one is nested in the other. Each natural loop is a SCC. Background: Strongly connected component (SCC) = subgraph S = (NS ES), NS N, ES E, where every node in VS is reachable in S from every other node in VS via edges in ES. SCC’s can be computed with Tarjan’s algorithm (extension of dfs) in time O(jV j + jE j) [Tarjan’72] Page 31 C. Kessler, IDA, Linköpings Universitet, 2009. Several loops sharing a common header node is a pathological special case that must be treated ad hoc. See e.g. Muchnick 7.4 for more details. TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Reducibility of flow graphs TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Properties of Natural Loops (cont.) Page 30 C. Kessler, IDA, Linköpings Universitet, 2009. C. Kessler, IDA, Linköpings Universitet, 2009. A SCC S V is maximal if every SCC containing S is just S itself. Example: entry B1 B2 B3 exit Page 32 S1 = fB1 B2 B3g is a maximal SCC. S2 = fB2g is a SCC but not a maximal SCC. TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Interval analysis B1a T2: Simplest variant: T 1-T 2 Analysis [Ullman’73] T1: B1 Try to fold entire flow graph into a single node Works only for very restricted flow graphs B2 B1 B1a Hierarchical folding structure allows for faster / simpler data flow analysis ! nested regions (control tree) ! abstract flowgraph Divide flowgraph into regions (e.g., loops in CFA) b A flow-graph is reducible if all B edges in any DFS tree are loop back edges. c Repeatedly collapse a region to an abstract node d c’ f d Intuitively: ... if there are no jumps into the middles of loops (e.g., goto’s). c f b e e Reducible flow graphs are well-structured (loops properly nested). Irreducible flow graphs are rare and can be made reducible by replicating nodes. B1a B1a B3a T1: TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. B1 T2: T1-T2 Analysis — Example T1: Example: B1 B2 B3 B4 B1a B3b Page 33 T2: T2: B1 B2 B2 B1b C. Kessler, IDA, Linköpings Universitet, 2009. B1a B1a B1 B3 T 1-T 2 control tree B1a C. Kessler, IDA, Linköpings Universitet, 2009. B1b B1 B3b B3a B4 TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Structural Analysis is a special case of interval analysis: Page 34 C. Kessler, IDA, Linköpings Universitet, 2009. CFG folding follows the hierarchical structure of the program ! folding transformations for loops, if-then-else, switch, etc. Every region has 1 entry point Works only for well-structured programs – Extensions to handle arbitrary flowgraphs (define a new region / transf. for otherwise irreducible constructs) Y B1 entry R8 B5 N R7 n ... m B3 Y B4 B6 Page 36 entry R7 R8 B6 B5 exit C. Kessler, IDA, Linköpings Universitet, 2009. procedure R9 B2 B3 B4 Region Hierarchy Tree B1 + Equations etc. for dataflow analysis can be pre-formulated for each construct ! faster R9 B2 exit Remark: If only loop-based regions are of interest, the hierarchy flattens accordingly (R8 and R9 merged with top level). N TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Page 35 Tstmt−block: B1 B2 Bn Twhile−loop: B2 B1a TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. B1a B1a B1a Example (cont.): Structural analysis B1 B3 Structural analysis — Some regions and transformations Tself−loop: B1 Tif−then: B2 B1 Tif−then−else: B2 C. Kessler, IDA, Linköpings Universitet, 2009. TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Page 37 TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis. Summary: Control-Flow Analysis g = all leaf regions, i.e., all single blocks in G, in any order Structural analysis Interval-based CFA Dominator-based CFA Loop detection Basic blocks, extended basic blocks Computing a bottom-up order of regions of a reducible flow graph ::: Input: A reducible flow graph G Output: A bottom-up ordered list R of loop-based regions of G 1. R fB1 B2 2. repeat Choose a natural loop L such that, if there are any natural loops L0 contained within L, then the (body and loop) regions for these L0 were already added to R. R.add( the region consisting of the body of L ) // body of L = L without the back edges to the header of L R.add ( the loop region for L ) until all natural loops have been considered 3. If the entire flow graph is not itself a natural loop, R.add( the region consisting of the entire flow graph ). Page 38 C. Kessler, IDA, Linköpings Universitet, 2009.