Interprocedural Control Dependence Computation James Ezick & Kiri Wagstaff CS 612 Final Paper 13 May 1999 Section 1: Introduction Control dependence is a fundamental notion in program optimization. While a significant body of work exists which explores the computation of control dependence in the case of a single block of code [1,3,7], comparatively little exists which examines the interprocedural case [5,8]. We seek to extend existing methods of computing control dependence in an intraprocedural setting to the case of programs with multiple procedures. Consideration of modern programming techniques suggests a trend away from the single monolithic blocks of code common to languages such as PASCAL and FORTRAN toward the more fine grain division of code encouraged by object oriented languages (C++, JAVA). It is in the light of these observations that the question of the computation of interprocedural control dependence gains increasing relevance. In addition to the problems in program analysis and parallelization that control dependence was introduced to solve we recall that the problem of computing control dependence is identical to the problem of computing the edge dominance frontier relation of the reverse graph. It is this relation that can be used in the construction of the SSA form of a program [2]. SSA form facilitates a wide range of optimizations – from constant propagation to code hoisting. The scope of our activity has been to attempt to develop a means to compute control dependence in the interprocedural case. To this end we have divided the problem into two parts. First, we present a means of computing postdominance for control flow graphs involving multiple (possibly recursive) procedures. Second, given the postdominance relation we attempt to extend the work of Pingali and Bilardi [7] to answer control dependence queries following suitable preprocessing in optimal time. Thus far, we have a well-defined iterative method for computing interprocedural postdominance as well as numerous ideas, formulations, and partial solutions for both computing postdominance without iteration and for answering control dependence queries from a preprocessed form of the postdominance relation. Section 2: Problem Statement We assume that we are given an interprocedural control flow graph for the program. Definition: control flow graph G = (V, E) is a directed graph in which nodes represent statements, and an edge u v represents a possible flow of control from u to v. Set V contains two distinguished nodes: Start, with no predecessors and from which every node is reachable, and End, with no successors and reachable from every node. Further, we assume an edge Start End. Definition: interprocedural control flow graph G = (V, E, P) is a control flow graph augmented with a set P of pairs of call and return edges. The elements of P are pairs (C, R) of edges in which C must terminate at a distinguished node Start for some procedure and in which R must originate from the distinguished node End of the same procedure. Further, the procedures of G must from a partition of both V and E with these partitions connected only by edges in P. For each procedure, End must be reachable from Start although we do not require and edge Start End. An immediate consequence of this formulation is that we introduce paths of control through the graph that do not correspond to any execution of the program. We seek to eliminate these paths from our formulation by redefining the nature of a path. Definition: path – Any sequence of connected nodes in the interprocedural control flow graph. Definition: complete path – Given an initially empty stack S and interpreting a call edge in the interprocedural control flow graph as an edge which pushes a return site and a return edge as one which pops the site if it returns to the same site as the top of stack and does nothing otherwise a complete path is any path for which S is empty as the terminal node. Definition: total complete path – Any complete path originating at start and terminating at end. Definition: valid path – v is a valid path if and only if there exists a prefix path p and suffix path s (both possibly empty) such that p v s is a complete path. Given our definition of a valid path through the control flow graph we adapt the usual definitions of control dependence and postdominance accordingly. Definition: interprocedural postdominance – u interprocedurally postdominates (IPDOM) v if and only if every valid path from v to end contains u. Definition: interprocedural control dependence – w is interprocedurally control dependent (IPCOND) on edge u v if and only if: IPDOM(w, v) and If w u then not IPDOM(w, u). Given these definitions we seek first to compute the postdominance relation and then to preprocess that information into a form conducive to answering interprocedural control dependence queries. Section 3: Computing Interprocedural Postdominance Iteratively We first observe that, in contrast to the intraprocedural postdominance relation, the interprocedural postdominance relation forms a directed acyclic graph, rather than a tree. That the relation is acyclic is immediate from the observation that the relation is both transitive and antisymmetric (properties which follow directly from the definition). That the relation is no longer tree-structured is best illustrated with an example. Consider the following simple control flow graph and its associated postdominance relation: END START f a a b START Fe c e d call F d e call F Fs b c f END a Function F postdominates each of its call sites (b and c). Each return site (d and e) also postdominates the corresponding call site. However, neither return site can be said to postdominate the function F (e.g. d does not postdominate F because an alternative path (through e) exists from START to END which does not pass through d). As a result, the call sites have more than one parent in the postdominance relation, which therefore can no longer be tree-structured. A straightforward way to approach the problem of computing this interprocedural postdominance relation is to view it as a dataflow problem [8]. For simplicity and because it represents a forward-flow problem, we present an iterative dataflow algorithm for computing dominance in a control flow graph, observing that the same algorithm can be used to compute postdominance by reversing the edges in the control flow graph and working backwards through it. Clearly there is an additional consideration when developing an algorithm to do interprocedural analysis: we must determine in which order to process the functions. Therefore we perform a topological sort of the strongly connected components of the call graph and work backward through it. We collapse strongly connected components (corresponding to cycles in the call graph) to single nodes, and then perform the analysis on their equations simultaneously. This allows us to substitute the solutions to the equations solved for a function back into equations for the body of code that calls the function. Having selected a function to process, we apply a standard dataflow algorithm to it. The lattice involved in our dataflow analysis is the powerset of all nodes in the control flow graph of the current function or set of mutually recursive functions. Unknowns are initialized to U, the set of all nodes in the CFG for the current set of functions, rather than null, because we seek a greatest fixed point. The dataflow equations for individual nodes are defined as follows: Sin Start Sout Sout = {Start} S2 S1 S1 S Sout FEnd Sout = Sin {S1} Sout = (S1 S2) {S} These equations are standard for a forward-flow dataflow problem. The interesting case is how information propagates across function calls. The following equation indicates that the dominance set after a function call is composed of the dominators of the call node (Sin), the call and return nodes (Call F, Ret F), and the dominators of the function being called (SF). Sin Call F Ret F SF Sout Sout = Sin {Call F, Ret F} SF SF = {intraprocedural dominators of F} Note that this formulation allows us to do a context-sensitive analysis. Depending upon where the call to function F occurs, Sout will contain different call and return sites. This also means that information propagates correctly along valid paths in the CFG. This fits the traditional notion of a context sensitive analysis in which the output is a function of the input. In this case the transition function simply appends the dominators of the function to the set containing the call and return sites. This realization about the simplicity of the transition function motivates our use of set rather than function notation for SF. Consider the following example, which illustrates how the dataflow algorithm works and contains some interesting cases. The main function calls function F regardless of which branch of the conditional at a is taken. Function F makes a recursive call to itself. START FSTART a Fa a call F b c d e Fb call F call F Fd Fc f Fe END FEND Main F The call graph for this program is as follows: Main F Thus we begin our analysis on function F. For convenience, we refer to the dominance sets as Sout(edge). The equations for F are given as follows: Sout(FSTARTFa) = {FSTART} Sout(FaFb) = Sout(FSTARTFa) {Fa} Sout(FaFd) = Sout(FSTARTFa) {Fa} Sout(FdFe) = Sout(FaFd) {Fd} Sout(FcFe) = Sout(FaFb) {Fb, Fc} SF Sout(FeFEND) = (Sout(FcFe) Sout(FdFe)) {Fe} SF = Sout(FeFEND) {FEND} All but the last three equations will converge after one iteration. The remaining ones will converge after two iterations: Sout0(FcFe) = U = {FSTART, Fa, Fb, Fc, Fd, Fe, FEND} Sout0(FeFEND) = U SF0 = U Sout1(FcFe) = Sout(FaFb) {Fb, Fc} SF = U Sout1(FeFEND) = (Sout(FcFe) Sout(FdFe)) {Fe} = {FSTART, Fa, Fd, Fe} SF1 = {FSTART, Fa, Fd, Fe, FEND} Sout2(FcFe) = U Sout2(FeFEND) = {FSTART, Fa, Fd, Fe} SF2 = {FSTART, Fa, Fd, Fe, FEND} The resulting sets for the function F are: Sout(FSTARTFa) = {FSTART} Sout(FaFb) = {FSTART, Fa} Sout(FaFd) = {FSTART, Fa} Sout(FdFe) = {FSTART, Fa, Fd} Sout(FcFe) = {FSTART, Fa, Fb, Fc, Fd, Fe, FEND} Sout(FeFEND) = {FSTART, Fa, Fd, Fe} SF = {FSTART, Fa, Fd, Fe, FEND} Note that this also demonstrates the importance of initializing unknowns to U rather than the null set. Doing this ensures that we get a greatest fixed point. Had we initialized Sout(FcFe) to null, we would never add Fd to its set. Thus, because of the presence of the intersection operator, Fd would have been excluded from SF. This is clearly not what we want since it is clear that all paths out of the recursive function F and back to Main must go through Fd (else a new call to F would be generated instead of a return to Main). At this point, convergence for function F has been achieved. Returning to the main function, we likewise compute: Sout(STARTa) = {START} Sout(ab) = Sout(START) {a} Sout(ac) = Sout(START) {a} Sout(df) = Sout(ab) {b, d} SF Sout(ef) = Sout(ac) {c, e} SF Sout(fEND) = (Sout(df) Sout(ef)) {f} SMAIN = Sout(fEND) {END} Having already computed SF, all of these equations will converge after one iteration, to: Sout(STARTa) = {START} Sout(ab) = {START, a} Sout(ac) = {START, a} Sout(df) = {START, a, b, d, FSTART, Fa, Fd, Fe, FEND} Sout(ef) = {START, a, c, e, FSTART, Fa, Fd, Fe, FEND} Sout(fEND) = {START, a, f, FSTART, Fa, Fd, Fe, FEND} SMAIN = {START, a, f, END, FSTART, Fa, Fd, Fe, FEND} This example illustrates how the iterative algorithm handles function calls and recursion. Note that Sout(fEND) correctly includes all of the dominators from function F, since it is called on either side of the conditional. Convergence in general is guaranteed due to the finite nature of the lattice and the monotonicity of the dataflow equations. Although they contain both union and intersection operations, it is still true for all equations that for x, y {all dominance sets}, x y f(x) f(y). Finally, we note that while the algorithm requires exponential time in theory (since the lattice contains the powerset of a subset of the nodes of the control flow graph) it is usually polynomial in practice. However, we still do not expect this method to be as efficient as non-iterative algorithms. Section 4: Computing Postdominance without Iteration As we expect that computing the interprocedural postdominance relation by means of iteration until a greatest fixed point is found will be slow, we seek a method that produces the same solution without the need for iteration. In the intraprocedural case, efficient algorithms exist to do this computation [6]. These algorithms range from naïve quadratic time algorithms based on depth first search to more sophisticated (though still practical) linear time algorithms. In the interprocedural case, the matter of computing postdominance is complicated by the existence of invalid paths in the control flow graph. While this problem is dealt with implicitly in the context sensitive analysis of the previous section, it must be confronted explicitly in non-iterative algorithms. An additional goal in developing a non-iterative algorithm is to eliminate to the greatest extent possible the need for cloning of nodes of procedures called from multiple locations. This forces us to deal with the existence of invalid paths. The temptation to clone comes from the observation that in the absence of recursion, it is possible to eliminate invalid paths in the CFG by cloning the subgraph for each procedure once for every time it is called and ‘inlining’ that subgraph in the appropriate places. However, this leads to an undesirable explosion in the size of the CFG and any algorithm that must process it suffers a big hit in runtime. Additionally, recursion cannot be properly dealt with, as it would require an infinite graph. The solution to this problem that we are developing involves first computing the postdominance relation for each procedure with placeholders for called procedures and then “gluing” these pieces together. This approach is complicated by two factors. First, we have been able to generate a sufficient bank of examples to convince ourselves that in addition to which procedures are or might be called from a particular code block we also need to know the sequence of these calls. Thus, when computing postdominance for a single procedure we also need to know all of the possible sequences of calls that can be initiated from that procedure and the point(s) at which they are initiated. We accomplish this by introducing the notion of “sparse stubs”. When constructing the control flow graph, we construct in parallel a scaled down representation of the control flow graph consisting of only the entry and exit points to functions as well as the decision points that precede these calls. The result is a graph such that every path from start to end represents a possible sequence of procedure calls and returns in an execution of the program. Further, every such possible sequence is represented in the sparse graph. We note immediately that the graph may have repeated subgraphs. That is, if a procedure’s start and end each occur twice in the graph (they must occur in pairs as per our assumptions on the program’s design) then the connected subgraph between one pair will be isomorphic to the subgraph between any other pair. In other words, the subgraph representing a procedure F will be identical regardless of where it is called. Given this observation, we simply choose one pair and inject that subgraph into the “gap” between the call and return site of the corresponding instance of the procedure in the control flow graph. Start Start a F-Start F-Start F-Start F-Subgraph F F-Start F F-End End c F F-End F-End b F d e f Sparse Representation End Plugged CFG for Main This technique leads to our second problem. We notice that in the case of a procedure being called two or more times from a procedure we will have repeated nodes within the “filled” control flow graph. While it is clearly the case that we would like to treat these nodes as copies of a single node, we cannot simply merge all of the edges of one copy with the edges of another, as this would introduce invalid paths into our control flow graph. The utility of this “plugged” graph comes from the fact that it does not contain any invalid sequences of nodes. To compute postdominance on this special form of control flow graph we need to introduce the notion of union postdominance based on the concept of generalized dominators [4]. Definition: union postdominance – A set of nodes T in a control flow graph is said to union postdominate a set of nodes S if and only if for every node s in S every path from s to End contains an element of T. Given this definition of union postdominance, we partition the “filled” control flow graph into sets consisting of a node and all of its copies. (Nodes without copies form singleton sets of the partition). We then seek to construct the DAG that is the transitive reduction of the union postdominance relation on the sets of the partition. Given these component pieces it is our claim that in the non-recursive case at least we have enough information to put these component graphs together in topological order with respect to the call graph. In the DAG for a specific procedure F that calls procedure G we know that we will find nodes for G-Start and G-End. Further these nodes will define a subgraph with G-Start the source and G-End the sink. This subgraph may be empty or it may have other procedure start and end nodes inside of it depending upon whether or not G calls any other functions. In either case, we replace this subgraph with the one constructed for G. By the nature of the G and F DAGs, any node appearing in the deleting subgraph of the F DAG will be included in the injected G DAG. From this we see that no information is lost and the process can continue by injecting the graphs for procedures called by G. It is worth noting that while this works for programs with acyclic call graphs it does not work in the recursive case. In that case the plugs from the sparse representation introduce invalid paths which break the algorithm. At present I am working on developing a new, augmented sparse representation with call site information for recursive functions that may alleviate this problem. Section 5: Answering Control Dependence Queries Once we have computed the interprocedural postdominance relation, we can use it to answer control dependence queries. There are three types of queries of interest in the realm of control dependence: cd(e) is the set of nodes control dependent on edge e. conds(v) is the set of edges v is control dependent on. cdequiv(v) is the set of nodes with the same control dependencies as node v. In the intraprocedural case, Pingali and Bilardi have shown methods for computing control dependence information in optimal (proportional to the size of the output) time [7]. However, their methods exploit structure in the intraprocedural postdominance relation, namely the fact that it forms a tree. As we have shown, the interprocedural postdominance relation generalizes only to a DAG, so we are unable to directly apply the Roman Chariots approach. However, it forms a starting point for our approach. In answering the above control dependence queries, we must perform numerous reachability computations. For instance we know that, cd(uv) ={nodes reachable from v} – {nodes reachable from u}. For a tree-structured relation, this reduces to cd(uv) ={nodes on the path from v up to but excluding LCA(u, v)} since for any given u, v, that path is uniquely determined. Further, Pingali and Bilardi cite that due to the structure of the postdominance relation, the parent of u (its immediate postdominator) is also an ancestor of v. This then allows further simplification to cd(uv) ={nodes on the path from v up to but excluding parent(u)} which can be represented as a unique open interval on the postdominance tree: cd(uv) = [v, parent(u)) The interprocedural case presents us with additional challenges, however. The fact that the relation is a DAG, rather than a tree, means that nodes may have more than one parent. Thus LCA(u, v) is not well-defined. In addition, using the LCA as in the intraprocedural case yields an incorrect result. Returning to our previous example, consider LCA(a, b). Fs is a least common ancestor of both nodes, but only considering the interval from b to Fs means that all nodes on the other path (d) will be excluded, even though they are control dependent on a b. The interval endpoint we are more interested in is f, which is a least total ancestor of a and b. However, even if we are able to compute the LTA of a and b efficiently, the interval from b to f still does not completely specify cd(ab) because we need to exclude all nodes reachable from a. END f START Fe e d Fs b a Thus we consider a reformulation as the Roman Aqueducts problem. In this formulation we have a set of cities connected by a large network of aqueducts. The Romans are occasionally faced with water pollution problems and the Emperor then is concerned: given that a city u is polluted, and city v is not, what other cities that get water from city v have water that is safe to drink? This is exactly the computation we want, where we wish to compute {nodes reachable from v} – {nodes reachable from u}. This could be computed naively by marking all nodes reachable from v, then marking all nodes reachable from u, and then collecting any nodes that have the first mark but not the second. However, we’d like to reduce the number of reachability computations that must be made. To do this, we break the reachable sets (aqueduct systems) into smaller intervals (aqueducts). Each aqueduct represents a segment of the system that has no split points (nodes with multiple parents). For example, consider computing cd(ab) in the above example. This requires computing {nodes reachable from b} – {nodes reachable from a} or [b, END] – [a, END]. The aqueducts in [b, END] are listed in the first column of the following table. Aqueduct Reachable LCA(startpoint, Modified from a? a) Aqueduct [b, b] N - [b, b] [d, END] Y f [d, f) [Fs, END] Y Fs [Fs, Fs) Once we have these, we then ask, for each aqueduct, is its top endpoint reachable from a? If not, then the entirety of that aqueduct is ‘unpolluted’ by b, so we preserve that aqueduct unchanged, otherwise we compute the LCA of the start point of the aqueduct and a. Because each aqueduct has been constructed so as to eliminate any cross edges (no node has more than one parent), the LCA of these two nodes is well defined and will lie on somewhere inside the aqueduct. This can be computed by starting at the bottom endpoint and walking up the interval until a node reachable from a is encountered. Having computed this LCA, we replace the aqueduct’s top endpoint with it (the interval also becomes an open interval, so as to exclude the LCA itself). Then the modified aqueducts represent the ‘clean’ segments of the aqueduct system, because all nodes reachable from a have been excluded. In this example, cd(ab) = {b, d}. Here we need only answer a reachability query once for each interval, plus some additional number of times when finding the LCA on a specific interval – but those additional queries will in fact be proportional to the size of the output since we stop on encountering a node that doesn’t belong in the cd set. Thus the runtime for computing cd(uv) is near-optimal, if the aqueduct intervals are computed in a preprocessing stage. Having mastered the intervals on the postdominance DAG, answering conds and cdequiv queries is straightforward. We can apply the methods from the intraprocedural case proposed by Pingali and Bilardi because none of them rely on properties a tree has that a DAG does not. For instance, to answer conds queries we can make use of variable caching at interior nodes. In a DAG, as in a tree, the only intervals that could possibly be in the conds set of a given node must be cached at or beneath it. Likewise, the fingerprints of conds sets for the DAG case (size and the lowest common node for all intervals in the conds set) will uniquely identify them and can be used for fast comparisons to answer cdequiv queries. Section 6: Conclusions and Future Work What we have presented in the preceding sections is a problem in interprocedural analysis worthy of further study. We have shown how existing iterative algorithms can be applied to a proper formulation of the interprocedural postdominance relation computation and the foundation of what we believe will lead to practical direct algorithms to accomplish the same task. Further, we have taken the first steps toward extending the known optimal algorithm for responding to control dependence queries to the interprocedural case. Finally, we have assembled a substantial bank of interesting examples that have provided enormous insight into the problem. Our next task is to construct a working implementation of the iterative approach to computing postdominance. With that in place we hope to use it as both a testbed and benchmark for the algorithms whose development we have written about. Section 7: References [1] G. Bilardi and K. Pingali. A Framework for Generalized Control Dependence. In Proc. of SIGPLAN’96 Conf. on Prog. Lang. Design and Implem., pages 291-300, May 1996. [2] G. Bilardi and K. Pingali. The Static Single Assignment Form and its Computation. November, 1998. [3] J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. on Prog. Lang. and Sys., 9(3):319-349, July 1987. [4] R. Gupta. Generalized Dominators. 1991. [5] M. Harold, G. Rothermel, and S. Sinha. Computation of Interprocedural Control Dependence. 1998. [6] T. Lengauer and R. E. Tarjan. A Fast Algorithm for Finding Dominators in a Flowgraph. ACM Transactions on Programming Languages and Systems, 1(1):121-141., July 1979. [7] K. Pingali and G. Bilardi. Optimal control dependence computation and the Roman Chariots problem. ACM Transactions on Programming Languages and Systems, 19(3):462, May 1997. [8] M. Sharir and A. Pnueli. Two approaches to interprocedural dataflow analysis. Prentice Hall, 1981.