Interprocedural Control Dependence Computation James Ezick & Kiri Wagstaff Section 1: Introduction

advertisement
Interprocedural Control Dependence Computation
James Ezick & Kiri Wagstaff
CS 612 Final Paper
13 May 1999
Section 1: Introduction
Control dependence is a fundamental notion in program optimization. While a
significant body of work exists which explores the computation of control dependence in
the case of a single block of code [1,3,7], comparatively little exists which examines the
interprocedural case [5,8]. We seek to extend existing methods of computing control
dependence in an intraprocedural setting to the case of programs with multiple
procedures.
Consideration of modern programming techniques suggests a trend away from the single
monolithic blocks of code common to languages such as PASCAL and FORTRAN
toward the more fine grain division of code encouraged by object oriented languages
(C++, JAVA). It is in the light of these observations that the question of the computation
of interprocedural control dependence gains increasing relevance.
In addition to the problems in program analysis and parallelization that control
dependence was introduced to solve we recall that the problem of computing control
dependence is identical to the problem of computing the edge dominance frontier relation
of the reverse graph. It is this relation that can be used in the construction of the SSA
form of a program [2]. SSA form facilitates a wide range of optimizations – from
constant propagation to code hoisting.
The scope of our activity has been to attempt to develop a means to compute control
dependence in the interprocedural case. To this end we have divided the problem into
two parts. First, we present a means of computing postdominance for control flow graphs
involving multiple (possibly recursive) procedures. Second, given the postdominance
relation we attempt to extend the work of Pingali and Bilardi [7] to answer control
dependence queries following suitable preprocessing in optimal time.
Thus far, we have a well-defined iterative method for computing interprocedural
postdominance as well as numerous ideas, formulations, and partial solutions for both
computing postdominance without iteration and for answering control dependence
queries from a preprocessed form of the postdominance relation.
Section 2: Problem Statement
We assume that we are given an interprocedural control flow graph for the program.
Definition: control flow graph G = (V, E) is a directed graph in which nodes represent
statements, and an edge u  v represents a possible flow of control from u to v. Set V
contains two distinguished nodes: Start, with no predecessors and from which every node
is reachable, and End, with no successors and reachable from every node. Further, we
assume an edge Start  End.
Definition: interprocedural control flow graph G = (V, E, P) is a control flow graph
augmented with a set P of pairs of call and return edges. The elements of P are pairs (C,
R) of edges in which C must terminate at a distinguished node Start for some procedure
and in which R must originate from the distinguished node End of the same procedure.
Further, the procedures of G must from a partition of both V and E with these partitions
connected only by edges in P. For each procedure, End must be reachable from Start
although we do not require and edge Start  End.
An immediate consequence of this formulation is that we introduce paths of control
through the graph that do not correspond to any execution of the program. We seek to
eliminate these paths from our formulation by redefining the nature of a path.
Definition: path – Any sequence of connected nodes in the interprocedural control flow
graph.
Definition: complete path – Given an initially empty stack S and interpreting a call edge
in the interprocedural control flow graph as an edge which pushes a return site and a
return edge as one which pops the site if it returns to the same site as the top of stack and
does nothing otherwise a complete path is any path for which S is empty as the terminal
node.
Definition: total complete path – Any complete path originating at start and terminating
at end.
Definition: valid path – v is a valid path if and only if there exists a prefix path p and
suffix path s (both possibly empty) such that p v s is a complete path.
Given our definition of a valid path through the control flow graph we adapt the usual
definitions of control dependence and postdominance accordingly.
Definition: interprocedural postdominance – u interprocedurally postdominates
(IPDOM) v if and only if every valid path from v to end contains u.
Definition: interprocedural control dependence – w is interprocedurally control
dependent (IPCOND) on edge u  v if and only if:
 IPDOM(w, v) and
 If w  u then not IPDOM(w, u).
Given these definitions we seek first to compute the postdominance relation and then to
preprocess that information into a form conducive to answering interprocedural control
dependence queries.
Section 3: Computing Interprocedural Postdominance Iteratively
We first observe that, in contrast to the intraprocedural postdominance relation, the
interprocedural postdominance relation forms a directed acyclic graph, rather than a tree.
That the relation is acyclic is immediate from the observation that the relation is both
transitive and antisymmetric (properties which follow directly from the definition). That
the relation is no longer tree-structured is best illustrated with an example.
Consider the following simple control flow graph and its associated postdominance
relation:
END
START
f
a
a
b
START
Fe
c
e
d
call F
d
e
call F
Fs
b
c
f
END
a
Function F postdominates each of its call sites (b and c). Each return site (d and e) also
postdominates the corresponding call site. However, neither return site can be said to
postdominate the function F (e.g. d does not postdominate F because an alternative path
(through e) exists from START to END which does not pass through d). As a result, the
call sites have more than one parent in the postdominance relation, which therefore can
no longer be tree-structured.
A straightforward way to approach the problem of computing this interprocedural
postdominance relation is to view it as a dataflow problem [8]. For simplicity and
because it represents a forward-flow problem, we present an iterative dataflow algorithm
for computing dominance in a control flow graph, observing that the same algorithm can
be used to compute postdominance by reversing the edges in the control flow graph and
working backwards through it.
Clearly there is an additional consideration when developing an algorithm to do
interprocedural analysis: we must determine in which order to process the functions.
Therefore we perform a topological sort of the strongly connected components of the call
graph and work backward through it. We collapse strongly connected components
(corresponding to cycles in the call graph) to single nodes, and then perform the analysis
on their equations simultaneously. This allows us to substitute the solutions to the
equations solved for a function back into equations for the body of code that calls the
function.
Having selected a function to process, we apply a standard dataflow algorithm to it. The
lattice involved in our dataflow analysis is the powerset of all nodes in the control flow
graph of the current function or set of mutually recursive functions. Unknowns are
initialized to U, the set of all nodes in the CFG for the current set of functions, rather than
null, because we seek a greatest fixed point.
The dataflow equations for individual nodes are defined as follows:
Sin
Start
Sout
Sout = {Start}
S2
S1
S1
S
Sout
FEnd
Sout = Sin  {S1}
Sout = (S1  S2)  {S}
These equations are standard for a forward-flow dataflow problem. The interesting case
is how information propagates across function calls. The following equation indicates
that the dominance set after a function call is composed of the dominators of the call node
(Sin), the call and return nodes (Call F, Ret F), and the dominators of the function being
called (SF).
Sin
Call F
Ret F
SF
Sout
Sout = Sin  {Call F, Ret F}  SF
SF = {intraprocedural dominators of F}
Note that this formulation allows us to do a context-sensitive analysis. Depending upon
where the call to function F occurs, Sout will contain different call and return sites. This
also means that information propagates correctly along valid paths in the CFG. This fits
the traditional notion of a context sensitive analysis in which the output is a function of
the input. In this case the transition function simply appends the dominators of the
function to the set containing the call and return sites. This realization about the
simplicity of the transition function motivates our use of set rather than function notation
for SF.
Consider the following example, which illustrates how the dataflow algorithm works and
contains some interesting cases. The main function calls function F regardless of which
branch of the conditional at a is taken. Function F makes a recursive call to itself.
START
FSTART
a
Fa
a
call F
b
c
d
e
Fb
call F
call F
Fd
Fc
f
Fe
END
FEND
Main
F
The call graph for this program is as follows:
Main
F
Thus we begin our analysis on function F. For convenience, we refer to the dominance
sets as Sout(edge). The equations for F are given as follows:
Sout(FSTARTFa) = {FSTART}
Sout(FaFb) = Sout(FSTARTFa)  {Fa}
Sout(FaFd) = Sout(FSTARTFa)  {Fa}
Sout(FdFe) = Sout(FaFd)  {Fd}
Sout(FcFe) = Sout(FaFb)  {Fb, Fc}  SF
Sout(FeFEND) = (Sout(FcFe)  Sout(FdFe))  {Fe}
SF = Sout(FeFEND)  {FEND}
All but the last three equations will converge after one iteration. The remaining ones will
converge after two iterations:
Sout0(FcFe) = U = {FSTART, Fa, Fb, Fc, Fd, Fe, FEND}
Sout0(FeFEND) = U
SF0 = U
Sout1(FcFe) = Sout(FaFb)  {Fb, Fc}  SF = U
Sout1(FeFEND) = (Sout(FcFe)  Sout(FdFe))  {Fe}
= {FSTART, Fa, Fd, Fe}
SF1 = {FSTART, Fa, Fd, Fe, FEND}
Sout2(FcFe) = U
Sout2(FeFEND) = {FSTART, Fa, Fd, Fe}
SF2 = {FSTART, Fa, Fd, Fe, FEND}
The resulting sets for the function F are:
Sout(FSTARTFa) = {FSTART}
Sout(FaFb) = {FSTART, Fa}
Sout(FaFd) = {FSTART, Fa}
Sout(FdFe) = {FSTART, Fa, Fd}
Sout(FcFe) = {FSTART, Fa, Fb, Fc, Fd, Fe, FEND}
Sout(FeFEND) = {FSTART, Fa, Fd, Fe}
SF = {FSTART, Fa, Fd, Fe, FEND}
Note that this also demonstrates the importance of initializing unknowns to U rather than
the null set. Doing this ensures that we get a greatest fixed point. Had we initialized
Sout(FcFe) to null, we would never add Fd to its set. Thus, because of the presence of
the intersection operator, Fd would have been excluded from SF. This is clearly not what
we want since it is clear that all paths out of the recursive function F and back to Main
must go through Fd (else a new call to F would be generated instead of a return to Main).
At this point, convergence for function F has been achieved. Returning to the main
function, we likewise compute:
Sout(STARTa) = {START}
Sout(ab) = Sout(START)  {a}
Sout(ac) = Sout(START)  {a}
Sout(df) = Sout(ab)  {b, d}  SF
Sout(ef) = Sout(ac)  {c, e}  SF
Sout(fEND) = (Sout(df)  Sout(ef))  {f}
SMAIN = Sout(fEND)  {END}
Having already computed SF, all of these equations will converge after one iteration, to:
Sout(STARTa) = {START}
Sout(ab) = {START, a}
Sout(ac) = {START, a}
Sout(df) = {START, a, b, d, FSTART, Fa, Fd, Fe, FEND}
Sout(ef) = {START, a, c, e, FSTART, Fa, Fd, Fe, FEND}
Sout(fEND) = {START, a, f, FSTART, Fa, Fd, Fe, FEND}
SMAIN = {START, a, f, END, FSTART, Fa, Fd, Fe, FEND}
This example illustrates how the iterative algorithm handles function calls and recursion.
Note that Sout(fEND) correctly includes all of the dominators from function F, since it
is called on either side of the conditional.
Convergence in general is guaranteed due to the finite nature of the lattice and the
monotonicity of the dataflow equations. Although they contain both union and
intersection operations, it is still true for all equations that for x, y  {all dominance sets},
x  y  f(x)  f(y).
Finally, we note that while the algorithm requires exponential time in theory (since the
lattice contains the powerset of a subset of the nodes of the control flow graph) it is
usually polynomial in practice. However, we still do not expect this method to be as
efficient as non-iterative algorithms.
Section 4: Computing Postdominance without Iteration
As we expect that computing the interprocedural postdominance relation by means of
iteration until a greatest fixed point is found will be slow, we seek a method that produces
the same solution without the need for iteration. In the intraprocedural case, efficient
algorithms exist to do this computation [6]. These algorithms range from naïve quadratic
time algorithms based on depth first search to more sophisticated (though still practical)
linear time algorithms.
In the interprocedural case, the matter of computing postdominance is complicated by the
existence of invalid paths in the control flow graph. While this problem is dealt with
implicitly in the context sensitive analysis of the previous section, it must be confronted
explicitly in non-iterative algorithms.
An additional goal in developing a non-iterative algorithm is to eliminate to the greatest
extent possible the need for cloning of nodes of procedures called from multiple
locations. This forces us to deal with the existence of invalid paths. The temptation to
clone comes from the observation that in the absence of recursion, it is possible to
eliminate invalid paths in the CFG by cloning the subgraph for each procedure once for
every time it is called and ‘inlining’ that subgraph in the appropriate places. However,
this leads to an undesirable explosion in the size of the CFG and any algorithm that must
process it suffers a big hit in runtime. Additionally, recursion cannot be properly dealt
with, as it would require an infinite graph.
The solution to this problem that we are developing involves first computing the
postdominance relation for each procedure with placeholders for called procedures and
then “gluing” these pieces together. This approach is complicated by two factors.
First, we have been able to generate a sufficient bank of examples to convince ourselves
that in addition to which procedures are or might be called from a particular code block
we also need to know the sequence of these calls. Thus, when computing postdominance
for a single procedure we also need to know all of the possible sequences of calls that can
be initiated from that procedure and the point(s) at which they are initiated.
We accomplish this by introducing the notion of “sparse stubs”. When constructing the
control flow graph, we construct in parallel a scaled down representation of the control
flow graph consisting of only the entry and exit points to functions as well as the decision
points that precede these calls. The result is a graph such that every path from start to
end represents a possible sequence of procedure calls and returns in an execution of the
program. Further, every such possible sequence is represented in the sparse graph. We
note immediately that the graph may have repeated subgraphs. That is, if a procedure’s
start and end each occur twice in the graph (they must occur in pairs as per our
assumptions on the program’s design) then the connected subgraph between one pair will
be isomorphic to the subgraph between any other pair. In other words, the subgraph
representing a procedure F will be identical regardless of where it is called. Given this
observation, we simply choose one pair and inject that subgraph into the “gap” between
the call and return site of the corresponding instance of the procedure in the control flow
graph.
Start
Start
a
F-Start
F-Start
F-Start
F-Subgraph
F
F-Start
F
F-End
End
c
F
F-End
F-End
b
F
d
e
f
Sparse Representation
End
Plugged CFG for Main
This technique leads to our second problem. We notice that in the case of a procedure
being called two or more times from a procedure we will have repeated nodes within the
“filled” control flow graph. While it is clearly the case that we would like to treat these
nodes as copies of a single node, we cannot simply merge all of the edges of one copy
with the edges of another, as this would introduce invalid paths into our control flow
graph. The utility of this “plugged” graph comes from the fact that it does not contain
any invalid sequences of nodes. To compute postdominance on this special form of
control flow graph we need to introduce the notion of union postdominance based on the
concept of generalized dominators [4].
Definition: union postdominance – A set of nodes T in a control flow graph is said to
union postdominate a set of nodes S if and only if for every node s in S every path from s
to End contains an element of T.
Given this definition of union postdominance, we partition the “filled” control flow graph
into sets consisting of a node and all of its copies. (Nodes without copies form singleton
sets of the partition). We then seek to construct the DAG that is the transitive reduction
of the union postdominance relation on the sets of the partition.
Given these component pieces it is our claim that in the non-recursive case at least we
have enough information to put these component graphs together in topological order
with respect to the call graph.
In the DAG for a specific procedure F that calls procedure G we know that we will find
nodes for G-Start and G-End. Further these nodes will define a subgraph with G-Start the
source and G-End the sink. This subgraph may be empty or it may have other procedure
start and end nodes inside of it depending upon whether or not G calls any other
functions. In either case, we replace this subgraph with the one constructed for G. By
the nature of the G and F DAGs, any node appearing in the deleting subgraph of the F
DAG will be included in the injected G DAG. From this we see that no information is
lost and the process can continue by injecting the graphs for procedures called by G.
It is worth noting that while this works for programs with acyclic call graphs it does not
work in the recursive case. In that case the plugs from the sparse representation introduce
invalid paths which break the algorithm. At present I am working on developing a new,
augmented sparse representation with call site information for recursive functions that
may alleviate this problem.
Section 5: Answering Control Dependence Queries
Once we have computed the interprocedural postdominance relation, we can use it to
answer control dependence queries.
There are three types of queries of interest in the realm of control dependence:



cd(e) is the set of nodes control dependent on edge e.
conds(v) is the set of edges v is control dependent on.
cdequiv(v) is the set of nodes with the same control dependencies as node v.
In the intraprocedural case, Pingali and Bilardi have shown methods for computing
control dependence information in optimal (proportional to the size of the output) time
[7]. However, their methods exploit structure in the intraprocedural postdominance
relation, namely the fact that it forms a tree. As we have shown, the interprocedural
postdominance relation generalizes only to a DAG, so we are unable to directly apply the
Roman Chariots approach. However, it forms a starting point for our approach.
In answering the above control dependence queries, we must perform numerous
reachability computations. For instance we know that,
cd(uv) ={nodes reachable from v} – {nodes reachable from u}.
For a tree-structured relation, this reduces to
cd(uv) ={nodes on the path from v up to but excluding LCA(u, v)}
since for any given u, v, that path is uniquely determined. Further, Pingali and Bilardi
cite that due to the structure of the postdominance relation, the parent of u (its immediate
postdominator) is also an ancestor of v. This then allows further simplification to
cd(uv) ={nodes on the path from v up to but excluding parent(u)}
which can be represented as a unique open interval on the postdominance tree:
cd(uv) = [v, parent(u))
The interprocedural case presents us with additional challenges, however. The fact that
the relation is a DAG, rather than a tree, means that nodes may have more than one
parent. Thus LCA(u, v) is not well-defined. In addition, using the LCA as in the
intraprocedural case yields an incorrect result. Returning to our previous example,
consider LCA(a, b). Fs is a least common ancestor of both nodes, but only considering
the interval from b to Fs means that all nodes on the other path (d) will be excluded, even
though they are control dependent on a b. The interval endpoint we are more interested
in is f, which is a least total ancestor of a and b. However, even if we are able to
compute the LTA of a and b efficiently, the interval from b to f still does not completely
specify cd(ab) because we need to exclude all nodes reachable from a.
END
f
START
Fe
e
d
Fs
b
a
Thus we consider a reformulation as the Roman Aqueducts problem. In this formulation
we have a set of cities connected by a large network of aqueducts. The Romans are
occasionally faced with water pollution problems and the Emperor then is concerned:
given that a city u is polluted, and city v is not, what other cities that get water from city v
have water that is safe to drink? This is exactly the computation we want, where we wish
to compute {nodes reachable from v} – {nodes reachable from u}. This could be
computed naively by marking all nodes reachable from v, then marking all nodes
reachable from u, and then collecting any nodes that have the first mark but not the
second.
However, we’d like to reduce the number of reachability computations that must be
made. To do this, we break the reachable sets (aqueduct systems) into smaller intervals
(aqueducts). Each aqueduct represents a segment of the system that has no split points
(nodes with multiple parents). For example, consider computing cd(ab) in the above
example. This requires computing {nodes reachable from b} – {nodes reachable from a}
or [b, END] – [a, END]. The aqueducts in [b, END] are listed in the first column of the
following table.
Aqueduct
Reachable LCA(startpoint, Modified
from a?
a)
Aqueduct
[b, b]
N
-
[b, b]
[d, END]
Y
f
[d, f)
[Fs, END]
Y
Fs
[Fs, Fs)
Once we have these, we then ask, for each aqueduct, is its top endpoint reachable from a?
If not, then the entirety of that aqueduct is ‘unpolluted’ by b, so we preserve that
aqueduct unchanged, otherwise we compute the LCA of the start point of the aqueduct
and a. Because each aqueduct has been constructed so as to eliminate any cross edges
(no node has more than one parent), the LCA of these two nodes is well defined and will
lie on somewhere inside the aqueduct. This can be computed by starting at the bottom
endpoint and walking up the interval until a node reachable from a is encountered.
Having computed this LCA, we replace the aqueduct’s top endpoint with it (the interval
also becomes an open interval, so as to exclude the LCA itself). Then the modified
aqueducts represent the ‘clean’ segments of the aqueduct system, because all nodes
reachable from a have been excluded. In this example, cd(ab) = {b, d}.
Here we need only answer a reachability query once for each interval, plus some
additional number of times when finding the LCA on a specific interval – but those
additional queries will in fact be proportional to the size of the output since we stop on
encountering a node that doesn’t belong in the cd set. Thus the runtime for computing
cd(uv) is near-optimal, if the aqueduct intervals are computed in a preprocessing stage.
Having mastered the intervals on the postdominance DAG, answering conds and cdequiv
queries is straightforward. We can apply the methods from the intraprocedural case
proposed by Pingali and Bilardi because none of them rely on properties a tree has that a
DAG does not. For instance, to answer conds queries we can make use of variable
caching at interior nodes. In a DAG, as in a tree, the only intervals that could possibly be
in the conds set of a given node must be cached at or beneath it. Likewise, the
fingerprints of conds sets for the DAG case (size and the lowest common node for all
intervals in the conds set) will uniquely identify them and can be used for fast
comparisons to answer cdequiv queries.
Section 6: Conclusions and Future Work
What we have presented in the preceding sections is a problem in interprocedural
analysis worthy of further study. We have shown how existing iterative algorithms can
be applied to a proper formulation of the interprocedural postdominance relation
computation and the foundation of what we believe will lead to practical direct
algorithms to accomplish the same task. Further, we have taken the first steps toward
extending the known optimal algorithm for responding to control dependence queries to
the interprocedural case. Finally, we have assembled a substantial bank of interesting
examples that have provided enormous insight into the problem.
Our next task is to construct a working implementation of the iterative approach to
computing postdominance. With that in place we hope to use it as both a testbed and
benchmark for the algorithms whose development we have written about.
Section 7: References
[1]
G. Bilardi and K. Pingali. A Framework for Generalized Control Dependence. In Proc. of
SIGPLAN’96 Conf. on Prog. Lang. Design and Implem., pages 291-300, May 1996.
[2]
G. Bilardi and K. Pingali. The Static Single Assignment Form and its Computation. November,
1998.
[3]
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in
optimization. ACM Trans. on Prog. Lang. and Sys., 9(3):319-349, July 1987.
[4]
R. Gupta. Generalized Dominators. 1991.
[5]
M. Harold, G. Rothermel, and S. Sinha. Computation of Interprocedural Control Dependence.
1998.
[6]
T. Lengauer and R. E. Tarjan. A Fast Algorithm for Finding Dominators in a Flowgraph. ACM
Transactions on Programming Languages and Systems, 1(1):121-141., July 1979.
[7]
K. Pingali and G. Bilardi. Optimal control dependence computation and the Roman Chariots
problem. ACM Transactions on Programming Languages and Systems, 19(3):462, May 1997.
[8]
M. Sharir and A. Pnueli. Two approaches to interprocedural dataflow analysis. Prentice Hall,
1981.
Download