Classical Dependence Analysis Techniques: Sufficiently Accurate in

advertisement
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
Classical Dependence Analysis Techniques:
Sufficiently Accurate in Practice
Kleanthis Psarris
Santosh Pande
Department of Computer Science
The University of Texas at San Antonio
San Antonio, TX 78249
psarris@ringer.cs.utsa.edu
Department of Computer Science
Ohio University
Athens, OH 45701
pande@ace.cs.ohiou.edu
Abstract
In an earlier work [19, 201 we formally studied the
accuracyof the Banerjee-Wolfeand GCD tests.We derived
a set of conditions, which can be tested along with the
Banerjee inequality and the GCD test, to prove data
dependence.The cost of testing theseconditions is proven
to be linear in the number of variables in the dependence
equation. In this paper we perform an empirical study on
the Perfect benchmarks[7] to evaluate our formal results
and demonstratethe effectivenessand practical importance
of our extensionsto the Banerjee-Wolfe and GCD tests.
We show that our extensions indeed prove to be always
accuratein practice. Our empirical study indicates that the
Banerjeeinequality extendedwith our accuracyconditions,
becomes an exact test, i.e., a necessary as well as
sufficientconditionfor datadependence.
In Section 2 we discuss Data DependenceAnalysis
and review the GCD and Banerjee-Wolfe tests. In Section
3 we presentour extensionsto the Banerjee-Wolfetest and
to a combinationof the GCD and Banerjee-Wolfe tests.In
Section 4 we demonstrate the effectiveness of our
conditions in actual practice, by performing an empirical
study on the Perfect benchmarks.In Section 5 we discuss
related work and compareour results with other methods.
Finally, in Section 6 we presentour conclusions.
Data Dependence Analysis is the foundation of any
parallelizing compiler. The GCD test and the BanerjeeWolfe test are the two tests traditionally
used to
determine statement data dependence in automatic
vectorization I parallelization of loops. These tests are
approximate in the sense that they are necessary but not
sufficient conditions for data dependence. In an earlier
work we extended the Banerjee-Wolfe
test and a
combination of the GCD and Banerjee-Wolfe tests with a
set of conditions to derive exact data dependence
information. In this paper we perform an empirical study
on the Perfect benchmarks to demonstrate the
effectiveness
and practical
importance
of our
conditions. We show that the Banerjee-Wolfe test
extended with our conditions becomes an exact test for
data dependence in actual practice.
1. Introduction
In automatic parallelization of sequential programs,
parallelizing compilers [2, 4, 12, 18, 24, 271 perform
subscript analysis [3, 5, 6, 26, 291 to detect data
dependencesbetweenpairs of array referencesinside loop
nests. The data dependenceproblem is equivalent to the
integer programming problem, a well known NP-hard
problem, and, therefore, can not be solved efficiently in
general. A number of subscript analysis tests have been
proposedin the literature [6,9, 10, 15, 16,21,22, 281.In
eachtest there is a different tradeoff betweenaccuracyand
efficiency.
The most widely used approximatesubscriptanalysis
tests are the GCD test and the Banerjee-Wolfetest [6,26,
291.The major advantageof thesetests is their simplicity
and their low computational cost, which is linear in the
number of variables in the dependenceequation. This is
the primary reason that they have been adopted by most
parahelizing compilers. However, both the GCD test and
the Banerjee-Wolfe test are necessarybut not sufficient
conditionsfor datadependence.When independence
can not
be proved, both testsapproximateon the conservativeside
by assumingdependence,so that their use never results in
unsafeparallelization.
2. Data Dependence Analysis
Consider two statements, S1 and S2 (Figure l),
containing potentially conflicting references to a pdimensional array A. We assume that the subscript
expressions are linear functions of the loop iteration
variables, i.e.,
*for 1 I v I p, fv(Il, 12, ... , Ir) is a function of the form
+ VT0+ +v,l ‘1 + %,2 ‘2 + **. + @v,rIr
where each of $v,o, ev 31, ... , Qvs is an integer.
*for 1 I v I p, gv(I1, 12, .. . , IT) is a function of the form
%,o + %,l I1 + TV,2 12 + ***+ qq Ir
where each of T,,~, rv 1, ... , y, r is an integer.
9
,
123
1060-3425/95$4,0001995IEEE
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95)
1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
DOI
Definition
=Ll,U1
DO I2 = L2, U2
.
.
l
lf2(i1,i2,... , ir)=g2Cjl,
j,, ... ,jr)
.
.
.
‘$(il, i2,... , ir)= spcil.j2,
... ,jr)
DO Ir = Lr, U,
sl:
A(fl(Il,
s2:
... = A(glU1’ 12v... 7 I,), .., , gp(I1, I29 ... , $1
.
I27 ... , I,), ... , fp(I1, 12, ... , I,)) = ...
then
*if S 1 (i) < S2 (j), there is said to be a data
END
dependence from SI (i) to S2(j).
END
END
*if S,(j)
< S1 (i), there is said to be a data
dependence from S2(j) to S*(i).
A Nest of Loops
Figure 1
n
Full data dependenceinformation, betweenS 1 and S2,
consistsof all setsof orderedpairs (S 1(i), S,(j)) such that
there is a dependencefrom S,(i) to S,(j) and all sets of
order pairs (S,(j), Sl(i)) such that there is a dependence
from S,(j) to Sl(i). Becausethe number of such pairs is
often very large, we define the notion of dependence with
a direction vector [26], and often compute only the set of
directionvectorsof datadependences.
Each iteration of a nest of loops is identified by an
iteration vector whose elements are the values of the
iteration variablesfor that iteration. A statementembedded
in a nest of loops may be executedonce for each iteration
of the nest. Each potential execution of a statement is
termed an instance of the statement. An instance of a
statementis representedby the statementtogetherwith an
iteration vector. For example, the instance of statement
S1 during iteration i = (i,, i2, ... , ir) is denoted by Sl(i)
or S1(il, i2, ... , if) and the instance of statement S2
during iteration j = (jl, j,, ... , jr) is denoted by S,(j) or
S2cil, j,, ... , jr>.
Definition
2
Let Sl(i), and S,(j) be as in Definition 1.
If
fl(il, i2, ... , ir) = glcil, j,, ... , jr>
Definition
3
Let Sl(i), and S,(j) be as in Definition 1.
A vector of the form (el, e2, ... , e,), where
ek E (C, =, >, *), 1 I k 2 r, is termed a direction
1
Let S1 and S2 be two statementsas in Figure 1. Let
vector.
If there is a data dependencefrom S ,(i) to S,(j), and
if for 1 I kl r,ik ek j,, i.e., if the relation ek holds
between ik and j,, then there is said to be a data
i = (i,, i2, .. . , ir) and j = (il, j,, . .. , jr>
where Lk < ik, j, 5 Uk, for 1 I k I r.
dependence
If (i,, i2, ... , i,.) I (j,, j,, ... , jr), then S,(i) is said
to precede S,(j), denoted S,(i) < S,(j).
If t.il, j,, . . . , jr) < (i,, i2, . . . , ir), then S,(j) is said
to precede S,(i), denoted S,(j) c Sl(i).
S,(i) < S,(j), of course,meansthat, in the sequential
execution of the loop of Figure 1, if both S,(i) and S,(j)
execute, then S,(i) executesbefore S,(j); S,(j) < Sl(i)
means that, if both S,(j) and Sl(i) execute, then S,(j)
executes before S,(i).
from
SI to S2 with
direction
vector
(eI. e2, . , e$ .
Similarly if there is a data dependencefrom S,(j) to
Sl(i), and if for 1 I k < r, j, ek ik, then there is said to
be a data dependence from S2 to S, with direction vector
(e I9 e2, . . . . er) .
The symbol ‘*’ stands for an arbitrary relationship
betweenik and j,, i.e., it indicates a don’t care situation.
n
n
124
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95)
1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences -
We see from Definitions l-3 that a data dependence
exists from S1 to S2 with direction vector (e,, e2, ... , e,)
if and only if there exist iteration vectors i and j such that
the following systemof linear equations
fl(il,
V== (ul 1 <u<randeU=“=“)
i2, ... , ir) = gl(il, j,, ... , jr)
f2(i1, i2, ... , $) = g2cj 1vj2, ... , $1
(1)
11u5randeU=“>“]
Using the notation defined in Definition 4, we can
rewrite equation(4) as
-Yu)iu
c Mu
uEv=
(2)
+ c N&- r,j,>+ C (4&- r&J
and subjectto dependencedirections
UEV’
UEV<
ik = j,, for 1 -< k I r, such that ek = “=“
+c Ot+&
-Y&J=
(3)
Yo - $0
(5)
WV*
ik > j,, for 1 I k I r, such that ek = “B“
and the constraintsin (2) and (3) as
This problem is equivalent to the integer
programming problem and, therefore, can not be solved
efficiently in general. One taken approach, known as
subscript by subscript testing [6, 26, 291, is testing one
equation at a time and assuming that there is a data
dependenceunless at least one of the equations can be
shown not to have a constrained integer solution. This
method introduces a conservative approximation if and
only if we are testing multidimensional arrays with
coupled subscripts. We say that two different subscript
pairs in a multidimensional array are coupled [ 14, 151if
they contain the same loop index variable. In case of
coupled subscripts, techniques such as constraint
propagation [ 101,wherever applicable, or the lambda test
[ 151can be applied to reduce the problem to testing single
equationsfor constrainedinteger solutions.
Lu 5 iu I Uu, for u E V=
LUIiU,jU5UU,foruE
V<uV’vV*
(6)
and
iucjuforuE
V<
iu > ju for u E V’
(7)
Note that the constraint iu = ju for u E V= has been
expressedby simply eliminating the distinction between
variablesiu andju for u E V=.
Finally, eliminating terms with zero coefficients, and
renaming coefficients and variables so that all inequality
constraints are less than constraints, we see that the
problem can be reducedto that of determining whether an
equationof the form
Consider,therefore,one equationof the form:
(4)
We proceedby simplifying the notation.
Definition
V’=(uI
n
has a simultaneousinteger solution subject to loop limits
on the values of the variables
($1 il - y1 jl) + ... +(4+ir-yrY,jr)=Yo-~o
l<uIrandeU=“<“)
It is clear that the sets V=, V<, V’, and V* form a
partition of the set of integers ( 1, 2, ... , r).
fp(il, i2, ... , $) = gpt.il, j,, ... , jr)
ik c j,, for 1 I k 5 r, such that ek = “<“
V<=(uI
V* =(~I1~~Irande~=“*“)
.
.
.
Lk I ik, j, I Uk, for 1 I k I r
1995
n
c
4
Consider equation (4) together with direction vector
(el, e2, ... , e,). The sets V=, V’, V’, and V* are defined
as follows:
aiXi +
i=l
125
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95)
1060-3425/95 $10.00 © 1995 IEEE
i=
c
n+l
(
biYi
+ ciZi)
= a0
(8)
Proceedings of the 28th Annual Hawaii International Conference on System Sciences -
sufficient conditions for data dependence.It is proven in
this study, that these conditions are almost invariably
satisfiedin practice making the extended Banerjee-Wolfe
and GCD testsexact as well as efficient.
all of whose coefficients are non zero integers, has an
integer solution satisfying constraintsof the form:
*Mi I Xi I Ni, for 1 I i I n
*Mi I Yi c Zi I Ni, for n+l I i I n+m
the
Banerjee-Wolfe
1
Corollary
(9)
Considerequation (8) together with the constraintsin (9).
For each i, n+l I i I n+m, let
The GCD test and the Banerjee-Wolfetest [6,26,29]
are the two tests traditionally used in parallelizing
compilers to determine whether an equation of the form
(8) has an integer solution satisfying constraints of the
form (9). Neither the GCD test nor the Banerjee-Wolfe
test actually checks the equation for the existence of a
constrained integer solution. The GCD test ignores loop
limit and direction vector inequality constraints entirely
and it simply determines whether the equation has an
unconstrained integer solution. The Banerjee-Wolfetest,
on the other hand, takes constraints into account, but
determines whether the equation has a constrained real
solution.
The GCD test computesthe greatestcommon divisor
of the coefficients in the left hand side of equation(8). By
a number theory result, equation (8) has an integer
solution iff gcd(al , .. . , a,, b,, 1, . . . , b,,, , c,+ 1, . .. ,
C
n+m ) is a divisor of ao. If the gcd does not divide ao,
then there is definitely no dependence: otherwise there
maybe a dependence.
The Banerjee-Wolfetest computesthe extremevalues
min and max, assumedby the expressionon the left hand
side of equation (8) when the variablesare subjectedto the
constraints specified in (9). By the Intermediate Value
Theorem, equation (8) has a real solution within the
region specified by the constraints in (9), iff min 5 a0 I
max. If min 5 a0 I max is not true, then there is
definitely no dependence: otherwise there maybe a
dependence.
If both the GCD test and the Banerjee-Wolfe test
return a maybe answer,then we assumea data dependence.
However, in that case we do not know whether an
approximation was made or not. In the next section we
presenta set of conditions which can be testedalong with
the Banerjeeinequality and the GCD test to derivean exact
yes answer.
3. Extending
GCD Tests
1995
Yi
I
=
I
lb,1 + lcil
if
b.c.>O
max (lbil , Icil)
if
b.c.<
Imax (Ibil,
t;=
1
1
1
0
ifb.c. > 0
1 1
lcil)
max(min(lbil,
1
lcil),
Ibi+cil)
ifbici<
0
For each i, 1 I i 5 n+m, let
1
Iail
7.
=
I
ti
if
l<iln
if
n+llilm
Let K be a permutation of (1, 2, .. . , n+m) such that
‘n( 1) ’ ‘x(2) ’ **.’ rrr(n+m)
If
?r(l) = l
*for eachj, 2 5 j I n+m,
z
Ki)
+
and
I l+
la n(k)l(N7t(k)- M5qk))
c
kc(hllShlj-1
& 1 5 n(h) I n}
c
kE{hllSh<j-1
& n+l 5 n(h) I n+m}
’ n(k)
(N
n(k)
-M
x(k)-
then
We derived the following resultsstatedas Corollaries
1 and 2 [ 191basedon a necessaryand sufficient condition
for the linear expressionon the left hand side of equation
(8) to assumeall integer values betweenits min and max,
when the variables are subject to the constraints in (9).
Corollaries 1 and 2 provide conditions under which the
Banerjee-Wolfe test and a combination of the GCD test
and the Banerjee-Wolfe test become necessary and
min 5a 0 S max
iff
equation (8) has an integer solution satisfying the
constraints in (9).
n
126
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95)
1060-3425/95 $10.00 © 1995 IEEE
‘)
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
in the hypothesis of Corollary 2 is, as in Corollary 1,
linear in the number of variablesin the equation.
Testing the conditions in Corollary 1, along with the
Banerjee inequality, and, if required, the conditions in
Corollary 2, along with the GCD test, enables a
parallelizing compiler to obtain exact data dependence
information in linear time and avoid the use of more
expensive, potentially exponential tests [9, 16, 22, 281.
The following example illustrates the application of
Corollaries 1 and 2.
Corollary 1 states that if the coefficients of the
dependence equation are small enough to satisfy its
hypothesis,then the Banerjee-Wolfe test is an exact test,
a necessary and sufficient condition for data
i$3mce.
Testing the conditions in the hypothesisof Corollary
1 has a cost which is linear in the number of variables in
the equation. Since the test values zi are integers,it takes
linear time to sort them by applying the bin sort
algorithm [l]. Once the test values have been sorted, it
also takes linear time to evaluate the inequalities in the
hypothesis. Hence, in case the conditions of Corollary 1
are satisfied, we are able to derive exact data dependence
information in linear time.
Corollary
Example
Considerthe following loop:
DO I = 1, 10
DO J=l,lO
2
ConsiderEquation (8) together with the constraintsin (9).
sl:
s2:
Let yi, ti and Zi be as in Corollary 1. Let K be a
permutation of (1, 2, . . . , n+m) such that
~~(1) I rx(2) 2 ... I z~(~+~), and let
ENDDO
ENDDO
d = gcd(a1, ... , an, b,+l, ... , b,,,,
There is a data dependencebetween SI and S2 iff the
equation
il -i2+9jl -9j,= 18
(Ex-1)
cn+l, ... , c,,+~).
If
has an integer solution subject to the following
constraints:
%( 1) = d
*for each j, 2 5 j I n+m,
z
Ki)
+
A(1 + 9 * (J-l)) = ...
... = A(1 + 9 * (J+l))
Sd+
kc{hll
v
2
1 I il, i2 I 10
11jl,j25
10
<h <j-l
& 1 < rt(h) I n}
c
kE{hlllhlj-1
& n+l <n(h) I n+m}
‘an(k)‘(Nrr(k)-
M*(k))
03-2)
Consider testing for data dependencefrom Sl to S2
with direction vector (c, >). This direction vector
introducesthe additional constraints:
’ n(k) @x(k)- Mrr(k)- ‘)
il<i2andjl>j2
(Ex-3)
The extremevalues assumedby the expressionin the
left hand side of (Ex-1), within the region specified by the
constraints in (Ex-2) and (Ex-3), as computed by the
Banerjee-Wolfetest,are:
then
d divides a0 and min I a0 I max
iff
Since min 5 18 5 max, the Banerjee-Wolfe test
indicates that there maybe a data dependencefrom Sl to
S2 with direction vector (<, >).
Consider the application of Corollary 1 to equation
(Ex-1) subject to constraints (Ex-2) and (Ex-3). We first
computethe test values
21 = max ( min (Ill , I-11), 11-11) = 1
r2 = max ( min (191, I-91), 19-91) = 9
equation (8) has an integer solution satisfying the
constraintsin (9).
n
Corollary 2 states that if the coefficients of the
dependence equation satisfy its hypothesis, then a
combination of the GCD test and the Banerjee-Wolfetest
is an exact test, i.e., a necessaryand sufficient condition
for data dependence.The application cost of the conditions
127
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95)
1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences -
Since
rl = 1 and
r2 5 1 + max (Ill , l-11)(10 - 1 - 1) = 9
the hypothesis of Corollary 1 is satisfied and, therefore,
equation (Ex-1) has an integer solution satisfying the
constraintsin (Ex-2) and (Ex-3). Hence,there is definirely
a data dependencefrom S1 to S2 with direction vector
(<, >).
The reason why equation (Ex-1) has an integer
solution satisfying the constraintsin (Ex-2) and (Ex-3) is
that the expressionon the left hand side of equation(Ex-1)
assumesevery integer value between its extremevalues0
and 80, when the variablesare subject to the constraintsin
(Ex-2) and (Ex-3) [ 19, 201. Therefore, regardlessof the
value, aO,on the right hand side of equation (Ex-1), if 0 2
a0 I 80, then (Ex-1) has an integer solution satisfying the
constraintsin (Ex-2) and (Ex-3).
Now consider testing for data dependencefrom S2 to
Sl with direction vector (=, <). This is equivalent to
testing for data dependence from S1 to S2 with the
implausible direction vector (=, >) [8]. This direction
vector introducesthe additional constraints:
il=i2andjl>j2
solution satisfying the constraints in (Ex-2) and (Ex-4).
Hence,there is definitely a data dependencefrom S 1 to S2
with direction vector (=, >), i.e., a data dependencefrom
S2 to S 1 with direction vector (=, <).
n
The following algorithm demonstratesthe order of
application of Corollaries 1 and 2 to derive exact data
dependenceinformation.
Algorithm
1
If not (min I a0 I max) then
No Dependence
elseif the conditionsof Corollary 1 are true then
Yes Dependence
elseif gcd doesnot divide a0 then
No Dependence
else if the conditionsof Corollary 2 are true then
Yes Dependence
else
Maybe
Dependence.
The algorithm suggeststhe application of Corollary 1
before the GCD test. If the conditions in Corollary 1 are
satisfied, then there is definitely a data dependenceand,
therefore,the application of the GCD test is redundant.In
that case, the cost for applying Corollary 1 is offset by
not applying the GCD test. As we will see, in the
following section, the conditions in Corollary 1 are very
frequently satisfied,justifying the order of application of
thesetests.
@x-4)
By substituting the constraint il = i2 into equation
(Ex-1) we derive the following simplified equation:
9j, - 9j, = 18
1995
(Ex-5)
The extreme values assumedby the expressionin the
left hand side of (Ex-5) within the region specifiedby the
constraints in (Ex-2) and (Ex-4), as computed by the
Banerjee-Wolfetest,are:
4. Empirical
Results
In this section we present empirical results on how
often our conditionsof Corollaries 1 and 2 guaranteeexact
answers in practice. For the experimental evaluation of
our work we used the Perfect benchmarks [71, a
representativecollection of programsexecutedon parallel
computers.
We have implementedthe conditions in Corollaries 1
and 2 in the dependenceanalyzer of the Tiny program
restructuring researchtool [27]. The original version of
Tiny was developed by Michael Wolfe and has been
extended with additional features at University of
Maryland. Both the GCD test and the Banerjeeinequality
are implementedin Tiny. The Fortran 77 versions of the
Perfectbenchmarkswere preprocessedand convertedinto
Tiny syntax usingflt, the Fortran 77 to Tiny converter.
Intraprocedural constant propagation and dead code
elimination were carried out before the application of the
data dependencetests.Induction variable recognition was
also performed, before applying the tests, to remove
induction variables and convert array subscripts,
containing induction variables, into functions of the loop
index variables. No interprocedural or symbolic analysis
min=9x2-9x1=9
max=9xlO-9x1=81
Since min I 18 I max, the Banerjee-Wolfe test
indicates that there maybe a data dependencefrom S1 to
S2 with direction vector (=, >).
The hypothesis of Corollary 1 is not satisfied for
equation (Ex-5) subject to constraints(Ex-2) and (Ex-4)
since
r1 = max ( min (I91, l-91), 19-91) = 9 > 1
Therefore, we can not yet conclusively determineif there
is a data dependencefrom S1 to S2 with direction vector
(=, >>.
Since d = gcd (9, -9) = 9 and 9 divides 18, the GCD
test also indicatesthat theremaybe a data dependencefrom
S 1 to S2 with direction vector (=, >).
But since r1 = d, the hypothesis of Corollary 2 is
satisfied and, therefore, equation (Ex-5) has an integer
128
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95)
1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
applied. If the conditions in Corollary 1 were true, then
the counter for Corollary l-Yes was incremented:
otherwise we continued by applying the GCD Test. If the
gcd did not divide the right hand side of the dependence
equation, then the counter for GCD-No was incremented;
otherwise the counter for the GCD-Maybe was
incremented.In the latter caseCorollary 2 was applied. If
the conditions in Corollary 2 were true, then the counter
for Corollary-2-Yes was incremented: otherwise we can
not conclusively resolve a dependenceand an answer of
maybe should be reported.
To measure the success of our conditions we
introducethe following definitions.
SuccessRate 1 indicates the successrate of Corollary
1, which is the percentage of the cases a Banerjee
inequality maybe is converted into yes by applying
Corollary 1. In our experiment it is computedas the ratio:
[ll, 131 was performed. We tested only for potential
dependencescausedby array references.Furthermore,only
dependenceswithin the same loop nest were considered.
Subscript pairs were not tested if they could not be
expressedas linear functions of the loop indices.
If both subscripts in a pair of array references,tested
for data dependence,are loop invariant, then existenceof
dependencecan be determinedby simply comparing their
values. This is known as the constant or ZIV test [lo].
We do not report any results for these trivial cases.
Results about the application frequencyand independence
rate of the constanttest are reported in [lo, 16, 17,231.
We consider the case of statically unknown loop
limits. In some programs a number of loop limits can be
statically unknown, even after applying intraprocedural
constant propagation and induction variable recognition
and elimination. An experiment in [ 171shows that overall
12% of the lower loop limits and 71% of the upper loop
limits are unknown on the Perfect benchmarks. In an
interactive system such as ParaScope[12] or PAT [24],
information about statically unknown loop limits can be
provided by the user. If such information can not be made
available at compile time, the Banerjeeinequality can still
be applied making the following conservativeassumption.
Whenever a lower loop limit is unknown we assume-00
as its value, and whenever an upper loop limit is
unknown we assume +m as its value.
Three experiments were performed on the Perfect
benchmarks to study the effectiveness of our conditions
for exactnesson both unknown and known loop limits. In
the first experiment (Table l), unknown lower and upper
loop limits were assumedto be -m and + 00 respectively.
In the second experiment (Table 2). any unknown lower
loop limit, that was not a linear function of the loop
indices in the enclosing loops, was replaced by 1.
Similarly, any unknown upper loop limit, that was not a
linear function of the loop indices in the enclosing loops,
was replaced by 40. The choice of those numberswas to
maintain consistencywith earlier experiments[ 17,231. In
the third experiment (Table 3) such unknown lower and
upper loop limits were replaced by 1 and 5 respectively.
One can observethat the hypothesisof Corollaries 1 and 2
is less likely to be satisfied when the loop iteration
domains are small. The choice of 1 and 5 as the loop
limits in this case was made to demonstrate the
effectivenessof the conditions in Corollaries 1 and 2, even
in very small iteration domains.
The dependence tests were applied dimension by
dimension to all proper [29] direction vectors for a pair of
array references.Direction vector pruning [ 161was applied
to prune away aI1unusedvariables.If an independencewas
proved in a given dimension, the testswere not applied in
further dimensions.In eachdimensionthe dependencetests
were carried out in the order that Algorithm 1 suggests.
First the Banerjee inequality was applied. If the Banerjee
inequality was not satisfied, then the counter for BanerjeeNo was incremented;otherwise the counter for BanerjeeMaybe was incremented.In the latter caseCorollary 1 was
Success
Rate 1 = Corollary l-Yes / Banerjee-Maybe.
Success Rate 2 indicates the combined success rate of
Corollaries 1 and 2, which is the percentageof the casesa
Banerjee inequality and a GCD test maybe, is converted
into yes by applying either Corollary 1 or Corollary 2. In
our experimentit is computedas the ratio:
Success
Rate 2 = (Corollary l-Yes + Corollary 2-Yes)
/ (Banerjee-Maybe- GCD-No).
Several important observationscan be derived from
our experiments.First as we can see from Tables l-3 the
success rate of Corollary 1 is 100% in all the benchmarks
but one (AP) and in that one the combined success rate of
Corollary 1 and 2 is 100%. This demonstrates that the
Banerjee inequality extended with the conditions in
Corollary 1 or at least a combination of the Banerjee
inequality and the GCD test extendedwith the conditions
in Corollary 2 are always exact in practice. This obviates
the need for more expensive,potentially exponential tests
[9, 16,22,28] or special caseapproaches[ 10, 161.
Our experimentsalso indicate that the GCD test does
not have to be applied at all in most cases. Only in the
AP benchmark a few of the cases were resolved by the
application of the GCD test and the conditions in
Corollary 2. We can also see that the results of Tables 1
and 2 are almost identical. Only in the AP and SR
benchmarksa few additionalindependences
were discovered
when the unknown lower and upper loop limits were
replacedby 1 and 40 respectively. Comparing the results
of Tables 2 and 3 though, we see that when we replaced
the upper loop limit by 5 more independencesin four
benchmarks(AP, SR, LW, SM) were discovered. In LW
the number of independencesincreased from 3 1.8 % to
41.0%. This indicates that even though the Banerjee
inequality can be successfullyperformed without the need
for complete information about the loop limits, in certain
extreme cases this information may help discover
additional independences.In all three experimentsthough
the success ratesof the Corollaries remain the same.
129
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95)
1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences -
5. Related Work
1995
exponentialtests for solving more generalinstancesof the
problem and second, applying sequences of different
dependencetestseachone exact for special instancesof the
problem. In this work we in fact demonstrate that our
extensions to the GCD and Banerjee-Wolfe tests are
sufficiently accurate as well as efficient in practice and,
therefore,there is no need for exponential or special case
approaches.
A number of empirical studies have been performed
evaluating the performance of different data dependence
tests. They are all based though on the notion that the
Banerjee-Wolfe test and GCD test are approximate tests
and they can not conclusivelydeterminea datadependence.
Our empirical study focuses on how often the conditions
for exactnessdevelopedfor the Banerjee-Wolfetest and a
combination of the GCD and Banerjee-Wolfe tests occur
in practice, i.e., how often our work will guaranteeexact
answersin practice.
Shen et al. [23] in a preliminary empirical study
present information about the usage frequency and
independencedetection rates of various dependencetests,
including the constant test, the GCD test and different
implementations of the Banerjee inequality. Their
empirical study was performed on a number of Fortran
numericalpackages.
Petersen and Padua [17] perform an experimental
evaluation on the Perfect benchmarks of a proposed
sequenceof dependencetests.This sequenceconsistsof the
constant test, the generalized GCD test, Banerjee’s
inequalities, integer programming and the Omega test
[22]. In caseof a Banerjeeinequality maybe answer,based
on the perception that the Banerjee inequality can not
prove dependence, exponential tests such as integer
programming and Omega test were applied to elicit a yes
or no answer. Their results also show that Banerjee
inequality is sufficiently accuratein practice and most of
the applicability of the integer programming and Omega
tests is in proving a dependencethat would otherwise be
assumeddependent.Here we have shown that this can be
done in linear time by applying the conditions in
Corollaries 1 and 2 rather than exponential dependence
tests.
Maydan et al [ 161 propose a different sequenceof
exact dependencetests for special case inputs and they
show that they derive an exact answer in all casesin the
Perfect benchmarks. Their sequence includes FourierMotzkin variable elimination, an exponentialmethod,as a
back up test. It has been shown in their experimentsthat
Fourier-Motzkin has to be applied in a number of cases.
Triolet [25] found that using Fourier-Motzkin variable
elimination takes from 2? to 28 times longer than
conventionaldependencetesting.
Another approach taken, is based on the fact that
most array references in scientific programs are fairly
simple. Goff et al [lo] propose a dependence testing
schemebased on classifying pairs of subscripts.Efficient
and exact tests are presented for certain classes of
commonly occurring array references involving single
index variables (SIV). Our results in Corollaries 1 and 2
are more general than their special SIV cases.In caseof
multiple index variable (MIV) subscriptstheir techniques,
in fact, rely on the GCD and Banerjee-Wolfetests.
As one can see from the above discussion these
approaches fall into two categories. First, developing
6. Conclusions
Data DependenceAnalysis is the foundation of any
parallelizing compiler. The GCD test and the BanerjeeWolfe test are the two tests traditionally used in
parallelizing compilers to determine whether a pair of
array referencescausesa data dependence.These testsare
necessarybut not sufficient conditionsfor data dependence.
In our previouswork we extendedthe Banerjee-Wolfetest
and a combination of the GCD and Banerjee-Wolfe tests
with a set of conditions (Corollaries 1 and 2) to become
exact tests, i.e., necessaryand sufficient conditions for
data dependence.These conditions can be tested in linear
time along with the Banerjee inequality and the GCD test
to deriveexactdatadependenceinformation.
In this paper we performed an empirical study on the
Perfect benchmarksto find out how often our conditions
guaranteeexact answersin practice. The empirical results
indicated that our conditions are always satisfied in
practice.Therefore,the Banerjee-Wolfetest extendedwith
the conditions in Corollary 1 or at least a combination of
the GCD and Banerjee-Wolfe tests extended with the
conditions in Corollary 2 are always exact tests in
practice.
In light of the perceptionthat the GCD and BanerjeeWolfe tests were approximate methods, possibly
inaccuratein practice,a number of exponentialdependence
tests have been proposed in the literature. Our empirical
study has shown that exact answers can be derived in
linear time in practice. The only cases were more
expensive,potentially exponential tests might be helpful
is in the presenceof multidimensional coupled subscripts.
The number of coupled subscripts in the Perfect
benchmarkshas shown though to be only about 4% of the
total number of subscript pairs [lo]. In case of coupled
subscripts a system of equations has to be tested for a
simultaneous constrained integer solution and the
subscriptby subscriptBanerjee-Wolfeand GCD testsmay
introduce approximations.In thosecasespolynomial time
techniquessuch as constraint propagation [lo], wherever
applicable, or the lambda test [ 151can also be applied to
reduce the problem to testing single equations for
constrainedinteger solutions.Future work will addressthe
issueof simultaneousconstrainedinteger solutions.
References
[l]
A. Aho, J. Hopcroft, and J. Ullman. Data Structures
and Algorithnls. Addison-Wesley, 1983.
130
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95)
1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences -
PI
131
141
[51
F. Allen, M. Burke, P.
Ferrante. An Overview of
for Multiprocessing.
In
International Conference
Greece, June 1987.
1995
Transactions on Parallel and Distributed Systems, Vol.
1, No. 1, January 1990.
Charles, R. Cytron, and I.
the PTRAN Analysis System
Proceedings of the 1987
on Supercomputing, Athens,
1161 D. Maydan, J. Hennesy. and M. Lam. Efficient
and
Exact Data Dependence Analysis for Parallelizing
Compilers. In Proceedings of fhe SIGPLAN ‘91
Conference on Programming Language Design and
Implementation, Toronto, Canada, June 1991.
J. R. Allen. Dependence Analysis for Subscripted
Variables
and its Application
to Program
Transformations.
Ph.D. Thesis, Dept. of Computer
Science, Rice University, April 1983.
1171 P. Petersen, D. Padua. Static and Dynamic Evaluation of
Data Dependence Analysis. In Proceedings of the
Seventh
ACM
International
Conference
on
Supercomputing, Tokyo, Japan, July 1993.
J. R. Allen and K. Kennedy. PFC: A program to convert
Fortran to parallel form. Supercomputers: Design and
Applications,
IEEE Computer Society Press, Silver
Spring, MD, 1984.
1181 C. Polychronopoulos,
Environment
Synchronizing
Multiprocessors.
Computing, Vol.
J. R. Allen and K. Kennedy. Automatic Translation of
Fortran Programs to Vector Form. ACM Transactions on
Programming Languages and Systems, Vol. 9, No. 4,
October 1987.
et al. Parafrase-2
: An
for
Parallelizing,
Partitioning,
and Scheduling
Programs
on
International Journal of High Speed
1, No. 1, May 1989.
Kluwer Academic Publishers, Norwell, MA, 1988.
1191 K. Psarris. On Exact Data Dependence Analysis. In
Proceedings of the Sixth ACM International Conference
on Supercomputing. Washington, D.C., July 1992.
M. Berry, et al. The Perfect Club Benchmarks: Effective
Performance Evaluation
of Supercomputers. The
International Journal of Supercomputer Applications,
Vol. 3, 1989.
[201 K. Psarris, D. Klappholz, and X. Kong. On the Accuracy
of the Banerjee Test. Journal
of Parallel
and
Distributed
Computing, Special Issue on Shared
Memory Multiprocessors, Vol. 12, No. 2, June 1991.
[f31 M. Burke and R. Cytron. Interprocedural Dependence
1211 K. Psarris, X. Kong, and D. Klappholz. The Direction
[61 U. Banerjee. Dependence Analysis for Supercomputing.
171
Analysis and Parallelization.
In Proceedings
of
SIGPLAN ‘86 Symposium on Compiler Construction.
Palo Alto, CA, June 1986.
[91
[lOI
[Ill
Vector I Test. IEEE Transactions on Parallel and
Distributed Systems, Vol. 4, No. 11, November 1993.
WI
C. Eisenbeis, J.-C. Sogno. A General Algorithm for
Data Dependence Analysis. In Proceedings of fhe Sixth
ACM International
Conference on Supercomputing,
Washington, DC.. July 1992.
1231 Z. Shen, Z. Li, and P. Yew. An Empirical Study of
Fortran Programs for Parallelizing Compilers. IEEE
Transactions on Parallel and Distributed Systems, Vol.
1, No. 3, July 1990.
G. Golf, K. Kennedy, and C. W. Tseng. Practical
Dependence Testing. In Proceedings of the SIGPLAN ‘91
Conference on Programming Language Design and
Implementation, Toronto, Canada, June 1991.
1241 K. Smith and W. Appelbe. PAT--An Interactive Fortran
Parallelizing Assistant Tool. Proceedings of fhe 1988
International
Conference on Parallel Processing,
Saint-Charles, IL, August 1988.
M. Haghighat and C. Polychronopoulos.
Symbolic
Dependence
Analysis
for
High-Performance
Parallelizing Compilers. In Proceedings of the Third
Annual Workshop on Languages and Compilers for
Parallel Computing, Irvine, CA, August 1990.
[251 R. Triolet.
Interprocedural
analysis for program
restructuring with Parafrase. CSRD Report No. 538.
Department of Computer Science, University of Illinois
at Urbana-Champaign, December 1985.
[I21 K. Kennedy, K. McKinley,
C.-W. Tseng. Interactive
Parallel Programming Using the ParaScope Editor. IEEE
Transactions on Parallel and Distributed Systems, Vol.
2, No. 3, July 1991.
Supercompilers
for
Optimizing
Supercomputers. Pitman, London and The MIT Press,
Cambridge, MA, 1989.
1261 M. Wolfe.
and F. Thomasset.
Introducing
1131 A. Lichnewsky
Symbolic
Problem
Solving
Techniques
in the
Dependence Testing Phases. In Proceedings of the
Second
ACM
International
Conference
on
Supercomputing. Saint-Malo, France, July 1988.
[I41
[271 M. Wolfe. The Tiny Loop Restructuring Research Tool.
In Proceedings of the 1991 International Conference on
Parallel Processing, St Charles, IL, August 1991.
1281 M. Wolfe and C.-W. Tseng. The Power Test for Data
Dependence. IEEE Transactions on Parallel and
Distributed Sysrems. Vol. 3, No. 5, September 1992.
Z. Li, P. Yew. Some Results on Exact Data Dependence
Analysis. In Proceedings of the 2nd Workshop on
Languages and Compilers for Parallel Computing,
Urbana, Illinois, August 1989.
[151 Z. Li. P. Yew, and
Analysis
for
W. Pugh. A Practical Algorithm for Exact Array
Dependence Analysis. Communications of the ACM,
Vol. 35, No. 8, August 1992.
1291 H. Zima and B. Chapman. Supercompilers for Parallel
and Vector Computers. ACM Press, New York, NY,
1991.
C. Zhu. An Efficient Data Dependence
Parallelizing
Compilers.
I EE E
131
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95)
1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
Table 1: DeoendenceResults on Perfect Benchmarks.
I
I
Success
Rate 2
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
Table 2: DependenceResults on Perfect Benchmarks (Unknown Upper Loop Limits assumed40).
Results on NA, SD, TF, LW, SM, OC, LG, WS, MT, TI are identical to Table 1.
5
Program
Instances Banejee
5715
16524
Banejee Corollary 1
MAYBE
NO
YES
263 1
3uS4
(S4,0%) @6.0%)
4958
11566
Success
Rate 1
~30.0%) (70.0%)
GCD
MAYBE
GCD
NO
(0.3i;
Corollary 2
YES
Success
Rate 2
(Log
0
0
Table 3: DependenceResults on Perfect Benchmarks (Unknown Upper Loop Limits assumed5).
Results on NA, SD, TF, OC, LG, WS, MT, TI are identical to Table 1.
Bancjee
Corollary 1
132
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95)
1060-3425/95 $10.00 © 1995 IEEE
Download