Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995 Classical Dependence Analysis Techniques: Sufficiently Accurate in Practice Kleanthis Psarris Santosh Pande Department of Computer Science The University of Texas at San Antonio San Antonio, TX 78249 psarris@ringer.cs.utsa.edu Department of Computer Science Ohio University Athens, OH 45701 pande@ace.cs.ohiou.edu Abstract In an earlier work [19, 201 we formally studied the accuracyof the Banerjee-Wolfeand GCD tests.We derived a set of conditions, which can be tested along with the Banerjee inequality and the GCD test, to prove data dependence.The cost of testing theseconditions is proven to be linear in the number of variables in the dependence equation. In this paper we perform an empirical study on the Perfect benchmarks[7] to evaluate our formal results and demonstratethe effectivenessand practical importance of our extensionsto the Banerjee-Wolfe and GCD tests. We show that our extensions indeed prove to be always accuratein practice. Our empirical study indicates that the Banerjeeinequality extendedwith our accuracyconditions, becomes an exact test, i.e., a necessary as well as sufficientconditionfor datadependence. In Section 2 we discuss Data DependenceAnalysis and review the GCD and Banerjee-Wolfe tests. In Section 3 we presentour extensionsto the Banerjee-Wolfetest and to a combinationof the GCD and Banerjee-Wolfe tests.In Section 4 we demonstrate the effectiveness of our conditions in actual practice, by performing an empirical study on the Perfect benchmarks.In Section 5 we discuss related work and compareour results with other methods. Finally, in Section 6 we presentour conclusions. Data Dependence Analysis is the foundation of any parallelizing compiler. The GCD test and the BanerjeeWolfe test are the two tests traditionally used to determine statement data dependence in automatic vectorization I parallelization of loops. These tests are approximate in the sense that they are necessary but not sufficient conditions for data dependence. In an earlier work we extended the Banerjee-Wolfe test and a combination of the GCD and Banerjee-Wolfe tests with a set of conditions to derive exact data dependence information. In this paper we perform an empirical study on the Perfect benchmarks to demonstrate the effectiveness and practical importance of our conditions. We show that the Banerjee-Wolfe test extended with our conditions becomes an exact test for data dependence in actual practice. 1. Introduction In automatic parallelization of sequential programs, parallelizing compilers [2, 4, 12, 18, 24, 271 perform subscript analysis [3, 5, 6, 26, 291 to detect data dependencesbetweenpairs of array referencesinside loop nests. The data dependenceproblem is equivalent to the integer programming problem, a well known NP-hard problem, and, therefore, can not be solved efficiently in general. A number of subscript analysis tests have been proposedin the literature [6,9, 10, 15, 16,21,22, 281.In eachtest there is a different tradeoff betweenaccuracyand efficiency. The most widely used approximatesubscriptanalysis tests are the GCD test and the Banerjee-Wolfetest [6,26, 291.The major advantageof thesetests is their simplicity and their low computational cost, which is linear in the number of variables in the dependenceequation. This is the primary reason that they have been adopted by most parahelizing compilers. However, both the GCD test and the Banerjee-Wolfe test are necessarybut not sufficient conditionsfor datadependence.When independence can not be proved, both testsapproximateon the conservativeside by assumingdependence,so that their use never results in unsafeparallelization. 2. Data Dependence Analysis Consider two statements, S1 and S2 (Figure l), containing potentially conflicting references to a pdimensional array A. We assume that the subscript expressions are linear functions of the loop iteration variables, i.e., *for 1 I v I p, fv(Il, 12, ... , Ir) is a function of the form + VT0+ +v,l ‘1 + %,2 ‘2 + **. + @v,rIr where each of $v,o, ev 31, ... , Qvs is an integer. *for 1 I v I p, gv(I1, 12, .. . , IT) is a function of the form %,o + %,l I1 + TV,2 12 + ***+ qq Ir where each of T,,~, rv 1, ... , y, r is an integer. 9 , 123 1060-3425/95$4,0001995IEEE Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995 DOI Definition =Ll,U1 DO I2 = L2, U2 . . l lf2(i1,i2,... , ir)=g2Cjl, j,, ... ,jr) . . . ‘$(il, i2,... , ir)= spcil.j2, ... ,jr) DO Ir = Lr, U, sl: A(fl(Il, s2: ... = A(glU1’ 12v... 7 I,), .., , gp(I1, I29 ... , $1 . I27 ... , I,), ... , fp(I1, 12, ... , I,)) = ... then *if S 1 (i) < S2 (j), there is said to be a data END dependence from SI (i) to S2(j). END END *if S,(j) < S1 (i), there is said to be a data dependence from S2(j) to S*(i). A Nest of Loops Figure 1 n Full data dependenceinformation, betweenS 1 and S2, consistsof all setsof orderedpairs (S 1(i), S,(j)) such that there is a dependencefrom S,(i) to S,(j) and all sets of order pairs (S,(j), Sl(i)) such that there is a dependence from S,(j) to Sl(i). Becausethe number of such pairs is often very large, we define the notion of dependence with a direction vector [26], and often compute only the set of directionvectorsof datadependences. Each iteration of a nest of loops is identified by an iteration vector whose elements are the values of the iteration variablesfor that iteration. A statementembedded in a nest of loops may be executedonce for each iteration of the nest. Each potential execution of a statement is termed an instance of the statement. An instance of a statementis representedby the statementtogetherwith an iteration vector. For example, the instance of statement S1 during iteration i = (i,, i2, ... , ir) is denoted by Sl(i) or S1(il, i2, ... , if) and the instance of statement S2 during iteration j = (jl, j,, ... , jr) is denoted by S,(j) or S2cil, j,, ... , jr>. Definition 2 Let Sl(i), and S,(j) be as in Definition 1. If fl(il, i2, ... , ir) = glcil, j,, ... , jr> Definition 3 Let Sl(i), and S,(j) be as in Definition 1. A vector of the form (el, e2, ... , e,), where ek E (C, =, >, *), 1 I k 2 r, is termed a direction 1 Let S1 and S2 be two statementsas in Figure 1. Let vector. If there is a data dependencefrom S ,(i) to S,(j), and if for 1 I kl r,ik ek j,, i.e., if the relation ek holds between ik and j,, then there is said to be a data i = (i,, i2, .. . , ir) and j = (il, j,, . .. , jr> where Lk < ik, j, 5 Uk, for 1 I k I r. dependence If (i,, i2, ... , i,.) I (j,, j,, ... , jr), then S,(i) is said to precede S,(j), denoted S,(i) < S,(j). If t.il, j,, . . . , jr) < (i,, i2, . . . , ir), then S,(j) is said to precede S,(i), denoted S,(j) c Sl(i). S,(i) < S,(j), of course,meansthat, in the sequential execution of the loop of Figure 1, if both S,(i) and S,(j) execute, then S,(i) executesbefore S,(j); S,(j) < Sl(i) means that, if both S,(j) and Sl(i) execute, then S,(j) executes before S,(i). from SI to S2 with direction vector (eI. e2, . , e$ . Similarly if there is a data dependencefrom S,(j) to Sl(i), and if for 1 I k < r, j, ek ik, then there is said to be a data dependence from S2 to S, with direction vector (e I9 e2, . . . . er) . The symbol ‘*’ stands for an arbitrary relationship betweenik and j,, i.e., it indicates a don’t care situation. n n 124 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE Proceedings of the 28th Annual Hawaii International Conference on System Sciences - We see from Definitions l-3 that a data dependence exists from S1 to S2 with direction vector (e,, e2, ... , e,) if and only if there exist iteration vectors i and j such that the following systemof linear equations fl(il, V== (ul 1 <u<randeU=“=“) i2, ... , ir) = gl(il, j,, ... , jr) f2(i1, i2, ... , $) = g2cj 1vj2, ... , $1 (1) 11u5randeU=“>“] Using the notation defined in Definition 4, we can rewrite equation(4) as -Yu)iu c Mu uEv= (2) + c N&- r,j,>+ C (4&- r&J and subjectto dependencedirections UEV’ UEV< ik = j,, for 1 -< k I r, such that ek = “=“ +c Ot+& -Y&J= (3) Yo - $0 (5) WV* ik > j,, for 1 I k I r, such that ek = “B“ and the constraintsin (2) and (3) as This problem is equivalent to the integer programming problem and, therefore, can not be solved efficiently in general. One taken approach, known as subscript by subscript testing [6, 26, 291, is testing one equation at a time and assuming that there is a data dependenceunless at least one of the equations can be shown not to have a constrained integer solution. This method introduces a conservative approximation if and only if we are testing multidimensional arrays with coupled subscripts. We say that two different subscript pairs in a multidimensional array are coupled [ 14, 151if they contain the same loop index variable. In case of coupled subscripts, techniques such as constraint propagation [ 101,wherever applicable, or the lambda test [ 151can be applied to reduce the problem to testing single equationsfor constrainedinteger solutions. Lu 5 iu I Uu, for u E V= LUIiU,jU5UU,foruE V<uV’vV* (6) and iucjuforuE V< iu > ju for u E V’ (7) Note that the constraint iu = ju for u E V= has been expressedby simply eliminating the distinction between variablesiu andju for u E V=. Finally, eliminating terms with zero coefficients, and renaming coefficients and variables so that all inequality constraints are less than constraints, we see that the problem can be reducedto that of determining whether an equationof the form Consider,therefore,one equationof the form: (4) We proceedby simplifying the notation. Definition V’=(uI n has a simultaneousinteger solution subject to loop limits on the values of the variables ($1 il - y1 jl) + ... +(4+ir-yrY,jr)=Yo-~o l<uIrandeU=“<“) It is clear that the sets V=, V<, V’, and V* form a partition of the set of integers ( 1, 2, ... , r). fp(il, i2, ... , $) = gpt.il, j,, ... , jr) ik c j,, for 1 I k 5 r, such that ek = “<“ V<=(uI V* =(~I1~~Irande~=“*“) . . . Lk I ik, j, I Uk, for 1 I k I r 1995 n c 4 Consider equation (4) together with direction vector (el, e2, ... , e,). The sets V=, V’, V’, and V* are defined as follows: aiXi + i=l 125 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE i= c n+l ( biYi + ciZi) = a0 (8) Proceedings of the 28th Annual Hawaii International Conference on System Sciences - sufficient conditions for data dependence.It is proven in this study, that these conditions are almost invariably satisfiedin practice making the extended Banerjee-Wolfe and GCD testsexact as well as efficient. all of whose coefficients are non zero integers, has an integer solution satisfying constraintsof the form: *Mi I Xi I Ni, for 1 I i I n *Mi I Yi c Zi I Ni, for n+l I i I n+m the Banerjee-Wolfe 1 Corollary (9) Considerequation (8) together with the constraintsin (9). For each i, n+l I i I n+m, let The GCD test and the Banerjee-Wolfetest [6,26,29] are the two tests traditionally used in parallelizing compilers to determine whether an equation of the form (8) has an integer solution satisfying constraints of the form (9). Neither the GCD test nor the Banerjee-Wolfe test actually checks the equation for the existence of a constrained integer solution. The GCD test ignores loop limit and direction vector inequality constraints entirely and it simply determines whether the equation has an unconstrained integer solution. The Banerjee-Wolfetest, on the other hand, takes constraints into account, but determines whether the equation has a constrained real solution. The GCD test computesthe greatestcommon divisor of the coefficients in the left hand side of equation(8). By a number theory result, equation (8) has an integer solution iff gcd(al , .. . , a,, b,, 1, . . . , b,,, , c,+ 1, . .. , C n+m ) is a divisor of ao. If the gcd does not divide ao, then there is definitely no dependence: otherwise there maybe a dependence. The Banerjee-Wolfetest computesthe extremevalues min and max, assumedby the expressionon the left hand side of equation (8) when the variablesare subjectedto the constraints specified in (9). By the Intermediate Value Theorem, equation (8) has a real solution within the region specified by the constraints in (9), iff min 5 a0 I max. If min 5 a0 I max is not true, then there is definitely no dependence: otherwise there maybe a dependence. If both the GCD test and the Banerjee-Wolfe test return a maybe answer,then we assumea data dependence. However, in that case we do not know whether an approximation was made or not. In the next section we presenta set of conditions which can be testedalong with the Banerjeeinequality and the GCD test to derivean exact yes answer. 3. Extending GCD Tests 1995 Yi I = I lb,1 + lcil if b.c.>O max (lbil , Icil) if b.c.< Imax (Ibil, t;= 1 1 1 0 ifb.c. > 0 1 1 lcil) max(min(lbil, 1 lcil), Ibi+cil) ifbici< 0 For each i, 1 I i 5 n+m, let 1 Iail 7. = I ti if l<iln if n+llilm Let K be a permutation of (1, 2, .. . , n+m) such that ‘n( 1) ’ ‘x(2) ’ **.’ rrr(n+m) If ?r(l) = l *for eachj, 2 5 j I n+m, z Ki) + and I l+ la n(k)l(N7t(k)- M5qk)) c kc(hllShlj-1 & 1 5 n(h) I n} c kE{hllSh<j-1 & n+l 5 n(h) I n+m} ’ n(k) (N n(k) -M x(k)- then We derived the following resultsstatedas Corollaries 1 and 2 [ 191basedon a necessaryand sufficient condition for the linear expressionon the left hand side of equation (8) to assumeall integer values betweenits min and max, when the variables are subject to the constraints in (9). Corollaries 1 and 2 provide conditions under which the Banerjee-Wolfe test and a combination of the GCD test and the Banerjee-Wolfe test become necessary and min 5a 0 S max iff equation (8) has an integer solution satisfying the constraints in (9). n 126 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE ‘) Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995 in the hypothesis of Corollary 2 is, as in Corollary 1, linear in the number of variablesin the equation. Testing the conditions in Corollary 1, along with the Banerjee inequality, and, if required, the conditions in Corollary 2, along with the GCD test, enables a parallelizing compiler to obtain exact data dependence information in linear time and avoid the use of more expensive, potentially exponential tests [9, 16, 22, 281. The following example illustrates the application of Corollaries 1 and 2. Corollary 1 states that if the coefficients of the dependence equation are small enough to satisfy its hypothesis,then the Banerjee-Wolfe test is an exact test, a necessary and sufficient condition for data i$3mce. Testing the conditions in the hypothesisof Corollary 1 has a cost which is linear in the number of variables in the equation. Since the test values zi are integers,it takes linear time to sort them by applying the bin sort algorithm [l]. Once the test values have been sorted, it also takes linear time to evaluate the inequalities in the hypothesis. Hence, in case the conditions of Corollary 1 are satisfied, we are able to derive exact data dependence information in linear time. Corollary Example Considerthe following loop: DO I = 1, 10 DO J=l,lO 2 ConsiderEquation (8) together with the constraintsin (9). sl: s2: Let yi, ti and Zi be as in Corollary 1. Let K be a permutation of (1, 2, . . . , n+m) such that ~~(1) I rx(2) 2 ... I z~(~+~), and let ENDDO ENDDO d = gcd(a1, ... , an, b,+l, ... , b,,,, There is a data dependencebetween SI and S2 iff the equation il -i2+9jl -9j,= 18 (Ex-1) cn+l, ... , c,,+~). If has an integer solution subject to the following constraints: %( 1) = d *for each j, 2 5 j I n+m, z Ki) + A(1 + 9 * (J-l)) = ... ... = A(1 + 9 * (J+l)) Sd+ kc{hll v 2 1 I il, i2 I 10 11jl,j25 10 <h <j-l & 1 < rt(h) I n} c kE{hlllhlj-1 & n+l <n(h) I n+m} ‘an(k)‘(Nrr(k)- M*(k)) 03-2) Consider testing for data dependencefrom Sl to S2 with direction vector (c, >). This direction vector introducesthe additional constraints: ’ n(k) @x(k)- Mrr(k)- ‘) il<i2andjl>j2 (Ex-3) The extremevalues assumedby the expressionin the left hand side of (Ex-1), within the region specified by the constraints in (Ex-2) and (Ex-3), as computed by the Banerjee-Wolfetest,are: then d divides a0 and min I a0 I max iff Since min 5 18 5 max, the Banerjee-Wolfe test indicates that there maybe a data dependencefrom Sl to S2 with direction vector (<, >). Consider the application of Corollary 1 to equation (Ex-1) subject to constraints (Ex-2) and (Ex-3). We first computethe test values 21 = max ( min (Ill , I-11), 11-11) = 1 r2 = max ( min (191, I-91), 19-91) = 9 equation (8) has an integer solution satisfying the constraintsin (9). n Corollary 2 states that if the coefficients of the dependence equation satisfy its hypothesis, then a combination of the GCD test and the Banerjee-Wolfetest is an exact test, i.e., a necessaryand sufficient condition for data dependence.The application cost of the conditions 127 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE Proceedings of the 28th Annual Hawaii International Conference on System Sciences - Since rl = 1 and r2 5 1 + max (Ill , l-11)(10 - 1 - 1) = 9 the hypothesis of Corollary 1 is satisfied and, therefore, equation (Ex-1) has an integer solution satisfying the constraintsin (Ex-2) and (Ex-3). Hence,there is definirely a data dependencefrom S1 to S2 with direction vector (<, >). The reason why equation (Ex-1) has an integer solution satisfying the constraintsin (Ex-2) and (Ex-3) is that the expressionon the left hand side of equation(Ex-1) assumesevery integer value between its extremevalues0 and 80, when the variablesare subject to the constraintsin (Ex-2) and (Ex-3) [ 19, 201. Therefore, regardlessof the value, aO,on the right hand side of equation (Ex-1), if 0 2 a0 I 80, then (Ex-1) has an integer solution satisfying the constraintsin (Ex-2) and (Ex-3). Now consider testing for data dependencefrom S2 to Sl with direction vector (=, <). This is equivalent to testing for data dependence from S1 to S2 with the implausible direction vector (=, >) [8]. This direction vector introducesthe additional constraints: il=i2andjl>j2 solution satisfying the constraints in (Ex-2) and (Ex-4). Hence,there is definitely a data dependencefrom S 1 to S2 with direction vector (=, >), i.e., a data dependencefrom S2 to S 1 with direction vector (=, <). n The following algorithm demonstratesthe order of application of Corollaries 1 and 2 to derive exact data dependenceinformation. Algorithm 1 If not (min I a0 I max) then No Dependence elseif the conditionsof Corollary 1 are true then Yes Dependence elseif gcd doesnot divide a0 then No Dependence else if the conditionsof Corollary 2 are true then Yes Dependence else Maybe Dependence. The algorithm suggeststhe application of Corollary 1 before the GCD test. If the conditions in Corollary 1 are satisfied, then there is definitely a data dependenceand, therefore,the application of the GCD test is redundant.In that case, the cost for applying Corollary 1 is offset by not applying the GCD test. As we will see, in the following section, the conditions in Corollary 1 are very frequently satisfied,justifying the order of application of thesetests. @x-4) By substituting the constraint il = i2 into equation (Ex-1) we derive the following simplified equation: 9j, - 9j, = 18 1995 (Ex-5) The extreme values assumedby the expressionin the left hand side of (Ex-5) within the region specifiedby the constraints in (Ex-2) and (Ex-4), as computed by the Banerjee-Wolfetest,are: 4. Empirical Results In this section we present empirical results on how often our conditionsof Corollaries 1 and 2 guaranteeexact answers in practice. For the experimental evaluation of our work we used the Perfect benchmarks [71, a representativecollection of programsexecutedon parallel computers. We have implementedthe conditions in Corollaries 1 and 2 in the dependenceanalyzer of the Tiny program restructuring researchtool [27]. The original version of Tiny was developed by Michael Wolfe and has been extended with additional features at University of Maryland. Both the GCD test and the Banerjeeinequality are implementedin Tiny. The Fortran 77 versions of the Perfectbenchmarkswere preprocessedand convertedinto Tiny syntax usingflt, the Fortran 77 to Tiny converter. Intraprocedural constant propagation and dead code elimination were carried out before the application of the data dependencetests.Induction variable recognition was also performed, before applying the tests, to remove induction variables and convert array subscripts, containing induction variables, into functions of the loop index variables. No interprocedural or symbolic analysis min=9x2-9x1=9 max=9xlO-9x1=81 Since min I 18 I max, the Banerjee-Wolfe test indicates that there maybe a data dependencefrom S1 to S2 with direction vector (=, >). The hypothesis of Corollary 1 is not satisfied for equation (Ex-5) subject to constraints(Ex-2) and (Ex-4) since r1 = max ( min (I91, l-91), 19-91) = 9 > 1 Therefore, we can not yet conclusively determineif there is a data dependencefrom S1 to S2 with direction vector (=, >>. Since d = gcd (9, -9) = 9 and 9 divides 18, the GCD test also indicatesthat theremaybe a data dependencefrom S 1 to S2 with direction vector (=, >). But since r1 = d, the hypothesis of Corollary 2 is satisfied and, therefore, equation (Ex-5) has an integer 128 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995 applied. If the conditions in Corollary 1 were true, then the counter for Corollary l-Yes was incremented: otherwise we continued by applying the GCD Test. If the gcd did not divide the right hand side of the dependence equation, then the counter for GCD-No was incremented; otherwise the counter for the GCD-Maybe was incremented.In the latter caseCorollary 2 was applied. If the conditions in Corollary 2 were true, then the counter for Corollary-2-Yes was incremented: otherwise we can not conclusively resolve a dependenceand an answer of maybe should be reported. To measure the success of our conditions we introducethe following definitions. SuccessRate 1 indicates the successrate of Corollary 1, which is the percentage of the cases a Banerjee inequality maybe is converted into yes by applying Corollary 1. In our experiment it is computedas the ratio: [ll, 131 was performed. We tested only for potential dependencescausedby array references.Furthermore,only dependenceswithin the same loop nest were considered. Subscript pairs were not tested if they could not be expressedas linear functions of the loop indices. If both subscripts in a pair of array references,tested for data dependence,are loop invariant, then existenceof dependencecan be determinedby simply comparing their values. This is known as the constant or ZIV test [lo]. We do not report any results for these trivial cases. Results about the application frequencyand independence rate of the constanttest are reported in [lo, 16, 17,231. We consider the case of statically unknown loop limits. In some programs a number of loop limits can be statically unknown, even after applying intraprocedural constant propagation and induction variable recognition and elimination. An experiment in [ 171shows that overall 12% of the lower loop limits and 71% of the upper loop limits are unknown on the Perfect benchmarks. In an interactive system such as ParaScope[12] or PAT [24], information about statically unknown loop limits can be provided by the user. If such information can not be made available at compile time, the Banerjeeinequality can still be applied making the following conservativeassumption. Whenever a lower loop limit is unknown we assume-00 as its value, and whenever an upper loop limit is unknown we assume +m as its value. Three experiments were performed on the Perfect benchmarks to study the effectiveness of our conditions for exactnesson both unknown and known loop limits. In the first experiment (Table l), unknown lower and upper loop limits were assumedto be -m and + 00 respectively. In the second experiment (Table 2). any unknown lower loop limit, that was not a linear function of the loop indices in the enclosing loops, was replaced by 1. Similarly, any unknown upper loop limit, that was not a linear function of the loop indices in the enclosing loops, was replaced by 40. The choice of those numberswas to maintain consistencywith earlier experiments[ 17,231. In the third experiment (Table 3) such unknown lower and upper loop limits were replaced by 1 and 5 respectively. One can observethat the hypothesisof Corollaries 1 and 2 is less likely to be satisfied when the loop iteration domains are small. The choice of 1 and 5 as the loop limits in this case was made to demonstrate the effectivenessof the conditions in Corollaries 1 and 2, even in very small iteration domains. The dependence tests were applied dimension by dimension to all proper [29] direction vectors for a pair of array references.Direction vector pruning [ 161was applied to prune away aI1unusedvariables.If an independencewas proved in a given dimension, the testswere not applied in further dimensions.In eachdimensionthe dependencetests were carried out in the order that Algorithm 1 suggests. First the Banerjee inequality was applied. If the Banerjee inequality was not satisfied, then the counter for BanerjeeNo was incremented;otherwise the counter for BanerjeeMaybe was incremented.In the latter caseCorollary 1 was Success Rate 1 = Corollary l-Yes / Banerjee-Maybe. Success Rate 2 indicates the combined success rate of Corollaries 1 and 2, which is the percentageof the casesa Banerjee inequality and a GCD test maybe, is converted into yes by applying either Corollary 1 or Corollary 2. In our experimentit is computedas the ratio: Success Rate 2 = (Corollary l-Yes + Corollary 2-Yes) / (Banerjee-Maybe- GCD-No). Several important observationscan be derived from our experiments.First as we can see from Tables l-3 the success rate of Corollary 1 is 100% in all the benchmarks but one (AP) and in that one the combined success rate of Corollary 1 and 2 is 100%. This demonstrates that the Banerjee inequality extended with the conditions in Corollary 1 or at least a combination of the Banerjee inequality and the GCD test extendedwith the conditions in Corollary 2 are always exact in practice. This obviates the need for more expensive,potentially exponential tests [9, 16,22,28] or special caseapproaches[ 10, 161. Our experimentsalso indicate that the GCD test does not have to be applied at all in most cases. Only in the AP benchmark a few of the cases were resolved by the application of the GCD test and the conditions in Corollary 2. We can also see that the results of Tables 1 and 2 are almost identical. Only in the AP and SR benchmarksa few additionalindependences were discovered when the unknown lower and upper loop limits were replacedby 1 and 40 respectively. Comparing the results of Tables 2 and 3 though, we see that when we replaced the upper loop limit by 5 more independencesin four benchmarks(AP, SR, LW, SM) were discovered. In LW the number of independencesincreased from 3 1.8 % to 41.0%. This indicates that even though the Banerjee inequality can be successfullyperformed without the need for complete information about the loop limits, in certain extreme cases this information may help discover additional independences.In all three experimentsthough the success ratesof the Corollaries remain the same. 129 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 5. Related Work 1995 exponentialtests for solving more generalinstancesof the problem and second, applying sequences of different dependencetestseachone exact for special instancesof the problem. In this work we in fact demonstrate that our extensions to the GCD and Banerjee-Wolfe tests are sufficiently accurate as well as efficient in practice and, therefore,there is no need for exponential or special case approaches. A number of empirical studies have been performed evaluating the performance of different data dependence tests. They are all based though on the notion that the Banerjee-Wolfe test and GCD test are approximate tests and they can not conclusivelydeterminea datadependence. Our empirical study focuses on how often the conditions for exactnessdevelopedfor the Banerjee-Wolfetest and a combination of the GCD and Banerjee-Wolfe tests occur in practice, i.e., how often our work will guaranteeexact answersin practice. Shen et al. [23] in a preliminary empirical study present information about the usage frequency and independencedetection rates of various dependencetests, including the constant test, the GCD test and different implementations of the Banerjee inequality. Their empirical study was performed on a number of Fortran numericalpackages. Petersen and Padua [17] perform an experimental evaluation on the Perfect benchmarks of a proposed sequenceof dependencetests.This sequenceconsistsof the constant test, the generalized GCD test, Banerjee’s inequalities, integer programming and the Omega test [22]. In caseof a Banerjeeinequality maybe answer,based on the perception that the Banerjee inequality can not prove dependence, exponential tests such as integer programming and Omega test were applied to elicit a yes or no answer. Their results also show that Banerjee inequality is sufficiently accuratein practice and most of the applicability of the integer programming and Omega tests is in proving a dependencethat would otherwise be assumeddependent.Here we have shown that this can be done in linear time by applying the conditions in Corollaries 1 and 2 rather than exponential dependence tests. Maydan et al [ 161 propose a different sequenceof exact dependencetests for special case inputs and they show that they derive an exact answer in all casesin the Perfect benchmarks. Their sequence includes FourierMotzkin variable elimination, an exponentialmethod,as a back up test. It has been shown in their experimentsthat Fourier-Motzkin has to be applied in a number of cases. Triolet [25] found that using Fourier-Motzkin variable elimination takes from 2? to 28 times longer than conventionaldependencetesting. Another approach taken, is based on the fact that most array references in scientific programs are fairly simple. Goff et al [lo] propose a dependence testing schemebased on classifying pairs of subscripts.Efficient and exact tests are presented for certain classes of commonly occurring array references involving single index variables (SIV). Our results in Corollaries 1 and 2 are more general than their special SIV cases.In caseof multiple index variable (MIV) subscriptstheir techniques, in fact, rely on the GCD and Banerjee-Wolfetests. As one can see from the above discussion these approaches fall into two categories. First, developing 6. Conclusions Data DependenceAnalysis is the foundation of any parallelizing compiler. The GCD test and the BanerjeeWolfe test are the two tests traditionally used in parallelizing compilers to determine whether a pair of array referencescausesa data dependence.These testsare necessarybut not sufficient conditionsfor data dependence. In our previouswork we extendedthe Banerjee-Wolfetest and a combination of the GCD and Banerjee-Wolfe tests with a set of conditions (Corollaries 1 and 2) to become exact tests, i.e., necessaryand sufficient conditions for data dependence.These conditions can be tested in linear time along with the Banerjee inequality and the GCD test to deriveexactdatadependenceinformation. In this paper we performed an empirical study on the Perfect benchmarksto find out how often our conditions guaranteeexact answersin practice. The empirical results indicated that our conditions are always satisfied in practice.Therefore,the Banerjee-Wolfetest extendedwith the conditions in Corollary 1 or at least a combination of the GCD and Banerjee-Wolfe tests extended with the conditions in Corollary 2 are always exact tests in practice. In light of the perceptionthat the GCD and BanerjeeWolfe tests were approximate methods, possibly inaccuratein practice,a number of exponentialdependence tests have been proposed in the literature. Our empirical study has shown that exact answers can be derived in linear time in practice. The only cases were more expensive,potentially exponential tests might be helpful is in the presenceof multidimensional coupled subscripts. The number of coupled subscripts in the Perfect benchmarkshas shown though to be only about 4% of the total number of subscript pairs [lo]. In case of coupled subscripts a system of equations has to be tested for a simultaneous constrained integer solution and the subscriptby subscriptBanerjee-Wolfeand GCD testsmay introduce approximations.In thosecasespolynomial time techniquessuch as constraint propagation [lo], wherever applicable, or the lambda test [ 151can also be applied to reduce the problem to testing single equations for constrainedinteger solutions.Future work will addressthe issueof simultaneousconstrainedinteger solutions. References [l] A. Aho, J. Hopcroft, and J. Ullman. Data Structures and Algorithnls. Addison-Wesley, 1983. 130 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE Proceedings of the 28th Annual Hawaii International Conference on System Sciences - PI 131 141 [51 F. Allen, M. Burke, P. Ferrante. An Overview of for Multiprocessing. In International Conference Greece, June 1987. 1995 Transactions on Parallel and Distributed Systems, Vol. 1, No. 1, January 1990. Charles, R. Cytron, and I. the PTRAN Analysis System Proceedings of the 1987 on Supercomputing, Athens, 1161 D. Maydan, J. Hennesy. and M. Lam. Efficient and Exact Data Dependence Analysis for Parallelizing Compilers. In Proceedings of fhe SIGPLAN ‘91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991. J. R. Allen. Dependence Analysis for Subscripted Variables and its Application to Program Transformations. Ph.D. Thesis, Dept. of Computer Science, Rice University, April 1983. 1171 P. Petersen, D. Padua. Static and Dynamic Evaluation of Data Dependence Analysis. In Proceedings of the Seventh ACM International Conference on Supercomputing, Tokyo, Japan, July 1993. J. R. Allen and K. Kennedy. PFC: A program to convert Fortran to parallel form. Supercomputers: Design and Applications, IEEE Computer Society Press, Silver Spring, MD, 1984. 1181 C. Polychronopoulos, Environment Synchronizing Multiprocessors. Computing, Vol. J. R. Allen and K. Kennedy. Automatic Translation of Fortran Programs to Vector Form. ACM Transactions on Programming Languages and Systems, Vol. 9, No. 4, October 1987. et al. Parafrase-2 : An for Parallelizing, Partitioning, and Scheduling Programs on International Journal of High Speed 1, No. 1, May 1989. Kluwer Academic Publishers, Norwell, MA, 1988. 1191 K. Psarris. On Exact Data Dependence Analysis. In Proceedings of the Sixth ACM International Conference on Supercomputing. Washington, D.C., July 1992. M. Berry, et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. The International Journal of Supercomputer Applications, Vol. 3, 1989. [201 K. Psarris, D. Klappholz, and X. Kong. On the Accuracy of the Banerjee Test. Journal of Parallel and Distributed Computing, Special Issue on Shared Memory Multiprocessors, Vol. 12, No. 2, June 1991. [f31 M. Burke and R. Cytron. Interprocedural Dependence 1211 K. Psarris, X. Kong, and D. Klappholz. The Direction [61 U. Banerjee. Dependence Analysis for Supercomputing. 171 Analysis and Parallelization. In Proceedings of SIGPLAN ‘86 Symposium on Compiler Construction. Palo Alto, CA, June 1986. [91 [lOI [Ill Vector I Test. IEEE Transactions on Parallel and Distributed Systems, Vol. 4, No. 11, November 1993. WI C. Eisenbeis, J.-C. Sogno. A General Algorithm for Data Dependence Analysis. In Proceedings of fhe Sixth ACM International Conference on Supercomputing, Washington, DC.. July 1992. 1231 Z. Shen, Z. Li, and P. Yew. An Empirical Study of Fortran Programs for Parallelizing Compilers. IEEE Transactions on Parallel and Distributed Systems, Vol. 1, No. 3, July 1990. G. Golf, K. Kennedy, and C. W. Tseng. Practical Dependence Testing. In Proceedings of the SIGPLAN ‘91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991. 1241 K. Smith and W. Appelbe. PAT--An Interactive Fortran Parallelizing Assistant Tool. Proceedings of fhe 1988 International Conference on Parallel Processing, Saint-Charles, IL, August 1988. M. Haghighat and C. Polychronopoulos. Symbolic Dependence Analysis for High-Performance Parallelizing Compilers. In Proceedings of the Third Annual Workshop on Languages and Compilers for Parallel Computing, Irvine, CA, August 1990. [251 R. Triolet. Interprocedural analysis for program restructuring with Parafrase. CSRD Report No. 538. Department of Computer Science, University of Illinois at Urbana-Champaign, December 1985. [I21 K. Kennedy, K. McKinley, C.-W. Tseng. Interactive Parallel Programming Using the ParaScope Editor. IEEE Transactions on Parallel and Distributed Systems, Vol. 2, No. 3, July 1991. Supercompilers for Optimizing Supercomputers. Pitman, London and The MIT Press, Cambridge, MA, 1989. 1261 M. Wolfe. and F. Thomasset. Introducing 1131 A. Lichnewsky Symbolic Problem Solving Techniques in the Dependence Testing Phases. In Proceedings of the Second ACM International Conference on Supercomputing. Saint-Malo, France, July 1988. [I41 [271 M. Wolfe. The Tiny Loop Restructuring Research Tool. In Proceedings of the 1991 International Conference on Parallel Processing, St Charles, IL, August 1991. 1281 M. Wolfe and C.-W. Tseng. The Power Test for Data Dependence. IEEE Transactions on Parallel and Distributed Sysrems. Vol. 3, No. 5, September 1992. Z. Li, P. Yew. Some Results on Exact Data Dependence Analysis. In Proceedings of the 2nd Workshop on Languages and Compilers for Parallel Computing, Urbana, Illinois, August 1989. [151 Z. Li. P. Yew, and Analysis for W. Pugh. A Practical Algorithm for Exact Array Dependence Analysis. Communications of the ACM, Vol. 35, No. 8, August 1992. 1291 H. Zima and B. Chapman. Supercompilers for Parallel and Vector Computers. ACM Press, New York, NY, 1991. C. Zhu. An Efficient Data Dependence Parallelizing Compilers. I EE E 131 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995 Table 1: DeoendenceResults on Perfect Benchmarks. I I Success Rate 2 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% Table 2: DependenceResults on Perfect Benchmarks (Unknown Upper Loop Limits assumed40). Results on NA, SD, TF, LW, SM, OC, LG, WS, MT, TI are identical to Table 1. 5 Program Instances Banejee 5715 16524 Banejee Corollary 1 MAYBE NO YES 263 1 3uS4 (S4,0%) @6.0%) 4958 11566 Success Rate 1 ~30.0%) (70.0%) GCD MAYBE GCD NO (0.3i; Corollary 2 YES Success Rate 2 (Log 0 0 Table 3: DependenceResults on Perfect Benchmarks (Unknown Upper Loop Limits assumed5). Results on NA, SD, TF, OC, LG, WS, MT, TI are identical to Table 1. Bancjee Corollary 1 132 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS'95) 1060-3425/95 $10.00 © 1995 IEEE