Partitioning Loops with Variable Dependence Distances Yijun Yu and Erik D’Hollander Department of Electronics and Information Systems University of Ghent, Belgium 1 Introduction 1. Overview 2. Dependence analysis: pseudo distance matrix (PDM) 3. Loop transformations: unimodular and partitioning 4. Results 5. Conclusion 2 1. Overview • • • • • Loop with linear array subscripts Solve dependence equation Find all non-constant distances Create maximally covering grid and base-vectors Create the pseudo distance matrix, PDM containing all base-vectors of the covering grid • Find independent loops or independent partitions, based on the rank of PDM 3 Approach Dependence analysis: Linear dependence equation Uniform or constant distance Loop transformation: Full rank N det(H)> 1? Y Partitioning transformation Variable or non-constant distance H=PDM Y rank(H)<loop depth? Non-full rank N Unimodular transformation Loop parallelization 4 2. Dependence Analysis L1: L2: i (1, -5) (3, 0) (-3,-3) do I1= -N,N do I2= -N,N A(4I1-I2+3,2I1+I2-2)=… …=A(I1+I2-1,I1-I2+2) enddo enddo j (3, 10) (9, 7) (-9, 4) A[f(i)]=A[g(j)] A[12, -5] A[15, 4] A[-6,-11] … d=|j-i| (2,15) (6, 7) (6, -7) A[f(I)]=… …=A[g(I)] f(i)=g(j) 4I1-I2+3=J1+J2-1 2I1+I2-2=J1-J2+2 i=(I1,I2) j=(J1,J2) 4 2 3 1 1 1 A b a B 1 1 2 1 1 2 iA+a = jB+b 5 The dependence distance 1. The linear dependence equation: iA a jB b 2. Using Banerjee’s unimodular transformation U to obtain an echelon matrix S, the equation t S=(b-a) is solved, yielding: i tUl j tUr • Ul and Ur are left, right halves of U • t has constant part t1 and unknown part t2 3. The distance between dependent iterations i, j is: d | j i || t (Ur Ul ) || tF | 6 The distance set 1. From the dependence equation t S = (b-a), the solution vector t contains a constant and an arbitrary part: t t1 t2 where t1 const, t 2 variable 2. Matrix F=Ur-Ul can be vertically separated into two sub-matrices: F' d | tF |, F F '' 3. The distance set of the dependence equations is: d xR | d t1F' 0 x i Z where R F '' or F '' 7 i1 Distances in the iteration space • • Iteration-space (i1,i2) of loop 1 with dep. eqns: 4I1-I2+3=J1+J2-1 2I1+I2-2=J1-J2+2. The arrows (I1,I2)(J1,J2) i2 represent the distance vectors between dependent iterations. 8 i1 Distances base vectors 1. The dependence distance is non-constant for the reference pair, e.g. (2,15),(6,7),(6,-7), as highlighted. 2. However, the distance set is spanned by the grid generated by the base vectors (2,1) and (0,2). 3. For example, (2,15) = (2,1) + 7 (0,2), (6, 7) = 3(2,1) + 2 (0,2), i2 (6, -7) = 3(2,1) - 5 (0,2). 9 The largest base vectors The distance set is the linear combination of the row vectors in R: d xR | d 0 x i A lattice L(R) is a group of vectors generated by all the linear combinations of the independent row vectors of a matrix R. L(R) xR | x i We look for the smallest lattice L(R) (generating the largest grid) which covers the whole distance set: L(R ) In this way, possible spurious dependencies introduced by replacing the distance set with a lattice are minimized. 10 Pseudo Distance Matrix (PDM) • A Hermite normal form HNF(R) is a full row rank matrix reduced from the echelon form of R by unimodular transformation. H = HNF (R) • Therefore H generates the same lattice as R does, that is, the smallest lattice. In addition, the HNF rows are base vectors. L (H) = L (R) • H is called the pseudo distance matrix (PDM), because it generates the distance set from its row vectors. • Since the row vectors of H are constant, the techniques from the uniform distance dependence matrix may apply. 11 Calculating the PDM 1. Solving the linear dependence equations: d tF i1 4 2 3 1 1 1 i2 j1 j 2 1 1 2 1 1 2 d 4 4 t 3 2. Expressing the distance set: d xR d 1 t 3 1 1 t4 2 0 0 1 1 2 0 4 t 4 2 1 0 2 3. Finding the largest base vectors: H HNF(R) 0 4 2 1 HNF 2 1 0 2 0 2 2 1 PDM 0 2 12 3. Loop transformations: unimodular and partitioning Legality Any transformation should be legal, i.e. preserve the executing order of dependent iterations. Transformations depending on rank(H): 3.1 Unimodular transformation: non-full rank PDM 3.2 Partitioning transformation: full rank PDM 3.3 Combined approach 13 3.1 Unimodular transformation • Given a non-full rank (r m) pseudo distance matrix H, a unimodular matrix T can be developed such that the first m-r columns of HT are zero. • As a result, m-r outermost loops can be parallelized. 14 3.2 Partitioning transformation • Given a full rank pseudo distance matrix H, the loop nest can be partitioned such that det(H) partitions are found. • The partitioned parallelism is det(H). 15 3.3 Combined approach • After a unimodular transformation on a non-full rank PDM, the transformed PDM matrix has a full rank sub-matrix, S. • When the det(S)>1, additional parallelism can be found using loop partitioning transformation. 16 4. Results (1) Non-full rank PDM L1: L2: L’1: do I1=-N,N L’2: do I2=-N,N A(3I1+1,2I1+I2-1)=… …=A(I1+3,I2+1) enddo enddo PDM=(2,2) 1 1 0 1 (2,0) 0 1 1 0 doall J1=-2N,2N do J2=max(-N,-N-J1), min(N,N-J1) I1=J2 I2=J1+J2 A(3I1+1,2I1+I2-1)=… …=A(I1+3,I2+1) enddo enddoall (0,2) 1 1 0 1 1 1 T 0 1 1 0 1 0 17 NF-rank: Dependence graphs j1 i2 18 j 2 4. Results (2) partitioning L’1: doall Io1=0,1 L’2: doall Io2=0,1 L1 : do I1=-N,N L’3: do I1=-N+mod(N+Io1,2), L2: do I2=-N,N N-mod(N-Io1,2),2 A(4I1-I2+3,2I1+I2-2)=… io’2=Io2+(I1-Io1)/2 …=A(I1+I2-1,I1-I2+2) L’4: do I2=-N+mod(N+Io’2,2), enddo N-mod(N-Io’2,2),2 enddo A(4I1-I2+3,2I1+I2-2)=… …=A(I1+I2-1,I1-I2+2) enddo 1/ 2 0 1 0 1 1 enddo 0 1 0 1/ 2 0 1 2 1 enddoall 1 1 1 0 1 0 PDM 0 2 0 1 enddoall 0 2 0 2 det 4 det 1 19 F-rank partitioning: dependence graphs 20 4. Results (3) Combined L’1: L’2: doall J1=-2N,2N do J2=max(-N,-N-J1), min(N,N-J1) I1=J2 I2=J1+J2 A(3I1+1,2I1+I2-1)=… …=A(I1+3,I2+1) enddo enddoall PDM=(2,2) 1 1 1 0 L’’1: doall Jo2=0,1 L’’2: doall J1=-2N,2N p2=max(-N,-N-J1) q2=min(N,N-J1) L’’3: do J2=p2+mod(Jo2-p2,2), q2-mod(q2-Jo2,2),2 I1=J2 I2=J1+J2 A(3I1+1,2I1+I2-1)=… …=A(I1+3,I2+1) enddo 1 0 enddoall 0 1/ 2 (0, 2) (0, 1) enddoall det 2 det 1 21 F-rank submatrix dependence graph j1 j1 j1 j j2 j2 22 2 5. Conclusion • The distances of the dependent iterations are nonconstant when the array subscripts are linear. • A pseudo distance matrix(PDM) with the largest base vectors of the distance space is computed from the linear dependence equations. • Parallelism can still be exploited for these loops with variable distances by the unimodular and partitioning transformations that are derived from the PDM. 23