i 1 ,i 2 - Department of Computer Science

Partitioning Loops with Variable Dependence Distances Yijun Yu and Erik D’Hollander Department of Electronics and Information Systems University of Ghent, Belgium 1 Introduction 1. Overview 2. Dependence analysis: pseudo distance matrix (PDM) 3. Loop transformations: unimodular and partitioning 4. Results 5. Conclusion 2 1. Overview • • • • • Loop with linear array subscripts Solve dependence equation Find all non-constant distances Create maximally covering grid and base-vectors Create the pseudo distance matrix, PDM containing all base-vectors of the covering grid • Find independent loops or independent partitions, based on the rank of PDM 3 Approach Dependence analysis: Linear dependence equation Uniform or constant distance Loop transformation: Full rank N det(H)> 1? Y Partitioning transformation Variable or non-constant distance H=PDM Y rank(H)<loop depth? Non-full rank N Unimodular transformation Loop parallelization 4 2. Dependence Analysis L1: L2: i (1, -5) (3, 0) (-3,-3) do I1= -N,N do I2= -N,N A(4I1-I2+3,2I1+I2-2)=… …=A(I1+I2-1,I1-I2+2) enddo enddo j (3, 10) (9, 7) (-9, 4) A[f(i)]=A[g(j)] A[12, -5] A[15, 4] A[-6,-11] … d=|j-i| (2,15) (6, 7) (6, -7) A[f(I)]=… …=A[g(I)] f(i)=g(j) 4I1-I2+3=J1+J2-1 2I1+I2-2=J1-J2+2 i=(I1,I2) j=(J1,J2)  4 2  3  1 1   1 A b  a  B   1 1   2  1 1  2  iA+a = jB+b 5 The dependence distance 1. The linear dependence equation: iA  a  jB  b 2. Using Banerjee’s unimodular transformation U to obtain an echelon matrix S, the equation t S=(b-a) is solved, yielding: i  tUl j  tUr • Ul and Ur are left, right halves of U • t has constant part t1 and unknown part t2 3. The distance between dependent iterations i, j is:  d | j  i || t (Ur  Ul ) || tF | 6 The distance set 1. From the dependence equation t S = (b-a), the solution vector t contains a constant and an arbitrary part: t   t1 t2  where t1  const, t 2  variable 2. Matrix F=Ur-Ul can be vertically separated into two sub-matrices:  F' d | tF |, F     F ''  3. The distance set of the dependence equations is:   d  xR | d   t1F'  0  x i  Z  where R  F '' or   F ''  7  i1 Distances in the iteration space • • Iteration-space (i1,i2) of loop 1 with dep. eqns: 4I1-I2+3=J1+J2-1 2I1+I2-2=J1-J2+2. The arrows (I1,I2)(J1,J2) i2 represent the distance vectors between dependent iterations. 8 i1 Distances base vectors 1. The dependence distance is non-constant for the reference pair, e.g. (2,15),(6,7),(6,-7), as highlighted. 2. However, the distance set is spanned by the grid generated by the base vectors (2,1) and (0,2). 3. For example, (2,15) = (2,1) + 7 (0,2), (6, 7) = 3(2,1) + 2 (0,2), i2 (6, -7) = 3(2,1) - 5 (0,2). 9 The largest base vectors The distance set is the linear combination of the row vectors in R:   d  xR | d  0  x i   A lattice L(R) is a group of vectors generated by all the linear combinations of the independent row vectors of a matrix R. L(R)  xR | x i  We look for the smallest lattice L(R) (generating the largest grid) which covers the whole distance set:   L(R ) In this way, possible spurious dependencies introduced by replacing the distance set with a lattice are minimized. 10 Pseudo Distance Matrix (PDM) • A Hermite normal form HNF(R) is a full row rank matrix reduced from the echelon form of R by unimodular transformation. H = HNF (R) • Therefore H generates the same lattice as R does, that is, the smallest lattice. In addition, the HNF rows are base vectors. L (H) = L (R) • H is called the pseudo distance matrix (PDM), because it generates the distance set from its row vectors. • Since the row vectors of H are constant, the techniques from the uniform distance dependence matrix may apply. 11 Calculating the PDM 1. Solving the linear dependence equations: d  tF  i1  4 2  3  1 1   1 i2        j1 j 2      1 1  2 1  1       2 d   4 4 t 3 2. Expressing the distance set: d  xR d  1 t 3  1  1 t4  2  0 0  1 1   2   0 4    t 4   2 1  0 2    3. Finding the largest base vectors: H  HNF(R)  0 4     2 1 HNF  2 1     0 2  0 2      2 1 PDM    0 2   12 3. Loop transformations: unimodular and partitioning Legality Any transformation should be legal, i.e. preserve the executing order of dependent iterations. Transformations depending on rank(H): 3.1 Unimodular transformation: non-full rank PDM 3.2 Partitioning transformation: full rank PDM 3.3 Combined approach 13 3.1 Unimodular transformation • Given a non-full rank (r  m) pseudo distance matrix H, a unimodular matrix T can be developed such that the first m-r columns of HT are zero. • As a result, m-r outermost loops can be parallelized. 14 3.2 Partitioning transformation • Given a full rank pseudo distance matrix H, the loop nest can be partitioned such that det(H) partitions are found. • The partitioned parallelism is det(H). 15 3.3 Combined approach • After a unimodular transformation on a non-full rank PDM, the transformed PDM matrix has a full rank sub-matrix, S. • When the det(S)>1, additional parallelism can be found using loop partitioning transformation. 16 4. Results (1) Non-full rank PDM L1: L2: L’1: do I1=-N,N L’2: do I2=-N,N A(3I1+1,2I1+I2-1)=… …=A(I1+3,I2+1) enddo enddo PDM=(2,2)  1 1    0 1  (2,0)  0 1    1 0 doall J1=-2N,2N do J2=max(-N,-N-J1), min(N,N-J1) I1=J2 I2=J1+J2 A(3I1+1,2I1+I2-1)=… …=A(I1+3,I2+1) enddo enddoall (0,2)  1 1 0 1   1 1  T    0 1 1 0     1 0 17 NF-rank: Dependence graphs j1 i2 18 j 2 4. Results (2) partitioning L’1: doall Io1=0,1 L’2: doall Io2=0,1 L1 : do I1=-N,N L’3: do I1=-N+mod(N+Io1,2), L2: do I2=-N,N N-mod(N-Io1,2),2 A(4I1-I2+3,2I1+I2-2)=… io’2=Io2+(I1-Io1)/2 …=A(I1+I2-1,I1-I2+2) L’4: do I2=-N+mod(N+Io’2,2), enddo N-mod(N-Io’2,2),2 enddo A(4I1-I2+3,2I1+I2-2)=… …=A(I1+I2-1,I1-I2+2) enddo 1/ 2 0  1 0   1 1        enddo  0 1  0 1/ 2  0 1   2 1 enddoall 1 1 1 0 1 0 PDM           0 2  0 1  enddoall  0 2  0 2 det  4 det  1 19 F-rank partitioning: dependence graphs 20 4. Results (3) Combined L’1: L’2: doall J1=-2N,2N do J2=max(-N,-N-J1), min(N,N-J1) I1=J2 I2=J1+J2 A(3I1+1,2I1+I2-1)=… …=A(I1+3,I2+1) enddo enddoall PDM=(2,2)  1 1     1 0 L’’1: doall Jo2=0,1 L’’2: doall J1=-2N,2N p2=max(-N,-N-J1) q2=min(N,N-J1) L’’3: do J2=p2+mod(Jo2-p2,2), q2-mod(q2-Jo2,2),2 I1=J2 I2=J1+J2 A(3I1+1,2I1+I2-1)=… …=A(I1+3,I2+1) enddo 1 0    enddoall  0 1/ 2  (0, 2) (0, 1) enddoall det  2 det  1 21 F-rank submatrix dependence graph j1 j1 j1 j j2 j2 22 2 5. Conclusion • The distances of the dependent iterations are nonconstant when the array subscripts are linear. • A pseudo distance matrix(PDM) with the largest base vectors of the distance space is computed from the linear dependence equations. • Parallelism can still be exploited for these loops with variable distances by the unimodular and partitioning transformations that are derived from the PDM. 23

i 1 ,i 2 - Department of Computer Science

Related documents

Products

Support

i 1 ,i 2 - Department of Computer Science

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib