i 1 ,i 2 - Department of Computer Science

advertisement
Partitioning Loops with
Variable Dependence
Distances
Yijun Yu and Erik D’Hollander
Department of Electronics and Information Systems
University of Ghent, Belgium
1
Introduction
1. Overview
2. Dependence analysis:
pseudo distance matrix (PDM)
3. Loop transformations:
unimodular and partitioning
4. Results
5. Conclusion
2
1. Overview
•
•
•
•
•
Loop with linear array subscripts
Solve dependence equation
Find all non-constant distances
Create maximally covering grid and base-vectors
Create the pseudo distance matrix, PDM
containing all base-vectors of the covering grid
• Find independent loops or independent partitions,
based on the rank of PDM
3
Approach
Dependence analysis:
Linear dependence equation
Uniform or
constant
distance
Loop transformation:
Full rank
N
det(H)> 1?
Y
Partitioning
transformation
Variable or
non-constant
distance
H=PDM
Y
rank(H)<loop depth?
Non-full
rank
N
Unimodular
transformation
Loop parallelization
4
2. Dependence Analysis
L1:
L2:
i
(1, -5)
(3, 0)
(-3,-3)
do I1= -N,N
do I2= -N,N
A(4I1-I2+3,2I1+I2-2)=…
…=A(I1+I2-1,I1-I2+2)
enddo
enddo
j
(3, 10)
(9, 7)
(-9, 4)
A[f(i)]=A[g(j)]
A[12, -5]
A[15, 4]
A[-6,-11]
…
d=|j-i|
(2,15)
(6, 7)
(6, -7)
A[f(I)]=…
…=A[g(I)]
f(i)=g(j)
4I1-I2+3=J1+J2-1
2I1+I2-2=J1-J2+2
i=(I1,I2) j=(J1,J2)
 4 2  3 
1 1   1
A
b 
a  B

 1 1   2 
1 1  2 
iA+a = jB+b
5
The dependence distance
1. The linear dependence equation:
iA  a  jB  b
2. Using Banerjee’s unimodular transformation U to obtain an
echelon matrix S, the equation t S=(b-a) is solved, yielding:
i  tUl
j  tUr
• Ul and Ur are left, right halves of U
• t has constant part t1 and unknown part t2
3. The distance between dependent iterations i, j is:
 d | j  i || t (Ur  Ul ) || tF |
6
The distance set
1. From the dependence equation t S = (b-a), the solution vector
t contains a constant and an arbitrary part:
t   t1 t2  where t1  const, t 2  variable
2. Matrix F=Ur-Ul can be vertically separated into two sub-matrices:
 F'
d | tF |, F   
 F '' 
3. The distance set of the dependence equations is:
  d  xR | d

 t1F' 
0  x i  Z  where R  F '' or 

F
''
 7 
i1
Distances in the iteration space
•
•
Iteration-space
(i1,i2) of loop 1
with dep. eqns:
4I1-I2+3=J1+J2-1
2I1+I2-2=J1-J2+2.
The arrows
(I1,I2)(J1,J2)
i2
represent the
distance vectors
between dependent
iterations.
8
i1
Distances base vectors
1. The dependence
distance is non-constant
for the reference pair,
e.g. (2,15),(6,7),(6,-7),
as highlighted.
2. However, the distance
set is spanned by the
grid generated by the
base vectors
(2,1) and (0,2).
3. For example,
(2,15) = (2,1) + 7 (0,2),
(6, 7) = 3(2,1) + 2 (0,2),
i2 (6, -7) = 3(2,1) - 5 (0,2).
9
The largest base vectors
The distance set is the linear combination of the row vectors in R:
  d  xR | d  0  x i  
A lattice L(R) is a group of vectors generated by all the linear
combinations of the independent row vectors of a matrix R.
L(R)  xR | x i 
We look for the smallest lattice L(R) (generating the largest grid)
which covers the whole distance set:
  L(R )
In this way, possible spurious dependencies introduced by
replacing the distance set with a lattice are minimized.
10
Pseudo Distance Matrix (PDM)
• A Hermite normal form HNF(R) is a full row rank matrix reduced
from the echelon form of R by unimodular transformation.
H = HNF (R)
• Therefore H generates the same lattice as R does, that is, the
smallest lattice. In addition, the HNF rows are base vectors.
L (H) = L (R)
• H is called the pseudo distance matrix (PDM), because it
generates the distance set from its row vectors.
• Since the row vectors of H are constant, the techniques from the
uniform distance dependence matrix may apply.
11
Calculating the PDM
1. Solving the linear dependence equations: d  tF
 i1
 4 2  3 
1 1   1
i2 
      j1 j 2  
 

1
1

2
1

1

  

 2
d   4 4 t 3
2. Expressing the distance set: d  xR
d  1 t 3
 1

1
t4 
2

0
0

1
1 

2 
 0 4 


t 4   2 1 
0 2 


3. Finding the largest base vectors: H  HNF(R)
 0 4 

  2 1
HNF  2 1   

0
2

0 2  


 2 1
PDM  

0
2


12
3. Loop transformations:
unimodular and partitioning
Legality
Any transformation should be legal, i.e.
preserve the executing order of dependent
iterations.
Transformations depending on rank(H):
3.1 Unimodular transformation: non-full rank PDM
3.2 Partitioning transformation: full rank PDM
3.3 Combined approach
13
3.1 Unimodular transformation
• Given a non-full rank (r  m) pseudo distance
matrix H, a unimodular matrix T can be
developed such that
the first m-r columns of HT are zero.
• As a result, m-r outermost loops can be
parallelized.
14
3.2 Partitioning transformation
• Given a full rank pseudo distance matrix H, the
loop nest can be partitioned such that det(H)
partitions are found.
• The partitioned parallelism is det(H).
15
3.3 Combined approach
• After a unimodular transformation on a non-full
rank PDM, the transformed PDM matrix has a
full rank sub-matrix, S.
• When the det(S)>1, additional parallelism can be
found using loop partitioning transformation.
16
4. Results
(1) Non-full rank PDM
L1:
L2:
L’1:
do I1=-N,N
L’2:
do I2=-N,N
A(3I1+1,2I1+I2-1)=…
…=A(I1+3,I2+1)
enddo
enddo
PDM=(2,2)
 1 1 


0 1 
(2,0)
 0 1


 1 0
doall J1=-2N,2N
do J2=max(-N,-N-J1),
min(N,N-J1)
I1=J2
I2=J1+J2
A(3I1+1,2I1+I2-1)=…
…=A(I1+3,I2+1)
enddo
enddoall
(0,2)
 1 1 0 1   1 1 
T



0
1
1
0


  1 0
17
NF-rank: Dependence graphs
j1
i2
18
j
2
4. Results
(2) partitioning
L’1: doall Io1=0,1
L’2: doall Io2=0,1
L1 :
do I1=-N,N
L’3:
do I1=-N+mod(N+Io1,2),
L2:
do I2=-N,N
N-mod(N-Io1,2),2
A(4I1-I2+3,2I1+I2-2)=…
io’2=Io2+(I1-Io1)/2
…=A(I1+I2-1,I1-I2+2)
L’4:
do I2=-N+mod(N+Io’2,2),
enddo
N-mod(N-Io’2,2),2
enddo
A(4I1-I2+3,2I1+I2-2)=…
…=A(I1+I2-1,I1-I2+2)
enddo
1/ 2 0 
1 0 
 1 1 






enddo
 0 1
 0 1/ 2 
0 1 
 2 1
enddoall
1 1
1 0
1 0
PDM  







 0 2
 0 1  enddoall
 0 2
 0 2
det  4
det  1
19
F-rank partitioning: dependence graphs
20
4. Results
(3) Combined
L’1:
L’2:
doall J1=-2N,2N
do J2=max(-N,-N-J1),
min(N,N-J1)
I1=J2
I2=J1+J2
A(3I1+1,2I1+I2-1)=…
…=A(I1+3,I2+1)
enddo
enddoall
PDM=(2,2)
 1 1 


 1 0
L’’1: doall Jo2=0,1
L’’2: doall J1=-2N,2N
p2=max(-N,-N-J1)
q2=min(N,N-J1)
L’’3:
do J2=p2+mod(Jo2-p2,2),
q2-mod(q2-Jo2,2),2
I1=J2
I2=J1+J2
A(3I1+1,2I1+I2-1)=…
…=A(I1+3,I2+1)
enddo
1 0 


enddoall
 0 1/ 2 
(0, 2)
(0, 1)
enddoall
det  2
det  1
21
F-rank submatrix dependence graph
j1
j1
j1
j
j2
j2
22 2
5. Conclusion
• The distances of the dependent iterations are nonconstant when the array subscripts are linear.
• A pseudo distance matrix(PDM) with the largest base
vectors of the distance space is computed from the
linear dependence equations.
• Parallelism can still be exploited for these loops with
variable distances by the unimodular and partitioning
transformations that are derived from the PDM.
23
Download