Notes on Loop Tiling

advertisement
Notes on Loop Tiling
Loop tiling can be viewed as a two-step transformation:
1. Split each loop of a nested loop-set into a pair of adjacent loops in the loop nest, with the
outer loop (tiling loop) traversing tiles (blocks), and the inner loop (intra-tile loop)
covering the iteration points within the tile. This transformation is always valid, since it
does not at all change the relative order of execution of the points in the iteration space.
2. Perform loop permutation, moving the tiling loops outwards.
As an example, consider the tiling of the loops for matrix-vector multiplication:
for i = 1 to N
for j = 1 to N
y(i) += A(I,j)*x(j)
endfor
endfor
for it = 1 to N by T
for i = it to it+T-1
for jt = 1 to N
for j= jt to jt+T-1
y(i) += A(i,j)*x(j)
endfor
endfor
endfor
endfor
for it = 1 to N by T
for jt = 1 to N
for i = it to it+T-1
for j= jt to jt+T-1
y(i) += A(i,j)*x(j)
endfor
endfor
endfor
endfor
The validity of the second step (permutation) depends on the data dependences of the loop nest –
they must be preserved for the permuted loop nest. One approach to check for validity of the
permutation step would be to explicitly form the dependence vectors for the intermediate loop
nest (with 2N components, if the original loop nest has N nested loops), and then check if the
permuted dependence vectors are lexicographically positive. However, a single constant
dependence vector in the original n-nest loop can result in a number of dependence vectors in the
intermediate 2n-nest loop. So instead of explicitly forming the dependence vectors in the 2Ndimensional iteration space of the intermediate loop nest, we take a different, simpler approach to
developing a sufficient condition for validity of loop tiling: consider all possible extreme cases of
tile sizes, where all but one of the tile extents is one, and one tile extent is N. Each of these
extreme cases corresponds simply to a permutation of the original loop nest.
For example, for a 2D loop such as shown above, the two extremes for a 2D Ti x Tj tile are 1 x N
and N x 1. If the tile size is chosen to be N x 1 (i.e. tile extent along “i” is N and along “j” is 1),
the tiled version essentially corresponds to the “ji” permuted version of the original loop – the “it”
and “j” loops degenerate to having only one iteration each. Similarly, the 1 x N case corresponds
to the original loop without any permutation.
For a triply nested loop, the various extreme cases for the 6-nested tiled version correspond to the
6 possible permutations of the original loop.
So the sufficient condition for validity of loop tiling is that the given loop nest be fully
permutable. Sometimes the loop nest is not fully permutable, but contiguous subsets of the loops
are fully permutable among themselves. If so, partial tiling is possible. Partial tiling involves the
same two-step process as seen above, performed among the subset of loops being tiled, with all
other loops being unchanged. For example, tiling the middle two loops of a 4-nested loop is
shown below:
for i = 1 to N
for j = 1 to N
for k = 1 to N
for l = 1 to N
S(i,j,k,l)
endfor
endfor
endfor
endfor
for i = 1 to N
for jt = 1 to N by T
for kt = 1 to N by T
for j = jt to jt+T-1
for k = kt to kt+T-1
for l = 1 to N
S(i,j,k,l)
endfor
endfor
endfor
endfor
endfor
endfor
The sufficient condition for validity of this tiling is: given any dependence vector (di,dj,dk,dl) for
the loop on the left, the permuted dependence vector (di,dk,dj,dl) must be lexicographically
positive.
Download