Notes on Loop Tiling Loop tiling can be viewed as a two-step transformation: 1. Split each loop of a nested loop-set into a pair of adjacent loops in the loop nest, with the outer loop (tiling loop) traversing tiles (blocks), and the inner loop (intra-tile loop) covering the iteration points within the tile. This transformation is always valid, since it does not at all change the relative order of execution of the points in the iteration space. 2. Perform loop permutation, moving the tiling loops outwards. As an example, consider the tiling of the loops for matrix-vector multiplication: for i = 1 to N for j = 1 to N y(i) += A(I,j)*x(j) endfor endfor for it = 1 to N by T for i = it to it+T-1 for jt = 1 to N for j= jt to jt+T-1 y(i) += A(i,j)*x(j) endfor endfor endfor endfor for it = 1 to N by T for jt = 1 to N for i = it to it+T-1 for j= jt to jt+T-1 y(i) += A(i,j)*x(j) endfor endfor endfor endfor The validity of the second step (permutation) depends on the data dependences of the loop nest – they must be preserved for the permuted loop nest. One approach to check for validity of the permutation step would be to explicitly form the dependence vectors for the intermediate loop nest (with 2N components, if the original loop nest has N nested loops), and then check if the permuted dependence vectors are lexicographically positive. However, a single constant dependence vector in the original n-nest loop can result in a number of dependence vectors in the intermediate 2n-nest loop. So instead of explicitly forming the dependence vectors in the 2Ndimensional iteration space of the intermediate loop nest, we take a different, simpler approach to developing a sufficient condition for validity of loop tiling: consider all possible extreme cases of tile sizes, where all but one of the tile extents is one, and one tile extent is N. Each of these extreme cases corresponds simply to a permutation of the original loop nest. For example, for a 2D loop such as shown above, the two extremes for a 2D Ti x Tj tile are 1 x N and N x 1. If the tile size is chosen to be N x 1 (i.e. tile extent along “i” is N and along “j” is 1), the tiled version essentially corresponds to the “ji” permuted version of the original loop – the “it” and “j” loops degenerate to having only one iteration each. Similarly, the 1 x N case corresponds to the original loop without any permutation. For a triply nested loop, the various extreme cases for the 6-nested tiled version correspond to the 6 possible permutations of the original loop. So the sufficient condition for validity of loop tiling is that the given loop nest be fully permutable. Sometimes the loop nest is not fully permutable, but contiguous subsets of the loops are fully permutable among themselves. If so, partial tiling is possible. Partial tiling involves the same two-step process as seen above, performed among the subset of loops being tiled, with all other loops being unchanged. For example, tiling the middle two loops of a 4-nested loop is shown below: for i = 1 to N for j = 1 to N for k = 1 to N for l = 1 to N S(i,j,k,l) endfor endfor endfor endfor for i = 1 to N for jt = 1 to N by T for kt = 1 to N by T for j = jt to jt+T-1 for k = kt to kt+T-1 for l = 1 to N S(i,j,k,l) endfor endfor endfor endfor endfor endfor The sufficient condition for validity of this tiling is: given any dependence vector (di,dj,dk,dl) for the loop on the left, the permuted dependence vector (di,dk,dj,dl) must be lexicographically positive.