CS 201 Compiler Construction Array Dependence Analysis & Loop Parallelization 1 Loop Parallelization Goal: Identify loops whose iterations can be executed in parallel on different processors of a shared-memory multiprocessor system. Matrix Multiplication for I = 1 to n do -- parallel for J = 1 to n do -- parallel for K = 1 to n do –- not parallel C[I,J] = C[I,J] + A[I,K]*B[K,J] 2 Data Dependences Flow Dependence: S1: X = …. S2: … = X Anti Dependence: S1: … = X S2: X = … Output Dependence: S1: X = … S2: X = … S1 δf S1 δa S1 δo S2 S2 S2 S1 S1 S1 δf S 2 δa S 2 δo S 2 3 Example: Data Dependences do I = 1, 40 S1: A(I+1) = …. S2: … = A(I-1) enddo do I = 1, 40 S1: A(I-1) = … S2: … = A(I+1) enddo do I = 1, 40 S1: A(I+1) = … S2: A(I-1) = … enddo S1 S δf 2 δa 2 S1 S δo S1 S 2 4 Sets of Dependences do I = 1, 100 S: A(I) = B(I+2) + 1 T: B(I) = A(I-1) - 1 enddo S(1): A(1) = B(3) + 1 T(1): B(1) = A(0) - 1 S(2): A(2) = B(4) + 1 T(2): B(2) = A(1) - 1 S(3): A(3) = B(5) + 1 T(3): B(3) = A(2) - 1 ………….. S(100): A(100) = B(102) + 1 T(100): B(100) = A(99) - 1 S δf T Due to A() Set of iteration pairs associated with this dependence: {(i,j): j=i+1, 1<=i<=99} Dependence distance: j-i=1 constant in this case. 5 Nested Loops level 1: do I1 = 1, 100 level 2: do I2 = 1, 50 S: A(I1,I2) = A(I1,I2-1) + B(I1,I2) enddo enddo Value computed by S in an iteration (i1,i2) is same as the value used in an iteration (j1,j2): A(i1,i2)A(j1,j2-1) iff i1=j1 and i2=j2-1 • S is flow dependent on itself at level 2 (corresponds to inner loop) for fixed value of I1 dependence exists between different iterations of second loop (inner loop). • iteration pairs: {((i1,i2),(j1,j2)): j1=i1, j2=i2+1, 1<=i1<=100, 1<=i2<=49} 6 Nested Loops Contd.. Iteration pairs: {((i1,i2),(j1,j2)): j1=i1, j2=i2+1, 1<=i1<=100, 1<=i2<=49} • Dependence distance vector: (j1-i1,j2-i2) = (0,1) • • There is no dependence at level 1. 7 Computing Dependences Formulation level 1: do I1 = 1, 8 level 2: do I2 = max(I1-3,1), min(I1,5) A(I1+1,I2+1) = A(I1,I2) + B(I1,I2) enddo enddo Can we find iterations (i1,i2) & (j1,j2) such that i1+1=j1; i2+1=j2 and 1<=i1<=81<=j1<=8 i1-3<=i2<=i1 j1-3<=j2<=j1 i1-3<=i2<=5 j1-3<=j2<=5 1<=i2<=i1 1<=j2<=j1 1<=i2<=5 1<=j2<=5 and i1, i2, j1, j2 are integers. 8 Computing Dependences Contd.. Dependence testing is an integer programming problem NP-complete • Solutions trade-off Speed and Accuracy of the solver. • False positives: imprecise tests may report dependences that actually do not exist – conservative is to report false positives but never miss a dependence. • DependenceTests: extreme value test; GCD test; Generalized GCD test; Lambda test; Delta test; Power test; Omega test etc… 9 Extreme Value Test Approximate test which guarantees that a solution exists but it may be a real solution not an integer solution. Let f: Rn R st f is bounded in a set S contained in Rn Let b be a lower bound of f in S and B be an upper bound of f in S: b ≤ f(ā) ≤ B for any ā ε S contained in Rn For a real number c, the equation f(ā) = c will have solutions iff b ≤ c ≤ B. 10 Extreme Value Test Contd.. Example: f: R2 R; f(x,y) = 2x + 3y S = {(x,y): 0<=x<=1 & 0<=y<=1} contained in R2 lower bound, b=0; upper bound, B=5 1. Is there a real solution to the equation 2x + 3y = 4 ? Yes. (0.5,1) is a solution, there are many others. 2. Is there an integer solution in S ? No. f(0,0)=0; f(0,1)=3; f(1,0)=2; & f(1,1)=5. For none of the integer points in S, f(x,y)=4. If there are no real solutions, there are no integer solutions. If there are real solutions, then integer solutions may or may not exist. 11 Extreme Value Test Contd.. Example: DO I = 1, 10 DO J = 1, 10 A[10*I+J-5] = ….A[10*I+J-10]…. 10*I1+J1-5 = 10*I2+J2-10 10*I1-10*I2+J1-J2 = -5 f: R4 R; f(I1,I2,J1,J2) = 10*I1-10*I2+J1-J2 1<=I1,I2,J1,J2<=10; lower bound, b=-99; upper bound, B=+99 since -99 <= -5 <= +99 there is a dependence. 12 Extreme Value Test Contd.. 13 Extreme Value Test Contd.. 14 Extreme Value Test Contd.. 15 Extreme Value Test Contd.. 16 Nested Loops – Multidimensional Arrays 17 Nested Loops Contd.. 18