EASTERN MEDITERRANEAN UNIVERSITY Department of Computer Engineering Quiz1 CMPE-523 Parallel Programming 16.04.2014 (2 points, 90 min) Student’s Name-Surname _____________________________________ Student’s Id _________________________________________ Instructor Alexander Chefranov Task 1. (0.5 point). Consıder an expression ( A B * C C D * A) * ( D G * B) . Draw dependence graph for the expression, give its size and depth. Generate code for evaluation of the expression using instructions of the format: Op1= op2 ”operation” op3, where op3 and op2 are operands of the “operation”, and op1 is a variable for keeping result of the operation. Assume that a computer has 2 adders and 2 multipliers. Addition takes 1 time unit, and multiplication takes 3 time units. Draw a time diagram showing execution of the code for the expression. A + + B C * + * * D G * Size=8,depth=5 S1: t1=B*C S2: t1=a+t1 S3: t1=c+t1 S4: t2=a*d S5: t1=t1+t2 S6: t3=b*g S7: t3=t3+d S8: t1=t1*t3 A1 A2 M1 S6 S6 M2 S1 S1 + S6 S1 S7 S2 S3 S4 S4 S5 S4 S8 S8 S8 1 Task 2. (0.5 point). Draw a dependence graph for a scalar product calculation 8 p xi yi i 1 Write SIMD pseudocode for its calculation. Assume that addition takes 1 time unit, and multiplication takes 3 time units. What is the minimal number of processors providing maximal performance for that program? Estimate speedup and efficiency for that number of processors. X1 y1 * X2 * + Y2 X3 Y3 * X4 Y4 * X5 * Y5 * + + + | + X6 Y6 X7 * Y7 X8 Y8 * + + X(i)=x(i)*y(i),(i=1,8) X(2i-1)=x(2i-1)+x(2i),(i=1,4) X(4i+1)=x(4i+1)+x(4i+3),(i=0,1) P=x(1)+x(5) 8 T1=8*3+7=31 T8=3+3=6 S8=31/6 E8=s8/8=31/48 2 Task 3. (0.5 point). Draw a data dependence diagram for an iterative solution of a onedimensional Poisson’s equation below assuming N=6, M=3 and taking the whole assignment as one operation.Write a SIMD program for it. For k:=1 step 1 until M For i:=1 step 1 until N-1 a[i]:=(a[i-1]+a[i+1])/2+b[i]; 0 1 2 3 4 5 6 Layers of mutually parallel operations are shown by connecting them not-arrowed lines Respective SIMD code follows: A(1)=(a(0)+a(2))/2+b(1) A(2)=(a(1)+a(3))/2+b(2) A(2i-1)=(a(2i-2)+a(2i))/2+b(2i-1),(i=1,2) A(2i)=(a(2i-1)+a(2i+1))/2+b(2i),(i=1,2) A(2i-1)=(a(2i-2)+a(2i))/2+b(2i-1),(i=1,3) A(2i)=(a(2i-1)+a(2i+1))/2+b(2i),(i=1,2) A(2i+1)=(a(2i)+a(2i+2))/2+b(2i+1),(i=1,2) A(4)=(a(3)+a(5))/2+b(4) A(5)=(a(4)+a(6))/2+b(5) 3 Task 4. (0.5 point). A SIMD computer’s vector unit executes with the rate 120 MFLOPS, and its scalar unit averages at 15 MFLOPS. What is the average execution rate of that machine on an algorithm with 120 M floating point operations for vector execution and 15 M floating point operations for serial execution? Provide details of your calculations together with necessary explanations Ra=W/T=(S+P)/T (1) T=S/Rs+P/Rp (2) Where W=135 M is the total number of operations, S=15 M is the number of serial operations, P=120 M is the number of parallel operations, Rs=15 MFLOPS is the scalar rate, Rp=120 MFLOPS is the rate of parallel operations execution, Ra is the average rate to be calculated From (1), (2), we get Ra=1/(f/Rs+(1-f)/Rp), (3) Where F=S/(S+P)=15/135=1/9 (4) is the fraction of serial operations in the algorithm From (3), (4), we get Ra=1/(1/(9*15)+8/(9*120))=9*120/(8+8)=9*120/16=9*30/4=9*15/2=135/2=67.5 MFLOPS This we can obtain also since from (2), T=15/15+120/120=2 s, and from (1), Ra=W/T=135 M/2 s=67.5 MFLOPS 4