Computer Engineering Department - Eastern Mediterranean University

advertisement
EASTERN MEDITERRANEAN UNIVERSITY
Department of Computer Engineering
Quiz1 CMPE-523 Parallel Programming 16.04.2014 (2 points, 90 min)
Student’s Name-Surname _____________________________________
Student’s Id _________________________________________
Instructor Alexander Chefranov
Task 1. (0.5 point). Consıder an expression ( A  B * C  C  D * A) * ( D  G * B) . Draw
dependence graph for the expression, give its size and depth. Generate code for evaluation of the
expression using instructions of the format:
Op1= op2 ”operation” op3,
where op3 and op2 are operands of the “operation”, and op1 is a variable for keeping result of the
operation. Assume that a computer has 2 adders and 2 multipliers. Addition takes 1 time unit, and
multiplication takes 3 time units. Draw a time diagram showing execution of the code for the
expression.
A
+
+
B
C
*
+
*
*
D
G
*
Size=8,depth=5
S1: t1=B*C
S2: t1=a+t1
S3: t1=c+t1
S4: t2=a*d
S5: t1=t1+t2
S6: t3=b*g
S7: t3=t3+d
S8: t1=t1*t3
A1
A2
M1
S6
S6
M2
S1
S1
+
S6
S1
S7
S2
S3
S4
S4
S5
S4
S8
S8
S8
1
Task 2. (0.5 point). Draw a dependence graph for a scalar product calculation
8
p   xi yi
i 1
Write SIMD pseudocode for its calculation. Assume that addition takes 1 time unit, and
multiplication takes 3 time units. What is the minimal number  of processors providing maximal
performance for that program? Estimate speedup and efficiency for that number  of processors.
X1
y1
*
X2
*
+
Y2
X3
Y3
*
X4
Y4
*
X5
*
Y5
*
+
+
+
|
+
X6
Y6
X7
*
Y7
X8
Y8
*
+
+
X(i)=x(i)*y(i),(i=1,8)
X(2i-1)=x(2i-1)+x(2i),(i=1,4)
X(4i+1)=x(4i+1)+x(4i+3),(i=0,1)
P=x(1)+x(5)
 8
T1=8*3+7=31
T8=3+3=6
S8=31/6
E8=s8/8=31/48
2
Task 3. (0.5 point). Draw a data dependence diagram for an iterative solution of a onedimensional Poisson’s equation below assuming N=6, M=3 and taking the whole assignment as
one operation.Write a SIMD program for it.
For k:=1 step 1 until M
For i:=1 step 1 until N-1
a[i]:=(a[i-1]+a[i+1])/2+b[i];
0
1
2
3
4
5
6
Layers of mutually parallel operations are shown by connecting them not-arrowed lines
Respective SIMD code follows:
A(1)=(a(0)+a(2))/2+b(1)
A(2)=(a(1)+a(3))/2+b(2)
A(2i-1)=(a(2i-2)+a(2i))/2+b(2i-1),(i=1,2)
A(2i)=(a(2i-1)+a(2i+1))/2+b(2i),(i=1,2)
A(2i-1)=(a(2i-2)+a(2i))/2+b(2i-1),(i=1,3)
A(2i)=(a(2i-1)+a(2i+1))/2+b(2i),(i=1,2)
A(2i+1)=(a(2i)+a(2i+2))/2+b(2i+1),(i=1,2)
A(4)=(a(3)+a(5))/2+b(4)
A(5)=(a(4)+a(6))/2+b(5)
3
Task 4. (0.5 point). A SIMD computer’s vector unit executes with the rate 120 MFLOPS, and its
scalar unit averages at 15 MFLOPS. What is the average execution rate of that machine on an
algorithm with 120 M floating point operations for vector execution and 15 M floating point
operations for serial execution? Provide details of your calculations together with necessary
explanations
Ra=W/T=(S+P)/T
(1)
T=S/Rs+P/Rp
(2)
Where W=135 M is the total number of operations, S=15 M is the number of serial operations,
P=120 M is the number of parallel operations, Rs=15 MFLOPS is the scalar rate, Rp=120
MFLOPS is the rate of parallel operations execution, Ra is the average rate to be calculated
From (1), (2), we get
Ra=1/(f/Rs+(1-f)/Rp),
(3)
Where
F=S/(S+P)=15/135=1/9
(4)
is the fraction of serial operations in the algorithm
From (3), (4), we get
Ra=1/(1/(9*15)+8/(9*120))=9*120/(8+8)=9*120/16=9*30/4=9*15/2=135/2=67.5 MFLOPS
This we can obtain also since from (2), T=15/15+120/120=2 s, and from (1), Ra=W/T=135 M/2
s=67.5 MFLOPS
4
Download