Uploaded by abdelrhman yousry

HPC 2021 Final Answers

advertisement
Q1)
1.
2.
3.
4.
5.
a
b
d
‫( ينفعوا المفروض؟‬c,d)
a
0
1
2
3
3 levels, each node has 4 children : 4 (𝑟𝑜𝑜𝑡) + 4 + 4 + 4
6. d
7. d
8. b
= 85
Q2)
a) b)
Star
Linear Array
Ring
Diameter
2
2
(p - 1)
1
Bisection Width
1
1
2
Connectivity
1
1
2
Cost
2
(p - 1)
2
(p - 1)
3
(p)
Graphical
Representation
(
𝑝
2
)
c)
Note : ‫الحل بإفتراض ان كل بروسيسور يقدر يبعت بس او يستقبل بس في نفس الوقت‬
Star
T1
𝑃1 −> 𝑃2
Linear Array
Ring
𝑃1 −> 𝑃2
𝑃1 −> 𝑃2
T2
𝑃2 −> 𝑃3
𝑃2 −> 𝑃3
𝑃2 −> 𝑃3
T3
𝑃3 −> 𝑃2
𝑃3 −> 𝑃2
𝑃3 −> 𝑃1
T4
𝑃2 −> 𝑃1
𝑃2 −> 𝑃1
Total
4
4
3
ِQ3)
1)
9 bits : 1 presence bit for each processor + 1 dirty bit
2)
Final Answer :
(‫)الشرح بالتفصيل تحتها‬
State of 𝑃0
State of 𝑃1
State of 𝑃2
D:
LM :
C:
D:
LM :
C:
D : [U] 0 0 0
LM : a = 7
C:
𝑃0 reads a
D:
LM :
C: a = 7
D:
LM :
C:
D : [S] 1 0 0
LM : a = 7
C:
𝑃2 reads a
D:
LM :
C: a = 7
D:
LM :
C:
D : [S] 1 0 1
LM : a = 7
C: a = 7
𝑃0 writes 6 to a
D:
LM :
C: a = 6
D:
LM :
C:
D : [E] 1 0 0
LM : a = 7
C:
𝑃1 reads a
D:
LM :
C: a = 6
D:
LM :
C: a = 6
D : [S] 1 1 0
LM : a = 6
C:
𝑃2 writes 5 to a
D:
LM :
C:
D:
LM :
C:
D : [E] 0 0 1
LM : a = 6
C: a = 5
Detailed Steps :
Initial State :
• Directory set to Uncached
State of 𝑃0
D:
LM :
C:
State of 𝑃1
D:
LM :
C:
State of 𝑃2
D : [U] 0 0 0
LM : a = 7
C:
𝑃0 reads a :
• Reading uncached value
1. Change directory entry to Shared
2. Add presence bit for 𝑃0
3. Read from 𝑃2 memory
4. Add it 𝑃0 cache
State of 𝑃0
𝑃0 reads a
D:
LM :
C: a = 7
State of 𝑃1
D:
LM :
C:
State of 𝑃2
D : [S] 1 0 0
LM : a = 7
C:
𝑃2 reads a :
• Reading Cached value
1. Add presence bit for 𝑃2
2. Read into cache
State of 𝑃0
𝑃2 reads a
D:
LM :
C: a = 7
State of 𝑃1
D:
LM :
C:
State of 𝑃2
D : [S] 1 0 1
LM : a = 7
C: a = 7
𝑃0 writes 6 to a :
• Writing to shared value
1. Invalidate in all processors that has it (‫)شيلها من الكاش بتاعهم‬
2. Change directory entry to Exclusive and remove all presence bits
3. Add present bit for 𝑃0
State of 𝑃0
𝑃0 writes 6 to a
D:
LM :
C: a = 6
State of 𝑃1
D:
LM :
C:
State of 𝑃2
D : [E] 1 0 0
LM : a = 7
C:
𝑃1 reads a :
• Reading Exclusive value
1. Change directory entry to Shared
2. Read it from the cache of the processor that has it (𝑃0)
3. Update the memory entry with the new value ( 𝑃2‫)الميموري بتاعة‬
4. Add presence bit for 𝑃1
5. Read from the memory to 𝑃1 cache
State of 𝑃0
𝑃1 reads a
D:
LM :
C: a = 6
State of 𝑃1
D:
LM :
C: a = 6
State of 𝑃2
D : [S] 1 1 0
LM : a = 6
C:
𝑃2 writes 5 to a :
• Writing to shared value
1. Invalidate in all processors that has it (‫)شيلها من الكاش بتاعهم‬
2. Change directory entry to Exclusive and remove all presence bits
3. Add present bit for 𝑃2
State of 𝑃0
𝑃2 writes 5 to a
D:
LM :
C:
State of 𝑃1
D:
LM :
C:
State of 𝑃2
D : [E] 0 0 1
LM : a = 6
C: a = 5
Q4)
1.
a. The task size is small and the frequency of communication is high
b. Since the cache can maintain more records
c. Efficiency is affected by the amount of the problem size that is
parallelizable since a non-parallelizable large problem running on a
parallel machine has no increase in efficiency
2.
Max Degree of concurrency : 5
Critical Path : 4
Critical path length : 𝑇0 + 𝑇3 + 𝑇8 + 𝑇9 = 4 + 3 + 8 + 4 = 19
Total Work : 4 + 3 + 1 + 3 + 2 + 3 + 7 + 5 + 8 + 4 = 40
Average degree of concurrency :
40
19
= 2.1
Download