Q1) 1. 2. 3. 4. 5. a b d ( ينفعوا المفروض؟c,d) a 0 1 2 3 3 levels, each node has 4 children : 4 (𝑟𝑜𝑜𝑡) + 4 + 4 + 4 6. d 7. d 8. b = 85 Q2) a) b) Star Linear Array Ring Diameter 2 2 (p - 1) 1 Bisection Width 1 1 2 Connectivity 1 1 2 Cost 2 (p - 1) 2 (p - 1) 3 (p) Graphical Representation ( 𝑝 2 ) c) Note : الحل بإفتراض ان كل بروسيسور يقدر يبعت بس او يستقبل بس في نفس الوقت Star T1 𝑃1 −> 𝑃2 Linear Array Ring 𝑃1 −> 𝑃2 𝑃1 −> 𝑃2 T2 𝑃2 −> 𝑃3 𝑃2 −> 𝑃3 𝑃2 −> 𝑃3 T3 𝑃3 −> 𝑃2 𝑃3 −> 𝑃2 𝑃3 −> 𝑃1 T4 𝑃2 −> 𝑃1 𝑃2 −> 𝑃1 Total 4 4 3 ِQ3) 1) 9 bits : 1 presence bit for each processor + 1 dirty bit 2) Final Answer : ()الشرح بالتفصيل تحتها State of 𝑃0 State of 𝑃1 State of 𝑃2 D: LM : C: D: LM : C: D : [U] 0 0 0 LM : a = 7 C: 𝑃0 reads a D: LM : C: a = 7 D: LM : C: D : [S] 1 0 0 LM : a = 7 C: 𝑃2 reads a D: LM : C: a = 7 D: LM : C: D : [S] 1 0 1 LM : a = 7 C: a = 7 𝑃0 writes 6 to a D: LM : C: a = 6 D: LM : C: D : [E] 1 0 0 LM : a = 7 C: 𝑃1 reads a D: LM : C: a = 6 D: LM : C: a = 6 D : [S] 1 1 0 LM : a = 6 C: 𝑃2 writes 5 to a D: LM : C: D: LM : C: D : [E] 0 0 1 LM : a = 6 C: a = 5 Detailed Steps : Initial State : • Directory set to Uncached State of 𝑃0 D: LM : C: State of 𝑃1 D: LM : C: State of 𝑃2 D : [U] 0 0 0 LM : a = 7 C: 𝑃0 reads a : • Reading uncached value 1. Change directory entry to Shared 2. Add presence bit for 𝑃0 3. Read from 𝑃2 memory 4. Add it 𝑃0 cache State of 𝑃0 𝑃0 reads a D: LM : C: a = 7 State of 𝑃1 D: LM : C: State of 𝑃2 D : [S] 1 0 0 LM : a = 7 C: 𝑃2 reads a : • Reading Cached value 1. Add presence bit for 𝑃2 2. Read into cache State of 𝑃0 𝑃2 reads a D: LM : C: a = 7 State of 𝑃1 D: LM : C: State of 𝑃2 D : [S] 1 0 1 LM : a = 7 C: a = 7 𝑃0 writes 6 to a : • Writing to shared value 1. Invalidate in all processors that has it ()شيلها من الكاش بتاعهم 2. Change directory entry to Exclusive and remove all presence bits 3. Add present bit for 𝑃0 State of 𝑃0 𝑃0 writes 6 to a D: LM : C: a = 6 State of 𝑃1 D: LM : C: State of 𝑃2 D : [E] 1 0 0 LM : a = 7 C: 𝑃1 reads a : • Reading Exclusive value 1. Change directory entry to Shared 2. Read it from the cache of the processor that has it (𝑃0) 3. Update the memory entry with the new value ( 𝑃2)الميموري بتاعة 4. Add presence bit for 𝑃1 5. Read from the memory to 𝑃1 cache State of 𝑃0 𝑃1 reads a D: LM : C: a = 6 State of 𝑃1 D: LM : C: a = 6 State of 𝑃2 D : [S] 1 1 0 LM : a = 6 C: 𝑃2 writes 5 to a : • Writing to shared value 1. Invalidate in all processors that has it ()شيلها من الكاش بتاعهم 2. Change directory entry to Exclusive and remove all presence bits 3. Add present bit for 𝑃2 State of 𝑃0 𝑃2 writes 5 to a D: LM : C: State of 𝑃1 D: LM : C: State of 𝑃2 D : [E] 0 0 1 LM : a = 6 C: a = 5 Q4) 1. a. The task size is small and the frequency of communication is high b. Since the cache can maintain more records c. Efficiency is affected by the amount of the problem size that is parallelizable since a non-parallelizable large problem running on a parallel machine has no increase in efficiency 2. Max Degree of concurrency : 5 Critical Path : 4 Critical path length : 𝑇0 + 𝑇3 + 𝑇8 + 𝑇9 = 4 + 3 + 8 + 4 = 19 Total Work : 4 + 3 + 1 + 3 + 2 + 3 + 7 + 5 + 8 + 4 = 40 Average degree of concurrency : 40 19 = 2.1