CS5334-HW2

advertisement
Parallel & Concurrent Programming
CS 5334
Problem 2.14
Given a processor array containing 8 processing elements, each capable of performing
10 million integer operations per second, determine the performance in millions of
operations per second of this processor array adding two integer vectors for vectors of
sizes 10, 20, 30, 40, 50.
Solution: If the processing array is capable of 10^7 operations per second, then it takes
each processing element 10^-7 seconds to perform an integer operation.
For a vector of size 10, and processing element is of size 8, the processing array
requires two clock cycles to accomplish the computation. Thus, the performance is:
10/ ( 2*10^-7) = 5 * 10^7 operations per second.
If the vector is of size 20, then the processor array requires three clock cycles to add the
two vectors. Thus, the performance is:
20/ (3*10^-7) = 6.66 * 10 ^ 7 operations per second.
A size 30 vector requires four clock cycles to add the vectors in the processor array .
Consequently, the performance is:
30 / (4*10^-7) = 7.5 * 10 ^ 7 operations per second.
When using vectors of length 40, the processor array needs 5 clock cycles. Therefore,
the performance is:
40/ (5*10^-7) = 8*10^7 operations per second.
But, if the vectors added are length 50, six clock cycles are needed by the array. As a
result, the performance is:
50 / (6*10^-7) = 8.33 * 10^7 operations per second.
Problem 2.17
Why is the number of processors in a centralized multiprocessor limited to a few dozen?
Answer: On a centralized multiprocessor, all the processors share a common memory
bus. If the number of processors increase, data transmission also increases on the bus.
However, the memory bus, with a limited bandwidth, can only support a few processors
before it becomes a bottleneck.
Problem 2.18
A directory based protocol is a popular way to implement cache coherence on a
distributed multiprocessor.
a. Why should the directory be distributed among the multiprocessor's local memories?
Answer: If the directory existed on only one local memory, the one processor attached
to this memory must process the all cache coherence operations. This could potentially
lead to a bottleneck. The directory is distributed among the distinct local memories to
utilize all processors to handle cache coherence and avoid a bottleneck on any one
processor.
b. Why are the contents of the directory not replicated?
Answer: In a distributed multiprocessor, each processor has its own separate local
memory. As copies of these local memories are loaded into caches (of other
processors), coherence must be maintained between the other copies and the original
block in the primary memory. Since the memory is distributed (not shared), the
directory of each local memory must keep track of which processors have cache copies
of its own memory block so that, if one processor modifies its cache copy, it invalidates
all other copies of it in other caches. The contents of each primary memory may vary.
Thus, each directory must keep track of cache copies from distinct memory blocks of
distinct primary memories. If the contents of the directory were replicated, this would
imply that the memory of each processor contains exactly the same memory blocks,
with copies in exactly the same caches. It is inappropriate to assume this.
Problem 2.19
Continue the illustration of a directory based cache coherence protocol begun in Figure
2.16. Assume the following five operations now occur in the order listed:
1. CPU 2 reads X
2. CPU 2 writes 5 to X.
3. CPU 1 reads X.
4. CPU 0 reads X.
5. Cpu 1 writes 9 to X.
Show the states of the directories, caches, and memories after each of these
operations.
Answer: See attachment.
Problem 2.22
Explain why contemporary supercomputers are invariably multicomputers.
Answer: The need to support data and functional parallelism drives supercomputers to
incorporate Multiple Instruction and Multiple Data (MIMD). While multiprocessors and
multicomputers both fall into this category, multicomputers avoid the pitfalls of cache
coherence found in multiprocessors.
Download