FACULTY OF INFORMATION TECHNOLOGY AND COMMUNICATION UNIVERSITY TECHNICAL OF MELAKA (UTeM) COMPUTER ARCHITECTURE AND COMPILER (MITS5113) Project 10: Influence of the Cache Coherence Protocol on the Bus Traffic ATTENTION : PROFESOR MADYA DR SAZILAH BINTI SALAM ASRAR NAJIB BIN YUNOS M031010039 Project 10: Influence of the Cache Coherence Protocol on the Bus Traffic Purpose Analyse the influence of the cache coherence protocol on the bus traffic during the execution of a parallel program in a SMP. Development Configure a system with the following architectural characteristics: • Processors in SMP = 8. • Scheme for bus arbitration = LRU. • Word wide (bits) = 16. • Words by block = 32 (block size = 64 bytes). • Blocks in main memory = 524288 (main memory size = 32 MB). • Blocks in cache = 256 (cache size = 16 KB). • Mapping = Set-Associative. • Cache sets = 64 (four-way set associative caches). • Replacement policy = LRU. Configure the cache coherence protocol using the following configurations: MSI, MESI, and DRAGON. For each of the configurations, obtain the bus traffic (in bytes per memory access) for the system using the trace files: FFT, Simple, Speech and Weather. In order to compute the bus traffic, assume that cache block transfers move 64 bytes (the block size) on the bus data lines, and that each bus transaction involves six bytes of command and address on the bus address lines. Therefore, you can compute the address traffic (including command) by multiplying the obtained bus transactions by the traffic per transaction (6 bytes). In the same way, you can compute the data traffic by multiplying the number of block transfers by the traffic per transfer (64 bytes). The total bus traffic, in bytes per memory access, will be the addition of these two quantities divided by the number of memory accesses (references) in the trace (see Table 2). Configuration MSI View Cache Evalution In The Text Format for MSI and HYDRO memory Traces. View Cache Evalution In The Text Format for MSI and EAR memory Traces. View Cache Evalution In The Text Format for MSI and NASA7 memory Traces. View Cache Evalution In The Graphic Format fro MSI and HYDRO memory Traces. View Cache Evalution In The Graphic Format fro MSI and EAR memory Traces. View Cache Evalution In The Graphic Format fro MSI and NASA7 memory Traces. Configuration MESI View Cache Evolution in Text Format for MESI and HYDRO memory Traces. View Cache Evolution in Text Format for MESI and EAR memory Traces. View Cache Evolution in Text Format for MESI and NASA7 memory Traces. View Cache Evalution In The Graphic Format fro MESI and HYDRO memory Traces. View Cache Evalution In The Graphic Format fro MESI and EAR memory Traces. View Cache Evalution In The Graphic Format fro MESI and NASA7 memory Traces. Configuration DRAGON View Cache Evolution in Text Format for DRAGON and HYDRO memory Traces. View Cache Evolution in Text Format for DRAGON and EAR memory Traces View Cache Evolution in Text Format for DRAGON and NASA7 memory Traces. View Cache Evalution In The Graphic Format fro DRAGON and HYDRO memory Traces. View Cache Evalution In The Graphic Format fro DRAGON and EAR memory Traces. View Cache Evalution In The Graphic Format fro DRAGON and NASA7 memory Traces. RESULT CACHE COHERENCE PROTOCOL (MSI, MESI AND DRAGON) Given: Traffic per transaction Traffic per transfer Memory Access: bytes 6 64 NASA7 HYDRO COMP Reference 27497 30751 38409 PROCESSOR CACHE 8 (MSI) Block Data Transfer Traffic Memory Tracer Bus Transaction Address Traffic HYDRO 509 3054 488 31232 34286 EAR 630 3780 589 37696 41476 NASA7 449 2694 423 27072 Memory Tracer Bus Transaction Address Traffic HYDRO 520 3120 500 32000 35120 EAR 630 3780 581 37184 40964 NASA7 449 2694 423 27072 Memory Tracer Bus Transaction HYDRO 539 3234 263.875 16888 20122 EAR 496 2976 242.1875 15500 18476 NASA7 510 3060 248.4375 15900 18960 PROCESSOR CACHE 8 (MESI) Block Data Transfer Traffic PROCESSOR CACHE 8 (DRAGON) Address Block Data Traffic Transfer Traffic TOTAL BUS TRAFFIC HITS RATE % MISS RATE 29766 77.01 89.412 77.951 22.99 10.588 22.049 TOTAL BUS TRAFFIC HITS RATE % MISS RATE 29766 91.067 96.157 90.674 22.567 10.494 22.049 TOTAL BUS TRAFFIC HITS RATE % MISS RATE 91.067 96.157 90.674 8.9328 3.8433 9.3261 COHERENCE PROTOCOL USING HYDRO MEMORY TRACERS HYDRO MEMORY TRACERS MSI MESI DRAGON TOTAL BUS TRAFFIC HITS RATE % MISS RATE 34286 77.01 22.99 35120 91.067 22.567 20122 91.067 8.9328 TOTAL BUS TRAFFIC 40000 30000 TOTAL BUS TRAFFIC 20000 10000 0 MSI MESI DRAGON CACHE COHERENCE PROTOCOL 120 22.567 100 22.99 91.067 80 8.9328 91.067 77.01 MISS RATE 60 HITS RATE % 40 20 0 MSI MESI DRAGON Fig 1 COHERENCE PROTOCOL USING EAR MEMORY TRACERS EAR MEMORY TRACERS TOTAL BUS TRAFFIC HITS RATE % MISS RATE 41476 89.412 10.588 40964 96.157 10.494 18476 96.157 3.8433 MSI MESI DRAGON TOTAL BUS TRAFFIC 50000 40000 30000 TOTAL BUS TRAFFIC 20000 10000 0 MSI MESI DRAGON CACHE COHERENCE PROTOCOL 110 10.494 105 100 10.588 3.8433 96.157 95 96.157 MISS RATE HITS RATE % 90 89.412 85 80 MSI MESI DRAGON Fig 2 COHERENCE PROTOCOL USING NASA7 MEMORY TRACERS NASA7 MEMORY TRACERS TOTAL BUS TRAFFIC HITS RATE % MISS RATE 29766 77.951 22.049 29766 90.674 22.049 18960 90.674 9.3261 MSI MESI DRAGON TOTAL BUS TRAFFIC DRAGON MESI TOTAL BUS TRAFFIC MSI 0 10000 20000 30000 CACHE COHERENCE PROTOCOL 120 22.049 100 22.049 90.674 80 9.3261 90.674 77.951 MISS RATE 60 HITS RATE % 40 20 0 MSI MESI DRAGON Fig 3 Answers : 1. Do all the protocols have the same bus traffic? Which is the coherence protocol with the best bus traffic? And which does it have the worst? In particular, is the bus traffic the same for the MSI and MESI protocols? Why (for this answer, remember how the miss rate was for these two protocols –project 9)? Average total bus traffic for DRAGON is 19186 compared for average MSI 35176 and MESI 35283. Lowest total bus traffic is the best. So DRAGON memory tracers is the best traffic for coherence protocol. For the worst in this case is MSI and MESI memory tracers. The bus traffic are the same for the MSI and MESI protocols because both protocols are invalidation-based protocols. The more miss rate and bus traffic will occured. It is possible because with a invalidation-based protocol, like MESI and MSI protocol, the more processors are, it will caused several caches shared the same block, and hence it on a write operation, a cache force the other caches to invalidate that block. That will produce new misses (coherence misses) and increasing the number of block transfers. On the other hand, the greater number of processors, the greater number of bus transactions is needed to hold the cache coherence. In short, as the number of processors increases for a given problem size, the working set starts to fit in the cache, and a domination by local misses (mainly, capacity misses) is replaced by a domination of coherence misses. 2. Do you observe any difference between the update-based protocol and the invalidationbased protocols? Which? Why (give at least two reasons)? Yes. Update based protocol is DRAGON protocol and invalidation based protocols is MSI and MESI protocol Update Based Protocols Update protocols all coherence misses that eliminated since all copies of a memory block are updated with the new value instead of invalidated on a write request to a shared block. The price to pay for the elimination of the coherence misses is the increasing number of global write actions. Invalidate Based protocols Lower traffic caused by the sequence of writes from the same processor with no intervening access from other processors, only the first write causes global traffic. Unfortunately, the invalidation of remote copies will lead to coherence misses. 3. Do you think that the results and conclusions obtained with these experiments are of general application or they may change depending on the used benchmarks? Indicate other scenario in which the invalidation protocol does much better than the update protocol. In conclusion, does the use of a concrete cache coherence protocol improve the multiprocessor system performance? Why? Are the conclusions you obtain similar to the previous ones for the miss rate (project 9)? The influence of the cache coherence protocol on the miss rate and the bus traffic. As before, the traffic is split into data traffic and address (including command) bus traffic. The same architecture is also considered (8 processors, 16-bit words, 64-byte blocks, four-way set associative caches and LRU replacement). We have gathered the results shown in Fig. 1, 2 and Fig. 3. The miss rate for the MSI and MESI protocols is the same. This is consistent with the theory because the MESI protocol is only an improvement of the MSI protocol (it adds the “Exclusive” state) in order to reduce the number of bus transactions due to the coherence protocol. In fact, Fig. 1,2 and 3 tell us that, although the MSI and MESI protocols have the same miss rate, the MESI protocol generates less bus traffic. Also, it may be observed that the Dragon protocol has the least miss rate and bus traffic. This is possible because the Dragon protocol is a update-based protocol whereas the MSI and MESI protocols are invalidation-based protocols. In update-based protocols, whenever a shared location is written to by a processor, its value is updated in the caches of all other processors holding that memory block. Furthermore, in the Dragon protocol, updates are improved to a single-word write (the specific modified word) rather than a full cache block transfer. In contrast, with invalidation-based protocols, on a write operation the cache state of that memory block in all other processors’ caches is set to invalid, so those processors will have to obtain the block through a miss (a coherence miss) and hence a larger bus traffic. Of course, it is easy to construct other scenarios in which the invalidation protocol does much better than the update protocol. We can conclude that, the greater number of processors for a parallel application, the more miss rate and bus traffic. It is possible because with a invalidation-based protocol, like the MESI protocol, the more processors are, the more possible is that several caches share the same block, and hence that, on a write operation, a cache force the other caches to invalidate that block, producing new misses (coherence misses) and increasing the number of block transfers. On the other hand, the greater number of processors, the greater number of bus transactions is needed to hold the cache coherence. In short, as the number of processors increases for a given problem size, the working set starts to fit in the cache, and a domination by local misses (mainly, capacity misses) is replaced by a domination by coherence misses.