Project 10- coherence protocol on the bus traffic

advertisement
FACULTY OF INFORMATION TECHNOLOGY AND COMMUNICATION
UNIVERSITY TECHNICAL OF MELAKA (UTeM)
COMPUTER ARCHITECTURE AND COMPILER (MITS5113)
Project 10: Influence of the Cache Coherence Protocol on the Bus Traffic
ATTENTION :
PROFESOR MADYA DR SAZILAH BINTI SALAM
ASRAR NAJIB BIN YUNOS
M031010039
Project 10: Influence of the Cache Coherence Protocol on the Bus Traffic
Purpose
Analyse the influence of the cache coherence protocol on the bus traffic during the
execution of a parallel program in a SMP.
Development
Configure a system with the following architectural characteristics:
• Processors in SMP = 8.
• Scheme for bus arbitration = LRU.
• Word wide (bits) = 16.
• Words by block = 32 (block size = 64 bytes).
• Blocks in main memory = 524288 (main memory size = 32 MB).
• Blocks in cache = 256 (cache size = 16 KB).
• Mapping = Set-Associative.
• Cache sets = 64 (four-way set associative caches).
• Replacement policy = LRU.
Configure the cache coherence protocol using the following configurations: MSI, MESI, and
DRAGON. For each of the configurations, obtain the bus traffic (in bytes per memory access) for the
system using the trace files: FFT, Simple, Speech and Weather. In order to compute the bus traffic,
assume that cache block transfers move 64 bytes (the block size) on the bus data lines, and that each
bus transaction involves six bytes of command and address on the bus address lines. Therefore, you
can compute the address traffic (including command) by multiplying the obtained bus transactions
by the traffic per transaction (6 bytes). In the same way, you can compute the data traffic by
multiplying the number of block transfers by the traffic per transfer (64 bytes). The total bus traffic,
in bytes per memory access, will be the addition of these two quantities divided by the number of
memory accesses (references) in the trace (see Table 2).
Configuration MSI
View Cache Evalution In The Text Format for MSI and HYDRO memory Traces.
View Cache Evalution In The Text Format for MSI and EAR memory Traces.
View Cache Evalution In The Text Format for MSI and NASA7 memory Traces.
View Cache Evalution In The Graphic Format fro MSI and HYDRO memory Traces.
View Cache Evalution In The Graphic Format fro MSI and EAR memory Traces.
View Cache Evalution In The Graphic Format fro MSI and NASA7 memory Traces.
Configuration MESI
View Cache Evolution in Text Format for MESI and HYDRO memory Traces.
View Cache Evolution in Text Format for MESI and EAR memory Traces.
View Cache Evolution in Text Format for MESI and NASA7 memory Traces.
View Cache Evalution In The Graphic Format fro MESI and HYDRO memory Traces.
View Cache Evalution In The Graphic Format fro MESI and EAR memory Traces.
View Cache Evalution In The Graphic Format fro MESI and NASA7 memory Traces.
Configuration DRAGON
View Cache Evolution in Text Format for DRAGON and HYDRO memory Traces.
View Cache Evolution in Text Format for DRAGON and EAR memory Traces
View Cache Evolution in Text Format for DRAGON and NASA7 memory Traces.
View Cache Evalution In The Graphic Format fro DRAGON and HYDRO memory Traces.
View Cache Evalution In The Graphic Format fro DRAGON and EAR memory Traces.
View Cache Evalution In The Graphic Format fro DRAGON and NASA7 memory Traces.
RESULT CACHE COHERENCE PROTOCOL (MSI, MESI AND DRAGON)
Given:
Traffic per
transaction
Traffic per transfer
Memory
Access:
bytes
6
64
NASA7
HYDRO
COMP
Reference
27497
30751
38409
PROCESSOR CACHE 8 (MSI)
Block
Data
Transfer
Traffic
Memory Tracer
Bus
Transaction
Address
Traffic
HYDRO
509
3054
488
31232
34286
EAR
630
3780
589
37696
41476
NASA7
449
2694
423
27072
Memory Tracer
Bus
Transaction
Address
Traffic
HYDRO
520
3120
500
32000
35120
EAR
630
3780
581
37184
40964
NASA7
449
2694
423
27072
Memory Tracer
Bus
Transaction
HYDRO
539
3234
263.875
16888
20122
EAR
496
2976
242.1875
15500
18476
NASA7
510
3060
248.4375
15900
18960
PROCESSOR CACHE 8 (MESI)
Block
Data
Transfer
Traffic
PROCESSOR CACHE 8 (DRAGON)
Address
Block
Data
Traffic
Transfer
Traffic
TOTAL BUS
TRAFFIC
HITS RATE %
MISS
RATE
29766
77.01
89.412
77.951
22.99
10.588
22.049
TOTAL BUS
TRAFFIC
HITS RATE %
MISS
RATE
29766
91.067
96.157
90.674
22.567
10.494
22.049
TOTAL BUS
TRAFFIC
HITS RATE %
MISS
RATE
91.067
96.157
90.674
8.9328
3.8433
9.3261
COHERENCE PROTOCOL USING HYDRO MEMORY TRACERS
HYDRO MEMORY TRACERS
MSI
MESI
DRAGON
TOTAL BUS
TRAFFIC
HITS RATE
%
MISS
RATE
34286
77.01
22.99
35120
91.067
22.567
20122
91.067
8.9328
TOTAL BUS TRAFFIC
40000
30000
TOTAL BUS TRAFFIC
20000
10000
0
MSI
MESI
DRAGON
CACHE COHERENCE PROTOCOL
120
22.567
100
22.99
91.067
80
8.9328
91.067
77.01
MISS RATE
60
HITS RATE %
40
20
0
MSI
MESI
DRAGON
Fig 1
COHERENCE PROTOCOL USING EAR MEMORY TRACERS
EAR MEMORY TRACERS
TOTAL BUS
TRAFFIC
HITS RATE
%
MISS
RATE
41476
89.412
10.588
40964
96.157
10.494
18476
96.157
3.8433
MSI
MESI
DRAGON
TOTAL BUS TRAFFIC
50000
40000
30000
TOTAL BUS TRAFFIC
20000
10000
0
MSI
MESI
DRAGON
CACHE COHERENCE PROTOCOL
110
10.494
105
100
10.588
3.8433
96.157
95
96.157
MISS RATE
HITS RATE %
90
89.412
85
80
MSI
MESI
DRAGON
Fig 2
COHERENCE PROTOCOL USING NASA7 MEMORY TRACERS
NASA7 MEMORY TRACERS
TOTAL BUS
TRAFFIC
HITS RATE
%
MISS
RATE
29766
77.951
22.049
29766
90.674
22.049
18960
90.674
9.3261
MSI
MESI
DRAGON
TOTAL BUS TRAFFIC
DRAGON
MESI
TOTAL BUS TRAFFIC
MSI
0
10000
20000
30000
CACHE COHERENCE PROTOCOL
120
22.049
100
22.049
90.674
80
9.3261
90.674
77.951
MISS RATE
60
HITS RATE %
40
20
0
MSI
MESI
DRAGON
Fig 3
Answers :
1. Do all the protocols have the same bus traffic? Which is the coherence protocol with the best
bus traffic? And which does it have the worst? In particular, is the bus traffic the same for the
MSI and MESI protocols? Why (for this answer, remember how the miss rate was for these two
protocols –project 9)?
Average total bus traffic for DRAGON is 19186 compared for average MSI 35176
and MESI 35283. Lowest total bus traffic is the best. So DRAGON memory tracers is the
best traffic for coherence protocol. For the worst in this case is MSI and MESI memory
tracers.
The bus traffic are the same for the MSI and MESI protocols because both protocols
are invalidation-based protocols. The more miss rate and bus traffic will occured. It is
possible because with a invalidation-based protocol, like MESI and MSI protocol, the
more processors are, it will caused several caches shared the same block, and hence it on
a write operation, a cache force the other caches to invalidate that block. That will
produce new misses (coherence misses) and increasing the number of block transfers. On
the other hand, the greater number of processors, the greater number of bus transactions is
needed to hold the cache coherence. In short, as the number of processors increases for a
given problem size, the working set starts to fit in the cache, and a domination by local
misses (mainly, capacity misses) is replaced by a domination of coherence misses.
2. Do you observe any difference between the update-based protocol and the invalidationbased
protocols? Which? Why (give at least two reasons)?
Yes. Update based protocol is DRAGON protocol and invalidation based protocols is
MSI and MESI protocol
Update Based Protocols
Update protocols all coherence misses that eliminated since all copies of a memory
block are updated with the new value instead of invalidated on a write request to a shared
block. The price to pay for the elimination of the coherence misses is the increasing
number of global write actions.
Invalidate Based protocols
Lower traffic caused by the sequence of writes from the same processor with no
intervening access from other processors, only the first write causes global traffic.
Unfortunately, the invalidation of remote copies will lead to coherence misses.
3. Do you think that the results and conclusions obtained with these experiments are of general
application or they may change depending on the used benchmarks? Indicate other scenario in
which the invalidation protocol does much better than the update protocol. In conclusion, does
the use of a concrete cache coherence protocol improve the multiprocessor system
performance? Why? Are the conclusions you obtain similar to the previous ones for the miss
rate (project 9)?
The influence of the cache coherence protocol on the miss rate and the bus traffic. As
before, the traffic is split into data traffic and address (including command) bus traffic.
The same architecture is also considered (8 processors, 16-bit words, 64-byte blocks,
four-way set associative caches and LRU replacement). We have gathered the results
shown in Fig. 1, 2 and Fig. 3. The miss rate for the MSI and MESI protocols is the same.
This is consistent with the theory because the MESI protocol is only an improvement of
the MSI protocol (it adds the “Exclusive” state) in order to reduce the number of bus
transactions due to the coherence protocol. In fact, Fig. 1,2 and 3 tell us that, although the
MSI and MESI protocols have the same miss rate, the MESI protocol generates less bus
traffic. Also, it may be observed that the Dragon protocol has the least miss rate and bus
traffic. This is possible because the Dragon protocol is a update-based protocol whereas
the MSI and MESI protocols are invalidation-based protocols. In update-based protocols,
whenever a shared location is written to by a processor, its value is updated in the caches
of all other processors holding that memory block. Furthermore, in the Dragon protocol,
updates are improved to a single-word write (the specific modified word) rather than a
full cache block transfer. In contrast, with invalidation-based protocols, on a write
operation the cache state of that memory block in all other processors’ caches is set to
invalid, so those processors will have to obtain the block through a miss (a coherence
miss) and hence a larger bus traffic. Of course, it is easy to construct other scenarios in
which the invalidation protocol does much better than the update protocol.
We can conclude that, the greater number of processors for a parallel application, the
more miss rate and bus traffic. It is possible because with a invalidation-based protocol,
like the MESI protocol, the more processors are, the more possible is that several caches
share the same block, and hence that, on a write operation, a cache force the other caches
to invalidate that block, producing new misses (coherence misses) and increasing the
number of block transfers. On the other hand, the greater number of processors, the
greater number of bus transactions is needed to hold the cache coherence. In short, as the
number of processors increases for a given problem size, the working set starts to fit in
the cache, and a domination by local misses (mainly, capacity misses) is replaced by a
domination by coherence misses.
Download