Exercises, Fall 2012 1. Consider the following cache: block size: 64

advertisement
Exercises, Fall 2012
1. Consider the following cache:
 block size: 64 Bytes
 4-way set associative
 256 blocks
 Write Back with Dirty Bit
 Valid Bit
 Random Replacement strategy: A new block is always written to a random block position in the
corresponding set.
 32-bit address
 32bit data
a) How many Tag comparisons are done for a cache access in a set?
b) Find the number of tag, index and offset bits.
c) Find the total size of cache. Show your work.
d) How is the set number calculated for this cache configuration?
e) The processor accesses the following blocks for reading in the given order. The cache is empty at
program start. Block accesses: 0 - 12 - 144 - 12 - 256 - 76 - 18 - 140 - 204 - 5 - 10 - 140 - 64 – 256.
Calculate the set position of each accessed block.
Block 0
Set
12
144
12
256
76
18
140
204
5
10
140
64
256
f) What is the hit rate in best case and in worst case?
2. Consider three processors with different cache configurations:
Cache 1: Direct mapped with one-word blocks
Cache 2: Direct mapped with four-word blocks
Cache 3: Two-way set associative with four-word blocks
The following miss rate measurements have been made:
Cache 1: Instruction miss rate is 4%; data miss rate is 6%
Cache 2: Instruction miss rate is 2%; data miss rate is 4%
Cache 3: Instruction miss rate is 2%; data miss rate is 3%
a) For these processors, one-half of the instructions contain a data reference. Assume that cache miss
penalty is 6 plus the block size in words. The CPI for this workload was measure d on processor with
cache 1 and was found to be 2.0. Determine which processor spends most cycles on a cache miss. Show
your work.
b) The cycle times for the processors above are 420ps for the first and second processors and 310ps for
the third processor. Determine which processor is the fastest and which is the slowest. Show your work.
3. Assume a processor has a clock rate of 500 MHz and an ideal CPI (no memory misses) of1.0. What is
the effective CPI if a program with a mix of 50% arithmetic and logic, 30% load/stores and 20% control
instructions is run, if 10% of the data memory operations and 1% of the instructions have a miss penalty
of 50 cycles. Show the equation you used to get your answer.
4. Given is an 8 way set associative level 2 data cache with a capacity of 2 MByte (1MByte = 220 Byte)
and a block size 128 Bytes. The cache is connected to the main memory by a shared 32 bit address and
data bus. The cache and the RISC-CPU are connected by a separated address and data bus, each with a
width of 32 bit. The CPU is executing a LW instruction.
a) How much user data is transferred from the main memory to the cache in case of a cache miss?
b) How much user data is transferred from the cache to the CPU in case of a cache miss?
5. A computer system has a 32K byte, 8-way set associative cache, and the block size is 8
bytes. The machine is byte addressable, and physical addresses generated by the CPU are
22 bits. Specify how the physical address is partitioned into tag, set, and offset fields, giving
the number of bits in each field.
6. Assume an instruction cache miss rate for gcc of 2% and a data cache miss rate of 4%. If a machine has
a CPI of 2 without any memory stalls and the miss penalty is 40 cycles for all misses, determine how
much faster a machine would run with a perfect cache that never missed. Assume 36% of instructions are
loads/stores. (6 pts)
7. Cache1 is direct- mapped , Cache2 is fully associative, and Cache3 is 2-way set associative.
Each has 4, one-word blocks (4 total words). Assume that the miss penalty for each cache is 10
clock cycles. Assume that the caches are initially empty. Using word addresses, fill in the chart
below whether each memory hits or misses and which block it would be in, for all of the caches.
At the bottom of the chart, compute the hit rate and the total miss penalty. Use an LRU strategy
for replacement when appropriate.
Download