Exercises, Fall 2012 1. Consider the following cache: block size: 64 Bytes 4-way set associative 256 blocks Write Back with Dirty Bit Valid Bit Random Replacement strategy: A new block is always written to a random block position in the corresponding set. 32-bit address 32bit data a) How many Tag comparisons are done for a cache access in a set? b) Find the number of tag, index and offset bits. c) Find the total size of cache. Show your work. d) How is the set number calculated for this cache configuration? e) The processor accesses the following blocks for reading in the given order. The cache is empty at program start. Block accesses: 0 - 12 - 144 - 12 - 256 - 76 - 18 - 140 - 204 - 5 - 10 - 140 - 64 – 256. Calculate the set position of each accessed block. Block 0 Set 12 144 12 256 76 18 140 204 5 10 140 64 256 f) What is the hit rate in best case and in worst case? 2. Consider three processors with different cache configurations: Cache 1: Direct mapped with one-word blocks Cache 2: Direct mapped with four-word blocks Cache 3: Two-way set associative with four-word blocks The following miss rate measurements have been made: Cache 1: Instruction miss rate is 4%; data miss rate is 6% Cache 2: Instruction miss rate is 2%; data miss rate is 4% Cache 3: Instruction miss rate is 2%; data miss rate is 3% a) For these processors, one-half of the instructions contain a data reference. Assume that cache miss penalty is 6 plus the block size in words. The CPI for this workload was measure d on processor with cache 1 and was found to be 2.0. Determine which processor spends most cycles on a cache miss. Show your work. b) The cycle times for the processors above are 420ps for the first and second processors and 310ps for the third processor. Determine which processor is the fastest and which is the slowest. Show your work. 3. Assume a processor has a clock rate of 500 MHz and an ideal CPI (no memory misses) of1.0. What is the effective CPI if a program with a mix of 50% arithmetic and logic, 30% load/stores and 20% control instructions is run, if 10% of the data memory operations and 1% of the instructions have a miss penalty of 50 cycles. Show the equation you used to get your answer. 4. Given is an 8 way set associative level 2 data cache with a capacity of 2 MByte (1MByte = 220 Byte) and a block size 128 Bytes. The cache is connected to the main memory by a shared 32 bit address and data bus. The cache and the RISC-CPU are connected by a separated address and data bus, each with a width of 32 bit. The CPU is executing a LW instruction. a) How much user data is transferred from the main memory to the cache in case of a cache miss? b) How much user data is transferred from the cache to the CPU in case of a cache miss? 5. A computer system has a 32K byte, 8-way set associative cache, and the block size is 8 bytes. The machine is byte addressable, and physical addresses generated by the CPU are 22 bits. Specify how the physical address is partitioned into tag, set, and offset fields, giving the number of bits in each field. 6. Assume an instruction cache miss rate for gcc of 2% and a data cache miss rate of 4%. If a machine has a CPI of 2 without any memory stalls and the miss penalty is 40 cycles for all misses, determine how much faster a machine would run with a perfect cache that never missed. Assume 36% of instructions are loads/stores. (6 pts) 7. Cache1 is direct- mapped , Cache2 is fully associative, and Cache3 is 2-way set associative. Each has 4, one-word blocks (4 total words). Assume that the miss penalty for each cache is 10 clock cycles. Assume that the caches are initially empty. Using word addresses, fill in the chart below whether each memory hits or misses and which block it would be in, for all of the caches. At the bottom of the chart, compute the hit rate and the total miss penalty. Use an LRU strategy for replacement when appropriate.