Section 11 PREPARED BY: TASNEEM ADEL Memory Hierarchy Memory Hierarchy Block (aka line): unit of copying If accessed data is present in upper level Hit: access satisfied by upper level Usually, multiple words Hit ratio: hits/accesses If accessed data is absent Miss: block copied from lower level Time taken miss penalty Miss ratio: misses/accesses Miss ratio = 1 – hit ratio Summary Write-through mode data is written in cache and in source memory when it is modified. No matter how many times data is modified in cache memory, it is the same amount of times modified in source memory as well. This mode can impact performance due to huge traffic through the bus between cache and source memory. Write-back mode means that data is written only in cache. And when it is being replaced in cache memory it is written in back in source memory. Then source memory is always written only once. WRITE ALLOCATION There are two common options on a write miss: Write Allocate - the block is loaded on a write miss, followed by the write-hit action. No Write Allocate (write-around)- the block is modified in the main memory and not loaded into the cache. WRITE ALLOCATION Write Through with Write Allocate: on hits it writes to cache and main memory on misses it updates the block in main memory and brings the block to the cache Bringing the block to cache on a miss does not make a lot of sense in this combination because the next hit to this block will generate a write to main memory anyway (according to Write Through policy) Write Through with No Write Allocate: on hits it writes to cache and main memory; on misses it updates the block in main memory not bringing that block to the cache; Subsequent writes to the block will update main memory because Write Through policy is employed. So, some time is saved not bringing the block in the cache on a miss because it appears useless anyway. WRITE ALLOCATION Write Back with Write Allocate: on hits it writes to cache setting dirty bit for the block, main memory is not updated; on misses it updates the block in main memory and brings the block to the cache; Subsequent writes to the same block, if the block originally caused a miss, will hit in the cache next time, setting dirty bit for the block. That will eliminate extra memory accesses and result in very efficient execution compared with Write Through with Write Allocate combination. Write Back with No Write Allocate: on hits it writes to cache setting dirty bit for the block, main memory is not updated; on misses it updates the block in main memory not bringing that block to the cache; Subsequent writes to the same block, if the block originally caused a miss, will generate misses all the way and result in very inefficient execution. PRINCIPLE OF LOCALITY Spatial Locality : Spatial Locality means that all those instructions which are stored nearby to the recently executed instruction have high chances of execution. It refers to the use of data elements(instructions) which are relatively close in storage locations. Temporal Locality : Temporal Locality means that an instruction which is recently executed have high chances of execution again. So, the instruction is kept in cache memory such that it can be fetched easily and takes no time in searching for the same instruction. Direct Mapping ASSOCIATIVITY EXAMPLE Block access sequence: 0, 8, 0, 6, 8 Replacement Algorithms Problems Problem A set-associative cache consists of 64 lines, or slots, divided into four-line sets. Main memory contains 4K blocks of 128 words each. Show the format of main memory addresses. Solution The cache is divided into 16 sets of 4 lines each. Therefore, 4 bits are needed to identify the set number. Main memory consists of 4K = 2^12 blocks. Therefore, the set plus tag lengths must be 12 bits and therefore the tag length is 8 bits. Each block contains 128 words. Therefore, 7 bits are needed to specify the word. Problem A two-way set-associative cache has lines of 16 bytes and a total size of 8 KB. The 64-Mbyte main memory is byte addressable. Show the format of main memory addresses. Solution There are a total of 8 kbytes/16 bytes = 512 lines in the cache. Thus, the cache consists of 256 sets of 2 lines each. Therefore 8 bits are needed to identify the set number. For the 64-Mbyte main memory, a 26-bit address is needed. Main memory consists of 64-Mbyte/16 bytes = 2^22 blocks. Therefore, the set plus tag lengths must be 22 bits, so the tag length is 14 bits and the word field length is 4 bits. Problem Solution Problem ➢ Which references exhibit temporal locality? ➢ Which references exhibit spatial locality? Solution Problem Solution Solution Problem For a direct-mapped cache design with a 32-bit address and byte-addressable memory, the following bits of the address are used to access the cache: What is the cache line size (in words)? How many entries (cache lines) does the cache have? Cache lines are also called blocks. Solution Problem Recall that we have two write policies and write allocation policies, and their combinations can be implemented either in L1 or L2 cache. Assume the following choices for L1 and L2 caches: Buffers are employed between different levels of memory hierarchy to reduce access latency. For this given configuration, list the possible buffers needed between L1 and L2 caches, as well as L2 cache and memory. Describe the procedure of handling an L2 write-miss, considering the component involved and the possibility of replacing a dirty block. Write-back Write-through Solution A) Between L1 and L2 caches, one write buffer is required. When the miss occurs, we directly update the portion of the block into the buffer, which will be waiting to be written into L2 cache, while the processor doesn’t need to stall if the buffer is not full. Between L2 cache and the memory, we require write and store buffers. When we have a cache miss, we must first write the block back to memory if the data in the cache is modified. In this situation, a write buffer is required to hold that data, such that the processor can continue the execution while that data is waiting to be written to the memory. In the meanwhile, a store buffer is used, such that the processor places the new data in the store buffer. Then when a cache hit occurs, this new data is written from the store buffer into the cache. Solution B) First we check whether the block is dirty. If it is, then we write the dirty block to memory. Next, we retrieve the target block from memory (overwriting the block that is in our way). Finally we write to our L2 block. Virtual Memory Problem Solution 232 (4𝑘 𝑝𝑒𝑟 𝑜𝑛𝑒 𝑝𝑎𝑔𝑒)212 Number of page table entries= Size of the page table= 220 * 22 = 4𝑀𝐵 = 220 Thank You