COEN 180 Main Memory Cache Architectures Basics Speed difference between cache and memory is small. Therefore: Cache algorithms need to be implemented in hardware and be simple. MM is byte addressable, but only words are moved, typically, the last two bits of the address are not even transmitted. Hit rate needs to be high. Basics Average Access Time = (Hit Rate)*(Access to Cache) + (Miss Rate)*(Access to MM) Direct Mapped Cache Each Item in MM can be located in only one position in cache. Direct Mapped Cache Address is split into Tag (highest order bits); Index; 2 LSB distinguish the byte in the word. Direct Mapped Cache Tag serves to identify the data item in the cache. Index is the address of the word in the cache. Direct Mapped Cache Example Memory Addresses are 8b long. Cache with 8 words in it. Byte located at 01101011. Tag is 011. Index is 010. The two least significant bits are 11. Since the tags, If the byte is in the cache, it the byte is in the is byte 3 in the word stored cache. in line 010 = 2. Write Policies Write-through: A write is performed to both the cache and to the main memory. Copy-back: A write is performed only to cache. If an item is replaced by another item, then the item to be replaced is copied back to main memory. Dirty Bit To implement copy-back: Add one bit that tells us whether the data item is dirty (has changed) or not. Cache Operations Write-Through READ: 1. Extract Tag and Index from Address. 2. Go to the cache line given by Index. 3. See whether the Tag matches the Tag stored there. 4. If they match: Hit. Satisfy read from cache. 5. If they do not match: Miss. Satisfy read from main memory. Also store item in cache. Cache Operations Write-Through Write: 1. Extract Tag and Index from address. 2. Write datum in cache at location given by Index. 3. Reset the tag field in the cache line with Tag. 4. Write datum in main memory. Cache Operations Copy-Back If there is a read miss, then before the current item in cache is replaced, check the dirty bit. If the current item is dirty, then it needs to be written to main memory before being replaced. Writes are only to cache and set the dirty bit. Cache Overhead Assume a cache that contains 1MB worth of data. Hence, it contains 256K data words. Hence, it has 2**18 cache lines. Hence, index is 18 bits long, tag is 12 bits long. Hence, each cache line stores 12b+32b (write through). Hence, overhead is 12/32 = 37.5%. Cache uses 1.375MB actual storage. Blocks Moving only one word to cache does not exploit spatial locality. Move blocks of words into cache. Split address into: Address =Tag:Index:Word in Block:2 ls bits. Blocks If there is a miss on a read, bring all words in the block into the cache. For a write operation: – Write the word, bring other words into the cache. Or – Write directly to main memory. Do not update cache. Example: Blocked Cache Read access to byte at address 0111 0101. Extract components: 011:10:1:01. Go to cache line 2. Compare tags. No match, hence a miss. Dirty bit set. Move the two words to MM. Bring in words starting at 0111 0000 and 0111 R0100. Read access to byte at address 0100 0101. Extract components: 010:00:1:01. Go to cache line 0. Compare tags. They match, hence a hit. Byte is the second byte in word 1, i.e. 0100 0101 (assuming low-endian). Set Associative Cache In a direct mapped cache, there is only one possible location for a datum in the cache. Contention between two (or more) popular data is possible, resulting in low hit rates. Solution: Place more than one item in a cache line. Set Associative Cache A n-way set associative cache has a cache line consisting of n pairs of Tag + Datum. The larger the associativity n, the larger the hit rate. The larger the associativity n, the more complex the read, since all n tags need to be compared in parallel. Cache replacement is more difficult to implement. Set Associative Cache Example Popular words at address 0011 0000 and 1011 0000 fit both into cache. Split address 0011 0000 into 0011:00:00 and address 1011 0000 into 1011:00:00. Set Associative Cache Assume a l-way set associative cache. On a miss, we need to replace one of the l data in the cache line. We need to implement in hardware a choice. Random: Replaces one of the items at random. LRU: If the associativity is two, then we can store in a single bit the priority. If the set associativity is larger, then we use an approximation.