Cache Tag

advertisement
CMP 301A
Computer Architecture 1
Lecture 2
Outline
 Direct mapped caches: Reading and writing policies
 Measuring cache performance
 Improving cache performance
Enhancing main memory performance
Flexible placement of blocks: Associativity
Multilevel caches
Read and Write Policies
Cache read is much easier to handle than cache write:
Instruction cache is much easier to design than data cache
Cache write:
How do we keep data in the cache and memory consistent?
Two write options:
Write Through: write to cache and memory at the same time.
Isn’t memory too slow for this?
Write Back: write to cache only. Write the cache block to memory
when that cache block is being replaced on a cache miss.
 Need a “dirty” bit for each cache block
 Greatly reduce the memory bandwidth requirement
 Control can be complex
Write Buffer for Write Through
Processor
Cache
DRAM
Write Buffer
A Write Buffer is needed between the Cache and Memory
Processor: writes data into the cache and the write buffer
Memory controller: write contents of the buffer to memory
Write buffer is just a FIFO:
Typical number of entries: 4
Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle
Memory system designer’s nightmare:
Store frequency (w.r.t. time) -> 1 / DRAM write cycle
Write buffer saturation
Problem: Write buffer may hold updated value of location needed by a
read miss??!!
Write Allocate versus Not Allocate
Assume: a 16-bit write to memory location 0x0 and causes a miss
Do we read in the rest of the block (Byte 2, 3, ... 31)?
Yes: Write Allocate
No: Write Not Allocate
31
9
Cache Tag
Cache Tag
0
Cache Index
Byte Select
Ex: 0x00
Ex: 0x00
Cache Data
0x00
Byte 31
Byte 63
: :
Valid Bit
Example: 0x00
4
Byte 1
Byte 0
0
Byte 33 Byte 32 1
2
3
:
:
Byte 1023
:
:
Byte 992 31
Measuring cache performance
Impact of cache miss on Performance
Suppose a processor executes at
Clock Rate = 1 GHz (1 ns per cycle), Ideal (no misses) CPI = 1.1
50% arith/logic, 30% ld/st, 20% control
Suppose that 10% of memory operations (involving data) get 100 cycle miss penalty
Suppose that 1% of instructions get same miss penalty
CPI  ideal CPI  average stalls per instructio n
CPI 
cycles  Data_Mops
miss
cycle   Inst_Mop
miss
cycle 






instr.  instr.
Data_Mops miss   instr.
Inst_Mop miss 
CPI  1.1  0.30  0.10 100  1 0.01100
cycle
 (1.1  3.0  1.0)
 5.1
instr.
78% of the time the proc is stalled waiting for memory!
6
Improving Cache Performance
Average memory access time(AMAT) =
Hit time + Miss rate x Miss penalty
To improve performance:
• reduce the hit time
• reduce the miss rate
• reduce the miss penalty
7
Enhancing main memory performance
 Increasing memory and
bus width
 Transfer more words
every clock cycle
 Isn’t too much wiring
 Using interleaved memory
organization
 Reduce access time
with less wiring
 Double Date Rate
(DDR) DRAMs
Enhancing main memory performance (Cont)
Flexible placement of blocks: Associativity
Block Number
1111111111 2222222222 33
0123456789 0123456789 0123456789 01
Memory
Set Number
0
1
2
3
01234567
Cache
block 12
can be placed
Fully
Associative
anywhere
(2-way) Set
Associative
anywhere in
set 0
(12 mod 4)
Direct
Mapped
only into
block 4
(12 mod 8)
10
Flexible placement of blocks: Associativity
11
A Two-way Set Associative Cache
N-way set associative: N entries for each Cache Index
N direct mapped caches operates in parallel
Example: Two-way set associative cache
Cache Index selects a “set” from the cache
The two tags in the set are compared in parallel
Data is selected based on the tag result
Valid
Cache Tag
:
:
Adr Tag
Compare
Cache Data
Cache Index
Cache Data
Cache Block 0
Cache Block 0
:
:
Sel1 1
Mux
0 Sel0
OR
Hit
Cache Block
Cache Tag
Valid
:
:
Compare
And yet Another Extreme Example: Fully Associative
Fully Associative Cache -- push the set associative idea to its limit!
Forget about the Cache Index
Compare the Cache Tags of all cache entries in parallel
Example: Block Size = 32 B blocks, we need N 27-bit comparators
By definition: Conflict Miss = 0 for a fully associative cache
31
4
Cache Tag (27 bits long)
0
Byte Select
Ex: 0x01
Valid Bit Cache Data
X
Byte 31
X
Byte 63
: :
Cache Tag
Byte 1
Byte 33 Byte 32
X
X
X
:
:
Byte 0
:
Replacement Policy
In an associative cache, which block from a set
should be evicted when the set becomes full?
• Random
• Least-Recently Used (LRU)
• LRU cache state must be updated on every access
• true implementation only feasible for small sets (2-way)
• First-In, First-Out (FIFO) a.k.a. Round-Robin
• used in highly associative caches
• Not-Most-Recently Used (NMRU)
• FIFO with exception for most-recently used block or blocks
Replacement only happens on misses
14
Download