Set Associative Cache

advertisement
CMPE 421 Advanced Computer Architecture
Caching with Associativity
PART2
1
Other Cache organizations
Direct Mapped
Index
V Tag
0:
1:
2
3:
4:
5:
6:
7:
8
9:
10:
11:
12:
13:
14:
15:
Fully Associative
Data
V Tag
Data
No Index
Each address
has only one
possible location
Address = Tag | Index | Block offset
Address = Tag | Block offset
2
Fully Associative Cache
3
A Compromise
2-Way set associative
V Tag
4-Way set associative
V Tag
Data
0:
0:
1:
2:
3:
Each address
has two possible
locations with
the same index
6:
1:
Each address
has four possible
locations with the
same index
2:
4:
5:
Data
One fewer index
bit: 1/2 the indexes
7:
Address = Tag | Index | Block offset
3:
Two fewer index bits:
1/4 the indexes
Address = Tag | Index | Block offset
4
Range of Set Associative Caches
Used for tag compare
Tag
Decreasing associativity
Direct mapped
(only one way)
Smaller tags
Selects the set
Index
Selects the word in the block
Block offset Byte offset
Increasing associativity
Fully associative
(only one set)
Tag is all the bits except
block and byte offset
5
Set Associative Cache
Cache
Way Set V Tag
0
0
1
1
0
1
Data
Q2: Is it there?
Compare all the cache
tags in the set to the
high order 3 memory
address bits to tell if
the memory block is in
the cache
Main Memory
0000xx
0001xx Two low order bits
0010xx define the byte in the
word (32-b words)
0011xx
One word blocks
0100xx
0101xx
0110xx
0111xx
1000xx Q1: How do we find it?
1001xx
1010xx Use next 1 low order
1011xx memory address bit to
1100xx determine which
1101xx cache set (i.e., modulo
1110xx the number of sets in
1111xx the cache)
(block address) modulo (# set in the cache)
6
Set Associative Cache Organization
FIGURE 7.17 The implementation of a four-way set-associative cache requires four comparators and a 4-to-1
multiplexor. The comparators determine which element of the selected set (if any) matches the tag. The output of the
comparators is used to select the data from one of the four blocks of the indexed set, using a multiplexor with a decoded
select signal. In some implementations, the Output enable signals on the data portions of the cache RAMs can be used
to select the entry in the set that drives the output. The Output enable signal comes from the comparators, causing the
element that matches to drive the data outputs.
7
Remember the Example for Direct Mapping (ping pong effect)

Consider the main memory word reference string
Start with an empty cache - all
blocks initially marked as not valid
0 miss
00
00
01


Mem(0)
0 miss
Mem(4)
0
01
00
01
00
0 4 0 4 0 4 0 4
4 miss
Mem(0)
4
4 miss
Mem(0)
4
00
01
00
01
0 miss
0
01
00
0 miss
0
Mem(4)
01
00
Mem(4)
4 miss
Mem(0)4
4 miss
Mem(0)
4
8 requests, 8 misses
Ping pong effect due to conflict misses - two memory
locations that map into the same cache block
8
Solution: Use set associative cache

Consider the main memory word reference string
Start with an empty cache - all
blocks initially marked as not valid
0 miss
000


Mem(0)
0 4 0 4 0 4 0 4
4 miss
0 hit
4 hit
000
Mem(0)
000
Mem(0)
000
Mem(0)
010
Mem(4)
010
Mem(4)
010
Mem(4)
8 requests, 2 misses
Solves the ping pong effect in a direct mapped cache
due to conflict misses since now two memory locations
that map into the same cache set can co-exist!
9
Byte offset (2 bits)
Block offset (2 bits)
Index (1-3 bits)
Tag (3-5 bits)
Set Associative Example
0100111000
1100110100
0100111100
0110110000
1100111000
Index V
000: 0
Tag
Miss
Miss
Miss
Miss
Miss
Data
Index V
0
0
0
01:
0
0
10:
0
0
1
11:
1
0
Tag
Miss
Miss
Hit
Miss
Miss
Data
00:
001: 0
010: 0
011: 0
1
100: 0
101: 0
0100111000
1100110100
0100111100
0110110000
1100111000
011
110
010
010
-
110: 0
111: 0
Direct-Mapped
1100
0100
0110
1100
-
2-Way Set Assoc.
0100111000
1100110100
0100111100
0110110000
1100111000
Index V
0
0
0:
0
0
0
1
0
1
1:
0
1
0
Miss
Miss
Hit
Miss
Hit
Tag
Data
01001
11001
01101
-
4-Way Set Assoc.
10
New Performance Numbers
Miss rates for DEC 3100 (MIPS machine)
Separate 64KB Instruction/Data Caches
Benchmark Associativity Instruction
rate
gcc
Direct
2.0%
Data miss
miss rate
1.7%
Combined
1.9%
gcc
2-way
1.6%
1.4%
1.5%
gcc
4-way
1.6%
1.4%
1.5%
spice
Direct
0.3%
0.6%
0.4%
spice
2-way
0.3%
0.6%
0.4%
spice
4-way
0.3%
0.6%
0.4%
11
Benefits of Set Associative Caches

The choice of direct mapped or set associative depends
on the cost of a miss versus the cost of implementation
12
4KB
8KB
16KB
32KB
64KB
128KB
256KB
512KB
Miss Rate
10
8
6
4
2
0
1-way
2-way
4-way
8-way
Data from Hennessy &
Patterson, Computer
Architecture, 2003
Associativity

Largest gains are in going from direct mapped to 2-way
(20%+ reduction in miss rate)
12
Virtual Memory (32-bit system): 8KB page size,16MB Mem
31
Index
0
13 12
19
219=512K
Page offset
13
Virtual Address
4GB / 8KB =
512K entries
Virt.
Pg.# V Phys. Page #
0
1
2
...
Disk Address
...
512K
11
23
13 12
0
Physical Address
Virtual memory example
System with 20-bit V.A., 16KB pages, 256KB of physical memory
Page offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N.
Page Table:
Virtual Page #
(index)
000000
000001
000010
000011
000100
000101
000110
000111
Valid
Bit
1
0
1
0
1
1 0
0 1
1
Physical Page #/
Disk address
1001
sector 5000...
0010
sector 4323…
1011
1010 sector xxxx...
sector 1239... 1010
0001
Access to:
0000 1000 1100 1010 1010
PPN = 0010
Physical Address:
00 1000 1100 1010 1010
Access to:
0001 1001 0011 1100 0000
PPN = Page Fault to
sector 1239...
Pick a page to “kick out” of memory (use LRU). Read data from sector 1239
Assume LRU is VPN 000101 for this example. into PPN 1010
Download