Memory Organization (Chapter 5) ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2016

ELEC 5200-001/6200-001
Computer Architecture and Design
Spring 2016
Memory Organization (Chapter 5)
Vishwani D. Agrawal
James J. Danaher Professor
Department of Electrical and Computer Engineering
Auburn University, Auburn, AL 36849
http://www.eng.auburn.edu/~vagrawal
vagrawal@eng.auburn.edu
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
1
Types of Computer Memories
From the cover of:
A. S. Tanenbaum, Structured Computer Organization, Fifth Edition, Upper Saddle
River, New Jersey: Pearson Prentice Hall, 2006.
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
2
Random Access Memory (RAM)
Address bits
Address
decoder
Memory
cell
array
Read/write
circuits
Data bits
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
3
Six-Transistor SRAM Cell
bit
bit
Word
line
Bit line
Bit line
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
4
Dynamic RAM (DRAM) Cell
Bit
line
Word line
“Single-transistor DRAM cell”
Robert Dennard’s 1967 invevention
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
5
Electronic Memory Devices
Memory
technology
Typical
access
time
Clock rate
GHz
Cost per GB
in 2004
SRAM
0.5-5 ns
0.2-2.0 GHz
$4k-$10k
DRAM
50-70 ns
15-20 MHz
$100-$200
Magnetic
disk
5-20 ms
50-200 Hz
$0.5-$2
For more on memories:
Semiconductor Memories: A Handbook of Design, Manufacture and
Application, by Betty Prince, Wiley 1996.
Emerging Memories: Technologies and Trends, by Betty Prince,
Springer 2002.
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
6
Building a Computer with 1GHz
Clock and 40GB Memory
Type of memory
Cost
Clock rate
SRAM
$160,000
0.2-2.0 GHz
DRAM
$4,000
15-20 MHz
Disk
$20
50-200 Hz
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
7
Trying to Buy a Laptop Computer?
(Three Years Ago)
IBM ThinkPad X Series by Lenovo
23717GU
1.20 GHz Low Voltage Intel® Pentium® M, 1MB L2 SRAM Cache
Microsoft® Windows® XP Professional
512 MB DRAM
40 GB Hard Drive
2.7 lbs, 12.1" XGA (1024x768)
IBM Embedded Security Subsystem 2.0
Intel PRO/Wireless Network Connection 802.11b, Gigabit Ethernet
Integrated graphics Intel Extreme Graphics 2
No CD/DVD drive PROP, Fixed Bay
Availability**: Within 2 weeks
$2,149.00 IBM web price*
$1,741.65 sale price*
Last year $1,023.75 X61 with Intel Duo 2GHz Processor
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
~$5
~$100
~$40
8
2006/07
Choose a
Lenovo
3000 V Series to customize & buy
From:
$999.00
Sale price:
$949.00
Processor
Intel Core 2 Duo T5500 (1.66GHz, 2MBL2, 667MHzFSB)
Total memory
512MB PC2-5300DDR2 SDRAM
Hard drive
80GB, 5400rpm Serial ATA
Weight
4.0lbs
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
9
2007 Executive-Class
The classic, award-winning
ThinkPad design remains
unchanged - why mess with
success?
The Reserve Edition features
a leather exterior handmade
by expert Japanese saddle
makers.
ThinkPad Reserve Verizon Edition or
ThinkPad Reserve Cingular Edition
From $5,000
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
10
Nicholas Negroponte’s OLPC
(One Laptop per Child)
http://www.flickr.com/photos/olpc/3145038187/
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
11
X0-1
Manufacturer:
Type:
Connectivity:
Quanta Computer
Subnotebook
802.11b/g /s, wireless LAN, 3 USB 2.0 ports,
MMC/SD card slot
Media:
1 GB flash memory
Operating system: Fedora-based (Linux)
Input:
Keyboard, Touchpad, Microphone, Camera
Camera:
Built-in video camera (640×480; 30 FPS)
Power:
NiMH or LiFePO4 battery removable pack
CPU:
AMD Geode LX700@0.8 W
Memory:
256 MB DRAM
Display:
Dual-mode 19.1 cm/7.5" diagonal TFT LCD
1200×900
Dimensions:
242 mm × 228 mm × 32 mm
Weight:
LiFeP battery: 1.45 kg [3.2 lbs]; NiMH battery: 1.58
kg [3.5 lbs]
Price:
$100+
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
12
Today’s Laptop
Inspiron 15
Dell Price $379.99
Introducing the new Inspiron™ 15, a 15.6"
laptop that gives you the everyday features
you need, all at a great value.
•Up to Intel® Core™2 Duo processors
•Entertainment on the go with the HD display
•Personalize with a choice of six vibrant
colors or choose from over 200+ artist
designs with Design Studio
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
13
Cache
Processor
words
Cache
small, fast
memory
blocks
Main memory
large, inexpensive
(slow)
Spr 2016, Mar 28 . . .
Processor does all memory
operations with cache.
Miss – If requested word is not
in cache, a block of words
containing the requested word is
brought to cache, and then the
processor request is completed.
Hit – If the requested word is in
cache, read or write operation is
performed directly in cache,
without accessing main memory.
Block – minimum amount of
data transferred between cache
and main memory.
ELEC 5200-001/6200-001 Lecture 9
14
Inventor of Cache
M. V. Wilkes, “Slave Memories and Dynamic Storage Allocation,”
IEEE Transactions on Electronic Computers, vol. EC-14, no. 2,
pp. 270-271, April 1965.
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
15
Cache Performance
Processor
Average access time
= T1 × h + (Tm + T1) × (1 – h)
= T1 + Tm × (1 – h)
where
Access
time = T1
Cache
Small, fast
memory
– T1 = cache access time (small)
– Tm = memory access time (large)
– h = hit rate (0 ≤ h ≤ 1)
Access time = Tm
Main memory
large, inexpensive
(slow)
Spr 2016, Mar 28 . . .
Hit rate is also known as hit ratio,
miss rate = 1 – hit rate
ELEC 5200-001/6200-001 Lecture 9
16
Average Access Time
Acceptable miss rate < 10%
Access time
Tm + T1
T1 + Tm × (1– h)
Desirable
miss rate < 5%
T1
miss rate, 1 – h
0
h=1
Spr 2016, Mar 28 . . .
1
h=0
ELEC 5200-001/6200-001 Lecture 9
17
Comparing Performance
Ideal processor with 1 cycle memory access, CPI = 1
Processor without cache:
Assume main memory access time of 10 cycles
Assume 30% instructions require memory data access
Processor with cache:
Assume cache access time of 1 cycle
Assume hit rate 0.95 for instructions, 0.90 for data
Assume miss penalty (time to read memory into cache and from
cache to processor) is 11 cycles
– Comparing times of 100 instructions:
Time without cache
────────────
Time with cache
100×10 + 30×10
= ────────────────────────────
100(0.95×1+0.05×11) + 30(0.9×1+0.1×11)
= 6.19
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
18
Controlling Miss Rate
Increase cache size
More blocks can be
kept in cache; lower
miss rate.
Larger cache is
slower; expensive.
Increase block size
Cache
Blocks
Large memory
More data available;
may lower miss rate.
Fewer blocks in
cache increase miss
rate.
Larger blocks need
more time to swap.
Cache
Blocks
Large memory
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
19
Cache Size
hit rate
1/(cycle time)
optimum
Increasing cache size
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
20
Block Size
1.0
hit rate
fragmented
data
localized
data
0.8
optimum
Increasing block size
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
21
Increasing Hit Rate
Hit rate increases with cache size.
Hit rate mildly depends on block size.
100%
64KB
Decreasing
chances of
covering large
data locality
5%
16KB
Decreasing
chances of
getting
fragmented
data
95%
hit rate, h
miss rate = 1 – hit rate
0%
Cache size = 4KB
10%
90%
16B
Spr 2016, Mar 28 . . .
32B
64B
Block size
128B
ELEC 5200-001/6200-001 Lecture 9
256B
22
The Locality Principle
A program tends to access data that form a
physical cluster in the memory – multiple
accesses may be made within the same block.
Physical localities are temporal and may shift
over longer periods of time – data not used for
some time is less likely to be used in the future.
Upon miss, the least recently used (LRU) block
can be overwritten by a new block.
P. J. Denning, “The Locality Principle,”
Communications of the ACM, vol. 48, no. 7, pp.
19-24, July 2005.
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
23
Data Locality, Cache, Blocks
Memory
Increase
block size
to match
locality
size
Increase
cache size
to include
most blocks
Cache
Data
needed by
a program
Block 1
Block 2
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
24
Types of Caches
Direct-mapped cache
Partitions of size of cache in the memory
Each partition subdivided into blocks
Set-associative cache
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
25
Direct-Mapped Cache
Memory
Cache
LRU
Data
needed by
a program
Block 1
Block 2
Swap-in
Spr 2016, Mar 28 . . .
Data
needed
ELEC 5200-001/6200-001 Lecture 9
26
Set-Associative Cache
Memory
Cache
LRU
Data
needed by
a program
Block 1
Block 2
Spr 2016, Mar 28 . . .
Data
needed
ELEC 5200-001/6200-001 Lecture 9
27
Cache of 8 blocks
Block size = 1 word
01000
01001
01010
01011
01100
01101
01110
01111
00
10
11
01
01
00
10
11
10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
11011
11100
11101
11110
11111
Spr 2016, Mar 28 . . .
index (local address)
00000
00001
00010
00011
00100
00101
00110
00111
tag
32-word word-addressable memory
Direct-Mapped Cache
000
001
010
011
100
101
110
111
cache address:
tag
index
Main
memory
ELEC 5200-001/6200-001 Lecture 9
11 101 → memory address
28
Cache of 4 blocks
Block size = 2 word
01000
01001
01010
01011
01100
01101
01110
01111
00
11
00
10
10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
11011
11100
11101
11110
11111
Spr 2016, Mar 28 . . .
0
1
index (local address)
00000
00001
00010
00011
00100
00101
00110
00111
tag
32-word word-addressable memory
Direct-Mapped Cache
00
01
10
11
block offset
cache address:
tag
index
block offset
Main
memory
ELEC 5200-001/6200-001 Lecture 9
11 10 1 → memory address
29
Number of Tag and Index Bits
Cache Size
= w words
Main memory
Size=W words
Each word in cache has unique index (local addr.)
Number of index bits = log2w
Index bits are shared with block offset when
a block contains more words than 1
Assume partitions of w words each
in the main memory.
W/w such partitions, each identified by a tag
Number of tag bits = log2(W/w)
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
30
00000 00
00001 00
00010 00
00011 00
00100 00
00101 00
00110 00
00111 00
Cache of 8 blocks
01000 00
01001 00
01010 00
01011 00
01100 00
01101 00
01110 00
01111 00
index
Block size = 1 word
tag
32-word byte-addressable memory
Direct-Mapped Cache (Byte Address)
00
10
11
01
01
00
10
11
000
001
010
011
100
101
110
111
10000 00
10001 00
10010 00
10011 00
10100 00
10101 00
10110 00
10111 00
11000 00
11001 00
11010 00
11011 00
11100 00
11101 00
11110 00
11111 00
Spr 2016, Mar 28 . . .
cache address:
tag
index
Main
memory
11 101 00 → memory address
byte offset
ELEC 5200-001/6200-001 Lecture 9
31
Finding a Word in Cache
Memory address
32 words
byte-address
Tag
b6 b5 b4 b3 b2 b1 b0
Index
Index
Valid 2-bit
bit
Tag
byte offset
Data
000
001
010
011
100
101
110
111
Cache size
8 words
Block size
= 1 word
=
Data
1 = hit
0 = miss
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
32
How Many Bits Cache Has?
Consider a main memory:
32 words; byte address is 7 bits wide: b6 b5 b4 b3 b2 b1 b0
Each word is 32 bits wide
Assume that cache block size is 1 word (32 bits
data) and it contains 8 blocks.
Cache requires, for each word:
2 bit tag, and one valid bit
Total storage needed in cache
= #blocks in cache × (data bits/block + tag bits + valid bit)
= 8 (32+2+1) = 280 bits
Physical storage/Data storage = 280/256 = 1.094
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
33
A More Realistic Cache
Consider 4 GB, byte-addressable main memory:
1Gwords; byte address is 32 bits wide: b31…b16 b15…b2 b1 b0
Each word is 32 bits wide
Assume that cache block size is 1 word (32 bits data) and it
contains 64 KB data, or 16K words, i.e., 16K blocks.
Number of cache index bits = 14, because 16K = 214
Tag size = 32 – byte offset – #index bits = 32 – 2 – 14 = 16 bits
Cache requires, for each word:
16 bit tag, and one valid bit
Total storage needed in cache
= #blocks in cache × (data bits/block + tag size + valid bits)
= 214(32+16+1) = 16×210×49 = 784×210 bits = 784 Kb = 98 KB
Physical storage/Data storage = 98/64 = 1.53
But, need to increase the block size to match the size of locality.
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
34
Tag
0
0
0
0
0
0
0
0
0100
0011
0101
1011
1001
1010
1100
1110
4 bit tag means memory is 16 times larger than cache
Block offset
00
01
4 word block
10
11
000
001
010
011
100
101
110
111
Block index
Address
mapping
overhead
Valid bit
Data Organization in Cache
Memory address of word in cache 1011 011 10 00
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
35
Cache Bits for 4-Word Block
Consider 4 GB, byte-addressable main memory:
1Gwords; byte address is 32 bits wide: b31…b16 b15…b2 b1 b0
Each word is 32 bits wide
Assume that cache block size is 4 words (128 bits data) and
it contains 64 KB data, or 16K words, i.e., 4K blocks.
Number of cache index bits = 12, because 4K = 212
– Tag size = 32 – byte offset – #block offset bits – #index bits
= 32 – 2 – 2 – 12 = 16 bits
Cache requires, for each word:
– 16 bit tag, and one valid bit
– Total storage needed in cache
= #blocks in cache × (data bits/block + tag size + valid bit)
= 212(4×32+16+1) = 4×210×145 = 580×210 bits =580 Kb = 72.5 KB
Physical storage/Data storage = 72.5/64 = 1.13
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
36
Using Larger Cache Block (4 Words)
Memory address
4GB = 1G words
byte-address
16 bit Tag
b31… b15 b14… b4 b3 b2 b1 b0
12 bit Index
Val. 16-bit
Data
bit Tag
(4 words=128 bits)
Index
4K Indexes
0000 0000 0000
byte offset
2 bit
block
offset
Cache size
16K words
Block size
= 4 word
1111 1111 1111
=
1 = hit
0 = miss
Spr 2016, Mar 28 . . .
MUX
Data
ELEC 5200-001/6200-001 Lecture 9
37
Handling a Miss
Miss occurs when data at the required memory
address is not found in cache.
Controller actions:
– Stall pipeline
– Freeze contents of all registers
– Activate a separate cache controller
If cache is full
– select the least recently used (LRU) block in cache for overwriting
– If selected block has inconsistent data, take proper action
Copy the block containing the requested address from memory
– Restart Instruction
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
38
Miss During Instruction Fetch
Send original PC value (PC – 4) to the
memory.
Instruct main memory to perform a read
and wait for the memory to complete the
access.
Write cache entry.
Restart the instruction whose fetch failed.
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
39
Writing to Memory
Cache and memory become inconsistent
when data is written into cache, but not to
memory – the cache coherence problem.
Strategies to handle inconsistent data:
– Write-through
Write to memory and cache simultaneously always.
Write to memory is ~100 times slower than to cache.
– Write buffer
Write to cache and to buffer for writing to memory.
If buffer is full, the processor must wait.
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
40
Writing to Memory: Write-Back
Write-back (or copy back) writes only to
cache but sets a “dirty bit” in the block
where write is performed.
When a block with dirty bit “on” is to be
overwritten in the cache, it is first written to
the memory.
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
41
AMD Opteron Microprocessor
L2
1MB
Block 64B
Write-back
L1
(split
64KB each)
Block 64B
Write-back
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
42
Interleaved Memory
Reduces miss penalty.
Memory designed to read words
of a block simultaneously in one
read operation.
Example:
Processor
words
Cache
Small, fast
memory
–
–
–
–
blocks
Memory
bank 0
Memory
bank 1
Memory
bank 2
Main memory
Spr 2016, Mar 28 . . .
Memory
bank 3
Cache block size = 4 words
Interleaved memory with 4 banks
Suppose memory access ~15 cycles
Miss penalty = 1 cycle to send
address + 15 cycles to read a block +
4 cycles to send data to cache
= 20 cycles
– Without interleaving,
Miss penalty = 65 cycles
ELEC 5200-001/6200-001 Lecture 9
43
Cache Hierarchy
Processor
Average access time
Access
time = T1
= T1 + (1 – h1) [ T2 + (1 – h2)Tm ]
L1 Cache
(SRAM)
Where
T1 = L1 cache access time
(smallest)
T2 = L2 cache access time (small)
Tm = memory access time (large)
h1, h2 = hit rates (0 ≤ h1, h2 ≤ 1)
Access time = T2
L2 Cache
(DRAM)
Access time = Tm
Main memory
large, inexpensive
(slow)
Spr 2016, Mar 28 . . .
Average access time reduces
by adding a cache.
ELEC 5200-001/6200-001 Lecture 9
44
Average Access Time
T1 + (1 – h1) [ T2 + (1 – h2)Tm ]
T1 < T2 < Tm
Access time
T1+T2+Tm
T1+T2+Tm / 2
T1+T2
T1
miss rate, 1- h1
0
h1=1
Spr 2016, Mar 28 . . .
1
h1=0
ELEC 5200-001/6200-001 Lecture 9
45
Processor Pipeline Performance
Without Cache
5GHz processor, cycle time = 0.2ns
Memory access time = 100ns = 500 cycles
Ignoring memory access, CPI = 1
Assuming no memory data access:
CPI = 1 + # stall cycles
= 1 + 500 = 501
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
46
Performance with 1 Level Cache
Assume hit rate, h1 = 0.95
L1 access time = 0.2ns = 1 cycle
CPI
= 1 + # stall cycles
= 1 + 0.05×500
= 26
Processor speed increase due to cache
= 501/26 = 19.3
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
47
Performance with 2 Level Caches
Assume:
– L1 hit rate, h1 = 0.95
– L2 hit rate, h2 = 0.90
– L2 access time = 5ns = 25 cycles
CPI = 1 + # stall cycles
= 1 + 0.05 (25 + 0.10×500)
= 1 + 3.75 = 4.75
Processor speed increase due to 2 level caches
= 501/4.75
= 105.5
Speed increase due to L2 cache
= 26/4.75
= 5.47
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
48
This block is needed
Cache of 8 blocks
Block size = 1 word
01000 00
01001 00
01010 00
01011 00
01100 00
01101 00
01110 00
01111 00
00
10
11
01
01
00 11
10
11
10000 00
10001 00
10010 00
10011 00
10100 00
10101 00
10110 00
10111 00
11000 00
11001 00
11010 00
11011 00
11100 00
11101 00
11110 00
11111 00
Spr 2016, Mar 28 . . .
index
00000 00
00001 00
00010 00
00011 00
00100 00
00101 00
00110 00
00111 00
tag
32-word word-addressable memory
Miss Rate of Direct-Mapped Cache
000
001
010
011
100
101
110
111
Least recently used
(LRU) block
cache address:
tag
index
Main
memory
11 101 00 → memory address
byte offset
ELEC 5200-001/6200-001 Lecture 9
49
00000 00
00001 00
00010 00
00011 00
00100 00
00101 00
00110 00
00111 00
This block is needed
Cache of 8 blocks
Block size = 1 word
01000 00
01001 00
01010 00
01011 00
01100 00
01101 00
01110 00
01111 00
tag
32-word word-addressable memory
Fully-Associative Cache (8-Way Set Associative)
00 000
10 001
11 010
01 011
01 100
00 101
01 010 11101
11 111
10000 00
10001 00
10010 00
10011 00
10100 00
10101 00
10110 00
10111 00
11000 00
11001 00
11010 00
11011 00
11100 00
11101 00
11110 00
11111 00
Spr 2016, Mar 28 . . .
LRU block
cache address:
tag
Main
memory
11101 00 → memory address
byte offset
ELEC 5200-001/6200-001 Lecture 9
50
01000 00
01001 00
01010 00
01011 00
01100 00
01101 00
01110 00
01111 00
10000 00
10001 00
10010 00
10011 00
10100 00
10101 00
10110 00
10111 00
11000 00
11001 00
11010 00
11011 00
11100 00
11101 00
11110 00
11111 00
Spr 2016, Mar 28 . . .
Memory references to addresses: 0, 8, 0, 6, 8, 16
1. miss
3. miss
2.miss
Cache of 8 blocks
Block size = 1 word
00 / 01 / 00 / 10
xx
xx
xx
xx
xx
00
xx
4. miss
5. miss
6. miss
Main
memory
index
00000 00
00001 00
00010 00
00011 00
00100 00
00101 00
00110 00
00111 00
tag
32-word word-addressable memory
Miss Rate of Direct-Mapped Cache
cache address:
tag
index
11 101 00 → memory address
byte offset
ELEC 5200-001/6200-001 Lecture 9
000
001
010
011
100
101
110
111
51
00000 00
00001 00
00010 00
00011 00
00100 00
00101 00
00110 00
00111 00
01000 00
01001 00
01010 00
01011 00
01100 00
01101 00
01110 00
01111 00
10000 00
10001 00
10010 00
10011 00
10100 00
10101 00
10110 00
10111 00
11000 00
11001 00
11010 00
11011 00
11100 00
11101 00
11110 00
11111 00
Spr 2016, Mar 28 . . .
Memory references to addresses: 0, 8, 0, 6, 8, 16
Cache of 8 blocks
1. miss
Block size = 1 word
4. miss
tag
32-word word-addressable memory
Miss Rate: Fully-Associative Cache
2. miss
00000
01000
00110
10000
xxxxx
xxxxx
xxxxx
xxxxx
6. miss
3. hit
5. hit
cache address:
tag
Main
memory
11101 00 → memory address
byte offset
ELEC 5200-001/6200-001 Lecture 9
52
Finding a Word in Associative Cache
Memory address
32 words
byte-address
b6 b5 b4 b3 b2 b1 b0
5 bit Tag
no index
Valid 5-bit
bit
Tag
byte offset
Data
Cache size
8 words
Block size
= 1 word
No index,
must compare
with all tags
in the cache
=
Data
1 = hit
0 = miss
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
53
Eight-Way Set-Associative Cache
Memory address
32 words
byte-address
V | tag | data
=
b6 b5 b4 b3 b2 b1 b0
byte offset
5 bit Tag
V | tag | data
=
V | tag | data
V | tag | data
=
=
V | tag | data
V | tag | data
=
=
V | tag | data
=
Cache size
8 words
Block size
= 1 word
V | tag | data
=
8 to 1 multiplexer
1 = hit
0 = miss
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
Data
54
This block is needed
Cache of 8 blocks
Block size = 1 word
01000 00
01001 00
01010 00
01011 00
01100 00
01101 00
01110 00
01111 00
000 | 011
100 | 001
110 | 101
010 | 111
10000 00
10001 00
10010 00
10011 00
10100 00
10101 00
10110 00
10111 00
11000 00
11001 00
11010 00
11011 00
11100 00
11101 00
11110 00
11111 00
Spr 2016, Mar 28 . . .
index
00000 00
00001 00
00010 00
00011 00
00100 00
00101 00
00110 00
00111 00
tags
32-word word-addressable memory
Two-Way Set-Associative Cache
00
01
10
11
LRU block
cache address:
tag
index
Main
memory
111 01 00 → memory address
byte offset
ELEC 5200-001/6200-001 Lecture 9
55
10000 00
10001 00
10010 00
10011 00
10100 00
10101 00
10110 00
10111 00
11000 00
11001 00
11010 00
11011 00
11100 00
11101 00
11110 00
11111 00
Spr 2016, Mar 28 . . .
Cache of 8 blocks
1. miss
Block size = 1 word
2. miss
000 | 010
xxx | xxx
001 | xxx
xxx | xxx
index
01000 00
01001 00
01010 00
01011 00
01100 00
01101 00
01110 00
01111 00
Memory references to addresses: 0, 8, 0, 6, 8, 16
tags
00000 00
00001 00
00010 00
00011 00
00100 00
00101 00
00110 00
00111 00
4. miss
32-word word-addressable memory
Miss Rate: Two-Way Set-Associative Cache
00
01
10
11
3. hit
6. miss
5. hit
cache address:
tag
index
Main
memory
111 01 00 → memory address
byte offset
ELEC 5200-001/6200-001 Lecture 9
56
Two-Way Set-Associative Cache
Memory address
32 words
byte-address
b6 b5 b4
b3 b2 b1 b0
byte offset
3 bit tag
2 bit index
00
01
10
11
V | tag |
data
V | tag |
data
V | tag |
data
V | tag |
data
V | tag |
data
V | tag |
data
V | tag |
data
V | tag |
data
1 = hit
0 = miss
Spr 2016, Mar 28 . . .
=
2 to 1 MUX
=
Cache size
8 words
Block size
= 1 word
ELEC 5200-001/6200-001 Lecture 9
Data
57
Cache Miss and Page Fault
Disk
Pages
All data, organized in
Pages (~4KB), accessed by
Physical addresses
(Write-back,
same as in
cache)
Main
Memory
Processor
Cache
MMU
Cached pages,
Page table
Cache miss:
a required block
is not found in
cache
Page fault:
a required page
is not found in
main memory
“Page fault” in virtual memory is similar to “miss” in cache.
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
58
Virtual vs. Physical Address
Processor assumes a virtual memory addressing
scheme:
Disk is a virtual memory (large, slow)
A block of data is called a virtual page
An address is called virtual (or logical) address (VA)
Main memory may have a different addressing
scheme:
Physical memory consists of caches
Memory address is called physical address
MMU translates virtual address to physical address
Complete address translation table is large and is kept in
main memory
MMU contains TLB (translation-lookaside buffer), which
keeps record of recent address translations.
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
59
Memory Hierarchy
Main memory
Physical
Virtual
Registers
Cache
(1 or more levels)
Words
transferred
via load/store
Spr 2016, Mar 28 . . .
Blocks
transferred
automatically
upon cache miss
Pages
transferred
automatically
upon page fault
ELEC 5200-001/6200-001 Lecture 9
60
Memory Hierarchy Example
32-bit address (byte addressing)
4 GB virtual main memory (disk space)
Page size = 4 KB
Number of virtual pages = 4✕230/(4✕210) = 1M
Bits for virtual page number = log2(1M) = 20
128 MB physical main memory
Page size 4 KB
Number of physical pages = 128✕220/(4✕210) = 32K
Bits for physical page number = log2(32K) = 15
Page table contains 1M records specifying where
each virtual page is located.
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
61
Address of
Page table
in main
memory
1M virtual page numbers
Page table
register
0
1
2
3
.
.
K
.
.
.
Page locations
2
1
3
0
-
Valid bit
Other flags, e.g.,
dirty bit, LRU ref. bit
Page 0
Page 1
Page 2
Page 3
Physical main memory (pages)
Page Table
Virtual main memory
(pages on disk)
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
62
32-bit Virtual Address (4 KB Page)
20-bit virtual page number |10-b word number within page|2-b byte offset
1K words
(4KB data)
A virtual page
(contains 4KB,
or 1K words)
32 bits (4 bytes)
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
63
Virtual to Physical Address Translation
Virtual address
20-bit virtual page number
| 12-bit byte offset within page
Address translation
15-bit physical page number | 12-bit byte offset within page
Physical address
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
64
Virtual Memory System
Virtual or logical
address (VA)
Physical
address (PTE)
MMU:
Memory
management
unit with
TLB
Processor
Cache
Data
Data
DRAM
Spr 2016, Mar 28 . . .
SRAM
Physical
address
Main Memory
with page table
ELEC 5200-001/6200-001 Lecture 9
Disk
DMA:
Direct memory
access
65
TLB: Translation-Lookaside Buffer
A processor request requires two
accesses to main memory:
Access page table to get physical address
Access physical address
TLB acts as a cache of page table
Holds recent virtual to physical page translations
Eliminates one main memory access if requested
virtual page address is found in TLB (hit)
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
66
Address of
Page table
in main
memory
1M virtual page numbers
Page table
register
0
1
2
3
4
.
K
.
.
.
Valid bit
Other flags, e.g.,
dirty bit, LRU ref. bit
Spr 2016, Mar 28 . . .
V
DR
Tag
Phy. Pg. Addr.
1
11
4
3
1
01
1
2
2
1
3
0
Page locations
Page 0
Page 1
Page 2
Page 3
Physical main memory (pages)
TLB Organization
Virtual main memory
(pages on disk)
ELEC 5200-001/6200-001 Lecture 9
67
TLB Data
V: Valid bit
D: Dirty bit
R: Reference bit (LRU)
Tag: Index in page table
Physical page address
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
68
Typical TLB Characteristics
TLB size: 16 – 512 entries
Block size: 1 – 2 page table entries of 4 –
8 bytes each
Hit time: 0.5 – 1 clock cycle
Miss penalty: 10 – 100 clock cycles
Miss rate: 0.01% – 1%
Spr 2016, Mar 28 . . .
ELEC 5200-001/6200-001 Lecture 9
69