ADMIN IC220 Set #18: Caching Finale and Virtual Reality (Chapter 7)

advertisement
ADMIN
•
Reading – finish Chapter 7
– Sections 7.4 (skip 531-536), 7.5, 7.7, 7.8
IC220 Set #18:
Caching Finale and Virtual Reality
(Chapter 7)
1
Cache Performance
•
Performance Example
•
Simplified model:
execution time = (execution cycles + stall cycles) × cycle time
= execTime + stallTime
stall cycles =
(or)
•
2
=
Suppose processor has a CPI of 1.5 given a perfect cache. If there are 1.2
memory accesses per instruction, a miss penalty of 20 cycles, and a miss
rate of 10%, what is the effective CPI with the real cache?
MemoryAccesses
• MissRate • MissPenalty
Pr ogram
Instructions
Misses
•
• MissPenalty
Pr ogram Instruction
Two typical ways of improving performance:
– decreasing the miss rate
– decreasing the miss penalty
What happens if we increase block size?
Add associativity?
3
4
Split Caches
•
Cache Complexities
Instructions and data have different properties
– May benefit from different cache organizations (block size, assoc…)
•
Not always easy to understand implications of caches:
2000
1200
ICache
(L1)
Radix sort
1000
DCache
(L1)
L1
Radix sort
1600
800
1200
600
800
400
L2 Cache
L2 Cache
200
0
Quicksort
4
8
Quicksort
400
16
32
64
128
256
512 1024 2048 4096
0
4
Size (K items to sort)
Main memory
•
Theoretical behavior of
Radix sort vs. Quicksort
Main memory
8
16
32
64
128
256
512 1024 2048 4096
Size (K items to sort)
Observed behavior of
Radix sort vs. Quicksort
Why else might we want to do this?
5
Cache Complexities
•
Here is why:
Program Design for Caches – Example 1
5
Radix sort
4
6
•
Option #1
for (j = 0; j < 20; j++)
for (i = 0; i < 200; i++)
x[i][j] = x[i][j] + 1;
•
Option #2
for (i = 0; i < 200; i++)
for (j = 0; j < 20; j++)
x[i][j] = x[i][j] + 1;
3
2
1
0
Quicksort
4
8
16
32
64
128
256
512 1024 2048 4096
Size (K items to sort)
•
Memory system performance is often critical factor
– multilevel caches, pipelined processors, make it harder to predict outcomes
– Compiler optimizations to increase locality sometimes hurt ILP
•
Difficult to predict best algorithm: need experimental data
7
8
Program Design for Caches – Example 2
•
•
Why might this code be problematic?
int A[1024][1024];
int B[1024][1024];
for (i = 0; i < 1024; i++)
for (j = 0; j < 1024; j++)
A[i][j] += B[i][j];
VIRTUAL MEMORY
How to fix it?
9
Virtual memory summary (part 1)
10
Virtual memory summary (part 2)
Virtual address
Data access without
virtual memory:
31 30 29 28 27
Virtual address
15 14 13 12 11 10 9 8
Memory
Virtual page number
address
3210
Page offset
15 14 13 12 11 10 9 8
Physical page number
3210
Page offset
Translation
29 28 27
3210
Page offset
15 14 13 12 11 10 9 8
Virtual page number
“all problems in Computer Science can be
solved by another level of indirection”
-- Butler Lampson
Translation
29 28 27
Data access with
virtual memory:
31 30 29 28 27
15 14 13 12 11 10 9 8
Physical page number
3210
Page offset
Physical address
Cache
Cache
Disk
Disk
Memory
11
Memory
12
Virtual Memory
•
Address Translation
Terminology:
•Cache block
•Cache miss
•Cache tag
•Byte offset
Main memory can act as a cache for the secondary storage (disk)
Virtual addresses
Physical addresses
Address translation
Virtual address
31 30 29 28 27
15 14 13 12 11 10 9 8
3210
Page offset
Virtual page number
Disk addresses
•
•
Advantages:
– Illusion of having more physical memory
– Program relocation
– Protection
Note that main point is caching of disk in main memory but will
affect all our memory references!
Translation
29 28 27
13
Pages: virtual memory blocks
•
3210
Page offset
14
Physical address
Page Tables
Virtual page
number
Page faults: the data is not in memory, retrieve it from disk
– huge miss penalty (slow disk), thus
• pages should be fairly
Valid
1
1
1
1
0
1
1
0
1
1
0
1
• Replacement strategy:
– can handle the faults in software instead of hardware
•
15 14 13 12 11 10 9 8
Physical page number
Writeback or write-through?
15
Page table
Physical page or
disk address
Physical memory
Disk storage
16
Example – Address Translation Part 1
•
•
•
Translate the following addresses:
1. C0001560
2. C0006123
Page offset
3. C0002450
What will the physical address look like?
Physical page #
EX 7-31…
Page Table
Our virtual memory system has:
– 32 bit virtual addresses
– 28 bit physical addresses
– 4096 byte page sizes
How to split a virtual address?
Virtual page #
•
Example – Address Translation Part 2
Valid?
Physical Page
or Disk Block #
C0000
1
A204
C0001
1
A200
FB00
C0002
0
C0003
1
8003
C0004
1
7290
C0005
0
5600
C0006
1
F5C0
…
Page offset
How many entries in the page table?
17
Making Address Translation Fast
•
18
Protection and Address Spaces
A cache for address translations: translation lookaside buffer
•
TLB
Virtual page
number Valid Dirty Ref
1
1
1
1
0
1
0
1
1
0
0
0
Tag
Physical page
address
1
1
1
1
0
1
•
Physical memory
Every program has its own “address space”
– Program A’s address 0xc000 0200 not same as program B’s
– OS maps every virtual address to distinct physical addresses
How do we make this work?
– Page tables –
Page table
Physical page
Valid Dirty Ref or disk address
1
1
1
1
0
1
1
0
1
1
0
1
Typical values:
1
0
0
0
0
0
0
0
1
1
0
1
1
0
0
1
0
1
1
0
1
1
0
1
– TLB –
Disk storage
•
16-512 entries,
miss-rate: .01% - 1%
miss-penalty: 10 – 100 cycles
19
Can program A access data from program B? Yes, if…
1. OS can map different virtual page #’s to same physical page #’s
•
So A’s 0xc000 0200 = B’s 0xb320 0200
2. Program A has read or write access to the page
3. OS uses supervisor/kernel protection to prevent user programs
from modifying page table/TLB
20
TLBs and Caches
Integrating Virtual Memory, TLBs, and Caches
Virtual address
(Figure 7.25)
What happens after translation?
Virtual address
TLB access
31 30 29 28 27
TLB miss
exception
No
3210
Page offset
Physical address
No
No
Cache hit?
Yes
Write?
Translation
Try to read data
from cache
Cache miss stall
while read block
15 14 13 12 11 10 9 8
Virtual page number
Yes
TLB hit?
No
Yes
Write access
bit on?
Write protection
exception
Yes
29 28 27
Try to write data
to cache
15 14 13 12 11 10 9 8
Deliver data
to the CPU
Cache miss stall
while read block
No
Cache hit?
3210
Page offset
Physical page number
Yes
Write data into cache,
update the dirty bit, and
put the data and the
address into the write buffer
Cache
21
Modern Systems
•
22
Some Issues
Things are getting complicated!
•
Processor speeds continue to increase very fast
— much faster than either DRAM or disk access times
100,000
10,000
1,000
Performance
CPU
100
10
Memory
1
Year
•
23
Design challenge: dealing with this growing disparity
– Prefetching? 3rd level caches and more? Memory design?
24
Download