memory2001

advertisement
Lecture 9-1
Virtual Memory
Original Note By Prof. Mike Schulte
Present by Pradondet Nilagupta
Spring 2001
Virtual Memory
• Virtual memory (VM) allows main memory (DRAM) to
act like a cache for secondary storage (magnetic disk).
• VM address translation a provides a mapping from the
virtual address of the processor to the physical
address in main memory or on disk.
• VM provides the following benefits
– Allows multiple programs to share the same physical memory
– Allows programmers to write code as though they have a very large
amount of main memory
– Automatically handles bringing in data from disk
• Cache terms vs. VM terms
– Cache block => page or segment
– Cache Miss => page fault or address fault
Virtual Memory Basics
• Programs reference “virtual” addresses in a nonexistent memory
– These are then translated into real “physical” addresses
– Virtual address space may be bigger than physical address
space
• Divide physical memory into blocks, called pages
– Anywhere from 512 to 16MB (4k typical)
• Virtual-to-physical translation by indexed table lookup
– Add another cache for recent translations (the TLB)
• Invisible to the programmer
– Looks to your application like you have a lot of memory!
– Anyone remember overlays?
204521 Digital Computer Architecture
3
VM: Page Mapping
Process 1’s
Virtual
Address
Space
Page Frames
Process 2’s
Virtual
Address
Space
Disk
Physical Memory
204521 Digital Computer Architecture
4
VM: Address Translation
20 bits
Virtual page number
12 bits
Page offset
Log2 of
pagesize
Per-process page table
Valid bit
Protection bits
Dirty bt
Reference bit
Page
Table
base
Physical page number Page offset
To physical memory
204521 Digital Computer Architecture
5
Typical Page Parameters
Parameter
Value
Page Size
4KB – 64KB
L1 Cache Hit Time
1-2 clock cycles
Virtual Hit (e.g. mapped to DRAM)
50-400 clock cycles
Miss Penalty (all the way to disk)
700k-6M clock cycles
Disk Access Time
500k-4M clock cycles
Page Transfer Time
200k-2M clock cycles
Page Fault Rate
.001% - .00001%
Main Memory Size
4MB – 4GB
• It’s a lot like what happens in a cache
– But everything (except miss rate) is a LOT worse
204521 Digital Computer Architecture
6
Paging vs. Segmentation
• Pages are fixed sized blocks
• Segments vary from 1 byte to 232 (for 32bit addresses) bytes
Aspect
Page
Words per address
One – contains
page and offset
Programmer visible?
No
Replacement
Trivial – because of
fixed size
Memory Efficiency
Internal
Fragmentation
Disk Efficiency
Yes – adjust page
size to balance
access and
transfer time
204521 Digital Computer Architecture
Segment
Two – possible large
max-size, so need Seg
and offset words
Sometimes
Hard, need to find
contiguous space, use
garbage collection
External Fragmentation
Not always – segment
size varies
7
Cache and VM Parameters
Parameter
L1 Cache
Virtual Memory
Block (page) size
16-128 bytes
4096-65,536 bytes
Hit time
1-2 cycles
40-100 cycles
Miss Penalty
8-100 cycles
1 to 6 million cycles
Miss rate
0.5-10%
0.00001-0.001%
Memory size
16 KB to 1 MB
16 MB to 8 GB
• How is virtual memory different from caches?
– Software controls replacement - why?
– Size of virtual memory determined by size of processor
address
– Disk is also used to store the file system - nonvolatile
Paged and Segmented VM
(Figure 5.38, pg. 442)
• Virtual memories can be catagorized into two
main classes
– Paged memory : fixed size blocks
– Segmented memory : variable size blocks
Paged vs. Segmented VM
• Paged memory
–
–
–
–
–
Fixed sized blocks (4 KB to 64 KB)
One word per address (page number + page offset)
Easy to replace pages (all same size)
Internal fragmentation (not all of page is used)
Efficient disk traffic (optimize for page size)
• Segmented memory
–
–
–
–
–
Variable sized blocks (up to 64 KB or 4GB)
Two words per address (segment + offset)
Difficult to replace segments (find where segment fits)
External fragmentation (unused portions of memory)
Inefficient disk traffic (may have small or large transfers)
• Hybrid approaches
– Paged segments: segments are a multiple of a page size
– Multiple page sizes: (e.g., 8 KB, 64 KB, 512 KB, 4096 KB)
Pages are Cached in a Virtual
Memory System
Can Ask the Same Four Questions we did about caches
• Q1: Block Placement
– choice: lower miss rates and complex placement or vice versa
• miss penalty is huge
• so choose low miss rate ==> place page anywhere in
physical memory
• similar to fully associative cache model
• Q2: Block Addressing - use additional data structure
– fixed size pages - use a page table
• virtual page number ==> physical page number and
concatenate offset
• tag bit to indicate presence in main memory
204521 Digital Computer Architecture
11
Normal Page Tables
• Size is number of virtual pages
• Purpose is to hold the translation of VPN to
PPN
– Permits ease of page relocation
– Make sure to keep tags to indicate page is mapped
• Potential problem:
– Consider 32bit virtual address and 4k pages
– 4GB/4KB = 1MW required just for the page table!
– Might have to page in the page table…
• Consider how the problem gets worse on 64bit
machines with even larger virtual address spaces!
• Alpha has a 43bit virtual address with 8k pages…
– Might have multi-level page tables
204521 Digital Computer Architecture
12
Inverted Page Tables
Similar to a set-associative mechanism
• Make the page table reflect the # of physical pages
(not virtual)
• Use a hash mechanism
– virtual page number ==> HPN index into inverted page table
– Compare virtual page number with the tag to make sure it is the
one you want
– if yes
• check to see that it is in memory - OK if yes - if not page fault
– If not - miss
• go to full page table on disk to get new entry
• implies 2 disk accesses in the worst case
• trades increased worst case penalty for decrease in capacity
induced miss rate since there is now more room for real pages
with smaller page table
204521 Digital Computer Architecture
13
Inverted Page Table
Page
Offset
Hash
•Only store entries
For pages in physical
memory
Page Frame V
=
OK
Frame Offset
204521 Digital Computer Architecture
14
Address Translation Reality
• The translation process using page tables
takes too long!
• Use a cache to hold recent translations
– Translation Lookaside Buffer
• Typically 8-1024 entries
• Block size same as a page table entry (1 or 2 words)
• Only holds translations for pages in memory
• 1 cycle hit time
• Highly or fully associative
• Miss rate < 1%
• Miss goes to main memory (where the whole page
table lives)
• Must be purged on a process switch
204521 Digital Computer Architecture
15
Back to the 4 Questions
• Q3: Block Replacement (pages in physical
memory)
– LRU is best
• So use it to minimize the horrible miss penalty
– However, real LRU is expensive
• Page table contains a use tag
• On access the use tag is set
• OS checks them every so often, records what it sees,
and resets them all
• On a miss, the OS decides who has been used the
least
– Basic strategy: Miss penalty is so huge, you can spend a
few OS cycles to help reduce the miss rate
204521 Digital Computer Architecture
16
Last Question
• Q4: Write Policy
– Always write-back
• Due to the access time of the disk
• So, you need to keep tags to show when pages are
dirty and need to be written back to disk when
they’re swapped out.
– Anything else is pretty silly
– Remember – the disk is SLOW!
204521 Digital Computer Architecture
17
Page Sizes
An architectural choice
• Large pages are good:
– reduces page table size
– amortizes the long disk access
– if spatial locality is good then hit rate will improve
• Large pages are bad:
– more internal fragmentation
• if everything is random each structure’s last page is only half
full
• Half of bigger is still bigger
• if there are 3 structures per process: text, heap, and control
stack
• then 1.5 pages are wasted for each process
– process start up time takes longer
• since at least 1 page of each type is required to prior to start
• transfer time penalty aspect is higher
204521 Digital Computer Architecture
18
More on TLBs
• The TLB must be on chip
– otherwise it is worthless
– small TLB’s are worthless anyway
– large TLB’s are expensive
• high associativity is likely
• ==> Price of CPU’s is going up!
– OK as long as performance goes up faster
204521 Digital Computer Architecture
19
Address Translation with
Page Table
(Figure 5.40, pg. 444)
• A page table translates a
virtual page number into
a physical page number
• The page offset remains
unchaged
• Page tables are large
– 32 bit virtual address
– 4 KB page size
– 2^20 4 byte table entries = 4MB
• Page tables are stored in
main memory => slow
• Cache table entries in a
translation buffer
Fast Address Translation
with Translation Buffer (TB)
(Figure 5.41, pg. 446)
• Cache translated
addresses in TB
• Alpha 21064 data TB
–
–
–
–
–
–
32 entries
fully associative
30 bit tag
21 bit physical address
Valid and read/write bits
Separate TB for instr.
• Steps in translation
– compare page no. to tags
– check for memory
access violation
– send physical page no.
of matching tag
– combine physical page
no. and page offset
Selecting a Page Size
• Reasons for larger page size
– Page table size is inversely proportional to the page size;
therefore memory saved
– Fast cache hit time easy when cache size < page size (VA caches);
bigger page makes this feasible as cache size grows
– Transferring larger pages to or from secondary storage, possibly
over a network, is more efficient
– Number of TLB entries are restricted by clock cycle time, so a
larger page size maps more memory, thereby reducing TLB misses
• Reasons for a smaller page size
– Want to avoid internal fragmentation: don’t waste storage; data
must be contiguous within page
– Quicker process start for small processes - don’t need to bring in
more memory than needed
Memory Protection
• With multiprogramming, a computer is shared by
several programs or processes running concurrently
– Need to provide protection
– Need to allow sharing
• Mechanisms for providing protection
– Provide Base and Bound registers: Base ฃ Address ฃ Bound
– Provide both user and supervisor (operating system) modes
– Provide CPU state that the user can read, but cannot write
• Branch and bounds registers, user/supervisor bit, exception bits
– Provide method to go from user to supervisor mode and vice versa
• system call : user to supervisor
• system return : supervisor to user
– Provide permissions for each flag or segment in memory
Alpha VM Mapping
(Figure 5.43, pg. 451)
• “64-bit” address divided
into 3 segments
– seg0 (bit 63=0) user code
– seg1 (bit 63 = 1, 62 = 1) user stack
– kseg (bit 63 = 1, 62 = 0)
kernel segment for OS
• Three level page table, each
one page
– Reduces page table size
– Increases translation time
• PTE bits; valid, kernel &
user read & write enable
Alpha 21064
Memory Hierarchy
• The Alpha 21064 memory hierarchy includes
–
–
–
–
–
–
–
A 32 entry, fully associative, data TB
A 12 entry, fully associative instruction TB
A 8 KB direct-mapped physically addressed data cache
A 8 KB direct-mapped physically addressed instruction cache
A 4 entry by 64-bit instruction prefetch stream buffer
A 4 entry by 256-bit write buffer
A 2 MB directed mapped second level unified cache
• The virtual memory
– Maps a 43-bit virtual address to a 34-bit physical address
– Has a page size of 8 KB
Miss Rate
10.00%
1.00%
0.10%
0.01%
Su2cor
Spice
Nasa7
Mdljp2
Hydro2d
Wave5
Alvinn
Tomcatv
Doduc
Ear
Swm256
Fpppp
Mdljsp2
Ora
Gcc
Compress
Sc
Li
Eqntott
Espresso
TPC-B (db2)
TPC-B (db1)
AlphaSort
Alpha Memory Performance:
Miss Rates
100.00%
I$
8K
D$
8K
L2
2M
Alpha CPI Components
• Largest increase in CPI due to
– I stall: Instruction stalls from branch mispredictions
– Other: data hazards, structural hazards
4.50
4.00
3.50
L2
I$
2.50
D$
2.00
I Stall
1.50
Other
1.00
0.50
Hydro2d
Mdljp2
Wave5
Tomcatv
Alvinn
Doduc
Swm256
Ear
Fpppp
Ora
Mdljsp2
Compress
Gcc
Sc
Eqntott
Li
Espresso
TPC-B (db1)
TPC-B (db2)
0.00
AlphaSort
CPI
3.00
Pitfall: Address space to small
• One of the biggest mistakes than can be
made when designing an architect is to
devote to few bits to the address
– address size limits the size of virtual memory
– difficult to change since many components depend on it
(e.g., PC, registers, effective-address calculations)
• As program size increases, larger and larger
address sizes are needed
–
–
–
–
–
8 bit: Intel 8080
16 bit: Intel 8086
24 bit: Intel 80286
32 bit: Intel 80386
64 bit: Intel Merced
(1975)
(1978)
(1982)
(1985)
(1998)
Pitfall: Predicting Cache Performance
of one Program from Another Program
35%
D: tomcatv
• 4KB Data cache miss
rate 8%,12%,
or 28%?
• 1KB Instr cache miss
rate 0%,3%,
Miss
or 10%?
Rate
• Alpha vs. MIPS for
8KB Data:
17% vs. 10%
30%
D: gcc
D: espresso
25%
I: gcc
20%
I: espresso
I: tomcatv
15%
10%
5%
0%
1
2
4
8
16
Cache Size (KB)
32
64
128
Pitfall: Simulating Too Small an
Address Trace
4.5
4
Cummlati
3.5
ve
3
Average
Memory 2.5
Access
2
Time
1.5
1
0
1
2
3
4
5
6
7
8
9 10 11 12
Instructions Executed (billions)
Virtual Memory Summary
• Virtual memory (VM) allows main memory
(DRAM) to act like a cache for secondary
storage (magnetic disk).
• The large miss penalty of virtual memory
leads to different stategies from cache
– Fully associative, TB + PT, LRU, Write-back
• Designed as
– paged: fixed size blocks
– segmented: variable size blocks
– hybrid: segmented paging or multiple page sizes
• Avoid small address size
Summary 2: Typical Choices
Option
TLB
L1 Cache
L2 Cache
Block Size
4-8 bytes
(1 PTE)
1 cycle
4-32 bytes
32-256 bytes 4k-16k
bytes
6-15 cycles 10-100
cycles
30-200
700k-6M
cycles
cycles
13 - 15%
.00001 001%
256KB 16MB
DRAM
Disks
Hit Time
1-2 cycles
Miss Penalty 10-30
cycles
Local Miss
.1 - 2%
Rate
32B – 8KB
Size
8-66 cycles
Backing
Store
Q1: Block
Placement
Q2: Block ID
L2 Cache
L1 Cache
.5 – 20%
1 – 128 KB
Fully or set DM
associative
Tag/block
Tag/block
Q3: Block
Random
Replacemen (not last)
N.A. For
DM
DM or SA
Tag/block
Random (if
SA)
VM (page)
Fully
associative
Table
LRU/LFU
Download