Lec13 - Program Detail

advertisement
This lecture…

Options for managing memory:
–
–
–
–
–

Paging
Segmentation
Multi-level translation
Paged page tables
Inverted page tables
Comparison among options
Hardware Translation Overview
Virtual address
CPU
Translation
Box (MMU)
Physical
address
Physical
memory
Data read or write
(untranslated)

Think of memory in two ways:
– View from the CPU – what program sees, virtual memory
– View from memory – physical memory



Translation implemented in hardware; controlled in
software.
There are many kinds of hardware translation schemes.
Start with the simplest!
Base and Bounds


Each program loaded into contiguous regions of physical
memory, but with protection between programs.
First built in the Cray-1.
bounds
CPU
virtual
address
base
yes
<
no
+
physical
address Memory
MMU
error


relocation: physical addr = virtual addr + base register
protection: check that address falls in (base, base+bound)
Base and Bounds



Program has illusion it is running on its own dedicated machine,
with memory starting at 0 and going up to size = bounds.
Like linker-loader, program gets contiguous region of memory.
But unlike linker-loader, protection: program can only touch
locations in physical memory between base and base + bounds.
0
Code
bound
6250
Data
Virtual memory
stack
6250 + bound
Physical memory
Base and Bounds
Provides level of indirection: OS can move bits around
behind the program’s back, for instance, if program
needs to grow beyond its bounds, or if need to
coalesce fragments of memory.
 Stop program, copy bits, change base and bounds
registers, restart.


Only the OS gets to change the base and bounds!
Clearly, user program can’t, or else lose protection.
Base and Bounds

With base&bounds system, what gets saved/restored
on a context switch?
– Everything from before + base/limit values
– Complete contents of memory out to disk (Called
“Swapping”)

Hardware cost:
– 2 registers, Adder, Comparator
– Plus, slows down hardware because need to take time to
do add/compare on every memory reference.
Base and bound tradeoffs

Pros:
–

Simple, fast
Cons:
1. Hard to share between programs
– For example, suppose two copies of “vi”
» Want to share code
» Want data and stack to be different
– Can’t do this with base and bounds!
2. Complex memory allocation
3. Doesn’t allow heap, stack to grow dynamically – want to put
these as far apart as possible in virtual memory, so that
they can grow to whatever size is needed.
Base and bound: Cons (complex allocation)

Variable-sized partitions
– Hole – block of available memory; holes of various size
are scattered throughout memory.
– New process allocated memory from hole large enough to
fit it
– Operating system maintains information about:
a) allocated partitions
b) free partitions (hole)
OS
process 5
OS
OS
process 5
process 5
process 9
10 arrive
9 arrive
8 done
OS
process 8
process 9
process 10
5 done
process 2
process 2
process 2
process 2
Dynamic Storage-Allocation Problem

How to satisfy a request of size n from a list of free
holes?
– First-fit: Allocate the first hole that is big enough.
– Best-fit: Allocate the smallest hole that is big enough; must
search entire list, unless ordered by size. Produces the
smallest leftover hole.
– Worst-fit: Allocate the largest hole; must also search
entire list. Produces the largest leftover hole.
First-fit and best-fit better than worst-fit in terms
of speed and storage utilization.
 Particularly bad if want address space to grow
dynamically (e.g., the heap).

Internal Fragmentation
OS
Process 7
Process 4 request for
18,462 bytes
Process 4
Process 2

Hole of 18,464 bytes
Internal fragment of
2 bytes
Internal Fragmentation – allocated memory may be
slightly larger than requested memory but not being
used.
External Fragmentation
External Fragmentation - total memory space exists
to satisfy request but it is not contiguous
 50-percent rule: one-third of memory may be
unusable.

– Given N allocated blocks, another 0.5N blocks will be lost
due to fragmentation.
OS
50k
125k
Process 9
?
process 3
process 8
100k
process 2
Compaction
Shuffle memory contents to place all free memory
together in one large block
 Only if relocation dynamic!
 Same I/O DMA problem

OS
OS
125k Process 9
50k
process 3
process 3 90k
process 8
process 8 60k
process 8
process 3
100k
process 2
OS
process 2
process 2
Segmentation
A segment is a region of logically contiguous memory.
 Idea is to generalize base and bounds, by allowing a
table of base&bound pairs.
 Virtual address: <segment-number, offset>
 Segment table – maps two-dimensional user defined
address into one-dimensional physical address

– base - starting physical location
– limit - length of segment

Hardware support
– Segment Table Base Register
– Segment Table Length Register
Segmentation
Segmentation example

Assume 14 bit addresses divided up as:
– 2 bit segment ID (1st digit), and a 12 bit segment offset
physical memory
(last 3).
Virtual memory
0
Seg
base
limit
0 code
0x4000
0x700
1 Data
0x0000
0x500
2
-
3 Stack 0x2000 0x1000
Segment table
6ff
1000
14ff
0
4ff
2000
2fff
3000
3fff
where is 0x0240?
0x1108?
0x265c?
0x3002?
0x1600?
4000
46ff
Observations about Segmentation
This should seem a bit strange: the virtual address
space has gaps in it!
 Each segment gets mapped to contiguous locations in
physical memory, but may be gaps between segments.
 But a correct program will never address gaps; if it
does, trap to kernel and then core dump.
 Minor exception: stack, heap can grow.
 In UNIX, sbrk() increases size of heap segment.
 For stack, just take fault, system automatically
increases size of stack.

Observations about Segmentation cont’d
Detail: Need protection mode in segmentation table.
 For example, code segment would be read-only (only
execution and loads are allowed).
 Data and stack segment would be read-write (stores
allowed).


What must be saved/restored on context switch?
– Typically, segment table stored in CPU, not in memory,
because it’s small.
– Might store all of processes memory onto disk when
switched (called “swapping”)
Segment Translation Example
Example: What happens with the segment table shown earlier,
with the following as virtual memory contents? Code does:
Physical memory
Initially PC = 240
Virtual memory strlen(x);

Main:
240
store 1108, r2
244
store pc +8, r31
248
jump 360
24c
…
…
Strlen: 360
loadbyte (r2), r3
jump (r31)
1108
666
…
Main: 4240
store 1108, r2
4244
store pc +8, r31
4248
jump 360
424c
…
Strlen: 4360
loadbyte (r2), r3
…
…
x:
108
...
…
420
x:
a b c \0
4420
jump (r31)
Segmentation Tradeoffs

Pro:
–
–
–
–

Efficient for sparse address spaces
Multiple segments per process
Easy to share whole segments (for example, code segment)
Don’t need entire process
in memory!!!
Con:
– Complex memory allocation
– Extra layer of translation
speed = hardware support
– Still need first fit, best fit, etc., and re-shuffling to
coalesce free fragments, if no single free space is big
enough for a new segment.

How do we make memory allocation simple and easy?
Paging
Logical address space can be noncontiguous; process is
allocated physical memory whenever available.
 Divide physical memory into fixed-sized blocks called
frames.
 Divide logical memory into blocks of same size called
pages (page size is power of 2, 512 bytes to 16 MB).
 Simpler, because allows use of a bitmap. What’s a
bitmap?
001111100000001100
 Each bit represents one page of physical memory – 1
means allocated, 0 means unallocated.
 Lots simpler than base&bounds or segmentation

Address Translation Architecture

Operating system controls mapping: any page of
virtual memory can go anywhere in physical memory.
virtual address
CPU
error
Virtual page #
No
physical address
Phys frame #
Offset
Offset
d
Page table size
<
yes
p
f
PTBR
Page
table
Phys frame #
Page table
physical memory
Paging Example
Paging Example
Paging Tradeoffs

What needs to be saved/restored on a context
switch?
– Page table pointer and limit

Advantages
– no external fragmentation (no compaction)
– relocation (now pages, before were processes)

Disadvantages
– internal fragmentation
» consider: 2048 byte pages, 72,766 byte proc
 35 pages + 1086 bytes = 962 bytes fragment
» avg: 1/2 page per process
» small pages!
– overhead
» page table / process (context switch + space)
» lookup (especially if page to disk)
Free Frames

Frame table: keeps track of which frames are
allocated and which are free.
Free frames (a) before allocation (b) After allocation
Implementation of Page Table
Page table kept in registers
 Fast!
 Only good when number of frames is small
 Expensive!
 Instructions to load or modify the page-table
registers are privileged.

Registers
Memory
Disk
Implementation of Page Table
Page table kept in main memory
 Page Table Base Register (PTBR)

Page 0
Page 1
Virtual memory
0 2
1
1
Page table
PTBR
0
2 1
1
Page 1
2
Page 0
3
Physical memory
Page Table Length
 Two memory accesses per data/inst access.

– Solution? Associative Registers or translation look-aside
buffers (TLBs).
Associative Register

Associative memory – parallel search
Page #

Frame #
Address translation (A´, A´´)
– If A´ is in associative register, get frame # out.
– Otherwise get frame # from page table in memory
TLB full – replace one (LRU, random, etc.)
 Address-space identifiers (ASIDs): identifies each
process, used for protection, many processes in TLB

Translation Look-aside Buffer (TLB)
Paging Hardware With TLB
10-20% mem time
(Intel P3 has 32 entries)
(Intel P4 has 128 entries)
Effective Access Time






Associative Lookup =  time unit
Assume memory cycle time is 1 microsecond
Hit ratio – percentage of times that a page number is found in
the associative registers; ratio related to number of associative
registers.
Hit ratio = 
Effective Access Time (EAT)
EAT = (1 + )  + (2 + )(1 – )
=2+–
Example:
– 80% hit ratio,  = 20 nanoseconds, memory access time = 100
nanoseconds
– EAT = 0.8 x 120 + 0.20 x 220 = 140 nanoseconds
Memory Protection

Protection bits with each frame
– “valid” - page in process’ logical address space
– “invalid” - page not in process’ logical address space.

Store in page table
Expand to more perms

14-bit address space –

– 0 to 16,383

Program’s addresses –
– 0 to 10,468


beyond 10,468 is illegal
Page 5 classified as valid
– Due to 2K page size,
– internal fragmentation
Multilevel Paging
Most modern operating systems support a very large
logical address space (232 or 264).
 Example

–
–
–
–
logical address space = 32 bit
suppose page size = 4K bytes (212)
page table = 1 million entries (232/212 = 220)
each entry is 4 bytes, space required for page table = 4 MB
Do not want to allocate the page table contiguously in
main memory.
 Solution

– divide the page table into smaller pieces (Page the page
table)
Two-Level Paging
page number
page offset
p1
p2
d
10
10
12
p1 – index into outer page table
p2 – displacement within the
page of the page table
On context-switch: save single
PageTablePtr register
Address-Translation Scheme

Address-translation scheme for a two-level 32-bit
paging architecture
Paging + segmentation: best of both?



simple memory allocation,
easy to share memory, and
efficient for sparse address spaces
Virtual address
virt seg # virt page #
Physical address
offset
phys frame# offset
page-table page-table
base
size
>
No
error
yes
Segment table
Physical memory
+
Phys frame #
Page table
Paging + segmentation

1.
2.

Questions:
What must be saved/restored on context switch?
How do we share memory? Can share entire segment, or a
single page.
Example: 24 bit virtual addresses = 4 bits of segment #, 8
bits of virtual page #, and 12 bits of offset.
Segment table
Page-table base Page-table size
0x2000
–
0x1000
–
0x14
–
0xD
–
Physical memory
What do the
0x1000
0x6
following addresses
0xb
translate to?
0x4
…
0x2000
0x13 0x002070?
0x2a 0x201016 ?
0x14c684 ?
0x3
0x210014 ?
…
portions of the page tables
for the segments
Multilevel translation

What must be saved/restored on context switch?
– Contents of top-level segment registers (for this example)
– Pointer to top-level table (page table)

Pro:
– Only need to allocate as many page table entries as we need.
» In other words, sparse address spaces are easy.
– Easy memory allocation
– Share at segment or page level (need additional reference
counting)

Cons:
– Pointer per page (typically 4KB - 16KB pages today)
– Page tables need to be contiguous
– Two (or more, if > 2 levels) lookups per memory reference
Hashed Page Tables
What is an efficient data structure for doing
lookups? Hash table.
 Why not use a hash table to translate from virtual
address to a physical address.
 Common in address spaces > 32 bits.
 Each entry in the hash table contains a linked list of
elements that hash to the same location (to handle
collisions).
 Take virtual page #, run hash function on it, index
into hash table to find page table entry with physical
page frame #.

Hashed Page Table
Hashed Page Table
Independent of size of address space,
 Pro:

– O(1) lookup to do translation
– Requires page table space proportional to how many pages
are actually being used, not proportional to size of address
space – with 64 bit address spaces, this is a big win!

Con:
– Overhead of managing hash chains, etc.

Clustered Page Tables
– Each entry in the hash table refers to several pages (such as
16) rather than a single page.
Inverted Page Table



One entry for each real (physical) page of memory.
Entry consists of the virtual address of the page
stored in that real memory location, with information
about the process that owns that page.
Address-space identifier (ASID) stored in each entry
maps logical page for a particular process to the
corresponding physical page frame.
Inverted Page Table Architecture
Inverted Page Table

Pro:
– Decreases memory needed to store each page table

Con:
– increases time needed to search the table
Use hash table to limit the search
 One virtual memory reference requires at least two
real memory reads: one for the hash table entry and
one for the page table.
 Associative registers can be used to improve
performance.

Download