Memory Management

advertisement
Memory Management
 Basic memory management
 Swapping
 Virtual memory
 Page replacement algorithms
 Modeling page replacement algorithms
 Design issues for paging systems
 Implementation issues
 Segmentation
1
Memory Management
• Ideally programmers want memory that is
large, fast, non volatile
• Memory hierarchy
– small amount of fast, expensive
memory – cache
– some medium-speed,
medium price main memory
– gigabytes of slow,
cheap disk storage
larger
faster
Intel : 8-, 16-, 32-bits
MIPS: 32- bit
32 KB to a few MB
128 MB to
1GB
40 GB to
160 GB
• Memory manager handles
the memory hierarchy
2
Basic Memory Management
3
Basic Memory Management
Memory management:(1) swapping and paging
(2) without swapping and paging
Monoprogramming without Swapping or Paging
Model (a) was used on mainframes and minicomputers, and is rarely used any more.
Model (b) is used on some palmtop computers and embedded systems.
Model (c) was used by the early personal computers. The portion of the system in
ROM is called BIOS (Basic Input Output System)
Except on simple embedded systems, monoprogramming is hardly used anymore.
4
Multiprogramming with Fixed Partitions
(a) separate input queues for each partition
(b) Single input queue
5
• (-) multiple input queues
– queue for a large partition is empty but queue for a small partition is
full
– since the partitions are fixed, any space in a partition not used by a job
is lost
• single input queue: whenever a partition becomes free, the job closest to
the front of the queue that fits in it could be loaded into the empty
partition and run
• different strategy: since it is undesirable to waste a large partition on a
small job, search the whole input queue whenever a partition becomes free
and pick the largest job that fits the partition.
6
Swapping
9
Swapping
Two general approaches to memory management
 Swapping: Method of copying a process’s memory
contents to secondary storage, removing the process from
the memory and allocating the new free memory to a new
process, running it for a while, then putting it back on disk.
 Virtual memory: Capability of operating systems that
enables programs to address more memory locations than
are actually provided in main memory. Virtual memory
systems help remove much of the burden of memory
management from programmers, freeing them to
concentrate on application development  Sec. 4.3.
10
Memory allocation
time
 Swapping system:
 The number of processes in memory varies dynamically.
 Locations of processes in memory vary dynamically.
 Size of the partitions varies dynamically.
 Memory Compaction: When swapping creates multiple holes in memory, it is
possible to combine them all into one big one by moving all the processes
downward as far as possible.
 Usually not done because it requires a lot of CPU time.
11
How much memory should be allocated for a process when it
is created or swapped?
 If processes are created with a fixed size that never change, then
the allocation is simple: the OS allocates exactly what is needed,
no more and no less.
 If processes’ data segments can grow, a problem occurs whenever
a process tries to grow.
12
Allocation space for a growing data
(a)Allocating space for growing data segment
If the hole between processes A and B runs out, A or B will have to be moved to a
hole with enough space, swapped out of the memory until a large enough hole can
be created, or killed.
(b)Allocating space for growing stack & data segment
If the hole between stack segment and data segment runs out, the process will have
to be moved to a hole with enough space, swapped out of the memory until a large
enough hole can be created, or killed.
13
• Two ways to keep track of memory usage
– bit maps
– lists
14
Memory Management with Bit Maps
• Memory is divided up into allocation units,
the size of unit may be as small as a few
words as large as several kilobytes.
• Part of memory with 5 processes, 3 holes
– tick marks show allocation units
– shaded regions are free
15
Trade-off:
 The smaller the allocation unit, the larger the bitmap.
 If the allocation unit is chosen large, the bitmap will become smaller, but the
memory may be wasted in the last unit of the process if the the process
size is not an exact multiple of the allocation unit.
Main problem:
 When it has been decided to bring a k-unit process into memory, the
memory manager must search the bitmap to find a run of k consecutive 0
bits in the map. Searching a bitmap for a run of a given length is a slow
operation.
16
Memory Management with Linked Lists
Linked list of allocated and free memory segments
• The segment list is kept sorted by address. Sorting this way
has advantage that when a process terminates or is
swapped out, updating the list is straightforward.
17
(a)
(b)
(c)
(d)
Updating the list requires replacing a P with H.
Two entries are coalesced into one, and the list becomes one entry shorter.
The same with (b).
Three entries are merged and two items are removed from the list.
18
Algorithms to allocate memory for a newly created process
Assume that the memory manager knows how much memory to allocate.
 First fit :The memory manager scans along the list of segments until it finds a
hole that is big enough. The hole is then broken up into two pieces, one for the
process and one for the unused memory.
 It is a fast algorithm because it searches as little as possible.
 Next fit :It works the same way as first, except that it keeps track of where it is
whenever it finds a suitable hole. The next time it is called to find a hole, it
starts searching the list from the place where it left off last time.
 Simulations (Bays, 1977) show that it gives slightly worse performance than first fit.
 Best fit: It searches the entire list and takes the smallest hole that is adequate.
 It is slower than first fit.
19
 Worst fit :To get around the problem of breaking up nearly exact
matches into a process and tiny hole, it always takes the largest
available hole, so that the hole broken off will be big enough to be
useful.
 Simulation has shown that the worst fit is not a very good idea either.
 Quick fit :It maintains separate lists for some of the more common
sizes requested.
 e.g. a table with n entries, in which the first entry is a pointer to the head of
a list of 4-KB holes, the second entry is the a pointer to a list of 8-KB
holes, the third entry a pointer to 12-KB holes.
 Finding a hole of required size is fast.
 It has the same disadvantage as all schemes that sort by hole size, when a
process terminates or is swapped out, finding its neighbor to see if a merge
is possible is expensive.
20
Virtual Memory
21
Virtual Memory
 Virtual memory: Capability of operating systems that enables
programs to address more memory locations than are actually provided
in main memory. Virtual memory systems help remove much of the
burden of memory management from programmers, freeing them to
concentrate on application development (Devised by Fotheringham,
1961)
 Basic idea: the combined size of a program, data, and stack may
exceed the amount of physical memory available for it. OS keeps those
parts of the program currently in use in main memory, and the rest on
disk
 e.g. 16-MB program can run on a 4-MB machine by carefully
choosing which 4-MB to keep in memory at each instant, with pieces
of program being swapped between disk and memory as needed.
22
Paging
• Paging: Virtual memory organization technique that divides an address
space into fixed blocks of contiguous address. When applied to a
process’s virtual address space, the blocks are called pages, which
store process data and instructions. When applied to main memory, the
blocks are called page frames.
23
 Virtual address: Program-generated address (using indexing, base
registers, segment registers and other ways).
 Virtual address space: formed by all virtual address.
Pentium II pro:36 bits address: 236 = 64GB
 Memory management unit (MMU): a chip or collection of chips that
maps the virtual addresses onto the physical memory addresses
24
Example of how the mapping works.
Virtual addresses: 16-bit (0 – 64KB)
Physical memory: 64KB
User program can be up to 64KB,
but it cannot be loaded into memory
entirely and run.
The virtual address space is divided
into units called pages.
The corresponding units in physical
memory are called page frames.
The pages and frame pages are
always the same size. 4KB (512B –
64KB in real system)
8 frame pages, 16 virtual pages
e.g. MOV REG, 0
it is transformed into (by MMU)
MOV REG, 8192
25
e.g. MOV REG, 8192
is transformed into
MOV REG, 24576
In the actual hardware, a
Present/absent bit keeps track of
which pages are physically present
in memory.
(2456728671)
(8192-12287)
(4196-8191)
(0-4095)
26
Page fault: Fault that occurs as the
result of an error when a process
attempts to access a nonresident
page, in which case the OS can
load it from disk.
e.g. MOV REG, 32780
(12-th byte within virtual page 8)
(1) MMU notices that the page is
unmapped and causes CPU to trap
to OS.
(2) OS picks a little-used page frame
and writes back to the disk.
(3) Then it fetches the page just
referenced into frame page just
freed.
(4) Change the map and restart the
trapped instruction.
27
Page Tables
Page table: Table that stores
entries that map page numbers
to page frames. A page table
contains an entry for each of a
process’s virtual pages.
e.g. 16-bit address:
High-order 4 bits: virtual page
number.
Low-order 12 bits: offset
8196 is transformed into 24580
by MMU.
Internal operation of MMU with 16 4 KB pages
28
 The purpose of page table is to map virtual pages onto
page frames.
 Two major issues must be faced:
(1) The page table can be extremely large.
e.g. a computer uses 32-bit virtual addresses, page size: 4KB
Page number = 232/ 212 = 220 (1 million)
Remember that each process needs its own page table because it has
its own virtual address space.
(2) The mapping must be fast.
The virtual-to-physical mapping must be done on every memory
reference.
A typical instruction has an instruction word, and often a memory
operand as well. Consequently, it is necessary to make 1, 2, or
sometimes more page table reference per instruction.
29
Hardware solutions:
 Simplest design: one page table consisting of an array of fast hardware
registers, with one entry for each virtual page, indexed by virtual page
number.
 Advantage: straightforward, and requires no memory reference.
 Disadvantage: expensive (if the page table is large)
 Page table entirely in main memory, and one hardware register that
points to the start of the page table
 Advantage: allows the memory map to be changed at a context
switch by reloading one register.
 Disadvantage: requires one or more memory references to read
page table entries during the execution of each instruction.
 Variations of the two approaches
30
Second-level Page tables
Multilevel Page Tables
To get around the problem of having to
store huge page tables in memory all theSecond-level page tables
time.
32-bit virtual address
PT1:10 bits, PT2: 10 bits
Offset:12 bits
Top-level
page table
(Page size: 4KB )
Page number: 220
The secret to the multilevel page table method is
to avoid keeping all tables in memory all the time.
e.g. a process needs 12Mbytes, 4MB for text, the
next 4MB for data, and the top 4MB for stack.
Only 4 page tables are actually needed: top-level
table, second level tables for 0 to 4M, 4M to 8M,
and top 4M.
e.g. Virtual address = 0x00402004, then PT1=1,
PT2=2, Offset=4
31
Structure of a Page Tables Entry
 The exact layout of an entry is highly machine dependent, but the kind of
information present is roughly the same.
 The size varies from computer to computer, but 32 bits is a common size.
 Page frame number: the goal of the page mapping is to locate this value.
 Present/absent bit: If this bit is 1, the entry is valid and can be used. If it is 0,
the virtual page to which the entry belongs is not currently in memory.
 Modified and Referenced bits: keep track of page usage. When a page is
written to, the hardware automatically sets the modified bit. If the page in it has
been modified, it must be written back to the disk. Modified bit is sometimes
called dirty bit. The reference bit is set whenever a page is referenced.
 Caching disabled bit: allows caching to be disabled for the page.
32
TLBs – Translation Lookaside Buffers
 All paging schemes keep the page tables in memory => performance
problems!
 Most programs tend to make a large number of references to a small
number of pages, and not the other way around
 Solution: equip computers with a small hardware device for mapping
virtual addresses to physical addresses without going through the page
table
 This device is called associative memory (AM) or translation lookaside
buffer. It is usually inside the MMU and consists of a small number of
entries (normally 32)
33
A TLB to speed up
paging
 When a virtual address is presented to the MMU for translation, the
hardware first check to see if its virtual page number is present in TLB by
comparing it to all the entries simultaneously. If a valid match is found and
the access does not violate the protection bits, the page frame is taken
directly from TLB, without going to the page table.
 Hit ratio: fraction of memory references that can be satisfied from the TLBs.
The higher the hit ratio, the better the performance.
 When the virtual page number is not in TLB, the MMU detects the miss and
does an ordinary page lookup.
34
Software TLB Management
 Hardware TLB Management
– MMU hardware recognizes the virtual memory has page table. TLB
management and TLB fault handling are done by TLB.
 Software TLB Management
– Modern RISC computers do nearly all of these page management in
software.
– e.g. SPARC, MIPS, Alpha, and HP PA.
– On these machines, TLB entries are explicitly loaded by the OS. When a
TLB miss occurs, it just generates a TLB fault and tosses the problem to
OS. The OS must find the page, remove an entry from the TLB, enter the
new one, and restart the instruction that faulted. And, of course, all of this
must be done in a handful of instructions because TLB misses occur much
more frequently than page faults.
– If TLB is reasonably large to reduce the miss rate, software management of
TLB turns out to be acceptably efficient (Uhlig, 1994).
– Main gain: simpler MMU, more area on CPU chip for cache and other
features.
35
Inverted Page Tables
 Today: 32-bit virtual address space and physical memory, 4 Kbytes pages size
=> each process need 2 20 entries in its page table (PT) with 4 bytes per entry =
4 Mbytes / process and PT is large but manageable (multilevel paging
schemes)
 RISC chips with 64-bit virtual address space?:
– 64-bit virtual address space >>>> physical memory
– 64-bit address space = 20 million terabytes
– 4 Kbytes page size => 2 52 = 4 quadrillion PT entries => requires
rethinking!!!!!
• Solution: virtual address space immense, physical pages frames still
manageable => inverted page table  in this design, there is one entry per
page frame in real memory, rather than one entry per page of virtual address
space.
E.g. with 64-bit virtual addresses, a 4-KB page, and 256 MB of RAM, and
inverted page table only requires 65,536 entries. The entry keeps track of
which (process, virtual page) is located in the page frame.
36
All virtual pages currently in
memory that have the same
hash value are chained together
 Comparison of a traditional page table with an inverted page table
 IBM and HP workstations use inverted page tables. It will become more
common as 64-bit machines become wide-spread.
37
Page Replacement Algorithms
38
Page Replacement Algorithms
•
•
•
•
Page fault => OS has to select a page for replacement
Modified page => write back to disk
Not modified page => just overwrite with new page
How to decide which page should be replaced?
– random
– many algorithms take into account
• usage
• age
• ...
39
Optimal Page Replacement Algorithm
What is optimal page replacement algorithm?
Unrealizable page-replacement strategy that replaces the page
that will not be used until furthest in the future.
 Easy to describe - impossible to implement because OS cannot
look into future
 Useful to evaluate page replacement algorithms
 Best (optimal) page replacement algorithm
– page fault occurs, a set of pages is in memory
– label all pages with the number of instructions that will be
executed before this page will be used again in the future
– replace the page with the highest number
It is of no use in practical.
40
NRU(Not Recently Used) Page Replacement Algorithm
What is NRU page replacement algorithm?
Page replacement strategy that uses referenced bits and modified bits to replace page.
• Status bits associated with each page
– R: page referenced (read or written)
– M: page modified (written) (dirty bit, dirty page)
• Four classes:
– class 0: not referenced, not modified
Check in
– class 1: not referenced, modified
this order
– class 2: referenced, not modified
– class 4: referenced, modified
• NRU removes a page at random from the lowest numbered nonempty class
• Low overhead
41
FIFO Page Replacement Algorithm
What is FIFO page replacement algorithm?
It is a page replacement strategy that replaces the page that has been in memory
longest.





OS maintains list of all pages currently in memory.
Pages are stored in list by age.
FIFO replaces oldest pages in case of page fault.
Incurs low overhead, but does not predict future page usage accurately.
FIFO is rarely used in its pure form.
Page loaded first
Time
0
A
3
B
7
C
8
D
12
E
14
F
15
G
18
H
Most recently loaded
page
42
Second Chance Page Replacement Algorithm
What is second chance page replacement algorithm?
It is a variation of FIFO page replacement that uses the referenced bit and
FIFO queue to determine which page to replace. If the oldest page’s
referenced bit is off, it replace the page. Otherwise it turns off the referenced
bit on the oldest page and moves it to the tail of FIFO queue, and examines
the next page or pages until it locates a page with its referenced bit turned off.
• R: referenced bit.
• Second chance is a reasonable algorithm
• But, inefficient because it is moving pages around on its list
Page loaded first
Time
Time
0
A
3
B
3
B
7
C
7
C
8
D
8
D
12
E
12
E
14
F
14
F
15
G
15
G
18
H
18
H
20
A
Most recently loaded
page
A is treated like newly
loaded page
43
The Clock Page Replacement Algorithm
When a page fault occurs,
the page the arrow is
pointing to is inspected.
Action taken depends on
the R bit
R=0: evict page
R=1: clear R &
advance
What is clock page replacement? It is a variation of second chance page replacement
strategy that arranges the pages in a circular list instead of a linear list.
• Pointer to the oldest page
– R bit 0: page not referenced in last round => replace
– R bit 1: page referenced in last round
• set R bit to 0
• advance until first page with R = 0 is found
– advance pointer to next entry in both cases
44
Least Recently Used (LRU) Page Replacement Algorithm
What is LRU page replacement algorithm? Page-replacement strategy that replaces
the page that has not been referenced for longest time. LRU generally predicts
future page usage well but incurs significant overhead.
(1) Linked list. It is expensive: maintaining the list is time consuming operation.
(2) Implement with special hardware — a counter. Each page table entry must also
have a filed large enough to contain the counter.
(3) Another special hardware that can contain a matrix of nn bits, initially all 0.
At any instant, the row whose value is lowest is the least recently used.
0 1 2 3
0 0111
0011
0001
0000
0000
1011
1001
1000
1000
1 0000
0000
1101
1100
1101
2 0000
0000
0000
1110
1100
3 0000
0
1
1
1
0
0
0
0
0
1
0
0
0
1
1
0
0
0
0
0
1
0
0
0
1
1
0
0
1
1
1
0
0
0
0
1
1
0
0
1
1
1
0
1
0
0
0
0
0
0
1
1
1
0
1
1
0
0
0
0
0
0
1
0
0
0
1
1
1
0
1
1
0
0
0
1
0
0
0
0
Pages referenced in this order: 0 1 2 3 2 1 0 3 2 3
45
Simulating LRU in Software
Previous LRU algorithms are realizable in principle if machines have this
hardware. They are no use to OS designer who is making a system for a
machine that does not have this hardware.
Solution: NFU (Not Frequently Used) algorithm: It requires a software counter
associated with each page, initially zero. At each clock interrupt, OS scans all
pages in memory. For each page, the R bit (0 or 1) is added to the counter.
—Main problem of NFU algorithm: it never forget anything.
Aging: Modifies NFU algorithm as follows, and makes it able to simulate LRU
quite well.
(1) The counters are each shifted right 1 bit before R bit is added in;
(2) The R bit is added to the leftmost, rather than the rightmost.
46
The aging algorithm simulates LRU in software
Note 6 pages for 5 clock ticks, (a) – (e)
In practice, 8 bits is enough if a clock tick is around 20 msec.
47
The Working Set Page Replacement Algorithm
W(k,t)
k
Working set: the set of pages that a process is
currently using.
k: most recent memory reference
t: time
w(k,t): the size of the working set at time, t
48
page span =
current virtual time – time of
last use
: predetermined page span
The working set page replacement algorithm:
 The hardware is assumed to set R and M bits.
 A periodic clock interrupt is assumed to cause software to run that clears R
bit on every clock tick.
 On page every fault, the page table is scanned to look for a suitable page
to evict.
49
algorithm but also uses the working set information.
page span =
current virtual time – time of last use
: predetermined page span.
50
Review of Page Replacement Algorithms
51
Segmentation
52
Segmentation
Problem in one-dimensional address
space with growing tables
Ex. A compiler has following tables
(1) Source text
(2) Symbol table
(3) Constant table
(4) Parse tree
(5) Stack
Problem: one table may bump into
another
Solution: To provide the machine with
many completely independent
address spaces, called sgements.
53
Segment: Variable-size set of contiguous addresses in a
process’s virtual address space that is managed as one unit.
A segment is typically the size of an entire set of similar
items, such as a set of instructions in a procedure or the
contents of an array, which enables the system to protect
such items with fine granularity using appropriate access
rights.
– two or more separate/independent virtual address spaces
growing/shrinking
– different kinds of protection are possible
Two-part address (n, k):
– n: address number (which segment)
– k: address within segment
Segmentation also facilitates sharing procedures or data
between several processes
– e.g. shared library
54
• Segmented memory allows each table to grow or shrink
independently of other tables
55
Comparison of paging and segmentation
56
Implementation of Pure Segmentation
The implementation of segmentation differs from paging in
an essential way: Pages are fixed size and segments are not.
(a)-(d) Development of checkerboarding
(e) Removal of the checkerboarding by compaction
External fragment (or checkerboarding): After the system has been running for a while,
memory will be divided up into a number of chunks, some containing segments and
some containing holes. This phenomena is called external fragment.
57
Segmentation with Paging: MULTICS
MULTICS (MULTiplexed Information and Computer
Service) : One of the first operating systems to implement
virtual memory. Developed by MIT, GE and Bell
Laboratories as the successors to MIT’s CTSS (Compatible
Time Sharing System).
Ken Thompson, one of the computer scientists at Bell Labs
who had worked on MULTICS project, wrote a strippeddown, one-user version of MULTICS. This work later
developed into UNIX.
58
Segmentation with Paging
• Many large segments > main memory size => paging
• MULTICS
– Honeywell 6000 machines + descendents
– per program: virtual memory of max. size 218 = 256 K segments
(max. size 64 K 36-bit word long)
– Treat each segment as a virtual memory and to page it.
– segment table + page tables
– 16-word high speed TLB
59
• Descriptor segment
points to page tables
64K
60
A 34-bit MULTICS virtual address
61
Memory reference
 Conversion of a 2-part MULTICS address into a main memory address
 Problem: program would not run very fast.
 Solution: 16-word TLB
62
Simplified version of the MULTICS TLB (Existence of 2
page sizes makes actual TLB more complicated)
63
Segmentation with Paging: The Intel Pentium
• MULTICS:
– Both segmentation and paging
– 256K independent segments, each up to 64K 36-bit words
• Intel Pentium
– Both segmentation and paging
– 16K independent segments, each up to 1 billion 32-bit words
– Each program has its own LDT (Local Descriptor Table). LDT
describes segments local to each program, including its code, data,
stack, and so on.
– A single GDT (Global Descriptor Table) shared by all programs on
the computer. GDT describes system segments including the OS its
self.
64
To access a segment, a Pentium program first loads a selector for that
segment into one of the machine’s 6 segment register.
CS: holds the selector for code segment
DS: holds the selector for data segment
A Pentium selector
Specify LDT or GDT entry number. Theses
tables are restricted to hold 8K segment
descriptors.
65
At the time a selector is loaded into a segment register, the corresponding
descriptor is fetched from the LDT or GDT and stored in microprogram
registers, so it can be accessed quickly.
Pentium code segment descriptor (Data segments differ slightly): 8 bytes
66
How to convert (selector, offset) pair to physical address ?
(1) Find the descriptor corresponding to the selector. If the segment does
not exist, or is currently paged out, a trap occurs.
(2) Check the offset is beyond the end of the segment, in which case a
trap occurs.
If G(granularity)=0, limit field (20bits) is the exact segment size, up
to 1MB. If G=0, limit field gives the segment size in pages instead of
bytes. Pentium page size is fixed as 4KB, 20 bits are enough for
segments up to 232 bytes.
(3) Assuming that the segment is in memory and the offset is in range, the
Pentium then adds 32-bit base field to offset to form linear address.
32-bit base is broken into 3 pieces all over descriptor for
compatibility with 286 (base is 24 bits)
(4) If paging is disabled (by a bit in global control register), the linear
address is interpreted as the physical address and sent to memory for
read or write. This is a pure segmentation scheme.
(5) If paging is enabled, the linear address is interpreted as a virtual address
and mapped onto physical address using page tables. Page size is
4KB, a segment might contain 1 million pages.
67
Conversion of a (selector, offset) pair to a linear address
68
Each running program has a page directory consisting of 1K 32-bit entries.
Located at an address pointed to by a global register.
Each entry in this directory points to a table also containing 1K 32-bit entries.
Mapping of a linear address onto a physical address
69
Page table entry: 32 bits each, 20 of which contains page frame number,
remaining bits contains access and dirty bits, set by hardware,
Single page table handles 4MBytes of memory (1K page frames, page size
is 4KB)
To avoid making repeated reference to memory, the Pentium (like
MULTICS) has a small TLB that directly maps the most recently used
Dir-page combination onto physical address of the page frame.
If some application does not need segmentation but is content with a
single, paged, 32-bit address, the model is possible. All segment
registers can be set up with the same selector, whose descriptor has
base=0 and limit set to maximum. In fact all current OSs for Pentium
work this way. OS/2 was the only one that used full power of Intel
MMU architecture.
70
Level
Protection on the Pentium
 Pentium supports 4 protection level. A running program is at a
certain level indicated by 2 bits in PSW(processor status word).
 Each segment in the system also has a level.
71
Download