Memory Management

advertisement
Chapter 4
Memory Management
4.1 Basic memory management
4.2 Swapping
4.3 Virtual memory
4.4 Page replacement algorithms
4.5 Modeling page replacement algorithms
4.6 Design issues for paging systems
4.7 Implementation issues
4.8 Segmentation
1
Memory Management
• Ideally programmers want memory that is
– large
– fast
– nonvolatile
• Memory hierarchy
– small amount of fast, expensive memory – cache
– some medium-speed, medium price main memory
– gigabytes of slow, cheap disk storage
• Memory manager handles the memory hierarchy
2
Basic Memory Management
Monoprogramming without Swapping or Paging
Three simple ways of organizing memory
- an operating system with one user process
3
Multiprogramming with Fixed Partitions
• Fixed memory partitions
– separate input queues for each partition
Disadvantage: A large partition is empty.
– single input queue
4
Memory Management
• The CPU utilization can be modeled by the formula
CPU utilization = 1 - pn
where there are n processes in memory and each
process spends a fraction p of its time waiting for I/O
(the probability that all n processes are waiting for I/O
is pn
• CPU utilization is a function of n, which is called the
degree of multiprogramming.
• A more accurate model can be constructed using
queuing theory.
• Example:
A computer has 32 MB. The OS takes 16 MB. Each process takes
4 MB. 80 percent of time is waiting for I/O. The CPU utilization
is 1 – 0.84 = 60%. If 16 MB is added, then the utilization is 1 –
0.88 = 83%.
5
Modeling Multiprogramming
Degree of multiprogramming
CPU utilization as a function of number of processes in memory
6
Analysis of Multiprogramming System
Performance
• Arrival and work requirements of 4 jobs
• CPU utilization for 1 – 4 jobs with 80% I/O wait
• Sequence of events as jobs arrive and finish
– note numbers show amout of CPU time jobs get in each interval
7
Relocation and Protection
• Multiprogramming introduces two problems – relocation
and protection.
• Relocation - Cannot be sure where program will be
loaded in memory
– address locations of variables and code routines cannot be
absolute
– must keep a program out of other processes’ partitions
• Protection - Use base and limit values (registers)
– address locations added to base value to map to physical
address
– address locations larger than limit value is an error
8
Swapping
• Two approaches to overcome the limitation of
memory:
– Swapping puts a process back and forth in memory
and on the disk.
– Virtual memory allows programs to run even when
they are only partially in main memory.
• When swapping creates multiple holes in
memory, memory compaction can be used to
combine them into a big one by moving all
processes together.
9
Swapping
Memory allocation changes as
– processes come into memory
– leave memory
Shaded regions are unused memory
10
Swapping
• Allocating space for growing data segment
• Allocating space for growing stack & data segment
11
Memory Management with Bit Maps
and Linked Lists
• There are two ways to keep track of memory usage:
bitmaps and free lists.
• The problem of bitmaps is to find a run of consecutive 0
bits in the map. This is a slow operation.
• Four major algorithms can be used in memory
management with linked lists (double-linked list):
– First fit searches from the beginning for a hole that fits.
– Next fit searches from the place where it left off last time for a
hole that fits.
– Best fit searches the entire list and takes the smallest hole that
fits.
– Worst fit searches the largest hole that fits.
– (Quick fit) maintains separate lists for some of the more
common size requested. The same size holes are linked together.
12
Memory Management with Bit Maps
• Part of memory with 5 processes, 3 holes
– tick marks show allocation units
– shaded regions are free
• Corresponding bit map
• Same information as a list
13
Memory Management with Linked Lists
Four neighbor combinations for the terminating process X
14
Virtual Memory
• Problem: Program too large to fit in memory
• Solution:
– Programmer splits program into pieces called
Overlays - too much work
– Virtual memory - [Fotheringham 1961] - OS keeps
the part of the program currently in use in memory
• Paging is a technique used to implement virtual
memory.
• Virtual Address is a program generated address.
• The MMU (memory management unit)
translates a virtual address into a physical
address.
15
Virtual Memory
Paging
The position and function of the MMU
16
Virtual Memory
• Suppose the computer can generate 16-bit addresses, (064k). However, the computer only has 32k of memory
 64k program can be written, but not loaded into
memory.
• The virtual address space is divided into (virtual) pages
and those in the physical memory are (page) frames.
• A Present/Absent bit keeps track of whether or not the
page is mapped.
• Reference to an unmapped page causes the CPU to trap
to the OS.
• This trap is called a Page fault. The MMU selects a
little used page frame, writes its contents back to disk,
fetches the page just referenced, and restarts the trapped
instruction.
17
Paging
• The relation
between
virtual addresses
and physical
memory addresses given by
page table
• Where to keep the
mapping
information?
– Page table
18
Paging Model Example
19
Page Tables
•
Example: Virtual address = 4097 = 0001 000000000001
Virtual page # 12-bit offset
•
•
See Figure 4-11.
The purpose of the page table is to map virtual pages into page
frames. The page table is a function to map the virtual page to the
page frame.
•
Two major issues of the page tables are faced:
1. Page tables may be extremely large (e.g. most computers use)
32-bit address with 4k page size, 12-bit offset
 20 bits for virtual page number
1 million entries!
 What can we do about it? Multiple-level paging
2. The mapping must be fast because it is done on every memory
access!!
20
Pure paging
21
Page Tables
Internal operation of MMU with 16 4 KB pages
22
Two-Level Paging Example
• A logical address (on 32-bit machine with 4K page size) is
divided into:
– a page number consisting of 20 bits.
– a page offset consisting of 12 bits.
• Since the page table is paged, the page number is further
divided into:
– a 10-bit page number.
– a 10-bit page offset.
• Thus, a logical address is as follows:
page number
page offset
p1
p2
d
10
10
12
where p1 is an index into the outer page table, and p2 is the
displacement within the page of the outer page table.
23
Address-Translation Scheme
• Address-translation scheme for a twolevel 32-bit paging architecture is shown
as below.
24
Two-Level Page-Table Scheme
25
Page Tables
• Multilevel page tables - reduce the table size. Also, don't
keep page tables in memory that are not needed.
• See the diagram in Figure 4-12
• Top level entries point to the page table for
0 = program text
1 = program data
1023 = stack
 4M stack
4M data
4M code
26
Page Tables
Second-level page tables
Top-level
page table
• 32 bit address with 2 page table fields
• Two-level page tables
27
Page Tables
• Most operating systems allocate a page table for each
process.
• Single page table consisting of an array of hardware
registers. As a process is loaded, the registers are loaded
with page table.
– Advantage - simple
– Disadvantage - expensive if table is large and loading the full
page table at every context switch hurts performance.
• Leave page table in memory - a single register points to
the table
– Advantage - context switch cheap
– Disadvantage - one or more memory references to read table
entries
28
Hierarchical Paging
• Examples of page table design
– PDP-11 uses one-level paging.
– The Pentium-II uses this two-level architecture.
– The VAX architecture supports a variation of two-level
paging (section + page + offset).
– The SPARC architecture (with 32-bit addressing) supports a
three-level paging scheme.
– The 32-bit Motorola 68030 architecture supports a fourlevel paging scheme.
• Further division could be made for large logicaladdress space.
• However, for 64-bit architectures, hierarchical page
are general infeasible.
29
Page Tables
Typical page table entry
30
Structure of a Page Table Entry
•
•
•
•
Page frame number: map the frame number
Present/absent bit: 1/0 indicates valid/invalid entry
Protection bit: what kids of access are permitted.
Modified (dirty bit) – set when modified and writing
to the disk occur
• Referenced - Set when page is referenced (help decide
which page to evict)
• Caching disabled - Cache is used to keep data that
logically belongs on the disk in memory to improve
performance. (Reference to I/O may require no
cache!)
31
TLB
• Observation: Most programs make a large number of
references to a small number of pages.
• Solution: Equip computers with a small hardware
device, called Translation Look-aside Buffers (TLBs)
or associative memory, to map virtual addresses to
physical addresses without using the page table.
• Modern RISC machines do TLB management in
software. If the TLB is large enough to reduce the miss
rate, software management of the TLB become
acceptably efficient.
• Methods to reduce TLB misses and the cost of a TLB
miss:
– Preload pages
– Maintain large TLB
32
TLBs – Translation Lookaside Buffers
A TLB to speed up paging
33
Paging Hardware With TLB
34
Effective Access Time
• Associative Lookup =  time unit
• Assume memory cycle time is t time unit
• Hit ratio – percentage of times that a page number is
found in the associative memory; ration related to
number of associative memory.
• Hit ratio = 
• Effective Access Time (EAT)
EAT =  (t + ) + (1 – ) (2t + )
=  t +   + 2t +  - 2t -  
= (2 – )t + 
• Example:  = 0.8,  = 20 ns, t = 100 ns
EAT = 0.8 x 120 + 0.2 x (200 + 20) = 140 ns.
35
Inverted Page Table
• Usually, each process has a page table associated with
it. One of drawbacks of this method is that each page
table may consist of millions of entries.
• To solve this problem, an inverted page table can be
be used. There is one entry for each real (page) frame
of memory.
• Each entry consists of the virtual address of the page
stored in that real memory location, with information
about the process that owns that page.
• Examples of systems using the inverted page tables
include 64-bit UltraSPARC and PowerPC.
36
Inverted Page Table
• To illustrate this method, a simplified version of the
implementation of the inverted page is described as: <processid, page-number, offset>.
• Each inverted page-table entry is a pair <process-id, pagenumber>. The inverted page table is then searched for a match.
If a match i found, then the physical address <i, offset> is
generated. Otherwise, an illegal address access has been
attempted.
• Although it decreases memory needed to store each page table,
but increases time needed to search the table when a page
reference occurs.
• Use hash table to limit the search to one — or at most a few —
page-table entries.
37
Inverted Page Table Architecture
38
Inverted Page Table Structure
39
Inverted Page Tables
Comparison of a traditional page table with an inverted page table
40
Page Replacement Algorithms
• Page fault forces choice
– which page must be removed
– make room for incoming page
• Modified page must first be saved
– unmodified just overwritten
• Better not to choose an often used page
– will probably need to be brought back in soon
• Applications: Memory, Cache, Web pages
41
Optimal Page Replacement Algorithm
• Replace the page which will be referenced at
the farthest point
– Optimal but impossible to implement and is only
used for comparison
• Estimate by
– logging page use on previous runs of process
– although this is impractical
42
Not Recently Used Page Replacement Algorithm
•
Each page has Reference bit (R) and Modified bit
(M).
–
–
–
•
bits are set when page is referenced (read or written
recently), modified (written to)
when a process starts, both bits R and M are set to 0 for all
pages.
periodically, (on each clock interval (20msec) ), the R bit
is cleared. (i.e. R=0).
Pages are classified
Class 0: not referenced, not modified (00)
Class 1: not referenced, modified (01)
Class 2: referenced, not modified (10)
Class 3: referenced, modified (11)
•
NRU removes page at random
–
from lowest numbered non-empty class
43
FIFO Page Replacement Algorithm
• Maintain a linked list of all pages
– in order they came into memory with the oldest page
at the front of the list.
• Page at beginning of list is replaced
• Advantage: easy to implement
• Disadvantage
– page in memory the longest (perhaps often used)
may be evicted
44
Second Chance Page Replacement
Algorithm
• Inspect R bit:
if R = 0  evict the page
if R = 1  set R = 0 and put page at end (back)
of list. The page is treated like a newly loaded
page.
• Clock Replacement Algorithm : a different
implementation of second chance
45
Second Chance Page Replacement Algorithm
• Operation of a second chance
– pages sorted in FIFO order
– Page list if fault occurs at time 20, A has R bit set
(numbers above pages are loading times)
46
The Clock Page Replacement Algorithm
47
Least Recently Used (LRU)
•
Assume pages used recently will used again soon
– throw out page that has been unused for longest time
•
Software Solution: Must keep a linked list of pages
– most recently used at front, least at rear
– update this list every memory reference  Too expensive!!
•
Hardware solution:
1. Equip hardware with a 64 bit counter.
• That is incrementing after each instruction.
• The counter value is stored in the page table entry of the
page that was just referenced.
• choose page with lowest value counter
• periodically zero the counter
• Problem: page table is larger.
48
Least Recently Used (LRU)
•
Hardware solution:
2. Maintain a matrix of n x n bits for a machine with n
page frames.
•
When page frame K is referenced:
(i) Set row K to all 1s.
(ii) Set column K to all 0s.
• The row whose binary value is smallest is the LRU
page.
49
Simulating LRU in Software
LRU using a matrix – pages referenced in order 0,1,2,3,2,1,0,3,2,3
50
Simulating LRU in Software - NFU
• LRU hardware is not usually available. NFU (Not
Frequently Used) is implemented in software.
– At each clock interrupt, the R bit is added to the counter
associated with each page. When a page fault occurs, the page
with the lowest counter is replaced.
– Problem: NFU never forgets, so a page referenced frequency
long ago may have the highest counter.
• Modified NFU = NFU with Aging - at each clock
interrupt:
– the counters are shifted right one bit, and
– the R bits are added to the leftmost bit.
– In this way, we can give higher priority to recent R values.
51
LRU in Software – NFU with aging
• The aging algorithm simulates LRU in software
• Note 6 pages for 5 clock ticks, (a) – (e)
52
Working-Set Model
• Pages are loaded only on demand. This strategy is
called demand paging.
• During the phase of execution the process references
relatively small fraction of its pages. This is called a
locality of reference.
• The set of pages that a process is currently is called its
working set.
• A program causing page faults every few instructions
is said to be thrashing.
• Paging systems keep each process’ working set in
memory before letting the process run. This approach
is called the working set model.
53
Locality In A Memory-Reference Pattern
54
Working-Set Model
• Loading the pages before letting processes run is called
prepaging.
•   w (k, t)  working-set window containing k page
references at time t
• WSi (working set of Process Pi) =
total number of pages referenced in the most recent 
(varies in time)
– if  too small will not encompass entire locality.
– if  too large will encompass several localities.
– if  =   will encompass entire program.
55
Working-set model
56
The Working Set Page Replacement Algorithm
• The working set is the set of pages used by the k
most recent memory references
• w(k,t) is the size of the working set at time, t
57
Working-Set Model
• The idea is to examine the most recent  page
references. Evict a page that is not in the working set.
• The working set of a process is the set of pages it has
referenced during the past τ seconds of virtual time
(the amount of CPU time a process has actually used).
• Scan the entire page table and evict the page:
– Its referenced bit is clean and its age is greater than τ.
– Its referenced bit is clean if no page has the age greater than
τ.
– Its age is largest if no page has its referenced bit clear.
• The basic working set algorithm is expensive. Instead,
WSClock is used in practice.
58
The Working Set Page Replacement Algorithm
The working set algorithm
59
The WSClock Page Replacement Algorithm
Operation of the WSClock algorithm
60
Review of Page Replacement Algorithms
61
Modeling Page Replacement
Algorithms
• Belady discovered more page frames might not
always have fewer page faults. This is called
Belady's anomaly.
• Conceptually, a process’ memory access can be
characterized by an (ordered) list of page numbers.
This list is called the reference string.
• A paging system can be characterized by three items:
– The reference string of the executing process.
– The page replacement algorithm.
– The number of (page) frames available in memory, m.
62
Modeling Page Replacement Algorithms
Belady's Anomaly
• FIFO with 3 page frames
• FIFO with 4 page frames
• P's indicate which page references cause page faults
63
Modeling Page Replacement
Algorithms
• Modeling LRU Algorithms:
– When a page is referenced, it is always moved to
the top entry in pages in memory.
– If the page referenced was already in memory, all
pages above it move down one position.
– Pages that below the referenced page are not
moved.
• The algorithms that do not suffer from Belady’s
algorithm such as LRU and optimal page
replace algorithm are called stack algorithms.
64
Stack Algorithms
• State of memory array, M, after each item in
reference string is processed.
• n virtual pages and m page frames.
65
FIFO Page Replacement
• 15 page faults
66
Optimal Page Replacement
• 9 page faults
67
LRU Page Replacement
• 12 page faults
68
Distance String
• Distance String - A page reference can be
denoted by the distance it is from the top.
Pages not in memory are at a distance infinity.
The string so generated for a given reference
string is called the distance string.
– (Distance = 1 <==> page is at the top.)
– (Distance = infinity <==> page is not in memory
and has not been accessed yet.
69
The Distance String
• Probability density functions for two
hypothetical distance strings
• Which needs more page frames?
– In the right one the references are so spread out
that more frames are required to avoid page faults.
70
The Distance String
• Computation of page fault rate from distance string
– the C vector (e.g. Ci - # times of i distance string)
– the F vector (e.g. Fm - # of page faults with m frames)
71
Design Issues: Local versus Global
Allocation Policies
• Global algorithms dynamically allocate page
frames among all runnable processes. Local
algorithms allocate pages for a single process.
• A global algorithm such as page fault
frequency (PFF) algorithm is used to prevent
thrashing and keep the paging rate within
acceptable bounds.
– too high page faults  assign more page frames to
the process.
– too low page faults  assign process fewer page
frames.
72
Design Issues for Paging Systems
Local versus Global Allocation Policies
• Original configuration
• Local page replacement
• Global page replacement
73
Local versus Global Allocation Policies
• Page fault rate as a function of the number of page
frames assigned
– A – high PF
B – low PF
74
Load Control
• Despite good designs, system may still thrash
• When PFF (page fault frequency) algorithm
indicates
– some processes need more memory
– but no processes need less
• Solution :
Reduce number of processes competing for
memory
– swap one or more to disk, divide up pages they held
among other processes that are trashing
– reconsider degree of multiprogramming
75
Page Size
Small page size
• Advantages
– less internal fragmentation
– better fit for various data structures, code sections
– less unused program in memory
• Disadvantages
– programs need many pages, larger page tables
76
Page Size
• Overhead due to page table and internal
fragmentation
page table space
s e p
overhead 

p 2
internal
fragmentation
• Where
– s = average process size in bytes
– p = page size in bytes
– e = page entry
Optimized when
p  2se
77
Page Size
• Example:
– s = 128K
– e=8
– p = square root of (2(128K)(8)) = 1448
– p = 1k or 2k
– In general, 512 <= page size <= 8k.
78
Separate Instruction and Data Spaces
• Most systems separate address spaces for
instructions (program text) and data.
• A process can have two pointers in its process
table: one to the instruction page and one to
the data page. A shared code can be pointed by
two processes.
• For the shared data only the data pages that are
written need be copied. This approach is Copy
on write.
79
Separate Instruction and Data Spaces
• One address space
• Separate I and D spaces
80
Shared Pages
Two processes sharing same program sharing its page table
81
Cleaning Policy
• Need for a background process, paging daemon
– periodically inspects state of memory
• When too few frames are free
– selects pages to evict using a replacement algorithm
• It can use a two-handed clock.
– The font hand is controlled by the paging system.
– The back hand is used for page replacement
algorithm.
82
Implementation Issues
Operating System Involvement with Paging
Four times when OS involved with paging
1.
Process creation


determine program size
create page table
Process execution
2.


MMU reset for new process
TLB flushed
Page fault time
3.


determine virtual address causing fault
swap target page out, needed page in
Process termination time
4.

release page table, pages
83
Page Fault Handling
1.
2.
3.
4.
5.
Hardware traps to kernel
General registers saved
OS determines which virtual page needed
OS checks validity of address, seeks page
frame
If selected frame is dirty, write it to disk
84
Page Fault Handling
6.
7.

6.
7.

OS brings schedules new page in from disk
Page tables updated
Faulting instruction backed up to when it began
Faulting process scheduled
Registers restored
Program continues
85
Instruction Backup
An instruction causing a page fault
86
Locking Pages in Memory
• Virtual memory and I/O occasionally interact
• Process issues call for read from device into
buffer
– while waiting for I/O, another process is allowed to
run
– this other process has a page fault
– buffer for the first process may be chosen to be paged
out
• If a page transferring data through the I/O is
paged out, it will cause part of the data in buffer
and part in the newly loaded page. In this case,
the page need be locked (pinning).
87
Backing Store
• Two approaches can be used to allocate page
space on the disk:
– Paging to static swap area - Reserve separate swap
areas for the text, data, and stack when the process
is started.
– Backing up pages dynamically with a disk map Allocate disk space for each page when it is
swapped in and out.
88
Backing Store
(a) Paging to static swap area
(b) Backing up pages dynamically
89
Separation of Policy and Mechanism
• To manage the complexity of any system the
policy is separated from the mechanism.
• The memory management system can be
divided into three parts:
– A low-level MMU handler.
– A page fault handler that is part of the kernel.
– An external pager running in user space.
90
Separation of Policy and Mechanism
Page fault handling with an external pager
91
Segmentation
• Consider a compiler which has many tables.
– In one-dimensional design the table will grow and
bump into another.
– A segmented memory allows each table to grow or
shrink.
92
Segmentation
• One-dimensional address space with growing tables
• One table may bump into another
93
Segmentation
Allows each table to grow or shrink, independently
94
Segmentation
• A segment is a logically independent address
space.
– segments may have different sizes
– their sizes may change dynamically
– the address space uses 2-dimensional memory
addresses and has 2 parts:
– (segment #, offset within segment)
– segments may have different protections
– allows for the sharing of procedures and data
between processes. An example is the shared
library.
95
Segmentation
Comparison of paging and segmentation
96
Implementation of Pure Segmentation
• The implementation of segmentation differs
from paging in an essential way: pages are fixed
size and segments are not.
• External fragmentation or checkerboarding is
wasted memory in the holes. It can be dealt
with by compaction.
97
Implementation of Pure Segmentation
(a)-(d) Development of checkerboarding
(e) Removal of the checkerboarding by compaction98
Retrospect: Pure paging
• Address generated by CPU is divided into:
– Page number (p) – used as an index into a page table which
contains base address of each page in physical memory.
– Page offset (d) – combined with base address to define the
physical memory address that is sent to the memory unit.
page number
page offset
p
d
m–n
n
Where p is an index to the page table and d is the
displacement within the page.
99
Pure paging
10
Retrospect: Pure segmentation
• Address generated by CPU is divided into:
– Segment number (s) – used as an index into a segment table
which contains the base and limit address of each segment.
– Segment displacement (d) – combined with base address to
define the physical memory address that is sent to the
memory unit. If d < limit, base + d specify the physical
memory address.
segment number
displacement
s
d
m–n
n
Where p is an index to the segment table and d is the
displacement within the page.
10
Pure Segmentation
10
Segmentation with Paging
• Paging segments allows the large segments to
fit in main memory.
• MULTICS ran on the Honeywell 6000
machines Motorola 68000 line is designed
based on a flat-address space, whereas the Intel
80x86 and Pentium family are based on
segmentation. Both are merging memory
models toward a mixture of paging and
segmentation.
10
Segmentation with Paging: MULTICS
A 34-bit MULTICS virtual address
10
Segmentation with Paging: MULTICS
• Descriptor segment points to page tables
• Segment descriptor – numbers are field lengths 10
Segmentation with Paging: MULTICS
Conversion of a 2-part MULTICS address into a main memory address
10
Segmentation with Paging: MULTICS
• Simplified version of the MULTICS TLB
• Existence of 2 page sizes makes actual TLB more complicated
10
Segmentation with Paging – Intel 386
• On 386, the logical-address space of a process is
divided into two partitions. The first partition is private
to that process and the second is shared among all
process.
• Information about the first partition is kept in the local
descriptor table (LDT), information about the second
partition is kept in the global descriptor table (GDT).
• The physical address on the 386 is 32 bits. The
segment register points to the appropriate entry in the
LDT or GDT.
10
Segmentation with Paging – Intel 386
• The base and limit information about the segment are
used to generate a linear address. It is divided into a
page number of 20 bits, and a page offset consisting of
12 bits.
• As shown in the following diagram, the Intel 386 uses
segmentation with paging for memory management
with a two-level paging scheme.
10
Segmentation with Paging: Pentium
A Pentium selector
11
Segmentation with Paging: Pentium
• Pentium code segment descriptor
• Data segments differ slightly
11
Segmentation with Paging: Pentium
Conversion of a (selector, offset) pair to a linear address
11
Segmentation with Paging: Pentium
Mapping of a linear address onto a physical address
11
Segmentation with Paging: Pentium
Level
Protection on the Pentium
11
Download