Virtual Memory Overview: Virtual Memory

advertisement
TDDB68
Concurrent programming and
operating systems
Overview: Virtual Memory
 Background
 Demand Paging
 Page Replacement
 Allocation of Frames
Virtual Memory
 Thrashing and Data Access Locality
 Process Creation: Copy-on-Write
[SGG7/8/9]
 I/O Interlock
Chapter 9
 Memory-Mapped Files  later
Copyright Notice: The lecture notes are mainly based on Silberschatz’s, Galvin’s and Gagne’s book (“Operating System
Concepts”, 7th ed., Wiley, 2005). No part of the lecture notes may be reproduced in any form, due to the copyrights
reserved by Wiley. These lecture notes should only be used for internal teaching purposes at the Linköping University.
Christoph Kessler, IDA,
Linköpings universitet.
Virtual address space of a process
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.2
Silberschatz, Galvin and Gagne ©2005
Virtual Memory
 Previously achieved:
Separation of user logical memory from physical memory
Sparse address space:
”Holes” should be allocated pages
only when the stack/heap grows

++ protection, reuse

Still, the whole program and data must be loaded in memory
 Virtual memory
Idea: Only part of the program needs to be in memory for execution.

Throw out pages currently not used (to secondary memory)
Logical address space can therefore be much larger
than physical address space.
Allows address spaces to be shared by several processes.

Allows for more efficient process creation.


Not all functionality provided in
the program text will be executed
in every run of the program
 Virtual memory can be implemented via:


TDDB68, C. Kessler, IDA, Linköpings universitet.
9.3
Silberschatz, Galvin and Gagne ©2005
Virtual Memory That is Larger Than Physical Memory
Demand paging
Demand segmentation
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.4
Demand Paging
Silberschatz, Galvin and Gagne ©2005
[Kilburn et al. 1961]
 Bring a page into memory only when it is needed


Less I/O needed

Less memory needed

Faster response

More users
 Page is needed if referenced (load/store, data/instructions)
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.5
Silberschatz, Galvin and Gagne ©2005

invalid reference  abort

not-in-memory  bring to memory
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.6
Silberschatz, Galvin and Gagne ©2005
1
Demand paging (“lazy paging”, “lazy swapping”):
Transfer of a Paged Memory to Contiguous Disk Space
Valid-Invalid Bit
 With each page table entry, a valid–invalid bit is associated
(1  page in memory, 0  not-in-memory)
 Initially, valid–invalid bit is set to 0 on all entries
 Example of a page table snapshot:
Frame #
valid-invalid bit
1
1
1
1
0

0
0
page table
Rather than swapping entire processes (cf. swapping),
we page their pages from/to disk only when first referenced.
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.7
Silberschatz, Galvin and Gagne ©2005
Page Table When Some Pages Are Not in Main Memory
 During address translation,
if valid–invalid bit in page table entry is 0  page fault
TDDB68, C. Kessler, IDA, Linköpings universitet.
Silberschatz, Galvin and Gagne ©2005
9.8
Steps in Handling a Page Fault
(Case: a free frame exists)
”Busy waiting” for slow disk?
Better allocate the CPU to
some other ready process in
the meanwhile…
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.9
Silberschatz, Galvin and Gagne ©2005
What happens if there is no free frame?
find some page in memory, but not really in use,
swap it out

Write-back only necessary if victim page was modified

Same page may be brought into memory several times

Various algorithms

Performance
an algorithm that minimizes number of page faults
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.10
Silberschatz, Galvin and Gagne ©2005
Performance of Demand Paging
0  p  1.0
if p = 0: no page faults
 if p = 1, every reference is a fault
 Memory access time t
 Page Fault Rate p
 Page replacement –
want
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.11
Silberschatz, Galvin and Gagne ©2005

 Effective Access Time (EAT)
EAT = (1 – p) t
+ p . (
TDDB68, C. Kessler, IDA, Linköpings universitet.
page fault overhead
+ time to swap page out, if modified
+ time to swap new page in
+ restart overhead
+t )
9.12
Silberschatz, Galvin and Gagne ©2005
2
Demand Paging Example
Example: Need For Page Replacement
 Write-back rate w
0 <= w <= 1
% of page faults where page replacement is needed and the
victim page has been modified so it needs to be swapped out
kernel
0
1
2
 Example:
3

Memory access time = 1 microsecond (ms)

Time for swapping a page = 10 ms = 10,000 ms

Write-back rate w = 0.5 = 50%
...
...
page fault for
page 3
 expected swap time per page fault = (1+w) * 10,000 ms
0
1
2
 EAT = (1 – p) . 1ms + p (15,000 ms)
all frames
currently in
use...
3
= 1 ms + p . 14999 ms
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.13
Silberschatz, Galvin and Gagne ©2005
TDDB68, C. Kessler, IDA, Linköpings universitet.
Silberschatz, Galvin and Gagne ©2005
9.14
Page Replacement
Basic Page Replacement
 Prevent over-allocation of memory by modifying page-fault
Extended page-fault service routine:
service routine to include page replacement
 Find the location of the desired page on disk
 Use modify (dirty) bit to reduce overhead of page transfers –
only modified pages are written to disk
 Find a free frame:
- If there is a free frame,
use it
 Page replacement completes separation between logical
memory and physical memory – large virtual memory can be
provided on a smaller physical memory
- If there is no free frame,
use a page replacement algorithm
to select a victim frame
and, if dirty, write its page to disk (1,2)
 Read the desired page into the (newly) free frame (3)
Update the page table (4)
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.15
Silberschatz, Galvin and Gagne ©2005
How to compare algorithms
for page replacement?
 Restart the process
TDDB68, C. Kessler, IDA, Linköpings universitet.
Silberschatz, Galvin and Gagne ©2005
9.16
First-In-First-Out (FIFO) Algorithm
 Use a time stamp or a queue
 Goal: find algorithm with lowest page-fault rate.
 Victim is the ”oldest page”
 Method: Simulation.
 Assume table size = 3 frames / process



Assume initially empty page table
Run algorithm on a particular string of memory references
(reference string – page numbers only)
Count number of page faults on that string.
(3 pages can be in memory at a time per process)
and reference string:
1,
1
2,
3,
4,
1,
2,
5,
1,
2,
3,
4,
5
1
1
4
4
4
5
5
5
5
5
5
2
2
2
1
1
1
1
1
3
3
3
3
3
3
2
2
2
2
2
4
4
A total
of 9 page
faults
 In all our examples, the reference string is
After page 3 is loaded,
page 1 is the oldest of
them all (gray box)
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5.
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.17
Silberschatz, Galvin and Gagne ©2005
TDDB68, C. Kessler, IDA, Linköpings universitet.
The fact that we re-use an existing
page does not alter who has been
in there the longest...
9.18
Silberschatz, Galvin and Gagne ©2005
3
Expected Graph of
Page Faults Versus Number of Frames
Same FIFO: More frames = better ?
 4 frames/process
(4 pages can be in memory at a time per process)
 Reference string:
1, 2,
1
3,
4,
1, 2,
5, 1, 2,
3,
4, 5
1
1
1
1
1
5
5
5
5
4
4
2
2
2
2
2
2
1
1
1
1
5
3
3
3
3
3
3
2
2
2
2
4
4
4
4
4
4
3
3
3
A total
of 10 page
faults
 FIFO Replacement – Belady’s Anomaly

more frames with more page faults – possible!
Generally, more frames => Less page faults ?
TDDB68, C. Kessler, IDA, Linköpings universitet.
Silberschatz, Galvin and Gagne ©2005
9.19
FIFO illustrating Belady’s Anomaly
TDDB68, C. Kessler, IDA, Linköpings universitet.
Silberschatz, Galvin and Gagne ©2005
9.20
An Optimal Algorithm
[Belady 1966]
 ”optimal”:

has the lowest possible page-fault rate (NB: still ignoring dirty-ness)

does not suffer from Belady’s anomaly
Remark: Belady’s algorithm
is only optimal if there are no
dirty write-backs. Otherwise
it is just a heuristic algorithm.
 Belady’s Algorithm: Farthest-First, MIN, OPT

Replace page that will not be used for the longest period of time....

How do you know this?
 Example: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
Silberschatz, Galvin and Gagne ©2005
9.21
Least Recently Used (LRU) Algorithm
 Algorithm:
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
2
1
2
3
1
2
3
4
1
2
3
4
1
2
3
4
1
2
5
4
1
2
5
4
1
2
5
4
1
2
5
3
1
2
4
3
1
4
1
2
2
2
2
2
2
2
3
3
1
3
1
3
1
3
3
3
3
4
4
4
5
5
5
5
5
5
9.22
A total
of 6 page
faults
We will need frames 1, 2 and 3
before we need frame 4 again,
thus throw it out!
Silberschatz, Galvin and Gagne ©2005

The process maintains a logical clock
(counter for number of memory accesses made)


Every page entry has a timestamp field
Every time a page is referenced,
copy the logical clock into the timestamp field

When a page needs to be replaced, search for oldest timestamp
9.23
Overhead!
Linear search!
 Stack implementation
5
2
4
3
A total
of 8 page
faults
Out of pages 1,2,3 and 4,
page 3 is the one not used for
the longest time...
TDDB68, C. Kessler, IDA, Linköpings universitet.
1
2
3
TDDB68, C. Kessler, IDA, Linköpings universitet.
Replace the page that has not been used for the longest period of time
1
1
2
3
 Timestamp implementation
....try using recent history as approximation of the future!
 Example:
1
2
LRU Algorithm Implementations
 Optimal algorithm not feasible?

1
2
Theoretical algorithm! Used for measuring
how well your algorithm performs, e.g.,
”is it within 12% from the optimal?”
more frames but more page faults
TDDB68, C. Kessler, IDA, Linköpings universitet.
1
Silberschatz, Galvin and Gagne ©2005

keep a stack of page numbers

in doubly linked form:
page referenced:
 move
it to the top
6 pointers
to be changed
 No search for replacement
TDDB68, C. Kessler, IDA, Linköpings universitet.
 requires
9.24
Silberschatz, Galvin and Gagne ©2005
4
LRU Approximation Algorithms
Counting Algorithms
 Reference bit for each page





 Keep a counter of the number of references that have been
Initially = 0 (by OS)
When page is referenced,
bit set to 1
Replace a page whose bit is 0
(if one exists).
We do not know the access order,
however.
May improve precision by
using several bits.
made to each page
 LFU Algorithm: replaces page with smallest count
 MFU Algorithm: based on the argument that the page with
the smallest count was probably just brought in and has yet
to be used
 Second chance algorithm


Clockwise replacement
If page to be replaced (in clock order) has reference bit = 1 then:
 set reference bit 0
 leave page in memory
 replace next page (in clock order), subject to same rules
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.25
Silberschatz, Galvin and Gagne ©2005
 Not common in today’s operating systems
TDDB68, C. Kessler, IDA, Linköpings universitet.
Allocation of Frames
Fixed Allocation
 Each process needs a minimum number of pages
 Equal allocation

Example:
IBM 370 – 6 pages to handle SS MOVE instruction:
instruction
is 6 bytes, might span 2 pages
2
pages to handle from
2
pages to handle to

fixed allocation

priority allocation
TDDB68, C. Kessler, IDA, Linköpings universitet.

9.27
Example: if there are 100 frames and 5 processes,
give each process 20 frames.
 Proportional allocation
Allocate according to the size of process
si  size of process pi
m  64
s1  10
m  total number of frames
s
ai  allocation for pi  i  m
S
s2  127
S   si
 Two major allocation schemes

Silberschatz, Galvin and Gagne ©2005
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.28
Priority Allocation
Global vs. Local Allocation
 Use a proportional allocation scheme
 Global replacement
using priorities rather than size
 If process Pi generates a page fault,

select for replacement one of its frames

select for replacement a frame from a process with lower
priority
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.29
Silberschatz, Galvin and Gagne ©2005
Silberschatz, Galvin and Gagne ©2005
9.26
10
 64  5
137
127
a2 
 64  59
137
a1 
Silberschatz, Galvin and Gagne ©2005

process selects a replacement frame
from the set of all frames;

one process can take a frame from another
 Local replacement

each process selects from only its own set of allocated
frames
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.30
Silberschatz, Galvin and Gagne ©2005
5
Thrashing
Demand Paging and Thrashing
 Why does demand paging work?
Locality model
 If a process does not have “enough” pages,

Locality = set of pages that
are actively used together

Process migrates from
one locality to another

Localities may overlap
the page-fault rate is very high. This leads to:



low CPU utilization
operating system thinks that it needs to increase the
degree of multiprogramming
another process added to the system
 Why does thrashing occur?
 sizes of localities
> total memory size
 Thrashing  a process is busy swapping pages in and out
TDDB68, C. Kessler, IDA, Linköpings universitet.
Silberschatz, Galvin and Gagne ©2005
9.31
Working-Set Model
[Denning, 1968]
TDDB68, C. Kessler, IDA, Linköpings universitet.
Silberschatz, Galvin and Gagne ©2005
9.32
Program Structure Matters!
j
 Example: Matrix initialization

int data [128][128];

Assume page size = 128 ints

Assume < 128 frames available, LRU repl.
 Each
 WSi = WSi (t) = working set of process Pi at current time t
= set of pages referenced in the most recent  accesses

if  too small, WS will not encompass entire locality
if  too large, WS will encompass several localities

if  =   WS will encompass entire program

WS can be derived
e.g. by simulation,
by sampling or
tracing (with hardware support).
row is stored in one page
....
   working-set window  a fixed number of page references
i
 Program 1: for (j = 0; j <128; j++)
for (i = 0; i < 128; i++)
data[i][j] = 0;
// traverse column-wise
-> 128 x 128 = 16,384 page faults
 WSSi = |WSi| = working set size of process Pi at current time
= total number of pages referenced in the most recent  accesses
 D=
 Program 2: for (i = 0; i < 128; i++)
i WSSi  total demand frames
if D > m  Thrashing
 Policy: if D > m, then suspend
one of the processes
Silberschatz, Galvin and Gagne ©2005
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.33
for (j = 0; j < 128; j++)
data[i][j] = 0;

TDDB68, C. Kessler, IDA, Linköpings universitet.
// traverse row-wise
Data access
-> 128 page faults
9.34
locality is an important
issue for programmers
Silberschatz,
Galvin and compilers
Gagne ©2005
and
optimizing
Process Creation: Copy-on-Write
Copy-on-Write
 fork():
 Copy-on-Write (COW) allows both parent and child processes
Child process
gets a copy of
parent process’
address space
to initially share the same pages in memory
 Lazy copying
process2 writes
to page C

If either process modifies a shared page,
only then the page is copied

Free pages are allocated from a pool of zeroed-out pages

More efficient process creation
as only modified pages are copied

No improvement if child process does exec (overwrites
all)
 Alternatively: vfork() in Unix, Solaris, Linux:

copy of C
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.35
Silberschatz, Galvin and Gagne ©2005
Light-weight fork variant:
sleeping parent and immediate
exec() by child
process
Silberschatz, Galvin and Gagne ©2005
9.36
TDDB68, C. Kessler, IDA, Linköpings universitet.
6
I/O interlock
Memory-Mapped Files
 I/O Interlock
– Pages must sometimes be locked in memory
Uses physical
mem. addresses
 Consider I/O.
Pages that are used for
copying a file from a device
must be locked from being
selected for eviction by a
page replacement algorithm.
DMA /
interruptdriven I/O
 Memory-mapped file I/O
allows file I/O to be treated
as routine memory access
by mapping a disk block
to a page in memory
 A file is initially read using demand paging.
A page-sized portion of the file is read from the file system into
a physical page. Subsequent reads/writes to/from the file are
treated as ordinary memory accesses.
 Simplifies file access by treating file I/O through memory
rather than read() / write() system calls
(Physical) address of buffer gets
stale if any of its pages is
evicted: risk for overwriting
(unrelated) data
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.37
 Also allows several processes to map the same file
allowing the pages in memory to be shared
Silberschatz, Galvin and Gagne ©2005
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.38
Silberschatz, Galvin and Gagne ©2005
Data used by
Page Replacement Algorithms
TDDB68
Concurrent programming and
operating systems
 All algorithms need extra data in the page table,
e.g., one or several of the following:
Additional Slides
on Virtual Memory

A reference bit to mark if the page has been used

a modify (dirty) bit to mark if a page has been written to
(changed) since last fetched from SM

Additional mark bits

Counters or time stamps (used for theoretical algorithms –
must often be converted into use of mark bits)

A queue or stack of page numbers...
Copyright Notice: The lecture notes are mainly based on Silberschatz’s, Galvin’s and Gagne’s book (“Operating System
Concepts”, 7th ed., Wiley, 2005). No part of the lecture notes may be reproduced in any form, due to the copyrights
reserved by Wiley. These lecture notes should only be used for internal teaching purposes at the Linköping University.
Christoph Kessler, IDA,
Linköpings universitet.
Page-Fault Frequency Scheme
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.40
Silberschatz, Galvin and Gagne ©2005
Other Issues – Prepaging
 Prepaging
 Establish “acceptable” page-fault rate

If actual rate too low, process loses frame

If actual rate too high, process gains frame

To reduce the large number of page faults ( kernel entry
overhead) that occurs at process startup

Prepage all or some of the pages a process will need,
before they are referenced

But if prepaged pages are unused,
I/O and memory was wasted

Assume s pages are prepaged and
fraction α of the pages is used
Is
cost of s * α saved page faults > or < than
the cost of prepaging s * (1 – α) unnecessary pages?
α
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.41
Silberschatz, Galvin and Gagne ©2005
near zero  prepaging loses
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.42
Silberschatz, Galvin and Gagne ©2005
7
Other Issues – Page Size
Other Issues – TLB Reach
 Page size selection must take into consideration:
 TLB Reach = the amount of memory accessible from the TLB

fragmentation
 TLB Reach = (TLB Size) . (Page Size)

table size
 Ideally, the working set of each process is stored in the TLB.

I/O overhead

Otherwise there is a high degree of page faults.
 Increase the page size.
locality
This may lead to an increase in fragmentation as not all
applications require a large page size
 Provide Multiple Page Sizes.
This allows applications that require larger page sizes the
opportunity to use them without an increase in fragmentation.
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.43
Silberschatz, Galvin and Gagne ©2005
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.44
Silberschatz, Galvin and Gagne ©2005
References
 Belady, L.: A Study of Replacement of Algorithms for a Virtual
Storage Computer. IBM Systems Journal 5 (1966) 78–101
 Peter J. Denning: The working set model for program
behavior, Communications of the ACM 11(5):323-333, May
1968
 Peter J. Denning: The Locality Principle. Communications of
the ACM 48(7):19-24, July 2005.
TDDB68, C. Kessler, IDA, Linköpings universitet.
9.45
Silberschatz, Galvin and Gagne ©2005
8
Download