Week 6 Power Point Slides

advertisement
Background Information
• To execute
– Processes must be in main memory
– The CPU can only directly access main memory and registers
• Speed
–
–
–
–
Register access requires a single CPU cycle
Accessing main memory can take multiple cycles
Accessing disk can take milliseconds
Cache sits between main memory and CPU registers
• Memory mapping: always depends on hardware assists
• Depending on the Hardware, processes might
– Contiguous logical memory; contiguous in physical memory
– Contiguous logical memory; scattered through physical memory
• Memory protection: processes have a limited memory view
Memory Management Issues
Goal: Effective Allocation of memory among processes
1. How and when are memory references bound to absolute
physical addresses?
2.
How can processes maximize memory use?
•
•
•
•
•
How many processes can be in memory?
Can processes move during while they execute?
Can programs exceed the size of physical memory?
Do entire programs need to be in memory to run?
Can memory be shared among processes?
3.
How are processes protected from each other?
4.
What are the system limitations? memory limits? CPU
processing speed? Disk speed? Hardware assistance?
Logical vs. Physical Address Space
• Definitions
–Memory Management Unit (MMU): Device mapping logical
(virtual) addresses to physical addresses
–Logical address – process view of memory
–Physical address –MMU view of memory
• Memory references
– Logical and physical addresses are the same when binding
occurs during compile or load time
– Logical and physical addresses are different when binding
occurs dynamically during execution
When are Processes Bound to Memory
• Compile time: Compiler
generates absolute
references
• Load time: Compiler
generates relocatable
code. The Link Editor
merges separately
compiled modules and the
loader generates absolute
code
• Execution time: Binding
delayed until run time.
Processes can move
during execution.
Hardware support
required.
A Simple Memory Mapping Scheme
• Controlled by a pair of
base and limit registers
define the logical
address space
• The MMU adds the
content of the relocation
(base) register to each
memory reference
• The limit register
disallows reverences
that are out of bounds
Hardware to Support Many Processes in Memory
MMU Relocation Register Protection
• Program accesses a memory location
• Trap
– When accessing a location that is out of range
– Action: terminate the process
Improving Memory
Utilization
Overlays
• Parts of a process load into an overlay
area
• Implemented by user programs using
an overlay aware loader
Swapping with OS support
• Backing store: a fast disk partition
large enough to accommodate direct
access copies of all memory images
• Swap operation: Temporarily roll out
lower priority process and roll in
another process on the swap queue
• Issues: seek time and transfer time
• Modified versions of swapping are
found on many systems (i.e., UNIX,
Linux, and Windows)
Overlays
Swapping
Dynamic Library Loading
• Definitions:
– Library functions: those which are common to many applications
– Dynamic loading: the process of loading library functions at run time
• Advantages
– Unused functions are never loaded
– Minimize memory use if large functions handle infrequent events
– Operating system support is not required.
• Disadvantage:
– Library functions are not shared among processes
– Could require application load requests
Dynamic Linking
• Assumption: A run-time (shared) library exists
– Set of functions shared by many processes
– Linked at execution time
• Stub
– A piece of code that locates the memory-resident library function
– The stub replaces itself and with the library function address and
executes it
• Operating System Support
– Return address of function if in memory
– Load the function if it is not in memory
Contiguous Memory Allocation
Each Process is stored in one contiguous block
• Memory is partitioned into two areas
– The kernel and interrupt vector is usually in low memory
– User processes are in high memory
• Single-partition allocation
– MMU relocation base and limit registers enforce memory protection
– The size of the operating system doesn’t impact user programs
• Multiple-partition allocation
– Processes allocated into spare ‘Holes’ (available areas of memory)
– Operating system maintains allocated and free memory
OS
OS
OS
OS
process 5
process 5
process 5
process 5
process 9
process 9
process 8
process 2
process 10
process 2
process 2
process 2
Algorithms for Contiguous Allocations
• Issues: How to maintain the free list; what is the search algorithm complexity?
• Algorithms (Comment worst-fit generally performs worst)
– First-fit: Allocate the first hole that is big enough.
– Best-fit: Use smallest hole that is big enough; Leaves small leftover hole
– Worst-fit: Allocate the largest hole; Leaves large leftover holes
• Fragmentation
– External: memory holes limit possible allocations
– Internal: allocated memory is larger than needed
– 50% fragmentation rule: ½ of memory lost because of fragmentation
• Compaction Algorithm
– Shuffle memory contents to place all free memory together.
– Issues
• Memory binding must be dynamic
• Time consuming, handling physical I/O during the remapping
Paged Memory Addressing
• The MMU causes every memory reference
instruction address to contain a:
– Page number (p) – index into a page table
array containing the base address of every
frame in physical memory
– Page offset (d) – Offset into a physical frame
page number
p
m-n
page offset
d
n
– Logical addresses contain m bits, n of which are
a displacement. There are 2m pages of size 2n
– Advantage: No external fragmentation
Paging
Definition: A page is a fixed-sized block of logical memory, generally a
power of 2 in length between 512 and 8,192 bytes
Definition: A frame is a fixed-sized block of physical memory. Each frame
corresponds to a single page
Definition: A Page table is an array that translates from pages to frames
• Operating System responsibilities
– Maintain the page table
– Allocate sufficient pages from free frames to execute a program
• Benefit: Logical address space of a process can be noncontiguous
and allocated as needed
• Issue: Internal fragmentation
Paged Memory Allocation
1. p indexes the page table
referring to physical frames
2. d is the offset into a
physical frame
3. Each process has an OS
maintained page table
Process page table
Four locations per page
Physical frames
Note: Instruction address
bits define bounds of the
logical address space
Page Table Examples
Memory layout
Before allocation
After allocation
Page Table Implementation
• Hardware Assist
– Page-table base register (PTBR) addresses the page table
– Page-table length register (PRLR) page table size
• Issue:
– Every memory access requires two trips to memory which
could slow the processor speed by half
– (1) read page table; access (2) memory reference
• Solution: A translation look-aside associative
memory (TLBs), which we describe on the next slide
Translation look-aside buffers
Associative Memory (parallel search) to avoid double memory access
• Two column table
– Return frame If page found
– Otherwise use page table
•
Timing:
– Assume:
•
•
•
20 ns TLB access,
100 ns main memory access,
hit ratio 80%
– Expected access time (EAT):
.8 * 20 + .2 *120 + 100 = 140 ns
Page Number
Frame Number
Note: The TLB is flushed on context
switches
Extra Page Table Bits
• Valid-invalid bits
– “valid”: page belongs to
process; it is legal
– “invalid”: illegal page that
is not accessible
• Expanded uses
– Virtual memory: page trap
triggers a disk load
– Read only page
– Address-space identifier
(ASID) to identify the
process owning the page
• Note
– The entire last partial page
is marked as valid
– Processes can access
those locations incorrectly
Processes Sharing Data (or Not)
• Shared
– One copy of read-only
code shared among
processors
– Mapped to same logical
address of all
processes
• Private
– Process keeps a
separate copy
– code and data enable
data to be anywhere in
memory
Hierarchical Page Tables
• Single level
Page
Offset
20
12
• Hierarchical Two level
Outer
page
Inner
page
Offset
10
10
12
• Notes:
– Tree structure
– Multiple memory accesses
required to find the actual
physical locations
– Parts of the page table can
be on disk
Three-level Paging Scheme
Hashed Page Tables
Hashing complexity is close to O(1)
• Collisions resolved
using separate
chaining (linked list)
• Virtual page number
hashed to physical
frame
• Common on address
spaces > 32 bits
• Ineffective if
collisions are
frequent
Inverted Page Table
Goal: Reduce page table memory requirements
• One global page table
– Advantage: Eliminates a
page table per process
– Disadvantage: Slower
memory access because
of searching
• Implementation
– Hash with key = pid &
page number
– TLB access eliminates
search most of the time
• Example: UltraSPARC
Segmentation
Supports a process view of memory
Program Segments: main, object, stack, symbol table, arrays
Subroutines (1)
Main Program (4)
Library Methods (2)
Stack (3)
Symbol Table (5)
• Segment table registers
– Segment base register (SBR) = segment table’s location
– Segment length register (SLR) = # of segments in a program
• Segments
– Are variable size; allocated via first fit/best fit algorithms
– can be shared among processors and relocated at the segment level
– able to contain protection bits for: valid bit, read/write privileges
– Suffer from external fragmentation
Segmentation Examples
Hardware
Segmentation with Paging
Segment table entries address a segment page table
Point to correct
page table
MULTICS
Intel 386
• The MULTICS system pages the segments.
Pentium Address
Translation
• Supports both
segmentation and
segmentation with paging
Segmentation with Paging
• Translation Scheme
– Segmentation unit
produces a linear
address
– The paging unit
produces the physical
address
Segmentation Only
Pentium Paging Architecture
Three-level Paging in Linux
Virtual Memory
Separate logical and physical memory spaces
• Concepts
– Programs access logical memory
– Operating system memory management and hardware coordinate to
establish a logical to physical mapping
• Advantages
–
–
–
–
–
–
–
The whole program doesn't need to be in memory
The system can execute programs larger than memory
Processes can share blocks of memory
Resident library routines
Improved memory utilization: more processes running concurrently
Memory mapped files
Copy on write algorithms
• Disadvantages
– Extra disk I/O and thrashing
Logical Memory Examples
Copy on Writes
• Processes initially share the same pages
• Operating System Support
– Maintains a list of free zeroed out pages
– Each get their own copy only after the page modified
Before Modification
After Modification
Demand Paging
Definition: Pages loaded into memory “on demand”
The Lazy Swapper (pager)
The lazy swapper load pages only when needed. This minimizes
I/O and memory requirements and allows for more users
Hardware
Support
Frame #
Valid
Dirty
1
1
1
1
1
0
.
.
.
.
.
.
.
.
.
1
1
0
1
1
0
Page table entries contain valid bits and dirty bits
• Valid-invalid bits - set 0 (invalid) when the page is not in memory.
• Dirty bits are set when a page gets modified. This avoids
unnecessary writes during swaps.
Advantages: less I/O, less memory, faster response, more users
Page Faults
Note: Some pages can be loaded and swapped out multiple times
Note: Unused bits for invalid entries contain the page’s disk address
Processing Page Faults
User program references a location not resident
Page fault occurs and OS handles the fault
IF invalid reference, abort program
ELSE
IF no empty frame
Choose victim and write to backing store
ELSE Choose empty frame
Find needed page on disk
Read page into frame
Update page tables
Set page table valid bit
Re-execute the instruction causing the page fault
Performance of Demand Paging
• Page Fault Rate 0  p  1.0
– p=0 means no page faults
– p=1 means every reference triggers a page fault
• Effective Access Time (EAT)
– EAT=(1–p) x memory access + p* (page fault overhead
+ swap page out + swap page in + restart overhead)
• Example
– p = 0.01
– Memory access time = 200 nanoseconds
– Average page-fault service time = 8 milliseconds
– Restart overhead is insignificant
– EAT = (1 – p) x 200 + p x 8,000,000 = 198 + 80000 ≈ 80 us
Question: Is the flexibility worth the extra overhead?
Page Replacement Algorithms
Occurs when all of the frames are occupied
Swap victim out and bring page in
•
•
•
•
•
Technique: Assign a number of frames to each process (x axis)
Goal: minimize page faults (y axis)
Algorithm Evaluation: Count faults using a predefined access string
Belady’s Anomaly: When allocating more frames causes more faults
Copy out: Only write frames to backing store that are “dirty”
First-In-First-Out (FIFO) Algorithm
Memory Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
Illustration of
Belady’s Anomoly
• Case 1: A process can
have 3 frames at a time.
1
4
5
2
1
3 9 page faults
3
2
4
Another reference string example
• Case 2: A process can
have 4 frames at a time.
1
5
4
2
1
5
3
2
4
3
10 page faults
Optimal Page Replacement
Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
Replace the page that will not be used for the longest period of time
• A process can have
4 frames at a time
1
4
2
6 page faults
6 page faults
3
4
5
Another reference string example
• Advantage: It is optimal
• Disadvantage: We don't know the future
• Use: A good benchmark algorithm
Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
Replace the page that has not been used for the longest period of time
LRU Page
Replacement
1
5
2
8 Page faults
3
5
4
3
4
Assumption: A process can have
four frames in memory at a time
Another reference string example
Naïve Stack Implementation
• O(1) victim frame selection
• Search and update on each memory reference
Approximate LRU with Hardware Support
• Reference bit
– Each page has a reference
bit, initially = 0
– Hardware sets value = 1
when page is referenced
– OS replace the first page
with a 0 bit
• Second chance Algorithm
– Need second bit.
– Clock loop through pages.
– Replace page where
reference=0 twice in a row.
• set reference bit 0.
• leave page in memory.
• replace next page (in
clock order), subject to
same rules.
Frame Allocation
How are frames allocated among executing processes?
• Allocation can be Global or Local
– Global: selects a replacement frame from a single set all frames
– Local: Each process selects from its own set of allocated frames
• Each process needs minimum number of pages
– Ex: IBM 370 – A MOVE instruction could require 6 pages:
– Instruction is 6 bytes long and could span 2 pages.
– 2 pages for the from address, 2 pages for the to address
• Each process has a need less than a maximum number of pages
– Excessive allocation to a process can degrade system performance
• Examples of frame allocation algorithms
– fixed: Each process gets an equal number of frames
– priority: Higher priority processes get more frames
– Proportional: Size of process relative to other processes
Other Replacement Algorithms
• Lease Frequently Used (LFU)
– Replaces the page with the lowest usage count. In case of a tie, the
oldest page in memory is replaced.
– Disadvantage: A page used with heavy usage remains in memory
after it is no longer needed.
• Most Frequently Used (MFU)
– Replace the page with the largest usage count. In case of a tie,
replace the oldest page in memory.
– Idea: page with smallest count was just loaded
• Usage counts: updated at regular intervals using a page
table entry’s reference bit
Thrashing Considerations
Thrashing: Excessive system resources dedicated to swapping pages
• Insufficient frames leads to
– low CPU utilization.
– Small length of ready queue.
– Added more processes leads
to more thrashing
• Paging works because of
locality
– Processes perform most of
their work referencing narrow
ranges of memory
• Thrashing occurs when the
total size of process locality
> total memory size
Performance log – memory access over time
Working Set Model
Adjust allocated frames to references done in a window of time
• Goal: achieve an “acceptable” page-fault rate
– If actual rate too low, a process loses frame
– If actual rate too high, a process gains frame
Working-Set Model
•   working-set window  a fixed number of page references
Example: 10,000 instruction
• WSSi (working set of Process Pi) = total number of pages
referenced in the most recent  (varies in time)
– if  too small will not encompass entire locality
– if  too large will encompass several localities
– if  =   will encompass entire program
• D = Total WSSi  total demand frames
• if D > memory pages  Thrashing
• Policy if D > memory pages, then suspend one of the processes
Working-Set Model
Working set: The pages referenced during a working set window
Working-set window (): A fixed number of instruction references (ex: 10,000)
Processes are given the frames in their working set
• Considerations
– Small : Processes lose frames.
– Large : Processes gain frames.  =  includes the entire program.
– Thrashing results if the sum of all working sets (D) exceed memory (m)
• Implementation
– Suspend processes if D > memory pages
– Timer interrupts every /2 time units. Referenced pages are included in
the working set; others are discarded. Reference bits are reset
Pre-paging
•
•
•
•
Purpose: Reduce the page occurring at process startup
Pre-page all or some of the pages before they are referenced
Note: If pre-paged pages are unused, wasted I/O and memory
Assume s pages are prepaged and α of the pages are used
– Reduced page costs: s * α
– Unnecessary page loads: s * (1- α)
– IF α is near zero  pre-paging loses
Additional Considerations
• I/O Interlock – Pages involved in data transfer must be locked into memory
• TLB Size impacts working set size
– TLB Reach = (TLB Size) X (Page Size)
– If the working set is in the TLB, there will be less page faults
• Techniques to reduce page faults
– Increase the Page Size: leads to increased fragmentation
– Provide Variable Page Sizes based on application specifications
• Poor program design can increase page faults
– Example: One page for each row 1024x1024 array
– Program 1 (1024x1024 page faults) // Index by columns
for (j = 0; j < A.length; j++)
for (i = 0; i < A.length; i++) A[i,j] = 0;
– Program 2 (1024 page faults) // Index by rows
for (i = 0; i < A.length; i++)
for (j = 0; j < A.length; j++) A[i,j] = 0;
Memory
Mapped
Files
•
•
•
•
•
Disk blocks are mapped to memory pages.
We read a page-sized portion of files into physical pages
Reads/writes to/from files use simple memory access
Access without read() and write() system calls
Shared memory connects mapped memory to several processes
Memory-Mapped Files in Java
Memory-Mapped Shared Memory in
Windows
Examples
Windows NT
• Demand paging with clustering. Clustering loads surrounding pages
• Process parameters: working set minimum and working set maximum.
• Automatic working set trimming occurs if free memory is too low
Solaris 2
• Maintains list of free pages
• The pageout function selects
victims using LRU scans. It runs
more frequently if free memory
is low
• Lotsfree controls if paging starts
• Scanrate controls the page scan
rate, varying from slowscan
to fastscan
Solaris 2
Allocating Kernel Memory
• Differently from user allocation
– Deals with physical memory
– Kernel requests memory for structures of
varying sizes
– Some kernel memory must be contiguous
• Approaches
– Buddy System Allocation
– Slab Memory Allocation
Buddy
System
Allocation
• Allocates from a
fixed-size segment
of contiguous pages
• Memory is allocated
in power-of-2
blocks
– Allocation requests round up to the next power of 2
– If the kernel needs a smaller allocation than the blocks
that are available, then repeatedly split a larger block into
two buddies of next-lower power of 2 until the correct
sized block is found. Search time = O(lg tree depth)
Slab Memory Allocation
Slab: One or more physically contiguous pages
• Slab cache
– Contains of one or more slabs
– A cache exists for each unique kernel data structure
– Single cache for each unique kernel data structure
• A cache initially contains a group instantiated data structure objects
• The cache is initialized with objects marked as free
• Allocated objects are marked as used
– A new Slab is added to a cache when no more free objects
• Benefits
– No fragmentation
– Fast allocation
Slab Allocation
Download