Background Information • To execute – Processes must be in main memory – The CPU can only directly access main memory and registers • Speed – – – – Register access requires a single CPU cycle Accessing main memory can take multiple cycles Accessing disk can take milliseconds Cache sits between main memory and CPU registers • Memory mapping: always depends on hardware assists • Depending on the Hardware, processes might – Contiguous logical memory; contiguous in physical memory – Contiguous logical memory; scattered through physical memory • Memory protection: processes have a limited memory view Memory Management Issues Goal: Effective Allocation of memory among processes 1. How and when are memory references bound to absolute physical addresses? 2. How can processes maximize memory use? • • • • • How many processes can be in memory? Can processes move during while they execute? Can programs exceed the size of physical memory? Do entire programs need to be in memory to run? Can memory be shared among processes? 3. How are processes protected from each other? 4. What are the system limitations? memory limits? CPU processing speed? Disk speed? Hardware assistance? Logical vs. Physical Address Space • Definitions –Memory Management Unit (MMU): Device mapping logical (virtual) addresses to physical addresses –Logical address – process view of memory –Physical address –MMU view of memory • Memory references – Logical and physical addresses are the same when binding occurs during compile or load time – Logical and physical addresses are different when binding occurs dynamically during execution When are Processes Bound to Memory • Compile time: Compiler generates absolute references • Load time: Compiler generates relocatable code. The Link Editor merges separately compiled modules and the loader generates absolute code • Execution time: Binding delayed until run time. Processes can move during execution. Hardware support required. A Simple Memory Mapping Scheme • Controlled by a pair of base and limit registers define the logical address space • The MMU adds the content of the relocation (base) register to each memory reference • The limit register disallows reverences that are out of bounds Hardware to Support Many Processes in Memory MMU Relocation Register Protection • Program accesses a memory location • Trap – When accessing a location that is out of range – Action: terminate the process Improving Memory Utilization Overlays • Parts of a process load into an overlay area • Implemented by user programs using an overlay aware loader Swapping with OS support • Backing store: a fast disk partition large enough to accommodate direct access copies of all memory images • Swap operation: Temporarily roll out lower priority process and roll in another process on the swap queue • Issues: seek time and transfer time • Modified versions of swapping are found on many systems (i.e., UNIX, Linux, and Windows) Overlays Swapping Dynamic Library Loading • Definitions: – Library functions: those which are common to many applications – Dynamic loading: the process of loading library functions at run time • Advantages – Unused functions are never loaded – Minimize memory use if large functions handle infrequent events – Operating system support is not required. • Disadvantage: – Library functions are not shared among processes – Could require application load requests Dynamic Linking • Assumption: A run-time (shared) library exists – Set of functions shared by many processes – Linked at execution time • Stub – A piece of code that locates the memory-resident library function – The stub replaces itself and with the library function address and executes it • Operating System Support – Return address of function if in memory – Load the function if it is not in memory Contiguous Memory Allocation Each Process is stored in one contiguous block • Memory is partitioned into two areas – The kernel and interrupt vector is usually in low memory – User processes are in high memory • Single-partition allocation – MMU relocation base and limit registers enforce memory protection – The size of the operating system doesn’t impact user programs • Multiple-partition allocation – Processes allocated into spare ‘Holes’ (available areas of memory) – Operating system maintains allocated and free memory OS OS OS OS process 5 process 5 process 5 process 5 process 9 process 9 process 8 process 2 process 10 process 2 process 2 process 2 Algorithms for Contiguous Allocations • Issues: How to maintain the free list; what is the search algorithm complexity? • Algorithms (Comment worst-fit generally performs worst) – First-fit: Allocate the first hole that is big enough. – Best-fit: Use smallest hole that is big enough; Leaves small leftover hole – Worst-fit: Allocate the largest hole; Leaves large leftover holes • Fragmentation – External: memory holes limit possible allocations – Internal: allocated memory is larger than needed – 50% fragmentation rule: ½ of memory lost because of fragmentation • Compaction Algorithm – Shuffle memory contents to place all free memory together. – Issues • Memory binding must be dynamic • Time consuming, handling physical I/O during the remapping Paged Memory Addressing • The MMU causes every memory reference instruction address to contain a: – Page number (p) – index into a page table array containing the base address of every frame in physical memory – Page offset (d) – Offset into a physical frame page number p m-n page offset d n – Logical addresses contain m bits, n of which are a displacement. There are 2m pages of size 2n – Advantage: No external fragmentation Paging Definition: A page is a fixed-sized block of logical memory, generally a power of 2 in length between 512 and 8,192 bytes Definition: A frame is a fixed-sized block of physical memory. Each frame corresponds to a single page Definition: A Page table is an array that translates from pages to frames • Operating System responsibilities – Maintain the page table – Allocate sufficient pages from free frames to execute a program • Benefit: Logical address space of a process can be noncontiguous and allocated as needed • Issue: Internal fragmentation Paged Memory Allocation 1. p indexes the page table referring to physical frames 2. d is the offset into a physical frame 3. Each process has an OS maintained page table Process page table Four locations per page Physical frames Note: Instruction address bits define bounds of the logical address space Page Table Examples Memory layout Before allocation After allocation Page Table Implementation • Hardware Assist – Page-table base register (PTBR) addresses the page table – Page-table length register (PRLR) page table size • Issue: – Every memory access requires two trips to memory which could slow the processor speed by half – (1) read page table; access (2) memory reference • Solution: A translation look-aside associative memory (TLBs), which we describe on the next slide Translation look-aside buffers Associative Memory (parallel search) to avoid double memory access • Two column table – Return frame If page found – Otherwise use page table • Timing: – Assume: • • • 20 ns TLB access, 100 ns main memory access, hit ratio 80% – Expected access time (EAT): .8 * 20 + .2 *120 + 100 = 140 ns Page Number Frame Number Note: The TLB is flushed on context switches Extra Page Table Bits • Valid-invalid bits – “valid”: page belongs to process; it is legal – “invalid”: illegal page that is not accessible • Expanded uses – Virtual memory: page trap triggers a disk load – Read only page – Address-space identifier (ASID) to identify the process owning the page • Note – The entire last partial page is marked as valid – Processes can access those locations incorrectly Processes Sharing Data (or Not) • Shared – One copy of read-only code shared among processors – Mapped to same logical address of all processes • Private – Process keeps a separate copy – code and data enable data to be anywhere in memory Hierarchical Page Tables • Single level Page Offset 20 12 • Hierarchical Two level Outer page Inner page Offset 10 10 12 • Notes: – Tree structure – Multiple memory accesses required to find the actual physical locations – Parts of the page table can be on disk Three-level Paging Scheme Hashed Page Tables Hashing complexity is close to O(1) • Collisions resolved using separate chaining (linked list) • Virtual page number hashed to physical frame • Common on address spaces > 32 bits • Ineffective if collisions are frequent Inverted Page Table Goal: Reduce page table memory requirements • One global page table – Advantage: Eliminates a page table per process – Disadvantage: Slower memory access because of searching • Implementation – Hash with key = pid & page number – TLB access eliminates search most of the time • Example: UltraSPARC Segmentation Supports a process view of memory Program Segments: main, object, stack, symbol table, arrays Subroutines (1) Main Program (4) Library Methods (2) Stack (3) Symbol Table (5) • Segment table registers – Segment base register (SBR) = segment table’s location – Segment length register (SLR) = # of segments in a program • Segments – Are variable size; allocated via first fit/best fit algorithms – can be shared among processors and relocated at the segment level – able to contain protection bits for: valid bit, read/write privileges – Suffer from external fragmentation Segmentation Examples Hardware Segmentation with Paging Segment table entries address a segment page table Point to correct page table MULTICS Intel 386 • The MULTICS system pages the segments. Pentium Address Translation • Supports both segmentation and segmentation with paging Segmentation with Paging • Translation Scheme – Segmentation unit produces a linear address – The paging unit produces the physical address Segmentation Only Pentium Paging Architecture Three-level Paging in Linux Virtual Memory Separate logical and physical memory spaces • Concepts – Programs access logical memory – Operating system memory management and hardware coordinate to establish a logical to physical mapping • Advantages – – – – – – – The whole program doesn't need to be in memory The system can execute programs larger than memory Processes can share blocks of memory Resident library routines Improved memory utilization: more processes running concurrently Memory mapped files Copy on write algorithms • Disadvantages – Extra disk I/O and thrashing Logical Memory Examples Copy on Writes • Processes initially share the same pages • Operating System Support – Maintains a list of free zeroed out pages – Each get their own copy only after the page modified Before Modification After Modification Demand Paging Definition: Pages loaded into memory “on demand” The Lazy Swapper (pager) The lazy swapper load pages only when needed. This minimizes I/O and memory requirements and allows for more users Hardware Support Frame # Valid Dirty 1 1 1 1 1 0 . . . . . . . . . 1 1 0 1 1 0 Page table entries contain valid bits and dirty bits • Valid-invalid bits - set 0 (invalid) when the page is not in memory. • Dirty bits are set when a page gets modified. This avoids unnecessary writes during swaps. Advantages: less I/O, less memory, faster response, more users Page Faults Note: Some pages can be loaded and swapped out multiple times Note: Unused bits for invalid entries contain the page’s disk address Processing Page Faults User program references a location not resident Page fault occurs and OS handles the fault IF invalid reference, abort program ELSE IF no empty frame Choose victim and write to backing store ELSE Choose empty frame Find needed page on disk Read page into frame Update page tables Set page table valid bit Re-execute the instruction causing the page fault Performance of Demand Paging • Page Fault Rate 0 p 1.0 – p=0 means no page faults – p=1 means every reference triggers a page fault • Effective Access Time (EAT) – EAT=(1–p) x memory access + p* (page fault overhead + swap page out + swap page in + restart overhead) • Example – p = 0.01 – Memory access time = 200 nanoseconds – Average page-fault service time = 8 milliseconds – Restart overhead is insignificant – EAT = (1 – p) x 200 + p x 8,000,000 = 198 + 80000 ≈ 80 us Question: Is the flexibility worth the extra overhead? Page Replacement Algorithms Occurs when all of the frames are occupied Swap victim out and bring page in • • • • • Technique: Assign a number of frames to each process (x axis) Goal: minimize page faults (y axis) Algorithm Evaluation: Count faults using a predefined access string Belady’s Anomaly: When allocating more frames causes more faults Copy out: Only write frames to backing store that are “dirty” First-In-First-Out (FIFO) Algorithm Memory Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Illustration of Belady’s Anomoly • Case 1: A process can have 3 frames at a time. 1 4 5 2 1 3 9 page faults 3 2 4 Another reference string example • Case 2: A process can have 4 frames at a time. 1 5 4 2 1 5 3 2 4 3 10 page faults Optimal Page Replacement Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Replace the page that will not be used for the longest period of time • A process can have 4 frames at a time 1 4 2 6 page faults 6 page faults 3 4 5 Another reference string example • Advantage: It is optimal • Disadvantage: We don't know the future • Use: A good benchmark algorithm Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Replace the page that has not been used for the longest period of time LRU Page Replacement 1 5 2 8 Page faults 3 5 4 3 4 Assumption: A process can have four frames in memory at a time Another reference string example Naïve Stack Implementation • O(1) victim frame selection • Search and update on each memory reference Approximate LRU with Hardware Support • Reference bit – Each page has a reference bit, initially = 0 – Hardware sets value = 1 when page is referenced – OS replace the first page with a 0 bit • Second chance Algorithm – Need second bit. – Clock loop through pages. – Replace page where reference=0 twice in a row. • set reference bit 0. • leave page in memory. • replace next page (in clock order), subject to same rules. Frame Allocation How are frames allocated among executing processes? • Allocation can be Global or Local – Global: selects a replacement frame from a single set all frames – Local: Each process selects from its own set of allocated frames • Each process needs minimum number of pages – Ex: IBM 370 – A MOVE instruction could require 6 pages: – Instruction is 6 bytes long and could span 2 pages. – 2 pages for the from address, 2 pages for the to address • Each process has a need less than a maximum number of pages – Excessive allocation to a process can degrade system performance • Examples of frame allocation algorithms – fixed: Each process gets an equal number of frames – priority: Higher priority processes get more frames – Proportional: Size of process relative to other processes Other Replacement Algorithms • Lease Frequently Used (LFU) – Replaces the page with the lowest usage count. In case of a tie, the oldest page in memory is replaced. – Disadvantage: A page used with heavy usage remains in memory after it is no longer needed. • Most Frequently Used (MFU) – Replace the page with the largest usage count. In case of a tie, replace the oldest page in memory. – Idea: page with smallest count was just loaded • Usage counts: updated at regular intervals using a page table entry’s reference bit Thrashing Considerations Thrashing: Excessive system resources dedicated to swapping pages • Insufficient frames leads to – low CPU utilization. – Small length of ready queue. – Added more processes leads to more thrashing • Paging works because of locality – Processes perform most of their work referencing narrow ranges of memory • Thrashing occurs when the total size of process locality > total memory size Performance log – memory access over time Working Set Model Adjust allocated frames to references done in a window of time • Goal: achieve an “acceptable” page-fault rate – If actual rate too low, a process loses frame – If actual rate too high, a process gains frame Working-Set Model • working-set window a fixed number of page references Example: 10,000 instruction • WSSi (working set of Process Pi) = total number of pages referenced in the most recent (varies in time) – if too small will not encompass entire locality – if too large will encompass several localities – if = will encompass entire program • D = Total WSSi total demand frames • if D > memory pages Thrashing • Policy if D > memory pages, then suspend one of the processes Working-Set Model Working set: The pages referenced during a working set window Working-set window (): A fixed number of instruction references (ex: 10,000) Processes are given the frames in their working set • Considerations – Small : Processes lose frames. – Large : Processes gain frames. = includes the entire program. – Thrashing results if the sum of all working sets (D) exceed memory (m) • Implementation – Suspend processes if D > memory pages – Timer interrupts every /2 time units. Referenced pages are included in the working set; others are discarded. Reference bits are reset Pre-paging • • • • Purpose: Reduce the page occurring at process startup Pre-page all or some of the pages before they are referenced Note: If pre-paged pages are unused, wasted I/O and memory Assume s pages are prepaged and α of the pages are used – Reduced page costs: s * α – Unnecessary page loads: s * (1- α) – IF α is near zero pre-paging loses Additional Considerations • I/O Interlock – Pages involved in data transfer must be locked into memory • TLB Size impacts working set size – TLB Reach = (TLB Size) X (Page Size) – If the working set is in the TLB, there will be less page faults • Techniques to reduce page faults – Increase the Page Size: leads to increased fragmentation – Provide Variable Page Sizes based on application specifications • Poor program design can increase page faults – Example: One page for each row 1024x1024 array – Program 1 (1024x1024 page faults) // Index by columns for (j = 0; j < A.length; j++) for (i = 0; i < A.length; i++) A[i,j] = 0; – Program 2 (1024 page faults) // Index by rows for (i = 0; i < A.length; i++) for (j = 0; j < A.length; j++) A[i,j] = 0; Memory Mapped Files • • • • • Disk blocks are mapped to memory pages. We read a page-sized portion of files into physical pages Reads/writes to/from files use simple memory access Access without read() and write() system calls Shared memory connects mapped memory to several processes Memory-Mapped Files in Java Memory-Mapped Shared Memory in Windows Examples Windows NT • Demand paging with clustering. Clustering loads surrounding pages • Process parameters: working set minimum and working set maximum. • Automatic working set trimming occurs if free memory is too low Solaris 2 • Maintains list of free pages • The pageout function selects victims using LRU scans. It runs more frequently if free memory is low • Lotsfree controls if paging starts • Scanrate controls the page scan rate, varying from slowscan to fastscan Solaris 2 Allocating Kernel Memory • Differently from user allocation – Deals with physical memory – Kernel requests memory for structures of varying sizes – Some kernel memory must be contiguous • Approaches – Buddy System Allocation – Slab Memory Allocation Buddy System Allocation • Allocates from a fixed-size segment of contiguous pages • Memory is allocated in power-of-2 blocks – Allocation requests round up to the next power of 2 – If the kernel needs a smaller allocation than the blocks that are available, then repeatedly split a larger block into two buddies of next-lower power of 2 until the correct sized block is found. Search time = O(lg tree depth) Slab Memory Allocation Slab: One or more physically contiguous pages • Slab cache – Contains of one or more slabs – A cache exists for each unique kernel data structure – Single cache for each unique kernel data structure • A cache initially contains a group instantiated data structure objects • The cache is initialized with objects marked as free • Allocated objects are marked as used – A new Slab is added to a cache when no more free objects • Benefits – No fragmentation – Fast allocation Slab Allocation