CS 345 Virtual Memory Chapter 8 Objectives Topics to Cover… Program Execution Patterns Computer Memory Virtual Memory Paging Segmentation Performance Replacement Algorithms Paging Improvements BYU CS 345 Virtual Memory 2 Program Execution Program Execution What are the characteristics of an executing program? Characteristics of an executing program: has code that is unused allocated more memory than is needed has features that are used rarely A program’s instructions must be in main memory to execute, even though… BYU CS 345 the entire program is not always executing the address space of the program could be broken up across available frames (paging) Virtual Memory 3 Computer Memory Computer Memory What are the implications of lack of memory? Lack of memory has serious implications What if a program “grows” while executing? What about moving to a new machine? Execution of a program that is not ALL in physical memory would be advantageous. BYU CS 345 larger address space possible more programs could be in memory less I/O needed to get a process going unused modules would not be loaded Virtual Memory 4 Computer Memory Early Memory Solutions What were early solutions to lack of memory? All larger programs had to contain logic for managing two-level storage. The non-volatile hard drive was used to store data and code. Programs were responsible for moving “overlays” back and forth from primary to secondary storage. Multi-programming had to use “base and bounds registers” to manage, allocate, and reallocate memory. BYU CS 345 Virtual Memory 5 Computer Memory Virtual Memory to the Rescue! 1961 - First virtual memory machine, Atlas Computer project at the University of Manchester in the UK. 1962 - First commercial system, Burroughs B5000. 1972 – IBM introduces virtual memory in mainframes with OS/370. 1979 - Unix uses virtual memory with 3BSD. 1993 - Microsoft introduces virtual memory into Windows NT 3. All had challenges Specialized, hard to build hardware required Too much processor power required to do address translation BYU CS 345 Virtual Memory 6 Virtual Memory Virtual Memory What is the difference between “real” and “virtual” memory? Program addresses only logical addresses Hardware maps logical addresses to physical addresses Only part of a process is loaded into memory process may be larger than main memory additional processes allowed in main memory memory loaded/unloaded as the programs execute generally implemented using demand paging. Real Memory – The physical memory occupied by a program (frames) Virtual memory – The larger memory space perceived by the program (pages) BYU CS 345 Virtual Memory 7 Virtual Memory Memory Hierarchy Cache memory: provides illusion of very high speed Main memory: reasonable cost, but slow & small Virtual memory: provides illusion of very large size Virtual memory Main memory Cache Registers Words Lines (transferred explicitly via load/store) BYU CS 345 Pages (transferred automatically upon cache miss) (transferred automatically upon page fault) Virtual Memory 8 Virtual Memory Virtual Memory Principle of Locality – A program tends to reference the same items - even if same item not used, nearby items will often be referenced Resident Set – Those parts of the program being actively used (remaining parts of program on disk) Thrashing – Constantly needing to get pages off secondary storage happens if the O.S. throws out a piece of memory that is about to be used can happen if the program scans a long array – continuously referencing pages not used recently O.S. must watch out for this situation! BYU CS 345 Virtual Memory 9 Paging Paging Hardware Use page number as a index into the page table, which then contains the physical frame holding that page Typical Flag bits: Present, Accessed, Modified, various protection-related bits BYU CS 345 Virtual Memory 10 Paging More Paging Hardware Full page tables can be very large 4G space with 4K pages = 1M entries some systems put page tables in virtual address space Multilevel page tables top level page table has a Present bit to indicate entire range is not valid BYU CS 345 second level table only used if that part of the address space is used second level tables can also be used for shared libraries Virtual Memory 11 Paging Two-Level Paging System 15 Virtual Address … 11 10 RPTE # … UPTE # 65 … 0 Frame Offset Root Page Table User Page Table LC-3 Main Memory + Flags / Frame # Flags / UPT # RPT BYU CS 345 + One per process Frame<<6 Offset 15 … 65 … 0 Physical Address Virtual Memory 12 Paging MMU’s MMU’s used to sit between the CPU and bus Page tables now they are typically integrated into the CPU originally implemented in special very fast registers now they are stored in normal memory entries are cached in fast registers as they are used Optional features separate page tables for each processor mode read/write access control, referenced/dirty bits BYU CS 345 Virtual Memory 13 Paging More Paging Hardware To minimize the performance penalty of address translation, most modern CPUs include an onchip memory management unit (MMU) and maintain a table of recently used virtual-tophysical translations, called a Translation Lookaside Buffer (TLB). BYU CS 345 Virtual Memory 14 Segmentation Segmentation Programmer sees memory as a set of multiple segments, each with a separate address space Growing data structures easier to handle Can alter the one segment without modifying other segments Easy to share a library O.S. can expand or shrink segment share one segment among processes Easy memory protection can set values for the entire segment BYU CS 345 Virtual Memory 15 Segmentation Segmentation (continued…) Implementation: Combine with paging: No external fragmentation have a segment table for each process similar to one-level paging method status – present, modified, location, size easier to manage memory since all items are the same size Some processors have both (386) each segment broken up into pages address Translation do segment translation translate that address using paging some internal fragmentation at the end of each segment BYU CS 345 Virtual Memory 16 So… What policy decisions do OS designers face? Support paging, segmentation, or both? Windows/Unix/Linux Use paging for virtual memory Use segments only for privilege level (segment = address space) Support virtual memory? Which memory management algorithm? BYU CS 345 Virtual Memory 17 Paging Simple Paging Hardware Use page number as a index into the page table, which then contains the physical frame holding that page Logical Address BYU CS 345 Physical Address Virtual Memory 18 Simple Paging Quiz Consider a simple (1 level) byte addressable paging system with the following parameters: 224 bytes of physical memory; page/frame size of 211 bytes; 29 pages of logical address space. a. How many bits are in a logical address? b. How many bytes in a frame? c. How many bits in the physical address specify the frame #? d. What is the size of the logical address space? e. How many bits in each page table entry? (Include valid, dirty, and pin bits.) f. What is the size of a page table? BYU CS 345 Virtual Memory 19 Virtual Memory Virtual Memory Paged memory combined with disk swapping Processes reside in main/secondary memory Demand Paging could also be termed as lazy swapping bring pages into memory only when accessed allows us to over allocate What about at context switch time? could swap out entire process restore page state as remembered anticipate which pages are needed BYU CS 345 Virtual Memory 20 Decisions about Virtual Memory Fetch Policy Placement How many process pages to keep in memory? Fixed or variable? Reassign pages to other processes? Cleaning Policy What to unload to make room for a new page? Resident Set Management Where to put it? Unused page? Replacement When to bring a page in? When needed or in anticipation of need? When is a page written to disk? Load Control BYU CS 345 Virtual Memory 21 Page Replacement Frame replacement – two page transfers select a frame (victim) write the victim frame to disk read in new frame from disk update page tables restart process Reduce overhead using dirty bit dirty bit is set whenever a page is modified if dirty, write page, else just throw it out BYU CS 345 Virtual Memory 22 Page Fault 0 Page 0 0:m1 1 Page 1 1:v0 2 Page 2 2:m3 3 Page 3 3:v1 4 Page 0 Page 1 Page 2 Page 3 5 6 7 Page fault is generated when an invalid page is accessed BYU CS 345 Virtual Memory 23 Memory References Check internal table Find a free frame valid or invalid frame (page fault) new or swapped page get from frame pool unload a frame If page defined, read in from disk Update the page table Restart the instruction process restarts from exact location state unchanged (as if not interrupted) BYU CS 345 Virtual Memory 24 Paging Implementation Extreme case What about overhead of paging? start a process with no pages in memory pure demand paging locality thrashing Hardware support page table secondary memory BYU CS 345 Virtual Memory 25 Paging Implementation (continued…) Must be able to restart a process at any time instruction fetch operand fetch operand store (any memory reference) Consider simple instruction (VLIW or CISC) Add C,A,B (C = A + B) All operands on different pages Instruction not in memory 4 possible page faults )-: slooooow :-( BYU CS 345 Virtual Memory 26 Paging Performance Paging Time… Disk latency Disk seek Disk transfer time Total paging time 8 milliseconds 15 milliseconds 1 millisecond ~25 milliseconds Could be longer due to device queueing time other paging overhead BYU CS 345 Virtual Memory 27 Paging Performance (continued…) Effective access time: EAT = (1 - p) ma + p pft where: p is probability of page fault ma is memory access time pft is page fault time BYU CS 345 Virtual Memory 28 Paging Performance (continued…) Effective access time with 100 ns memory access and 25 ms page fault time: EAT = (1 - p) ma + p pft = (1 - p) 100 + p 25,000,000 = 100 + 24,999,900 p What is the EAT if p = 0.001 (1 out of 1000)? 100 + 24999,990 0.001 = 25 microseconds 250 times slowdown! How do we get less than 10% slowdown? 100 + 24999,990 p 1.10 100 ns = 110 ns Less than 1 out of 2,500,000 accesses fault BYU CS 345 Virtual Memory 29 Placement Policies Where to put the page trivial in a paging system – can be placed anywhere Best-fit, First-Fit, or Next-Fit can be used with segmentation is a concern with distributed systems Frame Locking require a page to stay in memory O.S. Kernel and Interrupt Handlers real-Time processes other key data structures implemented by bit in data structures BYU CS 345 Virtual Memory 30 Replacement Algorithms Replacement Algorithms Random (RAND) Belady’s Optimal Algorithm strictly a straw-man for stack algorithms Least Recently Used (LRU) for the dogs, forget ‘em Least Frequently Used (LFU) best but unrealizable – used for comparison First-In, First-Out (FIFO) choose any page to replace at random discard pages we have probably lost interest in Clock – Not Recently Used (NRU) efficient software LRU BYU CS 345 Virtual Memory 31 Replacement Algorithms Belady’s Optimal Algorithm Belady’s optimal replacement “Perfect knowledge” of the page reference stream. Select the page that will not be referenced for the longest time in the future. Rare for a system reference stream for every thread in every process in advance. generally unrealizable few special cases: program to predict the weather Its theoretical behavior is used to compare the performance of realizable algorithms. BYU CS 345 Virtual Memory 32 Replacement Algorithms Replacement Quiz Belady’s Optimal Frame 0 1 2 3 2 3 2 0 4 3 2 1 2 1 0 1 0 1 2 Least Recently Used Frame 0 1 2 3 2 3 2 0 4 3 2 1 2 1 0 1 0 1 2 Least Frequently Used Frame 0 1 2 3 2 3 2 0 4 3 2 1 2 1 0 1 0 1 2 BYU CS 345 Virtual Memory 33 Replacement Algorithms FIFO page replacement Replace oldest page in memory Intuition: Advantages: Fair: All pages receive equal residency Easy to implement (circular buffer) Disadvantage: First referenced long time ago, done with it now Some pages may always be needed Difficult to implement (time stamps) Can we improve the performance by adding more frames? BYU CS 345 Virtual Memory 34 Replacement Algorithms FIFO FIFO/3 Frames Frame 0 1 2 3 0 1 2 3 0 1 2 3 4 5 6 7 (16) 0 0 0 0 3 3 3 2 2 2 1 1 1 4 4 4 7 1 1 1 1 0 0 0 3 3 3 2 2 2 5 5 5 2 2 2 2 1 1 1 0 0 0 3 3 3 6 6 FIFO/4 Frames Frame 0 1 2 3 0 1 2 3 0 1 2 3 4 5 6 7 (8) 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 5 5 5 2 2 2 2 2 2 2 2 2 2 2 2 2 6 6 3 3 3 3 3 3 3 3 3 3 3 3 3 7 BYU CS 345 Virtual Memory 35 Replacement Algorithms 13 Page Faults! FIFO/3 Frames Frame 0 1 2 3 0 1 4 0 1 2 3 4 0 1 2 3 (13) 0 0 0 0 3 3 3 4 4 4 4 4 4 0 4 4 3 1 1 1 1 0 0 0 0 0 2 2 2 2 1 1 1 2 2 2 2 1 1 1 1 1 3 3 3 3 2 2 FIFO/4 Frames Frame 0 1 2 3 0 1 4 0 1 2 3 4 0 1 2 3 (14) 0 0 0 0 0 0 0 4 4 4 4 3 3 3 3 2 2 1 1 1 1 1 1 1 0 0 0 0 4 4 4 4 3 2 2 2 2 2 2 2 1 1 1 1 0 0 0 0 3 3 3 3 3 3 3 2 2 2 2 1 1 1 14 Page Faults! BYU CS 345 Virtual Memory 36 Replacement Algorithms FIFO replacement performance Page Faults What is going on here? Belady’s Anomaly 14 Number of Page Faults 12 10 8 Page Faults 6 4 2 0 0 1 2 3 4 5 6 7 8 Number of Frames BYU CS 345 Virtual Memory 37 Replacement Algorithms Least Recently Used Replace page not used for longest time in past Intuition: Use past to predict the future Advantages: Disadvantages: With locality, LRU approximates OPT (Belady’s algorithm) harder to implement, must track which pages have been accessed (time stamp, page stack) does not handle all workloads well Updates must occur at every memory access huge overhead few computers offer enough hardware support for LRU. BYU CS 345 Virtual Memory 38 Replacement Algorithms Implementing LRU Software Perfect LRU Hardware Perfect LRU OS maintains ordered list of physical pages by reference time When page is referenced: Move page to front of list (top) When need victim: Pick page at back of list (bottom) Trade-off: Slow on memory reference, fast on replacement Associate register with each page When page is referenced: Store system clock in register When need victim: Scan through registers to find oldest clock Trade-off: Fast on memory reference, slow on replacement (especially as size of memory grows) In practice, do not need to implement perfect LRU LRU is an approximation anyway, so approximate more Goal: Find an old page, but not necessarily the very oldest BYU CS 345 Virtual Memory 39 Replacement Algorithms Implementing LRU Can use a reference bit Reference time tracking implemented in software cleared on loading set every time the page is referenced periodically scan page tables note which pages referenced and modified reset the reference/modified bits Clock algorithm – efficient software LRU all pages are in a circular list start scan where the previous scan left off replace the first un-referenced page you find BYU CS 345 Virtual Memory 40 Clock Algorithm Second-Chance Algorithm Often called “clock algorithm” If the reference bit is 1 clear the reference bit move clock pointer to the next page If reference bit is 0 and not pinned keep circular list of all pages (RPT’s, UPT’s, Memory) clock pointer refers to next page to consider swap page out to disk (if dirty) move clock pointer to next page return page number Could cycle through entire list before finding victim BYU CS 345 Virtual Memory 41 Clock Algorithm Clock Replacement Quiz Clock/3 Frames Frame 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 0 1 2 Clock/4 Frames Frame 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 0 1 2 3 BYU CS 345 Virtual Memory 42 CS 345 Virtual Memory Project BYU CS 345 Virtual Memory 43 Project 4 Virtual Memory Guidelines Verify a clean compilation of your LC-3 virtual memory simulator. Validate that “crawler.hex” and “memtest.hex” programs execute properly. Modify the getMemAdr() function to handle a 2-level, paging, virtual memory addressing. Implement a clock page replacement algorithm to pick which frame is unloaded, if necessary, on a page fault. Use the provided 1MB page swap table routine to simulate paged disk storage (8192 pages) or implement your own routine. Use crawler.hex and memtest.hex to validate your virtual memory implementation. Use other routines (such as im) to debug you implementation. BYU CS 345 Virtual Memory 44 Project 4 Virtual Memory Guidelines Use the following CLI commands to verify and validate your virtual memory system. (Most of these routines are provided, but may require some adaptation to your system.) dfm <#> dft dm <sa>,<ea> dp <#> dv <sa>,<ea> Display LC3 memory frame <#> Display frame allocation table Display physical LC3 memory from <sa> to <ea> Display page <#> in swap space Display virtual LC3 memory <sa> to <ea> im <#> Init LC3/Set upper LC3 memory limit rpt <#> upt <p><#> Display task <#> root page table Display task <p> user page table <#> vma <a> vms Access <a> and display RPTE’s and UPTE’s Display LC3 statistics BYU CS 345 Virtual Memory 45 Project 4 Virtual Memory Guidelines Demonstrate that LC-3 tasks run correctly. Be able to dynamically change LC-3 memory size (im command) and chart resulting changes in page hits/faults. Memory accesses, hits and faults are defined as follows: Memory access (memAccess) = sum of memory hits (memHits) and memory faults (memPageFaults). Hit (memHits) = access to task RPT, UPT, or data frame. (Exclude accesses below 0x3000.) Fault (memPageFaults) = access to a task page that is undefined or not currently in a memory frame. Page Reads (pageReads) = # pages read from swap space into memory. Page Writes (pageWrites) = # pages written from memory to swap space. Swap Page (nextPage) = # of swap space pages currently allocated to swapped pages. Crawler Frames: 320 16 Memtest 2 320 16 2 Accesses: Hits: Faults: Page Reads: Page Writes: Swap Pages: BYU CS 345 Virtual Memory 46 Project 4 Project 4 Grading Criteria REQUIRED: 4 pts – Successfully execute crawler and memtest in 20k words (320 frames). 3 pts – Successfully execute crawler and memtest in 1k words (16 frames). 1 pt – Successfully execute 5 or more LC-3 tasks simultaneously in 16 frames of LC-3 memory. 1 pt – Correctly use the dirty bit to only write altered or new memory frames to swap space. 1 pt – Chart and submit the resulting memory access, hit, fault, and swap page statistics after executing crawler (and then memtest) in 320 and 16 frames. BONUS: +1 point – early pass-off (at least one day before due date.) +1 point – Add a task frame/swap page recovery mechanism of a terminated task. +1 point – Implement the advanced clock algorithm and chart the results. +1 point – Join the 2-frame club. (Successfully execute 5 or more LC-3 tasks simultaneously in 2 frames of LC-3 memory. Chart the memory accesses, hits, and faults.) –1 point penalty for each school day late. BYU CS 345 Virtual Memory 47 Project 4 So… 1. 2. 3. 4. 5. 6. Read and comprehend Stallings, Section 8.1. Comprehend the lab specs. Discuss questions with classmates, the TA’s and/or the professor. Make sure you understand what the requirements are! It's a tragedy to code for 20 hours and then realize you're doing everything wrong. Validate that the demo LC-3 simulator works for a single task with pass-through addressing (virtual equals physical) for the LC-3 by executing the commands “crawler” and “memtest”. Design your MMU. Break the problem down into manageable parts. Create and validate a “clock” mechanism that accesses all global root page tables, user page tables, and data frames. Implement dirty bit last – use “write-through” for all swapping of a data frame to swap space. BYU CS 345 Virtual Memory 48 Project 4 So… 7. Incrementally add support for the actual translation of virtual addresses to physical addresses with page fault detection as follows: a. Implement page fault frame replacement using available memory frames only. This should allow you to execute any test program in a full address space. b. Implement clock page replacement algorithm to unload data frames to swap pages and reload with a new frame or an existing frame from swap space. This should allow you to execute all the test programs in a 32k word address space (20k of paging frames). c. Implement clock page replacement algorithm to unload User Page Tables when there are no physical data frame references in the UPT. This will be necessary when running in a small physical space (16k words) with multiple tasks. d. Implement dirty bit to minimize writing frames to swap space. BYU CS 345 Virtual Memory 49 Project 4 So… 8. Remember to always increment your clock after finding a replacement frame. 9. Use the vma function to access a single virtual memory location and then display any non-zero RPT and UPT entries. Implement various levels of debug trace to watch what is going on in your MMU. You may use the provided display functions. 10. When swapping a user page table to swap space, add some debug “sanity check” code to validate that the UPT does not have any entries with the frame bit set. BYU CS 345 Virtual Memory 50 BYU CS 345 Virtual Memory 51 Paging Problems? Recent revival in page replacement research. Size of primary storage has increased - algorithms that require a periodic check of each and every memory frame are becoming less and less practical. Memory hierarchies have grown taller - the cost of a CPU cache miss is far more expensive. This exacerbates the previous problem. Object-oriented programming techniques have weakened locality of reference. Sophisticated data structures like trees and hash tables and the advent of garbage collection have drastically changed the memory access behavior of applications. BYU CS 345 Virtual Memory 52 Paging Improvements? Disk access techniques Demand paging Better Working Set model use larger blocks separate swap space - no file table lookup binary boundaries load several consecutive sectors/pages rather than individual sectors due to seek, rotational latency Monitor program execution – minimize number of pages per process are needed for execution (locality) Pre-paging bring in pages that are likely to be used in the near future easier to guess at program startup, but may load unnecessary pages BYU CS 345 Virtual Memory 53 Clock Algorithm Enhancements? Consider reference bit and dirty bit 4 possible cases (Macintosh scheme) (0,0) neither modified or referenced (0,1) not recently used but modified (1,0) recently used but clean (1,1) recently used and modified Still use “clock algorithm” clear only reference bit upon consideration Add additional reference bits - 3rd, 4th,… chance At regular intervals, clear all reference bits A process can be in RAM if and only if all of the pages that it is currently using can be in RAM. BYU CS 345 Virtual Memory 54 Frame Allocation Demand allocation Options Minimum number of frames keep 3 empty frames, write out in background what is the least number of frames to allocate Allocation Algorithms equal allocation proportional to storage for executable priority BYU CS 345 Virtual Memory 55 Global vs Local Allocation Global Allocation Local Allocation replacement page is selected from among all pages in system replacement page is selected only from the pages owned by the process Process controls its own page fault rate Number of pages for a process won’t grow BYU CS 345 Virtual Memory 56 BYU CS 345 Virtual Memory 57