Chapter 9 – Virtual Memory (Pgs 357 - 409) CSCI 3431: OPERATING SYSTEMS Overview Instructions need to be in memory to be executed Address space (bus size) is usually bigger than physical space (memory size) Not all a program is needed at any point in time Loading only what is needed speeds up execution and permits more processes to be in memory Virtual Memory Separates logical memory (i.e., address space seen by compiler/user) from the physical memory (i.e., actual RAM on the motherboard) Each process can be given its own independent address space Sharing, forking, code reuse can all be improved and supported more effectively Demand Paging Load pages only as they are needed Pager – loads and stores pages from secondary storage Swapper – loads and stores processes from secondary storage Pages in memory are called "memory resident" and are usually identified using the "valid bit" in the page table Page Faults 1. Determine if the reference is valid (i.e., 2. 3. 4. 5. 6. check PCB to see if reference is part of process) If invalid, abort, else begin paging Find a free frame in physical memory Schedule load from disk to frame On completion, update page table Restart instruction Handling a Page Fault (9.6) Concepts in Paging Pure Demand Paging – Pages are only brought in when a page fault occurs Locality of Reference: a phenomenon whereby programs tend to use small sets of pages that tend to change slowly Instruction restarting – usually just repeating current instruction in PC Sometimes difficult to predict page faults for CISC hardware Performance 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Trap to operating system Jmp to page fault handler Save registers and state Determine validity of address Identify an available frame Issue a disk read, block process in IO queue Run scheduler, execute another process ... After page load, Disk trap to operating system Jmp to disk interrupt handler Figure out the disk was handling a page fault Update page table for process, unblock process, reset process so it can resume at failed instruction Run scheduler, execute some process ... Resume where we left off when we are scheduled to run ... All this takes HOW LONG for ONE page fault? What this means ... Interrupt handlers are often hand-coded assembly for optimal performance Page fault times are horrible if there are a lot of disk requests Disk activity is the major wait factor We can compute Effective Access Time = ((1-p) * timemem) + (p * timefault) where p is the % probability of a fault And if there is no empty frame ... Copy on Write Pages only ever change if written to Pages that are not changed can be shared Thus, assume everything is shared, and only copy if a write occurs to a page Makes for very fast forks Sometimes OS reserves frames so they are available when needed for the copy Page Replacement Eventually the number of active pages exceeds the number of available frames 1. Swap the entire process and all its pages out 2. Swap out pages as needed to make space for new pages Most systems try to page out read-only pages because no write is needed (i.e., clean vs. dirty pages) Requires updating of the processes page table (which is why it is held by the kernel) Algorithms While we can use random data, it is best to use a "reference string" of page faults developed from actual runtime data when evaluating algorithms Goal is to minimise number of page replacements Key factor is ratio of pages:frames Higher = more faults, lower = fewer faults FIFO Replace the first one loaded (which is also the oldest) Mediocre performance Belady's Anomaly: Increasing number of frames INCREASES the fault rate! FIFO suffers from this anomaly for some reference strings (example, page 374) Optimal We know that a replacement strategy is optimal if it does the least work Page that is best to replace is the one that is going to be needed for the longest period of time Problem: Requires knowledge of when a page will be needed Not really possible and mostly used as a theoretical bound when measuring other strategies LRU Past history is the best indicator of future use Least likely to be used is the page that was Least Recently Used (LRU) Some page tables have a reference bit that is set whenever a page is referenced Good: enables us to delete pages not used very much Bad: setting the reference bit on every memory access is time consuming Reference bit allows us to approximate LRU Reference Shifts Use 8 bits (or more) for the reference bits Use some regular clock interval Zero the reference bits when a page is loaded Set the high-order (leftmost) bit to 1 when the page is referenced Every clock interval, shift the bits right by 1 Treat as an integer, higher means more recently used and more frequently used Second Chance Algorithm is a variant of FIFO When a page is needed, take the first If reference bit set, clear it and move page to back of the queue If not set, replace it Sometimes called the "clock" algorithm Can be easily implemented using a circular queue Can be enhanced by avoiding (if possible) pages that are dirty Other Options Reference counting – count references and replace the least frequently used Issue – LFU can replace a new page that hasn't had a chance to be used much yet P00ls of empty pages can be kept so a free page is always available – permits CPU to schedule the creation of more free pages for times when its less busy or disk isn't in use *** Next Class: 9.5+, Frame Allocation *** Raw Disk It is possible to write directly to, or read directly from, a disk block This technique is fast and avoids creating directory entries, file system data, etc. Used by some databases and other applications Bypasses paging mechanism – disk savings usually lost due to memory management problems Frame Allocation Page faults require a frame during servicing Are all frames equal? Do we just apply some variant of LRU, or do we need to ensure each user has their "fair share"? Performance requires that a process be given some minimum number of frames Hardware is a factor – What is the maximum number of frames that an instruction could possibly need to access? (This value makes a good minimum) Indirect addressing also adds to the page minimum (mostly an obsolete idea now) Also – stack & heap of process, OS tables, scheduler code, interrupt handlers etc. must be considered Strategies Equal Allocation: f frames are divided equally among u users (f/u per user) Proportional Allocation: Allocate memory in proportion to the size of the process I.e., if p2 is twice the size of p1, p2 gets 2/3 of memory and p1 gets 1/3 of memory p/S * f where S = total pages for all pi Allocation con't OS needs some pages to operate efficiently Process should always have its minimum Pool of free frames is useful Have not considered priority in any way Useful to consider scheduling – if a process isn't going to run, does it need much memory? IO bound processes do a lot of waiting, good candidates for less memory Global vs. Local Allocation Global Allocation: All frames are viable candidates and must be considered for replacement when a new page is loaded Local Allocation: Only the frames allocated to the process are considered for replacement NUMA just messes everything up ... > Good: being able to add additional memory on new memory cards > Bad: cards are slower than the memory on the motherboard and should not be paged as often Thrashing When a process spends more time being paged than it does being executed Occurs when page fault rate gets too high Local replacement can prevent processes from causing each other to thrash Locality: A set of pages that are used together A process can be viewed as a set of sequential localities that tend to overlap Locality Example (Fig 9.19) Working Set The set of pages accessed in some time period Can adjust the time period to produce smaller or (somewhat) larger working sets Captures a process' current locality Can be used to prevent thrashing Do not start (or swap out) a process when some process(s) do not have sufficient frames for their working set Working Set Size = number of total frames needed if the interval is greater than the life of the program = 1 if the interval contains only one instruction with not memory reference Generally try to use an interval that captures the locality of most processes Fault Rate Alternatively, we can ignore the working set Simply use the Page Fault Rate to avoid thrashing Goal is to keep the fault rate below some limit Very simple to calculate and keep track of Generally a good approach, but like all approaches, fails in some rare situations Memory Mapping Loading of a file into memory to minimise disk accesses Uses the standard paging system to do the loading Can cause synchronisation issues between inmemory version and on-disk version until the file is closed uses the mmap (or mmap2) system call often used as a technique for memory sharing supports concurrent file access by multiple processes IO Some IO devices have their buffers also mapped into main memory Writes to this location are then duplicated to the actual IO device Used for Disks, Monitors/Video Also used for hardware ports such as mouse, game controllers, USB If device uses a status/controller register for communication, it is programmed IO (PIO) If devices uses interrupts for communication it is interrupt driven Kernel Memory The OS often works with non-paged data Some data is very small (e.g., a semaphore) and doesn't need a page Some data is very large (i.e., a disk block) and must be on contiguous frames so its address space is continuous "Buddy System" Round all data sizes up to a power of 2 Easier to deal with memory blocks if they are 32 bytes, 64 bytes, 128 bytes etc. Large blocks easily split to form two smaller ones Smaller adjacent blocks combined to form one larger one Works well and is a very commonly used technique in all kinds of system programming Slab Allocation Keep a "slab" of memory for each specific kernel data structure (e.g., semaphore, PCB, Page Table) Manage the slap almost as if it was paged, with pages the same size as the data structure Has some issues if more pages are needed, as they won't be contiguous (which is necessary) Has been generalised and can be used for some user-mode requests This is how most modern OSs now function Prepaging Tries to avoid page faults by anticipating future faults Wastes time if unneeded page brought into memory Is often faster to read two sequential disk blocks than to issue two independent disk reads (so we can save time) Page Size Strongly dictated by CPU architecture Also influenced by: Bus Width/Size Disk block size System use characteristics (e.g., process and file sizes) Amount of physical memory available Time to service a page fault TLB Reach The amount of memory accessible from the TLB Should be greater than the working set size Can be influenced by page size TLB size may be configurable, depending upon the hardware Page Locking Sometimes a page should be locked into memory so that its not paged out, e.g., Disk buffer after an IO request issued The OS itself (i.e., scheduler, mmu - interrupt handlers, frame table) Solution is to put a "lock bit" into the frame table for each frame Programming Concerns Let the compiler optimise! Most situations that result in poor performance can be detected and avoided by the compiler (e.g., memory access in loops) Also note: Pointers and malloc will can be used very badly and can really harm performance 2. Objects almost always harm performance (as does method over-riding) – there is a reason no web browser, OS, compiler, VM, server ... is OO 1. To Do: Do Lab 5 Read Chapter 9 (pgs 357-409; this and next lecture) Continue working on Assignment 2