Memory Management Basic memory management Swapping Virtual memory Page replacement algorithms Modeling page replacement algorithms Design issues for paging systems Implementation issues Segmentation 1 Memory Management • Ideally programmers want memory that is large, fast, non volatile • Memory hierarchy – small amount of fast, expensive memory – cache – some medium-speed, medium price main memory – gigabytes of slow, cheap disk storage larger faster Intel : 8-, 16-, 32-bits MIPS: 32- bit 32 KB to a few MB 128 MB to 1GB 40 GB to 160 GB • Memory manager handles the memory hierarchy 2 Basic Memory Management 3 Basic Memory Management Memory management:(1) swapping and paging (2) without swapping and paging Monoprogramming without Swapping or Paging Model (a) was used on mainframes and minicomputers, and is rarely used any more. Model (b) is used on some palmtop computers and embedded systems. Model (c) was used by the early personal computers. The portion of the system in ROM is called BIOS (Basic Input Output System) Except on simple embedded systems, monoprogramming is hardly used anymore. 4 Multiprogramming with Fixed Partitions (a) separate input queues for each partition (b) Single input queue 5 • (-) multiple input queues – queue for a large partition is empty but queue for a small partition is full – since the partitions are fixed, any space in a partition not used by a job is lost • single input queue: whenever a partition becomes free, the job closest to the front of the queue that fits in it could be loaded into the empty partition and run • different strategy: since it is undesirable to waste a large partition on a small job, search the whole input queue whenever a partition becomes free and pick the largest job that fits the partition. 6 Swapping 9 Swapping Two general approaches to memory management Swapping: Method of copying a process’s memory contents to secondary storage, removing the process from the memory and allocating the new free memory to a new process, running it for a while, then putting it back on disk. Virtual memory: Capability of operating systems that enables programs to address more memory locations than are actually provided in main memory. Virtual memory systems help remove much of the burden of memory management from programmers, freeing them to concentrate on application development Sec. 4.3. 10 Memory allocation time Swapping system: The number of processes in memory varies dynamically. Locations of processes in memory vary dynamically. Size of the partitions varies dynamically. Memory Compaction: When swapping creates multiple holes in memory, it is possible to combine them all into one big one by moving all the processes downward as far as possible. Usually not done because it requires a lot of CPU time. 11 How much memory should be allocated for a process when it is created or swapped? If processes are created with a fixed size that never change, then the allocation is simple: the OS allocates exactly what is needed, no more and no less. If processes’ data segments can grow, a problem occurs whenever a process tries to grow. 12 Allocation space for a growing data (a)Allocating space for growing data segment If the hole between processes A and B runs out, A or B will have to be moved to a hole with enough space, swapped out of the memory until a large enough hole can be created, or killed. (b)Allocating space for growing stack & data segment If the hole between stack segment and data segment runs out, the process will have to be moved to a hole with enough space, swapped out of the memory until a large enough hole can be created, or killed. 13 • Two ways to keep track of memory usage – bit maps – lists 14 Memory Management with Bit Maps • Memory is divided up into allocation units, the size of unit may be as small as a few words as large as several kilobytes. • Part of memory with 5 processes, 3 holes – tick marks show allocation units – shaded regions are free 15 Trade-off: The smaller the allocation unit, the larger the bitmap. If the allocation unit is chosen large, the bitmap will become smaller, but the memory may be wasted in the last unit of the process if the the process size is not an exact multiple of the allocation unit. Main problem: When it has been decided to bring a k-unit process into memory, the memory manager must search the bitmap to find a run of k consecutive 0 bits in the map. Searching a bitmap for a run of a given length is a slow operation. 16 Memory Management with Linked Lists Linked list of allocated and free memory segments • The segment list is kept sorted by address. Sorting this way has advantage that when a process terminates or is swapped out, updating the list is straightforward. 17 (a) (b) (c) (d) Updating the list requires replacing a P with H. Two entries are coalesced into one, and the list becomes one entry shorter. The same with (b). Three entries are merged and two items are removed from the list. 18 Algorithms to allocate memory for a newly created process Assume that the memory manager knows how much memory to allocate. First fit :The memory manager scans along the list of segments until it finds a hole that is big enough. The hole is then broken up into two pieces, one for the process and one for the unused memory. It is a fast algorithm because it searches as little as possible. Next fit :It works the same way as first, except that it keeps track of where it is whenever it finds a suitable hole. The next time it is called to find a hole, it starts searching the list from the place where it left off last time. Simulations (Bays, 1977) show that it gives slightly worse performance than first fit. Best fit: It searches the entire list and takes the smallest hole that is adequate. It is slower than first fit. 19 Worst fit :To get around the problem of breaking up nearly exact matches into a process and tiny hole, it always takes the largest available hole, so that the hole broken off will be big enough to be useful. Simulation has shown that the worst fit is not a very good idea either. Quick fit :It maintains separate lists for some of the more common sizes requested. e.g. a table with n entries, in which the first entry is a pointer to the head of a list of 4-KB holes, the second entry is the a pointer to a list of 8-KB holes, the third entry a pointer to 12-KB holes. Finding a hole of required size is fast. It has the same disadvantage as all schemes that sort by hole size, when a process terminates or is swapped out, finding its neighbor to see if a merge is possible is expensive. 20 Virtual Memory 21 Virtual Memory Virtual memory: Capability of operating systems that enables programs to address more memory locations than are actually provided in main memory. Virtual memory systems help remove much of the burden of memory management from programmers, freeing them to concentrate on application development (Devised by Fotheringham, 1961) Basic idea: the combined size of a program, data, and stack may exceed the amount of physical memory available for it. OS keeps those parts of the program currently in use in main memory, and the rest on disk e.g. 16-MB program can run on a 4-MB machine by carefully choosing which 4-MB to keep in memory at each instant, with pieces of program being swapped between disk and memory as needed. 22 Paging • Paging: Virtual memory organization technique that divides an address space into fixed blocks of contiguous address. When applied to a process’s virtual address space, the blocks are called pages, which store process data and instructions. When applied to main memory, the blocks are called page frames. 23 Virtual address: Program-generated address (using indexing, base registers, segment registers and other ways). Virtual address space: formed by all virtual address. Pentium II pro:36 bits address: 236 = 64GB Memory management unit (MMU): a chip or collection of chips that maps the virtual addresses onto the physical memory addresses 24 Example of how the mapping works. Virtual addresses: 16-bit (0 – 64KB) Physical memory: 64KB User program can be up to 64KB, but it cannot be loaded into memory entirely and run. The virtual address space is divided into units called pages. The corresponding units in physical memory are called page frames. The pages and frame pages are always the same size. 4KB (512B – 64KB in real system) 8 frame pages, 16 virtual pages e.g. MOV REG, 0 it is transformed into (by MMU) MOV REG, 8192 25 e.g. MOV REG, 8192 is transformed into MOV REG, 24576 In the actual hardware, a Present/absent bit keeps track of which pages are physically present in memory. (2456728671) (8192-12287) (4196-8191) (0-4095) 26 Page fault: Fault that occurs as the result of an error when a process attempts to access a nonresident page, in which case the OS can load it from disk. e.g. MOV REG, 32780 (12-th byte within virtual page 8) (1) MMU notices that the page is unmapped and causes CPU to trap to OS. (2) OS picks a little-used page frame and writes back to the disk. (3) Then it fetches the page just referenced into frame page just freed. (4) Change the map and restart the trapped instruction. 27 Page Tables Page table: Table that stores entries that map page numbers to page frames. A page table contains an entry for each of a process’s virtual pages. e.g. 16-bit address: High-order 4 bits: virtual page number. Low-order 12 bits: offset 8196 is transformed into 24580 by MMU. Internal operation of MMU with 16 4 KB pages 28 The purpose of page table is to map virtual pages onto page frames. Two major issues must be faced: (1) The page table can be extremely large. e.g. a computer uses 32-bit virtual addresses, page size: 4KB Page number = 232/ 212 = 220 (1 million) Remember that each process needs its own page table because it has its own virtual address space. (2) The mapping must be fast. The virtual-to-physical mapping must be done on every memory reference. A typical instruction has an instruction word, and often a memory operand as well. Consequently, it is necessary to make 1, 2, or sometimes more page table reference per instruction. 29 Hardware solutions: Simplest design: one page table consisting of an array of fast hardware registers, with one entry for each virtual page, indexed by virtual page number. Advantage: straightforward, and requires no memory reference. Disadvantage: expensive (if the page table is large) Page table entirely in main memory, and one hardware register that points to the start of the page table Advantage: allows the memory map to be changed at a context switch by reloading one register. Disadvantage: requires one or more memory references to read page table entries during the execution of each instruction. Variations of the two approaches 30 Second-level Page tables Multilevel Page Tables To get around the problem of having to store huge page tables in memory all theSecond-level page tables time. 32-bit virtual address PT1:10 bits, PT2: 10 bits Offset:12 bits Top-level page table (Page size: 4KB ) Page number: 220 The secret to the multilevel page table method is to avoid keeping all tables in memory all the time. e.g. a process needs 12Mbytes, 4MB for text, the next 4MB for data, and the top 4MB for stack. Only 4 page tables are actually needed: top-level table, second level tables for 0 to 4M, 4M to 8M, and top 4M. e.g. Virtual address = 0x00402004, then PT1=1, PT2=2, Offset=4 31 Structure of a Page Tables Entry The exact layout of an entry is highly machine dependent, but the kind of information present is roughly the same. The size varies from computer to computer, but 32 bits is a common size. Page frame number: the goal of the page mapping is to locate this value. Present/absent bit: If this bit is 1, the entry is valid and can be used. If it is 0, the virtual page to which the entry belongs is not currently in memory. Modified and Referenced bits: keep track of page usage. When a page is written to, the hardware automatically sets the modified bit. If the page in it has been modified, it must be written back to the disk. Modified bit is sometimes called dirty bit. The reference bit is set whenever a page is referenced. Caching disabled bit: allows caching to be disabled for the page. 32 TLBs – Translation Lookaside Buffers All paging schemes keep the page tables in memory => performance problems! Most programs tend to make a large number of references to a small number of pages, and not the other way around Solution: equip computers with a small hardware device for mapping virtual addresses to physical addresses without going through the page table This device is called associative memory (AM) or translation lookaside buffer. It is usually inside the MMU and consists of a small number of entries (normally 32) 33 A TLB to speed up paging When a virtual address is presented to the MMU for translation, the hardware first check to see if its virtual page number is present in TLB by comparing it to all the entries simultaneously. If a valid match is found and the access does not violate the protection bits, the page frame is taken directly from TLB, without going to the page table. Hit ratio: fraction of memory references that can be satisfied from the TLBs. The higher the hit ratio, the better the performance. When the virtual page number is not in TLB, the MMU detects the miss and does an ordinary page lookup. 34 Software TLB Management Hardware TLB Management – MMU hardware recognizes the virtual memory has page table. TLB management and TLB fault handling are done by TLB. Software TLB Management – Modern RISC computers do nearly all of these page management in software. – e.g. SPARC, MIPS, Alpha, and HP PA. – On these machines, TLB entries are explicitly loaded by the OS. When a TLB miss occurs, it just generates a TLB fault and tosses the problem to OS. The OS must find the page, remove an entry from the TLB, enter the new one, and restart the instruction that faulted. And, of course, all of this must be done in a handful of instructions because TLB misses occur much more frequently than page faults. – If TLB is reasonably large to reduce the miss rate, software management of TLB turns out to be acceptably efficient (Uhlig, 1994). – Main gain: simpler MMU, more area on CPU chip for cache and other features. 35 Inverted Page Tables Today: 32-bit virtual address space and physical memory, 4 Kbytes pages size => each process need 2 20 entries in its page table (PT) with 4 bytes per entry = 4 Mbytes / process and PT is large but manageable (multilevel paging schemes) RISC chips with 64-bit virtual address space?: – 64-bit virtual address space >>>> physical memory – 64-bit address space = 20 million terabytes – 4 Kbytes page size => 2 52 = 4 quadrillion PT entries => requires rethinking!!!!! • Solution: virtual address space immense, physical pages frames still manageable => inverted page table in this design, there is one entry per page frame in real memory, rather than one entry per page of virtual address space. E.g. with 64-bit virtual addresses, a 4-KB page, and 256 MB of RAM, and inverted page table only requires 65,536 entries. The entry keeps track of which (process, virtual page) is located in the page frame. 36 All virtual pages currently in memory that have the same hash value are chained together Comparison of a traditional page table with an inverted page table IBM and HP workstations use inverted page tables. It will become more common as 64-bit machines become wide-spread. 37 Page Replacement Algorithms 38 Page Replacement Algorithms • • • • Page fault => OS has to select a page for replacement Modified page => write back to disk Not modified page => just overwrite with new page How to decide which page should be replaced? – random – many algorithms take into account • usage • age • ... 39 Optimal Page Replacement Algorithm What is optimal page replacement algorithm? Unrealizable page-replacement strategy that replaces the page that will not be used until furthest in the future. Easy to describe - impossible to implement because OS cannot look into future Useful to evaluate page replacement algorithms Best (optimal) page replacement algorithm – page fault occurs, a set of pages is in memory – label all pages with the number of instructions that will be executed before this page will be used again in the future – replace the page with the highest number It is of no use in practical. 40 NRU(Not Recently Used) Page Replacement Algorithm What is NRU page replacement algorithm? Page replacement strategy that uses referenced bits and modified bits to replace page. • Status bits associated with each page – R: page referenced (read or written) – M: page modified (written) (dirty bit, dirty page) • Four classes: – class 0: not referenced, not modified Check in – class 1: not referenced, modified this order – class 2: referenced, not modified – class 4: referenced, modified • NRU removes a page at random from the lowest numbered nonempty class • Low overhead 41 FIFO Page Replacement Algorithm What is FIFO page replacement algorithm? It is a page replacement strategy that replaces the page that has been in memory longest. OS maintains list of all pages currently in memory. Pages are stored in list by age. FIFO replaces oldest pages in case of page fault. Incurs low overhead, but does not predict future page usage accurately. FIFO is rarely used in its pure form. Page loaded first Time 0 A 3 B 7 C 8 D 12 E 14 F 15 G 18 H Most recently loaded page 42 Second Chance Page Replacement Algorithm What is second chance page replacement algorithm? It is a variation of FIFO page replacement that uses the referenced bit and FIFO queue to determine which page to replace. If the oldest page’s referenced bit is off, it replace the page. Otherwise it turns off the referenced bit on the oldest page and moves it to the tail of FIFO queue, and examines the next page or pages until it locates a page with its referenced bit turned off. • R: referenced bit. • Second chance is a reasonable algorithm • But, inefficient because it is moving pages around on its list Page loaded first Time Time 0 A 3 B 3 B 7 C 7 C 8 D 8 D 12 E 12 E 14 F 14 F 15 G 15 G 18 H 18 H 20 A Most recently loaded page A is treated like newly loaded page 43 The Clock Page Replacement Algorithm When a page fault occurs, the page the arrow is pointing to is inspected. Action taken depends on the R bit R=0: evict page R=1: clear R & advance What is clock page replacement? It is a variation of second chance page replacement strategy that arranges the pages in a circular list instead of a linear list. • Pointer to the oldest page – R bit 0: page not referenced in last round => replace – R bit 1: page referenced in last round • set R bit to 0 • advance until first page with R = 0 is found – advance pointer to next entry in both cases 44 Least Recently Used (LRU) Page Replacement Algorithm What is LRU page replacement algorithm? Page-replacement strategy that replaces the page that has not been referenced for longest time. LRU generally predicts future page usage well but incurs significant overhead. (1) Linked list. It is expensive: maintaining the list is time consuming operation. (2) Implement with special hardware — a counter. Each page table entry must also have a filed large enough to contain the counter. (3) Another special hardware that can contain a matrix of nn bits, initially all 0. At any instant, the row whose value is lowest is the least recently used. 0 1 2 3 0 0111 0011 0001 0000 0000 1011 1001 1000 1000 1 0000 0000 1101 1100 1101 2 0000 0000 0000 1110 1100 3 0000 0 1 1 1 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 1 0 0 1 1 1 0 1 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 1 0 0 0 1 0 0 0 0 Pages referenced in this order: 0 1 2 3 2 1 0 3 2 3 45 Simulating LRU in Software Previous LRU algorithms are realizable in principle if machines have this hardware. They are no use to OS designer who is making a system for a machine that does not have this hardware. Solution: NFU (Not Frequently Used) algorithm: It requires a software counter associated with each page, initially zero. At each clock interrupt, OS scans all pages in memory. For each page, the R bit (0 or 1) is added to the counter. —Main problem of NFU algorithm: it never forget anything. Aging: Modifies NFU algorithm as follows, and makes it able to simulate LRU quite well. (1) The counters are each shifted right 1 bit before R bit is added in; (2) The R bit is added to the leftmost, rather than the rightmost. 46 The aging algorithm simulates LRU in software Note 6 pages for 5 clock ticks, (a) – (e) In practice, 8 bits is enough if a clock tick is around 20 msec. 47 The Working Set Page Replacement Algorithm W(k,t) k Working set: the set of pages that a process is currently using. k: most recent memory reference t: time w(k,t): the size of the working set at time, t 48 page span = current virtual time – time of last use : predetermined page span The working set page replacement algorithm: The hardware is assumed to set R and M bits. A periodic clock interrupt is assumed to cause software to run that clears R bit on every clock tick. On page every fault, the page table is scanned to look for a suitable page to evict. 49 algorithm but also uses the working set information. page span = current virtual time – time of last use : predetermined page span. 50 Review of Page Replacement Algorithms 51 Segmentation 52 Segmentation Problem in one-dimensional address space with growing tables Ex. A compiler has following tables (1) Source text (2) Symbol table (3) Constant table (4) Parse tree (5) Stack Problem: one table may bump into another Solution: To provide the machine with many completely independent address spaces, called sgements. 53 Segment: Variable-size set of contiguous addresses in a process’s virtual address space that is managed as one unit. A segment is typically the size of an entire set of similar items, such as a set of instructions in a procedure or the contents of an array, which enables the system to protect such items with fine granularity using appropriate access rights. – two or more separate/independent virtual address spaces growing/shrinking – different kinds of protection are possible Two-part address (n, k): – n: address number (which segment) – k: address within segment Segmentation also facilitates sharing procedures or data between several processes – e.g. shared library 54 • Segmented memory allows each table to grow or shrink independently of other tables 55 Comparison of paging and segmentation 56 Implementation of Pure Segmentation The implementation of segmentation differs from paging in an essential way: Pages are fixed size and segments are not. (a)-(d) Development of checkerboarding (e) Removal of the checkerboarding by compaction External fragment (or checkerboarding): After the system has been running for a while, memory will be divided up into a number of chunks, some containing segments and some containing holes. This phenomena is called external fragment. 57 Segmentation with Paging: MULTICS MULTICS (MULTiplexed Information and Computer Service) : One of the first operating systems to implement virtual memory. Developed by MIT, GE and Bell Laboratories as the successors to MIT’s CTSS (Compatible Time Sharing System). Ken Thompson, one of the computer scientists at Bell Labs who had worked on MULTICS project, wrote a strippeddown, one-user version of MULTICS. This work later developed into UNIX. 58 Segmentation with Paging • Many large segments > main memory size => paging • MULTICS – Honeywell 6000 machines + descendents – per program: virtual memory of max. size 218 = 256 K segments (max. size 64 K 36-bit word long) – Treat each segment as a virtual memory and to page it. – segment table + page tables – 16-word high speed TLB 59 • Descriptor segment points to page tables 64K 60 A 34-bit MULTICS virtual address 61 Memory reference Conversion of a 2-part MULTICS address into a main memory address Problem: program would not run very fast. Solution: 16-word TLB 62 Simplified version of the MULTICS TLB (Existence of 2 page sizes makes actual TLB more complicated) 63 Segmentation with Paging: The Intel Pentium • MULTICS: – Both segmentation and paging – 256K independent segments, each up to 64K 36-bit words • Intel Pentium – Both segmentation and paging – 16K independent segments, each up to 1 billion 32-bit words – Each program has its own LDT (Local Descriptor Table). LDT describes segments local to each program, including its code, data, stack, and so on. – A single GDT (Global Descriptor Table) shared by all programs on the computer. GDT describes system segments including the OS its self. 64 To access a segment, a Pentium program first loads a selector for that segment into one of the machine’s 6 segment register. CS: holds the selector for code segment DS: holds the selector for data segment A Pentium selector Specify LDT or GDT entry number. Theses tables are restricted to hold 8K segment descriptors. 65 At the time a selector is loaded into a segment register, the corresponding descriptor is fetched from the LDT or GDT and stored in microprogram registers, so it can be accessed quickly. Pentium code segment descriptor (Data segments differ slightly): 8 bytes 66 How to convert (selector, offset) pair to physical address ? (1) Find the descriptor corresponding to the selector. If the segment does not exist, or is currently paged out, a trap occurs. (2) Check the offset is beyond the end of the segment, in which case a trap occurs. If G(granularity)=0, limit field (20bits) is the exact segment size, up to 1MB. If G=0, limit field gives the segment size in pages instead of bytes. Pentium page size is fixed as 4KB, 20 bits are enough for segments up to 232 bytes. (3) Assuming that the segment is in memory and the offset is in range, the Pentium then adds 32-bit base field to offset to form linear address. 32-bit base is broken into 3 pieces all over descriptor for compatibility with 286 (base is 24 bits) (4) If paging is disabled (by a bit in global control register), the linear address is interpreted as the physical address and sent to memory for read or write. This is a pure segmentation scheme. (5) If paging is enabled, the linear address is interpreted as a virtual address and mapped onto physical address using page tables. Page size is 4KB, a segment might contain 1 million pages. 67 Conversion of a (selector, offset) pair to a linear address 68 Each running program has a page directory consisting of 1K 32-bit entries. Located at an address pointed to by a global register. Each entry in this directory points to a table also containing 1K 32-bit entries. Mapping of a linear address onto a physical address 69 Page table entry: 32 bits each, 20 of which contains page frame number, remaining bits contains access and dirty bits, set by hardware, Single page table handles 4MBytes of memory (1K page frames, page size is 4KB) To avoid making repeated reference to memory, the Pentium (like MULTICS) has a small TLB that directly maps the most recently used Dir-page combination onto physical address of the page frame. If some application does not need segmentation but is content with a single, paged, 32-bit address, the model is possible. All segment registers can be set up with the same selector, whose descriptor has base=0 and limit set to maximum. In fact all current OSs for Pentium work this way. OS/2 was the only one that used full power of Intel MMU architecture. 70 Level Protection on the Pentium Pentium supports 4 protection level. A running program is at a certain level indicated by 2 bits in PSW(processor status word). Each segment in the system also has a level. 71