Memory Management •The part of O.S. that manages the memory hierarchy (i.e., main memory and disk) is called the memory manager •Memory manager allocates memory to process when they need it and deallocate it when they are done. It also manages swapping between main memory and disk when main memory is too small to hold all processes. Memory Managements Schemes • There are two types of memory managements: the ones that move back and forth between main memory and disk (i.e., swapping and paging) and those that do not. • The example of the second one is monoprogramming without swapping (rarely used today). In this approach only one program is running at a time and memory is used by this program and O.S. • Three ways of organizing the memory in monoprogramming approach is shown is next slide Relocation and Protection • When a program is linked (all parts of program are combined into a single address space), linker should know at what address the program will begin in memory. • Because the program each time is loaded into a different partition, its source code should be relocatable. • One solution is relocation during loading which means linker should modify the re-locatable addresses. • The problem is even this method can not stop the program to read or writes the memory locations belonging to the other users (i.e., protection problem) Relocation and Protection • Solution to both of these problems is using two special hardware registers, called base and limit registers. When process is scheduled, the base register is loaded with the address of start of its partition and limit register with the length of its partition. Example: base addr, X if loaded at location x=10 x + 10 20 2x + 3 23 x +40 50 Swapping • In the interactive system because sometimes there is not enough memory for the active processes, excess processes must kept on disk and brought in to run dynamically. • Swapping is one of the memory management technique, in which each process is brought in its entirely and after running it for a while putting it back on the disk. • The partitions can be variable that means number, location and size of the partitions vary dynamically to improve memory utilization Swapping • Swapping may cause multiple holes in memory in that case all the processes should move downward to be able to combine the holes into one big one ( memory compaction) • Swapping is slow and also if processes’ data segment grow dynamically there should be a little extra memory whenever a process is swapped in. If there is no space in the memory when processes growing they should be swapped out of the memory or killed (if there is no disk space). • Also if both stack segment (for return addresses and local variable) and data segment (for heap and dynamically allocated variables) of the processes grow, special memory configuration should be considered (see next slide) Issues with Variable Partitions • How do we choose which hole to assign to a process? (allocation strategy) • How do we manage memory to keep track of holes? (see next slide) • How do we avoid many small holes? One possible solution is compaction but there are several variations for compaction. The questions is which processes should be moved) Memory Management with Bitmaps • Each allocation unit in the memory corresponds to one bit in the bitmap (e.g. it is 0 if the unit is free and 1 if it is occupied) • The size of allocation unit is important. The smaller allocation unit the larger bitmap. • If the allocation unit is chosen large, the bitmap is smaller but if process size is not an exact multiple of the allocation unit the memory may be wasted Memory Management with Linked Lists • A linked list keeps track of allocated and free segments (i.e., process or hole). Memory Management with Linked Lists • If the list is sorted by address the combinations of terminating the process x can be shown as follows. List update is different for each situation. Using double-linked list is more convenient. Memory Management with Linked Lists For allocation of memory for new process allocation strategies are: • First fit : first available hole. Generates largest holes • Next fit: next hole. Search starts from the previous place instead of the beginning of the list • Best fit: finds the best hole with the size close to the size needed. It is slower and the results useless tiny holes • Worst fit: takes the largest hole • Quick fit: maintains the separate lists of common sizes such as 4KB,12KB and 20KB holes. It is fast but the disadvantage is merging of free spaces is expensive Virtual Memory • Overlays are the parts of program that are created by programmer. By keeping some overlays on disk it was possible to fit a program larger than memory in the memory. Swapping overlays in and out (they call each other) and also splitting a program to overlays were boring and time consuming. • To solve this problem virtual memory was devised in 1961. It means if the size of program exceeds the size of memory, O.S. keeps those part of the program currently in use, in main memory and the rest on disk. It can be used in single program and multiprogramming environments. Paging • Most of the virtual memory systems use a technique called paging, which is based on virtual addressing. • When a program references an address, this address is virtual address and forms virtual address space. A virtual address does not go directly to a memory bus. Instead it goes to Memory Management Unit (MMU) that maps virtual address onto physical memory address. (see next slide) Virtual Address Paging • Virtual address space is divided into the units called pages corresponding to the units in the physical memory called page frames. • Page table maps virtual pages onto page frames. (see next slide) • If the related page frame exist in the memory physical address is found and the address in the instruction transformed into physical address. • If related page frame does not exist in memory it is a page fault. That means the related page exist on disk. In case of page fault O.S. frees one of the used page frames (writes it to disk) and fetched the reference page into that page frame and updates the page table. Page Table • The internal structure of the page table consists of 3 bits that shows the number of page frame and a bit that shows availability of page frame in memory. • The incoming address consists of virtual page number that translate to 3 bits address of page frame and offset that copies directly to output address. (see next slide) • Each process has its own page table. If O.S. copies the process page table from memory to an array of fast hardware registers, the memory references reduced but since page table is large, it is expensive. If O.S. keeps the process page table in the memory, there should be one or two memory reference (reading the page table) for each instruction. This is a disadvantage because of slowing down the execution. We will see different solutions later. Paging • In general by knowing the logical address space and page size the page table can be designed • For example with 16 bit addressing and 4k page size: since 4K= 212 it means we need 12 bit offset and 16- 12 = 4 bit for higher numbers for memory logical address. The structure of page table can be same as previous slide. Paging • Example: For the 8 bit addressing system and page size of 32 bytes we need 5 bits for offset and 3 bits for the higher bits of memory logical address. If we assume 8 bit for physical addressing the page table can be as the following: 000 010 001 111 010 011 011 000 100 101 101 001 110 100 111 110 changes logical address 011 01101=> 000 01101 (physical) and logical address 110 0110=> 100 0110 (physical) Issues with the Page Tables • With 4KB page size and 32 bit address space 1 million pages can be addressed. It means the page table must have 1 million entries. One of the way for dealing with such a large page table is using multilevel page table. With multilevel page table we can keep only the pages that are needed in the memory (see next slide). • Another issue is mapping must be fast. For example if the instruction takes 4 nsec, the page lookup must be done under 1 nsec. 2 level scheme • The first 2 bits are the index to the table1 that remains in the main memory. • The second 2 bits are index to table2 that contains the pages which may not exist in memory. (table2 is subject to paging) table1 table2 page frame 00 ---------- 00 01 10 -------1011 11 01--------------- 00 01 …… for example 0010 1100 changes to 1011 1100 • Only the tables that are currently used are kept in the memory Structure of a PageTable Entry • The modified bit or dirty bit shows if the page is modified in the memory. In the case of reclaiming the page frame page with dirty bit must be written back to disk. Otherwise it could be abandoned. TLBs Translation Lookaside Buffers • Using fast hardware lookup cache ( typically between 8-2048 entries) is standard solution to speed up paging (see next slide). • In general the logical address is searched in TLB. If address is there and access does not violate the protection bit the page frame is taken directly from TLB. In case of miss it goes to the page table. The missed page looked up from page table and replaced by one of the entry of TLB for the future use. TLBs Translation Lookaside Buffers Inverted Page Tables • There is only one entry per page frame in real memory. • Advantage: Less memory is required for example with 256 MB Ram and 4K page size 216 pages can hold in the memory so we need 65,536 (equal to 216) entries instead of 252 entries that was required for traditional page table with 64bit addressing • Disadvantage: Must search the whole table to find the address of the page for the requested process because it is not indexed on virtual page (see the next slide). Inverted Page Tables Page Replacement Algorithm • When a page fault occurs the operating system has to choose the page to evict from the memory. • The Optimal Page Replacement Algorithm is based on the knowledge of the future usage of a page. It evicts the pages that will be used as far as possible in the future. It can not be implemented except for if the program runs for the second time and system recorded the usage of the pages from the first time run of that program. • The only use of optimal algorithm is to compare its performance with the other realizable algorithms. NRU (Not Recently Used) This algorithm uses the M (modified) and R(referenced) bit of page table entry. On each page fault it removes a page at random from the lowest numbered of following classes: • Class 0: not referenced, not modified • Class 1:not referenced , modified • Class 2:referenced, not modified • Class 3:referenced, modified The Second Chance Replacement Algorithm • FIFO: List of the pages with head and tail. On each page fault the oldest page which is at the head is removed • Problem with FIFO is it may throw out the heavily used pages that came early • Second chance algorithm is enhancement of FIFO to solve this problem • On each page fault it checks the R bit of the oldest page if it is 0, the page is replaced if it is 1, the bit cleared and the page is put onto the end of the list of pages. (see the next slide) The Second Chance Replacement Algorithm The Clock Page Replacement Algorithm • Problem with Second Page Replacement Algorithm is inefficiency because of moving pages around its list. • Clock Page Replacement solves this problem. On each page fault the pointed page is inspected. If its R bit is 0, the page is evicted and pointer(hand) is advanced one position. If R is 1, it is cleared and pointer is advanced to the next page (see next slide) The Clock Page Replacement Algorithm The Least Recently Used (LRU) Page Replacement Algorithm LRU: throw out the page that has not been used for the longest time. Implementation: • Link list of the used pages. Expensive because link list requires update on each reference to the page • 64 bit hardware counter that stores the time of references for each page. On each page fault the page with lowest counter is LRU page. Each page table entry should be large enough to contain the counter. • Using a matrix (hardware) n*n for n page frames: On each reference to the page k, first all bits of row k are set to 1 and then all bits of column k are set to zeros. The row with the lowest binary value is the LRU page. LRU with matrix. Pages: 0,1,2,3,2,1,0,3,2,3 Simulation LRU in Software • NFU (Not Frequently Used) algorithm is approximation of LRU that can be done in software. • There is a software counter for each page. O.S adds the value (0 or 1) of the R bit related to that page on each clock interrupt. • The problem with NFU is it does not forget the number of counts. It means if a page received lots of references in the past, even if it has not been used recently it never evicts from the memory. Simulation LRU in Software • To solve this problem the modified algorithm, known as aging works as the follows: • 8 bit reference byte is used for each page and at referenced intervals (clock tick) a one bit right shift occurs. Then R bit is added to the leftmost bit. The lowest binary value is the LRU page. (see next slide) • The difference between aging and LRU is in aging if two pages contain the same value of 0, we can not say which one is less used before 8 clock ticks. Aging algorithm simulation of LRU More on Page replacement algorithms • High paging activities is called thrashing . A process is thrashing if it is spending more time on paging than execution. • Usually processes exhibit locality of references. Means they reference small fraction of their pages • When a process brings the pages after a while it has most of the pages it needs. This strategy is called demand paging that means pages are load on demand, not in advance. • The strategies that try to load the pages before process run are prepaging. Working set model • The set of pages that a process is currently using is called its working set. • If k considered the most recent memory references, there is a wide range of k in which working set is unchanged. (see next slide) It means prepaging is possible. • Prepaging algorithms guess which pages will be needed when a process restarts. This prediction is based on the process working set when process was stopped. • At each page fault, working set page replacement algorithms check to see if the page is part of the working set of the current process or not. • WSClock algorithm is the modified working set algorithm that is based on clock algorithm. Working set model K Summary of Page Replacement Algorithms Modeling Page Replacement Algorithms • For some page replacement algorithms, increasing the number of frames does not necessarily reduce the number of page faults. This strange situation has become known as Belady’s Anomaly. For example FIFO caused more page faults with four page frames than with three for a program with five virtual pages and following page references: (see next slide) 01 2301401234 Stack Algorithms In these modeling algorithms, paging can be characterized by three items: 1- The reference string of the executing process 2- The page replacement algorithm 3- The number of page frames available in the memory For example using LRU page replacement and virtual address of eight pages and physical memory of four pages next slide shows the state of memory for each of the reference string: 021354637473355311172341 The State of Memory With LRU stack-based algorithm The Distance String • The distance from top of the stack is called distance string. • Distance string depends on reference string and paging algorithm. It shows the performance of algorithm. For example for (a) most of the string are between 1 and k. It shows with a memory of k page frames few page fault occur. But for (b) the reference are so spread out that generates lots of page faults. (a) (b) Segmentation • Although paging provides a large linear address space without having to buy physical memory but using one linear address space may cause some problems. • For example suppose with having virtual address of 0 to some maximum address we want to allocate the space for different tables of a program that are built up as compilation proceeds. Since we have a one-dimensional memory, all five tables that are produced by compiler, should be allocated as contiguous chunks. The problem is with growing tables, one table may bump into another. (see the next slide) Segmentation • Segmentation is a general solution to this problem. It is providing the machine with many completely independent address spaces, called segments • Segments have variable sizes and each of them can grow or shrink independently without affecting each other. • For example next slide shows a segmented memory for the compiler tables Advantages of Segmentation • Simplifying handling of growing/shrinking data structures • Since each segment starts with address of 0, if each procedure occupies a separate segment then linking up of procedures compiled separately is simplified. It is true because a call to procedure in segment n uses two part address (n,0) it means size changes of other procedures does not affect the starting address which is n • Sharing procedures or data between several processes is easier with segments (e.g. shared library) • Different segments can have different kind of protection (e.g. read/write but not executable for an array segment) • Major problem with segmentation is external fragmentation that can be dealt by compaction (see next slide) Segmentation with Paging • If the segments are large by paging them only the pages that are needed can be placed in memory. • MULTICS (OS of Honeywell 6000) uses segmentation with paging. Each 34-bit MULTICS has segment number and address within that segment. The segment number is used to find segment descriptor. Segment descriptor points to the page table by keeping 18 bits address and contains the segment information (see next slide) Segmentation with Paging in MULTICS • By finding segment number from virtual address then if there is no segment fault, page frame number can be find. If it is in the memory the memory address of the page is extracted and will be added to the offset. (see next slide) • If page frame is not in the memory page fault occurs. • The problem is: this algorithm is slow so MULTICS uses a 16-word TBL to speedup the searching.