Virtual Memory: LRU Approximation Algorithms Many systems provide some help in the form of a reference bit. Operating Systems: The reference bit for a page is set, by the Lecture 9 hardware, whenever that page is referenced (either a read or a write to any byte in the page). Memory Management Virtual Memory Reference bits are associated with each entry in the Page Table. 2 Virtual Memory: LRU Approximation Algorithms (1) Virtual Memory: LRU Approximation Algorithms (1) AdditionalAdditional-ReferenceReference-Bits Algorithm AdditionalAdditional-ReferenceReference-Bits Algorithm We can gain additional ordering information by recording the reference bits at regular intervals. These HRs contain the history of page use for the last eight time periods. We If the HR contains 00000000, then the page has not been used for eight time periods; At A page that is used at least once each period would have a HR’s value of 11111111. can keep an 8 bit byte (History Register (HR)) for each page in a table in memory. regular intervals (say every 100 milliseconds), a timer interrupts transfer control to the OS. The OS shifts the reference bit for each page into the high order bit of its HR, shifting the other bits right 1 bit, discarding the low order bit. 3 4 Virtual Memory: LRU Approximation Algorithms (2) Virtual Memory: LRU Approximation Algorithms (1) AdditionalAdditional-ReferenceReference-Bits Algorithm time Interval: Reference bits: 0 0 1 0 0 0 1 1 1 0 <HR> = 11000100 <HR> = 11100010 <HR> = 01110001 SecondSecond-Chance Algorithm (Clock Algorithm) A page with <HR> = 11000100 is more recently referenced then a page with <HR> = 01110111 The basic algorithm of a second chance replacement is a FIFO replacement algorithm. When a page has been selected, however, we inspect its reference bit. If we interpret <HR> as unsigned integers, the page with lowest number is the LRU page, and it can be replaced If the value is 0, we proceed to replace this page. If the reference bit is 1, however, we give that page a second chance and move on to select the next FIFO page. The number of bits in HR can be varied (selection should be fast as possible). In the extreme case, the number can be reduced to zero, leaving only the reference bit itself. This is the secondsecond-chance algorithm. 5 Virtual Memory: LRU Approximation Algorithms (2) Virtual Memory: LRU Approximation Algorithms (2) SecondSecond-Chance Algorithm (Clock Algorithm) 6 Second-Chance Algorithm (Clock Algorithm) When a page gets a second chance, its reference bit is cleared and its arrival time is reset to current time. Before a b Thus, a page that is given a second chance will not be replaced until all other pages are replaced. In additional, if a page is used often enough to keep its reference bit set, it will never be replaced. One way to implement this algorithm is as a circular queue. After PageRef. PageRef. bit b 0 0 0 m 1 Pointer c a b 1 g 0 c 1 f e g 0 circular queue of pages 1 d f 0 h 0 Page “h” is referenced 7 m 1 0 circular queue of pages d 0 0 1 0 Pointer 8 Virtual Memory: LRU Approximation Algorithms (3) Virtual Memory: LRU Approximation Algorithms (2) Second-Chance Algorithm (Clock Algorithm) Enhanced SecondSecond-Chance Algorithm Use the modify bit, M, along with the reference bit, R. Depending on (R, M) pages are classified in four classes: If page to be replacement (in clock order) has reference bit = 1, then: 1) (0, 0): not recently used nor modified - best page to replace; 2) (0, 1): not recently used, but modified - not quite as good, because the page will need to be written out before replacement; set reference bit 0; leave page in memory; replace next page (in clock order), subject to same rules. 3) (1, 0): recently used, but clean - probably will be used again soon; 4) (1, 1): recently used and modified - probably will be used again, and write out will be needed before replacing it. Class (1) has the lowest priority while class (4) the highest. highest. We replace the first page encountered in the lowest nonempty class. 9 10 Virtual Memory Virtual Memory: LRU Approximation Algorithms (4) Factors which determines the page fault rate Comparison of Page Replacement Algorithms Counting Algorithms - keep a counter of the number of references that have been made to each page. 40 LFU Algorithm: The least frequently used (LFU) algorithm replaces page with smallest count. Program; MFU Algorithm: The most frequently used (MFU) algorithm is based on the argument that the page with the smallest count was probably just brought in and has yet to be used. Memory size; Replacement Page size; Prefetch PagePage-Buffering Algorithm - desired page is read into a free frame from the pool before the victim is written out. Implementation is expensive. Approximation of OPT replacement is not very good. algorithm; algorithm; Page Faults per 1000 References 35 OPT LRU Clock FIFO 30 25 20 15 10 5 0 6 8 10 12 14 Number of Page Frames Allocated 11 (From W.Stallings, W.Stallings, Operating Systems, 1992, Macmillan Publ. Publ. Co., p. 289) 12 Virtual Memory: Allocation Virtual Memory: Allocation of Frames of Frames The simplest case of virtual memory is the single- There is a minimum number of frames that user system. There are many variations for distributing of frames. must be allocated. Obviously, as the number of frames allocated to each process decreases, the page fault-rate increases, slowing process execution. A different problem arises when demand paging is combined with multiprogramming. Multiprogramming puts two (or more) processes in memory at the same time. 13 14 Virtual Memory: Allocation of Virtual Memory: Allocation Frames of Frames Two major allocation schemes: fixed and priority allocations. This minimum number of frames is defined by the instruction-set architecture. (Remember that, when a page fault occurs before an executing instruction is complete, the instruction must be restarted) Fixed allocation: allocation: Example: Consider a machine in which all memory-reference instructions have only one memory address. We need at least one frame for the instruction and one frame for the memory reference. In addition, if one-level indirect addressing is allowed, then paging requires at least three frames per process. IBM 370, MVS (Move Character) instruction (storage-to-storage): instruction is 6 bytes, might span 2 pages; 2 pages to handle from; 2 pages to handle to => 6 pages to handle. Equal allocation => if 100 frames and 5 processes, give each 20 pages. Proportional allocation => allocate according to the size of process: si = size of process pi ; S = Σ si ; m = total number of frames; ai = allocation for pi = (si / S) x m . Example: m = 64, s1 15 = 10, s2 = 127, a1 = (10/137) x 64 ~ 5, a2 = (127/137) x 64 ~ 59. 16 Virtual Memory: Allocation Virtual Memory: Allocation of Frames of Frames We can classify page-replacement algorithms into Priority allocation: allocation: two broad categories: Use a proportional allocation scheme using priorities rather than size. If process Pi generates a page fault: select Global replacement - process select a replacement frame from the set of all frames; one process can take a frame from another. Gives more performance; may cause trashing. for replacement one of its frames; select for replacement a frame from a process with lower priority number. Local replacement - each process selects a replacement frame from only its own set of allocated frames. Not affected by other active processes. 17 Virtual Memory: Thrashing Virtual Memory: Thrashing A high paging activity is called thrashing. A process is thrashing if it is spending more time paging than executing. Thrashing = a process is busy swapping pages in and out. If a process does not have “enough” pages, the page-fault rate is very high: The OS monitors CPU utilization: if CPU utilization is too low, we increase the degree of multiprogramming by introducing a new process to the system a global-replacement algorithm is used, replacing pages with no regard to the process to which they belong (steals pages from active process) active processes have fewer pages, more page faults CPU utilization decreases As a result, CPU utilization drops even further, and the CPU scheduler tries to increase the degree of multiprogramming even more. The page-fault rate increases tremendously. CPU utilization 18 19 thrashing degree of multiprogramming 20 Virtual Memory: Thrashing: How to Virtual Memory: Thrashing:Working- prevent it Set Model The working-set model is based on the assumption of To prevent thrashing, we must provide a process as locality. many frames as it needs. This model used a parameter, ∆, (a time unit) to define the working-set window. The idea is to examine the most recent ∆ page references. The set of pages in the most recent ∆ page references is the working set. set If the page is in active use, it will be in the working set. If it is no longer being used, it will drop from the working set ∆ time units after its last reference. Thus, the working set is an approximation of the program’s locality. But how do we know how many frames it “needs”? There are several techniques. The working-set strategy starts by looking at how many frames a process is actually using. This approach defines the locality model of process execution. The locality model states that, as a process executes, it migrates from one locality to another. A locality is a set of pages that are actively used together. A program is generally composed of several different localities, which may overlap. ... 2 6 1 5 7 7 7 7 5 1 6 2 3 4 1 2 3 4 4 4 3 4 3 4 4 4 1 3 2 3 4 4 4 3 4 4 4... We can see that localities are defined by the program structure and its data structures. ∆ ∆ 9090-10 rule: rule programs spend 90% of their time in 10% of their code 21 WS(t WS(t1) = {1,2,5,6,7} t1 WS(t WS(t2) = {3,4} t2 22 Virtual Memory: Thrashing:Working- Thrashing:Working-Set Model Set Strategy The most important property of the working set is its Allocate the working set size to each process. size. If we compute the working-set size, WSSi (total number of pages referenced in the most recent ∆ ), for each process in the system, we can then consider D = Σ WSSi ,where D is the total demand frames. Each process is actively using the pages in its working set. Thus, process i needs WSSi frames. If the total demand is greater than the total number of available frames (D > m), thrashing will occur, because some processes will not have enough frames. Policy if D > m , then suspend one of the processes. Prevents thrashing while maximizing the multiprogramming level. The overhead of keeping track of the working set is non-trivial. Approximations have been used in actual implementation. Approximation of the working set using the reference bits and interval timer: Set the interval timer to interrupt at every T time-units; When the timer interrupts, check the page frames with the reference bits set to 1, which are considered to be in the working set. 23 24 Virtual Memory: Thrashing:Page- Virtual Memory: Thrashing:Page-Fault Fault Frequency Frequency If the actual page-fault rate exceeds the upper Thrashing has a high page-fault rate. Thus, we limit, we allocate that process another frame; want to control the page-fault rate. The page-fault frequency (PFF) strategy takes a more direct approach than Working-Set Model. When a page-fault rate is too high, we know that the process needs more frames. Similarly, if the page-fault rate is too low, then the process may have too many frames. we remove a frame from that process. Thus, we can directly measure and control the page-fault rate to prevent thrashing. page--fault fault--rate page If the page-fault rate falls below the lower limit, We can establish upper and lower bounds on the desired page-fault rate. increase number of frames upper bound lower bound decrease number of frames 25 number of frames Virtual Memory: Page Size Virtual Memory: Program Consideration Structure Page sizes are usually hardware defined. Demand paging is designed to be transparent to the No ideal choice: Bigger pages reduce table fragmentation; Smaller pages give better localities and reduce internal fragmentation; I/O is more efficient for bigger pages; Smaller pages may reduce the amount of I/O needed; Large pages reduce the number of page faults. 26 user program. However, in many cases system performance can be improved by an awareness of the underlying demand paging. Most contemporary system have settled on 4 Example: Assume pages are 128 words in size. Program is to initialize to 0 each element of a 128 x 128 array A(128,128). for i := 1 to 128 do for j := 1 to 128 do A(i,j) := 0 ; How many page faults are generated? Case 1: Pascal, C, PL/1 => matrix is stored in row major {A(1,1), A(1,2), …, A(1,128)} are stored in the same page => 128 page-faults Case 2: Fortran => matrix is stored in column major {A(1,1), A(2,1), …, A(128,1)} are stored in the same page => 128x128 = 16384 page-faults kilobytes. 27 28 Virtual Memory: I/O Virtual Memory: Demand Interlock Segmentation The problem with I/O in swapping systems still Demand Paging is the most efficient virtual- exists in paging system. memory system, but implementation is very complicated (hardware). The solution is usually to lock individual pages in Demand Segmentation - used when insufficient memory (these with I/O pending) rather than locking whole process in memory. hardware to implement demand paging. The overall impact is better than the overhead of copying in and out of OS buffers. 29 The End Memory Management 31 OS/2 (80286) allocates memory in segments, which it keeps track of through segment descriptors. Segment descriptor contains a valid bit to indicate whether the segment is currently in memory. If segment is in main memory, access continues; If not in memory, segment fault. 30