Virtual Memory

Virtual Memory Announcements • TA Swap – Tyler Steele is a 415 TA – Rohit Doshi is a 414 TA • My office hours today: 3:25-4:25 2 Review: Memory Hierarchy of a Modern Computer System • Take advantage of the principle of locality to: – Present as much memory as in the cheapest technology – Provide access at speed offered by the fastest technology Processor Control 1s Size (bytes): 100s On-Chip Cache Speed (ns): Registers Datapath Second Level Cache (SRAM) Main Memory (DRAM) 10s-100s 100s Ks-Ms Ms Secondary Storage (Disk) Tertiary Storage (Tape) 10,000,000s 10,000,000,000s (10s ms) (10s sec) 3 Gs Ts Review: A Summary on Sources of Cache Misses • Compulsory (cold start): first reference to a block – “Cold” fact of life: not a whole lot you can do about it – Note: When running “billions” of instruction, Compulsory Misses are insignificant • Capacity: – Cache cannot contain all blocks access by the program – Solution: increase cache size • Conflict (collision): – Multiple memory locations mapped to same cache location – Solutions: increase cache size, or increase associativity • Two others: – Coherence (Invalidation): other process (e.g., I/O) updates memory – Policy: Due to non-optimal replacement policy 4 Review: Where does a Block Get Placed in a Cache? • Example: Block 12 placed in 8 block cache 32-Block Address Space: Block no. Block no. 1111111111222222222233 01234567890123456789012345678901 Direct mapped: Set associative: Fully associative: block 12 can go only into block 4 (12 mod 8) block 12 can go anywhere in set 0 (12 mod 4) block 12 can go anywhere 01234567 Block no. 01234567 Set Set Set Set 0 1 2 3 Block no. 01234567 5 Review: Other Caching Questions • What line gets replaced on cache miss? – Easy for Direct Mapped: Only one possibility – Set Associative or Fully Associative: • Random • LRU (Least Recently Used) • What happens on a write? – Write through: The information is written to both the cache and to the block in the lower-level memory – Write back: The information is written only to the block in the cache • Modified cache block is written to main memory only when it is replaced • Question is block clean or dirty? 6 Caching Applied to Address Translation CPU Virtual Address TLB Cached? Yes No Physical Address Physical Memory Translate (MMU) Data Read or Write (untranslated) • Question is one of page locality: does it exist? – Instruction accesses spend a lot of time on the same page (since accesses sequential) – Stack accesses have definite locality of reference – Data accesses have less page locality, but still some… • Can we have a TLB hierarchy? – Sure: multiple levels at different sizes/speeds 7 What Actually Happens on a TLB Miss? • Hardware traversed page tables: – On TLB miss, hardware in MMU looks at current page table to fill TLB (may walk multiple levels) • If PTE valid, hardware fills TLB and processor never knows • If PTE marked as invalid, causes Page Fault, after which kernel decides what to do afterwards • Software traversed Page tables (like MIPS) – On TLB miss, processor receives TLB fault – Kernel traverses page table to find PTE • If PTE valid, fills TLB and returns from fault • If PTE marked as invalid, internally calls Page Fault handler • Most chip sets provide hardware traversal – Modern operating systems tend to have more TLB faults since they use translation for many things – Examples: • shared segments • user-level portions of an operating system 8 Goals for Today • Virtual memory • How does it work? – Page faults – Resuming after page faults • When to fetch? • What to replace? – Page replacement algorithms • FIFO, OPT, LRU (Clock) – Page Buffering – Allocating Pages to processes 9 What is virtual memory? • Each process has illusion of large address space – 232 for 32-bit addressing • • However, physical memory is much smaller How do we give this illusion to multiple processes? – Virtual Memory: some addresses reside in disk page 0 page 1 page 2 page 3 page table disk page 4 page N Virtual memory Physical memory 10 Virtual Memory • Separates users logical memory from physical memory. – Only part of the program needs to be in memory for execution – Logical address space can therefore be much larger than physical address space – Allows address spaces to be shared by several processes – Allows for more efficient process creation 11 Virtual Memory • Load entire process in memory (swapping), run it, exit – Is slow (for big processes) – Wasteful (might not require everything) • Solutions: partial residency – Paging: only bring in pages, not all pages of process – Demand paging: bring only pages that are required • Where to fetch page from? – Have a contiguous space in disk: swap file (pagefile.sys) 12 How does VM work? • Modify Page Tables with another bit (“valid”) – If page in memory, valid = 1, else valid = 0 – If page is in memory, translation works as before – If page is not in memory, translation causes a page fault 0 1 2 3 32 :V=1 4183 :V=0 177 :V=1 5721 :V=0 Page Table Disk Mem 13 Page Faults • On a page fault: – OS finds a free frame, or evicts one from memory (which one?) • Want knowledge of the future? – Issues disk request to fetch data for page (what to fetch?) • Just the requested page, or more? – Block current process, context switch to new process (how?) • Process might be executing an instruction – When disk completes, set valid bit to 1, and current process in ready queue 14 Steps in Handling a Page Fault 15 Resuming after a page fault • Should be able to restart the instruction • For RISC processors this is simple: – Instructions are idempotent until references are done • More complicated for CISC: – E.g. move 256 bytes from one location to another – Possible Solutions: • Ensure pages are in memory before the instruction executes 16 Page Fault (Cont.) • Restart instruction – block move – auto increment/decrement location 17 When to fetch? • Just before the page is used! – Need to know the future • Demand paging: – Fetch a page when it faults • Prepaging: – Get the page on fault + some of its neighbors, or – Get all pages in use last time process was swapped 18 Performance of Demand Paging • Page Fault Rate 0  p  1.0 – if p = 0 no page faults – if p = 1, every reference is a fault • Effective Access Time (EAT) EAT = (1 – p) x memory access + p (page fault overhead + swap page out + swap page in + restart overhead ) 19 Demand Paging Example • Memory access time = 200 nanoseconds • Average page-fault service time = 8 milliseconds • EAT = (1 – p) x 200 + p (8 milliseconds) = (1 – p x 200 + p x 8,000,000 = 200 + p x 7,999,800 • If one access out of 1,000 causes a page fault EAT = 8.2 microseconds. This is a slowdown by a factor of 40!! 20 What to replace? • What happens if there is no free frame? – find some page in memory, but not really in use, swap it out • Page Replacement – When process has used up all frames it is allowed to use – OS must select a page to eject from memory to allow new page – The page to eject is selected using the Page Replacement Algorithm • Goal: Select page that minimizes future page faults 21 Page Replacement • Prevent over-allocation of memory by modifying pagefault service routine to include page replacement • Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk • Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory 22 Page Replacement 23 Page Replacement Algorithms • Random: Pick any page to eject at random – Used mainly for comparison • FIFO: The page brought in earliest is evicted – Ignores usage – Suffers from “Belady’s Anomaly” • Fault rate could increase on increasing number of pages • E.g. 0 1 2 3 0 1 4 0 1 2 3 4 with frame sizes 3 and 4 • OPT: Belady’s algorithm – Select page not used for longest time • LRU: Evict page that hasn’t been used the longest – Past could be a good predictor of the future 24 First-In-First-Out (FIFO) Algorithm • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • 3 frames (3 pages can be in memory at a time per process): 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 1 4 5 2 2 1 3 3 3 2 4 9 page faults • 4 frames: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 1 5 4 2 2 1 5 3 3 2 4 4 3 10 page faults 25 FIFO Illustrating Belady’s Anomaly 26 Optimal Algorithm • Replace page that will not be used for longest period of time • 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 4 2 6 page faults 3 4 5 • How do you know this? • Used for measuring how well your algorithm performs 27 Least Recently Used (LRU) Algorithm • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 1 1 1 5 2 2 2 2 2 3 5 5 4 4 4 4 3 3 3 28 Implementing Perfect LRU • On reference: Time stamp each page • On eviction: Scan for oldest frame • Problems: – Large page lists – Timestamps are costly • Approximate LRU 13 – LRU is already an approximation! 0xffdcd: add r1,r2,r3 0xffdd0: ld r1, 0(sp) 14 14 t=4 t=14 t=14 t=5 29 LRU: Clock Algorithm • Each page has a reference bit – Set on use, reset periodically by the OS • Algorithm: – FIFO + reference bit (keep pages in circular list) • Scan: if ref bit is 1, set to 0, and proceed. If ref bit is 0, stop and evict. • Problem: – Low accuracy for large memory R=1 R=1 R=0 R=0 R=1 R=0 R=0 R=1 R=1 R=0 R=1 30 LRU with large memory • Solution: Add another hand – Leading edge clears ref bits – Trailing edge evicts pages with ref bit 0 • What if angle small? • What if angle big? R=1 R=1 R=0 R=0 R=1 R=0 R=0 R=1 R=1 R=0 R=1 31 Clock Algorithm: Discussion • Sensitive to sweeping interval – Fast: lose usage information – Slow: all pages look used • Clock: add reference bits – Could use (ref bit, modified bit) as ordered pair – Might have to scan all pages • LFU: Remove page with lowest count – No track of when the page was referenced – Use multiple bits. Shift right by 1 at regular intervals. • MFU: remove the most frequently used page • LFU and MFU do not approximate OPT well 32 Page Buffering • Cute simple trick: (XP, 2K, Mach, VMS) – Keep a list of free pages – Track which page the free page corresponds to – Periodically write modified pages, and reset modified bit evict add used free modified list (batch writes = speed) unmodified free list 33 Allocating Pages to Processes • Global replacement – Single memory pool for entire system – On page fault, evict oldest page in the system – Problem: protection • Local (per-process) replacement – Have a separate pool of pages for each process – Page fault in one process can only replace pages from its own process – Problem: might have idle resources 34 Allocation of Frames • Each process needs minimum number of pages • Example: IBM 370 – 6 pages to handle SS MOVE instruction: – instruction is 6 bytes, might span 2 pages – 2 pages to handle from – 2 pages to handle to • Two major allocation schemes – fixed allocation – priority allocation 35 Summary • Demand Paging: – Treat memory as cache on disk – Cache miss get page from disk • Transparent Level of Indirection – User program is unaware of activities of OS behind scenes – Data can be moved without affecting application correctness • Replacement policies – FIFO: Place pages on queue, replace page at end – OPT: replace page that will be used farthest in future – LRU: Replace page that hasn’t be used for the longest time • Clock Algorithm: Approximation to LRU – Arrange all pages in circular list – Sweep through them, marking as not “in use” – If page not “in use” for one pass, than can replace 36

Virtual Memory

Related documents

Products

Support

Virtual Memory

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib