Virtual Memory

Virtual Memory Announcements • Prelim coming up in one week: – In 203 Thurston, Thursday October 16th, 10:10—11:25pm, 1½ hour – Topics: Everything up to (and including) Thursday, October 9th • Lectures 1-13, chapters 1-9, and 13 (8th ed) • Review Session will be this Thursday, October 9th – Time and Location TBD: Possibly 6:30pm – 7:30pm • Nazrul’s office hours changed for today – 12:30m - 2:30pm in Upson 328 • Homework 3 due today, October 7th • CS 4410 Homework 2 graded. (Solutions avail via CMS). – Mean 45 (stddev 5), High 50 out of 50 – Common problems • Q1: did not satisfy bounded waiting mutual exclusion was not violated 2 Homework #2, Question #1 • Dekker’s Algorithm (1965) CSEnter(int i) { inside[i] = true; while(inside[J]) { if (turn == J) { inside[i] = false; while(turn == J) continue; inside[i] = true; } }} CSExit(int i) { turn = J; inside[i] = false; } Review: Multi-level Translation • Illusion of a contiguous address space • Physicall reality – address space broken into segments or fixed-size pages – Segments or pages spread throughout physical memory • Could have any number of levels. Example (top segment): Virtual Address: 10 bits Virtual Seg # Base0 Base1 Base2 Base3 Base4 Base5 Base6 Base7 10 bits Virtual Page # Limit0 Limit1 Limit2 Limit3 Limit4 Limit5 Limit6 Limit7 V V V N V N N V 12 bits Offset frame #0 V,R frame #1 V,R frame #2 V,R,W page #2 frame #3 V,R,W frame #4 N frame #5 V,R,W > Access Error Physical fram # Offset Physical Address Check Perm Access Error • What must be saved/restored on context switch? – Contents of top-level segment registers (for this example) – Pointer to top-level table (page table) 4 Review: Two-LevelPhysical Page Table 10 bits 10 bits Virtual Virtual Virtual Address: P1 index P2 index 12 bits Physical Address: Frame # Offset Offset 4KB PageTablePtr 4 bytes • Tree of Page Tables • Tables fixed size (1024 entries) – On context-switch: save single PageTablePtr register • Sometimes, top-level page tables called “directories” (Intel) • Each entry called a (surprise!) Page Table Entry (PTE) 4 bytes 5 What is in a PTE? • What is in a Page Table Entry (or PTE)? – Pointer to next-level page table or to actual page – Permission bits: valid, read-only, read-write, execute-only • Example: Intel x86 architecture PTE: – Address same format previous slide (10, 10, 12-bit offset) – Intermediate page tables called “Directories” PCD PWT Page Frame Number Free 0 L D A UW P (Physical Page Number) (OS) 31-12 11-9 8 7 6 5 4 3 2 1 0 P: Present (same as “valid” bit in other architectures) W: Writeable U: User accessible PWT: Page write transparent: external cache write-through PCD: Page cache disabled (page cannot be cached) A: Accessed: page has been accessed recently D: Dirty (PTE only): page has been modified recently L: L=14MB page (directory only). Bottom 22 bits of virtual address serve as offset 6 Examples of how to use a PTE • How do we use the PTE? – Invalid PTE can imply different things: • Region of address space is actually invalid or • Page/directory is just somewhere else than memory – Validity checked first • OS can use other (say) 31 bits for location info • Usage Example: Demand Paging – Keep only active pages in memory – Place others on disk and mark their PTEs invalid • Usage Example: Copy on Write – UNIX fork gives copy of parent address space to child • Address spaces disconnected after child created – How to do this cheaply? • Make copy of parent’s page tables (point at same memory) • Mark entries in both sets of page tables as read-only • Page fault on write creates two copies • Usage Example: Zero Fill On Demand – New data pages must carry no information (say be zeroed) – Mark PTEs as invalid; page fault on use gets zeroed page – Often, OS creates zeroed pages in background 7 How is the translation accomplished? CPU Virtual Addresses MMU Physical Addresses • What, exactly happens inside MMU? • One possibility: Hardware Tree Traversal – For each virtual address, takes page table base pointer and traverses the page table in hardware – Generates a “Page Fault” if it encounters invalid PTE • Fault handler will decide what to do • More on this next lecture – Pros: Relatively fast (but still many memory accesses!) – Cons: Inflexible, Complex hardware • Another possibility: Software – Each traversal done in software – Pros: Very flexible – Cons: Every translation must invoke Fault! • In fact, need way to cache translations for either case! 8 Caching Concept • Cache: a repository for copies that can be accessed more quickly than the original – Make frequent case fast and infrequent case less dominant • Caching underlies many of the techniques that are used today to make computers fast – Can cache: memory locations, address translations, pages, file blocks, file names, network routes, etc… • Only good if: – Frequent case frequent enough and – Infrequent case not too expensive • Important measure: Average Access time = (Hit Rate x Hit Time) + (Miss Rate x Miss Time) 9 Why Bother with Caching? Processor-DRAM Memory Gap (latency) Performance 1000 “Moore’s Law” (really Joy’s Law) 100 1 “Less’ Law?” 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 10 µProc 60%/yr. (2X/1.5yr) Processor-Memory Performance Gap: (grows 50% / year) DRAM DRAM 9%/yr. (2X/10 yrs) CPU Time 10 Another Major Reason to Deal with Caching Virtual Virtual Virtual Address: Seg # Base0 Base1 Base2 Base3 Base4 Base5 Base6 Base7 Offset Page # Limit0 Limit1 Limit2 Limit3 Limit4 Limit5 Limit6 Limit7 V V V N V N N V page page page page page page > #0 V,R #1 V,R #2 V,R,W #3 V,R,W N #4 #5 V,R,W Access Error • Too expensive to translate on every access Physical Page # Offset Physical Address Check Perm Access Error – At least two DRAM accesses per actual DRAM access – Or: perhaps I/O if page table partially on disk! • Even worse problem: What if we are using caching to make memory access faster than DRAM access??? • Solution? Cache translations! 11 – Translation Cache: TLB (“Translation Lookaside Buffer”) Why Does Caching Help? Locality! Probability of reference 0 2n - 1 Address Space • Temporal Locality (Locality in Time): – Keep recently accessed data items closer to processor • Spatial Locality (Locality in Space): – Move contiguous blocks to the upper levels To Processor Upper Level Memory Lower Level Memory Blk X From Processor Blk Y 12 Review: Memory Hierarchy of a Modern Computer System • Take advantage of the principle of locality to: – Present as much memory as in the cheapest technology – Provide access at speed offered by the fastest technology Processor Control 1s Size (bytes): 100s On-Chip Cache Speed (ns): Registers Datapath Second Level Cache (SRAM) Main Memory (DRAM) 10s-100s 100s Ks-Ms Ms Secondary Storage (Disk) Tertiary Storage (Tape) 10,000,000s 10,000,000,000s (10s ms) (10s sec) 13 Gs Ts A Summary on Sources of Cache Misses • Compulsory (cold start): first reference to a block – “Cold” fact of life: not a whole lot you can do about it – Note: When running “billions” of instruction, Compulsory Misses are insignificant • Capacity: – Cache cannot contain all blocks access by the program – Solution: increase cache size • Conflict (collision): – Multiple memory locations mapped to same cache location – Solutions: increase cache size, or increase associativity • Two others: – Coherence (Invalidation): other process (e.g., I/O) updates memory – Policy: Due to non-optimal replacement policy 14 Review: Where does a Block Get Placed in a Cache? • Example: Block 12 placed in 8 block cache 32-Block Address Space: Block no. Block no. 1111111111222222222233 01234567890123456789012345678901 Direct mapped: Set associative: Fully associative: block 12 can go only into block 4 (12 mod 8) block 12 can go anywhere in set 0 (12 mod 4) block 12 can go anywhere 01234567 Block no. 01234567 Set Set Set Set 0 1 2 3 Block no. 01234567 15 Other Caching Questions • What line gets replaced on cache miss? – Easy for Direct Mapped: Only one possibility – Set Associative or Fully Associative: • Random • LRU (Least Recently Used) • What happens on a write? – Write through: The information is written to both the cache and to the block in the lower-level memory – Write back: The information is written only to the block in the cache • Modified cache block is written to main memory only when it is replaced • Question is block clean or dirty? 16 Caching Applied to Address Translation CPU Virtual Address TLB Cached? Yes No Physical Address Physical Memory Translate (MMU) Data Read or Write (untranslated) • Question is one of page locality: does it exist? – Instruction accesses spend a lot of time on the same page (since accesses sequential) – Stack accesses have definite locality of reference – Data accesses have less page locality, but still some… • Can we have a TLB hierarchy? – Sure: multiple levels at different sizes/speeds 17 What Actually Happens on a TLB Miss? • Hardware traversed page tables: – On TLB miss, hardware in MMU looks at current page table to fill TLB (may walk multiple levels) • If PTE valid, hardware fills TLB and processor never knows • If PTE marked as invalid, causes Page Fault, after which kernel decides what to do afterwards • Software traversed Page tables (like MIPS) – On TLB miss, processor receives TLB fault – Kernel traverses page table to find PTE • If PTE valid, fills TLB and returns from fault • If PTE marked as invalid, internally calls Page Fault handler • Most chip sets provide hardware traversal – Modern operating systems tend to have more TLB faults since they use translation for many things – Examples: • shared segments • user-level portions of an operating system 18 Goals for Today • Virtual memory • How does it work? – Page faults – Resuming after page faults • When to fetch? • What to replace? – Page replacement algorithms • FIFO, OPT, LRU (Clock) – Page Buffering – Allocating Pages to processes 19 What is virtual memory? • Each process has illusion of large address space – 232 for 32-bit addressing • • However, physical memory is much smaller How do we give this illusion to multiple processes? – Virtual Memory: some addresses reside in disk page 0 page 1 page 2 page 3 page table disk page 4 page N Virtual memory Physical memory 20 Virtual Memory • Separates users logical memory from physical memory. – Only part of the program needs to be in memory for execution – Logical address space can therefore be much larger than physical address space – Allows address spaces to be shared by several processes – Allows for more efficient process creation 21 Virtual Memory • Load entire process in memory (swapping), run it, exit – Is slow (for big processes) – Wasteful (might not require everything) • Solutions: partial residency – Paging: only bring in pages, not all pages of process – Demand paging: bring only pages that are required • Where to fetch page from? – Have a contiguous space in disk: swap file (pagefile.sys) 22 How does VM work? • Modify Page Tables with another bit (“valid”) – If page in memory, valid = 1, else valid = 0 – If page is in memory, translation works as before – If page is not in memory, translation causes a page fault 0 1 2 3 32 :V=1 4183 :V=0 177 :V=1 5721 :V=0 Page Table Disk Mem 23 Page Faults • On a page fault: – OS finds a free frame, or evicts one from memory (which one?) • Want knowledge of the future? – Issues disk request to fetch data for page (what to fetch?) • Just the requested page, or more? – Block current process, context switch to new process (how?) • Process might be executing an instruction – When disk completes, set valid bit to 1, and current process in ready queue 24 Steps in Handling a Page Fault 25 Resuming after a page fault • Should be able to restart the instruction • For RISC processors this is simple: – Instructions are idempotent until references are done • More complicated for CISC: – E.g. move 256 bytes from one location to another – Possible Solutions: • Ensure pages are in memory before the instruction executes 26 Page Fault (Cont.) • Restart instruction – block move – auto increment/decrement location 27 When to fetch? • Just before the page is used! – Need to know the future • Demand paging: – Fetch a page when it faults • Prepaging: – Get the page on fault + some of its neighbors, or – Get all pages in use last time process was swapped 28 Performance of Demand Paging • Page Fault Rate 0  p  1.0 – if p = 0 no page faults – if p = 1, every reference is a fault • Effective Access Time (EAT) EAT = (1 – p) x memory access + p (page fault overhead + swap page out + swap page in + restart overhead ) 29 Demand Paging Example • Memory access time = 200 nanoseconds • Average page-fault service time = 8 milliseconds • EAT = (1 – p) x 200 + p (8 milliseconds) = (1 – p) x 200 + p x 8,000,000 = 200 + p x 7,999,800 • If one access out of 1,000 causes a page fault EAT = 8.2 microseconds. This is a slowdown by a factor of 40!! 30 What to replace? • What happens if there is no free frame? – find some page in memory, but not really in use, swap it out • Page Replacement – When process has used up all frames it is allowed to use – OS must select a page to eject from memory to allow new page – The page to eject is selected using the Page Replacement Algorithm • Goal: Select page that minimizes future page faults 31 Page Replacement • Prevent over-allocation of memory by modifying pagefault service routine to include page replacement • Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk • Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory 32 Page Replacement 33 Page Replacement Algorithms • Random: Pick any page to eject at random – Used mainly for comparison • FIFO: The page brought in earliest is evicted – Ignores usage – Suffers from “Belady’s Anomaly” • Fault rate could increase on increasing number of pages • E.g. 0 1 2 3 0 1 4 0 1 2 3 4 with frame sizes 3 and 4 • OPT: Belady’s algorithm – Select page not used for longest time • LRU: Evict page that hasn’t been used the longest – Past could be a good predictor of the future 34 First-In-First-Out (FIFO) Algorithm • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • 3 frames (3 pages can be in memory at a time per process): 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 1 4 5 2 2 1 3 3 3 2 4 9 page faults • 4 frames: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 1 5 4 2 2 1 5 3 3 2 4 4 3 10 page faults 35 FIFO Illustrating Belady’s Anomaly 36 Optimal Algorithm • Replace page that will not be used for longest period of time • 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 4 2 6 page faults 3 4 5 • How do you know this? • Used for measuring how well your algorithm performs 37 Least Recently Used (LRU) Algorithm • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 1 1 1 5 2 2 2 2 2 3 5 5 4 4 4 4 3 3 3 38 Implementing Perfect LRU • On reference: Time stamp each page • On eviction: Scan for oldest frame • Problems: – Large page lists – Timestamps are costly • Approximate LRU 13 – LRU is already an approximation! 0xffdcd: add r1,r2,r3 0xffdd0: ld r1, 0(sp) 14 14 t=4 t=14 t=14 t=5 39 LRU: Clock Algorithm • Each page has a reference bit – Set on use, reset periodically by the OS • Algorithm: – FIFO + reference bit (keep pages in circular list) • Scan: if ref bit is 1, set to 0, and proceed. If ref bit is 0, stop and evict. • Problem: – Low accuracy for large memory R=1 R=1 R=0 R=0 R=1 R=0 R=0 R=1 R=1 R=0 R=1 40 LRU with large memory • Solution: Add another hand – Leading edge clears ref bits – Trailing edge evicts pages with ref bit 0 • What if angle small? • What if angle big? R=1 R=1 R=0 R=0 R=1 R=0 R=0 R=1 R=1 R=0 R=1 41 Clock Algorithm: Discussion • Sensitive to sweeping interval – Fast: lose usage information – Slow: all pages look used • Clock: add reference bits – Could use (ref bit, modified bit) as ordered pair – Might have to scan all pages • LFU: Remove page with lowest count – No track of when the page was referenced – Use multiple bits. Shift right by 1 at regular intervals. • MFU: remove the most frequently used page • LFU and MFU do not approximate OPT well 42 Page Buffering • Cute simple trick: (XP, 2K, Mach, VMS) – Keep a list of free pages – Track which page the free page corresponds to – Periodically write modified pages, and reset modified bit evict add used free modified list (batch writes = speed) unmodified free list 43 Allocating Pages to Processes • Global replacement – Single memory pool for entire system – On page fault, evict oldest page in the system – Problem: protection • Local (per-process) replacement – Have a separate pool of pages for each process – Page fault in one process can only replace pages from its own process – Problem: might have idle resources 44 Allocation of Frames • Each process needs minimum number of pages • Example: IBM 370 – 6 pages to handle SS MOVE instruction: – instruction is 6 bytes, might span 2 pages – 2 pages to handle from – 2 pages to handle to • Two major allocation schemes – fixed allocation – priority allocation 45 Summary • Demand Paging: – Treat memory as cache on disk – Cache miss get page from disk • Transparent Level of Indirection – User program is unaware of activities of OS behind scenes – Data can be moved without affecting application correctness • Replacement policies – FIFO: Place pages on queue, replace page at end – OPT: replace page that will be used farthest in future – LRU: Replace page that hasn’t be used for the longest time • Clock Algorithm: Approximation to LRU – Arrange all pages in circular list – Sweep through them, marking as not “in use” – If page not “in use” for one pass, than can replace 46

Virtual Memory

Related documents

Products

Support

Virtual Memory

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib