CPS110: Page replacement Landon Cox February 21, 2008 Dynamic address translation User process Virtual address Translator (MMU) Physical address Translator: just a data structure Tradeoffs Flexibility (sharing, growth, virtual memory) Size of translation data Speed of translation Physical memory 2. Segmentation Segment # Base Bound Segment 0 4000 700 Code Segment 1 0 500 Data segment Segment 2 Unused Unused Segment 3 2000 1000 Segment # Stack segment Virtual memory Physical memory (3,fff) . (3,000) … (1,4ff) . (1,000) … (0,6ff) . (0,0) 46ff . 4000 … 2fff . 2000 … 4ff . 0 (Stack) Offset (Data) (Code) (Code segment) (Stack segment) (Data segment) Virtual addresses VA={b31,b30,…,b12,b11,…,b1,b0} High-order bits Low-order bits (segment number) (offset) 3. Paging Very similar to segmentation Allocate memory in fixed-size chunks Chunks are called pages Virtual addresses Virtual page # (e.g. upper 20 bits) Offset within the page (e.g. low 12 bits) 3. Paging Translation data is a page table Virtual page # Physical page # 0 10 1 15 2 20 3 invalid … … 1048574 Invalid 1048575 invalid 3. Paging Translation process if (virtual page # is invalid) { trap to kernel } else { physical page = pagetable[virtual page].physPageNum physical address is {physical page}{offset} } Page size trade-offs If page size is too small Lots of page table entries Big page table If we use 4-byte pages with 4-byte PTEs Num PTEs = 2^30 = 1 billion 1 billion PTE * 4 bytes/PTE = 4 GB Would take up entire address space! Page size trade-offs What if we use really big (1GB) pages? Internal fragmentation Wasted space within the page Recall external fragmentation (wasted space between pages/segments) Page size trade-offs Compromise between the two x86 page size = 4kb for 32-bit processor Sparc page size = 8kb For 4KB pages, how big is the page table? 1 million page table entries 32-bits (4 bytes) per page table entry 4 MB per page table Paging pros and cons + Simple memory allocation + Can share lots of small pieces (how would two processes share a page?) (how would two processes share an address space?) + Easy to grow address space – Large page tables (even with large invalid section) Comparing translation schemes Base and bounds Unit of translation = entire address space Segmentation Unit of translation = segment A few large variable-size segments/address space Paging Unit of translation = page Lots of small fixed-size pages/address space What we want Efficient use of physical memory Little external or internal fragmentation Easy sharing between processes Space-efficient data structure How we’ll do this Modify paging scheme Course administration Latest thread library score Thread library scores 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 Group number 9 1t: median = 41 (out of 64) Total: median = 74% (last semester median = 98%) 10 11 12 Course administration Latest thread library score Thread library scores 70 60 50 40 30 20 10 0 2/10 2/12 2/13 2/14 2/14 2/15 2/16 2/17 Initial submit date 2/17 2/18 1t: median = 41 (out of 64) Total: median = 74% (last semester median = 98%) 2/18 2/19 Course administration Next week I will be out of town Mid-term exam on Tuesday Amre will proctor exam No discussion sections Thursday lecture cancelled Project 2 out on Thursday Any questions about exam? Course administration Project 2 Posted on the web next Thursday (February 28th) Write a virtual memory manager Manage address spaces and page faults Due 4 weeks from today (March 20th) Comparison to Project 1 Less difficult concepts (no synchronization) Lots of bookkeeping and corner cases No solution to compare to (e.g. thread.o) 4. Multi-level translation Standard page table Just a flat array of page table entries VA={b31,b30,…,b12,b11,…,b1,b0} High-order bits (Page number) Low-order bits (offset) Used to index into table 4. Multi-level translation Multi-level page table Use a tree instead of an array VA={b31,b30,…,b22,b21,…, b12,b11,…,b1,b0} Level 1 Level 2 Low-order bits (offset) Used to index into table 2 Used to index into table 1 What is stored in the level 1 page table? If valid? If invalid? 4. Multi-level translation Multi-level page table Use a tree instead of an array VA={b31,b30,…,b22,b21,…, b12,b11,…,b1,b0} Level 1 Level 2 Low-order bits (offset) Used to index into table 2 Used to index into table 1 What is stored in the level 2 page table? If valid? If invalid? Two-level tree Level 1 Level 2 0 1 … NULL NULL 1023 ? 0: PhysPage, Res, Prot 0: PhysPage, Res, Prot 1: PhysPage, Res, Prot 1: PhysPage, Res, Prot … … ? ? 1023: PhysPage, Res, Prot 1023: PhysPage, Res, Prot VA={b31,b30,…,b22,b21,…, b12,b11,…,b1,b0} Level 1 Level 2 Offset Two-level tree Level 1 Level 2 0 1 … NULL NULL 1023 0: PhysPage, Res, Prot 0: PhysPage, Res, Prot 1: PhysPage, Res, Prot 1: PhysPage, Res, Prot … 1023: PhysPage, Res, Prot … 1023: PhysPage, Res, Prot How does this save space? Two-level tree Level 1 Level 2 0 1 … NULL NULL 1023 0: PhysPage, Res, Prot 0: PhysPage, Res, Prot 1: PhysPage, Res, Prot 1: PhysPage, Res, Prot … 1023: PhysPage, Res, Prot … 1023: PhysPage, Res, Prot What changes on a context switch? Multi-level translation pros and cons + Space-efficient for sparse address spaces + Easy memory allocation + Easy sharing – What is the downside? Two extra lookups per reference (read level 1 PT, then read level 2 PT) (memory accesses just got really slow) Translation look-aside buffer Aka the “TLB” Hardware cache from CPS 104 Maps virtual page #s to physical page #s On cache hit, get PTE very quickly On miss use page table, store mapping, restart instruction What happens on a context switch? Have to flush the TLB Takes time to rewarm the cache As in life, context switches may be expensive Replacement Think of physical memory as a cache What happens on a cache miss? Page fault Must decide what to evict Goal: reduce number of misses Review of replacement algorithms 1. Random Easy implementation, not great results 2. FIFO (first in, first out) Replace page that came in longest ago Popular pages often come in early Problem: doesn’t consider last time used 3. OPT (optimal) Replace the page that won’t be needed for longest time Problem: requires knowledge of the future Review of replacement algorithms LRU (least-recently used) Use past references to predict future Exploit “temporal locality” Problem: expensive to implement exactly Why? Either have to keep sorted list Or maintain time stamps + scan on eviction Update info on every access (ugh) LRU LRU is just an approximation of OPT Could try approximating LRU instead Don’t have to replace oldest page Just replace an old page Clock Approximates LRU What can the hardware give us? “Reference bit” for each PTE Set each time page is accessed Why is this done in hardware? May be slow to do in software Clock Approximates LRU What can the hardware give us? “Reference bit” for each PTE Set each time page is accessed What do “old” pages look like to OS? Clear all bits Check later to which are set Clock Time 0: clear reference bit for page . . . Time t: examine reference bit Splits pages into two classes Those that have been touched lately Those that haven’t been touched lately Clearing all bits simultaneously is slow Try to spread work out over time Clock A PP VP Physical page 0 Physical page 1 Physical page 2 E B PP VP Physical page 3 PP VP Physical page 4 D PP VP PP VP = Resident virtual pages PP VP C Clock A PP VP Physical page 0 E Physical page 1 Physical page 2 B PP VP Physical page 3 PP VP PP VP Physical page 4 D When you need to evict a page: 1) Check physical page pointed to by clock hand PP VP C Clock A PP VP Physical page 0 E Physical page 1 Physical page 2 B PP VP Physical page 3 PP VP PP VP Physical page 4 D PP VP C When you need to evict a page: 2) If reference=0, page hasn’t been touched in a while. Evict. Clock A PP VP Physical page 0 E Physical page 1 Physical page 2 B PP VP Physical page 3 PP VP PP VP Physical page 4 D PP VP C When you need to evict a page: 3) If reference=1, page has been accessed since last sweep. What to do? Clock A PP VP Physical page 0 E Physical page 1 Physical page 2 B PP VP Physical page 3 PP VP PP VP Physical page 4 D PP VP C When you need to evict a page: 3) If reference=1, page has been accessed since last sweep. Set reference=0. Rotate clock hand. Try next page. Clock Does this cause an infinite loop? No. First sweep sets all to 0, evict on next sweep What about new pages? Put behind clock hand Set reference bit to 1 Maximizes chance for page to stay in memory Paging out What can we do with evicted pages? Write to disk When don’t you need to write to disk? Disk already has data (page is clean) Can recompute page content (zero page) Paging out Why set the dirty bit in hardware? If set on every store, too slow for software Why not write to disk on each store? Too slow Better to defer work You might not have to do it! (except in 110) Paging out When does work of writing to disk go away? If you store to the page again If the owning process exits before eviction Project 2: other work you can defer Initializing a page with zeroes Taking faults Paging out Faulted-in page must wait for disk write Can we avoid this work too? Evict clean (non-dirty) pages first Write out pages during idle periods Project 2: don’t do either of these! Hardware page table info What should go in a PTE? Protection Physical page # Resident (read/write) Set by OS to control translation. Checked by MMU on each access. Set by OS. Checked by MMU on each access. Set by OS to control access. Checked by MMU on each access. Dirty Reference Set by MMU Set by when page is MMU used. Used by when OS to see if page is modified. page has been referenced. Used by OS to see if page is decisions? modified. What bits does a MMU need to make access MMU needs to know if resident, readable, or writable. Do we really need a resident bit? No, if non-resident, set R=W=0. MMU algorithm if (VP # is invalid || non-resident || protected) { trap to OS fault handler } else { physical page = pageTable[virtual page].physPageNum physical address = {physical page}{offset} pageTable[virtual page].referenced = 1 if (access is write) { pageTable[virtual page].dirty = 1 } } Project 2: infrastructure performs MMU functions Note: P2 page table entry definition has no dirty/reference bits Hardware page table entries Do PTEs need to store disk block nums? No Only the OS needs this (the MMU doesn’t) What per page info does OS maintain? Which virtual pages are valid On-disk locations of virtual pages Hardware page table entries Do we really need a dirty bit? Claim: OS can emulate at a reasonable overhead. How can OS emulate the dirty bit? Keep the page read-only MMU will fault on a store OS/you now know that the page is dirty Do we need to fault on every store? No. After first store, set page writable When do we make it read-only again? When it’s clean (e.g. written to disk and paged in) Hardware page table entries Do we really need a reference bit? Claim: OS can emulate at a reasonable overhead. How can OS emulate the reference bit? Keep the page unreadable MMU will fault on a load/store OS/you now knows that the page has been referenced Do we need to fault on every load/store? No. After first load/store, set page readable Application’s perspective VM system manages page permissions Application is totally unaware of faults, etc Most OSes allow apps to request page protections E.g. make their code pages read-only Project 2 App has no control over page protections App assumes all pages are read/writable