CS 61C: Great Ideas in Computer Architecture (Machine Structures) Virtual Memory Management Instructors: Randy H. Katz David A. Patterson http://inst.eecs.Berkeley.edu/~cs61c/Sp11 6/27/2016 Spring 2011 -- Lecture #24 1 6/27/2016 Spring 2011 -- Lecture #24 2 New-School Machine Structures Today’s Big Idea: Memory Hierarchy Lecture Software • Parallel Requests Assigned to computer e.g., Search “Katz” Hardware Harness • Parallel Threads Parallelism & Assigned to core e.g., Lookup, Ads Achieve High Performance Smart Phone Warehouse Scale Computer Virtual Memory Computer • Parallel Instructions >1 instruction @ one time e.g., 5 pipelined instructions • Parallel Data >1 data item @ one time e.g., Add of 4 pairs of words • Hardware descriptions All gates @ one time 6/27/2016 … Core Memory Core (Cache) Input/Output Instruction Unit(s) Core Functional Unit(s) A0+B0 A1+B1 A2+B2 A3+B3 Main Memory Logic Gates Spring 2011 -- Lecture #24 3 Overarching Theme for Today “Any problem in computer science can be solved by an extra level of indirection.” – Often attributed to Butler Lampson (Berkeley PhD and Professor, Turing Award Winner), who in turn, attributed it to David Wheeler, a British computer scientist, who also said “… except for the problem of too many layers of indirection!” 6/27/2016 Spring 2011 -- Lecture #24 Butler Lampson 4 Agenda • • • • • • • HW Memory Protection Administrivia Virtual Memory Translation Lookaside Buffer Technology Break Another View of Virtual Memory Summary 6/27/2016 Spring 2011 -- Lecture #24 5 Agenda • • • • • • • HW Memory Protection Administrivia Virtual Memory Translation Lookaside Buffer Technology Break Another View of Virtual Memory Summary 6/27/2016 Spring 2011 -- Lecture #24 6 Review: C Memory Management ~ FFFF FFFFhex • C has three pools of data memory (+ code memory) – Static storage: global variable storage, basically permanent, entire program run – The Stack: local variable storage, parameters, return address – The Heap (dynamic storage): malloc() grabs space from here, free() returns it heap • Common (Dynamic) Memory Problems – Using uninitialized values – Accessing memory beyond your allocated region – Improper use of free/realloc by messing with the pointer handle returned by malloc – Memory leaks: mismatched malloc/free pairs 6/27/2016 Spring 2011 -- Lecture #24 stack static data code ~ 0hex OS prevents accesses between stack and heap (via virtual memory) 7 Simplest Model • Only one program running on the computer – Addresses in the program are exactly the physical memory addresses • Extensions to the simple model: – What if less physical memory than full address space? – What if we want to run multiple programs at the same time? 6/27/2016 Spring 2011 -- Lecture #24 8 Problem #1: Physical Memory Less Than the Full Address Space • One architecture, ~ FFFF FFFF many implementations, with possibly different amounts of memory • Memory used to very expensive and physically bulky • Where does the stack ~ 0 grow from then? hex stack heap static data code hex 6/27/2016 “Logical” “Virtual” Spring 2011 -- Lecture #24 Real 9 Idea: Level of Indirection to Create Illusion of Large Physical Memory Hi Order Bits of Virtual Address ~ FFFF FFFFhex stack Address Map or Table Hi Order Bits of Physical Address 7 6 5 4 3 heap 2 static data 1 ~ 0hex code “Logical” “Virtual” 6/27/2016 0 Virtual “Page” Address Spring 2011 -- Lecture #24 Physical “Page” Address Real 10 Problem #2: Multiple Programs Sharing the Machine’s Address Space • How can we run ~ FFFF FFFF multiple programs without accidentally stepping on same addresses? • How can we protect programs ~0 from clobbering each other? hex stack stack heap heap static data static data code hex 6/27/2016 ~ FFFF FFFFhex Application 1 Spring 2011 -- Lecture #24 ~ 0hex code Application 2 11 Idea: Level of Indirection to Create Illusion of Separate Address Spaces ~ FFFF FFFFhex stack 7 7 6 6 static data 5 5 code 4 4 3 3 2 2 1 1 0 0 heap ~ 0hex ~ FFFF FFFFhex stack heap static data ~ 0hex 6/27/2016 code One table per running application OR swap table contents when switching Spring 2011 -- Lecture #24 Real 12 Extension to the Simple Model • Multiple programs sharing the same address space – E.g., Operating system uses low end of address range shared with application – Multiple programs in shared (virtual) address space • Static management: fixed partitioning/allocation of space • Dynamic management: programs come and go, take different amount of time to execute, use different amounts of memory • How can we protect programs from clobbering each other? • How can we allocate memory to applications on demand? 6/27/2016 Spring 2011 -- Lecture #24 13 Static Division of Shared Address Space ~ FFFF FFFFhex stack Application (4 GB - 64 MB) heap static data ~ 1000 0000hex ~ 0FFF FFFFhex 228 bytes (64 MB) ~ 0hex 6/27/2016 code stack heap Operating System • E.g., how to manage the carving up of the address space among OS and applications? • Where does the OS end and the application begin? • Dynamic management, with protection, would be better! static data code Spring 2011 -- Lecture #24 14 First Idea: Base + Bounds Registers for Location Independence Max addr Programming and storage management ease: need for a base register prog1 Protection Independent programs should not affect each other inadvertently: need for a bound register 0 addr Physical Memory Location-independent programs prog2 Historically, base + bounds registers were a very early idea in computer architecture 6/27/2016 Spring 2011 -- Lecture #24 15 Simple Base and Bound Translation Segment Length lw X Effective Address Bounds Violation? + Physical Address Physical Memory Bound Register current segment Base Register Base Physical Address Program Address Space Base and bounds registers are visible/accessible to programmer Trap to OS if bounds violation detected (“seg fault”/”core dumped”) 6/27/2016 Spring 2011 -- Lecture #24 16 Programs Sharing Memory OS Space pgms 4 & 5 arrive OS Space pgm 1 16K pgm 1 16K pgm 2 24K pgm 2 pgm 4 24K 32K pgm 3 16K 8K 32K 24K pgm 5 24K 24K pgm 3 pgms 2 & 5 leave free OS Space pgm 1 16K 24K pgm 4 16K 8K pgm 3 32K 24K Why do we want to run multiple programs? Run others while waiting for I/O What prevents programs from accessing each other’s data? 6/27/2016 Spring 2011 -- Lecture #24 17 Student Roulette? Restriction on Base + Bounds Regs Want only the Operating System to be able to change Base and Bound Registers Processors need different execution modes 1. User mode: can use Base and Bound Registers, but cannot change them 2. Supervisor mode: can use and change Base and Bound Registers – Also need Mode Bit (0=User, 1=Supervisor) to determine processor mode – Also need way for program in User Mode to invoke operating system in Supervisor Mode, and vice versa 6/27/2016 Spring 2011 -- Lecture #24 18 Programs Sharing Memory OS Space pgms 4 & 5 arrive OS Space pgm 1 16K pgm 1 16K pgm 2 24K pgm 2 pgm 4 24K 32K pgm 3 16K 8K 32K 24K pgm 5 24K 24K pgm 3 pgms 2 & 5 leave pgm 1 free OS Space 16K 24K pgm 4 16K 8K pgm 3 32K 24K As programs come and go, the storage is “fragmented”. Therefore, at some stage programs have to be moved around to compact the storage. Easy way to do this? 6/27/2016 Spring 2011 -- Lecture #24 19 Student Roulette? Agenda • • • • • • • HW Memory Protection Administrivia Virtual Memory Translation Lookaside Buffer Technology Break Another View of Virtual Memory Summary 6/27/2016 Spring 2011 -- Lecture #24 20 Administrivia • Extra Credit due 4/24 – “Fastest” Matrix Multiply – EC not limited to three fastest projects! – You get EC for 20% improvement over your Project #3 – Submit working code if your Project #3 did not work • F2F “Grading” of Project #4 in Lab this week • Final Review: Mon 5/2, 5 – 8PM, 2050 VLSB • Final Exam: Mon 5/9, 11:30-2:30PM, 100 Haas Pavilion – Designed for 90 minutes, you will have 3 hours – Comprehensive (particularly problem areas on midterm), but focused on course since midterm: lecture, lab, hws, and projects are fair game – 8 ½ inch x 11 inch crib sheet like midterm 6/27/2016 Spring 2011 -- Lecture #24 21 Administrivia http://www.geekosystem.com/engineering-professor-meme/2/ 6/27/2016 Spring 2011 -- Lecture #24 22 Agenda • • • • • • • HW Memory Protection Administrivia Virtual Memory Translation Lookaside Buffer Technology Break Another View of Virtual Memory Summary 6/27/2016 Spring 2011 -- Lecture #24 23 Idea #2: Page Tables to Avoid Memory Fragmentation • Divide memory address space into equal sized blocks, called pages – Traditionally 4 KB or 8 KB • Use a level of indirection to map program addresses into memory addresses – One indirection mapping per address space page • This table of mappings is called a page table 6/27/2016 Spring 2011 -- Lecture #24 24 Paged Memory Systems • Processor-generated address is split into: page number 20 bits offset 12 bits 32-bit byte address 4096 byte pages • Page table contains the physical address of the base of each page: 1 Program consists of 4x 4K Byte pages or 16384 Bytes 0 1 2 3 Virtual Address Space 0 0 1 2 3 Physical Memory 3 Page Table (think of an array of base registers or pointers) 2 Page tables make it possible to store the pages of a program non-contiguously. 6/27/2016 Spring 2011 -- Lecture #24 25 Separate Address Space per Program Pgm 1 OS pages VA1 Page Table VA1 Physical Memory Pgm 2 Page Table Pgm 3 VA1 Page Table free • Each program has own page table • Page table contains an entry for each program page 6/27/2016 Spring 2011 -- Lecture #24 26 Paging Terminology • Program addresses called virtual addresses – Space of all virtual addresses called virtual memory • Memory addresses called physical addresses – Space of all physical addresses called physical memory 6/27/2016 Spring 2011 -- Lecture #24 27 Processes and Virtual Memory • Allow multiple processes to simultaneously occupy memory and provide protection – don’t let one program read/write memory from another – Each has own PC, stack, heap – Like threads, except processes have separate address spaces • Address space – give each program the illusion that it has its own private memory – Suppose code starts at address 0x40000000. But different processes have different code, both residing at the same (virtual) address. So each program has a different view of memory. 6/27/2016 Spring 2011 -- Lecture #24 28 Combine Idea #1 and Idea #2: Protection via Page Table • Access Rights checked on every access to see if allowed – Read: can read, but not write page – Read/Write: read or write data on page – Execute: Can fetch instructions from page • Valid = Valid page table entry 6/27/2016 Spring 2011 -- Lecture #24 29 More Depth on Page Tables Virtual Address: page no. offset 12 bits 20 bits Page Table Base Reg index into page table Page Table ... V A.R. Val Access -id Rights . P. P. A. Physical Page Address ... + Physical Memory Address Page Table located in physical memory 6/27/2016 Spring 2011 -- Lecture #24 30 Address Translation & Protection Virtual Address Virtual Page No. (VPN) offset Kernel/User Mode Read/Write Protection Check Address Translation Exception? Physical Address Physical Page No. (PPN) offset Every instruction and data access needs address translation and protection checks 6/27/2016 Spring 2011 -- Lecture #24 31 Patterson’s Analogy • Book title like virtual address • Library of Congress call number like physical address • Card catalogue like page table, mapping from book title to call # • On card, available for 2-hour in library use (vs. 2-week checkout) like access rights 6/27/2016 Spring 2011 -- Lecture #24 32 Where Should Page Tables Reside? • Space required by the page tables is proportional to the address space, number of users, … – Space requirement is large: e.g., 232 byte address space, 212 byte pages = 220 table entries = 1024 x 1024 entries (per program!) – How many bits per page table entry? – Too expensive to keep in processor registers! 6/27/2016 Spring 2011 -- Lecture #24 33 Student Roulette? Where Should Page Tables Reside? • Idea: Keep page tables in the main memory – One reference to retrieve the page base address from table in memory – Second memory access to retrieve the data word – Doubles the number of memory references! • Why is this bad news? 6/27/2016 Spring 2011 -- Lecture #24 34 Student Roulette? Page Tables Can Be HUGE: Put Them In Physical Memory PT User 1 VA1 Physical Memory PT User 2 User 1 Virtual Address Space VA1 User 2 Virtual Address Space 6/27/2016 Spring 2011 -- Lecture #24 35 Virtual Memory Without Doubling Memory Accesses • Caches suggest that there is temporal and spatial locality of data • Locality of data really means locality of addresses of that data • What about locality of translations of virtual page addresses into physical page addresses? • For historical reasons, called Translation Lookaside Buffer (TLB) – More accurate name is Page Table Address Cache 6/27/2016 Spring 2011 -- Lecture #24 36 Agenda • • • • • • • HW Memory Protection Administrivia Virtual Memory Translation Lookaside Buffer Technology Break Another View of Virtual Memory Summary 6/27/2016 Spring 2011 -- Lecture #24 37 Translation Lookaside Buffers (TLB): Another Layer of Indirection! Address translation is very expensive! Each reference becomes 2 memory accesses Solution: Cache address translations in TLB! TLB hit Single Cycle Translation TLB miss Access Page-Table to refill virtual address V R W X tag PPN VPN offset (VPN = virtual page number) (PPN = physical page number) hit? 6/27/2016 physical address Spring 2011 -- Lecture #24 PPN offset 38 TLB Design • Typically 32-128 entries – Usually fully associative: why wouldn’t direct mapped work? 6/27/2016 Spring 2011 -- Lecture #24 39 Student Roulette? Historical Retrospective: 1960 versus 2010 • Memory used to be very expensive, and amount available to the processor was highly limited – Now memory is cheap: approx $20 per GByte in April 2011 • Many apps’ data could not fit in main memory, e.g., payroll – Paged memory system reduced fragmentation but still required whole program to be resident in the main memory – For good performance, buy enough memory to hold your apps • Programmers moved the data back and forth from the diskstore by overlaying it repeatedly on the primary store – Programmers no longer need to worry about this level of detail anymore 6/27/2016 Spring 2011 -- Lecture #24 41 Demand Paging in Atlas (1962) “A page from secondary storage is brought into the primary storage whenever it is (implicitly) demanded by the processor.” Tom Kilburn Primary memory as a cache for secondary memory User sees 32 x 6 x 512 words of storage 6/27/2016 Primary 32 Pages 512 words/page Central Memory Spring 2011 -- Lecture #24 Secondary (~disk) 32x6 pages 42 Demand Paging Scheme • On a page fault: – Input transfer into a free page is initiated – If no free page available, a page is selected to be replaced (based on usage) – Replaced page is written on the disk • To minimize disk latency effect, the first empty page on the disk was selected – Page table is updated to point to the new location of the page on the disk 6/27/2016 Spring 2011 -- Lecture #24 43 Notes on Page Table • Solves the fragmentation problem: all chunks same size, so any holes can be used • OS must reserve Swap Space on disk for each process • To grow a process, ask Operating System – If unused pages available, OS uses them first – If not, OS swaps some old pages to disk – (Least Recently Used to pick pages to swap) • How/Why grow a process? 6/27/2016 Spring 2011 -- Lecture #24 44 Student Roulette? Impact on TLB • Keep track of whether page needs to be written back to disk if its been modified • Set “Page Dirty Bit” in TLB when any data in page is written • When TLB entry replaced, corresponding Page Dirty Bit is set in Page Table Entry 6/27/2016 Spring 2011 -- Lecture #24 45 Hardware/Software Support for Memory Protection • Different tasks can share parts of their virtual address spaces – But need to protect against errant access – Requires OS assistance • Hardware support for OS protection – Privileged supervisor mode (aka kernel mode) – Privileged instructions – Page tables and other state information only accessible in supervisor mode – System call exception (e.g., syscall in MIPS) Spring 2011 -- Lecture #24 6/27/2016 46 Modern Virtual Memory Systems Illusion of a large, private, uniform store OS Protection & Privacy Several users, each with their private address Space and one or more shared address spaces page table name space useri Swapping Store Demand Paging Provides ability to run programs larger than the primary memory Primary Memory Hides differences in machine configurations Price is address translation on each memory reference; But disk so slow that performance suffers if going to disk all the time (“thrashing”) 6/27/2016 Spring 2011 -- Lecture #24 VA Mapping PA 47 Agenda • • • • • • • HW Memory Protection Administrivia Virtual Memory Translation Lookaside Buffer Technology Break Another View of Virtual Memory Summary 6/27/2016 Spring 2011 -- Lecture #24 48 6/27/2016 Spring 2011 -- Lecture #24 49 Agenda • • • • • • • HW Memory Protection Administrivia Virtual Memory Translation Lookaside Buffer Technology Break Another View of Virtual Memory Summary 6/27/2016 Spring 2011 -- Lecture #24 50 §5.4 Virtual Memory Another View of Virtual Memory: Just Another Part of Memory Hierarchy • Use main memory as a “cache” for secondary (disk) storage – Managed jointly by CPU hardware and the operating system (OS) • Programs share main memory – Each gets a private virtual address space holding its frequently used code and data – Protected from other programs • CPU and OS translate virtual addresses to physical addresses – VM “block” is called a page – VM translation “miss” is called a page fault 6/27/2016 Spring 2011 -- Lecture #24 51 Just Another View of Memory Hierarchy Regs Instr. Operands Upper Level Faster Cache Blocks L2 Cache Blocks { Virtual Memory 6/27/2016 Memory Pages Disk Files Tape Spring 2011 -- Lecture #24 Larger 52 Lower Level Caching vs. Demand Paging secondary memory CPU cache primary memory CPU primary memory Caching Demand paging cache entry page frame cache block (~32 bytes) page (~4K bytes) cache miss rate (1% to 20%) page miss rate (<0.001%) cache hit (~1 cycle) page hit (~100 cycles) cache miss (~100 cycles) page miss (~5M cycles) a miss is handled a miss is handled in hardware mostly in software 6/27/2016 Spring 2011 -- Lecture #24 53 Address Translation: Putting it all Together Virtual Address Restart instruction hardware hardware or software software TLB Lookup miss hit Protection Check Page Table Walk the page is Memory Page Fault (OS loads page) 6/27/2016 memory Update TLB denied Protection Fault Spring 2011 -- Lecture #24 SEGFAULT permitted Physical Address (to cache) 54 Address Translation in CPU Pipeline PC Inst TLB Inst. Cache TLB miss? Page Fault? Protection violation? D Decode E + M Data TLB Data Cache W TLB miss? Page Fault? Protection violation? • Software handlers need restartable exception on TLB fault • Handling a TLB miss needs a hardware or software mechanism to refill TLB • Need mechanisms to cope with the additional latency of a TLB: – Slow down the clock – Pipeline the TLB and cache access – Virtual address caches (indexed with virtual addresses) – Parallel TLB/cache access 6/27/2016 Spring 2011 -- Lecture #24 55 Concurrent Access to TLB & Cache VA VPN L b TLB PA PPN Virtual Index k Direct-map Cache 2L blocks 2b-byte block Page Offset Tag = Physical Tag hit? Index L is available without consulting the TLB cache and TLB accesses can begin simultaneously Tag comparison is made after both accesses are completed Cases: L + b = k 6/27/2016 Data L+b<k L+b>k Spring 2011 -- Lecture #24 56 Impact of Paging on AMAT • Memory Parameters: – – – – L1 cache hit = 1 clock cycles, hit 95% of accesses L2 cache hit = 10 clock cycles, hit 60% of L1 misses DRAM = 200 clock cycles (~100 nanoseconds) Disk = 20,000,000 clock cycles (~ 10 milliseconds) • Average Memory Access Time (with no paging): – 1 + 5%*10 + 5%*40%*200 = 5.5 clock cycles • Average Memory Access Time (with paging) = – AMAT (with no paging) + ? – 5.5 + ? 6/27/2016 Spring 2011 -- Lecture #24 57 Student Roulette? Modern Memory Management • Slowdown too great to run much bigger programs than memory – Called Thrashing – Buy more memory or run program on bigger computer or reduce size of problem • Paging system today still used for – Translation (mapping of virtual address to physical address) – Protection (permission to access word in memory) – Sharing of memory between independent tasks 6/27/2016 Spring 2011 -- Lecture #24 59 Impact of TLBs on Performance • Each TLB miss to Page Table ~ L1 Cache miss • Page sizes are 4 KB to 8 KB (4 KB on x86) • TLB has typically 128 entries – Set Associative or Fully Associative • TLB Reach: Size of largest virtual address space that can be simultaneously mapped by TLB: • 128 * 4 KB = 512 KB = 0.5 MB! • What can you do to have better performance? 6/27/2016 Spring 2011 -- Lecture #24 60 Student Roulette? Nehalem Virtual Memory Details • 48-bit virtual address space, 40-bit physical address space • Two-level TLB: L1 + L2 • I-TLB (L1) has shared 128 entries 4-way associative for 4KB pages, plus 7 dedicated fully-associative entries per SMT thread for large page (2/4MB) entries • D-TLB (L1) has 64 entries for 4KB pages and 32 entries for 2/4MB pages, both 4-way associative, dynamically shared between SMT threads • Unified L2 TLB has 512 entries for 4KB pages only, also 4way associative • Data TLB Reach (4 KB only) L1: 64*4 KB =0.25 MB, L2:512*4 KB= 2MB (superpages) L1: 32 *2-4 MB = 64-128 MB 6/27/2016 Spring 201162 -- Lecture #24 Using Large Pages from Application? • Difficulty is communicating from application to operating system that want to use large pages • Linux: “Huge pages” via a library file system and memory mapping; beyond 61C – See http://lwn.net/Articles/375096/ – http://www.ibm.com/developerworks/wikis/displ ay/LinuxP/libhuge+short+and+simple • Max OS X: no support for applications to do this (OS decides if should use or not) 6/27/2016 Spring 2011 -- Lecture #24 63 Address Translation & Protection Virtual Address Virtual Page No. (VPN) offset Kernel/User Mode Read/Write Protection Check Address Translation Exception? Physical Address Physical Page No. (PPN) offset • Every instruction and data access needs address translation and protection checks Good VM design needs to be fast (~ one cycle) and space efficient 6/27/2016 Spring 2011 -- Lecture #24 64 And in Conclusion, … • Separate Memory Management into orthogonal functions: – Translation (mapping of virtual address to physical address) – Protection (permission to access word in memory) – But most modern systems provide support for all functions with single page-based system • All desktops/servers have full demand-paged virtual memory – – – – Portability between machines with different memory sizes Protection between multiple users or multiple tasks Share small physical memory among active tasks Simplifies implementation of some OS features • Hardware support: User/Supervisor Mode, invoke Supervisor Mode, TLB, Page Table Register 6/27/2016 Spring 2011 -- Lecture #24 65