CS 152 Computer Architecture and Engineering Lecture 15 – Virtual Memory 2005-10-20 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/ CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB Last Time: Practical Cache Design Cache design control is done by many loosely coupled state machines, including ... State Machine To CPU Control Control Control Addr To CPU Din Dout CS 152 L15: Virtual Memory Addr Blocks Tags Din Dout To Lower Level Memory To Lower Level Memory UC Regents Fall 2005 © UCB State machines for bus control .... For reads, your state machine must: To Processor Upper Level Memory Small, fast (1) sense REQ (2) latch Addr (3) create Wait (4) put Data Out on the bus. Blk X From Processor Lower Level Memory Large, slow Blk Y From CPU To CPU An example interface ... there are other possibilities. CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB fou r or th e last d esired of a lon ger b u rst. Th e 256Mb SDRAM u ses a p ip elin ed arch itectu re an d th erefore d oes n ot req u ire th e 2n ru le associated with a p refetch sam e b an k, as sh own in Figu re 14, or each su b se READ m ay b e p erform ed to a d ifferen t b an k. State machines for block fetch from DRAM DRAM can be set up to request an N byte region Figure 13: Consecut READ Burst s within region starting at anive arbitrary N+k One request ... T0 T1 T2 T3 T4 T5 T6 CLK COM M AND READ NOP NOP NOP READ NOP NOP X = 1 cycle ADDRESS BANK, COL n BANK, COL b DOUT n DQ CAS Lat ency = 2 DOUT n+1 DOUT n+2 DOUT n+3 DOUT b Many returns ... T0 T1challenges: T2 T3 setting T4 up correct T5 T6 State machine (1) block T7 CLK mode (2) delivering correct word direct to CPU (3) read putting all words in cache in right place. CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB State machine for writeback to DRAM Figure 47: Writ e – Wit h Aut o Precharge 1 T0 t CK CLK One command ... t CKS T1 t CL T2 T3 T4 T5 T6 T7 NOP NOP NOP NOP NOP t CH t CKH CKE tCM S tCM H COM M AND ACTIVE NOP WRITE t CM S t CM H DQM / DQM L, DQM U t AS A0-A9, A11, A12 Many bytes written t AH ENABLE AUTO PRECHARGE ROW t AS BA0, BA1 COLUM N m 2 ROW t AS A10 t AH t AH BANK BANK t DS t DH DIN m DQ t RCD t DS t DH DIN m + 1 t DS t DH DIN m + 2 t DS t DH DIN m + 3 t WR t RAS State machine challenges: (1) putting cache block into correct location (2) what if a read or write wants to use DRAM before the burst is complete? Must stall ... t RC CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB State machines to manage write buffer Solution: add a “write buffer” to cache datapath Processor Cache Lower Level Memory Write Buffer Holds data awaiting write-through to lower level memory Q. Why a write buffer ? A. So CPU doesn’t stall Q. Why a buffer, why A. Bursts of writes are not just one register ? common. Q. Are Read After Write A. Yes! Drain buffer (RAW) hazards an issue before next read, or for write buffer? check write buffers. On reads, state machine checks cache and write buffer -what if word was removed from cache before lower-level write? On writes, state machine stalls for full write buffer, handles write buffer duplicates. CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB Don’t design one big state machine!!! Focus on the high-level state machine structure early! State Machine To CPU Control Control Control Addr To CPU Din Dout CS 152 L15: Virtual Memory Addr Blocks Tags Din Dout To Lower Level Memory To Lower Level Memory UC Regents Fall 2005 © UCB Today’s Lecture - Virtual Memory Virtual address spaces Page table layout TLB design options CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB The Limits of Physical Addressing “Physical addresses” of memory locations A0-A31 CPU A0-A31 Where we are in CS 152 ... D0-D31 Memory D0-D31 Data All programs share one address space: The physical address space Machine language programs must be aware of the machine organization No way to prevent a program from accessing any machine resource CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB Apple II: A physically addressed machine Apple ][ (1977) CPU: 1000 ns DRAM: 400 ns Steve Jobs CS 152 L13: Cache I Steve Wozniak UC Regents Fall 2005 © UCB The Limits of Physical Addressing “Physical addresses” of memory locations A0-A31 CPU A0-A31 Programming the Apple ][ ... D0-D31 Memory D0-D31 Data All programs share one address space: The physical address space Machine language programs must be aware of the machine organization No way to prevent a program from accessing any machine resource CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB Solution: Add a Layer of Indirection “Physical Addresses” “Virtual Addresses” A0-A31 Virtual Physical Address Translation CPU D0-D31 A0-A31 Memory D0-D31 Data User programs run in an standardized virtual address space Address Translation hardware managed by the operating system (OS) maps virtual address to physical memory Hardware supports “modern” OS features: Protection, Translation, Sharing CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB MIPS R4000: Address Space Model Process A ASID = 12 32 2 -1 Address Error 2 ASID = Address Space Identifier Process A and B have independent address spaces 2 GB 32 2 -1 Address Error 2 31 When Process A writes its address 9, it writes to a different physical memory location than Process B’s address 9 To let Process A and B share memory, OS maps parts of ASID 12 and ASID 13 to the same physical memory locations. 0 ASID = 13 All address spaces use a standard memory map May only be accessed by kernel/supervisor 31 Process B 2 GB 0 Still works (slowly!) if a process accesses more virtual memory than the machine has physical memory CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB 4.3 System Control Coprocessor MIPS R4000: Who’s Running on the CPU? The System Control Cop rocessor (CP0) is im p lem ented as an integral p art of the CPU, and su p p orts m em ory m anagem ent, ad d ress translation, excep tion hand ling, and other p rivileged op erations. CP0 contains the registers show n in Figu re 4-7 p lu s a 48-entry TLB. The sections that follow d escribe how the p rocessor u ses the m em ory m anagem ent-related registers †. System Control Registers Each CP0 register has a u niqu e nu m ber that id entifies it; this nu m ber is referred to as the register number. For instance, the Page M ask register is register nu m ber 5. EntryLo0 EntryLo0 2*2* EntryHi EntryHi 10* EntryLo1 3* 47 Context Index 4* Random Random Count Page Mask Page Mask Status 12* 13* Wired Wired EPC WatchLo 1* 6* 0 LLAddr 17* Compare 11* Cause 14* 15* 19* Config ECC 16* 26* XContext 20* CacheErr 27* TagLo TagHi ErrorEPC 28* 29* 30* Used with memory management system. *Register number Status (12): Indicates user, supervisor, or kernel mode 18* WatchHi PRId (“Safe” entries) (See Random Register, contents of TLB Wired) 127 0 8* 9* 5* TLB BadVAddr 0* Index EntryLo0 (2): 8-bit ASID field codes virtual address space ID. Used with exception processing. See Chapter 5 for details. Figure 4-7 CP0 Registers and the TLB User cannot write supervisor/kernel bits. Supervisor cannot write kernel bit. † For a d escrip tion of CP0 d ata d ep end encies and hazard s, p lease see Ap p end ix F. 80 M IPS R4000 M icroprocessor User' s M anual User cannot change address translation configuration CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB MIPS Address Translation: How it works “Physical Addresses” “Virtual Addresses” A0-A31 Virtual CPU D0-D31 Data Physical Translation Look-Aside Buffer (TLB) A0-A31 Memory D0-D31 What is the table Translation Look-Aside Buffer (TLB) of A small fully-associative cache of mappings mappings from virtual to physical addresses that it caches? TLB also contains ASID and kernel/supervisor bits for virtual address Fast common case: Virtual address is in TLB, process has permission to read/write it. CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB size for ea Page tables encode virtual address spaces Page Table (One per ASID) Physical Memory Space frame frame A virtual address space is divided into blocks of memory called pages frame frame virtual address OS manages the page table for each ASID A machine usually supports pages of a few sizes (MIPS R4000): A page table is indexed by a virtual address Page Size TLB read d estinatio p hysical a ad d ress bi field is no is u nd efin Ta 24 23 4 Kbytes 0 0 16 Kbytes 0 0 64 Kbytes 0 0 256 Kbytes 0 0 1 Mbyte 0 0 4 Mbytes 0 0 16 Mbytes 1 1 A valid page table entry codes physical memory “frame” address for the page M IPS R4000 M icroprocessor Us CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB TheWhat TLB page table entries is caches virtual memory? Virtual Address Space Physical Address Space TLB caches page table entries. virtual address page off Virtual Address 10 offset V page no. Page Table Page Table for ASIDReg Base index Page Table into page table 2 0 V Access Rights PA In this example, physical and virtual pages must be the same size! Physical frame address table located in physical P page no. memory offset 10 Physical Address 1 V=0 pages either physical address as a cache for the disk ° Virtual memory => treat memory reside on disk or ° Terminology:TLB blocks inpage thisoffcache are called “Pages” have not yet been • Typical frame size of a page: 1K — 8K page allocated. 2 2 virtual page numbers to physical ° Page table maps frames OS handles V=0 0 5 MIPS handles TLB misses • “PTE” = Page Table Entry “Page fault” in software (random CS152 / Kubiatowicz 3 4/19/04 CS 152 L15: Virtual Memory ©UCB Spring 2004 replacement). Other machines use hardware. Lec21.22 UC Regents Fall 2005 © UCB Page tables may not fit in memory! A table for 4KB pages for a 32-bit address space has 1M entries Large Address Spaces Each process needs its own address space! Two-level Page Tables Two-level Page Tables ookup 32-bit address: 10 10 32 bit virtual address P1 index P2 index y4 1K PTEs 4KB 12 page offest 31 22 21 12 11 0 P1 index P2 index Page Offset 4 bytes Top-level table wired in main memory ° 2 GB virtual address space ° 4 MB of PTE2 Subset of 1024 second-level tables in – paged, holes main memory; °rest are 4 KB of PTE1on disk or unallocated 4 bytes What about a 48-64 bit address space? 52 / Kubiatowicz Lec21.25 4/19/04 CS 152 L15: Virtual Memory ©UCB Spring 2004 CS152 / Kubiatowicz Lec21.26 UC Regents Fall 2005 © UCB What if isa virtual page memory? resides on disk? What Virtual Address Space Physical Address Space TLB caches page table entries. virtual address page off Virtual Address 10 offset V page no. Page Table Page Table for ASIDReg Base index Page Table into page table 2 0 V Access Rights PA Physical frame address table located in physical P page no. memory offset 10 Physical Address 1 V=0 pages either physical address as a cache for the disk ° Virtual memory => treat memory reside on disk or ° Terminology:TLB blocks inpage thisoffcache are called “Pages” have not yet been • Typical frame size of a page: 1K — 8K page allocated. 2 2 virtual page numbers to physical ° Page table maps frames OS handles V=0 0 5 Question: What to do when a • “PTE” = Page Table Entry “Page fault” TLB miss causes an access to a CS152 / Kubiatowicz 3 4/19/04 CS 152 L15: Virtual Memory ©UCB Spring 2004 page table entry with V=0? Lec21.22 UC Regents Fall 2005 © UCB VM and Disk: Page replacement policy Dirty bit: page written. Used bit: set to 1 on any reference Set of all pages in Memory Head pointer Place pages on free list if used bit is still clear. Schedule pages with dirty bit set to be written to disk. Page Table dirty used 1 1 0 1 0 0 0 1 1 0 ... Tail pointer: Clear the used bit in the page table Freelist On page fault: deallocate page table entry of a page on the free list. Free Pages Architect’s role: support setting dirty and used bits CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB Friday: Design document deadlines IM Bus IC Bus Instruction Cache P i p e l i n e d C P U DM Bus DC Bus Data Cache CS 152 L8: Pipelining I D R A M C o n t r o l l e r DRAM List the bugs you will target in test benches. UC Regents Spring 2005 © UCB Also: Lab 3 Peer Evaluations ... CS 152 L16: Error Correcting Codes Define the timing diagrams and signal names for the IM, DM, IC, DC buses. Other items ... UC Regents Fall 2005 © UCB TLB Design Concepts CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB MIPS R4000 TLB: A closer look ... M emory M anagement 32-bit Mode Address Translation Figu re 4-2 show s the virtu al-to-p hysical-ad d ress translation of a 32-bit “Physical Addresses” “Virtual Addresses” m od e ad d ress. • A0-A31 CPU The bottom pLook-Aside ortion of Figu re 4-2 show s a virtu al ad dSystem ress w ith a 24-bit, or 16-Mbyte, p age size, labelled Offset. The rem aining Buffer D0-D31 8 bits of the ad d ress rep resent the VPN , and ind ex the 256(TLB) Dataentry p age table. • D0-D31 Checked against CPO ASID The top p ortion of Figu re 4-2 show s a virtu al ad d ress w ith a 12-bit, or 4-Kbyte, labelled Offset. The rem aining 20 Virtual p age size, Physical A0-A31 bits of the ad d ress rep resent the VPN , and ind ex the 1M-entry p age table. Translation Memory Virtual Address with 1M (220) 4-Kbyte pages 39 32 31 29 28 20 bits = 1M pages 12 11 Physical space larger than virtual space! 0 ASID VPN Offset 8 20 12 Virtual-to-physical translation in TLB Bits 31, 30 and 29 of the virtual address select user, supervisor, or kernel address spaces. TLB 36-bit Physical Address 35 0 PFN Virtual-to-physical translation in TLB CS 152 L15: Virtual Memory Offset passed unchanged to physical memory TLB Offset Offset passed unchanged to physical UC Regents Fall 2005 © UCB memory Can TLB and caching be overlapped? Virtual Page Number Page Offset Index Byte Select Virtual Translation Look-Aside Buffer (TLB) Cache Tags Valid Cache Data Cache Block Physical Cache Tag This works, but ... = Cache Block Hit Q. What is the downside? A. Inflexibility. VPN size locked to cache tag size. CS 152 L15: Virtual Memory Data out UC Regents Fall 2005 © UCB Can we cache virtual addresses? “Physical Addresses” “Virtual Addresses” A0-A31 Virtual Virtual Cache CPU D0-D31 D0-D31 Physical Translation Look-Aside Buffer (TLB) A0-A31 Main Memory D0-D31 Only use TLB on a cache miss ! Downside: a subtle, fatal problem. What is it? A. Synonym problem. If two address spaces share a physical frame, data may be in cache twice. Maintaining consistency is a nightmare. CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB Conclusions VM: Uniform memory models, protection, sharing. A TLB acts as a fast cache for recent address translations. Operating systems manage the page table and (often) the TLB CS 152 L15: Virtual Memory UC Regents Fall 2005 © UCB Tuesday: How ECC memory works ... Detecting and correcting RAM bit errors Replacing lost network packets, recovering from disk drive failure Detecting arbitrary bit errors in network packets CS 152 L16: Error Correcting Codes UC Regents Fall 2005 © UCB