15-213 Recitation 7 Greg Reshko Office Hours: Wed 2:00-3:00PM March 31st, 2003 Outline Virtual Memory Paging Page faults TLB Address translation Malloc Lab Lots of hints and ideas Virtual Memory Reasons Use RAM as a cache for disk Easier memory management Protection Enable ‘partial swapping’ Share memory efficiently Physical memory Memory Physical Addresses 0: 1: CPU N-1: Virtual Memory Memory 0: 1: Page Table Virtual Addresses 0: 1: Physical Addresses CPU P-1: N-1: Disk Paging: Purpose Solves two problems External memory fragmentation Long delay to swap a whole process Divide memory more finely – small logical memory region Frame – small physical memory region Page Any page can map to any frame Paging: Address Mapping Logical Address Page Offset Frame Offset .... f29 f34 .... Page table Physical Address Paging: Multi-Level P1 P2 Offset .... f07 f08 .... Page Directory .... f99 f87 .... .... f29 f34 f25 Page Tables Frame Offset Page Faults Virtual address not in memory This means it is on a disk Go to disk, fetch the page, load it into memory, get back to the process Memory Memory Page Table Virtual Addresses Physical Addresses CPU Page Table Virtual Addresses Physical Addresses CPU Disk Disk Copy-on-Write “Simulated” Copy Copy page table entries to new process Mark PTEs read-only in old and new What really happens Process writes to page Page fault handler is called Copy page into empty frame Mark read-write in both PTEs Result Faster and less work Relevance to Fork Why is paging good for fork and exec? Fork produces two very similar processes Same code, data, and stack Copying Many will never be modified (especially in exec) Share all pages is expensive pages instead i.e. just mark them as read only and duplicate when necessary Address Translation: General Idea Mapping between virtual and physical addresses page fault fault handler Processor V Hardware Addr Trans Mechanism Main Memory Secondary memory P virtual address part of the physical address on-chip memory mgmt unit (MMU) OS performs this transfer (only if miss) Address Translation: In terms of address itself Higher bits of the address get mapped from virtual address to physical. Lower bits (page offset) stays the same. p p–1 n–1 virtual page number 0 virtual address page offset address translation m–1 p p–1 physical page number page offset 0 physical address TLB Translation Lookaside Buffer Small hardware cache in MMU Maps virtual page numbers to physical page numbers hit PA VA CPU miss TLB Lookup miss Cache hit Translation data Main Memory Address Translation with TLB n–1 p p–1 0 virtual page number page offset valid . virtual address tag physical page number . TLB . = TLB hit physical address tag index valid tag byte offset data Cache = cache hit data Example Motivation: A detailed example of end-to-end address translation Same as in the book and lecture I just want to make sure it makes perfect sense Do practice problems at home Ask questions if anything is unclear Example: Description Memory is byte addressable Accesses are to 1-byte words Virtual addresses are 14 bits Physical addresses are 12 bits Page size is 64 bytes TLB is 4-way set associative with 16 total entries L1 d-cache is physically addressed and direct mapped, with 4-byte line size and 16 total sets Example: Addresses 14-bit virtual addresses 12-bit physical address Page size = 64 bits 13 12 11 10 9 8 7 6 5 4 VPN 10 2 1 0 VPO (Virtual Page Offset) (Virtual Page Number) 11 3 9 8 7 6 5 4 3 2 1 PPN PPO (Physical Page Number) (Physical Page Offset) 0 Example: Page Table VPN PPN Valid VPN PPN Valid 00 28 1 08 13 1 01 – 0 09 17 1 02 33 1 0A 09 1 03 02 1 0B – 0 04 – 0 0C – 0 05 16 1 0D 2D 1 06 – 0 0E 11 1 07 – 0 0F 0D 1 … Example: TLB TLBT 13 12 11 10 TLBI 9 8 7 6 5 4 3 VPN 2 1 VPO 16 entries 4-way associative Set Tag PPN Valid Tag PPN Valid Tag PPN Valid Tag PPN Valid 0 03 – 0 09 0D 1 00 – 0 07 02 1 1 03 2D 1 02 – 0 04 – 0 0A – 0 2 02 – 0 08 – 0 06 – 0 03 – 0 3 07 – 0 03 0D 1 0A 34 1 02 – 0 0 Example: Cache CI CT 16 lines 4-byte line size Direct mapped 11 10 9 8 7 6 5 4 PPN CO 3 2 1 0 PPO Index Tag Valid B0 B1 B2 B3 Index Tag Valid B0 B1 B2 B3 0 19 1 99 11 23 11 8 24 1 3A 00 51 89 1 15 0 – – – – 9 2D 0 – – – – 2 1B 1 00 02 04 08 A 2D 1 93 15 DA 3B 3 36 0 – – – – B 0B 0 – – – – 4 32 1 43 6D 8F 09 C 12 0 – – – – 5 0D 1 36 72 F0 1D D 16 1 04 96 34 15 6 31 0 – – – – E 13 1 83 77 1B D3 7 16 1 11 C2 DF 03 F 14 0 – – – – Example: Address Translation Virtual Address 0x03D4 Split into offset and page number 0x03D4 = 00001111010100 VPO = 010100 = 0x14 VPN = 00001111 = 0x0F Lets see if this is in TLB 0x03D4 = 00001111010100 TLBI = 11 = 0x03 TLBT = 000011 = 0x03 Example: TLB TLBT 13 12 11 10 TLBI 9 8 7 6 5 4 3 VPN 2 1 VPO 16 entries 4-way associative Set Tag PPN Valid Tag PPN Valid Tag PPN Valid Tag PPN Valid 0 03 – 0 09 0D 1 00 – 0 07 02 1 1 03 2D 1 02 – 0 04 – 0 0A – 0 2 02 – 0 08 – 0 06 – 0 03 – 0 3 07 – 0 03 0D 1 0A 34 1 02 – 0 0 Example: Address Translation Virtual Address 0x03D4 TLB lookup This address is in TLB (second entry, set 0x3) PPN = 0x0D = 001101 PPO = VPO = 0x14 = 010100 PA = PPN + PPO = 001101010100 Cache PA = 0x354 = 0x001101010100 CT = 001101 = 0x0D CI = 0101 = 0x05 CO = 00 = 0x0 Example: Cache CI CT 16 lines 4-byte line size Direct mapped 11 10 9 8 7 6 5 4 PPN CO 3 2 1 0 PPO Index Tag Valid B0 B1 B2 B3 Index Tag Valid B0 B1 B2 B3 0 19 1 99 11 23 11 8 24 1 3A 00 51 89 1 15 0 – – – – 9 2D 0 – – – – 2 1B 1 00 02 04 08 A 2D 1 93 15 DA 3B 3 36 0 – – – – B 0B 0 – – – – 4 32 1 43 6D 8F 09 C 12 0 – – – – 5 0D 1 36 72 F0 1D D 16 1 04 96 34 15 6 31 0 – – – – E 13 1 83 77 1B D3 7 16 1 11 C2 DF 03 F 14 0 – – – – Example: Address Translation Virtual Address 0x03D4 Cache Hit Tag in set 0x5 matches CT Data at offset CO is 0x36 Data returned to MMU Data returned to CPU Lab 6 Hints and Ideas Due April 16 40 points for performance 20 points for correctness 5 points for style Get the correctness points this week Get a feel for how hard the lab is You'll probably need the time Starting a couple days before is a BAD idea! How to get the correctness points We provide mm-helper.c which contains the code from the book malloc works free works (with coalescing) Heap checking doesn't work realloc doesn't work Implement a dumb version of realloc malloc new block, memcpy, free old block, return new block How to get the correctness points Implement heap checking Have to add a request id field to each allocated block (tricky) Hint: need padding to maintain 8 byte alignment of user pointer In the book's code bp always the same as the user pointer Size+a Payload… Footer bp The 4 bytes immediately before bp contain size of payload 3 lsb of size unused (because of alignment) first bit indicates of the block is alloced or not How to get the correctness points Need to change block layout to look like this: ID Size+a Payload… Footer bp This changes how the implicit list has to be traversed But size is at same place relative to bp How to get the correctness points Or change block layout to look like this: Size+a ID Payload… Footer bp All accesses to what was size now access id but can be clever and make size 4 bytes larger Could even make bp point to id.. Most code would just work How to get the correctness points Once malloc, free, and realloc work with the id field, write heapcheck Iterate over the whole heap and print out allocated blocks Need to read the id field… That's it for correctness Hints Remember that pointer arithematic behaves differently depending on type of pointer Consider using structs/unions to eliminate some messy pointer code Get things working with the short trace file first: ./mdriver -f short1-bal.rep To get the best performance Red-Black trees Ternary trees Other interesting data structures That’s it for hints… Good Luck!