virtual addresses

55:035

Computer Architecture and Organization

Lecture 8

55:035 Computer Architecture and Organization 1

Outline

 Virtual Memory

 Basics

 Address Translation

 Cache vs VM

 Paging

 Replacement

 TLBs

 Segmentation

 Page Tables


The Full Memory Hierarchy

Capacity

Access Time

Cost

CPU Registers

100s Bytes

<10s ns

Cache

K Bytes

10-100 ns

1-0.1 cents/bit

Registers

Instr. Operands

Cache

Blocks

Staging

Xfer Unit prog./compiler

1-8 bytes cache cntl

8-128 bytes

Main Memory

M Bytes

200ns- 500ns

$.0001-.00001 cents /bit

Disk

G Bytes, 10 ms

(10,000,000 ns)

-5 -6

10 - 10 cents/bit

Tape infinite sec-min

10

-8

Memory

Disk

Tape

Pages

Files

55:035 Computer Architecture and Organization

OS

4K-16K bytes user/operator

Mbytes

Upper Level faster

Larger

Lower Level

3

Virtual Memory



Some facts of computer life…

 Computers run lots of processes simultaneously

 No full address space of memory for each process

 Must share smaller amounts of physical memory among many processes

 Virtual memory is the answer!

 Divides physical memory into blocks, assigns them to different processes


Virtual Memory

 Virtual memory (VM) allows main memory

(DRAM) to act like a cache for secondary storage (magnetic disk).

 VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory or on disk .

Compiler assigns data to a “virtual” address.

VA translated to a real/physical somewhere in memory…

(allows any program to run anywhere; where is determined by a particular machine, OS)


VM Benefit

 VM provides the following benefits

 Allows multiple programs to share the same physical memory

 Allows programmers to write code as though they have a very large amount of main memory

 Automatically handles bringing in data from disk


Virtual Memory Basics



Programs reference “virtual” addresses in a non-existent memory



These are then translated into real “physical” addresses

 Virtual address space may be bigger than physical address space

 Divide physical memory into blocks, called pages

 Anywhere from 512 to 16MB (4k typical)

 Virtual-to-physical translation by indexed table lookup

 Add another cache for recent translations (the TLB)

 Invisible to the programmer

 Looks to your application like you have a lot of memory!


VM: Page Mapping

Process 1 ’ s

Virtual

Address

Space

Page Frames

Process 2 ’ s

Virtual

Address

Space

Physical Memory


Disk

8

VM: Address Translation

20 bits 12 bits

Virtual page number Page offset

Per-process page table

Page

Table base

Log

2 of pagesize

Valid bit

Protection bits

Dirty bt

Reference bit

Physical page number Page offset

To physical memory


Example of virtual memory





Relieves problem of making a program that was too large to fit in physical memory – well….fit!

Allows program to run in any location in physical memory

 (called relocation)

 Really useful as you might want to run same program on lots machines…

Virtual

Address

0

4

8

12

A

B

C

D

Virtual Memory

Physical

Address

0

4K

8K

12K

16K

20K

24K

28K

C

A

B

D

Physical

Main Memory

Disk

Logical program is in contiguous VA space; here, consists of 4 pages: A, B, C, D;

The physical location of the 3 pages – 3 are in main memory and 1 is located on the disk


Cache terms vs. VM terms

So, some definitions/“analogies”



A “ page ” or “ segment ” of memory is analogous to a

“ block ” in a cache



A “ page fault ” or “ address fault ” is analogous to a cache miss so, if we go to main memory and our data isn’t there, we need to get it from disk…

“real”/physical memory


More definitions and cache comparisons



These are more definitions than analogies…



With VM, CPU produces “ virtual addresses ” that are translated by a combination of HW/SW to “ physical addresses ”



The “ physical addresses ” access main memory



The process described above is called “ memory mapping ” or

“ address translation ”


Cache VS. VM comparisons (1/2)

Parameter

Block (page) size

First-level cache

12-128 bytes

Virtual memory

4096-65,536 bytes

Hit time 1-2 clock cycles

Miss penalty

(Access time)

(Transfer time)

8-100 clock cycles

(6-60 clock cycles)

(2-40 clock cycles)

40-100 clock cycles

700,000 – 6,000,000 clock cycles

(500,000 – 4,000,000 clock cycles)

(200,000 – 2,000,000 clock cycles)

Miss rate

Data memory size

0.5 – 10%

0.016 – 1 MB

0.00001 – 0.001%

4MB – 4GB


Cache VS. VM comparisons (2/2)

 Replacement policy:

 Replacement on cache misses primarily controlled by hardware

 Replacement with VM (i.e. which page do I replace?) usually controlled by OS

 Because of bigger miss penalty, want to make the right choice

 Sizes:

 Size of processor address determines size of VM

 Cache size independent of processor address size


Virtual Memory



Timing’s tough with virtual memory:

 AMAT = T mem

+ (1-h) * T disk

 = 100nS + (1-h) * 25,000,000nS

 h (hit rate) had to be incredibly (almost unattainably) close to perfect to work

 so: VM is a “cache” but an odd one.


Paging Hardware

How big is a page?

How big is the page table?

CPU

32 page offset frame offset

32

Physical

Memory page page table frame


Address Translation in a Paging System

Virtual Address

Page # Offset Frame # Offset

Register

Page Table Ptr

Page Table

Offset

+

P#

Frame #

Page

Frame

Program Paging


Main Memory

17

How big is a page table?

 Suppose

 32 bit architecture

 Page size 4 kilobytes

 Therefore:

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

Page Number 2 20


Offset 2 12

18

Test Yourself

A processor asks for the contents of virtual memory address

0x10020. The paging scheme in use breaks this into a VPN of

0x10 and an offset of 0x020.

PTR (a CPU register that holds the address of the page table) has a value of 0x100 indicating that this process’s page table starts at location 0x100.

The machine uses word addressing and the page table entries are each one word long.

PTR 0x100

Memory Reference

VPN OFFSET

0x010 0x020


Test Yourself

ADDR CONTENTS

0x00000

0x00100

0x00110

0x00000

0x00010

0x00022

0x00120

0x00130

0x00145

0x00045

0x00078

0x00010

0x10000

0x10020

0x22000

0x22020

0x45000

0x45020

0x03333

0x04444

0x01111

0x02222

0x05555

0x06666

PTR 0x100


VPN OFFSET

0x010 0x020

• What is the physical address calculated?

1.

10020

2.

22020

3.

45000

4.

45020

5.

none of the above


Test Yourself

ADDR CONTENTS

0x00000 0x00000

0x00100

0x00110

0x00120

0x00010

0x00022

0x00045

0x00130

0x00145

0x10000

0x10020

0x22000

0x22020

0x45000

0x45020

0x00078

0x00010

0x03333

0x04444

0x01111

0x02222

0x05555

0x06666

PTR 0x100


VPN OFFSET

0x010 0x020

• What is the physical address calculated?

• What is the contents of this address returned to the processor?

• How many memory accesses in total were required to obtain the contents of the desired address?


Another Example

7

8

9

10

11

12

13

14

15

2

3

4

5

6

Logical memory

0

1 a b c d e f g m n k l o p h i j

00

01

10

11

01 01

Page Table

0

1

2

3

5

6

1

2

101

110

001

010

110 01


26

27

28

29

30

31

20

21

22

23

24

25

16

17

18

19

10

11

12

13

14

15

8

9

6

7

3

4

5

1

2

Physical memory

0 k l i j o p m n c d e f a b g h

22

Replacement policies


Block replacement

 Which block should be replaced on a virtual memory miss?



Again, we’ll stick with the strategy that it’s a good thing to eliminate page faults

 Therefore, we want to replace the LRU block



Many machines use a “use” or “reference” bit

 Periodically reset



Gives the OS an estimation of which pages are referenced


Writing a block

 What happens on a write?



We don’t even want to think about a write through policy!

 Time with accesses, VM, hard disk, etc. is so great that this is not practical

 Instead, a write back policy is used with a dirty bit to tell if a block has been written


Mechanism vs. Policy

 Mechanism:

 paging hardware

 trap on page fault

 Policy:

 fetch policy : when should we bring in the pages of a process?

 1. load all pages at the start of the process



2. load only on demand: “demand paging”

 replacement policy : which page should we evict given a shortage of frames?


Replacement Policy

 Given a full physical memory, which page should we evict??

 What policy?

 Random

 FIFO: First-in-first-out

 LRU: Least-Recently-Used

 MRU: Most-Recently-Used

 OPT: (will-not-be-used-farthest-in-future)


Replacement Policy Simulation

 example sequence of page numbers

 0 1 2 3 42 2 37 1 2 3

 FIFO?

 LRU?

 OPT?

 How do you keep track of LRU info? (another data structure question)


Page tables and lookups…



1. it’s slow! We’ve turned every access to memory into two accesses to memory

 solution: add a specialized “cache” called a “translation lookaside buffer (TLB)” inside the processor



2. it’s still huge!

 even worse: we’re ultimately going to have a page table for every process . Suppose 1024 processes, that’s 4GB of page tables!


Paging/VM (1/3)

Operating

System

CPU 42 356 356

Physical

Memory page table i

Disk


Paging/VM (2/3)

Operating

System

CPU 42 356 356

Physical

Memory page table i

Disk

Place page table in physical memory

However: this doubles the time per memory access!!


Paging/VM (3/3)

Operating

System

CPU 42 356 356

Physical

Memory page table i Cache!

Disk

Special-purpose cache for translations

Historically called the TLB: Translation Lookaside Buffer


Translation Cache

Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped

TLBs are usually small, typically not more than 128 - 256 entries even on high end machines. This permits fully associative lookup on these machines.

Most mid-range machines use small n-way set associative organizations.

Note: 128-256 entries times 4KB-16KB/entry is only 512KB-4MB the L2 cache is often bigger than the “span” of the TLB.

CPU

Translation with a TLB

VA

TLB

Lookup miss

Translation

Cache hit data

55:035 Computer Architecture and Organization miss

Main

Memory

33

Translation Cache

A way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most frequently used is Translation Lookaside Buffer or TLB

Virtual Page # Physical Frame # Dirty Ref Valid Access tag

Really just a cache (a special-purpose cache) on the page table mappings

TLB access time comparable to cache access time

(much less than main memory access time)


An example of a TLB

Page frame address

<30>

Page

Offset

<13>

Read/write policies and permissions…

1 2

V R W

<1> <2> <2>

Tag

<30>

Phys. Addr.

<21>

(Low-order 13 bits of addr.)

<13>

... …

3

4

…

32:1 Mux

<21>

(High-order 21 bits of addr.)

34-bit physical address


The “big picture” and TLBs



Address translation is usually on the critical path…



…which determines the clock cycle time of the m

P

 Even in the simplest cache, TLB values must be read and compared

 TLB is usually smaller and faster than the cacheaddress-tag memory



This way multiple TLB reads don’t increase the cache hit time

 TLB accesses are usually pipelined b/c its so important!


The “big picture” and TLBs

Virtual Address TLB access

No Yes

TLB Hit?

Try to read from page table

No

Write?

Try to read from cache

Page fault?

Yes

Replace page from disk

No

TLB miss stall

No

Cache miss stall

Cache hit?

Yes

Deliver data to CPU


Yes

Set in TLB

Cache/buffer memory write

37

Pages are Cached in a Virtual Memory System

 Can Ask the Same Four Questions we did about caches

 Q1: Block Placement

 choice: lower miss rates and complex placement or vice versa

 miss penalty is huge

 so choose low miss rate ==> place page anywhere in physical memory

 similar to fully associative cache model

 Q2: Block Addressing - use additional data structure

 fixed size pages - use a page table

 virtual page number ==> physical page number and concatenate offset

 tag bit to indicate presence in main memory


Normal Page Tables

 Size is number of virtual pages

 Purpose is to hold the translation of VPN to PPN





Permits ease of page relocation

Make sure to keep tags to indicate page is mapped

 Potential problem:









Consider 32bit virtual address and 4k pages

4GB/4KB = 1MW required just for the page table!

Might have to page in the page table…

 Consider how the problem gets worse on 64bit machines with even larger virtual address spaces!

Might have multi-level page tables


Inverted Page Tables







Similar to a set-associative mechanism

Make the page table reflect the # of physical pages (not virtual)

Use a hash mechanism

 virtual page number ==> HPN index into inverted page table







Compare virtual page number with the tag to make sure it is the one you want if yes

 check to see that it is in memory - OK if yes - if not page fault

If not - miss

 go to full page table on disk to get new entry

 implies 2 disk accesses in the worst case

 trades increased worst case penalty for decrease in capacity induced miss rate since there is now more room for real pages with smaller page table


Inverted Page Table

Page Offset

Hash Page Frame V

•Only store entries for pages in physical memory

=

OK

Frame Offset


Address Translation Reality

 The translation process using page tables takes too long!

 Use a cache to hold recent translations

 Translation Lookaside Buffer

















Typically 8-1024 entries

Block size same as a page table entry (1 or 2 words)

Only holds translations for pages in memory

1 cycle hit time

Highly or fully associative

Miss rate < 1%

Miss goes to main memory (where the whole page table lives)

Must be purged on a process switch


Back to the 4 Questions

 Q3: Block Replacement (pages in physical memory)







LRU is best

 So use it to minimize the horrible miss penalty

However, real LRU is expensive









Page table contains a use tag

On access the use tag is set

OS checks them every so often, records what it sees, and resets them all

On a miss, the OS decides who has been used the least

Basic strategy: Miss penalty is so huge, you can spend a few OS cycles to help reduce the miss rate


Last Question

 Q4: Write Policy

 Always write-back



Due to the access time of the disk

 So, you need to keep tags to show when pages are dirty and need to be written back to disk when they’re swapped out.





Anything else is pretty silly

Remember – the disk is SLOW!


Page Sizes







An architectural choice

Large pages are good:





 reduces page table size amortizes the long disk access if spatial locality is good then hit rate will improve

Large pages are bad:

 more internal fragmentation

 if everything is random each structure’s last page is only half full

 Half of bigger is still bigger

 if there are 3 structures per process: text, heap, and control stack

 then 1.5 pages are wasted for each process

 process start up time takes longer

 since at least 1 page of each type is required to prior to start

 transfer time penalty aspect is higher


More on TLBs

 The TLB must be on chip





 otherwise it is worthless small TLB’s are worthless anyway large TLB’s are expensive

 high associativity is likely



==> Price of CPU’s is going up!

 OK as long as performance goes up faster


Selecting a Page Size





Reasons for larger page size

 Page table size is inversely proportional to the page size; therefore memory saved





Fast cache hit time easy when cache size < page size (VA caches); bigger page makes this feasible as cache size grows

Transferring larger pages to or from secondary storage, possibly over a network, is more efficient

 Number of TLB entries are restricted by clock cycle time, so a larger page size maps more memory, thereby reducing TLB misses

Reasons for a smaller page size





Want to avoid internal fragmentation: don’t waste storage; data must be contiguous within page

Quicker process start for small processes don’t need to bring in more memory than needed


Memory Protection





With multiprogramming , a computer is shared by several programs or processes running concurrently



Need to provide protection

 Need to allow sharing

Mechanisms for providing protection



Provide Base and Bound registers: Base

ฃ

Address

ฃ

Bound

 Provide both user and supervisor (operating system) modes







Provide CPU state that the user can read, but cannot write

 Branch and bounds registers, user/supervisor bit, exception bits

Provide method to go from user to supervisor mode and vice versa

 system call : user to supervisor

 system return : supervisor to user

Provide permissions for each flag or segment in memory


Pitfall: Address space to small





One of the biggest mistakes than can be made when designing an architecture is to devote to few bits to the address

 address size limits the size of virtual memory

 difficult to change since many components depend on it (e.g., PC, registers, effective-address calculations)

As program size increases, larger and larger address sizes are needed





8 bit: Intel 8080 (1975)

16 bit: Intel 8086 (1978)





24 bit: Intel 80286 (1982)

32 bit: Intel 80386 (1985)

 64 bit: Intel Merced (1998)


Virtual Memory Summary

 Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk).

 The large miss penalty of virtual memory leads to different stategies from cache

 Fully associative, TB + PT, LRU, Write-back

 Designed as

 paged: fixed size blocks

 segmented: variable size blocks

 hybrid: segmented paging or multiple page sizes

 Avoid small address size


Summary 2: Typical Choices

Option

Block Size

Hit Time

Miss Penalty

Local Miss Rate

Size

Backing Store

Q1: Block

Placement

Q2: Block ID

Q3: Block

Replacement

Q4: Writes

TLB L1 Cache

4-8 bytes (1 PTE) 4-32 bytes

1 cycle

10-30 cycles

.1 - 2%

32B – 8KB

L1 Cache

Fully or set associative

Tag/block

Random (not last)

Flush on PTE write

1-2 cycles

8-66 cycles

.5 – 20%

1 – 128 KB

L2 Cache

DM

Tag/block

N.A. For DM

Through or back

L2 Cache

32-256 bytes

6-15 cycles

30-200 cycles

13 - 15%

256KB - 16MB

DRAM

DM or SA

Tag/block

Random (if SA)

Write-back


VM (page)

4k-16k bytes

10-100 cycles

700k-6M cycles

.00001 - 001%

Disks

Fully associative

Table

LRU/LFU

Write-back

51

virtual addresses

Computer Architecture and Organization

Lecture 8

Outline

The Full Memory Hierarchy

Virtual Memory

Virtual Memory

VM Benefit

Virtual Memory Basics

VM: Page Mapping

VM: Address Translation

Example of virtual memory

Cache terms vs. VM terms

More definitions and cache comparisons

Cache VS. VM comparisons (1/2)

Cache VS. VM comparisons (2/2)

Virtual Memory

Paging Hardware

Address Translation in a Paging System

How big is a page table?

Test Yourself

Test Yourself

Test Yourself

Another Example

Replacement policies

Block replacement

Writing a block

Mechanism vs. Policy

Replacement Policy

Replacement Policy Simulation

Page tables and lookups…

Paging/VM (1/3)

Paging/VM (2/3)

Paging/VM (3/3)

Translation Cache

Translation Cache

An example of a TLB

The “big picture” and TLBs

The “big picture” and TLBs

Pages are Cached in a Virtual Memory System

Normal Page Tables

Inverted Page Tables

Inverted Page Table

Address Translation Reality

Back to the 4 Questions

Last Question

Page Sizes

More on TLBs

Selecting a Page Size

Memory Protection

Pitfall: Address space to small

Virtual Memory Summary

Summary 2: Typical Choices

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib