Lecture 22:Advanced Memory Management

advertisement

CS 422/522 Design & Implementation of Operating Systems

Lecture 22:Advanced Memory

Management

Zhong Shao

Dept. of Computer Science

Yale University

Acknowledgement: some slides are taken from previous versions of the CS422/522 lectures taught by Prof. Bryan Ford and Dr. David Wolinsky, and also from the official set of slides accompanying the OSPP textbook by Anderson and Dahlin.

Address translation uses

◆  

◆  

◆  

◆  

◆  

Process isolation

Keep a process from touching anyone else’s memory, or the kernel’s

Efficient interprocess communication

Shared regions of memory between processes

Shared code segments

E.g., common libraries used by many different programs

Program initialization

Start running a program before it is entirely in memory

Dynamic memory allocation

Allocate and initialize stack/heap pages on demand

12/1/15

1

Address translation (more)

◆  

◆  

◆  

◆  

Program debugging

Data breakpoints when address is accessed

Zero-copy I/O

Directly from I/O device into/out of user memory

Memory mapped files

Access file data using load/store instructions

Demand-paged virtual memory

Illusion of near-infinite memory, backed by disk or memory on other machines

Address translation (even more)

◆  

◆  

◆  

◆  

◆  

Checkpoint/restart

Transparently save a copy of a process, without stopping the program while the save happens

Persistent data structures

Implement data structures that can survive system reboots

Process migration

Transparently move processes between machines

Information flow control

Track what data is being shared externally

Distributed shared memory

Illusion of memory that is shared between machines

12/1/15

2

Web server

Server

Request

Buffer

1.

Network

Socket Read

3.

Kernel Copy

4.

Parse Request

Reply

Buffer

5.

File Read

8.

Kernel Copy

Kernel

2.

Copy Arriving

Packet (DMA)

Hardware

Network Interface

6.

Disk Request

7.

Disk Data

(DMA)

Disk Interface

9.

Format Reply

10.

Write and Copy to Kernel Buffer

12.

Format Outgoing

Packet and DMA

Zero copy I/O:

block aligned Read/Write system calls

Before Zero Copy

Empty Buffer

User Page Table

Full Kernel Buffer

After Zero Copy

Free Page

User Page Table

Full User and

Kernel Buffer

12/1/15

3

Virtualization & virtual machines

12/1/15

Other virtualization examples

◆  

◆  

◆  

◆  

Qemu

Windows games on Linux and Mac (virtualize DirectX and WinAPI)

Android/iOS simulators (emulators) for app development

The cloud

Goals:

◆  

Complete (machine-level) isolation (security)

◆  

◆  

Consolidation

Multipe Oses

◆  

◆  

Kernel development

….

4

x86 virtualization

◆  

Instructions that violate requirements

Privileged – user mode trap into kernel

Trap and emulate

*

Execute code as normal

*

Execute privileged instruction

*

Triggers trap to hypervisor

*

Hypervisor handles and returns to user

◆   x86 hardware support

Instantiates different machine states: vmcs (virtual machine control structures):

Vmxon /vmxoff – turn on / off support

Vm enter and vm exit --- run / stop a VM

Virtual machines and virtual memory

Guest Virtual

Address Space

Guest Virtual

Address

Guest Page Table

Guest Physical

Address

Guest Physical

Address Space

Guest Physical

Address

Host Page Table

Host Physical

Address

Host Physical

Memory

12/1/15

5

Shadow page tables

Guest Virtual

Address Space

Guest Virtual

Address

Guest Page Table

Guest Physical

Address

Guest Physical

Address Space

Guest Physical

Address

Host Page Table

Host Physical

Address

Host Physical

Memory

Shadow Page Table

Host Physical

Address

Hardware support for virtual machines

◆   x86 recently added hardware support for running virtual machines at user level

◆  

Operating system kernel initializes two sets of translation tables

One for the guest OS

One for the host OS

◆  

Hardware translates address in two steps

First using guest OS tables, then host OS tables

TLB holds composition

12/1/15

6

Intel® Virtualiza6on Technologies

Intel® VT-x

Processor

Intel® VT-x

Hardware assists for robust virtualizaBon

Intel® VT FlexMigraBon – Flexible live migraBon

Intel® VT FlexPriority – Interrupt acceleraBon

Intel® EPT – Memory VirtualizaBon

Intel® VT-d

Chipset

Intel® VT-c

Network

Intel® VT for Directed I/O

Reliability and Security through device IsolaBon

I/O performance with direct assignment

Intel® VT for ConnecBvity

NIC Enhancement with VMDq

Single Root IOV support

Network Performance and reduced CPU uBlizaBon

Intel® I/OAT for virtualizaBon

Lower CPU Overhead and Data AcceleraBon

So$ware & Services Group

A Framework for Op6mizing Virtualiza6on

•   Reduce overheads from virtualiza6on

•   Intel® VT-x Latency Reduc6ons

•   Extended Page Tables

•   Virtual Processor IDs

•   APIC Virtualiza6on (Flex Priority)

•   I/O Assignment via DMA Remapping

•   Introduce capabili6es that increase scaling out across VMs

•   Intel® Hyper-Threading Technology

•   PAUSE-loop Exi6ng

•   Network Virtualiza6on

2.

… and scale out across VMs

1.

Reduce overhead within each VM…

OS

VMM

OS

OS

Host

14

Intel® Virtualization Technology (Intel® VT) for IA-32, Intel® 64 and Intel® Architecture (Intel® VT-x)

So$ware & Services Group

12/1/15

7

Reducing VT Latencies

What makes VM Context Switching expensive

Saving-Loading privileged state

•   Compounded by consistency checking

Addressing Context changes

•   Transla6on Lookaside Buffer (TLB) flushes

15

Virtual Machine Ctl Structure (VMCS)

Maintains Guest & Host reg. state

“Backed” by host physical memory

Accessed via architectural VMREAD / VMWRITE

Enables caching of VMCS state on-die

Virtual Processor IDs (VPIDs)

  Tag µarch structures (TLBs)

  Removes need to flush TLBs

So$ware & Services Group

Intel

®

Virtualiza6on Technology (VT-x)

•   VT-x® provides architected assists to allow guest OSes to run directly on hardware

•   On Nehalem and Westmere VT-x is extended with:

Extended Page Tables (EPT)

Eliminates VM exits to the VMM for shadow pagetable maintenance

Virtual Processor IDs (VPID)

Guest Preemption Timer -- lets a VMM preempt a guest OS

Avoid flushes on VM transitions to give a lower-cost

VM transition time

Aids VMM vendors in flexibility and Quality of

Service (QoS)

Descriptor Table Exiting –Traps on modifications of guest DTs Allows VMM to protect a guest from internal attack

Transition Latency reductions

Continuing improvements in microarchitectural handling of VMM round trips

So$ware & Services Group

12/1/15

8

Issues with abstrac6ng physical memory

Address Transla6on

•   Guest OS expects con6guous, zerobased physical memory

•   VMM must preserve this illusion

Page-table Shadowing

•   VMM intercepts paging opera6ons

•   Constructs copy of page tables

Overheads

•   VM exits add to execu6on 6me

•   Shadow page tables consume significant host memory

VM

0

VM n

Induced

VM Exits

Guest

Page Tables

Remap

Guest

Page Tables

VMM

Shadow

Page Tables

TLB

CPU

0

Memory

17

So$ware & Services Group

How Extended Page Tables help with abstrac6ng Physical

Memory

Extended Page Tables (EPT)

•   Map guest physical to host address

•   New hardware page-table walker

Performance Benefit

•   A guest OS can modify its own page tables freely and without VM exits

Memory Savings

•   A single EPT supports en6re VM: instead of a shadow pagetable per guest process

18

Intel® Virtualization Technology (Intel® VT) for IA-32, Intel® 64 and Intel® Architecture (Intel® VT-x)

So$ware & Services Group

12/1/15

9

19

Linear Addr Extended Page Table (EPT) Walk

CR3

EPTP L4

PML4

TLB EPTP L4

PDP

EPTP L4

PDE

EPTP L4

PTE

EPTP L4

Physical Addr

2-level TLB reduces page-table walks

•   VPID tags help to retain TLB entries

Paging-structure caches

•   Cache intermediate steps in walk

•   Result: Reduce length of walks

•   Common case: Much beier than 24 steps

L3

L3

L3

L3

L3

L2

L2

L2

L2

L2

L1

L1

L1

L1

L1

So$ware & Services Group

Difficul6es in virtualizing I/O

Virtual Device Interface

•   Traps device commands

•   Translates DMA opera6ons

•   Injects virtual interrupts

Solware Methods

•   I/O Device Emula6on

•   Paravirtualize Device Interface

Challenges

•   Controlling DMA and interrupts

•   Overheads of copying I/O buffers

20

1

I/O Device

Emula6on

VM

0

Guest Device

Driver

Para- 2 virtualiza6on

VM n

Guest Device

Driver

Device Model Device Model

Physical Device Driver

VMM

Memory

Storage

Network

So$ware & Services Group

12/1/15

10

Solu6on: DMA remapping (Intel® VT-d)

Fault Genera9on

Bus 255

Bus N

Bus 0

Dev 31, Func 7

Dev P, Func 2

Dev P, Func 1

Dev 0, Func 0

DMA Remapping

Engine

Transla9on Cache

Context Cache

Memory Access with Host Physical Address

4KB Page

Tables

Device D1 Address Transla6on

Structures

Device

Assignment

Structures

Device D2

Address Transla6on

Structures

Memory-resident ParBBoning &

TranslaBon Structures

4KB

Page

Frame

21

So$ware & Services Group

Transparent checkpoint

Copy of Process Copy of Process

Process

Checkpoint Restore

Process

Execute

Instructions

X

Failure

Time

Execute

Instructions

12/1/15

11

Question

◆  

At what point can we resume the execution of a checkpointed program?

When the checkpoint starts?

When the checkpoint is entirely on disk?

12/1/15

Incremental checkpoint

Checkpoint 1

(Full)

C

D

A

B

E

Checkpoint 2 Checkpoint 3

P

Q

R

S

Restore

(Full)

R

Q

A

P

S

12

Deterministic debugging

◆  

◆  

Can we precisely replay the execution of a multithreaded process?

If process does not have a memory race

From a checkpoint, record:

All inputs and return values from system calls

All scheduling decisions

All synchronization operations

*

Ex: which thread acquired lock in which order

Process migration

◆  

What if we checkpoint a process and then restart it on a different machine?

Process migration: move a process from one machine to another

Special handling needed if any system calls are in progress

*

Where does the system call return to?

12/1/15

13

Cooperative caching

◆  

Can we demand page to memory on a different machine?

Remote memory over LAN much faster than disk

On page fault, look in remote memory first before fetching from disk

12/1/15

Distributed virtual memory

◆  

◆  

◆  

Can we make a network of computers appear to be a shared-memory multiprocessor?

Read-write: if page is cached only on one machine

Read-only: if page is cached on several machines

Invalid: if page is cached read-write on a different machine

On read page fault:

Change remote copy to read-only

Copy remote version to local machine

On write page fault (if cached):

Change remote copy to invalid

Change local copy to read-write

14

Recoverable virtual memory

◆  

Data structures that survive failures

Want a consistent version of the data structure

User marks region of code as needing to be atomic

*

Begin transaction, end transaction

If crash, restore state before or after transaction

12/1/15

Recoverable virtual memory

◆  

◆  

◆  

◆  

On begin transaction:

Snapshot data structure to disk

Change page table permission to read-only

On page fault:

Mark page as modified by transaction

Change page table permission to read-write

On end transaction:

Log changed pages to disk

Commit transaction when all mods are on disk

Recovery:

Read last snapshot + logged changes, if committed

15

Download