Process/Thread/Virtual Machine ECE7610/7650 © C. Xu 1998-2010 Outline • Process and Thread – Abstraction of execution entity • Software vs HW Implementation – User-level and Kernel-level – Hyper-threading and multi-core • Virtual Machines – Another layer of abstraction of exec entity • Scheduling Policies/Disciplines © C. Xu 1998-2010 Process Model • Process: a program in execution – IE to web browsing, outlook to read email, word to write reports • Process is a full blown virtual machine, defining its own address space and maintains running state – Address space: the set of memory locations that can be generated and accessed directly by a program; enforced by hw for protection – I/O part of the machine is accessed through OS calls. © C. Xu 1998-2010 Under the Hood • Process Control Block (PCB): a key data structure that maintains info associated with each process: – – – – – – – Process state Program counter CPU registers CPU scheduling information Memory-management info Accounting information I/O status information © C. Xu 1998-2010 Process State • As a process executes, it changes state – – – – – new: The process is being created. running: Instructions are being executed. waiting: The process is waiting for some event to occur. ready: The process is waiting to be assigned to a process. terminated: The process has finished execution. © C. Xu 1998-2010 Context Switch • When CPU switches to another process, the system must save the state of the old process and load the saved state for the new process. • Context-switch time is overhead; the system does no useful work while switching. • Time dependent on hardware support. PCB © C. Xu 1998-2010 Process Schedulers • Long-term scheduler (or job scheduler) – selects which processes should be brought into the ready queue. – Invoke infrequently (seconds, minutes) – Control the degree of concurrency • Short-term scheduler (or CPU scheduler) – selects which process should be executed next and allocates CPU. – Invoke frequently (millisecond) enough to provide a perception of concurrent execution on single core system; must be fast – Optimize throughput, average response time, slowdown, revenue, etc – Slowdown is a normalized queuing delay w.r.t. exec time – On multiprocessor or multi-core system, real concurrent execution © C. Xu 1998-2010 Interprocess Comm (IPC) • Processes are usually communicate by sending msgs back and forth via the OS • Processes can share a segment of physical memory by mapping it to their address spaces – data in the segment can be accessed by processes for faster IPC © C. Xu 1998-2010 Thread and Multithreaded Proc • A thread of control is a seq of instructions being executed within a process context. Each has its own logical control flow (pc) • A multithreaded process has two or more threads within the same context. They share code, data in heap, and kernel context(open files, timers, etc) • Each thread has its own id. Thread 1 (main thread) Shared code and data Thread 2 (peer thread) shared libraries stack 1 stack 2 run-time heap read/write data read-only code/data Thread 1 context: Data registers Condition codes SP1 PC1 0 Kernel context: VM structures Open files Signal handlers brk pointer Thread 2 context: Data registers Condition codes SP2 PC2 © C. Xu 1998-2010 Logical View of Threads • Threads associated with a process form a pool of peers. – unlike processes which form a tree hierarchy Threads associated with process foo Process hierarchy P0 T2 T4 T1 P1 shared code, data and kernel context sh T5 sh sh T3 foo bar © C. Xu 1998-2010 Process vs Threads • Processes are typically independent, while threads exist as subsets of a process • Processes carry considerable state info, whereas multiple threads within a process share state as well as memory and other resources • Processes have separate address space, whereas threads share their address space • Processes interact through system-provided IPC • Context switching between threads is an order of magnitude faster than context switching between processes © C. Xu 1998-2010 Why Multithreading • Improve program structure – e.g. producer-consumer problem • Efficiency – – – – Creating a process calls into OS, which duplicates entire address space Synchronizing processes need to trap into the OS, too. Threads could be created in user-space Threads could be synchronized by monitoring shared variables • Match multi-core and multiprocessor arch • Improve application responsiveness – overlapping I/O • e.g. Multithreaded clients, fast file read/write – asynchronous event handling • e.g. network-centric server , GUI © C. Xu 1998-2010 Outline • Process and Thread – Abstraction of execution entity • Software vs HW Implementation – User-level and Kernel-level – Hyper-threading and multi-core • Virtual Machines – Another layer of abstraction of exec entity • Scheduling Policies/Disciplines © C. Xu 1998-2010 Design Issues for Multithreading • Thread management: – thread creation and termination • Thread synchronization: – mutex, condition variable – semaphore, reader/writer – monitor • Thread scheduling – in a way similar to process scheduling © C. Xu 1998-2010 Implementation • User space implementation (green threads) – Transparent to OS kernel – Runtime system manages threads in userspace – E.g. Java Runtime Enviroment • Kernel space implementation – Thread becomes a basic unit of OS’s resource management – e.g. Window, Solaris, Linux • Hardware implementation: Simultaneous Multithreading (SMT) – Duplicate certain CPU sections (arch. states) to make a processor appear as multiple “logical” processors – E.g. Intel’s Hyper-Threading Technology (HTT) © C. Xu 1998-2010 User Space Implementation • Runtime system is a collection of procedures that manage threads – create/terminate threads, synchronization, schedule – RTS maintains a table per process with each entry per thread • thread’s registers, state, priority, etc. – Switch threads when a thread be suspended • switch register values • switch stack pointer and program counter © C. Xu 1998-2010 Kernel Space Implementation • Threads are managed by kernel – creation/termination are through system calls. – Thread table is maintained in the kernel space • Threads are scheduled as processes – when a thread is blocked, the kernel run another thread from the same proc or a different proc • but, relatively heavier – Solution: thread recycling (create more kernel level threads than available processors); But – Idle kernel threads, priority problems, deadlock introduced by blocked kernel threads – Kernel doesn’t know which threads to take away © C. Xu 1998-2010 Pros and Cons of User Threads • Lightweight: an order of magnitude faster than kernel trapping (procedure call ~10ns vs system call ~5us) • flexible in scheduling threads of a process in userspace • Cons: – how to implement blocking system calls • A blocking call may stop all threads of a process • Wrap a blocking call with a jacket: e.g. SELECT, MPI_Test() • No true parallelism without support from kernel thread – Information barrier between library and kernel • A thread holding lock could be scheduled out by kernel • A thread of high priority could be scheduled out by kernel • Hard for time-slicing scheduling – no clock interrupts to threads – Java thread scheduling is left up to implementation • round-robin in Solaris vs time-slicing in Windows • robust code: yield or sleep() © C. Xu 1998-2010 Hybrid User/kernel-Space Impl. • M-1 model (user-level): all app-level threads map to a single kernel-level scheduled entity. portable, easy to programming, but no concurrency • 1:1 model (kernel-level): user-level threads are in 1-1 correspondence with schedulable entities in the kernel. Each user level thread is known to the kernel and all threads can access the kernel at the same time but, hard to program. Linux, Solaris 5.9 and later, NetBSD5, FreeBSD 8 Solaris 5.2 to 5.9, NetBSD2 to 4, FreeBSD 5 and 6 © C. Xu 1998-2010 N-to-M Model: • M:M model (2-level model) to minimizesprogramming effort while reducing the cost and weight of each thread. • Threads, Lightweight Process, Processor – user threads over lightweight processes(lwp) – LWP, supported by kernel-thread, over CPU • A program can create any number of threads. It relies on user-level threads library for scheduling. Kernel only needs to manage currently active threads. • But, complexity, priority inversion, suboptimal scheduling © C. Xu 1998-2010 M-to-M Thread Model • Solaris 5.2 to 5.8, NetBSD 2 to Net BSD 4, FreeBSD 5 to 7 • Create user-level threads via library libthread • User can specify how many LWPs should run these userlevel threads – thr_setconcurrency(new_level) and thr_getconcurrency() • User can bind threads to LWPs, or leave it to the library to schedule – unbound vs bound • User level thread library libthread schedules the threads on the LWPs – preemptive scheduling: thr_setprio() and thr_getprio() • Kernel schedules the LWPs on the available CPUs – each LWP has a unique interval timer and alarm © C. Xu 1998-2010 Simultaneous Multithreading • SMT (Hyper-threading in Pentium 4): duplicate certain sections of the processor, mainly those for storing the architectural states, but not duplicating the main execution resources – Make it appear to OS as multiple “logical” processors – When the execution part is not used due to memory stall, another task can be scheduled for execution RAM CPU • Transparent to OS and programs. But SMP support needs to be enabled to take advantage of HTT • Cons: not energy efficient © C. Xu 1998-2010 Multicore Architecture • Combine 2 or more independent cores (normally CPU) into a single package • Support multitasking and multithreading in a single physical package © C. Xu 1998-2010 Hyperthreading vs Multicore © C. Xu 1998-2010 Multithreading on multi-core David Geer, IEEE Computer, 2007 © C. Xu 1998-2010 Outline • Process and Thread – Abstraction of execution entity • Software vs HW Implementation – User-level and Kernel-level – Hyper-threading and multi-core • Virtual Machines – Another layer of abstraction of exec entity • Scheduling Policies/Disciplines © C. Xu 1998-2010 What is Virtualization • In computing, virtualization is an abstraction layer (sw) that hides physical characteristics of computing platform from the users, instead showing another abstract, emulated computing platform [wikipedia] • Machine virtualization is about the creation and management of VMs on a “real” machine – E.g. Java Virtual Machine to accept java bytecode in the form .class files. – Virtual PC allows Windows app to be run on Mac/PowerPC – SunPC sw emulates a PC hw environment on Solaris/SPARC [10 years ago] – How about Cygwin, which provides Linux-like environment for Windows? – etc © C. Xu 1998-2010 Computer Systems Arch Application Programs Runtime Libraries Operating System drivers Memory mng sched Execution Hardware Memory Translation System Interconnect (bus) Controllers I/O devices and networking Controllers Main memory © C. Xu 1998-2010 Computer Systems Arch (cont’) • Instruction set arch. (ISA), introduced in IBM 360 series in early 60’s, provides an interface between HW and SW, so that hw could be implemented in various ways – Various models provides different perf/price levels – ISA makes SW portable in the same series. • OS provides a first layer of abstraction, that hides specifies of the hw from programs. – ISA provides an interface Application Software OS System ISA User ISA ISA Machine © C. Xu 1998-2010 Application Binary Interface • From the perspective of a user process, the machine is a combination of the OS and the underlying user-level HW, defined by the ABI interface Application Software System Calls User ISA ABI Machine Application Binary Interface © C. Xu 1998-2010 Virtual Machine • A VM executes sw in the same manner as the machine for which the sw was developed. • VM is implemented as a combination of a real machine and virtualizing software – The VM may have different resources different from the real machine either in quantity or type (cpu, mem, I/O, etc) – A VM often provides a less perf than an equivalent real machine running the same sw © C. Xu 1998-2010 Virtualization • Mapping of virtual resources or state (e.g. registers, memory, files, etc) to real resources • User of real machine instructions and/or system calls to carry out the actions specified by VM instructions and/or system calls (e.g. emulation of the VM ABI or ISA) • Two types of VM – Process VM, running as a user process – System VM, running as a OS © C. Xu 1998-2010 Process VMs • Process-level VMs provide user apps with a virtual ABI env for – Replication, emulation, and optimization • Types of process-level VMs: – Multiprogramming – Emulators and Dynamic Binary Translators – Same-ISA Binary Optimizers – High-Level Language Virtual Machine © C. Xu 1998-2010 HLL Process VM (a) Platform-dependent code is distributed in traditional systems (b) In HLL Process VM like JVM and MS common language infrs (CML): portable immediate code is interpreted by a platform-dependent VM – bytecode intermediate ISA, stack arch, runtime lib support © C. Xu 1998-2010 System Virtual Machine • VM monitor (VMM): provide a complete exec environment supporting multiple user processes, providing them with access to I/O devices, and supports GUI if on the desktop © C. Xu 1998-2010 System VMs Architecture • Two types of VMM: hypervisor versus hosted VM arch • Hypervisor (bare-metal) arch: – VMM is placed on bare HW, and VMs fit on top – VMM runs in the most highly privileged mode, and VMs run with lesser privileges – VMM intercepts and impl all the guest OS’s actions that interact with HW resources efficient – But, reinstallation, I/O device drivers must be available in VMM Window apps Linux apps Windows Linux VMM IA-32 © C. Xu 1998-2010 System VM Arch: Hosted VM • Place virtualizing sw on top of an existing host OS • Easy in management (installation), relying on Host OS for device drivers Guest Apps • But, less efficient Guest OS (win) Apps VMM Host OS (e.g. Linux) hardware © C. Xu 1998-2010 OS vs Runtime Library • RT Library • OS – Processes, each has an illusion of the machine – Multitasking: context switch between the illusions – System API (calls) to be called by apps – Ops in privileged mode permit the direct manipulation, allocation, and observation of shared hw resources like CPU, mem, and I/O – Runtime: program execution duration , vs compile-time – RT library: functions for I/O, mem, math, errorchecking, etc , which are performed at runtime • compiler-specific and platform-specific – Implementation of RT library needs access to OS © C. Xu 1998-2010 Virtualization Usage Models • Reduce total cost of ownership (TCO) by server consolidation – Increased systems utilization (current servers have less than 10% average utilization, less than 50% peak utilization) – Reduce hardware (25% of the TCO) – Space, electricity, cooling (50% of the operating cost of a data center) © C. Xu 1998-2010 Vir Usage: Dynamic Datacenter • Simplify management for perf. and availability – – – – Dynamic provisioning on demand Workload management/isolation VM migration Reconfiguration © C. Xu 1998-2010 Vir Usage: Client Centralization • Virtual desktop consolidation • VM streaming © C. Xu 1998-2010 Vir Usage: Security • Packet inspection through a vm security appliance • Env isolation & “disposable VMs” © C. Xu 1998-2010 Other Vir Usage Models • • • • • • • • Mixed-OS envs Legacy applications Multiplatform application development New system transition System sw development OS training Help desk support OS instrumentation • Virtualization protects IT investment • Virtualization is a true scalable multi-core work load © C. Xu 1998-2010 Inside the VMM • Resource Virtualization – Processor – Memory – Device and I/O • Inter-VM Comm – Event channel – Datagram channel – Shared mem mapping © C. Xu 1998-2010 CPU Virtualization • Execution of the guest instructions – Direct native execution, but not always possible • “A virtualizable arch allows any instruction inspecting/modifying machine state to be trapped when executed in any but the most privileged mode.” (Popek & Goldberg’1974) • X86 is NOT Applications Applications User Library User Library Kernel Hardware Conventional Arch User Space (unprivileged) Kernel Space (privileged) Kernel VM Monitor Hardware © C. Xu 1998-2010 Virtualization Arch Sufficient Conditions for Vir – Classification of Instructions: – Privileged instruction traps if the machine is in user mode and does not trap if in system mode – Control-sensitive instructions attempt to change the configuration of resources in the system – Behavior-sensitive instructions: results produced depend on the configuration of resources – A VMM may be constructed if the set of sensitive instructions is a subset of the privileged instructions – Intuitively, it is sufficient that all instructions that could affect the correct functioning of the VMM (sensitive instructions) always trap and pass control to the VMM. © C. Xu 1998-2010 Challenges of X86 Virtualization • IS-32 contains 16 sensitive, but non-privileged instructions – Sensitive register instructions: read or change sensitive registers and/or memory locations such as a clock register or interrupt registers: • SGDT, SIDT, SLDT, SMSW, PUSHF, POPF – Protection system instructions: reference the storage protection system, memory or address relocation system: • LAR, LSL, VERR, VERW, POP, PUSH, CALL, JMP, INT n, RET, STR, MOV • E.g. IA-32 POPF instruction pops the flag registers from a stack held in memory. © C. Xu 1998-2010 X86 Virtualization • Full virtualization using binary translation to trap and virtualize the execution of certain instructions – VMware Virtual Platform (patent for BT in 1998) • OS assisted vir or Para-virtualization – Xen • Hardware assisted virtualization © C. Xu 1998-2010 Full Virtualization • User level codes are executed directly on the processor • Translate kernel code to replace non-virtualizable instructions with new sequences of instructions that have the intended effects on the virutal haredware. X86 privileged level arch w/o virtualization Binary translation © C. Xu 1998-2010 OS-assisted Vir (Para-Vir) • Modify the OS kernel to replace non-virtualizable instructions with hypercalls that comm directly with the virtualization layer hypervisor • Hypervisor also provides hypercall interfaces for other critical kernel ops such as mem management, interrupt handling and time keeping © C. Xu 1998-2010 Hardware Assisted Virtualization • Intel Virtual Technology (VT-X) and AMD’s AMD-V • A new CPU exe mode that allows VMM to run in a new root mode below ring 0 • Privileged and sensitive calls are set to automatically trap to the hypervisor 51 © C. Xu 1998-2010 Memory Virtualization • Native platform (without VMM) : – OS keeps maps from virtual address space to real memory which is physical memory (tranlsate-lookup table) • Virtualized platform (with VMM): – Guest’s real memory must undergo further mapping to determine address in physical memory of host hardware © C. Xu 1998-2010 I/O Virtualization • Manage the routing of I/O requests between virtual devices and the shared physical hardware • For a given I/O device type, construct a virtual version of the device and then virtualize I/O activity directed at the device • When guest VM makes request to use virtual device, request is intercepted and converted to the equivalent on the physical device • Different treatments for different devices: – Partitioned devices: disk – Dedicated devices: mouse, console, keyboard… – Shared devices: network adapter © C. Xu 1998-2010 VMWare Arch • VMM doesn’t have access to I/O devices • I/O belongs to “host OS” © C. Xu 1998-2010 Xen 2.0 Arch • Para-virtualization • I/O through VM0 VM0 VM1 VM2 VM3 Device Manager & Control s/w Unmodified User Software Unmodified User Software Unmodified User Software GuestOS GuestOS GuestOS GuestOS (XenLinux) (XenLinux) (XenLinux) (Solaris) Front-End Device Drivers Front-End Device Drivers Front-End Device Drivers Back-Ends Native Device Drivers Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU Xen Virtual Machine Monitor Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) © C. Xu 1998-2010 Xen 3.0 Arch • Full virtualization on VT-x machines AGP ACPI PCI x86_32 x86_64 IA64 VM0 VM1 VM2 Device Manager & Control s/w Unmodified User Software Unmodified User Software Unmodified User Software GuestOS GuestOS GuestOS (XenLinux) (XenLinux) (XenLinux) Unmodified GuestOS (WinXP)) Back-End SMP Native Device Drivers Control IF Front-End Device Drivers Safe HW IF Front-End Device Drivers Event Channel Virtual CPU Front-End Device Drivers VT-x Virtual MMU Xen Virtual Machine Monitor Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) © C. Xu 1998-2010 In Summary • Process and Thread – Abstraction of execution entity • Software vs HW Implementation – User-level and Kernel-level – Hyper-threading and multi-core • Virtual Machines – Another layer of abstraction of exec entity • Scheduling Policies/Disciplines © C. Xu 1998-2010