Virtualization Virtual Data Center Design Virtualization • In computing, virtualization is a broad term that refers to the abstraction of computer resources • It is "a technique for hiding the physical characteristics of computing resources from the way in which other systems, applications, or end users interact with those resources. This includes making a single physical resource (such as a server, an operating system, an application, or storage device) appear to function as multiple logical resources; or it can include making multiple physical resources (such as storage devices or servers) appear as a single logical resource." Virtualization • The common theme of all virtualization technologies is the hiding of technical detail, through encapsulation. • Virtualization creates an external interface that hides an underlying implementation, e.g. by multiplexing access, by combining resources at different physical locations, or by simplifying a control system. Virtualization • It is divided into two main categories: – Platform virtualization involves the simulation of virtual machines. – Resource virtualization involves the simulation of combined, fragmented, or simplified resources. Platform Virtualization • the creation of a virtual machine using a combination of hardware and software is referred to as platform virtualization • Platform virtualization is performed on a given hardware platform by "host" software (a control program), which creates a simulated computer environment (a virtual machine) for its "guest" software. • The "guest" software, which is often itself a complete operating system, runs just as if it were installed on a stand-alone hardware platform. • Typically, many such virtual machines are simulated on a given physical machine. • For the "guest" system to function, the simulation must be robust enough to support all the guest system's external interfaces, which (depending on the type of virtualization) may include hardware drivers. Platform Virtualization • There are several approaches to platform virtualization, listed below based on how complete a hardware simulation is implemented. (The following terms are not universally-recognized as such, but the underlying concepts are all found in the literature.) Platform Virtualization • Emulation or simulation – the virtual machine simulates the complete hardware, allowing an unmodified "guest" OS for a completely different CPU to be run. This approach has long been used to enable the creation of software for new processors before they were physically available. Examples include Bochs, PearPC, PPC version of Virtual PC, QEMU without acceleration, and the Hercules emulator. Emulation is implemented using a variety of techniques, from state machines to the use of dynamic recompilation on a full virtualization platform. Hardware emulation uses a VM to simulate the required hardware Platform Virtualization • Native virtualization and full virtualization – the virtual machine simulates enough hardware to allow an unmodified "guest" OS (one designed for the same CPU) to be run in isolation. Typically, many instances can be run at once. This approach was pioneered in 1966 with CP-40 and CP[-67]/CMS, predecessors of IBM's VM family. • Examples include Virtual Iron, VMware Workstation, VMware Server (formerly GSX Server), Parallels Desktop, Adeos, Mac-on-Linux, Win4BSD, Win4Lin Pro, and z/VM. Full virtualization uses a hypervisor to share the underlying hardware Hypervisor • In computing, a hypervisor (also: virtual machine monitor) is a virtualization platform that allows multiple operating systems to run on a host computer at the same time. The term usually refers to an implementation using full virtualization. Hypervisor • Hypervisors are currently classified in two types: – Type 1 hypervisor (or Type 1 virtual machine monitor) is software that runs directly on a given hardware platform (as an operating system control program). A "guest" operating system thus runs at the second level above the hardware. • The classic type 1 hypervisor was CP/CMS, developed at IBM in the 1960s, ancestor of IBM's current z/VM. More recent examples are Xen, VMware's ESX Server, and Sun's Hypervisor (released in 2005). – Type 2 hypervisor (or Type 2 virtual machine monitor) is software that runs within an operating system environment. A "guest" operating system thus runs at the third level above the hardware. • Examples include VMware server and Microsoft Virtual Server. Platform Virtualization • Partial virtualization (and including "address space virtualization") – the virtual machine simulates multiple instances of much (but not all) of an underlying hardware environment, particularly address spaces. Such an environment supports resource sharing and process isolation, but does not allow separate "guest" operating system instances. Although not generally viewed as a virtual machine category per se, this was an important approach historically, and was used in such systems as CTSS, the experimental IBM M44/44X, and arguably such systems as OS/VS1, OS/VS2, and MVS. (Many more recent systems, such as Microsoft Windows and Linux, as well as the remaining categories below, also use this basic approach.) Platform Virtualization • Paravirtualization – the virtual machine does not necessarily simulate hardware, but instead (or in addition) offers a special API that can only be used by modifying the "guest" OS. This system call to the hypervisor is called a "hypercall" in Xen, Parallels Workstation and Enomalism; it is implemented via a DIAG ("diagnose") hardware instruction in IBM's CMS under VM (which was the origin of the term hypervisor). • Examples include VMware ESX Server, Win4Lin 9x, and z/VM. Paravirtualization shares the process with the guest operating system Platform Virtualization • Operating system-level virtualization – virtualizing a physical server at the operating system level, enabling multiple isolated and secure virtualized servers to run on a single physical server. The "guest" OS environments share the same OS as the host system – i.e. the same OS kernel is used to implement the "guest" environments. Applications running in a given "guest" environment view it as a stand-alone system. • Examples are Linux-VServer, Virtuozzo, OpenVZ, Solaris Containers, and FreeBSD Jails. Operating system-level virtualization isolates servers Platform Virtualization • Application Virtualization – running a desktop or server application locally, using local resources, within an appropriate virtual machine; this is in contrast with running the application as conventional local software, i.e. software that has been 'installed' on the system. (Compare this approach with Software installation and Terminal Services.) – Such a virtualized application runs in a small virtual environment containing the components needed to execute – such as registry entries, files, environment variables, user interface elements, and global objects. – This virtual environment acts as a layer between the application and the operating system, and eliminates application conflicts and applicationOS conflicts. Examples include the Sun Java Virtual Machine, Softricity, Thinstall, Altiris, and Trigence. (This approach to virtualization is clearly different from the preceding ones; only an arbitrary line separates it from such virtual machine environments as Smalltalk, FORTH, Tcl, P-code, or any interpreted language.) Resource Virtualization • The basic concept of platform virtualization, was later extended to the virtualization of specific system resources, such as storage volumes, name spaces, and network resources. Resource Virtualization • Resource aggregation, spanning, or concatenation combines individual components into larger resources or resource pools. For example: – RAID and volume managers combine many disks into one large logical disk. – Storage Virtualization refers to the process of completely abstracting logical storage from physical storage, and is commonly used in SANs. The physical storage resources are aggregated into storage pools, from which the logical storage is created. Multiple independent storage devices, which may be scattered over a network, appear to the user as a single, location-independent, monolithic storage device, which can be managed centrally. – Channel bonding and network equipment use multiple links combined to work as though they offered a single, higherbandwidth link. – Virtual Private Network (VPN), Network Address Translation (NAT), and similar networking technologies create a virtualized network namespace within or across network subnets. – Multiprocessor and multi-core computer systems often present what appears as a single, fast processor. Resource Virtualization • Computer clusters, grid computing, and virtual servers use the above techniques to combine multiple discrete computers into larger metacomputers. • Partitioning is the splitting of a single resource (usually large), such as disk space or network bandwidth, into a number of smaller, more easily utilized resources of the same type. This is sometimes also called "zoning," especially in storage networks. • Encapsulation is the hiding of resource complexity by the creation of a simplified interface. For example, CPUs often incorporate cache memory or pipelines to improve performance, but these elements are not reflected in their virtualized external interface. Similar virtualized interfaces hiding complex implementations are found in disk drives, modems, routers, and many other "smart" devices. Linux-related virtualization projects Project Type License Bochs Emulation LGPL QEMU Emulation LGPL/GPL VMware Full virtualization Proprietary z/VM Full virtualization Proprietary Xen Paravirtualization GPL UML Paravirtualization GPL Linux-VServer Operating systemlevel virtualization GPL OpenVZ Operating systemlevel virtualization GPL Bochs (emulation) • Bochs is an x86 computer simulator that is portable and runs on a variety of platforms, including x86, PowerPC, Alpha, SPARC, and MIPS. What makes Bochs interesting is that it doesn't just simulate the processor but the entire computer, including the peripherals, such as the keyboard, mouse, video graphics hardware, network interface card (NIC) devices, and so on. • Bochs can be configured as an older Intel® 386, or successor processors such as the 486, Pentium, Pentium Pro, or a 64-bit variant. It even emulates optional graphics instructions like the MMX and 3DNow. • Using the Bochs emulator, you can run any Linux distribution on Linux, Microsoft® Windows® 95/98/NT/2000 (and a variety of applications) on Linux, and even the Berkeley Software Distribution (BSD) operating systems (FreeBSD, OpenBSD, and so on) on Linux. QEMU (emulation) • QEMU is another emulator, like Bochs, but it has some differences that are worth noting. QEMU supports two modes of operation. The first is the Full System Emulation mode. This mode is similar to Bochs in that it emulates a full personal computer (PC) system with processor and peripherals. This mode emulates a number of processor architectures, such as x86, x86_64, ARM, SPARC, PowerPC, and MIPS, with reasonable speed using dynamic translation. Using this mode, you can emulate the Windows operating systems (including XP) and Linux on Linux, Solaris, and FreeBSD. Many other operating system combinations are also supported (see the Resources section for more information). • QEMU also supports a second mode called User Mode Emulation. In this mode, which can only be hosted on Linux, a binary for a different architecture can be launched. This allows, for example, a binary compiled for the MIPS architecture to be executed on Linux running on x86. Other architectures supported in this mode include ARM, SPARC, and PowerPC, though more are under development. VMware (full virtualization) • VMware is a commercial solution for full virtualization. A hypervisor sits between the guest operating systems and the bare hardware as an abstraction layer. This abstraction layer allows any operating system to run on the hardware without knowledge of any other guest operating system. • VMware also virtualizes the available I/O hardware and places drivers for high-performance devices into the hypervisor. • The entire virtualized environment is kept as a file, meaning that a full system (including guest operating system, VM, and virtual hardware) can be easily and quickly migrated to a new host for load balancing. z/VM (full virtualization) • While the IBM System z™ is a new brand name, it actually has a long heritage originating back in the 1960s. The System/360 supported virtualization using virtual machines in 1965. Interestingly, the System z retains backward compatibility with the older System/360 line. • The z/VM® is the operating system hypervisor for the System z. At its core is the Control Program (CP), which provides the virtualization of physical resources to the guest operating systems, including Linux (see the figure on the next slide). This permits multiple processors and other resources to be virtualized for a number of guest operating systems. • The z/VM can also emulate a guest local area network (LAN) virtually for those guest operating systems that want to communicate with each other. This is emulated entirely in the hypervisor, making it highly secure. z/VM (full virtualization) Xen (paravirtualization) • Xen is a free open source solution for operating system-level paravirtualization from XenSource. Recall that in paravirtualization the hypervisor and the operating system collaborate on the virtualization, requiring operating system changes but resulting in near native performance. • As Xen requires collaboration (modifications to the guest operating system), only those operating systems that are patched can be virtualized over Xen. From the perspective of Linux, which is itself open source, this is a reasonable compromise because the result is better performance than full virtualization. But from the perspective of wide support (such as supporting other non-open source operating systems), it's a clear disadvantage. • It is possible to run Windows as a guest on Xen, but only on systems running the Intel Vanderpool or AMD Pacifica. Other operating systems that support Xen include Minix, Plan 9, NetBSD, FreeBSD, and OpenSolaris. User-mode Linux (paravirtualization) • User-mode Linux (UML) allows a Linux operating system to run other Linux operating systems in user-space. Each guest Linux operating system exists within a process of the host Linux operating system (see Figure 6). This permits multiple Linux kernels (with their own associated user-spaces) to run within the context of a single Linux kernel. User-mode Linux (paravirtualization) • As of the 2.6 Linux kernel, UML resides in the main kernel tree, but it must be enabled and then recompiled for use. These changes provide, among other things, device virtualization. This allows the guest operating systems to share the available physical devices, such as the block devices (floppy, CD-ROM, and file systems, for example), consoles, NIC devices, sound hardware, and others. User-mode Linux (paravirtualization) • Note that since the guest kernels run in application space, they must be specially compiled for this use (though they can be different kernel versions). This results in what's called the host kernel (which resides on the hardware) and the guest kernel (which runs in the user space of the host kernel). These kernels can even be nested, allowing a guest kernel to run on another guest kernel that is running on the host kernel. User-mode Linux (paravirtualization) Linux-VServer (operating systemlevel virtualization) • Linux-VServer is a solution for operating system-level virtualization. Linux-VServer virtualizes the Linux kernel so that multiple user-space environments, otherwise known as Virtual Private Servers (VPS), run independently with no knowledge of one another. Linux-VServer achieves userspace isolation through a set of modifications to the Linux kernel. Linux-VServer (operating systemlevel virtualization) • To isolate the individual user-spaces from one another, you begin with the concept of a context. A context is a container for processes of a given VPS, so that tools like ps know only about the processes of the VPS. For initial boot, the kernel defines a default context. A spectator context also exists for administration (to view all executing processes). As you can guess, the kernel and internal data structures are modified to support this approach to virtualization. Linux-VServer (operating systemlevel virtualization) • Linux-VServer also uses a form of chroot to isolate the root directory for each VPS. Recall that chroot allows a new root directory to be specified, but additional functionality is required (called a ChrootBarrier) so that a VPS can't escape its isolated root directory to the parent. Given an isolated root directory, each VPS has its own user list and root password. Linux-VServer (operating systemlevel virtualization) • The Linux-VServer is supported by both the 2.4 and 2.6 Linux kernels and operates on a number of platforms, including x86, x86-64, SPARC, MIPS, ARM and PowerPC. OpenVZ (operating system-level virtualization) • OpenVZ is another operating system-level virtualization solution, like Linux-VServer, but it has some interesting differences. • OpenVZ is a virtualization-aware (modified) kernel that supports isolated user-spaces, VPS, with a set of user-tools for management. • For example, you can easily create a new VPS from the command line OpenVZ (operating system-level virtualization) $ vzctl create 42 --ostemplate fedora-core-4 Creating VPS private area VPS private area was created $ vzctl start 42 Starting VPS ... VPS is mounted OpenVZ (operating system-level virtualization) • You can also list the currently created VPSes using the vzlist command, which operates in a similar fashion to the standard Linux ps command. • To schedule processes, OpenVZ includes a twolevel CPU scheduler. First, the scheduler determines which VPS should get the CPU. After this is done, the second-level scheduler picks the process to execute given the standard Linux priorities. OpenVZ (operating system-level virtualization) • OpenVZ also includes what are called beancounters. A beancounter consists of a number of parameters that define resource distribution for a given VPS. This provides a level of control over a VPS, defining how much memory is available, how many interprocess communication (IPC) objects are available, and so on. • A unique feature of OpenVZ is the ability to checkpoint and migrate a VPS from one physical server to another. Checkpointing means that the state of a running VPS is frozen and store into a file. This file can then be migrated to a new server and restored to bring the VPS back online. • OpenVZ supports a number of hardware architectures, including x86, x86-64, and PowerPC. Hardware support for full virtualization and paravirtualization • Recall that the IA-32 (x86) architecture creates some issues when it comes to virtualization. Certain privilegedmode instructions do not trap, and can return different results based upon the mode. For example, the x86 STR instruction retrieves the security state, but the value returned is based upon the particular requester's privilege level. This is problematic when attempting to virtualize different operating systems at different levels. For example, the x86 supports four rings of protection, where level 0 (the highest privilege) typically runs the operating system, levels 1 and 2 support operating system services, and level 3 (the lowest level) supports applications. Hardware vendors have recognized this shortcoming (and others), and have produced new designs that support and accelerate virtualization. Hardware support for full virtualization and paravirtualization • Intel is producing new virtualization technology that will support hypervisors for both the x86 (VT-x) and Itanium® (VT-i) architectures. • The VT-x supports two new forms of operation – one for the VMM (root) – one for guest operating systems (non-root). • The root form is fully privileged, while the non-root form is deprivileged (even for ring 0). • The architecture also supports flexibility in defining the instructions that cause a VM (guest operating system) to exit to the VMM and store off processor state. Other capabilities have been added Hardware support for full virtualization and paravirtualization • AMD is also producing hardware-assisted virtualization technology, under the name Pacifica. • Among other things, Pacifica maintains a control block for guest operating systems that are saved on execution of special instructions. • The VMRUN instruction allows a virtual machine (and its associated guest operating system) to run until the VMM regains control (which is also configurable). The configurability allows the VMM to customize the privileges for each of the guests. • Pacifica also amends address translation with host and guest memory management unit (MMU) tables. Linux KVM (Kernel Virtual Machine) • The most recent news out of Linux is the incorporation of the KVM into the Linux kernel (2.6.20). • KVM is a full virtualization solution that is unique in that it turns a Linux kernel into a hypervisor using a kernel module. • This module allows other guest operating systems to then run in user-space of the host Linux kernel (see Figure in the next slide). • The KVM module in the kernel exposes the virtualized hardware through the /dev/kvm character device. • The guest operating system interfaces to the KVM module using a modified QEMU process for PC hardware emulation. Linux KVM (Kernel Virtual Machine) Linux KVM (Kernel Virtual Machine) • The KVM module introduces a new execution mode into the kernel. Where vanilla kernels support kernel mode and user mode, the KVM introduces a guest mode. The guest mode is used to execute all non-I/O guest code, where normal user mode supports I/O for guests. • The introduction of the KVM is an interesting evolution of Linux, as it represents the first virtualization technology that is part of the mainline Linux kernel. It exists in the 2.6.20 tree, but can be used as a kernel module for the 2.6.19 kernel. When run on hardware that supports virtualization, Linux (32-and 64-bit) and Windows (32-bit) guests are supported. Virtualization Examples • Server consolidation - Virtual machines are used to consolidate many physical servers into fewer servers, which in turn host virtual machines. Each physical server is reflected as a virtual machine "guest" residing on a virtual machine host system. This is also known as Physical-toVirtual or 'P2V' transformation. Virtualization Examples • Disaster recovery - Virtual machines can be used as "hot standby" environments for physical production servers. This changes the classical "backup-and-restore" philosophy, by providing backup images that can "boot" into live virtual machines, capable of taking over workload for a production server experiencing an outage. Virtualization Examples • Testing and training - Hardware virtualization can give root access to a virtual machine. This can be very useful such as in kernel development and operating system courses. Virtualization Examples • Portable applications - The Microsoft Windows platform has a well-known issue involving the creation of portable applications, needed (for example) when running an application from a removable drive, without installing it on the system's main disk drive. This is a particular issue with USB drives. Virtualization can be used to encapsulate the application with a redirection layer that stores temporary files, Windows Registry entries, and other state information in the application's installation directory – and not within the system's permanent file system. See portable applications for further details. It is unclear whether such implementations are currently available. Virtualization Examples • Portable workspaces - Recent technologies have used virtualization to create portable workspaces on devices like iPods and USB memory sticks. These products include: – Application Level – Thinstall – which is a driver-less solution for running "Thinstalled" applications directly from removable storage without system changes or needing Admin rights – OS-level – MojoPac, Ceedo, and U3 – which allows end users to install some applications onto a storage device for use on another PC. – Machine-level – moka5 and LivePC – which delivers an operating system with a full software suite, including isolation and security protections. Server Virtualization • Server virtualization is used to describe many different technologies and approaches to abstract operating systems from hardware. • Server virtualization presents a virtual view of hardware to an operating system to allow multiple operating systems to share the same physical resource in complete isolation from each other. Server Virtualization • The key benefits of virtualization are: – Isolation: A virtual server’s state is unaffected by the state of other virtual servers on the same physical hardware. – Encapsulation: The state of a virtual server can be captured and files representing a virtual server are portable. – Hardware-independence: Virtual hardware does not have to be identical to the underlying physical hardware. X86 Virtualization • The x86 architecture was not originally designed for virtualization. • This created tradeoffs in early server virtualization implementations in terms of both performance and complexity. • Historically there have been two approaches to virtualize x86 architecture – binary patching – paravirtualization. • Although both approaches create the illusion of physical hardware to achieve the goal of operating system independence from the hardware, there are significant differences between the approaches X86 Virtualization • Full virtualization with binary patching, at run-time rewrites x86 instructions that cannot be trapped and converts them into a series of instructions that can be trapped and virtualized. Full virtualization is capable of running existing, legacy operating systems without modifications, however it has significant costs in complexity and runtime performance. X86 Virtualization • Paravirtualization modifies an operating system to replace non-trappable x86 instructions with a series of calls directly into a hypervisor (a virtual machine monitor). It achieves high performance with less complexity in the virtualization layer but requires the guest operating system to be substantially modified and tied to a particular version of the hypervisor. Virtual Infrastructure • All data center resources can be virtualized to create a Virtual Infrastructure. The components described in the chart below provide the foundation to create virtual servers. A virtual server consists of 32 or 64-bit CPUs, memory, disks, network adapters, fibre channel adapters, keyboard, video, and mouse. A virtual server can run standard Linux and Windows operating systems and applications. Virtual Infrastructure Physical Resource Virtual Infrastructure Industry standard Intel and AMD servers upon which the virtualization layer is automatically deployed A Virtualized Node consists of a collection of CPUs and RAM that can be allocated to a virtual server Each server can have multiple gigabit Ethernet cards (NICs) to provide required throughput and availability Virtual servers connect through virtual NICs to physical or virtual networks iSCSI, SAN and NAS storage technologies are used for reliable persistent storage A collection of storage resources can be partitioned and allocated to virtual servers using raw mappings or virtual hard disks Virtualization Tips • In the VMware space, VirtualCenter is the management tool of choice for ESX Server. • Other products, like Hewlett-Packard's Virtual Machine Management or IBM's Director modules, are adding functionality to deal with virtual machine [VM] environments. • The problem is that most of these tools that are snap-ins lack much of the simple functionality you get in VirtualCenter. • Most companies will end up buying both VirtualCenter and the vendor's tool and use both depending on what they are doing. Virtualization Tips • Shy away from large amounts of processing when doing consolidation. • If you are doing virtualization for other reasons, like workload management, then you can get nearly anything to run virtualized if you are willing to change some of the things you do. • However, if you are looking for maximum consolidation ratios and high ROIs, stay away from the quad boxes that are already running at 50%. Security Tips • Some standard minimum security at least: – Disable remote root access – use sudo when needed – configure the AD PAM modules for Windows shops. Security Tips • Some organizations use too much surrounding security and end up making their environment slower, more difficult and expensive to manage. • When dealing with the VMs, all of the standard procedures should be followed. • The host systems themselves should often be considered appliances, and organizations should limit the amount of customized agents and security hacks performed on these systems. Security Tips • One should not go overboard with ESX hosts, since they are basically appliances serving up computing resources and should be treated as such. Nevertheless, taking a common sense approach to security on the servers is the best bet. • The most common mistakes made with virtual security are based on ignorance, lack of knowledge of the Linux console, failure to understand how virtual switch architecture works, and what the host does not directly see in the data in the VM disk files. Security Tips • The same practices that are performed to secure a physical environment can, and should, be used in a virtual environment as well. • Everything from proper VLAN/firewall organization to host-based intrusion detection should be leveraged to keep the environment secure. Scalability Tips • Simplicity. The more complicated the design and infrastructure, the less scalable it will be. – For example, a common mistake in large organizations, is that they assume they cannot create a simple solution because they are big. One can argue that they should make the solution or design for VMware as simple as possible to make it scalable for the size of their organization and largest client base. • Don't design the entire solution around the oneoffs. Scalability Tips • When designing a virtual infrastructure, one should never look at the environment and try to plan one large infrastructure for the entire virtualization project. It won’t work. • Organize the overall environment into smaller groupings of servers and addressed individually. • When approached this way, at the end of the project, a very scalable deployment methodology that uses the same principals with a manageable number of servers in various phases of the project will be in place