The Developer’s Guide to Virtual Machines an Developer eBook The Developer's Guide to Virtual Machines n a world of multiple operating systems, each with various versions, no developer has the luxury of building applications for only one target configuration. Every developer needs to ensure that his or her applications will function correctly on all the OS configurations used by today's heterogeneous IT environments. Because dedicating physical test systems for each target environment is out of most development teams' budgets, virtual machines (VMs) are the right solution at the right time. I Virtualization solutions enable you to run multiple VMs on one physical computer. Each VM behaves as an isolated physical PC or server with its own configuration – a very useful testing and development environment that's much cheaper than the real thing. Java developers know the benefits of the VM concept well. The promise of enabling developers to "write once, run anywhere" was a key factor in the broad adoption of Java, which itself runs on the Java Virtual Machine. As the market for virtualization grows, the ways in which developers use virtualization itself are expanding. The traditional development and testing uses of virtual machines as local disposable sandboxes and solutions for application isolation are broadening. Sharing development tasks across large teams in disparate locations appears to be the next step. Proper Virtual Machine Uses in Development and Testing Virtual machines can cut time and money from the software development and testing process, but they aren't the best choice in every scenario. When should you use virtual machines for developing and testing software, and when should you use a more traditional setup with physical machines? Software development typically involves developing and testing for different target environments, but dedicating a physical computer to each environment can be expensive. Not only do you have to consider the initial purchase cost, but physical computers also take up space, use power, and require maintenance. Virtual machines can reduce this cost by providing a way to run multiple development and test environments on one physical computer. Another problem with dedicating a physical computer to each environment is that setting up your target environments can be quite time-consuming. In this situation, virtual machines can save you time. If you need to duplicate a particular environment, you can create a library of virtual hard disks that are pre-loaded with specific sets of software. You and other members of your developFigure 1 ment and test team can clone the disks that you need and quickly replicate a particular environment in a virtual machine. This type of setup can save lots of time when you need to start over with a clean installation, or duplicate the same environment in several virtual machines This type of setup can save lots of time when you need to start over with a clean installation, or duplicate the same environment in several virtual machines (See Figure 1). The figure shows Microsoft's Virtual Server with three virtual machines running the same build of BizTalk Server on three different operating systems. You could click one of 1 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines the thumbnails to access the virtual machine or use a Remote Desktop connection. (Note that licensing for software running in virtual machines is pretty much the same as in physical machines, so make sure you have the proper licenses for all of your running software.) You can also attach virtual machines to physical networks just as if they were physical, or you can create a virtual network for testing different scenarios, while isolating virtual machine network traffic to the host computer. This is useful for patching virtual machines, providing general network access to them, and validating different network scenarios that might be relevant in your software testing. Development and Testing Uses for Virtual Machines The following sections describe the different ways that you can use virtual machines for development and testing. Create a Library of Virtual Hard Disks As previously mentioned, you can create a library of virtual hard disk (.vhd) files that you and your colleagues can use to "instantly" recreate a particular environment. Using Microsoft's Virtual Server and Virtual PC, you can do this by creating a virtual machine, installing the requisite software on it, and then cloning the .vhd file. You can attach the .vhd file to a new virtual machine, boot it up, and voila: your environment is running on the new virtual machine. If you want to run more than one virtual machine with the same .vhd on the same network, you must sysprep the virtual machine to prevent network conflicts. Create a Standardized IDE With your IDE deployed in a virtual machine, you can quickly set up a development environment that meets your company's standards and even share it with colleagues in your .vhd library. If you work at home, you can install a virtual machine on your personal computer with the corporate "standard" environment so you can connect to the corporate network. You can also sandbox the virtual machine, isolating it from your personal computer, to satisfy corporate security requirements. We'll go into sandboxing in more detail later. 2 According to Rich Lechner, VP of Virtualization technology for IBM, “Anywhere from 40 to 50 percent of the clients out there either have implemented or plan to implement virtualization over the next one to two years. We are seeing very broad-based adoption. Certainly much broader than, in my experience, the early days of Java and Linux." The trend has not been lost on chipmakers such as Intel and AMD, who are tweaking the underlying silicon of the x86 platform to enable VM software makers to optimize their products. Let's examine exactly how virtualization works. An operating system running on a physical computer controls the computer's hardware, including memory, CPU, network adapters, hard disks, and peripherals. Because only one operating system can control the hardware at any given time, you usually can't run more than one operating system on the same computer at the same time. Virtual machine technology uses an agent to allocate physical hardware resources to the host operating system and the running virtual machines as needed. The host still controls the physical hardware, but each virtual machine emulates its own set of hardware and "borrows" physical resources from the host to run it. The virtualization agent (called the virtual machine monitor in Microsoft's Virtual Server and Virtual PC products) allocates the resources to each virtual machine's emulated hardware. This allows the hardware of the physical computer to serve the host operating system and a number of virtual machines simultaneously. As you might guess, you need enough hardware resources to run the various operating systems and applications, which is at the root of the main limitation of virtual machines: you may need to beef up your hardware if it doesn't have enough resources for the virtual machines you want to run. The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines Test New Development and Test Tools You can try out new tools for software development and testing on a virtual machine without jeopardizing your primary workstation setup. You can set up a .vhd file that has your basic environment installed, copy it, attach it to a new virtual machine, and boot it up. You can then install the new tools and see how they work in your environment. Perform Functionality Tests Use the following tips to make it quicker and easier to perform software functionality tests with virtual machines: • Deploy a variety of destination environments for functionality testing using minimal hardware (as previously mentioned). • Set up a library of test environments in virtual machines for rapid deployment (also previously mentioned). Just copy the .vhd file, attach it to a virtual machine, and boot up. • Run tests and then quickly roll your virtual machines back to a clean state. You can do this with Microsoft's Virtual PC and Virtual Server thanks to a cool feature called "undo disks." It can be used to reinstall builds of software that are under development at Microsoft. To get back to a clean base where you can install the latest build, do the following: • Install all of the prerequisites for the software. • Enable undo disks. • Install the software. When you want to go back to a clean base, turn off the virtual machine and discard undo disks. • Test complex network scenarios without setting up a physical network. Thanks to the flexible virtual networking in Virtual Server, you can create a complete network setup on your test computer and keep all of the network traffic isolated to the physical box. You can even set up a domain. What Not to Expect from Virtual Machines While their benefits all sound ideal, virtual machines do have two main drawbacks: they share physical resources with the host and any other running virtual machines, and they carry some processing overhead. So you can't expect the same performance from a virtual machine as you do from a physical one. Because they contend for resources in this way, the following are not good uses of virtual machines: • Performance and stress testing. Your results may not be accurate because the amount of resources available for a given operation can fluctuate. • Running multiple resource-intensive virtual environments on the same physical computer. Performance will be sub-optimal unless your computer is sized adequately. Your host computer must have the sum of all of the physical resources required by the running virtual machines, plus what the host system needs, plus about another 10 percent for overhead. You'll have other considerations as well, such as disk I/O requirements. What does this mean to you? If you're a developer, limit the number of resource-intensive programs you run on a single computer. If you're a tester, you shouldn't try to use virtual machines for stress or performance testing. You should use physical computers for these purposes. 3 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines The Pros and Cons of Virtualization in the Datacenter Let's discuss the pros and cons of virtual machine technology for helping you determine whether the cost of implementing virtual machine technology is worthwhile. Should the VM benefits outweigh the drawbacks in your multiserver datacenter, virtual machine technology can provide more reliability, easier manageability, and lower overall cost for your organization. Features and Benefits Isolation One of the key reasons to employ virtualization is to isolate applications from each other. Running everything on one machine would be great if it all worked, but many times it results in undesirable interactions or even outright conflicts. The cause is often software problems or business requirements, such as the need for isolated security. Virtual machines allow you to isolate each application (or group of applications) in its own sandbox environment. The virtual machines can run on the same physical machine (simplifying IT hardware management), yet appear as independent machines to the software you are running. For all intents and purposes – except performance – the virtual machines are independent machines. If one virtual machine goes down due to application or operating system error, the others continue running, providing services your business needs to function smoothly. Standardization Another key benefit virtual machines provide is standardization. The hardware that is presented to the guest operating system is uniform for the most part, usually with the CPU being the only component that is "pass-through" in the sense that the guest sees what is on the host. A standardized hardware platform reduces support costs and increases the share of IT resources that you can devote to accomplishing goals that give your business a competitive advantage. The host machines can be different (as indeed they often are when hardware is acquired at different times), but the virtual machines will appear to be the same across all of them. Consolidation Virtual machines also increase utilization and promote consolidation. Consolidation of servers results in easier management and decreased hardware costs. The drawback of consolidation is increased susceptibility to hardware failures and increased impact from those failures. However, the risk and negative impact can be mitigated with failover setups where virtual machines on two different physical machines monitor each other with each one ready to take over for the other. Using virtual machines should not require more physical machines and usually will result in fewer physical machines. This is a great boon because setting up and maintaining physical hardware is messy and time-consuming. On top of that, physical servers consume power. With electricity rising in cost, power savings translate into larger and larger financial savings. Ease of Testing Virtual machines let you test scenarios easily. Most virtual machine software today provides snapshot and rollback capabilities. This means you can stop a virtual machine, create a snapshot, perform more operations in the virtual machine, and then roll back again and again until you have finished your testing. This is very handy for software development, but it is also useful for system administration. Admins can snapshot a system and install some software or make some configuration changes that they suspect may destabilize the system. If the software installs or changes work, then the admin can commit the updates. If the updates damage or destroy the system, the admin can roll them back. 4 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines Virtual machines also facilitate scenario testing by enabling virtual networks. In VMware Workstation, for example, you can set up multiple virtual machines on a virtual network with configurable parameters, such as packet loss from congestion and latency. You can thus test timing-sensitive or load-sensitive applications to see how they perform under the stress of a simulated heavy workload. Mobility Virtual machines are easy to move between physical machines. Most of the virtual machine software on the market today stores a whole disk in the guest environment as a single file in the host environment. Snapshot and rollback capabilities are implemented by storing the change in state in a separate file in the host information. Having a single file represent an entire guest environment disk promotes the mobility of virtual machines. Transferring the virtual machine to another physical machine is as easy as moving the virtual disk file and some configuration files to the other physical machine. Deploying another copy of a virtual machine is the same as transferring a virtual machine, except that instead of moving the files, you copy them. Multiple deployments of a single virtual machine are much easier to achieve than multiple deployments of an operating system on a physical machine. Drawbacks and Challenges Concentration Risk We've already discussed the increased reliance on fewer physical machines: the "putting all your eggs in a few baskets" effect, which is also called concentration risk. You can ameliorate this risk by setting up heartbeat monitoring and failover on virtual machines located on independent physical machines. Virtual machine technology actually reduces concentration risk when deployed in the right configurations. Compared to a baseline configuration of x physical machines, you can almost always achieve a more failure-resistant configuration using x or fewer physical machines hosting more than x virtual machines that are networked to watch each other and take over in the event of a partner machine's failure. The efficiency multiple could be 1.5x, 2x, 3x, 4x, or more, depending on the applications in the virtual machines and the specifications of the physical hosts. Cost Licensing costs were a drawback to running virtual machines, but the picture is starting to look better. If you were running servers on VMware's GSX Server, for example, the dollar cost of licenses could have been a significant portion of (or more than) the cost of the physical hardware, depending on your physical machine specifications. That's because VMware GSX Server cost $1,400, which added considerably to the cost of a workgroup file server or a Web server. VMWare replaced GSX Server with a free product called VMware Server, with the hopes that providing a free product will make the company's virtualization technology available to a wider audience. Xen charges no license fee, but it currently runs only on Linux hosts and handles only guest operating systems for which source code is available, a criterion that includes Linux and BSD but not Windows. The guest operating system limitation is changing with Intel's release of its VT "Virtualization Technology" and AMD's Pacifica chip technology. Both enable a host hypervisor to execute unmodified guest operating systems, which means Xen will be able to run Windows as a guest operating system. The hypervisor is the bit of code sitting between the hardware and the guest environment that mediates access to physical hardware and controls execution of privileged instructions on the CPU. Performance Penalty Virtual machine technology imposes a performance penalty from running an additional layer above the physical hardware but beneath the guest operating system. The performance penalty varies based on the virtualization software used and the guest software being run. Two good performance comparisons of VMware and Xen were conducted by the computer science departments at University of Cambridge, England and Clarkson University. Based on the Cambridge study, VMware Workstation achieves near-native performance for processor-intensive tasks, but experiences slow-downs of up to 88 percent on I/O-bound tasks. That means your I/O-bound process would be running 5 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines at nearly 1/10 of its native speed – something that may be unacceptable to you. The Cambridge group performed its study based on VMware Workstation 3.2 because licensing restrictions in newer VMware versions prohibit test comparisons. VMware likely has improved its performance, but in any case if your task is I/O-intensive, you would do well to test it in a trial copy before purchasing the software. In the same study, Xen performed extremely well whether the task was CPU-bound or I/O-bound. In some cases, Xen's performance penalty is almost non-existent thanks to its paravirtualization function, which modifies the guest operating system to optimize performance (hence, the more limited selection of supported guest operating systems). The performance penalty can mean you need to purchase additional hardware or more expensive, higher-end hardware. This is one factor you must take into account when determining whether, or to what extent, to adopt virtual machine technology. For large deployments, the increased ease of management often far outweighs the license fees and potentially more demanding hardware requirements per physical machine. Furthermore, services often can be consolidated onto fewer physical machines that serve as hosts for multiple virtual machines, meaning that overall hardware costs decline. Hardware Support A fourth drawback of virtual machine technology is that it supports only the hardware that both the virtual machine hypervisor and the guest operating system support. Even if the guest operating system supports the physical hardware, it sees only the virtual hardware presented by the virtual machine. The virtual machine's hardware support actually has two aspects. The first is what the virtual machine hypervisor recognizes on the host machine. This is generally fairly broad within the common categories such as networking, hard drive storage, keyboards, mice, and video cards. The virtual machine hypervisor, if it runs on top of the host operating system, usually just takes advantage of the host operating system's support of the physical device in question. It takes advantage of the host's support for physical devices so that they do not have to provide code specifically for the plethora of hardware devices on the market today. VMware ESX Server, in contrast, is designed to run on bare hardware with no underlying host operating system for support. As a result, performance can be better than that provided by GSX Server or Workstation, but the range of hardware that the virtual machine hypervisor will run on is much more limited because the ESX Server code base must contain code to handle each device that it supports. The second aspect of virtual machine hardware support is the hardware presented to the guest operating system. No matter the hardware in the host, the hardware presented to the guest environment is usually the same (with the exception of the CPU, which shows through). For example, VMware GSX Server presented an AMD PCnet32 Fast Ethernet card or an optimized VMware-proprietary network card, depending on which you chose. The network card in the host machine does not matter. VMware GSX Server performed the translation between the guest environment's network card and the host environment's network card. This was great for standardization, but it also means that host hardware that VMware does not understand will not be present in the guest environment. Software Licensing A fifth challenge of virtual machine technology is the complication of software licensing inside guest operating systems. If you load and run Windows Server 2003 in eight virtual machines on four physical machines, how many licenses would you be obligated to pay for? What about database software like Oracle or SQL Server, which are usually licensed based on the number of processors? A virtual machine on a dual processor host machine may have only one processor. If Oracle runs in the virtual machine, should you be charged for one processor or two? Choose Wisely The challenge of deploying virtual machine technology is figuring out whether the benefits outweigh the costs in your situation. Virtual machines improve utilization, facilitate management, reduce downtime, and enhance the mobility of applications in many scenarios. So if the management of IT hardware and software resources is a current or antici- 6 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines pated headache, you should take a look at VMware, Xen, and Microsoft Virtual Server and carefully consider the pros and cons for your particular situation. Building a Virtual PC We're now going to walk through a step-by-step Microsoft Virtual PC 2004 installation of SuSE Linux 9.1 Professional. Once it's built, you can clone your Virtual PC, back it up, perform experiments on it, restore it, and even distribute it to others. This is a good skill for a developer to have because in order to replicate their customers' problems, developers need the ability to run their software in the same environments as their customers. If you develop on Windows XP, and a customer discovers a problem on Windows 98 Second Edition, it helps you immensely to have a Windows 98 SE installation. Installing an operating system on a Virtual PC is just as much work as installing it on a real PC, at least the first time. To give you an idea of what building a Virtual PC involves, this tutorial walks you through a Microsoft Virtual PC 2004 installation of SuSE Linux 9.1 Professional from DVD. It uses Virtual PC 2004 SP1, installed on Windows XP SP2. The host computer has 1 GB of RAM. Start by choosing to create a VPC. The New Virtual Machine Wizard will come up. Choose to create a virtual machine: You usually name a virtual machine by the operating system you're installing (in this case, SuSE 9.1 Professional). If you later clone the VPC, use a name that indicates the special purpose of the clone. If you attempt to maintain simple numbered clones, you'll probably find out that after a week or so you can't remember which number had which software version. The New VM Wizard knows the memory needs of 11 specific Microsoft operating systems, ranging from MS-DOS to Windows Server 2003. The wizard allows you to choose Other for anything else. The Wizard tends to recommend less memory than you'll want for most systems, but you can change it now, and adjust it later. To install SuSE Linux, allot 256 MB of RAM, at least for now. Later, you might decide to run the VM with less RAM to allow more processes to run in the host machine, or you might decide to run the VM with more RAM to accommodate the memory needs of the applications you are running in it. Since you are installing the system from scratch, create a new virtual hard disk. If you were cloning an existing VPC, you'd use a copy of the existing virtual hard disk file. The default hard disk name is usually fine for a new VPC. Figure 2 You now have a VPC that's ready to run, but it has an empty, unformatted hard disk. You need to boot it from a physical CD or DVD, or an ISO CD or DVD image file. The CD menu on the Virtual PC shows that you've already captured the physical D: drive. The boot information from the VPC's BIOS shows that it sees 256 MB of RAM, just as you set it. (See Figure 2.) With a SuSE installation DVD booted, you get a menu of options. Choose a normal installation. From this point on, installing to the Virtual PC is almost exactly like installing to a physical PC. In YaST, SuSE's OS setup and configuration tool, you pick your language and you accept the default installation settings. YaST does not automatically detect 7 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines the emulation sound card. So ask it to detect older sound chips, and it will find the emulation Sound Blaster 16. Working with the SuSE VPC Two or three hours of automatic installation later, you can boot SuSE Linux. As you might expect, the support for Linux in Virtual PC 2004 is not as complete as the support for Windows. To release the mouse from the Linux VPC window, you must press the right Alt key. On a VPC that runs Windows, you can move the mouse focus in and out of the VPC window smoothly and transparently once you have installed the Virtual Machine Additions (see Figure 3). Figure 3 The Virtual Machine Additions also allow the emulation of a video card with more memory, allowing for better video modes. They further support sizing the VPC window interactively with the mouse to nearly arbitrary sizes, as well as sharing folders between the host PC and the VPC. You can work around the lack of shared host folders. SuSE, like most modern Linux builds, can view Windows networks using Samba, as long as you don't block the Samba ports with a firewall. If you have a Linux build of ActiveState Komodo on your host Windows machine, and you have shared the directory over the Windows network so that it can be seen from Linux with Samba, once you copy the license file to your VPC, you can run it in a shell. If a firewall blocked Samba, you could still transfer the file using Web sharing of the folder on the host Windows PC and view the directory from the Linux VPC using a Web browser. Transferring the compressed installation TAR is a similar process. Once you've transferred it to the Linux VPC, you open it with the default archive management tool and extract it to a new directory. Then you install the software. In order to do so successfully, you need to take root privilege temporarily. Figure 4 Once the software is installed, you need to exit the privileged shell to run under your own ID, since you have installed a license for only your user account. You can either create a link or put the installed software on the path to make it convenient to start from a shell (see Figure 4). SuSE installs Perl, Python, and Tcl by default, and Komodo detects all three. SuSE also installs Java, but Komodo is not a Java editor. SuSE does not install PHP by default, but you can download and install it from php.net using the built-in Web browser, just as you would for a SuSE installation on a physical PC. Similarly, you can download and install Eclipse from eclipse.org and NetBeans from netbeans.org for Java editing. 8 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines Just like that, you have configured a complete installation of SuSE 9.1 Professional Linux in a Virtual PC and installed development tools. Now you can use the Virtual PC for Linux development and testing. You also can save the disk image on a DVD+R or other media, enabling you to revert to this configuration quickly in the future. Tips for Working with Virtual PCs The following tips can save you time and aggravation when working with Virtual PCs. Use them to get the most out of your VPC investments. 1. If you are installing a Windows OS in a Virtual PC, you can save yourself at least half an hour by doing a quick format of the virtual hard disk instead of a full format. What you're skipping is the extensive testing that the format utility does to find bad disk sectors. It's not a real disk, so the testing is essentially useless – assuming that you have a reliable physical hard disk on your host computer. 2. When working with Virtual PCs, you have the option to turn on "undo" disks. This option allows you to experiment with a VPC and decide at the end of your session whether to commit your changes to disk or return the disk to its original state. It's not as great as it sounds. Undo disks can slow the VPC down to a crawl. Instead, you can "clone" a VPC. For example, if you have an existing Windows XP VPC, you can copy its hard disk image file to a new VPC. If the experiments you do with the cloned VPC work, you can make that your new base VPC and delete the original. If the experiments fail, you can delete the clone. If the experiments create a configuration you want to keep in addition to the base configuration, then you can keep both around. 3. Three factors limit the number of VPCs you can keep and run: disk space, RAM, and your time. Disk space is obvious: if a VPC takes between 500 KB and 5 GB of space on your host machine's hard disk, it's easy to chew up disk space by keeping too many VPCs live. Writable DVDs are a good way to store VPCs offline. The RAM limit is not so obvious. You would think that VPCs could utilize the large virtual memory space of your host PC, but the actual RAM in your host limits them. For example, with the 256 MB SuSE VPC you created running, the host PC has only 450 MB of its total 1,024 MB of physical RAM available. With the VPC closed, the host PC has 750 MB of its physical RAM available. In other words, the VPC takes up the entire physical RAM you allocated, plus about 45 MB. That makes sense, because emulating the hardware also takes some RAM. What about your time? Just like a physical PC, a Virtual PC needs not only to be installed, but maintained as well. The disk file needs to be backed up. The operating system needs to be patched. The antivirus needs to be kept up to date, and the anti-spyware solution needs to be kept up to date. On the other hand, if a VPC gets a virus and you have been faithful about backing up the virtual hard disk, restoring to a clean backup is a snap: delete the infected image file and copy the saved disk image back from a DVD in just a few minutes. 4. If you're planning to buy a new PC that will run multiple VPCs, include one or more big hard disks, a big backup, lots of RAM, and the fastest CPU you can get. It's only money, but giving this machine the ability to run multiple VPCs eliminates the need to buy a whole bunch of other computers. Prototyping Complex Enterprise Solutions with a Workstation Suppose you have to deploy a highly available and scalable database backend solution for an Internet application. A 9 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines cluster immediately comes to mind. A cluster is a conglomerate of two or more machines that are capable of sharing their workload in a real-time, near real-time, or scheduled "toggle" mode. The following are its three main components: 1. Machines running some form of an enterprise-level operating system, such as Linux, Windows, or Unix 2. A network – a key component of the clustered architecture 3. Cluster-capable software products running within machines that are designated as the cluster members Buying some hardware to prototype and test the proposed solution seems to be a reasonable course of action, but before you embark on a hardware purchasing spree, consider an option that will let you build a prototype of your clustered backend application right on your desktop: virtualization. Virtualization products support the creation of the first two essential clustering components in your scenario: provisioning of machines and the establishment of a network between these machines. The third key component, cluster-capable software, is independent of the physical architecture and therefore has to be installed as part of the prototyping process. Let's take a look at how to prototype machine clusters for the proposed solution utilizing VMware Workstation version 5.5. Establish the Virtual Environment Step 1: Determine Guest Machine Template To save time and expedite the effort, you should establish a "template machine" that contains all the required components for your virtual machines. The example prototype uses MySQL 5.1 with cluster extensions installed on the template machine. Running a cluster prototype on a single machine is certainly more cost effective than purchasing the requisite hardware for a physical cluster, but it does carry certain software, hardware, and skills requirements. Software: We're using VMware Workstation 5.5 because it is a highly capable virtualization product. Similar products such as Microsoft's Virtual PC and the open source Xen also would enable you to develop similar prototyping solutions. VMware Workstation runs on Windows XP, Windows 2003, and the majority of current Linux distributions. It enables installation of any x86compliant guest operating system (even Novell NetWare), including the 64-bit offerings. Hardware: You will need plenty of working memory and CPU power: at least 256MB of RAM per virtual machine plus 256-500MB for the host operating system and a 2.4GHz Pentium 4 or AMD Athlon XP 2400+ class or above. Skills: In order to successfully establish the virtual cluster, you need to know the specifics of the operating system you are installing, as well as those for the network configurations and the cluster-aware software components. Understanding a few crucial aspects of the proposed virtualization prototype and the strategy for its implementation will make the prototyping process as fast and productive as possible. As previously mentioned, VMware works well with all three major x86-compliant operating systems: Linux, Windows, and Solaris Your configuration depends on your needs and the nature of the prototype (security, size, performance requirements, etc.). You could select a "hardened" installation with only the smallest, safest set of operating system components. In that event, you should still have at least one file-transfer protocol available, so that you can add components to your virtual machines later if needed. Another approach is installing the operating system with all the options you can imagine. This installation would certainly require more resources from the virtual machine. For prototyping three-tier application clusters, you may find Linux on Windows to be the most convenient combination because Linux allows for minimal, bare-bones guest operating system installs and the Windows version of VMware was the most convenient to work with because most of the workstations in my professional environment are already running some form of Windows OS (XP or Server 2003). Step 2: Install Guest Machine The cluster here consists of CentOS Linux 3 (free distribution of RedHat ES 3) guests on a VMware Workstation 5.5 for Windows XP. VMware Workstation offers two convenient ways to install the 10 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines operating system: 1. From the host machine's physical CD/DVD drive 2. From the virtual CD/DVD drive (i.e., from the ISO image on the physical machine's hard drive) Virtual drive installation is very quick. You may find it more convenient to have ISO images for your operating system, but keep in mind that if you have a multi-CD installation you will need to remap the virtual CD (ISO image) every time you are asked to continue installation from the next CD. For this reason, I found downloading the Server ISO or DVD ISO images for Linux distributions (e.g., CentOS) very convenient (see Figure 5). Figure 5 Each CentOS Linux 3 guest will be installed in a non-graphic mode, which occupies about 1GB of space on the hard drive and runs minimal kernel services requiring between 192MB and 256MB of memory per virtual machine. When you install the guest operating system, VMware requires you to specify all the basic parameters for your virtual machine: memory, network, allocated drive storage. For the template operating system, I usually select the default options and NAT (Network Address Translation) networking. You can customize these options later. Step 3: Create Clones The ability to clone the virtual machines is the primary reason for having a template operating system in the first place. You will use it to create clone machines (i.e., other members in the cluster). Figure 6 With VMware 5.5, creating clones of the virtual guests is generally a simple process. It enables you to create a linked or a full clone (see Figure 6). Linked clone is an especially convenient feature for the type of prototype discussed here. As the name implies, it creates a clone whose installation files are linked to a "parent,” an original virtual machine, and for which VMware creates only the specific configuration files. If you do not plan to move these clone machines around, linked clones are probably the best solution. If you plan on moving the guest machines across multiple host machines, I recommend going with the full clone option. It creates a full replica of the parent virtual machine. For the example, I created three clones of the templated Linuxwith-MySQL installation. Step 4: Configure and Customize Networking The VMware Workstation installation process automatically configures two new (virtual) network adapters on your machine: VMNet1 and VMNet8. The VMNet1 adapter is used for the private networking between the host and the 11 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines virtual machines. VMNet8 is used for NAT networking, which enables sharing of the host's external network access with the virtual machines. These adapters are essential for the interconnectivity and proper operation of the network between the virtual machines and the host, and for the virtual machines' access to the Internet. Through these adapters, VMware provides DHCP services to the virtual machines, as well as NAT access to the Internet. VMware Network Configuration It is time now to look at some basics of VMware network configuration and how they pertain to the cluster configuration. VMware Workstation supports three network modes: 1. Bridged networking – Virtual machines have full access to the host's network. However, in order to gain access to the network they need to be assigned their own IP addresses. 2. NAT – With NAT configuration, guest machines do not have their own IP addresses on the external network. They are assigned IP addresses in the context of the private network within the virtual environment. Virtual machines gain access to the external network via the host machine's VMNet8 adapter. The host machine translates the traffic coming from the virtual machines via the VMNet8 adapter as well as external traffic. 3. Host Only networking – This type of networking enables the connection only between the host machine and the virtual machine. Virtual machines do not have access to the external network. The ability to create the virtual network adapters and configure network options as described above is essential to the cluster prototyping process. For the example cluster, the best approach is to start with Host Only networking to establish the interconnectivity between the machines in the cluster, and then to test it from the host machine. For the cluster configuration, you need to create the subnet for the machines in the cluster and assign static IP addresses to them. To assign static IP addresses, you should manually assign them in the C-class network ranging from <net>.3 to <net>.192. (Addresses <net>.1-3 are reserved for virtual machine use.) On Linux, you could configure the subnet, range, and IP address for each machine. One way would be to add the following command (use the machine IP address as specified in the MySQL tutorial): /sbin/ifconfig eth0 192.168.0.10 netmask 255.255.255.0 broadcast 192.168.0.255 to this: /etc/rc.d/rc.local and execute it on startup. Since the VMware host does not automatically provide internal DNS service for the virtual machines, you need to manually configure some of the machines to serve the purpose of a DNS server or to configure the host files (which is outside our scope right now). The simplest cluster configuration has no firewalls between the machines, enabling software components to interact with each other based on your configuration preferences. In a more sophisticated configuration, you could configure 12 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines special-purpose machines to serve as the routers/firewalls. (More on this option in Step 6). Figure 7 Step 5: Customize Cluster-Aware Software Components Once you have established the networking between the virtual machines, and when you are operating within the boundaries of the virtual machine, the cluster-aware software components "see" the virtual machine exactly as they see the network and surrounding software. From this point on, the example just follows the cluster setup steps for the components it prototypes: MySQL Management Server on machine 168.192.0.10, MySQL Server on 168.192.0.20, and data servers on 168.192.0.30 and 168.192.0.40 (see Figure 7). You can just follow the MySQL 5.1 clustering instructions, as the virtualization process requires nothing extra except that you must have enough RAM to support all virtual machines running concurrently. Note: RedHat-based systems require a few extra steps in order to enable unicast-based clustering. To enable your primary network card (likely eth0) for unicast, use the following command: ifconfig eth0 multicast Use this command to enable unicast: route add -net 224.0.0.0 network 240.0.0.0 dev eth0 Step 6 (Optional): Configuring Firewall To simulate a firewall, you could install a very small Linux virtual machine with two (virtual) network adapters (eth0, eth1) and configure Bridged networking on one adapter, allowing the incoming traffic from the external network into that machine. You would configure the other adapter for membership in your virtual cluster's subnet. Utilizing iptables and ipchains on Linux, you could configure the rules for allowed traffic between the external system (through the bridged adapter) and into the adapter (on the private subnet). Now that you have configured the virtual machine to represent the cluster of physical machines getting the clustered application up and running is completely a matter of following the directions as laid out in the MySQL documentation. From this point on there is nothing specific to the virtual machine operations anymore. Make a note that any configuration error that you may experience in the setup process will likely be related to the improper setting of the networking on Linux. It is absolutely essential that you understand all the intricacies of the network configuration before embarking on the cluster prototyping. Once the virtual cluster is established you can proceed with the testing and experimentation that is typical for this type of the architecture: load balancing properties by generating the load and examining the switching between the servers, suddenly bringing down (powering off) one of the servers, etc. 13 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines Keep in mind that virtual machines in this configuration will not exercise the same performance properties as their physical counterparts. They will perform slower. However, the performance ratios, failures, and successes observed during the experimentation on the virtual machines will be the same for the physical counterparts. If you experience the performance issues with the data replication between two virtual data servers you will see the same issues in the physical environment. The same will apply for all the positives that you may observe during the testing. Oh, the Possibilities... We've provided you with one idea for utilizing virtualization to prototype clustered, highly available applications. As you can imagine, it is only the tip of the iceberg. Here are some other interesting ideas that you may find useful. Prototyping Different Database Designs The database is the most critical performance component of almost any system. A proper relational design and a physical storage strategy often help make a difference in how well the complete application performs. With virtualization products capable of savepoints (an important feature that enables you to save the complete state of the virtual machine at a given point in time), you can establish a baseline database architecture and then explore how well the database performs under different data loads, storage strategies, and logical optimizations such as denormalization. Savepoints will enable you to safely fall back to the original state of the application, or to the one you liked best. Securing the Network As mentioned previously, virtualization does not cover only the installation and configuration of guest machines, but also the configuration of virtual networks. With some creativity, you could prototype and explore different configurations and elements of network security: machine hardening, setting up and operating honeypots and traps, probing the network for weaknesses, exploits, and data leaks – and do it all within the safe confines of your own machine. As you can see, virtualization software opens the door to many professionally exciting prototyping opportunities. Although we could not cover all the details involved in the relatively sophisticated prototyping process, the general concepts and ideas presented hopefully showed how helpful the virtualization concept can be, even in relatively complicated, multi-machine scenarios. So explore them. You will make yourself and your organization more agile and productive in accomplishing your technical goals. Make A Virtual Machine Your Safe Browsing Sandbox No matter how well protected your system is and how careful you are, browsing unknown Web sites puts your system at risk. Consider the highly publicized Microsoft Graphics Rendering Engine Vulnerability. An unpatched system with this vulnerability is subject to being completely taken over by an attacker. Browsing an infected Web site can be enough for this vulnerability to be exploited. Using a virtual machine for Web browsing provides an excellent defense against this type of threat. To understand how to use a virtual machine for safer browsing, first some terminology needs to be defined: • The physical machine on which the virtualization application (e.g., Virtual PC, Virtual Server, VMware, Xen) resides is the host machine, as in the machine that hosts the virtual machine. 14 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines • A virtual machine is a guest machine. The entire guest operating system and programs are written into a large virtual hard disk file that resides on the host machine. (Although the figures in this section use Microsoft Virtual PC 2004, the concepts illustrated are generic and applicable to other virtualization products.) Undoing a Threat The single most valuable reason to use a virtual machine for browsing is the undo capability. Microsoft implements this with its undo disks feature. The idea is simple: Whatever takes place in the guest machine, such as inadvertently downloading spyware, is written to another file instead of the principal virtual hard disk file where the OS and applications are installed. When the browsing session ends, the guest machine is turned off without saving any of the changes that occurred while it was running. Figure 8 The undo disks feature is off by default, so you must enable it. The following steps show how to configure it: 1. Select a virtual machine in the Virtual PC Console. 2. Click the Settings button. 3. Select Undo Disks. 4. Check the Enable undo disks checkbox as shown in Figure 8 and then click OK. Figure 9 The advantage of using the virtual machine becomes apparent when you turn off the machine (see Figure 9). By selecting the option Turn off and delete changes, you restore the virtual machine to the exact same state it was in before it was turned on. If any malware was downloaded, it will be in the undo disk file, which is discarded. The virtual hard disk where the operating system and programs reside is untouched. In order for safe browsing to work, the virtual machine must connect to the network. How to configure networking in a virtual machine is covered in the next section. 15 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines Enabling Network Access Virtual PC provides two options for enabling network access via the host machine's network adapter, using either the host network adapter itself or shared networking. These options are the last two in the dropdown list of networking options in the Virtual PC settings (see Figure 10). The second from the last option (using the host network adapter) is different on every machine because it is the description of the network adapter on the physical host machine. Enabling the host's network adapter causes the guest machine to appear on the network as a separate machine with its own IP address. From a networking perspective, the guest functions the same way as a physical machine equipped with a network adapter. This is typically fine for a home network, but may not work in a corporate environment with a Windows domain because unless the guest machine joins the domain, it will not be authorized and may not be able to use the network. (Note: wireless networking and dialup do not work with a host network adapter.) Figure 10 The other option to enable network access is Shared networking (NAT), which is referred to simply as NAT in VMware Workstation. With Shared networking enabled, Virtual PC serves as a NAT router that uses the host's IP address to access the network. Since all network access is routed through the host, you can establish network access in a tightly controlled domain. If the host is authorized to use the network, then Shared networking uses the host to connect to the network and then to the Internet. If multiple network adapters are available, you can configure Shared networking only on the first one. A guest using Shared networking cannot communicate with other guest machines on the same host. (Note: wireless networking and dialup do work with Shared networking.) Regardless of which networking option you choose, if Windows Firewall is enabled only on the host, it will not protect the guest. You must enable Windows Firewall within the guest as well to ensure maximum protection. Mitigating Risk Virtual PC Shared Folders are host local drives or folders that appear as mapped drives, and they are actually functionally equivalent to mapped drives. A guest machine used to browse the Internet should not use the Shared Folders feature or have any drives mapped. Network drives on the host cannot be shared using Shared Folders, and any type of drive mapping exposes the host filesystem to guest malware that targets mapped drives. Remember, the objective is to keep the host safe from any malware that may affect the guest, so don't connect the host's filesystem to the guest. However, at some point, you may want to use the guest's browser to download a file from the Internet and make it available to the host. The safest way to do this is to use Virtual PC's drag-and-drop feature to transfer files between guest and host because it does not open up a TCP/IP connection between them. Keeping a guest machine up to date with all Windows Updates, service packs, and security patches is just as important as keeping the host machine up to date. It's easy for a guest machine to get behind on updates because it typically is turned off most of the time. It has to be running to receive updates and they must not be undone when the machine is turned off. 16 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines Using Virtual Machines for Security Analysis Now that you've seen how to use a virtual machine as a sort of Internet-browsing sandbox, expanding the use of the sandbox may seem logical. Using the Not connected network setting and then transferring a suspected malware file into a guest machine with drag and drop would appear to offer a safe environment for analyzing the behavior of the file. This technique might indeed work in many cases, but it could easily fail to detect malware in others. The problem is that a malicious coder can easily add code that checks whether his or her malware program is executing inside a virtual machine. The coder could program the malware to behave safely if it detects that it is running in a virtual environment. Thus, the malware would falsely pass the safety test and then run amuck inside the physical machines you wanted to protect. Some have proposed using virtual machines to host honeypots, another security technique that may seem attractive. Should malware damage the virtual honeypot, the argument goes, the virtual machine can be reset. Once again, the malware can determine if it's running in a virtual machine and behave differently, which makes the analysis a waste of time. With these caveats in mind, you should always undo your changes when you browse unknown Web sites. You can't assume that the virtual machine is free of malware just because it appears to be normal. Keys to a Successful Virtual Infrastructure Implementation We asked Todd Hudson, a senior systems engineer who oversaw his organization's virtual infrastructure deployment, for some valuable advice for those ready to make the VM leap. Here's what he said. The company I worked for decided to check out virtualization to reduce costs. We chose VMware because it is the market leader and its ESX server product gave great performance for the cost. Our team consisted of an architect and three engineers. We had help from other groups as needed, but that was the core group that got it implemented. We spent several months training and using VMware ESX Server before implementing it enterprise-wide. Today, we have dozens of hosts running hundreds of VMs. Comparing the differences between each product and determining which is best is beyond the scope of this article. Suffice it to say, you can pick the best product yet still fail miserably in your virtualization project if you do not plan well. A Step-by-Step Process When the company I worked for decided to adopt a virtualization solution, they chose VMware ESX Server for a multitude of reasons. We wanted to go with the market leader and use the product that would allow the greatest use of virtualization, while maintaining the lowest possible costs and minimizing hardware. We found that the main players for us were VMware and Microsoft, as the others were way too expensive or didn't support Microsoft as a VM. Comparing the number of VMs against the total ROI and TCO, VMware ESX was the best choice. We spent the first few months learning about the product, completing extensive research on how to set up and configure it, and determining which servers would be virtual and which physical servers to migrate to a virtual environment. We finally went live with our implementation after five months. We started with the lowest-risk servers – our test servers – in case there was an issue. When all went well with the 17 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines test servers, we moved on to disaster recovery and staging servers. Finally, we went after the CPUs with the lowest utilization, migrating them to a virtual environment with VMware's P2V Assistant, a migration tool that transforms an image of an existing physical system into a VMware virtual machine. At first, we fielded a lot of questions from our developers and users about VMware and how we were using it. Surprisingly, the biggest doubters were members of our IT department. We had to prove to them that virtualization was a good strategy for the datacenter, both in terms of productivity and cost savings. Today, the same people who questioned "going virtual" are the ones who complain when they can't virtualize their projects. Two years later, we have hundreds of virtual machines running on several dozen VMware ESX Servers. Virtualization has been invaluable in our disaster recovery strategy, because we can swap out a failed server with a spare we keep just for this purpose, reboot the virtual machines onto the new machine, and be up and running in about 45 minutes. Where to Begin The best way to get any virtualization method implemented is to first understand exactly what you are trying to do. The two most common reasons for virtualizing are saving money and addressing capacity issues such as limited rack space. No matter the issue, you can easily make a case for purchasing quad-processor servers, and perhaps even eight-way servers, because the cost of these large machines is substantially lower today than it was a few years ago. After all, the lowest-cost upfront method to begin your virtualization project is to use internal storage. However, in VMware's case, this would rule out the use of VMware's VMotion, which is a method of migrating virtual machines from one physical server to another without an outage. VMotion requires a storage-attached network (SAN). Once you choose a hardware platform and verify that all the parts are on the software manufacturer's hardware compatibility list, the next process is to get trained on the product. The first step is to create a lab setup with a minimum of two physical machines, enough disk space for four to eight virtual machines, and enough server resources (RAM, NIC, CPU, and disk I/O) to handle the virtual machines' activities. Then install and configure the virtualization software. If you are using HP or IBM servers, you will need to do some hardware tuning. Once you have installed and properly configured the software, you need to become familiar with the product. No matter which product you use, the vendors all offer training at a fairly nominal cost. For support reasons, maintaining an excellent relationship with your virtualization product vendor is also critical. One bad experience can lead to virtualization being delayed or potentially not even being adopted. On to the Fun Stuff Once you are familiar with the product, create a number of virtual machines to learn how the process works. Familiarize yourself with how the VMs share the physical servers' (hosts') resources and what, if any, impact changing parameters or adding new virtual machines has on the other virtual machines. You also should create good documentation explaining how to set it all back up if (or when) an issue arises. As an example, most companies that use VMware ESX Server do not back up the host server because installing and configuring it is very easy – it takes only five to seven minutes. However, backing up the virtual machines themselves is critical, as is backing up each VM's configuration files. When you start virtualizing, go after the least critical servers first – specifically, the ones that won't create a huge 18 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines problem if they fail and the ones that use minimal CPU. Typically, these are development or test servers. As an example, my company first used a planning tool to gather metrics on CPU, network, RAM, and disk I/O over the course of a month. We then focused our first efforts on the physical servers that used less than 20 percent CPU, using VMware's P2V product to move them from physical machines to VMs. Make sure your own migration process, regardless of which tool you use, does not make any changes to the physical server's hard drive, as changes can cause issues. VMware's P2V tool boots to a CD-ROM and writes no data to the physical server's hard drive at all. This way, if you have to turn the VM off and fail back to the physical server, it most likely will work without problems. Once you have the lower-end servers running in a virtual environment, you can tackle the higher-end servers, those that have 20 to 40 percent CPU usage. Give careful consideration to avoid overloading a system. It is better to mix high CPU/low disk I/O/low RAM usage virtual machines with virtual machines that are low CPU/high RAM usage than to have all of the same type on the same physical server. Issues and Resolutions As you well know, the server will eventually fail, if for no other reason than a system board failure. To help minimize this impact, do not group all virtual machines from the same group together, nor group all production or all test servers together. We decided that no more than 60 percent of the virtual machines on a physical server would be production servers; we use the rest of the capacity for development, staging, test, and disaster recovery virtual machines. This way, if a server goes down, it is easy to prioritize which virtual machines need to be recovered in the event of a serious disk failure. Or if the system board has an issue, an entire group of virtual machines won't go down with it. Rather, just a few systems from each of several groups are affected. When you are deciding what else to migrate to a virtual environment, stop and review the documentation. Is it current, or have you made some changes? If so, update it first and then ensure that various members of your team have been cross-trained. Planning like this is where you can make yourself (and your virtualization project) really valuable to your managers and executive team. As your virtual infrastructure grows, expect growing pains that test the manageability of the new environment. If the product you are using for virtualization does not have strong management capabilities, you'll end up spending a lot of time performing duties that otherwise could have been automated. As an example, suppose you have more than 50 virtual machines running on several servers. How do you find out (easily and quickly) where VM "X" is? What if you have 300+ virtual machines? What if you have 1,000? How do you shut down all of the virtual machines, or automatically start them when you need to reboot the physical server? With VMware's VirtualCenter product, you can easily do all this and more. If a product like VirtualCenter is not available, you will need to consider some type of scripting, such as Perl, to automate some of these functions. Otherwise, they will eat up your time and you will not gain all the efficiencies and cost savings that virtualization promises. For example, if you need to perform an upgrade of the host server for the virtualization product you choose, you then generally need to upgrade a software component running in each VM as well. Since it takes one minute to log on to a system, doing this manually is very time consuming. Planning Is Key Virtualization is here to stay. According to everything I have read, companies that do not use virtualization technology will have higher costs in not only acquiring new systems, but also in maintaining existing (legacy) systems. Based on what I have seen at different engagements, for every 1,000 servers a company has, it stands to save a minimum of $750,000 by using virtualization. 19 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp. The Developer's Guide to Virtual Machines Which product should you pick? That depends on what you are looking to do. What works for one company will not necessarily be right for your company. Not that the product cannot do what you need, but if you want to virtualize a number of test servers and are really sensitive to upfront costs, then you need to look for a less expensive solution. With planning and some training, your company can easily amass significant savings. Plan well so you can execute flawlessly. If you succeed, your group will be stars to management. And when you look back after your first year and see the money you've saved, how about throwing some my way? : ) This content was adapted from the Special Report “Virtual Machines Usher in a New Era” on DevX.com. Contributors: Glen Kunene, Megan Davis, Wellie Chao, Martin Heller, Edmon Begoli, John Paul Cook, and Todd Hudson. Copyright 2006 Jupitermedia Corp. 20 The Developer’s Guide to Virtual Machines, an Internet.com Developer eBook. Copyright 2006, Jupitermedia Corp.