Collated vSphere Best Practices, Pre-requisites and requirements Paul Meehan http://paulpmeehan.com 1 2 Introduction ....................................................................................................................... 6 1.1 Version Control ........................................................................................................... 6 1.2 Current Scope .............................................................................................................. 7 vSphere Installation and Setup ..........................................................................................8 2.2 ESXi Booting Requirements .......................................................................................8 2.3 ESXi Support for 64-Bit Guest Operating Systems .................................................. 10 2.4 Hardware Requirements for vCenter Server, vCenter Single Sign On, vSphere Client, and vSphere Web Client ............................................................................................ 11 2.5 vCenter Server and vSphere Client System Recommendations for Performance Based on Deployment Size ................................................................................................... 14 3 4 5 2.6 Required Ports for vCenter Server ............................................................................ 16 2.7 Required Ports for the vCenter Server Appliance ..................................................... 17 2.8 Prepare Your System and Install the Auto Deploy Server ........................................ 19 2.9 Auto Deploy Best Practices and Security Consideration ..........................................20 2.10 Before You Install vCenter Server ............................................................................. 22 2.11 (vCenter) Database Prerequisites ............................................................................. 24 2.12 Using a User Account for Running vCenter Server .................................................. 29 HA Clusters.......................................................................................................................30 3.1 Requirements for a vSphere HA Cluster ...................................................................30 3.2 EVC Requirements for Hosts ....................................................................................30 3.3 Best Practices for vSphere HA Clusters .................................................................... 31 3.4 Best Practices for Networking ................................................................................... 39 3.5 Fault Tolerance ......................................................................................................... 42 Networking ....................................................................................................................... 49 4.1 vSphere Distributed Switch Health Check ................................................................ 49 4.2 vDS Port Group settings and parameters ................................................................. 49 4.3 vSphere Network I/O Control ................................................................................... 52 4.4 TCP Segmentation Offload and Jumbo Frames ....................................................... 53 4.5 Single Root I/O Virtualization (SR-IOV) .................................................................. 54 4.6 Configure NetFlow Settings ...................................................................................... 58 4.7 Mounting NFS Volumes ............................................................................................ 58 4.8 Networking Best Practices ........................................................................................ 59 Storage .............................................................................................................................. 61 5.1 Making LUN Decisions ............................................................................................. 61 5.2 Best Practices for Fibre Channel Storage .................................................................. 63 5.3 Preventing Fibre Channel SAN Problems ................................................................. 64 6 5.4 Disable Automatic Host Registration ....................................................................... 64 5.5 Optimizing Fibre Channel SAN Storage Performance .............................................. 64 5.6 iSCSI .......................................................................................................................... 67 5.7 iSCSI SAN Restrictions ............................................................................................. 69 5.8 iBFT iSCSI Boot Overview ........................................................................................ 71 5.9 Best Practices for iSCSI Storage ................................................................................ 73 5.10 Preventing iSCSI SAN Problems ............................................................................... 73 5.11 Optimizing iSCSI SAN Storage Performance ............................................................ 74 5.12 Checking Ethernet Switch Statistics ......................................................................... 78 5.13 iSCSI SAN Configuration Checklist .......................................................................... 78 5.14 Identifying Device Connectivity Problems................................................................ 78 5.15 Best Practices for SSD Devices ..................................................................................83 5.16 Upgrading VMFS Datastores ................................................................................... 88 5.17 Set Up Dynamic Disk Mirroring ...............................................................................89 5.18 Creating a Diagnostic Partition ................................................................................ 90 5.19 About Raw Device Mapping ..................................................................................... 90 5.20 Raw Device Mapping Characteristics........................................................................ 92 5.21 VMkernel and Storage............................................................................................... 93 5.22 Understanding Multipathing and Failover ............................................................... 95 5.23 Array-Based Failover with iSCSI............................................................................... 96 5.24 Path Failover and Virtual Machines..........................................................................98 5.25 Managing Multiple Paths .......................................................................................... 99 5.26 VMware Multipathing Module................................................................................ 100 5.27 Path Scanning and Claiming ................................................................................... 102 5.28 Managing Storage Paths and Multipathing Plug-Ins.............................................. 103 5.29 Multipathing Considerations .................................................................................. 104 5.30 Hardware Acceleration on NAS Devices ................................................................. 107 5.31 Hardware Acceleration Considerations .................................................................. 107 5.32 Booting ESXi with Software FCoE .......................................................................... 108 5.33 Requirements and Considerations for Software FCoE Boot................................... 108 5.34 Best Practices for Software FCoE Boot ................................................................... 109 vSphere Resource Management ..................................................................................... 110 6.1 Configuring Resource Allocation Settings .............................................................. 110 6.2 Memory Virtualization Basics ................................................................................. 122 6.3 Memory Reliability.................................................................................................. 133 6.4 Managing Storage I/O Resources ........................................................................... 133 7 6.5 Set Storage I/O Control Threshold Value ............................................................... 135 6.6 Managing Resource Pools ....................................................................................... 136 6.7 Managing Resource Pools ....................................................................................... 137 6.8 Resource Pool Admission Control .......................................................................... 139 6.9 Creating a DRS Cluster ............................................................................................141 6.10 DRS Cluster Requirements ......................................................................................141 6.11 Removing a Host from a Cluster ............................................................................. 144 6.12 DRS Cluster Validity ............................................................................................... 145 6.13 DPM ........................................................................................................................ 149 6.14 Datastore clusters.................................................................................................... 152 6.15 Setting the Aggressiveness Level for Storage DRS.................................................. 152 6.16 Datastore Cluster Requirements ............................................................................. 153 6.17 Adding and Removing Datastores from a Datastore Cluster .................................. 153 6.18 Storage DRS Anti-Affinity Rules ............................................................................. 155 6.19 Storage vMotion Compatibility with Datastore Clusters ........................................ 157 6.20 Using NUMA Systems with ESXi ........................................................................ 157 Security ........................................................................................................................... 163 7.2 Securing Standard Switch Ports.............................................................................. 172 7.3 Cipher Strength ....................................................................................................... 174 7.4 Control CIM-Based Hardware Monitoring Tool Access ......................................... 175 7.5 General Security Recommendations ....................................................................... 176 7.6 ESXi Firewall Configuration ....................................................................................177 7.7 Lockdown Mode Behavior....................................................................................... 180 7.8 Lockdown Mode Configurations ............................................................................. 181 7.9 ESXi Authentication and User Management .......................................................... 181 7.10 Best Practices for Roles and Permissions ............................................................... 185 7.11 Replace a Default ESXi Certificate with a CA-Signed Certificate ........................... 185 7.12 Modifying ESXi Web Proxy Settings ....................................................................... 186 7.13 General Virtual Machine Protection ....................................................................... 187 7.14 Removing Unnecessary Hardware Devices ............................................................ 189 7.15 Securing vCenter Server Systems............................................................................ 189 7.16 Best Practices for Virtual Machine and Host Security ............................................ 190 7.17 Installing Antivirus Software .................................................................................. 190 7.18 Managing ESXi Log Files .........................................................................................191 7.19 Securing Fault Tolerance Logging Traffic ................................................................191 7.20 Auto Deploy Security Considerations ..................................................................... 192 8 9 7.21 Image Builder Security Considerations .................................................................. 192 7.22 Host Password Strength and Complexity ............................................................... 193 7.23 Synchronizing Clocks on the vSphere Network ...................................................... 195 7.24 Monitoring and Restricting Access to SSL Certificates .......................................... 195 MSCS .............................................................................................................................. 196 8.2 Cluster Virtual Machines Across Physical Hosts .................................................... 199 8.3 Cluster Physical and Virtual Machines ................................................................... 199 8.4 vSphere MSCS Setup Checklist .............................................................................. 200 Virtual Machine Administration ................................................................................... 202 9.1 What Is a Virtual Machine?.................................................................................... 202 9.2 Installing the Microsoft Sysprep Tool .................................................................... 204 9.3 Virtual Machine Compatibility Options .................................................................. 207 9.4 Change CPU Hot Plug Settings in the vSphere Web Client ................................... 209 9.5 VM Disk Persistence Modes .................................................................................... 213 9.6 SCSI Controller Configuration ................................................................................ 214 9.7 Configure Fibre Channel NPIV Settings in the vSphere Web Client ...................... 219 9.8 Managing Multi-Tiered Applications with vSphere vApp in the vSphere Web Client 221 9.9 vCenter Solutions Manager ..................................................................................... 222 9.10 Monitoring vServices .............................................................................................. 223 9.11 Using Snapshots To Manage Virtual Machines ...................................................... 224 9.12 Change Disk Mode to Exclude Virtual Disks from Snapshots in the vSphere Web Client 229 1 Introduction Hello and Welcome…. It’s 02/01/2014. While studying for VCAP-DCD and preparing a submission for VCDX, I’ve been trying to find a really good source for all pre-requisites, best practices, requirements and other important information used by vSphere Architects, admins and end users. As a designer, understanding the impact of your decisions against these items is key. I like to have things in one place if possible. This document has been created from the existing vSphere official documentation set (pubs.vmware.com), using copy and paste, to capture the requirements above into a single document that can be used as a reference. Note: This is the result of my personal editing of what I believe are the useful nuggets we all need to know. However, once I started I realised it could never be a 3-4 page document which was my original intention, so better to make it complete and use an index to allow people to search around. The Intellectual Property for this material is not mine. It has not been added to in ANY way by me. This material is VMware Copyrighted material that is freely available on pubs.vmware.com. Once I started doing this I noticed this link where this is a massive amount of documentation, best practices etc. http://bit.ly/1caCcQs. o That should be Bookmark #1 for any vSphere/vCloud Architect. I have copied what I believe are other nuggets of info that will help you implementing the best possible vSphere design. The current version (1.0) relates to v5.1 which is the subject of my studies. As with all things in the Vmware community it’s important to pass this around so folks following the same certification route might find it useful, or just for general reference. So please share and I truly hope you find it useful. This document assumes a medium level of understanding and is not a “build” guide. Please email me on info@tieroneconsulting.ie with any feedback or other ideas. 1.1 Version Control Version 1.0 Date 02/01/2014 Author Paul P Meehan Description Issued version for vSphere 5.1. Still requires some additional input from other vSphere official documentation. 1.2 Current Scope At the time of writing the following vSphere documents have been reviewed, explored, edited and copied. I will be adding to this list on a regular basis and will upgrade the set to v5.5 in the near future. Availability Networking Storage Security Resource Management Installation and Setup Host Profiles Virtual Machine Administration 2 vSphere Installation and Setup 2.1.1 Hardware and System Resources To install and use ESXi 5.1, your hardware and system resources must meet the following requirements: Supported server platform. For a list of supported platforms, see the VMware Compatibility Guide at http://www.vmware.com/resources/compatibility. ESXi 5.1 will install and run only on servers with 64-bit x86 CPUs. ESXi 5.1 requires a host machine with at least two cores. ESXi 5.1 supports only LAHF and SAHF CPU instructions. ESXi 5.1 requires the NX/XD bit to be enabled for the CPU in the BIOS. ESXi supports a broad range of x64 multicore processors. For a complete list of supported processors, see the VMware compatibility guide at http://www.vmware.com/resources/compatibility. ESXi requires a minimum of 2GB of physical RAM. Provide at least 8GB of RAM to take full advantage of ESXi features and run virtual machines in typical production environments. To support 64-bit virtual machines, support for hardware virtualization (Intel VT-x or AMD RVI) must be enabled on x64 CPUs. One or more Gigabit or 10Gb Ethernet controllers. For a list of supported network adapter models, see the VMware Compatibility Guide at http://www.vmware.com/resources/compatibility. Any combination of one or more of the following controllers: o Basic SCSI controllers. Adaptec Ultra-160 or Ultra-320, LSI Logic FusionMPT, or most NCR/Symbios SCSI. o RAID controllers. Dell PERC (Adaptec RAID or LSI MegaRAID), HP Smart Array RAID, or IBM (Adaptec) ServeRAID controllers. SCSI disk or a local, non-network, RAID LUN with unpartitioned space for the virtual machines. For Serial ATA (SATA), a disk connected through supported SAS controllers or supported on-board SATA controllers. SATA disks will be considered remote, not local. These disks will not be used as a scratch partition by default because they are seen as remote. Note You cannot connect a SATA CD-ROM device to a virtual machine on an ESXi 5.1 host. To use the SATA CD-ROM device, you must use IDE emulation mode. 2.2 ESXi Booting Requirements vSphere 5.1 supports booting ESXi hosts from the Unified Extensible Firmware Interface (UEFI). With UEFI you can boot systems from hard drives, CD-ROM drives, or USB media. Network booting or provisioning with VMware Auto Deploy requires the legacy BIOS firmware and is not available with UEFI. ESXi can boot from a disk larger than 2TB provided that the system firmware and the firmware on any add-in card that you are using support it. See the vendor documentation. Note Changing the boot type from legacy BIOS to UEFI after you install ESXi 5.1 might cause the host to fail to boot. In this case, the host displays an error message similar to: Not a VMware boot bank. Changing the host boot type between legacy BIOS and UEFI is not supported after you install ESXi 5.1. 2.2.1 Storage Requirements for ESXi 5.1 Installation Installing ESXi 5.1 requires a boot device that is a minimum of 1GB in size. When booting from a local disk or SAN/iSCSI LUN, a 5.2GB disk is required to allow for the creation of the VMFS volume and a 4GB scratch partition on the boot device. If a smaller disk or LUN is used, the installer will attempt to allocate a scratch region on a separate local disk. If a local disk cannot be found the scratch partition, /scratch, will be located on the ESXi host ramdisk, linked to /tmp/scratch. You can reconfigure /scratch to use a separate disk or LUN. For best performance and memory optimization, VMware recommends that you do not leave /scratch on the ESXi host ramdisk. To reconfigure /scratch, see Set the Scratch Partition from the vSphere Client. Due to the I/O sensitivity of USB and SD devices the installer does not create a scratch partition on these devices. As such, there is no tangible benefit to using large USB/SD devices as ESXi uses only the first 1GB. When installing on USB or SD devices, the installer attempts to allocate a scratch region on an available local disk or datastore. If no local disk or datastore is found, /scratch is placed on the ramdisk. You should reconfigure /scratch to use a persistent datastore following the installation. In Auto Deploy installations, the installer attempts to allocate a scratch region on an available local disk or datastore. If no local disk or datastore is found /scratch is placed on ramdisk. You should reconfigure /scratch to use a persistent datastore following the installation. For environments that boot from a SAN or use Auto Deploy, it is not necessary to allocate a separate LUN for each ESXi host. You can co-locate the scratch regions for many ESXi hosts onto a single LUN. The number of hosts assigned to any single LUN should be weighed against the LUN size and the I/O behavior of the virtual machines. 2.2.2 Recommendation for Enhanced ESXi Performance To enhance performance, install ESXi on a robust system with more RAM than the minimum required and with multiple physical disks. Recommendations for Enhanced Performance System ElementRecommendation ESXi hosts require more RAM than typical servers. Provide at least 8GB of RAM to take full advantage of ESXi features and run virtual machines in typical production environments. An ESXi host must have sufficient RAM to run concurrent virtual machines. The following examples are provided to help you calculate the RAM required by the virtual machines running on the ESXi host. Operating four virtual machines with Red Hat Enterprise Linux or Windows XP requires at least 3GB of RAM for baseline performance. This figure includes approximately 1024MB for the virtual machines, 256MB minimum for each operating system as recommended by vendors. RAM Running these four virtual machines with 512MB RAM requires that the ESXi host have approximately 4GB RAM, which includes 2048MB for the virtual machines. These calculations do not take into account possible memory savings from using variable overhead memory for each virtual machine. See vSphere Resource Management . Dedicated Fast Ethernet adapters for virtual machines Place the management network and virtual machine networks on different physical network cards. Dedicated Gigabit Ethernet cards for virtual machines, such as Intel PRO 1000 adapters, improve throughput to virtual machines with high network traffic. Disk location Place all data that your virtual machines use on physical disks allocated specifically to virtual machines. Performance is better when you do not place your virtual machines on the disk containing the ESXi boot image. Use physical disks that are large enough to hold disk images that all the virtual machines use. VMFS5 partitioning The ESXi installer creates the initial VMFS volumes on the first blank local disk found. To add disks or modify the original configuration, use the vSphere Client. This practice ensures that the starting sectors of partitions are 64K-aligned, which improves storage performance. 2.3 ESXi Support for 64-Bit Guest Operating Systems ESXi offers support for several 64-bit guest operating systems. For a complete list of operating systems supported for ESXi, see the VMware Compatibiity Guide at http://www.vmware.com/resources/compatibility/search.php. Hosts running virtual machines with 64-bit guest operating systems have the following hardware requirements: For AMD Opteron-based systems, the processors must be Opteron Rev E or later. For Intel Xeon-based systems, the processors must include support for Intel Virtualization Technology (VT). Many servers that include CPUs with VT support might have VT disabled by default, so you must enable VT manually. If your CPUs support VT ,but you do not see this option in the BIOS, contact your vendor to request a BIOS version that lets you enable VT support. To determine whether your server has 64-bit VMware support, you can download the CPU Identification Utility from the VMware Web site. 2.4 Hardware Requirements for vCenter Server, vCenter Single Sign On, vSphere Client, and vSphere Web Client The vCenter Server system is a physical machine or virtual machine with access to a supported database. The vCenter Server system must meet specific requirements. The vCenter Server machines must meet the hardware requirements. 2.4.1 vCenter Single Sign On, Inventory Service and vCenter Server Hardware Requirements You can install vCenter Single Sign On, Inventory Service, and vCenter Server on the same host machine (as with vCenter Simple Install) or on different machines. Minimum Hardware Requirements for vCenter Single Sign On, Running on a Separate Host Machine from vCenter Server and Minimum Hardware Requirements for vCenter Inventory Service, Running on a Separate Host Machine from vCenter Server list the hardware requirements for Single Sign On and Inventory Service, running on separate host machines. If you install vCenter Single Sign On, vCenter Inventory Service, and vCenter Server on the same host machine, the Single Sign On and Inventory Service memory and disk storage requirements are in addition to the requirements for vCenter Server. See Minimum Hardware Requirements for vCenter Server. Minimum Hardware Requirements for vCenter Single Sign On, Running on a Separate Host Machine from vCenter Server vCenter Single Sign On Requirement Hardware Processor Intel or AMD x64 processor with two or more logical cores, each with a speed of 2GHz. Memory 3GB. Memory requirements might be higher if the vCenter Single Sign On database runs on the same host machine. If vCenter Single Sign On runs on the same host machine as vCenter Server, see Minimum Hardware Requirements for vCenter Server. Disk storage 2GB. Disk requirements might be higher if the vCenter Single Sign On database runs on the same host machine. Network speed 1Gbps 2.4.2 Minimum Hardware Requirements for vCenter Inventory Service, Running on a Separate Host Machine from vCenter Server vCenter Inventory Service Hardware Requirement Processor Intel or AMD x64 processor with two or more logical cores, each with a speed of 2GHz. Memory 3GB. If vCenter Inventory Service runs on the same host machine as vCenter Server, see Minimum Hardware Requirements for vCenter Server. Disk storage Network speed At least 60GB for medium- to large-sized inventories (more than 100 hosts or 1000 virtual machines). If vCenter Inventory Service runs on the same host machine as vCenter Server, see Minimum Hardware Requirements for vCenter Server. 1Gbps 2.4.3 Minimum Hardware Requirements for vCenter Server vCenter Server Requirement Hardware CPU Two 64-bit CPUs or one 64-bit dual-core processor. Processor 2.0GHz or faster Intel 64 or AMD 64 processor. The Itanium (IA64) processor is not supported. Processor requirements might be higher if the database runs on the same machine. The amount of memory needed depends on your vCenter Server configuration. If vCenter Server is installed on a different host machine than vCenter Single Sign On and vCenter Inventory Service, 4GB of RAM are required. Memory If vCenter Server, vCenter Single Sign On and vCenter Inventory Service are installed on the same host machine (as with vCenter Simple Install), 10GB of RAM are required. Memory requirements are higher if the vCenter Server database or vCenter Single Sign On database runs on the same machine as vCenter Server. vCenter Server includes several Java services: VMware VirtualCenter Management Webservices (tc Server), Inventory Service, and Profile- 2.4.2 Minimum Hardware Requirements for vCenter Inventory Service, Running on a Separate Host Machine from vCenter Server vCenter Inventory Service Hardware Requirement Driven Storage Service. When you install vCenter Server, you select the size of your vCenter Server inventory to allocate memory for these services. The inventory size determines the maximum JVM heap settings for the services. You can adjust this setting after installation if the number of hosts in your environment changes. See the recommendations in JVM Heap Settings for vCenter Server. The amount of disk storage needed for the vCenter Server installation depends on your vCenter Server configuration. Disk storage If vCenter Server is installed on a different host machine than vCenter Single Sign On and vCenter Inventory Service, 4GB are required. If vCenter Server, vCenter Single Sign On and vCenter Inventory The JVM heap settings for vCenter Server depend on your inventory size. See Configuring VMware Tomcat Server Settings in vCenter Server 5.1. 2.4.4 JVM Heap Settings for vCenter Server Inventory Service Profile-Driven Storage Service Small inventory (1-100 hosts 1GB or 1-1000 virtual machines) 3GB 512MB Medium inventory (100400 hosts or 1000-4000 virtual machines) 6GB 1GB 12GB 2GB vCenter Server Inventory VMware VirtualCenter Management Webservices (tc Server) 2GB Large inventory (More than 400 hosts or 4000 virtual 3GB machines) Note Installing vCenter Server on a network drive or USB flash drive is not supported. For the hardware requirements of your database, see your database documentation. The database requirements are in addition to the vCenter Server requirements if the database and vCenter Server run on the same machine. 2.4.5 VMware vCenter Server Appliance Hardware Requirements and Recommendations Important The embedded database is not configured to manage an inventory that contains more than 5 hosts and 50 virtual machines. If you use the embedded database with the vCenter Server Appliance, exceeding these limits can cause numerous problems, including causing vCenter Server to stop responding. 2.4.5.1 Hardware Requirements for VMware vCenter Server Appliance VMware vCenter Server Requirement Appliance Hardware The vCenter Server Appliance requires at least 7GB of disk space, and is limited to a maximum size of 80GB. The vCenter Server Appliance can be deployed with Disk storage on the thin-provisioned virtual disks that can grow to the maximum size of 80GB. If the host machine does not have enough free disk space to accommodate the growth of host machine the vCenter Server Appliance virtual disks, vCenter Server might cease operation, and you will not be able to manage your vSphere environment. Very small inventory (10 or fewer hosts, 100 or fewer virtual machines): at least 4GB. Memory in the VMware vCenter Server Appliance Small inventory (10-100 hosts or 100-1000 virtual machines): at least 8GB. Medium inventory (100-400 hosts or 1000-4000 virtual machines): at least 16GB. Large inventory (More than 400 hosts or 4000 virtual machines): at least 24GB. 2.5 vCenter Server and vSphere Client System Recommendations for Performance Based on Deployment Size The number of hosts and powered-on virtual machines in your environment affects performance. Use the following system requirements as minimum guidelines for reasonable For increased performance, you can configure systems in your environment with values greater than those listed here. Processing requirements are listed in terms of hardware CPU cores. Only physical cores are counted. In hyperthreaded systems, logical CPUs do not count as separate cores. Log Management Agent (hostd) VirtualCenter Agent (vpxa) vSphere HA agent (Fault Domain Manager, fdm) Maximum Log Number of Rotations Minimum Disk File Size to Preserve Space Required 10240KB 10 100MB 5120KB 10 50MB 5120KB 10 50MB Important The recommended disk sizes assume default log levels. If you configure more detailed log levels, more disk space is required. Medium Deployment of Up to 50 Hosts and 500 Powered-On Virtual Machines Product Cores Memory Disk vCenter Server 2 4GB 5GB vSphere Client 1 1GB 1.5GB Large Deployment of Up to 300 Hosts and 3,000 Powered-On Virtual Machines Product Cores Memory Disk vCenter Server 4 8GB 10GB vSphere Client 1 1GB 1.5GB Extra-Large Deployment of Up to 1,000 Hosts and 10,000 Powered-On Virtual Machines Product Cores Memory Disk vCenter Server 8 16GB 10GB vSphere Client 2 1GB 1.5GB 2.5.1 vSphere Web Client Hardware Requirements The vSphere Web Client has two components: A Java server and an Adobe Flex client application running in a browser. Hardware Requirements for the vSphere Web Client Server Component vSphere Web Client Server Hardware Requirement Memory At least 2GB: 1GB for the Java heap, and 1GB for The resident code Hardware Requirements for the vSphere Web Client Server Component vSphere Web Client Server Hardware Requirement The stack for Java threads Global/bss segments for the Java process CPU 2.00 GHz processor with 4 cores Disk Storage At least 2GB free disk space Networking Gigabit connection recommended 2.5.2 Recommended Minimum Size and Rotation Configuration for hostd, vpxa, and fdm Logs. Log Maximum Log Number of RotationsMinimum Disk File Size to Preserve Space Required Management Agent (hostd) 10240KB 10 100MB VirtualCenter Agent (vpxa) 5120KB 10 50MB vSphere HA agent (Fault Domain Manager, fdm) 5120KB 10 50MB 2.6 Required Ports for vCenter Server Port Description vCenter Server requires port 80 for direct HTTP connections. Port 80 redirects requests to HTTPS port 443. This redirection is useful if you accidentally use http://server instead of https://server. 80 If you use a custom Microsoft SQL database (not the bundled SQL Server 2008 database) that is stored on the same host machine as the vCenter Server, port 80 is used by the SQL Reporting Service. When you install vCenter Server, the installer will prompt you to change the HTTP port for vCenter Server. Change the vCenter Server HTTP port to a custom value to ensure a successful installation. Microsoft Internet Information Services (IIS) also use port 80. See Conflict Between vCenter Server and IIS for Port 80. 389 This port must be open on the local and all remote instances of vCenter Server. This is the LDAP port number for the Directory Services for the Port Description vCenter Server group. The vCenter Server system needs to bind to port 389, even if you are not joining this vCenter Server instance to a Linked Mode group. If another service is running on this port, it might be preferable to remove it or change its port to a different port. You can run the LDAP service on any port from 1025 through 65535. If this instance is serving as the Microsoft Windows Active Directory, change the port number from 389 to an available port from 1025 through 65535. The default port that the vCenter Server system uses to listen for connections from the vSphere Client. To enable the vCenter Server system to receive data from the vSphere Client, open port 443 in the firewall. 443 The vCenter Server system also uses port 443 to monitor data transfer from SDK clients. If you use another port number for HTTPS, you must use ip-address:port when you log in to the vCenter Server system. 636 For vCenter Server Linked Mode, this is the SSL port of the local instance. If another service is running on this port, it might be preferable to remove it or change its port to a different port. You can run the SSL service on any port from 1025 through 65535. 2.7 Required Ports for the vCenter Server Appliance Port Description 80 vCenter Server requires port 80 for direct HTTP connections. Port 80 redirects requests to HTTPS port 443. This redirection is useful if you accidentally use http://server instead of https://server. The default port that the vCenter Server system uses to listen for connections from the vSphere Client. To enable the vCenter Server system to receive data from the vSphere Client, open port 443 in the firewall. 443 The vCenter Server system also uses port 443 to monitor data transfer from SDK clients. If you use another port number for HTTPS, you must use ipaddress:port when you log in to the vCenter Server system. 902 The default port that the vCenter Server system uses to send data to managed hosts. Managed hosts also send a regular heartbeat over UDP port 902 to the vCenter Server system. This port must not be blocked Port Description by firewalls between the server and the hosts or between hosts. Port 902 must not be blocked between the vSphere Client and the hosts. The vSphere Client uses this port to display virtual machine consoles. 8080 Web Services HTTP. Used for the VMware VirtualCenter Management Web Services. 8443 Web Services HTTPS. Used for the VMware VirtualCenter Management Web Services. 10080 vCenter Inventory Service HTTP 10443 vCenter Inventory Service HTTPS 10109 vCenter Inventory Service database 514 vSphere Syslog Collector server 2.8 Prepare Your System and Install the Auto Deploy Server Before you turn on a host for PXE boot with vSphere Auto Deploy, you must install prerequisite software and set up the DHCP and TFTP servers that Auto Deploy interacts with. Ensure that the hosts that you will provision with Auto Deploy meet the hardware requirements for ESXi 5.1. See ESXi Hardware Requirements. Note You cannot provision EFI hosts with Auto Deploy unless you switch the EFI system to BIOS compatibility mode. Ensure that the ESXi hosts have network connectivity to vCenter Server and that all port requirements are met. See Required Ports for vCenter Server. If you want to use VLANs in your Auto Deploy environment, you must set up the end to end networking properly. When the host is PXE booting, the UNDI driver must be set up to tag the frames with proper VLAN IDs. You must do this set up manually by making the correct changes in the BIOS. You must also correctly configure the ESXi port groups with the correct VLAN IDs. Ask your network administrator how VLAN IDs are used in your environment. Ensure that you have enough storage for the Auto Deploy repository. The Auto Deploy server uses the repository to store data it needs, including the rules and rule sets you create and the VIBs and image profiles that you specify in your rules. Best practice is to allocate 2GB to have enough room for four image profiles and some extra space. Each image profile requires approximately 350MB. Determine how much space to reserve for the Auto Deploy repository by considering how many image profiles you expect to use. Obtain the vCenter Server installation media, which include the Auto Deploy installer, or deploy the vCenter Server Appliance. See Installing vCenter Server. See Using Auto Deploy with the VMware vCenter Server Appliance. Ensure that a TFTP server is available in your environment. If you require a supported solution, purchase a supported TFTP server from your vendor of choice. Obtain administrative privileges to the DHCP server that manages the network segment you want to boot from. You can use a DHCP server already in your environment, or install a DHCP server. For your Auto Deploy setup, replace the gpxelinux.0 file name with undionly.kpxe.vmw-hardwired. Secure your network as you would for any other PXE-based deployment method. Auto Deploy transfers data over SSL to prevent casual interference and snooping. However, the authenticity of the client or the Auto Deploy server is not checked during a PXE boot. . Note Auto Deploy is not supported with NPIV (N_Port ID Virtualization). Set up a remote Syslog server. See the vCenter Server and Host Management documentation for Syslog server configuration information. Configure the first host you boot to use the remote syslog server and apply that host's host profile to all other target hosts. Optionally, install and use the vSphere Syslog Collector, a vCenter Server support tool that provides a unified architecture for system logging and enables network logging and combining of logs from multiple hosts. Install ESXi Dump Collector and set up your first host so all core dumps are directed to ESXi Dump Collector and apply the host profile from that host to all other hosts. See Configure ESXi Dump Collector with ESXCLI and Set Up ESXi Dump Collector from the Host Profiles Interface in the vSphere Client. See Install or Upgrade vSphere ESXi Dump Collector. Auto Deploy does not support a pure IPv6 environment because the PXE boot specifications do not support IPv6. However, after the initial PXE boot state, the rest of the communication can happen over IPv6. You can register Auto Deploy to the vCenter Server system with IPv6, and you can set up the host profiles to bring up hosts with IPv6 addresses. Only the initial boot process requires an IPv4 address. 2.9 Auto Deploy Best Practices and Security Consideration Follow best practices when installing vSphere Auto Deploy and when using Auto Deploy with other vSphere components. Set up a highly available Auto Deploy infrastructure in large production environments or when using stateless caching. Follow all security guidelines that you would follow in a PXE boot environment, and consider the recommendations in this chapter. 2.9.1 Auto Deploy and vSphere HA Best Practices You can improve the availability of the virtual machines running on hosts provisioned with Auto Deploy by following best practices. Some environments configure the hosts provisioned with Auto Deploy with a distributed switch or configure virtual machines running on the hosts with Auto Start Manager. In those environments, deploy the vCenter Server system so that its availability matches the availability of the Auto Deploy server. Several approaches are possible. o o o In a proof of concept environment, deploy the vCenter Server system and the Auto Deploy server on the same system. In all other situations, install the two servers on separate systems. Deploy vCenter Server Heartbeat. VMware vCenter Server Heartbeat delivers high availability for VMware vCenter Server, protecting the virtual and cloud infrastructure from application, configuration, operating system, or hardware related outages. Deploy the vCenter Server system in a virtual machine. Run the vCenter Server virtual machine in a vSphere HA enabled cluster and configure the virtual machine with a vSphere HA restart priority of high. Include two or more hosts in the cluster that are not managed by Auto Deploy and pin the vCenter Server virtual machine to these hosts by using a rule (vSphere HA DRS required VM to host rule). You can set up the rule and then disable DRS if you do not wish to use DRS in the cluster. The greater the number of hosts that are not managed by Auto Deploy the greater your resilience to host failures. Note This approach is not suitable if you use Auto Start Manager because Auto Start Manager is not supported in a cluster enabled for vSphere HA. 2.9.2 Auto Deploy Networking Best Practices Prevent networking problems by following Auto Deploy networking best practices. IP Address Allocation Using DHCP reservations is highly recommended for address allocation. Fixed IP addresses are supported by the host customization mechanism, but providing input for each host is cumbersome and not recommended. VLAN Using Auto Deploy in environments that do not use VLANs is highly Considerations recommended. If you intend to use Auto Deploy in an environment that uses VLANs, you must make sure that the hosts you want to provision can reach the DHCP server. How hosts are assigned to a VLAN depends on the setup at your site. The VLAN ID might be assigned by the switch or by the router, or you might be able to set the VLAN ID in the host's BIOS or through the host profile. Contact your network administrator to determine the steps for allowing hosts to reach the DHCP server. 2.9.2.1 Auto Deploy and VMware Tools Best Practices See the VMware Knowledge Base article 2004018 for Auto Deploy and VMware Tools best practices. 2.9.2.2 Auto Deploy Load Management Best Practice Simultaneously booting large numbers of hosts places a significant load on the Auto Deploy server. Because Auto Deploy is a web server at its core, you can use existing web server scaling technologies to help distribute the load. For example, one or more caching reverse proxy servers can be used with Auto Deploy. The reverse proxies serve up the static files that make up the majority of an ESXi boot image. Configure the reverse proxy to cache static content and pass all requests through to the Auto Deploy server. See the VMware Techpubs Video Using Reverse Web Proxy Servers for Auto Deploy. onfigure the hosts to boot off the reverse proxy by using multiple TFTP servers, one for each reverse proxy server. Finally, set up the DHCP server to send different hosts to different TFTP servers. When you boot the hosts, the DHCP server sends them to different TFTP servers. Each TFTP server sends hosts to a different server, either the Auto Deploy server or a reverse proxy server, significantly reducing the load on the Auto Deploy server. After a massive power outage, VMware recommends that you bring up the hosts on a percluster basis. If you bring up multiple clusters simultaneously, the Auto Deploy server might experience CPU bottlenecks. All hosts come up after a potential delay. The bottleneck is less severe if you set up the reverse proxy. 2.9.3 vSphere Auto Deploy Logging and Troubleshooting Best Practices To resolve problems you encounter with vSphere Auto Deploy, use the Auto Deploy logging information from the vSphere Client and set up your environment to send logging information and core dumps to remote hosts. 2.10 Before You Install vCenter Server 2.10.1 System Prerequisites Verify that your system meets the requirements listed in Hardware Requirements for vCenter Server, vCenter Single Sign On, vSphere Client, and vSphere Web Client and vCenter Server Software Requirements, and that the required ports are open, as discussed in Required Ports for vCenter Server. Before you install or upgrade any vSphere product, synchronize the clocks of all machines on the vSphere network. See Synchronizing Clocks on the vSphere Network. Review the Windows Group Policy Object (GPO) password policy for your system machines. The Single Sign On installation requires you to enter passwords that comply with GPO password policy. Verify that the DNS name of the vCenter Server host machine matches the actual computer name. Verify that the host name of the machine that you are installing vCenter Server on complies with RFC 952 guidelines. The installation path of vCenter Server must be compatible with the installation requirements for Microsoft Active Directory Application Mode (ADAM/AD LDS). The installation path cannot contain any of the following characters: non-ASCII characters, commas (,), periods (.), exclamation points (!), pound signs (#), at signs (@), or percentage signs (%). Verify that the host machine computer name is no more than 15 characters. Verify that the system on which you are installing vCenter Server is not an Active Directory domain controller. On each system that is running vCenter Server, verify that the domain user account has the following permissions: o Member of the Administrators group o Act as part of the operating system o Log on as a service vCenter Server requires the Microsoft .NET 3.5 SP1 Framework. If your system does not have it installed, the vCenter Server installer installs it. The .NET 3.5 SP1 installation might require Internet connectivity to download more files. If the system that you use for your vCenter Server installation belongs to a workgroup rather than a domain, not all functionality is available to vCenter Server. If assigned to a workgroup, the vCenter Server system is not able to discover all domains and systems available on the network when using some features. To determine whether the system belongs to a workgroup or a domain, right-click My Computer. Click Properties and click the Computer Name tab. The Computer Name tab displays either a Workgroup label or a Domain label. Verify that the NETWORK SERVICE account has read permission on the folder in which vCenter Server is installed and on the HKLM registry. During the installation, verify that the connection between the machine and the domain controller is working. Before the vCenter Server installation, in the Administrative Tools control panel of the vCenter Single Sign-On instance that you will register vCenter Server to, verify that the vCenter Single Sign-On and RSA SSPI services are started. You must log in as a member of the Administrators group on the host machine, with a user name that does not contain any non-ASCII characters. 2.10.2 Network Prerequisites Verify that the fully qualified domain name (FQDN) of the system where you will install vCenter Server is resolvable. To check that the FQDN is resolvable, type nslookup your_vCenter_Server_fqdn at a command line prompt. If the FQDN is resolvable, the nslookup command returns the IP and name of the domain controller machine. Verify that DNS reverse lookup returns a fully qualified domain name when queried with the IP address of the vCenter Server. When you install vCenter Server, the installation of the web server component that supports the vSphere Client fails if the installer cannot look up the fully qualified domain name of the vCenter Server from its IP address. Reverse lookup is implemented using PTR records. To create a PTR record, see the documentation for your vCenter Server host operating system. Verify that no Network Address Translation (NAT) exists between the vCenter Server system and the hosts it will manage. Install vCenter Server, like any other network server, on a machine with a fixed IP address and well known DNS name, so that clients can reliably access the service. Assign a static IP address and host name to the Windows server that will host the vCenter Server system. This IP address must have a valid (internal) domain name system (DNS) registration. Ensure that the ESXi host management interface has a valid DNS resolution from the vCenter Server and all vSphere Clients. Ensure that the vCenter Server has a valid DNS resolution from all ESXi hosts and all vSphere Clients. If you use DHCP instead of a static IP address for vCenter Server, make sure that the vCenter Server computer name is updated in the domain name service (DNS). Ping the computer name to test this connection. For example, if the computer name is host-1.company.com, run the following command in the Windows command prompt: ping host-1.company.com If you can ping the computer name, the name is updated in DNS. For the vCenter Single Sign-On installer to automatically discover Active Directory identity sources, verify that the following conditions are met. The Active Directory identity source must be able to authenticate the user who is logged in to perform the Single Sign-On installation. The DNS of the Single Sign-On Server host machine must contain both lookup and reverse lookup entries for the domain controller of the Active Directory. For example, pinging mycompany.com should return the domain controller IP address for mycompany. Similarly, the ping -a command for that IP address should return the domain controller hostname. Avoid trying to correct name resolution issues by editing the hosts file. Instead, make sure that the DNS server is correctly set up. The system clock of the Single Sign-On Server host machine must be synchronized with the clock of the domain controller. 2.11 (vCenter) Database Prerequisites Verify that your vCenter Server database meets the database requirements. See vCenter Server Database Configuration Notes and Preparing vCenter Server Databases. Create a vCenter Server database, unless you plan to install the bundled database. Create a vCenter Single Sign-On database, unless you plan to install the bundled database. If you are using an existing database for Single Sign On, you must create a database user (RSA_USER) and database administrator (RSA_DBA) to use for the Single Sign On database installation and setup. To create these users, run the script rsaIMSLiteDBNameSetupUsers.sql. The script is included in the vCenter Server installer download package, at vCenter Server Installation directory\SSOServer. If you are using an existing database with your vCenter Single Sign-On installation or upgrade, make sure that the table spaces are named RSA_DATA and RSA_INDEX. Any other table space names will cause the vCenter Single Sign-On Installation to fail. If you are using an existing database for Single Sign-On, to ensure that table space is created for the database, run the script rsaIMSLite<DBName>SetupTablespaces.sql. The script is included in the vCenter Server installer download package, at vCenter Server Installation directory\Single Sign On\DBScripts\SSOServer\Schema\your_existing_database. You can run this script prior to the installation, or during the installation, when you are prompted by the installer. You can leave the installer to run the script, and resume the installer after you run the script. 2.11.1 vCenter Single Sign On Components vCenter Single Sign On includes these components: STS (Security Token Service), an administration server, vCenter Lookup Service, and the RSA SSPI service. When you install vCenter Single Sign-On, the following components are deployed. STS (Security Token Service) The STS service issues Security Assertion Markup Language (SAML) tokens. These security tokens pass information about a system user between an identity provider and a web service. This service enables a user who has logged on through vCenter Single Sign-On to use multiple web-service delivered applications without authenticating to each one. Administration server The Administration Server configures the vCenter Single Sign-On server and manages users and groups. vCenter Lookup Service The Lookup Service contains topology information about the vSphere infrastructure, enabling vSphere components to connect to each other securely. RSA SSPI service The Security Support Provider Interface is a Microsoft Windows-based API used to perform authentication against Security Support Providers such as NTLM and Kerberos. 2.11.2 vCenter Lookup Service vCenter Lookup Service is a component of vCenter Single Sign On. Lookup Service registers the location of vSphere components so they can securely find and communicate with each other. The vCenter Single Sign-On installer also deploys the VMware Lookup Service on the same address and port. The Lookup Service enables different components of vSphere to find one another in a secure way. When you install vCenter Server components after vCenter Single Sign-On, you must provide the Lookup Service URL. The Inventory Service and the vCenter Server installers ask for the Lookup Service URL and then contact the Lookup Service to find vCenter Single Sign-On. After installation, the Inventory Service and vCenter Server are registered in Lookup Service so other vSphere components, like the vSphere Web Client, can find them. 2.11.3 Setting the vCenter Server Administrator User In vCenter Server 5.1 with vCenter Single Sign On, the way you set the vCenter Server administrator user depends on your vCenter Single Sign On deployment. In vSphere versions before vSphere 5.1, vCenter Server administrators are the users that belong to the local operating system administrators group. In vSphere 5.1, when you install vCenter Server, you must provide the default (initial) vCenter Server administrator user or group. For small deployments where vCenter Server and vCenter Single Sign-On are deployed on the same host machine, you can designate the local operating system group Administrators as vCenter Server administrative users. This option is the default. This behavior is unchanged from vCenter Server 5.0. For larger installations, where vCenter Single Sign-On and vCenter Server are deployed on different hosts, you cannot preserve the same behavior as in vCenter Server 5.0. Instead, assign the vCenter Server administrator role to a user or group from an identity source that is registered in the vCenter Single Sign-On server: Active Directory, OpenLDAP, or the system identity source. 2.11.4 Authenticating to the vCenter Server 5.1 Environment In vCenter Server 5.1, users authenticate through vCenter Single Sign On. In vCenter Server versions earlier than vCenter Server 5.1, when a user connects to vCenter Server, vCenter Server authenticates the user by validating the user against an Active Directory domain or the list of local operating system users. Because vCenter Server now has its own vCenter Single Sign-On server, you must create Single Sign-On users to manage the Single Sign-On server. These users might be different from the users that administer vCenter Server. The default vCenter Single Sign-On administrator user ID is admin@System-Domain. You can create Single Sign-On administrator users with the Single Sign-On administration tool in the vSphere Web Client. You can associate the following permissions with these users: Basic, Regular, and Administrator. Users can log in to vCenter Server with the vSphere Client or the vSphere Web Client. Using the vSphere Client, the user logs in to each vCenter Server separately. All linked vCenter Server instances are visible on the left pane of the vSphere Client. The vSphere Client does not show vCenter Server systems that are not linked to the vCenter Server that the user logged in to unless the user connects to those vCenter Server systems explicitly. This behavior is unchanged from vCenter Server versions earlier than version 5.1. Using the vSphere Web Client, users authenticate to vCenter Single Sign-On, and are connected to the vSphere Web Client. Users can view all the vCenter Server instances that the user has permissions on. After users connect to vCenter Server, no further authentication is required. The actions users can perform on objects depend on the user's vCenter Server permissions on those objects. For vCenter Server versions earlier than vCenter Server 5.1, you must explicitly register each vCenter Server system with the vSphere Web Client, using the vSphere Web Client Administration Application. For more information about vCenter Single Sign On, see vSphere Security. 2.11.5 How vCenter Single Sign-On Deployment Scenarios Affect Log In Behavior The way that you deploy vCenter Single Sign-On and the type of user who installs vCenter Single Sign-On affects which administrator user accounts have privileges on the Single SignOn server and on vCenter Server. During the vCenter Server installation process, certain users are granted privileges to log in to vCenter Server and certain users are granted privileges to manage vCenter Single Sign-On. The vCenter Server administrator might not be the same user as the vCenter Single Sign-On administrator. This means that when you log in to the vSphere Web Client as the default Single Sign-On administrator (admin@System-Domain), you might not see any vCenter Server systems in the inventory. The inventory appears to be empty because you see only the systems upon which you have privileges in the vSphere Web Client. This also means that when you log in to the vSphere Web Client as the default vCenter Server administrator, you might not see the vCenter Single Sign-On configuration tool. The configuration tool is not present because only the default vCenter Single Sign-On Administrator (admin@System-Domain) is allowed to view and manage vCenter Single SignOn after installation. The Single Sign-On administrator can create additional administrator users if necessary. 2.11.6 Login Behavior When You Use vCenter Simple Install The vCenter Simple Install process installs vCenter Single Sign-On, the Inventory Service, and vCenter Server on one system. The account you use when you run the Simple Install process affects which users have privileges on which components. When you log in as a domain account user or local account user to install vCenter Server using vCenter Simple Install, the following behavior occurs upon installation. By default, users in the local operating system Administrators group can log in to the vSphere Web Client and vCenter Server. These users cannot configure Single Sign-On or view the Single Sign-On management interface in the vSphere Web Client. By default, the vCenter Single Sign-On administrator user is admin@SystemDomain. This user can log in to the vSphere Web Client to configure Single Sign-On and add accounts to manage Single Sign-On if necessary. This user cannot view or configure vCenter Server. If you are logged in as a domain account user, the default Active Directory identity sources are discovered automatically during vCenter Single Sign On installation. If you are logged in as a local account user, Active Directory identity sources are not discovered automatically during vCenter Single Sign On installation. The local operating system (localos or hostname) users are added as an identity source. 2.11.7 Login Behavior When You Deploy vCenter Single Sign-On as a Standalone Server Deploying vCenter Single Sign-On in Basic mode means that a standalone version of vCenter Single Sign-On is installed on a system. Multiple vCenter Server, Inventory Service, and vSphere Web Client instances can point to this standalone version of vCenter Single Sign-On. In this deployment scenario, the installation process grants admin@System-Domain vCenter Server privileges by default. In addition, the installation process creates the user admin@System-Domain to manage vCenter Single Sign-On. Note When you install vCenter Server components with separate installers, you can choose which account or group can log in to vCenter Server upon installation. Specify this account or group on the Single Sign-On Information page of the installer, in the following text box: vCenter Server administrator recognized by vCenter Single Sign-On. For example, to grant a group of domain administrators permission to log in to vCenter Server, type of name of the domain administrators group, such as Domain Admins@VCADSSO.LOCAL. In high availablity and multisite Single Sign-On modes, there is no local operating system identity source. Therefore, it will not work if you enter Administrators or Administrator in the text box vCenter Server administrator recognized by vCenter Single Sign-On. Administrators is treated as the local operating system group Administrators, and Administrator is treated me as local operating system user Administrator. 2.11.8 dentity Sources for vCenter Server with vCenter Single Sign On vCenter Server 5.1 with vCenter Single Sign On adds support for several new types of user repository. vCenter Server versions earlier than version 5.1 supported Active Directory and local operating system users as user repositories. vCenter Server 5.1 supports the following types of user repositories as identity sources. Active Directory. OpenLDAP. Local operating system. System. vCenter Single Sign-On identity sources are managed by Single Sign-On administrator users. You can attach multiple identity sources from each type to a single Single Sign-On server. Each identity source has a name that is unique within the scope of the corresponding Single Sign-On server instance. There is always exactly one System identity source, named SystemDomain. There can be at most one local operating system identity source. On Linux systems, the identity source label is localOS. On Windows systems, the identity source label is the system's host name. The local operating system identity source can exist only in nonclustered Single Sign-On server deployments. You can attach remote identity sources to a Single Sign-On server instance. Remote identity sources are limited to any of Active Directory, and OpenLDAP server implementations. During Single Sign On installation, the installer can automatically discover Active Directory identity sources, if your system meets the appropriate prerequisites. See the section "Network Prerequisites" in Prerequisites for Installing vCenter Single Sign-On, Inventory Service, and vCenter Server. For more information about vCenter Single Sign On, see vSphere Security. 2.12 Using a User Account for Running vCenter Server You can use the Microsoft Windows built-in system account or a user account to run vCenter Server. With a user account, you can enable Windows authentication for SQL Server, and it provides more security. The user account must be an administrator on the local machine. In the installation wizard, you specify the account name as DomainName\Username. You must configure the SQL Server database to allow the domain account access to SQL Server. The Microsoft Windows built-in system account has more permissions and rights on the server than the vCenter Server system needs, which can contribute to security problems. For SQL Server DSNs configured with Windows authentication, use the same user account for the VMware VirtualCenter Management Webservices service and the DSN user. If you do not plan to use Microsoft Windows authentication for SQL Server or you are using an Oracle or DB2 database, you might still want to set up a local user account for the vCenter Server system. The only requirement is that the user account is an administrator on the local machine. Note If you install an instance of vCenter Server as a local system account on a local SQL Server database with Integrated Windows NT Authentication, and you add an Integrated Windows NT Authentication user to the local database server with the same default database as vCenter Server, vCenter Server might not start. See vCenter Server Fails to Start When Installed as a Local System Account on a Local SQL Server Database with Integrated Windows NT Authentication. 3 HA Clusters 3.1 Requirements for a vSphere HA Cluster Review this list before setting up a vSphere HA cluster. For more information, follow the appropriate cross reference or see Creating a vSphere HA Cluster. All hosts must be licensed for vSphere HA. You need at least two hosts in the cluster. All hosts need to be configured with static IP addresses. If you are using DHCP, you must ensure that the address for each host persists across reboots. There should be at least one management network in common among all hosts and best practice is to have at least two. Management networks differ depending on the version of host you are using. o ESX hosts - service console network. o ESXi hosts earlier than version 4.0 - VMkernel network. o ESXi hosts version 4.0 and later ESXi hosts - VMkernel network with the Management traffic checkbox enabled. See Best Practices for Networking. To ensure that any virtual machine can run on any host in the cluster, all hosts should have access to the same virtual machine networks and datastores. Similarly, virtual machines must be located on shared, not local, storage otherwise they cannot be failed over in the case of a host failure. Note vSphere HA uses datastore heartbeating to distinguish between partitioned, isolated, and failed hosts. Accordingly, if there are some datastores that are more reliable in your environment, configure vSphere HA to give preference to them. For VM Monitoring to work, VMware tools must be installed. See VM and Application Monitoring. vSphere HA supports both IPv4 and IPv6. A cluster that mixes the use of both of these protocol versions, however, is more likely to result in a network partition. 3.2 EVC Requirements for Hosts To improve CPU compatibility between hosts that have varying CPU feature sets, you can hide some host CPU features from the virtual machine by placing the host in an Enhanced vMotion Compatibility (EVC) cluster. Hosts in an EVC cluster and hosts that you add to an existing EVC cluster must meet EVC requirements. Power off all virtual machines in the cluster that are running on hosts with a feature set greater than the EVC mode that you intend to enable, or migrate out of the cluster. All hosts in the cluster must meeting the following requirements. Requirements Description Supported ESX/ESXi ESX/ESXi 3.5 Update 2 or later. version vCenter Server The host must be connected to a vCenter Server system. CPUs A single vendor, either AMD or Intel. Advanced CPU features enabled Enable these CPU features in the BIOS if they are available: Hardware virtualization support (AMD-V or Intel VT) MD No eXecute(NX) Intel eXecute Disable (XD) Note Hardware vendors sometimes disable particular CPU features in the BIOS by default. This can cause problems in enabling EVC, because the EVC compatibility checks detect the absence of features that are expected to be present for a particular CPU. If you cannot enable EVC on a system with a compatible processor, ensure that all features are enabled in the BIOS. Supported CPUs for To check EVC support for a specific processor or server model, see the EVC mode that you the VMware Compatibility Guide at want to enable http://www.vmware.com/resources/compatibility/search.php. 3.3 Best Practices for vSphere HA Clusters To ensure optimal vSphere HA cluster performance, you should follow certain best practices. This topic highlights some of the key best practices for a vSphere HA cluster. You can also refer to the vSphere High Availability Deployment Best Practices publication for further discussion. 3.3.1 Setting Alarms to Monitor Cluster Changes When vSphere HA or Fault Tolerance take action to maintain availability, for example, a virtual machine failover, you can be notified about such changes. Configure alarms in vCenter Server to be triggered when these actions occur, and have alerts, such as emails, sent to a specified set of administrators. Several default vSphere HA alarms are available. Insufficient failover resources (a cluster alarm) Cannot find master (a cluster alarm) Failover in progress (a cluster alarm) Host HA status (a host alarm) VM monitoring error (a virtual machine alarm) VM monitoring action (a virtual machine alarm) Failover failed (a virtual machine alarm) Note The default alarms include the feature name, vSphere HA. 3.3.2 Monitoring Cluster Validity A valid cluster is one in which the admission control policy has not been violated. A cluster enabled for vSphere HA becomes invalid when the number of virtual machines powered on exceeds the failover requirements, that is, the current failover capacity is smaller than configured failover capacity. If admission control is disabled, clusters do not become invalid. In the vSphere Web Client, select vSphere HA from the cluster's Monitor tab and then select Configuration Issues. A list of current vSphere HA issues appears. In the vSphere Client, the cluster's Summary tab displays a list of configuration issues for clusters. The list explains what has caused the cluster to become invalid or overcommitted. DRS behavior is not affected if a cluster is red because of a vSphere HA issue. 3.3.3 vSphere HA and Storage vMotion Interoperability in a Mixed Cluster In clusters where ESXi 5.x hosts and ESX/ESXi 4.1 or prior hosts are present and where Storage vMotion is used extensively or Storage DRS is enabled, do not deploy vSphere HA. vSphere HA might respond to a host failure by restarting a virtual machine on a host with an ESXi version different from the one on which the virtual machine was running before the failure. A problem can occur if, at the time of failure, the virtual machine was involved in a Storage vMotion action on an ESXi 5.x host, and vSphere HA restarts the virtual machine on a host with a version prior to ESXi 5.0. While the virtual machine might power on, any subsequent attempts at snapshot operations could corrupt the vdisk state and leave the virtual machine unusable. 3.3.4 Admission Control Best Practices The following recommendations are best practices for vSphere HA admission control. Select the Percentage of Cluster Resources Reserved admission control policy. This policy offers the most flexibility in terms of host and virtual machine sizing. When configuring this policy, choose a percentage for CPU and memory that reflects the number of host failures you want to support. For example, if you want vSphere HA to set aside resources for two host failures and have ten hosts of equal capacity in the cluster, then specify 20% (2/10). Ensure that you size all cluster hosts equally. For the Host Failures Cluster Tolerates policy, an unbalanced cluster results in excess capacity being reserved to handle failures because vSphere HA reserves capacity for the largest hosts. For the Percentage of Cluster Resources Policy, an unbalanced cluster requires that you specify larger percentages than would otherwise be necessary to reserve enough capacity for the anticipated number of host failures. If you plan to use the Host Failures Cluster Tolerates policy, try to keep virtual machine sizing requirements similar across all configured virtual machines. This policy uses slot sizes to calculate the amount of capacity needed to reserve for each virtual machine. The slot size is based on the largest reserved memory and CPU needed for any virtual machine. When you mix virtual machines of different CPU and memory requirements, the slot size calculation defaults to the largest possible, which limits consolidation. If you plan to use the Specify Failover Hosts policy, decide how many host failures to support and then specify this number of hosts as failover hosts. If the cluster is unbalanced, the designated failover hosts should be at least the same size as the nonfailover hosts in your cluster. This ensures that there is adequate capacity in case of failure. 3.3.5 Using Auto Deploy with vSphere HA You can use vSphere HA and Auto Deploy together to improve the availability of your virtual machines. Auto Deploy provisions hosts when they power up and you can also configure it to install the vSphere HA agent on such hosts during the boot process. See the Auto Deploy documentation included in vSphere Installation and Setup for details. 3.3.6 Best Practices for Networking Observe the following best practices for the configuration of host NICs and network topology for vSphere HA. Best Practices include recommendations for your ESXi hosts, and for cabling, switches, routers, and firewalls. 3.3.7 Network Configuration and Maintenance The following network maintenance suggestions can help you avoid the accidental detection of failed hosts and network isolation because of dropped vSphere HA heartbeats. When making changes to the networks that your clustered ESXi hosts are on, suspend the Host Monitoring feature. Changing your network hardware or networking settings can interrupt the heartbeats that vSphere HA uses to detect host failures, and this might result in unwanted attempts to fail over virtual machines. When you change the networking configuration on the ESXi hosts themselves, for example, adding port groups, or removing vSwitches, suspend Host Monitoring. After you have made the networking configuration changes, you must reconfigure vSphere HA on all hosts in the cluster, which causes the network information to be reinspected. Then re-enable Host Monitoring. Note Because networking is a vital component of vSphere HA, if network maintenance needs to be performed inform the vSphere HA administrator. 3.3.8 Networks Used for vSphere HA Communications To identify which network operations might disrupt the functioning of vSphere HA, you should know which management networks are being used for heart beating and other vSphere HA communications. On legacy ESX hosts in the cluster, vSphere HA communications travel over all networks that are designated as service console networks. VMkernel networks are not used by these hosts for vSphere HA communications. On ESXi hosts in the cluster, vSphere HA communications, by default, travel over VMkernel networks, except those marked for use with vMotion. If there is only one VMkernel network, vSphere HA shares it with vMotion, if necessary. With ESXi 4.x and ESXi, you must also explicitly enable the Management traffic checkbox for vSphere HA to use this network. Note To keep vSphere HA agent traffic on the networks you have specified, configure hosts so vmkNICs used by vSphere HA do not share subnets with vmkNICs used for other purposes. vSphere HA agents send packets using any pNIC that is associated with a given subnet if there is also at least one vmkNIC configured for vSphere HA management traffic. Consequently, to ensure network flow separation, the vmkNICs used by vSphere HA and by other features must be on different subnets. 3.3.9 Network Isolation Addresses A network isolation address is an IP address that is pinged to determine whether a host is isolated from the network. This address is pinged only when a host has stopped receiving heartbeats from all other hosts in the cluster. If a host can ping its network isolation address, the host is not network isolated, and the other hosts in the cluster have either failed or are network partitioned. However, if the host cannot ping its isolation address, it is likely that the host has become isolated from the network and no failover action is taken. By default, the network isolation address is the default gateway for the host. Only one default gateway is specified, regardless of how many management networks have been defined. You should use the das.isolationaddress[...] advanced attribute to add isolation addresses for additional networks. See vSphere HA Advanced Attributes. 3.3.10 Network Path Redundancy Network path redundancy between cluster nodes is important for vSphere HA reliability. A single management network ends up being a single point of failure and can result in failovers although only the network has failed. If you have only one management network, any failure between the host and the cluster can cause an unnecessary (or false) failover activity if heartbeat datastore connectivity is not retained during the networking failure. Possible failures include NIC failures, network cable failures, network cable removal, and switch resets. Consider these possible sources of failure between hosts and try to minimize them, typically by providing network redundancy. You can implement network redundancy at the NIC level with NIC teaming, or at the management network level. In most implementations, NIC teaming provides sufficient redundancy, but you can use or add management network redundancy if required. Redundant management networking allows the reliable detection of failures and prevents isolation or partition conditions from occurring, because heartbeats can be sent over multiple networks. Configure the fewest possible number of hardware segments between the servers in a cluster. The goal being to limit single points of failure. Additionally, routes with too many hops can cause networking packet delays for heartbeats, and increase the possible points of failure. 3.3.11 Network Redundancy Using NIC Teaming Using a team of two NICs connected to separate physical switches improves the reliability of a management network. Because servers connected through two NICs (and through separate switches) have two independent paths for sending and receiving heartbeats, the cluster is more resilient. To configure a NIC team for the management network, configure the vNICs in vSwitch configuration for Active or Standby configuration. The recommended parameter settings for the vNICs are: Default load balancing = route based on originating port ID Failback = No After you have added a NIC to a host in your vSphere HA cluster, you must reconfigure vSphere HA on that host. 3.3.12 Network Redundancy Using a Second Network As an alternative to NIC teaming for providing redundancy for heartbeats, you can create a second management network connection, which is attached to a separate virtual switch. The original management network connection is used for network and management purposes. When the second management network connection is created, vSphere HA sends heartbeats over both management network connections. If one path fails, vSphere HA still sends and receives heartbeats over the other path. Sphere HA Advanced Attributes Attribute Description das.isolationaddress[...] Sets the address to ping to determine if a host is isolated from the network. This address is pinged only when heartbeats are not received from any other host in the cluster. If not specified, the default gateway of the management network is used. This default gateway has to be a reliable address that is available, so that the host can determine if it is isolated from the network. You can specify multiple isolation addresses (up to 10) for the cluster: das.isolationaddressX, where X = 0-9. Typically you should specify one per management network. Specifying too many addresses makes isolation detection take too long. das.usedefaultisolationaddress By default, vSphere HA uses the default gateway of the console network as an isolation address. This attribute specifies whether or not this default is used (true|false). das.isolationshutdowntimeout The period of time the system waits for a virtual machine to shut down before powering it off. This only applies if the host's isolation response is Shut down VM. Default value is 300 seconds. das.slotmeminmb Defines the maximum bound on the memory slot size. If this option is used, the slot size is the smaller of this value or the maximum memory reservation plus memory overhead of any powered-on virtual machine in the cluster. das.slotcpuinmhz Defines the maximum bound on the CPU slot size. If this option is used, the slot size is the smaller of this value or the maximum CPU reservation of any powered-on virtual machine in the cluster. das.vmmemoryminmb Defines the default memory resource value assigned to a virtual machine if its memory reservation is not specified or zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default is 0 MB. das.vmcpuminmhz Defines the default CPU resource value assigned to a virtual machine if its CPU reservation is not specified or Sphere HA Advanced Attributes Attribute Description zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default is 32MHz. das.iostatsinterval Changes the default I/O stats interval for VM Monitoring sensitivity. The default is 120 (seconds). Can be set to any value greater than, or equal to 0. Setting to 0 disables the check. Disables configuration issues created if the host does not das.ignoreinsufficienthbdatastorehave sufficient heartbeat datastores for vSphere HA. Default value is false. 3.3.13 Monitoring Cluster Validity A valid cluster is one in which the admission control policy has not been violated. A cluster enabled for vSphere HA becomes invalid when the number of virtual machines powered on exceeds the failover requirements, that is, the current failover capacity is smaller than configured failover capacity. If admission control is disabled, clusters do not become invalid. In the vSphere Web Client, select vSphere HA from the cluster's Monitor tab and then select Configuration Issues. A list of current vSphere HA issues appears. In the vSphere Client, the cluster's Summary tab displays a list of configuration issues for clusters. The list explains what has caused the cluster to become invalid or overcommitted. DRS behavior is not affected if a cluster is red because of a vSphere HA issue. 3.3.14 vSphere HA and Storage vMotion Interoperability in a Mixed Cluster In clusters where ESXi 5.x hosts and ESX/ESXi 4.1 or prior hosts are present and where Storage vMotion is used extensively or Storage DRS is enabled, do not deploy vSphere HA. vSphere HA might respond to a host failure by restarting a virtual machine on a host with an ESXi version different from the one on which the virtual machine was running before the failure. A problem can occur if, at the time of failure, the virtual machine was involved in a Storage vMotion action on an ESXi 5.x host, and vSphere HA restarts the virtual machine on a host with a version prior to ESXi 5.0. While the virtual machine might power on, any subsequent attempts at snapshot operations could corrupt the vdisk state and leave the virtual machine unusable. 3.3.15 Admission Control Best Practices The following recommendations are best practices for vSphere HA admission control. Select the Percentage of Cluster Resources Reserved admission control policy. This policy offers the most flexibility in terms of host and virtual machine sizing. When configuring this policy, choose a percentage for CPU and memory that reflects the number of host failures you want to support. For example, if you want vSphere HA to set aside resources for two host failures and have ten hosts of equal capacity in the cluster, then specify 20% (2/10). Ensure that you size all cluster hosts equally. For the Host Failures Cluster Tolerates policy, an unbalanced cluster results in excess capacity being reserved to handle failures because vSphere HA reserves capacity for the largest hosts. For the Percentage of Cluster Resources Policy, an unbalanced cluster requires that you specify larger percentages than would otherwise be necessary to reserve enough capacity for the anticipated number of host failures. If you plan to use the Host Failures Cluster Tolerates policy, try to keep virtual machine sizing requirements similar across all configured virtual machines. This policy uses slot sizes to calculate the amount of capacity needed to reserve for each virtual machine. The slot size is based on the largest reserved memory and CPU needed for any virtual machine. When you mix virtual machines of different CPU and memory requirements, the slot size calculation defaults to the largest possible, which limits consolidation. If you plan to use the Specify Failover Hosts policy, decide how many host failures to support and then specify this number of hosts as failover hosts. If the cluster is unbalanced, the designated failover hosts should be at least the same size as the nonfailover hosts in your cluster. This ensures that there is adequate capacity in case of failure. 3.3.16 vSphere HA Security vSphere HA is enhanced by several security features. Select firewall ports opened vSphere HA uses TCP and UDP port 8182 for agent-to-agent communication. The firewall ports open and close automatically to ensure they are open only when needed. Configuration files protected using file system permissions vSphere HA stores configuration information on the local storage or on ramdisk if there is no local datastore. These files are protected using file system permissions and they are accessible only to the root user. Hosts without local storage are only supported if they are managed by Auto Deploy. Detailed logging The location where vSphere HA places log files depends on the version of host. For ESXi 5.x hosts, vSphere HA writes to syslog only by default, so logs are placed where syslog is configured to put them. The log file names for vSphere HA are prepended with fdm, fault domain manager, which is a service of vSphere HA. For legacy ESXi 4.x hosts, vSphere HA writes to /var/log/vmware/fdm on local disk, as well as syslog if it is configured. For legacy ESX 4.x hosts, vSphere HA writes to /var/log/vmware/fdm. Secure vSphere HA vSphere HA logs onto the vSphere HA agents using a user account, vpxuser, created by vCenter Server. This account is the same account logins used by vCenter Server to manage the host. vCenter Server creates a random password for this account and changes the password periodically. The time period is set by the vCenter Server VirtualCenter.VimPasswordExpirationInDays setting. Users with administrative privileges on the root folder of the host can log in to the agent. Secure communication All communication between vCenter Server and the vSphere HA agent is done over SSL. Agent-to-agent communication also uses SSL except for election messages, which occur over UDP. Election messages are verified over SSL so that a rogue agent can prevent only the host on which the agent is running from being elected as a master host. In this case, a configuration issue for the cluster is issued so the user is aware of the problem. Host SSL certificate verification required vSphere HA requires that each host have a verified SSL certificate. Each host generates a self-signed certificate when it is booted for the first time. This certificate can then be regenerated or replaced with one issued by an authority. If the certificate is replaced, vSphere HA needs to be reconfigured on the host. If a host becomes disconnected from vCenter Server after its certificate is updated and the ESXi or ESX Host agent is restarted, then vSphere HA is automatically reconfigured when the host is reconnected to vCenter Server. If the disconnection does not occur because vCenter Server host SSL certificate verification is disabled at the time, verify the new certificate and reconfigure vSphere HA on the host. 3.4 Best Practices for Networking Observe the following best practices for the configuration of host NICs and network topology for vSphere HA. Best Practices include recommendations for your ESXi hosts, and for cabling, switches, routers, and firewalls. 3.4.1 Network Configuration and Maintenance The following network maintenance suggestions can help you avoid the accidental detection of failed hosts and network isolation because of dropped vSphere HA heartbeats. When making changes to the networks that your clustered ESXi hosts are on, suspend the Host Monitoring feature. Changing your network hardware or networking settings can interrupt the heartbeats that vSphere HA uses to detect host failures, and this might result in unwanted attempts to fail over virtual machines. When you change the networking configuration on the ESXi hosts themselves, for example, adding port groups, or removing vSwitches, suspend Host Monitoring. After you have made the networking configuration Then re-enable Host Monitoring. Note Because networking is a vital component of vSphere HA, if network maintenance needs to be performed inform the vSphere HA administrator. 3.4.2 Networks Used for vSphere HA Communications To identify which network operations might disrupt the functioning of vSphere HA, you should know which management networks are being used for heart beating and other vSphere HA communications. On legacy ESX hosts in the cluster, vSphere HA communications travel over all networks that are designated as service console networks. VMkernel networks are not used by these hosts for vSphere HA communications. On ESXi hosts in the cluster, vSphere HA communications, by default, travel over VMkernel networks, except those marked for use with vMotion. If there is only one VMkernel network, vSphere HA shares it with vMotion, if necessary. With ESXi 4.x and ESXi, you must also explicitly enable the Management traffic checkbox for vSphere HA to use this network. Note To keep vSphere HA agent traffic on the networks you have specified, configure hosts so vmkNICs used by vSphere HA do not share subnets with vmkNICs used for other purposes. vSphere HA agents send packets using any pNIC that is associated with a given subnet if there is also at least one vmkNIC configured for vSphere HA management traffic. Consequently, to ensure network flow separation, the vmkNICs used by vSphere HA and by other features must be on different subnets. 3.4.3 Network Isolation Addresses A network isolation address is an IP address that is pinged to determine whether a host is isolated from the network. This address is pinged only when a host has stopped receiving heartbeats from all other hosts in the cluster. If a host can ping its network isolation address, the host is not network isolated, and the other hosts in the cluster have either failed or are network partitioned. However, if the host cannot ping its isolation address, it is likely that the host has become isolated from the network and no failover action is taken. By default, the network isolation address is the default gateway for the host. Only one default gateway is specified, regardless of how many management networks have been defined. You should use the das.isolationaddress[...] advanced attribute to add isolation addresses for additional networks. See vSphere HA Advanced Attributes. 3.4.4 Network Path Redundancy Network path redundancy between cluster nodes is important for vSphere HA reliability. A single management network ends up being a single point of failure and can result in failovers although only the network has failed. If you have only one management network, any failure between the host and the cluster can cause an unnecessary (or false) failover activity if heartbeat datastore connectivity is not retained during the networking failure. Possible failures include NIC failures, network cable failures, network cable removal, and switch resets. Consider these possible sources of failure between hosts and try to minimize them, typically by providing network redundancy. You can implement network redundancy at the NIC level with NIC teaming, or at the management network level. In most implementations, NIC teaming provides sufficient redundancy, but you can use or add management network redundancy if required. Redundant management networking allows the reliable detection of failures and prevents isolation or partition conditions from occurring, because heartbeats can be sent over multiple networks. Configure the fewest possible number of hardware segments between the servers in a cluster. The goal being to limit single points of failure. Additionally, routes with too many hops can cause networking packet delays for heartbeats, and increase the possible points of failure. 3.4.5 Network Redundancy Using NIC Teaming Using a team of two NICs connected to separate physical switches improves the reliability of a management network. Because servers connected through two NICs (and through separate switches) have two independent paths for sending and receiving heartbeats, the cluster is more resilient. To configure a NIC team for the management network, configure the vNICs in vSwitch configuration for Active or Standby configuration. The recommended parameter settings for the vNICs are: Default load balancing = route based on originating port ID Failback = No After you have added a NIC to a host in your vSphere HA cluster, you must reconfigure vSphere HA on that host. 3.4.6 Network Redundancy Using a Second Network As an alternative to NIC teaming for providing redundancy for heartbeats, you can create a second management network connection, which is attached to a separate virtual switch. The original management network connection is used for network and management purposes. When the second management network connection is created, vSphere HA sends heartbeats over both management network connections. If one path fails, vSphere HA still sends and receives heartbeats over the other path. 3.5 Fault Tolerance 3.5.1 Fault Tolerance Checklist The following checklist contains cluster, host, and virtual machine requirements that you need to be aware of before using vSphere Fault Tolerance. Review this list before setting up Fault Tolerance. You can also use the VMware SiteSurvey utility (download at http://www.vmware.com/download/shared_utilities.html) to better understand the configuration issues associated with the cluster, host, and virtual machines being used for vSphere FT. Note The failover of fault tolerant virtual machines is independent of vCenter Server, but you must use vCenter Server to set up your Fault Tolerance clusters. 3.5.2 Cluster Requirements for Fault Tolerance You must meet the following cluster requirements before you use Fault Tolerance. At least two FT-certified hosts running the same Fault Tolerance version or host build number. The Fault Tolerance version number appears on a host's Summary tab in the vSphere Web Client or vSphere Client. Note For legacy hosts prior to ESX/ESXi 4.1, this tab lists the host build number instead. Patches can cause host build numbers to vary between ESX and ESXi installations. To ensure that your legacy hosts are FT compatible, do not mix legacy ESX and ESXi hosts in an FT pair. ESXi hosts have access to the same virtual machine datastores and networks. See Best Practices for Fault Tolerance. Fault Tolerance logging and VMotion networking configured. See Configure Networking for Host Machines in the vSphere Client or Configure Networking for Host Machines in the vSphere Web Client. vSphere HA cluster created and enabled. See Creating a vSphere HA Cluster. vSphere HA must be enabled before you can power on fault tolerant virtual machines or add a host to a cluster that already supports fault tolerant virtual machines. 3.5.3 Host Requirements for Fault Tolerance You must meet the following host requirements before you use Fault Tolerance. Hosts must have processors from the FT-compatible processor group. It is also highly recommended that the hosts' processors are compatible with one another. See the VMware knowledge base article at http://kb.vmware.com/kb/1008027 for information on supported processors. Hosts must be licensed for Fault Tolerance. Hosts must be certified for Fault Tolerance. See http://www.vmware.com/resources/compatibility/search.php and select Search by Fault Tolerant Compatible Sets to determine if your hosts are certified. The configuration for each host must have Hardware Virtualization (HV) enabled in the BIOS. To confirm the compatibility of the hosts in the cluster to support Fault Tolerance, you can also run profile compliance checks as described in Create Cluster and Check Compliance in the vSphere Client or Create Cluster and Check Compliance in the vSphere Web Client. 3.5.4 Virtual Machine Requirements for Fault Tolerance You must meet the following virtual machine requirements before you use Fault Tolerance. No unsupported devices attached to the virtual machine. See Fault Tolerance Interoperability. Virtual machines must be stored in virtual RDM or virtual machine disk (VMDK) files that are thick provisioned. If a virtual machine is stored in a VMDK file that is thin provisioned and an attempt is made to enable Fault Tolerance, a message appears indicating that the VMDK file must be converted. To perform the conversion, you must power off the virtual machine. Incompatible features must not be running with the fault tolerant virtual machines. See Fault Tolerance Interoperability. Virtual machine files must be stored on shared storage. Acceptable shared storage solutions include Fibre Channel, (hardware and software) iSCSI, NFS, and NAS. Only virtual machines with a single vCPU are compatible with Fault Tolerance. Virtual machines must be running on one of the supported guest operating systems. See the VMware knowledge 3.5.5 Fault Tolerance Interoperability Before configuring vSphere Fault Tolerance, you should be aware of the features and products Fault Tolerance cannot interoperate with. 3.5.5.1 vSphere Features Not Supported with Fault Tolerance The following vSphere features are not supported for fault tolerant virtual machines. Snapshots. Snapshots must be removed or committed before Fault Tolerance can be enabled on a virtual machine. In addition, it is not possible to take snapshots of virtual machines on which Fault Tolerance is enabled. Storage vMotion. You cannot invoke Storage vMotion for virtual machines with Fault Tolerance turned on. To migrate the storage, you should temporarily turn off Fault Tolerance, and perform the storage vMotion action. When this is complete, you can turn Fault Tolerance back on. Linked clones. You cannot enable Fault Tolerance on a virtual machine that is a linked clone, nor can you create a linked clone from an FT-enabled virtual machine. Virtual Machine Backups. You cannot back up an FT-enabled virtual machine using Storage API for Data Protection, vSphere Data Protection, or similar backup products that require the use of a virtual machine snapshot, as performed by ESXi. To back up a fault tolerant virtual machine in this manner, you must first disable FT, then reenable FT after performing the backup. Storage array-based snapshots do not affect FT. 3.5.5.2 Features and Devices Incompatible with Fault Tolerance For a virtual machine to be compatible with Fault Tolerance, the Virtual Machine must not use the following features or devices. Features and Devices Incompatible with Fault Tolerance and Corrective Actions Incompatible Feature or Device Corrective Action Symmetric multiprocessor (SMP) virtual machines. Only Reconfigure the virtual machine as a single vCPU. Many virtual machines with a single workloads have good performance configured as a single vCPU are compatible with vCPU. Fault Tolerance. Physical Raw Disk mapping (RDM). Reconfigure virtual machines with physical RDM-backed virtual devices to use virtual RDMs instead. CD-ROM or floppy virtual Remove the CD-ROM or floppy virtual device or reconfigure devices backed by a physical or the backing with an ISO installed on shared storage. remote device. Paravirtualized guests. If paravirtualization is not required, reconfigure the virtual machine without a VMI ROM. USB and sound devices. Remove these devices from the virtual machine. N_Port ID Virtualization (NPIV). Disable the NPIV configuration of the virtual machine. NIC passthrough. This feature is not supported by Fault Tolerance so it must be turned off. vlance networking drivers. Fault Tolerance does not support virtual machines that are configured with vlance virtual NIC cards. However, vmxnet2, vmxnet3, and e1000 are fully supported. Virtual disks backed with thinprovisioned storage or thick- When you turn on Fault Tolerance, the conversion to the provisioned disks that do not appropriate disk format is performed by default. You must have clustering features power off the virtual machine to trigger this conversion. enabled. Features and Devices Incompatible with Fault Tolerance and Corrective Actions Incompatible Feature or Device Corrective Action The hot plug feature is automatically disabled for fault tolerant virtual machines. To hot plug devices (either adding or removing), you must momentarily turn off Fault Tolerance, perform the hot plug, and then turn on Fault Tolerance. Note Hot-plugging devices. When using Fault Tolerance, changing the settings of a virtual network card while a virtual machine is running is a hot-plug operation, since it requires "unplugging" the network card and then "plugging" it in again. For example, with a virtual network card for a running virtual machine, if you change the network that the virtual NIC is connected to, FT must be turned off first. Extended Page Tables/Rapid Virtualization Indexing (EPT/RVI). EPT/RVI is automatically disabled for virtual machines with Fault Tolerance turned on. Serial or parallel ports Remove these devices from the virtual machine. 3.5.6 Best Practices for Fault Tolerance To ensure optimal Fault Tolerance results, you should follow certain best practices. In addition to the following information, see the white paper VMware Fault Tolerance Recommendations and Considerations at http://www.vmware.com/resources/techresources/10040. 3.5.6.1 Host Configuration Consider the following best practices when configuring your hosts. Hosts running the Primary and Secondary VMs should operate at approximately the same processor frequencies, otherwise the Secondary VM might be restarted more frequently. Platform power management features that do not adjust based on workload (for example, power capping and enforced low frequency modes to save power) can cause processor frequencies to vary greatly. If Secondary VMs are being restarted on a regular basis, disable all power management modes on the hosts running fault tolerant virtual machines or ensure that all hosts are running in the same power management modes. Apply the same instruction set extension configuration (enabled or disabled) to all hosts. The process for enabling or disabling instruction sets varies among BIOSes. See the documentation for your hosts' BIOSes about how to configure instruction sets. 3.5.6.2 Homogeneous Clusters vSphere Fault Tolerance can function in clusters with nonuniform hosts, but it works best in clusters with compatible nodes. When constructing your cluster, all hosts should have the following configuration: Processors from the same compatible processor group. Common access to datastores used by the virtual machines. The same virtual machine network configuration. The same ESXi version. The same Fault Tolerance version number (or host build number for hosts prior to ESX/ESXi 4.1). The same BIOS settings (power management and hyperthreading) for all hosts. Run Check Compliance to identify incompatibilities and to correct them. 3.5.6.3 Performance To increase the bandwidth available for the logging traffic between Primary and Secondary VMs use a 10Gbit NIC, and enable the use of jumbo frames. 3.5.6.4 Store ISOs on Shared Storage for Continuous Access Store ISOs that are accessed by virtual machines with Fault Tolerance enabled on shared storage that is accessible to both instances of the fault tolerant virtual machine. If you use this configuration, the CD-ROM in the virtual machine continues operating normally, even when a failover occurs. For virtual machines with Fault Tolerance enabled, you might use ISO images that are accessible only to the Primary VM. In such a case, the Primary VM can access the ISO, but if a failover occurs, the CD-ROM reports errors as if there is no media. This situation might be acceptable if the CD-ROM is being used for a temporary, noncritical operation such as an installation. 3.5.6.5 Avoid Network Partitions A network partition occurs when a vSphere HA cluster has a management network failure that isolates some of the hosts from vCenter Server and from one another. See Network Partitions. When a partition occurs, Fault Tolerance protection might be degraded. In a partitioned vSphere HA cluster using Fault Tolerance, the Primary VM (or its Secondary VM) could end up in a partition managed by a master host that is not responsible for the virtual machine. When a failover is needed, a Secondary VM is restarted only if the Primary VM was in a partition managed by the master host responsible for it. To ensure that your management network is less likely to have a failure that leads to a network partition, follow the recommendations in Best Practices for Networking. 3.5.6.6 Viewing Fault Tolerance Errors in the vSphere Client When errors related to your implementation of Fault Tolerance are generated by vCenter Server, the Fault Details screen appears. This screen lists faults related to Fault Tolerance and for each fault it provides the type of fault (red is an error, yellow is a warning), the name of the virtual machine or host involved, and a brief description. You can also invoke this screen for a specific failed Fault Tolerance task. To do this, select the task in either the Recent Tasks pane or the Tasks & Events tab for the entity that experienced the fault and click the View details link that appears in the Details column. 3.5.6.7 Viewing Fault Tolerance Errors in the vSphere Web Client When tasks related to your implementation of Fault Tolerance cause errors, you can view information about them in the Recent Tasks pane. The Recent Tasks pane lists a summary of each error under the Failed tab. For information about failed tasks, click More Tasks to open the Task Console. In the Task Console, each task is listed with information that includes its Name, Target, and Status. In the Status column, if the task failed, the type of fault it generated is described. For information about a task, select it and details appear in the pane below the task list. 3.5.6.8 Upgrade Hosts Used for Fault Tolerance When you upgrade hosts that contain fault tolerant virtual machines, ensure that the Primary and Secondary VMs continue to run on hosts with the same FT version number or host build number (for hosts prior to ESX/ESXi 4.1). 3.5.6.9 Prerequisites Verify that you have cluster administrator privileges. Verify that you have sets of four or more ESXi hosts that are hosting fault tolerant virtual machines that are powered on. If the virtual machines are powered off, the Primary and Secondary VMs can be relocated to hosts with different builds. This upgrade procedure is for a minimum four-node cluster. The same instructions can be followed for a smaller cluster, though the unprotected interval will be slightly longer. Procedure Using vMotion, migrate the fault tolerant virtual machines off of two hosts. Upgrade the two evacuated hosts to the same ESXi build. Turn off Fault Tolerance on the Primary VM. Using vMotion, move the disabled Primary VM to one of the upgraded hosts. Turn on Fault Tolerance on the Primary VM that was moved. Repeat Step 1 to Step 5 for as many fault tolerant virtual machine pairs as can be accommodated on the upgraded hosts. Using vMotion, redistribute the fault tolerant virtual machines. All ESXi hosts in a cluster are upgraded. 3.5.7 vSphere Fault Tolerance Configuration Recommendations You should observe certain guidelines when configuring Fault Tolerance. In addition to non-fault tolerant virtual machines, you should have no more than four fault tolerant virtual machines (primaries or secondaries) on any single host. The number of fault tolerant virtual machines that you can safely run on each host is based on the sizes and workloads of the ESXi host and virtual machines, all of which can vary. If you are using NFS to access shared storage, use dedicated NAS hardware with at least a 1Gbit NIC to obtain the network performance required for Fault Tolerance to work properly. Ensure that a resource pool containing fault tolerant virtual machines has excess memory above the memory size of the virtual machines. The memory reservation of a fault tolerant virtual machine is set to the virtual machine's memory size when Fault Tolerance is turned on. Without this excess in the resource pool, there might not be any memory available to use as overhead memory. Use a maximum of 16 virtual disks per fault tolerant virtual machine. To ensure redundancy and maximum Fault Tolerance protection, you should have a minimum of three hosts in the cluster. In a failover situation, this provides a host that can accommodate the new Secondary VM that is created. 4 Networking 4.1 vSphere Distributed Switch Health Check vSphere 5.1 distributed switch health check helps identify and troubleshoot configuration errors in vSphere distributed switches. The following errors are common configuration errors that health check helps identify. Mismatched VLAN trunks between a vSphere distributed switch and physical switch. Mismatched MTU settings between physical network adapters, distributed switches, and physical switch ports. Mismatched virtual switch teaming policies for the physical switch port-channel settings. Health check monitors the following: VLAN. Checks whether vSphere distributed switch VLAN settings match trunk port configuration on the adjacent physical switch ports. MTU. Checks whether the physical access switch port MTU jumbo frame setting based on per VLAN matches the vSphere distributed switch MTU setting. Teaming policies. Checks whether the physical access switch ports EtherChannel setting matches the distributed switch distributed port group IPHash teaming policy settings. Health check is limited to only the access switch port to which the distributed switch uplink connects. Note For VLAN and MTU checks, you must have at least two link-up physical uplink NICs for the distributed switch. For a teaming policy check, you must have at least two link-up physical uplink NICs and two hosts when applying the policy. 4.2 vDS Port Group settings and parameters Setting Description Port binding Choose when ports are assigned to virtual machines connected to this distributed port group. Static binding: Assign a port to a virtual machine when the virtual machine connects to the distributed port group. This option is not available when the vSphere Web Client is connected directly to ESXi. Dynamic binding: Assign a port to a virtual machine the first time the virtual machine powers on after it is connected to the distributed port group. Dynamic binding is deprecated in ESXi 5.0. Ephemeral: No port binding. This option is not available when the vSphere Web Client is connected directly to ESXi. Port allocation Elastic: The default number of ports is eight. When all ports are assigned, a new set of eight ports is created. This is the default. Fixed: The default number of ports is set to eight. No additional ports are created when all ports are assigned. Number of ports Enter the number of ports on the distributed port group. Network Use the drop-down menu to assign the new distributed port group to a userresource pool defined network resource pool. If you have not created a network resource pool, this menu is empty. VLAN Use the Type drop-down menu to select VLAN options: None: Do not use VLAN. VLAN: In the VLAN ID field, enter a number between 1 and 4094. VLAN Trunking: Enter a VLAN trunk range. Private VLAN: Select a private VLAN entry. If you did not create any private VLANs, this menu is empty. Advanced Select this check box to customize the policy configurations for the new distributed port group. (Optional) In the Security section, edit the security exceptions and click Next. Setting Description Promiscuous Reject. Placing a guest adapter in promiscuous mode has no effect on which mode frames are received by the adapter. Accept. Placing a guest adapter in promiscuous mode causes it to detect all frames passed on the vSphere distributed switch. These frames are allowed under the VLAN policy for the port group to which the adapter is connected. MAC address Reject. If you set to Reject and the guest operating system changes the MAC address of the adapter to anything other than what is in the .vmx configuration changes file, all inbound frames are dropped. If the Guest OS changes the MAC address back to match the MAC address in the .vmx configuration file, inbound frames are passed again. Accept. Changing the MAC address from the Guest OS has the intended effect: frames to the new MAC address are received. Forged transmits Reject. Any outbound frame with a source MAC address that is different from the one currently set on the adapter is dropped. Accept. No filtering is performed and all outbound frames are passed. Setting Description Status If you enable either Ingress Traffic Shaping or Egress Traffic Shaping, you are setting limits on the amount of networking bandwidth allocated for each virtual adapter associated with this particular port group. If you disable the policy, services have a free, clear connection to the physical network by default. Average Bandwidth Establishes the number of bits per second to allow across a port, averaged over time. This is the allowed average load. Peak Bandwidth The maximum number of bits per second to allow across a port when it is sending and receiving a burst of traffic. This tops the bandwidth used by a port whenever it is using its burst bonus. Burst Size The maximum number of bytes to allow in a burst. If this parameter is set, a port might gain a burst bonus when it does not use all its allocated bandwidth. Whenever the port needs more bandwidth than specified by Average Bandwidth, it might temporarily transmit data at a higher speed if a burst bonus is available. This parameter tops the number of bytes that might be accumulated in the burst bonus and thus transferred at a higher speed. (Optional) In the Teaming and failover section, edit the settings and click Next. Setting Description Load balancing Specify how to choose an uplink. Route based on the originating virtual port. Choose an uplink based on the virtual port where the traffic entered the distributed switch. Route based on IP hash. Choose an uplink based on a hash of the source and destination IP addresses of each packet. For non-IP packets, whatever is at those offsets is used to compute the hash. Route based on source MAC hash. Choose an uplink based on a hash of the source Ethernet. Route based on physical NIC load. Choose an uplink based on the current loads of physical NICs. Use explicit failover order. Always use the highest order uplink from the list of Active adapters which passes failover detection criteria. Note IP-based teaming requires that the physical switch be configured with etherchannel. For all other options, disable etherchannel. Network failover detection Specify the method to use for failover detection. Link Status only. Relies solely on the link status that the network adapter provides. This option detects failures, such as cable pulls and physical switch power failures, but not configuration errors, such as a physical switch port being blocked by spanning tree or that is misconfigured to the wrong VLAN or cable pulls on the other side of a physical switch. Beacon Probing. Sends out and listens for beacon probes on all NICs in the team and uses this information, in addition to link status, to determine link failure. This detects many of the failures previously mentioned that are not detected by link status alone. Note Do not use beacon probing with IP-hash load balancing. Notify switches Select Yes or No to notify switches in the case of failover. If you select Yes, whenever a virtual NIC is connected to the distributed switch or whenever that virtual NIC’s traffic would be routed over a different physical NIC in the team because of a failover event, a notification is sent out over the network to update the lookup tables on physical switches. In almost all cases, this process is desirable for the lowest latency of failover occurrences and migrations with vMotion. 4.3 vSphere Network I/O Control Network resource pools determine the bandwidth that different network traffic types are given on a vSphere distributed switch. When network I/O control is enabled, distributed switch traffic is divided into the following predefined network resource pools: Fault Tolerance traffic, iSCSI traffic, vMotion traffic, management traffic, vSphere Replication (VR) traffic, NFS traffic, and virtual machine traffic. You can also create custom network resource pools for virtual machine traffic. You can control the bandwidth each network resource pool is given by setting the physical adapter shares and host limit for each network resource pool. The physical adapter shares assigned to a network resource pool determine the share of the total available bandwidth guaranteed to the traffic associated with that network resource pool. The share of transmit bandwidth available to a network resource pool is determined by the network resource pool's shares and what other network resource pools are actively transmitting. For example, if you set your FT traffic and iSCSI traffic resource pools to 100 shares, while each of the other resource pools is set to 50 shares, the FT traffic and iSCSI traffic resource pools each receive 25% of the available bandwidth. The remaining resource pools each receive 12.5% of the available bandwidth. These reservations apply only when the physical adapter is saturated. Note The iSCSI traffic resource pool shares do not apply to iSCSI traffic on a dependent hardware iSCSI adapter. The host limit of a network resource pool is the upper limit of bandwidth that the network resource pool can use. Assigning a QoS priority tag to a network resource pool applies an 802.1p tag to all outgoing packets associated with that network resource pool. Select the Physical adapter shares for the network resource pool. Option Custom Description Type a specific number of shares, from 1 to 100, for this network resource pool. High Sets the shares for this resource pool to 100. Normal Sets the shares for this resource pool to 50. Low Sets the shares for this resource pool to 25. 4.4 TCP Segmentation Offload and Jumbo Frames You enable jumbo frames on a vSphere distributed switch or vSphere standard switch by changing the maximum transmission units (MTU). TCP Segmentation Offload (TSO) is enabled on the VMkernel interface by default, but must be enabled at the virtual machine level. 4.4.1 Enabling TSO To enable TSO at the virtual machine level, you must replace the existing vmxnet or flexible virtual network adapters with enhanced vmxnet virtual network adapters. This replacement might result in a change in the MAC address of the virtual network adapter. TSO support through the enhanced vmxnet network adapter is available for virtual machines that run the following guest operating systems: Microsoft Windows 2003 Enterprise Edition with Service Pack 2 (32 bit and 64 bit) Red Hat Enterprise Linux 4 (64 bit) Red Hat Enterprise Linux 5 (32 bit and 64 bit) SUSE Linux Enterprise Server 10 (32 bit and 64 bit) 4.4.1.1 Enable TSO Support for a Virtual Machine You can enable TSO support on a virtual machine by using an enhanced vmxnet adapter for that virtual machine. 4.5 Single Root I/O Virtualization (SR-IOV) vSphere 5.1 and later supports Single Root I/O Virtualization (SR-IOV). SR-IOV is a specification that allows a single Peripheral Component Interconnect Express (PCIe) physical device under a single root port to appear to be multiple separate physical devices to the hypervisor or the guest operating system. SR-IOV uses physical functions (PFs) and virtual functions (VFs) to manage global functions for the SR-IOV devices. PFs are full PCIe functions that include the SR-IOV Extended Capability which is used to configure and manage the SR-IOV functionality. It is possible to configure or control PCIe devices using PFs, and the PF has full ability to move data in and out of the device. VFs are lightweight PCIe functions that contain all the resources necessary for data movement but have a carefully minimized set of configuration resources. SR-IOV-enabled PCIe devices present multiple instances of themselves to the guest OS instance and hypervisor. The number of virtual functions presented depends on the device. For SR-IOV-enabled PCIe devices to function, you must have the appropriate BIOS and hardware support, as well as SR-IOV support in the guest driver or hypervisor instance. 4.5.1 SR-IOV Support vSphere 5.1 supports SR-IOV. However, some features of vSphere are not functional when SR-IOV is enabled. 4.5.1.1 Supported Configurations To use SR-IOV, your environment must meet the following configuration requirements: Supported Configurations for Using SR-IOV Component vSphere Requirements Hosts with Intel processors require ESXi 5.1 or later. Hosts with AMD processors are not supported with SR-IOV. Must be compatible with the ESXi release. Must have an Intel processor. Must not have an AMD processor. Physical host Must support input/output memory management unit (IOMMU), and must have IOMMU enabled in the BIOS. Must support SR-IOV, and must have SR-IOV enabled in the BIOS. Contact the server vendor to determine whether the host supports SR-IOV. Supported Configurations for Using SR-IOV Component Requirements Must be compatible with the ESXi release. Physical NIC Must be supported for use with the host and SR-IOV according to the technical documentation from the server vendor. Must have SR-IOV enabled in the firmware. Must be certified by VMware. PF driver in ESXi for Must be installed on the ESXi host. The ESXi release provides a the physical NIC default driver for certain NICs, while for others you must download and manually install it. Guest OS Red Hat Enterprise Linux 6.x Windows Server 2008 R2 with SP2 o verify compatibility of physical hosts and NICs with ESXi releases, see the VMware Compatibility Guide. 4.5.1.2 Availability of Features The following features are not available for virtual machines configured with SR-IOV: vMotion Storage vMotion vShield Netflow Virtual Wire High Availability Fault Tolerance DRS DPM Suspend and resume Snapshots MAC-based VLAN for passthrough virtual functions Hot addition and removal of virtual devices, memory, and vCPU Participation in a cluster environment Note Attempts to enable or configure unsupported features with SR-IOV in the vSphere Web Client result in unexpected behavior in your environment. 4.5.1.3 Supported NICs The following NICs are supported for virtual machines configured with SR-IOV. All NICs must have drivers and firmware that support SR-IOV. Some NICs might require SR-IOV to be enabled on the firmware. Products based on the Intel 82599ES 10 Gigabit Ethernet Controller Family (Niantic) Products based on the Intel Ethernet Controller X540 Family (Twinville) Emulex OneConnect (BE3) 4.5.1.4 Upgrading from earlier versions of vSphere If you upgrade from vSphere 5.0 or earlier to vSphere 5.1 or later, SR-IOV support is not available until you update the NIC drivers for the vSphere release. NICs must have firmware and drivers that support SR-IOV enabled for SR-IOV functionality to operate. 4.5.2 vSphere 5.1 and Virtual Function Interaction Virtual functions (VFs) are lightweight PCIe functions that contain all the resources necessary for data movement but have a carefully minimized set of configuration resources. There are some restrictions in the interactions between vSphere 5.1 and VFs. When a physical NIC creates VFs for SR-IOV to use, the physical NIC becomes a hidden uplink and cannot be used as a normal uplink. This means it cannot be added to a standard or distributed switch. There is no rate control for VFs in vSphere 5.1. Every VF could potentially use the entire bandwidth for a physical link. When a VF device is configured as a passthrough device on a virtual machine, the standby and hibernate functions for the virtual machine are not supported. Due to the limited number of vectors available for passthrough devices, there is a limited number of VFs supported on an vSphere ESXi host . vSphere 5.1 SR-IOV supports up to 41 VFs on supported Intel NICs and up to 64 VFs on supported Emulex NICs. 4.5.3 DirectPath I/O vs SR-IOV SR-IOV offers performance benefits and tradeoffs similar to those of DirectPath I/O. DirectPath I/O and SR-IOV have similar functionalty but you use them to accomplish different things. SR-IOV is beneficial in workloads with very high packet rates or very low latency requirements. Like DirectPath I/O, SR-IOV is not compatible with certain core virtualization features, such as vMotion. SR-IOV does, however, allow for a single physical device to be shared amongst multiple guests. With DirectPath I/O you can map only one physical funtion to one virtual machine. SR-IOV lets you share a single physical device, allowing multiple virtual machines to connect directly to the physical funtion. This functionality allows you to virtualize low-latency (less than 50 microsec) and high PPS (greater than 50,000 such as network appliances or purpose built solutions) workloads on a VMWorkstation. 4.5.3.1 Configure SR-IOV in a Host Profile with the vSphere Web Client Before you can connect a virtual machine to a virtual function, you have to configure the virtual functions of the physical NIC on your host by using a host profile. You can enable SR-IOV virtual functions on the host by using the esxcli system module parameters set vCLI command on the NIC driver parameter for virtual functions in accordance with the driver documentation. For more information about using vCLI commands, see vSphere Command-Line Interface Documentation. 4.5.3.2 LACP Limitations on a vSphere Distributed Switch Link Aggregation Control Protocol (LACP) on a vSphere distributed switch allows network devices to negotiate automatic bundling of links by sending LACP packets to a peer. However, there are some limitations when using LACP with a vSphere distributed switch. LACP only works with IP Hash load balancing and Link Status Network failover detection. LACP is not compatible with iSCSI software multipathing. vSphere only supports one LACP group per distributed switch, and only one LACP group per host. LACP settings do not exist in host profiles. LACP between two nested ESXi hosts is not possible. LACP does not work with port mirroring. Option Description Promiscuous Mode Reject — Placing a guest adapter in promiscuous mode has no effect on which frames are received by the adapter. Accept — Placing a guest adapter in promiscuous mode causes it to detect all frames passed on the vSphere standard switch that are allowed under the VLAN policy for the port group that the adapter is connected to. MAC Address Changes Reject — If you set the MAC Address Changes to Reject and the guest operating system changes the MAC address of the adapter to anything other than what is in the .vmx configuration file, all inbound frames are dropped. If the Guest OS changes the MAC address back to match the MAC address in the .vmx configuration file, inbound frames are passed again. Accept — Changing the MAC address from the Guest OS has the intended effect: frames to the new MAC address are received. Forged Transmits Reject — Any outbound frame with a source MAC address that is different from the one currently set on the adapter are dropped. Accept — No filtering is performed and all outbound frames are passed. 4.6 Configure NetFlow Settings NetFlow is a network analysis tool that you can use to monitor network monitoring and virtual machine traffic. NetFlow is available on vSphere distributed switch version 5.0.0 and later. Procedure Log in to the vSphere Client and select the Networking inventory view. Right-click the vSphere distributed switch in the inventory pane, and select Edit Settings. Navigate to the NetFlow tab. Type the IP address and Port of the NetFlow collector. Type the VDS IP address. With an IP address to the vSphere distributed switch, the NetFlow collector can interact with the vSphere distributed switch as a single switch, rather than interacting with a separate, unrelated switch for each associated host. (Optional) Use the up and down menu arrows to set the Active flow export timeout and Idle flow export timeout. (Optional) Use the up and down menu arrows to set the Sampling rate. The sampling rate determines what portion of data NetFlow collects, with the sampling rate number determining how often NetFlow collects the packets. A collector with a sampling rate of 2 collects data from every other packet. A collector with a sampling rate of 5 collects data from every fifth packet. (Optional) Select Process internal flows only to collect data only on network activity between virtual machines on the same host. Click OK. 4.6.1 CDP Option Description Listen ESXi detects and displays information about the associated Cisco switch port, but information about the vSphere distributed switch is not available to the Cisco switch administrator. AdvertiseESXi makes information about the vSphere distributed switch available to the Cisco switch administrator, but does not detect and display information about the Cisco switch. Both ESXi detects and displays information about the associated Cisco switch and makes information about the vSphere distributed switch available to the Cisco switch administrator. 4.7 Mounting NFS Volumes ESXi supports VMkernel-based NFS mounts for storing virtual disks on NFS datastores. In addition to storing virtual disks on NFS datastores, you can also use NFS Datastores as a central repository for ISO images and virtual machine templates. For more information about creating NFS datastores, see vSphere Storage. ESXi supports NFS version 3 over Layer 2 and Layer 3 Network switches. Host servers and NFS storage arrays must be on different subnets and the network switch must handle the routing information. 4.8 Networking Best Practices Consider these best practices when you configure your network. Separate network services from one another to achieve greater security and better performance. Put a set of virtual machines on a separate physical NIC. This separation allows for a portion of the total networking workload to be shared evenly across multiple CPUs. The isolated virtual machines can then better serve traffic from a Web client, for example Keep the vMotion connection on a separate network devoted to vMotion. When migration with vMotion occurs, the contents of the guest operating system’s memory is transmitted over the network. You can do this either by using VLANs to segment a single physical network or by using separate physical networks (the latter is preferable). When using passthrough devices with a Linux kernel version 2.6.20 or earlier, avoid MSI and MSI-X modes because these modes have significant performance impact. To physically separate network services and to dedicate a particular set of NICs to a specific network service, create a vSphere standard switch or vSphere distributed switch for each service. If this is not possible, separate network services on a single switch by attaching them to port groups with different VLAN IDs. In either case, confirm with your network administrator that the networks or VLANs you choose are isolated in the rest of your environment and that no routers connect them. You can add and remove network adapters from a standard or distributed switch without affecting the virtual machines or the network service that is running behind that switch. If you remove all the running hardware, the virtual machines can still communicate among themselves. If you leave one network adapter intact, all the virtual machines can still connect with the physical network. To protect your most sensitive virtual machines, deploy firewalls in virtual machines that route between virtual networks with uplinks to physical networks and pure virtual networks with no uplinks. For best performance, use vmxnet3 virtual NICs. Every physical network adapter connected to the same vSphere standard switch or vSphere distributed switch should also be connected to the same physical network. Configure all VMkernel network adapters to the same MTU. When several VMkernel network adapters are connected to vSphere distributed switches but have different MTUs configured, you might experience network connectivity problems. When creating a distributed port group, do not use dynamic binding. Dynamic binding is deprecated in ESXi 5.0. 5 Storage 5.1 Making LUN Decisions You must plan how to set up storage for your ESXi systems before you format LUNs with VMFS datastores. When you make your LUN decision, keep in mind the following considerations: Each LUN should have the correct RAID level and storage characteristic for the applications running in virtual machines that use the LUN. Each LUN must contain only one VMFS datastore. If multiple virtual machines access the same VMFS, use disk shares to prioritize virtual machines. You might want fewer, larger LUNs for the following reasons: More flexibility to create virtual machines without asking the storage administrator for more space. More flexibility for resizing virtual disks, doing snapshots, and so on. Fewer VMFS datastores to manage. You might want more, smaller LUNs for the following reasons: Less wasted storage space. Different applications might need different RAID characteristics. More flexibility, as the multipathing policy and disk shares are set per LUN. Use of Microsoft Cluster Service requires that each cluster disk resource is in its own LUN. Better performance because there is less contention for a single volume. When the storage characterization for a virtual machine is not available, there is often no simple method to determine the number and size of LUNs to provision. You can experiment using either a predictive or adaptive scheme. 5.1.1 Use the Predictive Scheme to Make LUN Decisions When setting up storage for ESXi systems, before creating VMFS datastores, you must decide on the size and number of LUNs to provision. You can experiment using the predictive scheme. Procedure Provision several LUNs with different storage characteristics. Create a VMFS datastore on each LUN, labeling each datastore according to its characteristics. Create virtual disks to contain the data for virtual machine applications in the VMFS datastores created on LUNs with the appropriate RAID level for the applications' requirements. Use disk shares to distinguish high-priority from low-priority virtual machines. Note Disk shares are relevant only within a given host. The shares assigned to virtual machines on one host have no effect on virtual machines on other hosts. Run the applications to determine whether virtual machine performance is acceptable. 5.1.2 Use the Adaptive Scheme to Make LUN Decisions When setting up storage for ESXi hosts, before creating VMFS datastores, you must decide on the number and size of LUNS to provision. You can experiment using the adaptive scheme. Procedure Provision a large LUN (RAID 1+0 or RAID 5), with write caching enabled. Create a VMFS on that LUN. Create four or five virtual disks on the VMFS. Run the applications to determine whether disk performance is acceptable. If performance is acceptable, you can place additional virtual disks on the VMFS. If performance is not acceptable, create a new, large LUN, possibly with a different RAID level, and repeat the process. Use migration so that you do not lose virtual machines data when you recreate the LUN. 5.1.3 NPIV Capabilities and Limitations Learn about specific capabilities and limitations of the use of NPIV with ESXi. ESXi with NPIV supports the following items: NPIV supports vMotion. When you use vMotion to migrate a virtual machine it retains the assigned WWN. If you migrate an NPIV-enabled virtual machine to a host that does not support NPIV, VMkernel reverts to using a physical HBA to route the I/O. If your FC SAN environment supports concurrent I/O on the disks from an activeactive array, the concurrent I/O to two different NPIV ports is also supported. When you use ESXi with NPIV, the following limitations apply: Because the NPIV technology is an extension to the FC protocol, it requires an FC switch and does not work on the direct attached FC disks. When you clone a virtual machine or template with a WWN assigned to it, the clones do not retain the WWN. NPIV does not support Storage vMotion. Disabling and then re-enabling the NPIV capability on an FC switch while virtual machines are running can cause an FC link to fail and I/O to stop. ESXi system requirements Follow vendor recommendation for the server booting from a SAN. Adapter requirements Enable and correctly configure the adapter, so it can access the boot LUN. See your vendor documentation. Access control Each host must have access to its own boot LUN only, not the boot LUNs of other hosts. Use storage system software to make sure that the host accesses only the designated LUNs. Multiple servers can share a diagnostic partition. You can use array specific LUN masking to achieve this. Multipathing support Multipathing to a boot LUN on active-passive arrays is not supported because the BIOS does not support multipathing and is unable to activate a standby path. SAN considerations SAN connections must be through a switched topology if the array is not certified for direct connect topology. If the array is certified for direct connect topology, the SAN connections can be made directly to the array. Boot from SAN is supported for both switched topology and direct connect topology if these topologies for the specific array are certified. Hardwarespecific considerations If you are running an IBM eServer BladeCenter and use boot from SAN, you must disable IDE drives on the blades. 5.2 Best Practices for Fibre Channel Storage When using ESXi with Fibre Channel SAN, follow best practices that VMware offers to avoid performance problems. The vSphere Client and the vSphere Web Client offer extensive facilities for collecting performance information. The information is graphically displayed and frequently updated. You can also use the resxtop or esxtop command-line utilities. The utilities provide a detailed look at how ESXi uses resources in real time. For more information, see the vSphere Resource Management documentation. Check with your storage representative if your storage system supports Storage API - Array Integration hardware acceleration features. If it does, refer to your vendor documentation for information on how to enable hardware acceleration support on the storage system side. For more information, see Storage Hardware Acceleration. This chapter includes the following topics: Preventing Fibre Channel SAN Problems Disable Automatic Host Registration Disable Automatic Host Registration in the vSphere Web Client Optimizing Fibre Channel SAN Storage Performance Fibre Channel SAN Configuration Checklist 5.3 Preventing Fibre Channel SAN Problems When using ESXi in conjunction with a Fibre Channel SAN, you must follow specific guidelines to avoid SAN problems. You should observe these tips for preventing problems with your SAN configuration: Place only one VMFS datastore on each LUN. Do not change the path policy the system sets for you unless you understand the implications of making such a change. Document everything. Include information about zoning, access control, storage, switch, server and FC HBA configuration, software and firmware versions, and storage cable plan. o Plan for failure: o Make several copies of your topology maps. For each element, consider what happens to your SAN if the element fails. Cross off different links, switches, HBAs and other elements to ensure you did not miss a critical failure point in your design. Ensure that the Fibre Channel HBAs are installed in the correct slots in the host, based on slot and bus speed. Balance PCI bus load among the available busses in the server. Become familiar with the various monitor points in your storage network, at all visibility points, including host's performance charts, FC switch statistics, and storage performance statistics. Be cautious when changing IDs of the LUNs that have VMFS datastores being used by your ESXi host. If you change the ID, the datastore becomes inactive and its virtual machines fail. You can resignature the datastore to make it active again. See Managing Duplicate VMFS Datastores. If there are no running virtual machines on the VMFS datastore, after you change the ID of the LUN, you must use rescan to reset the ID on your host. For information on using rescan, see Storage Refresh and Rescan Operations. 5.4 Disable Automatic Host Registration When you use EMC CLARiiON or Invista arrays for storage, it is required that the hosts register with the arrays. ESXi performs automatic host registration by sending the host's name and IP address to the array. If you prefer to perform manual registration using storage management software, disable the ESXi auto-registration feature. 5.5 Optimizing Fibre Channel SAN Storage Performance Several factors contribute to optimizing a typical SAN environment. If the environment is properly configured, the SAN fabric components (particularly the SAN switches) are only minor contributors because of their low latencies relative to servers and storage arrays. Make sure that the paths through the switch fabric are not saturated, that is, that the switch fabric is running at the highest throughput. 5.5.1 Storage Array Performance Storage array performance is one of the major factors contributing to the performance of the entire SAN environment. If there are issues with storage array performance, be sure to consult your storage array vendor’s documentation for any relevant information. Follow these general guidelines to improve the array performance in the vSphere environment: When assigning LUNs, remember that each LUN is accessed by a number of hosts, and that a number of virtual machines can run on each host. One LUN used by a host can service I/O from many different applications running on different operating systems. Because of this diverse workload, the RAID group containing the ESXi LUNs should not include LUNs used by other servers that are not running ESXi. Make sure read/write caching is enabled. SAN storage arrays require continual redesign and tuning to ensure that I/O is load balanced across all storage array paths. To meet this requirement, distribute the paths to the LUNs among all the SPs to provide optimal load balancing. Close monitoring indicates when it is necessary to rebalance the LUN distribution. Tuning statically balanced storage arrays is a matter of monitoring the specific performance statistics (such as I/O operations per second, blocks per second, and response time) and distributing the LUN workload to spread the workload across all the SPs. Note Dynamic load balancing is not currently supported with ESXi. 5.5.2 Server Performance with Fibre Channel You must consider several factors to ensure optimal server performance. Each server application must have access to its designated storage with the following conditions: High I/O rate (number of I/O operations per second) High throughput (megabytes per second) Minimal latency (response times) Because each application has different requirements, you can meet these goals by choosing an appropriate RAID group on the storage array. To achieve performance goals: Place each LUN on a RAID group that provides the necessary performance levels. Pay attention to the activities and resource utilization of other LUNS in the assigned RAID group. A high-performance RAID group that has too many applications doing I/O to it might not meet performance goals required by an application running on the ESXi host. Make sure that each server has a sufficient number of HBAs to allow maximum throughput for all the applications hosted on the server for the peak period. I/O spread across multiple HBAs provide higher throughput and less latency for each application. To provide redundancy in the event of HBA failure, make sure the server is connected to a dual redundant fabric. When allocating LUNs or RAID groups for ESXi systems, multiple operating systems use and share that resource. As a result, the performance required from each LUN in the storage subsystem can be much higher if you are working with ESXi systems than if you are using physical machines. For example, if you expect to run four I/O intensive applications, allocate four times the performance capacity for the ESXi LUNs. When using multiple ESXi systems in conjunction with vCenter Server, the performance needed from the storage subsystem increases correspondingly. The number of outstanding I/Os needed by applications running on an ESXi system should match the number of I/Os the HBA and storage array can handle. 5.5.3 Fibre Channel SAN Configuration Checklist This topic provides a checklist of special setup requirements for different storage arrays and ESXi hosts. Multipathing Setup Requirements Component Comments All storage arrays Write cache must be disabled if not battery backed. Topology No single failure should cause both HBA and SP failover, especially with active-passive storage arrays. IBM TotalStorage DS 4000 (formerly FastT) Host type must be LNXCL or VMware in later versions. AVT (Auto Volume Transfer) is disabled in this host mode. HDS 9500V family (Thunder) requires two host modes: Host Mode 1: Standard. HDS 99xx and 95xxV family Host Mode 2: Sun Cluster HDS 99xx family (Lightning) and HDS Tabma (USP) require host mode set to Netware. EMC Symmetrix Enable the SPC2 and SC3 settings. Contact EMC for the latest settings. EMC Clariion Set the EMC Clariion failover mode to 1 or 4. Contact EMC for Multipathing Setup Requirements Component Comments details. Host type must be Linux. HP MSA Set the connection type for each HBA port to Linux. 5.6 iSCSI 5.6.1 iSCSI Naming Conventions iSCSI uses a special unique name to identify an iSCSI node, either target or initiator. This name is similar to the WorldWide Name (WWN) associated with Fibre Channel devices and is used as a way to universally identify the node. iSCSI names are formatted in two different ways. The most common is the IQN format. For more details on iSCSI naming requirements and string profiles, see RFC 3721 and RFC 3722 on the IETF Web site. 5.6.1.1 iSCSI Qualified Name (IQN) Format The IQN format takes the form iqn.yyyy-mm.naming-authority:unique name, where: yyyy-mm is the year and month when the naming authority was established. naming-authority is usually reverse syntax of the Internet domain name of the naming authority. For example, the iscsi.vmware.com naming authority could have the iSCSI qualified name form of iqn.1998-01.com.vmware.iscsi. The name indicates that the vmware.com domain name was registered in January of 1998, and iscsi is a subdomain, maintained by vmware.com. unique name is any name you want to use, for example, the name of your host. The naming authority must make sure that any names assigned following the colon are unique, such as: o iqn.1998-01.com.vmware.iscsi:name1 o iqn.1998-01.com.vmware.iscsi:name2 o iqn.1998-01.com.vmware.iscsi:name999 5.6.1.2 Enterprise Unique Identifier (EUI) Format The EUI format takes the form eui.16 hex digits. For example, eui.0123456789ABCDEF. The 16-hexadecimal digits are text representations of a 64-bit number of an IEEE EUI (extended unique identifier) format. The top 24 bits are a company ID that IEEE registers with a particular company. The lower 40 bits are assigned by the entity holding that company ID and must be unique. 5.6.2 SCSI Storage System Types ESXi supports different storage systems and arrays. The types of storage that your host supports include active-active, active-passive, and ALUAcompliant. Active-active storage system Allows access to the LUNs simultaneously through all the storage ports that are available without significant performance degradation. All the paths are active at all times, unless a path fails. Active-passive storage system A system in which one storage processor is actively providing access to a given LUN. The other processors act as backup for the LUN and can be actively providing access to other LUN I/O. I/O can be successfully sent only to an active port for a given LUN. If access through the active storage port fails, one of the passive storage processors can be activated by the servers accessing it. Asymmetrical storage system Supports Asymmetric Logical Unit Access (ALUA). ALUA-complaint storage systems provide different levels of access per port. ALUA allows hosts to determine the states of target ports and prioritize paths. The host uses some of the active paths as primary while others as secondary. Virtual port storage system Allows access to all available LUNs through a single virtual port. These are active-active storage devices, but hide their multiple connections though a single port. ESXi multipathing does not make multiple connections from a specific port to the storage by default. Some storage vendors supply session managers to establish and manage multiple connections to their storage. These storage systems handle port failover and connection balancing transparently. This is often referred to as transparent failover. 5.6.3 Error Correction To protect the integrity of iSCSI headers and data, the iSCSI protocol defines error correction methods known as header digests and data digests. Both parameters are disabled by default, but you can enable them. These digests pertain to, respectively, the header and SCSI data being transferred between iSCSI initiators and targets, in both directions. Header and data digests check the end-to-end, noncryptographic data integrity beyond the integrity checks that other networking layers provide, such as TCP and Ethernet. They check the entire communication path, including all elements that can change the network-level traffic, such as routers, switches, and proxies. The existence and type of the digests are negotiated when an iSCSI connection is established. When the initiator and target agree on a digest configuration, this digest must be used for all traffic between them. Enabling header and data digests does require additional processing for both the initiator and the target and can affect throughput and CPU use performance. 5.7 iSCSI SAN Restrictions A number of restrictions exist when you use ESXi with an iSCSI SAN. ESXi does not support iSCSI-connected tape devices. You cannot use virtual-machine multipathing software to perform I/O load balancing to a single physical LUN. ESXi does not support multipathing when you combine independent hardware adapters with either software or dependent hardware adapters. ESXi does not support IPv6 with software iSCSI and dependent hardware iSCSI. 5.7.1 Dependent Hardware iSCSI Considerations When you use dependent hardware iSCSI adapters with ESXi, certain considerations apply. When you use any dependent hardware iSCSI adapter, performance reporting for a NIC associated with the adapter might show little or no activity, even when iSCSI traffic is heavy. This behavior occurs because the iSCSI traffic bypasses the regular networking stack. If you use a third-party virtual switch, for example Cisco Nexus 1000V DVS, disable automatic pinning. Use manual pinning instead, making sure to connect a VMkernel adapter (vmk) to an appropriate physical NIC (vmnic). For information, refer to your virtual switch vendor documentation. The Broadcom iSCSI adapter performs data reassembly in hardware, which has a limited buffer space. When you use the Broadcom iSCSI adapter in a congested network or under heavy load, enable flow control to avoid performance degradation. Flow control manages the rate of data transmission between two nodes to prevent a fast sender from overrunning a slow receiver. For best results, enable flow control at the end points of the I/O path, at the hosts and iSCSI storage systems. To enable flow control for the host, use the esxcli system module parameters command. For details, see the VMware knowledge base article at http://kb.vmware.com/kb/1013413 Broadcom iSCSI adapters do not support IPv6. 5.7.2 Managing iSCSI Network Special consideration apply to network adapters, both physical and VMkernel, that are associated with an iSCSI adapter. After you create network connections for iSCSI, an iSCSI indicator on a number of Networking dialog boxes becomes enabled. This indicator shows that a particular virtual or physical network adapter is iSCSI-bound. To avoid disruptions in iSCSI traffic, follow these guidelines and considerations when managing iSCSI-bound virtual and physical network adapters: Make sure that the VMkernel network adapters are assigned addresses on the same subnet as the iSCSI storage portal they connect to. iSCSI adapters using VMkernel adapters are not able to connect to iSCSI ports on different subnets, even if those ports are discovered by the iSCSI adapters. When using separate vSphere switches to connect physical network adapters and VMkernel adapters, make sure that the vSphere switches connect to different IP subnets. If VMkernel adapters are on the same subnet, they must connect to a single vSwitch. If you migrate VMkernel adapters to a different vSphere switch, move associated physical adapters. Do not make configuration changes to iSCSI-bound VMkernel adapters or physical network adapters. Do not make changes that might break association of VMkernel adapters and physical network adapters. You can break the association if you remove one of the adapters or the vSphere switch that connects them, or change the 1:1 network policy for their connection. 5.7.3 iSCSI Network Troubleshooting A warning sign indicates non-compliant port group policy for an iSCSI-bound VMkernel adapter. Problem The VMkernel adapter's port group policy is considered non-compliant in the following cases: The VMkernel adapter is not connected to an active physical network adapter. The VMkernel adapter is connected to more than one physical network adapter. The VMkernel adapter is connected to one or more standby physical adapters. The active physical adapter is changed. CHAP Security Level Description Supported Software iSCSI None The host does not use CHAP authentication. Dependent Select this option to disable authentication if hardware iSCSI it is currently enabled. Independent hardware iSCSI The host prefers a non-CHAP connection, Use unidirectional CHAP but can use a CHAP connection if required if required by target by the target. Software iSCSI Dependent hardware iSCSI Software iSCSI Use unidirectional CHAP The host prefers CHAP, but can use nonunless prohibited by CHAP connections if the target does not target support CHAP. Dependent hardware iSCSI Independent hardware iSCSI Software iSCSI The host requires successful CHAP Dependent Use unidirectional CHAP authentication. The connection fails if CHAP hardware iSCSI negotiation fails. Independent hardware iSCSI Use bidirectional CHAP Software iSCSI The host and the target support bidirectional Dependent CHAP. hardware iSCSI 5.8 iBFT iSCSI Boot Overview ESXi hosts can boot from an iSCSI SAN using the software or dependent hardware iSCSI adapters and network adapters. To deploy ESXi and boot from the iSCSI SAN, the host must have an iSCSI boot capable network adapter that supports the iSCSI Boot Firmware Table (iBFT) format. The iBFT is a method of communicating parameters about the iSCSI boot device to an operating system. Before installing ESXi and booting from the iSCSI SAN, configure the networking and iSCSI boot parameters on the network adapter and enable the adapter for the iSCSI boot. Because configuring the network adapter is vendor specific, review your vendor documentation for instructions. When you first boot from iSCSI, the iSCSI boot firmware on your system connects to an iSCSI target. If login is successful, the firmware saves the networking and iSCSI boot parameters in the iBFT and stores the table in the system's memory. The system uses this table to configure its own iSCSI connection and networking and to start up. The following list describes the iBFT iSCSI boot sequence. When restarted, the system BIOS detects the iSCSI boot firmware on the network adapter. The iSCSI boot firmware uses the preconfigured boot parameters to connect with the specified iSCSI target. If the connection to the iSCSI target is successful, the iSCSI boot firmware writes the networking and iSCSI boot parameters in to the iBFT and stores 3. the table in the system memory. Note The system uses this table to configure its own iSCSI connection and networking and to start up. 4. 5. 6. 7. The BIOS boots the boot device. The VMkernel starts loading and takes over the boot operation. Using the boot parameters from the iBFT, the VMkernel connects to the iSCSI target. After the iSCSI connection is established, the system boots. 5.8.1 iBFT iSCSI Boot Considerations When you boot the ESXi host from iSCSI using iBFT-enabled network adapters, certain considerations apply. The iBFT iSCSI boot does not support the following items: IPv6 Failover for the iBFT-enabled network adapters Note Update your NIC's boot code and iBFT firmware using vendor supplied tools before trying to install and boot VMware ESXi. Consult vendor documentation and VMware HCL for supported boot code and iBFT firmware versions for VMware ESXi iBFT boot. The boot code and iBFT firmware released by vendors prior to the ESXi 4.1 release might not work. After you set up your host to boot from iBFT iSCSI, the following restrictions apply: You cannot disable the software iSCSI adapter. If the iBFT configuration is present in the BIOS, the host re-enables the software iSCSI adapter during each reboot. Note If you do not use the iBFT-enabled network adapter for the iSCSI boot and do not want the software iSCSI adapter to be always enabled, remove the iBFT configuration from the network adapter. You cannot remove the iBFT iSCSI boot target using the vSphere Client or the vSphere Web Client. The target appears on the list of adapter static targets. 5.9 Best Practices for iSCSI Storage When using ESXi with the iSCSI SAN, follow best practices that VMware offers to avoid problems. Check with your storage representative if your storage system supports Storage API - Array Integration hardware acceleration features. If it does, refer to your vendor documentation for information on how to enable hardware acceleration support on the storage system side. For more information, see Storage Hardware Acceleration. This chapter includes the following topics: Preventing iSCSI SAN Problems Optimizing iSCSI SAN Storage Performance Checking Ethernet Switch Statistics iSCSI SAN Configuration Checklist 5.10 Preventing iSCSI SAN Problems When using ESXi in conjunction with a SAN, you must follow specific guidelines to avoid SAN problems. You should observe these tips for avoiding problems with your SAN configuration: Place only one VMFS datastore on each LUN. Multiple VMFS datastores on one LUN is not recommended. Do not change the path policy the system sets for you unless you understand the implications of making such a change. Document everything. Include information about configuration, access control, storage, switch, server and iSCSI HBA configuration, software and firmware versions, and storage cable plan. Plan for failure: Make several copies of your topology maps. For each element, consider what happens to your SAN if the element fails. Cross off different links, switches, HBAs and other elements to ensure you did not miss a critical failure point in your design. Ensure that the iSCSI HBAs are installed in the correct slots in the ESXi host, based on slot and bus speed. Balance PCI bus load among the available busses in the server. Become familiar with the various monitor points in your storage network, at all visibility points, including ESXi performance charts, Ethernet switch statistics, and storage performance statistics. Be cautious when changing IDs of the LUNs that have VMFS datastores being used by your host. If you change the ID, virtual machines running on the VMFS datastore will fail. If there are no running virtual machines on the VMFS datastore, after you change the ID of the LUN, you must use rescan to reset the ID on your host. For information on using rescan, see Storage Refresh and Rescan Operations. If you need to change the default iSCSI name of your iSCSI adapter, make sure the name you enter is worldwide unique and properly formatted. To avoid storage access problems, never assign the same iSCSI name to different adapters, even on different hosts. 5.11 Optimizing iSCSI SAN Storage Performance Several factors contribute to optimizing a typical SAN environment. If the network environment is properly configured, the iSCSI components provide adequate throughput and low enough latency for iSCSI initiators and targets. If the network is congested and links, switches or routers are saturated, iSCSI performance suffers and might not be adequate for ESXi environments. 5.11.1 Storage System Performance Storage system performance is one of the major factors contributing to the performance of the entire iSCSI environment. If issues occur with storage system performance, consult your storage system vendor’s documentation for any relevant information. When you assign LUNs, remember that you can access each shared LUN through a number of hosts, and that a number of virtual machines can run on each host. One LUN used by the ESXi host can service I/O from many different applications running on different operating systems. Because of this diverse workload, the RAID group that contains the ESXi LUNs should not include LUNs that other hosts use that are not running ESXi for I/O intensive applications. Enable read caching and write caching. Load balancing is the process of spreading server I/O requests across all available SPs and their associated host server paths. The goal is to optimize performance in terms of throughput (I/O per second, megabytes per second, or response times). SAN storage systems require continual redesign and tuning to ensure that I/O is load balanced across all storage system paths. To meet this requirement, distribute the paths to the LUNs among all the SPs to provide optimal load balancing. Close monitoring indicates when it is necessary to manually rebalance the LUN distribution. Tuning statically balanced storage systems is a matter of monitoring the specific performance statistics (such as I/O operations per second, blocks per second, and response time) and distributing the LUN workload to spread the workload across all the SPs. 5.11.2 Server Performance with iSCSI You must consider several factors to ensure optimal server performance. Each server application must have access to its designated storage with the following conditions: High I/O rate (number of I/O operations per second) High throughput (megabytes per second) Minimal latency (response times) Because each application has different requirements, you can meet these goals by choosing an appropriate RAID group on the storage system. To achieve performance goals, perform the following tasks: Place each LUN on a RAID group that provides the necessary performance levels. Pay attention to the activities and resource utilization of other LUNS in the assigned RAID group. A high-performance RAID group that has too many applications doing I/O to it might not meet performance goals required by an application running on the ESXi host. Provide each server with a sufficient number of network adapters or iSCSI hardware adapters to allow maximum throughput for all the applications hosted on the server for the peak period. I/O spread across multiple ports provides higher throughput and less latency for each application. To provide redundancy for software iSCSI, make sure the initiator is connected to all network adapters used for iSCSI connectivity. When allocating LUNs or RAID groups for ESXi systems, multiple operating systems use and share that resource. As a result, the performance required from each LUN in the storage subsystem can be much higher if you are working with ESXi systems than if you are using physical machines. For example, if you expect to run four I/O intensive applications, allocate four times the performance capacity for the ESXi LUNs. When using multiple ESXi systems in conjunction with vCenter Server, the performance needed from the storage subsystem increases correspondingly. The number of outstanding I/Os needed by applications running on an ESXi system should match the number of I/Os the SAN can handle. 5.11.3 Network Performance A typical SAN consists of a collection of computers connected to a collection of storage systems through a network of switches. Several computers often access the same storage. Single Ethernet Link Connection to Storage shows several computer systems connected to a storage system through an Ethernet switch. In this configuration, each system is connected through a single Ethernet link to the switch, which is also connected to the storage system through a single Ethernet link. In most configurations, with modern switches and typical traffic, this is not a problem. Single Ethernet Link Connection to Storage When systems read data from storage, the maximum response from the storage is to send enough data to fill the link between the storage systems and the Ethernet switch. It is unlikely that any single system or virtual machine gets full use of the network speed, but this situation can be expected when many systems share one storage device. When writing data to storage, multiple systems or virtual machines might attempt to fill their links. As Dropped Packets shows, when this happens, the switch between the systems and the storage system has to drop data. This happens because, while it hassingle connection to the storage device, it has more traffic to send to the storage system than a single link can carry. In this case, the switch drops network packets because the amount of data it can transmit is limited by the speed of the link between it and the storage system. Dropped Packets Recovering from dropped network packets results in large performance degradation. In addition to time spent determining that data was dropped, the retransmission uses network bandwidth that could otherwise be used for current transactions. iSCSI traffic is carried on the network by the Transmission Control Protocol (TCP). TCP is a reliable transmission protocol that ensures that dropped packets are retried and eventually reach their destination. TCP is designed to recover from dropped packets and retransmits them quickly and seamlessly. However, when the switch discards packets with any regularity, network throughput suffers significantly. The network becomes congested with requests to resend data and with the resent packets, and less data is actually transferred than in a network without congestion. Most Ethernet switches can buffer, or store, data and give every device attempting to send data an equal chance to get to the destination. This ability to buffer some transmissions, combined with many systems limiting the number of outstanding commands, allows small bursts from several systems to be sent to a storage system in turn. If the transactions are large and multiple servers are trying to send data through a single switch port, a switch's ability to buffer one request while another is transmitted can be exceeded. In this case, the switch drops the data it cannot send, and the storage system must request retransmission of the dropped packet. For example, if an Ethernet switch can buffer 32KB on an input port, but the server connected to it thinks it can send 256KB to the storage device, some of the data is dropped. Most managed switches provide information on dropped packets, similar to the following: *: interface is up IHQ: pkts in input hold queue OHQ: pkts in output hold queue RXBS: rx rate (bits/sec) TXBS: tx rate (bits/sec) TRTL: throttle count IQD: pkts dropped from input queue OQD: pkts dropped from output queue RXPS: rx rate (pkts/sec) TXPS: tx rate (pkts/sec) Sample Switch Information Interface IHQ IQD OHQ OQD RXBS RXPS TXBS TXPS TRTL * GigabitEthernet0/1 3 9922 0 0 476303000 62273 477840000 63677 0 In this example from a Cisco switch, the bandwidth used is 476303000 bits/second, which is less than half of wire speed. In spite of this, the port is buffering incoming packets and has dropped quite a few packets. The final line of this interface summary indicates that this port has already dropped almost 10,000 inbound packets in the IQD column. Configuration changes to avoid this problem involve making sure several input Ethernet links are not funneled into one output link, resulting in an oversubscribed link. When a number of links transmitting near capacity are switched to a smaller number of links, oversubscription is a possibility. Generally, applications or systems that write a lot of data to storage, such as data acquisition or transaction logging systems, should not share Ethernet links to a storage device. These types of applications perform best with multiple connections to storage devices. Multiple Connections from Switch to Storage shows multiple connections from the switch to the storage. Multiple Connections from Switch to Storage Using VLANs or VPNs does not provide a suitable solution to the problem of link oversubscription in shared configurations. VLANs and other virtual partitioning of a network provide a way of logically designing a network, but do not change the physical capabilities of links and trunks between switches. When storage traffic and other network traffic end up sharing physical connections, as they would with a VPN, the possibility for oversubscription and lost packets exists. The same is true of VLANs that share interswitch trunks. Performance design for a SANs must take into account the physical limitations of the network, not logical allocations. 5.12 Checking Ethernet Switch Statistics Many Ethernet switches provide different methods for monitoring switch health. Switches that have ports operating near maximum throughput much of the time do not provide optimum performance. If you have ports in your iSCSI SAN running near the maximum, reduce the load. If the port is connected to an ESXi system or iSCSI storage, you can reduce the load by using manual load balancing. If the port is connected between multiple switches or routers, consider installing additional links between these components to handle more load. Ethernet switches also commonly provide information about transmission errors, queued packets, and dropped Ethernet packets. If the switch regularly reports any of these conditions on ports being used for iSCSI traffic, performance of the iSCSI SAN will be poor. 5.13 iSCSI SAN Configuration Checklist This topic provides a checklist of special setup requirements for different storage systems and ESXi hosts. 5.14 Identifying Device Connectivity Problems When your ESXi host experiences a problem while connecting to a storage device, the host treats the problem as permanent or temporary depending on certain factors. Storage connectivity problems are caused by a variety of reasons. Although ESXi cannot always determine the reason for a storage device or its paths being unavailable, the host differentiates between a permanent device loss (PDL) state of the device and a transient all paths down (APD) state of storage. Permanent Device Loss (PDL) A condition that occurs when a storage device permanently fails or is administratively removed or excluded. It is not expected to become available. When the device becomes permanently unavailable, ESXi receives appropriate sense codes or a login rejection from storage arrays, and is able to recognize that the device is permanently lost. All Paths A condition that occurs when a storage device becomes inaccessible to the host Down (APD) and no paths to the device are available. ESXi treats this as a transient condition because typically the problems with the device are temporary and the device is expected to become available again. 5.14.1 Detecting PDL Conditions A storage device is considered to be in the permanent device loss (PDL) state when it becomes permanently unavailable to your ESXi host. Typically, the PDL condition occurs when a device is unintentionally removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error. When the storage array determines that the device is permanently unavailable, it sends SCSI sense codes to the ESXi host. The sense codes allow your host to recognize that the device has failed and register the state of the device as PDL. Note The sense codes must be received on all paths to the device for the device to be considered permanently lost. The following VMkernel log example of a SCSI sense code indicates that the device is in the PDL state. H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0 or Logical Unit Not Supported For information about SCSI sense codes, see Troubleshooting Storage in vSphere Troubleshooting. In the case of iSCSI arrays with a single LUN per target, PDL is detected through iSCSI login failure. An iSCSI storage array rejects your host's attempts to start an iSCSI session with a reason Target Unavailable. As with the sense codes, this response must be received on all paths for the device to be considered permanently lost. After registering the PDL state of the device, the host stops attempts to reestablish connectivity or to issue commands to the device to avoid becoming blocked or unresponsive. The I/O from virtual machines is terminated. Note vSphere HA can detect PDL and restart failed virtual machines. The vSphere Web Client displays the following information for the device: The operational state of the device changes to Lost Communication. All paths are shown as Dead. Datastores on the device are grayed out. It is possible for a device to return from PDL, however, data consistency is not guaranteed. Note The host cannot detect PDL conditions and continues to treat the device connectivity problems as APD when a storage device permanently fails in a way that does not return appropriate SCSI sense codes or a login rejection. For additional details, see the VMware knowledge base article at http://kb.vmware.com/kb/2004684. 5.14.2 Performing Planned Storage Device Removal When a storage device is malfunctioning, you can avoid permanent device loss (PDL) conditions or all paths down (APD) conditions and perform a planned removal and reconnection of a storage device. Planned device removal is an intentional disconnection of a storage device. You might also plan to remove a device for such reasons as upgrading your hardware or reconfiguring your storage devices. When you perform an orderly removal and reconnection of a storage device, you complete a number of tasks. Migrate virtual machines from the device you plan to detach. See the vCenter Server and Host Management documentation. Unmount the datastore deployed on the device. See Unmount VMFS or NFS Datastores. Detach the storage device. See Detach Storage Devices. For an iSCSI device with a single LUN per target, delete the static target entry from each iSCSI HBA that has a path to the storage device. See Remove Static Targets in the vSphere Web Client. Perform any necessary reconfiguration of the storage device by using the array console. Reattach the storage device. See Attach Storage Devices. Mount the datastore and restart the virtual machines. See Mount VMFS 5.14.3 Handling Transient APD Conditions A storage device is considered to be in the all paths down (APD) state when it becomes unavailable to your ESXi host for an unspecified period of time. The reasons for an APD state can be, for example, a failed switch or a disconnected storage cable. In contrast with the permanent device loss (PDL) state, the host treats the APD state as transient and expects the device to be available again. The host indefinitely continues to retry issued commands in an attempt to reestablish connectivity with the device. If the host's commands fail the retries for a prolonged period of time, the host and its virtual machines might be at risk of having performance problems and potentially becoming unresponsive. To avoid these problems, your host uses a default APD handling feature. When a device enters the APD state, the system immediately turns on a timer and allows your host to continue retrying nonvirtual machine commands for a limited time period. By default, the APD timeout is set to 140 seconds, which is typically longer than most devices need to recover from a connection loss. If the device becomes available within this time, the host and its virtual machine continue to run without experiencing any problems. If the device does not recover and the timeout ends, the host stops its attempts at retries and terminates any nonvirtual machine I/O. Virtual machine I/O will continue retrying. The vSphere Web Client displays the following information for the device with the expired APD timeout: The operational state of the device changes to Dead or Error. All paths are shown as Dead. Datastores on the device are dimmed. Even though the device and datastores are unavailable, virtual machines remain responsive. You can power off the virtual machines or migrate them to a different datastore or host. If later one or more device paths becomes operational, subsequent I/O to the device is issued normally and all special APD treatment ends. 5.14.3.1 Disable Storage APD Handling The storage all paths down (APD) handling on your ESXi host is enabled by default. When it is enabled, the host continues to retry nonvirtual machine I/O commands to a storage device in the APD state for a limited time period. When the time period expires, the host stops its retry attempts and terminates any nonvirtual machine I/O. You can disable the APD handling feature on your host. If you disable the APD handling, the host will indefinitely continue to retry issued commands in an attempt to reconnect to the APD device. Continuing to retry is the same behavior as in ESXi version 5.0. This behavior might cause virtual machines on the host to exceed their internal I/O timeout and become unresponsive or fail. The host might become disconnected from vCenter Server. Procedure Browse to the host in the vSphere Web Client navigator. Click the Manage tab, and click Settings. Under System, click Advanced System Settings. Under Advanced System Settings, select the Misc.APDHandlingEnable parameter and click the Edit icon. Change the value to 0. If you disabled the APD handling, you can reenable it when a device enters the APD state. The internal APD handling feature turns on immediately and the timer starts with the current timeout value for each device in APD. 5.14.3.2 Change Timeout Limits for Storage APD The timeout parameter controls how many seconds the ESXi host will retry nonvirtual machine I/O commands to a storage device in an all paths down (APD) state. If needed, you can change the default timeout value. The timer starts immediately after the device enters the APD state. When the timeout expires, the host marks the APD device as unreachable and fails any pending or new nonvirtual machine I/O. Virtual machine I/O will continue to be retried. The default timeout parameter on your host is 140 seconds. You can increase the value of the timeout if, for example, storage devices connected to your ESXi host take longer than 140 seconds to recover from a connection loss. Note If you change the timeout value while an APD is in progress, it will not effect the timeout for that APD. Procedure Browse to the host in the vSphere Web Client navigator. Click the Manage tab, and click Settings. nder System, click Advanced System Settings. Under Advanced System Settings, select the Misc.APDTimeout parameter and click the Edit icon. Change the default value. You can enter a value between 20 and 99999 seconds. 5.14.4 Check the Connection Status of a Storage Device Use the esxcli command to verify the connection status of a particular storage device. In the procedure, --server=server_name specifies the target server. The specified target server prompts you for a user name and password. Other connection options, such as a configuration file or session file, are supported. For a list of connection options, see Getting Started with vSphere Command-Line Interfaces. Prerequisites Install vCLI or deploy the vSphere Management Assistant (vMA) virtual machine. See Getting Started with vSphere Command-Line Interfaces. For troubleshooting , run esxcli commands in the ESXi Shell. Procedure 1. Run the esxcli --server=server_name storage core device list d=device_ID command. 2. Check the connection status in the Status: field. o on - Device is connected. o dead - Device has entered the APD state. The APD timer starts. o dead timeout - The APD timeout has expired. o not connected - Device is in the PDL state. 5.14.5 PDL Conditions and High Availability When a datastore enters a Permanent Device Loss (PDL) state, High Availability (HA) can power off virtual machines located on the datastore and then restart them on an available datastore. VMware offers advanced options to regulate the power off and restart operations for virtual machines. Advanced Parameters to Regulate PDL Parameter Description disk.terminateVMOnPDLDefault When set to true, this option enables default power off for all virtual machines on the ESXi host. scsi0:0.terminateVMOnPDL Power off parameter that you can set for each individual virtual machine. This option overrides disk.terminateVMOnPDLDefault. This option is set to true by default. It allows HA to restart virtual machines that were powered off while the PDL das.maskCleanShutdownEnabledcondition was in progress. When this option is set to true, HA restarts all virtual machines, including those that were intentionally powered off buy a user. 5.15 Best Practices for SSD Devices Follow these best practices when you use SSD devices in vSphere environment. Use datastores that are created on SSD storage devices to allocate space for ESXi host cache. For more information see the vSphere Resource Management documentation. Make sure to use the latest firmware with SSD devices. Frequently check with your storage vendors for any updates. Carefully monitor how intensively you use the SSD device and calculate its estimated lifetime. The lifetime expectancy depends on how actively you continue to use the SSD device. 5.15.1 Estimate SSD Lifetime When working with SSDs, monitor how actively you use them and calculate their estimated lifetime. Typically, storage vendors provide reliable lifetime estimates for an SSD under ideal conditions. For example, a vendor might guarantee a lifetime of 5 years under the condition of 20GB writes per day. However, the more realistic life expectancy of the SSD will depend on how many writes per day your host actually generates. Follow these steps to calculate the lifetime of the SSD. Procedure Obtain the number of writes on the SSD by running the esxcli storage core device stats get -d=device_ID command. The Write Operations item in the output shows the number. You can average this number over a period of time. Estimate lifetime of your SSD by using the following formula: vendor provided number of writes per day times vendor provided life span divided by actual average number of writes per day For example, if your vendor guarantees a lifetime of 5 years under the condition of 20GB writes per day, and the actual number of writes per day is 30GB, the life span of your SSD will be approximately 3.3 years. 5.15.2 How VMFS5 Differs from VMFS3 VMFS5 provides many improvements in scalability and performance over the previous version. VMFS5 has the following improvements: Greater than 2TB storage devices for each VMFS extent. Increased resource limits such as file descriptors. Standard 1MB file system block size with support of 2TB virtual disks. Greater than 2TB disk size for RDMs in physical compatibility mode. Support of small files of 1KB. With ESXi 5.1, any file located on a VMFS5 datastore, new or upgraded from VMFS3, can be opened in a shared mode by a maximum of 32 hosts. VMFS3 continues to support 8 hosts or fewer for file sharing. This affects VMware products that use linked clones, such as View Manager. Scalability improvements on storage devices that support hardware acceleration. For information, see Storage Hardware Acceleration. Default use of hardware assisted locking, also called atomic test and set (ATS) locking, on storage devices that support hardware acceleration. For information about how to turn off ATS locking, see Turn off ATS Locking. Ability to reclaim physical storage space on thin provisioned storage devices. For information, see Array Thin Provisioning and VMFS Datastores. Online upgrade process that upgrades existing datastores without disrupting hosts or virtual machines that are currently running. For information, see Upgrading VMFS Datastores. For information about block size limitations of a VMFS datastore, see the VMware knowledge base article at http://kb.vmware.com/kb/1003565. 5.15.3 VMFS Datastores and Storage Disk Formats Storage devices that your host supports can use either the master boot record (MBR) format or the GUID partition table (GPT) format. With ESXi 5.0 and later, if you create a new VMFS5 datastore, the device is formatted with GPT. The GPT format enables you to create datastores larger than 2TB and up to 64TB for a single extent. VMFS3 datastores continue to use the MBR format for their storage devices. Consider the following items when you work with VMFS3 datastores: For VMFS3 datastores, the 2TB limit still applies, even when the storage device has a capacity of more than 2TB. To be able to use the entire storage space, upgrade a VMFS3 datastore to VMFS5. Conversion of the MBR format to GPT happens only after you expand the datastore to a size larger than 2TB. When you upgrade a VMFS3 datastore to VMFS5, the datastore uses the MBR format. Conversion to GPT happens only after you expand the datastore to a size larger than 2TB. When you upgrade a VMFS3 datastore, remove from the storage device any partitions that ESXi does not recognize, for example, partitions that use the EXT2 or EXT3 formats. Otherwise, the host cannot format the device with GPT and the upgrade fails. You cannot expand a VMFS3 datastore on devices that have the GPT partition format. 5.15.4 VMFS Datastores as Repositories ESXi can format SCSI-based storage devices as VMFS datastores. VMFS datastores primarily serve as repositories for virtual machines. With VMFS5, you can have up to 256 VMFS datastores per host, with the maximum size of 64TB. The required minimum size for a VMFS datastore is 1.3GB, however, the recommended minimum size is 2GB. Note Always have only one VMFS datastore for each LUN. You can store multiple virtual machines on the same VMFS datastore. Each virtual machine, encapsulated in a set of files, occupies a separate single directory. For the operating system inside the virtual machine, VMFS preserves the internal file system semantics, which ensures correct application behavior and data integrity for applications running in virtual machines. When you run multiple virtual machines, VMFS provides specific locking mechanisms for virtual machine files, so that virtual machines can operate safely in a SAN environment where multiple ESXi hosts share the same VMFS datastore. In addition to virtual machines, the VMFS datastores can store other files, such as virtual machine templates and ISO images. 5.15.5 VMFS Metadata Updates A VMFS datastore holds virtual machine files, directories, symbolic links, RDM descriptor files, and so on. The datastore also maintains a consistent view of all the mapping information for these objects. This mapping information is called metadata. Metadata is updated each time you perform datastore or virtual machine management operations. Examples of operations requiring metadata updates include the following: Creating, growing, or locking a virtual machine file Changing a file's attributes Powering a virtual machine on or off Creating or deleting a VMFS datastore Expanding a VMFS datastore Creating a template Deploying a virtual machine from a template Migrating a virtual machine with vMotion When metadata changes are made in a shared storage enviroment, VMFS uses special locking mechanisms to protect its data and prevent multiple hosts from concurrently writing to the metadata. 5.15.6 VMFS Locking Mechanisms In a shared storage environment, when multiple hosts access the same VMFS datastore, specific locking mechanisms are used. These locking mechanism prevent multiple hosts from concurrently writing to the metadata and ensure that no data corruption occurs. VMFS supports SCSI reservations and atomic test and set (ATS) locking. 5.15.6.1 SCSI Reservations VMFS uses SCSI reservations on storage devices that do not support hardware acceleration. SCSI reservations lock an entire storage device while an operation that requires metadata protection is performed. After the operation completes, VMFS releases the reservation and other operations can continue. Because this lock is exclusive, excessive SCSI reservations by a host can cause performance degradation on other hosts that are accessing the same VMFS. For information about how to reduce SCSI reservations, see the vSphere Troubleshooting documentation. 5.15.6.2 Atomic Test and Set (ATS) For storage devices that support hardware acceleration, VMFS uses the ATS algorithm, also called hardware assisted locking. In contrast with SCSI reservations, ATS supports discrete locking per disk sector. For information about hardware acceleration, see Storage Hardware Acceleration. Mechanisms that VMFS uses to apply different types of locking depend on the VMFS version. Use of ATS Locking on Devices with Hardware Acceleration Support Storage Devices New VMFS5 Upgraded VMFS5 VMFS3 Single extent ATS only ATS, but can revert to SCSI ATS, but can revert to SCSI reservations reservations Multiple extents ATS except when locks on ATS except when locks on non-head non-head Spans only over ATScapable devices In certain cases, you might need to turn off the ATS-only setting for a new VMFS5 datastore. For information, see Turn off ATS Locking. 5.15.6.3 Turn off ATS Locking When you create a VMFS5 datastore on a device that supports atomic test and set (ATS) locking, the datastore is set to the ATS-only mode. In certain circumstances, you might need to turn off the ATS mode setting. Turn off the ATS setting when, for example, your storage device is downgraded or firmware updates fail and the device no longer supports hardware acceleration. The option that you use to turn off the ATS setting is available only through the ESXi Shell. For more information, see the Getting Started with vSphere Command-Line Interfaces. Procedure 1. To turn off the ATS setting, run the following command: vmkfstools --configATSOnly 0 device The device parameter is the path to the head extent device on which VMFS5 was deployed. Use the following format: /vmfs/devices/disks/disk_ID:P 5.15.7 Using Layer 3 Routed Connections to Access NFS Storage When you use Layer 3 (L3) routed connections to access NFS storage, consider certain requirements and restructions. Ensure that your environment meets the following requirements: Use Cisco's Hot Standby Router Protocol (HSRP) in IP Router. If you are using nonCisco router, be sure to use Virtual Router Redundancy Protocol (VRRP) instead. Use Quality of Service (QoS) to prioritize NFS L3 traffic on networks with limited bandwidths, or on networks that experience congestion. See your router documentation for details. Follow Routed NFS L3 best practices recommended by storage vendor. Contact your storage vendor for details. Disable Network I/O Resource Management (NetIORM). If you are planning to use systems with top-of-rack switches or switch-dependent I/O device partitioning, contact your system vendor for compatibility and support. In an L3 environment the following restrictions apply: The environment does not support VMware Site Recovery Manager. The environment supports only NFS protocol. Do not use other storage protocols such as FCoE over the same physical network. The NFS traffic in this environment does not support IPv6. The NFS traffic in this environment can be routed only over a LAN. Other environments such as WAN are not supported. The environment does not support Distributed Virtual Switch (DVS). 5.16 Upgrading VMFS Datastores If your datastores were formatted with VMFS2 or VMFS3, you can upgrade the datastores to VMFS5. When you perform datastore upgrades, consider the following items: To upgrade a VMFS2 datastore, you use a two-step process that involves upgrading VMFS2 to VMFS3 first. Because ESXi 5.0 and later hosts cannot access VMFS2 datastores, use a legacy host, ESX/ESXi 4.x or earlier, to access the VMFS2 datastore and perform the VMFS2 to VMFS3 upgrade. After you upgrade your VMFS2 datastore to VMFS3, the datastore becomes available on the ESXi 5.x host, where you complete the process of upgrading to VMFS5. You can perform a VMFS3 to VMFS5 upgrade while the datastore is in use with virtual machines powered on. While performing an upgrade, your host preserves all files on the datastore. The datastore upgrade is a one-way process. After upgrading your datastore, you cannot revert it back to its previous VMFS format. An upgraded VMFS5 datastore differs from a newly formatted VMFS5. Comparing Upgraded and Newly Formatted VMFS5 Datastores CharacteristicsUpgraded VMFS5 Formatted VMFS5 File block size 1, 2, 4, and 8MB 1MB Subblock size 64KB 8KB Partition format MBR. Conversion to GPT happens only after you expand GPT the datastore to a size larger than 2TB. Datastore limits Retains limits of VMFS3 datastore. 5.16.1 Increase VMFS Datastore Capacity in the vSphere Client When you need to create virtual machines on a datastore, or when the virtual machines running on a datastore require more space, you can dynamically increase the capacity of a VMFS datastore. Use one of the following methods to increase a VMFS datastore: Add a new extent. An extent is a partition on a storage device. You can add up to 32 extents of the same storage type to an existing VMFS datastore. The spanned VMFS datastore can use any or all of its extents at any time. It does not need to fill up a particular extent before using the next one. Grow an extent in an existing VMFS datastore, so that it fills the available adjacent capacity. Only extents with free space immediately after them are expandable. Note If a shared datastore has powered on virtual machines and becomes 100% full, you can increase the datastore's capacity only from the host with which the powered on virtual machines are registered. 5.17 Set Up Dynamic Disk Mirroring Typically, you cannot use logical-volume manager software on virtual machines to mirror virtual disks. However, if your Microsoft Windows virtual machines support dynamic disks, you can protect the virtual machines from an unplanned storage device loss by mirroring virtual disks across two SAN LUNs. Prerequisites Use a Windows virtual machine that supports dynamic disks. Required privilege: Advanced Procedure 1. Create a virtual machine with two virtual disks. Make sure to place the disks on different datastores. 2. Log in to your virtual machine and configure the disks as dynamic mirrored disks. See Microsoft documentation. 3. After the disks synchronise, power off the virtual machine. 4. Change virtual machine settings to allow the use of dynamic disk mirroring. a. Right-click the virtual machine and select Edit Settings. b. Click the VM Options tab and expand the Advanced menu. c. Click Edit Configuration next to Configuration Parameters. d. Click Add Row and add the following parameters: Name Value scsi#.returnNoConnectDuringAPD scsi#.returnBusyOnNoConnectStatus e. Click OK. 5.18 Creating a Diagnostic Partition To run successfully, your host must have a diagnostic partition or a dump partition to store core dumps for debugging and technical support. Typically, a local diagnostic partition is created during ESXi installation. You can override this default behavior if, for example, you use shared storage devices instead of local storage. To prevent automatic disk formatting, detach the local storage devices from the host before you install ESXi and power on the host for the first time. You can later create a diagnostic partition on a local disk or on a private or shared SAN LUN using the client. The following considerations apply: A diagnostic partition cannot be located on an iSCSI LUN accessed through the software iSCSI or dependent hardware iSCSI adapter. For more information about diagnostic partitions with iSCSI, see General Boot from iSCSI SAN Recommendations. Unless you are using diskless servers, set up a diagnostic partition on a local storage. Each host must have a diagnostic partition of 110MB. If multiple hosts share a diagnostic partition on a SAN LUN, the partition should be large enough to accommodate core dumps of all hosts. If a host that uses a shared diagnostic partition fails, reboot the host and extract log files immediately after the failure. Otherwise, the second host that fails before you collect the diagnostic data of the first host might not be able to save the core dump. To mange the host’s diagnostic partition, use the vCLI commands. See vSphere CommandLine Interface Concepts and Examples. 5.18.1 Create a Diagnostic Partition in the vSphere Client You can create a diagnostic partition for your host. Procedure Log in to the vSphere Client and select the host from the Inventory panel. Click the Configuration tab and click Storage in the Hardware panel. Click Datastores and click Add Storage. Select Diagnostic and click Next. If you do not see Diagnostic as an option, the host already has a diagnostic partition. Specify the type of diagnostic partition. 5.19 About Raw Device Mapping An RDM is a mapping file in a separate VMFS volume that acts as a proxy for a raw physical storage device. The RDM allows a virtual machine to directly access and use the storage True False device. The RDM contains metadata for managing and redirecting disk access to the physical device. The file gives you some of the advantages of direct access to a physical device while keeping some advantages of a virtual disk in VMFS. As a result, it merges VMFS manageability with raw device access. RDMs can be described in terms such as mapping a raw device into a datastore, mapping a system LUN, or mapping a disk file to a physical disk volume. All these terms refer to RDMs. Raw Device Mapping Although VMware recommends that you use VMFS datastores for most virtual disk storage, on certain occasions, you might need to use raw LUNs or logical disks located in a SAN. For example, you need to use raw LUNs with RDMs in the following situations: When SAN snapshot or other layered applications run in the virtual machine. The RDM better enables scalable backup offloading systems by using features inherent to the SAN. In any MSCS clustering scenario that spans physical hosts — virtual-to-virtual clusters as well as physical-to-virtual clusters. In this case, cluster data and quorum disks should be configured as RDMs rather than as virtual disks on a shared VMFS. Think of an RDM as a symbolic link from a VMFS volume to a raw LUN. The mapping makes LUNs appear as files in a VMFS volume. The RDM, not the raw LUN, is referenced in the virtual machine configuration. The RDM contains a reference to the raw LUN. Using RDMs, you can: Use vMotion to migrate virtual machines using raw LUNs. Add raw LUNs to virtual machines using the vSphere Client or the vSphere Web Client. Use file system features such as distributed file locking, permissions, and naming. Two compatibility modes are available for RDMs: Virtual compatibility mode allows an RDM to act exactly like a virtual disk file, including the use of snapshots. 5.19.1 RDM Considerations and Limitations Certain considerations and limitations exist when you use RDMs. The RDM is not available for direct-attached block devices or certain RAID devices. The RDM uses a SCSI serial number to identify the mapped device. Because block devices and some direct-attach RAID devices do not export serial numbers, they cannot be used with RDMs. If you are using the RDM in physical compatibility mode, you cannot use a snapshot with the disk. Physical compatibility mode allows the virtual machine to manage its own, storage-based, snapshot or mirroring operations. Virtual machine snapshots are available for RDMs with virtual compatibility mode. You cannot map to a disk partition. RDMs require the mapped device to be a whole LUN. If you use vMotion to migrate virtual machines with RDMs, make sure to maintain consistent LUN IDs for RDMs across all participating ESXi hosts. 5.20 Raw Device Mapping Characteristics An RDM is a special mapping file in a VMFS volume that manages metadata for its mapped device. The mapping file is presented to the management software as an ordinary disk file, available for the usual file-system operations. To the virtual machine, the storage virtualization layer presents the mapped device as a virtual SCSI device. Key contents of the metadata in the mapping file include the location of the mapped device (name resolution), the locking state of the mapped device, permissions, and so on. 5.20.1 RDM Virtual and Physical Compatibility Modes You can use RDMs in virtual compatibility or physical compatibility modes. Virtual mode specifies full virtualization of the mapped device. Physical mode specifies minimal SCSI virtualization of the mapped device, allowing the greatest flexibility for SAN management software. In virtual mode, the VMkernel sends only READ and WRITE to the mapped device. The mapped device appears to the guest operating system exactly the same as a virtual disk file in a VMFS volume. The real hardware characteristics are hidden. If you are using a raw disk in virtual mode, you can realize the benefits of VMFS such as advanced file locking for data protection and snapshots for streamlining development processes. Virtual mode is also more portable across storage hardware than physical mode, presenting the same behavior as a virtual disk file. In physical mode, the VMkernel passes all SCSI commands to the device, with one exception: the REPORT LUNs command is virtualized so that the VMkernel can isolate the LUN to the owning virtual machine. Otherwise, all physical characteristics of the underlying hardware are exposed. Physical mode is useful to run SAN management agents or other SCSI targetbased software in the virtual machine. Physical mode also allows virtual-to-physical clustering for cost-effective high availability. VMFS5 supports greater than 2TB disk size for RDMs in physical compatibility mode only. The following restrictions apply: You cannot relocate larger than 2TB RDMs to datastores other than VMFS5. You cannot convert larger than 2TB RDMs to virtual disks, or perform other operations that involve RDM to virtual disk conversion. Such operations include cloning. Features Available with Virtual Disks and Raw Device Mappings ESXi Features Virtual Disk File Virtual Mode RDM Physical Mode RDM Yes SCSI Commands Passed Through No No REPORT LUNs is not passed through vCenter Server Support Yes Yes Yes Snapshots Yes Yes No Distributed Locking Yes Yes Yes Clustering Cluster-in-a-box Physical-to-virtual clustering Cluster-in-a-box cluster-acrossonly cluster-across-boxes boxes SCSI Target-Based Software No No Yes VMware recommends that you use virtual disk files for the cluster-in-a-box type of clustering. If you plan to reconfigure your cluster-in-a-box clusters as cluster-across-boxes clusters, use virtual mode RDMs for the cluster-in-a-box clusters. Option Description Persistent Changes are immediately and permanently written to the disk. NonpersistentChanges to the disk are discarded when you power off or revert to the snapshot. 5.21 VMkernel and Storage The VMkernel is a high-performance operating system that runs directly on the ESXi host. The VMkernel manages most of the physical resources on the hardware, including memory, physical processors, storage, and networking controllers. To manage storage, VMkernel has a storage subsystem that supports several Host Bus Adapters (HBAs) including parallel SCSI, SAS, Fibre Channel, FCoE, and iSCSI. These HBAs connect a wide variety of active-active, active-passive, and ALUA storage arrays that are certified for use with the VMkernel. See the vSphere Compatibility Guide for a list of the supported HBAs and storage arrays. The primary file system that the VMkernel uses is the VMware Virtual Machine File System (VMFS). VMFS is a cluster file system designed and optimized to support large files such as virtual disks and swap files. The VMkernel also supports the storage of virtual disks on NFS file systems. The storage I/O path provides virtual machines with access to storage devices through device emulation. This device emulation allows a virtual machine to access files on a VMFS or NFS file system as if they were SCSI devices. The VMkernel provides storage virtualization functions such as the scheduling of I/O requests from multiple virtual machines and multipathing. In addition, VMkernel offers several Storage APIs that enable storage partners to integrate and optimize their products for vSphere. The following graphic illustrates the basics of the VMkernel core, with special attention to the storage stack. Storage‐related modules reside between the logical device I/O scheduler and the adapter I/O scheduler layers. VMkernel and Storage This chapter includes the following topics: Storage APIs 5.21.1 Storage APIs Storage APIs is a family of APIs used by third-party hardware, software, and storage providers to develop components that enhance several vSphere features and solutions. This publication describes the following sets of Storage APIs and explains how they contribute to your storage environment. For information about other APIs from this family, including Storage API - Data Protection and Storage API - Site Recovery Manager, see the VMware Web site. Storage APIs - Multipathing, also known as the Pluggable Storage Architecture (PSA). PSA is a collection of VMkernel APIs that allows storage partners to enable and certify their arrays asynchronous to ESXi release schedules, as well as deliver performance‐enhancing, multipathing and load‐balancing behaviors that are optimized for each array. For more information, see Managing Multiple Paths. Storage APIs - Array Integration, formerly known as VAAI, include the following APIs: o Hardware Acceleration APIs. Allows arrays to integrate with vSphere to transparently offload certain storage operations to the array. This integration significantly reduces CPU overhead on the host. See Storage Hardware Acceleration. o Array Thin Provisioning APIs. Help to monitor space use on thin-provisioned storage arrays to prevent out-of-space conditions, and to perform space reclamation. See Array Thin Provisioning and VMFS Datastores. Storage APIs - Storage Awareness. These vCenter Server-based APIs enable storage arrays to inform the vCenter Server about their configurations, capabilities, and storage health and events. See Using Storage Vendor Providers. 5.22 Understanding Multipathing and Failover 5.22.1 Host-Based Failover with iSCSI When setting up your ESXi host for multipathing and failover, you can use multiple iSCSI HBAs or multiple NICs depending on the type of iSCSI adapters on your host. For information on different types of iSCSI adapters, see iSCSI Initiators. When you use multipathing, specific considerations apply. ESXi does not support multipathing when you combine an independent hardware adapter with software iSCSI or dependent iSCSI adapters in the same host. Multipathing between software and dependent adapters within the same host is supported. On different hosts, you can mix both dependent and independent adapters. The following illustration shows multipathing setups possible with different types of iSCSI initiators. Host-Based Path Failover 5.22.2 Failover with Hardware iSCSI With hardware iSCSI, the host typically has two or more hardware iSCSI adapters available, from which the storage system can be reached using one or more switches. Alternatively, the setup might include one adapter and two storage processors so that the adapter can use a different path to reach the storage system. On the Host-Based Path Failover illustration, Host1 has two hardware iSCSI adapters, HBA1 and HBA2, that provide two physical paths to the storage system. Multipathing plug-ins on your host, whether the VMkernel NMP or any third-party MPPs, have access to the paths by default and can monitor health of each physical path. If, for example, HBA1 or the link between HBA1 and the network fails, the multipathing plug-ins can switch the path over to HBA2. 5.22.3 Failover with Software iSCSI With software iSCSI, as shown on Host 2 of the Host-Based Path Failover illustration, you can use multiple NICs that provide failover and load balancing capabilities for iSCSI connections between your host and storage systems. For this setup, because multipathing plug-ins do not have direct access to physical NICs on your host, you first need to connect each physical NIC to a separate VMkernel port. You then associate all VMkernel ports with the software iSCSI initiator using a port binding technique. As a result, each VMkernel port connected to a separate NIC becomes a different path that the iSCSI storage stack and its storage-aware multipathing plug-ins can use. For information on how to configure multipathing for software iSCSI, see Setting Up iSCSI Network. 5.23 Array-Based Failover with iSCSI Some iSCSI storage systems manage path use of their ports automatically and transparently to ESXi. When using one of these storage systems, your host does not see multiple ports on the storage and cannot choose the storage port it connects to. These systems have a single virtual port address that your host uses to initially communicate. During this initial communication, the storage system can redirect the host to communicate with another port on the storage system. The iSCSI initiators in the host obey this reconnection request and connect with a different port on the system. The storage system uses this technique to spread the load across available ports. If the ESXi host loses connection to one of these ports, it automatically attempts to reconnect with the virtual port of the storage system, and should be redirected to an active, usable port. This reconnection and redirection happens quickly and generally does not disrupt running virtual machines. These storage systems can also request that iSCSI initiators reconnect to the system, to change which storage port they are connected to. This allows the most effective use of the multiple ports. The Port Redirection illustration shows an example of port redirection. The host attempts to connect to the 10.0.0.1 virtual port. The storage system redirects this request to 10.0.0.2. The host connects with 10.0.0.2 and uses this port for I/O communication. Note The storage system does not always redirect connections. The port at 10.0.0.1 could be used for traffic, also. Port Redirection If the port on the storage system that is acting as the virtual port becomes unavailable, the storage system reassigns the address of the virtual port to another port on the system. Port Reassignment shows an example of this type of port reassignment. In this case, the virtual port 10.0.0.1 becomes unavailable and the storage system reassigns the virtual port IP address to a different port. The second port responds to both addresses. Port Reassignment With this form of array-based failover, you can have multiple paths to the storage only if you use multiple ports on the ESXi host. These paths are active-active. For additional information, see iSCSI Session Management. 5.24Path Failover and Virtual Machines Path failover occurs when the active path to a LUN is changed from one path to another, usually because of a SAN component failure along the current path. When a path fails, storage I/O might pause for 30 to 60 seconds until your host determines that the link is unavailable and completes failover. If you attempt to display the host, its storage devices, or its adapters, the operation might appear to stall. Virtual machines with their disks installed on the SAN can appear unresponsive. After failover is complete, I/O resumes normally and the virtual machines continue to run. However, when failovers take a long time to complete, a Windows virtual machine might interrupt the I/O and eventually fail. To avoid the failure, set the disk timeout value for the Windows virtual machine to at least 60 seconds. 5.24.1 Set Timeout on Windows Guest OS Increase the standard disk timeout value on a Windows guest operating system to avoid disruptions during a path failover. This procedure explains how to change the timeout value by using the Windows registry. Prerequisites Back up the Windows registry. Procedure 1. Select Start > Run. 2. Type regedit.exe, and click OK. 3. In the left-panel hierarchy view, double-click HKEY_LOCAL_MACHINE > System > CurrentControlSet > Services > Disk. 4. Double-click TimeOutValue. 5. Set the value data to 0x3c (hexadecimal) or 60 (decimal) and click OK. After you make this change, Windows waits at least 60 seconds for delayed disk 5. operations to complete before it generates errors. 6. Reboot guest OS for the change to take effect. 5.25 Managing Multiple Paths To manage storage multipathing, ESXi uses a collection of Storage APIs, also called the Pluggable Storage Architecture (PSA). The PSA is an open, modular framework that coordinates the simultaneous operation of multiple multipathing plug-ins (MPPs). The PSA allows 3rd party software developers to design their own load balancing techniques and failover mechanisms for particular storage array, and insert their code directly into the ESXi storage I/O path. Topics discussing path management use the following acronyms. Multipathing Acronyms AcronymDefinition PSA Pluggable Storage Architecture NMP Native Multipathing Plug-In. Generic VMware multipathing module. PSP Path Selection Plug-In, also called Path Selection Policy. Handles path selection for a given device. SATP Storage Array Type Plug-In, also called Storage Array Type Policy. Handles path failover for a given storage array. The VMkernel multipathing plug-in that ESXi provides by default is the VMware Native Multipathing Plug-In (NMP). The NMP is an extensible module that manages sub plug-ins. There are two types of NMP sub plug-ins, Storage Array Type Plug-Ins (SATPs), and Path Selection Plug-Ins (PSPs). SATPs and PSPs can be built-in and provided by VMware, or can be provided by a third party. If more multipathing functionality is required, a third party can also provide an MPP to run in addition to, or as a replacement for, the default NMP. When coordinating the VMware NMP and any installed third-party MPPs, the PSA performs the following tasks: Loads and unloads multipathing plug-ins. Hides virtual machine specifics from a particular plug-in. Routes I/O requests for a specific logical device to the MPP managing that device. Handles I/O queueing to the logical devices. Implements logical device bandwidth sharing between virtual machines. Handles I/O queueing to the physical storage HBAs. Handles physical path discovery and removal. Provides logical device and physical path I/O statistics. As the Pluggable Storage Architecture illustration shows, multiple third-party MPPs can run in parallel with the VMware NMP. When installed, the third-party MPPs replace the behavior of the NMP and take complete control of the path failover and the load-balancing operations for specified storage devices. Pluggable Storage Architecture The multipathing modules perform the following operations: Manage physical path claiming and unclaiming. Manage creation, registration, and deregistration of logical devices. Associate physical paths with logical devices. Support path failure detection and remediation. Process I/O requests to logical devices: o Select an optimal physical path for the request. o Depending on a storage device, perform specific actions necessary to handle path failures and I/O command retries. Support management tasks, such as reset of logical devices. 5.26VMware Multipathing Module By default, ESXi provides an extensible multipathing module called the Native Multipathing Plug-In (NMP). Generally, the VMware NMP supports all storage arrays listed on the VMware storage HCL and provides a default path selection algorithm based on the array type. The NMP associates a set of physical paths with a specific storage device, or LUN. The specific details of handling path failover for a given storage array are delegated to a Storage Array Type Plug-In (SATP). The specific details for determining which physical path is used to issue an I/O request to a storage device are handled by a Path Selection Plug-In (PSP). SATPs and PSPs are sub plugins within the NMP module. With ESXi, the appropriate SATP for an array you use will be installed automatically. You do not need to obtain or download any SATPs. 5.26.1 VMware SATPs Storage Array Type Plug-Ins (SATPs) run in conjunction with the VMware NMP and are responsible for array-specific operations. ESXi offers a SATP for every type of array that VMware supports. It also provides default SATPs that support non-specific active-active and ALUA storage arrays, and the local SATP for direct-attached devices. Each SATP accommodates special characteristics of a certain class of storage arrays and can perform the array-specific operations required to detect path state and to activate an inactive path. As a result, the NMP module itself can work with multiple storage arrays without having to be aware of the storage device specifics. After the NMP determines which SATP to use for a specific storage device and associates the SATP with the physical paths for that storage device, the SATP implements the tasks that include the following: Monitors the health of each physical path. Reports changes in the state of each physical path. Performs array-specific actions necessary for storage fail-over. For example, for active-passive devices, it can activate passive paths. 5.26.2 VMware PSPs Path Selection Plug-Ins (PSPs) are sub plug-ins of the VMware NMP and are responsible for choosing a physical path for I/O requests. The VMware NMP assigns a default PSP for each logical device based on the SATP associated with the physical paths for that device. You can override the default PSP. For information, see Path Scanning and Claiming. By default, the VMware NMP supports the following PSPs: MW_PSP_MRU The host selects the path that it used most recently. When the path becomes unavailable, the host selects an alternative path. The host does not revert back to the original path when that path becomes available again. There is no preferred path setting with the MRU policy. MRU is the default policy for most active-passive storage devices. The VMW_PSP_MRU ranking capability allows you to assign ranks to individual paths. To set ranks to individual paths, use the esxcli storage nmp psp generic pathconfig set command. For details, see the VMware knowledge base article at http://kb.vmware.com/kb/2003468. The policy is displayed in the client as the Most Recently Used (VMware) path selection policy. VMW_PSP_FIXED The host uses the designated preferred path, if it has been configured. Otherwise, it selects the first working path discovered at system boot time. If you want the host to use a particular preferred path, specify it manually. Fixed is the default policy for most active-active storage devices. Note If the host uses a default preferred path and the path's status turns to Dead, a new path is selected as preferred. However, if you explicitly designate the preferred path, it will remain preferred even when it becomes inaccessible. Displayed in the client as the Fixed (VMware) path selection policy. VMW_PSP_RR The host uses an automatic path selection algorithm rotating through all active paths when connecting to active-passive arrays, or through all available paths when connecting to active-active arrays. RR is the default for a number of arrays and can be used with both active-active and active-passive arrays to implement load balancing across paths for different LUNs. Displayed in the client as the Round Robin (VMware) path selection policy. 5.26.3 VMware NMP Flow of I/O When a virtual machine issues an I/O request to a storage device managed by the NMP, the following process takes place. The NMP calls the PSP assigned to this storage device. The PSP selects an appropriate physical path on which to issue the I/O. The NMP issues the I/O request on the path selected by the PSP. If the I/O operation is successful, the NMP reports its completion. If the I/O operation reports an error, the NMP calls the appropriate SATP. The SATP interprets the I/O command errors and, when appropriate, activates the inactive paths. The PSP is called to select a new path on which to issue the I/O. 5.27 Path Scanning and Claiming When you start your ESXi host or rescan your storage adapter, the host discovers all physical paths to storage devices available to the host. Based on a set of claim rules, the host determines which multipathing plug-in (MPP) should claim the paths to a particular device and become responsible for managing the multipathing support for the device. By default, the host performs a periodic path evaluation every 5 minutes causing any unclaimed paths to be claimed by the appropriate MPP. The claim rules are numbered. For each physical path, the host runs through the claim rules starting with the lowest number first. The attributes of the physical path are compared to the path specification in the claim rule. If there is a match, the host assigns the MPP specified in the claim rule to manage the physical path. This continues until all physical paths are claimed by corresponding MPPs, either third-party multipathing plug-ins or the native multipathing plug-in (NMP). For the paths managed by the NMP module, a second set of claim rules is applied. These rules determine which Storage Array Type Plug-In (SATP) should be used to manage the paths for a specific array type, and which Path Selection Plug-In (PSP) is to be used for each storage device. Use the vSphere Client or the vSphere Web Client to view which SATP and PSP the host is using for a specific storage device and the status of all available paths for this storage device. If needed, you can change the default VMware PSP using the client. To change the default SATP, you need to modify claim rules using the vSphere CLI. You can find some information about modifying claim rules in Managing Storage Paths and Multipathing Plug-Ins. For more information about the commands available to manage PSA, see Getting Started with vSphere Command-Line Interfaces. For a complete list of storage arrays and corresponding SATPs and PSPs, see the SAN Array Model Reference section of the vSphere Compatibility Guide. 5.27.1 Viewing the Paths Information You can review the storage array type policy (SATP) and path selection policy (PSP) that the ESXi host uses for a specific storage device and the status of all available paths for this storage device. You can access the path information from both the Datastores and Devices views. For datastores, you review the paths that connect to the device the datastore is deployed on. The path information includes the SATP assigned to manage the device, the PSP, a list of paths, and the status of each path. The following path status information can appear: Active Paths available for issuing I/O to a LUN. A single or multiple working paths currently used for transferring data are marked as Active (I/O). Standby If active paths fail, the path can quickly become operational and can be used for I/O. Disabled The path is disabled and no data can be transferred. Dead The software cannot connect to the disk through this path. If you are using the Fixed path policy, you can see which path is the preferred path. The preferred path is marked with an asterisk (*) in the Preferred column. For each path you can also display the path's name. The name includes parameters that describe the path: adapter ID, target ID, and device ID. Usually, the path's name has the format similar to the following: fc.adapterID-fc.targetID-naa.deviceID Note When you use the host profiles editor to edit paths, you must specify all three parameters that describe a path, adapter ID, target ID, and device ID 5.28 Managing Storage Paths and Multipathing Plug-Ins Use the esxcli commands to manage the PSA multipathing plug-ins and storage paths assigned to them. You can display all multipathing plug-ins available on your host. You can list any third-party MPPs, as well as your host's NMP and SATPs and review the paths they claim. You can also define new paths and specify which multipathing plug-in should claim the paths. For more information about commands available to manage PSA, see the Getting Started with vSphere Command-Line Interfaces. 5.29Multipathing Considerations Specific considerations apply when you manage storage multipathing plug-ins and claim rules. The following considerations help you with multipathing: If no SATP is assigned to the device by the claim rules, the default SATP for iSCSI or FC devices is VMW_SATP_DEFAULT_AA. The default PSP is VMW_PSP_FIXED. When the system searches the SATP rules to locate a SATP for a given device, it searches the driver rules first. If there is no match, the vendor/model rules are searched, and finally the transport rules are searched. If no match occurs, NMP selects a default SATP for the device. If VMW_SATP_ALUA is assigned to a specific storage device, but the device is not ALUA-aware, no claim rule match occurs for this device. The device is claimed by the default SATP based on the device's transport type. The default PSP for all devices claimed by VMW_SATP_ALUA is VMW_PSP_MRU. The VMW_PSP_MRU selects an active/optimized path as reported by the VMW_SATP_ALUA, or an active/unoptimized path if there is no active/optimized path. This path is used until a better path is available (MRU). For example, if the VMW_PSP_MRU is currently using an active/unoptimized path and an active/optimized path becomes available, the VMW_PSP_MRU will switch the current path to the active/optimized one. While VMW_PSP_MRU is typically selected for ALUA arrays by default, certain ALUA storage arrays need to use VMW_PSP_FIXED. To check whether your storage array requires VMW_PSP_FIXED, see the VMware Compatibility Guide or contact your storage vendor. When using VMW_PSP_FIXED with ALUA arrays, unless you explicitly specify a preferred path, the ESXi host selects the most optimal working path and designates it as the default preferred path. If the host selected path becomes unavailable, the host selects an alternative available path. However, if you explicitly designate the preferred path, it will remain preferred no matter what its status is. By default, the PSA claim rule 101 masks Dell array pseudo devices. Do not delete this rule, unless you want to unmask these devices. 5.29.1 List Multipathing Claim Rules for the Host Use the esxcli command to list available multipathing claim rules. Claim rules indicate which multipathing plug-in, the NMP or any third-party MPP, manages a given physical path. Each claim rule identifies a set of paths based on the following parameters: Vendor/model strings Transportation, such as SATA, IDE, Fibre Channel, and so on Adapter, target, or LUN location Device driver, for example, Mega-RAID In the procedure, --server=server_name specifies the target server. The specified target server prompts you for a user name and password. Other connection options, such as a configuration file or session file, are supported. For a list of connection options, see Getting Started with vSphere Command-Line Interfaces. Prerequisites Install vCLI or deploy the vSphere Management Assistant (vMA) virtual machine. See Getting Started with vSphere Command-Line Interfaces. For troubleshooting , run esxcli commands in the ESXi Shell. Procedure 1. Run the esxcli --server=server_name storage core claimrule list -claimrule-class=MP command to list the multipathing claim rules. 5.29.1.1 Example: Sample Output of the esxcli storage core claimrule list Command Rule Class Rule Class MP 0 runtime MP 1 runtime MP 2 runtime MP 3 runtime MP 4 runtime MP 101 runtime model=Universal Xport MP 101 file model=Universal Xport MP 200 runtime MP 200 file MP 201 runtime target=* lun=* MP 201 file target=* lun=* MP 202 runtime MP 202 file MP 65535 runtime Type transport transport transport transport transport vendor Plugin NMP NMP NMP NMP NMP MASK_PATH Matches transport=usb transport=sata transport=ide transport=block transport=unknown vendor=DELL vendor MASK_PATH vendor=DELL vendor vendor location MPP_1 MPP_1 MPP_2 vendor=NewVend model=* vendor=NewVend model=* adapter=vmhba41 channel=* location MPP_2 adapter=vmhba41 channel=* driver driver vendor MPP_3 MPP_3 NMP driver=megaraid driver=megaraid vendor=* model=* This example indicates the following: The NMP claims all paths connected to storage devices that use the USB, SATA, IDE, and Block SCSI transportation. You can use the MASK_PATH module to hide unused devices from your host. By default, the PSA claim rule 101 masks Dell array pseudo devices with a vendor string of DELL and a model string of Universal Xport. The MPP_1 module claims all paths connected to any model of the NewVend storage array. The MPP_3 module claims the paths to storage devices controlled by the Mega-RAID device driver. Any paths not described in the previous rules are claimed by NMP. The Rule Class column in the output describes the category of a claim rule. It can 5.29.2 Hardware Acceleration for Block Storage Devices With hardware acceleration, your host can integrate with block storage devices, Fibre Channel or iSCSI, and use certain storage array operations. ESXi hardware acceleration supports the following array operations: Full copy, also called clone blocks or copy offload. Enables the storage arrays to make full copies of data within the array without having the host read and write the data. This operation reduces the time and network load when cloning virtual machines, provisioning from a template, or migrating with vMotion. Block zeroing, also called write same. Enables storage arrays to zero out a large number of blocks to provide newly allocated storage, free of previously written data. This operation reduces the time and network load when creating virtual machines and formatting virtual disks. Hardware assisted locking, also called atomic test and set (ATS). Supports discrete virtual machine locking without use of SCSI reservations. This operation allows disk locking per sector, instead of the entire LUN as with SCSI reservations. Check with your vendor for the hardware acceleration support. Certain storage arrays require that you activate the support on the storage side. On your host, the hardware acceleration is enabled by default. If your storage does not support the hardware acceleration, you can disable it. In addition to hardware acceleration support, ESXi includes support for array thin provisioning. For information, see Array Thin Provisioning and VMFS Datastores. ESXi host can communicate directly and does not require the VAAI plug-ins. If the device does not support T10 SCSI or provides partial support, ESXi reverts to using the VAAI plug-ins, installed on your host, or uses a combination of the T10 SCSI commands and plug-ins. The VAAI plug-ins are vendor-specific and can be either VMware or partner developed. To manage the VAAI capable device, your host attaches the VAAI filter and vendor-specific VAAI plug-in to the device. For information about whether your storage requires VAAI plug-ins or supports hardware acceleration through T10 SCSI commands, see the vSphere Compatibility Guide or check with your storage vendor. You can use several esxcli commands to query storage devices for the hardware acceleration support information. For the devices that require the VAAI plug-ins, the claim rule commands are also available. For information about esxcli commands, see Getting Started with vSphere Command-Line Interfaces. 5.29.2.1 Display Hardware Acceleration Plug-Ins and Filter To communicate with the devices that do not support the T10 SCSI standard, your host uses a combination of a single VAAI filter and a vendor-specific VAAI plug-in. Use the esxcli command to view the hardware acceleration filter and plug-ins currently loaded into your system. In the procedure, --server=server_name specifies the target server. The specified target server prompts you for a user name and password. Other connection options, such as a configuration file or session file, are supported. For a list of connection options, see Getting Started with vSphere Command-Line Interfaces. 5.30 Hardware Acceleration on NAS Devices Hardware acceleration allows your host to integrate with NAS devices and use several hardware operations that NAS storage provides. The following list shows the supported NAS operations: Full file clone. This operation is similar to the VMFS block cloning except that NAS devices clone entire files instead of file segments. Reserve space. Enables storage arrays to allocate space for a virtual disk file in thick format. Typically, when you create a virtual disk on an NFS datastore, the NAS server determines the allocation policy. The default allocation policy on most NAS servers is thin and does not guarantee backing storage to the file. However, the reserve space operation can instruct the NAS device to use vendor-specific mechanisms to reserve space for a virtual disk. As a result, you can create thick virtual disks on the NFS datastore. Lazy file clone. Allows VMware View to offload creation of linked clones to a NAS array. Extended file statistics. Enables storage arrays to accurately report space utilization. With NAS storage devices, the hardware acceleration integration is implemented through vendor-specific NAS plug-ins. These plug-ins are typically created by vendors and are distributed as VIB packages through a web page. No claim rules are required for the NAS plug-ins to function. There are several tools available for installing and upgrading VIB packages. They include the esxcli commands and vSphere Update Manager. For more information, see the vSphere Upgrade and Installing and Administering VMware vSphere Update Manager documentation. 5.31 Hardware Acceleration Considerations When you use the hardware acceleration functionality, certain considerations apply. Several reasons might cause a hardware-accelerated operation to fail. For any primitive that the array does not implement, the array returns an error. The error triggers the ESXi host to attempt the operation using its native methods. The VMFS data mover does not leverage hardware offloads and instead uses software data movement when one of the following occurs: The source and destination VMFS datastores have different block sizes. The source file type is RDM and the destination file type is non-RDM (regular file). The source VMDK type is eagerzeroedthick and the destination VMDK type is thin. The source or destination VMDK is in sparse or hosted format. The source virtual machine has a snapshot. The logical address and transfer length in the requested operation are not aligned to the minimum alignment required by the storage device. All datastores created with the vSphere Client or the vSphere Web Client are aligned automatically. The VMFS has multiple LUNs or extents, and they are on different arrays. Hardware cloning between arrays, even within the same VMFS datastore, does not work. 5.32 Booting ESXi with Software FCoE ESXi supports boot from FCoE capable network adapters. When you install and boot ESXi from an FCoE LUN, the host can use a VMware software FCoE adapter and a network adapter with FCoE capabilities. The host does not require a dedicated FCoE HBA. You perform most configurations through the option ROM of your network adapter. The network adapters must support one of the following formats, which communicate parameters about an FCoE boot device to VMkernel. FCoE Boot Firmware Table (FBFT). FBFT is Intel propriety. FCoE Boot Parameter Table (FBPT). FBPT is defined by VMware for third-party vendors to implement software FCoE boot. The configuration parameters are set in the option ROM of your adapter. During an ESXi installation or a subsequent boot, these parameters are exported in to system memory in either FBFT format or FBPT format. The VMkernel can read the configuration settings and use them to access the boot LUN. This chapter includes the following topics: Requirements and Considerations for Software FCoE Boot Best Practices for Software FCoE Boot Set Up Software FCoE Boot Troubleshooting Installation and Boot from Software FCoE 5.33 Requirements and Considerations for Software FCoE Boot When you boot the ESXi host from SAN using software FCoE, certain requirements and considerations apply. 5.33.1 Requirements ESXi 5.1. The network adapter must have the following capabilities: o Be FCoE capable. o Support ESXi 5.x open FCoE stack. o Contain FCoE boot firmware which can export boot information in FBFT format or FBPT format. 5.33.2 Considerations You cannot change software FCoE boot configuration from within ESXi. Coredump is not supported on any software FCoE LUNs, including the boot LUN. Multipathing is not supported at pre-boot. Boot LUN cannot be shared with other hosts even on shared storage. 5.34Best Practices for Software FCoE Boot VMware recommends several best practices when you boot your system from a software FCoE LUN. Make sure that the host has access to the entire boot LUN. The boot LUN cannot be shared with other hosts even on shared storage. If you use Intel 10 Gigabit Ethernet Controller (Niantec) with a Cisco switch, configure the switch port in the following way: o Enable the Spanning Tree Protocol (STP). o Turn off switchport trunk native vlan for the VLAN used for FCoE. 6 vSphere Resource Management 6.1 Configuring Resource Allocation Settings 6.1.1 Resource Allocation Shares Shares specify the relative importance of a virtual machine (or resource pool). If a virtual machine has twice as many shares of a resource as another virtual machine, it is entitled to consume twice as much of that resource when these two virtual machines are competing for resources. Shares are typically specified as High, Normal, or Low and these values specify share values with a 4:2:1 ratio, respectively. You can also select Custom to assign a specific number of shares (which expresses a proportional weight) to each virtual machine. Specifying shares makes sense only with regard to sibling virtual machines or resource pools, that is, virtual machines or resource pools with the same parent in the resource pool hierarchy. Siblings share resources according to their relative share values, bounded by the reservation and limit. When you assign shares to a virtual machine, you always specify the priority for that virtual machine relative to other powered-on virtual machines. The following table shows the default CPU and memory share values for a virtual machine. For resource pools, the default CPU and memory share values are the same, but must be multiplied as if the resource pool were a virtual machine with four virtual CPUs and 16 GB of memory. Share Values SettingCPU share values Memory share values High 2000 shares per virtual CPU 20 shares per megabyte of configured virtual machine memory. Normal 1000 shares per virtual CPU 10 shares per megabyte of configured virtual machine memory. Low 500 shares per virtual CPU 5 shares per megabyte of configured virtual machine memory. For example, an SMP virtual machine with two virtual CPUs and 1GB RAM with CPU and memory shares set to Normal has 2x1000=2000 shares of CPU and 10x1024=10240 shares of memory. Note Virtual machines with more than one virtual CPU are called SMP (symmetric multiprocessing) virtual machines. ESXi supports up to 64 virtual CPUs per virtual machine. The relative priority represented by each share changes when a new virtual machine is powered on. This affects all virtual machines in the same resource pool. All of the irtual machines have the same number of virtual CPUs. Consider the following examples. Two CPU-bound virtual machines run on a host with 8GHz of aggregate CPU capacity. Their CPU shares are set to Normal and get 4GHz each. A third CPU-bound virtual machine is powered on. Its CPU shares value is set to High, which means it should have twice as many shares as the machines set to Normal. The new virtual machine receives 4GHz and the two other machines get only 2GHz each. The same result occurs if the user specifies a custom share value of 2000 for the third virtual machine. 6.1.2 Resource Allocation Reservation A reservation specifies the guaranteed minimum allocation for a virtual machine. vCenter Server or ESXi allows you to power on a virtual machine only if there are enough unreserved resources to satisfy the reservation of the virtual machine. The server guarantees that amount even when the physical server is heavily loaded. The reservation is expressed in concrete units (megahertz or megabytes). For example, assume you have 2GHz available and specify a reservation of 1GHz for VM1 and 1GHz for VM2. Now each virtual machine is guaranteed to get 1GHz if it needs it. However, if VM1 is using only 500MHz, VM2 can use 1.5GHz. Reservation defaults to 0. You can specify a reservation if you need to guarantee that the minimum required amounts of CPU or memory are always available for the virtual machine. 6.1.3 Resource Allocation Limit Limit specifies an upper bound for CPU, memory, or storage I/O resources that can be allocated to a virtual machine. A server can allocate more than the reservation to a virtual machine, but never allocates more than the limit, even if there are unused resources on the system. The limit is expressed in concrete units (megahertz, megabytes, or I/O operations per second). CPU, memory, and storage I/O resource limits default to unlimited. When the memory limit is unlimited, the amount of memory configured for the virtual machine when it was created becomes its effective limit. In most cases, it is not necessary to specify a limit. There are benefits and drawbacks: Benefits — Assigning a limit is useful if you start with a small number of virtual machines and want to manage user expectations. Performance deteriorates as you add more virtual machines. You can simulate having fewer resources available by specifying a limit. Drawbacks — You might waste idle resources if you specify a limit. The system does not allow virtual machines to use more resources than the limit, even when the system is underutilized and idle resources are available. Specify the limit only if you have good reasons for doing so. 6.1.4 Resource Allocation Settings Suggestions Select resource allocation settings (shares, reservation, and limit) that are appropriate for your ESXi environment. The following guidelines can help you achieve better performance for your virtual machines. If you expect frequent changes to the total available resources, use Shares to allocate resources fairly across virtual machines. If you use Shares, and you upgrade the host, for example, each virtual machine stays at the same priority (keeps the same number of shares) even though each share represents a larger amount of memory, CPU, or storage I/O resources. Use Reservation to specify the minimum acceptable amount of CPU or memory, not the amount you want to have available. The host assigns additional resources as available based on the number of shares, estimated demand, and the limit for your virtual machine. The amount of concrete resources represented by a reservation does not change when you change the environment, such as by adding or removing virtual machines. When specifying the reservations for virtual machines, do not commit all resources (plan to leave at least 10% unreserved). As you move closer to fully reserving all capacity in the system, it becomes increasingly difficult to make changes to reservations and to the resource pool hierarchy without violating admission control. In a DRS-enabled cluster, reservations that fully commit the capacity of the cluster or of individual hosts in the cluster can prevent DRS from migrating virtual machines between hosts. Option Description Shares CPU shares for this resource pool with respect to the parent’s total. Sibling resource pools share resources according to their relative share values bounded by the reservation and limit. Select Low, Normal, or High, which specify share values respectively in a 1:2:4 ratio. Select Custom to give each virtual machine a specific number of shares, which expresses a proportional weight. ReservationGuaranteed CPU allocation for this resource pool. Select Expandable Reservation to specify that more than the specified reservation is allocated if resources are available in a parent. Limit Upper limit for this resource pool’s CPU allocation. Select Unlimited to specify no upper limit. Edit the Memory Resources. Option Description Shares Memory shares for this resource pool with respect to the parent’s total. Sibling resource pools share resources according to their relative share values bounded by the reservation and limit. Select Low, Normal, or High, which specify share values respectively in a 1:2:4 ratio. Select Custom to give each virtual machine a specific number of shares, which expresses a proportional weight. ReservationGuaranteed memory allocation for this resource pool. Select Expandable Reservation to specify that more than the specified reservation is allocated if resources are available in a parent. Upper limit for this resource pool’s memory allocation. Select Unlimited to specify no upper limit. Limit 6.1.5 Changing Resource Allocation Settings—Example The following example illustrates how you can change resource allocation settings to improve virtual machine performance. Assume that on an ESXi host, you have created two new virtual machines—one each for your QA (VM-QA) and Marketing (VM-Marketing) departments. Single Host with Two Virtual Machines In the following example, assume that VM-QA is memory intensive and accordingly you want to change the resource allocation settings for the two virtual machines to: Specify that, when system memory is overcommitted, VM-QA can use twice as much memory and CPU as the Marketing virtual machine. Set the memory shares and CPU shares for VM-QA to High and for VM-Marketing set them to Normal. Ensure that the Marketing virtual machine has a certain amount of guaranteed CPU resources. You can do so using a reservation setting. Procedure Start the vSphere Client and connect to a vCenter Server system. Right-click VM-QA, the virtual machine for which you want to change shares, and select Edit Settings. Select the Resources tab, and in the CPU panel, select High from the Shares dropdown menu. In the Memory panel, select High from the Shares drop-down menu. Click OK. Right-click the marketing virtual machine (VM-Marketing) and select Edit Settings. In the CPU panel, change the Reservation value to the desired number. Click OK. If you select the cluster’s Resource Allocation tab and click CPU, you should see that shares for VM-QA are twice that of the other virtual machine. Also, because the virtual machines have not been powered on, the Reservation Used fields have not changed. 6.1.6 Admission Control When you power on a virtual machine, the system checks the amount of CPU and memory resources that have not yet been reserved. Based on the available unreserved resources, the system determines whether it can guarantee the reservation for which the virtual machine is configured (if any). This process is called admission control. If enough unreserved CPU and memory are available, or if there is no reservation, the virtual machine is powered on. Otherwise, an Insufficient Resources warning appears. Note In addition to the user-specified memory reservation, for each virtual machine there is also an amount of overhead memory. This extra memory commitment is included in the admission control calculation. When the vSphere DPM feature is enabled, hosts might be placed in standby mode (that is, powered off) to reduce power consumption. The unreserved resources provided by these hosts are considered available for admission control. If a virtual machine cannot be powered on without these resources, a recommendation to power on sufficient standby hosts is made. 6.1.7 CPU Virtualization Basics CPU virtualization emphasizes performance and runs directly on the processor whenever possible. The underlying physical resources are used whenever possible and the virtualization layer runs instructions only as needed to make virtual machines operate as if they were running directly on a physical machine. CPU virtualization is not the same thing as emulation. ESXi does not use emulation to run virtual CPUs. With emulation, all operations are run in software by an emulator. A software emulator allows programs to run on a computer system other than the one for which they were originally written. The emulator does this by emulating, or reproducing, the original computer’s behavior by accepting the same data or inputs and achieving the same results. Emulation provides portability and runs software designed for one platform across several platforms. When CPU resources are overcommitted, the ESXi host time-slices the physical processors across all virtual machines so each virtual machine runs as if it has its specified number of virtual processors. When an ESXi host runs multiple virtual machines, it allocates to each virtual machine a share of the physical resources. With the default resource allocation settings, all virtual machines associated with the same host receive an equal share of CPU per virtual CPU. This means that a single-processor virtual machines is assigned only half of the resources of a dual-processor virtual machine. This chapter includes the following topics: Software-Based CPU Virtualization Hardware-Assisted CPU Virtualization Virtualization and Processor-Specific Behavior Performance Implications of CPU Virtualization 6.1.8 Software-Based CPU Virtualization With software-based CPU virtualization, the guest application code runs directly on the processor, while the guest privileged code is translated and the translated code executes on the processor. The translated code is slightly larger and usually executes more slowly than the native version. As a result, guest programs, which have a small privileged code component, run with speeds very close to native. Programs with a significant privileged code component, such as system calls, traps, or page table updates can run slower in the virtualized environment. Hardware-Assisted CPU Virtualization Certain processors provide hardware assistance for CPU virtualization. When using this assistance, the guest can use a separate mode of execution called guest mode. The guest code, whether application code or privileged code, runs in the guest mode. On certain events, the processor exits out of guest mode and enters root mode. The hypervisor executes in the root mode, determines the reason for the exit, takes any required actions, and restarts the guest in guest mode. When you use hardware assistance for virtualization, there is no need to translate the code. As a result, system calls or trap-intensive workloads run very close to native speed. Some workloads, such as those involving updates to page tables, lead to a large number of exits from guest mode to root mode. Depending on the number of such exits and total time spent in exits, hardware-assisted CPU virtualization can speed up execution significantly. 6.1.9 Virtualization and Processor-Specific Behavior Although VMware software virtualizes the CPU, the virtual machine detects the specific model of the processor on which it is running. Processor models might differ in the CPU features they offer, and applications running in the virtual machine can make use of these features. Therefore, it is not possible to use vMotion® to migrate virtual machines between systems running on processors with different feature sets. You can avoid this restriction, in some cases, by using Enhanced vMotion Compatibility (EVC) with processors that support this feature. See the vCenter Server and Host Management documentation for more information. 6.1.10 Performance Implications of CPU Virtualization CPU virtualization adds varying amounts of overhead depending on the workload and the type of virtualization used. An application is CPU-bound if it spends most of its time executing instructions rather than waiting for external events such as user interaction, device input, or data retrieval. For such applications, the CPU virtualization overhead includes the additional instructions that must be executed. This overhead takes CPU processing time that the application itself can use. CPU virtualization overhead usually translates into a reduction in overall performance. For applications that are not CPU-bound, CPU virtualization likely translates into an increase in CPU use. If spare CPU capacity is available to absorb the overhead, it can still deliver comparable performance in terms of overall throughput. ESXi supports up to 64 virtual processors (CPUs) for each virtual machine. Note Deploy single-threaded applications on uniprocessor virtual machines, instead of on SMP virtual machines that have multiple CPUs, for the best performance and resource use. Single-threaded applications can take advantage only of a single CPU. Deploying such applications in dual-processor virtual machines does not speed up the application. Instead, it causes the second virtual CPU to use physical resources that other virtual machines could otherwise use. 6.1.11 Specifying CPU Configuration You can specify CPU configuration to improve resource management. However, if you do not customize CPU configuration, the ESXi host uses defaults that work well in most situations. You can specify CPU configuration in the following ways: Use the attributes and special features available through the vSphere Client. The vSphere Client graphical user interface (GUI) allows you to connect to the ESXi host or a vCenter Server system. Use advanced settings under certain circumstances. Use the vSphere SDK for scripted CPU allocation. Use hyperthreading. 6.1.12 Multicore Processors Multicore processors provide many advantages for a host performing multitasking of virtual machines. Intel and AMD have each developed processors which combine two or more processor cores into a single integrated circuit (often called a package or socket). VMware uses the term socket to describe a single package which can have one or more processor cores with one or more logical processors in each core. A dual-core processor, for example, can provide almost double the performance of a singlecore processor, by allowing two virtual CPUs to execute at the same time. Cores within the same processor are typically configured with a shared last-level cache used by all cores, potentially reducing the need to access slower main memory. A shared memory bus that connects a physical processor to main memory can limit performance of its logical processors if the virtual machines running on them are running memory-intensive workloads which compete for the same memory bus resources. Each logical processor of each processor core can be used independently by the ESXi CPU scheduler to execute virtual machines, providing capabilities similar to SMP systems. For example, a two-way virtual machine can have its virtual processors running on logical processors that belong to the same core, or on logical processors on different physical cores. The ESXi CPU scheduler can detect the processor topology and the relationships between processor cores and the logical processors on them. It uses this information to schedule virtual machines and optimize performance. The ESXi CPU scheduler can interpret processor topology, including the relationship between sockets, cores, and logical processors. The scheduler uses topology information to optimize the placement of virtual CPUs onto different sockets to maximize overall cache utilization, and to improve cache affinity by minimizing virtual CPU migrations. In undercommitted systems, the ESXi CPU scheduler spreads load across all sockets by default. This improves performance by maximizing the aggregate amount of cache available to the running virtual CPUs. As a result, the virtual CPUs of a single SMP virtual machine are spread across multiple sockets (unless each socket is also a NUMA node, in which case the NUMA scheduler restricts all the virtual CPUs of the virtual machine to reside on the same socket.) In some cases, such as when an SMP virtual machine exhibits significant data sharing between its virtual CPUs, this default behavior might be sub-optimal. For such workloads, it can be beneficial to schedule all of the virtual CPUs on the same socket, with a shared lastlevel cache, even when the ESXi host is undercommitted. In such scenarios, you can override the default behavior of spreading virtual CPUs across packages by including the following configuration option in the virtual machine's .vmx configuration file: sched.cpu.vsmpConsolidate="TRUE". 6.1.13 Hyperthreading Hyperthreading technology allows a single physical processor core to behave like two logical processors. The processor can run two independent applications at the same time. To avoid confusion between logical and physical processors, Intel refers to a physical processor as a socket, and the discussion in this chapter uses that terminology as well. Intel Corporation developed hyperthreading technology to enhance the performance of its Pentium IV and Xeon processor lines. Hyperthreading technology allows a single processor core to execute two independent threads simultaneously. While hyperthreading does not double the performance of a system, it can increase performance by better utilizing idle resources leading to greater throughput for certain important workload types. An application running on one logical processor of a busy core can expect slightly more than half of the throughput that it obtains while running alone on a non-hyperthreaded processor. Hyperthreading performance improvements are highly application-dependent, and some applications might see performance degradation with hyperthreading because many processor resources (such as the cache) are shared between logical processors. Note On processors with Intel Hyper-Threading technology, each core can have two logical processors which share most of the core's resources, such as memory caches and functional units. Such logical processors are usually called threads. Many processors do not support hyperthreading and as a result have only one thread per core. For such processors, the number of cores also matches the number of logical processors. The following processors support hyperthreading and have two threads per core. Processors based on the Intel Xeon 5500 processor microarchitecture. Intel Pentium 4 (HT-enabled) Intel Pentium EE 840 (HT-enabled) 6.1.14 Hyperthreading and ESXi Hosts A host that is enabled for hyperthreading should behave similarly to a host without hyperthreading. You might need to consider certain factors if you enable hyperthreading, however. ESXi hosts manage processor time intelligently to guarantee that load is spread smoothly across processor cores in the system. Logical processors on the same core have consecutive CPU numbers, so that CPUs 0 and 1 are on the first core together, CPUs 2 and 3 are on the second core, and so on. Virtual machines are preferentially scheduled on two different cores rather than on two logical processors on the same core. If there is no work for a logical processor, it is put into a halted state, which frees its execution resources and allows the virtual machine running on the other logical processor on the same core to use the full execution resources of the core. The VMware scheduler properly accounts for this halt time, and charges a virtual machine running with the full resources of a core more than a virtual machine running on a half core. This approach to processor management ensures that the server does not violate any of the standard ESXi resource allocation rules. Consider your resource management needs before you enable CPU affinity on hosts using hyperthreading. For example, if you bind a high priority virtual machine to CPU 0 and another high priority virtual machine to CPU 1, the two virtual machines have to share the same physical core. In this case, it can be impossible to meet the resource demands of these virtual machines. Ensure that any custom affinity settings make sense for a hyperthreaded system. 6.1.15 Hyperthreaded Core Sharing Options You can set the hyperthreaded core sharing mode for a virtual machine using the vSphere Client. Hyperthreaded Core Sharing Modes Option Description Any The default for all virtual machines on a hyperthreaded system. The virtual CPUs of a virtual machine with this setting can freely share cores with other virtual CPUs from this or any other virtual machine at any time. Hyperthreaded Core Sharing Modes Option Description None Virtual CPUs of a virtual machine should not share cores with each other or with virtual CPUs from other virtual machines. That is, each virtual CPU from this virtual machine should always get a whole core to itself, with the other logical CPU on that core being placed into the halted state. This option is similar to none. Virtual CPUs from this virtual machine cannot share cores with virtual CPUs from other virtual machines. They can share cores with the Internalother virtual CPUs from the same virtual machine. You can select this option only for SMP virtual machines. If applied to a uniprocessor virtual machine, the system changes this option to none. These options have no effect on fairness or CPU time allocation. Regardless of a virtual machine’s hyperthreading settings, it still receives CPU time proportional to its CPU shares, and constrained by its CPU reservation and CPU limit values. For typical workloads, custom hyperthreading settings should not be necessary. The options can help in case of unusual workloads that interact badly with hyperthreading. For example, an application with cache thrashing problems might slow down an application sharing its physical core. You can place the virtual machine running the application in the none or internal hyperthreading status to isolate it from other virtual machines. If a virtual CPU has hyperthreading constraints that do not allow it to share a core with another virtual CPU, the system might deschedule it when other virtual CPUs are entitled to consume processor time. Without the hyperthreading constraints, you can schedule both virtual CPUs on the same core. The problem becomes worse on systems with a limited number of cores (per virtual machine). In such cases, there might be no core to which the virtual machine that is descheduled can be migrated. As a result, virtual machines with hyperthreading set to none or internal can experience performance degradation, especially on systems with a limited number of cores. 6.1.16 Quarantining In certain rare circumstances, ESXi might detect that an application is interacting badly with the Pentium IV hyperthreading technology. (This does not apply to systems based on the Intel Xeon 5500 processor microarchitecture.) In such cases, quarantining, which is transparent to the user, might be necessary. For example, certain types of self-modifying code can disrupt the normal behavior of the Pentium IV trace cache and can lead to substantial slowdowns (up to 90 percent) for an application sharing a core with the problematic code. In those cases, the ESXi host quarantines the virtual CPU running this code and places its virtual machine in the none or internal mode, as appropriate. 6.1.17 Using CPU Affinity By specifying a CPU affinity setting for each virtual machine, you can restrict the assignment of virtual machines to a subset of the available processors in multiprocessor systems. By using this feature, you can assign each virtual machine to processors in the specified affinity set. CPU affinity specifies virtual machine-to-processor placement constraints and is different from the relationship created by a VM-VM or VM-Host affinity rule, which specifies virtual machine-to-virtual machine host placement constraints. In this context, the term CPU refers to a logical processor on a hyperthreaded system and refers to a core on a non-hyperthreaded system. The CPU affinity setting for a virtual machine applies to all of the virtual CPUs associated with the virtual machine and to all other threads (also known as worlds) associated with the virtual machine. Such virtual machine threads perform processing required for emulating mouse, keyboard, screen, CD-ROM, and miscellaneous legacy devices. In some cases, such as display-intensive workloads, significant communication might occur between the virtual CPUs and these other virtual machine threads. Performance might degrade if the virtual machine's affinity setting prevents these additional threads from being scheduled concurrently with the virtual machine's virtual CPUs. Examples of this include a uniprocessor virtual machine with affinity to a single CPU or a two-way SMP virtual machine with affinity to only two CPUs. For the best performance, when you use manual affinity settings, VMware recommends that you include at least one additional physical CPU in the affinity setting to allow at least one of the virtual machine's threads to be scheduled at the same time as its virtual CPUs. Examples of this include a uniprocessor virtual machine with affinity to at least two CPUs or a two-way SMP virtual machine with affinity to at least three CPUs. 6.1.18 Potential Issues with CPU Affinity Before you use CPU affinity, you might need to consider certain issues. Potential issues with CPU affinity include: For multiprocessor systems, ESXi systems perform automatic load balancing. Avoid manual specification of virtual machine affinity to improve the scheduler’s ability to balance load across processors. Affinity can interfere with the ESXi host’s ability to meet the reservation and shares specified for a virtual machine. Because CPU admission control does not consider affinity, a virtual machine with manual affinity settings might not always receive its full reservation. Virtual machines that do not have manual affinity settings are not adversely affected by virtual machines with manual affinity settings. When you move a virtual machine from one host to another, affinity might no longer apply because the new host might have a different number of processors. The NUMA scheduler might not be able to manage a virtual machine that is already assigned to certain processors using affinity. Affinity can affect the host's ability to schedule virtual machines on multicore or hyperthreaded processors to take full advantage of resources shared on such processors. 6.1.19 Host Power Management Policies ESXi can take advantage of several power management features that the host hardware provides to adjust the trade-off between performance and power use. You can control how ESXi uses these features by selecting a power management policy. In general, selecting a high-performance policy provides more absolute performance, but at lower efficiency (performance per watt). Lower-power policies provide less absolute performance, but at higher efficiency. ESXi provides five power management policies. If the host does not support power management, or if the BIOS settings specify that the host operating system is not allowed to manage power, only the Not Supported policy is available. You select a policy for a host using the vSphere Client. If you do not select a policy, ESXi uses Balanced by default. CPU Power Management Policies Power Management Policy Description Not supported The host does not support any power management features or power management is not enabled in the BIOS. The VMkernel detects certain power management features, but will not High Performance use them unless the BIOS requests them for power capping or thermal events. The VMkernel uses the available power management features Balanced (Default) conservatively to reduce host energy consumption with minimal compromise to performance. Low Power The VMkernel aggressively uses available power management features to reduce host energy consumption at the risk of lower performance. Custom The VMkernel bases its power management policy on the values of several advanced configuration parameters. You can set these parameters in the vSphere Client Advanced Settings dialog box. When a CPU runs at lower frequency, it can also run at lower voltage, which saves power. This type of power management is typically called Dynamic Voltage and Frequency Scaling (DVFS). ESXi attempts to adjust CPU frequencies so that virtual machine performance is not affected. When a CPU is idle, ESXi can take advantage of deep halt states (known as C-states). The deeper the C-state, the less power the CPU uses, but the longer it takes for the CPU to resume running. When a CPU becomes idle, ESXi applies an algorithm to predict how long it will be in an idle state and chooses an appropriate C-state to enter. In power management policies that do not use deep C-states, ESXi uses only the shallowest halt state (C1) for idle CPUs. 6.1.20 Select a CPU Power Management Policy You set the CPU power management policy for a host using the vSphere Client. Prerequisites Verify that the BIOS settings on the host system allow the operating system to control power management (for example, OS Controlled). Note Some systems have Processor Clocking Control (PCC) technology, which allows ESXi to manage power on the host system even if the host BIOS settings do not specify OS Controlled mode. With this technology, ESXi does not manage P-states directly. Instead, the host cooperates with the BIOS to determine the processor clock rate. HP systems that support this technology have a BIOS setting called Cooperative Power Management that is enabled by default. 6.2 Memory Virtualization Basics Before you manage memory resources, you should understand how they are being virtualized and used by ESXi. The VMkernel manages all machine memory. The VMkernel dedicates part of this managed machine memory for its own use. The rest is available for use by virtual machines. Virtual machines use machine memory for two purposes: each virtual machine requires its own memory and the virtual machine monitor (VMM) requires some memory and a dynamic overhead memory for its code and data. The virtual and physical memory space is divided into blocks called pages. When physical memory is full, the data for virtual pages that are not present in physical memory are stored on disk. Depending on processor architecture, pages are typically 4 KB or 2 MB. See Advanced Memory Attributes. This chapter includes the following topics: 6.2.1 Virtual Machine Memory Each virtual machine consumes memory based on its configured size, plus additional overhead memory for virtualization. The configured size is a construct maintained by the virtualization layer for the virtual machine. It is the amount of memory that is presented to the guest operating system, but it is independent of the amount of physical RAM that is allocated to the virtual machine, which depends on the resource settings (shares, reservation, limit) explained below. For example, consider a virtual machine with a configured size of 1GB. When the guest operating system boots, it detects that it is running on a dedicated machine with 1GB of physical memory. The actual amount of physical host memory allocated to the virtual machine depends on its memory resource settings and memory contention on the ESXi host. In some cases, the virtual machine might be allocated the full 1GB. In other cases, it might receive a smaller allocation. Regardless of the actual allocation, the guest operating system continues to behave as though it is running on a dedicated machine with 1GB of physical memory. Shares Specify the relative priority for a virtual machine if more than the reservation is available. Reservation Is a guaranteed lower bound on the amount of physical memory that the host reserves for the virtual machine, even when memory is overcommitted. Set the reservation to a level that ensures the virtual machine has sufficient memory to run efficiently, without excessive paging. After a virtual machine has accessed its full reservation, it is allowed to retain that amount of memory and this memory is not reclaimed, even if the virtual machine becomes idle. For example, some guest operating systems (for example, Linux) might not access all of the configured memory immediately after booting. Until the virtual machines accesses its full reservation, VMkernel can allocate any unused portion of its reservation to other virtual machines. However, after the guest’s workload increases and it consumes its full reservation, it is allowed to keep this memory. Limit Is an upper bound on the amount of physical memory that the host can allocate to the virtual machine. The virtual machine’s memory allocation is also implicitly limited by its configured size. Overhead memory includes space reserved for the virtual machine frame buffer and various virtualization data structures. 6.2.2 Memory Overcommitment For each running virtual machine, the system reserves physical memory for the virtual machine’s reservation (if any) and for its virtualization overhead. Because of the memory management techniques the ESXi host uses, your virtual machines can use more memory than the physical machine (the host) has available. For example, you can have a host with 2GB memory and run four virtual machines with 1GB memory each. In that case, the memory is overcommitted. Overcommitment makes sense because, typically, some virtual machines are lightly loaded while others are more heavily loaded, and relative activity levels vary over time. To improve memory utilization, the ESXi host transfers memory from idle virtual machines to virtual machines that need more memory. Use the Reservation or Shares parameter to preferentially allocate memory to important virtual machines. This memory remains available to other virtual machines if it is not in use. In addition, memory compression is enabled by default on ESXi hosts to improve virtual machine performance when memory is overcommitted as described in Memory Compression. 6.2.3 Memory Sharing Many workloads present opportunities for sharing memory across virtual machines. For example, several virtual machines might be running instances of the same guest operating system, have the same applications or components loaded, or contain common data. ESXi systems use a proprietary page-sharing technique to securely eliminate redundant copies of memory pages. With memory sharing, a workload consisting of multiple virtual machines often consumes less memory than it would when running on physical machines. As a result, the system can efficiently support higher levels of overcommitment. The amount of memory saved by memory sharing depends on workload characteristics. A workload of many nearly identical virtual machines might free up more than thirty percent of memory, while a more diverse workload might result in savings of less than five percent of memory. 6.2.4 Software-Based Memory Virtualization ESXi virtualizes guest physical memory by adding an extra level of address translation. The VMM for each virtual machine maintains a mapping from the guest operating system's physical memory pages to the physical memory pages on the underlying machine. (VMware refers to the underlying host physical pages as “machine” pages and the guest operating system’s physical pages as “physical” pages.) Each virtual machine sees a contiguous, zero-based, addressable physical memory space. The underlying machine memory on the server used by each virtual machine is not necessarily contiguous. The VMM intercepts virtual machine instructions that manipulate guest operating system memory management structures so that the actual memory management unit (MMU) on the processor is not updated directly by the virtual machine. The ESXi host maintains the virtual-to-machine page mappings in a shadow page table that is kept up to date with the physical-to-machine mappings (maintained by the VMM). The shadow page tables are used directly by the processor's paging hardware. This approach to address translation allows normal memory accesses in the virtual machine to execute without adding address translation overhead, after the shadow page tables are set up. Because the translation look-aside buffer (TLB) on the processor caches direct virtual-tomachine mappings read from the shadow page tables, no additional overhead is added by the VMM to access the memory. 6.2.5 Performance Considerations The use of two-page tables has these performance implications. No overhead is incurred for regular guest memory accesses. Additional time is required to map memory within a virtual machine, which might mean: o The virtual machine operating system is setting up or updating virtual address to physical address mappings. o The virtual machine operating system is switching from one address space to another (context switch). Like CPU virtualization, memory virtualization overhead depends on workload. 6.2.6 Hardware-Assisted Memory Virtualization Some CPUs, such as AMD SVM-V and the Intel Xeon 5500 series, provide hardware support for memory virtualization by using two layers of page tables. The first layer of page tables stores guest virtual-to-physical translations, while the second layer of page tables stores guest physical-to-machine translation. The TLB (translation lookaside buffer) is a cache of translations maintained by the processor's memory management unit (MMU) hardware. A TLB miss is a miss in this cache and the hardware needs to go to memory (possibly many times) to find the required translation. For a TLB miss to a certain guest virtual address, the hardware looks at both page tables to translate guest virtual address to host physical address. The diagram illustrates the ESXi implementation of memory virtualization. ESXi Memory Mapping The boxes represent pages, and the arrows show the different memory mappings. The arrows from guest virtual memory to guest physical memory show the mapping maintained by the page tables in the guest operating system. (The mapping from virtual memory to linear memory for x86-architecture processors is not shown.) The arrows from guest physical memory to machine memory show the mapping maintained by the VMM. The dashed arrows show the mapping from guest virtual memory to machine memory in the shadow page tables also maintained by the VMM. The underlying processor running the virtual machine uses the shadow page table mappings. Because of the extra level of memory mapping introduced by virtualization, ESXi can effectively manage memory across all virtual machines. Some of the physical memory of a virtual machine might be mapped to shared pages or to pages that are unmapped, or swapped out. A host performs virtual memory management without the knowledge of the guest operating system and without interfering with the guest operating system’s own memory management subsystem. 6.2.7 Performance Considerations When you use hardware assistance, you eliminate the overhead for software memory virtualization. In particular, hardware assistance eliminates the overhead required to keep shadow page tables in synchronization with guest page tables. However, the TLB miss latency when using hardware assistance is significantly higher. As a result, whether or not a workload benefits by using hardware assistance primarily depends on the overhead the memory virtualization causes when using software memory virtualization. If a workload involves a small amount of page table activity (such as process creation, mapping the memory, or context switches), software virtualization does not cause significant overhead. Conversely, workloads with a large amount of page table activity are likely to benefit from hardware assistance. 6.2.8 Overhead Memory on Virtual Machines Virtual machines require a certain amount of available overhead memory to power on. You should be aware of the amount of this overhead. The following table lists the amount of overhead memory a virtual machine requires to power on. After a virtual machine is running, the amount of overhead memory it uses might differ from the amount listed in the table. The sample values were collected with VMX swap enabled and hardware MMU enabled for the virtual machine. (VMX swap is enabled by default.) Note The table provides a sample of overhead memory values and does not attempt to provide information about all possible configurations. You can configure a virtual machine to have up to 64 virtual CPUs, depending on the number of licensed CPUs on the host and the number of CPUs that the guest operating system supports. Sample Overhead Memory on Virtual Machines Memory (MB) 1 VCPU 2 VCPUs 4 VCPUs 8 VCPUs 256 20.29 24.28 32.23 48.16 1024 25.90 29.91 37.86 53.82 4096 48.64 52.72 60.67 76.78 16384 139.62 143.98 151.93 168.60 6.2.9 How ESXi Hosts Allocate Memory A host allocates the memory specified by the Limit parameter to each virtual machine, unless memory is overcommitted. ESXi never allocates more memory to a virtual machine than its specified physical memory size. For example, a 1GB virtual machine might have the default limit (unlimited) or a userspecified limit (for example 2GB). In both cases, the ESXi host never allocates more than 1GB, the physical memory size that was specified for it. When memory is overcommitted, each virtual machine is allocated an amount of memory somewhere between what is specified by Reservation and what is specified by Limit. The amount of memory granted to a virtual machine above its reservation usually varies with the current memory load. A host determines allocations for each virtual machine based on the number of shares allocated to it and an estimate of its recent working set size. Shares — ESXi hosts use a modified proportional-share memory allocation policy. Memory shares entitle a virtual machine to a fraction of available physical memory. Working set size — ESXi hosts estimate the working set for a virtual machine by monitoring memory activity over successive periods of virtual machine execution time. Estimates are smoothed over several time periods using techniques that respond rapidly to increases in working set size and more slowly to decreases in working set size. This approach ensures that a virtual machine from which idle memory is reclaimed can ramp up quickly to its full share-based allocation when it starts using its memory more actively. Memory activity is monitored to estimate the working set sizes for a default period of 60 seconds. To modify this default , adjust the Mem.SamplePeriod advanced setting. See Set Advanced Host Attributes. 6.2.10 VMX Swap Files Virtual machine executable (VMX) swap files allow the host to greatly reduce the amount of overhead memory reserved for the VMX process. Note VMX swap files are not related to the swap to host cache feature or to regular host-level swap files. ESXi reserves memory per virtual machine for a variety of purposes. Memory for the needs of certain components, such as the virtual machine monitor (VMM) and virtual devices, is fully reserved when a virtual machine is powered on. However, some of the overhead memory that is reserved for the VMX process can be swapped. The VMX swap feature reduces the VMX memory reservation significantly (for example, from about 50MB or more per virtual machine to about 10MB per virtual machine). This allows the remaining memory to be swapped out when host memory is overcommitted, reducing overhead memory reservation for each virtual machine. The host creates VMX swap files automatically, provided there is sufficient free disk space at the time a virtual machine is powered on. 6.2.11 Memory Tax for Idle Virtual Machines If a virtual machine is not actively using all of its currently allocated memory, ESXi charges more for idle memory than for memory that is in use. This is done to help prevent virtual machines from hoarding idle memory. The idle memory tax is applied in a progressive fashion. The effective tax rate increases as the ratio of idle memory to active memory for the virtual machine rises. (In earlier versions of ESXi that did not support hierarchical resource pools, all idle memory for a virtual machine was taxed equally.) You can modify the idle memory tax rate with the Mem.IdleTax option. Use this option, together with the Mem.SamplePeriod advanced attribute, to control how the system determines target memory allocations for virtual machines. See Set Advanced Host Attributes. 6.2.12 Using Swap Files You can specify the location of your swap file, reserve swap space when memory is overcommitted, and delete a swap file. ESXi hosts use swapping to forcibly reclaim memory from a virtual machine when the vmmemctl driver is not available or is not responsive. It was never installed.t is explicitly disabled. It is not running (for example, while the guest operating system is booting). It is temporarily unable to reclaim memory quickly enough to satisfy current system demands. It is functioning properly, but maximum balloon size is reached. Standard demand-paging techniques swap pages back in when the virtual machine needs them. 6.2.13 Swap File Location By default, the swap file is created in the same location as the virtual machine's configuration file. A swap file is created by the ESXi host when a virtual machine is powered on. If this file cannot be created, the virtual machine cannot power on. Instead of accepting the default, you can also: Use per-virtual machine configuration options to change the datastore to another shared storage location. Use host-local swap, which allows you to specify a datastore stored locally on the host. This allows you to swap at a per-host level, saving space on the SAN. However, it can lead to a slight degradation in performance for vSphere vMotion because pages swapped to a local swap file on the source host must be transferred across the network to the destination host. 6.2.14 Configure Virtual Machine Swapfile Properties for the Host Configure a swapfile location for the host to determine the default location for virtual machine swapfiles. By default, swapfiles for a virtual machine are located on a VMFS3 datastore in the folder that contains the other virtual machine files. However, you can configure your host to place virtual machine swapfiles on an alternative datastore. You can use this option to place virtual machine swapfiles on lower-cost or higherperformance storage. You can also override this host-level setting for individual virtual machines. Setting an alternative swapfile location might cause migrations with vMotion to complete more slowly. For best vMotion performance, store virtual machine swapfiles in the same directory as the virtual machine. If vCenter Server manages your host, you cannot change the swapfile location if you connect directly to the host by using the vSphere Client. You must connect the vSphere Client to the vCenter Server system. 6.2.15 Swapping to Host Cache Datastores that are created on solid state drives (SSD) can be used to allocate space for host cache. The host reserves a certain amount of space for swapping to host cache. The host cache is made up of files on a low-latency disk that ESXi uses as a write back cache for virtual machine swap files. The cache is shared by all virtual machines running on the host. Host-level swapping of virtual machine pages makes the best use of potentially limited SSD space. Using swap to host cache is not the same as placing regular swap files on SSD-backed datastores. Even if you enable swap to host cache, the host still needs to create regular swap files. However, when you use swap to host cache, the speed of the storage where the host places regular swap files is less important. The Host Cache Configuration page allows you to view the amount of space on a datastore that a host can use to swap to host cache. Only SSD-backed datastores appear in the list of datastores on the Host Cache Configuration page. 6.2.16 Sharing Memory Across Virtual Machines Many ESXi workloads present opportunities for sharing memory across virtual machines (as well as within a single virtual machine). For example, several virtual machines might be running instances of the same guest operating system, have the same applications or components loaded, or contain common data. In such cases, a host uses a proprietary transparent page sharing technique to securely eliminate redundant copies of memory pages. With memory sharing, a workload running in virtual machines often consumes less memory than it would when running on physical machines. As a result, higher levels of overcommitment can be supported efficiently. Use the Mem.ShareScanTime and Mem.ShareScanGHz advanced settings to control the rate at which the system scans memory to identify opportunities for sharing memory. You can also disable sharing for individual virtual machines by setting the sched.mem.pshare.enable option to FALSE (this option defaults to TRUE). See Set Advanced Virtual Machine Attributes. ESXi memory sharing runs as a background activity that scans for sharing opportunities over time. The amount of memory saved varies over time. For a fairly constant workload, the amount generally increases slowly until all sharing opportunities are exploited. To determine the effectiveness of memory sharing for a given workload, try running the workload, and use resxtop or esxtop to observe the actual savings. Find the information in the PSHARE field of the interactive mode in the Memory page. 6.2.17 Memory Compression ESXi provides a memory compression cache to improve virtual machine performance when you use memory overcommitment. Memory compression is enabled by default. When a host's memory becomes overcommitted, ESXi compresses virtual pages and stores them in memory. Because accessing compressed memory is faster than accessing memory that is swapped to disk, memory compression in ESXi allows you to overcommit memory without significantly hindering performance. When a virtual page needs to be swapped, ESXi first attempts to compress the page. Pages that can be compressed to 2 KB or smaller are stored in the virtual machine's compression cache, increasing the capacity of the host. You can set the maximum size for the compression cache and disable memory compression using the Advanced Settings dialog box in the vSphere Client. 6.2.18 Measuring and Differentiating Types of Memory Usage The Performance tab of the vSphere Client displays a number of metrics that can be used to analyze memory usage. Some of these memory metrics measure guest physical memory while other metrics measure machine memory. For instance, two types of memory usage that you can examine using performance metrics are guest physical memory and machine memory. You measure guest physical memory using the Memory Granted metric (for a virtual machine) or Memory Shared (for a host). To measure machine memory, however, use Memory Consumed (for a virtual machine) or Memory Shared Common (for a host). Understanding the conceptual difference between these types of memory usage is important for knowing what these metrics are measuring and how to interpret them. The VMkernel maps guest physical memory to machine memory, but they are not always mapped one-to-one. Multiple regions of guest physical memory might be mapped to the same region of machine memory (in the case of memory sharing) or specific regions of guest physical memory might not be mapped to machine memory (when the VMkernel swaps out or balloons guest physical memory). In these situations, calculations of guest physical memory usage and machine memory usage for an individual virtual machine or a host differ. Consider the example in the following figure, which shows two virtual machines running on a host. Each block represents 4 KB of memory and each color/letter represents a different set of data on a block. Memory Usage Example The performance metrics for the virtual machines can be determined as follows: To determine Memory Granted (the amount of guest physical memory that is mapped to machine memory) for virtual machine 1, count the number of blocks in virtual machine 1's guest physical memory that have arrows to machine memory and multiply by 4 KB. Since there are five blocks with arrows, Memory Granted would be 20 KB. Memory Consumed is the amount of machine memory allocated to the virtual machine, accounting for savings from shared memory. First, count the number of blocks in machine memory that have arrows from virtual machine 1's guest Measuring and Differentiating Types of Memory Usage The Performance tab of the vSphere Client displays a number of metrics that can be used to analyze memory usage. Some of these memory metrics measure guest physical memory while other metrics measure machine memory. For instance, two types of memory usage that you can examine using performance metrics are guest physical memory and machine memory. You measure guest physical memory using the Memory Granted metric (for a virtual machine) or Memory Shared (for a host). To measure machine memory, however, use Memory Consumed (for a virtual machine) or Memory Shared Common (for a host). Understanding the conceptual difference between these types of memory usage is important for knowing what these metrics are measuring and how to interpret them. The VMkernel maps guest physical memory to machine memory, but they are not always mapped one-to-one. Multiple regions of guest physical memory might be mapped to the same region of machine memory (in the case of memory sharing) or specific regions of guest physical memory might not be mapped to machine memory (when the VMkernel swaps out or balloons guest physical memory). In these situations, calculations of guest physical memory usage and machine memory usage for an individual virtual machine or a host differ. Consider the example in the following figure, which shows two virtual machines running on a host. Each block represents 4 KB of memory and each color/letter represents a different set of data on a block. Memory Usage Example The performance metrics for the virtual machines can be determined as follows: To determine Memory Granted (the amount of guest physical memory that is mapped to machine memory) for virtual machine 1, count the number of blocks in virtual machine 1's guest physical memory that have arrows to machine memory and multiply by 4 KB. Since there are five blocks with arrows, Memory Granted would be 20 KB. Memory Consumed is the amount of machine memory allocated to the virtual machine, accounting for savings from shared memory. First, count the number of blocks in machine memory that have arrows from virtual machine 1's guest physical memory. There are three such blocks, but one block is shared with virtual machine 2. So count two full blocks plus half of the third and multiply by 4 KB for a total of 10 KB Memory Consumed. The important difference between these two metrics is that Memory Granted counts the number of blocks with arrows at the guest physical memory level and Memory Consumed counts the number of blocks with arrows at the machine memory level. The number of blocks differs between the two levels due to memory sharing and so Memory Granted and Memory Consumed differ. This is not problematic and shows that memory is being saved through sharing or other reclamation techniques. A similar result is obtained when determining Memory Shared and Memory Shared Common for the host. Memory Shared for the host is the sum of each virtual machine's Memory Shared. Calculate this by looking at each virtual machine's guest physical memory and counting the number of blocks that have arrows to machine memory blocks that themselves have more than one arrow pointing at them. There are six such blocks in the example, so Memory Shared for the host is 24 KB. Memory Shared Common is the amount of machine memory that is shared by virtual machines. To determine this, look at the machine memory and count the number of blocks that have more than one arrow pointing at them. There are three such blocks, so Memory Shared Common is 12 KB. Memory Shared is concerned with guest physical memory and looks at the origin of the arrows. Memory Shared Common, however, deals with machine memory and looks at the destination of the arrows. The memory metrics that measure guest physical memory and machine memory might appear contradictory. In fact, they are measuring different aspects of a virtual machine's memory usage. By understanding the differences between these metrics, you can better utilize them to diagnose performance issues. 6.3 Memory Reliability Memory reliability, also known as error insolation, allows ESXi to stop using parts of memory when it determines that a failure might occur, as well as when a failure did occur. When enough corrected errors are reported at a particular address, ESXi stops using this address to prevent the corrected error from becoming an uncorrected error. Memory reliability provides a better VMkernel reliability despite corrected and uncorrected errors in RAM. It also enables the system to avoid using memory pages that might contain errors. 6.3.1 Correct an Error Isolation Notification With memory reliability, VMkernel stops using pages that receive an error isolation notification. The user receives an event in the vSphere Client when VMkernel recovers from an uncorrectable memory error, when VMkernel retires a significant percentage of system memory due to a large number of correctable errors, or if there is a large number of pages that are unable to retire. Procedure Vacate the host. Migrate the virtual machines. Run tests. 6.4 Managing Storage I/O Resources vSphere Storage I/O Control allows cluster-wide storage I/O prioritization, which allows better workload consolidation and helps reduce extra costs associated with over provisioning. Storage I/O Control extends the constructs of shares and limits to handle storage I/O resources. You can control the amount of storage I/O that is allocated to virtual machines during periods of I/O congestion, which ensures that more important virtual machines get preference over less important virtual machines for I/O resource allocation. When you enable Storage I/O Control on a datastore, ESXi begins to monitor the device latency that hosts observe when communicating with that datastore. When device latency exceeds a threshold, the datastore is considered to be congested and each virtual machine that accesses that datastore is allocated I/O resources in proportion to their shares. You set shares per virtual machine. You can adjust the number for each based on need. Configuring Storage I/O Control is a two-step process: 1. Enable Storage I/O Control for the datastore. 2. Set the number of storage I/O shares and upper limit of I/O operations per second (IOPS) allowed for each virtual machine. By default, all virtual machine shares are set to Normal (1000) with unlimited IOPS. Note Storage I/O Control is enabled by default on Storage DRS-enabled datastore clusters. 6.4.1 Storage I/O Control Resource Shares and Limits You allocate the number of storage I/O shares and upper limit of I/O operations per second (IOPS) allowed for each virtual machine. When storage I/O congestion is detected for a datastore, the I/O workloads of the virtual machines accessing that datastore are adjusted according to the proportion of virtual machine shares each virtual machine has. Storage I/O shares are similar to those used for memory and CPU resource allocation, which are described in Resource Allocation Shares. These shares represent the relative importance of a virtual machine with regard to the distribution of storage I/O resources. Under resource contention, virtual machines with higher share values have greater access to the storage array, which typically results in higher throughput and lower latency. When you allocate storage I/O resources, you can limit the IOPS that are allowed for a virtual machine. By default, these are unlimited. If a virtual machine has more than one virtual disk, you must set the limit on all of its virtual disks. Otherwise, the limit will not be enforced for the virtual machine. In this case, the limit on the virtual machine is the aggregation of the limits for all virtual disks. The benefits and drawbacks of setting resource limits are described in Resource Allocation Limit. If the limit you want to set for a virtual machine is in terms of MB per second instead of IOPS, you can convert MB per second into IOPS based on the typical I/O size for that virtual machine. For example, to restrict a backup application with 64KB IOs to 10MB per second, set a limit of 160 IOPS. 6.4.2 Storage I/O Control Requirements Storage I/O Control has several requirements and limitations. Datastores that are Storage I/O Control-enabled must be managed by a single vCenter Server system. Storage I/O Control is supported on Fibre Channel-connected, iSCSI-connected, and NFS-connected storage. Raw Device Mapping (RDM) is not supported. Storage I/O Control does not support datastores with multiple extents. Before using Storage I/O Control on datastores that are backed by arrays with automated storage tiering capabilities, check the VMware Storage/SAN Compatibility Guide to verify whether your automated tiered storage array has been certified to be compatible with Storage I/O Control. Automated storage tiering is the ability of an array (or group of arrays) to migrate LUNs/volumes or parts of LUNs/volumes to different types of storage media (SSD, FC, SAS, SATA) based on user-set policies and current I/O patterns. No special certification is required for arrays that do not have these automatic migration/tiering features, including those that provide the ability to manually migrate data between different types of storage media. 6.4.3 Storage I/O Control Resource Shares and Limits You allocate the number of storage I/O shares and upper limit of I/O operations per second (IOPS) allowed for each virtual machine. When storage I/O congestion is detected for a datastore, the I/O workloads of the virtual machines accessing that datastore are adjusted according to the proportion of virtual machine shares each virtual machine has. Storage I/O shares are similar to those used for memory and CPU resource allocation, which are described in Resource Allocation Shares. These shares represent the relative importance of a virtual machine with regard to the distribution of storage I/O resources. Under resource contention, virtual machines with higher share values have greater access to the storage array, which typically results in higher throughput and lower latency. When you allocate storage I/O resources, you can limit the IOPS that are allowed for a virtual machine. By default, these are unlimited. If a virtual machine has more than one virtual disk, you must set the limit on all of its virtual disks. Otherwise, the limit will not be enforced for the virtual machine. In this case, the limit on the virtual machine is the aggregation of the limits for all virtual disks. The benefits and drawbacks of setting resource limits are described in Resource Allocation Limit. If the limit you want to set for a virtual machine is in terms of MB per second instead of IOPS, you can convert MB per second into IOPS based on the typical I/O size for that virtual machine. For example, to restrict a backup application with 64KB IOs to 10MB per second, set a limit of 160 IOPS. 6.5 Set Storage I/O Control Threshold Value The congestion threshold value for a datastore is the upper limit of latency that is allowed for a datastore before Storage I/O Control begins to assign importance to the virtual machine workloads according to their shares. You do not need to adjust the threshold setting in most environments. Caution Storage I/O Control will not function correctly unless all datatores that share the same spindles on the array have the same congestion threshold. If you change the congestion threshold setting, set the value based on the following considerations. A higher value typically results in higher aggregate throughput and weaker isolation. Throttling will not occur unless the overall average latency is higher than the threshold. If throughput is more critical than latency, do not set the value too low. For example, for Fibre Channel disks, a value below 20 ms could lower peak disk throughput. A very high value (above 50 ms) might allow very high latency without any significant gain in overall throughput. A lower value will result in lower device latency and stronger virtual machine I/O performance isolation. Stronger isolation means that the shares controls are enforced more often. Lower device latency translates into lower I/O latency for the virtual machines with the highest shares, at the cost of higher I/O latency experienced by the virtual machines with fewer shares. If latency is more important, a very low value (lower than 20 ms) will result in lower device latency and better isolation among I/Os at the potential cost of a decrease in aggregate datastore throughput. 6.5.1 Monitor Storage I/O Control Shares Use the datastore Performance tab to monitor how Storage I/O Control handles the I/O workloads of the virtual machines accessing a datastore based on their shares. Datastore performance charts allow you to monitor the following information: Average latency and aggregated IOPS on the datastore Latency among hosts Queue depth among hosts Read/write IOPS among hosts Read/write latency among virtual machine disks Read/write IOPS among virtual machine disks Procedure 1. Select the datastore in the vSphere Client inventory and click the Performance tab. 2. From the View drop-down menu, select Performance. For more information, see the vSphere Monitoring and Performance documentation. 6.6 Managing Resource Pools A resource pool is a logical abstraction for flexible management of resources. Resource pools can be grouped into hierarchies and used to hierarchically partition available CPU and memory resources. Each standalone host and each DRS cluster has an (invisible) root resource pool that groups the resources of that host or cluster. The root resource pool does not appear because the resources of the host (or cluster) and the root resource pool are always the same. Users can create child resource pools of the root resource pool or of any user-created child resource pool. Each child resource pool owns some of the parent’s resources and can, in turn, have a hierarchy of child resource pools to represent successively smaller units of computational capability. A resource pool can contain child resource pools, virtual machines, or both. You can create a hierarchy of shared resources. The resource pools at a higher level are called parent resource pools. Resource pools and virtual machines that are at the same level are called siblings. The cluster itself represents the root resource pool. If you do not create child resource pools, only the root resource pools exist. In the following example, RP-QA is the parent resource pool for RP-QA-UI. RP-Marketing and RP-QA are siblings. The three virtual machines immediately below RP-Marketing are also siblings. Parents, Children, and Siblings in Resource Pool Hierarchy For each resource pool, you specify reservation, limit, shares, and whether the reservation should be expandable. The resource pool resources are then available to child resource pools and virtual machines. 6.7 Managing Resource Pools You can create a child resource pool of any ESXi host, resource pool, or DRS cluster. Note If a host has been added to a cluster, you cannot create child resource pools of that host. If the cluster is enabled for DRS, you can create child resource pools of the cluster. When you create a child resource pool, you are prompted for resource pool attribute information. The system uses admission control to make sure you cannot allocate resources that are not available. Prerequisites The vSphere Client is connected to the vCenter Server system. If you connect the vSphere Client directly to a host, you cannot create a resource pool. Procedure 1. In the vSphere Client inventory, select a parent object for the resource pool (a host, another resource pool, or a DRS cluster). 2. Select File > New > Resource Pool. 3. Type a name to identify the resource pool. 4. Specify how to allocate CPU and memory resources. The CPU resources for your resource pool are the guaranteed physical resources the host reserves for a resource pool. Normally, you accept the default and let the host handle resource allocation. Option Description Shares Specify shares for this resource pool with respect to the parent’s total resources. Sibling resource pools share resources according to their relative share values bounded by the reservation and limit. Select Low, Normal, or High to specify share values respectively in a 1:2:4 ratio. Select Custom to give each virtual machine a specific number of shares, which expresses a proportional weight. Reservation Specify a guaranteed CPU or memory allocation for this resource pool. Defaults to 0. A nonzero reservation is subtracted from the unreserved resources of the parent (host or resource pool). The resources are considered reserved, regardless of whether virtual machines are associated with the resource pool. Expandable Reservation When the check box is selected (default), expandable reservations are considered during admission control. If you power on a virtual machine in this resource pool, and the combined reservations of the virtual machines are larger than the reservation of the resource pool, the resource pool can use resources from its parent or ancestors. Limit Specify the upper limit for this resource pool’s CPU or memory allocation. You can usually accept the default (Unlimited). To specify a limit, deselect the Unlimited check box 6.8 Resource Pool Admission Control When you power on a virtual machine in a resource pool, or try to create a child resource pool, the system performs additional admission control to ensure the resource pool’s restrictions are not violated. Before you power on a virtual machine or create a resource pool, ensure that sufficient resources are available using the Resource Allocation tab in the vSphere Client. The Available Reservation value for CPU and memory displays resources that are unreserved. How available CPU and memory resources are computed and whether actions are performed depends on the Reservation Type. Reservation Types Reservation Description Type Fixed Expandable (default) The system checks whether the selected resource pool has sufficient unreserved resources. If it does, the action can be performed. If it does not, a message appears and the action cannot be performed. The system considers the resources available in the selected resource pool and its direct parent resource pool. If the parent resource pool also has the Expandable Reservation option selected, it can borrow resources from its parent resource pool. Borrowing resources occurs recursively from the ancestors of the current resource pool as long as the Expandable Reservation option is selected. Leaving this option selected offers more flexibility, but, at the same time provides less protection. A child resource pool owner might reserve more resources than you anticipate. The system does not allow you to violate preconfigured Reservation or Limit settings. Each time you reconfigure a resource pool or power on a virtual machine, the system validates all parameters so all service-level guarantees can still be met. 6.8.1 Expandable Reservations Example 1 This example shows you how a resource pool with expandable reservations works. Assume an administrator manages pool P, and defines two child resource pools, S1 and S2, for two different users (or groups). The administrator knows that users want to power on virtual machines with reservations, but does not know how much each user will need to reserve. Making the reservations for S1 and S2 expandable allows the administrator to more flexibly share and inherit the common reservation for pool P. Without expandable reservations, the administrator needs to explicitly allocate S1 and S2 a specific amount. Such specific allocations can be inflexible, especially in deep resource pool hierarchies and can complicate setting reservations in the resource pool hierarchy. Expandable reservations cause a loss of strict isolation. S1 can start using all of P's reservation, so that no memory or CPU is directly available to S2. 6.8.2 Expandable Reservations Example 2 This example shows how a resource pool with expandable reservations works. Assume the following scenario, as shown in the figure. Parent pool RP-MOM has a reservation of 6GHz and one running virtual machine VM-M1 that reserves 1GHz. You create a child resource pool RP-KID with a reservation of 2GHz and with Expandable Reservation selected. You add two virtual machines, VM-K1 and VM-K2, with reservations of 2GHz each to the child resource pool and try to power them on. VM-K1 can reserve the resources directly from RP-KID (which has 2GHz). No local resources are available for VM-K2, so it borrows resources from the parent resource pool, RP-MOM. RP-MOM has 6GHz minus 1GHz (reserved by the virtual machine) minus 2GHz (reserved by RP-KID), which leaves 3GHz unreserved. With 3GHz available, you can power on the 2GHz virtual machine. Admission Control with Expandable Resource Pools: Successful Power-On Now, consider another scenario with VM-M1 and VM-M2. Power on two virtual machines in RP-MOM with a total reservation of 3GHz. You can still power on VM-K1 in RP-KID because 2GHz are available locally. When you try to power on VM-K2, RP-KID has no unreserved CPU capacity so it checks its parent. RP-MOM has only 1GHz of unreserved capacity available (5GHz of RP-MOM are already in use—3GHz reserved by the local virtual machines and 2GHz reserved by RP-KID). As a result, you cannot power on VM-K2, which requires a 2GHz reservation. Admission Control with Expandable Resource Pools: Power-On Prevented 6.9 Creating a DRS Cluster A cluster is a collection of ESXi hosts and associated virtual machines with shared resources and a shared management interface. Before you can obtain the benefits of cluster-level resource management you must create a cluster and enable DRS. Depending on whether or not Enhanced vMotion Compatibility (EVC) is enabled, DRS behaves differently when you use vSphere Fault Tolerance (vSphere FT) virtual machines in your cluster. DRS Behavior with vSphere FT Virtual Machines and EVC EVC DRS (Load Balancing) Enabled Enabled (Primary and Secondary VMs) Disabled Disabled (Primary and Secondary VMs) DRS (Initial Placement) Enabled (Primary and Secondary VMs) Disabled (Primary VMs) Fully Automated (Secondary VMs) 6.9.1 Migration Recommendations If you create a cluster with a default manual or partially automated mode, vCenter Server displays migration recommendations on the DRS Recommendations page. The system supplies as many recommendations as necessary to enforce rules and balance the resources of the cluster. Each recommendation includes the virtual machine to be moved, current (source) host and destination host, and a reason for the recommendation. The reason can be one of the following: Balance average CPU loads or reservations. Balance average memory loads or reservations. Satisfy resource pool reservations. Satisfy an affinity rule. Host is entering maintenance mode or standby mode. 6.10 DRS Cluster Requirements Hosts that are added to a DRS cluster must meet certain requirements to use cluster features successfully. 6.10.1 Shared Storage Requirements A DRS cluster has certain shared storage requirements. Ensure that the managed hosts use shared storage. Shared storage is typically on a SAN, but can also be implemented using NAS shared storage. See the vSphere Storage documentation for information about other shared storage. 6.10.2 Shared VMFS Volume Requirements A DRS cluster has certain shared VMFS volume requirements. Configure all managed hosts to use shared VMFS volumes. Place the disks of all virtual machines on VMFS volumes that are accessible by source and destination hosts. Ensure the VMFS volume is sufficiently large to store all virtual disks for your virtual machines. Ensure all VMFS volumes on source and destination hosts use volume names, and all virtual machines use those volume names for specifying the virtual disks. Note Virtual machine swap files also need to be on a VMFS accessible to source and destination hosts (just like .vmdk virtual disk files). This requirement does not apply if all source and destination hosts are ESX Server 3.5 or higher and using host-local swap. In that case, vMotion with swap files on unshared storage is supported. Swap files are placed on a VMFS by default, but administrators might override the file location using advanced virtual machine configuration options. Processor Compatibility Requirements A DRS cluster has certain processor compatibility requirements. To avoid limiting the capabilities of DRS, you should maximize the processor compatibility of source and destination hosts in the cluster. vMotion transfers the running architectural state of a virtual machine between underlying ESXi hosts. vMotion compatibility means that the processors of the destination host must be able to resume execution using the equivalent instructions where the processors of the source host were suspended. Processor clock speeds and cache sizes might vary, but processors must come from the same vendor class (Intel versus AMD) and the same processor family to be compatible for migration with vMotion. Processor families are defined by the processor vendors. You can distinguish different processor versions within the same family by comparing the processors’ model, stepping level, and extended features. Sometimes, processor vendors have introduced significant architectural changes within the same processor family (such as 64-bit extensions and SSE3). VMware identifies these exceptions if it cannot guarantee successful migration with vMotion. vCenter Server provides features that help ensure that virtual machines migrated with vMotion meet processor compatibility requirements. These features include: Enhanced vMotion Compatibility (EVC) – You can use EVC to help ensure vMotion compatibility for the hosts in a cluster. EVC ensures that all hosts in a cluster present the same CPU feature set to virtual machines, even if the actual CPUs on the hosts differ. This prevents migrations with vMotion from failing due to incompatible CPUs. Configure EVC from the Cluster Settings dialog box. The hosts in a cluster must meet certain requirements for the cluster to use EVC. For information about EVC and EVC requirements, see the vCenter Server and Host Management documentation. CPU compatibility masks – vCenter Server compares the CPU features available to a virtual machine with the CPU features of the destination host to determine whether to allow or disallow migrations with vMotion. By applying CPU compatibility masks to individual virtual machines, you can hide certain CPU features from the virtual machine and potentially prevent migrations with vMotion from failing due to incompatible CPUs. 6.10.3 vMotion Requirements for DRS Clusters A DRS cluster has certain vMotion requirements. To enable the use of DRS migration recommendations, the hosts in your cluster must be part of a vMotion network. If the hosts are not in the vMotion network, DRS can still make initial placement recommendations. To be configured for vMotion, each host in the cluster must meet the following requirements: vMotion does not support raw disks or migration of applications clustered using Microsoft Cluster Service (MSCS). vMotion requires a private Gigabit Ethernet migration network between all of the vMotion enabled managed hosts. When vMotion is enabled on a managed host, configure a unique network identity object for the managed host and connect it to the private migration network. Automation Level Action Manual Initial placement: Recommended host(s) is displayed. Migration: Recommendation is displayed. Partially Automated Initial placement: Automatic. Migration: Recommendation is displayed. Fully Automated Initial placement: Automatic. Migration: Recommendation is executed automatically. 6.10.4 Set a Custom Automation Level for a Virtual Machine After you create a DRS cluster, you can customize the automation level for individual virtual machines to override the cluster’s default automation level. For example, you can select Manual for specific virtual machines in a cluster with full automation, or Partially Automated for specific virtual machines in a manual cluster. If a virtual machine is set to Disabled, vCenter Server does not migrate that virtual machine or provide migration recommendations for it. This is known as pinning the virtual machine to its registered host. Note If you have not enabled Enhanced vMotion Compatibility (EVC) for the cluster, fault tolerant virtual machines are set to DRS disabled. They appear on this screen, but you cannot assign an automation mode to them. 6.10.5 Add an Unmanaged Host to a Cluster You can add an unmanaged host to a cluster. Such a host is not currently managed by the same vCenter Server system as the cluster and it is not visible in the vSphere Client. Procedure 1. Select the cluster to which to add the host and select Add Host from the right-click menu. 2. Enter the host name, user name, and password, and click Next. 3. View the summary information and click Next. 4. Select what to do with the host’s virtual machines and resource pools. Put this host’s virtual machines in the cluster’s root resource pool vCenter Server removes all existing resource pools of the host and the virtual machines in the host’s hierarchy are all attached to the root. Because share allocations are relative to a resource pool, you might have to manually change a virtual machine’s shares after selecting this option, which destroys the resource pool hierarchy. Create a resource pool for this host’s virtual machines and resource pools vCenter Server creates a top-level resource pool that becomes a direct child of the cluster and adds all children of the host to that new resource pool. You can supply a name for that new top-level resource pool. The default is Grafted from <host_name>. The host is added to the cluster. 6.11 Removing a Host from a Cluster When you remove a host from a DRS cluster, you affect resource pool hierarchies, virtual machines, and you might create invalid clusters. Consider the affected objects before you remove the host. Resource Pool Hierarchies – When you remove a host from a cluster, the host retains only the root resource pool, even if you used a DRS cluster and decided to graft the host resource pool when you added the host to the cluster. In that case, the hierarchy remains with the cluster. You can create a host-specific resource pool hierarchy. Note Ensure that you remove the host from the cluster by first placing it in maintenance mode. If you instead disconnect the host before removing it from the cluster, the host retains the resource pool that reflects the cluster hierarchy. Virtual Machines – A host must be in maintenance mode before you can remove it from the cluster and for a host to enter maintenance mode all powered-on virtual machines must be migrated off that host. When you request that a host enter maintenance mode, you are also asked whether you want to migrate all the poweredoff virtual machines on that host to other hosts in the cluster. Invalid Clusters – When you remove a host from a cluster, the resources available for the cluster decrease. If the cluster has enough resources to satisfy the reservations of all virtual machines and resource pools in the cluster, the cluster adjusts resource allocation to reflect the reduced amount of resources. If the cluster does not have enough resources to satisfy the reservations of all resource pools, but there are enough resources to satisfy the reservations for all virtual machines, an alarm is issued and the cluster is marked yellow. DRS continues to run. 6.11.1 Place a Host in Maintenance Mode You place a host in maintenance mode when you need to service it, for example, to install more memory. A host enters or leaves maintenance mode only as the result of a user request. Virtual machines that are running on a host entering maintenance mode need to be migrated to another host 6.12 DRS Cluster Validity The vSphere Client indicates whether a DRS cluster is valid, overcommitted (yellow), or invalid (red). DRS clusters become overcommitted or invalid for several reasons. A cluster might become overcommitted if a host fails. A cluster becomes invalid if vCenter Server is unavailable and you power on virtual machines using a vSphere Client connected directly to a host. A cluster becomes invalid if the user reduces the reservation on a parent resource pool while a virtual machine is in the process of failing over. If changes are made to hosts or virtual machines using a vSphere Client connected to a host while vCenter Server is unavailable, those changes take effect. When vCenter Server becomes available again, you might find that clusters have turned red or yellow because cluster requirements are no longer met. When considering cluster validity scenarios, you should understand these terms. Reservation Reservation Used Unreserved A fixed, guaranteed allocation for the resource pool input by the user. The sum of the reservation or reservation used (whichever is larger) for each child resource pool, added recursively. This nonnegative number differs according to resource pool type. Nonexpandable resource pools: Reservation minus reservation used. Expandable resource pools: (Reservation minus reservation used) plus any unreserved resources that can be borrowed from its ancestor resource pools. 6.12.1 Valid DRS Clusters A valid cluster has enough resources to meet all reservations and to support all running virtual machines. The following figure shows an example of a valid cluster with fixed resource pools and how its CPU and memory resources are computed. Valid Cluster with Fixed Resource Pools The cluster has the following characteristics: A cluster with total resources of 12GHz. Three resource pools, each of type Fixed (Expandable Reservation is not selected). The total reservation of the three resource pools combined is 11GHz (4+4+3 GHz). The total is shown in the Reserved Capacity field for the cluster. RP1 was created with a reservation of 4GHz. Two virtual machines. (VM1 and VM7) of 2GHz each are powered on (Reservation Used: 4GHz). No resources are left for powering on additional virtual machines. VM6 is shown as not powered on. It consumes none of the reservation. RP2 was created with a reservation of 4GHz. Two virtual machines of 1GHz and 2GHz are powered on (Reservation Used: 3GHz). 1GHz remains unreserved. RP3 was created with a reservation of 3GHz. One virtual machine with 3GHz is powered on. No resources for powering on additional virtual machines are available. The following figure shows an example of a valid cluster with some resource pools (RP1 and RP3) using reservation type Expandable. Valid Cluster with Expandable Resource Pools A valid cluster can be configured as follows: A cluster with total resources of 16GHz. RP1 and RP3 are of type Expandable, RP2 is of type Fixed. The total reservation used of the three resource pools combined is 16GHz (6GHz for RP1, 5GHz for RP2, and 5GHz for RP3). 16GHz shows up as the Reserved Capacity for the cluster at top level. RP1 was created with a reservation of 4GHz. Three virtual machines of 2GHz each are powered on. Two of those virtual machines (for example, VM1 and VM7) can use RP1’s reservations, the third virtual machine (VM6) can use reservations from the cluster’s resource pool. (If the type of this resource pool were Fixed, you could not power on the additional virtual machine.) RP2 was created with a reservation of 5GHz. Two virtual machines of 1GHz and 2GHz are powered on (Reservation Used: 3GHz). 2GHz remains unreserved. RP3 was created with a reservation of 5GHz. Two virtual machines of 3GHz and 2GHz are powered on. Even though this resource pool is of type Expandable, no additional 2GHz virtual machine can be powered on because the parent’s extra resources are already used by RP1. 6.12.2 Overcommitted DRS Clusters A cluster becomes overcommitted (yellow) when the tree of resource pools and virtual machines is internally consistent but the cluster does not have the capacity to support all resources reserved by the child resource pools. There will always be enough resources to support all running virtual machines because, when a host becomes unavailable, all its virtual machines become unavailable. A cluster typically turns yellow when cluster capacity is suddenly reduced, for example, when a host in the cluster becomes unavailable. VMware recommends that you leave adequate additional cluster resources to avoid your cluster turning yellow. Yellow Cluster In this example: A cluster with total resources of 12GHz coming from three hosts of 4GHz each. Three resource pools reserving a total of 12GHz. The total reservation used by the three resource pools combined is 12GHz (4+5+3 GHz). That shows up as the Reserved Capacity in the cluster. One of the 4GHz hosts becomes unavailable, so total resources reduce to 8GHz. At the same time, VM4 (1GHz) and VM3 (3GHz), which were running on the host that failed, are no longer running. The cluster is now running virtual machines that require a total of 6GHz. The cluster still has 8GHz available, which is sufficient to meet virtual machine requirements. The resource pool reservations of 12GHz can no longer be met, so the cluster is marked as yellow. 6.12.3 Invalid DRS Clusters A cluster enabled for DRS becomes invalid (red) when the tree is no longer internally consistent, that is, resource constraints are not observed. The total amount of resources in the cluster does not affect whether the cluster is red. A cluster can be red, even if enough resources exist at the root level, if there is an inconsistency at a child level. You can resolve a red DRS cluster problem either by powering off one or more virtual machines, moving virtual machines to parts of the tree that have sufficient resources, or editing the resource pool settings in the red part. Adding resources typically helps only when you are in the yellow state. A cluster can also turn red if you reconfigure a resource pool while a virtual machine is failing over. A virtual machine that is failing over is disconnected and does not count toward the reservation used by the parent resource pool. You might reduce the reservation of the parent resource pool before the failover completes. After the failover is complete, the virtual machine resources are again charged to the parent resource pool. If the pool’s usage becomes larger than the new reservation, the cluster turns red. If a user is able to start a virtual machine (in an unsupported way) with a reservation of 3GHz under resource pool 2, the cluster would become red, as shown in the following figure. Red Cluster 6.13 DPM Note ESXi hosts cannot automatically be brought out of standby mode unless they are running in a cluster managed by vCenter Server. vSphere DPM can use one of three power management protocols to bring a host out of standby mode: Intelligent Platform Management Interface (IPMI), Hewlett-Packard Integrated Lights-Out (iLO), or Wake-On-LAN (WOL). Each protocol requires its own hardware support and configuration. If a host does not support any of these protocols it cannot be put into standby mode by vSphere DPM. If a host supports multiple protocols, they are used in the following order: IPMI, iLO, WOL. 6.13.1 Test Wake-on-LAN for vSphere DPM The use of Wake-on-LAN (WOL) for the vSphere DPM feature is fully supported, if you configure and successfully test it according to the VMware guidelines. You must perform these steps before enabling vSphere DPM for a cluster for the first time or on any host that is being added to a cluster that is using vSphere DPM. Prerequisites Before testing WOL, ensure that your cluster meets the prerequisites. Your cluster must contain at least two ESX 3.5 (or ESX 3i version 3.5) or later hosts. Each host's vMotion networking link must be working correctly. The vMotion network should also be a single IP subnet, not multiple subnets separated by routers. The vMotion NIC on each host must support WOL. To check for WOL support, first determine the name of the physical network adapter corresponding to the VMkernel port by selecting the host in the inventory panel of the vSphere Client, selecting the Configuration tab, and clicking Networking. After you have this information, click on Network Adapters and find the entry corresponding to the network adapter. The Wake On LAN Supported column for the relevant adapter should show Yes. To display the WOL-compatibility status for each NIC on a host, select the host in the inventory panel of the vSphere Client, select the Configuration tab, and click Network Adapters. The NIC must show Yes in the Wake On LAN Supported column. The switch port that each WOL-supporting vMotion NIC is plugged into should be set to auto negotiate the link speed, and not set to a fixed speed (for example, 1000 Mb/s). Many NICs support WOL only if they can switch to 100 Mb/s or less when the host is powered off. After you verify these prerequisites, test each ESXi host that is going to use WOL to support vSphere DPM. When you test these hosts, ensure that the vSphere DPM feature is disabled for the cluster. Caution Ensure that any host being added to a vSphere DPM cluster that uses WOL as a wake protocol is tested and disabled from using power management if it fails the testing. If this is not done, vSphere DPM might power off hosts that it subsequently cannot power back up. 6.13.2 Using VM-Host Affinity Rules You use a VM-Host affinity rule to specify an affinity relationship between a group of virtual machines and a group of hosts. When using VM-Host affinity rules, you should be aware of when they could be most useful, how conflicts between rules are resolved, and the importance of caution when setting required affinity rules. One use case where VM-Host affinity rules are helpful is when the software you are running in your virtual machines has licensing restrictions. You can place such virtual machines into a DRS group and then create a rule that requires them to run on a host DRS group that contains only host machines that have the required licenses. Note When you create a VM-Host affinity rule that is based on the licensing or hardware requirements of the software running in your virtual machines, you are responsible for ensuring that the groups are properly set up. The rule does not monitor the software running in the virtual machines nor does it know what non-VMware licenses are in place on which ESXi hosts. If you create more than one VM-Host affinity rule, the rules are not ranked, but are applied equally. Be aware that this has implications for how the rules interact. For example, a virtual machine that belongs to two DRS groups, each of which belongs to a different required rule, can run only on hosts that belong to both of the host DRS groups represented in the rules. When you create a VM-Host affinity rule, its ability to function in relation to other rules is not checked. So it is possible for you to create a rule that conflicts with the other rules you are using. When two VM-Host affinity rules conflict, the older one takes precedence and the newer rule is disabled. DRS only tries to satisfy enabled rules and disabled rules are ignored. DRS, vSphere HA, and vSphere DPM never take any action that results in the violation of required affinity rules (those where the virtual machine DRS group 'must run on' or 'must not run on' the host DRS group). Accordingly, you should exercise caution when using this type of rule because of its potential to adversely affect the functioning of the cluster. If improperly used, required VM-Host affinity rules can fragment the cluster and inhibit the proper functioning of DRS, vSphere HA, and vSphere DPM. A number of cluster functions are not performed if doing so would violate a required affinity rule. DRS does not evacuate virtual machines to place a host in maintenance mode. DRS does not place virtual machines for power-on or load balance virtual machines. vSphere HA does not perform failovers. vSphere DPM does not optimize power management by placing hosts into standby mode. To avoid these situations, exercise caution when creating more than one required affinity rule or consider using VM-Host affinity rules that are preferential only (those where the virtual machine DRS group 'should run on' or 'should not run on' the host DRS group). Ensure that the number of hosts in the cluster with which each virtual machine is affined is large enough that losing a host does not result in a lack of hosts on which the virtual machine can run. Preferential rules can be violated to allow the proper functioning of DRS, vSphere HA, and vSphere DPM. Note You can create an event-based alarm that is triggered when a virtual machine violates a VMHost affinity rule. In the vSphere Client, add a new alarm for the virtual machine and select VM is violating VM-Host Affinity Rule as the event trigger. For more information about creating and editing alarms, see the vSphere Monitoring and Performance documentation. 6.14 Datastore clusters Initial placement occurs when Storage DRS selects a datastore within a datastore cluster on which to place a virtual machine disk. This happens when the virtual machine is being created or cloned, when a virtual machine disk is being migrated to another datastore cluster, or when you add a disk to an existing virtual machine. Initial placement recommendations are made in accordance with space constraints and with respect to the goals of space and I/O load balancing. These goals aim to minimize the risk of over-provisioning one datastore, storage I/O bottlenecks, and performance impact on virtual machines. Storage DRS is invoked at the configured frequency (by default, every eight hours) or when one or more datastores in a datastore cluster exceeds the user-configurable space utilization thresholds. When Storage DRS is invoked, it checks each datastore's space utilization and I/O latency values against the threshold. For I/O latency, Storage DRS uses the 90th percentile I/O latency measured over the course of a day to compare against the threshold. 6.15 Setting the Aggressiveness Level for Storage DRS The aggressiveness of Storage DRS is determined by specifying thresholds for space used and I/O latency. Storage DRS collects resource usage information for the datastores in a datastore cluster. vCenter Server uses this information to generate recommendations for placement of virtual disks on datastores. When you set a low aggressiveness level for a datastore cluster, Storage DRS recommends Storage vMotion migrations only when absolutely necessary, for example, if when I/O load, space utilization, or their imbalance is high. When you set a high aggressiveness level for a datastore cluster, Storage DRS recommends migrations whenever the datastore cluster can benefit from space or I/O load balancing. In the vSphere Client, you can use the following thresholds to set the aggressiveness level for Storage DRS: Space Utilization Storage DRS generates recommendations or performs migrations when the percentage of space utilization on the datastore is greater than the threshold you set in the vSphere Client. I/O Latency Storage DRS generates recommendations or performs migrations when the 90th percentile I/O latency measured over a day for the datastore is greater than the threshold. You can also set advanced options to further configure the aggressiveness level of Storage DRS. Space utilization difference This threshold ensures that there is some minimum difference between the space utilization of the source and the destination. For example, if the space used on datastore A is 82% and datastore B is 79%, the difference is 3. If the threshold is 5, Storage DRS will not make migration recommendations from datastore A to datastore B. I/O load balancing invocation interval After this interval, Storage DRS runs to balance I/O load. I/O imbalance Lowering this value makes I/O load balancing less aggressive. Storage DRS computes an I/O fairness metric between 0 and 1, which 1 being the fairest threshold distribution. I/O load balancing runs only if the computed metric is less than 1 - (I/O imbalance threshold / 100). 6.16 Datastore Cluster Requirements Datastores and hosts that are associated with a datastore cluster must meet certain requirements to use datastore cluster features successfully. Follow these guidelines when you create a datastore cluster. Datastore clusters must contain similar or interchangeable datastores. A datastore cluster can contain a mix of datastores with different sizes and I/O capacities, and can be from different arrays and vendors. However, the following types of datastores cannot coexist in a datastore cluster. NFS and VMFS datastores cannot be combined in the same datastore cluster. Replicated datastores cannot be combined with non-replicated datastores in the same Storage-DRS-enabled datastore cluster. All hosts attached to the datastores in a datastore cluster must be ESXi 5.0 and later. If datastores in the datastore cluster are connected to ESX/ESXi 4.x and earlier hosts, Storage DRS does not run. Datastores shared across multiple datacenters cannot be included in a datastore cluster. As a best practice, do not include datastores that have hardware acceleration enabled in the same datastore cluster as datastores that do not have hardware acceleration enabled. Datastores in a datastore cluster must be homogeneous to guarantee hardware acceleration-supported behavior. o o 6.17 Adding and Removing Datastores from a Datastore Cluster You add and remove datastores to and from an existing datastore cluster by dragging them in the vSphere Client inventory. You can add to a datastore cluster any datastore that is mounted on a host in the vSphere Client inventory, with the following exceptions: All hosts attached to the datastore must be ESXi 5.0 and later. The datastore cannot be in more than one datacenter in the same instance of the vSphere Client. When you remove a datastore from a datastore cluster, the datastore remains in the vSphere Client inventory and is not unmounted from the host. 6.17.1 Place a Datastore in Maintenance Mode If you need to take a datastore out of service, you can place the datastore in Storage DRS maintenance mode. Prerequisites Storage DRS is enabled on the datastore cluster that contains the datastore that is entering maintenance mode. No CD-ROM image files are stored on the datastore. There are at least two datastores in the datastore cluster. Procedure 1. In the vSphere Client inventory, right-click a datastore in a datastore cluster and select Enter SDRS Maintenance Mode. A list of recommendations appears for datastore maintenance mode migration. 1. (Optional) On the Placement Recommendations tab, deselect any recommendations you do not want to apply. Note The datastore cannot enter maintenance mode without evacuating all disks. If you deselect recommendations, you must manually move the affected virtual machines. 1. If necessary, click Apply Recommendations. vCenter Server uses Storage vMotion to migrate the virtual disks from the source datastore to the destination datastore and the datastore enters maintenance mode. The datastore icon might not be immediately updated to reflect the datastore's current state. To update the icon immediately, click Refresh. 6.17.2 gnore Storage DRS Affinity Rules for Maintenance Mode Storage DRS affinity or anti-affinity rules might prevent a datastore from entering maintenance mode. You can ignore these rules when you put a datastore in maintenance mode. When you enable the Ignore Affinity Rules for Maintenance option for a datastore cluster, vCenter Server ignores Storage DRS affinity and anti-affinity rules that prevent a datastore from entering maintenance mode. Storage DRS rules are ignored only for evacuation recommendations. vCenter Server does not violate the rules when making space and load balancing recommendations or initial placement recommendations. Procedure 1. In the vSphere Client inventory, right-click a datastore cluster and select Edit Settings. 2. In the right pane of the Edit Datastore Cluster dialog box, select SDRS Automation. Click Advanced Options. 1. 2. 3. 4. Select IgnoreAffinityRulesForMaintenance. In the Value column, type 1 to enable the option. Type 0 to disable the option. Click OK 6.18 Storage DRS Anti-Affinity Rules You can create Storage DRS anti-affinity rules to control which virtual disks should not be placed on the same datastore within a datastore cluster. By default, a virtual machine's virtual disks are kept together on the same datastore. When you create an anti-affinity rule, it applies to the relevant virtual disks in the datastore cluster. Anti-affinity rules are enforced during initial placement and Storage DRSrecommendation migrations, but are not enforced when a migration is initiated by a user. Note Anti-affinity rules do not apply to CD-ROM ISO image files that are stored on a datastore in a datastore cluster, nor do they apply to swapfiles that are stored in user-defined locations. Inter-VM AntiAffinity Rules Specify which virtual machines should never be kept on the same datastore. See Create Inter-VM Anti-Affinity Rules. Intra-VM AntiAffinity Rules Specify which virtual disks associated with a particular virtual machine must be kept on different datastores. See Create Intra-VM Anti-Affinity Rules. If you move a virtual disk out of the datastore cluster, the affinity or anti-affinity rule no longer applies to that disk. When you move virtual disk files into a datastore cluster that has existing affinity and antiaffinity rules, the following behavior applies: Datastore Cluster B has an intra-VM affinity rule. When you move a virtual disk out of Datastore Cluster A and into Datastore Cluster B, any rule that applied to the virtual disk for a given virtual machine in Datastore Cluster A no longer applies. The virtual disk is now subject to the intra-VM affinity rule in Datastore Cluster B. Datastore Cluster B has an inter-VM anti-affinity rule. When you move a virtual disk out of Datastore Cluster A and into Datastore Cluster B, any rule that applied to the virtual disk for a given virtual machine in Datastore Cluster A no longer applies. The virtual disk is now subject to the inter-VM anti-affinity rule in Datastore Cluster B. Datastore Cluster B has an intra-VM anti-affinity rule. When you move a virtual disk out of Datastore Cluster A and into Datastore Cluster B, the intra-VM anti-affinity rule does not apply to the virtual disk for a given virtual machine because the rule is limited to only specified virtual disks in Datastore Cluster B. Note Storage DRS rules might prevent a datastore from entering maintenance mode. You can choose to ignore Storage DRS rules for maintenance mode by enabling the Ignore Affinity Rules for Maintenance option. 6.18.1 Create Inter-VM Anti-Affinity Rules You can create an anti-affinity rule to indicate that all virtual disks of certain virtual machines must be kept on different datastores. The rule applies to individual datastore clusters. Virtual machines that participate in an inter-VM anti-affinity rule in a datastore cluster must be associated with an intra-VM affinity rule in the datastore cluster. The virtual machines must also comply with the intra-VM affinity rule. If a virtual machine is subject to an inter-VM anti-affinity rule, the following behavior applies: Storage DRS places the virtual machine's virtual disks according to the rule. Storage DRS migrates the virtual disks using vMotion according to the rule, even if the migration is for a mandatory reason such as putting a datastore in maintenance mode. If the virtual machine's virtual disk violates the rule, Storage DRS makes migration recommendations to correct the error or reports the violation as a fault if it cannot make a recommendation that will correct the error. No inter-VM anti-affinity rules are defined by default. Procedure 1. In the vSphere Client inventory, right-click a datastore cluster and select Edit Settings. 2. In the left pane of the Edit Datastore Cluster dialog box, select Rules. 3. Click Add. 4. Type a name for the rule. 5. From the Type menu, select VM anti-affinity. 6. Click Add. 7. Click Select Virtual Machine. 8. Select at least two virtual machines and click OK. 9. Click OK to save the rule. 6.19 Storage vMotion Compatibility with Datastore Clusters A datastore cluster has certain vSphere Storage vMotion® requirements. The host must be running a version of ESXi that supports Storage vMotion. The host must have write access to both the source datastore and the destination datastore. The host must have enough free memory resources to accommodate Storage vMotion. The destination datastore must have sufficient disk space. The destination datastore must not be in maintenance mode or entering maintenance mode. 6.20 Using NUMA Systems with ESXi ESXi supports memory access optimization for Intel and AMD Opteron processors in server architectures that support NUMA (non-uniform memory access). After you understand how ESXi NUMA scheduling is performed and how the VMware NUMA algorithms work, you can specify NUMA controls to optimize the performance of your virtual machines. This chapter includes the following topics: What is NUMA? How ESXi NUMA Scheduling Works VMware NUMA Optimization Algorithms and Settings Resource Management in NUMA Architectures Using Virtual NUMA Specifying NUMA Controls 6.20.1 What is NUMA? NUMA systems are advanced server platforms with more than one system bus. They can harness large numbers of processors in a single system image with superior price to performance ratios. For the past decade, processor clock speed has increased dramatically. A multi-gigahertz CPU, however, needs to be supplied with a large amount of memory bandwidth to use its processing power effectively. Even a single CPU running a memory-intensive workload, such as a scientific computing application, can be constrained by memory bandwidth. This problem is amplified on symmetric multiprocessing (SMP) systems, where many processors must compete for bandwidth on the same system bus. Some high-end systems often try to solve this problem by building a high-speed data bus. However, such a solution is expensive and limited in scalability. NUMA is an alternative approach that links several small, cost-effective nodes using a highperformance connection. Each node contains processors and memory, much like a small SMP system. However, an advanced memory controller allows a node to use memory on all other nodes, creating a single system image. When a processor accesses memory that does not lie within its own node (remote memory), the data must be transferred over the NUMA connection, which is slower than accessing local memory. Memory access times are not uniform and depend on the location of the memory and the node from which it is accessed, as the technology’s name implies. 6.20.2Challenges for Operating Systems Because a NUMA architecture provides a single system image, it can often run an operating system with no special optimizations. The high latency of remote memory accesses can leave the processors under-utilized, constantly waiting for data to be transferred to the local node, and the NUMA connection can become a bottleneck for applications with high-memory bandwidth demands. Furthermore, performance on such a system can be highly variable. It varies, for example, if an application has memory located locally on one benchmarking run, but a subsequent run happens to place all of that memory on a remote node. This phenomenon can make capacity planning difficult. Some high-end UNIX systems provide support for NUMA optimizations in their compilers and programming libraries. This support requires software developers to tune and recompile their programs for optimal performance. Optimizations for one system are not guaranteed to work well on the next generation of the same system. Other systems have allowed an administrator to explicitly decide on the node on which an application should run. While this might be acceptable for certain applications that demand 100 percent of their memory to be local, it creates an administrative burden and can lead to imbalance between nodes when workloads change. Ideally, the system software provides transparent NUMA support, so that applications can benefit immediately without modifications. The system should maximize the use of local memory and schedule programs intelligently without requiring constant administrator intervention. Finally, it must respond well to changing conditions without compromising fairness or performance. 6.20.3How ESXi NUMA Scheduling Works ESXi uses a sophisticated NUMA scheduler to dynamically balance processor load and memory locality or processor load balance. 1. Each virtual machine managed by the NUMA scheduler is assigned a home node. A home node is one of the system’s NUMA nodes containing processors and local memory, as indicated by the System Resource Allocation Table (SRAT). 2. When memory is allocated to a virtual machine, the ESXi host preferentially allocates it from the home node. The virtual CPUs of the virtual machine are constrained to run on the home node to maximize memory locality. 3. The NUMA scheduler can dynamically change a virtual machine's home node to respond to changes in system load. The scheduler might migrate a virtual machine to a new home node to reduce processor load imbalance. Because this might cause more of its memory to be remote, the scheduler might migrate the virtual machine’s memory dynamically to its new home node to improve memory locality. The NUMA scheduler might also swap virtual machines between nodes when this improves overall memory locality. Some virtual machines are not managed by the ESXi NUMA scheduler. For example, if you manually set the processor or memory affinity for a virtual machine, the NUMA scheduler might not be able to manage this virtual machine. Virtual machines that are not managed by the NUMA scheduler still run correctly. However, they don't benefit from ESXi NUMA optimizations. The NUMA scheduling and memory placement policies in ESXi can manage all virtual machines transparently, so that administrators do not need to address the complexity of balancing virtual machines between nodes explicitly. The optimizations work seamlessly regardless of the type of guest operating system. ESXi provides NUMA support even to virtual machines that do not support NUMA hardware, such as Windows NT 4.0. As a result, you can take advantage of new hardware even with legacy operating systems. A virtual machine that has more virtual processors than the number of physical processor cores available on a single hardware node can be managed automatically. The NUMA scheduler accommodates such a virtual machine by having it span NUMA nodes. That is, it is split up as multiple NUMA clients, each of which is assigned to a node and then managed by the scheduler as a normal, non-spanning client. This can improve the performance of certain memory-intensive workloads with high locality. For information on configuring the behavior of this feature, see Advanced Virtual Machine Attributes. ESXi 5.0 and later includes support for exposing virtual NUMA topology to guest operating systems. For more information about virtual NUMA control, see Using Virtual NUMA. 6.20.4 VMware NUMA Optimization Algorithms and Settings This section describes the algorithms and settings used by ESXi to maximize application performance while still maintaining resource guarantees. 6.20.5 Home Nodes and Initial Placement When a virtual machine is powered on, ESXi assigns it a home node. A virtual machine runs only on processors within its home node, and its newly allocated memory comes from the home node as well. Unless a virtual machine’s home node changes, it uses only local memory, avoiding the performance penalties associated with remote memory accesses to other NUMA nodes. When a virtual machine is powered on, it is assigned an initial home node so that the overall CPU and memory load among NUMA nodes remains balanced. Because internode latencies in a large NUMA system can vary greatly, ESXi determines these internode latencies at boot time and uses this information when initially placing virtual machines that are wider than a single NUMA node. These wide virtual machines are placed on NUMA nodes that are close to each other for lowest memory access latencies. Initial placement-only approaches are usually sufficient for systems that run only a single workload, such as a benchmarking configuration that remains unchanged as long as the system is running. However, this approach is unable to guarantee good performance and fairness for a datacenter-class system that supports changing workloads. Therefore, in addition to initial placement, ESXi 5.0 does dynamic migration of virtual CPUs and memory between NUMA nodes for improving CPU balance and increasing memory locality. 6.20.6 Dynamic Load Balancing and Page Migration ESXi combines the traditional initial placement approach with a dynamic rebalancing algorithm. Periodically (every two seconds by default), the system examines the loads of the various nodes and determines if it should rebalance the load by moving a virtual machine from one node to another. This calculation takes into account the resource settings for virtual machines and resource pools to improve performance without violating fairness or resource entitlements. The rebalancer selects an appropriate virtual machine and changes its home node to the least loaded node. When it can, the rebalancer moves a virtual machine that already has some memory located on the destination node. From that point on (unless it is moved again), the virtual machine allocates memory on its new home node and it runs only on processors within the new home node. Rebalancing is an effective solution to maintain fairness and ensure that all nodes are fully used. The rebalancer might need to move a virtual machine to a node on which it has allocated little or no memory. In this case, the virtual machine incurs a performance penalty associated with a large number of remote memory accesses. ESXi can eliminate this penalty by transparently migrating memory from the virtual machine’s original node to its new home node: 1. The system selects a page (4KB of contiguous memory) on the original node and copies its data to a page in the destination node. 2. The system uses the virtual machine monitor layer and the processor’s memory management hardware to seamlessly remap the virtual machine’s view of memory, so that it uses the page on the destination node for all further references, eliminating the penalty of remote memory access. When a virtual machine moves to a new node, the ESXi host immediately begins to migrate its memory in this fashion. It manages the rate to avoid overtaxing the system, particularly when the virtual machine has little remote memory remaining or when the destination node has little free memory available. The memory migration algorithm also ensures that the ESXi host does not move memory needlessly if a virtual machine is moved to a new node for only a short period. When initial placement, dynamic rebalancing, and intelligent memory migration work in conjunction, they ensure good memory performance on NUMA systems, even in the presence of changing workloads. When a major workload change occurs, for instance when new virtual machines are started, the system takes time to readjust, migrating virtual machines and memory to new locations. After a short period, typically seconds or minutes, the system completes its readjustments and reaches a steady state. 6.20.7 Transparent Page Sharing Optimized for NUMA Many ESXi workloads present opportunities for sharing memory across virtual machines. For example, several virtual machines might be running instances of the same guest operating system, have the same applications or components loaded, or contain common data. In such cases, ESXi systems use a proprietary transparent page-sharing technique to securely eliminate redundant copies of memory pages. With memory sharing, a workload running in virtual machines often consumes less memory than it would when running on physical machines. As a result, higher levels of overcommitment can be supported efficiently. Transparent page sharing for ESXi systems has also been optimized for use on NUMA systems. On NUMA systems, pages are shared per-node, so each NUMA node has its own local copy of heavily shared pages. When virtual machines use shared pages, they don't need to access remote memory. Note This default behavior is the same in all previous versions of ESX and ESXi. 6.20.8 Resource Management in NUMA Architectures You can perform resource management with different types of NUMA architecture. With the proliferation of highly multicore systems, NUMA architectures are becoming more popular as these architectures allow better performance scaling of memory intensive workloads. All modern Intel and AMD systems have NUMA support built into the processors. Additionally, there are traditional NUMA systems like the IBM Enterprise XArchitecture that extend Intel and AMD processors with NUMA behavior with specialized chipset support. Typically, you can use BIOS settings to enable and disable NUMA behavior. For example, in AMD Opteron-based HP Proliant servers, NUMA can be disabled by enabling node interleaving in the BIOS. If NUMA is enabled, the BIOS builds a system resource allocation table (SRAT) which ESXi uses to generate the NUMA information used in optimizations. For scheduling fairness, NUMA optimizations are not enabled for systems with too few cores per NUMA node or too few cores overall. You can modify the numa.rebalancecorestotal and numa.rebalancecoresnode options to change this behavior. 6.20.9 Using Virtual NUMA vSphere 5.0 and later includes support for exposing virtual NUMA topology to guest operating systems, which can improve performance by facilitating guest operating system and application NUMA optimizations. Virtual NUMA topology is available to hardware version 8 virtual machines and is enabled by default when the number of virtual CPUs is greater than eight. You can also manually influence virtual NUMA topology using advanced configuration options. You can affect the virtual NUMA topology with two settings in the vSphere Client: number of virtual sockets and number of cores per socket for a virtual machine. If the number of cores per socket (cpuid.coresPerSocket) is greater than one, and the number of virtual cores in the virtual machine is greater than 8, the virtual NUMA node size matches the virtual socket size. If the number of cores per socket is less than or equal to one, virtual NUMA nodes are created to match the topology of the first physical host where the virtual machine is powered on. When the number of virtual CPUs and the amount of memory used grow proportionately, you can use the default values. For virtual machines that consume a disproportionally large amount of memory, you can override the default values in one of the following ways: Increase the number of virtual CPUs, even if this number of virtual CPUs is not used. See Change the Number of Virtual CPUs. Use advanced options to control virtual NUMA topology and its mapping over physical NUMA topology. See Virtual NUMA Controls. 7 Security VMware designed the virtualization layer, or VMkernel, to run virtual machines. It controls the hardware that hosts use and schedules the allocation of hardware resources among the virtual machines. Because the VMkernel is fully dedicated to supporting virtual machines and is not used for other purposes, the interface to the VMkernel is strictly limited to the API required to manage virtual machines. ESXi provides additional VMkernel protection with the following features: Memory Hardening The ESXi kernel, user-mode applications, and executable components such as drivers and libraries are located at random, non-predictable memory addresses. Combined with the non-executable memory protections made available by microprocessors, this provides protection that makes it difficult for malicious code to use memory exploits to take advantage of vulnerabilities. Kernel Module Digital signing ensures the integrity and authenticity of modules, drivers and applications as they are loaded by the VMkernel. Module signing allows ESXi Integrity to identify the providers of modules, drivers, or applications and whether they are VMware-certified. VMware software and certain third-party drivers are signed by VMware. Trusted vSphere uses Intel Trusted Platform Module/Trusted Execution Technology Platform (TPM/TXT) to provide remote attestation of the hypervisor image based on Module (TPM) hardware root of trust. The hypervisor image comprises the following elements: ESXi software (hypervisor) in VIB (package) format Third-party VIBs Third-party drivers To leverage this capability, your ESXi system must have TPM and TXT enabled. When TPM and TXT are enabled, ESXi measures the entire hypervisor stack when the system boots and stores these measurements in the Platform Configuration Registers (PCR) of the TPM. The measurements include the VMkernel, kernel modules, drivers, native management applications that run on ESXi, and any boot-time configuration options. All VIBs that are installed on the system are measured. Third-party solutions can use this feature to build a verifier that detects tampering of the hypervisor image, by comparing the image with an image of the expected known good values. vSphere does not provide a user interface to view these measurements. The measurements are exposed in a vSphere API. An event log is provided as part of the API, as specified by the Trusted Computing Group (TCG) standard for TXT. 7.1.1 Security and Virtual Machines Virtual machines are the containers in which applications and guest operating systems run. By design, all VMware virtual machines are isolated from one another. This isolation enables multiple virtual machines to run securely while sharing hardware and ensures both their ability to access hardware and their uninterrupted performance. Even a user with system administrator privileges on a virtual machine’s guest operating system cannot breach this layer of isolation to access another virtual machine without privileges explicitly granted by the ESXi system administrator. As a result of virtual machine isolation, if a guest operating system running in a virtual machine fails, other virtual machines on the same host continue to run. The guest operating system failure has no effect on: The ability of users to access the other virtual machines The ability of the operational virtual machines to access the resources they need The performance of the other virtual machines Each virtual machine is isolated from other virtual machines running on the same hardware. Although virtual machines share physical resources such as CPU, memory, and I/O devices, a guest operating system on an individual virtual machine cannot detect any device other than the virtual devices made available to it. Virtual Machine Isolation Because the VMkernel mediates the physical resources and all physical hardware access takes place through the VMkernel, virtual machines cannot circumvent this level of isolation. Just as a physical machine communicates with other machines in a network through a network card, a virtual machine communicates with other virtual machines running in the same host through a virtual switch. Further, a virtual machine communicates with the physical network, including virtual machines on other ESXi hosts, through a physical network adapter. Virtual Networking Through Virtual Switches These characteristics apply to virtual machine isolation in a network context: If a virtual machine does not share a virtual switch with any other virtual machine, it is completely isolated from virtual networks within the host. If no physical network adapter is configured for a virtual machine, the virtual machine is completely isolated from any physical networks. If you use the same safeguards (firewalls, antivirus software, and so forth) to protect a virtual machine from the network as you would for a physical machine, the virtual machine is as secure as the physical machine. You can further protect virtual machines by setting up resource reservations and limits on the host. For example, through the detailed resource controls available in ESXi, you can configure a virtual machine so that it always receives at least 10 percent of the host’s CPU resources, but never more than 20 percent. Resource reservations and limits protect virtual machines from performance degradation that would result if another virtual machine consumed excessive shared hardware resources. For example, if one of the virtual machines on a host is incapacitated by a denial-of-service (DoS) attack, a resource limit on that machine prevents the attack from taking up so much of the hardware resources that the other virtual machines are also affected. Similarly, a resource reservation on each of the virtual machines ensures that, in the event of high resource demands by the virtual machine targeted by the DoS attack, all the other virtual machines still have enough resources to operate. By default, ESXi imposes a form of resource reservation by applying a distribution algorithm that divides the available host resources equally among the virtual machines while keeping a certain percentage of resources for use by other system components. This default behavior provides a degree of natural protection from DoS and distributed denial-of-service (DDoS) attacks. You set specific resource reservations and limits on an individual basis to customize the default behavior so that the distribution is not equal across the virtual machine configuration. VMware Security Resources on the Web Topic Resource VMware securityhttp://www.vmware.com/security/ policy, up-to- VMware Security Resources on the Web Topic Resource date security alerts, security downloads, and focus discussions of security topics http://www.vmware.com/support/policies/security_response.html Corporate VMware is committed to helping you maintain a secure environment. security Security issues are corrected in a timely manner. The VMware Security response policy Response Policy states our commitment to resolve possible vulnerabilities in our products. http://www.vmware.com/support/policies/ VMware supports a variety of storage systems, software agents such as backup agents, system management agents, and so forth. You can find lists of agents, tools, and other software that supports ESXi by searching http://www.vmware.com/vmtn/resources/ for ESXi compatibility guides. Third-party software support policy The industry offers more products and configurations than VMware can test. If VMware does not list a product or configuration in a compatibility guide, Technical Support will attempt to help you with any problems, but cannot guarantee that the product or configuration can be used. Always evaluate security risks for unsupported products or configurations carefully. General information about virtualization and security VMware Virtual Security Technical Resource Center http://www.vmware.com/go/security/ Compliance and security standards, as well as partner solutions and in-http://www.vmware.com/go/compliance/ depth content about virtualization and compliance Information about VMsafe technology for protection of http://www.vmware.com/go/vmsafe/ VMware Security Resources on the Web Topic Resource virtual machines, including a list of partner solutions 7.1.2 Connecting to the Virtual Machine Console Through a Firewall When you connect your client to ESXi hosts through vCenter Server, certain ports are required for user and administrator communication with virtual machine consoles. These ports support different client functions, interface with different layers on ESXi, and use different authentication protocols. Port This is the port that vCenter Server assumes is available for receiving data from ESXi. 902 The vSphere Client uses this port to provide a connection for guest operating system mouse, keyboard, screen (MKS) activities on virtual machines. It is through this port that users interact with the virtual machine guest operating systems and applications. Port 902 is the port that the vSphere Client assumes is available when interacting with virtual machines. Port 902 connects vCenter Server to the host through the VMware Authorization Daemon (vmware-authd). This daemon multiplexes port 902 data to the appropriate recipient for processing. VMware does not support configuring a different port for this connection. Port The vSphere Client and SDK use this port to send data to vCenter Server managed 443 hosts. Also, the vSphere SDK, when connected directly to ESXi, use this port to support any management functions related to the server and its virtual machines. Port 443 is the port that clients assume is available when sending data to ESXi. VMware does not support configuring a different port for these connections. Port 443 connects clients to ESXi through the Tomcat Web service or the SDK. The host process multiplexes port 443 data to the appropriate recipient for processing. Port The vSphere Client uses this port to provide a connection for guest operating system 903 MKS activities on virtual machines. It is through this port that users interact with the guest operating systems and applications of the virtual machine. Port 903 is the port that the vSphere Client assumes is available when interacting with virtual machines. VMware does not support configuring a different port for this function. Port 903 connects the vSphere Client to a specified virtual machine configured on ESXi. The following figure shows the relationships between vSphere Client functions, ports, and processes. If you have a firewall between your vCenter Server system and vCenter Server managed host, open ports 443 and 903 in the firewall to allow data transfer to ESXi hosts from vCenter Server . For additional information on configuring the ports, see the firewall system administrator. 7.1.3 Connecting ESXi Hosts Through Firewalls If you have a firewall between two ESXi hosts and you want to allow transactions between the hosts or use vCenter Server to perform any source or target activities, such as vSphere High Availability (vSphere HA) traffic, migration, cloning, or vMotion, you must configure a connection through which the managed hosts can receive data. To configure a connection for receiving data, open ports for traffic from services such as vSphere High Availability, vMotion, and vSphere Fault Tolerance. See TCP and UDP Ports for Management Access for a list of ports. Refer to the firewall system administrator for additional information on configuring the ports. 7.1.4 TCP and UDP Ports for Management Access vCenter Server, ESXi hosts, and other network components are accessed using predetermined TCP and UDP ports. If you manage network components from outside a firewall, you might be required to reconfigure the firewall to allow access on the appropriate ports. The table lists TCP and UDP ports, and the purpose and the type of each. Ports that are open by default at installation time are indicated by (Default). TCP and UDP Ports Port Purpose Traffic Type 22 SSH Server Incoming TCP 53 DNS Client (Default) Incoming and outgoing UDP 68 DHCP Client (Default) Incoming and outgoing UDP TCP and UDP Ports Port Purpose 161 SNMP Server (Default) Traffic Type Incoming UDP vSphere Fault Tolerance (FT) (outgoing TCP, UDP) HTTP access Incoming TCP The default non-secure TCP Web port typically used in conjunction 80 (Default) with port 443 as a front end for access to ESXi networks from the Outgoing TCP, Web. Port 80 redirects traffic to an HTTPS landing page (port UDP 443). WS-Management 111 Incoming and RPC service used for the NIS register by vCenter Virtual Appliance (Default) outgoing TCP 123 NTP Client Outgoing UDP 135 Used to join vCenter Virtual Appliance to sn Active Direcotry (Default) domain Incoming and outgoing TCP 427 The CIM client uses the Service Location Protocol, version 2 (Default) (SLPv2) to find CIM servers. Incoming and outgoing UDP HTTPS access vCenter Server access to ESXi hosts Default SSL Web port vSphere Client access to vCenter Server 443 vSphere Client access to ESXi hosts (Default) WS-Management Incoming TCP vSphere Client access to vSphere Update Manager Third-party network management client connections to vCenter Server Third-party network management clients access to hosts 7.1.5 Security Considerations for VLANs The way you set up VLANs to secure parts of a network depends on factors such as the guest operating system and the way your network equipment is configured. ESXi features a complete IEEE 802.1q-compliant VLAN implementation. VMware cannot make specific recommendations on how to set up VLANs, but there are factors to consider when using a VLAN deployment as part of your security enforcement policy. 7.1.5.1 VLANs as Part of a Broader Security Implementation VLANs are an effective means of controlling where and how widely data is transmitted within the network. If an attacker gains access to the network, the attack is likely to be limited to the VLAN that served as the entry point, lessening the risk to the network as a whole. VLANs provide protection only in that they control how data is routed and contained after it passes through the switches and enters the network. You can use VLANs to help secure Layer 2 of your network architecture—the data link layer. However, configuring VLANs does not protect the physical layer of your network model or any of the other layers. Even if you create VLANs, provide additional protection by securing your hardware (routers, hubs, and so forth) and encrypting data transmissions. VLANs are not a substitute for firewalls in your virtual machine configurations. Most network configurations that include VLANs also include firewalls. If you include VLANs in your virtual network, be sure that the firewalls that you install are VLAN-aware. 7.1.5.2 Properly Configure VLANs Equipment misconfiguration and network hardware, firmware, or software defects can make a VLAN susceptible to VLAN-hopping attacks. VLAN hopping occurs when an attacker with authorized access to one VLAN creates packets that trick physical switches into transmitting the packets to another VLAN that the attacker is not authorized to access. Vulnerability to this type of attack usually results from a switch being misconfigured for native VLAN operation, in which the switch can receive and transmit untagged packets. To help prevent VLAN hopping, keep your equipment up to date by installing hardware and firmware updates as they become available. Also, follow your vendor’s best practice guidelines when you configure your equipment. VMware standard switches do not support the concept of a native VLAN. All data passed on these switches is appropriately tagged. However, because other switches in the network might be configured for native VLAN operation, VLANs configured with standard switches can still be vulnerable to VLAN hopping. If you plan to use VLANs to enforce network security, disable the native VLAN feature for all switches unless you have a compelling reason to operate some of your VLANs in native mode. If you must use native VLAN, see your switch vendor’s configuration guidelines for this feature. 7.1.6 Standard Switch Protection and VLANs VMware standard switches provide safeguards against certain threats to VLAN security. Because of the way that standard switches are designed, they protect VLANs against a variety of attacks, many of which involve VLAN hopping. Having this protection does not guarantee that your virtual machine configuration is invulnerable to other types of attacks. For example, standard switches do not protect the physical network against these attacks; they protect only the virtual network. Standard switches and VLANs can protect against the following types of attacks. MAC flooding 802.1q and ISL tagging attacks Doubleencapsulation attacks Floods a switch with packets that contain MAC addresses tagged as having come from different sources. Many switches use a contentaddressable memory table to learn and store the source address for each packet. When the table is full, the switch can enter a fully open state in which every incoming packet is broadcast on all ports, letting the attacker see all of the switch’s traffic. This state might result in packet leakage across VLANs. Although VMware standard switches store a MAC address table, they do not get the MAC addresses from observable traffic and are not vulnerable to this type of attack. Force a switch to redirect frames from one VLAN to another by tricking the switch into acting as a trunk and broadcasting the traffic to other VLANs. VMware standard switches do not perform the dynamic trunking required for this type of attack and, therefore, are not vulnerable. Occur when an attacker creates a double-encapsulated packet in which the VLAN identifier in the inner tag is different from the VLAN identifier in the outer tag. For backward compatibility, native VLANs strip the outer tag from transmitted packets unless configured to do otherwise. When a native VLAN switch strips the outer tag, only the inner tag is left, and that inner tag routes the packet to a different VLAN than the one identified in the now-missing outer tag. VMware standard switches drop any double-encapsulated frames that a virtual machine attempts to send on a port configured for a specific VLAN. Therefore, they are not vulnerable to this type of attack. Multicast brute- Involve sending large numbers of multicast frames to a known VLAN almost simultaneously to overload the switch so that it mistakenly allows force attacks some of the frames to broadcast to other VLANs. Spanning-tree attacks VMware standard switches do not allow frames to leave their correct broadcast domain (VLAN) and are not vulnerable to this type of attack. Target Spanning-Tree Protocol (STP), which is used to control bridging between parts of the LAN. The attacker sends Bridge Protocol Data Unit (BPDU) packets that attempt to change the network topology, establishing themselves as the root bridge. As the root bridge, the attacker can sniff the contents of transmitted frames. Random frame attacks VMware standard switches do not support STP and are not vulnerable to this type of attack. Involve sending large numbers of packets in which the source and destination addresses stay the same, but in which fields are randomly changed in length, type, or content. The goal of this attack is to force packets to be mistakenly rerouted to a different VLAN. VMware standard switches are not vulnerable to this type of attack. Because new security threats develop over time, do not consider this an exhaustive list of attacks. Regularly check VMware security resources on the Web to learn about security, recent security alerts, and VMware security tactics. 7.2 Securing Standard Switch Ports As with physical network adapters, a virtual network adapter can send frames that appear to be from a different machine or impersonate another machine so that it can receive network frames intended for that machine. Also, like physical network adapters, a virtual network adapter can be configured so that it receives frames targeted for other machines. When you create a standard switch for your network, you add port groups to impose a policy configuration for the virtual machines and storage systems attached to the switch. You create virtual ports through the vSphere Client. As part of adding a port or standard port group to a standard switch, the vSphere Client configures a security profile for the port. You can use this security profile to ensure that the host prevents the guest operating systems for its virtual machines from impersonating other machines on the network. This security feature is implemented so that the guest operating system responsible for the impersonation does not detect that the impersonation was prevented. The security profile determines how strongly you enforce protection against impersonation and interception attacks on virtual machines. To correctly use the settings in the security profile, you must understand the basics of how virtual network adapters control transmissions and how attacks are staged at this level. Each virtual network adapter has its own MAC address assigned when the adapter is created. This address is called the initial MAC address. Although the initial MAC address can be reconfigured from outside the guest operating system, it cannot be changed by the guest operating system. In addition, each adapter has an effective MAC address that filters out incoming network traffic with a destination MAC address different from the effective MAC address. The guest operating system is responsible for setting the effective MAC address and typically matches the effective MAC address to the initial MAC address. When sending packets, an operating system typically places its own network adapter’s effective MAC address in the source MAC address field of the Ethernet frame. It also places the MAC address for the receiving network adapter in the destination MAC address field. The receiving adapter accepts packets only when the destination MAC address in the packet matches its own effective MAC address. Upon creation, a network adapter’s effective MAC address and initial MAC address are the same. The virtual machine’s operating system can alter the effective MAC address to another value at any time. If an operating system changes the effective MAC address, its network adapter receives network traffic destined for the new MAC address. The operating system can send frames with an impersonated source MAC address at any time. This means an operating system can stage malicious attacks on the devices in a network by impersonating a network adapter that the receiving network authorizes. You can use standard switch security profiles on hosts to protect against this type of attack by setting three options. If you change any default settings for a port, you must modify the security profile by editing standard switch settings in the vSphere Client. 7.2.1 MAC Address Changes The setting for the MAC Address Changes option affects traffic that a virtual machine receives. When the option is set to Accept, ESXi accepts requests to change the effective MAC address to other than the initial MAC address. When the option is set to Reject, ESXi does not honor requests to change the effective MAC address to anything other than the initial MAC address, which protects the host against MAC impersonation. The port that the virtual adapter used to send the request is disabled and the virtual adapter does not receive any more frames until it changes the effective MAC address to match the initial MAC address. The guest operating system does not detect that the MAC address change was not honored. Note The iSCSI initiator relies on being able to get MAC address changes from certain types of storage. If you are using ESXi iSCSI and have iSCSI storage, set the MAC Address Changes option to Accept. In some situations, you might have a legitimate need for more than one adapter to have the same MAC address on a network—for example, if you are using Microsoft Network Load Balancing in unicast mode. When Microsoft Network Load Balancing is used in the standard multicast mode, adapters do not share MAC addresses. MAC address changes settings affect traffic leaving a virtual machine. MAC address changes will occur if the sender is permitted to make them, even if standard switches or a receiving virtual machine does not permit MAC address hanges. 7.2.2 Forged Transmissions The setting for the Forged Transmits option affects traffic that is transmitted from a virtual machine. When the option is set to Accept, ESXi does not compare source and effective MAC addresses. To protect against MAC impersonation, you can set this option to Reject. If you do, the host compares the source MAC address being transmitted by the operating system with the effective MAC address for its adapter to see if they match. If the addresses do not match, ESXi drops the packet. The guest operating system does not detect that its virtual network adapter cannot send packets by using the impersonated MAC address. The ESXi host intercepts any packets with impersonated addresses before they are delivered, and the guest operating system might assume that the packets are dropped. 7.2.3 Promiscuous Mode Operation Promiscuous mode eliminates any reception filtering that the virtual network adapter would perform so that the guest operating system receives all traffic observed on the wire. By default, the virtual network adapter cannot operate in promiscuous mode. Although promiscuous mode can be useful for tracking network activity, it is an insecure mode of operation, because any adapter in promiscuous mode has access to the packets regardless of whether some of the packets are received only by a particular network adapter. This means that an administrator or root user within a virtual machine can potentially view traffic destined for other guest or host operating systems. Note In some situations, you might have a legitimate reason to configure a standard switch to operate in promiscuous mode (for example, if you are running network intrusion detection software or a packet sniffer). 7.3 Cipher Strength Transmitting data over insecure connections presents a security risk because malicious users might be able to scan data as it travels through the network. As a safeguard, network components commonly encrypt the data so that it cannot be easily read. To encrypt data, the sending component, such as a gateway or redirector, applies cryptographic algorithms, or ciphers, to alter the data before transmitting it. The receiving component uses a key to decrypt the data, returning it to its original form. Several ciphers are in use, and the level of security that each provides is different. One measure of a cipher’s ability to protect data is its cipher strength—the number of bits in the encryption key. The larger the number, the more secure the cipher. To ensure the protection of the data transmitted to and from external network connections, ESXi uses one of the strongest block ciphers available—256-bit AES block encryption. ESXi also uses 1024-bit RSA for key exchange. These encryption algorithms are the default for the following connections. vSphere Client connections to vCenter Server and to ESXi through the management interface. SDK connections to vCenter Server and to ESXi. Management interface connections to virtual machines through the VMkernel. SSH connections to ESXi through the management interface. 7.3.1 SSH Security You can use SSH to remotely log in to the ESXi Shell and perform troubleshooting tasks for the host. SSH configuration in ESXi is enhanced to provide a high security level. Version 1 SSH protocol disabled VMware does not support Version 1 SSH protocol and uses Version 2 protocol exclusively. Version 2 eliminates certain security problems present in Version 1 and provides you with a safe way to communicate with the management interface. Improved SSH supports only 256-bit and 128-bit AES ciphers for your connections. cipher strength These settings are designed to provide solid protection for the data you transmit to the management interface through SSH. If this configuration is too restricted for your needs, you can lower security parameters. 7.4 Control CIM-Based Hardware Monitoring Tool Access The Common Information Model (CIM) system provides an interface that enables hardwarelevel management from remote applications using a set of standard APIs. To ensure that the CIM interface is secure, provide only the minimum access necessary to these applications. If an application has been provisioned with a root or full administrator account and the application is compromised, the full virtual environment might be compromised. CIM is an open standard that defines a framework for agent-less, standards-based monitoring of hardware resources for ESXi. This framework consists of a CIM object manager, often called a CIM broker, and a set of CIM providers. CIM providers are used as the mechanism to provide management access to device drivers and underlying hardware. Hardware vendors, including server manufacturers and specific hardware device vendors, can write providers to provide monitoring and management of their particular devices. VMware also writes providers that implement monitoring of server hardware, ESXi storage infrastructure, and virtualization-specific resources. These providers run inside the ESXi system and therefore are designed to be extremely lightweight and focused on specific management tasks. The CIM broker takes information from all CIM providers, and presents it to the outside world via standard APIs, the most common one being WS-MAN. Do not provide root credentials to remote applications to access the CIM interface. Instead, create a service account specific to these applications and grant read-only access to CIM information to any local account defined on the ESXi system, as well as any role defined in vCenter Server. Procedure Create a service account specific to CIM applications. Grant read-only access to CIM information to any local account defined on the ESXi system, as well as any role defined in vCenter Server. (Optional) If the application requires write access to the CIM interface, create a role to apply to the service account with only two privileges: o Host.Config.SystemManagement o Host.CIM.CIMInteraction This role can be local to the host or centrally defined on vCenter Server, depending on how the monitoring application works. When a user logs into the host with the service account (for example, using the vSphere Client), the user has only the privileges SystemManagement and CIMInteraction, or readonly access. 7.5 General Security Recommendations To protect the host against unauthorized intrusion and misuse, VMware imposes constraints on several parameters, settings, and activities. You can loosen the constraints to meet your configuration needs, but if you do so, make sure that you are working in a trusted environment and have taken enough other security measures to protect the network as a whole and the devices connected to the host. Consider the following recommendations when evaluating host security and administration. Limit user access. To improve security, restrict user access to the management interface and enforce access security policies like setting up password restrictions. The ESXi Shell has privileged access to certain parts of the host. Therefore, provide only trusted users with ESXi Shell login access. Also, strive to run only the essential processes, services, and agents such as virus checkers, and virtual machine backups. Use the vSphere Client to administer your ESXi hosts. Whenever possible, use the vSphere Client or a third-party network management tool to administer your ESXi hosts instead of working though the command-line interface as the root user. Using the vSphere Client lets you limit the accounts with access to the ESXi Shell, safely delegate responsibilities, and set up roles that prevent administrators and users from using capabilities they do not need. Use only VMware sources to upgrade ESXi components. The host runs a variety of third-party packages to support management interfaces or tasks that you must perform. VMware does not support upgrading these packages from anything other than a VMware source. If you use a download or patch from another source, you might compromise management interface security or functions. Regularly check third-party vendor sites and the VMware knowledge base for security alerts. In addition to implementing the firewall, risks to the hosts are mitigated using other methods. ESXi runs only services essential to managing its functions, and the distribution is limited to the features required to run ESXi. By default, all ports not specifically required for management access to the host are closed. You must specifically open ports if you need additional services. By default, weak ciphers are disabled and all communications from clients are secured by SSL. The exact algorithms used for securing the channel depend on the SSL handshake. Default certificates created on ESXi use SHA-1 with RSA encryption as the signature algorithm. The Tomcat Web service, used internally by ESXi to support access by Web clients, has been modified to run only those functions required for administration and monitoring by a Web client. As a result, ESXi is not vulnerable to the Tomcat security issues reported in broader use. VMware monitors all security alerts that could affect ESXi security and, if needed, issues a security patch. Insecure services such as FTP and Telnet are not installed, and the ports for these services are closed by default. Because more secure services such as SSH and SFTP are easily available, always avoid using these insecure services in favor of their safer alternatives. If you must use insecure services and have implemented sufficient protection for the host, you must explicitly open ports to support them. 7.6 ESXi Firewall Configuration ESXi includes a firewall between the management interface and the network. The firewall is enabled by default. At installation time, the ESXi firewall is configured to block incoming and outgoing traffic, except traffic for the default services listed in TCP and UDP Ports for Management Access. Note The firewall also allows Internet Control Message Protocol (ICMP) pings and communication with DHCP and DNS (UDP only) clients. Supported services and management agents that are required to operate the host are described in a rule set configuration file in the ESXi firewall directory /etc/vmware/firewall/. The file contains firewall rules and lists each rule's relationship with ports and protocols. You cannot add a rule to the ESXi firewall unless you create and install a VIB that contains the rule set configuration file. The VIB authoring tool is available to VMware partners. Note The behavior of the NFS Client rule set (nfsClient) is different from other rule sets. When the NFS Client rule set is enabled, all outbound TCP ports are open for the destination hosts in the list of allowed IP addresses. See NFS Client Rule Set Behavior for more information. 7.6.1 Rule Set Configuration Files A rule set configuration file contains firewall rules and describes each rule's relationship with ports and protocols. The rule set configuration file can contain rule sets for multiple services. Rule set configuration files are located in the /etc/vmware/firewall/ directory. To add a service to the host security profile, VMware partners can create a VIB that contains the port rules for the service in a configuration file. VIB authoring tools are available to VMware partners. The ESXi 5.x ruleset.xml format is the same as in ESX and ESXi 4.x, but has two additional tags: enabled and required. The ESXi 5.x firewall continues to support the 4.x ruleset.xml format. Each set of rules for a service in the rule set configuration file contains the following information. A numeric identifier for the service, if the configuration file contains more than one service. A unique identifier for the rule set, usually the name of the service. For each rule, the file contains one or more port rules, each with a definition for direction, protocol, port type, and port number or range of port numbers. A flag indicating whether the service is enabled or disabled when the rule set is applied. An indication of whether the rule set is required and cannot be disabled. Only users with the Administrator role can access the ESXi Shell. Users who are in the Active Directory group ESX Admins are automatically assigned the Administrator role. Any user with the Administrator role can execute system commands (such as vmware -v) using the ESXi Shell. 7.6.2 Lockdown Mode To increase the security of your ESXi hosts, you can put them in lockdown mode. When you enable lockdown mode, no users other than vpxuser have authentication permissions, nor can they perform operations against the host directly. Lockdown mode forces all operations to be performed through vCenter Server. When a host is in lockdown mode, you cannot run vSphere CLI commands from an administration server, from a script, or from vMA against the host. External software or management tools might not be able to retrieve or modify information from the ESXi host. Note Users with the DCUI Access privilege are authorized to log in to the Direct Console User Interface (DCUI) when lockdown mode is enabled. When you disable lockdown mode using the DCUI, all users with the DCUI Access privilege are granted the Administrator role on the host. You grant the DCUI Access privilege in Advanced Settings. Enabling or disabling lockdown mode affects which types of users are authorized to access host services, but it does not affect the availability of those services. In other words, if the ESXi Shell, SSH, or Direct Console User Interface (DCUI) services are enabled, they will continue to run whether or not the host is in lockdown mode. You can enable lockdown mode using the Add Host wizard to add a host to vCenter Server, using the vSphere Client to manage a host, or using the Direct Console User Interface (DCUI). Note If you enable or disable lockdown mode using the Direct Console User Interface (DCUI), permissions for users and groups on the host are discarded. To preserve these permissions, you must enable and disable lockdown mode using the vSphere Client connected to vCenter Server. Lockdown mode is only available on ESXi hosts that have been added to vCenter Server. This chapter includes the following topics: Lockdown Mode Behavior Lockdown Mode Configurations Enable Lockdown Mode Using the vSphere Client Enable Lockdown Mode Using the vSphere Web Client Enable Lockdown Mode from the Direct Console User Interface 7.7 Lockdown Mode Behavior Enabling lockdown mode affects which users are authorized to access host services. Users who were logged in to the ESXi Shell before lockdown mode was enabled remain logged in and can run commands. However, these users cannot disable lockdown mode. No other users, including the root user and users with the Administrator role on the host, can use the ESXi Shell to log in to a host that is in lockdown mode. Users with administrator privileges on the vCenter Server system can use the vSphere Client to disable lockdown mode for hosts that are managed by the vCenter Server system. Users granted the DCUI Access privilege can always log directly in to the host using the Direct Console User Interface (DCUI) to disable lockdown mode, even if the user does not have the Administrator role on the host. You must use Advanced Settings to grant the DCUI Access privilege. Note When you disable lockdown mode using the DCUI, all users with the DCUI Access privilege are granted the Administrator role on the host. Root users or users with the Administrator role on the host cannot log directly in to the host with the DCUI if they have not been granted the DCUI Access privilege. If the host is not managed by vCenter Server or if the host is unreachable, only DCUI Access users can log into the DCUI and disable lockdown mode. If the DCUI service is stopped, you must reinstall ESXi. Different services are available to different types of users when the host is running in lockdown mode, compared to when the host is running in normal mode. Nonroot users cannot run system commands in the ESXi Shell. Lockdown Mode Behavior Service Normal Mode Lockdown Mode vSphere WebServices API All users, based on ESXi permissions vCenter only (vpxuser) CIM Providers Root users and users with Admin role on the host vCenter only (ticket) Direct Console UI Users with Admin role on the host and users (DCUI) with the DCUI Access privilege Users with the DCUI Access privilege. Lockdown Mode Behavior Service Normal Mode Lockdown Mode ESXi Shell Users with Admin role on the host No users SSH Users with Admin role on the host No users 7.8 Lockdown Mode Configurations You can enable or disable remote and local access to the ESXi Shell to create different lockdown mode configurations. The following table lists which services are enabled for three typical configurations. Caution If you lose access to vCenter Server while running in Total Lockdown Mode, you must reinstall ESXi to gain access to the host. Lockdown Mode Configurations Service Default Configuration Recommended Configuration Total Lockdown Configuration Lockdown Off On On ESXi Shell Off Off Off SSH Off Off Off Direct Console UI On (DCUI) On Off 7.9 ESXi Authentication and User Management A user is an individual authorized to log in to ESXi or vCenter Server. In vSphere 5.1, ESXi user management has the following caveats. You cannot create ESXi users with the vSphere Web Client. You must log directly into the host with the vSphere Client to create ESXi users. ESXi 5.1 does not support local groups. However, Active Directory groups are supported. To prevent anonymous users such as root from accessing the host with the Direct Console User Interface (DCUI) or ESXi Shell, remove the user's administrator privileges on the root folder of the host. This applies to both local users and Active Directory users and groups. Most inventory objects inherit permissions from a single parent object in the hierarchy. For example, a datastore inherits permissions from either its parent datastore folder or parent datacenter. Virtual machines inherit permissions from both the parent virtual machine folder and the parent host, cluster, or resource pool simultaneously. To restrict a user’s privileges on a virtual machine, you must set permissions on both the parent fo 7.9.1 Multiple Permission Settings Objects might have multiple permissions, but only one permission for each user or group. Permissions applied on a child object always override permissions that are applied on a parent object. Virtual machine folders and resource pools are equivalent levels in the hierarchy. If you assign propagating permissions to a user or group on a virtual machine's folder and its resource pool, the user has the privileges propagated from the resource pool and from the folder. If multiple group permissions are defined on the same object and the user belongs to two or more of those groups, two situations are possible: If no permission is defined for the user on that object, the user is assigned the set of privileges assigned to the groups for that object. If a permission is defined for the user on that object, the user's permission takes precedence over all group permissions. 7.9.1.1 Example 1: Inheritance of Multiple Permissions This example illustrates how an object can inherit multiple permissions from groups that are granted permission on a parent object. In this example, two permissions are assigned on the same object for two different groups. Role 1 can power on virtual machines. Role 2 can take snapshots of virtual machines. Group A is granted Role 1 on VM Folder, with the permission set to propagate to child objects. Group B is granted Role 2 on VM Folder, with the permission set to propagate to child objects. User 1 is not assigned specific permission. User 1, who belongs to groups A and B, logs on. User 1 can both power on and take snapshots of VM A and VM B. Example 1: Inheritance of Multiple Permissions 7.9.1.2 Example 2: Child Permissions Overriding Parent Permissions This example illustrates how permissions that are assigned on a child object can override permissions that are assigned on a parent object. You can use this overriding behavior to restrict user access to particular areas of the inventory. In this example, permissions are assigned to two different groups on two different objects. Role 1 can power on virtual machines. Role 2 can take snapshots of virtual machines. Group A is granted Role 1 on VM Folder, with the permission set to propagate to child objects. Group B is granted Role 2 on VM B. User 1, who belongs to groups A and B, logs on. Because Role 2 is assigned at a lower point in the hierarchy than Role 1, it overrides Role 1 on VM B. User 1 can power on VM A, but not take snapshots. User 1 can take snapshots of VM B, but not power it on. Example 2: Child Permissions Overriding Parent Permissions 7.9.1.3 Example 3: User Permissions Overriding Group Permissions This example illustrates how permissions assigned directly to an individual user override permissions assigned to a group that the user is a member of. In this example, permissions are assigned to a user and to a group on the same object. Role 1 can power on virtual machines. Group A is granted Role 1 on VM Folder. User 1 is granted No Access role on VM Folder. User 1, who belongs to group A, logs on. The No Access role granted to User 1 on VM Folder overrides the group permission. User 1 has no access to VM Folder or VMs A and B. Example 3: User Permissions Overriding Group Permissions 7.9.2 root User Permissions Root users can only perform activities on the specific host that they are logged in to. For security reasons, you might not want to use the root user in the Administrator role. In this case, you can change permissions after installation so that the root user no longer has administrative privileges. Alternatively, you can remove the access permissions for the root user. (Do not remove the root user itself.) Important If you remove the access permissions for the root user, you must first create another permission at the root level that has a different user assigned to the Administrator role. Note In vSphere 5.1, only the root user and no other user with administrator privileges is permitted to add a host to vCenter Server. Assigning the Administrator role to a different user helps you maintain security through traceability. The vSphere Client logs all actions that the Administrator role user initiates as events, providing you with an audit trail. If all administrators log in as the root user, you cannot tell which administrator performed an action. If you create multiple permissions at the root level—each associated with a different user—you can track the actions of each administrator. 7.10 Best Practices for Roles and Permissions Use best practices for roles and permissions to maximize the security and manageability of your vCenter Server environment. VMware recommends the following best practices when configuring roles and permissions in your vCenter Server environment: Where possible, grant permissions to groups rather than individual users. Grant permissions only where needed. Using the minimum number of permissions makes it easier to understand and manage your permissions structure. If you assign a restrictive role to a group, check that the group does not contain the Administrator user or other users with administrative privileges. Otherwise, you could unintentionally restrict administrators' privileges in parts of the inventory hierarchy where you have assigned that group the restrictive role. Use folders to group objects to correspond to the differing permissions you want to grant for them. Use caution when granting a permission at the root vCenter Server level. Users with permissions at the root level have access to global data on vCenter Server, such as roles, custom attributes, vCenter Server settings, and licenses. Changes to licenses and roles propagate to all vCenter Server systems in a Linked Mode group, even if the user does not have permissions on all of the vCenter Server systems in the group. In most cases, enable propagation on permissions. This ensures that when new objects are inserted in to the inventory hierarchy, they inherit permissions and are accessible to users. Use the No Access role to masks specific areas of the hierarchy that you don’t want particular users to have access to. 7.11 Replace a Default ESXi Certificate with a CA-Signed Certificate ESXi uses automatically generated certificates that are created as part of the installation process. These certificates are unique and make it possible to begin using the server, but they are not verifiable and they are not signed by a trusted, well-known certificate authority (CA). Using default certificates might not comply with the security policy of your organization. If you require a certificate from a trusted certificate authority, you can replace the default certificate. Note If the host has Verify Certificates enabled, replacing the default certificate might cause vCenter Server to stop managing the host. If the new certificate is not verifiable by vCenter Server, you must reconnect the host using the vSphere Client. ESXi supports only X.509 certificates to encrypt session information sent over SSL connections between server and client components. 7.12 Modifying ESXi Web Proxy Settings When you modify Web proxy settings, you have several encryption and user security guidelines to consider. Note Restart the host process after making any changes to host directories or authentication mechanisms. Do not set up certificates using pass phrases. ESXi does not support pass phrases, also known as encrypted keys. If you set up a pass phrase, ESXi processes cannot start correctly. You can configure the Web proxy so that it searches for certificates in a location other than the default location. This capability proves useful for companies that prefer to centralize their certificates on a single machine so that multiple hosts can use the certificates. Caution If certificates are not stored locally on the host—for example, if they are stored on an NFS share—the host cannot access those certificates if ESXi loses network connectivity. As a result, a client connecting to the host cannot successfully participate in a secure SSL handshake with the host. To support encryption for user names, passwords, and packets, SSL is enabled by default for vSphere Web services SDK connections. If you want to configure the these connections so that they do not encrypt transmissions, disable SSL for your vSphere Web Services SDK connection by switching the connection from HTTPS to HTTP. Consider disabling SSL only if you created a fully trusted environment for these clients, where firewalls are in place and transmissions to and from the host are fully isolated. Disabling SSL can improve performance, because you avoid the overhead required to perform encryption. To protect against misuse of ESXi services, most internal ESXi services are accessible only through port 443, the port used for HTTPS transmission. Port 443 acts as a reverse proxy for ESXi. You can see a list of services on ESXi through an HTTP welcome page, but you cannot directly access the Storage Adapters services without proper authorization. You can change this configuration so that individual services are directly accessible through HTTP connections. Do not make this change unless you are using ESXi in a fully trusted environment. When you upgrade vCenter Server, the certificate remains in place. 7.13 General Virtual Machine Protection A virtual machine is, in most respects, the equivalent of a physical server. Employ the same security measures in virtual machines that you do for physical systems. For example, ensure that antivirus, anti-spy ware, intrusion detection, and other protection are enabled for every virtual machine in your virtual infrastructure. Keep all security measures up-to-date, including applying appropriate patches. It is especially important to keep track of updates for dormant virtual machines that are powered off, because it can be easy to overlook them. 7.13.1 Disable Unnecessary Functions Inside Virtual Machines Any service running in a virtual machine provides the potential for attack. By disabling unnecessary system components that are not necessary to support the application or service running on the system, you reduce the number of components that can be attacked. Virtual machines do not usually require as many services or functions as physical servers. When you virtualize a system, evaluate whether a particular service or function is necessary. Procedure Disable unused services in the operating system. For example, if the system runs a file server, turn off any Web services. Disconnect unused physical devices, such as CD/DVD drives, floppy drives, and USB adaptors. See Removing Unnecessary Hardware Devices. Turn off screen savers. Do not run the X Window system on Linux, BSD, or Solaris guest operating systems unless it is necessary. 7.13.2 Use Templates to Deploy Virtual Machines When you manually install guest operating systems and applications on a virtual machine, you introduce a risk of misconfiguration. By using a template to capture a hardened base operating system image with no applications installed, you can ensure that all virtual machines are created with a known baseline level of security. You can use these templates to create other, application-specific templates, or you can use the application template to deploy virtual machines. Procedure 1. Provide templates for virtual machine creation that contain hardened, patched, and properly configured operating system deployments. If possible, deploy applications in templates as well. Ensure that the applications do not depend on information specific to the virtual machine to be deployed. What to do next You can convert a template to a virtual machine and back to a template in the vSphere Client, which makes updating templates easy. For more information about templates, see the vSphere Virtual Machine Administration documentation. 7.13.3 Prevent Virtual Machines from Taking Over Resources When one virtual machine consumes so much of the host resources that other virtual machines on the host cannot perform their intended functions, a Denial of Service (DoS) might occur. To prevent a virtual machine from causing a DoS, use host resource management features such as setting shares and limits to control the server resources that a virtual machine consumes. By default, all virtual machines on a host share resources equally. Procedure 1. Use shares or reservations to guarantee resources to critical virtual machines. Limits constrain resource consumption by virtual machines that have a greater risk of being exploited or attacked, or that run applications that are known to have the potential to greatly consume resources. 7.14 Removing Unnecessary Hardware Devices Any enabled or connected device represents a potential attack channel. Users and processes without privileges on a virtual machine can connect or disconnect hardware devices, such as network adapters and CD-ROM drives. Attackers can use this capability to breach virtual machine security. Removing unnecessary hardware devices can help prevent attacks. Use the following guidelines to increase virtual machine security. Ensure that unauthorized devices are not connected and remove any unneeded or unused hardware devices. Disable unnecessary virtual devices from within a virtual machine. An attacker with access to a virtual machine can connect a disconnected CD-ROM drive and access sensitive information on the media left in the drive, or disconnect a network adapter to isolate the virtual machine from its network, resulting in a denial of service. Ensure that no device is connected to a virtual machine if it is not required. Serial and parallel ports are rarely used for virtual machines in a datacenter environment, and CD/DVD drives are usually connected only temporarily during software installation. For less commonly used devices that are not required, either the parameter should not be 7.15 Securing vCenter Server Systems 7.15.1 Hardening the vCenter Server Host Operating System Protect the host where vCenter Server is running against vulnerabilities and attacks by ensuring that the operating system of the host (Windows or Linux) is as secure as possible. Maintain a supported operating system, database, and hardware for the vCenter Server system. If vCenter Server is not running on a support operating system, it might not run properly, making vCenter Server vulnerable to attacks. Keep the vCenter Server system properly patched. By staying up-to-date with operating system patches, the server is less vulnerable to attack. Provide operating system protection on the vCenter Server host. Protection includes antivirus and antimalware software. For operating system and database compatibility information, see the vSphere Compatibility Matrixes. 7.15.2 Best Practices for vCenter Server Privileges Strictly control vCenter Server administrator privileges to increase security for the system. Full administrative rights to vCenter Server should be removed from the local Windows administrator account and granted to a special-purpose local vCenter Server administrator account. Grant full vSphere administrative rights only to those administrators who are required to have it. Do not grant this privilege to any group whose membership is not strictly controlled. Avoid allowing users to log in directly to the vCenter Server system. Allow only those users who have legitimate tasks to perform to log into the system and ensure that these events are audited. Install vCenter Server using a service account instead of a Windows account. You can use a service account or a Windows account to run vCenter Server. Using a service account allows you to enable Windows authentication for SQL Server, which provides more security. The service account must be an administrator on the local machine. Check for privilege reassignment when you restart vCenter Server. If the user or user group that is assigned the Administrator role on the root folder of the server cannot be verified as a valid user or group, the Administrator privileges are removed and assigned to the local Windows Administrators group. Grant minimal privileges to the vCenter Server database user. The database user requires only certain privileges specific to database access. In addition, some privileges are required only for installation and upgrade. These can be removed after the product is installed or upgraded. 7.15.3 Restrict Use of the Administrator Privilege By default, vCenter Server grants full administrator privileges to the administrator of the local system, which can be accessed by domain administrators. To minimize risk of this privilege being abused, remove administrative rights from the local operating system's administrator account and assign these rights to a special-purpose local vSphere administrator account. Use the local vSphere account to create individual user accounts. Grant the Administrator privilege only to administrators who are required to have it. Do not grant the privilege to any group whose membership is not strictly controlled. Procedure 1. Create a user account that you will use to manage vCenter Server (for example, viadmin). 2. Ensure that the user does not belong to any local groups, such as the Administrators group. 3. Log into the vCenter Server system as the local operating system administrator and grant the role of global vCenter Server administrator to the user account you created (for example, vi-admin). 4. Log out of vCenter Server and log in with the user account you created (vi-admin). 5. Verify that the user can perform all tasks available to a vCenter Server administrator. 6. Remove the administrator privileges that are assigned to the local operating system administrator user or group. 7.15.4 Restrict Use of the Administrator Role Secure the vCenter Server Administrator role and assign it only to certain users. 7.16 Best Practices for Virtual Machine and Host Security 7.17 Installing Antivirus Software Because each virtual machine hosts a standard operating system, consider protecting it from viruses by installing antivirus software. Depending on how you are using the virtual machine, you might also want to install a software firewall. Stagger the schedule for virus scans, particularly in deployments with a large number of virtual machines. Performance of systems in your environment will degrade significantly if you scan all virtual machines simultaneously. Because software firewalls and antivirus software can be virtualization-intensive, you can balance the need for these two security measures against virtual machine performance, especially if you are confident that your virtual machines are in a fully trusted environment. 7.18 Managing ESXi Log Files Log files are an important component of troubleshooting attacks and obtaining information about breaches of host security Logging to a secure, centralized log server can help prevent log tampering. Remote logging also provides a long-term audit record. Take the following measures to increase the security of the host. Configure persistent logging to a datastore. By default, the logs on ESXi hosts are stored in the in-memory file system. Therefore, they are lost when you reboot the host, and only 24 hours of log data is stored. When you enable persistent logging, you have a dedicated record of server activity available for the host. Remote logging to a central host allows you to gather log files onto a central host, where you can monitor all hosts with a single tool. You can also do aggregate analysis and searching of log data, which might reveal information about things like coordinated attacks on multiple hosts. Configure remote secure syslog on ESXi hosts using a remote command line such as vCLI or PowerCLI, or using an API client. Query the syslog configuration to make sure that a valid syslog server has been configured, including the correct port. 7.18.1 Configure Syslog on ESXi Hosts All ESXi hosts run a syslog service (vmsyslogd), which logs messages from the VMkernel and other system components to log files. You can use the vSphere Client or the esxcli system syslog vCLI command to configure the syslog service. For more information about using vCLI commands, see Getting Started with vSphere Command-Line Interfaces. Procedure 1. 2. 3. 4. 5. In the vSphere Client inventory, select the host. Click the Configuration tab. In the Software panel, click Advanced Settings. Select Syslog in the tree control. To set up logging globally, click global and make changes to the fields on the right. 7.19 Securing Fault Tolerance Logging Traffic When you enable Fault Tolerance (FT), VMware vLockstep captures inputs and events that occur on a Primary VM and sends them to the Secondary VM, which is running on another host. This logging traffic between the Primary and Secondary VMs is unencrypted and contains guest network and storage I/O data, as well as the memory contents of the guest operating system. This traffic can include sensitive data such as passwords in plaintext. To avoid such data being divulged, ensure that this network is secured, especially to avoid "man-in-themiddle" attacks. For example, use a private network for FT logging traffic. 7.20Auto Deploy Security Considerations To best protect your environment, be aware of security risks that might exist when you use Auto Deploy with host profiles. In most cases, administrators set up Auto Deploy to provision target hosts not only with an image, but also with a host profile. The host profile includes configuration information such as authentication or network settings. Host profiles can be set up to prompt the user for input on first boot. The user input is stored in an answer file. The host profile and answer file (if applicable) are included in the boot image that Auto Deploy downloads to a machine. The administrator password and user passwords that are included with the host profile and answer file are MD5-encrypted. Any other passwords associated with host profiles are in the clear. Use the vSphere Authentication Service to set up Active Directory to avoid exposing the Active Directory password. If you set up Active Directory using host profiles, the passwords are not protected. For more information about Auto Deploy, see the Auto Deploy information that is part of the vSphere Installation and Setup documentation. For more information about host profiles and answer files, see the vSphere Host Profiles documentation. 7.21 Image Builder Security Considerations To protect the integrity of the ESXi host, do not allow users to install unsigned (communitysupported) VIBs. An unsigned VIB contains untested code that is not certified by, accepted by, or supported by VMware or its partners. Community-supported VIBs do not have a digital signature. The ESXi Image Profile lets you set an acceptance level for the type of VIBs that are allowed on the host. The acceptance levels include the following. VMware Certified. VIBs that are VMware Certified are created, tested, and signed by VMware. VMware Accepted. VIBs that are created by a VMware partner, but tested and signed by VMware. Partner Supported. VIBs that are created, tested, and signed by a certified VMware partner. Community Supported. VIBs that have not been tested by VMware or a VMware partner. For more information about Image Builder, see the vSphere Installation and Setup documentation. 7.22 Host Password Strength and Complexity By default, ESXi uses the pam_passwdqc.so plug-in to set the rules that users must observe when creating passwords and to check password strength. The pam_passwdqc.so plug-in lets you determine the basic standards that all passwords must meet. By default, ESXi imposes no restrictions on the root password. However, when nonroot users attempt to change their passwords, the passwords they choose must meet the basic standards that pam_passwdqc.so sets. A valid password should contain a combination of as many character classes as possible. Character classes include lowercase letters, uppercase letters, numbers, and special characters such as an underscore or dash. Note When the number of character classes is counted, the plug-in does not count uppercase letters used as the first character in the password and numbers used as the last character of a password. To configure password complexity, you can change the default value of the following parameters. retry is the number of times a user is prompted for a new password if the password candidate is not sufficiently strong. N0 is the number of characters required for a password that uses characters from only one character class. For example, the password contains only lowercase letters. N1 is the number of characters required for a password that uses characters from two character classes. N2 is used for passphrases. ESXi requires three words for a passphrase. Each word in the passphrase must be 8-40 characters long. N3 is the number of characters required for a password that uses characters from three character classes. N4 is the number of characters required for a password that uses characters from all four character classes. match is the number of characters allowed in a string that is reused from the old password. If the pam_passwdqc.so plug-in finds a reused string of this length or longer, it disqualifies the string from the strength test and uses only the remaining characters. Setting any of these options to -1 directs the pam_passwdqc.so plug-in to ignore the requirement. Setting any of these options to disabled directs the pam_passwdqc.so plug-in to disqualify passwords with the associated characteristic. The values used must be in descending order except for -1 and disabled. Note The pam_passwdqc.so plug-in used in Linux provides more parameters than the parameters supported for ESXi. For more information on the pam_passwdqc.so plug-in, see your Linux documentation. 7.22.1 Change Default Password Complexity for the pam_passwdqc.so Plug-In Configure the pam_passwdqc.so plug-in to determine the basic standards all passwords must meet. Procedure 1. 2. 3. 4. 5. Log in to the ESXi Shell as a user with administrator privileges. Open the passwd file with a text editor. For example, vi /etc/pam.d/passwd Edit the following line. password requisite /lib/security/$ISA/pam_passwdqc.so retry=N min=N0,N1,N2,N3,N4 6. Save the file. 7.22.1.1 Example: Editing /etc/pam.d/passwd 7.22.2 Ensure that vpxuser Password Meets Policy When you add a host to the vCenter Server inventory, vCenter Server creates a special user account called vpxuser on the host. vpxuser is a privileged account that acts as a proxy for all actions initiated through vCenter Server. Ensure that the default settings for the vpxuser password meet the requirements of your organization's password policy. By default, vCenter Server generates a new vpxuser password every 30 days using OpenSSL crypto libraries as a source of randomness. The password is 32 characters long and is guaranteed to contain at least one symbol from four character classes: symbols (./:=@[\\]^_{}~), digits (1-9), uppercase letters, and lowercase letters. Ensuring that the password expires periodically limits the amount of time an attacker can use the vpxuser password if it is compromised. You can change the default value for password expiration and for password length to meet your password policy. Important To preclude the possibility that vCenter Server is locked out of the ESXi host, the password aging policy must not be shorter than the interval that is set to automatically change the vpxuser password. Procedure 1. To change the password length policy, edit the vpxd.hostPasswordLength parameter in the vCenter Server configuration file on the system where vCenter Server is running. Operating System Default Location Windows C:\Documents and Settings\All Users\Application Data\VMware VirtualCenter\vpxd.cfg Linux /etc/vmware-vpx/vpxd.cfg 1. To change the password aging requirement, use the Advanced Settings dialog box in the vSphere Web Client. 2. Browse to the vCenter Server system in the vSphere Web Client inventory. 3. Click the Manage tab and click Settings. 4. Select Advanced Settings and locate the VirtualCenter.VimPasswordExpirationInDays parameter. 5. Restart vCenter Server. 7.23 Synchronizing Clocks on the vSphere Network Before you install vCenter Single Sign On, install the vSphere Web Client, or deploy the vCenter Server appliance, make sure all machines on the vSphere network have their clocks synchronized. If the clocks on vCenter Server network machines are not synchronized, SSL certificates, which are time-sensitive, might not be recognized as valid in communications between network machines. Unsynchronized clocks can result in authentication problems, which can cause the vSphere Web Client installation to fail or prevent the vCenter Server Appliance vpxd service from starting. 7.23.1 Synchronize ESX and ESXi Clocks with a Network Time Server Before you install vCenter Single Sign On, the vSphere Web Client, or the vCenter Server appliance, make sure all machines on the vSphere network have their clocks synchronized. Procedure 1. From the vSphere Web Client, connect to the vCenter Server. 2. Select the host in the inventory. 7.24 Monitoring and Restricting Access to SSL Certificates Attackers can use SSL certificates to impersonate vCenter Server and decrypt the vCenter Server database password. You must monitor and strictly control access to the certificate. Only the service account user requires regular access to the directory that contains vCenter Server SSL certificates. Infrequently, the vCenter Server system administrator might need to access the directory as well. Because the SSL certificate can be used to impersonate vCenter Server and decrypt the database password, monitor the event log and set an alert to trigger when an account other than the service account accesses the directory. To prevent a user other than the service account user from accessing the directory, change the permissions on the directory so that only the vCenter Server service account is allowed to access it. This restriction prevents you from collecting a complete support log when you issue a vc-support script. The restriction also prevents the administrator from changing the vCenter Server database password. 8 MSCS Clustering Requirements Component Virtual SCSI adapter Requirement LSI Logic Parallel for Windows Server 2003 LSI Logic SAS for Windows Server 2008 Operating system Windows Server 2003 SP1 and SP2 or Windows Server 2008 SP2 and above. For supported guest operating systems see Other Clustering Requirements and Recommendations. Virtual NIC Use the default type for all guest operating systems. I/O timeout Set to 60 seconds or more. Modify HKEY_LOCAL_MACHINE\System\CurrentControlSet\S ervices\Disk\TimeOutValue. The system might reset this I/O timeout value if you re-create a cluster. You must reset the value in that case. Disk format Select Thick Provision to create disks in eagerzeroedthick format. Disk and networking setup Add networking before disks. Refer to the VMware knowledge base article at http://kb.vmware.com/kb/1513 if you encounter any errors. Windows Server 2003 SP1 and SP2 : two-node clustering Number of nodes Windows Server 2008 SP2 and above: up to five-node clustering For supported guest operating systems see Other Clustering Requirements and Recommendations. NTP server Synchronize domain controllers and cluster nodes with a common NTP server, and disable host-based time synchronization when using clustering in the guest. 8.1.1 Supported Shared Storage Configurations Different MSCS cluster setups support different types of shared storage configurations. Some setups support more than one type. Select the recommended type of shared storage for best results. Shared Storage Requirements Storage Type Clusters on One Physical Machine (Cluster in a Box) Virtual disks Yes (recommended) Pass-through RDM (physical compatibility No mode) Clusters Across Physical Machines Clusters of Physical and Virtual Machines (Cluster Across Boxes) (Standby Host Clustering) No No Yes (recommended) Yes Non-pass-through RDM (virtual compatibility mode) Yes Yes No Use of software iSCSI initiators within guest operating systems configured with MSCS, in any configuration supported by Microsoft, is transparent to ESXi hosts and there is no need for explicit support statements from VMware. Note Clusters across physical machines with non-pass-through RDM is supported only for clustering with Windows Server 2003. It is not supported for clustering with Windows Server 2008. 8.1.2 Sphere MSCS Setup Limitations Before you set up MSCS, review the list of functions that are not supported for this release, and requirements and recommendations that apply to your configuration. The following environments and functions are not supported for MSCS setups with this release of vSphere: Clustering on iSCSI and NFS disks. Mixed environments, such as configurations where one cluster node is running a different version of ESXi than another cluster node. Use of MSCS in conjunction with vSphere Fault Tolerance (FT). Migration with vSphere vMotion® of clustered virtual machines. N-Port ID Virtualization (NPIV) With native multipathing (NMP), clustering is not supported when the path policy is set to round robin. Third-party multipathing plug-ins might support round robin or other load balancing behavior with Microsoft clusters. Support of third-party multipathing plug-ins is provided by the plug-in vendor. Round robin is the default policy for multiple storage arrays in new vSphere releases. See KB 1010041 for a list of storage arrays and the PSP to configure for MSCS. ESXi hosts that use memory overcommitment are not suitable for deploying MSCS virtual machines. Memory overcommitment can cause virtual machines to stall for short durations. This can be significantly disruptive as the MSCS clustering mechanism is time-sensitive and timing delays can cause the virtual machines to behave incorrectly. Suspend or resume of more than one MSCS node in an ESX host with a five-node cluster in a box configuration is not supported. This I/O intensive operation is disruptive of the timing sensitive MSCS clustering software. FCoE is supported in ESXi 5.1 Update 1. See KB 1037959 for more information. 8.1.3 MSCS and Booting from a SAN You can put the boot disk of a virtual machine on a SAN-based VMFS volume. Booting from a SAN is complex. Problems that you encounter in physical environments extend to virtual environments. For general information about booting from a SAN, see the vSphere Storage documentation. Follow these guidelines when you place the boot disk of a virtual machine on a SAN-based VMFS volume: Consider the best practices for boot-from-SAN that Microsoft publishes in the following knowledge base article: http://support.microsoft.com/kb/305547/en-us. Use StorPort LSI Logic drivers instead of SCSIport drivers when running Microsoft Cluster Service for Windows Server 2003 or 2008 guest operating systems. Test clustered configurations in different failover scenarios before you put them into production environments. 8.1.4 Setting up Clustered Continuous Replication or Database Availability Groups with Exchange 2010 You can set up Clustered Continuous Replication (CCR) with Exchange 2007 or Database Availability Groups (DAG) with Exchange 2010 in your vSphere environment. When working in a vSphere environment: Use virtual machines instead of physical machines as the cluster components. If the boot disks of the CCR or DAG virtual machines are on a SAN, see MSCS and Booting from a SAN. For more information, see Microsoft’s documentation for CCR or DAG on the Microsoft Web site. 8.2 Cluster Virtual Machines Across Physical Hosts You can create a MSCS cluster that consists of two or more virtual machines on two ESXi or more hosts. A cluster across physical hosts requires specific hardware and software. ESXi hosts that have the following: o Two physical network adapters dedicated to the MSCS cluster and to the public and private networks. o One physical network adapter dedicated to the VMkernel. Fibre Channel (FC) SAN. Shared storage must be on an FC SAN. RDM in physical compatibility (pass-through) or virtual compatibility (non-passthrough) mode. VMware recommends physical compatibility mode. The cluster cannot use virtual disks for shared storage. Failover clustering with Windows Server 2008 is not supported with virtual compatibility mode (non-pass-through) RDMs. 8.3 Cluster Physical and Virtual Machines You can create an MSCS cluster in which each physical machine has a corresponding virtual machine. This type of configuration is known as a standby host cluster. A standby host cluster has specific hardware and software requirements. Use ESXi hosts that have the following: o Two physical network adapters dedicated to the MSCS cluster and to the public and private networks. o One physical network adapter dedicated to the VMkernel. Use RDMs in physical compatibility mode (pass-through RDM). You cannot use virtual disks or RDMs in virtual compatibility mode (non-pass-through RDM) for shared storage. Use the STORport Miniport driver for the Fibre Channel (FC) HBA (QLogic or Emulex) in the physical Windows machine. Do not run multipathing software in the physical or virtual machines. Use only a single physical path from the host to the storage arrays in standby host configurations. 8.3.1 Using vSphere DRS Groups and VM-Host Affinity Rules with MSCS Virtual Machines You can use the vSphere Client to set up two types of DRS groups: virtual machine DRS groups, which contain at least one virtual machine, and host DRS groups, which contain at least one host. A VM-Host affinity rule establishes an affinity (or anti-affinity) relationship between a virtual machine DRS group and a host DRS group. You must use VM-Host affinity rules because vSphere HA does not obey VM-VM affinity rules. This means that if a host fails, vSphere HA might separate clustered virtual machines that are meant to stay together, or vSphere HA might put clustered virtual machines that are meant to stay apart on the same host. You can avoid this problem by setting up DRS groups and using VM-Host affinity rules, which are obeyed by vSphere HA. For a cluster of virtual machines on one physical host, all MSCS virtual machines must be in the same virtual machine DRS group, linked to the same host DRS group with the affinity rule "Must run on hosts in group." For a cluster of virtual machines across physical hosts, each MSCS virtual machine must be in a different virtual machine DRS group, linked to a different host DRS group with the affinity rule "Must run on hosts in group." Limit the number of hosts to two when you define host DRS group rules for a cluster of virtual machines on one physical host. (This does not apply to clusters of virtual machines across physical hosts.) Since vSphere HA does not obey VMVM affinity rules, virtual machines in the configuration could be spread across hosts during a vSphere HA recovery from host failure if more than two hosts are included in a host DRS group rule. 8.4 vSphere MSCS Setup Checklist When you set up MSCS on ESXi, see the checklists to configure your environment according to the requirements. You can also use the checklists to verify that your setup meets the requirements if you need technical support. 8.4.1 Requirements for Clustered Disks Each type of clustered disk has its own requirements, depending on whether it is in a singlehost cluster or multihost cluster. Requirements for Clustered Disks Component Clustered virtual disk (.vmdk) Single-Host Clustering Multihost Clustering SCSI bus sharing mode Not supported. must be set to virtual. Device type must be set Device type must be set to virtual to virtual compatibility compatibility mode for cluster across mode. boxes, but not for standby host clustering or cluster across boxes on Windows Sever Clustered disks, virtual SCSI bus sharing mode 2008. compatibility mode must be set to virtual (non-pass-through mode. SCSI bus sharing mode must be set to RDM) physical. A single, shared RDM mapping file for each Requires a single, shared RDM mapping clustered disk is file for each clustered disk. required. Device type must be set to Physical compatibility mode during hard disk Clustered disks, creation. physical compatibility Not supported. mode (pass-through SCSI bus sharing mode must be set to RDM) physical (the default). Requirements for Clustered Disks Component All types Single-Host Clustering Multihost Clustering A single, shared RDM mapping file for each clustered disk is required. All clustered nodes must use the same target ID (on the virtual SCSI adapter) for the same clustered disk. A separate virtual adapter must be used for clustered disks. 8.4.2 Other Requirements and Recommendations The following table lists the components in your environment that have requirements for options or settings. Other Clustering Requirements and Recommendations Compone Requirement nt If you place the boot disk on a virtual disk, select Thick Provision during disk provisioning. Disk The only disks that you should not create with the Thick Provision option are RDM files (both physical and virtual compatibility mode). Use Windows Server 2003 SP1 and SP2 (32 bit), Windows Server 2003 SP1 and SP2 (64 bit), Windows Server 2008 SP2 (32 bit), Windows Server 2008 SP2 (64 bit), or Windows Server 2008 SP1 R2 (32 bit), Windows Server 2008 SP1 R2 (64 bit) For Windows Server 2003 SP1 and SP2, use only two cluster nodes. For Windows Server 2008 SP2 and above, you can use up to five cluster nodes. Windows Disk I/O timeout is 60 seconds or more (HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\T imeOutValue). Note If you recreate the cluster, this value might be reset to its default, so you must change it again. The cluster service must restart automatically on failure (first, second, and subsequent times). Do not overcommit memory. Set the Memory Reservation (minimum memory) option to the same as the amount of memory assigned to the virtual machine. ESXi configurati If you must overcommit memory, the swap file must be local, not on the SAN. on ESXi 5.0 uses a different technique to determine if Raw Device Mapped (RDM) LUNs are used for MSCS cluster devices, by introducing a configuration flag to Other Clustering Requirements and Recommendations Compone Requirement nt mark each device as "perennially reserved" that is participating in an MSCS cluster. For ESXi hosts hosting passive MSCS nodes with RDM LUNs, use the esxcli command to mark the device as perennially reserved: esxcli storage core device setconfig -d <naa.id> --perennially-reserved=true. See KB 1016106 for more information. Multipathi Contact your multipathing software vendor for information and support of nonng VMware multipathing software in vSphere. 9 Virtual Machine Administration 9.1 What Is a Virtual Machine? A virtual machine consists of several types of files that you store on a supported storage device. The key files that make up a virtual machine are the configuration file, virtual disk file, NVRAM setting file, and the log file. You configure virtual machine settings through the vSphere Web Client or the vSphere Client. You do not need to touch the key files. A virtual machine can have more files if one or more snapshots exist or if you add Raw Device Mappings (RDMs). Caution Do not change, move, or delete these files without instructions from a VMware Technical Support Representative. Virtual Machine Files File Usage Description .vmx vmname.vmx Virtual machine configuration file .vmxf vmname.vmxf Additional virtual machine configuration files .vmdk vmname.vmdk Virtual disk characteristics Virtual Machine Files File Usage Description vmname-flat.vmdk flat.vmdk Virtual machine data disk .nvram vmname.nvram or nvram Virtual machine BIOS or EFI configuration .vmsd vmname.vmsd Virtual machine snapshots .vmsn vmname.vmsn Virtual machine snapshot data file .vswp vmname.vswp Virtual machine swap file .vmss vmname.vmss Virtual machine suspend file .log vmware.log Current virtual machine log file -#.log vmware-#.log (where # is a number starting with 1) Old virtual machine log entries 9.1.1 Virtual Machine Options and Resources Each virtual device performs the same function for the virtual machine as hardware on a physical computer does. A virtual machine might be running in any of several locations, such as ESXi hosts, datacenters, clusters, or resource pools. Many of the options and resources that you configure have dependencies on and relationships with these objects. Every virtual machine has CPU, memory, and disk resources. CPU virtualization emphasizes performance and runs directly on the processor whenever possible. The underlying physical resources are used whenever possible. The virtualization layer runs instructions only as needed to make virtual machines operate as if they were running directly on a physical machine. All recent operating systems provide support for virtual memory, allowing software to use more memory than the machine physically has. Similarly, the ESXi hypervisor provides support for overcommitting virtual machine memory, where the amount of guest memory configured for all virtual machines might be larger than the amount of the host's physical memory. You can add virtual disks and add more space to existing disks, even when the virtual machine is running. You can also change the device node and allocate shares of disk bandwidth to the virtual machine. VMware virtual machines have the following options: Supported Features for Virtual Machine Compatibility and later ESXi 5.0 and later ESX/ESXi 4.x and later ESX/ESXi 3.5 and later Hardware version 9 8 7 4 Maximum memory (MB) 1035264 1035264 261120 65532 Maximum number of logical processors 64 32 8 4 Maximum number of cores (virtual CPUs) per socket 64 32 8 1 Maximum SCSI adapters 4 4 4 4 Bus Logic adapters Y Y Y Y LSI Logic adapters Y Y Y Y LSI-Logic SAS adapters Y Y Y N VMware Paravirtual controllers Y Y Y N ESXi 5.1 Feature 9.1.2 VM Disk formats Option Action Same format Use the same format as the source virtual machine. as source Thick Create a virtual disk in a default thick format. Space required for the virtual Provision disk is allocated during creation. Any data remaining on the physical device is Lazy Zeroed not erased during creation, but is zeroed out on demand at a later time on first write from the virtual machine. Thick Create a thick disk that supports clustering features such as Fault Tolerance. Provision Space required for the virtual disk is allocated at creation time. In contrast to Eager Zeroed the thick provision lazy zeroed format, the data remaining on the physical device is zeroed out during creation. It might take longer to create disks in this format than to create other types of disks. Thin Provision Use the thin provisioned format. At first, a thin provisioned disk uses only as much datastore space as the disk initially needs. If the thin disk needs more space later, it can grow to the maximum capacity allocated to it. 9.2 Installing the Microsoft Sysprep Tool Install the Microsoft Sysprep tool so that you can customize Windows guest operating systems when you clone virtual machines. The guest operating system customization feature in vCenter Server and VMware vCenter Server Appliance uses the functions of the Sysprep tool. Verify that your vCenter Server or VMware vCenter Server Appliance system meets the following requirements before you customize your virtual machine’s Windows guest operating systems: Install the Microsoft Sysprep tool. Microsoft includes the system tool set on the installation CD-ROM discs for Windows 2000, Windows XP, and Windows 2003. The Sysprep tool is built into the Windows Vista and Windows 2008 operating systems. The correct versions of the Sysprep tool is installed for each guest operating system that you want to customize. The password for the local administrator account on the virtual machines is set to blank (""). If you are using the VMware vCenter Server Application, you must have access to the VMware vCenter Server Appliance Web console. Note Customization operations will fail if the correct version of the Sysprep tool is not found. This chapter includes the following topics: Install the Microsoft Sysprep Tool from a Microsoft Web Site Install the Microsoft Sysprep Tool from the Windows Operating System CD Install the Microsoft Sysprep Tool for VMware vCenter Server Appliance 9.2.1 Install the Microsoft Sysprep Tool from a Microsoft Web Site You can download and install the Microsoft Sysprep tool from the Microsoft Web site. Prerequisites Verify that you download the correct version for the guest operating system to customize. Microsoft has a different version of Sysprep for each release and service pack of Windows. You must use the version of Sysprep specific to the operating system that you are deploying. The vCenter Server installer creates a Sysprep directory in ALLUSERSPROFILE. The ALLUSERSPROFILE location is usually \Documents And Settings\All Users\. The vpxd.cfg file is also in this location. On Windows 2008, the file location is C:\ProgramData\VMware\VMware VirtualCenter\sysprep\. Procedure 1. Download the Sysprep files from the Microsoft Download Center and save them to your local system. 2. Open and expand the .cab file. The contents of the .cab file vary, depending on the operating system. 3. Extract the files to the appropriate directory for your guest operating system. The following Sysprep support directories are created during the vCenter Server installation: C:\ALLUSERSPROFILE\Application Data\Vmware\VMware VirtualCenter\syspr ep ...\1.1\ ...\2k\ ...\xp\ ...\svr2003\ ...\xp-64\ ...\svr2003-64\ 4. Select the subdirectory that corresponds to your operating system. 5. Click OK to expand the file 9.2.2 Install the Microsoft Sysprep Tool from the Windows Operating System CD You can install the Microsoft Sysprep tool from a CD. The vCenter Server installer creates a Sysprep directory in ALLUSERSPROFILE. The ALLUSERSPROFILE location is usually \Documents and Settings\All Users\. The vpxd.cfg file is also in this location. On Windows 2008, the file location is C:\ProgramData\VMware\VMware VirtualCenter\sysprep\. Procedure 1. Insert the Windows operating system CD into the CD-ROM drive, often the D: drive. 2. Locate the DEPLOY.CAB file in the \Support\Tools directory on the CD. 3. Open and expand the DEPLOY.CAB file. The contents of the .cab file vary, depending on the operating system. 4. Extract the files to the directory appropriate for your guest operating system. The following Sysprep support directories are created during the vCenter Server installation: C:\ALLUSERSPROFILE\Application Data\Vmware\VMware VirtualCenter\syspr ep ...\1.1\ ...\2k\ ...\xp\ ...\svr2003\ ...\xp-64\ ...\svr2003-64\ 5. Select the subdirectory that corresponds to your operating system. 6. Click OK to expand the files. 7. Repeat this procedure to extract Sysprep files for each of the Windows guest operating systems that you plan to customize using vCenter Server. 9.2.3 Install the Microsoft Sysprep Tool for VMware vCenter Server Appliance After you download and install the Microsoft Sysprep tool from the Microsoft Web site, you can use the VMware vCenter Server Appliance Web console to upload the files to the appliance. Prerequisites Verify that you download the correct version for the guest operating system to customize. Microsoft has a different version of Sysprep for each release and service pack of Windows. You must use the version of Sysprep specific to the operating system that you are deploying. When you upload the files to vCenter Server Appliance, the contents of the CAB file for the Sysprep Tool version that you downloaded are saved in /etc/vmware-vpx/sysprep/OS. For example, /etc/vmware-vpx/sysprep/2k or /etc/vmware-vpx/sysprep/xp. Procedure 1. Download the Sysprep files from the Microsoft Download Center and save them to your local system. 2. Log in to the VMware vCenter Server Appliance Web console and click the vCenter Server Summary tab. 3. In the Utilities panel, click the Sysprep Files Upload button. 4. Select a Windows platform directory, and browse to the file. 5. Click Open. The file is uploaded to the VCenter Server Appliance. 6. Click Close. You can customize a new virtual machine with a supported Windows guest operating system when you clone an existing virtual machine. 9.3 Virtual Machine Compatibility Options Compatibility Description ESXi 5.1 and later This virtual machine (hardware version 9) is compatible with ESXi 5.1 and later. ESXi 5.0 and later This virtual machine (hardware version 8) is compatible with ESXi 5.0 and 5.1. ESX/ESXi 4.x This virtual machine (hardware version 7) is compatible with ESX/ ESXi and later 4.x, ESXi 5.0, and ESXi 5.1. ESX/ESXi 3.5 This virtual machine (hardware version 4) is compatible with ESX/ESXi 3.5. and later ESX/ESXi 4.x, and ESXi 5.1. It is also compatible with VMware Server 1.0 9.3 Virtual Machine Compatibility Options Compatibility Description and later. ESXi 5.0 does not allow creation of virtual machines with this compatibility, but you can run such virtual machines if they were created on a host with different compatibility. This virtual machine (hardware version 3) is compatible with ESX Server ESX Server 2.x 2.x, ESX/ESXi 3.5, ESX/ESXi 4.x, and ESXi 5.0. You cannot create or edit and later virtual machines with ESX Server 2.x compatibility. You can only start or upgrade them. The compatibility setting that appears in the Compatible with drop-down menu is the default for the virtual machine that you are creating. The following factors determine the default virtual machine compatibility: The ESXi host version on which the virtual machine is created. The inventory object that the default virtual machine compatibility is set on, including a host, cluster, or datacenter. You can accept the default compatibility or select a different setting. It is not always necessary to select the latest ESXi host version. Selecting an earlier version can provide greater flexibility and is useful in the following situations: To standardize testing and deployment in your virtual environment. If you do not need the capabilities of the latest host version. 9.3.1 Determine the Default Virtual Machine Compatibility Setting in the vSphere Web Client The compatibility setting for a virtual machine provides information about the hosts, clusters, or datacenter the virtual machine is compatible with. The virtual machine Summary tab displays the compatibility for the virtual machine. You can set and view the default compatibility used for virtual machine creation at the host, cluster, or datacenter level. Procedure Select an inventory object and display the virtual machine compatibility. Option Action Virtual machine Select a virtual machine in the inventory and click the Summary tab. Host Select a host in the inventory and click the Manage tab. The top panel displays the Compatibility setting. The Default Virtual Machine Compatibility is listed in the Virtual Machines section. Cluster Select a cluster in the inventory, click the Manage tab, and in the Configuration section, click General. Datacenter Right-click a datacenter in the inventory and select All Virtual Infrastructure Actions > Edit Default VM Compatibility. You can change the default compatibility or upgrade the virtual machine compatibility. 9.3.2 Supported Features for Virtual Machine Compatibility Feature ESXi 5.1 and ESXi 5.0 and ESX/ESXi 4.x later later and later ESX/ESXi 3.5 and later Hardware version 9 8 7 4 Maximum memory (MB) 1035264 1035264 261120 65532 Maximum number of logical processors 64 32 8 4 Maximum number of cores (virtual CPUs) per socket 64 32 8 1 Maximum SCSI adapters 4 4 4 4 Bus Logic adapters Y Y Y Y LSI Logic adapters Y Y Y Y LSI-Logic SAS adapters Y Y Y N VMware Paravirtual controllers Y Y Y N 9.4 Change CPU Hot Plug Settings in the vSphere Web Client The CPU hot plug option lets you add CPU resources for a virtual machine while the machine is turned on. The following conditions apply: For best results, use virtual machines that are compatible with ESXi 5.0 and later. Hot-adding multicore virtual CPUs is supported only with virtual machine that are compatible with ESXi 5.0 and later. Not all guest operating systems support CPU hot add. You can disable these settings if the guest is not supported. To use the CPU hot-add feature with virtual machines that are compatible with ESXi 4.x and later, set the Number of cores per socket to 1. Adding CPU resources to a running virtual machine with CPU hot plug enabled disconnects and reconnects all USB passthrough devices connected to that virtual machine. Prerequisites Verify that the virtual machine is running under the following conditions: VMware Tools is installed. This condition is required for hot plug functionality with Linux guest operating systems. The virtual machine has a guest operating system that supports CPU hot plug. The virtual machine compatibility is ESX/ESXi 4.x or later. The virtual machine is turned off. 9.4.1 Change CPU Identification Mask Settings in the vSphere Web Client CPU identification (CPU ID) masks control the CPU features visible to the virtual machine's guest operating system. Masking or hiding CPU features can make a virtual machine widely available to ESXi hosts for migration. vCenter Server compares the CPU features available to a virtual machine with the CPU features of the destination host to determine whether to allow or disallow migration with vMotion. For example, masking the AMD No eXecute (NX) and the Intel eXecute Disable (XD) bits prevents the virtual machine from using these features, but provides compatibility that allows you to migrate virtual machines to ESXi hosts that do not include this capability. When the NX/XD bit is visible to the guest operating system, the virtual machine can use this feature, but you can migrate the virtual machine only to hosts on which the feature is enabled. Caution Changing the CPU compatibility masks can result in an unsupported configuration. Do not manually change the CPU compatibility masks unless instructed to do so by VMware Support or a VMware Knowledge base article. Prerequisites Turn off the virtual machine. Procedure 1. Right-click the virtual machine and select Edit Settings. a. To locate a virtual machine, select a datacenter, folder, cluster, resource pool, host, or vApp. b. Click the Related Objects tab and click Virtual Machines. 2. On the Virtual Hardware tab, expand CPU, and in the CPUID Mask drop-down menu, select an NX/XD option. Option Hide the NX/XD flag from guest Description Increases vMotion compatibility. Hiding the NX/XD flag increases vMotion compatibility between hosts, but might disable certain CPU security features. Expose the NX/XD flag to guest Keep current Advanced setting values for the NX/XD flag Keeps all CPU security features enabled. Uses the NX/XD flag settings specified in the CPU Identification Mask dialog box. Enabled only when current settings specify something other than what is specified in the other NX/XD flag options, for example, if the NX/XD flag bit setting varies with processor brand. 3. Click OK. 9.4.2 Expose VMware Hardware Assisted Virtualization in the vSphere Web Client You can expose full CPU virtualization to the guest operating system so that applications that require hardware virtualization can run on virtual machines without binary translation or paravirtualization. Prerequisites Verify that the virtual machine compatibility is ESXi 5.1 and later. Intel Nehalem Generation (Xeon Core i7) or later processors or AMD Opteron Generation 3 (Greyhound) or later processors. Verify that Intel VT-x or AMD-V is enabled in the BIOS so that hardware assisted virtualization is possible. Required Privileges: Virtual machine.Configuration.Settings set on the vCenter Server system. Procedure Right-click the virtual machine and select Edit Settings. o To locate a virtual machine, select a datacenter, folder, cluster, resource pool, host, or vApp. o Click the Related Objects tab and click Virtual Machines. On the Virtual Hardware tab, expand CPU, and select Expose hardware-assisted virtualization to guest OS. Click OK. The Manage tab refreshes, and the Nested Hypervisor CPU option shows Enabled. 9.4.3 Allocate Memory Resources in the vSphere Web Client You can change the amount of memory resources allocated to a virtual machine by using the shares, reservations, and limits settings. The host determines the appropriate amount of physical RAM to allocate to virtual machines based on these settings. You can assign a high or low shares value to a virtual machine, depending on its load and status. The following user-defined settings affect the memory resource allocation of a virtual machine. Limit Places a limit on the consumption of memory for a virtual machine. This value is expressed in megabytes. Reservation Specifies the guaranteed minimum allocation for a virtual machine. The reservation is expressed in megabytes. If the reservation cannot be met, the virtual machine will not turn on. Shares Each virtual machine is granted a number of memory shares. The more shares a virtual machine has, the greater share of host memory it receives. Shares represent a relative metric for allocating memory capacity. For more information about share values, see the vSphere Resource Management documentation. You cannot assign a reservation to a virtual machine that is larger than its configured memory. If you give a virtual machine a large reservation and reduce its configured memory size, the reservation is reduced to match the new configured memory size. 9.4.4 Network Adapter Types When you configure a virtual machine, you can add network adapters (NICs) and specify the adapter type. The type of network adapters that are available depend on the following factors: The virtual machine version, which depends on what host created it or most recently updated it. Whether the virtual machine has been updated to the latest version for the current host. The guest operating system. The following NIC types are supported: E1000 Emulated version of the Intel 82545EM Gigabit Ethernet NIC, with drivers available in most newer guest operating systems, including Windows XP and later and Linux versions 2.4.19 and later. Flexible Identifies itself as a Vlance adapter when a virtual machine boots, but initializes itself and functions as either a Vlance or a VMXNET adapter, depending on which driver initializes it. With VMware Tools installed, the VMXNET driver changes the Vlance adapter to the higher performance VMXNET adapter. Vlance Emulated version of the AMD 79C970 PCnet32 LANCE NIC, an older 10 Mbps NIC with drivers available in most 32bit guest operating systems except Windows Vista and later. A virtual machine configured with this network adapter can use its network immediately. VMXNET Optimized for performance in a virtual machine and has no physical counterpart. Because operating system vendors do not provide built-in drivers for this card, you must install VMware Tools to have a driver for the VMXNET network adapter available. VMXNET 2 (Enhanced) Based on the VMXNET adapter but provides high-performance features commonly used on modern networks, such as jumbo frames and hardware offloads. VMXNET 2 (Enhanced) is available only for some guest operating systems on ESX/ESXi 3.5 and later. VMXNET 3 Next generation of a paravirtualized NIC designed for performance. VMXNET 3 offers all the features available in VMXNET 2 and adds several new features, such as multiqueue support (also known as Receive Side Scaling in Windows), IPv6 offloads, and MSI/MSI-X interrupt delivery. VMXNET 3 is not related to VMXNET or VMXNET 2. 9.5 VM Disk Persistence Modes Option Description Dependent Dependent disks are included in snapshots. Independent Persistent Disks in persistent mode behave like conventional disks on your physical computer. All data written to a disk in persistent mode are written permanently to the disk. Independent Nonpersistent Changes to disks in nonpersistent mode are discarded when you turn off or reset the virtual machine. With nonpersistent mode, you can restart the virtual machine with a virtual disk in the same state every time. Changes to the disk are written to and read from a redo log file that is deleted when you turn off or reset the virtual machine. 9.5.1 RDM Compatibility Modes Select a compatibility mode. Option Description PhysicalAllows the guest operating system to access the hardware directly. Physical compatibility is useful if you are using SAN-aware applications on the virtual machine. However, a virtual machine with a physical compatibility RDM cannot be cloned, made into a template, or migrated if the migration involves copying the disk. Virtual Allows the RDM to behave as if it were a virtual disk, so that you can use such features as taking snapshots, cloning, and so on. When you clone the disk or make a template out of it, the contents of the LUN are copied into a .vmdk virtual disk file. When you migrate a virtual compatibility mode RDM, you can migrate the mapping file or copy the contents of the LUN into a virtual disk. In most cases, you can accept the default device node. For a hard disk, a nondefault device node is useful to control the boot order or to have different SCSI controller types. For example, you might want to boot from an LSI Logic controller and share a data disk with another virtual machine using a BusLogic controller with bus sharing turned on. Disk modes are not available for RDM disks using physical compatibility mode. 9.5.2 Use Disk Shares to Prioritize Virtual Machines in the vSphere Web Client You can change the disk resources for a virtual machine. If multiple virtual machines access the same VMFS datastore and the same logical unit number (LUN), use disk shares to prioritize the disk accesses from the virtual machines. Disk shares distinguish high-priority from low-priority virtual machines. You can allocate the host disk's I/O bandwidth to the virtual hard disks of a virtual machine. Disk I/O is a host-centric resource so you cannot pool it across a cluster. Shares is a value that represents the relative metric for controlling disk bandwidth to all virtual machines. The values are compared to the sum of all shares of all virtual machines on the server. Disk shares are relevant only within a given host. The shares assigned to virtual machines on one host have no effect on virtual machines on other hosts. You can select an IOP limit, which sets an upper bound for storage resources that are allocated to a virtual machine. IOPs are the number of I/O operations per second. Procedure Right-click the virtual machine and select Edit Settings. o To locate a virtual machine, select a datacenter, folder, cluster, resource pool, host, or vApp. o Click the Related Objects tab and click Virtual Machines. On the Virtual Hardware tab, expand Hard disk to view the disk options. In the Shares drop-down menu, select a value for the shares to allocate to the virtual machine. If you selected Custom, enter a number of shares in the text box. In the Limit - IOPs box, enter the upper limit of storage resources to allocate to the virtual machine, or select Unlimited. 9.6 SCSI Controller Configuration To access virtual disks and SCSI devices, a virtual machine uses virtual SCSI controllers. These virtual controllers appear to a virtual machine as different types of controllers, including BusLogic Parallel, LSI Logic Parallel, LSI Logic SAS, and VMware Paravirtual SCSI. You can add a SCSI controller, change the SCSI controller type, and select bus sharing for a virtual machine. Each virtual machine can have a maximum of four SCSI controllers. The default SCSI controller is numbered as 0. When you create a virtual machine, the default hard disk is assigned to the default SCSI controller 0 at bus node (0:0). When you add SCSI controllers, they are numbered sequentially 1, 2, and 3. If you add a hard disk or SCSI device to a virtual machine after virtual machine creation, it is assigned to the first available virtual device node on the default SCSI Controller, for example (0:1). If you add a SCSI controller, you can reassign an existing or new hard disk, or a SCSI device, to that controller. For example, you can assign the device to (1:z ), where 1 is SCSI Controller 1 and z is a virtual device node from 0 to 15. By default, the SCSI controller is assigned to virtual device node (z:7), so that device node is unavailable for hard disks or SCSI devices. 9.6.1 About VMware Paravirtual SCSI Controllers VMware Paravirtual SCSI controllers are high performance storage controllers that can result in greater throughput and lower CPU use. These controllers are best suited for high performance storage environments. VMware Paravirtual SCSI controllers are available for virtual machines with ESXi 4.x and later compatibility. Disks on such controllers might not experience optimal performance gains if they have snapshots or if memory on the ESXi host is over committed. This behavior does not mitigate the overall performance gain of using VMware Paravirtual SCSI controllers as compared to other SCSI controller options. If you have virtual machines with VMware Paravirtual SCSI controllers, those virtual machines cannot be part of an MSCS cluster. For platform support for VMware Paravirtual SCSI controllers, see the VMware Compatibility Guide. 9.6.2 Add a PCI Device in the vSphere Web Client vSphere DirectPath I/O allows a guest operating system on a virtual machine to directly access physical PCI and PCIe devices connected to a host. This action gives you direct access to devices such as high-performance graphics or sound cards. You can connect each virtual machine to up to six PCI devices. You configure PCI devices on the host to make them available for passthrough to a virtual machine. See the vCenter Server and Host Management documentation. When PCI vSphere DirectPath I/O devices are available to a virtual machine, you cannot suspend, migrate with vMotion, or take or restore Snapshots of such virtual machines. Prerequisites To use DirectPath, verify that the host has Intel Virtualization Technology for Directed I/O (VT-d) or AMD I/O Virtualization Technology (IOMMU) enabled in the BIOS. Verify that the PCI devices are connected to the host and marked as available for passthrough. Verify that the virtual machine is compatible with ESXi 4.x and later. 9.6.3 Configure Video Cards in the vSphere Web Client You can change the number of monitor displays for a virtual machine, allocate memory for the displays, and enable 3D support. The default setting for total video RAM is adequate for minimal desktop resolution. For more complex situations, you can change the default memory. Some 3D applications require a minimum video memory of 64MB. Prerequisites Verify that the virtual machine is powered off. To enable 3D support, the virtual machine compatibility must be ESXi 5.0 and later. To use a hardware 3D renderer, ensure that graphics hardware is available. Otherwise, the virtual machine will not power on. Required privilege: Virtual machine.Configuration.Modify device settings 9.6.4 How USB Device Passthrough Technology Works When you attach a USB device to a physical host, the device is available only to virtual machines that reside on that host. The device cannot connect to virtual machines that reside on another host in the datacenter. A USB device is available to only one virtual machine at a time. When a device is connected to a powered-on virtual machine, it is not available to connect to other virtual machines that run on the host. When you remove the active connection of a USB device from a virtual machine, it becomes available to connect to other virtual machines that run on the host. Connecting a USB passthrough device to a virtual machine that runs on the ESXi host to which the device is physically attached requires an arbitrator, a controller, and a physical USB device or device hub. USB Manages connection requests and routes USB device traffic. The arbitrator is Arbitrator installed and enabled by default on ESXi hosts. It scans the host for USB devices and manages device connection among virtual machines that reside on the host. It routes device traffic to the correct virtual machine instance for delivery to the guest operating system. The arbitrator monitors the USB device and prevents other virtual machines from using it until you release it from the virtual machine it is connected to. USB The USB hardware chip that provides USB function to the USB ports that it Controller manages. The virtual USB Controller is the software virtualization of the USB host controller function in the virtual machine. USB controller hardware and modules that support USB 2.0 and USB 1.1 devices must exist on the host. Two virtual USB controllers are available to each virtual machine. A controller must be present before you can add USB devices to the virtual computer. The USB arbitrator can monitor a maximum of 15 USB controllers. Devices connected to controllers numbered 16 or greater are not available to the virtual machine. USB Devices You can add up to 20 USB devices to a virtual machine. This is the maximum number of devices supported for simultaneous connection to one virtual machine. The maximum number of USB devices supported on a single ESXi host for simultaneous connection to one or more virtual machines is also 20. For a list of supported USB devices, see the VMware knowledge base article at http://kb.vmware.com/kb/1021345. 9.6.5 Configuring USB Devices for vMotion With USB passthrough from a host to a virtual machine, you can migrate a virtual machine to another ESXi host in the same datacenter and maintain the USB passthrough device connections to the original host. If a virtual machine has USB devices attached that pass through to an ESXi host, you can migrate that virtual machine with the devices attached. For a successful migration, review the following conditions: You must configure all USB passthrough devices connected to a virtual machine for vMotion. If one or more devices is not configured for vMotion, the migration cannot proceed. For troubleshooting details, see the vSphere Troubleshooting documentation. When you migrate a virtual machine with attached USB devices away from the host to which the devices are connected, the devices remain connected to the virtual machine. However, if you suspend or power off the virtual machine, the USB devices are disconnected and cannot reconnect when the virtual machine is resumed. The device connections can be restored only if you move the virtual machine back to the host to which the devices are attached. If you resume a suspended virtual machine that has a Linux guest operating system, the resume process might mount the USB devices at a different location on the file system. If a host with attached USB devices resides in a DRS cluster with distributed power management (DPM) enabled, disable DPM for that host. Otherwise DPM might turn off the host with the attached device. This action disconnects the device from the virtual machine because the virtual machine migrated to another host. Remote USB devices require that the hosts be able to communicate over the management network following migration with vMotion, so the source and destination management network IP address families must match. You cannot migrate a virtual machine from a host that is registered to vCenter Server with an IPv4 address to a host that is registered with an IPv6 address. 9.6.6 Add a USB Controller to a Virtual Machine in the vSphere Web Client USB controllers are available to add to virtual machines to support USB passthrough from an ESXi host or from a client computer to a virtual machine. You can add two USB controllers to a virtual machine. The xHCI controller, available for Linux guest operating systems only, supports USB 3.0 superspeed, 2.0, and 1.1 devices. The EHCI+UHCI controller supports USB 2.0 and 1.1 devices. The conditions for adding a controller vary, depending on the device version, the type of passthrough (host or client computer), and the guest operating system. USB Controller Support Controller type Supported for Supported for Supported USB Passthrough from ESXi Passthrough from Client Device Version Host to VM Computer to VM EHCI+UHCI 2.0 and 1.1 Yes Yes xHCI Yes (USB 2.0 and 1.1 devices only) Yes (Linux guests only) 3.0, 2.0, and 1.1 For Mac OS X systems, the EHCI+UHCI controller is enabled by default and is required for USB mouse and keyboard access. For virtual machines with Linux guests, you can add one or both controllers, but 3.0 superspeed devices are not supported for passthrough from an ESXi host to a virtual machine. You cannot add two controllers of the same type. For USB passthrough from an ESXi host to a virtual machine, the USB arbitrator can monitor a maximum of 15 USB controllers. If your system includes controllers that exceed the 15 controller limit and you connect USB devices to them, the devices are not available to the virtual machine. Prerequisites ESXi hosts must have USB controller hardware and modules that support USB 2.0 and 1.1 devices present. Client computers must have USB controller hardware and modules that support USB 3.0, 2.0, and 1.1 devices present. To use the xHCI controller on a Linux guest, ensure that the Linux kernel version is 2.6.35 or later. Verify that the virtual machine is powered on. Required Privilege (ESXi host passthrough): Virtual Machine.Configuration.Add or Remove Device 9.6.7 How USB Device Passthrough Technology Works The USB controller is the USB hardware chip that provides USB function to the USB ports that it manages. USB controller hardware and modules that support USB 3.0, 2.0, and USB 1.1 devices must exist in the virtual machine. Two USB controllers are available for each virtual machine. The controllers support multiple USB 3.0, 2.0, and 1.1 devices. The controller must be present before you can add USB devices to the virtual machine. You can add up to 20 USB devices to a virtual machine. This is the maximum number of devices supported for simultaneous connection to one virtual machine. You can add multiple devices to a virtual machine, but only one at a time. The virtual machine retains its connection to the device while in S1 standby. USB device connections are preserved when you migrate virtual machines to another host in the datacenter. A USB device is available to only one powered-on virtual machine at a time. When a virtual machine connects to a device, that device is no longer available to other virtual machines or to the client computer. When you disconnect the device from the virtual machine or shut the virtual machine down, the device returns to the client computer and becomes available to other virtual machines that the client computer manages. For example, when you connect a USB mass storage device to a virtual machine, it is removed from the client computer and does not appear as a drive with a removable device. When you disconnect the device from the virtual machine, it reconnects to the client computer's operating system and is listed as a removable device. 9.6.8 USB 3.0 Device Limitations USB 3.0 devices have the following requirements and limitations: The virtual machine that you connect the USB 3.0 device to must be configured with an xHCI controller and have a Linux guest operating system with a 2.6.35 or later kernel. You can connect only one USB 3.0 device operating at superspeed to a virtual machine at a time. USB 3.0 devices are available only for passthrough from a client computer to a virtual machine. They are not available for passthrough from an ESXi host to a virtual machine. 9.6.9 Avoiding Data Loss Before you connect a device to a virtual machine, make sure the device is not in use on the client computer. If the vSphere Client disconnects from the vCenter Server or host, or if you restart or shut down the client computer, the device connection breaks. It is best to have a dedicated client computer for USB device use or to reserve USB devices connected to a client computer for short-term use, such as updating software or adding patches to virtual machines. To maintain USB device connections to a virtual machine for an extended time, use USB passthrough from an ESXi host to the virtual machine. 9.7 Configure Fibre Channel NPIV Settings in the vSphere Web Client N-port ID virtualization (NPIV) provides the ability to share a single physical Fibre Channe HBA port among multiple virtual ports, each with unique identifiers. This capability lets you control virtual machine access to LUNs on a per-virtual machine basis. Each virtual port is identified by a pair of world wide names (WWNs): a world wide port name (WWPN) and a world wide node name (WWNN). These WWNs are assigned by vCenter Server. For detailed information on how to configure NPIV for a virtual machine, see vSphere Storage. NPIV support is subject to the following limitations: NPIV must be enabled on the SAN switch. Contact the switch vendor for information about enabling NPIV on their devices. NPIV is supported only for virtual machines with RDM disks. Virtual machines with regular virtual disks continue to use the WWNs of the host’s physical HBAs. The physical HBAs on the ESXi host must have access to a LUN using its WWNs in order for any virtual machines on that host to have access to that LUN using their NPIV WWNs. Ensure that access is provided to both the host and the virtual machines. The physical HBAs on the ESXi host must support NPIV. If the physical HBAs do not support NPIV, the virtual machines running on that host will fall back to using the WWNs of the host’s physical HBAs for LUN access. Each virtual machine can have up to 4 virtual ports. NPIV-enabled virtual machines are assigned exactly 4 NPIV-related WWNs, which are used to communicate with physical HBAs through virtual ports. Therefore, virtual machines can utilize up to 4 physical HBAs for NPIV purposes. Prerequisites To edit the virtual machine’s WWNs, power off the virtual machine. Verify that the virtual machine has a datastore containing a LUN that is available to the host. ESXi Hosts and Compatible Virtual Machine Hardware Versions Version 8 Version 7 Version 4 ESXi 5.0 Create, edit, Create, edit, Edit, run run run ESX/ESXi 4.x Not supported ESX Server Not 3.x supported Compatible with vCenter Server Version vCenter Server 5.0 Create, edit, Create, edit, vCenter Server 4.x run run Not supported Create, edit, VirtualCenter Server 2.x and later run Version 3 virtual machines are not supported on ESXi 5.0 hosts. To make full use of these virtual machines, upgrade the virtual hardware. Note Virtual machine hardware version 4 might be listed as VM3 in documentation for earlier versions of ESX/ESXi. 9.8 Managing Multi-Tiered Applications with vSphere vApp in the vSphere Web Client You can use VMware vSphere as a platform for running applications, in addition to using it as a platform for running virtual machines. The applications can be packaged to run directly on top of VMware vSphere. The format of how the applications are packaged and managed is called vSphere vApp. A vApp is a container, like a resource pool and can contain one or more virtual machines. A vApp also shares some functionality with virtual machines. A vApp can power on and power off, and can also be cloned. Each vApp has a specific summary page with the current status of the service and relevant summary information, as well as operations on the service. The distribution format for vApp is OVF. Note The vApp metadata resides in the vCenter Server's database, so a vApp can be distributed across multiple ESXi hosts. This information can be lost if the vCenter Server database is cleared or if a standalone ESXi host that contains a vApp is removed from vCenter Server. You should back up vApps to an OVF package to avoid losing any metadata. vApp metadata for virtual machines within vApps do not follow the snapshots semantics for virtual machine configuration. So, vApp properties that are deleted, modified, or defined after a snapshot is taken remain intact (deleted, modified, or defined) after the virtual machine reverts to that snapshot or any prior snapshots. You can use VMware Studio to automate the creation of ready-to-deploy vApps with prepopulated application software and operating systems. VMware Studio adds a network agent to the guest so that vApps bootstrap with minimal effort. Configuration parameters specified for vApps appear as OVF properties in the vCenter Server deployment wizard. For information about VMware Studio and for download, see the VMware Studio developer page on the VMware web site. You can allocate CPU and memory resources for the new vApp using shares, reservations, and limits. Procedure Allocate CPU resources for this vApp. Option Description Shares CPU shares for this vApp with respect to the parent’s total. Sibling vApps share resources according to their relative share values bounded by the reservation and limit. Select Low, Normal, or High, which specify share values respectively in a 1:2:4 ratio. Select Custom to give each vApp a specific number of shares, which express a proportional weight. Reservation Guaranteed CPU allocation for this vApp. Reservation Select the Expandable check box to make the reservation expandable. When the vApp is powered on, if the combined reservations of its virtual machines are Type larger than the reservation of the vApp, the vApp can use resources from its parent or ancestors. Limit Upper limit for this vApp's CPU allocation. Select Unlimited to specify no upper limit. Allocate memory resources for this vApp. Option Description Shares Memory shares for this vApp with respect to the parent’s total. Sibling vApps share resources according to their relative share values bounded by the reservation and limit. Select Low, Normal, or High, which specify share values respectively in a 1:2:4 ratio. Select Custom to give each vApp a specific number of shares, which express a proportional weight. Reservation Guaranteed memory allocation for this vApp. Reservation Select the Expandable check box to make the reservation expandable. When the vApp is powered on, if the combined reservations of its virtual machines are Type larger than the reservation of the vApp, the vApp can use resources from its parent or ancestors. Limit Upper limit for this vApp's memory allocation. Select Unlimited to specify no upper limit. 9.9 vCenter Solutions Manager 9.9.1 Monitoring Agents The vCenter Solutions Manager displays the vSphere ESX Agent Manager agents that you use to deploy and manage related agents on ESX hosts. An administrator uses the solutions manager to keep track of whether a solution's agents are working as expected. Outstanding issues are reflected by the solution's ESX Agent Manager status and a list of issues. When a solution's state changes, the solutions manager updates the ESX Agent Manager's summary status and state. Administrators use this status to track whether the goal state is reached. The agency's health status is indicated by a specific color: Red. The solution must intervene for the ESX Agent Manager to proceed. For example, if a virtual machine agent is powered off manually on a compute resource and the ESX Agent Manager does not attempt to power on the agent. The ESX Agent Manager reports this action to the solution. The solution alerts the administrator to power on the agent. Yellow. The ESX Agent Manager is actively working to reach a goal state. The goal state can be enabled, disabled, or uninstalled. For example, when a solution is registered, its status is yellow until the ESX Agent Manager deploys the solutions agents to all the specified compute resources. A solution does not need to intervene when the ESX Agent Manager reports its ESX Agent Manager health status as yellow. Green. A solution and all its agents reached the goal state. 9.10 Monitoring vServices A vService is a service or function that a solution provides to virtual machines and vApps. A solution can provide one or more vServices. These vServices integrate with the platform and are able to change the environment in which the vApp or virtual machine runs. A vService is a type of service for a virtual machine and a vApp provided by a vCenter extension. Virtual machines and vApps can have dependencies on vServices. Each dependency is associated with a vService type. The vService type must be bound to a particular vCenter extension that implements that vService type. This vService type is similar to a virtual hardware device. For example, a virtual machine can have a networking device that at deployment must be connected to a particular network. The vService Manager allows a solution to connect to operations related to OVF templates: Importing OVF templates. Receive a callback when OVF templates with a vService dependancy of a certain type is imported. Exporting OVF templates. Inserts OVF sections when a virtual machine is exported. OVF environment generation. Inserts OVF sections into the OVF environment at the power-on instance. The vService Provider tab in the solution manager provides details for each vCenter extension. This information allows you to monitor vService providers and list the virtual machines or vApps to which they are bound. 9.10.1 Install the Client Integration Plug-In in the vSphere Web Client The Client Integration Plug-in provides access to a virtual machine's console in the vSphere Web Client, and provides access to other vSphere infrastructure tasks. You use the Client Integration Plug-in to deploy OVF or OVA templates and transfer files with the datastore browser. You can also use the Client Integration Plug-in to connect virtual devices that reside on a client computer to a virtual machine. You install the Client Integration Plug-in only once to connect virtual devices to virtual machines that you access through an instance of the vSphere Web Client. You must restart the browser after you install the plug-in. If you install the Client Integration Plug-in from an Internet Explorer browser, you must first disable Protected Mode. Internet Explorer identifies the Client Integration Plug-in as being on the Internet instead of on the local intranet. In such cases, the plug-in does not install correctly because Protected Mode is enabled for the Internet. The Client Integration Plug-in also enables you to log in to the vSphere Web Client using Windows session credentials. For information about supported browsers and operating systems, see the vSphere Installation and Setup documentation. 9.11 Using Snapshots To Manage Virtual Machines With snapshots, you can preserve a baseline before diverging a virtual machine in the snapshot tree. The Snapshot Manager in the vSphere Web Client and the vSphere Client provide several operations for creating and managing virtual machine snapshots and snapshot trees. These operations let you create snapshots, restore any snapshot in the snapshot hierarchy, delete snapshots, and more. You can create extensive snapshot trees that you can use to save the virtual machine state at any specific time and restore the virtual machine state later. Each branch in a snapshot tree can have up to 32 snapshots. A snapshot preserves the following information: Virtual machine settings. The virtual machine directory, which includes disks that were added or changed after you took the snapshot. Power state. The virtual machine can be powered on, powered off, or suspended. Disk state. State of all the virtual machine's virtual disks. (Optional) Memory state. The contents of the virtual machine's memory. 9.11.1 The Snapshot Hierarchy The Snapshot Manager presents the snapshot hierarchy as a tree with one or more branches. The relationship between snapshots is like that of a parent to a child. In the linear process, each snapshot has one parent snapshot and one child snapshot, except for the last snapshot, which has no child snapshots. Each parent snapshot can have more than one child. You can revert to the current parent snapshot or restore any parent or child snapshot in the snapshot tree and create more snapshots from that snapshot. Each time you restore a snapshot and take another snapshot, a branch, or child snapshot, is created. Parent Snapshots The first virtual machine snapshot that you create is the base parent snapshot. The parent snapshot is the most recently saved version of the current state of the virtual machine. Taking a snapshot creates a delta disk file for each disk attached to the virtual machine and optionally, a memory file. The delta disk files and memory file are stored with the base .vmdk file. The parent snapshot is always the snapshot that appears immediately above the You are here icon in the Snapshot Manager. If you revert or restore a snapshot, that snapshot becomes the parent of the You are here current state. Note The parent snapshot is not always the snapshot that you took most recently. Child Snapshots A snapshot that is taken of the same virtual machine after the parent snapshot. Each child constitutes delta files for each attached virtual disk, and optionally a memory file that points from the present state of the virtual disk (You are here). Each child snapshot's delta files merge with each previous child snapshot until reaching the parent disks. A child disk can later be a parent disk for future child disks. The relationship of parent and child snapshots can change if you have multiple branches in the snapshot tree. A parent snapshot can have more than one child. Many snapshots have no children. Important Do not manually manipulate individual child disks or any of the snapshot configuration files because doing so can compromise the snapshot tree and result in data loss. This restriction includes disk resizing and making modifications to the base parent disk using vmkfstools. 9.11.2 Snapshot Behavior Taking a snapshot preserves the disk state at a specific time by creating a series of delta disks for each attached virtual disk or virtual RDM and optionally preserves the memory and power state by creating a memory file. Taking a snapshot creates a snapshot object in the Snapshot Manager that represents the virtual machine state and settings. Each snapshot creates an additional delta .vmdk disk file. When you take a snapshot, the snapshot mechanism prevents the guest operating system from writing to the base .vmdk file and instead directs all writes to the delta disk file. The delta disk represents the difference between the current state of the virtual disk and the state that existed at the time that you took the previous snapshot. If more than one snapshot exists, delta disks can represent the difference between each snapshot. Delta disk files can expand quickly and become as large as the entire virtual disk if the guest operating system writes to every block of the virtual disk. 9.11.3 Snapshot Files When you take a snapshot, you capture the state of the virtual machine settings and the virtual disk. If you are taking a memory snapshot, you also capture the memory state of the virtual machine. These states are saved to files that reside with the virtual machine's base files. 9.11.3.1 Snapshot Files A snapshot consists of files that are stored on a supported storage device. A Take Snapshot operation creates .vmdk, -delta.vmdk, .vmsd, and .vmsn files. By default, the first and all delta disks are stored with the base .vmdk file. The .vmsd and .vmsn files are stored in the virtual machine directory. A .vmdk file to which the guest operating system can write. The delta disk represents the difference between the current state of the virtual disk and the state that existed at the time that the previous snapshot was taken. When you take a snapshot, the state of the virtual disk is preserved, which prevents the guest operating system from writing to it, and a delta or child disk is created. A delta disk has two files, including a descriptor file that is small and contains information about the virtual disk, such as geometry and child-parent relationship information, and a corresponding file that contains the raw data. Note If you are looking at a datastore with the Datastore Browser in the vSphere Client, you see only one entry to represent both files. The files that make up the delta disk are referred to as child disks or redo logs. A child disk is a sparse disk. Sparse disks use the copy-on-write mechanism, in which the virtual disk contains no data in places, until copied there by a write operation. This optimization saves storage space. A grain is the unit of measure in which the sparse disk uses the copy-on-write mechanism. Each grain is a block of sectors that contain virtual disk data. The default size is 128 sectors or 64KB. Flat file A -flat.vmdk file that is one of two files that comprises the base disk. The flat disk contains the raw data for the base disk. This file does not appear as a separate file in the Datastore Browser. Database file A .vmsd file that contains the virtual machine's snapshot information and is the primary source of information for the Snapshot Manager. This file contains line entries, which define the relationships between snapshots and between child disks for each snapshot. Memory file A .vmsn file that includes the active state of the virtual machine. Capturing the memory state of the virtual machine lets you revert to a turned on virtual machine state. With nonmemory snapshots, you can only revert to a turned off virtual machine state. Memory snapshots take longer to create than nonmemory snapshots. The time the ESX host takes to write the memory onto the disk is relative to the amount of memory the virtual machine is configured to use. A Take Snapshot operation creates .vmdk, -delta.vmdk, vmsd, and vmsn files. File Description vmname-number.vmdk and vmname-number-delta.vmdk Snapshot file that represents the difference between the current state of the virtual disk and the state that existed at the time the previous snapshot was taken. The filename uses the following syntax, S1vm000001.vmdk where S1vm is the name of the virtual machine and the six-digit number, 000001, is based on the files that already exist in the directory. The number does not consider the number of disks that are attached to the virtual machine. vmname.vmsd Database of the virtual machine's snapshot information and the primary source of information for the Snapshot Manager. vmname.Snapshotnumber.vmsn Memory state of the virtual machine at the time you take the snapshot. The file name uses the following syntax, S1vm.snapshot1.vmsn, where S1vm is the virtual machine name, and snapshot1 is the first snapshot. Note A .vmsn file is created each time you take a snapshot, regardless of the memory selection. A .vmsn file without memory is much smaller than one with memory. 9.11.4 Snapshot Limitations Snapshots can affect virtual machine performance and do not support some disk types or virtual machines configured with bus sharing. Snapshots are useful as short-term solutions for capturing point-in-time virtual machine states and are not appropriate for long-term virtual machine backups. VMware does not support snapshots of raw disks, RDM physical mode disks, or guest operating systems that use an iSCSI initiator in the guest. Virtual machines with independent disks must be powered off before you take a snapshot. Snapshots of powered-on or suspended virtual machines with independent disks are not supported. Snapshots are not supported with PCI vSphere Direct Path I/O devices. VMware does not support snapshots of virtual machines configured for bus sharing. If you require bus sharing, consider running backup software in your guest operating system as an alternative solution. If your virtual machine currently has snapshots that prevent you from configuring bus sharing, delete (consolidate) the snapshots. Snapshots provide a point-in-time image of the disk that backup solutions can use, but Snapshots are not meant to be a robust method of backup and recovery. If the files containing a virtual machine are lost, its snapshot files are also lost. Also, large numbers of snapshots are difficult to manage, consume large amounts of disk space, and are not protected in the case of hardware failure. Snapshots can negatively affect the performance of a virtual machine. Performance degradation is based on how long the snapshot or snapshot tree is in place, the depth of the tree, and how much the virtual machine and its guest operating system have changed from the time you took the snapshot. Also, you might see a delay in the amount of time it takes the virtual machine to power-on. Do not run production virtual machines from snapshots on a permanent basis. 9.11.5 Managing Snapshots You can review all snapshots for the active virtual machine and act on them by using the Snapshot Manager. After you take a snapshot, you can use the Revert to current snapshot command from the virtual machine’s right-click menu to restore that snapshot at any time. If you have a series of snapshots, you can use the Go to command in the Snapshot Manager to restore any parent or child snapshot. Subsequent child snapshots that you take from the restored snapshot create a branch in the snapshot tree. You can delete a snapshot from the tree in the Snapshot Manager. The Snapshot Manager window contains the following areas: Snapshot tree, Details region, command buttons, Navigation region, and a You are here icon. Snapshot tree Displays all snapshots for the virtual machine. You are here icon Represents the current and active state of the virtual machine. The You are here icon is always selected and visible when you open the Snapshot Manager. You can select the You are here state to see how much space the node is using. Go to, Delete, and Delete all are disabled for the You are here state. Go to, Delete, and Delete All Snapshot options. Details Shows the snapshot name and description, the date you created the snapashot, and the disk space. The Console shows the power state of the virtual machine when a snapshot was taken. The Name, Description, and Created text boxes are blank if you do not select a snapshot. Navigation Contains buttons for navigating out of the dialog box. Close the Snapshot Manager. The question mark icon opens the help system. 9.12 Change Disk Mode to Exclude Virtual Disks from Snapshots in the vSphere Web Client You can set a virtual disk to independent mode to exclude the disk from any snapshots taken of its virtual machine. Prerequisites Power off the virtual machine and delete any existing snapshots before you change the disk mode. Deleting a snapshot involves committing the existing data on the snapshot disk to the parent disk. Required privileges: Virtual machine.State.Remove Snapshot Virtual machine.Configuration.Modify device settings Procedure Right-click the virtual machine and select Edit Settings. 1. To locate a virtual machine, select a datacenter, folder, cluster, resource pool, host, or vApp. 2. Click the Related Objects tab and click Virtual Machines. On the Virtual Hardware tab, expand Hard disk, and select an independent disk mode option. Option Independent Persistent Independent Nonpersistent Description Disks in persistent mode behave like conventional disks on your physical computer. All data written to a disk in persistent mode are written permanently to the disk. Changes to disks in nonpersistent mode are discarded when you power off or reset the virtual machine. With nonpersistent mode, you can restart the virtual machine with a virtual disk in the same state every time. Changes to the disk are written to and read from a redo log file that is deleted when you power off or reset. Click OK. Restoring snapshots has the following effects: The current disk and memory states are discarded, and the virtual machine reverts to the disk and memory states of the parent snapshot. Existing snapshots are not removed. You can restore those snapshots at any time. If the snapshot includes the memory state, the virtual machine will be in the same power state as when you created the snapshot. Virtual Machine Power State After Restoring a Snapshot Virtual Machine State When Parent Snapshot Is Taken Virtual Machine State After Restoration Powered on (includes memory) Reverts to the parent snapshot, and the virtual machine is powered on and running. Powered on (does not include memory) Reverts to the parent snapshot and the virtual machine is powered off. Powered off (does not include memory) Reverts to the parent snapshot and the virtual machine is powered off. Virtual machines running certain kinds of workloads can take several minutes to resume responsiveness after reverting from a snapshot. Revert to Snapshot Note vApp metadata for virtual machines in vApps does not follow the snapshot semantics for virtual machine configuration. vApp properties that are deleted, modified, or defined after a snapshot is taken remain intact (deleted, modified, or defined) after the virtual machine reverts to that snapshot or any previous snapshots.