White Paper Cisco Solution for EMC VSPEX with ScaleIO Platform On Cisco UCS and VMware vSphere 5.5 Shivakumar Shastri December 2014 This white paper describes the EMC VSPEX virtualized infrastructure solution on Cisco UCS, EMC ScaleIO, and VMware vSphere 5.5. Acknowledgments The author would like to acknowledge the following for their support and contributions: Rajmohan Rajanayagam, Principal Solutions Engineer, EMC Tripp Bridges, Solutions Engineer, EMC © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 30 Contents Introduction .............................................................................................................................................................. 3 Executive Summary .............................................................................................................................................. 3 Business Objectives .............................................................................................................................................. 3 Target Audience.................................................................................................................................................... 3 Solution Overview.................................................................................................................................................... 4 Architecture ........................................................................................................................................................... 4 Key Components .................................................................................................................................................. 7 High Availability................................................................................................................................................... 11 Solution Details ...................................................................................................................................................... 12 Virtualization ....................................................................................................................................................... 12 Compute Layer ................................................................................................................................................... 13 Network Layer ..................................................................................................................................................... 16 ScaleIO Software ................................................................................................................................................ 18 Sizing ...................................................................................................................................................................... 21 Reference Virtual Machine and Workload ........................................................................................................... 21 Scale Out ............................................................................................................................................................ 21 Validated Building Blocks .................................................................................................................................... 21 Configuration Guidelines ..................................................................................................................................... 23 Deployment ............................................................................................................................................................ 25 Test and Validation ................................................................................................................................................ 27 Post-Installation Checklist ................................................................................................................................... 27 Failure Testing .................................................................................................................................................... 27 Monitoring ........................................................................................................................................................... 29 References ............................................................................................................................................................. 30 © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 2 of 30 Introduction This document provides guidance on the technical aspects of an integrated infrastructure solution using EMC ScaleIO software-defined storage on Cisco Unified Computing System™ (Cisco UCS®) and VMware through the VSPEX program. VSPEX provides modular solutions built with technologies that enable faster deployment, greater simplicity, greater choice, higher efficiency, and lower risk. This white paper provides a complete system architecture capable of supporting virtual machines with fault-tolerant server and network topology and highly available ScaleIO software. Executive Summary An integrated infrastructure brings together disparate compute, network, and storage products into a cohesive, prevalidated solution. In most cases, this approach eliminates compatibility and deployment hurdles but does not address operational inefficiencies stemming from complex features of the underlying components. One approach to simplifying management of such an integrated stack is to introduce an orchestration layer for automation. Hyperconvergence presents a simpler and more granular approach to integrate widely available processing nodes such as rack servers without introducing operational complexity. This method allows for a more flexible, agile, and efficient infrastructure in which capacity can keep pace with demand at a lower price point. ScaleIO presents a hypervisor-agnostic software-defined approach to serving up compute and storage resources from a cluster of nodes for consumption by workloads within the cluster. The use of Cisco UCS servers with Cisco UCS Manager adds to operational efficiency through a single pane for firmware and console management of cluster servers. Further, the architecture with Cisco UCS servers and fabric interconnects is fault tolerant and provides local switching for traffic between servers within the same domain. Business Objectives Business applications are moving into consolidated compute, network, and storage environments. The EMC ScaleIO platform on Cisco UCS with VMware delivered as a VSPEX proven infrastructure reduces the complexity of configuring every component of a traditional deployment model. The solution simplifies integration management while maintaining application design and implementation options. It also provides unified administration while enabling adequate control and monitoring of process separation. The business benefits for this VSPEX architecture include: ● Provides an end-to-end virtualization solution to effectively use the capabilities of the unified infrastructure components ● Provides a proven VSPEX solution on Cisco UCS with VMware for efficiently virtualizing compute resources for varied customer use cases ● Provides a reliable, flexible, and scalable reference design Target Audience The readers of this document must have the necessary training and background to install and configure VMware vSphere 5.5 and associated infrastructure, including the Cisco UCS server platform, as required by this implementation. External references are provided where applicable, and readers should be familiar with these documents. Readers should also be familiar with the infrastructure and database security policies of the customer installation. Guidance is provided on sizing, configuration, and validation, with references for further reading. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 3 of 30 Solution Overview Architecture The following is an overview of the VSPEX proven infrastructure platform and the key technologies used in the solution. The solution has been designed to provide virtualization, server, network, and storage resources, giving customers the ability to start with a right-sized deployment and scale as business demand grows. Physical and Logical Architecture The VSPEX solution for VMware vSphere with EMC ScaleIO validates the configuration for a specified number of virtual machines. System configuration determines the capacity of each node and hence the overall cluster workload. The sections that follow show that disk capacity, more than processing, I/O operations per second (IOPS), or memory, is the limiting factor in the number of virtual instances supported. Note: VSPEX uses a reference workload to describe and define a virtual machine. Therefore, one physical or virtual server in an existing environment may not be equal to one virtual machine in a VSPEX solution. Evaluate your workload in terms of the reference to arrive at an appropriate point of scale. The solution uses EMC ScaleIO software and VMware vSphere 5.5 on a cluster of Cisco UCS C240 M3 Rack Servers to provide the storage and virtualization platform. The clusters tested include a minimum configuration with three nodes and another consisting of seven nodes. The workload consists of Microsoft Windows Server 2012 virtual machines. The ScaleIO cluster is formed by a pair of low-latency 10 Gigabit Ethernet switches to allow the ScaleIO Data Client (SDC) on each processing node to access storage served up by the remote ScaleIO Data Servers (SDS). Figure 1 shows a basic configuration of a ScaleIO cluster consisting of a primary and secondary Metadata Manager (MDM) and the tie-breaker node with access requirements. Figure 1. Basic Configuration of a ScaleIO Cluster © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 4 of 30 Figures 2 and 3 depict the configurations validated. ® Option 1: Cisco UCS C240 M3 Rack Servers with Cisco Nexus 5548 10 Gigabit Ethernet Switches (Figure 2) This is the basic method to ensure low latency for the cluster switch. The Cisco Nexus 5548 Switches provide redundant 10 Gigabit Ethernet low-latency connectivity between all the nodes within the ScaleIO cluster. External user access and access to enterprise services such as Active Directory and DNS are through separate VLANs set up on the Nexus 5548 switches. While these switches provide sufficient bandwidth with the least amount of latency, there is still the need for a separate management network for console access to servers. Figure 2. Option 1: Cisco UCS C240 M3 Rack Servers with Cisco Nexus 5548 Switches © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 5 of 30 Option 2: Cisco UCS C240 M3 Rack Servers with Cisco UCS 6248UP Fabric Interconnects (Figure 3) The Cisco UCS C240 servers serving as ScaleIO cluster nodes can be connected through the Cisco UCS Virtual Interface Card (VIC) 1225 to the Cisco UCS 6248UP 48-Port Fabric Interconnects to form a Cisco UCS domain. This method allows for a converged fabric, eliminating additional cabling for cluster management, compared with option 1. The arrangement also introduces features and functionality of Cisco UCS Manager, which allows for single-pane console, firmware, and LAN/SAN management of the servers within the ScaleIO cluster. Figure 3. Note: Option 2: Cisco UCS C240 M3 Rack Servers with 6248UP 48-Port Fabric Interconnects Please go to the “Deployment” section, later in this white paper, for details on configuration. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 6 of 30 Key Components This architecture includes the following key components: ● VMware vSphere 5.5: Provides a common virtualization layer to host a server environment. vSphere 5.5 provides highly available infrastructure through such features as: ◦ vMotion: Provides live migration of virtual machines within a virtual infrastructure cluster, with no virtual machine downtime or service disruption ◦ Storage vMotion: Provides live migration of virtual machine disk files within and across storage arrays, with no virtual machine downtime or service disruption ◦ vSphere High Availability: Detects and provides rapid recovery for a failed virtual machine in a cluster ◦ Distributed Resource Scheduler (DRS): Provides load balancing of computing capacity in a cluster ◦ Storage Distributed Resource Scheduler (SDRS): Provides load balancing across multiple datastores based on space usage and I/O latency ● VMware vCenter Server: Provides a scalable and extensible platform that forms the foundation for virtualization management for the VMware vSphere cluster. vCenter manages all vSphere hosts and their virtual machines. ● Microsoft SQL Server: VMware vCenter Server requires a database service to store configuration and monitoring details. This solution uses a Microsoft SQL Server 2012 database. ● Shared infrastructure: Adds DNS (name resolution) and authentication/authorization services, such as AD Service, with existing infrastructure or set up as part of the new virtual infrastructure. ● Cluster network: A 10 Gigabit Ethernet network with either a pair of redundant Cisco Nexus 5548 Switches or a pair of Cisco UCS 6248UP fabric interconnects with VIC 1225 adapters in the C240 rack servers for both management and storage traffic. A shared IP network carries user and management traffic into the cluster. ● EMC ScaleIO: EMC ScaleIO software creates a server-based SAN from local server storage to deliver elastic, scalable performance and capacity on demand. Hardware Resources Table 1 lists the hardware used in this solution. Table 1. Hardware Components Component Configuration VMWare vSphere servers Cisco UCS C240 M3 Rack Servers: Config 1: 3 nodes, each with 64 GB memory and 6 x 600-GB 10,000-rpm SAS disks Config 2: 7 nodes, each with 128 GB memory and 9 x 900-GB 10,000-rpm SAS disks 1 vCPU per virtual machine Maximum of 4 vCPUs per physical core* Memory 2 GB RAM per virtual machine 2 GB RAM for each physical server for the hypervisor 3 GB RAM reservation for each ScaleIO virtual machine (SVM) Network 2 x 10 Gigabit Ethernet NICs per server 3 GB RAM and 2 vCPUs for each ScaleIO SVM © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 7 of 30 Network infrastructure 2 x Cisco Nexus 5548 switches [or] UCS 6248UP fabric interconnects 2 x 10 Gigabit Ethernet ports per VMware vSphere server Shared infrastructure In most cases, the customer environment will already have infrastructure services such as Active Directory and DNS services configured. The setup of these services is beyond the scope of this document. If implemented without existing infrastructure, the minimum requirements are: ● 2 physical servers ● 16 GB RAM per server ● 4 processor cores per server ● 2 x 1 Gigabit Ethernet ports per server *For Intel® Xeon® or later processors, use 8 virtual CPUs per physical core. Note: Add at least one additional server to the infrastructure beyond the minimum requirements to implement VMware vSphere High Availability functionality and to meet the listed minimums. Software Resources Table 2 lists the software resources for the solution. Table 2. Software Resources Software Configuration Software Configuration vSphere Server Enterprise Edition vCenter Server Standard Edition Operating system for vCenter Server Microsoft Windows Server 2012 R2 Standard Edition Microsoft SQL Server Version 2012 R2 Standard Edition ScaleIO 1.3 ScaleIO virtual machine ScaleIO virtual machine release 1.3 Metadata Manager (MDM)/tie breaker ScaleIO components release 1.3 ScaleIO Data Server (SDS) ScaleIO components release 1.3 ScaleIO Data Client (SDC) ScaleIO components release 1.3 Virtual Machines (for validation, but not required for deployment) Base operating system Microsoft Windows Server 2012 R2 Data Center Edition Virtualization Layer The virtualization layer is a key component of any private cloud solution. It decouples the application resource requirements from the underlying physical resources that serve them. This enables greater flexibility in the application layer by eliminating hardware downtime for maintenance, and allows the system to physically change without affecting the hosted applications. It enables multiple independent virtual machines to share the same physical hardware, rather than being directly implemented on dedicated hardware. VMware vSphere 5.5: VMware vSphere 5.5 transforms the physical resources of a computer by virtualizing the CPU, RAM, hard disk, and network controller. This approach creates fully functional systems with dedicated and isolated instances of operating systems and applications, similar to physical computers. The high-availability features of VMware vSphere 5.5, such as vMotion, enable seamless migration of virtual machines and stored files from one vSphere server to another, with minimal or no performance impact. Coupled with vSphere DRS, virtual machines have access to the appropriate resources at any point in time through load balancing of compute resources. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 8 of 30 VMware vCenter: VMware vCenter is a centralized management platform for the VMware virtual infrastructure. This platform provides administrators with a single interface for all aspects of monitoring, managing, and maintaining the virtual infrastructure, accessed from multiple devices. VMware vCenter also manages some advanced features of the VMware virtual infrastructure, such as VMware vSphere High Availability and DRS, vMotion, and Update Manager. EMC ScaleIO EMC ScaleIO is a software-defined storage solution that uses local disks and LANs to create a virtual SAN with all the benefits of external storage, but at reduced cost and with less complexity. The lightweight ScaleIO software components are installed in the application hosts and intercommunicate using a standard LAN to handle the application I/O requests sent to ScaleIO block volumes. An extremely efficient decentralized block I/O flow, combined with a distributed, sliced volume layout, results in a massively parallel I/O system that can scale to thousands of nodes. ScaleIO was designed and implemented with enterprise-grade resilience as a requirement. Furthermore, the software features efficient distributed automatic healing processes that overcome media and node failures without requiring administrator involvement. ScaleIO enables administrators to add or remove nodes and capacity “on the fly.” The software immediately responds to the changes, rebalancing the storage distribution and achieving a layout that optimally suits the new configuration. Because ScaleIO is hardware and hypervisor agnostic, the software works efficiently with various types of nodes and a mix of types within the same cluster. Software Components: The ScaleIO virtual SAN software consists of three software components: ● Metadata Manager (MDM): Configures and monitors the ScaleIO system. The MDM can be configured in a redundant cluster mode, with three members on three servers, or in single mode on a single server. ● ScaleIO Data Server (SDS): Manages the capacity of a single server and acts as a back end for data access. The SDS is installed on all servers contributing storage devices to the ScaleIO system. ● ScaleIO Data Client (SDC): SDC is a lightweight device driver situated in each host whose application or file system requires access to the ScaleIO virtual block devices. The SDC exposes block devices representing the ScaleIO volumes that are currently mapped to the host. Software Architecture: ScaleIO consists of two major functional components: the ScaleIO Data Client (SDC) and the ScaleIO Data Server (SDS). The SDC is a block service driver that exposes ScaleIO shared block volumes to applications. The SDC runs locally on any application server that requires access to the block storage volumes in the cluster. The SDS is a software component installed on each server that contributes local storage to the overall ScaleIO storage pool. The SDC serves incoming read and write requests from any of the SDCs within the cluster. It possesses full knowledge of the data locations throughout the cluster and always directs I/O requests to their correct destination SDS, whether on the same server or another server. Because the same hosts run applications and provide storage for the virtual SAN, SDC and SDS are typically both installed on each of the participating hosts, such as a rack server, as shown in Figure 4. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 9 of 30 Figure 4. Layout of SDS and SDC Designed and implemented to consume a minimum amount of computing resources, the ScaleIO software components have a negligible impact on the applications running on the hosts. Pure Block Storage Implementation: ScaleIO implements a pure block storage layout. The entire architecture and data path are optimized for block storage access needs. For example, when an application submits a read I/O request to the SDC, the SDC instantly deduces which SDS is responsible for the specified volume address and then interacts directly with the relevant SDS. The SDS reads the data (by issuing a single read I/O request to the local storage or by just fetching the data from the cache in a cache-hit scenario), and returns the result to the SDC. The SDC provides the read data to the application. This implementation is very simple, consuming as few resources as possible. The data moves over the network exactly once, and a maximum of only one I/O request is sent to the SDS storage. The write I/O flow is similarly simple and efficient. Unlike some block storage systems that run on top of a file system or object storage that runs on top of a local file system, ScaleIO offers optimal I/O efficiency. Massive Parallel, Scale-Out I/O Architecture: ScaleIO can scale to many nodes, thus breaking the traditional scalability barrier of block storage. Because the SDCs propagate the I/O requests directly to the pertinent SDSs, there is no central point through which the requests move—and thus a potential bottleneck is avoided. This decentralized data flow is important to the linearly scalable performance of ScaleIO. Therefore, a large ScaleIO configuration results in a massively parallel system. The more servers or disks the system has, the greater the number of parallel channels that will be available for I/O traffic. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 10 of 30 Mix-and-Match Nodes: The vast majority of traditional scale-out systems are based on a symmetrical brick architecture, in which the same node configuration is used in one cluster. Such symmetric scale-out architectures are likely to run in small islands. ScaleIO was designed to support a mixture of new and old nodes with dissimilar configurations. Volume Mapping and Volume Sharing: The volumes that ScaleIO exposes to the application clients can be mapped to one or more clients running in different hosts. Mapping can be changed dynamically if necessary, and ScaleIO volumes can be used by applications that expect shared- everything block access and by applications that expect shared-nothing or shared- nothing-with-failover access. High Availability This VSPEX solution provides a highly available virtualized server, network, and storage infrastructure. When the solution is implemented following the instructions in this white paper, business operations survive with little to no impact from single-unit failures. The VMware vSphere High Availability feature enables the virtualization layer to automatically restart virtual machines in various failure conditions: ● If the virtual machine operating system has an error, the virtual machine can automatically restart on the same hardware. ● If the physical hardware has an error, the affected virtual machines can automatically restart on other servers in the cluster. Note: For virtual machines to restart on different hardware, the servers must have available resources. The “Compute Layer” section, later in this white paper, provides detailed information on enabling this function. With vSphere High Availability, you can configure policies to determine which machines automatically restart, and under what conditions to attempt these operations. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 11 of 30 Solution Details Virtualization VMware vSphere 5.5 includes advanced features that help maximize performance and overall resource use. The most important of these features are in memory management. This section describes some of these features, and what to consider when using these features in the environment. In general, virtual machines on a single hypervisor consume memory as a pool of resources, as shown in Figure 5 for a system with 64 GB of memory (configuration 1). Figure 5. Memory Configuration for Virtual Machines on a Single Hypervisor (Configuration 1) © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 12 of 30 Memory Compression Memory overcommitment occurs when more memory is allocated to virtual machines than is physically present in a VMware vSphere host. Using sophisticated techniques, such as ballooning and transparent page sharing, vSphere 5.5 can handle memory overcommitment without any performance degradation. However, if memory usage exceeds server capacity, vSphere might swap out portions of the memory of a virtual machine. Nonuniform Memory Access (NUMA) vSphere 5.5 uses a NUMA load balancer to assign a home node to a virtual machine. Because the home node allocates virtual machine memory, memory access is local and provides the best performance possible. Applications that do not directly support NUMA also benefit from this feature. Transparent Page Sharing Virtual machines running similar operating systems and applications typically have similar sets of memory content. Page sharing enables the hypervisor to reclaim any redundant copies of memory pages and keep only one copy, which frees up the total host memory consumption. If most of your application virtual machines use the same operating system and application binaries, total memory usage can be reduced to increase consolidation ratios. Memory Ballooning By using a balloon driver loaded in the guest operating system, the hypervisor can reclaim host physical memory if memory resources are under contention, with little or no impact on the performance of the application. Memory Configuration Guidelines This section provides guidelines for allocating memory to virtual machines. These guidelines take into account vSphere memory overhead and the virtual machine memory settings. vSphere Memory Overhead: Some overhead is expected for the virtualization of memory resources. The memory space overhead has two components: ● The fixed system overhead for the VMkernel ● Additional overhead for each virtual machine Memory overhead depends on the number of virtual CPUs and the configured memory for the guest operating system. Allocating Memory to Virtual Machines: Many factors determine the proper sizing for virtual machine memory in VSPEX architectures. With the number of application services and use cases available, determining a suitable configuration for an environment requires creating a baseline configuration, testing the configuration, and making adjustments for optimal results. Compute Layer VSPEX documents minimum requirements for the number of processor cores and the amount of RAM. The compute node used for this solution is the Cisco UCS C240 M3 Rack Server with the capability to hold up to 24 internal disks of varying capacity and performance. The tested configuration contains nodes, each with two Intel E2650v2 sockets and 96 GB memory. In general, the infrastructure must conform to the following attributes: ● Sufficient cores and memory to support the required number and types of virtual machines ● Sufficient network connections to enable redundant connectivity to the system switches ● Sufficient capacity to enable the environment to withstand a server failure and failover in the environment © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 13 of 30 ScaleIO components are designed to work with a minimum of three server nodes. The physical server node, running VMware vSphere, can host other workloads beyond the ScaleIO virtual machine. The implementation that will be alluded to in this paper contains eight nodes with eight 900-GB 10,000-rpm SAS internal disks in each node. Note: To enable High Availability for the compute layer, the customer will need one additional spare server to ensure that the system has enough capability to maintain business operations when a server fails. Best Practices in the Compute Layer Use of identical, or at least compatible, servers is preferred, even though ScaleIO can accommodate different types within a cluster. This is because VSPEX implements hypervisor-level high-availability technologies that may require similar instruction sets and capabilities from the underlying physical hardware. By implementing ScaleIO on identical server units, you can minimize compatibility problems in this area. If high availability is implemented at the hypervisor layer, the largest virtual machine that can be created is constrained by the smallest physical server in the environment. Implement high-availability features in the virtualization layer, and ensure that the compute layer has sufficient resources to accommodate at least single server failures. This helps ensure minimal downtime during upgrades and maintenance with tolerance for single unit failures. Within the boundaries of these recommendations and best practices, the compute layer for EMC VSPEX can be flexible to meet your customer’s specific needs. Ensure that there are sufficient processor cores and RAM per core to meet the needs of the target environment. Configuration Guidelines When designing and ordering the compute/server layer of this VSPEX solution, assuming system workload is well understood, features such as memory ballooning and transparent page sharing can reduce the aggregate memory requirement. If the virtual machine pool does not have a high level of peak or concurrent usage, reduce the number of virtual CPUs. Conversely, if the applications being deployed are highly computational in nature, increase the number of CPUs and memory purchased. Intel Xeon Updates Testing on the Intel Xeon series processors has shown significant increases in virtual machine density from the server resource perspective. If your server deployment consists of Intel Xeon processors, we recommend increasing the ratio of virtual CPUs to physical CPUs from 4:1 to 8:1. This essentially halves the number of server cores required to host the reference virtual machines. Figure 6 demonstrates the results from tested configurations. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 14 of 30 Figure 6. Results of Testing Intel Ivy Bridge microarchitecture (Xeon Processors) Current VSPEX sizing guidelines require a maximum ratio of virtual CPU cores to physical CPU cores of 4:1, with a maximum 8:1 ratio for Intel Xeon or later processors. This ratio was based upon an average sampling of CPU technologies available at the time of testing. Table 3 lists the hardware resources used at the compute layer by VMware vSphere servers. Table 3. Hardware Resources Used by VMware vSphere Servers Component Configuration CPU 1 vCPU per virtual machine Maximum of 4 vCPUs per physical core Memory 2 GB RAM per virtual machine 2 GB RAM reservation per VMware vSphere host Network Note: 2 x 10 Gigabit Ethernet NICs per server Add at least one additional server to the infrastructure beyond the minimum requirements to implement VMware vSphere High Availability functionality and to meet the listed minimums. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 15 of 30 Note: The solution recommends using a 10 Gigabit Ethernet network or an equivalent 1 Gigabit Ethernet network infrastructure as long as the underlying requirements for bandwidth and redundancy are fulfilled. Network Layer The tested configuration consists of either a pair of Cisco Nexus 5548 switches or redundant Cisco UCS 6248UP fabric interconnects. When the Cisco UCS 6248UP is used with C240 servers and VIC 1225, the interconnects provide additional operational benefits, such as firmware, console, and LAN/SAN management from a single tool, the Cisco UCS Manager. Event monitoring of server hardware, along with the flexibility to grow the ScaleIO cluster without additional effort and with multipathing, are some of the advantages of building a Cisco UCS domain using the 6248UP fabric interconnects. The infrastructure must provide the following attributes: ● Redundant network links for the hosts, switches, and storage ● Traffic isolation based on industry-accepted best practices Support for Link Aggregation This section provides requirements for a redundant and highly available network. Please refer to the “Deployment” section, later in this white paper, for details on network setup. The guidelines consider VLANs, the Link Aggregation Control Protocol (LACP) ESXi Server, and the ScaleIO layer. Component Configuration Network Infrastructure 2 physical Cisco Nexus 5548UP LAN switches with 2 x 10 Gigabit Ethernet ports per VMWare vSphere server [or] 2 physical Cisco UCS 6248UP fabric interconnects with redundant converged 10G Fibre Channel over Ethernet (FCoE) connectivity to each server Logical network traffic isolation between hosts and storage, hosts and clients, and management traffic is available and provided in both setups. Figure 7 shows the Cisco Nexus 5548UP setup, with logical VLAN separation. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 16 of 30 Figure 7. Cisco Nexus 5548UP Switch Configuration Figure 8 shows the Cisco UCS 6248UP fabric interconnect setup, with converged 10G FCoE for management and storage traffic for rack servers below the fabric interconnects. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 17 of 30 Figure 8. Cisco UCS 6248UP Fabric Interconnect Configuration You can use the client access network to communicate with the ScaleIO infrastructure. The storage network provides communication between the ScaleIO nodes. Administrators use the management network as a dedicated way to access the management connections on the hosts. ScaleIO Software This section provides guidelines for setting up the storage layer of the solution to provide high availability and the expected level of performance. VMware vSphere 5.5 allows more than one method of storage when hosting virtual machines. The tested solution uses block protocols, and the ScaleIO layer described in this section uses all current best practices. VMware vSphere provides host-level storage virtualization, virtualizes the physical storage, and presents the virtualized storage to the virtual machines. A virtual machine stores its operating system and all the other files related to the virtual machine activities in a virtual disk. The virtual disk itself consists of one or more files. VMware uses a virtual SCSI controller to present virtual disks to a guest operating system running inside the virtual machines. Virtual disks, as shown in Figure 9, reside on a datastore. Depending on the protocol used, a datastore can be a VMware VMFS datastore. Another option, Raw Device Mapping (RDM), allows the virtual infrastructure to connect a physical device directly to a virtual machine. In our ScaleIO solution, we use VMFS datastore or RDM as the device to provide disk capacity. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 18 of 30 Figure 9. Virtual Disk Configuration VMFS VMFS is a cluster file system that provides storage virtualization optimized for virtual machines. It may be deployed over any SCSI-based local or network storage. Raw Device Mapping (RDM) VMware also provides RDM, which allows a virtual machine to directly access a volume on the physical storage. Note: We recommend using RDM mapping in the vSphere environment for this workload. The device is created on ScaleIO virtual machines that point to the physical disk on the vSphere server. Redundancy Scheme and Rebuild Process ScaleIO software uses a mirroring scheme to protect data against disk and node failures (Figure 10). This architecture supports distributed two-copy schemes. When an SDS node or SDS disk fails, applications can continue to access ScaleIO volumes, as the data is still available through the remaining mirrors. ScaleIO software immediately starts a seamless rebuild process that creates another mirror for the data chunks that were lost in the failure. In the rebuild process, those data chunks are copied to free areas across the SDS cluster, so it is not necessary to add any capacity to the system. The surviving SDS cluster nodes carry out the rebuild process by using the aggregated disk and network bandwidth of the cluster. The process is dramatically faster, resulting in a shorter exposure time and less application and performance degradation. After the rebuild, all the data is fully mirrored and healthy again. If a failed node rejoins the cluster before the rebuild process is completed, ScaleIO software dynamically uses data from the rejoined node to further minimize the exposure time and the use of resources. This capability is important for overcoming short outages efficiently. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 19 of 30 Figure 10. ScaleIO Topology Elasticity and Rebalancing Unlike many other systems, a ScaleIO cluster is extremely elastic. Administrators can add and remove capacity and nodes on the fly during I/O operations. When a cluster is expanded with new capacity (such as new SDSs or new disks added to existing SDSs), ScaleIO immediately responds to the event and rebalances the storage by seamlessly migrating data chunks from the existing SDSs to the new SDSs or disks. Such a migration does not affect the applications, which continue to access the data stored in the migrating chunks. By the end of the rebalancing process, all ScaleIO volumes are spread across the SDSs and disks, including the newly added ones, in an optimally balanced manner. Thus, adding SDSs or disks not only increases the available capacity but also increases the performance of the applications as they access their volumes. When an administrator decreases capacity (for example, by removing SDSs or removing disks from SDSs), ScaleIO performs a seamless migration that rebalances the data across the remaining SDSs and disks in the cluster. Note: In all types of rebalancing, ScaleIO migrates the least amount of data possible. ScaleIO has the flexibility to accept new requests to add or remove capacity while still rebalancing previous capacity additions and removals. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 20 of 30 Sizing This section provides definitions of the reference workload used to size and implement the VSPEX architecture. Sizing the environment includes designing the nodes that will be used for the ScaleIO environment and specifying the number of those nodes. This section provides details on how variations in node configuration and the number of nodes in a cluster affect the number of virtual instances that can be supported per host. The virtual machines used in this section correspond to the VSPEX definitions of those workloads. Reference Virtual Machine and Workload A reference virtual machine captures basic resources needed by a virtual machine with the intent of using it as a unit of measurement for scaling and sizing purposes. Once defined, the idea is to be able to compare an actual customer application workload to this reference virtual machine workload and arrive at sizing information for the platform. In any discussion about virtual infrastructure, the first step is to define this reference workload. It is important to note, however, that not all servers perform similarly or host the same set of tasks for an accurate estimation. VSPEX solutions define a reference virtual machine (RVM) workload, which represents a common point of comparison. Table 4 describes this workload. Table 4. VSPEX Reference Workload Parameter Value Virtual machine OS Windows Server 2012 R2 Virtual CPUs 1 Virtual CPUs per physical core (maximum) 4 Memory per virtual machine 2 GB IOPS per virtual machine 25 I/O pattern Fully random skew = 0.5 I/O read percentage 67% Virtual machine storage capacity 100 GB Scale Out ScaleIO is designed to scale from three to many nodes. Unlike most traditional storage systems, as the number of servers grows, so do capacity, throughputs, and IOPS. The scalability of performance is mostly linear as the cluster grows. Storage and compute resources grow together, as in the case of rack servers, so that the balance between them is maintained. Validated Building Blocks VSPEX uses a building block approach to reduce complexity. A building block is one specific server node that can support a certain number of virtual servers in the VSPEX architecture. Each building block combines several local disk spindles to contribute a shared ScaleIO volume that supports the needs of either a virtualized or private cloud environment. Both SDS and SDC are installed on each building block node. The SDS presents local disk to a ScaleIO storage pool, which then exposes ScaleIO shared block volumes to run the virtual machines. The SDC allows for use of shared block volumes by local compute resources on the node. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 21 of 30 The configuration of a reference building block includes the physical CPU core number, memory size, and disk spindle number for a server. Table 5 shows one specific node in a three-node cluster that was validated and provides a flexible solution for VSPEX sizing. Table 5. Building Block Node Configuration Node Parameter Target Value Notes CPU 6 cores Memory 64 GB This configuration can support up to 30 virtual machines Disks 6 x 600 GB 10,000-rpm SAS Capacity, not IOPS, is the limit for the number of virtual machines supported This configuration contains six SAS disks per node. The validated solution modeled these drives at 600 GB each. For the workload definition, we were limited by drive capacity more than by drive IOPS. With this configuration, up to12 virtual machines can be supported by one building block (node). The node configuration in Table 5 defines the CPU, memory, and disk configuration for one server. However, ScaleIO is infrastructure agnostic and can run on any server. This VSPEX solution also provides more options for the building block node configuration. Users can redefine a building block with different configurations. But after the building block configuration is redefined, the number of virtual machines that the building block can support is also changed. To calculate the virtual machine that the new building block can support, we must consider the following components: CPU Capability With a recommended maximum of 4 virtual CPUs for each physical core in a virtual machine environment, a server node with 16 physical cores can support up to 64 virtual machines. Memory Capability When sizing the memory for a server node, the ScaleIO virtual machine and hypervisor must be considered. A ScaleIO virtual machine consumes 3 GB of RAM and reserves 2 GB RAM for the hypervisor. We do not recommend using memory overcommit in this environment. Note: ScaleIO 1.3 introduces a new RAM cache feature by using the SDS server RAM. By default, the RAM size of the ScaleIO virtual machine is set to 3 GB, and 128 MB of the RAM uses the SDS server RAM cache. Add the RAM size to the 3 GB of the ScaleIO virtual machine if more RAM cache is used. Disk Capacity ScaleIO uses a RAIN (Redundant Array of In-expensive Nodes) topology to ensure data availability. In general, the capacity available is a function of the capacity per node (formatted capacity) and the number of nodes available (Table 6). Assuming N nodes and C TB of capacity per server, the storage available, S, is This formula accounts for two copies of the data and the ability to survive a single node failure. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 22 of 30 Table 6. Theoretical Maximum Number of Virtual Machines per Node (capacity based) 10,000-rpm SAS Drives Number of Virtual Machines 600 12 900 18 1200 24 The primary method for adding IOPS capability to a node without considering cache technologies is to increase either the number of disk units or the speed of those units. Determining the Maximum Number of Virtual Machines in a Building Block Node With the entire configuration defined for the building block node, we calculate the number of virtual machines that each component can support to find out the number of virtual machines that the building block node can support. For example, if the customer redefined the building block configuration with 16 physical CPU cores, 64 virtual machines can be supported (16 cores × 4 virtual machines per core); with 192 GB memory, 93 virtual machines can be supported (2 GB reserved for the hypervisor and 3 GB for the ScaleIO virtual machine); with 8 SAS drives, 45 virtual machines can be supported, based on the IOPS limit. Therefore, the theoretical maximum, determined by the lowest denominator (IOPS), gives us a limit for this building block node of 45 virtual machines. Note: The actual capacity observed for each physical server is further reduced, as the disk capacity limit is lower than the IOPS limit. Configuration Guidelines To choose the appropriate reference architecture for a customer environment, determine the resource requirements of the environment and then translate these requirements to the appropriate number of reference virtual machines, as defined earlier in Table 5. This section describes how to use the customer configuration worksheet to simplify the sizing calculations and additional factors you should take into consideration when deciding which architecture to deploy. The customer configuration worksheet helps you assess the customer environment and calculate the sizing requirements of the environment. Table 7 shows a completed worksheet for a sample customer environment. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 23 of 30 Table 7. Sample Customer Configuration Worksheet Server Resources Storage Resources Application CPU (virtual CPUs) Memory (GB) IOPS Capacity (GB) Reference Virtual Machines Example application 1: Resource Custom-built requirements application Equivalent reference virtual machines 1 3 15 30 — 1 2 1 1 2 Example application 2: Resource Point-of-sale system requirements 4 16 200 200 – 4 8 8 2 8 2 8 50 25 – 2 4 2 1 4 Equivalent reference virtual machines Example application 3: Resource Web server requirements Equivalent reference virtual machines Total equivalent reference virtual machines 14 To complete the customer configuration worksheet, follow these steps: 1. Identify the application planned for migration into the VSPEX virtualized environment. 2. For each application, determine the compute resource requirements for virtual CPUs, memory (GB), storage performance (IOPS), and storage capacity. 3. For each resource type, determine the equivalent reference virtual machines required—that is, the number of reference virtual machines required to meet the specified resource requirements. 4. Determine the total number of reference virtual machines needed from the resource pool for the customer environment. Determining the Resource Requirements Consider the following when you determine resource requirements: CPU: The reference virtual machine outlined earlier in Table 4 assumes that most virtual machine applications are optimized for a single CPU. If one application requires a virtual machine with multiple virtual CPUs, modify the proposed virtual machine count to account for the additional resources. Memory: Memory plays a key role in ensuring application functionality and performance. Each group of virtual machines will have different targets for the available memory that is considered acceptable. Like the CPU calculation, if one application requires additional memory resources, adjust the number of planned virtual machines to accommodate the additional resource requirements. For example, if there are 30 virtual machines, but each one needs 4 GB of memory instead of the 2 GB that the reference virtual machine provides, plan for 60 reference virtual machines. IOPS: The storage performance requirements for virtual machines are usually the least understood aspect of performance. The reference virtual machine uses a workload generated by an industry-recognized tool to run a wide variety of office productivity applications that should be representative of the majority of virtual machine implementations. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 24 of 30 Storage Capacity: The storage capacity requirement for a virtual machine can vary widely, depending on the type of provisioning, the types of applications in use, and specific customer policies. Determining the Equivalent Reference Virtual Machines With all of the resources defined, determine the number of equivalent reference virtual machines by using the relationships listed in Table 8. Round all values to the closest whole number. Table 8. Equivalent Reference Virtual Machines Resource Value for Reference Virtual Machine Relationship Between Requirements and Equivalent Reference Virtual Machines CPU 1 Equivalent reference virtual machines = Resource requirements Memory 2 Equivalent reference virtual machines = Resource requirements/2 IOPS 25 Equivalent reference virtual machines = Resource requirements/25 Capacity 100 Equivalent reference virtual machines = Resource requirements/100 For example, application 2 in the customer configuration worksheet in Table 7 requires four CPUs, 16 GB of memory, 200 IOPS, and 200 GB of storage. This translates to four reference virtual machines for CPU, eight reference virtual machines for memory, eight reference virtual machines for IOPS, and two reference virtual machines for capacity. Table 9 shows how that application fits into the worksheet row. Table 9. Equivalent Reference Virtual Machines for Example Application 2 in Table 7 Application Example application CPU Resource requirements Memory (GB) IOPS (virtual CPUs) Capacity (GB) Equivalent Reference Virtual Machines 4 16 200 200 — 8 8 2 8 Equivalent reference virtual machines 4 Use the highest value in the row to complete the Equivalent Reference Virtual Machines column. The example requires eight reference virtual machines. The number of reference virtual machines required for each application type equals the maximum number required for an individual configuration worksheet example. For example, the number of equivalent reference virtual machines for the application in Table 9 is eight, as this number will meet all the resource requirements for IOPS, virtual CPU, and memory resources. Determining the Total Reference Virtual Machines After the worksheet is completed for each application that the user wants to migrate into the virtual infrastructure, compute the total number of reference virtual machines required in the resource pool by calculating the sum of the total reference virtual machines for all application types. In the example in Table 7, the total is 14 virtual machines. Deployment Please refer to Chapter 6, “VSPEX Solution Implementation,” in the EMC VSPEX Private Cloud: VMWare vSphere and EMC ScaleIO Proven Infrastructure Guide, for deployment details: http://www.emc.com/collateral/technical-documentation/h13156-vspex-private-cloud-vmware-vsphere-scaleiopig.pdf The following network option from the “Network Implementation” section of the “VSPEX Solution Implementation” chapter is provided to take advantage of operational efficiencies afforded by Cisco UCS rack servers with the VIC © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 25 of 30 1225 adapter connected to the Cisco UCS 6248UP fabric interconnects. This setup functions as a converged management and storage traffic network, thus simplifying the topology. The following link provides a quick setup of the Cisco UCS environment with fabric interconnects: http://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-manager/whitepaper_c11697337.html At the end of this procedure, a seven-node ScaleIO cluster consisting of Cisco UCS C240 rack servers, each with a VIC 1225, is built. Figure 11 shows a snapshot of service profiles and network settings. Figure 11. Service Profiles and Network Settings © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 26 of 30 Test and Validation Because of the scale-out multiple-node architecture of ScaleIO, EMC recommends that you consider the possibility of the loss of a system node. ScaleIO is designed to keep copies of data on multiple nodes to protect against such a loss. Any node loss affects the virtual machines running on that node, but it should not affect the other users of the ScaleIO environment. Post-Installation Checklist This section provides a list of items to review and tasks to perform after configuring the solution. The goal is to verify the configuration and functionality of specific aspects of the solution, and ensure that the configuration meets core availability requirements. Table 10 lists tasks for testing the installation. Table 10. Post-Installation Checklist Task Description Reference Basic checks Verify that sufficient virtual ports exist on each vSphere host virtual switch. vSphere Networking Verify that each vSphere host has access to the required ScaleIO datastores and VLANs. vSphere Storage Guide Verify that the vMotion interfaces are configured correctly on all vSphere hosts. vSphere Networking Deploy and test a single virtual server Deploy a single virtual machine using the vSphere interface. Verify that the virtual machine is joined to the application domain, can be logged into, and has access to expected networks. vCenter Server and Host Management vSphere Virtual Machine Management Verify redundancy of the solution components Verify the data protection of the ScaleIO system. Restart one ScaleIO node, and ensure that shared volume access is maintained. Failure Testing Disable each of the redundant network switches in turn and verify that the vSphere host and virtual machine are intact. Vendor documentation On a vSphere host that contains at least one virtual machine, enable maintenance mode and verify that the virtual machine can successfully migrate to an alternate host. vCenter Server and Host Management vSphere Networking Failure Testing To provide for system maintenance and hardware failures, a set of virtual machines running the reference workload were started on two of the three nodes in a ScaleIO environment. There were no virtual machines on the remaining node. At a predetermined point, the node with no virtual machines running was turned off. Predictably, the I/O latency of the system was affected due to the loss of one third of the storage resources, but the virtual machines running on the other nodes were still able to access all of their data. When the node was replaced, rebalancing occurred automatically in the background without operator intervention and with minimal impact to applications and users. Similar node and disk failure scenarios involving a seven-node cluster were conducted. Rebuild rate was monitored while ensuring application availability and cluster resiliency. Note: Similar tests with virtual machines running on all nodes show the expected result of High Availability (configured for the non-ScaleIO virtual machines) restarting virtual machines on the surviving nodes until the restart criteria are no longer valid. EMC recommends that you include one node more than the workload requires, to help ensure that you can support the environment during an outage or during system maintenance. The additional spare node should be configured to be as large as the largest active node in the cluster to accommodate a node failure. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 27 of 30 Another seven-node cluster with Cisco UCS 6248 fabric interconnects was built to conduct additional failure testing for cluster resiliency (Figure 12). This new cluster and the nodes in it were not profiled for virtual machine capacity, since that exercise and the guidance stemming from it was completed earlier with reference virtual machines. Each node in this setup consists of a Cisco UCS C240 rack server with VIC 1225 cards and nine internal 900-GB 10,000-rpm SAS disks under ScaleIO management. Two storage pools were created within the protected domain. A set of virtual machines running the reference workload was started on nodes in one storage pool. Disk and node failures were introduced while observations of connectivity, rebuild, and performance were made. The focus of such tests was to check for cluster resiliency while also not adversely affecting performance. To ascertain logical separation between storage pools, single disk failures were introduced in both pools concurrently. An MDM node failure was also forced to check on cluster setup and operation in the absence of the master node. In all these tests, the cluster continued to operate and application workload was available and operational within expected levels of performance. ScaleIO has the capability to group a set of nodes within a protected domain as a fault set. Fault sets are supposed to come into play when data is mirrored. ScaleIO performs mirroring only across fault sets to provide for the possible loss of a set of nodes with similar firmware or other characteristics, such as during maintenance. This feature was not tested. Figure 12. ScaleIO Fault Units Figure 13 shows a snapshot of the dashboard with relevant metrics. © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 28 of 30 Figure 13. Dashboard Showing Metrics Monitoring This VSPEX proven infrastructure provides an end-to-end solution that requires system monitoring of three discrete but highly interrelated areas: ● Servers, both virtual machines and clusters ● Networking ● ScaleIO Given the purview of this white paper, this section focuses primarily on monitoring key components of the ScaleIO infrastructure. Server resources (processing, memory, and disk) and network usage may be measured through tools such as esxtop and perfmon. Storage-related metrics may be monitored through vdbench. Key network metrics to track include aggregate throughput and latencies. ScaleIO Layer Monitoring the ScaleIO layer of a VSPEX implementation is crucial to maintaining the overall health and performance of the system. The ScaleIO GUI enables you to review the overall status of the system, drill down to the component level, and monitor these components. The various screens display different views and data that are beneficial to the storage administrator. The ScaleIO GUI provides an easy yet powerful manner with which to gain insight into how the underlying ScaleIO components are operating. There are several key areas to focus on, including: ● Dashboard screen © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 29 of 30 ● Protection domain screen ● Protection domain servers screen ● Storage pool screen The EMC ScaleIO User Guide on EMC Online Support provides more instructions for monitoring the ScaleIO layer. References The following documents, available on EMC Online Support, provide additional and relevant information. If you do not have access to a document, contact your EMC representative. ● EMC VSPEX Private Cloud: VMWare vSphere and EMC ScaleIO Proven Infrastructure Guide ● EMC Host Connectivity Guide for VMware ESX Server ● EMC ScaleIO User Guide The following documents, located on the VMware website, provide additional and relevant information: ● vSphere Networking ● vSphere Storage Guide ● vSphere Virtual Machine Administration ● vSphere Virtual Machine Management ● vSphere Installation and Setup ● vCenter Server and Host Management ● vSphere Resource Management ● Interpreting esxtop Statistics ● Preparing vCenter Server Databases ● Understanding Memory Resource Management in VMware vSphere 5.0 For documentation on Microsoft products, refer to the following Microsoft resources: ● Microsoft Developer Network ● Microsoft TechNet Printed in USA © 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. C11-733544-00 12/14 Page 30 of 30