Achieving a Million I/O Operations per Second from a

Achieving a Million I/O
Operations per Second from
a Single VMware vSphere
5.0 Host
®
Performance Study
TECHNICAL WHITE PAPER
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Table of Contents
Introduction...................................................................................................................................................................................................................3
Executive Summary ..................................................................................................................................................................................................3
Software and Hardware ...........................................................................................................................................................................................3
Test Workload ..............................................................................................................................................................................................................3
Multi-VM Tests ............................................................................................................................................................................................................. 4
Experimental Setup ........................................................................................................................................................................................... 4
Server................................................................................................................................................................................................................. 4
Storage Area Network ............................................................................................................................................................................... 4
Virtual Platform ............................................................................................................................................................................................. 4
Virtual Machines............................................................................................................................................................................................ 4
Iometer Workload ........................................................................................................................................................................................ 4
Test Bed .............................................................................................................................................................................................................5
Results ...................................................................................................................................................................................................................... 7
Scaling I/O Operations per Second with Multiple VMs ................................................................................................................ 7
Scaling I/O Throughput as I/O Request Size Increase ................................................................................................................ 7
CPU Cost of an I/O Operation with LSI Logic SAS and PVSCSI Virtual Controllers ...................................................... 9
Performance of a Single VM ................................................................................................................................................................................. 11
Experimental Setup ........................................................................................................................................................................................... 11
Server................................................................................................................................................................................................................. 11
Storage Area Network ............................................................................................................................................................................... 11
Virtual Platform ............................................................................................................................................................................................. 11
Virtual Machines............................................................................................................................................................................................ 11
Iometer Workload ........................................................................................................................................................................................ 11
Test Bed ............................................................................................................................................................................................................ 11
Results .................................................................................................................................................................................................................... 13
Scaling I/O Operations per Second with Virtual SCSI Controllers ........................................................................................ 13
Conclusion ................................................................................................................................................................................................................... 14
Appendix A ................................................................................................................................................................................................................. 15
Selecting the Right PCIe Slots for the HBAs......................................................................................................................................... 15
Building a Scalable Test Infrastructure.................................................................................................................................................... 16
Assigning Virtual Disks to a VM .................................................................................................................................................................. 16
Estimation of CPU Cost of an I/O Operation ........................................................................................................................................ 17
References ................................................................................................................................................................................................................... 17
About the Author ..................................................................................................................................................................................................... 17
Acknowledgements ......................................................................................................................................................................................... 17
TECHNICAL WHITE PAPER /2
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Introduction
One of the essential requirements for a platform supporting enterprise datacenters is the capability to support
1
the extreme I/O demands of applications running in those datacenters. A previous study has shown that vSphere
can easily handle demands for high I/O operations per second. Experiments discussed in this paper strengthen
this assertion further by demonstrating that a vSphere 5.0 virtual platform can easily satisfy an extremely high
level of I/O demand that originates from the hosted applications, as long as the hardware infrastructure meets
the need.
Executive Summary
The results obtained from the experiments show that:
•
A single vSphere 5.0 host is capable of supporting a million+ I/O operations per second.
•
300,000 I/O operations per second can be achieved from a single virtual machine (VM).
•
I/O throughput (bandwidth consumption) scales almost linearly as the request size of an I/O operation
increases.
•
I/O operations on vSphere 5.0 systems with Paravirtual SCSI (PVSCSI) controllers use less CPU cycles than
those with LSI Logic SAS virtual SCSI controllers.
Software and Hardware
Because of its capabilities and rich feature set, VMware vSphere has become one of the Industry’s leading choices
as a platform to build private and public clouds. Significant improvements have been made to vSphere’s storage
stack in successive releases. These improvements enable vSphere to easily satisfy the ever-increasing demand
for I/O throughput by almost all enterprise applications.
2
The Symmetrix VMAX storage system is a high end, highly scalable storage system from EMC . It allows the
scaling of storage system resources through common building blocks called Symmetrix VMAX engines. VMAX
engines can be scaled from one VMAX engine with one storage bay, to eight VMAX engines with a maximum of
ten storage bays. Each VMAX engine contains four quad-core processors, up to 128GB of memory, and up to 16
front-end ports for host connectivity.
3
The Emulex LightPulse LPe120023 is an 8Gbits/s Fibre Channel PCI Express dual-channel host bus adapter . The
LPe12002 delivers some of the industry’s highest performance, CPU efficiency, and reliability, making it a
convenient choice to enable mission-critical and I/O-intensive applications in cloud environments.
Test Workload
4
Iometer was used to generate the I/O load in all the experiments discussed in the paper . Iometer was configured
to generate 16 or 32 Outstanding I/Os (OIOs) with 100% random and 100% read requests. The size of the I/O
requests varied between 512 bytes, 1KB, 2KB, 4KB, and 8KB depending on the experiments. The I/O sizes used in
the experiments represent the I/O characteristics of a wide gamut of transaction-oriented applications such as
databases and enterprise mail servers.
TECHNICAL WHITE PAPER /3
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Multi-VM Tests
Experimental Setup
Server
•
4 Intel Xeon Processors E7-4870, 2.40GHz, 10 cores
•
256GB memory
•
6 dual-port Emulex LPe12002 HBAs (8Gbps)
Storage Area Network
•
VMAX with 8 engines
•
4 quad-core processors and 128GB of memory per engine
•
64 front-end 8Gbps Fibre Channel (FC) ports
•
64 back-end 4Gbps FC ports
•
960 * 15K RPM, 450GB FC drives
•
1 FC switch
Virtual Platform
•
vSphere 5.0
Virtual Machines
•
Windows Server 2008 R2 EE x64
•
4 vCPUs
•
8GB memory
•
3 virtual SCSI controllers
Iometer Workload
•
1 worker for a pair of virtual disks (five workers per VM)
•
100% random
•
100% read
•
An access region of 6.4GB in each virtual disk (a total of 384GB on 60 virtual disks)
•
16 or 32 OIOs depending on the test case
•
Request size of 512 bytes, 1KB, 2KB, 4KB, or 8KB depending on the test case
TECHNICAL WHITE PAPER /4
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Test Bed
A single physical server running vSphere 5.0 was used for the experiments. Six identical VMs were created on this
host. Six dual-port 8Gbps FC HBAs were installed in the vSphere host. All 12 FC ports of these HBAs were
connected to a FC switch. The VMAX was connected to the same FC switch via 60 8Gbps FC links, with each link
connected to a separate front-end FC port on the array. Refer to “Building a Scalable Test Infrastructure” in the
appendix for more details.
A total of 480 RAID-1 groups were created on top of 960 FC drives in the VMAX array. A single LUN was created
in each RAID group. A 250GB metaLUN was created on a set of 8 LUNs, resulting in a total of 60 metaLUNs. All
60 metaLUNs were exposed to the vSphere host. Separate VMFS datastores were created on each of the 60
metaLUNs. To ensure high concurrency and effective use of all the available I/O processing capacity, each VMFS
datastore was configured to use a fixed path via a dedicated FC port on the array.
*
A single thick virtual disk was created in each VMFS datastore. The virtual disks were assigned to the VMs as
follows:
†
•
Each VM was assigned a total of 10 virtual disks for Iometer testing.
•
All 10 virtual disks in a VM were distributed across three virtual SCSI controllers. Refer to “Assigning Virtual
Disks to a VM” for more details on the distribution of virtual disks.
•
The virtual SCSI controller type was varied between LSI Logic SAS and PVSCSI for the comparison tests.
•
A total of 384GB (6.4GB in each virtual disk) of disk space was used by Iometer to generate I/O load in all
VMs. VMAX, because of its 1TB aggregate memory, was able to cache most of the disk blocks belonging to
this 384GB disk space.
*
Also known as eagerzeroed thick.
These virtual disks were different than the virtual disk that contained operating system and Iometer application–related
files.
†
TECHNICAL WHITE PAPER /5
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Figure 1. Configuration for Multi-VM Tests
TECHNICAL WHITE PAPER /6
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Results
For each experiment, different metrics such as I/O operations per second (IOPS), megabytes per second (MBps),
and latency of a single I/O operation measured in milliseconds (ms) were collected to analyze the I/O
performance measured under different test scenarios.
Scaling I/O Operations per Second with Multiple VMs
This test focused on illustrating the scalability of I/O operations per second on a single host as the number of VMs
generating the I/O load increased. Iometer in each VM was configured to generate a similar I/O load using 32
OIOs of 8KB size. The number of VMs generating the I/O load was increased from one to six. In each case, the
total I/O operations per second and average latency for each I/O operation was measured in each VM. This test
was done with only PVSCSI controllers.
2.4
IOPs
1,000,000
2.0
Latency
800,000
1.6
600,000
1.2
400,000
0.8
200,000
0.4
0
0.0
1
2
3
4
5
Latency (ms)
Input/Output Operations per Second
(IOPS)
1,200,000
6
Number of Virtual Machines
Figure 2. Scaling I/O Operations per Second with Multiple VMs
Figure 2 shows the aggregate number of I/O operations per second achieved as the number of VMs were
increased. Aggregate I/O operations per second scaled from 200 thousand to slightly above 1 million as the
number of VMs was increased from one to six. The latency of I/O operations remained under 2 milliseconds
throughout the test, increasing by only 10% as the I/O load increased on the host.
Scaling I/O Throughput as I/O Request Size Increase
This test focused on demonstrating the scalability of I/O throughput achieved on the host as the size of the I/O
operations increased from 512 bytes to 8KB. Iometer in each VM was configured to generate a similar I/O load
using 16 outstanding I/Os of varying size. The number of VMs generating the I/O load was fixed at six. The total
I/O operations per second, I/O throughput (megabytes per second), and average latency for each I/O operation
was measured in each VM. This test was run with LSI Logic SAS and PVSCSI virtual controllers in order to
compare their impact on overall I/O performance.
TECHNICAL WHITE PAPER /7
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
LSI - IOPs
pVSCSI - IOPs
LSI - Latency
pVSCSI - Latency
1.4
1,200,000
1.2
1,000,000
1.0
800,000
0.8
600,000
0.6
400,000
0.4
200,000
0.2
0
0.0
512 bytes
1KB
2KB
I/O Size
4KB
I/O Latency (ms)
IOPS
1,400,000
8KB
Figure 3. I/O Operations per Second with Different Request Sizes
Figure 3 shows the aggregate I/O performance of all six VMs when generating similar I/O load, but each time
with a different request size. For each request size, experiments were conducted with both LSI Logic SAS and
PVSCSI controllers. As seen Figure 3, aggregate I/O operations achieved from six VMs remained well over 1
million for all I/O sizes except 8KB. The average latency of a single I/O operation remained well under a
‡
millisecond with each virtual SCSI controller type, except at 8KB request size. The increase in I/O latency as
measured by Iometer was due to a corresponding increase in the I/O latency at the storage.
Please note that the slightly lower number of aggregate IOPS observed with 8KB request size and 6 VMs in this
test case compared to that with a same request size and a same number of VMs in the test case described in the
section titled “Scaling I/O Operations per Second with Multiple VMs” was primarily due to a lower number of
OIOs used for this test.
‡
Note that the I/O latency in this test was lower than that in the section titled “Scaling I/O Operations per Second with
Multiple VMs” as the number of OIOs was half of that used for the latter test.
TECHNICAL WHITE PAPER /8
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
8,000
IO Throughput (MBytes/sec)
LSI - MBps
pVSCSI - MBps
6,000
4,000
2,000
0
0
1
2
3
4
5
6
7
8
I/O Size (KB)
Figure 4. Scaling I/O Throughput with the Request Size
The corresponding aggregate I/O throughput observed in the host is shown in Figure 4. As seen in the figure,
throughput scales almost linearly as I/O request size is doubled in each iteration. vSphere utilized the available
I/O bandwidth to scale almost linearly from 592MB per second to almost 8GB per second as the I/O request size
increased from 512 bytes to 8KB. The linear scaling clearly indicates that the vSphere software stack didn’t
present any type of bottlenecks to the I/O workload originating from the VMs. The slight drop observed at 8KB
request size was due to a corresponding increase in the I/O latency observed in the storage array.
Figures 3 and 4 also compare the aggregate I/O performance of the host when using two different virtual SCSI
controllers - LSI Logic SAS and PVSCSI. The PVSCSI adapter provided 7% to 10% better throughput over LSI
Logic SAS in all of the cases.
CPU Cost of an I/O Operation with LSI Logic SAS and PVSCSI Virtual Controllers
Another important metric that is useful to compare the performance of LSI Logic and PVSCSI adapters is the CPU
cost of an I/O operation. A detailed explanation of the estimation methodology is given in “Estimation of CPU
Cost of an Operation.” Figure 6 compares the CPU cost of an I/O operation with a LSI Logic SAS virtual SCSI
adapter to that with a PVSCSI adapter for an I/O request size of 8KB.
TECHNICAL WHITE PAPER /9
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
IOPs
Cycles / IO
1.2
1,000,000
1.0
800,000
0.8
600,000
0.6
400,000
0.4
200,000
0.2
0
0.0
LSI
Normalized Cycles / IO
IOPS
1,200,000
PVSCSI
Virtual SCSI Controller
Figure 5. CPU Cost of an I/O Operation with LSI Logic SAS and PVSCSI Adapters
As seen in Figure 5, a PVSCSI adapter provides 8% better throughput at 10% lower CPU cost. These results clearly
show that PVSCSI adapters are capable of providing better throughput at a lower CPU cost than LSI Logic SAS
adapters at extreme load conditions.
TECHNICAL WHITE PAPER /10
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Performance of a Single VM
The next study focuses on the performance of a single VM.
Experimental Setup
Server
•
4 Intel Xeon Processors E7-4870, 2.40GHz, 10 cores
•
256GB memory
•
4 dual-port Emulex LPe12002 HBAs (8Gbps)
Storage Area Network
•
VMAX with 5 engines
•
4 quad-core processors and 128GB of memory per engine
•
40 front-end 8Gbps FC ports
•
40 back-end 4Gbps FC ports
•
960 * 15K RPM, 450GB FC drives
•
1 FC switch
Virtual Platform
•
vSphere 5.0
Virtual Machines
•
Windows Server 2008 R2 EE x64
•
16 vCPUs
•
16GB memory
•
1 to 4 virtual SCSI controllers
Iometer Workload
•
Two workers for every 10 virtual disks
•
100% random
•
100% read
•
An access region of 6.4GB in each virtual disk
•
16 OIOs
•
Request size of 8K Bytes
Test Bed
A single VM running on the vSphere 5.0 host was used for this test. The number of vCPUs was increased from 4
to 16. The amount of memory assigned to the VM was increased from 8GB to 16GB. The storage layout created
for the multi-VM tests was reused for this study. Iometer was configured to use a total of 40 virtual disks. All the
virtual disks were assigned to the single test VM in increments of 10. Each time, the set of 10 virtual disks were
§
attached to a new virtual SCSI controller . Iometer was configured to generate the same I/O load on every virtual
§
vSphere 5.0 supports a total of 4 virtual SCSI controllers per VM.
TECHNICAL WHITE PAPER /11
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
disk. This ensured a constant load on each virtual SCSI controller.
Each set of 10 virtual disks was accessed through a dual-port HBA in the host but separate FC ports on the array.
A total of four dual-port HBAs in the host and 40 front-end FC ports in the array were used to access all 40
virtual disks.
Figure 6. Test Configuration for Single-VM test
TECHNICAL WHITE PAPER /12
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Results
As done in the multi-VM test case, different metrics such as I/O operations per second and latency for a single
I/O operation measured in milliseconds were used to study the I/O performance.
Scaling I/O Operations per Second with Virtual SCSI Controllers
This test case focused on studying the scalability of I/O operations per second when the number of virtual SCSI
controllers in a VM was increased.
2.8
420,000
pVSCSI - Latency
360,000
2.4
300,000
2.0
240,000
1.6
180,000
1.2
120,000
0.8
60,000
0.4
I/O Latency (ms)
IOPS
pVSCSI - IOPs
0.0
0
1
2
3
4
Number of Virtual SCSI Controllers
Figure 7. Scaling I/O Operations per Second with 8KB Request Size in a Single VM
Figure 7 shows the aggregate I/O operations per second and the average latency of an I/O operation achieved
from a single VM as the number of virtual SCSI controllers was increased. As shown in Figure 8, aggregate I/O
operations per second increased linearly while the latency of I/O operations remained close after increasing from
one to two virtual SCSI controllers.
TECHNICAL WHITE PAPER /13
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Conclusion
vSphere offers a virtualization layer that, in conjunction with some of the industry’s biggest storage platforms,
can be used to create private and public clouds that are capable of supporting any level of I/O demands
originating from the applications running in those clouds. Results of the experiments conducted at EMC labs
illustrate that:
•
vSphere can easily support a million+ I/O operations per second from a single host out of the box.
•
A single VM running on vSphere 5.0 is capable of supporting 300,000 I/O operations per second at an 8KB
request size.
•
vSphere can scale linearly in terms of I/O throughput (bandwidth consumption) as the request size of an I/O
operation increases.
•
vSphere’s Paravirtual SCSI controller offers lower CPU cost for an I/O operation compared to that of the LSI
Logic SAS virtual SCSI controller.
The results presented in this paper are by no means the upper limit for the I/O operations achievable through any
of the components used for the tests. The intent is to show that a vSphere virtual infrastructure, such as the one
used for this study, can easily handle even the most extreme I/O demands that exist in datacenters today.
Customers can virtualize their most I/O-intensive applications with confidence on a platform powered by
vSphere 5 and EMC VMAX.
TECHNICAL WHITE PAPER /14
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Appendix A
Selecting the Right PCIe Slots for the HBAs
To achieve a million+ I/O operations per second with an 8KB request size through the least number of HBAs
required the HBAs to collectively provide 8GB per second of throughput. Although five HBAs, each with a dual
port and supporting 8Gbps, could theoretically meet the 8GB per second throughput requirement, at that
throughput each HBA would be operating near the HBA’s saturation regime. To ensure sufficient concurrency
and still have enough capacity to push a million+ IOPS, six HBAs were used for the exercise. This required each
HBA to support at least 1.3GB per second. To support that kind of throughput, the HBAs had to be placed in those
PCIe slots that were capable of supporting the required throughput from each HBA. The HBAs were placed in
different slots as shown in Table 1. The maximum theoretical throughput of each slot is also shown in the table.
Figure 8. PCIe Slot Configuration of the Server used for the Tests
SLOT
NUMBER
PCIe
VERSION
NUMBER OF
LANES
MAXIMUM
THROUGHPUT **
1
Gen 2
4
2GBps
2, 3, 4, 6
Gen 2
8
4GBps
7
Gen 2
16
8GBps
Table 1. Details of the PCIe Slots
**
Each PCIe Gen 2 channel provides a theoretical max of 500MB per second.
TECHNICAL WHITE PAPER /15
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Building a Scalable Test Infrastructure
To build a scalable virtual infrastructure that can support a million+ IOPS, a few tests were conducted initially. A
single VM with 2 vCPUs and 8GB of memory was chosen as the basic building block. This VM was assigned 4
thick virtual disks, each created on a separate VMFS datastore. Each datastore was created on a separate
metaLUN in the VMAX array. Each datastore was accessed through a single 8Gbps FC port in the host, but with
dedicated FC ports on the array. Iometer was configured to issue 100% random, 100% read requests to 6.4GB
worth of storage space on each disk.
With this Iometer profile, the VM produced 120K IOPS and 100K IOPS with 512 bytes and 8KB request sizes,
respectively. If the vCPU and storage configuration of the VMs were to be doubled, it was theoretically possible
to achieve 240K and 200K IOPS with 512 bytes and 8KB request sizes respectively through a dual-port 8Gbps FC
HBA. Thus, a single VM with the following configuration was chosen as the building block for the tests: 4 vCPUs
and 8GB of memory, issuing I/O requests to 10 virtual disks (all on separate VMFS datastores) through two ports
of a dual-port adapter. The number of VMs was increased to six, with each VM being identical in the
configuration. The final test setup used is shown in Figure 1.
Assigning Virtual Disks to a VM
To achieve enough parallelism while pushing a high number of I/O operations per second, virtual disks in each VM
were spread across multiple virtual SCSI controllers as follows:
NUMBER OF VDISKS
VIRTUAL SCSI
CONTROLLER ID
4
0
3
1
3
2
Table 2. Virtual Disk Assignment for Multi-VM tests
NUMBER OF VDISKS
VIRTUAL SCSI
CONTROLLER ID
10
0
10
1
10
2
10
3
Table 3. Virtual Disk Assignment for Single-VM Test
TECHNICAL WHITE PAPER /16
Achieving a Million I/O Operations per Second
from a Single VMware vSphere 5.0 Host
Estimation of CPU Cost of an I/O Operation
%Processor time of the vSphere host was used to estimate the CPU cost of an I/O operation. When all six VMs
were actively issuing 8KB I/O requests to their storage, the total %Processor time of the vSphere host was
5
recorded using esxtop . The CPU cost of an I/O operation was estimated using the following equation:
(Average % processor time of the host × Rated CPU clock frequency × Number of logical threads) ÷ (100 ×
Total IO operations per second)
The rated clock frequency of the processors in the server was 2.4GHz. By default, hyperthreading was enabled on
the vSphere host. Hence the number of logical threads was 80.
References
1. “VMware vSPhere 4 Performance with Extreme I/O Workloads.” VMware, Inc., 2009.
http://www.vmware.com/pdf/vsp_4_extreme_io.pdf.
2. “EMC Symmetrix VMAX storage system.” EMC Corporation, 2011.
http://www.emc.com/collateral/hardware/specification-sheet/h6176-symmetrix-vmax-storage-system.pdf.
3. “LightPulse LPe12002.” Emulex Corporation, 2008.
http://www.emulex.com/products/host-bus-adapters/emulex-branded/lightpulse-lpe12002/overview.html.
4. “Iometer.” http://www.iometer.org/.
5. “Interpreting esxtop Statistics.” VMware, Inc., 2010. http://communities.vmware.com/docs/DOC-9279.
About the Author
Chethan Kumar is a senior member of Performance Engineering at VMware, where his work focuses on
performance-related topics concerning database and storage. He has presented his findings in white papers and
blog articles, and technical papers in academic conferences and at VMworld.
Acknowledgements
The author would like to thank VMware’s partners—Intel, Emulex, and EMC—for providing the necessary gear to
build the infrastructure that was used for the experiments discussed in the paper. He would also like to thank the
Symmetrix Performance Engineering team of Thomas Rogers, John Aurin, John Adams and Dan Aharoni for
installing and configuring the hardware setup and for running all the experiments. Finally, the author would like to
thank Chad Sakac, VP of VMware Alliance, EMC for supporting this effort and ensuring the availability of the
hardware components to complete the exercise in time.
TECHNICAL WHITE PAPER /17
Achieving a Million I/O Operations per Second
from a Single vSphere 5.0 Host
VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 www.vmware.com
Copyright © 2011 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at
http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be
trademarks of their respective companies.