vSphere Performance Best Practices
Rob Moran
Premier Services Engineer – VMware Global Support Services – Cork, Ireland
© 2009 VMware Inc. All rights reserved
Global Support Services and Customer Advocacy
Burlington, Canada
Palo Alto, CA
Cork, Ireland
Broomfield, CO
Tokyo, Japan
Bangalore, India
Support offices
Local language support
Spanish, Portuguese, French, German, Japanese, Chinese
Global Coverage
24x7, 365 days/year
6 Support Centers
1000+ Support
Engineers
2
Follow-the-sun
Support for
Severity 1 Issues
Support Relationships
with 100% of the
Fortune 100;
99% of Fortune 500
Customer Support Day Events
Coming to a location near you: sharing of VMware best practices!
•
Support Days are a collaboration between VMware Support, Sales and
customers – you learn directly from the experts
•
Topics are driven by
customer input, and
typically include:
•
•
•
•
•
Best practices
Tips/tricks
Top issues
Product roadmaps/demos
Certification offerings
http://www.vmware.com/go/supportdays
3
Overview
What a performance problem sounds like:
• “My VM is running slow and I don’t know what to do!”
• “I tried adding more memory and CPUs but the problem got worse!”`
• “My VM is slow on one host but fast on another!”
What to look for? Where to start?
We will explore some of the most common performance-related
issues that our support centers receive cases for
4
A word about performance….
 Troubleshooting methodology must define:
• How to find root cause
• How to fix the problem
 Must answer these questions:
1. How do we know when we are done?
2. Where do we start looking for problems?
3. How do we know what to look for to identify a problem?
4. How do we find the root-cause of a problem we have identified?
5. What do we change to fix the root-cause?
6. Where do we look next if no problem is found?
5
Agenda
 Benchmarking & Tools
 Best Practices and Troubleshooting
 The 4 “food groups”
• Memory
• CPU
• Storage
• Network
6
BENCHMARKING & TOOLS
© 2009 VMware Inc. All rights reserved
Benchmarking
 Consistent and reproducible results
 Important to have base level of acceptable performance
• Expectation vs. Acceptable
 Determine baseline of performance prior to deployment
• Benchmark on a physical system if applicable
 Avoid subjective metrics, stay quantitative
• “The system seems slower”
• “This worked better last year”
8
Benchmarking
 Benchmarking should be done at the application layer
• Use application-specific benchmarking tools and load generators
• Check with the application vendor
 Isolate variables, benchmark optimum situation before introducing
load
 Understand dependencies
• Human interaction
• Other “food groups”
• Compare apples-to-apples
9
Tools – vCenter Operations
Slide 10
 Aggregates thousands of metrics into Workload, Capacity, Health
scores
 Self-learns “normal” conditions using patented analytics
 Smart alerts of impending performance and capacity degradation
 Identifies potential performance problems before they start
10
Tools – vCenter Operations
Slide 11
11
Tools – esxtop
 Valuable tool built in to vSphere hosts
 View or capture real-time data
• View or playback data later
• Import data in 3rd party tools
 vSphere Client performance graphs get their data from the kernel
and VSI
• Presentation/unit may be different (e.g. %RDY)
12
MEMORY
© 2009 VMware Inc. All rights reserved
Memory – Overhead
 A VM’s RAM is not necessarily machine RAM
• vRAM + overhead = maximum machine RAM
Source: vSphere 5.1 Resource Management Guide
• Note: These are estimated values
14
Memory – Transparent Page Sharing
15
Memory – Host Memory Management
Occurs when memory is under contention
 Ballooning
 Compression
 Swapping
16
Memory – Ballooning
17
Memory – Compression
18
Memory – Swapping
19
Memory – Swapping
20
Memory – VM Resource Allocation
21
Memory – Resource Pool Allocation
22
Memory – Ballooning vs. Swapping
 Ballooning is better than swapping
 Guest can surrender unused/free pages
 Guest chooses what to swap, can avoid swapping “hot” pages
23
Memory – Rightsizing
 Generally it is better to OVER-commit than UNDER-commit
 If the running VMs are consuming too much host/pool memory…
• Some VMs may not get physical memory
• Ballooning or host swapping
• Higher disk IO
• All VMs slow down
24
Memory – Rightsizing
 If a VM has too little vRAM…
• Applications suffer from lack of RAM
• The guest OS swaps
• Increased disk traffic, thrashing
• SAN slow down as a result of increased disk traffic
 If a VM has too much vRAM…
• Higher overhead memory
• Possible decreased failover capacity
• Longer vMotion time
• Larger VSWP file
• Wasted resources
25
Memory – Troubleshooting
 Wrong resource allocation
 May not notice a limit, e.g. VM or template with a limit gets cloned
 Custom share values
 Ballooning or swapping at the host level
• Ballooning is a warning sign, not a problem
• Swapping is a performance issue if seen over an extended period
 Swapping/paging at the guest level
• Under-provisioned guest memory
 Missing balloon driver (Tools)
26
Memory – Best Practices
 Avoid high active host memory over-commitment
• No host swapping occurs when total memory demand is less than the physical
memory (Assuming no limits)
 Right-size guest memory
• Avoid guest OS swapping
 Ensure there is enough vRAM to cover demand peaks
 Use a fully automated DRS cluster
• Use Resource Pools with High/Normal/Low shares
• Avoid using custom shares
27
CPU
© 2009 VMware Inc. All rights reserved
CPU – Overview
 Raw processing power of a given host or VM
• Hosts provide CPU resources
• VMs and Resource Pools consume CPU resources
 CPU cores/threads need to be shared between VMs
 Fair scheduling vCPU time
• Hardware interrupts for a VM
• Parallel processing for SMP VMs
• I/O
29
CPU – esxtop
30
CPU – esxtop
 Interpret the esxtop columns correctly
 %RDY - The percentage of time a VM is ready to run, but no
physical processor is ready to run it which may result in decreased
performance





%USED – Physical CPU usage
%SYS – Percentage of time in the VMkernel
%RUN – Percentage of total scheduled time to run
%WAIT – Percentage of time in blocked or busy wait states
%IDLE – %WAIT- %IDLE can be used to estimate I/O wait time
31
CPU – Performance Overhead & Utilization
 Different workloads have different overhead costs (%SYS) even for
the same utilization (%USED)
 CPU virtualization adds varying amounts of system overhead
• Direct execution vs. privileged execution
• Non-paravirtual adapters vs. emulated adaptors
• Virtual hardware (Interrupts!)
• Network and storage I/O
32
CPU – vSMP
 Relaxed Co-Scheduling: vCPUs can run out-of-sync
 Idle vCPUs incur a scheduling penalty
• configure only as many vCPUs as needed
• Imposes unnecessary scheduling constraints
 Use Uniprocessor VMs for single-threaded applications
33
CPU– Scheduling
Over committing physical CPUs
VMkernel CPU Scheduler
34
CPU– Scheduling
Over committing physical CPUs
X
VMkernel CPU Scheduler
35
X
CPU– Scheduling
Over committing physical CPUs
XX
X X
VMkernel CPU Scheduler
36
CPU – Ready Time
 The percentage of time that a vCPU is ready to execute, but waiting
for physical CPU time
 Does not necessarily indicate a problem
• Indicates possible CPU contention or limits
37
CPU – NUMA nodes
 Non-Uniform Memory Access system architecture
 Each node consists of CPU cores and memory
 A CPU core in one NUMA node can access memory in another
node, but at a small performance cost
NUMA node 1
38
NUMA node 2
CPU – Troubleshooting
 vCPU to pCPU over allocation
• HyperThreading does not double CPU capacity!
 Limits or too many reservations
• can create artificial limits.
 Expecting the same consolidation ratios with different workloads
• Virtualizing “easy” systems first, then expanding to heavier systems
• Compare Apples to Apples
• Frequency, turbo, cache sizes, cache sharing, core count, instruction set…
39
CPU – Best Practices
 Right-size vSMP VMs
 Keep heavy-hitters separated
• Fully automated DRS should do this for you
• Use anti-affinity rules if necessary
 Use a fully automated DRS cluster
• Test that vMotion works
• Use Resource Pools with High/Normal/Low shares
• Avoid using custom shares
40
STORAGE
© 2009 VMware Inc. All rights reserved
Storage – esxtop Counters
 Different esxtop storage views
• Adapter (d)
• VM (v)
• Disk Device (u)
 Key Fields:
• DAVG + KAVG = GAVG
• QUED/USD – Command Queue Depth
• CMDS/s – Commands Per Second
• MBREADS/s
• MBWRTN/s
42
Storage – Troubleshooting with esxtop
 High DAVG: issue beyond the adapter
• bad/overloaded zoning, over utilized storage processors, too few platters in the
RAID set, etc.
 High KAVG: issue in the kernel storage stack
• Driver issue
• Full queue
 Aborts: GAVG exceeding 5000 ms
• Command will be repeated, storage delay for the VM
43
Storage – Benchmarking with iometer
44
Storage – Storage I/O Control
 Allows the use of Shares per VMDK
 Throttling occurs when datastore reaches latency threshold
• Higher share VMDKs perform IO first
 vCenter monitors latency across all hosts
• Not effective if datastore shared with other vCenters
45
Storage – Storage DRS
 Datastore clusters
• Maintenance mode
• Anti-affinity rules
 vCenter monitors for latency and disk space
• Migrate VMDKs for better performance or utilization
 Not effective with automated tiering SANs
• Check HCL to confirm these features are compatible
46
Storage – Troubleshooting
 Snapshots
 Excessive traffic down one HBA / Switch / SP can cause latency
• Consider using Round Robin in conjunction with ALUA
• Always be paranoid when it comes to monitoring storage I/O
 Consider your I/O patterns
• Peak time for storage IO?
• Virus scans, database maintenance, user logins
 Always consult with array vendor
• They know the best practices for their array!
47
Storage – Best Practices
 Use different tiers of storage for different VM workloads
• Slower storage for OS VMDKs
• Faster storage for databases or other high-IO applications
 Use the Paravirtual SCSI adapter
• Reduced overhead, higher throughput
 Use path balancing where possible, either through 3rd party
plugins / Round Robin and ALUA, if supported.
 Use Storage DRS with SIOC
• Balance for both free space and latency
• Simplified datastore management
48
NETWORK
© 2009 VMware Inc. All rights reserved
Network – Load Balancing
 Load balancing defines which uplink is used
• Route based on Port ID
• Route based on IP hash
• Route based on MAC hash
• Route based on NIC load (Load Based Teaming)
 Probability of high-bandwidth VMs being on the same physical NIC
 Traffic will stay on elected uplink until an event occurs
• NIC link state change, adding/removing NIC from a team, beacon probe
timeout…
50
Network – Troubleshooting
 Check counters for NICs and VMs
• Network load imbalance
• 10 Gbps NICs can incur a significant CPU load when running at 100%
 Ensure hardware supports TSO
• Use latest drivers and firmware for your NIC on the host
 For multi-tier VM applications, use DRS affinity rules to keep VMs
on same host
• Same vSwitch / VLAN, rules out physical network
 If using Jumbo Frames, ensure it is enabled end-to-end
51
Network – Best Practices
 Use the vmxnet3 virtual adapter
• Less CPU overhead
• 10 Gbps connection to vSwitch
 Use the latest driver/firmware for the NICs on the host
 Use network shares
• Requires Virtual Distributed Switch 4.1
 Isolate vMotion and iSCSI traffic from regular VM traffic
• Separate vSwitches with dedicated NIC(s)
• Most applicable with Gigabit NICs
52
How to measure the network?
 scp from/to ESXi host is not valid check!
 With scp we will involve underlying storage on source and
destination VM/host
 CPU can affect the test, scp will encrypt/decrypt the network flow
 Copy to ESXi host can give false result as the management
interface has very limited resources
53
How to check network performance?
 VM – VM on same ESXi host. This will exclude physical network
problems
 VM –VM on different ESXi host. This will involve physical NICs and
switch as well
 Physical – VM. Will also test physical devices but we can focus on
one VM
 Physical – Physical: this will give us some number about what to
expect
 Use iperf/jperf/netperf. Free tool for network test
54
Iperf
55
Iperf




Windows and Linux version
Will not use storage
We can use different option for test (UDP/TCP)
Automatically calculates bandwith
56
In conclusion…
57
Key Takeaways – Performance Best Practices
 Understand your environment
• Hardware, storage, networking
• VMs & applications
 Advanced configuration values do not need to be tweaked or
modified
• In almost all situations
 Use fully automated DRS
 Use Paravirtual hardware
58
Important Links
59
Important Links
60