ESX Performance Troubleshooting

ESX Performance Troubleshooting VMware Technical Support Broomfield, Colorado Confidential © 2009 VMware Inc. All rights reserved What is slow performance? •What does slow performance mean? • Application responds slowly - latency • Application takes longer time to do a job – throughput Both related to time •Interpretation varies wildly • Slower than expectation • Throughput is low • Latency is high • Throughput, latency fine but uses excessive resources (efficiency) •What are high latency, low throughput, and excessive resource usage? • These are subjective and relative Bandwidth, Throughput, Goodput, Latency Bandwidth vs. Throughput • Higher Bandwidth does not guarantee Throughput. • Low Bandwidth is a bottleneck for higher Throughput Throughput vs. Goodput • Higher Throughput does not mean higher Goodput • Low Throughput is indicative of lower Goodput Efficiency = Goodput/Bandwidth Throughput vs. Latency • Low Latency does not guarantee higher Throughput and vice versa • Throughput or Latency alone can dominate performance Bandwidth, Throughput, Goodput, Latency Bandwidth Latency Goodput Throughput How to measure performance? Higher throughput does not necessarily mean higher performance – Goodput could be low Throughput is easy to measure, but Goodput is not How do we measure performance? • Performance is actually never measured • We could only quantify different metrics that affect performance. These metrics describe the state of: CPU, memory, disk and network Performance Metrics CPU • Throughput: MIPS (%used), Goodput: useful instructions • Latency: Instruction Latency (cache latency, cache miss) Memory • Throughput: MB/Sec, Goodput: useful data • Latency: nanosecs Storage • Throughput: MB/Sec, IOPS/Sec, Goodput: useful data • Latency: Seek time Networking • Throughput: MB/Sec, IO/Sec, Goodput: useful traffic • Latency: microseconds Hardware and Performance CPU • Processor Architecture: Intel XEON, AMD Opteron • Processor cache – L1, L2, L3, TLB • Hyperthreading • NUMA Hardware and Performance Processor Architecture • Clock Speeds from one architecture is not comparable with other  P-III outperforms P4 on a clock by clock basis  Opteron outperforms P4 on a clock by clock basis • Higher clock speeds is not always beneficial  Bigger cache or better architecture may outperform higher clock speeds • Processor memory communication is often the performance bottleneck  Processor wastes 100’s of instruction cycles while waiting on memory access  Caching alleviates this issue Hardware and Performance Processor Cache • Cache reduces memory access latency • Bigger cache increases cache hit probability • Why not build bigger cache ?  Expensive  Cache access latency increases with cache size • Cache is built into stages – L1, L2, L3 with varying cache access latency • ESX benefits from larger cache sizes • L3 cache seems to boost performance of networking workloads Hardware and Performance TLB – Translation Lookaside Buffer • Every running process needs virtual address (VA) to physical address (PA) translation • Historically this translation table was done entirely from memory • Since memory access is significantly slower and process needs access to this table on every context switch, TLB was introduced • TLB is a hardware circuitry that caches VA to PA mappings • When VA is not available in TLB, Page Fault occurs and OS needs to bring the address to TLB (load latency) • Performance of application depends on effective use of TLB • TLB is flushed during context switch Hardware and performance Hyperthreading • Introduced with Pentium 4 and Xeon processors • Allows simultaneous execution of two threads on a single processor • HT maintains separate architectural states for the same processor but shares underlying processor resources like execution unit, cache etc • HT strives to improve throughput by taking advantage of processor stalls on the logical processor • HT performance could be worse than UniProcessor (non-HT) performance if the threads have higher cache hit (more than 50%) Hardware and Performance Multicores • Cores have their own L1 Cache • L2 Cache is shared between processors • Cache coherency is relatively faster compared to SMP systems • Performance scaling is same as SMP systems Hardware and performance NUMA • Memory contention increases as the number of processors increase • NUMA alleviates memory contention by localizing memory per processor Hardware and Performance - Memory Node Interleaving • Opteron processors supports two type of memory access – NUMA and Node Interleaving mode • Node interleaving mode alternates memory pages between processor nodes so that the memory latencies are made uniform. This can offer performance improvements to systems that are not NUMA aware • NUMA on single core Opteron systems contains only one core per NUMA node. • SMP VM on ESX running on a single core Opteron systems will have to access memory across the NUMA boundary. So SMP VMs may benefit from Node Interleaving • On dual core Opteron systems a single NUMA node will have two cores. So NUMA mode could be turned on. Hardware and Performance – I/O devices I/O Devices • PCI-E, PCI-X, PCI  PCI at 66MHz – 533 MB/s  PCI-X at 133 MHz – 1066 MB/s  PCI-X at 266 MHz – 2133 MB/s  PCI-E bandwidth depends on the number of Lanes, x16 Lanes - 4GB/s, each Lane adds 250 MB/s. • PCI bus saturation – dual port, quad port devices  In PCI protocol the bus bandwidth is shared by all the devices in the bus. Only one device could communicate at a time.  PCI-E allows parallel full duplex transmission with the use of Lanes Hardware and Performance – I/O Devices SCSI • Ultra3/Ultra 160 SCSI – 160 MB/s • Ultra320 SCSI – 320 MB/s • SAS 3Gbps– 300 MB/s duplex FC • Speed constrained by Medium, Laser wavelength • Link Speeds: 1G FC – 200 MB/s, 2G – 400 MB/s, 4G – 800 MB/s, 8GB – 1600 MB/s ESX Architecture Performance Perspective 17 Confidential ESX Architecture – Performance Perspective CPU Virtualization – Virtual Machine Monitor • ESX doesn’t trap and emulate every instruction, x86 arch does not allow this • System calls and Faults are trapped by the monitor • Guest code runs in one of three contexts  Direct execution  Monitor code (fault handling)  Binary Translation (BT - non virtualizable instructions) • BT behaves much like JIT • Previously translated code fragments are stored in translation cache and reused – saves translation overhead ESX Architecture – Performance Implications Virtual Machine Monitor – Performance implications • Programs that don’t fault or invoke system calls run at near native speeds – ex. Gzip • Micro-benchmarks that do nothing but invoke system calls will incur nothing but monitor overhead • Translation overhead varies with different Privileged instructions. Translation cache tries to offset some of the overhead. • Applications will have varying amount of monitor overhead depending on their call stack profile. • Call stack profile of an application can vary depending on its workload, errors and other factors. • It is hard to generalize monitor overheads for any workload. Monitor overheads measured for an application are strictly applicable only to “Identical” test conditions. ESX Architecture – Performance Perspective Memory virtualization • Modern OS’es set up page tables for each running process. x86 paging hardware (TLB) caches VA - PA mappings • Page table shadowing – additional level of indirection  VMM maintains PA – MA mappings in a shadow table  Allows the guest to use x86 paging hardware with the shadow table • MMU updates  VMM write protects shadow page tables (trace)  When the guest updates page table, monitor kicks in (page fault) and keeps shadow page table consistent with the physical page table • Hidden page faults  Trace faults are hidden to the guest OS - monitor overhead.  Hidden page faults are similar to TLB misses on native environments ESX Architecture – Performance Perspective Page table shadowing ESX Architecture – Performance Implications Context Switches • On Native hardware TLB is flushed during a context switch. Newly switched process will incur TLB miss on first memory access. • VMM caches Page Table Entries (PTE) during context switches (caching MMU). We try to keep the Shadow PTE consistent with the Physical PTE • If there are lots of processes running in the guest, and they context switch frequently, VMM may run out of PT caching. Workload=terminalservices increases this cache size (vmx). Process creation • Every new process created requires new PT mapping. MMU updates are frequent • Shell Scripts that spawns commands can cause MMU overhead ESX Architecture – Performance Perspective I/O Path ESX Architecture – Performance Perspective I/O Virtualization • I/O devices are non virtualizable and therefore they are emulated in the guest OS • VMkernel handles Storage and Networking devices directly as they are performance critical in server environments. CDROM, floppy devices are handled by the service console. • I/O is interrupt driven and therefore incurs monitor overhead. All I/O goes through VMkernel and involves a context switch from VMM to VMKernel • Latency of networking device is lower and therefore delay due to context switches can hamper throughput • VMkernel fields I/O interrupts and delivers it to correct VM. From ESX 2.1, VMKernel delivers the interrupts to the idle processor. ESX Architecture – Performance Perspective Virtual Networking • Virtual NICs  Queue buffer could overflow - if the pkt tx/rx rate is high - VM is not scheduled frequently  VMs are scheduled when they have packets for delivery  Idle VMs still receive broadcast frames. Wastes CPU resources.  Guest Speed/duplex settings is irrelevant. • Virtual Switches don’t learn MAC address  VMs register MAC address, virtual switch knows the location of the MAC • VMnics  Listens for the MAC addresses that are registered by the VMs.  Layer 2 Broadcast frames are passed above ESX Architecture – Performance Perspective NIC Teaming • Teaming only provides outbound load balancing • NICs with different capabilities could be teamed  Least common Capability in the bond is used • Out-MAC mode scales with number of VMs/virtual NICs. Traffic from a single virtual NIC is never load balanced. • Out-IP scales with the number of Unique TCP/IP sessions. • Incoming traffic can come on the same NIC. Link aggregation on the physical switches provides inbound load balancing. • Packet reflections can cause performance hits in the guest OS. No empirical data available. • We Failback when the Link comes alive again.  Performance could be affected if the Link flips flops. ESX Architecture – Performance Perspective vmxnet optimizations • vmxnet handles cluster of packets at once – reduces context switches and interrupts • Clustering kicks in only when the packet receive/transmit rate is high. • vmxnet shares memory area with VMkernel – reduces copying overhead • vmxnet can take advantage of TCP checksum and Segmentation offloading (TSO) • NIC Morphing – allows loading vmxnet driver for valance virtual device. Probes a new register with the valance device. • Performance of a NIC Morphed vlance device is same as the performance of vmxnet virtual device. ESX Architecture – Performance Perspective SCSI performance • Queue depth determines the SCSI throughput. When the queue is full, SCSI I/O’s are blocked limiting effective throughput. • Stages of Queuing  Buslogic/LSILogic -> VMkernel Queue -> VMkernel Driver Queue depth > Device Firmware Queue -> Queue depth of the LUN • Sched.numrequestOutstanding – number of outstanding I/O commands per VM – see KB 1269 • Buslogic driver in windows limits the queue depth size to 1 – see KB 1890 • Registry settings available for maximizing queue depth for LSILogic adapter (Maximum Number of Concurrent I/Os) ESX Architecture – Performance Perspective VMFS • Uses larger block sizes (1MB default)  Larger block size reduces Metadata size – metadata is completely cached in memory  Near native speeds is possible, because metadata overhead is removed  Fewer I/O operations. Improves read-ahead cache hits for sequential reads • Spanning  Data is filled to the other LUN sequentially after overflow. There is no striping.  Does not offer performance improvements. • Distributed Access  Multiple ESX hosts can access the VMFS volume, only one ESX host updates the meta-data ESX Architecture – Performance Perspective VMFS • Volume Locking  Metadata updates are performed through locking mechanism  SCSI reservation is used to lock the volume  Do not confuse this locking with the file level locks implemented in the VMFS volume for different access modes • SCSI reservation  SCSI reservation blocks all I/O operations until the lock is released by the owner  SCSI reservation is held usually for a very short time and released as soon as the update is performed  SCSI reservation conflict happens when SCSI reservation is attempted on a volume that is already locked. This usually happens when multiple ESX hosts contend for metadata updates ESX Architecture – Performance Perspective VMFS • Contention for metadata updates  Redo log updates from multiple ESX hosts  Template deployment with redo log activity  Anything that changes/modifies file permission on every ESX host • VMFS 3.0 uses new volume locking mechanism that significantly reduces the number of SCSI reservations used ESX Architecture – Performance Perspective Service Console • Service console can share Interrupt resources with VMkernel. Shared interrupt lines reduce performance of I/O devices – KB 1290 • MKS is handled in the service console in ESX 2.x. and its performance is determined by the resources available in the COS • The default Min CPU allocated is 8% and may not be sufficient if there are lots of VMs running • Memory recommendations for service console do not account memory that will be used by the agents • Scalability of VMs is limited by COS in ESX 2.x. ESX 3.x avoids this problems with userworlds for VMkernel. Understanding ESX Resource Management & Over-Commitment 33 Confidential ESX Resource Management Scheduling • Only one VCPU runs on a CPU at any time • Scheduler tries to run the VM on the same CPU as much as possible • Scheduler can move VMs to others Processors when it has to meet the CPU demands of the VM Co-scheduling • SMP VMs are co-scheduled, i.e. all the VCPUs run on their own PCPUs/LCPUs simultaneously • Co-scheduling facilitates synchronization/communication between processors, like in the case of spinlock wait between CPUs • Scheduler can run a VCPU without the other for a short period of time (1.5 ms) • Guest could halt the co-scheduled CPU, if it is not using it, but Windows doesn’t seem to halt the CPU – wastes CPU cycles ESX Resource Management NUMA Scheduling • Scheduler tries to schedule the world within the same NUMA node so that cross NUMA migrations are fewer • If a VM’s memory pages are split between NUMA nodes, the memory scheduler slowly migrates all the VM’s pages to the local node. Over time the system becomes completely NUMA balanced. • On NUMA architecture, CPU utilization per NUMA node gives better idea of CPU contention • While factoring %ready, factor the CPU contention within the same NUMA node. ESX Resource Management Hyperthreading • Hyperthreading support was added in ESX 2.1, recommended • Hyperthreading increases scheduler’s flexibility especially in the case of running SMP VMs with UP VMs • A VM scheduled on a LCPU is charged only half the “package seconds” • Scheduler tries to avoid scheduling a SMP VM onto the logical CPUS of the same package • A high priority VM may be scheduled to a package with one its of LCPU halted – this prevents other running worlds from using the same package ESX Resource Management HTSharing • Controls hyperthreading behavior with individual VMs. • htsharing=any  Virtual CPUs could be scheduled on any LCPUs. Most flexible option for the scheduler. • htsharing=none  Excludes sharing of LCPUs with other VMs. VM with this option gets a full package or never gets scheduled.  Essentially this excludes the VM from using logical CPUs (useful for the security paranoid). Use this option if an application in the VM is known to perform poorly with HT. • htsharing=internal  Applies to SMP VMs only. This is same as none, but allows sharing the same package for the VCPUs of the same VM. Best of both worlds for SMP VMs.  For UP VMs this translates to none ESX Resource Management HT Quarantining • ESX uses P4 Performance counters to constantly evaluate HT performance of running worlds • If a VM appears to interact badly with HT, the VM is automatically placed into a quarantining mode (i.e. htsharing is set to none) • If the bad events disappear, the VM is automatically pulled back from quarantining mode • Quarantining is completely transparent ESX Resource Management CPU affinity • Defines a subset of LCPUs/PCPUs that a world could run on • Useful to  partition server between departments  troubleshoot system reliability issues  For manually setting NUMA affinity in ESX 1.5.x  applications that benefit from cache affinity • Caveats  Worlds that don’t have affinity can run on any CPU, so they have better chance of getting scheduled  Affinity reduces Schedulers capability to maintain fairness – min CPU guarantees may not be possible under some circumstances  NUMA optimizations (page migrations) are excluded for VMs that have CPU affinity (can enforce manual memory affinity)  SMP VMs should not be pinned to LCPUs  Disallows vMotion operations ESX Resource Management Proportional Shares • Shares are used only when there is a resource contention • Unused shares (shares of a halting/idling VM) are partitioned across active VMs. • In ESX 2.x shares operate on a flat namespace • Changing shares of one world affects the effective CPU cycles received by other running worlds. • If VMs use a different share scale then shares for other worlds should be changed to the same scale ESX Resource Management Minimum CPU • Guarantees CPU resources when the VM requests for it • Unused resources are not wasted, and is given to other worlds that requires it. • Setting min CPU to 100% (200% in case of SMP) ensures that the VM is not bound by the CPU resource limits • Using min CPU is favored over using CPU affinity or proportional shares • Admission control verifies if Min CPUs could be guaranteed when the VM is powered on or VMotioned ESX Resource Management Demystifying “Ready” time • Powered on VM could be either running, halted or in a ready state • Ready time signifies the time spent by a VM on the run queue waiting to be scheduled • Ready time accrues when more than one world wants to run at the same time on the same CPU  PCPU, VCPU over-commitment with CPU intensive workloads  Scheduler constraints - CPU affinity settings • Higher ready time reduces response times or increases job completion time • Total accrued ready time is not useful  VM could have accrued ready time during their runtime without incurring performance loss (for example during boot) • %ready = ready time accrual rate ESX Resource Management Demystifying “Ready” time • There are no good/bad values for %ready.  Depends on the priority of the VMs - latency sensitive applications may require less or no ready time • Ready time could be reduced by increasing the priority of the VM  Allocate more shares, set minCPU, remove CPU affinity ESX Resource Management Unexplained “Ready” time • If the VM accrues ready time while there are enough CPU resources then it is called “Unexplained Ready time” • There are some belief in the field that such a thing actually exists – hard to prove or disprove • Very hard to determine if CPU resources are available when ready time accrues  CPU utilization is not a good indicator of CPU contention  Burstiness is very hard to determine  NUMA boundaries – All VMs may contend within the same NUMA node  Misunderstanding of how scheduler works ESX Resource Management Resource Management in ESX 3.0 • Resource Pools  Extends hierarchy. Shares operate within the resource pool domain. • MHz  Resource allocation are absolute based on clock cycles. % based allocation could vary with processor speeds. • Clusters  Aggregates resources from multiple ESX hosts Resource Over-Commitment CPU Over-Commitment • Scheduling  Too many things to do!  Symptoms: high %ready  Judicious use of SMP • CPU utilization  Too much to do!  Symptoms: 100% CPU  Things to watch - Misbehaving applications inside the guest - Do not rely on Guest CPU utilization – halting issues, timer interrupts - Some applications/services seem to impact guest halting behavior. No longer tied to SMP HALs. Resource Over-Commitment CPU Over-Commitment • Higher CPU utilization does not necessarily mean lesser performance.  Application’s progress is not affected by higher CPU utilization  However if higher CPU utilization is due to monitor overheads then it may impact performance by increasing latency  When there is no headroom (100% CPU), performance degrades • 100% CPU utilization and %ready are almost identical – both delay application progress • CPU Over-Commitment could lead to other performance problems  Dropped network packets  Poor I/O throughput  Higher latency, poor response time Resource Over-Commitment Memory Over-Commitment • Guest Swapping - Warning  Guest page faults while swapping.  Performance is affected by both guest swapping and due to monitor overhead handling page faults.  Additional disk I/O • Ballooning – Serious • VMkernel Swapping - Critical • COS Swapping - Critical  VMX process could stall and affect the progress of the VM  VMX could be a victim of random process killed by the kernel  COS requires additional CPU cycles, for handling frequent page faults and disk I/O • Memory shares determine the rate of ballooning/swapping Resource Over-Commitment Memory Over-Commitment • Ballooning  Ballooning/swapping stalls processor, increases delay  Windows VMs touches all allocated memory pages during boot. Memory pages touched by the guest could be reclaimed only by ballooning  Linux guest touches memory pages on demand. Ballooning kicks in only when the guest is under complete memory pressure  Ballooning could be avoided by using min=max  /proc/vmware/sched/mem - size <>sizetgt indicates memory pressure - mctl > mctlgt – ballooning out (giving away pages) - mctl < mctlgt – ballooning in (taking in pages)  Memory shares affect ballooning rate Resource Over-Commitment Memory Over-Commitment • VMKernel Swapping  Processor stalls due to VMkernel swapping are more expensive than ballooning (due to disk I/O)  Do not confuse this with - Swap reservation: Swap is always reserved for worst case scenario if min<> max, reservation = max – min - Total swapped pages: Only current swap I/O affects performance  /proc/vmware/sched/mem-verbose - swpd – total pages swapped - swapin, swapout – swap I/O activity  SCSI I/O delays during VMKernel I/O swapping could result in system reliability issues Resource Over-Commitment I/O bottlenecks • PCI Bus saturation • Target device saturation  Easy to saturate storage arrays if the topology is not designed correctly for load distribution • Packet drops  Effective throughput reduces  Retransmissions can cause congestion  Window size scales down in the case of TCP • Latency affects throughput  TCP is very sensitive to Latency and packet drops • Broadcast traffic  Multicast and broadcast traffic sent to all VMs. • Keep an eye on Pkts/sec and IOPS and not just bandwidth consumption ESX Performance Application Performance issues 52 Confidential ESX Performance – Application Issues Before we begin • From VM perspective, an running application is just a x86 workload. • Any Application performance tuning that makes the application to run more efficiently will help • Application performance can vary between versions  New version could be more or less efficient  Tuning recommendations could change • Application behavior could change based on its configuration • Application performance tuning requires intimate knowledge on how the application behaves • Nobody at VMware specializes on application performance tuning  Vendors should optimize their software with the thought that the hardware resources could be shared by other Operating Systems.  TAP program - SpringSource (unit of VMware) – Provides developer support for API scripting ESX Performance – Application issues Citrix • Roughly 50-60% monitor overhead – takes 50-60% more CPU cycles than on the native machine • The maximum number of users limit is hit when the CPU is maxed out – roughly 50% of users as would be seen on native environment with an apples to apples comparison. • Citrix Logon delays  This could happen even on native machines when roaming profiles are configured. Refer Citrix and MS KB articles  Monitor overhead can introduce logon delays • Workarounds  Disable com ports, workload=terminalservices, disable unused apps, scale horizontally • ESX 3.0 improves Citrix performance – roughly 70-80% of native performance ESX Performance – Application issues Database performance • Scales well with vSMP – recommended  Exceptions: Pervasive SQL – not optimized for SMP • Two key parameters for database workloads  Response time - Transaction logs  CPU utilization • Understanding SQL performance is complex. Most enterprise databases run some sort of query optimizer that changes the SQL Engine parameters dynamically  Performance will vary with run time. Typically benchmarking is done after priming the database • Memory resource is key. SQL performance can vary a lot depending on the available memory. ESX Performance – Application Issues Lotus Domino Server • One of the better performing workloads. 80-90% of direct_exec • CPU and I/O intensive • Scalability issues – Not a good idea to run all domino servers on the same ESX server. ESX Performance – Application Issues 16-bit applications • 16 bit applications on windows NT/2000 and above run in a Sandboxed Virtual Machine • 16 bit apps depend on segmentation – possible monitor overhead. • Some 16-bit apps seem to spin idle loop instead of halting the CPU  Consumes excessive CPU cycles • No performance studies done yet  No compelling application ESX Performance – Application Issues Netperf – throughput • Max Throughput is bound by a variety of parameters  Available Bandwidth, TCP window size, available CPU cycles • VM incurs additional CPU overhead for I/O • CPU utilization for networking varies with  Socket buffer size, MTU – affects the number of I/O operations performed  Driver – vmxnet consumes lesser CPU cycles  Offloading features – depending on the driver settings and NIC capabilities • For most applications, throughput is not the bottleneck  Measuring throughput and improving it may not always resolve the underlying performance issue ESX Performance – Application Issues Netperf – Latency • Latency plays an important role for many applications • Latency can increase  When there are too many VMs to schedule  VM is CPU bound  Packets are dropped and then re-transmitted ESX Performance – Application Issues Compiler Workloads • MMU intensive: Lots of new processes created, context switched, and destroyed. • SMP VM may hurt performance  Many compiler workloads are not optimized by SMP  Process threads could ping-pong between the vCPUs • Workarounds:  Disable NPTL  Try UP (don’t forget to change the HAL)  Workload=terminalservices might help ESX Performance Forensics 61 Confidential ESX Performance Forensics Troubleshooting Methodology • Understand the problem.  Pay attention to all the symptoms  Pay less attention to subjective metrics. • Know the mechanics of the application  Find how the application works  What resources it uses, and how it interacts with the rest of the system • Identify the key bottleneck  Look for clues in the data and see if that could be related to the symptoms  Eliminate CPU, Disk I/O, Networking I/O, Memory bottlenecks by running tests • Running the right test is critical. ESX Performance Forensics Isolating memory bottlenecks • Ballooning • Swapping • Guest MMU overheads ESX Performance Forensics Isolating Networking Bottlenecks • Speed/Duplex settings • Link state flapping • NIC Saturation /Load balancing • Packet drops • Rx/Tx Queue Overflow ESX Performance Forensics Isolating Disk I/O bottlenecks • Queue depth • Path thrashing • LUN thrashing ESX Performance Forensics Isolating CPU bottlenecks • CPU utilization • CPU scheduling contention • Guest CPU usage • Monitor Overhead ESX Performance Forensics Isolating Monitor overhead • Procedures for release builds  Collect performance snapshots • Monitor Components ESX Performance Forensics Collecting Performance Snapshots • Duration • Delay • Proc nodes • Running esxtop on performance snapshots ESX Performance Forensics Collecting Benchmarking numbers • Client side benchmarks • Running benchmarks inside the guest ESX Performance Troubleshooting - Summary 70 Confidential ESX Performance Troubleshooting - Summary Key points • Address real performance issues. Lots of time could be spent on spinning wheels on theoretical benchmarking studies • Real performance issues could be easily described by the end user who uses the application • There is no magical configuration parameter that will solve all performance problems • ESX performance problems are resolved by  Re-architecting the deployment  Tuning application  Applying workarounds to circumvent bad workloads  Moving to a newer version that addresses a known problem • Understanding Architecture is the key  Understanding both ESX and application architecture is essential to resolve performance problems Questions? Reference links http://www.vmware.com/files/pdf/perf-vsphere-memory_management.pdf http://www.vmware.com/resources/techresources/10041 http://www.vmware.com/resources/techresources/10054 http://www.vmware.com/resources/techresources/10066 http://www.vmware.com/files/pdf/perf-vsphere-cpu_scheduler.pdf http://www.vmware.com/pdf/RVI_performance.pdf http://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf http://www.vmware.com/files/pdf/perf-vsphere-fault_tolerance.pdf

ESX Performance Troubleshooting

Related documents

Products

Support

ESX Performance Troubleshooting

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib