Resource Management in the Virtual World Singapore, Q1 2013 1 Topic How Resource Management works in vSphere 5 • Server Pool • Storage Pool • Network Pool Architecting Pools of resources in large environment • Server Pool • Storage Pool Monitoring Pools of resources in large environment • Performance monitoring • Compliance monitoring 3 Confidential Resource Pool: CPU and RAM The “Resource Pool” that most of us know. 4 Confidential Server Resource Pool: Quick Intro 5 Confidential Server Resource Pool: Quick Intro 6 Confidential Server Resource Pool Cluster means you no longer need to think of individual ESXi host • No longer need to map 1000 VM to 100 ESX What it is • Grouping of ESX CPU/RAM in a cluster, as if they are 1 giant computer. • They are not, obviously, as a VM can’t span across 2 hosts at a given time. • A few apps might be ESXi aware, and do their own co-ordination. Example is vFabric EM4J (Elastic Memory for Java). But this is a separate topic altogether • A logical grouping of CPU and RAM only • No Disk and Network • Cluster must be DRS-enabled to create resource pools What it is not • A way to organise VM. Use folder for this. • A way to segregate admin access for VM. Use folder for this. Example: a cluster has 8 ESX host. Each has 2 cores. So total is 48 GHz VI-3 Cluster of [CPU] 8 * (3.0Ghz * 2) [RAM] 8 * 16,384MB 7 Root Resource Pool [CPU] 49,152Mhz [RAM] 131,072MB Confidential Child Resource Pools A slice of the parent RP Child RP can exceed the capacity of the root resource pool Used to allocate capacity to different consumers and to enable delegated administration RP1-1 – Limits [CPU] 14,745Mhz [RAM] 39,320MB RP1-2 – Limits [CPU] 9,831Mhz [RAM] 26,216MB RP2 – Limits [CPU] 8,096Mhz [RAM] 24,576MB RP1 – Limits [CPU] 24,576Mhz [RAM] 65,536MB VI-3 Cluster of 8 * (3.0Ghz * 2) [CPU] 8 * 16,384MB [RAM] 8 RP3 – Limits [CPU] 16,192Mhz [RAM] 40,960MB Root Resource Pool 49,152Mhz [CPU] [RAM] 131,072MB Confidential RP Settings Can control CPU and RAM only • Disk is done at per VM level. • Network is done at per vDS port group level. Shares is mandatory • Can’t set it to blank Shares is always relative • Relative to other VM in same Resource Pool or Cluster Reservation • Impact the cluster Slot Size. Use sparingly. • Can’t overcommit. Notice the triangle Take note of “MHz” • Not aware of CPU generation • 2 GHz Xeon 5600 is considered as same speed as 2 GHz Xeon 5100. No such thing as “unlimited” in Limit • A VM can’t go beyond its Configured value. • A VM with 2 GB RAM won’t run as if it has 128 GB (assume ESXi has 128 GB) 9 Confidential Configuration, Reservation, Limit Configured For resources above “Limit” - you will never gain access • The amount presented to BIOS of the VM. • Hence a VM will never exceed its configured amount as it can’t see beyond it. ESX RAM is irrelevant. • A Windows VM configured with 8 GB. Windows will start swaping to its own swap file in its NTFS drive if it reach 8 GB. Limit “Configured” = amount configured for the VM It’s available for someone else’s reserved utilization (it can be “stolen” from you) down the CPU. It just give the VM less CPU cycle. Reservation • Define the minimum amount of a resource that a consumer is guaranteed to receive – if asked for • Reserved capacity that is not used is available to other consumers for them to use – but not reserve • If a consumer asks for reserved capacity that has been “loaned” to another consumer, it is reclaimed and given to satisfy the reservation 10 Confidential Reservation Limit • A virtual property. Does not exist in physical server. • Not visible by VM. • Can be used to force slow down a VM. ESXi does not clock For resources between “Reservation” and “Limit” - if you ask for it, you get it if it’s available Resource usage here is guaranteed – if you ask for it, you get it. If you don’t use it, it’s available for someone else’s unreserved utilization (it can be “loaned out”, but is reclaimed on request) VM-level Reservation CPU reservation: • • • • • Guarantees a certain level of resources to a VM Influences the admission control (PowerOn) CPU reservation isn’t as bad as often referenced: CPU reservation doesn’t claim the CPU when VM is idle (is refundable) CPU reservation caveats: CPU reservation does not always equal priority • VM uses processors and “Reserved VM” is claiming those CPUs = ResVM has to wait until threads / tasks are finished • Active threads can’t be “de-schedules” if you do so = Blue Screen / Kernel Panic Memory reservation • Guarantees a certain level of resources to a VM • Influences the admission control (PowerOn) • Memory reservation is as bad as often referenced. “Non-Refundable” once allocated. • Windows is zeroing out every bit of memory during startup… Memory reservation caveats: • Will drop the consolidation ratio • May waste resources (idle memory cant’ be reclaimed) • Introduces higher complexity (capacity planning) 11 Confidential Resource Pool shares is not “cascaded” down to each VM. The more VM you put into a Resource Pool, the less each get. • The pool is not per VM. It is for the entire pool. • The only way to give the VM guarantee is to set the pool for each VM. This has admin overhead as it’s not easily visible. VM3 VM4 Pool 1 Pool 1 VM5 VM2 Pool 33 Pool VM6 Pool 2 Pool 2 VM1 12 Confidential Resource Pool: A common mistake… Sys Admin created 3 resource pool called Tier 1, Tier 2, Tier 3. • The follow the relative High, Normal, Low share. • So Tier 1 gets 4x the shares of Tier 3. Place 10 VM on each Tier. • 30 total in the cluster. • Everything is fine for now. • Tier 1 does get 4x the share. Since Tier 1 performs better, place 10 more VM on Tier 1. • So Tier 1 now has 20 VM Result: Tier 1 performance drops. • The 20 VM are fighting the same share. The above problem will only happens if there is contention. If the physical ESXi host has enough resource to satisfy all 40 VMs, then Shares do not kick in. 13 Confidential Implication of poorly design resource pool The cluster has 2 resource pools and a few VM outside these 2 resource pools. “Test 1” resource pool is given 4x the shares. But it has 8 VM. So 26% / 8 = ~3% per VM. 14 Confidential Per VM settings Screen is based on Sphere 5 and VM hardware version 8 15 Confidential Shares Value and Shares Shares can be “Normal” but the value can differ from VM to VM. Use script to set all the values to identical amount. 16 Confidential Example VM 1 VM 2 VM 3 ESXi Hypervisor VM 1: VM 2: VM 3: Memory size: 4GB Reservation: 0 Limit: unlimited Shares: 3000 Idle memory: 0 Memory size: 4 GB Reservation: 0 Limit: unlimited Shares: 1000 Idle memory: 0 Memory size: 2 GB Reservation: 2 GB Limit: unlimited Shares: 1000 Idle memory: 0 Entitlement: 3 GB Entitlement: 1 GB Entitlement: 2 GB 6 GB pRAM Total for 3 VM = 10 GB. But ESX only has 6 GB. VM 3 will get 2 GB, as it has reservation. ESX has 4 GB left. VM 1 will get 3000/4000 shares, which is 3/4 * 4 GB = 3 GB VM 2 will get 1000/4000, which is 1/4 * 4 GB = 1 GB. VM 2 performance drops. VM 3 performance not affected at all 17 Confidential Resource Pool: Best Practices For Tier 1 cluster, where all the VMs are critical to business • Architect for Availability first, Performance second. • Translation: Do not over-commit. • So resource pool, reservation, etc are immaterial as there is enough for everyone. • But size each VM accordingly. No oversizing as it might slow down. For Tier 3 cluster, use carefully, or don’t use at all. • Tier 3 = overcommit. • So use Reservation sparingly, even at VM level. • This guarantees resource, so it impacts the cluster slot size. • Naturally, you can’t boot additional VM if your guarantee is fully used • Take note of extra complexity in performance troubleshooting. • Use as a mechanism to reserve at “group of VMs” level. • If Department A pays for half the cluster, then creating an RP with 50% of cluster resource will guarantee them the resource, in the event of contention. They can then put as many VM as they need. • But as a result, you cannot overcommit at cluster level, as you have guaranteed at RP level. Do not configure high CPU or RAM, then use Limit • • • • E.g. configure with 4 vCPU, then use limit to make it “2” vCPU It can result in unpredictable performance as Guest OS does not know. High CPU or high RAM has higher overhead. Limit is used when you need to force slow down a VM. Using Shares won’t achieve the same result Don’t put VM and RP as “sibling” or same level 18 Confidential Resource Pool: Disk and Network The “Resource Pool” that most of us don’t give enough attention. 19 Confidential Disk is set at individual VM, not Resource Pool Default Shares Value is 1000. This is at Datastore level, which may span across cluster. You can set Limit, but not Reservation. NFS Datastore can even span across vCenter (use case: read-only templates and ISO images) 20 Confidential Reviewing Disk Resource Pool Shares is at Datastore level. Just like “Server” Resource Pool, the more VM you put, the less each VM. You can view at Cluster level (which give view across datastores from this single cluster). This does not tell the whole picture as the datastores may span across clusters. You cannot view at individual ESXi level if it is part of a cluster 21 Confidential Viewing at Datastore level Shares is at Datastore level. Just like “Server” Resource Pool, the more VM you put, the less each VM. You can view at Cluster level (which give view across datastores from this single cluster). This does not tell the whole picture as the datastores may span across clusters. Do no span a datastore across “data center” as you can only see 1 DC at a time. You cannot view at individual ESXi level if it is part of a cluster. 22 Confidential Pre-requisite: Storage IO Control As a Datastore is just a logical construct, it has no physical limit by itself. The limit is on underlying LUN or path. To enable sharing, enable Storage I/O Control 23 Confidential Enabling Storage I/O Control Not enabled by default 24 Confidential Storage DRS Finally, a “cluster” for storage • Differences • VM disks won’t move to another DS in the event of datastore or LUN failure • Has concept of storage tiering. • Similarity • No need to specify individual datastore • Affinity and Anti-Affinity rules • Load balance among datastores, although in hours/days and not 5 minutes. New feature in vSphere 5 More details here. 25 Confidential Network Resource Pool Tenant 1 VMs Tenant 2 VMs VR vMotion Mgmt FT NFS iSCSI Server Admin vSphere Distributed Portgroup Teaming Policy vSphere Distributed Switch 26 Traffic Shares Limit (Mbps) 802.1p vMotion 5 150 1 Mgmt 30 NFS 10 iSCSI 10 2 FT 60 -- VR 10 -- VM 20 Tenant 1 5 -- Tenant 2 15 -- Shaper Scheduler Scheduler -250 2000 Load Based Teaming Limit enforcement per team -- Shares enforcement per uplink 4 Confidential Network Resource Pool 27 Confidential Network Resource Pool New feature in vSphere 5. Can set shares and Limit, but not Reservation. Unlike CPU/RAM, there is no reservation for Disk and Network • Network & Disk is not something that is completely controlled by ESX. • Array is serving multiple ESX or Cluster, and even non ESX. • Network has switches, router, firewall, etc which will impact performance. 28 Confidential Sample Architecture This shows an example for Cloud for ~2000 VM. It also uses Active/Passive data centers. 29 Confidential Sample Architecture Primary Data Center (Active) vCenter 1 Confidential Cluster Management VMs for Desktops reside in IT Cluster With LinkedMode. With SRM integration Standalone vCenter 2 Tier 1 Clusters Tier 2 Clusters Special Clusters vCenter 3 Tier 3 Clusters IT Cluster Desktop Cluster N Desktop Cluster 1 8 ESXi SAN Fabric NFS LAN NFS Storage 30 Confidential VM FC Storage NFS LAN Tier 1 Storage Tier 2 Storage Confidential Tape back up NFS Storage Tier 3 Storage IT Cluster 8 ESXi The need for IT Cluster Large Cloud Special purpose cluster • Running all the IT VMs used to manage the virtual DC or provide core services • The Central Management will reside here too • Separated for ease for management & security This separation keeps VMware vCenter (for Server Cloud) vCenter Heart-beat vCenter Update Manager Symantec AppHA Server vCloud Director Storage Storage Mgmt tool (may need physical RDM to get fabric info) Network Network Management Tool Nexus 1000V Manager (VSM) Core Infra MS AD 1 MS AD 2 Syslog server File Server (FTP Server) Advance vDC Services Site Recovery Manager + DB Chargeback + DB Agentless AV Object-based Firewall Security Security Management Server vShield Manager Admin Admin client (1 per Sys Admin) VMware Converter vMA vCenter Orchestrator Application Mgmt App Dependancy Manager Management vCenter Ops + DB Help Desk Desktop View Managers + DB ThinApp Update Server vCenter (for Desktop Cloud) Business Cluster clean, “strictly for business”. 31 Confidential 3 Tier Server resource pool Create 3 clusters • The hosts can be identical. Tier # Host Node Spec? Failure Tolerance MSCS #VM Monitoring Remarks Tier 1 5 (always) Always Identical 2 hosts Yes Max 18 per cluster Application level. Extensive Alert Only for Critical App. No Resource Overcommit. Tier 2 4–8 (likely 8) 2 variations 1 host Limited Max 70 VM. 10 per (N-1) Tier 3 6–8 (likely 8) 3 variations 1 host No Max 105 VM 15 per (N-1) App can be vMotion to Tier 1 during critical run Infrastructure level Minimal Alert. Each project then “leases” vCPU and GB • Not GHz, as speed may vary. • Not using Resource Pool, as we can’t control the #VM in the pool 32 Confidential Resource Overcommit 3 Tier pools of storage Create 3 Tiers of Storage. • This become the type of Storage Pool provided to VM • Paves for standardisation • Choose 1 size for each Tier. Keep it consistent. • 20% free capacity for VM swap files, snapshots, logs, thin volume growth, and storage vMotion (inter tier). • Use Thin Provisioning at array level, not ESX level. • Separate Production and Non Production • VMDK larger than 1 TB will be provisioned as RDM. Virtual-compatibility mode used. Example Tier Interface IOPS Latency RAID RPO RTO Size Limit Snapshot # VM 1 FC >4000 10 ms 10 1 hour 1 hour 1 TB 70% Yes ~10 VM. EagerZeroedThick 2 FC >2000 15 ms 5 4 hour 4 hour 2 TB 80% No ~20 VM. Normal Thick 3 iSCSI >1000 20 ms 5 8 hour 8 hour 3 TB 80% No ~30 VM. Normal Thick 33 Confidential Mapping: Cluster - Datastore Always know which cluster mounts what datastores • Keep the diagram simple. Not too many info. The idea is to have a mental picture that you can remember. • If your diagram has too many lines, too many datastores, too many clusters, then it maybe too complex. Create a Pod when such thing happens. Modularisation can be good. 34 Confidential Performance counters: CPU Same counters are shown for other period, because no real time counters. It does not make sense to see real time. 35 Confidential Performance counters: RAM counters not shown: Memory Capacity Usage 36 Confidential 37 Confidential Memory: Consume vs Active Consumed = how much physical RAM a VM has allocated to it • It does not mean the VM is actively using it. It can be idle page. Two types of memory overcommitment Mapped to pRAM • “Configured” memory overcommitment • (Sum of VMs’ configured memory size) / host’s mem.capacity.usable* • This is what is usually meant by “memory overcommitment” Hypervisor • “Active” memory overcommitment • (Sum of VMs’ mem.capacity.usage*) / host’s mem.capacity.usable* Impact of overcommitment • “Configured” memory overcommitment > 1 • zero to negligible VM performance degradation • “Active” memory overcommitment ≈ 1 • very high likelihood of VM performance degradation! *Only available in vSphere 5.0. But net effect is the same. 38 Confidential consumed Configured Memory Overcommitment Parts of idle and free memory not in physical RAM due to reclamation VM 1 free VM 2 idle active free VM 3 idle active free idle active Hypervisor All VMs’ active memory stays resident in physical RAM, allowing for maximum VM performance Entitlement >= demand for all VMs [good] 39 Confidential Active Memory Overcommitment No idle and free memory in physical RAM VM 1 VM 2 active VM 3 active active Hypervisor Some VM active memory not in physical RAM, which will lead to VM performance degradation! Entitlement < demand for one or more VMs [bad] 40 Confidential Example Notice that Active is lower than Consumed and Limit. • VM was doing fine. Active 41 Limit Consumed Confidential VM is fighting with ESX for memory vSphere and RAM Below is a typical picture. Most VMware Admin will conclude that ESX is running out of RAM. • Time to buy new RAM • This is misleading. It is showing memory.consumed, not memory.active counter. 42 Confidential vCenter Operation and RAM Same ESX. vCenter Ops shows 26%. vCenter Ops is showing the right data 43 Confidential Performance Monitoring 44 Confidential 45 Confidential 46 Confidential Global view 47 Confidential Thank You And have fun in the pool! Confidential © 2009 VMware Inc. All rights reserved