DBA Guide to Databases on VMware Solution Presentation - Don Sullivan – Senior Systems Engineer - Database Specialist sullivand@vmware.com © 2011 VMware Inc. All rights reserved Don Sullivan – sullivand@vmware.com • Oracle Certified Master, Server Products Trainer for Oracle University and consultant with Oracle Advanced Technology Services - 1998-05. • Oracle SA for Polyserve/HP – 05-10 • Vmware SE DB specialist 2010 – Present 4 Confidential Agenda Introduction Understanding VMware Performance Designing Databases on VMware Developing and Testing Databases Migrating Existing Databases Securing the Databases Running Databases on VMware Monitor and Troubleshoot Database Performance Introduction The Trend… • Large, multi-core servers becoming commodity • Increasing number of CPU cores, memory, network bandwidth • Traditional “one app one server” model is out dated • Increasing demands for high availability • Business going global • 24x7 internet • Economy demands to increase IT efficiency • Reduce operational costs, increase productivity • Reduce HW and SW costs • Increasing manageability challenges and security concerns over database server sprawl Experienced DBAs Look to VMware… Quality of Service App Lifecycle App Costs 9 Improve Quality of Service Built-in HA provides protections to all database environments from production, development, to QA Simple and Reliable Disaster Recovery manager per site instead of per database Scale on demand to handle database spikes/peak utilization Accelerate Database Lifecycle from Dev to Production Reduce provisioning times from weeks to minutes Self-service provisioning Enable testing of databases with production clones Reduce Infrastructure and Software License Costs Reduce Infrastructure footprint through consolidation while maintaining full database isolation Increase utilization of software licenses Understanding VMware Performance % of Applications >95% of Apps Match Native Performance on Virtual Machines ESX 2 ESX 3 ESX 3.5 ESX 4 30% - 60% 20% - 30% <10% - 20% <2% - 10% 1 vCPU 2 vCPU 4 vCPU 8 vCPU < 4 GB 16 GB 64 GB 255 GB 380 Mb/s 800 Mb/s 9 Gb/s 30 Gb/s < 10,000 20,000 100,000 > 350,000 Application Performance Requirements 1. Source: VMware Capacity Planner assessments Overhead SQL Server Scale Up Performance Relative to Native • At 1 & 2 vCPUs, ESX is 92 % of native performance At 1, 2 and 4 vCPUs on the 8pCPU server, ESX is able to effectively offload certain tasks to idle cores. • 4 vCPUs , 88% and 8 vCPUs 86 % of native performance Single VM Performance: Well-Known Database OLTP Workload † Intel® Xeon® processor 5500 series based 8-pCPU server RHEL 5.1 Oracle 11gR1 < 15% overhead for 8 vCPU VM Transaction Rate (Ratio to 1-way VM) In-house ESX Server 8,900 total DB transactions per second Near-perfect scalability from 1 to 8 vCPUs 60,000 I/O operations/second † A fair-use implementation of the TPC-C workload; results are not TPC-C compliant The average Oracle DB fits easily in a VM VM 8 vCPU VM 255GB Oracle DB 2-4 CPU 4% utilized CPU VM 350,000 IOPS Oracle DB 4-8GB 50% utilized Memory Source: VMware Capacity Planner analysis of > 700,000 servers in customer production environments Oracle DB 1200 IOPS Disk IO VM 30 Gb/s Oracle DB 2 MB/S Network IO Migrating Oracle 10g from UNIX to vSphere OnCourse Application 125,000 total users 12,000 concurrent users “ We have been able to virtualize our most demanding Oracle Databases on x86 servers. We now have the confidence that vSphere can handle our largest transactionprocessing databases with ease.” Rob Lowden, Director of IT at Indiana University IBM pSeries 9 Power5 Cores 100% utilized x86 8 virtual CPUs 50% utilized Designing Databases on VMware vSphere High Availability Features VMware HA • Detects operating system and hardware failures • Automatically restarts failed database virtual machine • Provides a simple and reliable first line of defense for all databases • Can be used in conjunction with Symantec App HA to provide application aware protections VMware vMotion • Enables live migration of database virtual machines from one physical server to another without service interruption • Can reduce virtual machine planned downtime • Perform host maintenance any time of the day VMware DRS • Monitors state of virtual machine resource usage • Can automatically and intelligently locate virtual machine • Directs compute resources where needed • Maintains database response time and SLAs Scalability on Demand Hot-Add Capacity Dynamic Scaling on VMware 1 vCPU 2 GB VMotion to More Powerful Host Provision Additional App Instance in Minutes 21 4 vCPU 64 GB Hardware Failure Tolerance Transforming Availability Service Levels SQL Failover Clustering / Oracle RAC Continuous VMotion VMware FT (Planned Downtime) Automated Restart SQL Database Mirroring / Oracle Data Guard VMware HA Unprotected 0% 10% Application Coverage 100% Clustering too complex and expensive for most applications VMware HA provide simple, cost-effective availability VMotion provides continuous availability against planned downtime VMware vCenter Site Recovery Manager™ (SRM) • Relies on storage replication • Allows creation, maintenance, and execution of automated process to facilitate site recovery • Safe testing without impacting production environment • Self-documenting Conventional DB Consolidation is Difficult Multi-Instancing DB DB DB DB DB orcl orcl orcl orcl orcl Shared Instance DB fault) • No load balancing across physical nodes DB DB DB Shared Instance Shared OS • No OS isolation (configuration, security, DB Shared OS • No OS isolation (configuration, security, fault) • No Database isolation • Resource isolation depends on DBMS Resource Governor • No load balancing across physical nodes Ideal Platform for DB Consolidation Legacy DB 1 Fast consolidation with P2V Increase performance! DB 2 Preserve isolation in VM OS isolation DB isolation Security isolation 3 Guarantee resources DB DB DB DB DB DB DB DB Reservations Priorities Maximums 4 Load balance across nodes vMotion DRS Developing and Testing Databases Provisioning on Demand 30 Fast, Self-Service Provisioning Developer / QA DBA 31 Lab Manager (and vCloud) Streamline Testing with Snapshots and Clones 4 Test Web APP DB OS OS vApp 3 Run more tests faster Production Move changes into production OS 2 1 Exact copy of production Web APP DB OS OS OS vApp Archive for Fast Roll-back Web APP DB OS OS OS vApp > Faster testing > More accurate testing on exact production copy > Lower cost testing infrastructure Migrating Existing Databases P2V with vCenter Converter • Easy to use, wizard driven process • Converts multiple local and remote physical database servers simultaneously with centralized management console • Creates one-to-one mapping from physical server to database virtual machine • Stop database services (leave OS running) for hot cloning of database server to ensure data consistency New Database Installation • Install new OS and database software on VM, then migrate data from physical server • Works well when planning database upgrade with migration • Works with VMware Templates and Clones for rapid deployment of multiple databases • With RDMs, data can swing over without backup/copy/restore • Minimize downtime • No additional storage requirement for migration • When used with native database replication features (such as mirroring, log shipping), the database VM can run side-by-side with the physical server to minimize migration downtime Securing the Databases Better-than-Physical Security • More granular security compared to native database consolidation • Minimize the database “surface area” per VM • Allows customization of security at VM level • Install/enable components and features as needed • Enable network protocols as needed • More selective administrative and db owner privileges • Database patching and change management less risky Running Databases on VMware Reduce Plan and Unplanned Downtime Protect Databases against Hardware Failures • Built-in, host based high availability • Simple to configure and easy to manage • Protects against hardware or operating system failures • Provide first line of defense for all databases on the host, including production, development, QA, and etc. VMware HA with Database Mirroring for Faster Recovery • Works in conjunction with native database high availability features • Protection against HW/SW failures and DB corruption In-guest Backup • Standard method for physical or virtual • Agent runs in the VM guest and handles database quiescing • Data is sent over the IP network • Can affects CPU utilization in the guest OS Array-based Backup • Backup vendor software coordinates with VSS to create a supported backup image of the databases • Snap-shotted databases can later be streamed to tape as flat files with no IO impact to the production databases Manage Patch Upgrade • The Challenges • Patch/upgrade may introduce bugs, regressions • Uninstalling a patch/upgrade may not be possible • Rolling back a patch/upgrade requires a rebuild of environment, and restore data from backup • VMware Solutions • Enable testing with production clones, reduces the risk of regression • VMware Snapshot Creates a snapshot of the state and data of the database virtual machine at a specific point in time Allows DBAs to easily revert back to the original state of the database virtual machine before the upgrade Manage Legacy Databases • The Challenges • Organizations need to maintain legacy database due to regulatory/compliance requirements, and other reasons • Legacy databases not are upgradable due to vendor support, HW compatibilities issues • Older hardware tends to fail more frequently • VMware Solutions • VMs can be cloned and stored in a virtual vault/archive, then powered on in the event of an audit or discovery request • Virtualization abstracts the OS/app from the underlying hardware, enables legacy database to run on the latest hardware • Legacy database performance can be improved significantly by moving to the latest hardware Monitor and Troubleshoot Databases Performance Host Level Monitoring • vSphere Client: • GUI interface, primary tool for observing performance and configuration data for one or more ESX/ESXi hosts • Does not require high levels of privilege to access the data • Resxtop/Esxtop • Gives access to detailed performance data of a single ESX/ESXi host • Provides fast access to a large number of performance metrics • Requires root-level access • Runs in interactive, batch, or replay mode Key Metrics to Monitor Resource CPU Metric Host / VM Description %USED Both CPU used over the collection interval (%) %RDY VM CPU time spent in ready state %SYS Both Percentage of time spent in the ESX Server VMKernel Swapin, Swapout Both Memory ESX host swaps in/out from/to disk (per VM, or cumulative over host) MCTLSZ (MB) Both Amount of memory reclaimed from resource pool by way of ballooning READs/s, WRITEs/s Both Reads and Writes issued in the collection interval DAVG/cmd Both Average latency (ms) of the device (LUN) KAVG/cmd Both Average latency (ms) in the VMkernel, also known as “queuing time” GAVG/cmd Both Average latency (ms) in the guest. GAVG = DAVG + KAVG MbRX/s, MbTX/s Both Amount of data transmitted per second PKTRX/s, PKTTX/s Both Packets transmitted per second %DRPRX, %DRPTX Both Drop packets per second Memory Disk Network Database VM Level Monitoring • The primary tools and methodologies for monitoring database performance have not change • Monitoring tools • SQL Server: Perfmon, Profiler, Dynamic Manage Views • Oracle: Statspack\AWR • Time-based metrics reported in in-guest tools may not be accurate, use host level monitoring tools • Focus on identifying bottlenecks instead of time-based measurements • CPU bottleneck: high processor queue length • IO bottleneck: high disk queue length Host CPU Saturation • Typical symptoms • DB Instance Sluggish performance with no appearance CPU, memory, disk resource issue • ESX Sustained high host CPU utilization, with avg. > 75%, peak > 90% High VM Ready time • Common causes • CPU over commitment • Unexpected guest VM CPU saturations driving up the host CPU usage • Solutions • Use vMotion or DRS to redistribute VMs to other hosts • Use resource controls to ensure resource is available to DB InstanceVMs • Check hardware assisted virtualization is enabled Guest Memory Misconfiguration • Typical Symptoms • Oracle\SQL Server Low buffer cache hit ratio, low page life expectancy, high number of lazy writes, high number of checkpoint pages/sec • ESX Ballooning > 0 • Common Causes • Misconfiguration of Instance memory and/or insufficient ESX memory reservation for VM • Solutions • Set VM memory reservation = memory provisioned • Set policies which disallow over committing of CPU resources • Analyze vCPU utilization and verify that vCPUs are not idle Monitoring Disk Performance with ESXTOP … • Rule of thumb: • GAVG/cmd > 20ms = high latency! very large values for DAVG/cmd and GAVG/cmd • What does this mean? • Latency when command reaches device is high. • Latency as seen by the guest is high. • Low KAVG/cmd means command is not queuing in VMkernel. Insufficient Disk Sub-System • Typical Symptoms • Oracle\SQL Server High number of waits for PAGEIOLATCH_EX, PAGEIOLATCH_SH • ESX Disk Latency is high, GAVG/cmd > 20ms • Common Causes • Overloaded or misconfigured storage sub-system • Sub-optimal query execution plan • Solutions • • • • • Make sure devices are configured properly (caches, queue depths) Check networking settings (for iSCSI/NAS) Increase memory to reduce need for disk access Index tune queries to reduce the number of IOs Use storage vMotion to balance load across storages systems Resources • Visit our partner central for Solutions Toolsets http://www.vmware.com/partners/partners.html • Running business critical applications on VMware http://www.vmware.com/solutions/business-critical-apps/ Best Practices, Reference Architectures, and Case Studies Microsoft Apps (Exchange, SQL, SharePoint) Oracle SAP • Performance White Paper http://www.vmware.com/resources/techresources/ • Performance User Community http://communities.vmware.com/community/vmtn/general/performance