DBA Guide to Databases on VMware
Solution Presentation - Don Sullivan – Senior Systems Engineer - Database
Specialist
sullivand@vmware.com
© 2011 VMware Inc. All rights reserved
Don Sullivan – sullivand@vmware.com
• Oracle Certified Master, Server Products Trainer for Oracle University and
consultant with Oracle Advanced Technology Services - 1998-05.
• Oracle SA for Polyserve/HP – 05-10
• Vmware SE DB specialist 2010 – Present
4
Confidential
Agenda








Introduction
Understanding VMware Performance
Designing Databases on VMware
Developing and Testing Databases
Migrating Existing Databases
Securing the Databases
Running Databases on VMware
Monitor and Troubleshoot Database Performance
Introduction
The Trend…
• Large, multi-core servers becoming commodity
• Increasing number of CPU cores, memory, network bandwidth
• Traditional “one app one server” model is out dated
• Increasing demands for high availability
• Business going global
• 24x7 internet
• Economy demands to increase IT efficiency
• Reduce operational costs, increase productivity
• Reduce HW and SW costs
• Increasing manageability challenges and security
concerns over database server sprawl
Experienced DBAs Look to VMware…
Quality of Service
App Lifecycle
App Costs
9
Improve Quality of Service
Built-in HA provides protections to all database environments
from production, development, to QA
Simple and Reliable Disaster Recovery manager per site
instead of per database
Scale on demand to handle database spikes/peak utilization
Accelerate Database Lifecycle from Dev to Production
Reduce provisioning times from weeks to minutes
Self-service provisioning
Enable testing of databases with production clones
Reduce Infrastructure and Software License Costs
Reduce Infrastructure footprint through consolidation
while maintaining full database isolation
Increase utilization of software licenses
Understanding VMware Performance
% of Applications
>95% of Apps Match Native Performance on Virtual Machines
ESX 2
ESX 3
ESX 3.5
ESX 4
30% - 60%
20% - 30%
<10% - 20%
<2% - 10%
1 vCPU
2 vCPU
4 vCPU
8 vCPU
< 4 GB
16 GB
64 GB
255 GB
380 Mb/s
800 Mb/s
9 Gb/s
30 Gb/s
< 10,000
20,000
100,000
> 350,000
Application Performance Requirements
1. Source: VMware Capacity Planner assessments
Overhead
SQL Server Scale Up Performance Relative to Native
• At 1 & 2 vCPUs, ESX is 92 % of native performance
 At 1, 2 and 4 vCPUs on the 8pCPU server, ESX is able to
effectively offload certain tasks to idle cores.
• 4 vCPUs , 88% and 8 vCPUs 86 % of native performance
Single VM Performance: Well-Known Database OLTP Workload †
Intel® Xeon® processor 5500 series based 8-pCPU server
RHEL 5.1
Oracle 11gR1
< 15% overhead for 8 vCPU VM
Transaction Rate (Ratio to 1-way VM)
In-house ESX Server
8,900 total DB transactions per second
Near-perfect scalability from 1 to 8 vCPUs
60,000 I/O operations/second
†
A fair-use implementation of the
TPC-C workload; results are not
TPC-C compliant
The average Oracle DB fits easily in a VM
VM
8 vCPU
VM
255GB
Oracle DB
2-4 CPU
4% utilized
CPU
VM
350,000 IOPS
Oracle DB
4-8GB
50% utilized
Memory
Source: VMware Capacity Planner analysis of > 700,000
servers in customer production environments
Oracle DB
1200 IOPS
Disk IO
VM
30 Gb/s
Oracle DB
2 MB/S
Network IO
Migrating Oracle 10g from UNIX to vSphere
OnCourse Application
125,000 total users
12,000 concurrent users
“ We have been able to
virtualize our most
demanding Oracle
Databases on x86
servers. We now have
the confidence that
vSphere can handle our
largest transactionprocessing databases
with ease.”
Rob Lowden, Director of IT
at Indiana University
IBM pSeries
9 Power5 Cores
100% utilized
x86
8 virtual CPUs
50% utilized
Designing Databases on VMware
vSphere High Availability Features
VMware HA
• Detects operating system and hardware failures
• Automatically restarts failed database virtual machine
• Provides a simple and reliable first line of defense for all databases
• Can be used in conjunction with Symantec App HA to provide application
aware protections
VMware vMotion
• Enables live migration of database virtual machines from one physical
server to another without service interruption
• Can reduce virtual machine planned downtime
• Perform host maintenance any time of the day
VMware DRS
• Monitors state of virtual machine resource usage
• Can automatically and intelligently locate virtual machine
• Directs compute resources where needed
• Maintains database response time and SLAs
Scalability on Demand
Hot-Add Capacity
Dynamic
Scaling on
VMware
1 vCPU
2 GB
VMotion to More
Powerful Host
Provision Additional App
Instance in Minutes
21
4 vCPU
64 GB
Hardware Failure Tolerance
Transforming Availability Service Levels
SQL Failover Clustering /
Oracle RAC
Continuous
VMotion
VMware FT
(Planned Downtime)
Automated
Restart
SQL Database
Mirroring / Oracle
Data Guard
VMware HA
Unprotected
0%
10%
Application Coverage
100%
Clustering too complex and expensive for most applications
VMware HA provide simple, cost-effective availability
VMotion provides continuous availability against planned downtime
VMware vCenter Site Recovery Manager™ (SRM)
• Relies on storage replication
• Allows creation, maintenance, and execution of automated process to
facilitate site recovery
• Safe testing without impacting production environment
• Self-documenting
Conventional DB Consolidation is Difficult
Multi-Instancing
DB
DB
DB
DB
DB
orcl
orcl
orcl
orcl
orcl
Shared Instance
DB
fault)
• No load balancing across physical
nodes
DB
DB
DB
Shared Instance
Shared OS
• No OS isolation (configuration, security,
DB
Shared OS
• No OS isolation (configuration, security,
fault)
• No Database isolation
• Resource isolation depends on DBMS
Resource Governor
• No load balancing across physical
nodes
Ideal Platform for DB Consolidation
Legacy DB
1 Fast consolidation with P2V
Increase performance!
DB
2 Preserve isolation in VM
OS isolation
DB isolation
Security isolation
3 Guarantee resources
DB
DB
DB
DB
DB
DB
DB
DB
Reservations
Priorities
Maximums
4 Load balance across
nodes
vMotion
DRS
Developing and Testing Databases
Provisioning on Demand
30
Fast, Self-Service Provisioning
Developer /
QA
DBA
31
Lab Manager (and vCloud)
Streamline Testing with Snapshots and Clones
4
Test
Web APP DB
OS
OS
vApp
3
Run more
tests faster
Production
Move changes into
production
OS
2
1
Exact copy of
production
Web
APP
DB
OS
OS
OS
vApp
Archive for Fast
Roll-back
Web APP DB
OS
OS
OS
vApp
> Faster testing
> More accurate
testing on exact production copy
> Lower cost testing infrastructure
Migrating Existing Databases
P2V with vCenter Converter
• Easy to use, wizard driven process
• Converts multiple local and remote physical database
servers simultaneously with centralized management
console
• Creates one-to-one mapping from physical server to
database virtual machine
• Stop database services (leave OS running) for hot
cloning of database server to ensure data
consistency
New Database Installation
• Install new OS and database software on VM, then
migrate data from physical server
• Works well when planning database upgrade with
migration
• Works with VMware Templates and Clones for rapid
deployment of multiple databases
• With RDMs, data can swing over without
backup/copy/restore
• Minimize downtime
• No additional storage requirement for migration
• When used with native database replication features
(such as mirroring, log shipping), the database VM
can run side-by-side with the physical server to
minimize migration downtime
Securing the Databases
Better-than-Physical Security
• More granular security compared to native database
consolidation
• Minimize the database “surface area” per VM
• Allows customization of security at VM level
• Install/enable components and features as needed
• Enable network protocols as needed
• More selective administrative and db owner privileges
• Database patching and change management less risky
Running Databases on VMware
Reduce Plan and Unplanned Downtime
Protect Databases against Hardware Failures
• Built-in, host based high
availability
• Simple to configure and
easy to manage
• Protects against hardware
or operating system
failures
• Provide first line of
defense for all databases
on the host, including
production, development,
QA, and etc.
VMware HA with Database Mirroring for Faster Recovery
• Works in conjunction with native database high availability
features
• Protection against HW/SW failures and DB corruption
In-guest Backup
• Standard method for physical or virtual
• Agent runs in the VM guest and handles database quiescing
• Data is sent over the IP network
• Can affects CPU utilization in the guest OS
Array-based Backup
• Backup vendor software coordinates with VSS to create a supported backup
image of the databases
• Snap-shotted databases can later be streamed to tape as flat files with no IO
impact to the production databases
Manage Patch Upgrade
• The Challenges
• Patch/upgrade may introduce bugs, regressions
• Uninstalling a patch/upgrade may not be possible
• Rolling back a patch/upgrade requires a rebuild of environment, and
restore data from backup
• VMware Solutions
• Enable testing with production clones, reduces the risk of regression
• VMware Snapshot
 Creates a snapshot of the state and data of the database virtual machine at
a specific point in time
 Allows DBAs to easily revert back to the original state of the database virtual
machine before the upgrade
Manage Legacy Databases
• The Challenges
• Organizations need to maintain legacy database due to
regulatory/compliance requirements, and other reasons
• Legacy databases not are upgradable due to vendor support, HW
compatibilities issues
• Older hardware tends to fail more frequently
• VMware Solutions
• VMs can be cloned and stored in a virtual vault/archive, then powered
on in the event of an audit or discovery request
• Virtualization abstracts the OS/app from the underlying hardware,
enables legacy database to run on the latest hardware
• Legacy database performance can be improved significantly by
moving to the latest hardware
Monitor and Troubleshoot Databases
Performance
Host Level Monitoring
• vSphere Client:
• GUI interface, primary tool for observing
performance and configuration data for
one or more ESX/ESXi hosts
• Does not require high levels of privilege
to access the data
• Resxtop/Esxtop
• Gives access to detailed performance
data of a single ESX/ESXi host
• Provides fast access to a large number
of performance metrics
• Requires root-level access
• Runs in interactive, batch, or replay
mode
Key Metrics to Monitor
Resource
CPU
Metric
Host /
VM
Description
%USED
Both
CPU used over the collection interval (%)
%RDY
VM
CPU time spent in ready state
%SYS
Both
Percentage of time spent in the ESX Server VMKernel
Swapin, Swapout
Both
Memory ESX host swaps in/out from/to disk (per VM,
or cumulative over host)
MCTLSZ (MB)
Both
Amount of memory reclaimed from resource pool by
way of ballooning
READs/s,
WRITEs/s
Both
Reads and Writes issued in the collection interval
DAVG/cmd
Both
Average latency (ms) of the device (LUN)
KAVG/cmd
Both
Average latency (ms) in the VMkernel, also known as
“queuing time”
GAVG/cmd
Both
Average latency (ms) in the guest. GAVG = DAVG +
KAVG
MbRX/s, MbTX/s
Both
Amount of data transmitted per second
PKTRX/s, PKTTX/s
Both
Packets transmitted per second
%DRPRX,
%DRPTX
Both
Drop packets per second
Memory
Disk
Network
Database VM Level Monitoring
• The primary tools and methodologies for monitoring
database performance have not change
• Monitoring tools
• SQL Server: Perfmon, Profiler, Dynamic Manage Views
• Oracle: Statspack\AWR
• Time-based metrics reported in in-guest tools may not be accurate,
use host level monitoring tools
• Focus on identifying bottlenecks instead of time-based
measurements
• CPU bottleneck: high processor queue length
• IO bottleneck: high disk queue length
Host CPU Saturation
• Typical symptoms
• DB Instance
 Sluggish performance with no appearance CPU, memory, disk resource issue
• ESX
 Sustained high host CPU utilization, with avg. > 75%, peak > 90%
 High VM Ready time
• Common causes
• CPU over commitment
• Unexpected guest VM CPU saturations driving up the host CPU usage
• Solutions
• Use vMotion or DRS to redistribute VMs to other hosts
• Use resource controls to ensure resource is available to DB InstanceVMs
• Check hardware assisted virtualization is enabled
Guest Memory Misconfiguration
• Typical Symptoms
• Oracle\SQL Server
 Low buffer cache hit ratio, low page life expectancy, high number of lazy writes, high
number of checkpoint pages/sec
• ESX
 Ballooning > 0
• Common Causes
• Misconfiguration of Instance memory and/or insufficient ESX memory
reservation for VM
• Solutions
• Set VM memory reservation = memory provisioned
• Set policies which disallow over committing of CPU resources
• Analyze vCPU utilization and verify that vCPUs are not idle
Monitoring Disk Performance with ESXTOP
…
• Rule of thumb:
• GAVG/cmd > 20ms = high latency!
very large values
for DAVG/cmd
and GAVG/cmd
• What does this mean?
• Latency when command reaches device is high.
• Latency as seen by the guest is high.
• Low KAVG/cmd means command is not queuing in VMkernel.
Insufficient Disk Sub-System
• Typical Symptoms
• Oracle\SQL Server
 High number of waits for PAGEIOLATCH_EX, PAGEIOLATCH_SH
• ESX
 Disk Latency is high, GAVG/cmd > 20ms
• Common Causes
• Overloaded or misconfigured storage sub-system
• Sub-optimal query execution plan
• Solutions
•
•
•
•
•
Make sure devices are configured properly (caches, queue depths)
Check networking settings (for iSCSI/NAS)
Increase memory to reduce need for disk access
Index tune queries to reduce the number of IOs
Use storage vMotion to balance load across storages systems
Resources
• Visit our partner central for Solutions Toolsets
http://www.vmware.com/partners/partners.html
• Running business critical applications on VMware
http://www.vmware.com/solutions/business-critical-apps/
 Best Practices, Reference Architectures, and Case Studies
 Microsoft Apps (Exchange, SQL, SharePoint)
 Oracle
 SAP
• Performance White Paper
http://www.vmware.com/resources/techresources/
• Performance User Community
http://communities.vmware.com/community/vmtn/general/performance