Optimizing Oracle on VMware

advertisement
Optimizing Oracle on VMware
Bert Scalzo PhD & Oracle ACE
Quest Software: Database Expert
Intro
Many people swore that databases on virtual machines would not fly – or at best, would
be accepted very, very slowly. I was not one of those skeptics – so I wrote a book on just
that subject: Oracle on VMware: Expert Tips for Database Virtualization.
http://www.rampant-books.com/book_2008_1_oracle_vmware.htm
However I also don’t expect people in these tough economic times to rush out to buy
every new book, so this paper will attempt to provide a very brief synopsis of Oracle on
VMware book’s overall thesis and some high value proposition recommendations.
Other Sources
One of the first and most robust papers I’ve seen on this subject is from VMware itself:
Deployment of Oracle Databases on VMware Infrastructure.
http://www.vmware.com/partners/alliances/technology/oracle-database.html
This fine paper is 119 pages of highly useful DBA information for both effectively and
efficiently deploying Oracle databases on a VMware infrastructure.
At first Oracle was a little slow to fully embrace the new concept of databases on virtual
machines. But then they débuted their own virtual machine solution – and things quickly
changed. There are now papers that show benchmarks of virtual machines versus bare
metal that are achieving acceptable results.
http://www.oracle.com/technologies/virtualization/docs/ovmbenchmark.pdf
There are now even papers for successfully deploying RAC (Oracle Real Application
Clusters) on virtual machines.
http://www.oracle.com/technology/products/database/clusterware/pdf/oracle_rac_
in_oracle_vm_environments.pdf
So the trend winds have changed, virtualization is no longer verboten for databases 
So Let’s Begin
At Oracle Open World 2008, I presented some materials that show that deploying Oracle
on VMware with “default configuration and settings across the board” could cost you as
much as 440% in terms of achievable database server performance. However, we must
approach configuration, optimization and tuning in a slightly different manner – because
we must now fully consider the four core, shared resources shown here more deeply:
This is really nothing new – DBA’s have always cared about the CPU, memory, disk and
network bandwidths versus utilization. But generally speaking in the past, DBA’s tended
to have one database instance per physical server – thus they historically didn’t worry too
much about “shared resource” consumption. Of course with SAN, NAS and iSCSI shared
storage servers – DBA’s began to feel, spot and optimize around such shared resources.
Virtualization merely extends that same paradigm across all of the key component layers
that the database engine relies upon – so we must now account and accommodate for that.
Suppose that we have a SQL Server and Oracle database sharing the same resources and
running radically different workloads concurrently. We must now then setup, configure,
optimize and tune everything with the assumption that every resource of value is shared
and thus not ours alone. This means that there will be even less room for error than ever.
What Was Tested
Of course the very best possible workload to test with would be that from production
which one needs to deploy onto a virtualized server. For the sake of this paper, I simply
needed to choose something that a general readership population might either already
know or at least easily understand. Therefore I chose the TPC-C benchmark – a wellknown and fairly respected industry standard OLTP benchmark (although soon to be
replaced by the newer TPC-E benchmark). I ran the TPC-C benchmark for 50 to 300
concurrent users – and not larger due to my hardware size limitations. My goal was to
meet a Service Level Agreement (SLA) of no transaction taking more than 2 seconds,
and to determine just how many concurrent users my server could ideally accommodate.
Luckily I work for a software vendor (i.e. Quest Software) – so it’s easy for me to obtain
tools to make this testing and optimization effort trivial. I don’t mention these tools as a
sales pitch, but rather merely to inform you as to what I used. There are of course several
other tools out there that offer similar capabilities. So feel free to utilize those that best
suit your needs, budget and/or preferences. Again, I only list these so that you know how
I performed all of these tests. The 440% improvement results would have probably been
also fairly easily obtained using different tools – because it’s really the concepts here that
apply, and not the tools per se used to arrive at and/or derive them.
1. Load Generator = Benchmark Factory for Databases
2. Virtualization Monitor = Foglight for VMware
3. Database Performance = Foglight Database Performance Analysis
4. Database Ad-Hoc Diagnostics = Spotlight for Oracle
So returning to the technique used, I incrementally and in an evolutionary style modified
all the default settings across the board to obtain that 440% improvement for the TPC-C
OLTP database benchmark (your mileage will of course vary). Look at the chart below
(Figure 1) at how the response time for 300 concurrent users decreased from 1080 ms to
200. And while 1080 ms (i.e. 1 second) may have been acceptable (i.e. met the required
SLA), what would happen if the user load increased to 1000 or more users. Obviously the
tuned scenario would scale much better to thus to a much higher concurrent user load.
So what black magic was applied to get these results? The ten simple steps covered in
this paper are all that’s required to see the 440% improvement. Note that the VM server
used was running Windows 2003 Server Enterprise Edition Release 2 64-bit, but similar
issues would also apply had it been Linux or some other OS.
Figure 1 – Performance Achieved for Various Default Settings
Ten Tricks to Try
Sometimes the best things are quite often obvious and easy. So here’s a simple plan at
how to incrementally optimize all the relevant database virtualization default settings:
1. Obtain a baseline test for relative comparisons
2. On the VM host, exclude VM clients from active, online virus scans
3. Remove Windows Indexing Service(because don’t need fast file searches)
4. Remove other extraneous Windows services
5. Change the VM host registry settings to improve file system IO for databases
6. Optimize the VM host configuration and options
7. Optimize the VM client OS configuration and options for Oracle database
8. Remove other extraneous VM client OS services and daemons
9. Change VM client file system settings to improve IO performance for databases
10.
Adjust VM client file system block size to more closely match Oracle block size
My key contention is that as long as there are sufficient resources and/or bandwidth to
handle the net requests, requiring database servers to be islands unto themselves is now
passé’. Even if we say that virtualization adds a 10-20% overhead (which I’m inclined to
say is far closer to 10 than 20), with cheap hardware these days the benefits far outweigh
the negatives – just spend 10% more and get a bigger server to handle multiple databases.
But it may be a few years before that opinion is generally or more widely accepted.
The remainder of this paper focuses on just three somewhat less obvious key steps from
the list above (which are highlighted in green): #5 and #9 (which kind of go together as
one concept), and #10. All the rest are somewhat more obvious and thus in need of less
explanation. Plus they also tend to be more subjective (i.e. what services don’t we need).
Optimize All File Systems for Databases
Remember, a database ultimately makes IO requests – and we all know that IO is the
slowest part of the database equation. So look again above at what choices #5 and #9
really are and/or mean. As a reminder:


#5 = Change the VM host registry settings to improve file system IO for
databases
#9 = Change VM client file system settings to improve IO performance for
databases
In reality, this is one and the same improvement simply being applied to two different
virtualization levels: the host and each client. Thus I’m simply going to present how to
accomplish this technique for both Windows and Linux, and then leave it to the reader to
make sure to apply it properly across all their various virtualization layers.
Both the Windows NTFS and Linux ext2/3 file systems maintain multiple “meta-data”
information related to file access – such as date created, last time updated, etc. So an IO
request might actually generate multiple physical IO’s – one for the data file and one or
more for updating the related meta-data. In the case of the VMware server, we really
don’t care to keep OS detailed file system information about the hosted clients’ data file
access – it’s simply neither useful nor critical (unless doing snapshots). And at the Oracle
database level, we know Oracle accesses its files every so many seconds – so why pay to
update the meta-data with that information. Oracle keeps it’s own timestamps (i.e. SCN)
in the file headers.
For Windows, we simply adjust the following registry entry:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\
Control\FileSystem\NtfsDisableLastAccessUpdate = 1
By the way, Quest Software’s Toad for Oracle offers a screen to do this for Windows:
For Linux, there are several ways to accomplish the same result. We can set the arrtibute
for the individual Oracle files as follows:
chattr +A file_name
Or we can do it for an entire directory:
chattr –R +A directory_name
However the best method (because it automatically handles any file additions) is to edit
the /etc/fstab and add the NOATIME attribute:
/dev/sda6
/dev/sda1
/dev/cdrom
/dev/sda5
/
/boot
/mnt/cdrom
swap
ext3
ext2
iso9660
swap
defaults,noatime
defaults,noatime
noauto,owner,ro
defaults
1
1
0
0
1
2
0
0
Many people ask if I know similar settings for other operating systems such as AIX,
HPUX and Solaris – I don’t. But please email me and share that information, because I
get asked this one all the time – and would love to have an answer for those people. 
Match File System and Database Block Sizes
explain what choice #10 is. As a reminder:

#10 = Adjust VM client file system block size to more closely match Oracle block
size
The default block size for both the host and client OS file systems’ is generally not the
same as your Oracle block size (although hopefully the Oracle block size is a multiple of
the OS file block size). Since the host may be servicing multiple Oracle databases with
different block sizes, different database platforms (e.g. MySQL), or being used as to host
other applications (e.g. web server) – we cannot always make this adjustment at the host
level. But generally we can make it for each of the Oracle database clients.
Let’s assume we have a Linux host running a Linux based Oracle database client. Let’s
examine Scenario #2 from the chart below. Start by assuming that we create the host file
system using the default block size: 2K. Let’s further assume that we do the same thing
on the client – but that we size the database blocks at 4K. Thus each Oracle physical IO
requests asks the client OS for two IO’s, and each client IO asks the host for one IO.
That’s a total of four logical IO requests (although only two physical IO’s in reality –
but note that there is overhead for each logical IO request, so larger numbers are worse).
Now if the database block size had instead been 8K while the other factors had remained
at 2K (i.e. Scenario #3), then the logical IO’ would instead now be DB -> Client = 4 and
Client -> Host = 4 for a grand total of eight logical IO requests. Here’s a basic chart for
some common combinations and their sometimes surprising overheads:
Obviously there are some very bad choices in the chart above. So a seemingly good 16K
database block size might actually result in a total of 12 to 16 logical IO requests across
the virtualization infrastructure under the worst case scenario. While it’s still effectively
just 16 K being read, the extra overheads added by this mismatch only serve to multiply
the bad effects. So choose wisely.
The Results & Conclusion
Look again back above at Figure 1 above where response time for 300 concurrent users
decreased from 1080 ms to 200. With those initial default settings, the server could only
scale to some 500+ users and still meet the under 2 second SLA. I was able to scale well
above 2000 users on the very same hardware – and with better average response times.
The first question people ask is how did the incremental improvements pan out? That is
what “measurable” percentage improvement went with each of colored line in Figure 1?
I have included Figure 2 below to show that the tests (and their resulting graph lines) do
in fact have a one to one correspondence to the recommended 10 steps or tricks to try.
But I very purposefully left the graph color versus performance ramifications results key
off Figure 1 so that you would be inclined to try them all. They’re too easy to skip even
one of them – and they’re all worth it.
Figure 2 - Breakdown of the Steps to Achieve the Results
This paper attempts to demonstrate that a database can perform up to 440% better simply
by adjusting various virtualized infrastructure default settings. Of course “your mileage”
will vary, but the key point is that by simply taking the time to properly layer your virtual
infrastructure settings can have substantial impacts – which are magnified by the net or
cumulative effect of a virtual server hosting multiple database servers. And while my
example was specific to Oracle, these same principles generally apply to any database
being deployed on a virtual server.
So don’t hesitate to deploy your database servers on virtualized platforms – it works. 
Lessons Learned: (i.e. Best Practices)

Tons of “low hanging fruit” (i.e. easy no-brainer stuff)

Optimize each of the four major VM platform stacks
o Optimize the Host Machine (BIOS too)
o Optimize the Host VMware / OS Setup
o Optimize each Guest VM Configuration
o Optimize each Guest Operating System
 Remember: 440% improvement for practically free – really 
About the Author
Bert Scalzo is a Database Expert for Quest Software and a member of the TOAD dev
team. He has worked with Oracle databases for over two decades, including time spent
working at both Oracle Education & Oracle Consulting. Mr. Scalzo is an Oracle ACE,
holds several Oracle Masters certifications, plus an extensive academic background including a BS, MS and PhD in Computer Science, an MBA and several insurance
industry designations. He is an accomplished speaker and has presented at numerous
Oracle conferences - including OOW, ODTUG, IOUGA, OAUG, RMOUG and Hotsos.
Mr. Scalzo's key areas of DBA interest are Data Modeling, Database Benchmarking,
Database Tuning & Optimization, "Star Schema" Data Warehouses, Linux and VMware.
He has also written many articles, papers and blogs - including for Oracle Technology
Network (OTN), Oracle Magazine, Oracle Informant, PC Week (eWeek), Dell Power
Solutions Magazine, The LINUX Journal, linux.com, Oracle FAQ and Toad World. Plus
he has written six books: Oracle DBA Guide to Data Warehousing and Star Schemas,
TOAD Handbook, TOAD Pocket Reference (2nd Edition),Database Benchmarking:
Practical Methods for Oracle & SQL Server, Oracle on VMware: Expert tips for
Database Virtualization, and Advanced Oracle Utilities: The Definitive Reference. Mr.
Scalzo’s email addresses are bert.scalzo@quest.com or bert.scalzo@yahoo.com.
Download