LESSONS LEARNED

advertisement
Oracle 10g RAC Scalability –
Lessons Learned
Bert Scalzo, Ph.D.
Bert.Scalzo@Quest.com
About the Author
•
•
•
•
•
•
Oracle Dev & DBA for 20 years, versions 4 through 10g
Worked for Oracle Education & Consulting
Holds several Oracle Masters (DBA & CASE)
BS, MS, PhD in Computer Science and also an MBA
LOMA insurance industry designations: FLMI and ACS
Books
– The TOAD Handbook (March 2003)
– Oracle DBA Guide to Data Warehousing and Star Schemas (June 2003)
– TOAD Pocket Reference 2nd Edition (June 2005)
• Articles
– Oracle Magazine
– Oracle Technology Network (OTN)
– Oracle Informant
– PC Week (now E-Magazine)
– Linux Journal
– www.Linux.com
About Quest Software
Used in this paper
Project Formation
This paper is based upon collaborative RAC research efforts
between Quest Software and Dell Computers.
Quest:
•Bert Scalzo
•Murali Vallath – author of RAC articles and books
Dell:
•Anthony Fernandez
•Zafar Mahmood
Also an extra special thanks to Dell for allocating a million
dollars worth of equipment to make such testing possible 
Project Purpose
Quest:
•To partner with a leading hardware vendor
•To field test and showcase our RAC enabled software
•Spotlight on RAC
•Benchmark Factory
•TOAD for Oracle with DBA module
Dell:
•To write a Dell Power Edge Magazine article about the
OLTP scalability of Oracle 10g RAC running on typical
Dell servers and EMC storage arrays
•To create a standard methodology for all benchmarking
of database servers to be used for future articles and for
lab testing & demonstration purposes
OLTP Benchmarking
TPC benchmark (www.tpc.org)
TPC Benchmark™ C (TPC-C) is an OLTP workload. It is a mixture of read-only and update
intensive transactions that simulate the activities found in complex OLTP application
environments. It does so by exercising a breadth of system components associated with
such environments, which are characterized by:
• The simultaneous execution of multiple transaction types that span a breadth of
complexity
• On-line and deferred transaction execution modes
• Multiple on-line terminal sessions
• Moderate system and application execution time
• Significant disk input/output
• Transaction integrity (ACID properties)
• Non-uniform distribution of data access through primary and secondary keys
• Databases consisting of many tables with a wide variety of sizes, attributes, and
relationships
• Contention on data access and update
Excerpt from “TPC BENCHMARK™ C: Standard Specification, Revision 3.5”
Create the Load - Benchmark Factory
The TPC-C like benchmark measures on-line transaction processing (OLTP) workloads.
It combines read-only and update intensive transactions simulating the activities found
in complex OLTP enterprise environments.
Monitor the Load - Spotlight on RAC
Hardware & Software
Servers,
Storage
and
Software
Oracle 10g RAC Cluster Servers
10 x 2-CPU Dell PowerEdge 1850
3.8 GHz P4 processors with HT
4 GB RAM (later expanded to 8GB RAM)
1 x 1 Gb NICs (Intel) for LAN
2 x1 Gb LOM teamed for RAC interconnect
1 x two port HBAs (Qlogic 2342)
DRAC
Benchmark Factory Servers
2 x 4-CPU Dell PowerEdge 6650
8 GB RAM
Storage
1 x Dell | EMC CX700
1 x DAE unit: total 30 x 73GB 15K RPM disks
Raid Group 1: 16 disks having 4 x 50GB RAID 1/0
LUN’s for Data and backup
Raid Group 2: 10 disks having 2 x 20GB RAID 1/0
LUN’s for Redo Logs
Raid Group 3: 4 disks having 1 x 5 GB RAID 1/0
LUN for voting disk, OCR, and spfiles
2 x Brocade SilkWorm 3800 Fibre Channel Switch
(16 port)
Configured with 8 paths to each logical volume
Network
1 x Gigabit 5224 Ethernet Switches (24 port) for
private interconnect
1 x Gigabit 5224 Ethernet switch for Public LAN
RHEL AS 4 QU1 (32-bit)
EMC PowerPath 4.4
EMC Navisphere agent
Oracle 10g R1 10.1.0.4
Oracle ASM 10.1.0.4
Oracle Cluster Ready Services 10.1.0.4
Linux bonding driver for interconnect
Dell OpenManage
Windows 2003 server
Quest Benchmark Factory Application
Quest Benchmark Factory Agents
Quest Spotlight on RAC
Quest TOAD for Oracle
Flare Code Release 16
Linux binding driver used to team dual onboard
NIC’s for private interconnect
TOAD for Oracle
Setup Planned vs. Actual
Planned:
•Redhat 4 Update 1 64-bit
•Oracle 10.2.0.1 64-bit
Actual:
•Redhat 4 Update 1 32-bit
•Oracle 10.0.1.4 32-bit
Issues:
•Driver problems with 64-bit (no real surprise)
•Some software incompatibilities with 10g R2
•Known ASM issues require 10.0.1.4, not earlier
Testing Methodology – Steps 1 A-C
1. For a single node and instance
a. Establish a fundamental baseline
i. Install the operating system and Oracle database (keeping all normal installation defaults)
ii. Create and populate the test database schema
iii. Shutdown and startup the database
iv. Run a simple benchmark (e.g. TPC-C for 200 users) to establish a baseline for default
operating system and database settings
b. Optimize the basic operating system
i. Manually optimize typical operating system settings
ii. Shutdown and startup the database
iii. Run a simple benchmark (e.g. TPC-C for 200 users) to establish a new baseline for basic
operating system improvements
iv. Repeat prior three steps until a performance balance results
c. Optimize the basic non-RAC database
i. Manually optimize typical database “spfile” parameters
ii. Shutdown and startup the database
iii. Run a simple benchmark (e.g. TPC-C for 200 users) to establish a new baseline for basic
Oracle database improvements
iv. Repeat prior three steps until a performance balance results
Testing Methodology – Steps 1 D-E
d) Ascertain the reasonable per-node load
a) Manually optimize scalability database “spfile” parameters
b) Shutdown and startup the database
c) Run an increasing user load benchmark (e.g. TPC-C for 100 to 800 users increment by
100) to find the “sweet spot” of how many concurrent users a node can reasonably
support
d) Monitor the benchmark run via the vmstat command, looking for the point where
excessive paging and swapping begins – and where the CPU idle time consistently
approaches zero
e) Record the “sweet spot” number of concurrent users – this represents an upper limit
f) Reduce the “sweet spot” number of concurrent users by some reasonable percentage to
account for RAC architecture and inter/intra-node overheads (e.g. reduce by say 10%)
e) Establish the baseline RAC benchmark
g) Shutdown and startup the database
h) Create an increasing user load benchmark based upon the node count and the “sweet
spot” (e.g. TPC-C for 100 to node count * sweet spot users increment by 100)
i) Run the baseline RAC benchmark
Step 1B - Optimize Linux Kernel
Linux Kernel parameters /etc/sysctl.conf:
 kernel.shmmax = 2147483648
 kernel.sem = 250 32000 100 128
 fs.file-max = 65536
 fs.aio-max-nr = 1048576
 net.ipv4.ip_local_port_range = 1024 65000
 net.core.rmem_default = 262144
 net.core.rmem_max = 262144
 net.core.wmem_default = 262144
 net.core.wmem_max = 262144
Step 1C - Optimize Oracle Binaries
Oracle compiled & linked for Asynchronous IO:
1.cd to $ORACLE_HOME/rdbms/lib
a. make -f ins_rdbms.mk async_on
b.make -f ins_rdbms.mk ioracle
2.Set necessary “spfile” parameter settings
a. disk_asynch_io = true (default value is true)
b.filesystemio_options = setall (for both async and direct io)
Note that in Oracle 10g Release 2 asynchronous IO is now
compiled & linked in by default.
Step 1C - Optimize Oracle SPFILE
spfile adjustments shown below:
 cluster_database=true
 .cluster_database_instances=10
 db_block_size=8192
 processes=16000
 sga_max_size=1500m
 sga_target=1500m
 pga_aggregate_target=700m
 db_writer_processes=2
 open_cursors=300
 optimizer_index_caching=80
 optimizer_index_cost_adj=40
The key idea was to eek out as much SGA memory usage as possible within the 32-bit operating system limit
(about 1.7 GB). Since our servers had only 4 GB of RAM each, we figured that allocating half to Oracle was
sufficient – with the remaining memory to be shared by the operating system and the thousands of dedicated
Oracle server processes that the TPC-C like benchmark would be creating as its user load.
Step 1D – Find Per Node Sweet Spot
Finding the ideal per node sweet spot is arguably the most critical aspect of the entire benchmark testing
process – and especially for RAC environments with more than just a few nodes.
We initially ran a 100-800 user TPC-C on the single node
 Without monitoring the database server using the vmstat command
 Simply looked the BMF transactions per second graph, which was positive to beyond 700 users
 Assumed this meant the “sweet spot” was 700 users per node (and did not factor in any overhead)
What was happening in reality:
 The operating system was being overstressed and exhibited thrashing characteristics at about 600 users
 Running benchmarks for 700 users per node did not scale either reliably or predictably beyond four servers
 Our belief is that by taking each box to a near thrashing threshold by our overzealous per node user load
selection, the nodes did not have sufficient resources available to communicate in a timely enough fashion for
inter/intra-node messaging – and thus Oracle began to think that nodes were either dead or non-respondent
 Furthermore when relying upon Oracle’s client and server side load balancing feature, which allocates
connections based upon node responding, the user load per node became skewed and then exceeded our per
node “sweet spot” value. For example when we tested 7000 users for 10 nodes, since some nodes appeared dead
to Oracle – the load balancer simply directed all the sessions across whatever node were responding. So we
ended up with nodes trying to handle far more than 700 users – and thus the thrashing was even worse.
Sweet Spot Lessons Learned
•Cannot solely rely on BMF transactions per second graph
•Can still be increasing throughput while beginning to trash
•Need to monitor database server with vmstat and other tools
•Must stop just shy of bandwidth challenges (RAM, CPU, IO)
•Must factor in multi-node overhead, and reduce accordingly
•Prior to 10g R2, better to rely on app (BMF) load balancing
•If you’re not careful on this step, you’ll run into roadblocks
which either invalidate your results or simply cannot scale!!!
Testing Methodology – Steps 2 A-C
2. For 2nd through Nth nodes and instances
a. Duplicate the environment
i. Install the operating system
ii. Duplicate all of the base node’s operating system settings
b. Add the node to the cluster
i. Perform node registration tasks
ii. Propagate the Oracle software to the new node
iii. Update the database “spfile” parameters for the new node
iv. Alter the database to add node specific items (e.g. redo logs)
c. Run the baseline RAC benchmark
i. Update the baseline benchmark criteria to include user load scenarios from the prior
run’s maximum up to the new maximum based upon node count * “sweet spot” of
concurrent users using the baseline benchmark’s constant for increment by
ii. Shutdown and startup the database – adding the new instance
iii. Run the baseline RAC benchmark
iv. Plot the transactions per second graph showing this run versus all the prior baseline
benchmark runs – the results should show a predictable and reliable scalability factor
Step 2C – Run OLTP Test per Node
With the correct per node user load now correctly identified and guaranteed load balancing, it
was now a very simple (although time consuming) exercise to run the TPC-C like benchmarks
listed below:






1 Node:
2 Node
4 Node
6 Node
8 Node
10 Node
100 to 500 users, increment by 100
100 to 1000 users, increment by 100
100 to 2000 users, increment by 100
100 to 3000 users, increment by 100
100 to 4000 users, increment by 100
100 to 5000 users, increment by 100
Benchmark Factory’s default TPC-C like test iteration requires about 4 minutes for a given
user load. So for the single node with five user load scenarios, the overall OLTP benchmark
test run requires 20 minutes.
During the entire testing process the load was monitored to identify any hiccups using
Spotlight on RAC.
Some Speed Bumps Along the Way
As illustrated below when we reached our four node tests we did identify that CPU’s on node
racdb1 and racdb3 reached 84% and 76% respectively. Analyzing the root cause of the problem
it was related to temporary overload of users on these servers, and the ASM response time.
Some ASM Fine Tuning Necessary
We increased the following parameters on the ASM
instance ran our four node tests again and all was well
beyond this:
Parameter
Default
Value
SHARED_POOL 32M
LARGE_POOL 12M
New
Value
67M
67M
This was the only parameter change we had to make to
the ASM instance and beyond this everything work just
smooth.
Smooth Sailing After That
As shown below, the cluster level latency charts from Spotlight on RAC during our eight
node run. This indicated that the interconnect latency was well within expectations and
in par with any industry network latency numbers.
Full Steam Ahead!
As shown below, ASM was performing excellently well at this user load. 10 instances with over
5000 users indicated an excellent service time from ASM, actually the I/O’s per second was pretty
high and noticeably good - topping over 2500 I/O’s per second!
Final Results
Other than some basic monitoring to make sure that all is well and the tests are working, there’s
really not very much to do while these tests run – so bring a good book to read. The final results
are shown below.
Interpreting the Results
The results are quite interesting. As the previous graph clearly shows, Oracle’s RAC
and ASM are very predictable and reliable in terms of its scalability.
Each successive node seems to continue the near linear line almost without issue.
Now there are 3 or 4 noticeable troughs in the graph for the 8 and 10 node test runs
that seem out of place.
Note that we had one database instance that was throwing numerous ORA-00600
[4194] errors related to its UNDO tablespace. And that one node took significantly
longer to startup and shutdown than all the other nodes combined. A search of
Oracle’s metalink web site located references to a known problem that would
require a database restore or rebuild.
Since we were tight on time, we decided to ignore those couple of valleys in the
graph, because it’s pretty obvious from the overall results we obtained that
smoothing over those few inconsistent points would yield a near perfect graph –
showing that RAC is truly reliable and predictable in terms of scalability.
Projected RAC Scalability
Using the 6 node graph results to project forward, the figure below shows a reasonable expectation in terms of
realizable scalability – where 17 nodes should equal nearly 500 TPS and support about 10,000 concurrent users.
Next Steps …
•Since first iteration of test we were limited by memory, we
upgraded each database server from 4 to 8 GB RAM
•Now able to scale up to 50% more users per node
•Now doing zero percent paging and/or swapping
•But – now CPU bound
•Next step, replace each CPU with a dual-core Pentium
•Increase from 4 CPU’s (2-real/2-virtual) to 8 CPU’s
•Should be able to double users again ???
•Will we now reach IO bandwidth limits ???
•Will be writing about those results in future Dell articles…
Conclusions …
 A few minor hiccups at the initial round where we tried to determine the optimal user load on a node for
the given hardware and processor configuration
 The scalability of the RAC cluster was outstanding. Addition of every node to the cluster showed
steady - close to linear scalability. Close to linear scalability because of the small overhead that
the cluster interconnect would consume during block transfer between instances.
 The interconnect also performed very well, in this particular case NIC paring/bonding feature of Linux
was implemented to provide load balancing across the redundant interconnects which also helped provide
availability should any one interconnect fail.
 The DELL|EMC storage subsystem that consisted of six ASM diskgroups for the various data
files types performed with high throughput also indicating high scalability numbers. EMC
PowerPath provided IO load balancing and redundancy utilizing dual Fibre Channel host bus
adapters on each server.
 It’s the unique architecture of RAC that makes this possible, because irrespective of the number of
instances in the cluster, the maximum number of hops that will be performed to before the requestor gets
the block requested will not exceed three under any circumstances. This unique architecture of RAC
removes any limitations in clustering technology (available from other database vendors) giving maximum
scalability. This was demonstrated through the tests above.
 Oracle® 10g Real Application Clusters (RAC) running on standards-based Dell™ PowerEdge™
servers and Dell/EMC storage can provide a flexible, reliable platform for a database grid.
 In particular, Oracle 10g RAC databases on Dell hardware can easily be scaled out to provide the
redundancy or additional capacity that the grid environment requires.
Questions …
Thanks for coming 
Download