RAC be Nimble, RAC be Quick Bert Scalzo, Domain Expert, Oracle Solutions Bert.Scalzo@Quest.com Copyright © 2006 Quest Software About the Author … Domain Expert & Product Architect for Quest Software Oracle Background: • Worked with Oracle databases for well over two decades (starting with version 4) • Work history includes time at both Oracle Education and Oracle Consulting Academic Background: • Several Oracle Masters certifications • BS, MS and PhD in Computer Science • MBA (general business) • Several insurance industry designations Key Interests: • Data Modeling • Database Benchmarking • Database Tuning & Optimization • "Star Schema" Data Warehouses • Oracle on Linux – and specifically: RAC on Linux Articles for: • Oracle’s Technology Network (OTN) • Oracle Magazine, • Oracle Informant • PC Week (eWeek) Articles for: • Dell Power Solutions Magazine • The Linux Journal • www.linux.com • www.orafaq.com 1 Books by Author … Potentially in 2008 … 2 Agenda • RAC Performance – “RAC in the Box” syndrome – Must Optimize every subsystem • RAC Optimization Approach – RAC Optimization Methodology – via Quest’s RAC Tools (just example - not a sales pitch) • Real-world Scenario – Dell’s use of Quest’s Solution for Oracle RAC – “Best Practices” applied incrementally & results 3 Optimization Focus • “Low Hanging Fruit” • Obvious yet Overlooked • Subtle yet Highly Critical Radically Different than other RAC tuning sessions: Not going to delve into obscure hardware, OS, network, Oracle and RAC tuning parameters or configurations, Just easy stuff that makes a big difference 4 Oracle RAC is Great, but … • Too often people expect RAC to “auto-magically” function out of the box with little to no optimization • During RAC optimization attempts, people far too often concentrate on just a single dimension – the Oracle database “stuff” (i.e. waits, parameters, etc) • During RAC optimization attempts, Oracle is often too heavily weighted as the source and reason for most, if not all, of the performance bottlenecks • Not enough “application nature” is identified and accounted for during the optimization process • Result: Too many people achieve sub-par results! 5 I call it “RAC in the BOX” syndrome • That’s just my stupid name for it (hope it catches) • But nearly half the RAC sites I visit are suffering from RAC performance issues related to this! • Still far too RAC experts among the general DBA population (although improving every day) • Really no simple, single button tools yet to make RAC “auto-magically” “fire on all cylinders” • Many people bail on RAC far too soon and simply fall-back to big SMP boxes (the evil that they know) • But with just a little manual “box winding”, anyone should be able to “pop the RAC weasel” free 6 RAC is System, Must Tune its Entirety 7 RAC Performance = Sum of its Parts • Application Nature (affects everything else below) • Public Network • Storage Network • Storage Sub-System • Oracle Instance Configuration • Oracle Cluster Configuration • Private Network (i.e. Interconnect) 8 RAC Performance Testing Methodology 1. Benchmark Factory www.TPC.org tests or Oracle trace file playback generator 2. Spotlight on RAC Monitor RAC system for performance bottlenecks RAC Performance Testing Methodology 3. TOAD with DBA Module “AWR/ADDM Reports “ Identify and correct Top five wait events 9 Quest Solutions for Oracle RAC Benchmark Factory® • Test Oracle RAC environments rapidly and reliably • Perform database “scalability” or “goal” testing to determine the most optimal RAC configuration • Tests: – TPC-C – TPC-H – Trace File playback – etc, etc, etc … • Let’s DBA concentrate on task at hand - Optimization 10 11 Quest Solutions for Oracle RAC Spotlight® on RAC • Monitor Oracle RAC environments rapidly and reliably • Diagnose Oracle RAC environment health levels at – Node – Cluster – ASM – Instance – Interconnect • Intelligent performance alerts plus market-leading GUI for entire RAC to instances architecture & bottlenecks • Let’s DBA concentrate on task at hand - Diagnosing 12 13 14 15 Quest Solutions for Oracle RAC TOAD® with DBA Module • Expedite typical DBA management & tuning tasks • Great Productivity Enhancing Features – Database Health Check – Database Probe – Database Monitor – AWR/ADDM Reports – UNIX Monitor – Stats Pack Reports (coming soon) • See Toad World paper – Title: “Maximize Database Performance Via Toad for Oracle” – http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx • Let’s DBA concentrate on task at hand – Correcting (i.e. Fixing) 16 17 Real-world Scenario Quest strategic partner & customer Dell uses Quest’s solution for Oracle RAC to test the performance of the Oracle RAC architecture running on Dell Power Edge servers and EMC Clarion SAN & iSCSI Disk Arrays 18 DELL Success Story www.Quest.com/success_stories/Dell-Quest.pdf 19 DELL Success Story Database Configuration used at DELL for the RAC test environment 20 DELL Success Story Near Linear Scalability! 21 Step-By-Step Example Apply Methodology, “Best Practices”, and Quest’s RAC tools to optimize and quantify the approximate percentage of the improvements Note – will quote some specific examples for a given RAC setup, your mileage will surely vary Test = TPC-C (OLTP) for 200-2000 users, 10 GB 22 Step 1 - Application Nature Know Your Application Demands (this info flows downstream) • OLTP vs. Data Warehousing – – – – – • • Concurrent User Load Profile (i.e. user load over time) Focus on User Response Time Requirements – – • Primarily Read vs. Write Average Transaction Size Likelihood of “Dead Lock” Logging, Flashback and Recovery Requirements etc, etc, etc … For example, TPC-C must run each transaction <= 2 seconds Response Time ~= Wait Events • Cary Millsap (Hotsos) calls this “Method R” • Anjo Kolk & et al … (Oracle) call this “YAPP Method” • Kyle Haily (PerfVision) paper “Waits Defined” • etc, etc, etc … Don’t skip this step – cost can be enormous – and that no network, OS, or database tuning can compensate for !!!!! 23 Application “Best Practices” Well known rules: • Write efficient SQL and/or PL/SQL code (explain plans) • Use bind variables to reduce unnecessary “re-parsing” • Often the underlying Application Code (e.g. Benchmark) NOT changeable, so you can’t do anything Deeper TPC-C Analysis (remember across all next steps): • Primarily Read with Some Writes • Small Average Transaction Size • High Concurrency with Potential Deadlocks • Logging for ACID compliance and no flashback 24 Public Network “Best Practices” Well known rules: • Isolate Network (for single or related applications only) • Use Gigabit Ethernet (consider “bonding” multiple cards) • Use Layer 2 or 3 Switches and verify Gigabit throughput TPC-C Analysis Ramifications: • Primarily Reads = Nothing • Small Transaction = No jumbo frames, Standard SDU/TCU • High Concurrency = Multiple Ethernet Segments (collisions) • No Logging, etc… = Nothing 25 Storage Network “Best Practices” Well known rules: • Isolate Network (for single or related applications only) • Use Fiber Channel for SAN, 10GB Ethernet for NAS/iSCSI • Consider multiple pathways per storage controller and HBA • Consider TCP/IP offload engine (TOE) NIC’s or iSCSI HBA’s TPC-C Analysis Ramifications: • Primarily Reads = Nothing • Small Transaction = Jumbo frames since “Block Level” IO • High Concurrency = Fiber Channel and Multiple Pathways (if budget) • No Logging, etc… = Nothing 26 Storage Sub-System “Best Practices” Well known rules: • More Smaller Disks generally higher overall throughput • More memory cache generally higher overall throughput • Avoid “write-back” mode if no backup power source (e.g. battery) • Align Stripe Boundaries: drive, OS block, LVM, file sys, database block, etc • Stripe Depth (i.e. size) from 256 KB to 1 MB • Stripe Width (i.e. # disks) between 4 and 16 • Stripe Depth = Stripe Width X Drive IO Size = One IO per Disk per IO request • Average I/O <= Stripe Width X Stripe Depth • Write-intensive = RAID 0+1/1+0 and Read-intensive = RAID 3 (sequential) or 5 (scattered) TPC-C Analysis Ramifications: • Primarily Reads = RAID 5, Adjust cache memory allocations & look-ahead algorithms • Small Transaction = Stripe Depth >= db_block_size X db_file_multiblock_read_count • High Concurrency = Low Deadlock, so spread DB objects and partitions across LUN’s • No Logging, etc… = Logging but no flashback, so write IO is reasonable, so RAID 5 OK 27 Oracle Instance “Best Practices” Well known rules: • Size & Tune the SGA appropriately for application nature • Choose reasonable block size based on application nature (?8K?) • Partition large objects & indexes across storage devices & spindles • Don’t assume any “golden rules” – i.e. test all assumptions! TPC-C Analysis Ramifications: • Primarily Reads = opt_index_caching=80, opt_index_adj_cost=20 • Small Transaction = Size redo logs correctly for small size X high load • High Concurrency = cursor_space_for_time=t, cursor_sharng=similar • No Logging, etc… = Turn off “Recycle Bin” but keep “LOGGING” 28 Oracle Cluster “Best Practices” Well known rules: • Increase default SGA size for all ASM instances (64M too small) • Interconnect is the most important bottleneck – many bonded NIC’s • Consider hash partitions & reverse indexes to spread IO across nodes • Don’t assume any “golden rules” – i.e. really test all assumptions!!! TPC-C Analysis Ramifications: • Primarily Reads = Nothing • Small Transaction = Decrease db_file_multiblock_read_count • High Concurrency = Decrease block size (?4K?) – see next slide • No Logging, etc… = Nothing 29 Single Instance – No Block Contention 30 Cluster – Block Contention Costs 31 Private Network “Best Practices” Well known rules: • Isolate Network (for single cluster only – and 0% public) • Use 10GB Ethernet or Inifini-Band (Dell found 15% RTI) • Consider multiple pathways per HBA and storage controller • Jumbo frames since high “Block Level” IO between nodes TPC-C Analysis Ramifications: • Primarily Reads = Nothing • Small Transaction = Nothing • High Concurrency = Lower block size until interconnect traffic OK – Consider increasing the OS priority of the global cache cluster services • No Logging, etc… = Nothing 32 Let’s Apply the Recommendations Read Count Block Size Cursor Cursor Index Space Share Cache Index Cost Jumbo Test 1 16 8 False Exact 0 100 False Test 2 2 8 False Exact 0 100 False Test 3 2 4 False Exact 0 100 False Test 4 2 4 True True 80 20 False Test 5 2 4 True True 80 20 True 33 Results – TPS (lesser interest here) Transactions / Second 35.00 30.00 25.00 Run 1 20.00 Run 2 Run 3 15.00 Run 4 Run 5 10.00 5.00 0.00 50 100 150 200 250 300 350 400 450 500 34 Results – Average Response Time Average Response Time 6.00 5.00 4.00 Run 1 Run 2 3.00 Run 3 Run 4 Run 5 2.00 1.00 Sub Second 0.00 50 100 150 200 250 300 350 400 450 500 35 Thank you Please offer any questions or comments Remember: • Eliminate “RAC in the Box” syndrome – eat low hanging fruit • Example was for TPC-C or OLTP type application • TPC-H or Data Warehouse would NOT be the same • Your mileage may well vary (especially percentages) Toad World Article: “Maximize Database Performance Via Toad for Oracle” http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx Dell Power Solutions article: http://www.quest.com/success_stories/Dell-Quest.pdf 36