SQL Performance 2011/12 Joe Chang, SolidQ www.qdpma.com http://sqlblog.com/blogs/joe_chang/default.aspx jchang6@yahoo.com 2011 • Hardware is Powerful & Cheap – CPU (cores), memory, and now IO too! • Quad-core since 2006, 1GHz since 2000 • Is Performance still a concern? – Yes, along with fundamentals • Modern Performance Strategy – Can handle minor inefficiencies – Identify and circumvent the really bad things Modern Hardware • 4-12 cores per processor socket • 16GB DIMM at less than $1K • SSD – Enterprise grade SSD still moderately expensive • Both SLC and MLC – Consumer SSD – really cheap, $3-4K per TB • Uneven performance characteristics over time • Desired IO performance: 1-2GB/s, 20-50K IOPS Hardware Baseline 2011 Entry • 1 Xeon E3 quad-core • 16GB memory – 4 x 4GB unbuffered ECC • SSD options: – 2-4 SATA SSDs – 1-2 PCI-E SSDs Mid-range • 2 Xeon 5600 6-core • 48 – 192GB – 6 x 8GB to 12 x 16GB • SSD options – 16+ SATA SSDs – 4-5 PCI-E SSDs Performance Fundamentals • Network round-trips – Owner qualified, case correct • Log write latencies – Sufficiently low to support transaction volume • Not necessarily separate data and log disks • Normalization – correct data trumps all! • Indexes – a few good ones, and not too many! • SQL – that the optimizer What can go wrong? • With immense hardware resources • And a great database engine – What can go wrong? • Following a fixed set of rules and procedures – Basic transactions processing should work well – If your process does something unanticipated • Some things can go horribly wrong Performance Concepts • Query Optimizer • Execution plan operators – Formula for component operation • Data distribution statistics – to estimate rows & pages, automatically updated – Rules when estimate not possible • Stored procedure compile rules – Parameters and variables Stored Procedure Basics Parameters and Variables • On compile – Parameter values used to for row estimate – Variables – assume unknown value • Consider effect of skewed distribution Parameter & Variable 6 rows for value 1 4 rows for value unknown Consider impact for skewed data distributions Stored Procedure Compile Options • WITH RECOMPILE • OPTIMIZE FOR • Plan Guide • Temp table – KEEP PLAN, KEEPFIXED PLAN Compile & Execute Time • Plan reuse desired when – Compile cost is high relative to execute cost • Recompile desired when – Execute cost is high relative to compile cost Statistics Basics Statistics • No statistics – table variables • Temp table – statistics auto recompute – 6 row modified, 500 rows, every 20% thereafter • Statistics sampling – Random page, how to handle skewed distribution? • Upper and lower bounds – Problems caused by incrementing columns • Propagation errors Statistics Recompute • Scenario: start with accurate statistics • Update column with new values – That did not previously exist • If fewer than 20% of rows updated – Auto-recompute is not triggered Sampling • Default sampling percentage is usually good • Caution: not a random row sample! – Random sampling of page • From nonclustered index if available • If there is correlation between pages & values – Then serious over estimation possible Out of range • Statistics sampling tries to identify lower and upper bound Bad Execution Plan Examples Not comprehensive Scenarios Not comprehensive • Or condition • Multiple optional search arguments • Skewed distributions – 1 business logic for Small and large data sets – Reports for 1 day, 1 week, 1 month, 1 year • Statistics related problems – Resulting in horrible execution plan When the Query Optimizer Does not understand you Simple OR Conditions Simple Index Seek OR Condition in Join Alternative – Union UNION and UNION ALL • UNION – Only distinct rows – Sort to eliminate duplicates • Can be expensive for high row counts • UNION ALL – All rows – No sort to eliminate duplicates Multiple Optional SARGs This was suppose to work, but does not Parameterized SQL Skew and Range Variation Statistics Out-of Range Statistics Out of Range (cont.) Table Variable No Statistics assumes 1 row, 1page Why? No recompiles Loop Join – Scan Inner Source Estimate 1 row Really Bad News The Correct Plan Estimate 1 row Hash Join forced with hint Loop Join vs. Hash Join Alternative Plan with Index Temp Table versus CTE • Consider options – SELECT xxx INTO #Temp – FROM Sql Main Expression – WITH tmp AS (SELECT xxx FROM Sql) Main Expression