TEN QUERY TUNING TECHNIQUES Every SQL Programmer Should Know Kevin Kline Director of Engineering Services at SQL Sentry Microsoft MVP since 2003 Facebook, LinkedIn, Twitter at KEKLINE KEKline@sqlsentry.com KevinEKline.com, ForITPros.com FOR FRIENDS OF SQL SENTRY • Free Plan Explorer download: http://www.sqlsentry.net/plan-explorer/ • Free query tuning consultations: http://answers.sqlperformance.com. • Free new ebook (regularly $10) to attendees. Send request to sales@sqlsentry.net. • SQL Server educational videos, scripts, and slides: http://SQLSentry.TV • Tuning blog: http://www.sqlperformance.com/ • Monthly eNews tips and tricks: http://www.sqlsentry.net/newsletterarchive.asp AGENDA • Introductions • Test & tuning environment • 1. Clearing caches • Looking for red flags • 2. Reading execution plans • Query tuning techniques: • 8 more specific examples of widespread approaches that lead to poor performance • Summary TEST & TUNING ENVIRONMENT • Code to clear the caches*: o CHECKPOINT o DBCC [FreeProcCache | FreeSystemCache | FlushProcInDB(<dbid>) ] o DBCC DropCleanBuffers • Code to set measurements: o SET STATISTICS [TIME | IO] o SET SHOWPLAN [TEXT | XML] or Graphic Execution Plans • Code for Dynamic Management Views (DMV) checks. o System info – sys.dm_os_performance_counters, sys.dm_os_wait_stats o Query info – sys.dm_exec_requests o Index info – sys.dm_db_index_usage_stats, sys.dm_io_virtual_file_stats RED FLAGS IN YOUR SQL CODE • Red Flags Query Operators: o Lookups, Scans, Spools, Parallelism Operations • Other Red Flags: o o o o o Dissimilar estimated versus actual row counts High physical reads Missing statistics alarms Large sort operations Implicit data type conversions DEMOS: DEFAULT CURSORS • I don’t always use cursors… o o o o …but when I do, I avoid the default options Slow and heavy-handed: Global, updateable, dynamic, scrollable I use LOCAL FAST_FORWARD May want to test STATIC vs. DYNAMIC, when tempdb is a bottleneck • Blog post: http://bit.ly/AB-cursors DEMOS: WHERE IN VERSUS WHERE EXISTS • There are lots of ways to find data existing within subsets: • IN, EXISTS, JOIN, Apply, subquery • Which technique is best? • Blog post: http://bit.ly/AB-NOTIN OPTIMIZING FOR SELECT VERSUS DML • Big differences between a SELECT and a DML statement that effects the same rows. • Shouldn’t blindly create every index the Tuning Advisor or execution plan tells you to! • Blog post - http://bit.ly/AB-BlindIndex READS & INDEX STRUCTURE • 8K pages • Leaf pages ARE the data. • Non-leaf pages are pointers. Root Page Intermediate Pages Leaf Pages Level 2 Level 1 Level 0 WRITES & INDEX STRUCTURE • Each change to the leaf pages requires all index structures be updated. Root Page Intermediate Pages Leaf Pages Level 2 Level 1 Level 0 DML Page Split Actual placement DEMOS: UNWANTED RECOMPILES Execution In Memory? NO Load metadata compile YES ReComp optimize Execute CAUSES OF RECOMPILE • Expected: Because we request it: • CREATE PROC … WITH RECOMPILE or EXEC myproc … WITH RECOMPILE • SP_RECOMPILE foo • Expected: Plan was aged out of memory • Unexpected: Interleaved DDL and DML • Unexpected: Big changes since last execution: • Schema changes to objects in underlying code • New/updated index statistics • Sp_configure INTERLEAVED DDL AND DML • • • • • • • • • • • CREATE PROC testddldml AS … ; CREATE TABLE #testdml; <some T-SQL code here> INSERT INTO #testdml; <some T-SQL code here> ALTER TABLE #testdml; <some T-SQL code here> INSERT INTO #testdml; <some T-SQL code here> DROP TABLE #testdml; <some T-SQL code here> -- (DDL) -- (DML + RECOMPILE) -- (DDL) -- (DML + RECOMPILE) -- (DDL) DEMOS: THE "KITCHEN SINK" PROCEDURE • Usually see it as a one-query-for-all-queries procedure, or even one-proc-for-for-all-transactions procedure: o Where name starts with S, or placed an order this year, or lives in Texas o Insert AND Update AND Delete AND Select • Conflicting optional parameters make optimization impossible o OPTION (RECOMPILE) o Dynamic SQL + Optimize for ad hoc workloads o Specialized procedures • Better approach? o Specialize and optimize each piece of code to do ONE THING really effectively DEMOS: SP_EXECUTESQL VS. EXEC(…) • I don’t always use dynamic SQL… o o o o …but when I do, I always use sp_executesql Less fuss with concatenation and implicit/explicit conversions Better protection against SQL injection (but not for all things) At worst case, behavior is the same • Can promote better plan re-use • Encourages strongly typed parameters instead of building up a massive string IMPLICIT CONVERSIONS • SQL Server has to do a lot of extra work / scans when conversion operations are assumed by the SQL programmer. • Happens all the time with data types you’d think wouldn’t need it, e.g. between date types and character types. • Very useful data type conversion chart at http://bit.ly/15bDRRA. • Data type precedence call also have an impact: http://bit.ly/13Zio1f. IMPLICIT CONVERSION RESOURCES • Ian Stirk’s Column Mismatch Utility at http://www.sqlservercentral.com/articles/Administration/6 5138/. • Jonathan Kehayias’ plan cache analyzer at http://sqlblog.com/blogs/jonathan_kehayias/archive/2010 /01/08/finding-implicit-column-conversions-in-the-plancache.aspx. • Jonathan Kehayias’ index scan study at http://www.sqlperformance.com/2013/04/t-sqlqueries/implicit-conversion-costs DEMOS: COMMA-DELIMITED PARAMETERS • Example: pass a comma-separated list of OrderIDs • String splitting is expensive, even using CLR • Table-valued parameters are typically a better approach DEMOS: TEMPORARY STRUCTURES • Which are better, temp tables or temp variables? Temp Table Temp Variable Stored in? Tempdb Tempdb Statistics? Yes No (1 row) Indexs/Keys? Yes 1 UK / PK only Truncate? Yes No Recompiles? Yes No Parallelism? Yes No Metadata Overhead? Low Lowest Lock Overhead? Normal Lowest CODING STANDARDS AND DISSIMILARITY • Might sound frivolous, but naming schemes are important o Convention is not important; but rather being consistent and logical • Story: dbo.UpdateCustomer vs. dbo.Customer_Update • Always specify schema when creating, altering, referencing objects o Object resolution works a little bit harder without it o More importantly, it can get the wrong answer o And will often yield multiple copies of the same plan • Do not use the sp_ prefix on stored procedures o This has observable overhead, no matter how specific you are MIMICKING PRODUCTION • Your dev machine is usually nothing like production o Build representative data when you can o Build a stats-only database when you can’t (a.k.a. a database clone) • Will allow you to see plan issues, but not speed o Make sure settings are the same • @@VERSION, edition • Max memory if possible, sp_configure options • Logins (and permissions), tempdb settings • Parameterization settings, recovery model, compression, snapshot isolation • Compatibility level (usually not an issue when working with a restore) • Run a full business cycle workload after a restore o Simulate equivalent hardware: DBCC OPTIMIZER_WHATIF o Use Distributed Replay when you can • Not perfect, but more realistic than single-threaded trace replay SUMMARY Let’s connect! Facebook, LinkedIn, Twitter at KEKLINE. Email at KEKline@sqlsentry.com Blogs at http://KevinEKline.com And http://ForITPros.com WRAP UP Engage with us on social media o I’m thankful for your word of mouth promotions and endorsements! Share your tough SQL tuning problems with us: http://answers.sqlperformance.com Download SQL Sentry Plan Explorer for free: http://www.sqlsentry.com/plan-explorer/ Check out our other award winning tools: http://www.sqlsentry.net/download NOLOCK http://www.flickr.com/photos/relyjus/4289185639/ NOLOCK • It is a turbo button …if you’re ok with inaccuracy • There are times it is perfectly valid o Ballpark row counts o Please use session-level setting, not table hint • Usually, though, better to use SNAPSHOT or RCSI o But test under heavy load