Modern Performance - SQL Server Joe Chang www.qdpma.com Jchang6 @ yahoo About Joe • • • • • SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, no data 2004 Decoding statblob/stats_stream – writing your own statistics • Disk IO cost structure • Tools for system monitoring, execution plan analysis See ExecStats http://www.qdpma.com/ExecStats/SQLExecStats.html Download: http://www.qdpma.com/ExecStatsZip.html Blog: http://sqlblog.com/blogs/joe_chang/default.aspx Overview • Why performance is still important today? – Brute force? • Yes, but … • Special Topics • Automating data collections • SQL Server Engine – What developers/DBA need to know? CPU & Memory 2001 versus 2012 PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E P QPI C3 C2 C1 C0 MCH 2001 – 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? LLC MI MI QPI PCI-E C3 C2 C1 C0 LLC MI C3 C2 C1 C0 PCI-E LLC MI C4 C5 C6 C7 C4 C5 C6 C7 MI QPI QPI C3 C2 C1 C0 PCI-E LLC MI C4 C5 C6 C7 MI PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E DMI 2 PCI-E PCI-E _____ 2013 __ 2014 16GB $191 __ $180 32GB $794 __ $650 QPI MI PCI-E Each core today is more than 10x over PIII PCI-E PCI-E Xeon MP 2002-4 C4 C5 C6 C7 QPI QPI FSB PCI-E QPI P PCI-E P PCI-E PCI-E P L2 2012 – 4 sockets, 8 cores each 4 x 8 = 32 cores total Westmere-EX 1TB (64x16GB) Sandy Bridge E5: 768GB (48 x 16GB), 15 cores in Xeon E7 v2 3TB (96 x 32GB) CPU & Memory 2001 versus 2014 C5 C6 C7 C8 C9 PCI-E C5 C6 C7 C8 C9 MI C5 C6 C7 C8 C9 PCI-E C5 C6 C7 C8 C9 MI QPI QPI PCI-E C5 C6 C7 C8 C9 MI QPI QPI C4 C3 C2 LLC C1 C0 MI PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E DMI 2 PCI-E Mem___2013 __ 2014 16GB __ $191 __ $180 32GB __ $794 __ $650 PCI-E Each core today is more than 10x over Pentium III (700MHz?) PCI-E PCI-E Xeon MP 2002-4 PCI-E 2001 – 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? PCI-E QPI C4 C3 C2 LLC C1 C0 MI PCI-E QPI PCI-E C5 C6 C7 C8 C9 PCI-E C5 C6 C7 C8 C9 MI PCI-E QPI C4 C3 C2 LLC C1 C0 MI PCI-E MCH C5 C6 C7 C8 C9 PCI-E FSB QPI C4 C3 C2 LLC C1 C0 MI PCI-E P PCI-E P PCI-E P PCI-E PCI-E P L2 Xeon E7 v2 (Ivy Bridge) 4 x 15 = 60 cores 3TB (96 x 32GB) 24 DIMMs per socket (12 shown) Intel E5 & E7 v2 (Ivy-Bridge) E3 v3 GFX MC DMI x4 x4 x4 x4 PCH Storage 2001 versus 2012/13 QPI 192 GB MCH HDD HDD HDD 10GbE RAID RAID RAID RAID SSD SSD SSD SSD HDD HDD HDD HDD PCIe x8 HDD HDD PCIe x8 HDD PCIe x8 PCI RAID PCIe x8 PCI RAID PCIe x8 PCI RAID PCIe x4 PCI RAID 192 GB QPI HDD HDD 2001 100 x 10K HDD 125 IOPS each = 12.5K IOPS IO Bandwidth limited: 1.3GB/s (1/3 memory bandwidth) 2013 64 SSDs, >10K+ IOPS each, 1M IOPS possible IO Bandwidth 10GB/s easy SAN vendors – questionable BW http://www.qdpma.com/Storage/Storage2013.html http://www.qdpma.com/ppt/Storage_2013.pptx IB SAN Node 1 Node 2 768 GB 768 GB SSD SSD 8 Gb FC Switch x4 SAS 2GB/s Switch SP A SP B 24 GB 24 GB x4 SAS Main Volume Log volume SSD 10K 7.2K Hot Spares x8 24 GB x8 24 GB Node 2 768 GB x8 SP B x8 SP A 768 GB x8 8 Gbps FC or Switch 10Gbps FCOE x8 Switch Node 1 2GB/s Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7 Data 8 Data 9 Data 10 Data 11 Data 12 Data 13 Data 14 Data 15 Data 16 SSD 1 SSD 2 SSD 3 SSD 4 Log 1 Log 2 Log 3 Log 4 http://sqlblog.com/blogs/joe_chang/archive/2013/05/10/enterprise-storage-systems-emc-vmax.aspx http://sqlblog.com/blogs/joe_chang/archive/2013/02/25/emc-vnx2-and-vnx-future.aspx Performance Past, Present, Future • When will servers be so powerful that … – Been saying this for a long time • Today – 10 to 100X overkill – 32-cores in 2012, 60-cores in 2014 – Enough memory that IO is only sporadic – Unlimited IOPS with SSD • What can go wrong? Today’s topic Factors to Consider SQL Tables Indexes Statistics Query Optimizer Storage Engine Hardware Compile Parameters DOP memory Special Topics • Data type mismatch • Multiple Optional Search Arguments (SARG) – Function on SARG • • • • • Parameter Sniffing versus Variables Statistics related (big topic) first OR, then AND/OR combinations Complex Query with sub-expressions Parallel Execution Not in order of priority http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx 1a. Data type mismatch DECLARE @name nvarchar(25) = N'Customer#000002760' SELECT * FROM CUSTOMER WHERE C_NAME = @name Table column is varchar Parameter/variable is nvarchar SELECT * FROM CUSTOMER WHERE C_NAME = CONVERT(varchar, @name) .NET auto-parameter discovery? Unable to use index seek 1b. Type Mismatch – Row Estimate SELECT * FROM CUSTOMER WHERE C_NAME LIKE 'Customer#00000276%' SELECT * FROM CUSTOMER WHERE C_NAME LIKE N’Customer#00000276%' Row estimate error could have severe consequences in a complex query SELECT TOP + Row Estimate Error SELECT TOP 1000 [Document].[ArtifactID] FROM [Document] (NOLOCK) WHERE [Document].[AccessControlListID_D] IN (1,1000064,1000269) AND EXISTS ( SELECT [DocumentBatch].[BatchArtifactID] FROM [DocumentBatch] (NOLOCK) INNER JOIN [Batch] (NOLOCK) ON [Batch].ArtifactID = [DocumentBatch].[BatchArtifactID] WHERE [DocumentBatch].[DocumentArtifactID] = [Document].[ArtifactID] AND [Batch].[Name] LIKE N'%Value%' ) ORDER BY [Document].[ArtifactID] Data type mismatch – results in estimate rows high Top clause – easy to find first 1000 rows In fact, there are few rows that match SARG Wrong plan for evaluating large number of rows http://www.qdpma.com/CBO/Relativity.html 2. Multiple Optional SARG DECLARE @Orderkey int, @Partkey int = 1 SELECT * FROM LINEITEM WHERE (@Orderkey IS NULL OR L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey) AND (@Partkey IS NOT NULL OR @Orderkey IS NOT NULL) IF block DECLARE @Orderkey int, @Partkey int = 1 These are actually the stored procedure parameters IF (@Orderkey IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey) ELSE IF (@Partkey IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_PARTKEY = @Partkey) Need to consider impact of Parameter Sniffing, Consider the OPTIMIZER FOR hint Dynamically Built Parameterized SQL DECLARE @Orderkey int, @Partkey int = 1 , @SQL nvarchar(500), @Param nvarchar(100) SELECT @SQL = N‘/* Comment */ SELECT * FROM LINEITEM WHERE 1=1‘ , @Param = N'@Orderkey int, @Partkey int' IF (@Orderkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_ORDERKEY = @Orderkey' IF (@Partkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_PARTKEY = @Partkey' PRINT @SQL exec sp_executesql @SQL, @Param, @Orderkey, @Partkey IF block is easier for few options Dynamically built parameterized SQL better for many options Consider /*comment*/ to help identify source of SQL 2b. Function on column SARG SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE YEAR(L_SHIPDATE) = 1995 AND MONTH(L_SHIPDATE) = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN '1995-01-01' AND '1995-01-31' DECLARE @Startdate date, @Days int = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN @Startdate AND DATEADD(dd,1,@Startdate) Estimated versus Actual Plan - rows Estimated Plan – 1 row??? Actual Plan – actual rows 77,356 3 Parameter Sniffing -- first call, procedure compiles with these parameters exec p_Report @startdate = '2011-01-01', @enddate = '2011-12-31' -- subsequent calls, procedure executes with original plan exec p_Report @startdate = '2012-01-01', @enddate = '2012-01-07' Assuming date data type Need different execution plans for narrow and wide range Options: 1) WITH RECOMPILE 2) main procedure calls 1 of 2 identical sub-procedures One sub-procedure is only called for narrow range Other called for wide range Skewed data distributions also important Example: Large & small customers 4 Statistics • Auto-recompute points • Sampling strategy – How much to sample - theory? – Random pages versus random rows – Histogram Equal and Range Rows – Out of bounds, value does not exist – etc. Statistics Used by the Query Optimizer in SQL Server 2008 Writer: Eric N. Hanson and Yavor Angelov Contributor: Lubor Kollar http://msdn.microsoft.com/en-us/library/dd535534.aspx Statistics Structure • Stored (mostly) in binary field Scalar values Density Vector – limit 30, half in NC, half Cluster key Histogram Up to 200 steps Consider not blindly using IDENTITY on critical tables Example: Large customers get low ID values Small customers get high ID values http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx Statistics Auto/Re-Compute • Automatically generated on query compile • Recompute at 6 rows, 500, every 20%? Has this changed? Statistics Sampling • Sampling theory – True random sample – Sample error - square root N • Relative error 1/ N • SQL Server sampling – Random pages • But always first and last page??? – All rows in selected pages Row Estimate Problems • Skewed data distribution • Out of bounds • Value does not exist Loop Join - Table Scan on Inner Source Estimated out from first 2 tabes (at right) is zero or 1 rows. Most efficient join to third table (without index on join column) is a loop join with scan. If row count is 2 or more, then a fullscan is performed for each row from outer source Default statistics rules may lead to serious ETL issues Consider custom strategy Compile Parameter Not Exists Main procedure has cursor around view_Servers First server in view_Servers is ’CAESIUM’ Cursor executes sub-procedure for each Server sql: SELECT MAX(ID) FROM TReplWS WHERE Hostname = @ServerName But CAESIUM does not exist in TReplWS! Good and Bad Plan? SqlPlan Compile Parameters SqlPlan Compile Parameters <?xml version="1.0" encoding="utf-8"?> <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.50.2500.0"> <BatchSequence> <Batch> <Statements> <StmtSimple StatementText="@ServerName varchar(50) SELECT @maxid = ISNULL(MAX(id),0) FROM TReplWS WHERE Hostname = @ServerName" StatementId="1" StatementCompId="43" StatementType="SELECT" StatementSubTreeCost="0.0032843" StatementEstRows="1" StatementOptmLevel="FULL" QueryHash="0x671D2B3E17E538F1" QueryPlanHash="0xEB64FB22C47E1CF2" StatementOptmEarlyAbortReason="GoodEnoughPlanFound"> <StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" /> <QueryPlan CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="168"> <RelOp NodeId="0" PhysicalOp="Compute Scalar" LogicalOp="Compute Scalar" EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007“ AvgRowSize="15" EstimatedTotalSubtreeCost="0.0032843" Parallel="0" EstimateRebinds="0" EstimateRewinds="0"> </RelOp> <ParameterList> <ColumnReference Column="@ServerName" ParameterCompiledValue="'CAESIUM'" /> </ParameterList> </QueryPlan> </StmtSimple> </Statements> </Batch> </BatchSequence> </ShowPlanXML> Compile parameter values at bottom of sqlplan file More Plan Details Query with joining 6 tables Each table has too many indexes Row estimate is high – plan cost is high Query optimizer tries really really hard to find better plan Actual rows is moderate, any plan works 5a Single Table OR -- Single table SELECT * FROM LINEITEM WHERE L_ORDERKEY = 1 OR L_PARTKEY = 184826 5a Join 2 Tables, OR in SARG -- subsequent calls, procedure executes with original plan SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826 OR O_CUSTKEY = 137099 5a UNION (ALL) instead of OR SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826 UNION (ALL) SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE O_CUSTKEY = 137099 -- AND (L_PARTKEY <> 184826 OR L_PARTKEY IS NULL) -- Caution: select list should have keys to ensure correct rows UNION removes duplicates (with Sort operation) UNION ALL does not -- Hugo Kornelis trick -- 5b AND/OR Combinations • Hash Join is good method to process many rows – Requirement is equality join condition SELECT xx FROM A WHERE col1 IN (expr1) AND col2 NOT IN (expr2) SELECT xx FROM A WHERE (expr1) AND (expr2 OR expr3) • In complex SQL with AND/OR or IN NOT IN combinations – Query optimizer may not be to determine that equality join condition exists – Execution plan will use loop join, – and attempt to force hash join will be rejected • Re-write using UNION in place of OR • And LEFT JOIN in place of NOT IN More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity3.html Complex Query with Sub-expression • Query complexity – really high compile cost • Repeating sub-expressions (including CTE) – Must be evaluated multiple times • Main Problem - Row estimate error propagation • Solution/Strategy – Get a good execution plan – Temp table when estimate is high, actual is low. When Estimate is low, and actual rows is high, need to balance temp table insert overhead versus plan benefit. Would a join hint work? More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity4.html http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx Temp Table and Table Variable • Forget what other people have said – Most is cr@p • Temp Tables – subject to statistics auto/re-compile • Table variable – no statistics, assumes 1 row • Question: In each specific case: does the statistics and recompile help or not? – Yes: temp table – No: table variable Parallelism • Designed for 1998 era – Cost Threshold for Parallelism: default 5 – Max Degree of Parallelism – instance level – OPTION (MAXDOP n) – query level • Today – complex system – 32 cores – Plan cost 5 query might run in 10ms? – Some queries at DOP 4 Really need to rethink parallelism / NUMA strategies – Others at DOP 16? More on Parallelism: http://www.qdpma.com/CBO/ParallelismComments.html http://www.qdpma.com/CBO/ParallelismOnset.html Number of concurrently running queries x DOP less than number of logical/physical processors? Full-Text Search Loop Join with FT as inner Source Full Text search Potentially executed many times varchar(max) stored in lob pages • Disk IO to lob pages is synchronous? – Must access row to get 16 byte link? – Feature request: index pointer to lob SQL PASS 2013 Understanding Data Files at the Byte Level Mark Rasmussen Summary • Hardware today is really powerful – Storage may not be – SAN vendor disconnect • Standard performance practice – Top resource consumers, index usage • But also Look for serious blunders http://www.qdpma.com/CBO/SQLServerCostBasedOptimizer.html http://www.qdpma.com/CBO/Relativity.html http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx Special Topics • Data type mismatch • Multiple Optional Search Arguments (SARG) – Function on SARG • • • • • Parameter Sniffing versus Variables Statistics related (big topic) AND/OR Complex Query with sub-expressions Parallel Execution SQL Server Edition Strategies • Enterprise Edition – per core licensing costs – Old system strategy • 4 (or 2)-socket server, top processor, max memory – Today: How many cores are necessary • 2 socket system, max memory (16GB DIMMs) • Is standard edition adequate – Low cost, but many important features disabled • BI edition – 16 cores – Limited to 64GB for SQL Server process New Features in SQL Server • 2005 – Index included columns – Filtered index – CLR • 2008 – Partitioning – Compression • 2012 – Column store (non-clustered) • 2014 – Column store clustered – Hekaton