SQL Server Technical Article Writer: Jonathan Foster Technical Reviewer: Robert L. Davis Special Acknowledgement: Jimmy May, Andre Ciabattari, John Yi, and Ken Hughes. Published: April 2011 Applies to: SQL Server 2008 R2 – Hosting TempDB on Fusion-IO ioDrive Duo 1280GB MLC Summary: This paper compares the TempDB performance between direct-attached physical disks and the Fusion-IO MLC DIMMS. Copyright This document is provided “as-is”. Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. © 2011 Microsoft. All rights reserved. 2 Introduction The Product Quality & Online group (PQO) at Microsoft was asked to design a centralized data store hosting copies of all the OLTP and data warehouses managed by their parent group, Customer Service & Support (CSS). While the long-term plan is a scale-out approach using Parallel Data Warehousing (PDW), the shortterm plan is a scale-up approach based on the familiar star schema where several individual servers running ETL’s and replication publications feed one very powerful server (VPS) that presents all the data to users. Due to the ad hoc nature of the queries expected to be run on the VPS, the monthly increase of end-users, and the need to actively manage maintenance tasks across 200+ databases, the designers were concerned about the disk i/o performance of TempDB. The main driving factor for a scale-up design was driven by end-user needs to perform cross-database queries. Linked servers queries and distributed queries would not provide acceptable performance to end users. A scale-up design was implemented in 2009 using an HPDL5-580 G5 w/ 24 CPU cores and 256GB RAM. The server was attached to an EMC CX4-240 SAN with a mixture of Solid State (SSD), Fiber Channel (FC), and SATA drives. Performance analysis revealed that TempDB accounted for approximately 30% of all the I/O on the server on any given day. For this reason, the TempDB was hosted on 2 LUNs spread across 48 72GB SAS 15k drives that were direct-attached to the server. At an estimate of 180 IOPS per disk this setup provided approximately 5750 IOPS. The size of the data being hosted and the number of users grew significantly in 2009/2010 which forced another upgrade in 2011. The to-be presentation server is a HP5-DL580 G7 w/ 48 cores and 1TB of RAM. The server was attached to an EMC CX4-960 SAN having a mixture of Solid State (SSD), Fiber Channel (FC), and SATA drives. The new G7’s attached storage for hosting TempDB consists of 32 300GB SAS 10k disks. An estimate of 140 IOPS per disk provides approximately 2770 IOPS for the entire LUN resulting in a 50% throughput reduction from the G5 attached storage. TempDB storage usage from 2009-2010 was typically 400GB per day with a maximum of 800GB. Performance of TempDB on the G5 was very good with disk response times predominately in the 5ms-8ms range and an average queue depth less than 1. The average daily load on TempDB physical disks by 80% of the users is approximately 400 writes and 480 reads. The remaining 20% of the users produced an average daily load of approximately 55400 writes and 85050 reads, and these users are the primary stakeholders of the data. The number of primary stakeholders is expected to increase dramatically in the next 12 months as will the amount of data they will be querying. For this reason it is imperative that the performance of TempDB be maintained at least to current levels which eliminate the use of the G7 attached storage. Use of SAN storage to host TempDB is not desired because the number of spindles required to provide the needed throughput results in a significant amount of unused storage. Since the TempDB database accounts for over 20% of the I/O, segregating that I/O off of the SAN heads helps reduce the overall load on the SAN as well. Test Environment Hardware Platform HP ProLiant SE326M1 2U SAS Model CPU (Clock Speed, Cache, Max TDP) 2 x 8 Intel Xeon L5520 2.26GHz, 8MB L3 Cache, 60W Memory 12 x 4GB PC3-10600R (48GB total) HDDs (Capacity, Rotational Speed, 20 x 146GB 10K SAS 2.5” (T$) , 5 x 300GB 10K SAS 2.5”(H$) 2 x Fusion 640 ioDrive Duo cards (U$) Interface, Form Factor) ioDIMMS LUN Configuration 3 LUN1 & LUN2 = 2x146GB RAID1 (OS & SQL) LUN3 = 7x300GB RAID5 (SQL data and log files) LUN4 = 16x146GB RAID10 (TempDB) LUN5 = 2x640 MCL ioDimms RAID1 (TempDB) Controller HP P410/256 Smart Array Controller with Battery Power Supply 2 x 750W Gold N+1 Redundant Power Network Single Dual Port Embedded Intel NIC Storage Configuration HDDs Phase 1 ~ T$ = RAID1+0; H$ = RAID5 Phase 2 ~ T$ = RAID 0; H$ = RAID5 U$ = RAID 0 ioDimms Software OS Windows Server 2008 R2 Enterprise SQL Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64) Test The test will involve a variety of TempDB intensive queries executed concurrently via SSIS while capturing disk performance counters to measure I/O throughput. The test will be executed 6 times against the same server: 3 runs with TempDB hosted on SAS drives and 3 runs with TempDB hosted on the Fusion-IO drives. Test Details 1) DDL Tasks a. Analyze index fragmentation in a 1.25TB database having 1243 indexes in 436 tables (max index size = 3530024kb and min index size = 16kb). b. Reorganize all indexes found to have more than 10% and less than 30% fragmentation (est. 130 indexes at 5840KB) i. Rebuild all indexes using the Sort in TempDB option that are found to have more than 29% fragmentation (est. 688 indexes at 56380144KB) ii. Run DBCC CHECKTABLE on 11242382KB table 2) DML Tasks – (executed simultaneously with the above-mentioned DDL Tasks) a. Create and populate 3 temp tables with 1247328KB each of sorted data including LOB data types (all subsequent items are performed in 3 streams synchronously the only difference being the WHERE qualifiers in the final queries) 4 i. Regular SELECTs and INSERTs: 1. Perform sorted distinct select of 2469264KB worth of data into user table from each temp table. 2. Insert 4938528KB of data to new temp table using UNION ALL 3. Insert 4938528KB of data to new temp table 4. Insert 555375 KB of data to new temp table using sorted SUBSTRING SELECT with WHERE qualifier on LOB column. ii. Cursors: 1. Insert 4938528KB of data to new temp table using UNION ALL 2. Insert 4938528KB of data to new temp table by combining 4 NVARCHAR(MAX) columns into 1 NVARCHAR(MAX) column 3. Create 7 record metadata table having 1 NVARCHAR(MAX) column 4. Cursor through 4938528 rows in table ii looking for data elements found in the 7 record meta table. INSERT found records to new temp table. b. Create and populate temp table with 28GB of data including LOB and uniqueidentifier data types. i. Use FOR WHILE loop to populate 2nd temp table with all rows from 1st temp table while scrambling the LOB data. Create clustered index on 1st temp table. ii. SELECT INTO 3rd temp table of all columns and rows from temp tables 1 & 2. Create clustered index on 3rd temp table. iii. Using the 3rd temp table, perform 100 sorted selects using a WHERE qualifier on the LOB columns iv. Using the 1st & 2nd temp tables via JOIN, perform 100 sorted selects using a WHERE qualifier on the LOB column and uniqueidentifier columns. v. Using the 1st & 2nd temp tables via JOIN, perform 200 sorted selects using a WHERE qualifier on the LOB column and uniqueidentifier columns. c. Create and populate temp table with 327680KB of LOB data. i. Use FOR WHILE loop to populate 2nd temp table with all rows from 1st temp table while scrambling the LOB data. ii. SELECT INTO 3rd temp table of all columns and rows from temp tables 1 & 2. iii. Using the 3rd temp table, perform 100 sorted selects using a WHERE qualifier iv. Using the 1st & 2nd temp tables via JOIN, perform 100 sorted selects using a WHERE qualifier. v. Using the 1st & 2nd temp tables via JOIN, perform 200 sorted selects using a WHERE qualifier d. Using a stored procedure, create and populate a 100 record sampling of source data. Use a FOR WHILE loop to select 1 unique row based upon specific column value 100 separate times. 5 Screen shot of test SSIS package Success Criteria If the following benchmarks are met or surpassed while hosting TempDB on Fusion-IO cards, we will consider this configuration as superior to the conventional attached storage: 1) 20% less time to run the SSIS package end-to-end 2) 30% less disk queuing 3) Disk queue length will not exceed 32 (2 x # of physical disks) for longer than a cumulative 120 seconds Test Results Fusion-IO vs. SAS RAID 1+0 Test Fusion-IO SAS Performance Delta 22562 seconds 24096 seconds 6.37% Fusion-IO SAS Performance Delta 22562 seconds 24096 seconds 6.37% 731 seconds 7680 seconds 14100 seconds 51 seconds 1360 seconds 7920 seconds 14700 seconds 116 seconds 46.25% 3.04% 4.09% 56.04% DML Task Fusion-IO SAS Performance Delta All DML Tasks End-toEnd 5461 seconds 9921 seconds 44.96% Entire test package endto-end DDL Task All DDL Tasks End-toEnd Analyze 1243 indexes Rebuild 688 indexes Reorganize 130 indexes DBCC Checktable 6 Generate RandomIO in TempDB using 100 Individual and Distinct Queries Build 320MB Temp Tables filled with bogus LOB data then run a variety of serialized SELECTs Create 28GB Temp Table w/ indexes and 380MB table then perform a JOIN to SELECT Create 200MB temp table containing LOB data and cursor through all rows for a string value Create 300MB temp table containing LOB data and cursor through all rows for a string value Create 500MB temp table containing LOB data and cursor through all rows for a string value Create a populate 3 separate 1200MB Temp Tables Create 200MB table and use to populate two 225MB LOB tables via JOIN then SELECT Create 256MB table and use to populate two 200MB LOB tables via JOIN then SELECT Create 256MB table and use to populate two 350MB LOB tables via JOIN then SELECT Disk Metric Avg. Logical Disk Queue Length (Total) Avg. Logical Disk Queue Length (initial 1000 seconds) Avg. Logical Disk Queue Length (remaining seconds) 7 5 seconds 9 seconds 45.45% 491 seconds 518 seconds 17.90% 1902 seconds 6074 seconds 68.69% 32 seconds 50 seconds 36% 91 seconds 81 seconds -10.09% 1171 seconds 1250 seconds 6.33% 706 seconds 815 seconds 13.38% 53 seconds 44 seconds -16.99% 462 seconds 542 seconds 14.77% 548 seconds 538 seconds 1.83% Fusion-IO SAS Performance Delta .09 26 99.66% 7.31 212.29 96.56% 0.01 0.29 96.56% Cumulative seconds where avg disk queue length exceeded 32 90 380 76.32% Fusion-IO SAS Performance Delta 22562 seconds 25390 seconds 11.14% Fusion-IO SAS Performance Delta 22562 seconds 25390 seconds 11.14% 731 seconds 7680 seconds 14100 seconds 51 seconds 897 seconds 8234 seconds 16128 seconds 131 seconds 18.51% 6.73% 12.58% 61.07% DML Task Fusion-IO SAS Performance Delta All DML Tasks End-toEnd Generate RandomIO in TempDB using 100 Individual and Distinct Queries Build 320MB Temp Tables filled with bogus LOB data then run a variety of serialized SELECTs Create 28GB Temp Table and 380MB table then perform a JOIN to SELECT Create 200MB temp table containing LOB data and cursor through all rows for a string value Create 300MB temp table containing LOB data and cursor through all rows for a string value Create 500MB temp table containing LOB data and cursor through all rows for a string 5461 seconds 8556 seconds 36.18% 5 seconds 7 seconds 28.58% 491 seconds 530 seconds 7.36% 1902 seconds 4156 seconds 54.24% 32 seconds 63 seconds 49.21% 91 seconds 106 seconds 14.16% 1171 seconds 1544 seconds 24.16% Test Results Fusion-IO vs. SAS RAID 0 Test Entire test package endto-end DDL Task All DDL Tasks End-toEnd Analyze 1243 indexes Rebuild 688 indexes Reorganize 130 indexes DBCC Checktable 8 value Create a populate 3 separate 1200MB Temp Tables Create 200MB table and use to populate two 225MB LOB tables via JOIN then SELECT Create 256MB table and use to populate two 200MB LOB tables via JOIN then SELECT Create 256MB table and use to populate two 350MB LOB tables via JOIN then SELECT Disk Metric Avg. Logical Disk Queue Length (Total) Avg. Logical Disk Queue Length (initial 1000 seconds) Avg. Logical Disk Queue Length (remaining seconds) Cumulative seconds where avg disk queue length exceeded 32 9 706 seconds 896 seconds 21.21% 53 seconds 103 seconds 48.55% 462 seconds 575 seconds 19.66% 548 seconds 576 seconds 4.87% Fusion-IO SAS Performance Delta .09 12.36 99.28% 7.31 276.13 97.36% 0.01 1.66 99.40% 90 300 70.00% The first phase of the test was the most disk intensive and specifically the metrics captured in the first 900 seconds caused the running averages to be abnormally high. The first graph below illustrates the queue depth difference between Fusion-IO and SAS due the first 1000 seconds of the test: 6000 5000 4000 Fusion_RAID0_QLength 3000 SAS_RAID10_QLength SAS_RAID0_Qlength 2000 1000 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Eliminating these first 1000 seconds from the test results helps illustrate more accurately the difference between Fusion-IO and SAS over the bulk of the test: 5 4.5 4 3.5 3 Fusion_RAID0_QLength 2.5 SAS_RAID10_QLength 2 SAS_RAID0_Qlength 1.5 1 0.5 1 93 185 277 369 461 553 645 737 829 921 1013 1105 1197 1289 1381 1473 1565 1657 1749 1841 1933 2025 2117 0 10 The following graph compares the WRITE throughput between the 3 disk configurations for the first 5000 seconds of the test. Notice that the Fusion-IO cards are able to perform a disproportionate amount of WRITE’s which reduces queue (seen above) thereby finishing the operation more quickly: 800000000 700000000 600000000 500000000 Fusion_Disk Write Bytes/sec 400000000 SAS_RAID10_Disk Write Bytes/sec 300000000 SAS_RAID0_Disk Write Bytes/sec 200000000 100000000 1 27 53 79 105 131 157 183 209 235 261 287 313 339 365 391 417 443 469 495 0 Performance monitor Summary Report for Fusion-IO 11 Performance Monitor Summary Report for SAS @ RAID10: Performance Monitor Summary Report for SAS @ RAID 0 12 Summary & Conclusion When the results are strictly measured against the success criteria, the Fusion-IO card surpassed expectations for the the two queuing criteria in both SAS RAID configurations. The time-to-completion criteria delta was not as pronounced since the package ran 11% faster compared to the SAS RAID 0 configuration, and completed only 6% faster compared to the SAS RAID 10 configuration. The following caveat should be considered though, before issuing a “failure” judgment for the third success criteria. The DDL tasks that showed the least amount of improvement were also the most disk intensive on the RAID 5 data drive (H$). Because most of that index rebuild and reorganize activity was occurring on H$ drive, holding the Fusion-IO drives accountable for that activity is not a reasonable expectation. Those DDL tasks that were TempDB dependent definitely met the success criteria. More importantly, the DML task metrics alone demonstrate that having TempDB hosted on Fusion-IO resulted in surpassing the success criteria by 50% to 100%. Careful interpretation of the results certainly reveals that hosting TempDB on Fusion-IO cards will result in a considerable performance improvement for queries that are dependent on TempDB. Similar throughput could be achieved with conventional disks. However, the number of disks required to achieve the same throughput would result in several hundred gigabytes of “wasted space” since TempDB files must be the sole occupants of the physical disks allocated to them. Server OEM’s are no longer offering smaller 15k SAS drives in their standard SKU’s and SAN manufacturers are phasing out the offering of smaller drives. Futhermore, reducing the load on the SAN by eliminating the 10% to 20% of total I/O from tempDB activity will help improve query processing performance. For more information: http://www.microsoft.com/sqlserver/: SQL Server Web site http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter http://www.Fusion-IO.com/: Fusion-IO Addendum The below table shows the performance timings for the DML package tasks on the HP5-DL580 G7 with TempDB hosted on the 30 SAS drives. The Fusion-IO cards were not installed on this server, therefore, the performance delta is not entirely valid given that the test environment was not the same. However, its worth noting that the Fusion-IO timings were still better even when compared to SAS setup with 24 additional processors and 976 additional gigabytes of RAM. 13 DML Task Fusion-IO HP5-DL580 G7 Performance Delta All DML Tasks End-toEnd Generate RandomIO in TempDB using 100 Individual and Distinct Queries Build 320MB Temp Tables filled with bogus LOB data then run a variety of serialized SELECTs Create 28GB Temp Table and 380MB table then perform a JOIN to SELECT Create 200MB temp table containing LOB data and cursor through all rows for a string value Create 300MB temp table containing LOB data and cursor through all rows for a string value Create 500MB temp table containing LOB data and cursor through all rows for a string value Create a populate 3 separate 1200MB Temp Tables Create 200MB table and use to populate two 225MB LOB tables via JOIN then SELECT Create 256MB table and use to populate two 200MB LOB tables via JOIN then SELECT Create 256MB table and use to populate two 350MB LOB tables via JOIN then SELECT 5461 seconds 8970 seconds 35.75% 5 seconds 4 seconds -20.00% 491 seconds 724 seconds 52.55% 1902 seconds 4227 seconds 55.01% 32 seconds 74 seconds 56.76% 91 seconds 124 seconds 26.62% 1171 seconds 785 seconds -32.97% 706 seconds 1626 seconds 56.59% 53 seconds 53 seconds 0% 462 seconds 568 seconds 18.67% 548 seconds 785 seconds 30.20% 14