Garrett Edmondson Data Warehouse Architect Blue Granite Inc. GEdmondson@blue-granite.com http://garrettedmondson.wordpress.com/ DW vs. OLTP OLTP Seek Centric (Radom Read/Writes) IOPs Volatile Data Index Heavy Many Heap Tables High Concurrency • Data Warehouse – Scan Centric (sequential reads/writes) MB/sec – Nonvolatile Data – nightly loads – Index – Light • Few covering Clustered Indexes – Low Concurrency Traditional SQL DW Architecture Shared Infrastructure Data Warehouse Speed = THROUGHPUT from storage to CPU Dedicated Network Bandwidth Enterprise Shared SAN Storage DW The DW Trinity • Star Schema (Fact & Dimension) – The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition) • Balanced Architecture – Parallel Data Warehouse • ColumnStore Indexes - SQL 2012 SQL Server Columnstore Index FAQ – “Batch” Mode – Page Compression CPU Feed Rate FC HBA SQL Server Read Ahead Rate A B A B HBA Port Rate A A B STORAGE CONTROLLER CACHE FC HBA DISK FC SWITCH CPU CORES WINDOWS SQL SERVER CACHE SERVER Potential Performance Bottlenecks A B Switch Port Rate LUN DISK B SP Port Rate DISK DISK LUN LUN Read Rate Disk Feed Rate Balanced System IO Stack Tune every IO Interface ! CPU Socket (4 Core) CPU Socket (4 Core) SMP Scale-Up for Data Growth!!! • Complex • Costly • Obsolete Hardware CPU Maximum Consumption Rate (MCR) • Theoretical Max (MCR) – From Cache Query • Set STATISTICS IO and STATISTICS TIME to ON • MAXDOP = 4 • ( [Logical reads] / [CPU time in seconds] ) * 8KB / 1024 = MCR per core • Benchmark (BCR) – From Disk - Cache • • • • DBCC dropcleanbuffers Set STATISTICS IO and STATISTICS TIME to ON MAXDOP 8+ ( [Logical reads] / [CPU time in seconds] ) * 8KB / 1024 MB/sec • Test Storage Throughput in MB/sec – SQLIO – Sequential Reads and Writes Important Establish real rather than rated, performance metrics for the key hardware components of the Fast Track reference architecture!!!! LUN Configuration Fast Track Data Striping Storage Enclosure ARY01D1v01 DB1-1.ndf ARY02D1v03 DB1-3.ndf ARY03D1v05 DB1-5.ndf ARY04D1v07 ARY05v09 DB1-7.ndf DB1.ldf ARY01D2v02 DB1-2.ndf ARY02D2v04 DB1-4.ndf ARY03D2v06 DB1-6.ndf ARY04D2v08 DB1-8.ndf Evaluating Page Fragmentation Average Fragment Size in Pages – This metric is a reasonable measure of contiguous page allocations for a table. Value should be >= 400 for optimal performance select db_name(ps.database_id) as database_name ,object_name(ps.object_id) as table_name ,ps.index_id ,i.name ,cast (ps.avg_fragment_size_in_pages as int) as [Avg Fragment Size In Pages] ,ps.fragment_count as [Fragment Count] ,ps.page_count ,(ps.page_count * 8)/1024/1024 as [Size in GB] from sys.dm_db_index_physical_stats (DB_ID() --NULL for all DBs else run in context of DB , OBJECT_ID(‘dbo.lineitem’), 1, NULL, ‘SAMPLED’) AS ps --DETAILED, SAMPLED, NULL = LIMITED inner join sys.indexes AS i on (ps.object_id = i.object_id AND ps.index_id = i.index_id) where ps.database_id = db_id() and ps.index_level = 0 TRACE FLAG –E !!!!!! Trace FLAG -T1117 (c) 2011 Microsoft. All rights reserved. SQL Server Configurations • Sequential scan performance starts with database creation and extent allocation • Recall that the –E startup option is used – Allocate 64 extents at a time (4MB) • Pre-allocation of user databases is strongly recommended • Autogrow should be avoided if possible – If used, always use 4MB increments Storage Layout Best Practices for SQL Server • Create a SQL data file per LUN, for every filegroup • TempDB filegroups share same LUNs as other databases • Log on separate disks, within each enclosure – Striped using SQL Striping – Log may share these LUNs with load files, backup targets Storage Layout Best Practices for SQL Server TempDB Stage Database Permanant_DB LUN 1 LUN 2 LUN16 LUN 3 Permanent FG Permanent_1.ndf Permanent_2.ndf Permanent_16.ndf Permanent_3.ndf Stage FG Stage_1.ndf Stage_2.ndf Stage_3.ndf Stage_16.ndf Local Drive 1 TempDB.mdf (25GB) TempDB_02.ndf (25GB) TempDB_03ndf (25GB) TempDB_16.ndf (25GB) Log LUN 1 Permanent DB Log Stage DB Log Techniques to Maximize Scan Throughput • –E startup parameter (2MB Extents and not mixed extents) • Minimize use of NonClustered indexes on Fact Tables • Load techniques to avoid fragmentation – Load in Clustered Index order (e.g. date) when possible • Index Creation always MAXDOP 1, SORT_IN_TEMPDB • Isolate volatile tables in separate filegroup • Isolate staging tables in separate filegroup or DB • Periodic maintenance • Turn on SQL Server Compression Conventional data loads lead to fragmentation • Bulk Inserts into Clustered Index using a moderate ‘batchsize’ parameter – Each ‘batch’ is sorted independently… causes fragmentation • Overlapping batches lead to page splits 1:31 1:32 1:36 1:33 1:32 1:34 1:37 Key Order of Index 1:35 1:33 1:38 1:34 1:39 1:35 1:40 Best Practices for loading • • • Use a heap – Practical if queries need to scan whole partitions or…Use a batchsize = 0 – Fine if no parallelism is needed during load or…Use a Two-Step Load 1. Load to a Staging Table (heap) 2. INSERT-SELECT from Staging Table into Target CI Resulting rows are not fragmented Can use Parallelism in step 1 – essential for large data volumes Other fragmentation best practices • Avoid Autogrow of filegroups – Pre-allocate filegroups to desired long-term size – Manually grow in large increments when necessary • Keep volatile tables in a separate filegroup – Tables that are frequently rebuilt or loaded in small increments • If historical partitions are loaded in parallel, consider separate filegroups for separate partitions to avoid extent fragmentation Columnstore Indexes storage = Segments + Dictionaries CPU = Batch Mode (c) 2011 Microsoft. All rights reserved. Row Storage Layout Customers Table ID Name Address City State 1 Bob … … … 3,000 2 Sue … … … 500 3 Ann … … … 1,700 4 Jim … … … 1,500 5 Liz … … … 0 6 Dave … … … 9,000 7 Sue … … … 1,010 8 Bob … … … 50 9 Jim … … … 1,300 Extent Bal Due page page page 1 Bob … … … 3,000 2 Sue … … … 500 3 Ann … … … 1,700 4 Jim … … … 1,500 5 Liz … … … 0 6 Dave … … … 9,000 7 Sue … … … 1,010 8 Bob … … … 50 9 Jim … … … 1,300 Column Storage Layout Customers Table ID Name Address City State Bal Due 1 Bob … … … 3,000 2 Sue … … … 500 3 Ann … … … 1,700 4 Jim … … … 1,500 5 Liz … … … 0 6 Dave … … … 9,000 7 Sue … … … 1,010 8 Bob … … … 9 Jim … … … Segment = 1 million row chunks ID Name Address City State Bal Due 50 1 Bob … … … 3,000 1,300 2 Sue … … … 500 3 Ann … … … 1,700 4 Jim … … … 1,500 5 Liz … … … 0 6 Dave … … … 9,000 7 Sue … … … 1,010 8 Bob … … … 50 9 Jim … … … 1,300 Run Length Encoding (RLE) Quarter ProdID Price Quarter Start Count Price Q1 1 100 Q1 1 310 100 Q1 1 120 Q2 311 290 120 Q1 1 315 … … … 315 Q1 1 100 Q1 1 315 Q1 2 198 … 2 450 Q2 2 320 Q2 … 320 Q2 1 150 Q2 1 256 256 Q2 1 450 450 Q2 1 192 Q2 1 184 Q2 2 310 Q2 2 251 251 … 2 266 266 100 ProdID Start Count 1 1 5 2 6 3 … … … 1 51 5 2 56 3 RLE Compression applied only when size of compressed data is smaller than original 315 198 450 320 320 150 192 184 310 Dictionary Encoding Quarter Q.ID Q.ID Quarter Q1 1 0 Q1 Q1 1 1 Q2 Q1 1 2 Q3 Q1 1 3 Q4 Q2 2 Q2 2 Q.ID Start Count … … 1 1 4 2 5 10 Q2 DISTINCT 2 R.L.E. Only 4 values. 2 bits are enough to represent it xVelocity Store Q3 3 3 11 4 Q3 3 4 15 15 Q3 3 Q3 3 Q4 4 Q4 4 Q4 4 Q4 4 … … CPU Architecture “Batch” Mode • Modern CPUs have many Cores • Cache Hierarchies: RAM L3,L2,L1 – Small L1 and L2 per core; L3 shared by socket – L1 faster than L2, L2 faster than L3 – CPUs stall when waiting for caches to load • Batch Mode sizes instructions & data to fit into L2/L1 cache ! Parallel Data Warehouse Overview Data Warehouse appliances A prepackaged or preconfigured balanced set of hardware (servers, memory, storage and I/O channels), software (operating system, DBMS and management software), service and support, sold as a unit with built-in redundancy for high availability positioned as a platform for data warehousing. Control Node Failover Protection: •Redundant Control Node •Redundant Compute Node •Cluster Failover •Redundante Array of Inexpensive Databases Spare Node Parallel Data Warehouse Appliance - Hardware Architecture Corporate Network Private Network Storage Nodes Database Servers Control Nodes SQL Active / Passive SQL Client Drivers SQL SQL Management Servers Landing Zone Dual Infiniband Data Center Monitoring SQL SQL SQL Dual Fiber Channel SQL SQL ETL Load Interface SQL Backup Node SQL Corporate Backup Solution Spare Database Server PDW Data Example PDW Compute Nodes Time Dim Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc SQL Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold SQL SQL Mktg Campaign Dim Store Dim Store Dim ID Store Name Store Mgr Store Size Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End SQL PDW Data Example Time Dim Smaller Dimension Tables are Replicated on Every Compute Node Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc SQL Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold SQL SQL Mktg Campaign Dim Store Dim Store Dim ID Store Name Store Mgr Store Size Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End SQL PDW Data Example Time Dim Larger Fact Table is Hash Distributed Across All Compute Nodes Product Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc SQL Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold SQL SQL Mktg Campaign Dim Store Dim Store Dim ID Store Name Store Mgr Store Size Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End SQL SMP vs. MPP SMP: Scale-Up • Complex – Tune Every IO interface • Cost – exponential • Obsolete Hardware MPP: Scale-Out Simple – Buy more Processing Nodes Cost – linear Keep all hardware investments SQL Server Parallel Data Warehouse A quick look at MPP query execution SQL Server PDW Appliance steps are executed on each compute node shared nothing MPP system connects to ‘the appliance’ like he would to a ‘normal’ SQL Server . generates a distributed execution plan Data Movement Service Dealing with Distributions - Shuffling Compute Node 2 Compute Node 1 Distributed Table 1 Red 5 Red 5 3 Blue 11 Red 8 5 Red 12 Red 12 7 Green 7 Green 7 2 Red 8 Blue 11 4 Blue 10 Blue 10 6 Yellow 12 Yellow 12 Blue 21 Green 7 Red 25 Yellow 12 Overall Architecture Control Rack Data Rack (up to 4) Compute Node 1 Control Node Client Interface (JDBC, ODBC, OLE-DB, ADO.NET) PDW Engine DMS Core PDW Agent DMS Manager Compute Node 2 PDW Agent DMS Core PDW Agent Landing Zone Node … ETL Interface Bulk Data Loader PDW Agent Management Node Compute Node 10 Active Directory Legend: PDW Agent PDW service PDW = Parallel Data Warehouse DMS = Data Movement Service DMS Core PDW Agent (c) 2011 Microsoft. All rights reserved. Release Themes How did it work before? Control Node • Problem – Basic RDBMS functionality, that already exists in SQL Server, was re-built in PDW • Challenge for PDW AU3 release – Can we leverage SQL Server and focus on MPP related challenges? SQL Server PDW AU3 Architecture PDW AU3 Architecture with Shell Appliance and Cost-Based Query Optimizer SELECT SELECT foo Shell Appliance (SQL Server) Every database exists as an empty ‘shell’ • All objects, no user data Control Node SQL Server runs a ‘Shell Appliance’ Engine Service foo Plan Steps Compute Node (SQL Server) Plan Steps Large parts of basic RDBMS functionality now provided by the shell • Authentication and authorization • Schema binding • Metadata catalog Plan Steps DDL executes against both the shell and the compute nodes Compute Node (SQL Server) Compute Node (SQL Server) foo foo SQL Server PDW AU3 Architecture PDW AU3 Architecture with Shell Appliance and Cost-Based Query Optimizer 1. User issues a query – SELECT Shell Appliance (SQL Server) MEMO – SQL Server performs parsing, binding, authorization SQL optimizer generates execution alternatives SELECT Return Engine Service 3. MEMO containing candidate plans, histograms, data types is generated Plan Steps Plan Steps Plan Steps 4. Parallel execution plan generated Control Node 2. Query is sent to the Shell through sp_showmemo_xml stored procedure Compute Node (SQL Server) Compute Node (SQL Server) 5. Parallel plan executes on compute nodes 6. Result returned to the user Compute Node (SQL Server) PDW Cost-Based Optimizer Optimizer lifecycle… 1. Simplification and space exploration – Query standardization and simplification (e.g. column reduction, predicates pushdown) – Logical space exploration (e.g. join re-ordering, local/global aggregation) – Space expansion (e.g. bushy trees – dealing with intermediate resultsets) – Physical space exploration – Serializing MEMO into binary XML (logical plans) – De-serializing binary XML into PDW Memo 2. Parallel optimization and pruning – Injecting data move operations (expansion) – Costing different alternatives – Pruning and selecting lowest cost distributed plan 3. SQL Generation – Generating SQL Statements to be executed PDW Cost-Based Optimizer … And Cost Model Details • PDW cost model assumptions: – Costing only data movement operations (relational operations excluded) – Sequential step execution (no pipelined and independent parallelism) • Data movement operations consist of multiple tasks • Each task has Fixed and Variable overhead • Uniform data distribution assumed (no data skew) PDW Sales Test Workload AU2 to AU3 80 70 60 50 AU2 40 AU3 30 20 10 0 1 • 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 5x improvement in terms of total elapsed time out of the box 29 30 31 32 33 34 35 36 37 38 39 Theme: Performance at Scale Zero data conversions in data movement DMS CPU Utilization - TPCH 60 50 • 40 Functionality • • Using ODBC instead of ADO.NET for reading and writing data Minimizing appliance resource utilization for data moves Benefits • • • • Better resource, CPU, utilization 6x or more faster move operations Increased concurrency Mixed workload (loads + queries) 30 20 10 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 • Eliminate CPU utilization spent on data conversions Further parallelize operations during data moves CPU (%) Goal AU2 AU3 Throughput improvement for data movements Repl Table Load Shuffle Replicate Trim Broadcast 0% 200% 400% 600% Theme: SQL Server Compatibility SQL Server Security and Metadata Security • • • • • SQL Server security syntax and semantics Supporting user, roles and logins Fixed database roles Allows script re-use Allows well-known security methods Metadata • • • • • PDW metadata stored in SQL Server Existing SQL Server metadata tables/views (e.g. security views) PDW distribution info as extended properties in SQL Server metadata Existing means and technology for persisting metadata Improved 3rd party tool compatibility (BI, ETL) Theme: SQL Server Compatibility Support for SQL Server (Native) Client Goal • • ‘Look’ just like a normal SQL Server Better integration with other BI tools Functionality • • • • Use existing SQL Server drivers to connect to SQL Server PDW Implement SQL Server TDS protocol Named Parameter support SQLCMD connectivity to PDW Benefits • • • • Use known tools and proven technology stack Existing SQL Server ’eco-system’ 2x performance improvement for return operations 5x reduction of connection time Server: 10.217.165.13, 17001 SQL Server Clients TDS (ADO.NET, ODBC, OLE-DB, JDBC) SequeLink SQL PDW Clients (ODBC, OLE-DB, ADO.NET) Server: 10.217.165.13, 17000 Theme: SQL Server Compatibility Stored Procedure Support (Subset) Goal • Support common scenarios of code encapsulation and reuse in Reporting and ETL Functionality • • • System and user-defined stored procedures Invocation using RPC or EXECUTE Control flow logic, input parameters Benefits • • • • Enables common logic re-use Big impact for Reporting Services scenarios Allows porting existing scripts Increases compatibility with SQL Server Theme: SQL Server Compatibility Collations Goal • Support local and international data Functionality • • • • Fixed server level collation User-defined column level collation Supporting all Windows collations Allow COLLATE clauses in Queries and DML Benefits • • • Store all the data in PDW w/ additional querying flexibility Existing T-SQL DDL and Query scripts SQL Server alignment and functionality Theme: Improved Integration SQL Server PDW Connectors Connector for Hadoop • • • • • Bi-directional (import/export) interface between MSFT Hadoop and PDW Delimited file support Adapter uses existing PDW tools (bulk loader, dwsql) Low cost solution that handles all the data: structured and unstructured Additional agility, flexibility and choice Connector for Informatica • • Connector providing PDW source and target (mappings, transformations) Informatica uses PDW bulk loader for fast loads • Leverage existing toolset and knowledge Connector for Business Objects Agenda • • • • Trends in the DW space How does SQL Server PDW fit in? SQL Server PDW AU3 – What’s new? Building BI Solutions with SQL Server PDW – – – – Customer Successes Using SQL Server PDW with Microsoft BI solutions Using SQL Server PDW with third party BI solutions BI solutions leveraging Hadoop integration • What’s coming next in SQL Server PDW? PDW Retail POS Workload Original Customer SMP solution vs. PDW AU3 (with cost-based query optimizer) 1600 1400 1200 1000 Old SMP 800 POS ODS AU3 600 400 200 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 How are customers using PDW & BI ? Data Volume • • 80 TB data warehouse analyzing data from exchanges Existing system based on SQL SMP farm – 2 different clusters of 6 servers each Portal Requirement • • • Linear scalability with additional hardware Support hourly loads with SSIS – 300GB/day BI Integration: SSRS, SSAS and PowerPivot Reports ETL Dashboards PDW AU3 Feedback • • • • SP and increased T-SQL support was great Migrating SMP SSRS to PDW was painless 142x for scan heavy queries & no summary tables Enabled queries that do not run on existing system Operational DB’s Scorecards Fast parallel Infiniband Aggregation abilities avoids ETL overhead in existing systems No need indexes indexed/materialized views Infiniban d GBit link SSAS with SQL Server PDW Understanding the differences compared to ‘SMP world’ foreign key constraints data design retrieval planning MOLAP & ROLAP Specific to the nature of large data limits include required data New Challenges for Business Analytics • • • Huge amount of data born ‘unstructured’ Increasing demand for (near) real-time business analytics Pre-filtering of important from less relevant raw data required HADOOP Applications • • • Sensor networks & RFID Social networks & Mobile Apps Biological & Genomics Sensor/ Blogs, RFID Data Docs Web Data Hadoop as a Platform Solution In the context of ETL , BI , and DW • Platform to accelerate ETL processes (not competing with current ETL software tools!) • Flexible and fast development of ‘handwritten’ refining requests of raw data Fast ETL processing Cost-Optimal storage • Active & cost effective data archive to let (historical) data ‘live forever’ • Co-existence with a relational DW (not completely replacing it !) HADOOP Fast Refinery Active Archi Importing HDFS data into PDW for advanced BI Application Programmers DBMS Admin Power BI Users SQOOP HADOOP Sensor/ Blogs, Web RFID Data Docs Data SQL Server PDW Interactive BI/Data Visualization Hadoop - PWD Integration via SQOOP (export) SQOOP export with source (HDFS path) & target (PDW DB & table) 2. Read HDFS data via mappers Copies incoming data on Landing Zone 3. 1. PDW Hadoop Connector FTP Server 5. … 4. Telnet Server … HDFS Invokes ‘DWLoader’ Compute Node 1 Landing Zone PDWconfiguration file Linux/ Window Hadoo s/PDW ControlCompute Node Nodes Compute Node 8 SQL Server PDW Roadmap What is coming next? CALENDAR YEAR 2011 Q1 Shipped Appliance Update 1 • Improved node manageability • Better performance and reduced overhead • OEM requests Q3 Q2 Shippe d CALENDAR YEAR 2012 Q4 Appliance Update 2 • Programmability • Batches • Control flow • Variables • Temp tables • QDR infiniband switch • Onboard Dell Q1 Shipped Q2 Q3 Q4 V-Next Appliance Update 3 • Cost based optimizer • Native SQL Server drivers, including JDBC • Collations • More expressive query language • Data Movement Services performance • SCOM pack • Stored procedures (subset) • Half-rack • 3rd party integration (Informatica, MicroStrategy, Business Objects, HADOOP) • • • • • • • • • Columnar store index Stored procedures Integrated Authentication PowerView integration Workload management LZ/BU redundancy Windows 8 SQL Server 2012 Hardware refresh