Agenda 10 Key SQL 2012 BI Innovations BI Semantic Model Project ‘Apollo’ Vertipaq xVelocity in SQL 2012 10 Key SQL 2012 BI Innovations 1. BI Semantic Model 2. Analysis Services Tabular Mode 1. With BIDS support 3. PowerPivot “2” 4. Power View (SSRS) 5. Self-Service Data Alerts (SSRS) 6. Hadoop Big Data Integration 7. xVelocity–Columnstore Indexes 8. Geospatial Indexes 9. Unstructured Data Queries 10. Data Quality Services – Updated Master Data Services Analysis Services: Tomorrow Build on the strengths and success of Analysis Services and expand its reach to a much broader user base Bring together the relational and multidimensional models under a single unified BI platform— best of both worlds! Embrace the relational data model – well understood by developers and IT Pros Provide flexibility in the platform to suit the diverse needs of BI applications BI Semantic Model • One Model for All User Experiences • Visualise analysis using your favourite tools • Model data the way you like • Store analytical data however it is best done • BISM is a concept, not a product – Can be hosted in PowerPivot or SSAS BI Semantic Model One Model for All User Experiences Your Apps Reporting Services & Power View Excel SharePoint Insights PowerPivot BI Semantic Model Data Model Business Logic/Queries Data Access Databases LOB Applications Multidimensional Tabular MDX DAX ROLAP Files MOLAP xVelocity OData Feeds Direct Query Cloud Services BI Semantic Model What about existing Analysis Services applications? Existing applications Based on Unified Dimensional Model Existing applications New applications Every UDM becomes a BI Semantic Model New technology options BI Semantic Model: Architecture Third-party applications Databases Reporting Services LOB Applications Excel PowerPivot SharePoint Insights Files OData Feeds Cloud Services Tabular Mode Scaling PowerPivot to Enterprise Needs • Model in PowerPivot – PowerPivot as source of SSAS Tabular Models – Excel for browsing and testing in SSDT • All new PowerPivot features: – Diagrams, Measure Grid, KPIs, Hierarchies, Perspectives, 30+ New DAX Functions • …and, unique to SSAS Tabular Mode: – Row-level Security, Partitions, Large Tables (>2 billion rows), Images, Memory Paging Example: Power View Over a Sales Model End User SQL Server Dynamics CRM Model Developer Example: Power View Over a Sales Model End User SQL Server Dynamics CRM Model Developer Example: Excel Over a Finance Model End User Oracle SAP Model Developer Example: Excel Over a Finance Model End User Oracle SAP Model Developer Demo…. Tour of SQL Server 2012 BISM BI Semantic Model Flexibility Richness Scalability Data Model Business Logic Data Access and Storage Analysis Services Architecture SharePoint Browser BI Development Studio Excel Services Reporting Services PowerPivo t for Excel Excel xlsx Analysis Services PowerPivot for SharePoint (Analysis Services) BI Semantic Model xlsx Third Party Apps Personal BI Team BI Organizational BI Project ‘Apollo’ in SQL 2012 Apollo: A new column-oriented query accelerator • What is Apollo? – What does column-oriented mean? – How does it accelerate queries? • When to create a columnstore index • How to use Apollo – Creating an index – Running queries – Loading data • How does Apollo relate to VertiPaq and PowerPivot? What is Apollo? • Apollo is the code name for new functionality that is available in SQL Server 2012 • It will substantially accelerate common data warehouse queries • Adds a column store option in SQL Server database engine – New index type in the database engine • Advanced query processing algorithms – New batch mode processing When to use Apollo • Data warehousing – Read-mostly workloads – Star joins – Process large amounts of data • Generous amount of memory – Best performance when data fits in memory – Graceful degradation as fact table paged from disk – Under severe memory constraints, falls back to row-at-a-time processing How does Apollo speed up queries? (1) • Stores data column-wise … • Better compression • Uses VertiPaq compression technology • Less IO C1 C2 C3 C4 C5 C6 How does Apollo speed up queries? (2) SELECT region, sum (sales) … • Fetches only needed columns from C2 C1 disk – Less IO – Better buffer hit rates C3 C4 C5 C6 Improved Data Warehouse Query performance • Columnstore indexes provide an easy way to significantly improve data warehouse and decision support query performance against very large data sets • Performance improvements for “typical” data warehouse queries from 10x to 100x • Ideal candidates include queries against star schemas that use filtering, aggregations and grouping against very large fact tables 34 What Happens When… • You need to execute high performance DW queries against very large data sets? – In SQL Server 2008 and SQL Server 2008 R2 • OLAP (SSAS) MDX solution • ROLAP and T-SQL + intermediate summary tables, indexed views and aggregate tables – Inherently inflexible – In SQL Server 2012 • You can create a columnstore index on a very large fact table referencing all columns with supporting data types – Utilizing T-SQL and core Database Engine functionality – Minimal query refactoring or intervention • Upon creating the columnstore index, your table becomes “read only” – but you can still use partitioning to switch in and out data OR drop/rebuild indexes periodically 35 How Are These Performance Gains Achieved? • Two complimentary technologies: – Storage • Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row). – Columnar storage allows for less data to be accessed when only a sub-set of columns are referenced – Data density/selectivity determines how compression friendly a column is – example “State” / “City” / “Gender” – Translates to improved buffer pool memory usage – New “batch mode” execution • Data can then be processed in batches (1,000 row blocks) versus row-by-row • Depending on filtering and other factors, a query may also benefit by “segment elimination” - bypassing million row chunks (segments) of data, further reducing I/O 36 Column vs. Row Store • Column Store (values compressed) • Row Store (Heap / B-Tree) Cost ProductI OrderDat Cost eD 310 2171.2 9 200107 2171.29 01 ProductI D 311 OrderDat e data 2001070 1 page 1000 2001070 1 1912.1 5 312 2001070 2 2171.2 9 313 2001070 2 413.14 data page 1001 310 data page 2000 data page 2002 data page 2001 311 1912.15 … 312 2171.29 200107 313 02 413.14 …314 333.42 …315 ProductI D OrderDat e Cost 314 200107 01 333.42 315 200107 01 1295.0 0 316 200107 02 4233.1 4 317 200107 02 641.22 1295.00 200107 316 03 4233.14 317 … 641.22 318 … 24.95 319 … 64.32 320 … 321 1111.25 200107 04 37 … Batch Mode • Allows processing of 1,000 row blocks as an alternative to single row-by-row operations – Enables additional algorithms that can reduce CPU overhead significantly – Batch mode “segment” is a partition broken into million row chunks with associated statistics used for Storage Engine filtering • Batch mode can work to further improve query performance of a columnstore index, but this mode isn’t always chosen: – Some operations aren’t enabled for batch mode: • E.g. outer joins to columnstore index table / joining strings / NOT IN / IN / EXISTS / scalar aggregates – Row mode might be used if there is SQL Server memory pressure or parallelism is unavailable – Confirm batch vs. row mode by looking at the graphical execution 38 plan Columnstore format + batch mode Variations • Performance gains can come from a combination of: – Columnstore indexing alone + traditional row mode in QP – Columnstore indexing + batch mode in QP – Columnstore indexing + hybrid of batch and traditional row mode in QP 39 Creating a columnstore index • T-SQL • SSMS 40 Defining the Columnstore Index • Index type – Columnstore indexes are always non-clustered and non-unique – They cannot be created on views, indexed views, sparse columns – They cannot act as primary or foreign key constraints • Column selection – Unlike other index types, there are no “key columns” • Instead you choose the columns that you anticipate will be used in your queries • Up to 1,024 columns – and the ordering in your CREATE INDEX doesn’t matter • No concept of “INCLUDE” • No 900 byte index key size limit • Column ordering – Use of ASC or DESC sorting not allowed – as ordering is defined via columnstore compression algorithms 41 Demo.. Accelerating Data Warehouse Queries with SQL Server 2012 Columnstore Indexes Supported Data Types • Supported data types – Char / nchar / varchar / nvarchar • (max) types, legacy LOB types and FILESTREAM are not supported – Decimal/numeric • Precision greater than 18 digits NOT supported – – – – – Tinyint, smallint, int, bigint Float/real Bit Money, smallmoney Date and time data types • Datetimeoffset with scale > 2 NOT supported 43 Limitations • Columnstore indexes cannot be used in conjunction with – Change Data Capture and Change Tracking – Filestream columns (supported columns from same table are supported) – Page, row and vardecimal storage compression – Replication – Sparse columns • Data type limitations – Binary / varbinary / ntext / text / image / varchar (max) / nvarchar (max) / uniqueidentifier / rowversion / sql_variant / decimal or numeric with precesion > 18 digits / CLR types / hierarchyid / xml / datetimeoffset with scale > 2 • You can prevent a query from using the columnstore index using the IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query 44 hint Adding data to a table with a columnstore index • Method 1: Disable the columnstore index • Disable (or drop) the index ALTER INDEX my_index ON T DISABLE • Update the table • Rebuild the columnstore index ALTER INDEX my_index ON T REBUILD Adding data to a table with a columnstore index • Method 2: Use Partitioning • Load new data into a staging table • Build a columnstore index CREATE NONCLUSTERED COLUMNSTORE INDEX my_index ON StagingT(OrderDate, ProductID, SaleAmount) • Switch the partition into the table ALTER TABLE StagingT SWITCH TO T PARTITION 5 Apollo and VertiPaq • VertiPaq: – PowerPivot for Excel – PowerPivot for Sharepoint – Analysis Services – Database Engine – Apollo • Use Apollo for relational data warehousing – Large fact tables – Ad hoc or reporting queries – When you don’t need MDX Performance example • 1 TB version of the TPC-DS database • 1.44 billion rows in catalog_sales fact table • 32 logical processor machine with 256 GB RAM • Warm start • Query SELECT w_city, w_state, d_year, SUM(cs_sales_price) AS cs_sales_price FROM warehouse, catalog_sales, date_dim WHERE w_warehouse_sk = cs_warehouse_sk and cs_sold_date_sk = d_date_sk and w_state in ('SD','OH') and d_year in (2001,2002,2003) GROUP BY w_city, w_state, d_year ORDER BY d_year, w_state, w_city; Performance example: Results Total CPU time Elapsed time No columnstore 502 sec 501 sec Columnstore 31.0 sec 1.10 sec Speedup 16X 455X Summary: Apollo in a nutshell Columnstore technology + Advanced query processing Astonishing speedup for DW queries Great compression Summary: SQL 2012 ColumStore • SQL Server 2012 offers significantly faster query performance for data warehouse and decision support scenarios – 10x to 100x performance improvement depending on the schema and query • I/O reduction and memory savings through columnstore compressed storage • CPU reduction with batch versus row processing, further I/O reduction if segmentation elimination occurs – Easy to deploy and requires less management than some legacy ROLAP or OLAP methods • No need to create intermediate tables, aggregates, preprocessing and cubes – Interoperability with partitioning 51 consider – For the best interactive end-user BI experience, xVelocity in SQL 2012 No more Vertipaq, it’s now called xVelocity in-memory technologies in SQL 2012 Q&A Thank You