BISM and ColumnStoreIndex in SQL 2012 PPT

advertisement
Agenda
10 Key SQL 2012 BI Innovations
BI Semantic Model
Project ‘Apollo’
Vertipaq
xVelocity in SQL 2012
10 Key SQL 2012 BI Innovations
1. BI Semantic Model
2. Analysis Services Tabular
Mode
1.
With BIDS support
3. PowerPivot “2”
4. Power View (SSRS)
5. Self-Service Data Alerts
(SSRS)
6. Hadoop Big Data
Integration
7. xVelocity–Columnstore
Indexes
8. Geospatial Indexes
9. Unstructured Data
Queries
10. Data Quality Services
– Updated Master Data
Services
Analysis Services: Tomorrow
Build on the strengths
and success of Analysis
Services and expand
its reach to a much
broader user base
Bring together the
relational and
multidimensional
models under a single
unified BI platform—
best of both worlds!
Embrace the relational
data model – well
understood by
developers and IT Pros
Provide flexibility in
the platform to suit
the diverse needs of BI
applications
BI Semantic Model
• One Model for All User Experiences
• Visualise analysis using your favourite tools
• Model data the way you like
• Store analytical data however it is best done
• BISM is a concept, not a product
– Can be hosted in PowerPivot or SSAS
BI Semantic Model
One Model for All User Experiences
Your
Apps
Reporting Services
& Power View
Excel
SharePoint
Insights
PowerPivot
BI Semantic Model
Data Model
Business
Logic/Queries
Data Access
Databases
LOB Applications
Multidimensional
Tabular
MDX
DAX
ROLAP
Files
MOLAP
xVelocity
OData Feeds
Direct
Query
Cloud Services
BI Semantic Model
What about existing Analysis Services applications?
Existing
applications
Based on Unified
Dimensional Model
Existing
applications
New
applications
Every UDM becomes a BI
Semantic Model
New technology
options
BI Semantic Model: Architecture
Third-party
applications
Databases
Reporting
Services
LOB Applications
Excel
PowerPivot
SharePoint
Insights
Files
OData Feeds
Cloud Services
Tabular Mode
Scaling PowerPivot to Enterprise Needs
• Model in PowerPivot
– PowerPivot as source of SSAS Tabular Models
– Excel for browsing and testing in SSDT
• All new PowerPivot features:
– Diagrams, Measure Grid, KPIs, Hierarchies,
Perspectives, 30+ New DAX Functions
• …and, unique to SSAS Tabular Mode:
– Row-level Security, Partitions, Large Tables (>2 billion
rows), Images, Memory Paging
Example: Power View Over a Sales Model
End
User
SQL Server
Dynamics CRM
Model Developer
Example: Power View Over a Sales Model
End
User
SQL Server
Dynamics CRM
Model Developer
Example: Excel Over a Finance Model
End
User
Oracle
SAP
Model Developer
Example: Excel Over a Finance Model
End
User
Oracle
SAP
Model Developer
Demo….
Tour of SQL Server 2012
BISM
BI Semantic Model
Flexibility
Richness
Scalability
Data Model
Business Logic
Data Access and Storage
Analysis Services Architecture
SharePoint
Browser
BI Development Studio
Excel Services
Reporting Services
PowerPivo
t for
Excel
Excel
xlsx
Analysis Services
PowerPivot for
SharePoint
(Analysis Services)
BI Semantic Model
xlsx
Third Party Apps
Personal BI
Team BI
Organizational BI
Project ‘Apollo’ in SQL 2012
Apollo: A new column-oriented query
accelerator
• What is Apollo?
– What does column-oriented mean?
– How does it accelerate queries?
• When to create a columnstore index
• How to use Apollo
– Creating an index
– Running queries
– Loading data
• How does Apollo relate to VertiPaq and PowerPivot?
What is Apollo?
• Apollo is the code name for new functionality that is available
in SQL Server 2012
• It will substantially accelerate common data warehouse queries
• Adds a column store option in SQL Server database engine
– New index type in the database engine
• Advanced query processing algorithms
– New batch mode processing
When to use Apollo
• Data warehousing
–
Read-mostly workloads
–
Star joins
–
Process large amounts of data
• Generous amount of memory
– Best performance when data fits in memory
– Graceful degradation as fact table paged from disk
– Under severe memory constraints, falls back to row-at-a-time
processing
How does Apollo speed up queries? (1)
• Stores data column-wise
…
• Better compression
• Uses VertiPaq compression
technology
•
Less IO
C1
C2
C3
C4
C5
C6
How does Apollo speed up queries? (2)
SELECT region, sum (sales) …
•
Fetches only needed columns from
C2
C1
disk
–
Less IO
–
Better buffer hit rates
C3
C4
C5
C6
Improved Data Warehouse Query performance
• Columnstore indexes provide an easy way to
significantly improve data warehouse and
decision support query performance against
very large data sets
• Performance improvements for “typical” data
warehouse queries from 10x to 100x
• Ideal candidates include queries against star
schemas that use filtering, aggregations and
grouping against very large fact tables
34
What Happens When…
• You need to execute high performance DW queries against very
large data sets?
– In SQL Server 2008 and SQL Server 2008 R2
• OLAP (SSAS) MDX solution
• ROLAP and T-SQL + intermediate summary tables, indexed views and aggregate
tables
– Inherently inflexible
– In SQL Server 2012
• You can create a columnstore index on a very large fact table referencing all
columns with supporting data types
– Utilizing T-SQL and core Database Engine functionality
– Minimal query refactoring or intervention
• Upon creating the columnstore index, your table becomes “read only” – but you
can still use partitioning to switch in and out data OR drop/rebuild indexes
periodically
35
How Are These Performance Gains
Achieved?
• Two complimentary technologies:
– Storage
• Data is stored in a compressed columnar data format (stored by
column) instead of row store format (stored by row).
– Columnar storage allows for less data to be accessed when only a
sub-set of columns are referenced
– Data density/selectivity determines how compression friendly a
column is – example “State” / “City” / “Gender”
– Translates to improved buffer pool memory usage
– New “batch mode” execution
• Data can then be processed in batches (1,000 row blocks) versus
row-by-row
• Depending on filtering and other factors, a query may also
benefit by “segment elimination” - bypassing million row chunks
(segments) of data, further reducing I/O
36
Column vs. Row Store
• Column Store (values
compressed)
• Row Store (Heap / B-Tree)
Cost
ProductI
OrderDat
Cost
eD
310
2171.2
9
200107
2171.29
01
ProductI
D
311
OrderDat
e
data
2001070
1
page
1000
2001070
1
1912.1
5
312
2001070
2
2171.2
9
313
2001070
2
413.14
data
page
1001
310
data
page
2000
data
page
2002
data
page
2001
311
1912.15
…
312
2171.29
200107
313
02
413.14
…314
333.42
…315
ProductI
D
OrderDat
e
Cost
314
200107
01
333.42
315
200107
01
1295.0
0
316
200107
02
4233.1
4
317
200107
02
641.22
1295.00
200107
316
03
4233.14
317
…
641.22
318
…
24.95
319
…
64.32
320
…
321
1111.25
200107
04
37
…
Batch Mode
• Allows processing of 1,000 row blocks as an alternative to
single row-by-row operations
– Enables additional algorithms that can reduce CPU overhead
significantly
– Batch mode “segment” is a partition broken into million row chunks
with associated statistics used for Storage Engine filtering
• Batch mode can work to further improve query
performance of a columnstore index, but this mode isn’t
always chosen:
– Some operations aren’t enabled for batch mode:
• E.g. outer joins to columnstore index table / joining strings / NOT IN /
IN / EXISTS / scalar aggregates
– Row mode might be used if there is SQL Server memory pressure or
parallelism is unavailable
– Confirm batch vs. row mode by looking at the graphical execution
38
plan
Columnstore format + batch mode Variations
• Performance gains can come from a combination of:
– Columnstore indexing alone + traditional row mode in QP
– Columnstore indexing + batch mode in QP
– Columnstore indexing + hybrid of batch and traditional
row mode in QP
39
Creating a columnstore index
• T-SQL
• SSMS
40
Defining the Columnstore Index
• Index type
– Columnstore indexes are always non-clustered and non-unique
– They cannot be created on views, indexed views, sparse columns
– They cannot act as primary or foreign key constraints
• Column selection
– Unlike other index types, there are no “key columns”
• Instead you choose the columns that you anticipate will be used in your
queries
• Up to 1,024 columns – and the ordering in your CREATE INDEX doesn’t
matter
• No concept of “INCLUDE”
• No 900 byte index key size limit
• Column ordering
– Use of ASC or DESC sorting not allowed – as ordering is defined via
columnstore compression algorithms
41
Demo..
Accelerating Data Warehouse Queries with
SQL Server 2012 Columnstore Indexes
Supported Data Types
• Supported data types
– Char / nchar / varchar / nvarchar
• (max) types, legacy LOB types and FILESTREAM are not
supported
– Decimal/numeric
• Precision greater than 18 digits NOT supported
–
–
–
–
–
Tinyint, smallint, int, bigint
Float/real
Bit
Money, smallmoney
Date and time data types
• Datetimeoffset with scale > 2 NOT supported
43
Limitations
• Columnstore indexes cannot be used in conjunction with
– Change Data Capture and Change Tracking
– Filestream columns (supported columns from same table are
supported)
– Page, row and vardecimal storage compression
– Replication
– Sparse columns
• Data type limitations
– Binary / varbinary / ntext / text / image / varchar (max) / nvarchar
(max) / uniqueidentifier / rowversion / sql_variant / decimal or
numeric with precesion > 18 digits / CLR types / hierarchyid / xml /
datetimeoffset with scale > 2
• You can prevent a query from using the columnstore index
using the
IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query
44
hint
Adding data to a table with a columnstore
index
• Method 1: Disable the columnstore index
• Disable (or drop) the index
ALTER INDEX my_index ON T DISABLE
• Update the table
• Rebuild the columnstore index
ALTER INDEX my_index ON T REBUILD
Adding data to a table with a columnstore
index
• Method 2: Use Partitioning
• Load new data into a staging table
• Build a columnstore index
CREATE NONCLUSTERED COLUMNSTORE INDEX my_index ON
StagingT(OrderDate, ProductID, SaleAmount)
• Switch the partition into the table
ALTER TABLE StagingT SWITCH TO T PARTITION 5
Apollo and VertiPaq
• VertiPaq:
– PowerPivot for Excel
– PowerPivot for Sharepoint
– Analysis Services
– Database Engine – Apollo
• Use Apollo for relational data warehousing
– Large fact tables
– Ad hoc or reporting queries
– When you don’t need MDX
Performance example
• 1 TB version of the TPC-DS database
• 1.44 billion rows in catalog_sales fact table
• 32 logical processor machine with 256 GB RAM
• Warm start
• Query
SELECT w_city, w_state, d_year, SUM(cs_sales_price) AS
cs_sales_price
FROM warehouse, catalog_sales, date_dim
WHERE w_warehouse_sk = cs_warehouse_sk
and cs_sold_date_sk = d_date_sk and w_state in ('SD','OH')
and d_year in (2001,2002,2003)
GROUP BY w_city, w_state, d_year
ORDER BY d_year, w_state, w_city;
Performance example: Results
Total CPU time
Elapsed time
No columnstore
502 sec
501 sec
Columnstore
31.0 sec
1.10 sec
Speedup
16X
455X
Summary: Apollo in a nutshell
Columnstore technology
+
Advanced query processing
Astonishing speedup for DW queries
Great compression
Summary: SQL 2012 ColumStore
• SQL Server 2012 offers significantly faster query
performance for data warehouse and decision
support scenarios
– 10x to 100x performance improvement depending on the
schema and query
• I/O reduction and memory savings through columnstore
compressed storage
• CPU reduction with batch versus row processing, further I/O
reduction if segmentation elimination occurs
– Easy to deploy and requires less management than some
legacy ROLAP or OLAP methods
• No need to create intermediate tables, aggregates, preprocessing and cubes
– Interoperability with partitioning
51 consider
– For the best interactive end-user BI experience,
xVelocity in SQL 2012
No more Vertipaq, it’s now called xVelocity in-memory
technologies in SQL 2012
Q&A
Thank You
Download