Simultaneous Reporting from Structured and Unstructured Data

advertisement
PDW Architecture Gets Real:
Customer Implementations
Brian Walker | Microsoft Corporation
PDW Center of Excellence
Murshed Zaman | Microsoft Corporation
SQL Customer Advisory Team
April 10-12, Chicago, IL
Please silence
cell phones
April 10-12, Chicago, IL
Agenda
3
Introducing Parallel Data Warehouse
Pre-Built Hardware + Software Appliance
•
Co-engineered with HP and Dell
•
Pre-built Hardware
•
Pre-installed Software
•
Appliance installed in 1-2 days
•
Support - Microsoft provides first call support
•
Hardware partner provides onsite break/fix support
Plug and Play
Built-in Best
Practices
Save Time
5
The Power of PDW
Massively Parallel Processing (MPP)
Symmetric Multi-Processing (SMP)
6
The Basic Full Rack
Infiniband
& Ethernet
SQL Server PDW 2012
• Reduce hardware footprint by virtualizing the entire
control server rack down to a few nodes
• 1.5x lower price/TB providing the one of the lowest
price/TB in the industry
• Save up to 70% of storage with up to ~15x compression
via the xVelocity columstore
• Resilient, scalable, and high performance storage
features in Windows Server 2012 replace SAN with high
density, low cost SAS JBODS
•
•
•
•
128 cores on 8 compute
nodes
2TB of RAM on compute
Up to 168 TB of temp DB
Up to 1PB of user data
• 70% more disk I/O bandwidth over SQL Server PDW
2008 R2
7
Data Layout
PDW
Compute Nodes
Dimensional Model
Date Dim
Item Dim
Date Dim ID
Calendar Year
Calendar Qtr
Calendar Mo
Calendar Day
Prod Dim ID
Prod Category
Prod Sub Cat
Prod Desc
Sls Fact
D
s
D
S
D
Date Dim ID
Store Dim ID
Prod Dim ID
Mktg Camp Id
Qty Sold
Dollars Sold
S
D
Store Dim
Store Dim ID
Store Name
Store Mgr
Store Size
Promo
Dim
Mktg
Camp ID
Camp Name
Camp Mgr
Camp Start
Camp End
S
D
S
F
1
F
2
F
3
S
4
F
5
I
P
I
P
I
P
I
P
I
P
8
Seamlessly Add Capacity
Start Small Linearly Scale OUT
Smallest (53TB) To Largest (6PB)
Add
Capacity
•
Start small with a few Terabyte warehouse
•
Add capacity up to 6 Petabytes
Add
Capacity
53 TB
6 PB
Start Small
And Grow
Largest
Warehouse
PB
9
Any Size : Next-Gen Performance
Country
Supplier
Sales
Products
Customer
xVelocity - Fast Data Query Processing
Columnstore Provides Dramatic Performance
•
Updateable and clustered xVelocity columnstore
•
Stores data in columnar format
•
Memory-optimized for next-generation performance
•
Updateable to support bulk and/or trickle loading
Up to
50X Faster
Up to 15x
compression
Save Time
and Costs
Batch
Processing
10
The Power of Updatable ColumnStore Indexing on PDW 2012
Any Data: Hadoop Integration
Polybase Details
•
External Tables and full SQL query access to data
stored in HDFS
•
HDFS bridge for direct & fully parallelized
access of data in HDFS
•
Joining ‘on-the-fly’ PDW data with data from HDFS
•
Parallel import of data from HDFS in PDW tables for
persistent storage
Regular
T-SQL
Results
Enhanced
PDW Query
Engine
PDW 2012
Structured data
External Table
HDFS Bridge
•
Parallel export of PDW data into HDFS including
‘round-tripping’ of data
HDFS Data Nodes
Unstructured data
12
Existing Excel Skillset With Big Data
Familiar Tools To Analyze Structured/Unstructured Data
Familiar Tools Analyse Big Data
Hadoop
Data
Structured
Data
•
Native Microsoft BI Integration to PDW
•
Structured and unstructured data in same
spreadsheet
•
Widely adopted and familiar user tools
High Adoption
Of Excel
No IT
Intervention
Analyze All
Data Types
13
Simultaneous Reporting from Structured and Unstructured Data
14
15
Upgrading to PDW Gains 100x Improvement
“…basic queries that previously took 20 minutes only
took seconds using the SQL Server 2008 R2 Parallel
Data Warehouse.”
-Tom Settle, Assistant VP, Data Warehousing, Hy-Vee
Benefits
16
16
Business Objectives
Critical
Provide Broader Range of Critical Customer Purchasing Data
- Current system only supported 2 years of data – Business required 7 years
Load Speed
Improve Performance of Complex Transformations
- Faster delivery of data within specified SLAs
Save Time
Enable Self-Service Reporting
- SSAS/SSRS/SharePoint/Excel
Query
Enable User Ad hoc Reporting
- Leveraging Excel/SharePoint
Scale
Provide solution that Scales to Meet Future Data Needs
- Expansion of history, point of sale detail, and expansion into social media
Save Costs
Reduced IT Costs
- Creating self-sufficient end users – Frees IT to focus on delivering new data
17
Shift from ETL to ELT
Using the Power of MPP
•
Move their complex transformations and
calculations to SQL Server Parallel Data
Warehouse from ETL server
•
PDW has allowed Hy-Vee to create an
enterprise data warehouse centralizing data
from many sources
•
Complex Transformations
Archiving point of sale source files for later
data extraction
18
Upgrade to PDW 2012
Future Option
•
Improves their
opportunity to further
analyze social media
data
•
Query data without
having to move it into
a relational database
• Provides an alternative
archive solution for
point of sale data
19
Data Archive Challenge – Financial Customer
Current Solution
•
Reporting Services
Business only actively
analyzes a rolling 12
months of data
Archive Servers
•
Regulations require data is
on-line and accessible for
extended period
•
Data > 12 months is
pushed to a farm of SQL
servers to meet regulatory
requirements
Centralized EDW
Data Archive Challenge – Financial Customer
Future Solution
•
Replace archive farm with
Hadoop cluster
•
PDW provides single point
of access
•
Allows analyst to leverage
existing SQL skills
•
Much lower maintenance
and administration
•
Meets regulatory
requirements
Reporting Services
Archive Servers
Centralized EDW
HDFS bridge
HDFS Data Nodes
Unstructured data
AMD Boosts Performance with PDW
“We used to worry about backlogs, but no more,”
- Rajarao Chitturi, Database and Applications Manager at AMD
AMD is also processing more reporting queries than it previously could—between 10,000 and
13,000 a day—with an average runtime of a few seconds and virtually no performance issues.
Benefits
AMD runs an average of 1,500 loads per day, and data loads to a given table range from fourminute to four-hour intervals. AMD averages about 500,000 file loads a day.
Because of the user complaints about the previous system, the data warehouse team had one
employee devoted full time to addressing performance-related support tickets. With Parallel Data
Warehouse, AMD has reduced support work to just a few hours a week.
22
22
AMD Business Challenges
Obstacles With SMP Oracle
Linux Based Reporting
Load Demand
•
Only supported 6 month
data retention
•
Loading data always
lagged behind by days
•
Issues loading
concurrently with high
query volume
•
Analyst couldn’t access
recent data
•
Continuous data loads
throughout the day
while users were
querying the system
•
Custom reporting tools
hosted on Linux uses
JDBC and ODBC drivers
23
Project Overview
Critical
Wafer Quality Assurance Data
- 42 TB on PDW
Save Space
Space Saving PDW Index Lite Approach
- Oracle required excessive non-clustered indexes to get any performance
Load Speed
Improved Loading Speed
- 660 GB/hr. throughput
Query
10,000 – 13,000 Analytic Queries per Day
- Most are scan intensive
Save Time
Faster Backups – Complete in 1~2 hours per Database
- Compared to a week on Oracle
Save Costs
Reduced Support Costs by 90%
- No more chopping up queries to fit the data warehouse
24
Parallel Data Warehouse 2012
25
Other PDW Sessions
Online Advertising: Hybrid Approach to Large-Scale Data Analysis
(DAV-303-M)
Data Analytics and Visualization
Breakout Session (60 minutes)
Fri April 12, 2013, 2:45 PM - 3:45 PM in Sheraton 3
Anna Skobodzinski
Christian Bonilla
Dmitri Tchikatilov
Trevor Attridge
26
Win a Microsoft Surface Pro!
Complete an online SESSION EVALUATION
to be entered into the draw.
Draw closes April 12, 11:59pm CT
Winners will be announced on the PASS BA
Conference website and on Twitter.
Go to passbaconference.com/evals or follow the QR code link displayed on
session signage throughout the conference venue.
Your feedback is important and valuable. All feedback will be used to improve
and select sessions for future events.
27
Thank you!
Diamond Sponsor
Platinum Sponsor
April 10-12, Chicago, IL
Download