Big Decision

advertisement
Big Decision
HPS Performance CoE
Jimmy ZHAO
June 10, 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP’s Big Data
Benchmark Strategy
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP’s Big Data Community Engagement
HP has lead in BI performance for a long
time, and we are interested in working with
the WDBD to leverage that leadership to
Big Data
•
HP is the only company who ever held #1 nonclustered results across 100GB, 300GB, 1TB,
3TB, 10TB, and 30TB in TPC-H (see attached
slide)
•
•
3
Today HP continues to lead in non-clustered high-end
TPC-H: #1 x86 3TB, #1 10TB, and #1 30TB
HP has more TPC-H publication than others
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Business Intelligence (BI) Performance
Leadership
HP ProLiant
DL380 G7
DL585 G7
TPC-H non-clustered results*
Superdome 2 DL980 G7
HP Integrity
Superdome
Superdome
• Sustained leadership in BI performance over several years
• Multi-OS proof points: HP-UX, Windows, and Linux
• Multi-DB proof points: Oracle, SQL Server, and Sybase
4
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
*Results as of July 15, 2010. Cannot be shared externally without additional TPC data.
HAVEn – Big Data Analytics Platform
HAVEn
Hadoop/
Autonom
HDFS
y
IDOL
Process and
Catalog
massive
volumes of
distributed data
Social media
5
Video
Analyze at
extreme scale
in real-time
index all
information
Audio
Email
Enterpris
Vertica
Texts
Mobile
Transactional
data
e
Security
Collect & unify
machine data
Documents
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
IT/OT
nApps
Powering
HP Software
+ your apps
Search engine
Images
Big Data Benchmarking
Problem State
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
New Business Model
Eco-system Analytics
Data Driven Business Model
•
•
Know more from business, partners and
customers
• More business, customer, behavior,
effect and efficiency data from existing
systems
SoMolized Business – Social & Mobile
•
Social marketing, advertisement and promotion
•
•
•
•
Business process tracking and reengineering
Effectiveness data
•
•
•
Marketing effectiveness
Customer understanding
Customer satisfaction
Popularity ranking – Like/Unlike
Mobile Internet
•
7
Efficiency data
Anytime and anywhere: Time + Location data
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Requirements for Big Data Benchmarking
Modeling the real-world applications
Low cost
•
•
•
•
•
Handling huge data volume
Data is variable
Multiple types of analysis
Model the real world infrastructure and
technologies
Demonstrating the new business
model
•
Changes in the business systems
•
•
8
Cloud & Big Data
•
Reuse the same infrastructure for
variable analysis works
Simple framework
High Velocity
•
Different kinds of queries
• Interactive/ad-hoc queries
Support business growth
•
•
Number of analysis jobs
Size of data
SoMo model
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Decision Support System
Definition
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Traditional Data Mining Process
10
…………
……
ETL
SemiStructure
Data mart
Structure
Data mart
Data
Warehouse
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Analytic
CRISP-DM vs. NG-DM
Modeling
Business
Understanding
Data
Preparation
Evaluation
BDPMED
11
NG-DM
• Larger data volume
• More complicated
• Faster deploy
• Faster analytics
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Machine
Learning
BUPME-(ML)
DSS System - Scope
Sources
DSS System
Query Data In System
Transform
Extract
Machine Learning in System
Load
B
U
M
B
U
P
E
D
E
P
P
M
Parallel
M
P
U
E
P
P
P
B
U
D
E
M
M
U
E
U
E
M
U
E
U
E
M
P
M
M
P
U
E
B
U
D
E
M
12
M
DU
Mix
BU
E
ML
DU
P
DU
P
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
DSS System Scope
Larger
Database
Larger &
Hybrid DW
In-database Analytic
Keep
growing
more
Visual &
Interactive
AI
Un-structured
Continuall
y
Integratio
n
Semi-Structured
Slowly
Structured
shrink
13 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Analytic
Benchmark Design
- Big Decision
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Why & How?
Benchmark for A DSS/Data Mining solutions
Big Decision – Big TPC-DS!
•
Everything running in the same system
TPC-DS
•
Engine of Analytics
•
Mature and proved workload for BI
•
Reflecting the real business model
•
Mix workloads
Huge data volume
•
Well defined scale factors
•
Data from Social
SoMoized TPC-DS
•
Data from Web log
•
Additional data and dimension from new data
•
Data from Comments
•
Semi-structured and unstructured data
Broader Data support
•
TB to PB or event Zeta Byte support
•
Semi-structured data
NEW TPC-DS generator – Agile ETL
•
Un-structured data
•
Continuously data generation and injection
•
Consider as part of the workloads
Continuous Data Integration
• ETL just a normal job of the system
• Data Integration whenever there’s data
Big Data Analytics
15
New massive parallel processing technologies
•
Convert queries to SQL liked queries
•
Include interactive & regular Queries
•
Include Machine Learning jobs
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Big Decision Block Diagram
TPC-DS
Marketing
Social
Message
SNS Marketing
Mobile log
Social
Feedbacks
Sales
Item
Web page
Customer
Web log
Reviews
Mobile log
Search & Social Advertise
Search
16
Social Web
pages
Social
Advertise
Agile ETL
Extractio
n
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Transfor
m
Load
SoMolized Retails Data Model Design
• Almost the same data model
• Inject more data
– Networking data
– Behavior data
– Tracking data
– Preference data
• More complicated data
dimension
– Time + Location
SM_Sites
Date_Dim
Item
Time_Dim
SM_Promotio
n
SM_Web_Sales
Ship_Mode
Web_Page
Customer_Dem
ographics
Customer_A
ddress
SM_Custome
r
Household_
Demographi
cs
Income_Ban
d
Web & Mobile
Log
17
Warehouse
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
DI: Data Injectors
MR: Map Reduce Jobs
Workload Design
Sources
ML: Machine Learning Algorithms
DSS System
Query Data In System
Transform
Extract
20%
30%
Machine Learning in System
Load
B
U
Structured DI
Semi-structured
DI
DU
D
E
SQL
9 SQL
P
M
M
P
50%
SQL
Liked/MR
U
E
Unstructured DI
90 SQL Liked/MR Job
M
P
U
E
M
L
ML
Huge
Agile ETL
Mix and Parallel Analytics
Volume
to change without notice.
18 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject
Workloads
9 ML Jobs
Architecture - Deployment
Driver
Big Decision System
Batch Controller
Injectors
Flume
Second
ary NN
Name
Node
MR Driver
Query Driver
19
Data
Node
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
…
Data
Node N
Architecture
Agile ETL + MR/Query
Batch Controller
Query
MR
Query
…
…
Semi-structured DI
MR
…
Flume
Query
Drivers
…
Unstructured DI
MR
Drivers
Structured DI
Injector Domain
20
Quer
y
…
Modified
TPC-DS
Generator
Mining Domain
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Scale
Factor
based
DI: Data Injectors
MR: Map Reduce Jobs
Benchmarking Metrics
Batch Controller
Peak
SQL
SQL
SQL
SQL
Liked/
MR
SQL
Liked/
MR
SQL
Liked/
MR
SQL
Liked/
MR
SQL
ML
SQL
ML
SQL
ML
SQL
ML
SQL
SQL
Liked/
MR
SQL
Liked/
MR
SQL
Liked/
MR
SQL
Liked/
MR
SQL
Liked/
MR
DI
SQL
Liked/
MR
21
DI
DI
DI
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
ML
SQL
DI
SQL
Liked/
MR
DI
ML
DI
ML
SQL
SQL
Liked/
MR
DI
ML
DI
ML
SQL
SQL
Liked/
MR
DI
ML
DI
ML
SQL
SQL
Liked/
MR
DI
ML
DI
ML
SQL
SQL
Consistency
SQL
Scaling
DI
ML: Machine Learning Algorithms
SQL
Liked/
MR
DI
ML
ML
Thank you
?
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Download