Big Decision HPS Performance CoE Jimmy ZHAO June 10, 2013 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP’s Big Data Benchmark Strategy © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP’s Big Data Community Engagement HP has lead in BI performance for a long time, and we are interested in working with the WDBD to leverage that leadership to Big Data • HP is the only company who ever held #1 nonclustered results across 100GB, 300GB, 1TB, 3TB, 10TB, and 30TB in TPC-H (see attached slide) • • 3 Today HP continues to lead in non-clustered high-end TPC-H: #1 x86 3TB, #1 10TB, and #1 30TB HP has more TPC-H publication than others © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Business Intelligence (BI) Performance Leadership HP ProLiant DL380 G7 DL585 G7 TPC-H non-clustered results* Superdome 2 DL980 G7 HP Integrity Superdome Superdome • Sustained leadership in BI performance over several years • Multi-OS proof points: HP-UX, Windows, and Linux • Multi-DB proof points: Oracle, SQL Server, and Sybase 4 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. *Results as of July 15, 2010. Cannot be shared externally without additional TPC data. HAVEn – Big Data Analytics Platform HAVEn Hadoop/ Autonom HDFS y IDOL Process and Catalog massive volumes of distributed data Social media 5 Video Analyze at extreme scale in real-time index all information Audio Email Enterpris Vertica Texts Mobile Transactional data e Security Collect & unify machine data Documents © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. IT/OT nApps Powering HP Software + your apps Search engine Images Big Data Benchmarking Problem State © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. New Business Model Eco-system Analytics Data Driven Business Model • • Know more from business, partners and customers • More business, customer, behavior, effect and efficiency data from existing systems SoMolized Business – Social & Mobile • Social marketing, advertisement and promotion • • • • Business process tracking and reengineering Effectiveness data • • • Marketing effectiveness Customer understanding Customer satisfaction Popularity ranking – Like/Unlike Mobile Internet • 7 Efficiency data Anytime and anywhere: Time + Location data © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Requirements for Big Data Benchmarking Modeling the real-world applications Low cost • • • • • Handling huge data volume Data is variable Multiple types of analysis Model the real world infrastructure and technologies Demonstrating the new business model • Changes in the business systems • • 8 Cloud & Big Data • Reuse the same infrastructure for variable analysis works Simple framework High Velocity • Different kinds of queries • Interactive/ad-hoc queries Support business growth • • Number of analysis jobs Size of data SoMo model © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Decision Support System Definition © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Traditional Data Mining Process 10 ………… …… ETL SemiStructure Data mart Structure Data mart Data Warehouse © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Analytic CRISP-DM vs. NG-DM Modeling Business Understanding Data Preparation Evaluation BDPMED 11 NG-DM • Larger data volume • More complicated • Faster deploy • Faster analytics © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Machine Learning BUPME-(ML) DSS System - Scope Sources DSS System Query Data In System Transform Extract Machine Learning in System Load B U M B U P E D E P P M Parallel M P U E P P P B U D E M M U E U E M U E U E M P M M P U E B U D E M 12 M DU Mix BU E ML DU P DU P © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. DSS System Scope Larger Database Larger & Hybrid DW In-database Analytic Keep growing more Visual & Interactive AI Un-structured Continuall y Integratio n Semi-Structured Slowly Structured shrink 13 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Analytic Benchmark Design - Big Decision © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Why & How? Benchmark for A DSS/Data Mining solutions Big Decision – Big TPC-DS! • Everything running in the same system TPC-DS • Engine of Analytics • Mature and proved workload for BI • Reflecting the real business model • Mix workloads Huge data volume • Well defined scale factors • Data from Social SoMoized TPC-DS • Data from Web log • Additional data and dimension from new data • Data from Comments • Semi-structured and unstructured data Broader Data support • TB to PB or event Zeta Byte support • Semi-structured data NEW TPC-DS generator – Agile ETL • Un-structured data • Continuously data generation and injection • Consider as part of the workloads Continuous Data Integration • ETL just a normal job of the system • Data Integration whenever there’s data Big Data Analytics 15 New massive parallel processing technologies • Convert queries to SQL liked queries • Include interactive & regular Queries • Include Machine Learning jobs © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Decision Block Diagram TPC-DS Marketing Social Message SNS Marketing Mobile log Social Feedbacks Sales Item Web page Customer Web log Reviews Mobile log Search & Social Advertise Search 16 Social Web pages Social Advertise Agile ETL Extractio n © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Transfor m Load SoMolized Retails Data Model Design • Almost the same data model • Inject more data – Networking data – Behavior data – Tracking data – Preference data • More complicated data dimension – Time + Location SM_Sites Date_Dim Item Time_Dim SM_Promotio n SM_Web_Sales Ship_Mode Web_Page Customer_Dem ographics Customer_A ddress SM_Custome r Household_ Demographi cs Income_Ban d Web & Mobile Log 17 Warehouse © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. DI: Data Injectors MR: Map Reduce Jobs Workload Design Sources ML: Machine Learning Algorithms DSS System Query Data In System Transform Extract 20% 30% Machine Learning in System Load B U Structured DI Semi-structured DI DU D E SQL 9 SQL P M M P 50% SQL Liked/MR U E Unstructured DI 90 SQL Liked/MR Job M P U E M L ML Huge Agile ETL Mix and Parallel Analytics Volume to change without notice. 18 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject Workloads 9 ML Jobs Architecture - Deployment Driver Big Decision System Batch Controller Injectors Flume Second ary NN Name Node MR Driver Query Driver 19 Data Node © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. … Data Node N Architecture Agile ETL + MR/Query Batch Controller Query MR Query … … Semi-structured DI MR … Flume Query Drivers … Unstructured DI MR Drivers Structured DI Injector Domain 20 Quer y … Modified TPC-DS Generator Mining Domain © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Scale Factor based DI: Data Injectors MR: Map Reduce Jobs Benchmarking Metrics Batch Controller Peak SQL SQL SQL SQL Liked/ MR SQL Liked/ MR SQL Liked/ MR SQL Liked/ MR SQL ML SQL ML SQL ML SQL ML SQL SQL Liked/ MR SQL Liked/ MR SQL Liked/ MR SQL Liked/ MR SQL Liked/ MR DI SQL Liked/ MR 21 DI DI DI © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. ML SQL DI SQL Liked/ MR DI ML DI ML SQL SQL Liked/ MR DI ML DI ML SQL SQL Liked/ MR DI ML DI ML SQL SQL Liked/ MR DI ML DI ML SQL SQL Consistency SQL Scaling DI ML: Machine Learning Algorithms SQL Liked/ MR DI ML ML Thank you ? © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.