The New BI Ecosystem: How Big Data Merges Top Down and Bottom up Computing Wayne W. Eckerson Director of Research and Founder Founder, BI Leadership Forum Agenda • Big data platforms –Relational databases –Analytical databases –Hadoop • New analytical ecosystem 2 What comes next? • Kilobyte (KB) • Megabyte (MB) • Gigabyte (GB) • Terabyte (TB) • Petabyte (PB) • Exabyte (EB) • Zettabyte (ZB) • Yottabyte (YB) – 103 bytes –106 bytes – 109 bytes –1012 bytes – 1015 bytes – 1018 bytes – 1021 bytes – 1024 bytes 3 What is “big data”? Data Systems Movement Yes! a) b) c) d) e) f) g) h) i) Lots of data Different types of data More data than you can handle Purpose-built analytical systems Distributed file system New staging area and archive A Java developer’s employment act A replacement for the RDBMS A club for hip data people Information explosion Unstructured & Content Depot Structured & Replicated Source: IDC Digital Universe 2009; White Paper, Sponsored by EMC, May 2009 2005 2006 2007 2008 2009 2010 2011 2012 Every 18 months, non-rich structured and unstructured enterprise data doubles 5 Data deluge • Structured data – Call detail records – Point of sale records – Claims data • Semi-structured data – Web logs – Sensor data – Email, Twitter • Unstructured data – Video, Audio, – Images, Text “A Sea of Sensors”, The Economist, Nov 4, 2010 6 From transactions to observations Structured Semi-Structured 7 Unstructured Three big data platforms (systems) • General purpose relational database • Analytical database • Hadoop 8 1. General purpose RDBMS - Powers first generation DW Benefits: - RDBMS already inhouse - SQL-based - Trained DBAs Operational System Operational System ETL Warehouse DataData Warehouse ETL Data Mart Operational System Operational System Challenges: - Cost to deploy and upgrade - Doesn’t support complex analytics - Scalability and performance 9 BI Server Reports / Dashboards 2. Analytical platforms 1010data Aster Data (Teradata) Calpont Datallegro (Microsoft) Exasol Greenplum (EMC) IBM SmartAnalytics Infobright Kognitio Netezza (IBM) Oracle Exadata Paraccel Pervasive Sand Technology SAP HANA Sybase IQ (SAP) Teradata Vertica (HP) Purpose-built database management systems designed explicitly for query processing and analysis that provides dramatically higher price/performance and availability compared to general purpose solutions. Deployment Options -Software only (Paraccel, Vertica) -Appliance (SAP, Exadata, Netezza) -Hosted(1010data, Kognitio) Game-changing technology • Quicker to deploy – Preconfigured and tuned – Fast ROI • Faster and more scalable – Faster query response times – Linear performance • Built-in analytics – Libraries of functions – Extensible SDK • Less costly – Less power, cooling, space – Fewer people to maintain Business value of analytic platforms • Kelley Blue Book – Consolidates millions of auto transactions each week to calculate car valuations • AT&T Mobility – Tracks purchasing patterns for 80M customers daily to optimize targeted marketing Analytical appliance Analytical Database 3. Hadoop •Ecosystem of open source projects •Hosted by Apache Foundation •Google developed and shared concepts •Distributed file system that scales out on commodity servers with direct attached storage and automatic failover. 13 Hadoop distilled: What’s new? Benefits Unstructured data Distributed File System Data scientist BIG DATA “Schema at Read” - Comprehensive - Agile - Expressive - Affordable Drawbacks Open Source $$ No SQL MapReduce 14 - Immature - Batch oriented - Expertise - TCO Hadoop ecosystem Source: Hortonworks Hadoop use cases • Sabre Holdings – Analyze airline shopping data • Vestas – Site wind turbines by modeling larger volumes of weather data • CBS Interactive – Optimize ad placement and pricing • Nokia – Identify new data services 16 Hadoop hype Overheard “Hadoop will replace relational databases.” “Hadoop will replace data warehouses.” “Hadoop has a superior query engine compared to analytical platforms.” Gartner Group – Hype Cycle 17 “Use Hadoop for any application that requires more than one node.” Hadoop adoption rates No plans 38% Considering 32% Experimenting Implementing In production 20% 5% 4% Based on 158 respondents, BI Leadership Forum, April, 2012 18 Hadoop workloads Today In 18 Months Staging area 92% 92% Online archive 92% 92% 83% Transformation Engine 58% Ad hoc queries 42% Scheduled reports Visual exploration 67% 25% Data mining Based on respondents that19have implemented Hadoop. BI Leadership Forum, April, 2012 67% 67% 58% 83% 92% Which platform do you choose? Hadoop Analytic Database General Purpose RDBMS Structured Semi-Structured 20 Unstructured Big data platform comparison Analytical Database Analytics RDBMS Purpose Volume OLTP Low Variety Hadoop Moderate Anything High Relational Relational+ Variable Access Latency Concurrency SQL Low High SQL+ Moderate Moderate Java+ High Low Cost per GB High DW Hub or data mart Moderate DW or Sandbox Low Staging area and archive Role 21 The New BI Ecosystem 22 BI Framework 2020 Business Intelligence End-User Tools Reports and Dashboards Design Framework MapReduce, XML schema, Key-value pairs, graph notation, etc. HDFS, NoSQL databses Keyword search, BI tools, Xquery, Hive, Java, etc. Event-driven Reporting & Analysis Analytic Analytic Sandboxes Sandboxes Dashboard Alerts Event-Driven Alerts and Dashboards Event detection and correlation Data Warehousing Data Warehousing CEP, Streams Content Intelligence Architecture Ad hoc query, Spreadsheets, Ad hoc SQL OLAP, Visual Analysis, Analytic Workbenches, Hadoop Excel, Access, OLAP, Data mining, visual exploration Analytics Intelligence 23 Exploration Power Users Continuous Intelligence MAD Dashboards Pros: - Alignment -Consistency Cons: - Hard to build - Politically charged - Hard to change - Expensive - “Schema Heavy” BI Framework TOP DOWN- “Business Intelligence” Corporate Objectives and Strategy Reporting & Monitoring (Casual Users) Data Warehousing Architecture Predefined Metrics Reports Beget Analysis Pros: - Quick to build - Politically uncharged - Easy to change -Low cost Cons: - Alignment - Consistency - “Schema Light” Analytics Architecture Non-volatile Data Analysis Begets Reports Ad hoc queries Analysis and Prediction (Power Users) Processes and Projects 24 Volatile Data The new analytical ecosystem Operational Systems (Structured data) Operational System Extract, Transform, Load (Batch, near real-time, or real-time) Casual User Streaming/ CEP Engine Operational System Machine Data BI Server Data Warehouse Hadoop Cluster Virtual Sandboxes Web Data Audio/video Data External Data Documents & Text Dept Data Mart Top-down Architecture Bottom-up Architecture Inm em ory Sandbox FreeStanding Sandbox Analytic platform or nonrelational database Power User Analytical sandboxes Operational Systems (Structured data) Operational System Extract, Transform, Load (Batch, near real-time, or real-time) Casual User Streaming/ CEP Engine Operational System Machine Data BI Server Data Warehouse Hadoop Cluster Virtual Sandboxes Web Data Audio/video Data External Data Documents & Text Dept Data Mart Top-down Architecture Bottom-up Architecture Inmemory Sandbox FreeStanding Sandbox Analytic platform or nonrelational database Power User Workflows “Capture only what’s needed” Source Systems 1. Extract, transform, load Analytical database (DW) “Capture in case it’s needed” 5. Explore data 9. Report and mine data 6. Parse, aggregate 27 Analytical tools Recommendations • Explore applications for multi-structured data • Apply the right tool for the job – RDBMS, Analytical platform, Hadoop, NoSQL • Make power users full-fledged members of your BI environment • Reconcile top-down and bottom-up BI environments Create an analytical ecosystem! 28 Questions? • • • • • Analytical thought leader Founder, BI Leadership Forum Director of Research, TechTarget Former director of research at TDWI Author • Wayne Eckerson • weckerson@bileadership.com 29