David Douglas Department of Information Systems Introducing… 2 12 Definitions of Big Data 25 Big Data Facts 3 Short Big Data Video Short Video 4 Leveraging Big Data in Today’s Enterprise 5 Big Data Via the Three Vs 6 7 Volume 8 SAS adds a V— Visualization Of course, the objective to get Value—the ultimate V 12 V’s has been reported 9 Definition of Big Data Big Data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis -- IDC Data is the new oil – European Consumer Commissioner Meglena Keneva 10 11 The Human Face of Big Data 12 Digital Universe 13 Gartner Technology Hype Cycle 14 Gartner Technology Hype Cycle 15 Big Data Technologies… 16 Hadoop/MapReduce • • • • Was driven by the need to index the web Existing technology did not scale MapReduce framework developed at Google Yahoo! built Hadoop on the Map/Reduce framework Note: recent survey indicates only 16% of companies using the Hadoop/MapReduce environment – dominated by the online companie 17 Hadoop • Is the Storage Layer (HDFS) – Hadoop Distributed File System - Software to distribute data across multiple computing nodes. – Typically runs on top of Linux – Store each block 3 times—hopefully with one on a node in a different rack – Sequential access — write once, read many – Optimized for streaming — no random access – No predefined schema—any data type 18 Hadoop (cont) • The Execution Layer (Map/Reduce) – Responsible of running a batch job in parallel on many servers – Typically runs on top of Linux – Works with (key, value) pairs – For a job • Mapper pulls data from their respective files • Mapper Feeds Shuffle (may not be needed) • Shuffle feeds Reducer which summarizes and returns result – Java is native language 19 Map Reduce Example • Five files; each with two columns of key, value pairs of city, max temperature Example: Toronto, 20 Whitby, 25 …… Problem: Find the maximum temperature for each city Break down into 5 mapper tasks; results of mapper tasks are: (Toronto, 20) (Whitby, 25) (New York, 22) (Rome, 33) (Toronto, 18) (Whitby, 27) (New York, 32) (Rome, 37) (Toronto, 32) (Whitby, 20) (New York, 33) (Rome, 38) (Toronto, 22) (Whitby, 19) (New York, 20) (Rome, 31) (Toronto, 31) (Whitby, 22) (New York, 19) (Rome, 30) Mapper task results feeds into reduce tasks which combine the input results and outputs a single value for each city (Toronto, 32) (Whitby, 27) (New York, 33) (Rome, 38) 20 Hadoop Ecosystem 21 Technology for Big Data 22 In-Memory Computing-Speed RAM Latency 70 Nanoseconds 1400 MPH Disk Latency 5 Milliseconds 0.003 MPH 23 In-Memory Computing Speed Demo Backdrop • Oracle World Demo • Put all of Wikipedia into Oracle 12c with in-memory option • • • • SAP Tech Ed a few weeks later Put all of Wikipedia into SGI HANA box(250 billion rows) Query and Plot of Wikipedia page views of AIDS versus Ebola by date Forecast of Wikipedia page views of AIDS versus Ebola by date HANA 24 25 26 Myth or Reality • RAM is so inexpensive, it is a no-brainer to move to in-memory computing? • In-memory computing is an expected evolution in the digital universe? • In-memory computing tenet: – RAM is the new DISK – DISK is the new TAPE 27 Cases • On-line Gambling – Increasing number of online bets per second from 20,000 to 150,000 (Bwin.Party) • Education – Near real-time analytics driving intervention for improving retention (University of Kentucky) • Health Care – Intersection of smart devices, electronic health care records and in-memory analytics to provide real-time diagnostics and treatment McKinsey & Company • Global package company – Move to real-time tracking of packages MarketWatch 28 Thoughts on In-Memory Computing • In-Memory Computing makes Big Data Possible • Insight at the speed of thought • IMDBMS – reduces data footprint – Eliminates aggregates – Compression for columns higher than for rows – Optimized for RAM instead of optimized for disk 29 Two Factors Will Drive In-Memory Computing Faster than Planned • Automated Decision-Making • Mobile Computing 30 A Data Scientist 31 Another View of a Data Scientist 32 So How Do I Find One… 33 Big Data is disruptive in the following ways • • • • It brings grid and in-memory computing to business Software is being moved to the data instead of moving the data to the software Transition from analytics as rest to analytics in motion Will create new demand for workers with analytics skills 34 Big Data is really about Analytics 35 A View of Analytics Source: mu-sigma 36 Another View of Analytics Source: Rose Business Technologies 37 Achieving Success with Business Analytics Another View of Analytics Competitive Advantage Decision Optimization What is the best decision? Advanced Analytics Predictive Modeling What will happen next? Forecasting What if these trends continue? Basic Statistical Analysis Why is this happening? Reporting with Early Warning What actions are needed? Dynamic Reporting Basic Analytics Where exactly are the problems? Ad Hoc Reporting How many, how often, where? Basic Reporting What happened? Data Decision Support Information Reporting Intelligence Decision Guidance 38 Another View of Analytics 39 Another View of Analytics 40 Cognitive Computing? Watson gains eyes, ears and a voice 41 The Importance of Big Data and Analytics • Wall Street Journal 9/16/13 – 44% of CIOs consider Business Intelligence as top priority for technology spending – 51% of the companies plan to increase spending on Business Intelligence and Analytics software this year • A recent McKinsey report – Considers Big Data as “The next frontier for competition” – “The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data.“ • Do you need a Data Scientist? 42 The Importance of Big Data and Analytics 43 Data Driven Decisions • Analytics, The New Path to Value (MIT research report: 30 industries, 100 countries) – Analytics is the differentiator for the top performing companies – chart on next slide – Data is not the problem 44 45 5 Stages of Big Data and Analytics Maturity 46 Current State 47 There is a Journal Big Data Journal Word (Tag) Cloud Word Cloud with Images Easy Text Manipulation http://www.ibm.com/analytics/watson-analytics/ https://ace.ng.bluemix.net/ http://www.biography.com/people/warren-buffett-9230729#synopsis 48 49 Of Interest • Social Bakers • Amazing Twitter Stats • Google Trends • Social media location adds considerable opportunity 50 Implications… 51 The Analytics At rest (static) Models including predictive models using historical data In-motion (real-time) Using models on a stream feed Combination Uses models on a stream feed; stream feed goes into the data at rest to update models 52 Analytics at Rest—Analytics in Motion 53 Thoughts • It is not a matter of “if” but “when” you get into Big Data analytics • Purpose is to provide enablement for users • Choices – Pure plays like Cloudera, Hortonworks, MapR, Pivotal, etc. – NoSQL databases (key-value, documents, networks) – Major computing player like IBM, Oracle, etc. – In-Memory Computing – Should not be a new silo 54 Terms • IoT – Internet of Things • IoE – Internet of Everything • IoN – Internet of Nothing The vast majority of the billions of things connected to the internet on Cisco’s website, for instance, are not the toasters, refrigerators, thermostats, smoke detectors, pace-makers and insulin pumps that the IoT's true believers enthuse about. Almost exclusively, they are existing smartphones, tablets, computers and routers, plus a surprising number of industrial components used to beam performance statistics back to corporate headquarters. Without any hoopla, operators of power stations, passenger jets, railways, refineries, chemical plants, oil platforms and other industrial equipment have been doing this for ages. 55 EMC Digital Universe with Research & Analysis by IDC The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things April 2014 40% of data created and consumed by consumers 56 57 We Live in an Era of Change 58 Gartner’s M agic Quadrant – BI & Analytics 59 60 Good Reading iPad App 61 62