Big Data in Context A practical, real-world view BCS Event, Leeds November 2012 Dale Vile CEO & Research Director Twitter: dale_vile Blog: researchbeat.com Freeform Dynamics Ltd www.freeformdynamics.com Copyright 2012 Freeform Dynamics Ltd 1 Copyright 2012 Freeform Dynamics Ltd The term ‘big data’ is currently being over-hyped by IT vendors in an unhelpful way 0% 20% 40% 60% 80% 100% Sep 2012 Sep 2011 5-Totally agree Copyright 2012 Freeform Dynamics Ltd 4 3 2 1-Totally disagree 2 Unsure Some up front statements ‘Big Data’ is a bandwagon But some genuinely new and interesting stuff is going on behind the hype Maturity remains an issue, and lots of challenges exist The new doesn’t (usually) replace the old It’s important to keep things in context Copyright 2012 Freeform Dynamics Ltd 3 Topics What’s the problem we are trying to solve? What, exactly, constitutes ‘big data’? Hadoop as an example of a big data solution How does big data change the way we think? Some common use cases The broader technology picture Frequently encountered challenges Looking to the future Copyright 2012 Freeform Dynamics Ltd 4 The problem (and opportunity) in a nutshell How much growth? 0% 20% 40% 60% 80% How well do you exploit? 100% 0% 20% 40% 60% 80% Structured data (e.g. tabular data in RDBMSs) Unstructured data (e.g. documents, messages, multimedia, etc) 5 (Extremely high growth) 4 3 2 1 (No growth) Copyright 2012 Freeform Dynamics Ltd 5 (Fully exploit) 4 3 2 1 (Very poorly exploit) 5 100% And that data just keeps on coming In the words of survey respondents… Increased transaction rates New business paradigms, especially the moving of revenue streams online CRM & social networking Movement away from paper to electronic documents Audit stipulation The shift from above the line advertising spend to direct marketing Regulation & compliance More affordable technology is available to store and analyse data Cheap storage Everything is bigger, faster, cheaper High demand for immediate access to more and more data Digital video archiving Increased signalling traffic in telecoms networks Business demand for better knowledge and insight Desire for reporting over longer time periods, with higher levels of drill down. Storage costs drop and processing power increases; formerly impossible applications morph into expensive ones, which eventually become mainstream Copyright 2012 Freeform Dynamics Ltd Greater use of ecommerce methods for supply management Stashing data that we used to archive to take advantage of future technologies Predictive analytics Better and more widespread sensors Fear of 'throwing away' Duplicate copies of data for BI and data mining. Poorly designed systems with inefficient storage and no archive functions No desire from Business to archive data 6 Ever more detailed (higher resolution) survey data. Digital imagery, Webex logging, email. Same information stored in many places (mail, file server, SharePoint, ...) Vast number of emails with client presentations attached Increased use of digital cameras for data capture *&%!* SharePoint Smart meters Increasing availability of external data which may or may not be highly relevant So what constitutes big data? The 3 V’s Volume More V’s Variety Voracity Value Value-Density Velocity Copyright 2012 Freeform Dynamics Ltd 7 A practical view M2M feeds, web activity logs, ticker data, etc. ERP, CRM, SCM & other transaction data Social media, news feeds, harvested web content, etc. Document repositories, message stores, etc. HIGHLY UNSTRUCTURED Copyright 2012 Freeform Dynamics Ltd 8 HIGH VALUE DENSITY BIG DATA LOW VALUE DENSITY HIGHLY STRUCTURED Need for a different architectural approach SCALE OUT SCALE UP (e.g. high performance RDBMS cluster) Powerful CPUs Lots of cores Huge memory Expensive disk Expensive SW Copyright 2012 Freeform Dynamics Ltd Distributed Commodity Hardware Open Source Software Parallel processing Principle of divide and conquer Distribute data into small chunks Execute lots of little tasks close to the data, then merge results 9 MapReduce HDFS Breaks traditional conventions Copyright 2012 Freeform Dynamics Ltd 10 Other tools Pig Cassandra ZooKeeper Hive hadoop.apache.org HBase The elephant in the room Comparison of approaches TRADITIONAL APPROACH BIG DATA APPROACH Schema based data model Key/value based (no schema) Create model, then load data Load data, then create model Only load what’s valuable Load data speculatively Premeditated/prescriptive analysis Exploratory/iterative analysis What’s the answer? What’s the question? Fastest time to result Generate the best insight Different way of thinking, different level of impact Copyright 2012 Freeform Dynamics Ltd 11 Some common big data use cases Social analytics (the ‘poster child’) Customer analytics in the broader sense Profiling and segmentation Advertising and promotion Retail optimisation (pricing, merchandising, etc) Customer services and support IT systems monitoring and management Security and associated forensics Business operations Suppler management, logistics, energy management Industry specific Financial services, public sector, telecoms Copyright 2012 Freeform Dynamics Ltd 12 INPUTS More data Greater diversity Faster acquisition More sources ANALYSIS More urgency Less predictability More granularity More history Smaller time-slices But vanilla Hadoop seldom the answer Enterprise readiness of Hadoop Resilience, security, integration friendliness Apache tools relatively raw, so look out for other distributions Cloudera, Hortonworks, MapR Technologies, IBM InfoSphere BigInsights… Mainstream vendors substituting components and extending framework Hadoop becoming an engine that sits behind commercial frameworks and tools IBM, Microsoft, Oracle, SAP, SAS, EMC, Teradata, … And Hadoop doesn’t define the whole advanced data management and analytics opportunity anyway Enhanced RDBMS, next generation data warehousing, NoSQL, statistical modelling, predictive analytics, time-series analysis, in-memory databases, stream based processing engines, and more…. it’s a pretty lively area Copyright 2012 Freeform Dynamics Ltd 13 Use of traditional and emerging technologies Current level of use 0% 20% 40% 60% 80% Change over next 3 years 100% -60% -40% -20% 0% 20% 40% Legacy databases and file systems General purpose RDBMS servers High performance RDBMS configurations OLAP multi-dimensional database systems Write once read many (WORM) databases Rule-based stream processing engines In memory databases Scale-out storage architectures Distributed indexing and search Distributed data analytics engines 5 (Extensive use) Copyright 2012 Freeform Dynamics Ltd 4 3 2 1 (Not used at all) Unsure 14 Less use More use 60% Taking a joined up approach Derivative structured data £ External feeds Advanced analytics Business insights Traditional BI systems Business decision makers Business models and policies Data scientists Operational data Operational data Actionable rules Operational systems Front line staff Customers & suppliers Copyright 2012 Freeform Dynamics Ltd 15 Common challenges organisations face Culture of driving via the rear view mirror Too much focus on ‘lag’ rather than ‘lead’ indicators Emphasis on planning/score keeping rather than in-flight control Management and decision making issues Lack of business and political alignment between divisions Parochial approach to budgeting and investment in IT Fragmented and disjointed systems and information Different formats, different coding structures Different levels of accuracy, quality and completeness Governance and control Ownership of source data often ambiguous Security, privacy and compliance challenges of centralised big data repositories Business and IT staff don’t know what they don’t know Locked into historical perceptions and assumptions Knowledge and skills gap often not recognised Copyright 2012 Freeform Dynamics Ltd 16 Looking to the future Blurring of the lines Big data and traditional BI Operational control and analytics Analysts and business people Managers and front-line staff On premise and cloud Mobile and office based KEY QUESTIONS How many of those data stores can be combined? Layering of analytics tools over big data infrastructure Promise and potential of in memory solutions Role of deep space skills vs standard models and templates? How quickly will the cultural shifts take place? Copyright 2012 Freeform Dynamics Ltd 17 How much do you agree or disagree with the following statements? 0% 20% 40% 60% 80% Developments in advanced storage, access and analytics can allow us to tackle problems today that were either too hard or too expensive to deal with in the past Developments in advanced storage, access and analytics can allow us to take different and better approaches to tackling some key business requirements Vendors and consulting firms are well geared up to providing us with the support and services we need to take advanced storage, access and analytics on board effectively 5 (Totally agree) Copyright 2012 Freeform Dynamics Ltd 4 3 2 1 (Totally disagree) 18 Unsure 100% Thank You Copyright 2012 Freeform Dynamics Ltd 19 Big Data in Context A practical, real-world view BCS Event, Leeds November 2012 Dale Vile CEO & Research Director Twitter: dale_vile Blog: researchbeat.com Freeform Dynamics Ltd www.freeformdynamics.com Copyright 2012 Freeform Dynamics Ltd 20 Copyright 2012 Freeform Dynamics Ltd About Freeform Dynamics Mission: To make emerging ideas and technologies more accessible the mainstream organisations Cut through vendor promises and hype Decipher aspirational marketing aimed at early adopters Pick the brains of early movers and learn from their experience Distil out critical success factors, tips, tricks and traps Provide advice to the broader community in plain English Mechanics Briefings with IT vendors and service providers Primary research - face to face, telephone and online Use of press and social media to get stuff out there Copyright 2012 Freeform Dynamics Ltd 21