Big Data …Big Opportunities ? ……Big Hype ? (or just a Big Mess ?) Data challenges and IBM views Dr. Matthew Ganis IBM Senior Technical Staff Member CIO Social Media Analytics Chief Architect Member, IBM Academy of Technology ganis@us.ibm.com @mattganis (twitter) The Term “Big Data” is pervasive - but still provokes a bit of confusion. SO what is it ? Big Data has been used to convey all sorts of concepts, including huge Quantities of data, social media analytics, next generation data management Capabilities, real time data and much much more..... That means we create about 1.8 Zetabytes of Information every two years. Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible. Information is at the Center of a New Wave of Opportunity… 44x 2020 35 zettabytes as much Data and Content Over Coming Decade 2009 800,000 petabytes 5 80% Of world’s data is unstructured … And Organizations Need Deeper Insights 1 in 3 Business leaders frequently make decisions based on information they don’t trust, or don’t have 1 in 2 Business leaders say they don’t have access to the information they need to do their jobs 83% of CIOs cited “Business intelligence and analytics” as part of their visionary plans to enhance competitiveness 60% of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions Structured vs Unstructured Structured data refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations; whereas unstructured data is essentially the opposite. The lack of structure makes compilation a time and energy-consuming task. The Challenge: Bring Together a Large Volume and Variety of Data to Find New Insights Multi-channel customer sentiment and experience a analysis Detect life-threatening conditions at hospitals in time to intervene Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement Make risk decisions based on realtime transactional data Identify criminals and threats from disparate video, audio, and data feeds 7 Where we want to go Merging the Traditional and Big Data Approaches Traditional Approach Big Data Approach Structured & Repeatable Analysis Iterative & Exploratory Analysis IT Business Users Delivers a platform to enable creative discovery Determine what question to ask IT Business Users Structures the data to answer that question Explores what questions could be asked Monthly sales reports Profitability analysis Customer surveys 9 Structured vs. Exploratory Brand sentiment Product strategy Maximum asset utilization Where is all this data coming from ? Where is all this data coming from ? The Internet of Things (IoT) is a scenario in which objects, animals or people are provided with unique identifies and the ability to automatically transfer data over a network without requiring human-to-human or human-tocomputer interaction Where is all this data coming from ? Approximately 2.7 billion users on the Internet today Social Media as Big Data What are we running ? Who is talking about us ? Male / Female / Student / Professional / Retired / Customers ? What do they “feel” ? Positive/Negative Sentiment / Angry / Annoyed ? Where are they talking ? Who are they influencing ? Who’s listening to them ? When customers are talking about us or about our products we want to know where those conversations are happening so we can: •Interact with interested customers •Get in front of any issues Numerous studies show that word-of-mouth and personal recommendations are seen as far more credible to consumers than newspaper and television advertisements. While such mass advertisements are still necessary because of their powerful reach, these findings show that companies need to increase their focus on more personalized approaches. Clearly, this is incredibly difficult, maybe even impossible, for most companies to deal directly with the countless number of potential consumers. This is where influencers come in…… What makes someone Influential ? The number of tweets they make ? The number of times people mention them ? The number of followers they have? How often they are retweeted ? We were asked to look at why a particular product launch wasn’t performing as expected. We pulled all the “chatter” about it and found: But there were people talking about it….. Some things to think about….. Where is all this data coming from ? While it is true that vast amounts of data are and will be generated from financial transactions, medical records, mobile phones and social media to the Internet of Things but there are questions that need to be asked to understand data’s meaningful use: • How will data be managed? • How will data be shared? Some thoughts about “data as a service” •Establishment of standards, governance, guidelines. (E.g., open architectures) •Creation of industry specific data exchanges. (E.g., healthcare data exchanges, environment data exchanges etc.) •Creation of cross-industry data exchanges. (E.g., healthcare data exchanges seamlessly interacting with environmental data exchanges etc.) Enterprise Integration Data Warehouse Big Data Platform Trusted Information & Governance – Companies need to govern what comes in, and the insights that come out Enterprise Integration Data Management – Insights from Big Data must be incorporated into the warehouse Traditional Sources 34 New Sources Poor data quality Dirty data Missing values Inadequate data size Poor representation in data sampling Data variety - trying to accommodate data that comes from different sources and in a variety of different forms (images, geo data, text, social, numeric, etc.). How do we link them together ? Is there a common taxonomy or why to organize it ? Is there a “signal” in one source of data that points to another ? Dealing with huge datasets, or 'Big Data,' that require distributed approaches. Who is influential ? How do we define influence ? Thank you for your attention 39 Where is all this data coming from ? The Big Data Opportunity Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible. Variety: Manage the complexity of multiple relational and nonrelational data types and schemas Velocity: Streaming data and large volume data movement Volume: 41 Scale from terabytes to zettabytes (1B TBs) Big Data : why is it possible Now ? Traditional approach Traditional approach : Data to Function Application server and Database server are separate Data can be on multiple servers Analysis Program can run on multiple Application servers Network is still a the middle Data have to go through the network Query Data User request Database server Application server Send result return Data process Data Data Big Data approach : Function to Data Send Function to process on Data User request Master node Send Consolidate result 42 Query & process Data Data Data nodes Data nodes Data nodes nodes Data Data Data Data •Big Data Approach Analysis Program runs where are the data : on Data Node Only the Analysis Program are have to go through the network Analysis Program need to be MapReduce aware Highly Scalable : 1000s Nodes Petabytes and more What Big Data Is Not It is not a replacement for your Database strategy It is not a replacement for your Warehouse strategy It is not a solution by itself, it needs jobs/applications to drive value 43