Big Data Definitions • Big data refers to data sets that are over 30 terabytes (~a trillion bytes or a thousand gigabytes) which are collected from traditional and digital sources both inside and outside a company Gartner’s Definitions – The 3 V’s • Variety – Structured: identifiable data in a traditional database, usually with columns and rows, that can be easily read by a computer or by a human – Unstructured: has no identifiable structure, like text documents, email, pictures, video, audio, tweets, stock ticker data and financial transactions – Multi-structured: a mixture of both • Volume – How much data is coming in – Examples include transaction-based data stored through the years or unstructured data streaming in from social media. • Velocity – How fast the data is coming in – Examples like RFID tags, sensors and smart metering all produce huge amounts of data in real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations. Not part of Gartner’s definitions but also widely accepted • Variability – When data flows can be highly inconsistent with periodic peaks – Examples are when something is trending on social media or if there are daily, seasonal or event-triggered peaks • Complexity – Data comes from multiple sources and being able to link, match, cleanse and transform data across systems is a complexity. Necessary to connect and correlate relationships, hierarchies and multiple data linkages so the data doesn’t get out of control. Companies involved • Data storage, networking and hardware companies: – ARM, Brocade, Cisco, Dell, EMC, HP, Intel ,Lenovo, NetApp, Seagate • Enterprise software companies: – Adobe, Citrix System, IBM, Fujitsu, Informatica, Oracle, Red Hat, SAP, Salesforce.com How much data •Every day 2.5 quintillion (billion billion) bytes of data is created •More data was produced from 2010-2012 than in all of history •295 exabytes = 1018 = 295,000,000,000,000,000,000 bytes of digital data exists (as of Dec 2012) •Every hour enough information is consumed by internet traffic to fill 7 million DVDs •Every 18 months the sum of digital human knowledge doubles source •As of 2013, 90% of the world’s data was created within the past two years –Only 20% is structured — meaning that it can be readily analyzed via the same tools that have been used for over four decades –The remaining 80% of this newly created data is “unstructured” content stemming from sources such as Instagram photos, YouTube videos and social media posts source source Trends in Big Data • According to research collective Wikibon, in 2013 big data is an $18 billion market on its way to $50 billion in 2018, source • Mobiles will provide a lot of the future’s data, including information from apps, GPS location, and other services running in the background • Price discrimination, Orbitz accused of charging more to Mac users, Netflix ran an experiment using big data on the users of Rottentomatoes, Wikipedia and Blockbuster.com to see what price the market would bear • On-the-fly and continuous champion/challenger testing of offers and content on websites • Cukier and Mayer-Schoenberger wrote a best-selling book Trends in Big Data • Social network analysis (SNA) – The mapping and measurement of relationships and flows between people, groups, organizations or other actors source – Made up of nodes (points or hubs) and ties (lines connecting the points), to analyze data – Example of SNA are Stanley Milgram’s six degrees of separation in the 1960’s – Example: uncovering relationships between entities or customers in a large network with the goal of identifying influencing nodes of customers • Next-best offer (NBO) – Customer-centric marketing paradigm that considers the different actions that can be taken for a specific customer and decides on the ‘best’ one – This is an offer, proposition, service, etc. that is determined by the customer’s interests and needs on the one hand, and the marketing organization’s business objectives on the other – Analytics estimates the probability that customers will be interested in a targeted offer • True-lift modeling or uplift modeling – Modeling to predict the influence on a customer's buying behavior that results from marketing contact – If you are launching a marketing campaign, there's no sense in sending an offer to prospective customers who would have bought anyway, to people who will react negatively when contacted, or to those who are "lost causes." The key is to focus in on only those people who are "persuadable.“ source – A predictive modelling technique that directly models the incremental impact of a treatment (such as a direct marketing action) on an individual's behaviour – Can be used for up-sell, cross-sell and retention modelling Derived revenue from big data • Amazon, Microsoft, Deloitte and Google are deriving 1% of their revenue on Big Data (2013) source Future of Big Data •McKinsey calls it "the biggest game-changing opportunity for marketing and sales since the Internet went mainstream 20 years ago.“ source •By 2020 one-third of data will be stored or will have passed through the cloud •By 2020 IT departments will look after 10x more servers, 50x more data and 75x more files •Cognitive computing (what Google is doing) is next for Big Data, being able to analyze data in the context of other consumer behaviours •Mobile, cloud, social and big data to drive 90% of all growth in the IT market from 2013-2020 (Chartered Institute for IT) •The world’s digital information is expected to grow by 57%. Within that, internet traffic is growing by 35%, and mobile data traffic at 110% (Cisco, 2013) Challenges: Bandwidth issues •Privacy •Security Source source Future of Big Data 2 •Intelligent personalization –Uses all the data that a marketer has at their disposal to optimize the content and optimize the experience, including things like mobile device, and regional optimization, real-time behavior, social signals, transactional data coming from an e-Commerce system and much more •Situational analytics –The topic of “predictive analytics” is very hot today. But as any marketer knows, it’s near impossible to actually predict how any customer interaction is going to go. Instead, the smart integration of data is going to result in “situational analytics.” This means being able to look at data, and plug in different, hypothetical situations and see which one has the better chance of actually succeeding. •Level playing field –One of the biggest evolutions of integrating smarter data into content experiences is that it levels the playing field with larger competitors who may have more resources to burn on advertising media source Techniques for Analyzing • A/B testing • Association rule learning: To discover what relationships or “association rules,” like what basket of goods a consumer might buy • Cluster analysis: Splitting large groups into smaller groups • Crowdsourcing: Collecting data submitted by a large group of people or a community Source Big Data and Healthcare •Nearly a quarter billion health and fitness apps will be downloaded by 2017, up from 156 million today, predicts iSuppli •Sales of sports and fitness monitors, like heart-rate monitors and pedometers, will reach 56.2 million units in 2017. Many of these will be on mobile phones, and they will increasingly connect to the Internet •Wellpoint and New York’s Memorial Sloan-Kettering Cancer Center are creating Watson (IBM) apps that will answer cancer diagnosis and treatment questions from doctors, researchers, and insurance companies source