Big Data - Notes for 4 Life 2.0

advertisement
Big Data Definitions
• Big data refers to data sets that are over 30 terabytes
(~a trillion bytes or a thousand gigabytes) which are
collected from traditional and digital sources both
inside and outside a company
Gartner’s Definitions – The 3 V’s
• Variety
– Structured: identifiable data in a traditional database, usually with columns and rows,
that can be easily read by a computer or by a human
– Unstructured: has no identifiable structure, like text documents, email, pictures,
video, audio, tweets, stock ticker data and financial transactions
– Multi-structured: a mixture of both
• Volume
– How much data is coming in
– Examples include transaction-based data stored through the years or unstructured
data streaming in from social media.
• Velocity
– How fast the data is coming in
– Examples like RFID tags, sensors and smart metering all produce huge amounts of data
in real time. Reacting quickly enough to deal with data velocity is a challenge for most
organizations.
Not part of Gartner’s definitions but
also widely accepted
• Variability
– When data flows can be highly inconsistent with periodic peaks
– Examples are when something is trending on social media or if there are daily,
seasonal or event-triggered peaks
• Complexity
– Data comes from multiple sources and being able to link, match, cleanse and
transform data across systems is a complexity. Necessary to connect and correlate
relationships, hierarchies and multiple data linkages so the data doesn’t get out of
control.
Companies involved
• Data storage, networking and hardware
companies:
– ARM, Brocade, Cisco, Dell, EMC, HP, Intel ,Lenovo,
NetApp, Seagate
• Enterprise software companies:
– Adobe, Citrix System, IBM, Fujitsu, Informatica,
Oracle, Red Hat, SAP, Salesforce.com
How much data
•Every day 2.5 quintillion (billion billion) bytes of data is created
•More data was produced from 2010-2012 than in all of history
•295 exabytes = 1018 = 295,000,000,000,000,000,000 bytes of digital data
exists (as of Dec 2012)
•Every hour enough information is consumed by internet traffic to fill 7
million DVDs
•Every 18 months the sum of digital human knowledge doubles source
•As of 2013, 90% of the world’s data was created within the past two years
–Only 20% is structured — meaning that it can be readily analyzed via the same tools
that have been used for over four decades
–The remaining 80% of this newly created data is “unstructured” content stemming
from sources such as Instagram photos, YouTube videos and social media posts source
source
Trends in Big Data
• According to research collective Wikibon, in 2013 big data
is an $18 billion market on its way to $50 billion in 2018,
source
• Mobiles will provide a lot of the future’s data, including
information from apps, GPS location, and other services
running in the background
• Price discrimination, Orbitz accused of charging more to
Mac users, Netflix ran an experiment using big data on the
users of Rottentomatoes, Wikipedia and Blockbuster.com to
see what price the market would bear
• On-the-fly and continuous champion/challenger testing of
offers and content on websites
• Cukier and Mayer-Schoenberger wrote a best-selling book
Trends in Big Data
•
Social network analysis (SNA)
– The mapping and measurement of relationships and flows between people, groups,
organizations or other actors source
– Made up of nodes (points or hubs) and ties (lines connecting the points), to analyze data
– Example of SNA are Stanley Milgram’s six degrees of separation in the 1960’s
– Example: uncovering relationships between entities or customers in a large network with the
goal of identifying influencing nodes of customers
•
Next-best offer (NBO)
– Customer-centric marketing paradigm that considers the different actions that can be taken
for a specific customer and decides on the ‘best’ one
– This is an offer, proposition, service, etc. that is determined by the customer’s interests and
needs on the one hand, and the marketing organization’s business objectives on the other
– Analytics estimates the probability that customers will be interested in a targeted offer
•
True-lift modeling or uplift modeling
– Modeling to predict the influence on a customer's buying behavior that results from
marketing contact
– If you are launching a marketing campaign, there's no sense in sending an offer to prospective
customers who would have bought anyway, to people who will react negatively when
contacted, or to those who are "lost causes." The key is to focus in on only those people who
are "persuadable.“ source
– A predictive modelling technique that directly models the incremental impact of a treatment
(such as a direct marketing action) on an individual's behaviour
– Can be used for up-sell, cross-sell and retention modelling
Derived revenue from big data
• Amazon, Microsoft, Deloitte and Google are
deriving 1% of their revenue on Big Data (2013)
source
Future of Big Data
•McKinsey calls it "the biggest game-changing opportunity for marketing and sales
since the Internet went mainstream 20 years ago.“ source
•By 2020 one-third of data will be stored or will have passed through the cloud
•By 2020 IT departments will look after 10x more servers, 50x more data and 75x
more files
•Cognitive computing (what Google is doing) is next for Big Data, being able to
analyze data in the context of other consumer behaviours
•Mobile, cloud, social and big data to drive 90% of all growth in the IT market from
2013-2020 (Chartered Institute for IT)
•The world’s digital information is expected to grow by 57%. Within that, internet
traffic is growing by 35%, and mobile data traffic at 110% (Cisco, 2013)
Challenges:
Bandwidth issues
•Privacy
•Security
Source source
Future of Big Data 2
•Intelligent personalization
–Uses all the data that a marketer has at their disposal to optimize the content and
optimize the experience, including things like mobile device, and regional optimization,
real-time behavior, social signals, transactional data coming from an e-Commerce system
and much more
•Situational analytics
–The topic of “predictive analytics” is very hot today. But as any marketer knows, it’s near impossible to
actually predict how any customer interaction is going to go. Instead, the smart integration of data is
going to result in “situational analytics.” This means being able to look at data, and plug in different,
hypothetical situations and see which one has the better chance of actually succeeding.
•Level playing field
–One of the biggest evolutions of integrating smarter data into content experiences is that it levels the
playing field with larger competitors who may have more resources to burn on advertising media
source
Techniques for Analyzing
• A/B testing
• Association rule learning: To discover what
relationships or “association rules,” like what
basket of goods a consumer might buy
• Cluster analysis: Splitting large groups into
smaller groups
• Crowdsourcing: Collecting data submitted by
a large group of people or a community
Source
Big Data and Healthcare
•Nearly a quarter billion health and fitness apps will be
downloaded by 2017, up from 156 million today, predicts
iSuppli
•Sales of sports and fitness monitors, like heart-rate monitors
and pedometers, will reach 56.2 million units in 2017. Many
of these will be on mobile phones, and they will increasingly
connect to the Internet
•Wellpoint and New York’s Memorial Sloan-Kettering Cancer
Center are creating Watson (IBM) apps that will answer
cancer diagnosis and treatment questions from doctors,
researchers, and insurance companies
source
Download