Big Data
…Big Opportunities ?
……Big Hype ?
(or just a Big Mess ?)
Data challenges and IBM views
Dr. Matthew Ganis
IBM Senior Technical Staff Member
CIO Social Media Analytics Chief Architect
Member, IBM Academy of Technology
[email protected]
@mattganis (twitter)
The Term “Big Data” is pervasive - but still provokes a bit of confusion.
SO what is it ?
Big Data has been used to convey all sorts of concepts, including huge
Quantities of data, social media analytics, next generation data management
Capabilities, real time data and much much more.....
That means we create about
1.8 Zetabytes of Information every
two years.
Extracting insight from an immense volume, variety and velocity of
data, in context, beyond what was previously possible.
Information is at the Center
of a New Wave of Opportunity…
44x
2020
35 zettabytes
as much Data and Content
Over Coming Decade
2009
800,000 petabytes
5
80%
Of world’s data
is unstructured
… And Organizations Need
Deeper Insights
1 in 3
Business leaders frequently make
decisions based on information they
don’t trust, or don’t have
1 in 2
Business leaders say they don’t have
access to the information they need
to do their jobs
83%
of CIOs cited “Business
intelligence and analytics” as part
of their visionary plans
to enhance competitiveness
60%
of CEOs need to do a better job
capturing and understanding
information rapidly in order to make
swift business decisions
Structured vs Unstructured
Structured data refers to information with a high degree of
organization, such that inclusion in a relational database is
seamless and readily searchable by simple, straightforward
search engine algorithms or other search operations;
whereas unstructured data is essentially the opposite.
The lack of structure makes compilation a time and
energy-consuming task.
The Challenge: Bring Together a Large Volume and Variety of Data
to Find New Insights
Multi-channel customer sentiment
and experience a analysis
Detect life-threatening
conditions at hospitals in time
to intervene
Predict weather patterns to plan
optimal wind turbine usage, and
optimize capital expenditure on
asset placement
Make risk decisions based on realtime transactional data
Identify criminals and threats from
disparate video, audio, and data
feeds
7
Where we want to go
Merging the Traditional and Big Data Approaches
Traditional Approach
Big Data Approach
Structured & Repeatable Analysis
Iterative & Exploratory Analysis
IT
Business Users
Delivers a platform to
enable creative discovery
Determine what
question to ask
IT
Business Users
Structures the data
to answer that
question
Explores what questions
could be asked
Monthly sales reports
Profitability analysis
Customer surveys
9
Structured
vs.
Exploratory
Brand sentiment
Product strategy
Maximum asset utilization
Where is all this data coming from ?
Where is all this data coming from ?
The Internet of Things (IoT) is a scenario in
which objects, animals or people are provided
with unique identifies and the ability to
automatically transfer data over a network
without requiring human-to-human or human-tocomputer interaction
Where is all this data coming from ?
Approximately 2.7 billion users
on the Internet today
Social Media as Big Data
What are we running ?
Who is talking about us ?
Male / Female / Student / Professional / Retired / Customers ?
What do they “feel” ?
Positive/Negative Sentiment / Angry / Annoyed ?
Where are they talking ?
Who are they influencing ?
Who’s listening to them ?
When customers are talking about us or about our products we want
to know where those conversations are happening so we can:
•Interact with interested customers
•Get in front of any issues
Numerous studies show that word-of-mouth and personal recommendations
are seen as far more credible to consumers than newspaper and television
advertisements. While such mass advertisements are still necessary because
of their powerful reach, these findings show that companies need to increase
their focus on more personalized approaches. Clearly, this is incredibly difficult,
maybe even impossible, for most companies to deal directly with the countless
number of potential consumers. This is where influencers come in……
What makes someone Influential ?
The number of tweets they make ?
The number of times people mention them ?
The number of followers they have?
How often they are retweeted ?
We were asked to look at why a particular product launch wasn’t performing
as expected. We pulled all the “chatter” about it and found:
But there were people talking about it…..
Some things to think about…..
Where is all this data coming from ?
While it is true that vast amounts of data are and will be generated from
financial transactions, medical records, mobile phones and social media to the
Internet of Things but there are questions that need to be asked to understand
data’s meaningful use:
• How will data be managed?
• How will data be shared?
Some thoughts about “data as a service”
•Establishment of standards, governance, guidelines. (E.g., open architectures)
•Creation of industry specific data exchanges. (E.g., healthcare data
exchanges, environment data exchanges etc.)
•Creation of cross-industry data exchanges. (E.g., healthcare data exchanges
seamlessly interacting with environmental data exchanges etc.)
Enterprise Integration
Data Warehouse
Big Data Platform
 Trusted Information &
Governance
– Companies need to govern
what comes in, and the
insights that come out
Enterprise Integration
 Data Management
– Insights from Big Data must
be incorporated into the
warehouse
Traditional Sources
34
New Sources
Poor data quality
Dirty data
Missing values
Inadequate data size
Poor representation in data sampling
Data variety - trying to accommodate data that comes from different sources and in a
variety of different forms (images, geo data, text, social, numeric, etc.).
How do we link them together ?
Is there a common taxonomy or why to organize it ?
Is there a “signal” in one source of data that points to another ?
Dealing with huge datasets, or 'Big Data,' that require distributed approaches.
Who is influential ?
How do we define influence ?
Thank you for your attention
39
Where is all this data coming from ?
The Big Data Opportunity
Extracting insight from an immense volume, variety and velocity of
data, in context, beyond what was previously possible.
Variety:
Manage the complexity of
multiple relational and nonrelational data types and
schemas
Velocity: Streaming data and large
volume data movement
Volume:
41
Scale from terabytes to
zettabytes (1B TBs)
Big Data : why is it possible Now ?
Traditional approach
 Traditional approach : Data to Function
Application server and Database
server are separate
Data can be on multiple servers
Analysis Program can run on multiple
Application servers
Network is still a the middle
Data have to go through the network
Query Data
User request
Database
server
Application
server
Send result
return Data
process Data
Data
 Big Data approach : Function to Data
Send Function to
process on Data
User request
Master node
Send Consolidate result
42
Query &
process Data
Data
Data
nodes
Data
nodes
Data
nodes
nodes
Data
Data
Data
Data
•Big Data Approach
 Analysis Program runs where are the
data : on Data Node
Only the Analysis Program are have to
go through the network
Analysis Program need to be
MapReduce aware
Highly Scalable :
1000s Nodes
Petabytes and more
What Big Data Is Not
It is not a replacement for your Database strategy
It is not a replacement for your Warehouse strategy
It is not a solution by itself, it needs jobs/applications
to drive value
43
Download

Social Media Big Data Analytics