big data

advertisement
Big Data in Context
A practical, real-world view
BCS Event, Leeds
November 2012
Dale Vile
CEO & Research Director
Twitter: dale_vile
Blog: researchbeat.com
Freeform Dynamics Ltd
www.freeformdynamics.com
Copyright 2012 Freeform Dynamics Ltd
1
Copyright 2012 Freeform Dynamics Ltd
The term ‘big data’ is currently being over-hyped
by IT vendors in an unhelpful way
0%
20%
40%
60%
80%
100%
Sep 2012
Sep 2011
5-Totally agree
Copyright 2012 Freeform Dynamics Ltd
4
3
2
1-Totally disagree
2
Unsure
Some up front statements
‘Big Data’ is a bandwagon
But some genuinely new and interesting
stuff is going on behind the hype
Maturity remains an issue, and lots of
challenges exist
The new doesn’t (usually) replace the old
It’s important to keep things in context
Copyright 2012 Freeform Dynamics Ltd
3
Topics
What’s the problem we are trying to solve?
What, exactly, constitutes ‘big data’?
Hadoop as an example of a big data solution
How does big data change the way we think?
Some common use cases
The broader technology picture
Frequently encountered challenges
Looking to the future
Copyright 2012 Freeform Dynamics Ltd
4
The problem (and opportunity) in a nutshell
How much growth?
0%
20%
40%
60%
80%
How well do you exploit?
100%
0%
20%
40%
60%
80%
Structured data (e.g.
tabular data in
RDBMSs)
Unstructured data
(e.g. documents,
messages, multimedia, etc)
5 (Extremely high growth)
4
3
2
1 (No growth)
Copyright 2012 Freeform Dynamics Ltd
5 (Fully exploit)
4
3
2
1 (Very poorly exploit)
5
100%
And that data just keeps on coming
In the words of survey respondents…
Increased
transaction rates
New business paradigms,
especially the moving of
revenue streams online
CRM & social networking
Movement away from
paper to electronic
documents
Audit
stipulation
The shift from above the
line advertising spend to
direct marketing
Regulation &
compliance
More affordable
technology is available to
store and analyse data
Cheap
storage
Everything is bigger, faster, cheaper
High demand for
immediate access to
more and more data
Digital video
archiving
Increased
signalling traffic in
telecoms networks
Business demand for
better knowledge
and insight
Desire for reporting over
longer time periods, with
higher levels of drill down.
Storage costs drop and processing power
increases; formerly impossible applications
morph into expensive ones, which
eventually become mainstream
Copyright 2012 Freeform Dynamics Ltd
Greater use of
ecommerce methods for
supply management
Stashing data that we
used to archive to take
advantage of future
technologies
Predictive analytics
Better and more
widespread
sensors
Fear of
'throwing away'
Duplicate copies of data
for BI and data mining.
Poorly designed
systems with inefficient
storage and no archive
functions
No desire from
Business to
archive data
6
Ever more detailed
(higher resolution)
survey data.
Digital imagery,
Webex logging,
email.
Same information stored in
many places (mail, file
server, SharePoint, ...)
Vast number of
emails with client
presentations
attached
Increased use of
digital cameras for
data capture
*&%!*
SharePoint
Smart
meters
Increasing availability
of external data which
may or may not be
highly relevant
So what constitutes big data?
The 3 V’s
Volume
More V’s
Variety
Voracity
Value
Value-Density
Velocity
Copyright 2012 Freeform Dynamics Ltd
7
A practical view
M2M feeds,
web activity
logs, ticker
data, etc.
ERP, CRM,
SCM & other
transaction
data
Social media,
news feeds,
harvested web
content, etc.
Document
repositories,
message
stores, etc.
HIGHLY
UNSTRUCTURED
Copyright 2012 Freeform Dynamics Ltd
8
HIGH VALUE
DENSITY
BIG
DATA
LOW VALUE
DENSITY
HIGHLY
STRUCTURED
Need for a different architectural approach
SCALE OUT
SCALE UP
(e.g. high
performance
RDBMS cluster)
Powerful CPUs
Lots of cores
Huge memory
Expensive disk
Expensive SW
Copyright 2012 Freeform Dynamics Ltd
Distributed Commodity Hardware
Open Source Software
 Parallel processing
 Principle of divide and conquer
 Distribute data into small chunks
 Execute lots of little tasks close
to the data, then merge results
9
MapReduce
HDFS
 Breaks traditional conventions
Copyright 2012 Freeform Dynamics Ltd
10
Other
tools
Pig
Cassandra
ZooKeeper
Hive
hadoop.apache.org
HBase
The elephant in the room
Comparison of approaches
TRADITIONAL APPROACH
BIG DATA APPROACH
Schema based data model
Key/value based (no schema)
Create model, then load data
Load data, then create model
Only load what’s valuable
Load data speculatively
Premeditated/prescriptive analysis
Exploratory/iterative analysis
What’s the answer?
What’s the question?
Fastest time to result
Generate the best insight
 Different way of thinking, different level of impact
Copyright 2012 Freeform Dynamics Ltd
11
Some common big data use cases
 Social analytics (the ‘poster child’)
 Customer analytics in the broader sense




Profiling and segmentation
Advertising and promotion
Retail optimisation (pricing, merchandising, etc)
Customer services and support
 IT systems monitoring and management
 Security and associated forensics
 Business operations
 Suppler management, logistics, energy management
 Industry specific
 Financial services, public sector, telecoms
Copyright 2012 Freeform Dynamics Ltd
12
INPUTS
More data
Greater diversity
Faster acquisition
More sources
ANALYSIS
More urgency
Less predictability
More granularity
More history
Smaller time-slices
But vanilla Hadoop seldom the answer
 Enterprise readiness of Hadoop
 Resilience, security, integration friendliness
 Apache tools relatively raw, so look out for other distributions
 Cloudera, Hortonworks, MapR Technologies, IBM InfoSphere BigInsights…
 Mainstream vendors substituting components and extending framework
 Hadoop becoming an engine that sits behind commercial
frameworks and tools
 IBM, Microsoft, Oracle, SAP, SAS, EMC, Teradata, …
 And Hadoop doesn’t define the whole advanced data
management and analytics opportunity anyway
 Enhanced RDBMS, next generation data warehousing, NoSQL, statistical
modelling, predictive analytics, time-series analysis, in-memory databases,
stream based processing engines, and more…. it’s a pretty lively area
Copyright 2012 Freeform Dynamics Ltd
13
Use of traditional and emerging technologies
Current level of use
0%
20%
40%
60%
80%
Change over next 3 years
100%
-60%
-40%
-20%
0%
20%
40%
Legacy databases and file systems
General purpose RDBMS servers
High performance RDBMS configurations
OLAP multi-dimensional database systems
Write once read many (WORM) databases
Rule-based stream processing engines
In memory databases
Scale-out storage architectures
Distributed indexing and search
Distributed data analytics engines
5 (Extensive use)
Copyright 2012 Freeform Dynamics Ltd
4
3
2
1 (Not used at all)
Unsure
14
Less use
More use
60%
Taking a joined up approach
Derivative
structured data
£
External
feeds
Advanced
analytics
Business
insights
Traditional
BI systems
Business
decision
makers
Business models
and policies
Data
scientists
Operational
data
Operational
data
Actionable
rules
Operational systems
Front line
staff
Customers
& suppliers
Copyright 2012 Freeform Dynamics Ltd
15
Common challenges organisations face
 Culture of driving via the rear view mirror
 Too much focus on ‘lag’ rather than ‘lead’ indicators
 Emphasis on planning/score keeping rather than in-flight control
 Management and decision making issues
 Lack of business and political alignment between divisions
 Parochial approach to budgeting and investment in IT
 Fragmented and disjointed systems and information
 Different formats, different coding structures
 Different levels of accuracy, quality and completeness
 Governance and control
 Ownership of source data often ambiguous
 Security, privacy and compliance challenges of centralised big data repositories
 Business and IT staff don’t know what they don’t know
 Locked into historical perceptions and assumptions
 Knowledge and skills gap often not recognised
Copyright 2012 Freeform Dynamics Ltd
16
Looking to the future
 Blurring of the lines






Big data and traditional BI
Operational control and analytics
Analysts and business people
Managers and front-line staff
On premise and cloud
Mobile and office based
KEY QUESTIONS
 How many of those data stores can be combined?
 Layering of analytics tools over big data infrastructure
 Promise and potential of in memory solutions
 Role of deep space skills vs standard models and templates?
 How quickly will the cultural shifts take place?
Copyright 2012 Freeform Dynamics Ltd
17
How much do you agree or disagree with the
following statements? 0%
20%
40%
60%
80%
Developments in advanced storage, access
and analytics can allow us to tackle problems
today that were either too hard or too
expensive to deal with in the past
Developments in advanced storage, access
and analytics can allow us to take different
and better approaches to tackling some key
business requirements
Vendors and consulting firms are well geared
up to providing us with the support and
services we need to take advanced storage,
access and analytics on board effectively
5 (Totally agree)
Copyright 2012 Freeform Dynamics Ltd
4
3
2
1 (Totally disagree)
18
Unsure
100%
Thank You
Copyright 2012 Freeform Dynamics Ltd
19
Big Data in Context
A practical, real-world view
BCS Event, Leeds
November 2012
Dale Vile
CEO & Research Director
Twitter: dale_vile
Blog: researchbeat.com
Freeform Dynamics Ltd
www.freeformdynamics.com
Copyright 2012 Freeform Dynamics Ltd
20
Copyright 2012 Freeform Dynamics Ltd
About Freeform Dynamics
Mission: To make emerging ideas and technologies
more accessible the mainstream organisations
 Cut through vendor promises and hype
 Decipher aspirational marketing aimed at early adopters
 Pick the brains of early movers and learn from their experience
 Distil out critical success factors, tips, tricks and traps
 Provide advice to the broader community in plain English
Mechanics
 Briefings with IT vendors and service providers
 Primary research - face to face, telephone and online
 Use of press and social media to get stuff out there
Copyright 2012 Freeform Dynamics Ltd
21
Download