Big Data Presentation

advertisement
David Douglas
Department of Information
Systems
Introducing…
2
12 Definitions of Big Data
25 Big Data Facts
3
Short Big Data Video
Short Video
4
Leveraging Big Data in
Today’s Enterprise
5
Big Data Via the Three Vs
6
7
Volume
8
SAS adds a V—
Visualization
Of course, the objective
to get Value—the
ultimate V
12 V’s has been reported
9
Definition of Big Data
Big Data technologies describe a new generation of
technologies and architectures, designed to
economically extract value from very large volumes
of a wide variety of data, by enabling high velocity
capture, discovery and/or analysis -- IDC
Data is the new oil – European Consumer Commissioner Meglena Keneva
10
11
The Human Face of Big Data
12
Digital Universe
13
Gartner Technology Hype Cycle
14
Gartner Technology Hype Cycle
15
Big Data
Technologies…
16
Hadoop/MapReduce
•
•
•
•
Was driven by the need to index the web
Existing technology did not scale
MapReduce framework developed at Google
Yahoo! built Hadoop on the Map/Reduce
framework
Note: recent survey indicates only 16% of companies using the
Hadoop/MapReduce environment – dominated by the online companie
17
Hadoop
• Is the Storage Layer (HDFS)
– Hadoop Distributed File System - Software to
distribute data across multiple computing nodes.
– Typically runs on top of Linux
– Store each block 3 times—hopefully with one on a
node in a different rack
– Sequential access — write once, read many
– Optimized for streaming — no random access
– No predefined schema—any data type
18
Hadoop (cont)
• The Execution Layer (Map/Reduce)
– Responsible of running a batch job in parallel on many
servers
– Typically runs on top of Linux
– Works with (key, value) pairs
– For a job
• Mapper pulls data from their respective files
• Mapper Feeds Shuffle (may not be needed)
• Shuffle feeds Reducer which summarizes and returns result
– Java is native language
19
Map Reduce Example
•
Five files; each with two columns of key, value pairs of city, max temperature
Example:
Toronto, 20
Whitby, 25
……
Problem: Find the maximum temperature for each city
Break down into 5 mapper tasks; results of mapper tasks are:
(Toronto, 20) (Whitby, 25) (New York, 22) (Rome, 33)
(Toronto, 18) (Whitby, 27) (New York, 32) (Rome, 37)
(Toronto, 32) (Whitby, 20) (New York, 33) (Rome, 38)
(Toronto, 22) (Whitby, 19) (New York, 20) (Rome, 31)
(Toronto, 31) (Whitby, 22) (New York, 19) (Rome, 30)
Mapper task results feeds into reduce tasks which combine the input results and outputs a single value for
each city
(Toronto, 32) (Whitby, 27) (New York, 33) (Rome, 38)
20
Hadoop Ecosystem
21
Technology for Big Data
22
In-Memory Computing-Speed
RAM Latency
70 Nanoseconds
1400 MPH
Disk Latency
5 Milliseconds
0.003 MPH
23
In-Memory Computing Speed Demo
Backdrop
• Oracle World Demo
• Put all of Wikipedia into Oracle 12c with in-memory option
•
•
•
•
SAP Tech Ed a few weeks later
Put all of Wikipedia into SGI HANA box(250 billion rows)
Query and Plot of Wikipedia page views of AIDS versus Ebola by date
Forecast of Wikipedia page views of AIDS versus Ebola by date
HANA
24
25
26
Myth or Reality
• RAM is so inexpensive, it is a no-brainer to move to
in-memory computing?
• In-memory computing is an expected evolution in
the digital universe?
• In-memory computing tenet:
– RAM is the new DISK
– DISK is the new TAPE
27
Cases
• On-line Gambling
– Increasing number of online bets per second from 20,000 to 150,000
(Bwin.Party)
• Education
– Near real-time analytics driving intervention for improving retention
(University of Kentucky)
• Health Care
– Intersection of smart devices, electronic health care records and in-memory
analytics to provide real-time diagnostics and treatment
McKinsey & Company
• Global package company
– Move to real-time tracking of packages
MarketWatch
28
Thoughts on In-Memory Computing
• In-Memory Computing makes Big Data
Possible
• Insight at the speed of thought
• IMDBMS – reduces data footprint
– Eliminates aggregates
– Compression for columns higher than for rows
– Optimized for RAM instead of optimized for disk
29
Two Factors Will Drive In-Memory
Computing Faster than Planned
• Automated Decision-Making
• Mobile Computing
30
A Data Scientist
31
Another View of a Data Scientist
32
So How Do I Find One…
33
Big Data is disruptive in
the following ways
•
•
•
•
It brings grid and in-memory computing to
business
Software is being moved to the data instead
of moving the data to the software
Transition from analytics as rest to analytics
in motion
Will create new demand for workers with
analytics skills
34
Big Data is really
about Analytics
35
A View of Analytics
Source: mu-sigma
36
Another View of Analytics
Source: Rose Business Technologies
37
Achieving Success with Business Analytics
Another View of Analytics
Competitive Advantage
Decision Optimization
What is the best decision?
Advanced
Analytics
Predictive Modeling
What will happen next?
Forecasting
What if these trends continue?
Basic Statistical Analysis
Why is this happening?
Reporting with Early Warning
What actions are needed?
Dynamic Reporting
Basic
Analytics
Where exactly are the problems?
Ad Hoc Reporting
How many, how often, where?
Basic Reporting
What happened?
Data
Decision Support
Information
Reporting
Intelligence
Decision Guidance
38
Another View of Analytics
39
Another View of Analytics
40
Cognitive Computing?
Watson gains eyes, ears and a voice
41
The Importance of
Big Data and Analytics
• Wall Street Journal 9/16/13
– 44% of CIOs consider Business Intelligence as top priority for
technology spending
– 51% of the companies plan to increase spending on Business
Intelligence and Analytics software this year
• A recent McKinsey report
– Considers Big Data as “The next frontier for competition”
– “The United States alone faces a shortage of 140,000 to 190,000
people with analytical expertise and 1.5 million managers and
analysts with the skills to understand and make decisions based
on the analysis of big data.“
• Do you need a Data Scientist?
42
The Importance of
Big Data and Analytics
43
Data Driven Decisions
• Analytics, The New Path to Value (MIT research
report: 30 industries, 100 countries)
– Analytics is the differentiator for the top performing
companies – chart on next slide
– Data is not the problem
44
45
5 Stages of Big Data and Analytics
Maturity
46
Current State
47
There is a Journal
Big Data Journal
Word (Tag) Cloud
Word Cloud with Images
Easy Text Manipulation
http://www.ibm.com/analytics/watson-analytics/
https://ace.ng.bluemix.net/
http://www.biography.com/people/warren-buffett-9230729#synopsis
48
49
Of Interest
• Social Bakers
• Amazing Twitter Stats
• Google Trends
• Social media location adds considerable
opportunity
50
Implications…
51
The Analytics
At rest (static)
Models including predictive models using
historical data
In-motion (real-time)
Using models on a stream feed
Combination
Uses models on a stream feed; stream feed
goes into the data at rest to update models
52
Analytics at Rest—Analytics in Motion
53
Thoughts
• It is not a matter of “if” but “when” you get into
Big Data analytics
• Purpose is to provide enablement for users
• Choices
– Pure plays like Cloudera, Hortonworks, MapR, Pivotal,
etc.
– NoSQL databases (key-value, documents, networks)
– Major computing player like IBM, Oracle, etc.
– In-Memory Computing
– Should not be a new silo
54
Terms
• IoT – Internet of Things
• IoE – Internet of Everything
• IoN – Internet of Nothing
The vast majority of the billions of things connected to the internet
on Cisco’s website, for instance, are not the toasters, refrigerators,
thermostats, smoke detectors, pace-makers and insulin pumps that
the IoT's true believers enthuse about. Almost exclusively, they are
existing smartphones, tablets, computers and routers, plus a
surprising number of industrial components used to beam
performance statistics back to corporate headquarters. Without
any hoopla, operators of power stations, passenger jets, railways,
refineries, chemical plants, oil platforms and other industrial
equipment have been doing this for ages.
55
EMC Digital Universe with Research & Analysis by IDC
The Digital Universe of Opportunities: Rich Data and the Increasing
Value of the Internet of Things
April 2014
40% of data created and consumed by consumers
56
57
We Live in an Era of Change
58
Gartner’s M agic Quadrant – BI & Analytics
59
60
Good Reading
iPad App
61
62
Download