BIG DATA - Department of Computer Science and Engineering

advertisement
Internet Pictures Clips Maps News Shop Email more
BIG DATA
Challenges & Opportunities
Search
Feeling Lucky
Lei Chen
1
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
Opportunities
Outline
Background
“Big data” is term acknowledging the exponential growth,
availability and use of …
Challenges
“Big data” proposes ground challenges on data capture, storage,
analysis …
Opportunities
Many applications can be benefited from “Big data” …
2
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
Opportunities
Background
We are capturing more data
Super exponential growth in data volume
Satellite imagery, mobile
station, distributed sensor
networks, geographical
plotting …
Copyright belongs to “Data Analysis Challenges”, JSR-08-142, Dec
3
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
Opportunities
Background
We are using more data
Intelligent transportation
Digital health care
4
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
Opportunities
Background
We need quick processing of the data
Volcano monitor
Hurricane moving path predication
5
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
Opportunities
Background
We are exploring the unknowns with
different means of data measurements
Exploring the universe
Ocean science
6
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
Opportunities
Background
We are discovering new rules from data
The well-formed.
eigenfactor project
visualizes information
flow in science.
This diagram shows
the citation links of the
journal Nature.
Copyright belongs to http://wellformed.eigenfactor.org
7
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
Opportunities
Background
Defining Big Data
Wiki: Big data are datasets that grow so large that they become
awkward to work with using on-hand database
management tools. Difficulties include capture,
storage, search, sharing, analytics and visualizing.
Gartner(2011): Big data is a popular term used to acknowledge
the exponential growth, availability and use of
information in the data-rich landscape of tomorrow.
8
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
Opportunities
Background
Features of Big Data
3V: Variety, Velocity and Volume
9
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
Opportunities
Challenges
Applications
<key,vals>
Object
E-R
Hierarchical
Data Processing
(Processing lang,
optimization,
Visualization)
Data Model
(Interpretation,
representation)
Network Topology
Storage
(Reliability,
Scalability,
Availability)
Data Extraction
(Acquisition,
Integration,
Representation )
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Challenges
Data model challenges
Volume
Scale up, scale out, and scale in
Velocity
“Interactive” properties to facilitate processing
Variety
Simple but unified to adapt heterogeneity
Existing data models are not satisfactory
<key,vals>
Object
E-R
Hierarchical
Functionality
vs. Simplicity
11
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Challenges
Storage challenges
Storage concerns:
• Reliability: data is safe and trustable
• Availability: data is accessible
• Scalability: data operation performance does not decay
along with data size growth
However, the CAP theorem is the bottleneck. No
one-for-all solution exists
12
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Challenges
Storage challenges
CAP Theorem
•
•
•
Consistency
Availability
Partition tolerance
13
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Challenges
Storage challenges
ACID vs. BASE
RDBMS
NoSQL
Atomic
Consistent
Basically
Available
Isolated
Soft-state
Durable
Eventually
consistent
RDBMS
BigTable
HyperTable
HBase
MongoDB
Redis
Scalaris etc.
Dynamo
CouchDB
Cassandra
SimpleDB
Tokyo Cabinet
Riak
Voldemot etc.
C
P
A
14
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Challenges
Management challenges
“Solving 'Big Data' Challenge Involves More Than Just Managing
Volumes of Data”
Gartner(2011)
Big data management
Functionality
Flexibility
Indexing &
Partition
Adaption to new
requirement and
new component
15
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Challenges
Management challenges
E.g., Indexing over big data
Volume
Large volume of Requires Distributed
data captured
adaptive index
very time unit
Leads to
Significant cost
on meta data
exchange
Leads to
Ambiguity on
indexing the
same object
Variety
Data captured
from different
sources
Requires Distributed
adaptive index
16
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Challenges
Challenges on processing
• New query language (algebra)
Desired
Flexibility
Sacrifices & Overhead
Complexity in data modeling
“Relational” supporting
Poor scalability
“Uncertain” supporting
Poor scalability and significant
computing overhead
Scalability
Efficiency & Effectiveness
Less functionality
Poor scalability
17
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Challenges
Challenges on processing
• New computing paradigm for processing
Distributed
Computing Paradigm
Message Passing
Unified Access
MapReduce
Limitations
Poor scalability and fault tolerance
Invalidated efficiency over large
computing nodes
Poor functionality
18
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Challenges
Challenges on processing
• New optimization methodology
Load Balance
Data Locality
High Parallelism
Merging Cost
Less Network I/O
Replicated Computing
19
Internet Pictures Clips Maps News Shop Email more
BIG
Opportunities
• We are empowered to learn knowledge and process
DATA
information more accurately, effectively and efficiently.
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Why “Big Data”?
Natural Science Study
Fundamental Scientific
Research
Big Data
Social Civilization
Daily Life
20
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Opportunities
Big Data for natural science study
• E.g., natural disaster forecasting and management
Flood
Forecasting
Earthquake
Meteorological data
Geographic data
Population, transportation,
urban design data
Economic data
Extreme Weather
Manage
ment
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Opportunities
Big Data for fundamental scientific
research
• E.g., Bio informatics and medicine
The mutual promotion relation between the gene technology
and the clinical medicine
22
Internet Pictures Clips Maps News Shop Email more
BIG
Opportunities
• Light-speed information spreading & enormous knowledge
DATA
Big Data for social civilization
line
kground
llenges
ata Model
torage
Management
rocessing
portunities
Quick events detection
Easy collaboration
Wandering where to get a real good cup of coffee ?
JUST tweet your question!!
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Opportunities
Big Data for daily life
• Our life can be much easier more data… E.g., trip planning
Travel to Beijing::Request
3-day stay
Budget< 1000$
Predefine
Forbidden City
Adaptive
agenda
10am Meeting every day
Real world incidents
Traffic jam
Updating
Luggage delay
Bad weather
24
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
Opportunities
Opportunity highlights
• Volume
o Capture, store and analyze data help us better
understand the world
• Velocity
o Guaranteed effective & efficient data processing
• Variety
o Handling heterogeneous sources of data
Considering all the challenges and constraints, perhaps
there is no one-for-all solution
However, application dependent “Big Data” solutions are
promising
25
Internet Pictures Clips Maps News Shop Email more
BIG
DATA
Outline
Background
Challenges
. Data Model
. Storage
. Management
. Processing
Opportunities
. Applications
Opportunities
Applications
Heterogeneous data management
• Search doctors
• Search universities (undergoing)
Data
Integration
 Web pages on the Internet
Search
Doctors
 Hospital databases
 Search results from general-
purpose search engines
 News / rumors
Integrated
Database
Data Extraction
…
~500,000 doctors &
~30,000 hospitals
from 50+GB source
OLAP Query
Processing
26
Download