Big data: The next frontier for innovation, competition and productivity

advertisement
Yiqun Xie, Yongbo Chen
CSCI 5715 – Fall 2014
10/21/2014
Spatial Data vs. Spatial Big Data
Traditional Spatial Data
Simple use
cases
Examples
Volume
Variety
Spatial Big Data
U of M gopher-way map
Real time map of traffic using Waze usergenerated content
Smart phone GPS-trace data size estimation
Point
data: restaurants
Check-ins
Time(date,
clock), Location(x,y,z), Metadata
Graph:
static roadmaps
Temporally detailed roadmaps
64 Bytes
10 min:of
64roadmaps
X (160 X 106) X (6 X 24) ->
1.5 TB perdetailed
day
Gigabytes
Temporally
maps can reach 1013
10 sec: 90 TB per day
items per year; GPS traces from smart phone
1 sec: 900 TB per day, close to 1 PB
Raster, vector, graph
Moving objects, temporal graph, frequently
updated satellite/UAV imagery, lidar/laser,
geo-located tweets about disasters
Velocity
Limited velocity (census-decade)
High velocity (Show near real-time map of
400 million tweets/day related to disasters)
Dimensionality
2D, 3D
Time dimension
Big Data vs. Spatial Big Data
Big Data
Spatial Big Data
Examples
Facebook/Twitter posts
Google search terms
Geo-located tweets and posts
Open Street Map
Data Types
Text keywords
Web logs
GPS traces; geo-located social platform posts
Temporally detailed roadmaps
Frequently collected satellite/UAV imagery
Questions
Google brain:
Does an image contain a cat?
Are there any hotspots of recent disasterrelated tweets? Where?
Representative
Computational
Paradigms
spatial queries: partitioning
data skew; boundary objects Spatial Hadoop, GIS in Hadoop
Hadoop
Hashing
Sub-problem optimization (learning)
Declustering
Spatial partitioning
Relationship between data volume and use-case complexity
Cloud computer (109 MIPS)
n3
n2
Use-case
complexity
Cluster (106 MIPS)
n log n
n
Laptop (103 MIPS)
log n
1
106
1
Query using hash map
log n
Search: binary search
n
Map check-ins from Facebook
n2
Distance between point-pairs
109
1012
1015
Volume of dataset (n)
M. Evans, D. Oliver, K. Yang, X. Zhou, S. Shekhar. Enabling Spatial Big Data via CyberGIS: Challenges and Opportunities. Springer, 2014. (Book chapter)
Use cases of SBD - Vestas Wind Systems
 Improve wind turbine placement for optimal energy output
 IBM BigInsights software + IBM "Firestorm" (#53 on the Top500 supercomputer)
 2.8 petabytes; 20+ petabytes over the next four years
 Analysis time: weeks -> less than 1 hour
Base resolution of wind data grids: 27x27km to 3x3km
Big Data Creates Big Jobs
 McKinsey: a shortage of 140,000 to
190,000 big data professionals by
2018
 Gartner: 2 million job openings in the
U.S. by 2015
Reference
 M. Evans, D. Oliver, K. Yang, X. Zhou, S. Shekhar. Enabling Spatial Big Data via





CyberGIS: Challenges and Opportunities. Springer, 2014. (Book chapter)
Big data: The next frontier for innovation, competition and productivity, McKinsey
Global Institute, May, 2011
http://www.emarketer.com/Article/Smartphone-Users-Worldwide-Will-Total-175Billion-2014/1010536
http://www.statista.com/statistics/201182/forecast-of-smartphone-users-in-the-us/
http://www-03.ibm.com/press/us/en/pressrelease/35737.wss
http://www.gartner.com/newsroom/id/2207915
Download