Yiqun Xie, Yongbo Chen CSCI 5715 – Fall 2014 10/21/2014 Spatial Data vs. Spatial Big Data Traditional Spatial Data Simple use cases Examples Volume Variety Spatial Big Data U of M gopher-way map Real time map of traffic using Waze usergenerated content Smart phone GPS-trace data size estimation Point data: restaurants Check-ins Time(date, clock), Location(x,y,z), Metadata Graph: static roadmaps Temporally detailed roadmaps 64 Bytes 10 min:of 64roadmaps X (160 X 106) X (6 X 24) -> 1.5 TB perdetailed day Gigabytes Temporally maps can reach 1013 10 sec: 90 TB per day items per year; GPS traces from smart phone 1 sec: 900 TB per day, close to 1 PB Raster, vector, graph Moving objects, temporal graph, frequently updated satellite/UAV imagery, lidar/laser, geo-located tweets about disasters Velocity Limited velocity (census-decade) High velocity (Show near real-time map of 400 million tweets/day related to disasters) Dimensionality 2D, 3D Time dimension Big Data vs. Spatial Big Data Big Data Spatial Big Data Examples Facebook/Twitter posts Google search terms Geo-located tweets and posts Open Street Map Data Types Text keywords Web logs GPS traces; geo-located social platform posts Temporally detailed roadmaps Frequently collected satellite/UAV imagery Questions Google brain: Does an image contain a cat? Are there any hotspots of recent disasterrelated tweets? Where? Representative Computational Paradigms spatial queries: partitioning data skew; boundary objects Spatial Hadoop, GIS in Hadoop Hadoop Hashing Sub-problem optimization (learning) Declustering Spatial partitioning Relationship between data volume and use-case complexity Cloud computer (109 MIPS) n3 n2 Use-case complexity Cluster (106 MIPS) n log n n Laptop (103 MIPS) log n 1 106 1 Query using hash map log n Search: binary search n Map check-ins from Facebook n2 Distance between point-pairs 109 1012 1015 Volume of dataset (n) M. Evans, D. Oliver, K. Yang, X. Zhou, S. Shekhar. Enabling Spatial Big Data via CyberGIS: Challenges and Opportunities. Springer, 2014. (Book chapter) Use cases of SBD - Vestas Wind Systems Improve wind turbine placement for optimal energy output IBM BigInsights software + IBM "Firestorm" (#53 on the Top500 supercomputer) 2.8 petabytes; 20+ petabytes over the next four years Analysis time: weeks -> less than 1 hour Base resolution of wind data grids: 27x27km to 3x3km Big Data Creates Big Jobs McKinsey: a shortage of 140,000 to 190,000 big data professionals by 2018 Gartner: 2 million job openings in the U.S. by 2015 Reference M. Evans, D. Oliver, K. Yang, X. Zhou, S. Shekhar. Enabling Spatial Big Data via CyberGIS: Challenges and Opportunities. Springer, 2014. (Book chapter) Big data: The next frontier for innovation, competition and productivity, McKinsey Global Institute, May, 2011 http://www.emarketer.com/Article/Smartphone-Users-Worldwide-Will-Total-175Billion-2014/1010536 http://www.statista.com/statistics/201182/forecast-of-smartphone-users-in-the-us/ http://www-03.ibm.com/press/us/en/pressrelease/35737.wss http://www.gartner.com/newsroom/id/2207915