Towards Personal High-Performance Geospatial Computing (HPC-G): Perspectives and a Case Study Jianting Zhang Department of Computer Science, the City College of New York jzhang@cs.ccny.cuny.edu 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Outline Introduction Geospatial Data, GIS, Spatial Databases and HPC Personal HPC-G: A New Framework Geospatial data: what’s special? GIS: impacts of hardware architectures Spatial Databases: parallel DB or MapReduce? HPC: many options Why Personal HPC for geospatial data? GPGPU Computing: a brief introduction Pipelining CPU and GPU workloads for performance Parallel GIS prototype development strategies A Case Study: Geographically Weighted Regression Summary and Conclusions 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Ecological Informatics Geography GIS Applications Remote Sensing Computer Science •Spatial Databases •Mobile Computing •Data Mining 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Introduction – Personal Stories Computational intensive problems in geospatial data processing Distributed hydrological modeling/flood simulation Satellite image processing: clustering/classification (multi-/hyper-spectral) Identifying and tracking storms from time-series NEXRAD images Species distribution modeling (e.g. regression/GA-based) History of accesses to HPC resources 1994: Simulating a 33 hours flood on a PC (33MHZ/4M) took 50+ hours 2000: A Cray machine was available but special arrangement was required to access it while taking a course (Parallel and Distributed Processing) 2004-2007: HPC resources at SDSC were available to the SEEK project but the project ended up only using SRB for data/metadata storage 2009-2010: An Nvidia Quadro FX 3700 GPU card (that came with a Dell workstation) gave 23X speedup after porting a serial CPU codebase (for SSDBM’10) to CUDA platform (ACM-GIS’10) 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Two books that changed my research focus (as a database person )… 2nd edition4th edition http://courses.engr.illinois.edu/ece498/al/ As well as a few visionary database research papers •David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM 35(6): 85-98 (1992) •Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, David A. Wood: DBMSs on a Modern Processor: Where Does Time Go? VLDB 1999: 266-277 •J. Cieslewicz and K.A. Ross: Database Optimizations for Modern Hardware. Proceedings of the IEEE, 96(5):2008 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Introduction – PGIS in traditional HPC Environment A. Clematis, M. Mineter, and R. Marciano. High performance computing with geographical data. Parallel Computing, 29(10):1275–1279, 2003 “Despite all these initiatives the impact of parallel GIS research has remained slight: the anticipated performance plateau became a mountain still being scaled GIS companies found that, other than for concurrency in databases, their markets did not demand multi-processor performance. While computing in general demands less of its users, HPC has demanded more– –the barriers to use remain high and the range of options has increased” “…fundamental problem remains the fact that creating parallel GIS operations is non-trivial and there is a lack of parallel GIS algorithms, application libraries and toolkits.” If parallel GIS runs in a personal computing environment, to what degree the conclusions will change? 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Introduction – PGIS in Personal Computing Environment Every personal computer is now a parallel machine Chip-Multiprocessors (CMP): Dual-core, Quad-core, Six-core CPUs INTEL XEON E5520 $379.99 4 cores/8 threads; 2.26G, 80W 4*256K L2 cache, 8M L3 cache Max Memory Bandwidth 25.6GB/s Massively parallel GPGPU computing: Hundreds of GPU cores in a GPU card Nvidia GTX480 $499.99 480 cores/ (15*1024 threads); 700/1401MHZ, 250W 1.35 TFlops 15*32768 registers; 15*64K shared memory/L1 cache; 768 L2 cache; additional constant/texture memory 1.5G GDDR5 – 1848MHZ clock rate, 384-bit memory interface width, 177.4 GB/s memory bandwidth If these parallel computing powers are fully utilized, to what degree a personal workstation can match a traditional cluster for geospatial data processing? 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Geospatial data: what’s special? The slowest processing unit determines the overall performance in parallel computing Real world data very often are skewed Wavelet compressed raster data Clustered Point data 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Geospatial data: what’s special? Techniques to handle skewness data decomposition/partition spatial indexing task scheduling Complexities of task scheduling grow Simple equal-size partition may work well for fast with the number of tasks and local operations, but may not for focal, zonal and generic scheduling heuristics may not global operations which requires more always produce good results sophisticated partitions to achieve load balancing 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GIS: impacts of hardware architectures GIS have been evolving along with mainstream information technologies major platform shift from Unix workstations to Windows PCs in the early 1990s the marriage with Web technologies to create Web-GIS in the late 1990s Will GIS naturally evolve from serial to parallel as computers evolve from uniprocessor to chip multiprocessor? What can the community do to speedup the evolution? 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GIS: impacts of hardware architectures Three roles of GIS GIS-based spatial modeling, such as agent based modeling, is naturally suitable for HPC data management information visualization modeling support Computational intensive Adopt a raster tessellation and mostly involve local operations and/or focal operations with small constant numbers of neighbors - parallelization-friendly or even “Embarrassingly parallel” Runs in an offline mode and uses traditional GIS for visualization How to make full use of hardware and support data management and information visualization more efficiently and effectively? 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 HPC: many options The combination of architectural and organizational enhancements lead to 16 years of sustained growth in performance at an annual rate of 50% from 1986 to 2002, due to the combined power, memory and instruction-level parallelism problem, the growth rate has dropped to about 20% per year from 2002 to 2006 In 2004, Intel cancelled its high-performance uniprocessor projects and joined IBM and Sun to declare that the road to higher performance would be via multiple processors per chip (or Chip Multiprocessors, CMP) rather than via faster uniprocessors. As a marketing strategy, Nvidia calls a personal computer equipped with one or more of its high-end GPGPU cards as a personal supercomputer. Nvidia claimed that when compared to the latest quad-core CPU, Tesla 20-series GPU computing processors deliver equivalent performance at 1/20th of power consumption and 1/10th of cost. 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 HPC: many options 1. 2. 3. 4. CPU Multi-cores GPU Many-cores CPU Multi-nodes (traditional HPC) CPU+GPU Multi-nodes (2+3) How about 1+2? Personal HPC •Affordable and dedicated personal computing environment •No additional cost: use-it or waste-it •Excellent visualization and user interaction supports •Can be the “last-mile” of a larger cyberinfrastructure •Data structures/algorithms/software are critical to the success 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Personal HPC-G: A New Framework Additional arguments to advocate for Personal HPC for geospatial data While some geospatial data processing tasks are computationally intensive, many more are data intensive in nature Distributing large data chunks incur significant network and disk I/O overheads (50-100MB/s) make full use of high interface bandwidths between CPU cores –memory (10-30 GB/s), CPU memoryGPU memory(8GB/s) and GPU coresmemory (100-200 GB/s) The improved CPU+GPU performance will not only solve old problems faster but also allow many traditionally offline data processing tasks run online in an interactive manner. The uninterrupted exploration processes are likely to facilitate novel scientific discoveries more effectively. 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Why Personal HPC for geospatial data? High-Level Comparisons among Cluster Computing, Cloud Computing and Personal HPC Cluster Computing Cloud Computing Personal HPC Initial cost High Low Low Operational cost High Medium Low End user control Low High High Theoretical scalability High High Medium User code development Medium Low High Data management Low Medium Medium Numeric modeling High Medium High Interaction & visualization Low Low High 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Spatial Database: parallel DB or MapReduce Spatial databases: GIS without GUI Learn lessons from the relational databases on parallelization The debates between Parallel DB and MapReduce The emergence of hybrid approaches (e.g. HadoopDB ) While parallel processing of geospatial data to achieve high performance has been a research topic for quite a while, neither of them has been extensively applied to practical large-scale geospatial data management Call for pilot studies in experimenting the two approaches to provide insights for future synthesis 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GPGPU Computing: Nvidia CUDA: Compute Unified Device Architecture AMD/ATI: Stream Computing 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Parallel GIS prototype development strategies We envision that Personal HPC-G provides an opportunity to evolve traditional GIS to parallel GIS gradually. Community research and development efforts are needed to speed up the evolution. We first propose to learn from existing parallel geospatial data processing algorithms and adapt them to CMP CPU and GPU architectures. Second, we suggest study existing GIS modules (e.g., ArcGIS geoprocessing tools) carefully, identify most frequently used ones and develop parallel code for multicore CPUs and many-core GPUs Third, while exiting database research on CMP CPU and GPU architectures are still relatively limited, they can be the starting point to investigate how geospatial data management can be realized on the new architectures and their hybridization Finally, reuse existing CMP and GPU based software codebases developed by the computer vision and computer graphics communities 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GWR Case Study A conceptual design of efficiently implement GWR based on CUDA GPGPU computing architecture preliminary in nature Being realized by a master student at CCNY Good C/C++ programming skills New to GPGPU/CUDA programming Being supported 5 hours/per week through a tiny grant (experiment on what $2000 can contribute to PGIS development) 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GWR Case Study GWR extends the traditional regression framework by allowing local parameters to be estimated Given a neighborhood definition (or Bandwidth) of a data item, a traditional regression can be applied to data items that fall into the neighborhood or region. The correlation coefficients for all the geo-referenced data items (raster cells or points) form a scalar field that can be visualized and interactively explored By interactively changing some GWR parameters (e.g., bandwidth) and visual exploring the changes of the corresponding scalar fields, users can have better understanding of the distributions of GWR statistics and the original dataset. 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Dependent Variable 7 9 8 8 7 7 6 6 5 Independent Variable GWR is computationally intensive 6 8 7 5 6 4 5 4 3 Using an n*n moving window to compute correlation coefficients (n=3). The correlation coefficient at the dotted cell is r=0.84 Point data are usually clustered which makes load-balancing very difficult 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GWR Case Study: Overall Design 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GWR Case Study: From partial to total statistics n f rxy n ( x x)( y y) xy n x y i 1 (n 1) s x s y i 1 (n 1) s x s y n n n i 1 i 1 i 1 n xi yi xi yi n n n xi ( xi ) i 1 2 i 1 n 2 n n yi ( yi ) 2 i 1 2 i 1 Let S1=nΣxiyi, S2=Σxi, S3= Σyi, S4=nΣxi2, S5=nΣyi2, f can be computed from n and S1 through S5. Assuming that data items D1, D2, …Dn are divided into m groups and each group has computed their partial statistics s1, s2, s3, s4, s5, then f can be computed from nj, S1j, S2j, S3j, S4j and S5j as the following (j=1,m): n= Σnj, S1=nΣ (S1j/nj), S2=Σ S2j, S3=Σ S3j, S4=nΣ (S4j/nj), S5=nΣ (S5j/nj). 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Summary and Conclusions We aimed at introducing a new HPC framework for processing geospatial data in a personal computing environment, i.e., Personal HPC-G. We argued that the fast increasing hardware capacities of modern personal computers equipped with chip multiprocessor CPUs and massively parallel GPU devices have make Personal HPC-G an attractive alternative to traditional Cluster computing and newly emerging Cloud computing for geospatial data processing. We used a parallel design of GWR on Nvidia CUDA enabled GPU device as an example to discuss how Personal HPC-G can be utilized to realize parallel GIS modules by synergistic software and hardware co-programming. 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Q&A jzhang@cs.ccny.cuny.edu 2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 25