Presentation

advertisement
Cyberinfrastructure for Scalable and High
Performance Geospatial Computation
Xuan Shi
Graduate assistants supported by the CyberGIS grant
Fei Ye (2011) and Zhong Chen (2012)
School of Computational Science and Engineering (CSE)
College of Computing, Georgia Institute of Technology
Overview
 Keeneland and Kraken: the Cyberinfrastructure
for our research and development
 Scalable and high performance geospatial
software modules developed in the past 1 year
and 7 months
Keeneland: a hybrid computer
architecture and system
 A five-year Track 2D cooperative agreement awarded by the National
Science Foundation (NSF) in 2009
 Developed by GA Tech, UT-Knoxville, and ORNL
 120 nodes [240 CPUs + 360 GPUs]
 Integrated into XSEDE in July 2012
 Blue Waters – full scale of hybrid computer systems
Kraken: a Cray XT5 supercomputer
 As of November 2010, Kraken is the 8th fastest computer in the
world
 The world’s first academic supercomputer to enter the petascale
 Peak performance of 1.17 PetaFLOPs
 112,896 computing cores (18,816 2.6 GHz six-core AMD Opteron
processors)
 147 TB of memory
Scalable and high performance geospatial computation (1)
Interpolation Using IDW Algorithm on GPU and Keeneland
 Performance comparison based on different scale of data (i.e. number of
sample points) and the computing resources (Time is counted in second)
 Speedup is calculated by the time used on a single CPU divided by the time
used on the GPU(s)
 Interpolation is calculated based on the value of 12 nearest neighbors
 Output grid size: 1M+ cells
Time and Speedup on desktop
Time and Speedup on Keeneland
Data Size
Single CPU
Single GPU
1 GPU
3 GPUs
6 GPUs
9 GPUs
2191
1331 (22.2)
15.3 / 87
3 / 444
4 / 333
6 / 222
6 / 222
4596
2502 (41.7)
14.6 / 171
5 / 500
5 / 500
7 / 357
8 / 313
5822
2926 (48.8)
16.5 / 177
7 / 418
5 / 585
6 / 488
6 / 488
6941
3717 (62.0)
17.1 / 217
6 / 620
4 / 929
7 / 531
6 / 620
7689
3978 (66.3)
18.4 / 216
7 / 568
5 / 796
6 / 663
8 / 497
9543
4875 (81.3)
20.6 / 237
7 / 696
4 / 1219
6 / 813
8 / 609
9817
5061 (84.4)
21.2 / 239
7 / 723
4 / 1265
6 / 844
7 / 723
Scalable and high performance geospatial computation (2)
Interpolation Using Kriging Algorithm on GPU and Keeneland
 Performance comparison based on different scale of data (i.e. number of
sample points) and the computing resources (Time is counted in second)
 Speedup is calculated by the time used on a single CPU divided by the time
used on the GPU(s)
 Interpolation is calculated based on the value of 10 nearest neighbors
 Output grid size: 1M+ cells
Three Kriging approaches a) Spherical, b) Exponential, and c) Gaussian have
been implemented on GPU/Keeneland
Time/speedup on desktop
Time/Speedup on Keeneland
Data Size
Single CPU
Single GPU
1 GPU
3 GPUs
6 GPUs
9 GPUs
2191
669 (11.2)
56 / 12
7 / 96
4 / 167
6 / 112
6 / 112
4596
1570 (26.2)
66 / 24
8 / 196
5 / 314
6 / 262
7 / 224
6941
1960 (32.7)
65 / 30
7 / 280
4 / 490
7 / 280
6 / 327
9817
2771 (46.2)
52 / 53
6 / 462
4 / 693
7 / 396
6 / 462
Scalable and High Performance Geospatial Computation (3)
Parallelizing Cellular Automata (CA) on GPU and Keeneland (1)
 Cellular Automata (CA) is the foundation for geospatial modeling and simulation,
such as SLEUTH for urban growth simulation
 Game of Life (GOL), invented by Cambridge mathematician John Conway, is a
well-known generic CA that consists of a collection of cells which, based on a few
mathematical rules, can live, die or multiply.
The Rules:
 For a space that is 'populated':
 Each cell with one or no neighbors dies, as if by loneliness.
 Each cell with four or more neighbors dies, as if by overpopulation.
 Each cell with two or three neighbors survives.
 For a space that is 'empty' or 'unpopulated'
 Each cell with three neighbors becomes populated.
Scalable and High Performance Geospatial Computation (3)
Parallelizing Cellular Automata on GPU and Keeneland (2)





Size of CA: 10,000 x 10,000
Number of iterations: 100
CPU time: ~ 100 minutes
GPU [desktop] time: ~ 6 minutes
Keeneland [20 GPUs]: 20 seconds
A cell is “born” if it
has exactly 3
neighbors, stays
alive if it has 2 or 3
living neighbors,
and dies otherwise.
CPU  Intel Xeon CPU 5110 @ 1.60 GHz, 3.25 GB of RAM
GPU  NVIDIA GeForce GTX 260 with 27 streaming multiprocessors (SM)
 A simple SLEUTH model has implemented on a single GPU
 Implementation on Kraken and Keeneland using multiple GPUs is under development
Scalable and High Performance Geospatial Computation (4)
Parallelizing ISODATA for Unsupervised Image Classification on Kraken (1)
Iterative Self-Organizing Data Analysis Technique Algorithm (ISODATA)
Performance comparison :: ERDAS uses 3:44:37 (13,477 seconds) to read image file [~ 2 minutes]
and do the classification over one tile of 18 GB imagery data [0.5 m resolution in three bands]
Number of Cores
144
324
576
900
Stripe Count
80
80
80
80
Stripe Size (MB)
10
10
10
10
Read Time (Sec)
5.66
5.13
2.94
2.77
13.72
6.15
3.56
3.31
Classification Time (Sec)
 20+ hours to load data from GT
into Kraken @ ORNL
 The more cores are requested, the
longer the waiting time will be
 ~ 10 seconds to complete the
classification process
 I/O needs to be further optimized
Our solution over Kraken using different number of cores with
optimized stripe count and stripe size.
Tue Jun 12 12:48:37 EDT 2012
Iteration 1: convergence = 0.000
Iteration 2: convergence = 0.918
Iteration 3: convergence = 0.938
Iteration 4: convergence = 0.954
---- Classification completed ---The reading file time is 15.4807
The classification time is 9.2374
The total ISODATA algorithm running time is 24.7181
Histogram:
Class 0: 1124674113
Class 1: 1970406180
Class 2: 2845484626
Class 3: 2897947070
Class 4: 2298948648
Class 5: 1662539363
Tue Jun 12 14:24:23 EDT 2012
Iteration 1: convergence = 0.000
Iteration 2: convergence = 0.915
Iteration 3: convergence = 0.935
Iteration 4: convergence = 0.952
---- Classification completed ---The reading file time is 28.6973
The classification time is 8.9810
The total ISODATA algorithm running time is 37.6782
Histogram:
Class 0: 2811537615
Class 1: 3715199078
Class 2: 5660559329
Class 3: 5766104126
Class 4: 4652035362
Class 5: 2994564490
Tue Jun 12 15:39:10 EDT 2012
Iteration 1: convergence = 0.000
Iteration 2: convergence = 0.919
Iteration 3: convergence = 0.936
Iteration 4: convergence = 0.953
---- Classification completed ---The reading file time is 53.5952
The classification time is 9.1167
The total ISODATA algorithm running time is 62.7119
Histogram:
Class 0: 2811537615
Class 1: 8743937711
Class 2: 12122628756
Class 3: 11850984345
Class 4: 9714452352
Class 5: 5956459221
Tue Jun 12 16:06:31 EDT 2012
Iteration 1: convergence = 0.000
Iteration 2: convergence = 0.919
Iteration 3: convergence = 0.937
Iteration 4: convergence = 0.953
---- Classification completed ---The reading file time is 47.8197
The classification time is 9.6519
The total ISODATA algorithm running time is 57.4716
Histogram:
Class 0: 2811537623
Class 1: 14137169249
Class 2: 18231156326
Class 3: 17844190199
Class 4: 14839032207
Class 5: 8936914396
Application 1436660 resources: utime ~30211s, stime ~1215s
Tue Jun 12 12:49:06 EDT 2012
Application 1439048 resources: utime ~78392s, stime ~2164s
Tue Jun 12 14:25:05 EDT 2012
Application 1440071 resources: utime ~208415s, stime ~4110s
Tue Jun 12 15:40:18 EDT 2012
Application 1440335 resources: utime ~275810s, stime ~6377s
Tue Jun 12 16:07:33 EDT 2012
36 GB
1,800 Cores
72 GB
3,600 Cores
144 GB
7,200 Cores
216 GB
10,800 Cores
Scalable and High Performance Geospatial Computation (4)
Parallelizing ISODATA for Unsupervised Image Classification on Kraken (2)
Iterative Self-Organizing Data Analysis Technique Algorithm (ISODATA)
# of
tiles
I/O
5 classes
CLS
Total
I/O
10 classes
CLS
Total
IR
1
4.32
2.13
6.45
2
8.94
2.16
4
21.01
8
12
15 classes
CLS
Total
IR
I/O
4
4.25
8.62
12.87
11
5.51
12.07
11.10
4
20.31
7.92
28.23
10
17.16
2.21
23.23
4
16.40
7.95
24.35
10
28.83
2.23
31.06
4
28.95
7.41
36.36
44.86
2.29
47.15
4
45.92
6.57
52.49
20 classes
CLS
Total
IR
I/O
17.57
11
6.00
18.13
24.13
13
11.32
28.47
11
9.02
15.09
24.11
12
14.80
13.41
28.21
13
16.40
7.95
24.35
10
9
28.67
14.78
43.46
14
29.52
15.34
44.86
12
8
58.31
9.43
67.74
9
41.56
15.37
56.93
12
Performance comparison  to classify one tile of 18 GB image into 10, 15, and 20 classes,
ERDAS uses about 5.5, 6.5, and 7.5 hours to complete 20 iterations, while the convergence
number is less than 0.95
IR
Scalable and High Performance Geospatial Computation (5)
Near-repeat calculation for spatial-temporal analysis on crime
events over GPU and Keeneland
 Through a re-engineering process, the near-repeat calculation is
first parallelized on to a NVIDIA GeForce GTX 260 GPU, which takes
about 48.5 minutes to complete one calculation and 999
simulations on two event chains over 30,000 events.
 Through a combination of MPI and GPU programs, we can dispatch
the simulation work onto multiple nodes in Keeneland to
accelerate the simulation process.
 We use 100 GPUs on Keeneland to implement 1,000 simulations for
about 264 seconds to complete this task.
 If more GPUs were used, the simulation time can be reduced.
One run of 4+ event chain calculation is easy to approach or go
beyond petascale (1015) and exascale (1018)
Thank you
Questions?
Download