Wave Propagation Modeling Enabled By GPUs - GTC On

advertisement
Wave Propagation
Modeling enabled by GPUs
in Underwater Acoustics
Paul Hursky
Heat, Light, and Sound Research Inc.
http://hlsresearch.com
GTC 2013, San Jose, Ca
18‐21 March, 2013
Outline
• Introduction to underwater sound propagation – typical problems and numerical approaches
• Split‐step Fourier Parabolic Equation (SFPE) implemented in CUDA
• Example application for SFPE
Deep water propagation
Shallow water coastal propagation
Ocean is a thin layer at scales we are interested in
5‐by‐75 kilometers
100‐by‐5000 meters
We can take advantage of low angles –
paraxial approximation.
Repertoire of models
•
•
•
•
•
Wavenumber integral
Normal modes
Ray tracing and Gaussian beams
Parabolic equation
Finite elements and boundary elements (currently impractical at range scales desired, except for small focused problems, like target scattering in a small volume)
Wavenumber Integration – Fast Field Programs (FFP)
• FFP is an integral transform technique for horizontally stratified media
• Total field is sum of particular solution (due to specified sources) and some linear combination of homogeneous solutions to depth‐
separated wave equation
• Boundary conditions are used to determine coefficients of homogeneous solutions in total field at each wavenumber (i.e. at each value of the kr transform variable for range)
• Field is calculated using spectral (wavenumber) integral to transform these solutions (at each wavenumber) to spatial domain (range)
• Called FFP because early versions used FFT to evaluate the spectral integrals
• If integral is evaluated using contour integration, the discrete set of poles corresponds to the normal modes
Normal modes
• Normal mode methods are closely related to wavenumber integral methods
• The unforced version of depth‐separated wave equation is solved using an eigenanalysis, in which the eigenvectors are the “normal modes” of vibration and the eigenvalues are the horizontal wavenumbers
• The field is calculated by summing contributions from each mode weighted according to the source distribution
• This can be viewed as evaluating the wavenumber integral using a sum of residues, where the poles are at the discrete set of mode horizontal wavenumbers
Fast and accurate at low frequency: Normal Mode, Wavenumber Integral, and Parabolic Equation Models
• Wavenumber integral and normal mode models do not do range‐
dependent oceans – nevertheless, they provide useful baselines for more complicated models
• Wavenumber integral, normal mode, and PE models are SINGLE FREQUENCY, so broadband solutions require multiple runs and an inverse Fourier transform to synthesize the time‐domain waveform; they are VERY accurate at low frequency, perhaps impractical at high frequency (> 3 kHz)
• We are going to present a Parabolic Equation model that we have implemented in CUDA – will go into details in several slides
• First, I’ll briefly cover ray trace and Gaussian beam models – these models are high‐frequency and inherently broadband; can produce broadband impulse response function and convolve an arbitrary source waveform with it
Ray tracing is useful way to look at propagation, but produce artifacts… Gaussian beams better
Single gaussian beam
Ray tracing
FFP reference solution
Gaussian beam models are fast and accurate at high frequency
Gaussian beams
Ray tracing
Reference solution
Split‐step Fourier PE Model
• Exploits FFTs (thus CUDA FFT)
• Has been eclipsed by split‐step Pade (which uses tri‐diagonal solver), which can model wider angles, but Fourier PE still used when computational domain is large (deep ocean)
• Will show shallow and deep water benchmarks
• Will show timing comparisons, CPU vs GPU
Split‐step Fourier Parabolic Equation
Helmholtz equation:
∂2
∂2
1∂
∂ 2
,
0
∂ 2
∂
0
Assume outgoing cylindrical wave and far field (
,
1
0
Ψ ,
0
1
0
0
0
2
≅
≫ 1):
0
0
4
Left with:
∂2 Ψ
∂
2
∂2Ψ
∂Ψ
2
0 ∂
∂
2
0
2
∂2Ψ
2
1 Ψ
0
∂Ψ
Paraxial approximation ( 2 ≪ 2 0 ) leaves standard ∂
∂
parabolic equation (Hardin and Tappert, 1973):
2
∂Ψ
0 ∂
∂2Ψ
∂
2
2
0
2
1 Ψ
0
Jensen, Kuperman, Porter, and Schmidt, Computational Ocean Acoustics, Second Edition, Springer, 2011.
Split‐step Fourier Parabolic Equation
∂Ψ
Use Fourier transform (FFT) to calculate ∂Ψ
,
zΨ
∂
∂2Ψ
,
2
∂ 2
∂
:
,
Ψ ,
This first order differential equation is basis for range step:
2
0
∂Ψ
∂
Ψ
2
0
,
Ψ ,
2
1
2
0
2
Ψ
0
2
2 1
2 0
0
1
Ψ
,
0
Ψ ,
Jensen, Kuperman, Porter, and Schmidt, Computational Ocean Acoustics, Second Edition, Springer, 2011.
Split‐step Fourier Parabolic Equation
Range step can be split several ways:
0
∂2
2
2 0 ∂ 2
2
2
2
1
2 0
2
2
2
0
Name “split‐step”: A refraction and B diffraction step
Jensen, Kuperman, Porter, and Schmidt, Computational Ocean Acoustics, Second Edition, Springer, 2011.
Split‐step Fourier PE Model

Reasonably complete implementation:
Starters: Gaussian, Greene, Thomson, normal mode, RAM self‐starter
Variable density via reduced pressure, seabed attenuation
Three forms of operator splitting, including Thomson‐Chapman splitting




Implemented on:
•
•
•

Host CPU, in Matlab
Host CPU, in C using FFTW ($562, single‐core and multiple core versions with OpenMP)
NVIDIA GTX‐460, in CUDA C using CUDA FFT ($250, Fermi architecture, 336 cores, 1GB RAM)
Getting acceleration of roughly 2‐35 on GPU wrt CPU
Pekeris waveguide benchmark
Scooter
Split step
Thomson starter
40 degrees
Wedge benchmark
RAM
Split step
Self‐starter
Demo • Shows “race” between CPU (FFTW, OpenMP) and GPU (NVIDIA CUDA and CUFFT) models, both with a real‐time display – CPU version gets a 10‐second head start
• Watch how fast the CPU version runs in first 10 seconds, and then compare with GPU version, which is also sharing computer with CPU version
• Full set of examples in YouTube video ‐ search for “Split‐step Fourier CPU vs GPU”
Race: CPU gets 10‐second head start
Race: CPU gets 10‐second head start
Dickins seamount
Examples compared: GPU vs CPU
m5rd
w500
dickins
Desktop workstation
GPU sec
CPU sec Ratio
GPU
GFLOPs/sec
11.05
319.65 28.93
34.632023
1.197348
28.92
32
1
9.83
343.03 34.90
38.945095
1.115724
34.91
64
2
9.34
310.89 33.29
40.984165
1.231072
33.29
128
4
9.25
336.18 36.34
41.371372
1.138483
36.34
256
8
3.11
68.69 22.09
38.460396
1.741282
22.09
32
1
2.83
67.8 23.96
42.318722
1.763977
23.99
64
2
2.71
57 21.03
44.200207
2.098401
21.06
128
4
2.69
64.93 24.14
44.387684
1.841962
24.10
256
8
2.98
19.56
6.56
11.388378
1.733704
6.57
32
1
2.89
18.61
6.44
11.719803
1.822122
6.43
64
2
2.87
15.64
5.45
11.80256
2.167825
5.44
128
4
2.87
15.96
5.56
11.806542
2.124061
5.56
256
8
Totals(sec)
62.42
1621.98
Totals(min)
1.04
27.033
m5rd
dickins
w500
Intel Core i7 950 @ 3.07 GHz
(4 cores, 8 threads)
CPU
GFLOPs/sec
GPU
Ratio threads
CPU
threads
GeForce GTX 460
7 Mps x 48 cores = 336 cores
1.35 GHz, CUDA 4.0
Profiler screen shot – m5rd case, 128 threads per block
Predicting exposure of marine mammals to man‐made noise sources
• Software product (called Simple) developed under NOAA SBIR funding (Phase I and II) • Used by NOAA to assess impact of man made noise on marine environment for environmental impact assessments
• Simple enables non‐acoustic specialists to produce reliable predictions of sound pressure levels due to variety of sources (cargo ships, oil exploration ships, air guns, pile drivers)
• Given sources at particular locations, Simple calculates how loud sound gets at remote location, where marine mammals may be located
• Forms map of loudness relative to source locations
User selects site to work in
Each site has databases for:
• sound speed profiles (by month)
• bathymetry
• seabed type (hard, soft)
User places sources and “pods” of marine mammals on the map
User selects from 122 different
source types and sets locations
User selects from 125 marine mammal species and sets
locations and densities, or relies upon OBIS‐seamap
database of previous sightings
Appropriate model is run along each radial from each source to predict sound pressure levels in vertical planes
Gaussian beam model
Parabolic equation alternative: fast and accurate at low frequency and shallow water
Parabolic equation model
Gaussian beams are versatile, but here we see too few rays have made it past seamount
Gaussian beam model
Parabolic equation handles full wave effects like diffraction better, particularly at low frequency
Parabolic equation model
Once calculations are complete along all radials, map of relevant metric is displayed (e.g. “sound exposure level” or “peak pressure”)
Challenges for CUDA (perhaps OptiX)
• 3D features such as bathymetric canyons and internal waves cannot be handled using Nx2D modeling and require 3D modeling formulations
• Acoustic communications and reverberation observed in multi‐static active sonar systems require significant bandwidth at high frequency
• Modeling acomms channel must handle reflections from moving ocean surface and from the seabed when the platform is moving – this produces time‐varying wideband Doppler effects
Summary
• Initial implementation of split‐step Fourier PE model served as proof‐of‐concept that GPGPU was a viable path for our work
• This work generated healthy interest from our sponsors
• We are engaged in several projects where GPGPU technology is an important ingredient
• Interested to see how GPGPU intersects with low‐
power mobile hardware like TEGRA for use in autonomous vehicles
Gaming laptop
m5rd
dickins
GPU sec
CPU sec Ratio
GPU
GFLOPs/sec
35.52
374.81 10.55
10.776348
1.021144
10.55
32
1
35.82
547.66 15.29
10.686275
0.69885
15.29
64
2
36.19
486.57 13.44
10.574224
0.786596
13.44
128
4
36.01
529.34 14.70
10.627154
0.723031
14.70
256
8
7.78
11.448703
1.471658
7.78
32
1
109.42 10.40
11.373087
1.093098
10.40
64
2
10.45
10.52
81.27
CPU
GFLOPs/sec
GPU
Ratio threads
CPU
threads
10.65
91.34
8.58
11.229033
1.309453
8.58
128
4
10.6
104.82
9.89
11.279705
1.141046
9.89
256
8
8
23.02
2.88
4.239896
1.472965
2.88
32
1
7.99
29.87
3.74
4.240794
1.134978
3.74
64
2
8.02
25.17
3.14
4.226517
1.347116
3.14
128
4
8.03
25.45
3.17
4.224125
1.332057
3.17
256
8
Totals(sec)
217.8
2403.29
Totals(min)
3.63
40.05
w500
Intel Core i7 Q 720 @ 1.60 GHz
Quad core (4 cores, 8 threads)
GeForce GTS 360M
12 Mps x 8 cores = 96 cores
1.32 GHz, CUDA 4.0
GPU server
m5rd
dickins
GPU sec
CPU sec Ratio
GPU
GFLOPs/sec
CPU
GFLOPs/sec
GPU
Ratio threads
CPU
threads
20.77
348.75 16.79
18.425264
1.097438
16.79
32
1
20.58
334.06 16.23
18.599964
1.145707
16.23
64
2
20.63
302.02 14.64
18.554523
1.267248
14.64
128
4
20.66
315.26 15.26
18.525311
1.214019
15.26
256
8
6.21
80.19 12.91
19.275263
1.491578
12.92
32
1
6.19
73.58 11.89
19.335495
1.625393
11.90
64
2
6.2
56.91
9.18
19.29336
2.101644
9.18
128
4
6.21
61.4
9.89
19.256332
1.947969
9.89
256
8
6.42
20.57
3.20
5.27944
1.64818
3.20
32
1
6.44
18.72
2.91
5.264232
1.810779
2.91
64
2
6.45
16.23
2.52
5.257461
2.088724
2.52
128
4
6.47
15.27
2.36
5.243372
2.220888
2.36
256
8
Totals(sec)
133.23
1627.69
Totals(min)
2.2205 27.1281667
w500
Intel Xeon X5550 @ 2.67 GHz
Dual quad core (8 cores, 16 threads)
GeForce GTX 285
30 Mps x 8 cores = 240 cores
1.48 GHz, CUDA 3.2
Profiler screen shot – w500 case, 128 threads per block
Pekeris waveguide benchmark
RAM
Split step
Thomson starter
40 degrees
Download