Radar Pulse Compression Using the NVIDIA CUDA SDK Stephen Bash, David Carpman, and David Holl HPEC 2008 September 23-25, 2008 MIT Lincoln Laboratory 0839-118-1 This work is sponsored by the Air Force Research Laboratory under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and not necessarily endorsed by the United States Government. NVIDIA Compute Unified Device Architecture SDK • • • • Create custom kernels that run on GPU Extension of C language Provides driver- and runtime-level APIs Includes numerical libraries – CUFFT – CUBLAS • $/GFLOP GPU=$1.27 CPU=$29.18 HPEC 07: Benchmarking the NVIDIA 8800GTX with the CUDA Development Platform Michael McGraw-Herdeg, MIT Douglas P. Enright, The Aerospace Corporation B. Scott Michel, The Aerospace Corporation ©2007 The Aerospace Corporation MIT Lincoln Laboratory 0839-118-2 NVIDIA Compute Unified Device Architecture SDK • • • • Create custom kernels that run on GPU Extension of C language Provides driver- and runtime-level APIs Includes numerical libraries – CUFFT – CUBLAS • $/GFLOP GPU=$1.27 CPU=$29.18 HPEC 07: Benchmarking the NVIDIA 8800GTX with the CUDA Development Platform Michael McGraw-Herdeg, MIT Douglas P. Enright, The Aerospace Corporation B. Scott Michel, The Aerospace Corporation ©2007 The Aerospace Corporation MIT Lincoln Laboratory 0839-118-3 Radar Pulse Compression • Waveform design and processing Replica to achieve higher range resolution and sensitivity* Fast Time FFT • Processing consists of Fast Time IFFT convolution with FIR filter – Doppler tolerant (top): traditional frequency domain convolution – Doppler intolerant (bottom): additional FFT and Doppler correction required Slow Time FFT Doppler Correction Replica Fast Time FFT Fast Time IFFT Fast Time IFFT MIT Lincoln Laboratory 0839-118-4 * Skolnik, Radar Handbook, Second Edition. McGraw Hill Publishing, Boston, MA, 1990. GPU vs. CPU Comparison CPU vs GPU comparison in real-world conditions – – 2 GHz dual quad-core AMD Opterons vs eVGA eGeForce 8800 Ultra Memory transfer to and from GPU included in timing 1D FFT 500 3 490 2 480 1 0.5 0.3 1 4 16 64 Stage Time (ms) 27000 10368 4725 1960 1000 65536 32768 16384 8192 4096 2048 1024 GPU Speedup FFT Size • Processing Time Per Stage CPU GPU 60 50 40 30 20 10 0 256 Batch Size MIT Lincoln Laboratory 0839-118-5 Backups MIT Lincoln Laboratory 0839-118-6 Reference: $/GFLOP As of July 2007, these products represent the top of the line consumer CPU and graphics card according to floating point computational power: 1. Kentsfield Core 2 Extreme QX6800 37.7 GFLOPS – fastest CPU as of 7/16/2007 http://www.tomshardware.com/2007/07/16/cpu_charts_2007/page36.html $1100 – price as of March 10, 2008 http://www.google.com/products?q=Kentsfield+Core+2+Extreme+QX6800 $/GFLOPS = $29.18 Notes: Price excludes motherboard + power supply + memory + GPU 2. EVGA GeForce 8800 Ultra Superclocked (NVIDIA) 576 GFLOPS – theoretical peak http://en.wikipedia.org/wiki/GeForce_8_Series $730 – price as of March 10, 2008 http://www.google.com/products?q=768-P2-N887-AR&scoring=p $/GFLOPS = $1.27 Notes: Price includes 768 MB GDDR3 memory, but excludes: motherboard + power supply + CPU MIT Lincoln Laboratory 0839-118-7