260_1 - GPUComputing.net

advertisement
Fast ARFTIS Reconstruction Algorithms using CUDA
Deqi Cui*, Ningfang Liao, Wenmin Wu, Boneng Tan, Yu Lin
Department of Optical and Electronical Engineering, Beijing Institute of Technology,
No.5 Zhongguancun South Street, Haidian District, Beijing, 100081, China
cuideqi@bit.edu.cn
Abstract. In order to realize the fast reconstruction processing of all-reflective Fourier Transform Imaging
Spectrometer (ARFTIS) data, the authors implement reconstruction algorithms on GPU using Compute
Unified Device Architecture (CUDA). We use both CUDA 1DFFT library and customization CUDA
parallel kernel to accelerate the spectrum reconstruction processing of ARFTIS. The results show that the
CUDA can drive hundreds of processing elements (‘manycore’ processors) of GPU hardware and can
enhance the efficiency of spectrum reconstruction processing significantly. Farther, computer with CUDA
graphic cards will implement real-time data reconstruction of ARFTIS alone.
Keywords. ARFTIS; Reconstruction; CUDA; Parallel.
1
Introduction
For hyperspectral Fourier transform imaging spectrometer (FTIS) the data have two distinct characteristics: one is a
large quantity of original data; the other is a large amount of calculation for reconstruction algorithm [1, 2]. This type of
imaging spectrometer has been adopted as optical payload in Chinese "Chang'e I" lunar exploration satellite and Chinese
"Huanjing I" environmental satellite. For the specificity of processing hyperspectral data, it will consume a lot of time,
one-track data takes one day, using the general-purpose computer currently. It is difficult to satisfy the demand of users,
especially for emergency command (such as 5•12 Wenchuan earthquake). Therefore, improving processing speed is
particularly important.
Graphics processing units (GPUs) originally designed for computer video cards have emerged as the most powerful
chip in a high-performance workstation. Unlike multicore CPU architectures, which currently ship with two or four cores,
GPU architectures are "manycore" with hundreds of cores capable of running thousands of threads in parallel [3]. From
the above, we can see the powerful computational capability of the GPU. Moreover, as the programmability and parallel
processing emerge, GPU begins being used in some non-graphics applications, which is general-purpose computing on
the GPU (GPGPU).
The emergence of CUDA (Compute Unified Device Architecture) technology can meet the demand of GPGPU in
some degree. CUDA brings the C-like development environment to programmers for the first time, which uses a C
compiler to compile programs, and provides some CUDA extended libraries [4-6]. Users needn’t map programs into
graphics APIs anymore, so GPGPU program development becomes more flexible and efficient. Our goal in this paper is
to examine the effectiveness of CUDA as a tool to express parallel computation in FTIS applications.
2
Interfere Data Reconstruction Theory
Firstly, introduce all-reflective Fourier transform imaging spectrometer (ARFTIS) designed by our team. ARFTIS
works on FTIS imaging principle. The optical structure of ARFTIS is shown in Figure 1. It consists of an all-reflected
three-mirror anastigmatic (TMA) telescope, an entrance slit, the Fresnel double mirror, the reflective collimating system,
the reflective cylindrical system and the focal plane array (FPA) detector. The incidence beam is first imaged onto the
entrance silt by the TMA telescope system, and then the wavefront is split by the Fresnel double mirror, and it is
collimated by the collimating system, then through the cylindrical system to image the interference fringes onto the FPA
[7]
.
Interfere data cube can be transformed to spectrum data cube by data reconstruction processing, which includes
interference frame filtering, window filtering, spectral reconstruction and phase-corrected etc.[8-10]. Spectral
reconstruction and window filtering are able to be accelerated befittingly. The following parts will introduce the basic
theories of them separately.
2
Deqi Cui*, Ningfang Liao, Wenmin Wu, Boneng Tan, Yu Lin
Fig. 1 All-Reflected Fourier Transform Imaging Spectrometer Principle
2.1
Spectral Reconstruction
The spectral cube can be obtained by using one-dimensional FFT transform to each line of each frame with in
interference cube separately. The data must be symmetric processing in order to carry out FFT transform, if the original
interference data is a single side acquisition. After symmetric processing we can obtained a double side interference data
I  l  , then the modulus of the Fourier transform of I  l  is the spectral of measurement B  
.
N 1
 2 lk 
B( )  F [ I (l )]   f (l )exp   j

N 

l 0
2.2
k=0, 1, 2, 3….N-1
(1)
Window Filtering
Theoretically, the monochromatic spectra B  (as impulse function) can be obtained by a Fourier transform from the
interferogram I l  (as cosine function), which the optical path difference is from the negative infinity to positive infinity.
But, in fact, the acquisition of FTIS I ' l  is only a limited optical path difference (  L ~ L ) within the interferogram,
which is equivalent to add a door function T l  to the cosine function, and the spectra will be B'   . After Inverse Fourier
Transform in interferogram, it is no longer an impulse function, but a sin c function, which generates errors in spectral
reconstruction. Therefore, window filtering is required before Fourier Transform. The window filtering function can be
selected from Bartlett (triangular), Blackman, Hamming and Hanning etc.
3
3.1
Parallel Optimization
CUDA Introduction
The hardware architecture can be seen from Figure 2. CUDA-enable cards contain many SIMD stream multiprocessors (SM), and each SM has also several stream processors (SP) and Super Function Unit (SFU). These cards use
on-chip memories and cache to accelerate memory access.
Fast ARFTIS Reconstruction Algorithms using CUDA
Fig. 2 CUDA Hardware Block Diagram
3
Fig. 3 Compiling CUDA
For software, CUDA mainly extended the driver program and function library. The software stack consists of driver,
runtime library, some APIs and NVCC compiler developed by NVIDIA [11]. In Figure 3, the integrated GPU and CPU
program is compiled by NVCC, and then the GPU code and CPU code are separated.
3.2
Implementation Spectral Reconstruction
For Fourier Transform, CUDA provide a library of CUFFT which contains many highly optimized function
interfaces[12], we can call these APIs simply. Before using the CUFFT library, users just need to create a Plan of FFT
transform, and then call the APIs. For FFT transform, device memory is needed to be allocated when creating the Plan
and the device memory will not vary in succeeding computations. For window filtering we use customization CUDA
parallel kernel to implementation matrix computing.
The framework designed is considered the two-dimensional irrelevance of reconstruction processing, that means the
reconstruction processing is parallelizable in space-dimensional. We designed parallel optimization reconstruction
arithmetic, which is parallel processing per frame interference data of a point spectral. The flow chart shows in Figure 4.
Beginning of Parallel Processing
Parallel Execution by Frame
Window Filtering
Spectral Reconstruction
Call CUFFT library
Fig. 4 Flow Chart of Parallel Optimization
Part Codes
// Allocate host memory for the signal
Complex* h_signal = (Complex*)malloc(sizeof(Complex) * SIGNAL_N * lines);
// Allocate device memory for signal
Complex* d_signal;
const unsigned int mem_size = sizeof(Complex) * SIGNAL_N * lines;
4
Deqi Cui*, Ningfang Liao, Wenmin Wu, Boneng Tan, Yu Lin
CUDA_SAFE_CALL(cudaMalloc((void**)&d_signal, mem_size));
// Copy host memory to device
CUDA_SAFE_CALL(cudaMemcpy(d_signal, h_signal, sizeof(Complex)*SIGNAL_N*lines,
cudaMemcpyHostToDevice));
// CUFFT plan
cufftHandle plan;
CUFFT_SAFE_CALL(cufftPlan1d(&plan, SIGNAL_N, CUFFT_C2C, 1));
Complex* p_signal = (Complex*)d_signal;
for (int line = 0; line < lines; line++)
{
// Transform signal and kernel
CUFFT_SAFE_CALL(cufftExecC2C(plan, (cufftComplex *)p_signal, (cufftComplex
*)p_signal, CUFFT_INVERSE));
p_signal += SIGNAL_N;
}
//Destroy CUFFT context
CUFFT_SAFE_CALL(cufftDestroy(plan));
// Copy device memory to host
CUDA_SAFE_CALL(cudaMemcpy(h_signal, d_signal, sizeof(Complex)*SIGNAL_N*lines,
cudaMemcpyDeviceToHost));
// cleanup memory
free(h_signal);
CUDA_SAFE_CALL(cudaFree(d_signal));
4
Experiment Results and Analysis
We experiment with simulation data, which is generate by FTIS Simulation Software (developed by CSEL, Beijing
Institute of Technology) The data parameters are as follows, the total pixel of each scene of interfere data cube is 512
(track through width) × 512 (track along width) × 512 (interference-dimensional, double side). The input data precision is
12-bit integer per pixel. The output spectral data is 115bands per point, saved as 16-bit float. Reconstruction processing
includes window filtering and 512 points FFT.
We carried out parallel optimization using NVIDIA GTX 280. The hardware specifications show in table 1. Program
development language is C++, and compiler is NVCC and the Microsoft Visual Studio 2008 compiler. The experiment
environment is Intel Core 2 Duo E8500 CPU (dual-core, 3.16G), Windows XP 32bit OS.
Table 1 NVIDIA GTX 280 Hardware Specifications
Texture processor clusters (TPC)
10
Streaming multiprocessors (SM) per TPC
3
Streaming processors (SP) per SM
8
Super function units (SFU) per SM
2
Total SPs in entire processor array
240
Peak floating point performance GFLOPS
933
Experiment with one scene data (512 frames × 512 lines × 512 points × 12bit). Interference pattern shows in Figure 5;
reconstruction image shows in Figure 6; spectral result shows in Figure 7;
Fast ARFTIS Reconstruction Algorithms using CUDA
5
Fig. 5 the Interference Pattern of One Scene
A
A
A
Fig. 6 Reconstruction Image, Left: Original Simulation Image;
Middle: 510nm Reconstruction Band Image; Right: 890nm Reconstruction Band Image.
Fig. 7 Reconstruction Spectrum of Point A
Left: Original Simulation Spectrum; Right: Reconstruction Spectrum.
From Figure 6 and Figure 7 we can see that the reconstructed image matched with the original simulation image, and
spectral characteristics is correct. However, there are some small difference between reconstructed spectra value and
original simulation spectra value. We consider it impact by the sampling precision of simulation and computing precision
of reconstruction, and it is in an acceptable error range.
The optimize results show in table 2. It can be clearly seen from Table 2 that the effect of speed-up on GPU
processing is obvious. Large memory block on GPU is better to accelerate the speed of stream processor shared bus
access than system memory.
6
Deqi Cui*, Ningfang Liao, Wenmin Wu, Boneng Tan, Yu Lin
Table 2 Optimize Results Compared CPU with GPU
Mode
Time(s) Speed-up
CPU
116.8
1
CPU + GPU 1MB block
8.3
14.07
CPU + GPU 512MB block
5.5
21.24
5
Conclusions
In this paper, we introduced novel implementations of ARFTIS reconstruction algorithms based on a novel
technology and new type hardware. Using CUDA programming and CUDA-enabled GPUs we can easily implement
basic reconstruction algorithms. It has been proved that the algorithm can enhance the efficiency of the spectrum
reconstruction processing significantly. However, the advantages in performance and efficiency on CUDA depend on
proper memory block allocated. It provides a feasible example for parallel optimizing of ARFTIS data reconstruction by
using CUDA, as well as providing a reference for using GPGPU computing effectively.
Future work includes implement more optional algorithms for ARFTIS. Another work is to optimize CUDA
efficiency on multi-core CPU.
Acknowledgment
The authors gratefully acknowledge the support of National Natural Science Foundation of China, No.60377042;
National 863 Foundation of China, No.2006AA12Z124; Aviation Science Foundation, No.20070172003 and Innovative
Team Development Plan of Ministry of Education, No.IRT0606; National 973 Foundation of China, No.2009CB7240054.
References
[1]
CHU Jianjun. The Study of Imaging Fourier Transform Spectroscopy[D]. Beijing: Beijing Institute of Technology, 2002. (in
Chinese)
[2]
Leonard John Otten III, R. Glenn Sellar, and Bruce Rafert. Mighty Sat II.1 Fourier-transform hyperspectral imager payload
performance[J]. Proc. SPIE, 1995, 2583: 566.
[3]
Hiroyuki Takizawa, Noboru Yamada, Seigo Sakai et al. Radiative Heat Transfer Simulation Using Programmable Graphics
Hardware [J]. Proceedings of the 5th IEEE/ACIS International Conference on Computer and Information Science, 2006, 0-76952613-6/06
[4]
NVIDIA, GPU Computing Programming a Massively Parallel Processor, 2006.
[5]
NVIDIA, NVIDIA CUDA Programming guide version 2.0, Available: http://www.developer.download.nvidia.com
[6]
J. Tölke, Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture, submitted to Computing
and Visualization in Science, 2007
[7]
Deqi Cui, Ningfang Liao, Ling Ma et al. Spectral calibration method for all-reflected Fourier transform imaging spectrometer
Proc. SPIE, 2008, 716022.
[8]
MA Ling. The Study of Spectral Reconstruction for Imaging Fourier Transform Spectrometer[D]. Beijing: Beijing Institute of
Technology, 2008.(in Chinese)
[9]
Griffiths PR. Fourier Transform Infrared Spectrometry[M]. New York: Wiley Interscience Publication, 1986.
[10] Andre J. Villemaire, Serge Fortin, Jean Giroux et al. Imaging Fourier transform spectrometer[J]. Proc. SPIE, 1995, 2480: 387.
[11] NVIDIA, The CUDA Compiler Driver NVCC, Available: http://www.developer.download.nvidia.com
[12] NVIDIA, CUFFT Library version 1.1, Available: http://www.developer.download.nvidia.com
Download