High Productivity Computing System Program

advertisement
Wavelet Spectral Dimension Reduction of
Hyperspectral Imagery on a
Reconfigurable Computer
Tarek El-Ghazawi1, Esam El-Araby1, Abhishek Agarwal1,
Jacqueline Le Moigne2, and Kris Gaj3
1The
George Washington University,
Space Flight Center,
3George Mason University
{tarek, esam, agarwala}@gwu.edu, lemoigne@backserv.gsfc.nasa.gov, kgaj@gmu.edu
2NASA/Goddard
Objectives and Introduction
Investigate Use of Reconfigurable Computing for
On-Board Automatic Processing of
Remote Sensing Data
 Remote Sensing  Image Classification
 Applications:
 Land Classification, Mining, Geology, Forestry, Agriculture, Environmental
Management, Global Atmospheric Profiling (e.g. water vapor and temperature
profiles), and Planetary Space missions
 Types of Carriers:
Spaceborne
Airborne
El-Ghazawi
2
E229 / MAPLD2004
Types of Sensing
 Mono-Spectral Imagery  1 band
(SPOT ≡ panchromatic)
 Multi-Spectral Imagery  10s of
bands (MODIS ≡ 36 bands,
SeaWiFS ≡ 8 bands, IKONOS ≡ 5
bands)
 Hyperspectral Imagery  100s-
1000s of bands (AVIRIS ≡ 224
bands, AIRS ≡ 2378 bands)
Multispectral / Hyperspectral Imagery
Comparison
El-Ghazawi
3
E229 / MAPLD2004
Different Airborne Hyperspectral Systems
AISA
AURORA
AVIRIS
GER
El-Ghazawi
4
E229 / MAPLD2004
Why On-Board Processing?
Problems
 Complex Pre-
processing Steps:
 Image Registration /
Solutions
 Automatic On-Board Processing
 Reduces the cost and the complexity
of the On-The-Ground/Earth
processing system
Fusion

 Large Data Volumes
larger utilization for broader community,
including educational institutions
 Enables autonomous decisions to be
 Large cost and
taken on-board  faster critical
decisions
complexity of the OnThe-Ground / Earth
processing systems

Applications:
» Future reconfigurable web sensors
missions
» Future Mars and planetary exploration
missions
 Large critical
decisions latency
 Dimension Reduction*
 Large data downlink

bandwidth
requirements

Reduction of communication
bandwidth
Simpler and faster subsequent
computations
* Investigated Pre-Processing Step
El-Ghazawi
5
E229 / MAPLD2004
Why Reconfigurable Computers?
On-Board Processing
Problems
Solutions
 Reconfigurable Computers (RCs)
 Higher performance (throughput and
 High Computational
Complexities
processing power) compared to
conventional processors
 Low performance for traditional
processing platforms
 Lower form / wrap factors compared
 High form / wrap factors (size
to parallel computers
and weight) for parallel
computing systems
 Higher flexibility (reconfigurability)
compared to ASICs
 Low flexibility for traditional
ASIC-Based solutions
 Less costs and shorter time-tosolution compared to ASICs
 High costs and long design
cycles for traditional ASICBased solutions
El-Ghazawi
6
E229 / MAPLD2004
Introduction
512 pixels
Data Arrangement
224 bands
Rows
Pixels ≡ (Rows x Columns)
Parallel Computing Scope,
Reconfigurable Computing 2nd Scope
512 pixels
El-Ghazawi
Columns
Bands
Reconfigurable
Computing 1st Scope
Hyper Image
Matrix Form
8
E229 / MAPLD2004
Data Arrangement (cnt’d)
0
0
1
2
.
.
Bands-1
0
1
2
.
.
Bands-1
(0,1)
(0,cols-1)
Rows
(0,0) (0,1)
(0,0)
(Pixels-1)
(rows-1,0)
(rows-1,cols-1)
Columns
Pixels = Rows X Columns
Hyper Image
El-Ghazawi
0
1
2
.
.
Bands-1
(rows-1,
cols-1)
8 Bits
Array Form
9
E229 / MAPLD2004
Examples of Hyperspectral Datasets
AVIRIS: INDIAN PINES’92 (400x400 by 192 bands)
AVIRIS: SALINAS’98 (217x512 by 192 bands)
El-Ghazawi
10
E229 / MAPLD2004
Dimension Reduction Techniques
 Principal Component
Analysis (PCA):
 Most Common Method
Dimension Reduction
 Does Not Preserve Spectral
Signatures
 Complex and Global
computations: difficult for
parallel processing and
hardware implementations
 Wavelet-Based Dimension
Reduction:
 Preserves Spectral
Signatures
 High-Performance
Implementation
 Simple and Local Operations
El-Ghazawi
Multi-Resolution Wavelet Decomposition
of Each Pixel 1-D Spectral Signature
(Preservation of Spectral Locality)
11
E229 / MAPLD2004
2-D DWT
(1-level Decomposition)
L
H
2
H
L
2
H
2
LL
HL
LH
HH
2
1-D DWT
El-Ghazawi
2
2
L
H
L
12
E229 / MAPLD2004
2-D DWT
(2-level Decomposition)
L
L
L
H
H
2
2
2
L
2
H
2
H
L
2
H
2
HL
2
LH
HH
2
Second Level
First Level
El-Ghazawi
2
2
2
H
L
13
E229 / MAPLD2004
Wavelet-Based vs. PCA
(Execution Time, 500 MHz P3)
Complexity: Wavelet-Based = O(MN) ; PCA = O(MN2+N3)
Timer-Salinas98
158.583
160
140
122.173
Time (sec)
120
100
90.634
94.824
104.178
Wavelet
80
PCA
60
40
20 7.696
0
6/5
7.677
12/4
7.631
24/3
7.715
48/2
9.003
96/1
No.of PC/Level of Decomp.
El-Ghazawi
14
E229 / MAPLD2004
Wavelet-Based vs. PCA (cnt’d)
(Execution Time, 500 MHz P3)
Complexity: Wavelet-Based = O(MN) ; PCA = O(MN2+N3)
Wavelet-Based
PCA
3% 5%
0%
0%
IO_R
IO_R
Comp.
Comp.
IO_W
IO_W
92%
100%
WAVELET
Timer GLOBAL
IO_R
Comp.
IO_W
El-Ghazawi
PCA
No.of PC/Level of Wavelet Decomposition
6/5
12/4
24/3
48/2
96/1
7.696
7.677
7.631
7.715
9.003
0.406
0.412
0.411
0.412
0.41
7.253
7.19
7.069
6.692
7.939
0.037
0.075
0.151
0.311
0.654
15
Timer GLOBAL
IO_R
Comp.
IO_W
No.of PC/Level of Wavelet Decomposition
6/5
12/4
24/3
48/2
96/1
90.634
94.824
104.178 122.173 158.583
0.423
0.395
0.395
0.394
0.394
90.173
94.355
103.633 121.478 157.568
0.038
0.074
0.15
0.301
0.621
E229 / MAPLD2004
Wavelet-Based vs. PCA (cnt’d)
(Classification Accuracy)

Implemented on the HIVE (8 Pentium Xeon/Beowulfs-Type System) 6.5 times faster than
sequential implementation

Classification Accuracy Similar or Better than PCA

Faster than PCA
El-Ghazawi
16
E229 / MAPLD2004
The Algorithm
PIXEL LEVEL
OVERALL
Save Current Level [a] of
Wavelet Coefficients
Read Data
Decompose Spectral Pixel
Read Threshold (Th)
DWT Coefficients
(the Approximation)
Reconstruct
Individual Pixel to Original Stage
Compute Level for Each Individual Pixel
(PIXEL LEVEL)
Reconstructed Approximation
Compute Correlation (Corr)
between Orig and Recon.
Remove Outlier Pixels
Corr < Th
Get Lowest Level (L)
from Global Histogram
No
Yes
Get Current Level [a] of
Wavelet Coefficients
Decompose Each Pixel to Level L
Add Contribution of the Pixel to Global
Histogram
Write Data
El-Ghazawi
17
E229 / MAPLD2004
Prototyping Wavelet-Based Dimension
Reduction of Hyperspectral Imagery
on a Reconfigurable Computer,
the SRC-6E
Hardware Architecture of SRC-6E
El-Ghazawi
19
E229 / MAPLD2004
SRC Compilation Process
Application sources
Macro sources
.c or .f files
.vhd or .v files
HDL
sources
.v files
P Compiler
Logic synthesis
MAP Compiler
Netlists
Object
files
.o files
.o files
Linker
Place & Route
.bin files
Configuration
bitstreams
Application
executable
El-Ghazawi
.ngo files
20
E229 / MAPLD2004
Top Hierarchy Module
X
L1:L5
DWT_IDWT
TH
MUX
Llevel
Y1:Y5
Correlator
GTE_1: GTE_5
Histogram
Level
N
El-Ghazawi
21
E229 / MAPLD2004
Decomposition and Reconstruction Levels of Dimension
Reduction (DWT_IDWT)
Level_1
Level_2
Level_3
Level_4
L
Level_5
L5
2
2
L
2
L4
L’
2
2
L
L
2
L2
2
L3
L’
L’
2
2
2
L’
L
2
L’
2
L1
L0
2
L’
2
2
X
L’
2
L’
L’
2
L’
L’
2
L’
2
2
L’
D
Y1
El-Ghazawi
L’
D
D
Y3
Y2
22
L’
D
Y4
Y5
E229 / MAPLD2004
FIR Filters (L, L’) Implementation
Register
C(1)
Register
C(2)
Register
C(3)
X
X
X
+
Output Image
F(i)
…
Register
C(n)
X
Input Image
D(i)
El-Ghazawi
23
E229 / MAPLD2004
Correlator Module
termAB


term AB    NAi Bi   Ai  Bi   16  2 log 2 N  bits
N
N
 N

termxx
 ( x, y ) 
term
2
X
N
termAB
termxy
MULT
term2xy
2
xy
term xx term yy
 TH 
  16 
2 


2
Shift Left
(32 bits)
Yi
Compare
termAB
termyy
MULT
termxxtermyy
GTE_i
(Increment
Histogrami)
MULT
TH
El-Ghazawi
MULT
24
TH2
E229 / MAPLD2004
Histogram Module
GTE_1
GTE_2
GTE_3
GTE_4
GTE_5
El-Ghazawi
Update
Histogram
Counters
cnt_1
cnt_2
cnt_3
Level
Selector
Level
cnt_4
cnt_5
25
E229 / MAPLD2004
Resource Utilization and Operating Frequency
El-Ghazawi
26
E229 / MAPLD2004
Measurements Scenarios
µP Functions
MAP
Function
MAP
Alloc.
Read
Data
CM to
OBM
Compute
OBM
Transfer-In
Computations
to CM
Write
Data
MAP
Free
Transfer-Out
Repeat
nstreams times
End-to-End time (HW)
Configuration + End-to-End time (SW)
Allocation
time
El-Ghazawi
End-to-End time with I/O
27
Release
time
E229 / MAPLD2004
SRC Experiment Setup and Results
 Salinas’98
 217 X 512 Pixels, 192 Bands = 162.75 MB
 Number of Streams = 41
 Stream Size = 2730 voxels ≈ 4 MB
DMAIN
Compute
DMAOUT
DMAIN
Compute
DMAOUT
DMAIN
Compute
DMA-IN
TDMA-IN
Compute
DMA-OUT
TCOMPUTATIONS
TDMA-OUT
TTOTAL
DMAOUT
 Non-Overlapped Streams
 TDMA-IN = 13.040 msec
 TCOMP = 0.62428 msec
 TDMA-OUT = 22.712 msec
 TTotal = 1.49 sec
 Throughput = 109.23 MB/Sec
DMA-IN
DMA-OUT
Compute
DMA-IN
DMA-OUT
Compute
DMA-IN
DMA-OUT
Compute
 Overlapped Streams
 TDMA = 35.752 msec
 TCOMP = 0.62428 msec
 Xc = 0.0175
 Throughput = 111.14 MB/Sec
 Speedupnon-overlapped = (1+ Xc) =
1.0175 (insignificant)
El-Ghazawi
28
E229 / MAPLD2004
Execution Time
Salinas'98
40
33.05
35
30.22
Time (sec)
30
20.44
20.23
20
15
P3 (500MHz)
23.21
25
16.16
14.27
10
5
20.21
Intel Xeon (1.8GHz)
SRC-6E (Non-Overlapped)
SRC-6E (Overlapped)
12.34
8.60
1.491.47
1.491.47
1.491.47
1.491.47
1.491.47
1
2
3
4
5
0
Level of Decomposition
El-Ghazawi
29
E229 / MAPLD2004
Distribution of Execution Times
El-Ghazawi
30
E229 / MAPLD2004
Speedup Results
Salinas'98
25.00
Speedup
20.00
No Overlapping Speedup (P3,
500MHz)
No Overlapping Speedup (Xeon,
1.8GHz)
Overlapping Speedup (P3,
500MHz)
Overlapping Speedup (Xeon,
1.8GHz)
15.00
10.00
5.00
0.00
1
2
3
4
5
Level of Decomposition
El-Ghazawi
31
E229 / MAPLD2004
Concluding Remarks
 We prototyped the automatic wavelet-based dimension
reduction algorithm on a reconfigurable architecture
 Both coarse-grain and fine-grain parallelism are exploited
 We observed a 10x speedup using the P3 version of SRC-
6E. From our previous experience we expect this speedup
to double using the P4 version of SRC machine
 These speedup figures were obtained while I/O is still
dominating. The speedup can be increased by improving
I/O Bandwidth of the reconfigurable platforms
El-Ghazawi
32
E229 / MAPLD2004
Download