Randomized methods in lossless compression of hyperspectral data Qiang Zhang V. Paúl Pauca

advertisement
Randomized methods in lossless
compression of hyperspectral data
Qiang Zhang
V. Paúl Pauca
Robert Plemmons
Randomized methods in lossless compression
of hyperspectral data
Qiang Zhang,a V. Paúl Pauca,b and Robert Plemmonsc
a
Wake Forest School of Medicine, Department of Biostatistical Sciences, Winston-Salem,
North Carolina 27157
qizhang@wakehealth.edu
b
Wake Forest University, Department of Computer Science, Winston-Salem, North Carolina
27109
c
Wake Forest University, Departments of Mathematics and Computer Science, Winston-Salem,
North Carolina 27109
Abstract. We evaluate recently developed randomized matrix decomposition methods for fast
lossless compression and reconstruction of hyperspectral imaging (HSI) data. The simple random projection methods have been shown to be effective for lossy compression without severely
affecting the performance of object identification and classification. We build upon these methods to develop a new double-random projection method that may enable security in data transmission of compressed data. For HSI data, the distribution of elements in the resulting residual
matrix, i.e., the original data subtracted by its low-rank representation, exhibits a low entropy
relative to the original data that favors high-compression ratio. We show both theoretically and
empirically that randomized methods combined with residual-coding algorithms can lead to
effective lossless compression of HSI data. We conduct numerical tests on real large-scale
HSI data that shows promise in this case. In addition, we show that randomized techniques
can be applicable for encoding on resource-constrained on-board sensor systems, where the
core matrix-vector multiplications can be easily implemented on computing platforms such as
graphic processing units or field-programmable gate arrays. © 2013 Society of Photo-Optical
Instrumentation Engineers (SPIE) [DOI: 10.1117/1.JRS.7.074598]
Keywords: random projections; hyperspectral imaging; dimensionality reduction; lossless compression; singular value decomposition.
Paper 12486SS received Jan. 3, 2013; revised manuscript received Apr. 18, 2013; accepted for
publication Jun. 14, 2013; published online Jul. 30, 2013.
1 Introduction
Hyperspectral image (HSI) data are the measurements of the electromagnetic radiation reflected
from an object or a scene (i.e., materials in the image) at many narrow wavelength bands.
Spectral information is important in many fields such as environmental remote sensing, monitoring chemical/oil spills, and military target discrimination. For comprehensive discussions,
see Refs. 1–3. HSI data is being gathered in sensors of increasing spatial, spectral, and radiometric resolutions leading to the collection of truly massive datasets. The transmission, storage,
and processing of these large datasets present significant difficulties in practical situations as
new-generation sensors are used. For example, for aircraft or for increasingly popular
unmanned-aerial vehicles carrying hyperspectral scanning imagers, the imaging time is limited
by the data capacity and computational capability of the on-board equipment; since within 5 to
10 s, hundreds to thousands of pixels of hyperspectral data are collected and often preprocessed.1
For real-time on-board processing, it would be desirable to design algorithms capable of compressing such amounts of data within 5 to 10 s, before the next section of the scene is scanned.
This requirement makes it difficult to apply algorithms such as JPEG2000,4 three-dimensional
(3-D)-SPIHT,5 or 3-D-SPECK,6 unless it is being deployed on acceleration platforms such as
digital signal processor,7 graphic processing unit (GPU), or field-programmable gate array
0091-3286/2013/$25.00 © 2013 SPIE
Journal of Applied Remote Sensing
074598-1
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
(FPGA). For example, Christophe and Pearlman8 reported over 2 min of processing time using
3-D-SPIHT with random access for a 512 × 512 × 224 HSI dataset, including 30 s for the discrete wavelet transformation.
Dimensionality reduction methods can provide means to deal with the computational
difficulties of hyperspectral data. These methods often use projections to compress a highdimensional data space represented by a matrix A into a lower-dimensional space represented
by a matrix B, which is then factorized. For HSI processing, hundreds of bands of images can be
grouped in a 3-D data array, also called a tensor or a datacube, which can be unfolded into a
matrix A from which B is obtained and then factorized. Such factorizations are referred to as lowrank matrix factorizations, resulting in a low-rank matrix approximation to the original HSI data
matrix A.2,9–11
However, dimensionality reduction techniques provide lossy compression, as the original
data is not exactly represented or reconstructed from the lower-dimensional space. Recent efforts
to provide lossless compression exploit the correlation structure within HSI data, encoding the
residuals (original data—approximation) after stripping off the correlated parts.12,13 Given the
large number of pixels, such correlations are often restricted to spatially or spectrally local areas,
whereas dimensionality reduction techniques essentially explore the global correlation structure.
In this paper, we propose the use of randomized dimensionality reduction techniques for efficiently capturing global correlation structures and residual encoding, as in Ref. 13, and for providing lossless compression. The success of this approach requires low entropy of the
distribution of the residual data relative to the original, and as it shall be observed in the experimental section this appears to be the case with HSI data.
The most popular methods for low-rank factorizations employ the singular value decomposition (SVD), e.g., Ref. 14, and can lead to popular data analysis methods such as principal
component analysis (PCA).15 Compared with algorithms that employ fixed basis functions,
such as 3-D wavelets in JPEG2000, 3-D-SPIHT, and 3-D-SPECK, the basis given by the
SVD or PCA are data driven and provide a more compact representation of the original
data. Moreover, by the optimality of the truncated SVD’s (TSVD) low-rank approximation,14
the Frobenius norm of the residual matrix is also optimal, and a low entropy in its distribution
may be expected. Both the SVD and PCA can be used to represent an n-band hyperspectral
dataset with the data size equivalent to only k bands, where k ≪ n. For applications of the
SVD and PCA in HSI, see Refs. 16–19. The main disadvantage of using the SVD is its computation time: Oðmn2 Þ floating-point operations (flops) for an m × n matrix (m ≥ n) (Ref. 20).
With recent technology, HSI datasets can easily be at the million pixel or even giga pixel-level,
rendering the use of a full SVD impractical on real scenarios.
The recent development of probabilistic methods for approximated singular vectors and singular values has provided a way to circumvent the computational complexity of the SVD, though
at the cost of optimality in the approximation.21 These methods begin by randomly projecting the
original matrix to obtain a lower-dimensional matrix, while keeping the range of the original
matrix asymptotically intact. The much smaller-projected matrix is then factorized using a fullmatrix decomposition such as the SVD. The resulting singular vectors are backprojected to the
original space. Compared with deterministic methods, probabilistic methods often offer lowercomputational cost, while still achieving high-accuracy approximations (see Ref. 21 and the
references therein).
Chen et al.22 have recently provided an extensive study on the effects of linear projections on
the performance of target detection and classification of HSI. In their tests, they found that the
dimensionality of hyperspectral data can typically be reduced to 1∕5 ∼ 1∕3 that of the original
data without severely affecting the performances of classical target detection and classification
algorithms. Compressive sensing approaches for HSI also take advantage of redundancy along
the spectral dimension,11,17,23–25 and involve random projection of the data onto a lower-dimensional space. For example, Fowler17 proposed an approach that exploits the use of compressive
projections in sensors that integrate dimensionality reduction and signal acquisition to effectively
shift the computational burden of PCA from the encoder platform to the decoder site. This technique, termed compressive-projection PCA (CPPCA), couples random projections at the
encoder with a Rayleigh–Ritz process for approximating eigenvectors at the decoder. In its
use of random projections, this technique possesses a certain duality with newer randomized
Journal of Applied Remote Sensing
074598-2
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
SVD (rSVD) approaches recently proposed.19 However, CPPCA recovers coefficients of a
known sparsity pattern in an unknown basis. Accordingly, CPPCA requires the additional
step of eigenvector recovery.
In this paper, we will present several randomized algorithms designed for the on-board
processing of the lossy and the lossless compressions of HSI. Our goals include the fast processing of hundreds of pixels of hyperspectral data within a time frame of 5 s and to achieve a lossless
compression ratio (CR) close to 3. The structure in the remainder of the paper is as follows. In
Sec. 2, we present several fast-randomized methods for the purposes of lossless compression and
reconstruction, suitable for on-board and off-board (receiving station) processing. In Sec. 3, we
apply the methods to a large HSI dataset to demonstrate the efficiency and effectiveness of the
proposed methods. We conclude with some observations in Sec. 4.
2 Methodology
Randomized algorithms have recently drawn a large amount of interest,21 and here we exploit
this approach specifically for efficient on-board lossless compression and data transfer and offboard reconstruction of HSI data. For lossless compression, the process is as follows:
1. Calculate a low-rank approximation of the original data using randomized algorithms.
2. Encode the residual (original data—approximation) using standard integer or floating
point-coding algorithms.
We present several randomized algorithms for efficient low-rank approximation. They can be
written in fewer than 10 lines of pseudo-code, can be easily implemented on PC platforms, and
may be ported to platforms such as GPUs or FPGAs. As readers will see, in all of the large-scale
computations only matrix-vector multiplications are involved, and more computationally intensive SVD computations involve only small scale matrices.
In the encoding and decoding algorithms that follow, it is assumed that HSI data is collected
in blocks of size nx × ny × n, where nx and ny are the number of pixels along the spatial dimensions and n is the number of spectral bands. During compression, each block is first unfolded
into a two-dimensional array of size m × n, where m ¼ nx ny , by stacking each slice of size nx ×
ny into a one-dimensional array of size m × 1. The compact representation for each block can
then be stored on board. See Sec. 3 for a more extensive discussion of compression of HSI data in
blocks as the entire dataset is being gathered.
We start by defining terms and notations. The SVD of a matrix A ∈ Rm×n is defined as
A ¼ UΣV T , where U and V are orthonormal and the columns of which are denoted as ui
and vi , respectively. Σ is a diagonal matrix with entries σ 1 ≥ σ 2 ≥ · · · ≥ σ p ≥ 0, with p ¼
minðm;
PnÞ. For some k ≤ p, the TSVD rank-k approximation of A is a matrix Ak such that
Ak ¼ ki¼1 σ i ui vTi ¼ Uk Σk V Tk , where U k and V k contain the first k-columns of U and V, respectively. The residual matrix obtained from the approximation of A with Ak is given by
R ¼ A − Ak . By the Eckart–Young theorem,14 Ak is the optimal rank-k approximation of A minimizing the Frobenius norm of R.
2.1 Single-Random Projection Method
Computing low-rank approximations of a large matrix using the SVD is prohibitive in most of
the real-world applications. Randomized projections into lower-dimensional spaces provide a
feasible way to get around this problem. Let P ¼ ðpij Þm×k1 be a matrix of size m × k1 with
random independent and identically distributed (i.i.d.) entries drawn from N ð0; 1Þ. We define
the random projection of the row space of A onto a lower k1 -dimensional subspace as
B ¼ PT A:
(1)
If P is of size n × k1 , then B ¼ AP is a similar random projection of the column space of A.
Given a target rank k, Vempala26 uses such P matrices to propose an efficient algorithm for
computing a rank-k approximation of A. The algorithm consists of the following three simple steps:
Journal of Applied Remote Sensing
074598-3
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
pffiffiffiffiffi
1. Compute the random projection
B ¼ 1∕ k1 PT1 A for some k1 ≥ c log n∕ϵ2
P
2. Compute the SVD, B ¼ i¼1 λi u^ i v^ Ti
P
3. Return: Ãk ←Að ki¼1 v^ i v^ Ti Þ ¼ AV^ k V^ Tk .
It is also shown in Ref. 26 that with a high probability, the norm error between Ãk and A is
bounded by
kA − Ãk k2F ≤ kA − Ak k2F þ 2ϵkAk k2F ;
(2)
where Ak is is the optimal rank-k approximation provided by the TSVD. This bound shows that
the approximation Ãk is near optimal for small ϵ.
During HSI remote sensing data acquisition, Vempala’s algorithm may enable lossy compression by efficiently computing and storing AV^ k and V^ k on-board as the data is being gathered.
The storage requirement of AV^ k and V^ k is proportional to ðm þ nÞk compared with mn of the
original data. For lossless compression, the residual R ¼ A − Ãk may be compressed with an
integer or floating point-coding algorithm and also stored on board.
Encoding and decoding procedures using Vempala’s algorithm are presented in Algorithms 1
and 2, respectively. For lossy compression, R^ may be ignored. Clearly, there is a tradeoff between
the target rank, reducing the size of AV^ k and V^ k , and the compressibility of the residual R, which
is also dependent on the type of data being compressed. Figure 1 illustrates this tradeoff, assuming that the entropy of the residual decreases as a scaled power law in the form of k−s ∕α for
s ¼ 0.1∶0.1∶2 and with constant α.
Matrix P1 plays an important role in the efficient low-rank approximation of A. P1 could be
fairly large depending on the prespecified value of ϵ. For example, for ϵ ¼ 0.15, c ¼ 5, and
n ¼ 220, P requires k1 ≥ 1199 columns. However, P1 is needed only once in the compression
process, and may be generated in blocks (see Sec. 3). In addition, the distribution of random
entries in P1 is symmetric, being drawn from a normal distribution. Zhang et al.27 relax this
requirement to allow for any distribution with a finite variance. For faster implementation, a
Algorithm 1 On-Board Random Projections Encoder.
Input: HSI data block of size n x × n y × n, unfolded into a m × n array A, target rank k , and approximation
tolerance ϵ.
Output: V^ k , W , R^
pffiffiffiffiffiffi
1. Compute B ¼ 1∕ k 1 P T1 A, for some k 1 ≥ c log n∕ϵ2 .
2. Compute the SVD of B: B ¼
P
^ i v^ Ti .
i¼1 λi u
T
3. Construct the rank-k approximation: Ãk ¼ W V^ k ; W ¼ AV^ k .
4. Compute the residual: R ¼ A − Ãk .
5. Encode the residual as R^ with a parallel coding algorithm.
6. Store V^ k , W , and R^
Algorithm 2 Off-Board Random Projection Decoder.
^
Input: V^ k , W , R.
Output: The original matrix A.
1. Decode R from R^ with a parallel decoding algorithm.
T
2. Compute the rank-k approximation: Ãk ¼ W V^ k .
3. Reconstruct the original: A ¼ Ãk þ R
Journal of Applied Remote Sensing
074598-4
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
14
compression ratio
12
10
8
6
4
2
2
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
150
200
desired rank
250
300
Fig. 1 Theoretical compressibility curves when entropy of the residual decreases as ðk ∕αÞ−s for
k ¼ 2; : : : ; 300, s ¼ 0.1: : : 2 and a constant α ¼ 2. The dashed line indicates a compressed ratio of
1 (original data).
circulant random matrix could also be effective,27,28 needing storage of only one random
vector.
2.2 Double-Random Projections Method
A variant of the above low-rank approximation approach may be derived by introducing a second
random projection for the row space
B2 ¼ AP2 ;
(3)
where P2 ∈ Rn×k2 has i.i.d. entries drawn from N ð0; 1Þ and B2 ∈ Rm×k2 . Substitution of A in
Eq. (3) with its rank-k approximation AV^ k V^ Tk results in
B2 ≈ AV^ k V^ Tk P2 :
(4)
Notice that V^ Tk P2 has full-row rank, hence its Moore–Penrose pseudo-inverse satisfies
ðV^ Tk P2 ÞðV^ Tk P2 Þ† ¼ I k :
(5)
Multiplying Eq. (4) on both sides with ðV^ Tk P2 Þ† gives
B2 ðV^ Tk P2 Þ† ≈ AV^ k :
(6)
A new rank-k approximation of A can then be obtained as
A^ k ¼ B2 ðV^ Tk P2 Þ† V^ Tk ≈ AV^ k V^ Tk ≈ A:
(7)
As in Vempala’s algorithm, the quality of this approximation depends on choosing a sufficiently large value of k2 ≥ 2k þ 1 (see Ref. 27 for a more detailed discussion). We refer to this
method as the double-random projection (DRP) approach for low-rank approximation.
During HSI remote sensing data acquisition, the DRP approach may enable lossy compression by efficiently computing and storing B2 , V^ k , and P2 on-board as the data is being gathered.
The storage requirement for these factors is proportional to ðm þ nÞk2 þ nk. For lossless compression, the residual R ¼ A − A^ k may be compressed with an integer or floating point-coding
algorithm and also stored on-board. Encoding and decoding procedures based on DRP are presented in Algorithms 3 and 4, respectively. For lossy compression, R^ may be ignored as in the
single-random projection case.
At a slight loss of precision and increased storage requirement, the DRP encoding and decoding algorithms offer the additional advantage of secure data transfer if P2 is used as a shared key
Journal of Applied Remote Sensing
074598-5
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
Algorithm 3 On-Board Double-Random Projections Encoder.
Input: HSI data block of size n x × n y × n, unfolded into a m × n array A, target rank k , and approximation
tolerance ?.
^
Output: B 2 , V^ k , R.
pffiffiffiffiffiffi
1. Compute: B 1 ¼ 1∕ k 1 P T1 A, and B 2 ¼ AP 2 , for some k 1 ≥ c log n∕ϵ2 and k 2 ≥ 2k þ 1.
2. Compute the SVD of B 1 : B 1 ¼
P
^ i v^ Ti .
i¼1 λi u
T
† T
3. Compute the rank-k approximation: A^ k ¼ B 2 ðV^ k P 2 Þ V^ k .
4. Compute the residual: R ¼ A − A^ k .
5. Code the residual as R^ with a parallel coding algorithm.
6. Store B 2 , V^ k , and R^
Algorithm 4 Off-Board Double-Random Projections Decoder.
Input: B 2 , V^ k , P 2 , R^
Output: The original matrix A.
1. Decode R from R^ with a parallel decoding algorithm.
† T
T
2. Compute the low-rank approximation: A^ k ¼ B 2 ðV^ k P 2 Þ V^ k
3. Reconstruct the original: A ¼ A^ k þ R
between the remote sensing aircraft and the ground. It remains to be seen whether this cipher is
easily violated in the future study, and for now we can regard it as a lightweight security. In this
case, P2 could be generated and transmitted securely only once between the ground and the
aircraft. Subsequent communication would not require transmission of P2 . Unlike the single-random projection approach, interception of factors B2 , V^ k , and R^ would not easily lead
to a reconstruction of the original without P2 .
2.3 Randomized Singular Value Decomposition
The rSVD algorithm described by Halko et al.21 explores approximate matrix factorizations by
random projections and separates the process into two stages. In the first stage, A is projected
into a l-dimensional space by computing
Y ¼ AΩ;
(8)
where Ω is a matrix of size n × l with random entries drawn from N ð0; 1Þ. Then, for a given
ϵ > 0, a matrix Q ∈ Rm×l whose columns form an orthonormal basis for the range of Y is
obtained such that
kA − QQT Ak22 ≤ ϵ:
(9)
See Algorithms 4.1 and 4.2 in Ref. 21 to see how Q and l may be computed adaptively. In the
second stage, the SVD of the reduced matrix QT A ∈ Rl×n is computed as Ũ Σ^ V^ T . Since l ≪ n,
it is generally computationally feasible to compute the SVD of the reduced matrix. Matrix A can
then be approximated as
^ Σ^ V^ T ;
A ≈ ðQŨÞΣ^ V^ T ¼ U
Journal of Applied Remote Sensing
074598-6
(10)
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
^ ¼ QŨ and V^ are orthonormal matrices. As such, Eq. (10) is an approximate SVD of A,
where U
^ is an approximation to range of A. See Ref. 21 for details on the choice of l,
and the range of U
along with extensive numerical experiments using rSVD methods, and a detailed error analysis
of the two-stage method described above.
The rSVD approach may also be used to specify HSI encoding and decoding compression
algorithms, as shown in Algorithms 5 and 6. For lossy compression, Q and B need to be computed and stored on-board. The storage requirement for these factors is proportional to
ðm þ nÞl. As in the previous cases, for lossless compression the residual may be calculated
and compressed using an integer or floating point-coding algorithm.
Compared with the previous single- and double-random projection approaches, rSVD
requires the computation of Q but is also able to push the SVD calculation to the decoder.
Since l appears to be much smaller than k1 and k2 in practice, the encoder is able to store
Q and B directly without any loss in the approximation accuracy. Perhaps, the key benefit
^ and V^ can be used directly
^ Σ,
of rSVD lies in that the low-rank approximation factors U,
for subsequent analysis such as PCA, clustering, etc.
2.4 Randomized Singular Value Decomposition by DRP
The DRP approach can also be applied in the rSVD calculation by introducing
B1 ¼ PT1 A;
(11)
where P1 is of size m × k1 with entries drawn from N ð0; 1Þ. Replacing A with the rSVD approximation, QQT A leads to
Algorithm 5 Randomized SVD Encoder.
Input: HSI data block of size n x × n y × n, unfolded into a m × n array A and approximation tolerance ?.
Output: Q, B, R^
1. Calculate: Y ¼ AΩ, for some l > k
2. Apply Algorithm 4.2 in Ref. 21 to obtain Q from Y
3. Compute: B ¼ Q T A
4. Compute the residual: R ¼ A − QB
5. Code R as R^ with a parallel coding algorithm.
6. Store Q, B, and R^
Algorithm 6 Randomized SVD Decoder.
Input: Q, B, and R^
^ Σ,
^ V^
Output: The original matrix A and its rank-k approximate SVD U,
1. Decode R from R^ with a parallel decoding algorithm.
2. Compute the SVD: B ¼ Ũ Σ^ V^
3. Compute: U^ ¼ Q Ũ
4. Compute the low-rank approximation: A^ l ¼ U^ Σ^ V^
5. Reconstruct the original: A ¼ A^ l þ R
Journal of Applied Remote Sensing
074598-7
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
B1 ≈ PT1 QQT A:
(12)
Multiplying both sides by the pseudo-inverse of PT1 Q, we have
ðPT1 QÞ† B1 ≈ QT A:
(13)
With this slight modification, the rSVD calculation in the encoder can proceed by using
ðPT1 QÞ† B1 instead of QT A. The corresponding encoding algorithm is given Algorithm 7.
The decoder algorithm remains the same as in the rSVD case.
3 Numerical Experiments
We have tested the encoding algorithms presented in Sec. 2 on a large and publicly available HSI
dataset, namely Indian Pines, collected by AVIRIS over a 25 × 6 mi2 portion of Northwest
Tippecanoe County, Indiana, on June 12, 1992. The sensor has a spectral range of 0.45 to
2.5 μm over 220 bands, and the full dataset consists of a 2;678 × 614 × 220 image cube stored
as unsigned 16-bit integers. Figure 2 shows the 100th band in grayscale.
A remote-sensing aircraft carrying hyperspectral scanning imagers can collect such a data
cube in blocks of hundreds to thousands of pixels in size, each gathered within a few seconds
time.1 The size of each data block is determined by factors such as the ground sample distance
and the flight speed.
To simulate this process, we unfolded the Indian Pines data cube into a large matrix T of size
1;644;292 × 220, and then divided T into nine blocks Ai of size m ¼ 182;699 × n ¼ 220 each.
For simplicity, the last pixel in the original dataset was ignored. Each Ai block was then compressed sequentially using the encoding algorithms of Sec. 2. In all cases, Ai is converted from an
unsigned 16-bit integer to double the precision before compression, and the compressed representation is converted back to unsigned 16-bit integers for storage.
All algorithms were implemented in Matlab, and the tests were performed on a PC platform
having eight 3.2 GHz Intel Xeon cores and 12 Gb memory. In the implementation of
Algorithm 1, random matrix P1 ∈ Rm×k1 could be large, since m ¼ 182;699 and the oversampling requirement k1 ≥ c log n∕ϵ2 can lead to relatively large k1 , e.g., k1 ¼ 1199 when c ¼ 5
and ϵ ¼ 0.15. To reduce the memory requirement, we implicitly represent P1 in column blocks
ð1Þ ð2Þ
ðνÞ
as P1 ¼ ½P1 P1 : : : P1 and implement the matrix multiplication PT1 A as a series of products
ðjÞ
P1 A, generating and storing P1 as only one block at the time.
3.1 Compressibility of HSI Data
As alluded to with the compressibility curves in Fig. 1, the effectiveness of low-rank approximation and residual encoding depends on (1) the compressibility of the data and (2) the effectiveness of dimensionality reduction in reducing the entropy of the residual as a function of the
desired rank k. The first point can be demonstrated by computing high-accuracy approximated
Algorithm 7 Randomized SVD by DRP Encoder.
Input: HSI data block of size n x × n y × n, unfolded into a m × n array A and approximation tolerance ϵ
Output: Q; W , and R^
?n
and l > k
1. Calculate: B 1 ¼ p1ffiffiffiffi P T1 A, Y ¼ AΩ, for some k 1 ≥ c? log
ϵ2
k1
2. Apply Algorithm 4.2 in Ref. 21 to obtain Q from Y
3. Compute the residual: R ¼ A − QW ; W ¼ ðP T1 QÞ† B 1
4. Code R as R^ with a parallel coding algorithm.
5. Store Q, W , and R^
Journal of Applied Remote Sensing
074598-8
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
Fig. 2 The grayscale image of the 100th band.
singular vectors and singular values of the entire Indian Pines dataset using the rSVD algorithm.
Figure 3 shows the first eight singular vectors folded as images of size 2;678 × 614. Figure 4
shows the corresponding singular values up to the 20th value. As can be observed, a great deal of
the information is encoded in the first six singular vectors and singular values with the seventh
singular vector appearing more like noise.
To address the second point, we compare the histogram of the original dataset with that of the
residual produced by the rSVD encoder in Algorithm 5 with target rank k ¼ 6. Figure 5(a) shows
200
400
600
500
1000
1500
2000
2500
500
1000
1500
2000
2500
500
1000
1500
2000
2500
500
1000
1500
2000
2500
500
1000
1500
2000
2500
500
1000
1500
2000
2500
500
1000
1500
2000
2500
500
1000
1500
2000
2500
200
400
600
200
400
600
200
400
600
200
400
600
200
400
600
200
400
600
200
400
600
Fig. 3 The first eight singular vectors, u^ i , shown as images.
Journal of Applied Remote Sensing
074598-9
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
4
10
3
σi
10
2
10
1
10
0
10
2
4
6
8
10
12
14
16
18
20
i
Fig. 4 The singular spectrum of the full Indian Pines dataset singular values up to the 20th value.
values in the original dataset to be in the range ½0; 0.4. After rSVD encoding, the residual values
are roughly distributed in a Laplacian distribution in the range ½−0.1; 0.1 as seen in Fig. 5(b).
Moreover, 95.42% of the residual values are within the range of ½−:0015; :0015 (notice the log
scale on the y-axis). This suggests that the entropy of the residual is significantly smaller than the
entropy of the original dataset and that, as a consequence, the residual may be effectively
encoded for lossless compression. Figure 5(c) shows the probability of observing a residual
value, r, greater than a given value x, i.e., pðr > xÞ, and again indicating the residuals are highly
densely distributed around zero.
3.2 Lossless Compression Through Randomized Dimensionality Reduction
We use the entropy of the residuals produced by each encoding algorithm as the informationtheoretic lower bound, i.e., the minimum amount of bits required to code the residuals, to estimate the amount of space needed to store a compressed residual. This entropy of the distribution
of residual values is defined as
Z
hðRÞ ¼ −
10
pðxÞ logðpðxÞÞdx;
(14)
10
10
10
1
8
8
10
10
0.8
6
6
10
10
0.6
4
4
10
10
2
2
10
10
0
10
0.4
0.2
0
0
0.1
0.2
0.3
0.4
(a)
10
−0.1
−0.05
0
(b)
0.05
0.1
0
−0.01 −0.005
0
0.005
0.01
(c)
Fig. 5 (a) The distribution of the original Indian Pines hyperspectral imaging (HSI) data values.
(b) The distribution of residuals after subtracting the truncated SVD (TSVD) approximation from
the original data. (c) The cumulative distribution of residuals after subtracting the TSVD approximation from the original data.
Journal of Applied Remote Sensing
074598-10
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
where pðxÞ is the probabilistic distribution function of residual values. We estimate hðRÞ by
computing and scaling histograms of residual values [as in Fig. 5(b)].
We assume that, like the original data, the low-rank representation and the corresponding
residual are stored in the signed 16-bit integer format. The CR is then calculated by dividing
the amount of storage needed for the original data by the amount of storage needed for the
compressed data. As an example, for Algorithm 1, output V^ k and W ¼ AV^ k require space proportional to ðm þ nÞk. If the entropy of the residual is hðRÞ bits, then the lossless CR obtained
using Algorithm 1 is calculated as
CR ¼
16mn
:
16nk þ 16 mk þ hðRÞmn
(15)
Figure 6 shows lossless CRs obtained using all four encoding algorithms of Sec. 2 as a function of data block Ai . The target rank is k ¼ 6 for all cases, and the number of columns in P1 and
P2 are k1 ¼ 1;000 and k2 ¼ 2k þ 1 ¼ 13, respectively. Notice that the CRs are above 2.5 and
close to or around 3, while Wang et al.13 indicated 3 as a good CR for HSI data. Readers should
be aware that Fig. 6 only shows the theoretical upper bounds of the lossless CRs, while those in
Ref. 13 are the real ones. The CRs produced by the DRP variants are slightly lower than their
counterparts. This is an expected result as the advantage of DRP (Algorithm 3) lies in the easily
implemented lightweight data security. Finally, high CRs above 4.5 may be achieved, as shown
in Fig. 6, for the last data block. This block corresponds to segments of homogeneous vegetation,
seen in the right side of Fig. 2, which has been extensively tested by classification algorithms.29
Besides the theoretical upper-bounds of the CRs presented in Fig. 6, we also combine the
randomized methods with some popular lossless compression algorithms for coding the residuals. The chosen residual coding methods include the Lempel-Ziv-Welch (LZW) algorithm,30
Huffman coding,31 Arithmetic coding,32 and JPEG2000.33 Table 1 presents the mean lossless
CRs of the nine blocks of HSI data, where columns correspond to the randomized methods
and rows correspond to the coding algorithms. The highest CR of 2.430 is achieved by
the combination of the rSVD method and the JPEG2000 algorithm. Given the rapid
development of coding algorithms, and the relatively limited and rudimentary algorithms presented here, the CR can be further elevated by incorporating more advanced algorithms in the
future work.
5
RP
DRP
rSVD
rSVD−DRP
4.5
Compression Ratio
4
3.5
3
2.5
2
1
2
3
4
5
Block
6
7
8
9
Fig. 6 The lossless compression ratios (CR) using Algorithms 1, 3, 5, and 7.
Journal of Applied Remote Sensing
074598-11
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
Table 1 Lossless compression ratios (CRs) of hyperspectral imaging (HSI) data with combinations of randomized methods with coding algorithms.
Algorithm 1
Algorithm 3
Algorithm 5
Algorithm 7
LZW
1.438
1.338
1.569
1.563
Huffman coding
2.353
2.022
2.328
2.316
Arithmetic coding
2.362
2.017
2.326
2.313
JPEG2000
2.414
2.189
2.430
2.419
3.3 Optimal Compressibility
Optimal CRs using the randomized dimensionality reduction methods of Sec. 2 depend on the
appropriate selection of parameters such as target rank, approximation error tolerances, etc. Such
results for the Indian Pines dataset are beyond the scope of this paper. However, some optimality
information can be gleaned by observing how CR changes as a function of target rank k (with
other parameters fixed). Notice from Ref. 15 that the amount of storage needed for the low-rank
representation increases with k, while the entropy of the residual decreases. The two terms in the
denominator thus result in an optimal k, which is often data dependent. Figure 7 shows such the
result for the Indian Pines dataset. The different curves correspond to different data blocks of the
original dataset, and the solid red curve is the mean across all blocks. Our choice of k ¼ 6 is seen
to be near optimal.
We can learn several things from the curves in Fig. 7. First, HSI data is compressible, but its
compressibility depends on the right choice of k in the presented algorithms. Some hints on
choosing the right k can be seen through the singular values and singular vectors. For example,
in Fig. 3, the singular vectors after the sixth singular vector look more and more like noise, which
tells us most of the information is contained in the first six singular vectors. Second, we have
empirically demonstrated the entropy of residuals approximately following the power law,
Algorithm 1 (RP)
5
4.5
4
3.5
3
2.5
2
1.5
1
Algorithm 3 (DRP)
5
Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Block 7
Block 8
Block 9
Mean
4.5
4
3.5
3
2.5
2
1.5
0
20
40
60
80
100
1
0
20
Algorithm 5 (rSVD)
5
5
4.5
4.5
4
4
3.5
3.5
3
3
2.5
2.5
2
2
1.5
1.5
1
0
20
40
60
40
60
80
100
80
100
Algorithm 7 (rSVD−DRP)
80
100
1
0
20
40
60
Fig. 7 CRs of the Indian Pines HSI dataset as function of target rank k .
Journal of Applied Remote Sensing
074598-12
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
i.e., ðk∕αÞ−s , as illustrated in Fig. 1; hence, the optimal k has a highly peak area, and is relatively
easy to choose from compared with flatter curves. Further tests are needed to develop robust
methods for obtaining near optimal CRs. The adaptive selection of the rank parameter in the
rSVD calculation21 can be used as an important first step in this direction. Third, since the
rSVD algorithm is near optimal in terms of the Frobenius norm, or in the mean squared error
sense, the similar curves by other randomized algorithms demonstrate that they all share the near
optimality as the rSVD algorithm.
To further justify this finding, we explore their suboptimality through comparing the
Frobenius norms and entropies of the residuals by the four randomized algorithms with
those by the exact TSVD. Figure 8(a) shows the ratios of the Frobenius norm of residuals
by the exact TSVD and the Frobenius norm of residuals by each algorithm for the nine blocks
of HSI data, while Fig. 8(b) shows the ratios of the entropy of residuals by the exact TSVD and
the entropy of residuals by each algorithm. The ratio at 1 represents the exact optimality, while
higher ratios are more optimal than the lower ones. In terms of the Frobenius norm, three of the
four algorithms are fairly close to the optimal, while Algorithm 3, the DRP algorithm, shows less
optimality. In terms of the entropy, all four algorithms are fairly close to the optimal, which
explains why the CRs of the four algorithms are all fairly close to each other. Interestingly,
in Fig. 8(b), we observe ratios even higher than 1, which means in some cases the entropies
of residuals by these algorithms can be even less than those by the exact TSVD.
3.4 Time Performance of Randomized Dimensionality Reduction
If lossy compression of HSI data is preferred, randomized dimensionality reduction methods can
perform in near real time. Figure 9 shows the amount of time (in seconds) that each encoder in
Sec. 2 takes to process each data block Ai , while ignoring the residuals. Notice that all encoders
take less than 5 s for each of the nine data blocks. The computation times of the RP encoder
(Algorithm 1) and the DRP encoder (Algorithm 1) do not appear to be significantly different, and
both take less than 2.4 s per data block, averaging about 2.3 s over all nine data blocks. This can
translate to a mean throughput of 182;699 × 220 × 8∕2.3 ≈ 140 Mb∕s. Note that the original
unsigned 16-bit integer is converted to double precision before processing. The green curve
corresponding to the rSVD encoder (Algorithm 5) shows the best performance, while the
black curve corresponding to the rSVD-DRP encoder (Algorithm 7) is the slowest, but still
takes less than 5 s per block. The extra time is spent in step 3 computing the pseudo-inverse
of PT1 Q. Efficient non-Matlab implementations of the encoding algorithms presented in this
paper on platforms such as GPUs, would be expected to perform in real time. For lossless compression, our tests show that the low-entropy residuals may be effectively compressed with conventional tools, such as gzip, in less than 4 s per data block, or better performance tools, such as
Entropy Optimality
Frobenius norm optimality
1.06
1
1.04
0.9
1.02
0.8
1
0.7
0.98
0.96
0.6
Algorithm 1
Algorithm 3
Algorithm 5
Algorithm 7
0.5
0.4
0
2
4
6
8
0.94
0.92
10
0.9
0
(a)
2
4
6
8
10
(b)
Fig. 8 (a) The ratios of the Frobenius norm of residuals of the nine blocks of HSI data by each
algorithm and that by the exact TSVD. (b) The ratios of the entropy of residuals by each algorithm
and that by the exact TSVD.
Journal of Applied Remote Sensing
074598-13
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
Computation time for lossy compression using Algorithm 1, 3, 5, and 7
5
4.5
Time (Second)
4
3.5
RP
DRP
rSVD
rSVD−DRP
3
2.5
2
1.5
1
2
3
4
5
6
7
8
9
10
Block
Fig. 9 The computation time for lossy compression by Algorithms 1, 3, 5, and 7.
JPEG2000, which can compress each block within 4.5 s. For Huffman coding and Arithmetic
coding algorithms, computation would take significantly longer times without the assistance of
special acceleration platforms, such as GPU or FPGA.
For comparison, we also run 3-D-SPECK and 3-D-SPIHT on a 512 × 512 × 128 subset, and
both algorithms needed over 2 min to provide lossless compression. Christophe and Pearlman
also reported over 2 min of processing time using 3-D-SPIHT with random access for a similarsize dataset.8
4 Conclusions and Discussions
As HSI datasets grow in size, compression and dimensionality reduction for analytical purposes
become increasingly critical for storage, data transmission, and subsequent postprocessing. This
paper shows the potential of using randomized algorithms for efficient and effective compression
and reconstruction of massive HSI datasets. Built upon the random projection and rSVD algorithms, we have further developed a DRP method for a standalone encoding algorithm or for it
being combined with the rSVD algorithm. The DRP algorithm slightly sacrifices CRs, while
adding a lightweight encryption security.
We have demonstrated that for a large HSI dataset, such as the Indian Pines dataset, theoretical CRs close to 3 are possible, while empirical CRs can be as high as 2.43 based on testing a
limited number of coding algorithms. We have used the rSVD algorithm also to estimate near
optimal target ranks by simply using the approximate singular vectors. Choosing optimal parameters for dimensionality reduction using randomized methods is a topic of future research. The
adaptive rank selection method described in Ref. 21 offers an initial step in this direction. In
terms of the suboptimality of the randomized algorithms, we have compared them with the exact
TSVD in terms of the Frobenius norm and the entropy of the residuals, both of which appear to
be near optimal empirically.
The presented randomized algorithms can be regarded as loss compression algorithms, which
need to be combined with residual-coding algorithms for the lossless compression. We have
shown empirically that the entropy of the residual (original data—low-rank approximation)
decreases significantly for HSI data. Conventional entropy-based methods for integer coding
are expected to perform well on these low-entropy residuals. Integrating advanced residualcoding algorithms with the randomized algorithm is an important research topic for the future
study.
One concern for the residual coding is the speed. In this regard, recent developments in floating-point coding34 have shown throughputs reaching as high as 75 Gb∕s on a GPU. On an eight
Xeon-core computer, we have observed throughputs near 20 Gb∕s. Both of these throughputs
Journal of Applied Remote Sensing
074598-14
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
should be sufficient for coding the required HSI residual data. Saving residuals back as 16-bit
integers can further reduce the computation time.
Acknowledgments
Research by R. Plemmons and Q. Zhang is supported by the U.S. Air Force Office of Scientific
Research (AFOSR), under Grant FA9550-11-1-0194.
References
1. M. T. Eismann, Hyperspectral Remote Sensing, SPIE Press, Bellingham, WA (2012).
2. H. F. Grahn and E. Paul Geladi, Techniques and Applications of Hyperspectral Image
Analysis, John Wiley & Sons Ltd., West Sussex, England (2007).
3. J. Bioucas-Dias et al., “Hyperspectral unmixing overview: geometrical, statistical, and
sparse regression-based approaches,” IEEE J. Sel. Top. Appl. Earth Obs. Rem. Sens. 5(2),
354–379 (2012).
4. J. Zhang et al., “Evaluation of jp3d for lossy and lossless compression of hyperspectral
imagery,” in 2009 IEEE Int. Geoscience and Remote Sensing Symposium, IGARSS
2009, Vol. 4, pp. IV-474, IEEE, Cape Town, South Africa (2009).
5. B. Kim, Z. Xiong, and W. Pearlman, “Low bit-rate scalable video coding with 3-d set partitioning in hierarchical trees (3-D SPIHT),” IEEE Trans. Circuits Syst. Video Technol.
10(8), 1374–1387 (2000), http://dx.doi.org/10.1109/76.889025.
6. X. Tang, S. Cho, and W. Pearlman, “Comparison of 3D set partitioning methods in hyperspectral image compression featuring an improved 3D-SPIHT,” in Proc. Data Compression
Conf., 2003, DCC 2003, p. 449, IEEE, Snowbird, UT (2003).
7. Y. Langevin and O. Forni, “Image and spectral image compression for four experiments on
the ROSETTA and Mars express missions of ESA,” in Int. Symp. Optical Science and
Technology, pp. 364–373, SPIE Press, Bellingham, WA (2000).
8. E. Christophe and W. Pearlman, “Three-dimensional SPIHT coding of volume images with
random access and resolution scalability,” J. Image Video Process., 13, Article 2 (2008),
http://dx.doi.org/10.1155/2008/248905.
9. J. C. Harsanyi and C. Chang, “Hyperspectral image classification and dimensionality reduction: an orthogonal subspace projection approach,” IEEE Trans. Geosci. Rem. Sens. 32(4),
779–785 (1994), http://dx.doi.org/10.1109/36.298007.
10. A. Castrodad et al., “Learning discriminative sparse models for source separation and mapping of hyperspectral imagery,” IEEE Trans. Geosci. Rem. Sens. 49(11), 4263–4281 (2011),
http://dx.doi.org/10.1109/TGRS.2011.2163822.
11. C. Li et al., “A compressive sensing and unmixing scheme for hyperspectral data processing,” IEEE Trans. Image Process. 21(3), 1200–1210 (2012).
12. X. Tang and W. Pearlman, “Three-dimensional wavelet-based compression of hyperspectral
images,” Hyperspectral Data Compression, pp. 273–308, Springer, New York (2006).
13. H. Wang, S. Babacan, and K. Sayood, “Lossless hyperspectral-image compression using
context-based conditional average,” IEEE Trans. Geosci. Rem. Sens. 45(12), 4187–4193
(2007), http://dx.doi.org/10.1109/TGRS.2007.906085.
14. G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed., The Johns Hopkins
University Press, Baltimore, Maryland (1996).
15. I. Jolliffe, Principal Component Analysis, 2nd ed., Springer, New York (2002).
16. Q. Du and J. Fowler, “Hyperspectral image compression using JPEG2000 and principal
component analysis,” IEEE Geosci. Rem. Sens. Lett. 4(2), 201–205 (2007), http://dx.doi
.org/10.1109/LGRS.2006.888109.
17. J. Fowler, “Compressive-projection principle component analysis,” IEEE Trans. Image
Process. 18(10), 2230–2242 (2009), http://dx.doi.org/10.1109/TIP.2009.2025089.
18. P. Drineas and M. W. Mahoney, “A randomized algorithm for a tensor-based generalization
of the SVD,” Linear Algebra Appl. 420(2–3), 553–571 (2007), http://dx.doi.org/10.1016/j
.laa.2006.08.023.
Journal of Applied Remote Sensing
074598-15
Vol. 7, 2013
Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data
19. J. Zhang et al., “Randomized SVD methods in hyperspectral imaging,” J. Elect. Comput.
Eng., article 3, in press (2012).
20. L. Trefethen and D. Bau, Numerical Linear Algebra, Lecture 31, SIAM, Philadelphia, PA
(1997).
21. N. Halko, P. G. Martinsson, and J. A. Tropp, “Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions,” SIAM Rev. 53(2),
217–288 (2011), http://dx.doi.org/10.1137/090771806.
22. Y. Chen, N. Nasrabadi, and T. Tran, “Effects of linear projections on the performance of
target detection and classification in hyperspectral imagery,” J. Appl. Rem. Sens. 5(1),
053563 (2011), http://dx.doi.org/10.1117/1.3659894.
23. Q. Zhang et al., “Joint segmentation and reconstruction of hyperspectral data with compressed measurements,” Appl. Opt. 50(22), 4417–4435 (2011), http://dx.doi.org/
10.1364/AO.50.004417.
24. M. Gehm et al., “Single-shot compressive spectral imaging with a dual-disperser architecture,” Opt. Express 15(21), 14013–14027 (2007), http://dx.doi.org/10.1364/OE.15.014013.
25. A. Wagadarikar et al., “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt. 47(10), B44–B51 (2008), http://dx.doi.org/10.1364/AO.47.000B44.
26. S. Vempala, The Random Projection Method, Vol. 65, American Mathematical Society,
Providence, Rhode Island (2004).
27. Q. Zhang, V. P. Pauca, and R. Plemmons, “Image reconstruction from double random projections,” (2013), to be submitted.
28. W. Bajwa et al., “Toeplitz-structured compressed sensing matrices,” in IEEE/SP 14th
Workshop on Statistical Signal Processing, 2007, SSP’07, pp. 294–298, IEEE,
Madison, WI (2007).
29. R. Archibald and G. Fann, “Feature selection and classification of hyperspectral images
with support vector machines,” IEEE Geosci. Rem. Sens. Lett. 4(4), 674–677 (2007),
http://dx.doi.org/10.1109/LGRS.2007.905116.
30. J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,”
IEEE Trans. Inf. Theor. 24(5), 530–536 (1978), http://dx.doi.org/10.1109/TIT.1978
.1055934.
31. K. Skretting, J. H. Husøy, and S. O. Aase, “Improved Huffman coding using recursive splitting,” in Proc. Norwegian Signal Processing, NORSIG, IEEE, Norway (1999).
32. M. Nelson and J.-L. Gailly, The Data Compression Book, 2nd ed., M & T Books,
New York, NY (1995).
33. T. Acharya and P.-S. Tsai, JPEG2000 Standard for Image Compression: Concepts,
Algorithms and VLSI Architectures, Wiley & Sons Ltd., Hoboken, NJ (2005).
34. M. O’Neil and M. Burtscher, “Floating-point data compression at 75 gb∕s on a GPU,” in
Proc. Fourth Workshop on General Purpose Processing on Graphics Processing Units,
p. 7, ACM, New York, NY (2011).
Biographies and photographs of the authors are not available.
Journal of Applied Remote Sensing
074598-16
Vol. 7, 2013
Download