Randomized methods in lossless compression of hyperspectral data Qiang Zhang V. Paúl Pauca Robert Plemmons Randomized methods in lossless compression of hyperspectral data Qiang Zhang,a V. Paúl Pauca,b and Robert Plemmonsc a Wake Forest School of Medicine, Department of Biostatistical Sciences, Winston-Salem, North Carolina 27157 qizhang@wakehealth.edu b Wake Forest University, Department of Computer Science, Winston-Salem, North Carolina 27109 c Wake Forest University, Departments of Mathematics and Computer Science, Winston-Salem, North Carolina 27109 Abstract. We evaluate recently developed randomized matrix decomposition methods for fast lossless compression and reconstruction of hyperspectral imaging (HSI) data. The simple random projection methods have been shown to be effective for lossy compression without severely affecting the performance of object identification and classification. We build upon these methods to develop a new double-random projection method that may enable security in data transmission of compressed data. For HSI data, the distribution of elements in the resulting residual matrix, i.e., the original data subtracted by its low-rank representation, exhibits a low entropy relative to the original data that favors high-compression ratio. We show both theoretically and empirically that randomized methods combined with residual-coding algorithms can lead to effective lossless compression of HSI data. We conduct numerical tests on real large-scale HSI data that shows promise in this case. In addition, we show that randomized techniques can be applicable for encoding on resource-constrained on-board sensor systems, where the core matrix-vector multiplications can be easily implemented on computing platforms such as graphic processing units or field-programmable gate arrays. © 2013 Society of Photo-Optical Instrumentation Engineers (SPIE) [DOI: 10.1117/1.JRS.7.074598] Keywords: random projections; hyperspectral imaging; dimensionality reduction; lossless compression; singular value decomposition. Paper 12486SS received Jan. 3, 2013; revised manuscript received Apr. 18, 2013; accepted for publication Jun. 14, 2013; published online Jul. 30, 2013. 1 Introduction Hyperspectral image (HSI) data are the measurements of the electromagnetic radiation reflected from an object or a scene (i.e., materials in the image) at many narrow wavelength bands. Spectral information is important in many fields such as environmental remote sensing, monitoring chemical/oil spills, and military target discrimination. For comprehensive discussions, see Refs. 1–3. HSI data is being gathered in sensors of increasing spatial, spectral, and radiometric resolutions leading to the collection of truly massive datasets. The transmission, storage, and processing of these large datasets present significant difficulties in practical situations as new-generation sensors are used. For example, for aircraft or for increasingly popular unmanned-aerial vehicles carrying hyperspectral scanning imagers, the imaging time is limited by the data capacity and computational capability of the on-board equipment; since within 5 to 10 s, hundreds to thousands of pixels of hyperspectral data are collected and often preprocessed.1 For real-time on-board processing, it would be desirable to design algorithms capable of compressing such amounts of data within 5 to 10 s, before the next section of the scene is scanned. This requirement makes it difficult to apply algorithms such as JPEG2000,4 three-dimensional (3-D)-SPIHT,5 or 3-D-SPECK,6 unless it is being deployed on acceleration platforms such as digital signal processor,7 graphic processing unit (GPU), or field-programmable gate array 0091-3286/2013/$25.00 © 2013 SPIE Journal of Applied Remote Sensing 074598-1 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data (FPGA). For example, Christophe and Pearlman8 reported over 2 min of processing time using 3-D-SPIHT with random access for a 512 × 512 × 224 HSI dataset, including 30 s for the discrete wavelet transformation. Dimensionality reduction methods can provide means to deal with the computational difficulties of hyperspectral data. These methods often use projections to compress a highdimensional data space represented by a matrix A into a lower-dimensional space represented by a matrix B, which is then factorized. For HSI processing, hundreds of bands of images can be grouped in a 3-D data array, also called a tensor or a datacube, which can be unfolded into a matrix A from which B is obtained and then factorized. Such factorizations are referred to as lowrank matrix factorizations, resulting in a low-rank matrix approximation to the original HSI data matrix A.2,9–11 However, dimensionality reduction techniques provide lossy compression, as the original data is not exactly represented or reconstructed from the lower-dimensional space. Recent efforts to provide lossless compression exploit the correlation structure within HSI data, encoding the residuals (original data—approximation) after stripping off the correlated parts.12,13 Given the large number of pixels, such correlations are often restricted to spatially or spectrally local areas, whereas dimensionality reduction techniques essentially explore the global correlation structure. In this paper, we propose the use of randomized dimensionality reduction techniques for efficiently capturing global correlation structures and residual encoding, as in Ref. 13, and for providing lossless compression. The success of this approach requires low entropy of the distribution of the residual data relative to the original, and as it shall be observed in the experimental section this appears to be the case with HSI data. The most popular methods for low-rank factorizations employ the singular value decomposition (SVD), e.g., Ref. 14, and can lead to popular data analysis methods such as principal component analysis (PCA).15 Compared with algorithms that employ fixed basis functions, such as 3-D wavelets in JPEG2000, 3-D-SPIHT, and 3-D-SPECK, the basis given by the SVD or PCA are data driven and provide a more compact representation of the original data. Moreover, by the optimality of the truncated SVD’s (TSVD) low-rank approximation,14 the Frobenius norm of the residual matrix is also optimal, and a low entropy in its distribution may be expected. Both the SVD and PCA can be used to represent an n-band hyperspectral dataset with the data size equivalent to only k bands, where k ≪ n. For applications of the SVD and PCA in HSI, see Refs. 16–19. The main disadvantage of using the SVD is its computation time: Oðmn2 Þ floating-point operations (flops) for an m × n matrix (m ≥ n) (Ref. 20). With recent technology, HSI datasets can easily be at the million pixel or even giga pixel-level, rendering the use of a full SVD impractical on real scenarios. The recent development of probabilistic methods for approximated singular vectors and singular values has provided a way to circumvent the computational complexity of the SVD, though at the cost of optimality in the approximation.21 These methods begin by randomly projecting the original matrix to obtain a lower-dimensional matrix, while keeping the range of the original matrix asymptotically intact. The much smaller-projected matrix is then factorized using a fullmatrix decomposition such as the SVD. The resulting singular vectors are backprojected to the original space. Compared with deterministic methods, probabilistic methods often offer lowercomputational cost, while still achieving high-accuracy approximations (see Ref. 21 and the references therein). Chen et al.22 have recently provided an extensive study on the effects of linear projections on the performance of target detection and classification of HSI. In their tests, they found that the dimensionality of hyperspectral data can typically be reduced to 1∕5 ∼ 1∕3 that of the original data without severely affecting the performances of classical target detection and classification algorithms. Compressive sensing approaches for HSI also take advantage of redundancy along the spectral dimension,11,17,23–25 and involve random projection of the data onto a lower-dimensional space. For example, Fowler17 proposed an approach that exploits the use of compressive projections in sensors that integrate dimensionality reduction and signal acquisition to effectively shift the computational burden of PCA from the encoder platform to the decoder site. This technique, termed compressive-projection PCA (CPPCA), couples random projections at the encoder with a Rayleigh–Ritz process for approximating eigenvectors at the decoder. In its use of random projections, this technique possesses a certain duality with newer randomized Journal of Applied Remote Sensing 074598-2 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data SVD (rSVD) approaches recently proposed.19 However, CPPCA recovers coefficients of a known sparsity pattern in an unknown basis. Accordingly, CPPCA requires the additional step of eigenvector recovery. In this paper, we will present several randomized algorithms designed for the on-board processing of the lossy and the lossless compressions of HSI. Our goals include the fast processing of hundreds of pixels of hyperspectral data within a time frame of 5 s and to achieve a lossless compression ratio (CR) close to 3. The structure in the remainder of the paper is as follows. In Sec. 2, we present several fast-randomized methods for the purposes of lossless compression and reconstruction, suitable for on-board and off-board (receiving station) processing. In Sec. 3, we apply the methods to a large HSI dataset to demonstrate the efficiency and effectiveness of the proposed methods. We conclude with some observations in Sec. 4. 2 Methodology Randomized algorithms have recently drawn a large amount of interest,21 and here we exploit this approach specifically for efficient on-board lossless compression and data transfer and offboard reconstruction of HSI data. For lossless compression, the process is as follows: 1. Calculate a low-rank approximation of the original data using randomized algorithms. 2. Encode the residual (original data—approximation) using standard integer or floating point-coding algorithms. We present several randomized algorithms for efficient low-rank approximation. They can be written in fewer than 10 lines of pseudo-code, can be easily implemented on PC platforms, and may be ported to platforms such as GPUs or FPGAs. As readers will see, in all of the large-scale computations only matrix-vector multiplications are involved, and more computationally intensive SVD computations involve only small scale matrices. In the encoding and decoding algorithms that follow, it is assumed that HSI data is collected in blocks of size nx × ny × n, where nx and ny are the number of pixels along the spatial dimensions and n is the number of spectral bands. During compression, each block is first unfolded into a two-dimensional array of size m × n, where m ¼ nx ny , by stacking each slice of size nx × ny into a one-dimensional array of size m × 1. The compact representation for each block can then be stored on board. See Sec. 3 for a more extensive discussion of compression of HSI data in blocks as the entire dataset is being gathered. We start by defining terms and notations. The SVD of a matrix A ∈ Rm×n is defined as A ¼ UΣV T , where U and V are orthonormal and the columns of which are denoted as ui and vi , respectively. Σ is a diagonal matrix with entries σ 1 ≥ σ 2 ≥ · · · ≥ σ p ≥ 0, with p ¼ minðm; PnÞ. For some k ≤ p, the TSVD rank-k approximation of A is a matrix Ak such that Ak ¼ ki¼1 σ i ui vTi ¼ Uk Σk V Tk , where U k and V k contain the first k-columns of U and V, respectively. The residual matrix obtained from the approximation of A with Ak is given by R ¼ A − Ak . By the Eckart–Young theorem,14 Ak is the optimal rank-k approximation of A minimizing the Frobenius norm of R. 2.1 Single-Random Projection Method Computing low-rank approximations of a large matrix using the SVD is prohibitive in most of the real-world applications. Randomized projections into lower-dimensional spaces provide a feasible way to get around this problem. Let P ¼ ðpij Þm×k1 be a matrix of size m × k1 with random independent and identically distributed (i.i.d.) entries drawn from N ð0; 1Þ. We define the random projection of the row space of A onto a lower k1 -dimensional subspace as B ¼ PT A: (1) If P is of size n × k1 , then B ¼ AP is a similar random projection of the column space of A. Given a target rank k, Vempala26 uses such P matrices to propose an efficient algorithm for computing a rank-k approximation of A. The algorithm consists of the following three simple steps: Journal of Applied Remote Sensing 074598-3 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data pffiffiffiffiffi 1. Compute the random projection B ¼ 1∕ k1 PT1 A for some k1 ≥ c log n∕ϵ2 P 2. Compute the SVD, B ¼ i¼1 λi u^ i v^ Ti P 3. Return: Ãk ←Að ki¼1 v^ i v^ Ti Þ ¼ AV^ k V^ Tk . It is also shown in Ref. 26 that with a high probability, the norm error between Ãk and A is bounded by kA − Ãk k2F ≤ kA − Ak k2F þ 2ϵkAk k2F ; (2) where Ak is is the optimal rank-k approximation provided by the TSVD. This bound shows that the approximation Ãk is near optimal for small ϵ. During HSI remote sensing data acquisition, Vempala’s algorithm may enable lossy compression by efficiently computing and storing AV^ k and V^ k on-board as the data is being gathered. The storage requirement of AV^ k and V^ k is proportional to ðm þ nÞk compared with mn of the original data. For lossless compression, the residual R ¼ A − Ãk may be compressed with an integer or floating point-coding algorithm and also stored on board. Encoding and decoding procedures using Vempala’s algorithm are presented in Algorithms 1 and 2, respectively. For lossy compression, R^ may be ignored. Clearly, there is a tradeoff between the target rank, reducing the size of AV^ k and V^ k , and the compressibility of the residual R, which is also dependent on the type of data being compressed. Figure 1 illustrates this tradeoff, assuming that the entropy of the residual decreases as a scaled power law in the form of k−s ∕α for s ¼ 0.1∶0.1∶2 and with constant α. Matrix P1 plays an important role in the efficient low-rank approximation of A. P1 could be fairly large depending on the prespecified value of ϵ. For example, for ϵ ¼ 0.15, c ¼ 5, and n ¼ 220, P requires k1 ≥ 1199 columns. However, P1 is needed only once in the compression process, and may be generated in blocks (see Sec. 3). In addition, the distribution of random entries in P1 is symmetric, being drawn from a normal distribution. Zhang et al.27 relax this requirement to allow for any distribution with a finite variance. For faster implementation, a Algorithm 1 On-Board Random Projections Encoder. Input: HSI data block of size n x × n y × n, unfolded into a m × n array A, target rank k , and approximation tolerance ϵ. Output: V^ k , W , R^ pffiffiffiffiffiffi 1. Compute B ¼ 1∕ k 1 P T1 A, for some k 1 ≥ c log n∕ϵ2 . 2. Compute the SVD of B: B ¼ P ^ i v^ Ti . i¼1 λi u T 3. Construct the rank-k approximation: Ãk ¼ W V^ k ; W ¼ AV^ k . 4. Compute the residual: R ¼ A − Ãk . 5. Encode the residual as R^ with a parallel coding algorithm. 6. Store V^ k , W , and R^ Algorithm 2 Off-Board Random Projection Decoder. ^ Input: V^ k , W , R. Output: The original matrix A. 1. Decode R from R^ with a parallel decoding algorithm. T 2. Compute the rank-k approximation: Ãk ¼ W V^ k . 3. Reconstruct the original: A ¼ Ãk þ R Journal of Applied Remote Sensing 074598-4 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data 14 compression ratio 12 10 8 6 4 2 2 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 150 200 desired rank 250 300 Fig. 1 Theoretical compressibility curves when entropy of the residual decreases as ðk ∕αÞ−s for k ¼ 2; : : : ; 300, s ¼ 0.1: : : 2 and a constant α ¼ 2. The dashed line indicates a compressed ratio of 1 (original data). circulant random matrix could also be effective,27,28 needing storage of only one random vector. 2.2 Double-Random Projections Method A variant of the above low-rank approximation approach may be derived by introducing a second random projection for the row space B2 ¼ AP2 ; (3) where P2 ∈ Rn×k2 has i.i.d. entries drawn from N ð0; 1Þ and B2 ∈ Rm×k2 . Substitution of A in Eq. (3) with its rank-k approximation AV^ k V^ Tk results in B2 ≈ AV^ k V^ Tk P2 : (4) Notice that V^ Tk P2 has full-row rank, hence its Moore–Penrose pseudo-inverse satisfies ðV^ Tk P2 ÞðV^ Tk P2 Þ† ¼ I k : (5) Multiplying Eq. (4) on both sides with ðV^ Tk P2 Þ† gives B2 ðV^ Tk P2 Þ† ≈ AV^ k : (6) A new rank-k approximation of A can then be obtained as A^ k ¼ B2 ðV^ Tk P2 Þ† V^ Tk ≈ AV^ k V^ Tk ≈ A: (7) As in Vempala’s algorithm, the quality of this approximation depends on choosing a sufficiently large value of k2 ≥ 2k þ 1 (see Ref. 27 for a more detailed discussion). We refer to this method as the double-random projection (DRP) approach for low-rank approximation. During HSI remote sensing data acquisition, the DRP approach may enable lossy compression by efficiently computing and storing B2 , V^ k , and P2 on-board as the data is being gathered. The storage requirement for these factors is proportional to ðm þ nÞk2 þ nk. For lossless compression, the residual R ¼ A − A^ k may be compressed with an integer or floating point-coding algorithm and also stored on-board. Encoding and decoding procedures based on DRP are presented in Algorithms 3 and 4, respectively. For lossy compression, R^ may be ignored as in the single-random projection case. At a slight loss of precision and increased storage requirement, the DRP encoding and decoding algorithms offer the additional advantage of secure data transfer if P2 is used as a shared key Journal of Applied Remote Sensing 074598-5 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data Algorithm 3 On-Board Double-Random Projections Encoder. Input: HSI data block of size n x × n y × n, unfolded into a m × n array A, target rank k , and approximation tolerance ?. ^ Output: B 2 , V^ k , R. pffiffiffiffiffiffi 1. Compute: B 1 ¼ 1∕ k 1 P T1 A, and B 2 ¼ AP 2 , for some k 1 ≥ c log n∕ϵ2 and k 2 ≥ 2k þ 1. 2. Compute the SVD of B 1 : B 1 ¼ P ^ i v^ Ti . i¼1 λi u T † T 3. Compute the rank-k approximation: A^ k ¼ B 2 ðV^ k P 2 Þ V^ k . 4. Compute the residual: R ¼ A − A^ k . 5. Code the residual as R^ with a parallel coding algorithm. 6. Store B 2 , V^ k , and R^ Algorithm 4 Off-Board Double-Random Projections Decoder. Input: B 2 , V^ k , P 2 , R^ Output: The original matrix A. 1. Decode R from R^ with a parallel decoding algorithm. † T T 2. Compute the low-rank approximation: A^ k ¼ B 2 ðV^ k P 2 Þ V^ k 3. Reconstruct the original: A ¼ A^ k þ R between the remote sensing aircraft and the ground. It remains to be seen whether this cipher is easily violated in the future study, and for now we can regard it as a lightweight security. In this case, P2 could be generated and transmitted securely only once between the ground and the aircraft. Subsequent communication would not require transmission of P2 . Unlike the single-random projection approach, interception of factors B2 , V^ k , and R^ would not easily lead to a reconstruction of the original without P2 . 2.3 Randomized Singular Value Decomposition The rSVD algorithm described by Halko et al.21 explores approximate matrix factorizations by random projections and separates the process into two stages. In the first stage, A is projected into a l-dimensional space by computing Y ¼ AΩ; (8) where Ω is a matrix of size n × l with random entries drawn from N ð0; 1Þ. Then, for a given ϵ > 0, a matrix Q ∈ Rm×l whose columns form an orthonormal basis for the range of Y is obtained such that kA − QQT Ak22 ≤ ϵ: (9) See Algorithms 4.1 and 4.2 in Ref. 21 to see how Q and l may be computed adaptively. In the second stage, the SVD of the reduced matrix QT A ∈ Rl×n is computed as Ũ Σ^ V^ T . Since l ≪ n, it is generally computationally feasible to compute the SVD of the reduced matrix. Matrix A can then be approximated as ^ Σ^ V^ T ; A ≈ ðQŨÞΣ^ V^ T ¼ U Journal of Applied Remote Sensing 074598-6 (10) Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data ^ ¼ QŨ and V^ are orthonormal matrices. As such, Eq. (10) is an approximate SVD of A, where U ^ is an approximation to range of A. See Ref. 21 for details on the choice of l, and the range of U along with extensive numerical experiments using rSVD methods, and a detailed error analysis of the two-stage method described above. The rSVD approach may also be used to specify HSI encoding and decoding compression algorithms, as shown in Algorithms 5 and 6. For lossy compression, Q and B need to be computed and stored on-board. The storage requirement for these factors is proportional to ðm þ nÞl. As in the previous cases, for lossless compression the residual may be calculated and compressed using an integer or floating point-coding algorithm. Compared with the previous single- and double-random projection approaches, rSVD requires the computation of Q but is also able to push the SVD calculation to the decoder. Since l appears to be much smaller than k1 and k2 in practice, the encoder is able to store Q and B directly without any loss in the approximation accuracy. Perhaps, the key benefit ^ and V^ can be used directly ^ Σ, of rSVD lies in that the low-rank approximation factors U, for subsequent analysis such as PCA, clustering, etc. 2.4 Randomized Singular Value Decomposition by DRP The DRP approach can also be applied in the rSVD calculation by introducing B1 ¼ PT1 A; (11) where P1 is of size m × k1 with entries drawn from N ð0; 1Þ. Replacing A with the rSVD approximation, QQT A leads to Algorithm 5 Randomized SVD Encoder. Input: HSI data block of size n x × n y × n, unfolded into a m × n array A and approximation tolerance ?. Output: Q, B, R^ 1. Calculate: Y ¼ AΩ, for some l > k 2. Apply Algorithm 4.2 in Ref. 21 to obtain Q from Y 3. Compute: B ¼ Q T A 4. Compute the residual: R ¼ A − QB 5. Code R as R^ with a parallel coding algorithm. 6. Store Q, B, and R^ Algorithm 6 Randomized SVD Decoder. Input: Q, B, and R^ ^ Σ, ^ V^ Output: The original matrix A and its rank-k approximate SVD U, 1. Decode R from R^ with a parallel decoding algorithm. 2. Compute the SVD: B ¼ Ũ Σ^ V^ 3. Compute: U^ ¼ Q Ũ 4. Compute the low-rank approximation: A^ l ¼ U^ Σ^ V^ 5. Reconstruct the original: A ¼ A^ l þ R Journal of Applied Remote Sensing 074598-7 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data B1 ≈ PT1 QQT A: (12) Multiplying both sides by the pseudo-inverse of PT1 Q, we have ðPT1 QÞ† B1 ≈ QT A: (13) With this slight modification, the rSVD calculation in the encoder can proceed by using ðPT1 QÞ† B1 instead of QT A. The corresponding encoding algorithm is given Algorithm 7. The decoder algorithm remains the same as in the rSVD case. 3 Numerical Experiments We have tested the encoding algorithms presented in Sec. 2 on a large and publicly available HSI dataset, namely Indian Pines, collected by AVIRIS over a 25 × 6 mi2 portion of Northwest Tippecanoe County, Indiana, on June 12, 1992. The sensor has a spectral range of 0.45 to 2.5 μm over 220 bands, and the full dataset consists of a 2;678 × 614 × 220 image cube stored as unsigned 16-bit integers. Figure 2 shows the 100th band in grayscale. A remote-sensing aircraft carrying hyperspectral scanning imagers can collect such a data cube in blocks of hundreds to thousands of pixels in size, each gathered within a few seconds time.1 The size of each data block is determined by factors such as the ground sample distance and the flight speed. To simulate this process, we unfolded the Indian Pines data cube into a large matrix T of size 1;644;292 × 220, and then divided T into nine blocks Ai of size m ¼ 182;699 × n ¼ 220 each. For simplicity, the last pixel in the original dataset was ignored. Each Ai block was then compressed sequentially using the encoding algorithms of Sec. 2. In all cases, Ai is converted from an unsigned 16-bit integer to double the precision before compression, and the compressed representation is converted back to unsigned 16-bit integers for storage. All algorithms were implemented in Matlab, and the tests were performed on a PC platform having eight 3.2 GHz Intel Xeon cores and 12 Gb memory. In the implementation of Algorithm 1, random matrix P1 ∈ Rm×k1 could be large, since m ¼ 182;699 and the oversampling requirement k1 ≥ c log n∕ϵ2 can lead to relatively large k1 , e.g., k1 ¼ 1199 when c ¼ 5 and ϵ ¼ 0.15. To reduce the memory requirement, we implicitly represent P1 in column blocks ð1Þ ð2Þ ðνÞ as P1 ¼ ½P1 P1 : : : P1 and implement the matrix multiplication PT1 A as a series of products ðjÞ P1 A, generating and storing P1 as only one block at the time. 3.1 Compressibility of HSI Data As alluded to with the compressibility curves in Fig. 1, the effectiveness of low-rank approximation and residual encoding depends on (1) the compressibility of the data and (2) the effectiveness of dimensionality reduction in reducing the entropy of the residual as a function of the desired rank k. The first point can be demonstrated by computing high-accuracy approximated Algorithm 7 Randomized SVD by DRP Encoder. Input: HSI data block of size n x × n y × n, unfolded into a m × n array A and approximation tolerance ϵ Output: Q; W , and R^ ?n and l > k 1. Calculate: B 1 ¼ p1ffiffiffiffi P T1 A, Y ¼ AΩ, for some k 1 ≥ c? log ϵ2 k1 2. Apply Algorithm 4.2 in Ref. 21 to obtain Q from Y 3. Compute the residual: R ¼ A − QW ; W ¼ ðP T1 QÞ† B 1 4. Code R as R^ with a parallel coding algorithm. 5. Store Q, W , and R^ Journal of Applied Remote Sensing 074598-8 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data Fig. 2 The grayscale image of the 100th band. singular vectors and singular values of the entire Indian Pines dataset using the rSVD algorithm. Figure 3 shows the first eight singular vectors folded as images of size 2;678 × 614. Figure 4 shows the corresponding singular values up to the 20th value. As can be observed, a great deal of the information is encoded in the first six singular vectors and singular values with the seventh singular vector appearing more like noise. To address the second point, we compare the histogram of the original dataset with that of the residual produced by the rSVD encoder in Algorithm 5 with target rank k ¼ 6. Figure 5(a) shows 200 400 600 500 1000 1500 2000 2500 500 1000 1500 2000 2500 500 1000 1500 2000 2500 500 1000 1500 2000 2500 500 1000 1500 2000 2500 500 1000 1500 2000 2500 500 1000 1500 2000 2500 500 1000 1500 2000 2500 200 400 600 200 400 600 200 400 600 200 400 600 200 400 600 200 400 600 200 400 600 Fig. 3 The first eight singular vectors, u^ i , shown as images. Journal of Applied Remote Sensing 074598-9 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data 4 10 3 σi 10 2 10 1 10 0 10 2 4 6 8 10 12 14 16 18 20 i Fig. 4 The singular spectrum of the full Indian Pines dataset singular values up to the 20th value. values in the original dataset to be in the range ½0; 0.4. After rSVD encoding, the residual values are roughly distributed in a Laplacian distribution in the range ½−0.1; 0.1 as seen in Fig. 5(b). Moreover, 95.42% of the residual values are within the range of ½−:0015; :0015 (notice the log scale on the y-axis). This suggests that the entropy of the residual is significantly smaller than the entropy of the original dataset and that, as a consequence, the residual may be effectively encoded for lossless compression. Figure 5(c) shows the probability of observing a residual value, r, greater than a given value x, i.e., pðr > xÞ, and again indicating the residuals are highly densely distributed around zero. 3.2 Lossless Compression Through Randomized Dimensionality Reduction We use the entropy of the residuals produced by each encoding algorithm as the informationtheoretic lower bound, i.e., the minimum amount of bits required to code the residuals, to estimate the amount of space needed to store a compressed residual. This entropy of the distribution of residual values is defined as Z hðRÞ ¼ − 10 pðxÞ logðpðxÞÞdx; (14) 10 10 10 1 8 8 10 10 0.8 6 6 10 10 0.6 4 4 10 10 2 2 10 10 0 10 0.4 0.2 0 0 0.1 0.2 0.3 0.4 (a) 10 −0.1 −0.05 0 (b) 0.05 0.1 0 −0.01 −0.005 0 0.005 0.01 (c) Fig. 5 (a) The distribution of the original Indian Pines hyperspectral imaging (HSI) data values. (b) The distribution of residuals after subtracting the truncated SVD (TSVD) approximation from the original data. (c) The cumulative distribution of residuals after subtracting the TSVD approximation from the original data. Journal of Applied Remote Sensing 074598-10 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data where pðxÞ is the probabilistic distribution function of residual values. We estimate hðRÞ by computing and scaling histograms of residual values [as in Fig. 5(b)]. We assume that, like the original data, the low-rank representation and the corresponding residual are stored in the signed 16-bit integer format. The CR is then calculated by dividing the amount of storage needed for the original data by the amount of storage needed for the compressed data. As an example, for Algorithm 1, output V^ k and W ¼ AV^ k require space proportional to ðm þ nÞk. If the entropy of the residual is hðRÞ bits, then the lossless CR obtained using Algorithm 1 is calculated as CR ¼ 16mn : 16nk þ 16 mk þ hðRÞmn (15) Figure 6 shows lossless CRs obtained using all four encoding algorithms of Sec. 2 as a function of data block Ai . The target rank is k ¼ 6 for all cases, and the number of columns in P1 and P2 are k1 ¼ 1;000 and k2 ¼ 2k þ 1 ¼ 13, respectively. Notice that the CRs are above 2.5 and close to or around 3, while Wang et al.13 indicated 3 as a good CR for HSI data. Readers should be aware that Fig. 6 only shows the theoretical upper bounds of the lossless CRs, while those in Ref. 13 are the real ones. The CRs produced by the DRP variants are slightly lower than their counterparts. This is an expected result as the advantage of DRP (Algorithm 3) lies in the easily implemented lightweight data security. Finally, high CRs above 4.5 may be achieved, as shown in Fig. 6, for the last data block. This block corresponds to segments of homogeneous vegetation, seen in the right side of Fig. 2, which has been extensively tested by classification algorithms.29 Besides the theoretical upper-bounds of the CRs presented in Fig. 6, we also combine the randomized methods with some popular lossless compression algorithms for coding the residuals. The chosen residual coding methods include the Lempel-Ziv-Welch (LZW) algorithm,30 Huffman coding,31 Arithmetic coding,32 and JPEG2000.33 Table 1 presents the mean lossless CRs of the nine blocks of HSI data, where columns correspond to the randomized methods and rows correspond to the coding algorithms. The highest CR of 2.430 is achieved by the combination of the rSVD method and the JPEG2000 algorithm. Given the rapid development of coding algorithms, and the relatively limited and rudimentary algorithms presented here, the CR can be further elevated by incorporating more advanced algorithms in the future work. 5 RP DRP rSVD rSVD−DRP 4.5 Compression Ratio 4 3.5 3 2.5 2 1 2 3 4 5 Block 6 7 8 9 Fig. 6 The lossless compression ratios (CR) using Algorithms 1, 3, 5, and 7. Journal of Applied Remote Sensing 074598-11 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data Table 1 Lossless compression ratios (CRs) of hyperspectral imaging (HSI) data with combinations of randomized methods with coding algorithms. Algorithm 1 Algorithm 3 Algorithm 5 Algorithm 7 LZW 1.438 1.338 1.569 1.563 Huffman coding 2.353 2.022 2.328 2.316 Arithmetic coding 2.362 2.017 2.326 2.313 JPEG2000 2.414 2.189 2.430 2.419 3.3 Optimal Compressibility Optimal CRs using the randomized dimensionality reduction methods of Sec. 2 depend on the appropriate selection of parameters such as target rank, approximation error tolerances, etc. Such results for the Indian Pines dataset are beyond the scope of this paper. However, some optimality information can be gleaned by observing how CR changes as a function of target rank k (with other parameters fixed). Notice from Ref. 15 that the amount of storage needed for the low-rank representation increases with k, while the entropy of the residual decreases. The two terms in the denominator thus result in an optimal k, which is often data dependent. Figure 7 shows such the result for the Indian Pines dataset. The different curves correspond to different data blocks of the original dataset, and the solid red curve is the mean across all blocks. Our choice of k ¼ 6 is seen to be near optimal. We can learn several things from the curves in Fig. 7. First, HSI data is compressible, but its compressibility depends on the right choice of k in the presented algorithms. Some hints on choosing the right k can be seen through the singular values and singular vectors. For example, in Fig. 3, the singular vectors after the sixth singular vector look more and more like noise, which tells us most of the information is contained in the first six singular vectors. Second, we have empirically demonstrated the entropy of residuals approximately following the power law, Algorithm 1 (RP) 5 4.5 4 3.5 3 2.5 2 1.5 1 Algorithm 3 (DRP) 5 Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8 Block 9 Mean 4.5 4 3.5 3 2.5 2 1.5 0 20 40 60 80 100 1 0 20 Algorithm 5 (rSVD) 5 5 4.5 4.5 4 4 3.5 3.5 3 3 2.5 2.5 2 2 1.5 1.5 1 0 20 40 60 40 60 80 100 80 100 Algorithm 7 (rSVD−DRP) 80 100 1 0 20 40 60 Fig. 7 CRs of the Indian Pines HSI dataset as function of target rank k . Journal of Applied Remote Sensing 074598-12 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data i.e., ðk∕αÞ−s , as illustrated in Fig. 1; hence, the optimal k has a highly peak area, and is relatively easy to choose from compared with flatter curves. Further tests are needed to develop robust methods for obtaining near optimal CRs. The adaptive selection of the rank parameter in the rSVD calculation21 can be used as an important first step in this direction. Third, since the rSVD algorithm is near optimal in terms of the Frobenius norm, or in the mean squared error sense, the similar curves by other randomized algorithms demonstrate that they all share the near optimality as the rSVD algorithm. To further justify this finding, we explore their suboptimality through comparing the Frobenius norms and entropies of the residuals by the four randomized algorithms with those by the exact TSVD. Figure 8(a) shows the ratios of the Frobenius norm of residuals by the exact TSVD and the Frobenius norm of residuals by each algorithm for the nine blocks of HSI data, while Fig. 8(b) shows the ratios of the entropy of residuals by the exact TSVD and the entropy of residuals by each algorithm. The ratio at 1 represents the exact optimality, while higher ratios are more optimal than the lower ones. In terms of the Frobenius norm, three of the four algorithms are fairly close to the optimal, while Algorithm 3, the DRP algorithm, shows less optimality. In terms of the entropy, all four algorithms are fairly close to the optimal, which explains why the CRs of the four algorithms are all fairly close to each other. Interestingly, in Fig. 8(b), we observe ratios even higher than 1, which means in some cases the entropies of residuals by these algorithms can be even less than those by the exact TSVD. 3.4 Time Performance of Randomized Dimensionality Reduction If lossy compression of HSI data is preferred, randomized dimensionality reduction methods can perform in near real time. Figure 9 shows the amount of time (in seconds) that each encoder in Sec. 2 takes to process each data block Ai , while ignoring the residuals. Notice that all encoders take less than 5 s for each of the nine data blocks. The computation times of the RP encoder (Algorithm 1) and the DRP encoder (Algorithm 1) do not appear to be significantly different, and both take less than 2.4 s per data block, averaging about 2.3 s over all nine data blocks. This can translate to a mean throughput of 182;699 × 220 × 8∕2.3 ≈ 140 Mb∕s. Note that the original unsigned 16-bit integer is converted to double precision before processing. The green curve corresponding to the rSVD encoder (Algorithm 5) shows the best performance, while the black curve corresponding to the rSVD-DRP encoder (Algorithm 7) is the slowest, but still takes less than 5 s per block. The extra time is spent in step 3 computing the pseudo-inverse of PT1 Q. Efficient non-Matlab implementations of the encoding algorithms presented in this paper on platforms such as GPUs, would be expected to perform in real time. For lossless compression, our tests show that the low-entropy residuals may be effectively compressed with conventional tools, such as gzip, in less than 4 s per data block, or better performance tools, such as Entropy Optimality Frobenius norm optimality 1.06 1 1.04 0.9 1.02 0.8 1 0.7 0.98 0.96 0.6 Algorithm 1 Algorithm 3 Algorithm 5 Algorithm 7 0.5 0.4 0 2 4 6 8 0.94 0.92 10 0.9 0 (a) 2 4 6 8 10 (b) Fig. 8 (a) The ratios of the Frobenius norm of residuals of the nine blocks of HSI data by each algorithm and that by the exact TSVD. (b) The ratios of the entropy of residuals by each algorithm and that by the exact TSVD. Journal of Applied Remote Sensing 074598-13 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data Computation time for lossy compression using Algorithm 1, 3, 5, and 7 5 4.5 Time (Second) 4 3.5 RP DRP rSVD rSVD−DRP 3 2.5 2 1.5 1 2 3 4 5 6 7 8 9 10 Block Fig. 9 The computation time for lossy compression by Algorithms 1, 3, 5, and 7. JPEG2000, which can compress each block within 4.5 s. For Huffman coding and Arithmetic coding algorithms, computation would take significantly longer times without the assistance of special acceleration platforms, such as GPU or FPGA. For comparison, we also run 3-D-SPECK and 3-D-SPIHT on a 512 × 512 × 128 subset, and both algorithms needed over 2 min to provide lossless compression. Christophe and Pearlman also reported over 2 min of processing time using 3-D-SPIHT with random access for a similarsize dataset.8 4 Conclusions and Discussions As HSI datasets grow in size, compression and dimensionality reduction for analytical purposes become increasingly critical for storage, data transmission, and subsequent postprocessing. This paper shows the potential of using randomized algorithms for efficient and effective compression and reconstruction of massive HSI datasets. Built upon the random projection and rSVD algorithms, we have further developed a DRP method for a standalone encoding algorithm or for it being combined with the rSVD algorithm. The DRP algorithm slightly sacrifices CRs, while adding a lightweight encryption security. We have demonstrated that for a large HSI dataset, such as the Indian Pines dataset, theoretical CRs close to 3 are possible, while empirical CRs can be as high as 2.43 based on testing a limited number of coding algorithms. We have used the rSVD algorithm also to estimate near optimal target ranks by simply using the approximate singular vectors. Choosing optimal parameters for dimensionality reduction using randomized methods is a topic of future research. The adaptive rank selection method described in Ref. 21 offers an initial step in this direction. In terms of the suboptimality of the randomized algorithms, we have compared them with the exact TSVD in terms of the Frobenius norm and the entropy of the residuals, both of which appear to be near optimal empirically. The presented randomized algorithms can be regarded as loss compression algorithms, which need to be combined with residual-coding algorithms for the lossless compression. We have shown empirically that the entropy of the residual (original data—low-rank approximation) decreases significantly for HSI data. Conventional entropy-based methods for integer coding are expected to perform well on these low-entropy residuals. Integrating advanced residualcoding algorithms with the randomized algorithm is an important research topic for the future study. One concern for the residual coding is the speed. In this regard, recent developments in floating-point coding34 have shown throughputs reaching as high as 75 Gb∕s on a GPU. On an eight Xeon-core computer, we have observed throughputs near 20 Gb∕s. Both of these throughputs Journal of Applied Remote Sensing 074598-14 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data should be sufficient for coding the required HSI residual data. Saving residuals back as 16-bit integers can further reduce the computation time. Acknowledgments Research by R. Plemmons and Q. Zhang is supported by the U.S. Air Force Office of Scientific Research (AFOSR), under Grant FA9550-11-1-0194. References 1. M. T. Eismann, Hyperspectral Remote Sensing, SPIE Press, Bellingham, WA (2012). 2. H. F. Grahn and E. Paul Geladi, Techniques and Applications of Hyperspectral Image Analysis, John Wiley & Sons Ltd., West Sussex, England (2007). 3. J. Bioucas-Dias et al., “Hyperspectral unmixing overview: geometrical, statistical, and sparse regression-based approaches,” IEEE J. Sel. Top. Appl. Earth Obs. Rem. Sens. 5(2), 354–379 (2012). 4. J. Zhang et al., “Evaluation of jp3d for lossy and lossless compression of hyperspectral imagery,” in 2009 IEEE Int. Geoscience and Remote Sensing Symposium, IGARSS 2009, Vol. 4, pp. IV-474, IEEE, Cape Town, South Africa (2009). 5. B. Kim, Z. Xiong, and W. Pearlman, “Low bit-rate scalable video coding with 3-d set partitioning in hierarchical trees (3-D SPIHT),” IEEE Trans. Circuits Syst. Video Technol. 10(8), 1374–1387 (2000), http://dx.doi.org/10.1109/76.889025. 6. X. Tang, S. Cho, and W. Pearlman, “Comparison of 3D set partitioning methods in hyperspectral image compression featuring an improved 3D-SPIHT,” in Proc. Data Compression Conf., 2003, DCC 2003, p. 449, IEEE, Snowbird, UT (2003). 7. Y. Langevin and O. Forni, “Image and spectral image compression for four experiments on the ROSETTA and Mars express missions of ESA,” in Int. Symp. Optical Science and Technology, pp. 364–373, SPIE Press, Bellingham, WA (2000). 8. E. Christophe and W. Pearlman, “Three-dimensional SPIHT coding of volume images with random access and resolution scalability,” J. Image Video Process., 13, Article 2 (2008), http://dx.doi.org/10.1155/2008/248905. 9. J. C. Harsanyi and C. Chang, “Hyperspectral image classification and dimensionality reduction: an orthogonal subspace projection approach,” IEEE Trans. Geosci. Rem. Sens. 32(4), 779–785 (1994), http://dx.doi.org/10.1109/36.298007. 10. A. Castrodad et al., “Learning discriminative sparse models for source separation and mapping of hyperspectral imagery,” IEEE Trans. Geosci. Rem. Sens. 49(11), 4263–4281 (2011), http://dx.doi.org/10.1109/TGRS.2011.2163822. 11. C. Li et al., “A compressive sensing and unmixing scheme for hyperspectral data processing,” IEEE Trans. Image Process. 21(3), 1200–1210 (2012). 12. X. Tang and W. Pearlman, “Three-dimensional wavelet-based compression of hyperspectral images,” Hyperspectral Data Compression, pp. 273–308, Springer, New York (2006). 13. H. Wang, S. Babacan, and K. Sayood, “Lossless hyperspectral-image compression using context-based conditional average,” IEEE Trans. Geosci. Rem. Sens. 45(12), 4187–4193 (2007), http://dx.doi.org/10.1109/TGRS.2007.906085. 14. G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed., The Johns Hopkins University Press, Baltimore, Maryland (1996). 15. I. Jolliffe, Principal Component Analysis, 2nd ed., Springer, New York (2002). 16. Q. Du and J. Fowler, “Hyperspectral image compression using JPEG2000 and principal component analysis,” IEEE Geosci. Rem. Sens. Lett. 4(2), 201–205 (2007), http://dx.doi .org/10.1109/LGRS.2006.888109. 17. J. Fowler, “Compressive-projection principle component analysis,” IEEE Trans. Image Process. 18(10), 2230–2242 (2009), http://dx.doi.org/10.1109/TIP.2009.2025089. 18. P. Drineas and M. W. Mahoney, “A randomized algorithm for a tensor-based generalization of the SVD,” Linear Algebra Appl. 420(2–3), 553–571 (2007), http://dx.doi.org/10.1016/j .laa.2006.08.023. Journal of Applied Remote Sensing 074598-15 Vol. 7, 2013 Zhang, Pauca, and Plemmons: Randomized methods in lossless compression of hyperspectral data 19. J. Zhang et al., “Randomized SVD methods in hyperspectral imaging,” J. Elect. Comput. Eng., article 3, in press (2012). 20. L. Trefethen and D. Bau, Numerical Linear Algebra, Lecture 31, SIAM, Philadelphia, PA (1997). 21. N. Halko, P. G. Martinsson, and J. A. Tropp, “Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions,” SIAM Rev. 53(2), 217–288 (2011), http://dx.doi.org/10.1137/090771806. 22. Y. Chen, N. Nasrabadi, and T. Tran, “Effects of linear projections on the performance of target detection and classification in hyperspectral imagery,” J. Appl. Rem. Sens. 5(1), 053563 (2011), http://dx.doi.org/10.1117/1.3659894. 23. Q. Zhang et al., “Joint segmentation and reconstruction of hyperspectral data with compressed measurements,” Appl. Opt. 50(22), 4417–4435 (2011), http://dx.doi.org/ 10.1364/AO.50.004417. 24. M. Gehm et al., “Single-shot compressive spectral imaging with a dual-disperser architecture,” Opt. Express 15(21), 14013–14027 (2007), http://dx.doi.org/10.1364/OE.15.014013. 25. A. Wagadarikar et al., “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt. 47(10), B44–B51 (2008), http://dx.doi.org/10.1364/AO.47.000B44. 26. S. Vempala, The Random Projection Method, Vol. 65, American Mathematical Society, Providence, Rhode Island (2004). 27. Q. Zhang, V. P. Pauca, and R. Plemmons, “Image reconstruction from double random projections,” (2013), to be submitted. 28. W. Bajwa et al., “Toeplitz-structured compressed sensing matrices,” in IEEE/SP 14th Workshop on Statistical Signal Processing, 2007, SSP’07, pp. 294–298, IEEE, Madison, WI (2007). 29. R. Archibald and G. Fann, “Feature selection and classification of hyperspectral images with support vector machines,” IEEE Geosci. Rem. Sens. Lett. 4(4), 674–677 (2007), http://dx.doi.org/10.1109/LGRS.2007.905116. 30. J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,” IEEE Trans. Inf. Theor. 24(5), 530–536 (1978), http://dx.doi.org/10.1109/TIT.1978 .1055934. 31. K. Skretting, J. H. Husøy, and S. O. Aase, “Improved Huffman coding using recursive splitting,” in Proc. Norwegian Signal Processing, NORSIG, IEEE, Norway (1999). 32. M. Nelson and J.-L. Gailly, The Data Compression Book, 2nd ed., M & T Books, New York, NY (1995). 33. T. Acharya and P.-S. Tsai, JPEG2000 Standard for Image Compression: Concepts, Algorithms and VLSI Architectures, Wiley & Sons Ltd., Hoboken, NJ (2005). 34. M. O’Neil and M. Burtscher, “Floating-point data compression at 75 gb∕s on a GPU,” in Proc. Fourth Workshop on General Purpose Processing on Graphics Processing Units, p. 7, ACM, New York, NY (2011). Biographies and photographs of the authors are not available. Journal of Applied Remote Sensing 074598-16 Vol. 7, 2013