Hyperspectral detection algorithms: Use covariances or subspaces? Please share

advertisement
Hyperspectral detection algorithms: Use covariances or
subspaces?
The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.
Citation
Manolakis, D. et al. “Hyperspectral detection algorithms: use
covariances or subspaces?.” Imaging Spectrometry XIV. Ed.
Sylvia S. Shen & Paul E. Lewis. San Diego, CA, USA: SPIE,
2009. 74570Q-8. © 2009 SPIE
As Published
http://dx.doi.org/10.1117/12.828397
Publisher
Society of Photo-optical Instrumentation Engineers
Version
Final published version
Accessed
Thu May 26 07:55:43 EDT 2016
Citable Link
http://hdl.handle.net/1721.1/52735
Terms of Use
Article is made available in accordance with the publisher's policy
and may be subject to US copyright law. Please refer to the
publisher's site for terms of use.
Detailed Terms
Hyperspectral Detection Algorithms: Use Covariances or Subspaces?
D. Manolakis, R. Lockwooda , T. Cooleyb and J. Jacobsonc
MIT Lincoln Laboratory, 244 Wood Street, Lexington, MA 02420
a Space Vehicles Directorate Air Force Research Laboratory
29 Randolph Road, Hanscom AFB, MA 01731-3010
b Space Vehicles Directorate Air Force Research Laboratory
Kirtland AFB, 2251 Maxwell Ave, Kirtland AFB, NM, 87117
c National Air and Space Intelligence Center, Wright-Patterson AFB, OH
ABSTRACT
There are two broad classes of hyperspectral detection algorithms.1, 2 Algorithms in the first class use the spectral covariance matrix of the background clutter; in contrast, algorithms in the second class characterize the background using a
subspace model. In this paper we show that, due to the nature of hyperspectral imaging data, the two families of algorithms
are intimately related. The link between the two representations of the background clutter is the low-rank of the covariance
matrix of natural hyperspectral backgrounds and its relation to the spectral linear mixture model. This link is developed
using the method of dominant mode rejection. Finally, the effects of regularization, covariance shrinkage, and dominant
mode rejection are discussed in the context of robust matched filtering algorithms.
Keywords: Hyperspectral imaging, target detection, statistical modeling, background characterization.
1. INTRODUCTION
The detection of materials and objects using remotely sensed spectral information has many military and civilian applications. Hyperspectral imaging sensors measure the radiance for every pixel at a large number of narrow spectral bands.
The obtained measurements are known as the radiance spectrum of the pixel. In the reflective part of the electromagnetic
spectrum (0.4μm-2.5μm), the spectral information characterizing a material is the reflectance spectrum, defined as the
ratio between reflected and incident radiation as a function of wavelength.
The most widely used detection algorithms use the covariance matrix of the background data; however, there are algorithms that use a subspace model formed by the endmembers of a linear mixing model or the eigenvectors of the covariance
matrix.3 Finding the endmembers in a data cube is a non-trivial task whose complexity exceeds that of the detection problem. On the other hand, due to the high dimensionality of hyperspectral imaging data, the estimated covariance matrix may
be inaccurate or numerically unstable. A practical approach to improve the quality of the estimated covariance matrix is to
use covariance shrinkage or the dominant mode rejection approximation. The invertibility of the estimated matrix can be
assured by using regularization. These techniques lead to the development of robust matched filter detectors which can be
used in practical applications without concerts about numerical instabilities. These issues are the topic of this paper, which
is organized as follows. Section 2 discusses two approaches to covariance matrix regularization: matched filter optimization and shrinkage. In Section 3 we present an approach to covariance matrix estimation and inversion using domonant
mode rejection and diagonal loading. Section 4 presents an interpretation of dominant mode rejection as covariance matrix
augmentation. In Section 5 we discuss the relationship between the subspaces generated by eigenvectors and endmembers.
Finally, Section 6 explores the relationship between covariance and subspace based detectors.
2. COVARIANCE MATRIX REGULARIZATION
Accurate estimation and numerically robust inversion of the covariance matrix is critical in hyperspectral detection applications. We next present two different approaches to regularization that lead to the same diagonal loading solution.
Correspondence to D.Manolakis. E-mail: dmanolakis@ll.mit.edu, Telephone: 781-981-0524, Fax: 781-981-7271
Imaging Spectrometry XIV, edited by Sylvia S. Shen, Paul E. Lewis, Proc. of SPIE Vol. 7457,
74570Q · © 2009 SPIE · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.828397
Proc. of SPIE Vol. 7457 74570Q-1
Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms
2.1 The Matched Filter Approach
The spectral measurements obtained by a p-band hyperspectral imaging sensor can be arranged in vector form as
x = x1
x2
...
xp
T
(1)
where T denotes matrix transpose. Let v be a p × 1 random vector from a normal distribution with mean μ and covariance
matrix Σ representing the background clutter. Finally, let s0 be a p × 1 vector representing the spectral signature of the
target of interest. To simplify notation, we assume that μ is removed from all spectra, that is, we deal with zero mean
clutter and a “clutter-centered” target signature.
The Optimum Matched Filter The optimum linear matched filter4 is a linear operator
y = hT x
(2)
which can be determined by minimizing the output clutter power Var(y 2 ) = hT Σh subject to a unity gain constraint in
the direction of the target spectral signature
min hT Σh subject to
h
The solution to (3) is given by
h=
hT s0 = 1
Σ−1 s0
sT0 Σ−1 s0
(3)
(4)
which is the formula for the widely used matched filter.
In the array processing area, where the data and filter vectors are complex, the matched filter (4) is known as the
standard Capon beamformer (SCB).5
In practice, the clutter covariance matrix Σ and the target signature s0 have to be estimated from the available data. It
turns out that the matched filter (4) is sensitive to signature errors and the quality of the clutter covariance matrix. Therefore,
the development of matched filters that are robust to signature and clutter covariance errors is highly desirable. This
problem has been traditionally dealt with using a diagonal loading approach or an eigenspace-based approach. However,
in both case the selection of diagonal loading or the subspace dimension is ad-hoc.5
Quadratically Constrained Matched Filter The robustness of matched filter to covariance matrix and signature mismatch can be improved by constraining the size of hT h. This is done by solving the following optimization problem
min hT Σh subject to
h
hT s0 = 1 and hT h ≤ h
(5)
The solution is the well-known diagonally loaded matched filter
hd =
(Σ + δh I)−1 s0
sT0 (Σ + δh I)−1 s0
(6)
The load level δh can be computed from h by solving a nonlinear equation. However, it is not clear what is the meaning
and how to choose the parameter h . This issue is addressed next using the robust Capon beamformer approach.
The Robust Matched Filter In this section, we shall use the theory of robust Capon beamformer (RCB)? to develop a
robust matched filter that takes measurement errors and the spectral variability of hyperspectral target signatures into consideration. The robust matched filter (RMF) addresses robustness to target signature errors by introducing an uncertainty
region constraint into the optimization process. To this end, assume that the only knowledge we have about the signature s
is that it belongs to an uncertainty ellipsoid
(s − s0 )T C −1 (s − s0 ) ≤ 1
Proc. of SPIE Vol. 7457 74570Q-2
Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms
(7)
where the vector s0 and the positive definite matrix C are given. In most hyperspectral target detection applications, it is
difficult to get sufficient data to reliably estimate the full matrix C. Therefore, we usually set C = εI, so that (7) becomes
||s − s0 ||2 ≤ ε
(8)
where ε is a positive number. These ideas are illustrated in Figure 1(a). It has been shown in? that the RMF can be obtained
as the solution to the following optimization problem
min sT Σ−1 s
||s − s0 ||2 ≤ ε
subject to
s
(9)
It turns out that the solution of (9) occurs on the boundary of the constraint set; therefore, we can reformulate (9) as a
quadratic optimization problem with a quadratic equality constraint
min sT Σ−1 s
||s − s0 ||2 = ε
subject to
s
(10)
This problem can be efficiently solved using the method of Lagrange multipliers.6 The solution involves an estimated
target signature
(11)
ŝ = ζ(Σ−1 + βI)−1 s0
which is subsequently used to determine the RMF by
hβ =
Σ−1 ŝ
(12)
ŝT Σ−1 ŝ
The Lagrange multiplier ζ ≥ 0 can be obtained by solving the nonlinear equation
sT0 (I + ζΣ)−2 s0 =
L
k=1
|s̃k |2
=ε
(1 + ζλk )2
(13)
where λk and s̃k are obtained from the eigen-decomposition
Σ = QΛQT =
K
λk qk qkT
(14)
k=1
and the orthogonal transformation
s̃ = QT s0
(15)
The solution of (13) can be easily done using some nonlinear optimization algorithm, for example, Newton’s method.
Finally, we note that the RMF (12) can be expressed in diagonal loading form as follows
hζ =
sT0 (Σ
(Σ + ζ −1 I)−1 s0
+ ζ −1 I)−1 Σ(Σ + ζ −1 I)−1 s0
(16)
where ζ −1 is a loading factor? computed from (13).
Figure 1(b) illustrates the validity of the optimization approach leading to the RMF. We note that the RMF is obtained
as a standard MF for a modified target signature. As expected the “assumed” target signature specifies the center of the
uncertainty region, whereas the modified signature “touches” the boundary of the uncertainty region.
2.2 Covariance Shrinkage
In practice the covariance matrix Σ has to be estimated from a set of observations xk , k = 1, 2, . . . , N . The most widely
used estimate is the sample covariance matrix defined by the well-known formula
Σ̂ =
N
1 (xk − μ̂)(xk − μ̂)T ,
N
μ̂ =
k=1
N
1 xk
N
k=1
Proc. of SPIE Vol. 7457 74570Q-3
Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms
(17)
3
ε = 0.52
2
Actual target
signature
ε
s
s0
Background
variability
1
Assumed target
signature
Band 2
h2
Target
variability
θ
0
-1
0
hT Σh = constant
h1
-2
-3
-3
-2
-1
0
Band 1
1
2
3
Figure 1. (a) Illustration of robust matched filter design principle using two spectral bands. (b) Illustration of robust matched filter when
there is a target signature mismatch. The algorithm uses the available signature specifying the center of the uncertainty region, to produce
a “robust” signature that is subsequently used to determine the RMF coefficients.
The sample covariance matrix has appealing properties: it is asymptotically unbiased and maximum likelihood under
normality. Since Σ has p(p + 1)/2 free parameters which have to be estimated from p × N measurements, we can get
good estimates only when N p. However, when N is of the order of p, Σ̂ is a poor estimate of Σ.
p p
To mitigate the problem that Σ̃ − Σ2 , where A2 = i=1 j=1 a2ij is the Frobenious norm, is large when p is
relatively large, it is suggested that we use a shrunk estimator
Σ̃(δ) = δF + (1 − δ)Σ̂,
0≤δ≤1
(18)
where F is a constrained version of σ. The basic idea is to reduce the variance of the estimator by increasing its bias.
The sample covariance has many free parameters and very little structure; as a result, it is asymptotically unbiased but it
has a lot of estimation error. The matrix F has a few free parameters and a lot of structure. As a result of stringent and
misspecified structural assumptions, F has significant bias but insignificant variance. This technique is called shrinkage
because the sample covariance matrix is “shrunk” toward the structured estimator. The number δ is referred to as the
shrinkage constant. A shrinkage estimator has three components: an estimator with no structure, an estimator with a lot
of structure, and a shrinkage constant. The typical choice for F in (18) is the identity matrix. When δ 1 we have
Σ̃(δ) = Σ̂ + δI, which is identical to diagonal loading. The shrinkage approach to covariance matrix estimation, including
estimation of δ, is thoroughly discussed by Ledoit and Wolf.7
3. ESTIMATION AND INVERSION OF COVARIANCE MATRIX USING DOMINANT MODES
The basic idea is to estimate only the large eigenvalues and corresponding eigenvectors of the covariance matrix Σ of the
background.8 The advantage is that we can obtain better estimates of Σ with fewer spectra. The spectral decomposition
of the covariance matrix can be broken into two parts: one for the d largest eigenvalues and one for the (p − d) smaller
eigenvalues
p
p
d
λi qi qiT =
λi qi qiT +
λi qi qiT
(19)
Σ=
i=1
i=1
i=d+1
or in compact matrix form as
Σ = QΛQT = Q1
Q2
Λ1
0
0
Λ2
T
Q1
QT2
= Q1 Λ1 QT1 + Q2 Λ2 QT2
Proc. of SPIE Vol. 7457 74570Q-4
Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms
(20)
(21)
where Λ is the diagonal matrix of p eigenvalues of Σ sorted in decreasing order and Q is a matrix whose columns are the
corresponding eigenvectors. The elements of the other matrices can be easily determined by comparing (20) to (19).
Since some of the smaller eigenvalues may be zero, Σ may be less than full rank. The small eigenvalues and their
eigenvectors are difficult to estimate and hard to compute accurately when Σ is ill conditioned. If we replace the last p − d
eigenvalues by a constant α, we obtain the approximation
Σ̃ =
d
λi qi qiT
p
+α
i=1
qi qiT
(22)
i=d+1
From the orthogonality relation QQT = I, we have
p
qi qiT = I −
d
qi qiT
(23)
(λi − α)qi qiT + αI
(24)
i=1
i=d+1
Substituting (23) into (22) yields
Σ̃ =
d
i=1
To express the inverse of Σ̃ explicitly, we first rewrite (24) as
α
−1
Σ̃ = I +
d
qi
i=1
λi − α
α
qiT = I + Q1 Λ1 QT1
(25)
where
1
diag {(λ1 − α), (λ2 − α), . . . , (λd − α)}
α
Using (25) and the matrix inversion lemma
Λa =
(A + BCD)−1 = A−1 − A−1 B(DA−1 B + C −1 )−1 DA−1
we obtain the expression
Σ̃−1
d λi − α
1
I−
=
qi qiT
α
λ
i
i=1
One way to determine α is by requiring that trΣ = trΣ̃, where tr denotes the trace of a matrix. This yields
p
d
1
1
α=
trΣ −
λi =
λi
p−d
p−d
i=1
(26)
(27)
(28)
(29)
i=d+1
which is the average of the smaller p − d eigenvalues of Σ.
Repeating the same process for the matrix Σ̃ + δI, we can easily show that
d λ
−
α
1
i
I−
(Σ̃ + δI)−1 =
qi qiT
α+δ
λ
+
δ
i
i=1
d
1
T
=
I−
βi qi qi
α+δ
i=1
(30)
where
λi − α
λi + δ
This procedure introduces diagonal loading to the dominant modes.
βi Proc. of SPIE Vol. 7457 74570Q-5
Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms
(31)
4. DOMINANT MODE INTERPRETATION AS COVARIANCE MATRIX AUGMENTATION
Consider a data set with covariance matrix Σ, which is singular with rank d < p. Since the eigenvalues λd+1 = · · · =
λp = 0, the spectral decomposition of Σ is given by
Λ1 0 QT1
Σ = Q1 Q2
(32)
0 0 QT2
One possible approach to make Σ non-singular, is to find an augmented matrix Σ̃ in such a way that it retains its major
characteristics
1. Σ̃ is symmetric,
2. Σ̃ has full rank p,
3. the first d principal axes of Σ̃ are the same as those of Σ, and they are in the same order,
4. the last p − d principal axes are indeterminate, that is, the corresponding eigenvalues of Σ̃ are identical, and
5. tr(Σ̃) = tr(Σ).
These criteria are all upheld by the matrix Σ̃ defined by
Λ1 + δI
1
Q1 Q2
Σ̃ =
0
γ
0
(α + δ)I
T
Q1
QT2
(33)
where α and δ are parameters satisfying δ ≥ 0, α < λd and α + δ > 0, and γ is a normalizing constant given by
d
d
λi /
λi
γ = δp + α(p − d) +
i=1
(34)
i=1
A justification for this approach is provided by the optimum least squares approximation interpretation of principal component analysis (PCA). The method of PCA provides the best d-dimensional approximation to a p-dimensional set of data
by projecting the data onto the first d principal components. The p × p covariance matrix Σ of the projected data is singular
with rank d < p. The matrix (33) can be used to define an inverse PCA transform to “reverse” this process, that is, to
obtain the “nearest” p-dimensional non-singular approximation to an d-dimensional singular set of data.
Dominant Mode Robust Matched Filter If we use the DMR approximation, the matched filter coefficients for a target
with spectral signature s0 , can be evaluated using the formulas
d
−1
T
h = κ(Σ̃ + δI) s0 = κ s0 −
βi (qi s0 )qi
(35)
i=1
where
κ
sT0 (Σ̃
−1
+ δI)
s0
−1
=
sT0 s0
−
d
−1
βi (qiT s0 )2
i=1
Proc. of SPIE Vol. 7457 74570Q-6
Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms
(36)
5. EIGENVECTORS AND ENDMEMBERS
The spectral linear mixing model assumes that the spectrum of any pixel can be expressed as
x=
M
ai si + w = Sa + w
(37)
i=1
where si is an endmember spectrum, ai ≥ 0 its abundance, and w is normally distributed noise with mean zero and
2
I. If the endmembers are assumed linearly independent, matrix S = [s1 . . . sM ] has rank M . The
covariance matrix σw
covariance matrix of x is given by
2
I
(38)
Σ = SAS T + σw
2
where A = diag{a21 , . . . , a2M }. We next accept the approximation (22) with d = M and set α = σw
. Then (22) and (38)
yield
2
Qd Λa QTd = SAS T
(39)
σw
that is, the columns of Qd and S span the same space. Therefore, at least in theory, either S or Qd can be used for the
implementation of low-rank detectors.
Under the assumptions of the linear mixing model, the maximum likelihood estimate of the background subspace is
spanned by the M -dominant eigenvectors of the estimated correlation matrix of the data.9 In practice, the covariance
matrix of the noise in hyperspectral data differs from σ 2 I; therefore, this result is an approximation. For non-zero mean
data there is a difference between the linear subspace defined by a covariance matrix and the affine subspace defined by
the correlation matrix.3 Although the two approaches are theoretically different, for detection applications we de-mean the
data and we work with the covariance matrix of the background. Demeaning does not make the two approaches equivalent
but it appears to be sufficient for practical applications.
6. COVARIANCE OR SUBSPACE BASED DETECTORS
If we assume that λi α, for all 1 ≤ i ≤ d, we obtain the Principal Component Inversion (PCI) approximation of the
inverse covariance matrix
d
1
1
(40)
I−
I − Q1 QT1
qi qiT =
Σ̃−1
PCI =
α
α
i=1
This case, which is also known as zero-variance discrimination in the statistics literature, provides the link between the
matched filter and subspace detection algorithms, like the OSP.10 Since endmembers and dominant eigenvectors of the
covariance matrix span the same subspace, there is a strong relationship between covariance-based and subspace-based
detection algorithms. The link between the two classes of algorithms is provided by (30). Although there is a difference
between covariance matrix and correlation matrix eigenspaces, we should keep in mind that the derivation of optimum
detection and classification algorithms under a normal distribution model involves the use of covariance matrices. In
Figure 2 we show an example of detection statistics for the OSP detector with M = 5 eigenvectors, the matched filter with
d = 5 dominant modes, and the CEM detector (this is basically a matched filter using the correlation matrix) with d = 5
dominant modes. We note a strong similarity between the three detection statistics. Similar results have been obtained
for other cases. Based on these findings and the underlying theoretical arguments we prefer the use of covariance-based
detectors in practical hyperspectral imaging applications.
7. SUMMARY
In this paper we discussed the use of regularization and dominant mode rejection techniques in the implementation of
hyperspectral detection algorithms. We then used the dominant mode rejection inversion of the covariance matrix to obtain
a link between covariance and subspace detection algorithms. Experimental investigations showed that we can emulate
the behavior of subspace algorithms by changing the number of dominant modes in a matched filter detector. Further
work to fully understand the effects of regularization, covariance shrinkage, and dominant mode rejection on detection and
classification algorithms is in progress.
Proc. of SPIE Vol. 7457 74570Q-7
Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms
Figure 2. Detection statistics for the orthogonal subspace projection algorithm, the matched filter, and the constrained energy minimization (CEM) algorithm with DMR inversion and regularization.
REFERENCES
[1] D. Manolakis and G. Shaw, “Detection algorithms for hyperspectral imaging application,” IEEE Signal Processing
Magazine , pp. 29–43, January 2002.
[2] D. Manolakis, D. Marden, and G. Shaw, “Target detection algorithms for hyperspectral imaging application,” Lincoln
Laboratory Journal 14(1), pp. 79–116, 2003.
[3] D. Manolakis, R. Lockwood, T. Cooley, and J. Jacobson, “Is there a best hyperspectral detection algorithm?,” Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XV 7334(1), p. 733402, SPIE,
2009.
[4] S. M. Kay, Fundamentals of Statistical Signal Processing, Prentice Hall, New Jersey, 1998.
[5] H. L. V. Trees, Optimum Array Processing, Wiley, New York, 2002.
[6] P. Gill, W. Murray, and M. Wright, Practical Optimization, Academic Press, London, UK, 1981.
[7] O. Ledoit and M. Wolf, “A well-conditioned estimator for large-dimensional covariance matrices,” Journal of Multivariate Analysis 88, pp. 365–411, 2004.
[8] H. Cox and R. Pitre, “Robust dmr and multi-rate adaptive beamforming,” Signals, Systems & Computers, 1997.
Conference Record of the Thirty-First Asilomar Conference on 1, pp. 920–924 vol.1, Nov 1997.
[9] L. Scharf, Statistical Signal Processing, Addison-Wesley, Reading, MA, 1991.
[10] J. C. Harsanyi and C. I. Chang, “Detection of low probability subpixel targets in hyperspectral image sequences with
unknown backgrounds,” IEEE Trans. Geoscience and Remote Sensing 32, pp. 779–785, July 1994.
ACKNOWLEDGMENTS
This work was sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States
Government.
Proc. of SPIE Vol. 7457 74570Q-8
Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms
Download