Document 11863963

advertisement
This file was created by scanning the printed publication.
Errors identified by the software have been corrected;
however, some errors may remain.
Spatial and Probabilistic
Classification of Forest Structures
Using Landsat TM Data
Jeffrey L. Moffettl and Julian ~ e s a g
Abstract. Satellite sensors record upwelling radiant flux from the Earth's
surface. Classifying forest structures from these measurements is a
statistical inference problem, assuming the classes are defined at scales
corresponding to the spectral and spatial resolution of the data. Bayesian
image analysis methods incorporate both spectral and spatial (contextual)
information to improve classification accuracy over spectral classifiers.
Stochastic simulation techniques associated with Bayesian models provide
a mechanism for estimating class probability vectors for each image pixel.
This study evaluates a Markov chain Monte Carlo (MCMC) technique,
the Gibbs sampler, for forest classification. The Gibbs sampler runs as a
time-homogeneous Markov chain. Class estimates are obtained by seeking
the marginal posterior modes (MPM) of the limit distribution. The image
is topographically normalized and training samples are collected from
randomly distributed pixels. Preliminary results using a simple model
show an improvement over maximum lrkelihood. A hierarchical model
designed to incorporate the terrain effects and the system point spread
function (PSF) is currently under development.
INTRODUCTION
Inferring forest stand types from image data is a classification process that
reduces raw data to useful information. Satellite remote sensing can be useful as
a means of stratifying land cover in the first stage of a multi-stage sampling
scheme for forest inventory. Classified maps also provide necessary information
for landscape planning such as the distribution of stand structure classes by area
across a watershed. Probabilistic contextual classifiers are well suited to modelling
the uncertainty inherent in the process.
BAYESIAN IMAGE ANALYSIS METHODOLOGY
For a given scale, the desired classification can be modelled as a discrete
Markov random field (MRF) x with an associated probability distribution a(x).
' ~ r a d u a t estudent, Silviculture Laboratory, University of Washington, Seattle, WA (jmoffett@silvae.cfr.woshington.edu).
'professor, Department oj'Statistics. UniversiQ of Washington. Seattle, WA.
This probability distribution can be combined with a likelihood density for the
data y given x. The combination gives a posterior probability distribution for the
random field. The posterior distribution is thus a conditional probability
distribution for the classification given the image data. Statistical inferences about
x are then based on the posterior distribution P(X ly).
There are several advantages of this approach. In addition to classifying each
pixel on the basis of the spectral information contained in the image data, other
a priori information can be specifically modelled by the prior distribution. This
information pertains to the local characteristics of the classification, e.g. having
adjacent pixels belonging to the same class, having objects with closed boundaries,
or preventing two particular classes from being adjacent. Unlike post-classification
smoothing algorithms, such as majority filters, Bayesian classification techniques
incorporate spatial information without losing fidelity to the data.
Observed data
Consider a two-dimensional rectangular array of pixels over a finite region S
with each pixel identified by an index i = 1,2, ..., n. For each pixel in S a remote
sensor records a measurement of radiant flux that is a sample distribution of the
upwelling radiance, degraded by a stochastic process. The imperfect observations
form an image y = {y,: i E S ) . The yi's are vector valued for bands 1-5,7.
Markov random fields
Let X be a random vector that classifies y over region S and x = {xi: i E S } be
any realization of X. Thus, x denotes an arbitrary classification of S and X =
{X,,...,X,}. Each univariate Xi takes values among c unordered classes. The
objective of classification is to estimate the "true" class image x*. Bayesian image
analysis assumes x*is a realization of a locally dependent MRF with an associated
joint probability distribution ~ ( x assigning
)
classes to S (Besag, 1986). In forest
type mapping stands are typically larger than the individual pixels. Exploiting this
spatial continuity of the surface cover at a articular scale relative to the image
resolution provides a second source of imperfect information about x*.
Unlike time dependent variables the only logical conditioning set for spatial
variables is all x except xi. An MRF must be defined with respect to a set of
neighbors, written as xai, and satisfy the positivity condition: ~ ( x >) 0 for all x.
Given a specified neighborhood system, the Markov property states P(X, Ix-~)=
T(x,Ix~~),
where
represents all pixels other than pixel i . The Markov property
allows the full conditional distribution of xi to be specified in terms of its
neighbors. As a result of this assumption, ~ ( x is) a Gibbs distribution with respect
to the assumed neighborhood (Geman and Geman, 1984). The MRF-Gibbs
equivalence provides an explicit formula for the jointly distributed random field,
in terms of the full conditionals or Gibbs potential functions.
A second-order neighborhood consists of eight neighbors in the horizontal,
vertical, and diagonal directions. The general form of a pairwise interaction MRF
for discrete unordered classes can be written as follows (Besag, 1986):
where n, is the number of pixels in class k and n, is the number of distinct class
pairs (k,l).The a,'s and the P,'s govern the percentage of pixels in each class and
the strength of interaction between different classes; for P,, > 0 classes k and 1 are
discouraged from being neighbors.
Likelihood function
The recorded spectral radiance influences the classification through the
likelihood function. In contrast to the prior, the likelihood is ideally determined
by knowledge of the sensing process and the existence of training data. The
likelihood function represents the joint probability density as a conditional density
of the data y given the actual scene x:
Written in this form the image data is assumed to be conditionally independent
given knowledge of the classes, or class conditionally independent. While this
assumption expedites an overview of the theory, and may be reasonable for some
applications, it is not a valid assumption for remotely sensed images.
Posterior distribution
Bayes theorem can be used to construct a posterior distribution ~ (ly)xon which
inferences about x* can be based. Using this theorem, the posterior distribution for
the scene x given the data y can be written as,
The computational burden presented by the posterior distribution is overcome
by the fact that it too is an MRF with a Gibbs distribution. This allows pixels to
be evaluated individually based on their full conditional distributions, written as
The conditional prior distribution follows from ~ ( x=) a ( x i IX.~)T ( x . ~ ) .
Image estimation
Within this framework the final classification is a point estimate of x*. Due to
the high dimensionality of x the optimization required to find an exact solution is
infeasible for realistic scenes. Estimates of various image attributes can be
obtained using stochastic simulation. The marginal posterior modes (MPM)
estimate will minimize the expected number of misclassifications by maximizing
the marginal probability with which each pixel is classified. MPM has been
increasingly adopted in recent years for classification problems (Besag et al.,
1991; Green et aL, 1994). MPM estimates of xi* are obtained by maximizing
r ( x i 1 y), where the most frequently sampled class for pixel i is the estimate of xi*.
This estimate can be computed using the Gibbs sampler (Geman and Geman,
1984) which runs as a discrete-time Markov chain, with state space the set of all
possible classifications x and limit distribution a ( x ly) (Besag, 1989). For a single
iteration, a new xi is randomly sampled for each pixel from its full conditional
distribution. This generates a sequence of stochastically dependent classifications
{x}. An initial classification is required to start the algorithm followed by a
number of iterations during which samples are not collected. This burn-in period
will minimize the influence of the initial classification. This procedure also
estimates a posterior probability distribution for each pixel, providing spatially
explicit information about the certainty of the point estimate classification.
Credible intervals can also be obtained for measurements such as class areas. The
MPM estimate requires simulating a predetermined number of realizations of the
MRF and it can be difficult to determine the number of realizations required to
assure convergence. Parameter estimation can also be problematic.
A PRELIMINARY MODEL
Using this approach a project was undertaken to classify TM data for the Lizard
Lake watershed near Clallam Bay, Washington. The area is covered by second
growth forests. Six classes were selected to represent general forest structures and
land cover: clearcut (CC); stand initiation (SI); stem exclusion (SE); older forest
(OF); hardwood (HD); and deep water (W). Class means and covariances were
estimated from randomly selected pixels within training sites to avoid biasing the
estimates. As this is an area of rugged terrain, the backwards radiance correction
transformation (Colby, 1991) was used to topographically normalize the image.
The model
A simple model was constructed by first adopting the conventional maximum
likelihood function (ML), which assumes the data to be class conditionally
independent and normally distributed. A discrete pairwise difference prior was
chosen to model the tendency of adjacent pixels to be of the same class. The prior
distribution is written as follows,
where j'is a strength of interaction parameter, and ni(c) counts the number of
neighbors (8 nearest) of pixel i in class c for c = {O,...,5 ) .
Results
The Gibbs sampler was written in C. Samples were collected for 3000 iterations
after a burn-in of 1000 iterations; J was fixed at 0.5. Figure 1 shows the "ground
truth" classification, which does not indicate the presence of hardwood patches
known to be on this landscape. Figure 2 shows the ML estimate from data that
was not normalized. The topographic shade causes some SE forests to be
classified as OF. Figure 3 shows the MPM point estimate from normalized data.
Figure 3 also shows a single pixel at the center of a small lake where the water
is deepest (lower left corner). The influence of the likelihood preserves the single
deep water pixel, which would be lost using a majority filter. Figure 4 shows a
posterior probability classification of SE forest. Note that the edges of the younger
forest types and older forests have a modest probability of being stem exclusion.
A HIERARCHICAL MODEL
The likelihood in the above model is problematic and the prior is the most
basic. Sources of variation such as atmospheric scattering and topography
complicate the classification process. As an alternative to data preprocessing, the
Bayesian approach allows separate models for each source of variation to be
x a(zllzz), ..., P ( Z ~
1y)
linked through successive conditioning of the form ~ ( lzl),
(Smith and Roberts, 1993). Each source of variation is modelled by a conditional
probability distribution as a stage within a hierarchical framework. If a secondorder neighborhood is assumed, triple and quadruple pixel interaction can be
modelled (Besag, 1974). Triples are useful for modelling convex stand boundaries
and quadruples for modelling interior regions. Recent research suggests that for
restoration problems a crude prior will suffice, but that estimates of attributes such
as class area and perimeter are sensitive to the specification of the prior
distribution (Tjelmeland and Besag, 1996).
Topographic effect model
Estimates of slope and aspect are generally calculated from digital elevation
models (DEMs), but are subject to several sources of error. Given the likely errors
and little knowledge of the surface photometric function it is therefore logical to
model the effect of topography with some degree of uncertainty. This can be
specified by first defining z to be the measured pixel image incorporating the
effects of topography. Assuming z to be conditionally independent given x, and
each zj to be normally distributed gives the following model,
Figure 1. "Ground truth." Note
missing patches of hardwoods.
Figure 3. MPM point estimate, P=0.5.
Figure 2. ML classification without
opographic normalization showing a
reater portion of older forest.
Figure 4. Posterior probability of
SE forest, P=0.5. Linear stretch with
brighter shades indicating higher
probability.
P(
z
I X,
DEM, sun position)
cc
n
8
exp( -(zj
2 j=l
coski coske
-
(
cos e
)P~)~),
where cos i is the cosine of the angle of incidence, cos e is the cosine of the
slope, 0 is the variance, px are the means, and k is the Minnaert constant. This
model is derived from the normalization transformation of Colby (199 1). The
assumption of normality in this model is arbitrary, requiring further investigation.
System PSF effects model
After reflecting off of a topographic surface the upwelling radiant flux is
scattered by the atmosphere and recorded as a function of the sensor PSF (Duggin
and Robinove, 1990). If the data y are assumed normally distributed and
conditionally independent given z,then this process can be modelled as follows,
where h is the system PSF detecting radiation from j pixels in the proximity of
pixel i, and 4 is the variance. Note also that h will be a function of wavelength.
Posterior distribution
Bayes theorem allows these mode1s to be combined and gives the following
joint posterior distribution,
r ( x , z l Y ) C-(Y
Again an MPM estimate can be obtained by sampling from the full conditionals.
However the added complexity of this model may require the use of a Metropolis
algorithm. The complexity will be further increased if prior distributions are
specified for some of the parameters as a fully Bayesian approach would suggest.
CONCLUSION
Appropriate accuracy assessment methods for probabilistic classifications must
still be developed. Estimates of posterior probabilities will be difficult to verify.
The "ground truth" for assessing point estimates needs to be at the same scale
and resolution as the image data. Sensitivity analysis must be conducted to assess
the stability of the solutions. Topographic normalization and spatial information
can improve forest classification. Probabilistic classifiers provide additional
information by estimating the level of certainty with which a pixel is classified,
Download