This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. Spatial and Probabilistic Classification of Forest Structures Using Landsat TM Data Jeffrey L. Moffettl and Julian ~ e s a g Abstract. Satellite sensors record upwelling radiant flux from the Earth's surface. Classifying forest structures from these measurements is a statistical inference problem, assuming the classes are defined at scales corresponding to the spectral and spatial resolution of the data. Bayesian image analysis methods incorporate both spectral and spatial (contextual) information to improve classification accuracy over spectral classifiers. Stochastic simulation techniques associated with Bayesian models provide a mechanism for estimating class probability vectors for each image pixel. This study evaluates a Markov chain Monte Carlo (MCMC) technique, the Gibbs sampler, for forest classification. The Gibbs sampler runs as a time-homogeneous Markov chain. Class estimates are obtained by seeking the marginal posterior modes (MPM) of the limit distribution. The image is topographically normalized and training samples are collected from randomly distributed pixels. Preliminary results using a simple model show an improvement over maximum lrkelihood. A hierarchical model designed to incorporate the terrain effects and the system point spread function (PSF) is currently under development. INTRODUCTION Inferring forest stand types from image data is a classification process that reduces raw data to useful information. Satellite remote sensing can be useful as a means of stratifying land cover in the first stage of a multi-stage sampling scheme for forest inventory. Classified maps also provide necessary information for landscape planning such as the distribution of stand structure classes by area across a watershed. Probabilistic contextual classifiers are well suited to modelling the uncertainty inherent in the process. BAYESIAN IMAGE ANALYSIS METHODOLOGY For a given scale, the desired classification can be modelled as a discrete Markov random field (MRF) x with an associated probability distribution a(x). ' ~ r a d u a t estudent, Silviculture Laboratory, University of Washington, Seattle, WA (jmoffett@silvae.cfr.woshington.edu). 'professor, Department oj'Statistics. UniversiQ of Washington. Seattle, WA. This probability distribution can be combined with a likelihood density for the data y given x. The combination gives a posterior probability distribution for the random field. The posterior distribution is thus a conditional probability distribution for the classification given the image data. Statistical inferences about x are then based on the posterior distribution P(X ly). There are several advantages of this approach. In addition to classifying each pixel on the basis of the spectral information contained in the image data, other a priori information can be specifically modelled by the prior distribution. This information pertains to the local characteristics of the classification, e.g. having adjacent pixels belonging to the same class, having objects with closed boundaries, or preventing two particular classes from being adjacent. Unlike post-classification smoothing algorithms, such as majority filters, Bayesian classification techniques incorporate spatial information without losing fidelity to the data. Observed data Consider a two-dimensional rectangular array of pixels over a finite region S with each pixel identified by an index i = 1,2, ..., n. For each pixel in S a remote sensor records a measurement of radiant flux that is a sample distribution of the upwelling radiance, degraded by a stochastic process. The imperfect observations form an image y = {y,: i E S ) . The yi's are vector valued for bands 1-5,7. Markov random fields Let X be a random vector that classifies y over region S and x = {xi: i E S } be any realization of X. Thus, x denotes an arbitrary classification of S and X = {X,,...,X,}. Each univariate Xi takes values among c unordered classes. The objective of classification is to estimate the "true" class image x*. Bayesian image analysis assumes x*is a realization of a locally dependent MRF with an associated joint probability distribution ~ ( x assigning ) classes to S (Besag, 1986). In forest type mapping stands are typically larger than the individual pixels. Exploiting this spatial continuity of the surface cover at a articular scale relative to the image resolution provides a second source of imperfect information about x*. Unlike time dependent variables the only logical conditioning set for spatial variables is all x except xi. An MRF must be defined with respect to a set of neighbors, written as xai, and satisfy the positivity condition: ~ ( x >) 0 for all x. Given a specified neighborhood system, the Markov property states P(X, Ix-~)= T(x,Ix~~), where represents all pixels other than pixel i . The Markov property allows the full conditional distribution of xi to be specified in terms of its neighbors. As a result of this assumption, ~ ( x is) a Gibbs distribution with respect to the assumed neighborhood (Geman and Geman, 1984). The MRF-Gibbs equivalence provides an explicit formula for the jointly distributed random field, in terms of the full conditionals or Gibbs potential functions. A second-order neighborhood consists of eight neighbors in the horizontal, vertical, and diagonal directions. The general form of a pairwise interaction MRF for discrete unordered classes can be written as follows (Besag, 1986): where n, is the number of pixels in class k and n, is the number of distinct class pairs (k,l).The a,'s and the P,'s govern the percentage of pixels in each class and the strength of interaction between different classes; for P,, > 0 classes k and 1 are discouraged from being neighbors. Likelihood function The recorded spectral radiance influences the classification through the likelihood function. In contrast to the prior, the likelihood is ideally determined by knowledge of the sensing process and the existence of training data. The likelihood function represents the joint probability density as a conditional density of the data y given the actual scene x: Written in this form the image data is assumed to be conditionally independent given knowledge of the classes, or class conditionally independent. While this assumption expedites an overview of the theory, and may be reasonable for some applications, it is not a valid assumption for remotely sensed images. Posterior distribution Bayes theorem can be used to construct a posterior distribution ~ (ly)xon which inferences about x* can be based. Using this theorem, the posterior distribution for the scene x given the data y can be written as, The computational burden presented by the posterior distribution is overcome by the fact that it too is an MRF with a Gibbs distribution. This allows pixels to be evaluated individually based on their full conditional distributions, written as The conditional prior distribution follows from ~ ( x=) a ( x i IX.~)T ( x . ~ ) . Image estimation Within this framework the final classification is a point estimate of x*. Due to the high dimensionality of x the optimization required to find an exact solution is infeasible for realistic scenes. Estimates of various image attributes can be obtained using stochastic simulation. The marginal posterior modes (MPM) estimate will minimize the expected number of misclassifications by maximizing the marginal probability with which each pixel is classified. MPM has been increasingly adopted in recent years for classification problems (Besag et al., 1991; Green et aL, 1994). MPM estimates of xi* are obtained by maximizing r ( x i 1 y), where the most frequently sampled class for pixel i is the estimate of xi*. This estimate can be computed using the Gibbs sampler (Geman and Geman, 1984) which runs as a discrete-time Markov chain, with state space the set of all possible classifications x and limit distribution a ( x ly) (Besag, 1989). For a single iteration, a new xi is randomly sampled for each pixel from its full conditional distribution. This generates a sequence of stochastically dependent classifications {x}. An initial classification is required to start the algorithm followed by a number of iterations during which samples are not collected. This burn-in period will minimize the influence of the initial classification. This procedure also estimates a posterior probability distribution for each pixel, providing spatially explicit information about the certainty of the point estimate classification. Credible intervals can also be obtained for measurements such as class areas. The MPM estimate requires simulating a predetermined number of realizations of the MRF and it can be difficult to determine the number of realizations required to assure convergence. Parameter estimation can also be problematic. A PRELIMINARY MODEL Using this approach a project was undertaken to classify TM data for the Lizard Lake watershed near Clallam Bay, Washington. The area is covered by second growth forests. Six classes were selected to represent general forest structures and land cover: clearcut (CC); stand initiation (SI); stem exclusion (SE); older forest (OF); hardwood (HD); and deep water (W). Class means and covariances were estimated from randomly selected pixels within training sites to avoid biasing the estimates. As this is an area of rugged terrain, the backwards radiance correction transformation (Colby, 1991) was used to topographically normalize the image. The model A simple model was constructed by first adopting the conventional maximum likelihood function (ML), which assumes the data to be class conditionally independent and normally distributed. A discrete pairwise difference prior was chosen to model the tendency of adjacent pixels to be of the same class. The prior distribution is written as follows, where j'is a strength of interaction parameter, and ni(c) counts the number of neighbors (8 nearest) of pixel i in class c for c = {O,...,5 ) . Results The Gibbs sampler was written in C. Samples were collected for 3000 iterations after a burn-in of 1000 iterations; J was fixed at 0.5. Figure 1 shows the "ground truth" classification, which does not indicate the presence of hardwood patches known to be on this landscape. Figure 2 shows the ML estimate from data that was not normalized. The topographic shade causes some SE forests to be classified as OF. Figure 3 shows the MPM point estimate from normalized data. Figure 3 also shows a single pixel at the center of a small lake where the water is deepest (lower left corner). The influence of the likelihood preserves the single deep water pixel, which would be lost using a majority filter. Figure 4 shows a posterior probability classification of SE forest. Note that the edges of the younger forest types and older forests have a modest probability of being stem exclusion. A HIERARCHICAL MODEL The likelihood in the above model is problematic and the prior is the most basic. Sources of variation such as atmospheric scattering and topography complicate the classification process. As an alternative to data preprocessing, the Bayesian approach allows separate models for each source of variation to be x a(zllzz), ..., P ( Z ~ 1y) linked through successive conditioning of the form ~ ( lzl), (Smith and Roberts, 1993). Each source of variation is modelled by a conditional probability distribution as a stage within a hierarchical framework. If a secondorder neighborhood is assumed, triple and quadruple pixel interaction can be modelled (Besag, 1974). Triples are useful for modelling convex stand boundaries and quadruples for modelling interior regions. Recent research suggests that for restoration problems a crude prior will suffice, but that estimates of attributes such as class area and perimeter are sensitive to the specification of the prior distribution (Tjelmeland and Besag, 1996). Topographic effect model Estimates of slope and aspect are generally calculated from digital elevation models (DEMs), but are subject to several sources of error. Given the likely errors and little knowledge of the surface photometric function it is therefore logical to model the effect of topography with some degree of uncertainty. This can be specified by first defining z to be the measured pixel image incorporating the effects of topography. Assuming z to be conditionally independent given x, and each zj to be normally distributed gives the following model, Figure 1. "Ground truth." Note missing patches of hardwoods. Figure 3. MPM point estimate, P=0.5. Figure 2. ML classification without opographic normalization showing a reater portion of older forest. Figure 4. Posterior probability of SE forest, P=0.5. Linear stretch with brighter shades indicating higher probability. P( z I X, DEM, sun position) cc n 8 exp( -(zj 2 j=l coski coske - ( cos e )P~)~), where cos i is the cosine of the angle of incidence, cos e is the cosine of the slope, 0 is the variance, px are the means, and k is the Minnaert constant. This model is derived from the normalization transformation of Colby (199 1). The assumption of normality in this model is arbitrary, requiring further investigation. System PSF effects model After reflecting off of a topographic surface the upwelling radiant flux is scattered by the atmosphere and recorded as a function of the sensor PSF (Duggin and Robinove, 1990). If the data y are assumed normally distributed and conditionally independent given z,then this process can be modelled as follows, where h is the system PSF detecting radiation from j pixels in the proximity of pixel i, and 4 is the variance. Note also that h will be a function of wavelength. Posterior distribution Bayes theorem allows these mode1s to be combined and gives the following joint posterior distribution, r ( x , z l Y ) C-(Y Again an MPM estimate can be obtained by sampling from the full conditionals. However the added complexity of this model may require the use of a Metropolis algorithm. The complexity will be further increased if prior distributions are specified for some of the parameters as a fully Bayesian approach would suggest. CONCLUSION Appropriate accuracy assessment methods for probabilistic classifications must still be developed. Estimates of posterior probabilities will be difficult to verify. The "ground truth" for assessing point estimates needs to be at the same scale and resolution as the image data. Sensitivity analysis must be conducted to assess the stability of the solutions. Topographic normalization and spatial information can improve forest classification. Probabilistic classifiers provide additional information by estimating the level of certainty with which a pixel is classified,