Model based classification and segmentation of hyperspectral

advertisement

Model based classification and segmentation of hyperspectral images

Project proposal Wiskunde Toegepast

A.

Stein

Wageningen Universiteit, Wiskundige en statistische methoden. Postbus 100, 6700 AC Wageningen, The

Netherlands

Titel: Modelmatige classificatie en segmentatie van hyperspectraal beelden

Title: Model based classification and segmentation of hyperspectral images

Adres: Prof.dr. A. Stein

Wageningen Universiteit (Biometris)

Postbus 100

6700 AC Wageningen tel. +31-317-483551 fax. +31-317-483554 email: alfred.stein@wur.nl

secr. +31-317-484085

Inpassing/Programme: Wiskunde toegepast

Aanvragen elders: Voor dit project wordt elders geen steun aangevraagd

Key words: model based segmentation, spatial statistics, hyperspectral images

Mathematics subject classification:

1) Summary of the project

1.1 Research

During recent years, the quality of remotely sensed images has increased. This has, among others, resulted in hyperspectral images, which have a spectral resolution equal to 100 or more bands. Each image may still contain in the order of magnitude of 10 6 pixels. With such images, subsequent classification and segmentation become increasingly complicated. In this proposal we aim at a model based approach for classification and segmentation of hyperspectral images. The aim of the current proposal is to develop statistically sound and valid methods using modern developments in computational procedures (MCMC, hierarchical modeling, bootstrap). The procedures will be applied at several hyperspectral images with a clear geological and vegetation characterization.

1.2 Utilisation

- To be filled in

1.3 Translation in Dutch

2. Composition of the research group

2.1

Prof. A. Stein WU – ITC

Dr. M.C. van Lieshout CWI

Prof. F. van der Meer TUD – ITC

Spatial Statistics

Spatial statistics

Applied Earth Observation

2.2 Candidates

This project has been written to accommodate Willem Kruijer, currently an MSc student from the

Rijksuniversiteit Groningen in the field of statistics. As we have been able to identify, he is an excellent candidate to do statistical research in a field where this is very much needed.

3) Scientific description of the proposal

3.1 The project

In applications of remote sensing for life sciences, statistical image classification and segmentation are important tasks. Segmentation differs from classification in the sense that segmentation aims at identifying spatially homogeneous contiguous objects, whereas classification solely deals with aspects of homogeneity.

In image segmentation it is particularly important to derive spatially dependent objects from hyperspectral images. So far, classification and segmentation methods for hyperspectral images are mainly based upon methods with their roots in data reduction techniques: principal components, selection of a limited number of indicator bands, followed by a maximum likelihood or k -nearest neighbors classification techniques is a most commonly applied method (Richards, 2001). Standard statistical classification and segmentation methods often fall apart if applied to hyperspectral images. The reason for this is two-fold. First, values obtained in different pixels are usually highly correlated, because dependence exists in the physical world, and second the number of bands and pixels is abundant. Further, the objects are most often non-overlapping

(but exceptions occur). It is therefore not only important to make homogeneous classes, but also to properly represent the spatial dependence between objects as modeled by means of the pixel values. Hyperspectral information may, however, be treated in another way, namely by considering the spectral information for each individual pixel as a curve, with the band number on the horizontal axis and the intensity on the vertical axis. Such a curve may be generated by the physical process, which in turn would require a

(Bayesian) statistical approach for modeling. As a typical image contains 640 by 480 pixels, hundreds of thousands of these curves are obtained. At this stage it appears to be useful and important to further develop classification and segmentation methods for hyperspectral images, based upon modeling of curves. This could be a much more natural approximation of intensity spectra than approximation by step functions.

During the past years some progress has been observed in collecting standard curves for both geological and vegetation objects. Such curves are useful to investigate the general type of objects, as well as some specific features, displayed as deviations from the general type. For both purposes a proper modeling should take place. Various distance and classification measures can be defined (examples are the

‘total variation norm’ for a region containing mainly reflections in the small wavelength segment), each being sensitive for a particular part of the curve. Such measures (metrics) can then be combined to new metrics at larger scales. In this respect the role of wavelets seems to be promising. For a specific type of landuse/geological unit it should be possible to define the measure that is most suitable to detect the differences with other units, provided that curves for a given type of landuse can be used. For any new measurement, however, such curves may be different, possibly including atmospheric distortions, and also accommodating some local features of variations in landuse (within parcel variation, for example), as a consequence of locally specific features. A Bayesian analysis would start with the (known) optimal curve, the a priori discriminatory measure, and a first naïve classification. The curves, and the corresponding discriminatory measure, would then be adjusted iteratively to the local reflection conditions that apply to the specific combination of atmospheric condition, crop properties and/or geologic units. An optimal discriminatory metric requires the particular details that apply to the object that has to be identified.

An unsupervised classification on the other hand has no fixed number of classes beforehand. This therefore would require a different algorithm that continuously balances the differences between and within different segments. As such it should lead to the most likely segmentation of an area. This would then again require a well-specified metric for (groups of ) pixels, again based on the full set of available spectral values, but with a carefully, model-based weighing. After application of such a measure, spatial models are incorporated in an algorithm, such as Markov random field models, or more advanced techniques.

For this purpose it would hence be useful to develop a generic procedure that given n curves, can automatically determine a prior metric, corresponding to n given curves, and a posterior metric, that belongs to the given curves. It would then be desirable to only need this metric when observing under precisely the same conditions some additional pixels at the edge of the image. The problem to face is that this measure can not be constructed based solely on the n curves, but that also the source (and the amount) of variation needs to be included. If the posterior curves are known, then the residuals within the different classes need to be considered. It is a challenge, however, to a priori specify the variation within the classes.

The research method will consist in the first place on identifying usable curves on a per pixel basis. Attention will focus on splines, wavelets, kernels en LOESS curves. At the background, probability density curves may be considered, which after estimation would yield a meaningful interpretation and allow to apply further segmentation steps. This should then lead to meaningful objects. Methods will be

developed to compare and either combine or separate curves from neighboring pixels with a proper attention to the presence of residuals. Standard metrics like the Kullback distance and the Hellinger distance seem to be reasonable first candidates for that purpose, to be critically analyzed and improved afterwards. A further step would have to address the applicability of such metrics within a spatially dependent context. Use of prior information seems to be promising, and if present indispensable. A critical consideration at each of these steps is that the number of pixels is large, and that a solution is to be reached within a reasonable time frame.

Segmentation in fixed classes (either using a supervised or an unsupervised segmentation routine) has as a drawback that the actual nature is rarely crisp, if ever. To overcome this problem, recent years has seen an increased use of fuzzy and random sets, leading to a more probabilistic way if reasoning. A modelbased approach using spectral curves should we expect to give a useful further contribution to this approach, where particular spatially dependent classification and segmentation procedures with a solid probabilistic background are being formulated. Such a segmentation procedure has as an additional advantage that its quality can easily be visualized.

Of some particular interest seems to be the use of wavelet, allowing to carry out a multiscale analysis in the combined spatial and spectral domain of an image. This would allow to distinguish local anomalies from general trends, either for the purpose of classification, or for removing superfluous details.

A final important issue concerns spatial data quality and spatial data uncertainty. Every band is likely to contain deviations, in terms of precise location in a field setting, in terms of spatial resolution or in terms of recording. Standard curves are derived on the basis of ideal information, after correction for several distortions. A specific aim of this project is to come to a quantification of such influences to estimates of density and model curves. Again, a complicating factor is that measurement errors of bands that are closely together, either in the physical space or in the feature space, can be correlated. A hierarchical Bayesian approach may be useful at this stage.

At this moment, research to the use of hyperspectral images mainly takes place within a disciplinary context. Convincing examples exist for application for object characterization within a geological and vegetation setting. A general lack of coherence is felt within these disciplines for a proper and generally applicable, well founded theory to perform classification and segmentation. We expect this project, therefore, to have a scope well beyond the current set of applications. We aim to stimulate collaboration with other scientists working in related domains.

This research will take place in collaboration with research started earlier by Pravesh Debba

(ITC). He uses image segmentation to determine optimal sampling schemes. It further is related to research by Arko Lucieer (ITC) on spatial data quality. It will finally relate to research on the revision of geographic information. All these subjects are related to the research describes in this proposal, although somewhat remote.

3.2. Need of personnel

Biometris is a new group of statisticians and mathematicians, with an emphasis on education and cosultancy. It is compiled of the chair ‘wiskundige en statistische methoden’ and the former Wageningen

Center for Biometry. At the moment, only 2 PhD students are present. We are convinced that at the interface between statistics and earth observation a wealth of possibilities and requirements exists. Neither at the ITC with a large experience in image analysis nor at Biometris currently any personnel is avialble for this type of research. We have found in Willem Kruijer an excellent candidate for this research.

3.3 Work plan and viability

3.3.1 Preparation (6 months)

Literature review.

Identification of a small set (3-4) relevant hyperspectral image

Preliminary analysis of a hyperspectral test image

Visit to the field location

Result: a report on the state of the art of segmentation and classification methods for hyperspectral images, including their applications to the test images

3.3.2 Development of a curve-based method (1 year)

Curve- and model-based approaches

MCMC modeling

Density function estimation

Identification of generic noise from spectra

Application to a relevant test image

Result: 1-2 papers on curve- and modeling approaches for image segmentation

3.3.3 Modeling impacts of external influences on hyperspectral images (6-9 months)

Modeling using the general linear model, MRFs and more modern techniques

Result: 1-2 papers on quantification of impacts of external influences on hyperspectral image segmentation

3.3.4 Wavelet applications (6 months)

Multiscale analysis in the space-spectrum domain

Development of data reduction techniques

Result: 1-2 papers on wavelet applications for a multiscale analysis of hyperspectral images

3.3.5 Creating a generic framework for data quality assessment and modeling (1 year)

Issues of positional and attribute accuracy are addressed

Consequences for modeling are assessed

Applications to 2-3 hyperspectral images

Field visit

Result: 1-2 papers on a generic framework for modeling data quality issues of hyperspectral images

3.3.6 Finishing the work by compiling the thesis (3-6 months)

3.4 Intended infrastructure

The main work will be done at Wageningen University. Regular visits to the ITC are planned, in order to have sufficient access to hyperspectral information, discussions on aspects of practical field conditions and dissemination of develop tools and ideas.

3.5 Positioning

Because of the dual position of the proposer, a closer collaboration can be established between Wageningen

University, and the ITC. In particular, a further collaboration with the CWI is envisaged here. Several triple activities have already taken place in the recent past: mutual contributions to workshops (1996, 1999), collaboration in joint projects and joint publications. We hope to further strengthen the ties between the institutes by means of this proposal.

In an international setting we will further elaborate on existing contacts with the IWMI. We further envision to have case studies from developing countries.

4. Utilisation

4.1 Relevance to society

4.1.1 Image analysis

4.1.2 Vegetation and life sciences

4.1.3 Water related issues

4.2 User’s group

1. Dr. Hans van Leeuwen

Synopsis

Wageningen

2.

Neo

Amersfoort

3. Prof. dr. Wim Bastiaanssen

IWMI

Sri Lanka

4. Prof. dr. A. Skidmore

ITC

PO Box 6

7500 AA Enschede

The Netherlands

9) Literature

1.

Richards, J.A. 2001. Remote sensing digital image analysis: an introduction. Springer, Berlin

2.

Stein, A., Van der Meer, F. and Gorte, B. 1999. Spatial Statistics for Remote Sensing. Kluwer

Academic Publishers, Dordrecht, 284pp. (ISBN: 0-7923-5978-X)

3.

Epinat, V., Stein, A., De Jong, S.M. and Bouma, J. 2001. A wavelet characterization of hghresolution NDVI patterns forprecision agriculture. International Journal of Applied Earth

Observation and Geoinformation 3(2), 121-132.

4.

Gorte, B. and A. Stein. 1998. Bayesian classification and class area estimation of satellite images using stratification. IEEE Transactions on Geoscience and Remote Sensing 36, 803-812.

5.

Lucieer, A. and Stein, A. 2002. Existential uncertainty of spatial objects segmented from satellite sensor imagery. IEEE Transactions on Geoscience and remote sensing 40, 2518-2521.

6.

Van der Meer, F., 2000. Geophysical inversion of hyperspectral data. International Journal of

Remote Sensing , 21(2): 387-393. ISSN 1366-5901

7.

Van der Meer, F., Van Dijk, P., Van der Werff, H. & Yang, H. 2001. Remote sensing and petroleum leakage: a review and case study. Terra Nova, 14(1): 1-17.

8.

Van der Meer, F. 2000. Spectral curve shape matching with a continuum removed CCSM algorithm. International Journal of Remote Sensing , 21(16): 3179-3185 (ISSN 1366-5901).

6) Budget

6.1 Personnel

1 PhD student for 4 years, requiring a total of € 135762. A benchfee of € 4538 is added to it to allow small expenditures during the PhD period.

6.2 Domestic travel

A total of k€ 1.4 evenly spread over 4 years is required for regular visits to both the CWI and the ITC

6.3 International travel

A total of € 8000 is required for two visits to field sites and for presentations at two international conferences

6.4 Equipment

A hardware/software configuration consisting of a robust PC with the ENVI/IDL package. Three NASA hyperion images will be purchased, containing approximately 200 spectral bands, 400-2500nm., 30 m resolution, in strips of 7.5-80km

6.5 User’s contribution

6.6 Summary

Budget Year

1

2

3

4

Total

Personnel

140300

Travel NL

350

350

350

350

1400

Travel abroad

1500

1500

2500

2500

8000

Equipment/images

10800

1800

1800

14400

b) persoonsgebonden benchfee c) additioneel reisbudget

2 x bezoek veldwerk, afh. Van locatie (max € 5000)

2 x bezoek international congres (€ 3000) regelmatig bezoek ITC (€ 350/yr) d) projectgebonden apparatuur software

Robuuste PC, goede grafische mogelijkheden

Software pakket ENVI/IDL

Images 1

= € 135762

= € 4538

= € 9050

= € 10000

= € 4000

= € 5000

= € 5400

1 NASA's hyperion: ongeveer 200 spectrale banden, 400-2500nm., 30 m. resolutie, 7.5-80km.

Strips.

Download