Terms of Use synthMix-SVR

advertisement
synthMix-SVR
Manual for Application: synthMix-SVR
Last manual update
09.02.2016
Authors
Akpona Okujeni, Matthias Held, Marcel Schwieder, Benjamin Jakimow, Andreas
Rabe, Pedro Leitão, Sebastian van der Linden, Patrick Hostert
Copyright for application and manual
© Humboldt-Universität zu Berlin, Geomatics Lab, 2014, www.hu-geomatics.de
Citation
Please cite this manual as: Okujeni, A., Jakimow, B., Held, M., Schwieder, M.,
Rabe, A., van der Linden, S., Leitão, P., Hostert, P., (2014). synthMix-SVR,
Manual for Application: synthMix-SVR. Humboldt-Universität zu Berlin, Germany
Please cite this application as: Okujeni, A., Jakimow, B., Held, M., Schwieder, M.,
Leitão, P., van der Linden, S., Hostert, P. (2014). synthMix-SVR, software
available at http://www.enmap.org/
Disclaimer
The authors of this manual accept no responsibility for errors or omissions in this
work and shall not be liable for any damage caused by these errors or omissions.
Contents
1
Concept of synthMix-SVR ....................................................................... 4
2
Background .............................................. Error! Bookmark not defined.
3
User Guide ........................................................................................... 5
4
References ......................................................................................... 13
Terms of Use SyntMix-SVR ........................................................................ 14
3
1
Concept of synthMix-SVR
synthMix-SVR is an IDL based tool for quantitative analysis of remote sensing
data. It implements the concept of combining support vector regression (SVR)
with synthetically mixed training data for mapping sub-pixel fractions of land
cover (Okujeni et al. 2013). synthMix-SVR is embedded into the EnMAP-Box 2.0
(Rabe et al. 2014a) and uses imageSVM 3.0 (Rabe et al. 2014b) for SVR
modeling.
The goal of synthMix-SVR is to provide a user-friendly tool for producing land
cover fraction maps from remote sensing imagery (Figure 1). synthMix-SVR
follows the approach proposed in Okujeni et al. (2013): (1) A spectral library is
used to generate a synthetically mixed training data set for a single land cover
category of interest, a so-called target category; (2) The synthetic data set is
used to train a single SVR model, which is subsequently used to derive a fraction
map of the respective target category. Building on this concept, synthMix-SVR
(1) combines and (2) additionally allows the user to flexibly define multiple
target categories through iterative processing. Thus, synthMix-SVR is suited for a
comprehensive mapping of land cover, delivering a stack of target fraction maps
in a multilayer image.
The synthMix-SVR application is delivered along with a test data set, consisting
of a hyperspectral image subset and a corresponding spectral library. Once the
data have been loaded into the EnMAP-Box, synthMix-SVR leads the user through
several steps. This includes (i) the definition of input data and output directory,
(ii) a flexible selection of target categories, (iii) the definition of the mixing
interval for training data generation, and (iv) a fully automated or user-defined
SVR model parameterization with subsequent model application. Further optional
features of synthMix-SVR include the superposition of random noise on synthetic
spectral mixtures and post-processing steps to produce maps with physically
meaningful fraction values (values between 0 and 100%) or physically
meaningful stacks of fraction maps (sum to unity, i.e., 100%).
Figure 1 Workflow for quantifying land cover using synthMix-SVR.
4
2
Background
Kernel-based support vector machines (SVMs) from the field of machine learning
provide flexible, non-parametric and nonlinear models that are excellently suited
for exploiting remote sensing image data (e.g.Foody and Mathur 2004; Melgani
and Bruzzone 2004; Camps-Valls et al. 2006; Durbha et al. 2007). For a detailed
introduction into support vector machines and the underlying concepts please
refer to Schölkopf and Smola (2002), Smola and Schölkopf (2004) or Burges
(1998). While the support vector classifier (SVC) has been established as
powerful technique for a per-pixel mapping of discrete land cover classes, little
attention has been paid to the use of support vector regression for estimating
sub-pixel fractions of land cover categories. This can be explained by the
difficulty in finding reliable quantitative training information, i.e., pairs of spectra
and associated cover fraction, needed for regression modeling. Compared to perpixel classifiers, training signature cannot be labeled in the data itself or mapped
in the field. A further possible solution to combine image spectra with spatially
aggregated land cover information from a high resolution reference map often
fails due to inaccurate co-registered image data set. The combination of machine
learning-based SVR with synthetically mixed training data (Okujeni et al. 2013)
has been demonstrated to be able to overcome this dilemma and was therefore
recommended as suitable approach for sub-pixel mapping purposes.
Generation of synthetically mixed training data
The general idea behind generating synthetically mixed training data is to
produce a multiple set of mixed spectra along with related mixing fractions,
which can be used as training input for regression modelling of a single target
land cover category. The basic principle is illustrated in Figure 2. A library
consisting of pure material spectra that are assigned to specified land cover
categories forms the data base. Following the description in Okujeni et al.
(2013), further processing steps include:
(1) The partitioning of the library spectra into a target category and a
background category (includes all remaining categories).
(2) The calculation of synthetic mixtures between each pure spectrum of the
target category (with 100% mixing fraction) and each pure spectrum of the
background category (with 0% mixing fraction). For simplification, linear
mixing systematics is assumed. Further, the mixing complexity (number of
possible material spectra to be mixed) and the mixing interval (number of
intermediate mixtures within the fraction range between 0 and 100%) needs
to be defined.
(3) The combination of all pure original and mixed spectra in a single spectral
library. The mixing fraction of the respective target category is assigned to
each spectrum.
synthMix-SVR was developed to flexibly select multiple target categories through
iterative processing. The current version of synthMix-SVR only supports the
generation of binary mixtures between each pure spectrum of the target
category and each pure spectrum of the background category. The user is
requested to set the mixing interval.
5
Figure 2 Generation of synthetically mixed training data (after Okujeni et al. (2013)).
Support vector regression modeling
Support vector regression emanate from the field of kernel-based machine
learning methods and have been widely used as powerful, nonlinear technique
mainly for quantifying biophysical/-chemical plant properties (Camps-Valls et al.
2006; Durbha et al. 2007; Tuia et al. 2011). In general, SVR estimates a linear
dependency between pairs of n-dimensional input vectors (i.e., spectral bands)
and 1-dimensional target variables (i.e., land cover fraction of a target category)
by fitting an optimal approximating hyperplane to the training data. For nonlinear
problems, the training data are implicitly mapped by a kernel function into a
higher dimensional space, wherein the new data distribution enables a better
fitting of a linear hyperplane. The parameterization of an SVR requires the user
to select the parameter(s) of a kernel function (γ) as well as a regularization (C)
and loss function (ε) parameters. Once these parameters have been selected, a
quadratic optimization problem is solved to construct the optimal approximating
hyperplane.
synthMix-SVR integrates the SVR algorithm provided by imageSVM 3.0 (Rabe et
al. 2014b). imageSVM is an IDL based tool for the SVM classification and
regression analysis of remote sensing image data. Its workflow allows a flexible
and transparent use of the support vector concept for image data analysis.
imageSVM uses LIBSVM (Chang and Lin 2011) and a Gaussian kernel function
during the training of the SVM. synthMix-SVR integrates the user-friendly
imageSVM interface, which enables (i) a fully automated or user-defined SVR
model training (optimal model parameter found via grid search using a crossvalidation strategy) and (ii) subsequent model application to image data to
derive a model prediction. Once the training data has been generated, an
iterative processing strategy is carried out to train SVR model and derive fraction
maps of the selected target categories.
6
3
User Guide
The synthMix-SVR was developed to provide a handy tool for the estimation of
multiple target fraction cover maps. In the following we will introduce the test
data set that is delivered with the synthMix-SVR and explain the graphical user
interface. You can find it in the EnMAP-Box environment under applications.
Data type
The synthMix-SVR needs a spectral library and a label file with classes related to
the spectra in the library. Both files can be created within the EnMAP-Box or in
other programs. They need to be provided in the ENVI specific “envi spectral
library” and “envi classification” format, respectively. For more specific
information on data types we refer you the EnMAP-Box manual (Held, 2012).
Furthermore, you will need the image file in which you want to estimate land
cover fractions.
The synthMix-SVR application is delivered with a set of test data that can be used
for testing the application. It contains a subset of a hyperspectral image of
Berlin, Germany acquired with the HyMAP sensor (Figure 3). Moreover, a spectral
library with 41 pure spectra selected from the image (Figure 3; Table 1) of
various urban surfaces that can be comprehended in the classes impervious,
grass, tree and others (stored in the accompanying pseudo-image label file;
Figure 3), is part of the test data set.
Figure 2 (Top left) Subset of the Hymap image acquired over Berlin. (Top right) Pure image
spectra of distinct urban surfaces. (Bottom left) Label pseudo-image of the selected image spectra.
7
Table 1 Distribution of pure endmember spectra within each class.
Class
Enmember
Grass
5
Impervious
23
Tree
7
Other
6
Total
41
Open the test data via:
 File > Open Image/SpecLib
 Path:\EnMAP-Box\SourceCode\applications\SyntMIX\_resource\testData
Define input data and output directory
You will find the the synthMix-SVR application within the application menu:
 Applications > synthMix-SVR > synthMix-SVR Application
In the first window of the synthMix-SVR application (Figure 4) you are asked to
define the working directory in which all produced models, libraries and images
will be stored. Furthermore, you need to define the input spectral library and the
corresponding labels for the spectral mixing, as well as the image in which you
want to estimate the fractions of the selected target classes. After providing and
accepting all necessary information the “Class Selection” (Figure 5) window will
pop up.
Figure 4 The first window of the synthMix-SVR application in which the input data need to be
defined.
8
Generation of synthetically mixed training data
In the “Class Selection” window (Figure 5), you will find the classes that were
found in the input label file that was defined in the input window. Now you need
to select the classes for which you want to estimate fractional cover within the
chosen image. At least choose one target and one background class or two target
classes. Note that selected targets are always used as background spectra for the
mixing of other targets. Define the mixing ratio for the selected classes (Table 2).
Furthermore, you can choose if the mixed spectra should have added noise, to
make their appearance more realistic. Therefore you are asked to define the
signal to noise factor.
Figure 5 Class Selection window. Here you need to choose your target and background categories
of interest, the mixing intervall and artificial noise.
Table 2 Mixing intervals for spectral mixing.
Steps
Width
Fractions of Target Profile
1
50%
.5
2
33%
.33, .66
3
25%
.25, .5, .75
4
20%
.2, .4, .6, .8
5
16%
.16, .33, .5, .66, .83
...
9
...
10%
.1, .2, .3, .4, .5, .6, .7, .8, .9
9
Before you accept your selection, you can validate your settings in order to get
an overview of how many distinct libraries and spectra you are going to produce.
Note that the SVR processing time increases with increasing input data. After
clicking the validate button a window will pop up that gives you a summary of
your chosen selection (Figure 6). From the summary window you will get back to
the “Class Selection” window to adjust your setting if necessary.
After you accepted your selection, a library containing the original and the mixed
spectra for each selected target will be created, saved in the defined working
directory and opened in the EnMAP-Box. Accompanying fraction labels will be
produced accordingly.
Figure 6 Summary of the amount of libraries and mixed spectra you will create with the chosen
settings. After confirmation you will get back to the category selection window, where you can
adjust your settings if necessary. Note that SVR processing time increases with an increasing size
of input training data.
Define post-processing options
In the next window (Figure 7) you will be asked if any post-processing steps shall
be performed on the output estimations. SVR may result in values that are
negative or above 1 (super-positive). These values are not realistic for the
estimation of pixel fractions. Thus you can choose if you are satisfied with the
estimated values or if the negative and super-positive values should be set to 0
and 1. In any case the original estimations will be saved in the defined working
directory.
The estimated fractions of each selected target do not necessarily sum up to
100%, as they are estimated independently from each other. Therefore, you can
choose if you want to weight the estimations relative to the total amount of
estimated fractions, so that the final image sums up to 100%.
Figure 7 Post-processing window. Choose if unrealistic estimations shall be manipulated and
weighted fractions sum up to 100%.
10
SVR model parameterization
The Synth-Mix-SVR is based on imageSVM and allows choosing default SVR
parameters for the heuristic grid search of the best SVR parameters g, C and ε. If
you wish to alter the default search parameters, you can adjust them in the
advanced section by clicking on the “advanced” button (Figure 8).
In the advanced settings section, the user is allowed to modify the grid search
and to select min (g/C), max (g/C): Minimum and maximum values that define
the range of the grid (g and C dimension).
On this basis, ideal parameter values for g, C and ε will be found.
For more information on SVR parameterization we refer to the imageSVM manual
(van der Linden, 2010), which can be found in the EnMAP-Box help menu.
The distinct SVR models for each selected target class will be saved to your
working directory and displayed in the EnMAP-Box GUI. If you want to manually
check your models results for optimization, you can examine the parameters by
opening the model files with:
 Applications > Regression > imageSVM Regression > View SVR
Parameters
Figure 8 Parameter selections in the default and advanced section to initialize the SVR-Parameter
grid search.
11
After accepting the selected settings, synthMix-SVR will start with the SVR model
training for each selected target and apply them iteratively to the input image.
Final output
When all processing steps are finished, the estimated results will be saved in
your chosen working directory and opened in the EnMAP-Box (Figure 9).
In the EnMAP-Box GUI you will find:

Original input library, labels and image

n mixed spectral libraries and labels (“TG_class”) for each target

n individual fraction estimations for each target

n SVR models for each target

final fraction image (stack of all estimations)

final unity image (stack that sums up to 100%; optional)
If you have chosen any post-processing steps, they will be opened along with the
original fraction estimations. You can display the fraction estimations with up to
three color combinations and link them to each other or the original input image.
Additionally, an output HTML-report will be opened in your standard browser and
provide you with the general information about your synthMix-SVR process.
Figure 9 Final output of the synthMix-SVR. On the far left hand side all produced and saved
products are listed. The left image shows the fractions of impervious, grass and trees in a three
color combination. The right frame contents the input image in true colors.
12
4
References
Burges, C.J.C. (1998). A tutorial on Support Vector Machines for pattern
recognition. Data Mining and Knowledge Discovery, 2, 121-167.
Camps-Valls, G., Bruzzone, L., Rojo-Alvarez, J.L., & Melgani, F. (2006). Robust
support vector regression for biophysical variable estimation from remotely
sensed images. IEEE Geoscience and Remote Sensing Letters, 3, 339-343.
Durbha, S.S., King, R.L., & Younan, N.H. (2007). Support vector machines
regression for retrieval of leaf area index from multiangle imaging
spectroradiometer. Remote Sensing of Environment, 107, 348-361.
Foody, G.M., & Mathur, A. (2004). A relative evaluation of multiclass image
classification by support vector machines. Ieee Transactions on Geoscience and
Remote Sensing, 42, 1335-1343.
Held, M., Jakimow, B., Rabe, A., van der Linden, S., Wirth, F., Hostert, P. (2012).
EnMAP-Box Manual, Version 1.4, Humboldt-Universität zu Berlin, Germany.
Melgani, F., & Bruzzone, L. (2004). Classification of hyperspectral remote sensing
images with support vector machines. IEEE Transactions on Geoscience and
Remote Sensing, 42, 1778-1790.
Okujeni, A., van der Linden, S., Tits, L., Somers, B., & Hostert, P. (2013).
Support vector regression and synthetically mixed training data for quantifying
urban land cover. Remote Sensing of Environment, 137, 184-197.
Rabe, A., van der Linden, S., & Hostert, P. (2014a). EnMAP Box, Version 2.0
[online]. Available from: http://www.enmap.org/ [accessed March 2014].
Rabe, A., van der Linden, S., & Hostert, P. (2014b). imageSVM, Version 3.0
[online]. Available from: http://www.imagesvm.net/ [accessed March 2014].
Schölkopf, B., & Smola, A.J. (2002). Learning with Kernels - Support Vector
Machines, Regularization, Optimization, and Beyond. Cambridge, Massachusetts:
MIT Press.
Smola, A.J., & Schölkopf, B. (2004). A tutorial on support vector regression.
Statistics and Computing, 14, 199-222.
Tuia, D., Verrelst, J., Alonso, L., Perez-Cruz, F., & Camps-Valls, G. (2011).
Multioutput support vector regression for remote sensing biophysical parameter
estimation. IEEE Geoscience and Remote Sensing Letters, 8, 804-808.
van der Linden, S., Rabe, A., Wirth, F., Suess, S.,
Okujeni, A., Hostert, P., (2010). imageSVM regression, Application Manual:
imageSVM version 2.1. Humboldt-Universität zu Berlin, Germany
13
Terms of Use synthMix-SVR
© Copyright synthMix-SVR: Humboldt-Universität zu Berlin, 2014
Redistribution and use of synthMix-SVR in binary form, with or without
modification, are permitted for scientific purposes provided that the following
conditions are met:
1. Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or other
materials provided with the distribution.
2. Neither name of copyright holders nor the names of its contributors may be
used to endorse or promote products derived from this software without specific
prior written permission.
Any commercial use of synthMix-SVR, derivatives thereof or of results achieved
by using the software is prohibited.
DISCLAIMER: THE SOFTWARE "TEMPLATE" IS PROVIDED BY THE COPYRIGHT
HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
14
Download