synthMix-SVR Manual for Application: synthMix-SVR Last manual update 09.02.2016 Authors Akpona Okujeni, Matthias Held, Marcel Schwieder, Benjamin Jakimow, Andreas Rabe, Pedro Leitão, Sebastian van der Linden, Patrick Hostert Copyright for application and manual © Humboldt-Universität zu Berlin, Geomatics Lab, 2014, www.hu-geomatics.de Citation Please cite this manual as: Okujeni, A., Jakimow, B., Held, M., Schwieder, M., Rabe, A., van der Linden, S., Leitão, P., Hostert, P., (2014). synthMix-SVR, Manual for Application: synthMix-SVR. Humboldt-Universität zu Berlin, Germany Please cite this application as: Okujeni, A., Jakimow, B., Held, M., Schwieder, M., Leitão, P., van der Linden, S., Hostert, P. (2014). synthMix-SVR, software available at http://www.enmap.org/ Disclaimer The authors of this manual accept no responsibility for errors or omissions in this work and shall not be liable for any damage caused by these errors or omissions. Contents 1 Concept of synthMix-SVR ....................................................................... 4 2 Background .............................................. Error! Bookmark not defined. 3 User Guide ........................................................................................... 5 4 References ......................................................................................... 13 Terms of Use SyntMix-SVR ........................................................................ 14 3 1 Concept of synthMix-SVR synthMix-SVR is an IDL based tool for quantitative analysis of remote sensing data. It implements the concept of combining support vector regression (SVR) with synthetically mixed training data for mapping sub-pixel fractions of land cover (Okujeni et al. 2013). synthMix-SVR is embedded into the EnMAP-Box 2.0 (Rabe et al. 2014a) and uses imageSVM 3.0 (Rabe et al. 2014b) for SVR modeling. The goal of synthMix-SVR is to provide a user-friendly tool for producing land cover fraction maps from remote sensing imagery (Figure 1). synthMix-SVR follows the approach proposed in Okujeni et al. (2013): (1) A spectral library is used to generate a synthetically mixed training data set for a single land cover category of interest, a so-called target category; (2) The synthetic data set is used to train a single SVR model, which is subsequently used to derive a fraction map of the respective target category. Building on this concept, synthMix-SVR (1) combines and (2) additionally allows the user to flexibly define multiple target categories through iterative processing. Thus, synthMix-SVR is suited for a comprehensive mapping of land cover, delivering a stack of target fraction maps in a multilayer image. The synthMix-SVR application is delivered along with a test data set, consisting of a hyperspectral image subset and a corresponding spectral library. Once the data have been loaded into the EnMAP-Box, synthMix-SVR leads the user through several steps. This includes (i) the definition of input data and output directory, (ii) a flexible selection of target categories, (iii) the definition of the mixing interval for training data generation, and (iv) a fully automated or user-defined SVR model parameterization with subsequent model application. Further optional features of synthMix-SVR include the superposition of random noise on synthetic spectral mixtures and post-processing steps to produce maps with physically meaningful fraction values (values between 0 and 100%) or physically meaningful stacks of fraction maps (sum to unity, i.e., 100%). Figure 1 Workflow for quantifying land cover using synthMix-SVR. 4 2 Background Kernel-based support vector machines (SVMs) from the field of machine learning provide flexible, non-parametric and nonlinear models that are excellently suited for exploiting remote sensing image data (e.g.Foody and Mathur 2004; Melgani and Bruzzone 2004; Camps-Valls et al. 2006; Durbha et al. 2007). For a detailed introduction into support vector machines and the underlying concepts please refer to Schölkopf and Smola (2002), Smola and Schölkopf (2004) or Burges (1998). While the support vector classifier (SVC) has been established as powerful technique for a per-pixel mapping of discrete land cover classes, little attention has been paid to the use of support vector regression for estimating sub-pixel fractions of land cover categories. This can be explained by the difficulty in finding reliable quantitative training information, i.e., pairs of spectra and associated cover fraction, needed for regression modeling. Compared to perpixel classifiers, training signature cannot be labeled in the data itself or mapped in the field. A further possible solution to combine image spectra with spatially aggregated land cover information from a high resolution reference map often fails due to inaccurate co-registered image data set. The combination of machine learning-based SVR with synthetically mixed training data (Okujeni et al. 2013) has been demonstrated to be able to overcome this dilemma and was therefore recommended as suitable approach for sub-pixel mapping purposes. Generation of synthetically mixed training data The general idea behind generating synthetically mixed training data is to produce a multiple set of mixed spectra along with related mixing fractions, which can be used as training input for regression modelling of a single target land cover category. The basic principle is illustrated in Figure 2. A library consisting of pure material spectra that are assigned to specified land cover categories forms the data base. Following the description in Okujeni et al. (2013), further processing steps include: (1) The partitioning of the library spectra into a target category and a background category (includes all remaining categories). (2) The calculation of synthetic mixtures between each pure spectrum of the target category (with 100% mixing fraction) and each pure spectrum of the background category (with 0% mixing fraction). For simplification, linear mixing systematics is assumed. Further, the mixing complexity (number of possible material spectra to be mixed) and the mixing interval (number of intermediate mixtures within the fraction range between 0 and 100%) needs to be defined. (3) The combination of all pure original and mixed spectra in a single spectral library. The mixing fraction of the respective target category is assigned to each spectrum. synthMix-SVR was developed to flexibly select multiple target categories through iterative processing. The current version of synthMix-SVR only supports the generation of binary mixtures between each pure spectrum of the target category and each pure spectrum of the background category. The user is requested to set the mixing interval. 5 Figure 2 Generation of synthetically mixed training data (after Okujeni et al. (2013)). Support vector regression modeling Support vector regression emanate from the field of kernel-based machine learning methods and have been widely used as powerful, nonlinear technique mainly for quantifying biophysical/-chemical plant properties (Camps-Valls et al. 2006; Durbha et al. 2007; Tuia et al. 2011). In general, SVR estimates a linear dependency between pairs of n-dimensional input vectors (i.e., spectral bands) and 1-dimensional target variables (i.e., land cover fraction of a target category) by fitting an optimal approximating hyperplane to the training data. For nonlinear problems, the training data are implicitly mapped by a kernel function into a higher dimensional space, wherein the new data distribution enables a better fitting of a linear hyperplane. The parameterization of an SVR requires the user to select the parameter(s) of a kernel function (γ) as well as a regularization (C) and loss function (ε) parameters. Once these parameters have been selected, a quadratic optimization problem is solved to construct the optimal approximating hyperplane. synthMix-SVR integrates the SVR algorithm provided by imageSVM 3.0 (Rabe et al. 2014b). imageSVM is an IDL based tool for the SVM classification and regression analysis of remote sensing image data. Its workflow allows a flexible and transparent use of the support vector concept for image data analysis. imageSVM uses LIBSVM (Chang and Lin 2011) and a Gaussian kernel function during the training of the SVM. synthMix-SVR integrates the user-friendly imageSVM interface, which enables (i) a fully automated or user-defined SVR model training (optimal model parameter found via grid search using a crossvalidation strategy) and (ii) subsequent model application to image data to derive a model prediction. Once the training data has been generated, an iterative processing strategy is carried out to train SVR model and derive fraction maps of the selected target categories. 6 3 User Guide The synthMix-SVR was developed to provide a handy tool for the estimation of multiple target fraction cover maps. In the following we will introduce the test data set that is delivered with the synthMix-SVR and explain the graphical user interface. You can find it in the EnMAP-Box environment under applications. Data type The synthMix-SVR needs a spectral library and a label file with classes related to the spectra in the library. Both files can be created within the EnMAP-Box or in other programs. They need to be provided in the ENVI specific “envi spectral library” and “envi classification” format, respectively. For more specific information on data types we refer you the EnMAP-Box manual (Held, 2012). Furthermore, you will need the image file in which you want to estimate land cover fractions. The synthMix-SVR application is delivered with a set of test data that can be used for testing the application. It contains a subset of a hyperspectral image of Berlin, Germany acquired with the HyMAP sensor (Figure 3). Moreover, a spectral library with 41 pure spectra selected from the image (Figure 3; Table 1) of various urban surfaces that can be comprehended in the classes impervious, grass, tree and others (stored in the accompanying pseudo-image label file; Figure 3), is part of the test data set. Figure 2 (Top left) Subset of the Hymap image acquired over Berlin. (Top right) Pure image spectra of distinct urban surfaces. (Bottom left) Label pseudo-image of the selected image spectra. 7 Table 1 Distribution of pure endmember spectra within each class. Class Enmember Grass 5 Impervious 23 Tree 7 Other 6 Total 41 Open the test data via: File > Open Image/SpecLib Path:\EnMAP-Box\SourceCode\applications\SyntMIX\_resource\testData Define input data and output directory You will find the the synthMix-SVR application within the application menu: Applications > synthMix-SVR > synthMix-SVR Application In the first window of the synthMix-SVR application (Figure 4) you are asked to define the working directory in which all produced models, libraries and images will be stored. Furthermore, you need to define the input spectral library and the corresponding labels for the spectral mixing, as well as the image in which you want to estimate the fractions of the selected target classes. After providing and accepting all necessary information the “Class Selection” (Figure 5) window will pop up. Figure 4 The first window of the synthMix-SVR application in which the input data need to be defined. 8 Generation of synthetically mixed training data In the “Class Selection” window (Figure 5), you will find the classes that were found in the input label file that was defined in the input window. Now you need to select the classes for which you want to estimate fractional cover within the chosen image. At least choose one target and one background class or two target classes. Note that selected targets are always used as background spectra for the mixing of other targets. Define the mixing ratio for the selected classes (Table 2). Furthermore, you can choose if the mixed spectra should have added noise, to make their appearance more realistic. Therefore you are asked to define the signal to noise factor. Figure 5 Class Selection window. Here you need to choose your target and background categories of interest, the mixing intervall and artificial noise. Table 2 Mixing intervals for spectral mixing. Steps Width Fractions of Target Profile 1 50% .5 2 33% .33, .66 3 25% .25, .5, .75 4 20% .2, .4, .6, .8 5 16% .16, .33, .5, .66, .83 ... 9 ... 10% .1, .2, .3, .4, .5, .6, .7, .8, .9 9 Before you accept your selection, you can validate your settings in order to get an overview of how many distinct libraries and spectra you are going to produce. Note that the SVR processing time increases with increasing input data. After clicking the validate button a window will pop up that gives you a summary of your chosen selection (Figure 6). From the summary window you will get back to the “Class Selection” window to adjust your setting if necessary. After you accepted your selection, a library containing the original and the mixed spectra for each selected target will be created, saved in the defined working directory and opened in the EnMAP-Box. Accompanying fraction labels will be produced accordingly. Figure 6 Summary of the amount of libraries and mixed spectra you will create with the chosen settings. After confirmation you will get back to the category selection window, where you can adjust your settings if necessary. Note that SVR processing time increases with an increasing size of input training data. Define post-processing options In the next window (Figure 7) you will be asked if any post-processing steps shall be performed on the output estimations. SVR may result in values that are negative or above 1 (super-positive). These values are not realistic for the estimation of pixel fractions. Thus you can choose if you are satisfied with the estimated values or if the negative and super-positive values should be set to 0 and 1. In any case the original estimations will be saved in the defined working directory. The estimated fractions of each selected target do not necessarily sum up to 100%, as they are estimated independently from each other. Therefore, you can choose if you want to weight the estimations relative to the total amount of estimated fractions, so that the final image sums up to 100%. Figure 7 Post-processing window. Choose if unrealistic estimations shall be manipulated and weighted fractions sum up to 100%. 10 SVR model parameterization The Synth-Mix-SVR is based on imageSVM and allows choosing default SVR parameters for the heuristic grid search of the best SVR parameters g, C and ε. If you wish to alter the default search parameters, you can adjust them in the advanced section by clicking on the “advanced” button (Figure 8). In the advanced settings section, the user is allowed to modify the grid search and to select min (g/C), max (g/C): Minimum and maximum values that define the range of the grid (g and C dimension). On this basis, ideal parameter values for g, C and ε will be found. For more information on SVR parameterization we refer to the imageSVM manual (van der Linden, 2010), which can be found in the EnMAP-Box help menu. The distinct SVR models for each selected target class will be saved to your working directory and displayed in the EnMAP-Box GUI. If you want to manually check your models results for optimization, you can examine the parameters by opening the model files with: Applications > Regression > imageSVM Regression > View SVR Parameters Figure 8 Parameter selections in the default and advanced section to initialize the SVR-Parameter grid search. 11 After accepting the selected settings, synthMix-SVR will start with the SVR model training for each selected target and apply them iteratively to the input image. Final output When all processing steps are finished, the estimated results will be saved in your chosen working directory and opened in the EnMAP-Box (Figure 9). In the EnMAP-Box GUI you will find: Original input library, labels and image n mixed spectral libraries and labels (“TG_class”) for each target n individual fraction estimations for each target n SVR models for each target final fraction image (stack of all estimations) final unity image (stack that sums up to 100%; optional) If you have chosen any post-processing steps, they will be opened along with the original fraction estimations. You can display the fraction estimations with up to three color combinations and link them to each other or the original input image. Additionally, an output HTML-report will be opened in your standard browser and provide you with the general information about your synthMix-SVR process. Figure 9 Final output of the synthMix-SVR. On the far left hand side all produced and saved products are listed. The left image shows the fractions of impervious, grass and trees in a three color combination. The right frame contents the input image in true colors. 12 4 References Burges, C.J.C. (1998). A tutorial on Support Vector Machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121-167. Camps-Valls, G., Bruzzone, L., Rojo-Alvarez, J.L., & Melgani, F. (2006). Robust support vector regression for biophysical variable estimation from remotely sensed images. IEEE Geoscience and Remote Sensing Letters, 3, 339-343. Durbha, S.S., King, R.L., & Younan, N.H. (2007). Support vector machines regression for retrieval of leaf area index from multiangle imaging spectroradiometer. Remote Sensing of Environment, 107, 348-361. Foody, G.M., & Mathur, A. (2004). A relative evaluation of multiclass image classification by support vector machines. Ieee Transactions on Geoscience and Remote Sensing, 42, 1335-1343. Held, M., Jakimow, B., Rabe, A., van der Linden, S., Wirth, F., Hostert, P. (2012). EnMAP-Box Manual, Version 1.4, Humboldt-Universität zu Berlin, Germany. Melgani, F., & Bruzzone, L. (2004). Classification of hyperspectral remote sensing images with support vector machines. IEEE Transactions on Geoscience and Remote Sensing, 42, 1778-1790. Okujeni, A., van der Linden, S., Tits, L., Somers, B., & Hostert, P. (2013). Support vector regression and synthetically mixed training data for quantifying urban land cover. Remote Sensing of Environment, 137, 184-197. Rabe, A., van der Linden, S., & Hostert, P. (2014a). EnMAP Box, Version 2.0 [online]. Available from: http://www.enmap.org/ [accessed March 2014]. Rabe, A., van der Linden, S., & Hostert, P. (2014b). imageSVM, Version 3.0 [online]. Available from: http://www.imagesvm.net/ [accessed March 2014]. Schölkopf, B., & Smola, A.J. (2002). Learning with Kernels - Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, Massachusetts: MIT Press. Smola, A.J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14, 199-222. Tuia, D., Verrelst, J., Alonso, L., Perez-Cruz, F., & Camps-Valls, G. (2011). Multioutput support vector regression for remote sensing biophysical parameter estimation. IEEE Geoscience and Remote Sensing Letters, 8, 804-808. van der Linden, S., Rabe, A., Wirth, F., Suess, S., Okujeni, A., Hostert, P., (2010). imageSVM regression, Application Manual: imageSVM version 2.1. Humboldt-Universität zu Berlin, Germany 13 Terms of Use synthMix-SVR © Copyright synthMix-SVR: Humboldt-Universität zu Berlin, 2014 Redistribution and use of synthMix-SVR in binary form, with or without modification, are permitted for scientific purposes provided that the following conditions are met: 1. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 2. Neither name of copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. Any commercial use of synthMix-SVR, derivatives thereof or of results achieved by using the software is prohibited. DISCLAIMER: THE SOFTWARE "TEMPLATE" IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 14