Gabor Descriptors for Aerial Image Classification Vladimir Risojević, Snježana Momić, Zdenka Babić Faculty of Electrical Engineering, University of Banja Luka, Bosnia and Herzegovina vlado@etfbl.net, s momic@hotmail.com, zdenka@etfbl.net Abstract. The amount of remote sensed imagery that has become available by far surpasses the possibility of manual analysis. One of the most important tasks in the analysis of remote sensed images is land use classification. This task can be recast as semantic classification of remote sensed images. In this paper we evaluate classifiers for semantic classification of aerial images. The evaluated classifiers are based on Gabor and Gist descriptors which have been long established in image classification tasks. We use support vector machines and propose a kernel well suited for using with Gabor descriptors. These simple classifiers achieve correct classification rate of about 90% on two datasets. From these results follows that, in aerial image classification, simple classifiers give results comparable to more complex approaches, and the pursuit for more advanced solutions should continue having this in mind. Key words: Aerial image classification, Gabor filters, Gist descriptor 1 Introduction There is a constantly increasing number of instruments for remote sensing of the Earth. Consequently, many databases of remotely sensed data are being flooded with data. At the moment, images dominate these databases, both in variety and quantity. Remote sensing imaging of the Earth is done by a variety of airborne and space-borne imagers in various spectral bands, ranging from visible spectrum to microwave [8]. There are many applications of remote sensing imaging, both military and civilian. Civilian applications include land use planning, weather forecasting, studying long-term climate changes, crops monitoring, studying deforestation, city planning, and many others. These applications require development of effective means for acquisition, processing, transmission, storage, retrieval, and analysis of images. One of the key problems in aerial image analysis is the problem of semantic classification. This problem is closely related to the task of land use monitoring which is necessary for control of environmental quality as well as maintaining and improving living conditions and standards. The holy grail of automatic land use classification is pixel-level semantic segmentation of remotely sensed images. 2 Vladimir Risojević, Snježana Momić, Zdenka Babić The result of a pixel-level segmentation is a thematic map in which each pixel is assigned a predefined label from a finite set. However, remote sensing images are often multispectral and of high resolution which makes its detailed semantic segmentation excessively computationally demanding task. This is the reason why some researchers decided to classify image blocks instead of individual pixels. We also adopt this approach and evaluate classifiers based on the state of the art image descriptors and support vector machines, which have shown good results in image classification tasks, at the task of aerial image classification. The contribution of this paper is in the evaluation of Gabor and Gist descriptors for the task of aerial image classification. For the classifier based on Gabor descriptors we propose a kernel based on the distance function proposed for Gabor descriptors. In the experiments we show that the classifier based on Gabor descriptors yields similar or better performance compared to the Gist descriptor based classifier, despite lower dimensionality of the former. We also show that these simple classifiers yield classification performance which is better or comparable with some more complicated classifiers using more features. The paper is organized as follows. In Section 2 we briefly review previous related work. Image representation and classifier are described in Section 3, and experimental results are given in Section 4. In Section 5 we conclude and give ideas for future research. 2 Related Work There has been a long history of using computer vision techniques for classification of aerial and satellite images. We briefly review here some of the methods that are relevant to our work. Ma and Manjunath [3] use Gabor descriptors for representing aerial images. Their work is centered around efficient content-based retrieval from the database of aerial images and they did not try to automatically classify images to semantic categories. Parulekar et al. [7] classify satellite images into four semantic categories in order to enable fast and accurate browsing of the image database. Fauquer et al. [2] classify aerial images based on color, texture and structure features. The authors tested their algorithm on a dataset of 1040 aerial images from 8 categories. In a more recent work [6], Ozdemir and Aksoy use bag-ofwords model and frequent subgraph mining to construct higher level features for satellite image classification. The algorithm is tested on a dataset of 585 images classified into 8 semantic categories. Our work is in a similar vein, but rather than trying to construct semantic features for image classification we focus on low level features and aerial images. Despite wide use of Gist descriptor [5] in general-purpose image classification, to the best of our knowledge there are not many examples of aerial image classification using Gist descriptor. Exception is work on tree detection by Yang et al. [10], where Gist is used for clustering of images prior to detection phase. Gabor Descriptors for Aerial Image Classification 3 3 Image Representation and Classifier In this paper we evaluate two image descriptors, both based on Gabor filters. There is a long tradition of using Gabor descriptors in computer vision and image processing, dating back to Daugman [1] who noted similarity between low level processing in biological vision and Gabor filter banks. Subsequently, Gabor descriptors have been used for various tasks including texture segmentation, image recognition, iris recognition, registration, and motion tracking. In the context of image classification the most notable are its uses for texture classification and retrieval, pioneered by Manjunath and Ma [4], and, more recently, for scene classification using Gist descriptor, as proposed by Oliva and Torralba [5]. 3.1 Gabor Descriptor Gabor descriptor for an image is computed by passing the image through a filter bank of Gabor filters. Gabor filter is a linear band-pass filter whose impulse response is defined as a Gaussian function modulated with a complex sinusoid, 1 y2 1 x2 g (x, y) = + 2πjΩx , (1) + exp − 2πσx σy 2 σx2 σy2 where Ω is the frequency of the Gabor function, and σx and σy determine its bandwidth. Gabor showed that these functions are optimal in the sense of minimizing the joint two-dimensional uncertainty in space and frequency [1]. Impulse responses of the filters in a Gabor filter bank are dilated (scaled) and rotated versions of the function (1). Filters in a Gabor filter bank can be considered as edge detectors with tunable orientation and scale so that information on texture can be derived from statistics of the outputs of those filters [4]. We can consider (1) as a mother Gabor wavelet, and the functions obtained by its dilations and rotations are Gabor wavelets. For a given image, I (x, y) , (x, y) ∈ Ψ (Ψ is the set of image points), the output of a Gabor filter bank is actually Gabor wavelet transformation of that image, which can be written as ZZ ∗ Wmn (x, y) = I (x1 , y1 ) gmn (x − x1 ) (y − y1 ) dx1 dy1 , (2) Ψ where gmn (x, y) are Gabor wavelets at scale m and orientation n, obtained from (1), and asterisk denotes complex conjugation. Assuming that image regions have homogeneous texture, means µmn and standard deviations σmn of the transform coefficients are used to represent the texture of the region: ZZ |Wmn (x, y)| dxdy , (3) µmn = Ψ σmn vZ Z u u 2 (|Wmn (x, y)| − µmn ) dxdy . =t Ψ (4) 4 Vladimir Risojević, Snježana Momić, Zdenka Babić Gabor descriptor is now formed as a vector of means and standard deviations of filter responses x = µ00 σ00 µ01 σ01 · · · µ(S−1)(K−1) σ(S−1)(K−1) , (5) where S is the total number of scales, and K is the total number of orientations. These values are typically set heuristically, through cross-validation. In [4] a distance metric based on the weighted L1 -norm is proposed for computing the dis-similarity between textures: XX d (xi , xj ) = dmn (xi , xj ) , (6) m n where µ(i) − µ(j) σ (i) − σ (j) mn mn mn mn dmn (xi , xj ) = + , α (µmn ) α (σmn ) (7) and α (µmn ) and α (σmn ) are the standard deviations of the respective features over the entire database. 3.2 Gist Descriptor Oliva and Torralba proposed Gist descriptor [5] to represent the spatial envelope of the scene. The spatial envelope is a set of holistic scene properties which can be used for inferring the semantic category of the scene, without the need for recognition of the objects in the scene. The Gist descriptor of an image is computed by first filtering the image by a filter bank of Gabor filters, and then averaging the responses of filters in each block on a 4 × 4 nonoverlaping grid. Comparing this descriptor to the Gabor descriptor, we see that Gist descriptor is essentially a spatial layout of textures. Note that here standard deviations of the distribution of filter responses are not used. Despite its simplicity this descriptor shows very good results in natural scene classification tasks. 3.3 Classifier As a classifier we use support vector machine (SVM). Since distances of Gabor descriptors are computed using (6) we construct a kernel function starting from this metric as K (xi , xj ) = exp [−d (xi , xj )] , (8) where d (xi , xj ), is given by (6). This kernel function is essentially based on weighted L1 -norm, and it satisfies Mercer condition [9]. For Gist descriptor we follow the approach in [5] and use SVM with radial basis function kernel. We construct a multi-class classifier using N (corresponding to the number of categories) one-vs-all SVMs and selecting the class with maximal SVM output. Gabor Descriptors for Aerial Image Classification 4 5 Datasets and Experimental Results We tested the described image representations and classifier on two datasets. Both datasets consist of aerial images. The first dataset is our in-house dataset and contains images of the part of Banja Luka, Bosnia and Herzegovina. The second dataset contains images used previously for aerial image classification [2], and we include it here for comparison purposes. 4.1 In-House Dataset For evaluation of the classifiers we used an 4500×6000 pixel multispectral (RGB) aerial image of the part of Banja Luka, Bosnia and Herzegovina. In this image there is a variety of structures, both man-made, such as buildings, factories, and warehouses, as well as natural, such as fields, trees and rivers. We partitioned this image into 128×128 pixel tiles, and used a total of 606 images in our experiments. We manually classified all images into 6 categories, namely: houses, cemetery, industry, field, river, and trees. Examples of images from each class are shown in Fig. 1. It should be noted that the distribution of images in these categories is highly uneven, which can be observed from the bar graph in Fig. 2. In our experiments we used half of the images for training and the other half for testing. We compute Gabor descriptors at 8 scales and 8 orientations for all images from the dataset. We also tried other combinations of numbers of scales and orientations and chose the one with the best performance. Gabor descriptors, as proposed in [4] are computed for grayscale images. Since images are multispectral we compute Gabor descriptor for all 3 spectral bands in an image, and concatenate the obtained vectors, which yields 3 × 8 × 8 × 2 = 384-dimensional descriptors. For comparison purposes we also compute Gabor descriptors for grayscale (panchromatic) versions of images, which are 8 × 8 × 2 = 128-dimensional. As for Gist descriptors, we obtained the best results with the default setup, ie. a filter bank at 4 scales and 8 orientations. For this descriptor we also compute grayscale variant, which is 4×8×16 = 512-dimensional, and color variant, which results in a 3 × 4 × 8 × 3 = 1536-dimensional descriptor. For testing our classifiers we used 10-fold cross validation, each time with different random partition of the dataset, and averaged the results. Average classification accuracies on all categories are given in Table 1. In the table, Gabor (full) denotes Gabor descriptor as given in (5), while Gabor (mean) denotes descriptor obtained using only means of filter-bank responses. Table 1. Comparison of the classification accuracies for the in-house dataset. Descriptor Panchromatic (grayscale) (%) Multispectral (RGB) (%) Gabor (full) 84.5 88.0 Gabor (mean) 80.7 84.5 Gist 79.5 89.3 6 Vladimir Risojević, Snježana Momić, Zdenka Babić Fig. 1. Samples of images from all classes. From left to right, column-wise: houses, cemetery, industry, field, river, trees. (Best viewed in color.) Fig. 2. Per category distribution of images in the in-house dataset. Gabor Descriptors for Aerial Image Classification 7 We see that Gist descriptor computed for all spectral bands of an RGB image has the best performance, at cost of high-dimensionality of the descriptor. It is worth noting that much simpler Gabor descriptor, with 4 times lower dimensionality, yields similar performance. Even more interesting is the fact that for grayscale (panchromatic) images Gabor descriptor outperforms Gist. From these results, it is obvious that classifiers benefit from information from various spectral bands. When grayscale images are considered, standard deviations of Gabor filter bank responses provide richer information about the texture of the image, hence its better performance. The importance of this information can be observed from the drop of performance when only means of Gabor filter bank responses are used. Another conclusion is that spatial layout of filter bank responses does not have beneficial influence on the performance of aerial image classifier, as is the case with general scenes [5]. The confusion matrix for Gabor descriptor is given in figure 3. We note that confusions mainly arise between categories which can be difficult even for humans. The most notable examples are houses versus cemetery, because of rectangular structures with strong oriented edges, and river versus field, because both have homogeneous, smooth texture without pronounced edges. It is also important to note that there are not many confusions between natural (river, trees, field) and man-made categories (houses, cemetery, industry). Fig. 3. Confusion matrix for the in-house dataset using Gabor (RGB) descriptor. The confusion matrix for Gist descriptor is given in Fig. 4. The same observations we made for the confusion matrix for Gabor descriptor are also valid here. 8 Vladimir Risojević, Snježana Momić, Zdenka Babić Fig. 4. Confusion matrix for the in-house dataset using Gist (RGB) descriptor. 4.2 Window on the UK Dataset For our second experiment we chose Window on the UK dataset which was also used in [2]. This dataset consists of 1040 64 × 64 pixels aerial images, which are manually classified into the following 8 categories: building, road, river, field, grass, tree, boat, vehicle. There are 130 images per category so the distribution of images into categories in this dataset is uniform, in contrast to our in-house dataset. The authors of [2] also proposed a split into training and test sets of 520 images each. For images from this dataset we computed Gabor descriptor at 8 scales and 8 orientations, as well as Gist descriptor, and then trained a multi-class classifier as described previously. In Table 2 we give the comparison of classification accuracies for this dataset. Again, Gabor and Gist descriptor result in comparable performances, this time with some advantage on the side of Gabor descriptors. This supports our previous findings about descriptive power of these two descriptors. Moreover, we can see that the performance of our classifier with Gabor descriptors is better than the performance of the algorithm proposed in [2], and only slightly worse than the performance of the SVM classifier trained with features from [2]. The confusion matrix for Gabor descriptor is shown in Fig. 5. We can see that common misclassifications again occur in cases that can also potentially confuse human subjects, such as building versus vehicle and field versus grass. It is important to note that, in this case too, misclassifications rarely occur between natural and man-made categories. Gabor Descriptors for Aerial Image Classification 9 Table 2. Comparison of the classification accuracies for Window on the UK dataset. Method SVM with SVM with Algorithm SVM with Gabor descriptor (RGB) Gist descriptor (RGB) from [2] features from [2] Accuracy (%) 90.8 87.1 89.4 92.3 Fig. 5. Confusion matrix for Window on the UK dataset using Gabor descriptor. 5 Conclusion In this paper we evaluate two image descriptors, namely Gabor and Gist descriptors, and show that classifiers based on these descriptors show results comparable or better than more complex approaches. Both descriptors have previously shown good results in texture and image classification tasks. As a classifier we use SVM with standard radial basis function kernel, as well as a kernel constructed using a metric function proposed for comparing Gabor descriptors. We show that, for multispectral images, lower dimensional Gabor descriptors show similar or better performance performance than Gist, while, for panchromatic images, Gabor descriptors outperform Gist. This is mainly due to the fact that spatial layout is not such a strong cue for semantic classification of aerial images, but their texture regions are rather spatially homogeneous. Also, Gabor descriptors use standard deviations of filter bank responses, and this richer representation that they provide is another reason for their better performance. Despite its simplicity, classifier based on Gabor descriptors and SVMs with weighted L1 -norm kernel achieves better performance than more complex classifiers trained with color, texture and structural descriptors. This finding calls for a more thorough investigation of descriptors used for aerial image classification 10 Vladimir Risojević, Snježana Momić, Zdenka Babić since it is possible that state of the art descriptors in other application areas do not show better performance than simpler descriptors on the task at hand. Comparing results of this paper with the literature, we also note that using multiple features does not guarantee better results. Therefore, another important research area, stemming from these results, is feature combination. Obviously, this question needs more elaborate studies that will show what features are needed to adequately represent aerial images, and how they should be combined. Also, the whole community would benefit from more manually annotated ground truth datasets which are publicly available so that the algorithms from various groups can be compared. References 1. Daugman, J.G.: Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression. IEEE Transactions on Acoustics, Speech and Signal Processing 36(7), 1169–1179 (1988) 2. Fauqueur, J., Kingsbury, N.G., Anderson, R.: Semantic discriminant mapping for classification and browsing of remote sensing textures and objects. In: Proceedings of IEEE International Conference on Image Processing (ICIP 2005). pp. 846–849 (2005) 3. Ma, W.Y., Manjunath, B.S.: A texture thesaurus for browsing large aerial photographs. Journal of the American Society for Information Science 49(7), 633–648 (1998) 4. Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Transactions on Pattern analysis and Machine Intelligence 18(8), 837– 842 (1996) 5. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42(3), 145–175 (2001) 6. Ozdemir, B., Aksoy, S.: Image classification using subgraph histogram representation. In: Proceedings of 20th IAPR International Conference on Pattern Recognition. Istanbul, Turkey (2010) 7. Parulekar, A., Datta, R., Li, J., Wang, J.Z.: Large-scale satellite image browsing using automatic semantic categorization and content-based retrieval. In: IEEE International Workshop on Semantic Knowledge in Computer Vision, in conjunction with IEEE International Conference on Computer Vision. pp. 1873–1880. Beijing, China (2005) 8. Ramapriyan, H.K.: Satellite imagery in earth science applications. In: Castelli, V., Bergman, L.D. (eds.) Image Databases, pp. 35–82. John Wiley & Sons, Inc. (2002) 9. Vapnik, V.: Statistical Learning Theory. John Wiley (1998) 10. Yang, L., Wu, X., Praun, E., Ma, X.: Tree detection from aerial imagery. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. pp. 131–137. GIS ’09, New York, NY, USA (2009)