Gabor Descriptors for Aerial Image Classification Vladimir Risojevi´c, Snjeˇzana Momi´c, Zdenka Babi´c

advertisement
Gabor Descriptors for Aerial Image
Classification
Vladimir Risojević, Snježana Momić, Zdenka Babić
Faculty of Electrical Engineering,
University of Banja Luka, Bosnia and Herzegovina
vlado@etfbl.net, s momic@hotmail.com, zdenka@etfbl.net
Abstract. The amount of remote sensed imagery that has become available by far surpasses the possibility of manual analysis. One of the most
important tasks in the analysis of remote sensed images is land use classification. This task can be recast as semantic classification of remote
sensed images. In this paper we evaluate classifiers for semantic classification of aerial images. The evaluated classifiers are based on Gabor and
Gist descriptors which have been long established in image classification
tasks. We use support vector machines and propose a kernel well suited
for using with Gabor descriptors. These simple classifiers achieve correct classification rate of about 90% on two datasets. From these results
follows that, in aerial image classification, simple classifiers give results
comparable to more complex approaches, and the pursuit for more advanced solutions should continue having this in mind.
Key words: Aerial image classification, Gabor filters, Gist descriptor
1
Introduction
There is a constantly increasing number of instruments for remote sensing of
the Earth. Consequently, many databases of remotely sensed data are being
flooded with data. At the moment, images dominate these databases, both in
variety and quantity. Remote sensing imaging of the Earth is done by a variety of
airborne and space-borne imagers in various spectral bands, ranging from visible
spectrum to microwave [8].
There are many applications of remote sensing imaging, both military and
civilian. Civilian applications include land use planning, weather forecasting,
studying long-term climate changes, crops monitoring, studying deforestation,
city planning, and many others. These applications require development of effective means for acquisition, processing, transmission, storage, retrieval, and
analysis of images.
One of the key problems in aerial image analysis is the problem of semantic
classification. This problem is closely related to the task of land use monitoring
which is necessary for control of environmental quality as well as maintaining
and improving living conditions and standards. The holy grail of automatic land
use classification is pixel-level semantic segmentation of remotely sensed images.
2
Vladimir Risojević, Snježana Momić, Zdenka Babić
The result of a pixel-level segmentation is a thematic map in which each pixel
is assigned a predefined label from a finite set. However, remote sensing images
are often multispectral and of high resolution which makes its detailed semantic
segmentation excessively computationally demanding task. This is the reason
why some researchers decided to classify image blocks instead of individual pixels. We also adopt this approach and evaluate classifiers based on the state of
the art image descriptors and support vector machines, which have shown good
results in image classification tasks, at the task of aerial image classification.
The contribution of this paper is in the evaluation of Gabor and Gist descriptors for the task of aerial image classification. For the classifier based on
Gabor descriptors we propose a kernel based on the distance function proposed
for Gabor descriptors. In the experiments we show that the classifier based on
Gabor descriptors yields similar or better performance compared to the Gist
descriptor based classifier, despite lower dimensionality of the former. We also
show that these simple classifiers yield classification performance which is better
or comparable with some more complicated classifiers using more features.
The paper is organized as follows. In Section 2 we briefly review previous
related work. Image representation and classifier are described in Section 3, and
experimental results are given in Section 4. In Section 5 we conclude and give
ideas for future research.
2
Related Work
There has been a long history of using computer vision techniques for classification of aerial and satellite images. We briefly review here some of the methods
that are relevant to our work.
Ma and Manjunath [3] use Gabor descriptors for representing aerial images.
Their work is centered around efficient content-based retrieval from the database
of aerial images and they did not try to automatically classify images to semantic categories. Parulekar et al. [7] classify satellite images into four semantic
categories in order to enable fast and accurate browsing of the image database.
Fauquer et al. [2] classify aerial images based on color, texture and structure
features. The authors tested their algorithm on a dataset of 1040 aerial images
from 8 categories. In a more recent work [6], Ozdemir and Aksoy use bag-ofwords model and frequent subgraph mining to construct higher level features for
satellite image classification. The algorithm is tested on a dataset of 585 images
classified into 8 semantic categories. Our work is in a similar vein, but rather
than trying to construct semantic features for image classification we focus on
low level features and aerial images.
Despite wide use of Gist descriptor [5] in general-purpose image classification, to the best of our knowledge there are not many examples of aerial image
classification using Gist descriptor. Exception is work on tree detection by Yang
et al. [10], where Gist is used for clustering of images prior to detection phase.
Gabor Descriptors for Aerial Image Classification
3
3
Image Representation and Classifier
In this paper we evaluate two image descriptors, both based on Gabor filters.
There is a long tradition of using Gabor descriptors in computer vision and image processing, dating back to Daugman [1] who noted similarity between low
level processing in biological vision and Gabor filter banks. Subsequently, Gabor descriptors have been used for various tasks including texture segmentation,
image recognition, iris recognition, registration, and motion tracking. In the context of image classification the most notable are its uses for texture classification
and retrieval, pioneered by Manjunath and Ma [4], and, more recently, for scene
classification using Gist descriptor, as proposed by Oliva and Torralba [5].
3.1
Gabor Descriptor
Gabor descriptor for an image is computed by passing the image through a filter
bank of Gabor filters. Gabor filter is a linear band-pass filter whose impulse
response is defined as a Gaussian function modulated with a complex sinusoid,
1
y2
1 x2
g (x, y) =
+
2πjΩx
,
(1)
+
exp −
2πσx σy
2 σx2
σy2
where Ω is the frequency of the Gabor function, and σx and σy determine its
bandwidth. Gabor showed that these functions are optimal in the sense of minimizing the joint two-dimensional uncertainty in space and frequency [1].
Impulse responses of the filters in a Gabor filter bank are dilated (scaled) and
rotated versions of the function (1). Filters in a Gabor filter bank can be considered as edge detectors with tunable orientation and scale so that information on
texture can be derived from statistics of the outputs of those filters [4]. We can
consider (1) as a mother Gabor wavelet, and the functions obtained by its dilations and rotations are Gabor wavelets. For a given image, I (x, y) , (x, y) ∈ Ψ
(Ψ is the set of image points), the output of a Gabor filter bank is actually Gabor
wavelet transformation of that image, which can be written as
ZZ
∗
Wmn (x, y) =
I (x1 , y1 ) gmn
(x − x1 ) (y − y1 ) dx1 dy1 ,
(2)
Ψ
where gmn (x, y) are Gabor wavelets at scale m and orientation n, obtained from
(1), and asterisk denotes complex conjugation.
Assuming that image regions have homogeneous texture, means µmn and
standard deviations σmn of the transform coefficients are used to represent the
texture of the region:
ZZ
|Wmn (x, y)| dxdy ,
(3)
µmn =
Ψ
σmn
vZ Z
u
u
2
(|Wmn (x, y)| − µmn ) dxdy .
=t
Ψ
(4)
4
Vladimir Risojević, Snježana Momić, Zdenka Babić
Gabor descriptor is now formed as a vector of means and standard deviations
of filter responses
x = µ00 σ00 µ01 σ01 · · · µ(S−1)(K−1) σ(S−1)(K−1) ,
(5)
where S is the total number of scales, and K is the total number of orientations.
These values are typically set heuristically, through cross-validation.
In [4] a distance metric based on the weighted L1 -norm is proposed for computing the dis-similarity between textures:
XX
d (xi , xj ) =
dmn (xi , xj ) ,
(6)
m
n
where
µ(i) − µ(j) σ (i) − σ (j) mn mn mn
mn
dmn (xi , xj ) = +
,
α (µmn ) α (σmn ) (7)
and α (µmn ) and α (σmn ) are the standard deviations of the respective features
over the entire database.
3.2
Gist Descriptor
Oliva and Torralba proposed Gist descriptor [5] to represent the spatial envelope
of the scene. The spatial envelope is a set of holistic scene properties which
can be used for inferring the semantic category of the scene, without the need
for recognition of the objects in the scene. The Gist descriptor of an image is
computed by first filtering the image by a filter bank of Gabor filters, and then
averaging the responses of filters in each block on a 4 × 4 nonoverlaping grid.
Comparing this descriptor to the Gabor descriptor, we see that Gist descriptor is
essentially a spatial layout of textures. Note that here standard deviations of the
distribution of filter responses are not used. Despite its simplicity this descriptor
shows very good results in natural scene classification tasks.
3.3
Classifier
As a classifier we use support vector machine (SVM). Since distances of Gabor
descriptors are computed using (6) we construct a kernel function starting from
this metric as
K (xi , xj ) = exp [−d (xi , xj )] ,
(8)
where d (xi , xj ), is given by (6). This kernel function is essentially based on
weighted L1 -norm, and it satisfies Mercer condition [9].
For Gist descriptor we follow the approach in [5] and use SVM with radial
basis function kernel.
We construct a multi-class classifier using N (corresponding to the number of
categories) one-vs-all SVMs and selecting the class with maximal SVM output.
Gabor Descriptors for Aerial Image Classification
4
5
Datasets and Experimental Results
We tested the described image representations and classifier on two datasets.
Both datasets consist of aerial images. The first dataset is our in-house dataset
and contains images of the part of Banja Luka, Bosnia and Herzegovina. The
second dataset contains images used previously for aerial image classification [2],
and we include it here for comparison purposes.
4.1
In-House Dataset
For evaluation of the classifiers we used an 4500×6000 pixel multispectral (RGB)
aerial image of the part of Banja Luka, Bosnia and Herzegovina. In this image
there is a variety of structures, both man-made, such as buildings, factories, and
warehouses, as well as natural, such as fields, trees and rivers. We partitioned this
image into 128×128 pixel tiles, and used a total of 606 images in our experiments.
We manually classified all images into 6 categories, namely: houses, cemetery,
industry, field, river, and trees. Examples of images from each class are shown
in Fig. 1. It should be noted that the distribution of images in these categories
is highly uneven, which can be observed from the bar graph in Fig. 2. In our
experiments we used half of the images for training and the other half for testing.
We compute Gabor descriptors at 8 scales and 8 orientations for all images
from the dataset. We also tried other combinations of numbers of scales and
orientations and chose the one with the best performance. Gabor descriptors, as
proposed in [4] are computed for grayscale images. Since images are multispectral
we compute Gabor descriptor for all 3 spectral bands in an image, and concatenate the obtained vectors, which yields 3 × 8 × 8 × 2 = 384-dimensional descriptors. For comparison purposes we also compute Gabor descriptors for grayscale
(panchromatic) versions of images, which are 8 × 8 × 2 = 128-dimensional.
As for Gist descriptors, we obtained the best results with the default setup,
ie. a filter bank at 4 scales and 8 orientations. For this descriptor we also compute
grayscale variant, which is 4×8×16 = 512-dimensional, and color variant, which
results in a 3 × 4 × 8 × 3 = 1536-dimensional descriptor.
For testing our classifiers we used 10-fold cross validation, each time with
different random partition of the dataset, and averaged the results. Average
classification accuracies on all categories are given in Table 1. In the table, Gabor
(full) denotes Gabor descriptor as given in (5), while Gabor (mean) denotes
descriptor obtained using only means of filter-bank responses.
Table 1. Comparison of the classification accuracies for the in-house dataset.
Descriptor Panchromatic (grayscale) (%) Multispectral (RGB) (%)
Gabor (full)
84.5
88.0
Gabor (mean)
80.7
84.5
Gist
79.5
89.3
6
Vladimir Risojević, Snježana Momić, Zdenka Babić
Fig. 1. Samples of images from all classes. From left to right, column-wise: houses,
cemetery, industry, field, river, trees. (Best viewed in color.)
Fig. 2. Per category distribution of images in the in-house dataset.
Gabor Descriptors for Aerial Image Classification
7
We see that Gist descriptor computed for all spectral bands of an RGB image has the best performance, at cost of high-dimensionality of the descriptor.
It is worth noting that much simpler Gabor descriptor, with 4 times lower dimensionality, yields similar performance. Even more interesting is the fact that
for grayscale (panchromatic) images Gabor descriptor outperforms Gist. From
these results, it is obvious that classifiers benefit from information from various
spectral bands. When grayscale images are considered, standard deviations of
Gabor filter bank responses provide richer information about the texture of the
image, hence its better performance. The importance of this information can be
observed from the drop of performance when only means of Gabor filter bank
responses are used. Another conclusion is that spatial layout of filter bank responses does not have beneficial influence on the performance of aerial image
classifier, as is the case with general scenes [5].
The confusion matrix for Gabor descriptor is given in figure 3. We note
that confusions mainly arise between categories which can be difficult even for
humans. The most notable examples are houses versus cemetery, because of
rectangular structures with strong oriented edges, and river versus field, because
both have homogeneous, smooth texture without pronounced edges. It is also
important to note that there are not many confusions between natural (river,
trees, field) and man-made categories (houses, cemetery, industry).
Fig. 3. Confusion matrix for the in-house dataset using Gabor (RGB) descriptor.
The confusion matrix for Gist descriptor is given in Fig. 4. The same observations we made for the confusion matrix for Gabor descriptor are also valid
here.
8
Vladimir Risojević, Snježana Momić, Zdenka Babić
Fig. 4. Confusion matrix for the in-house dataset using Gist (RGB) descriptor.
4.2
Window on the UK Dataset
For our second experiment we chose Window on the UK dataset which was also
used in [2]. This dataset consists of 1040 64 × 64 pixels aerial images, which are
manually classified into the following 8 categories: building, road, river, field,
grass, tree, boat, vehicle. There are 130 images per category so the distribution
of images into categories in this dataset is uniform, in contrast to our in-house
dataset. The authors of [2] also proposed a split into training and test sets of
520 images each.
For images from this dataset we computed Gabor descriptor at 8 scales and
8 orientations, as well as Gist descriptor, and then trained a multi-class classifier
as described previously. In Table 2 we give the comparison of classification accuracies for this dataset. Again, Gabor and Gist descriptor result in comparable
performances, this time with some advantage on the side of Gabor descriptors.
This supports our previous findings about descriptive power of these two descriptors. Moreover, we can see that the performance of our classifier with Gabor descriptors is better than the performance of the algorithm proposed in [2],
and only slightly worse than the performance of the SVM classifier trained with
features from [2].
The confusion matrix for Gabor descriptor is shown in Fig. 5. We can see
that common misclassifications again occur in cases that can also potentially
confuse human subjects, such as building versus vehicle and field versus grass. It
is important to note that, in this case too, misclassifications rarely occur between
natural and man-made categories.
Gabor Descriptors for Aerial Image Classification
9
Table 2. Comparison of the classification accuracies for Window on the UK dataset.
Method
SVM with
SVM with
Algorithm
SVM with
Gabor descriptor (RGB)
Gist descriptor (RGB)
from [2]
features from [2]
Accuracy (%)
90.8
87.1
89.4
92.3
Fig. 5. Confusion matrix for Window on the UK dataset using Gabor descriptor.
5
Conclusion
In this paper we evaluate two image descriptors, namely Gabor and Gist descriptors, and show that classifiers based on these descriptors show results comparable
or better than more complex approaches. Both descriptors have previously shown
good results in texture and image classification tasks. As a classifier we use SVM
with standard radial basis function kernel, as well as a kernel constructed using
a metric function proposed for comparing Gabor descriptors. We show that, for
multispectral images, lower dimensional Gabor descriptors show similar or better performance performance than Gist, while, for panchromatic images, Gabor
descriptors outperform Gist. This is mainly due to the fact that spatial layout
is not such a strong cue for semantic classification of aerial images, but their
texture regions are rather spatially homogeneous. Also, Gabor descriptors use
standard deviations of filter bank responses, and this richer representation that
they provide is another reason for their better performance.
Despite its simplicity, classifier based on Gabor descriptors and SVMs with
weighted L1 -norm kernel achieves better performance than more complex classifiers trained with color, texture and structural descriptors. This finding calls for
a more thorough investigation of descriptors used for aerial image classification
10
Vladimir Risojević, Snježana Momić, Zdenka Babić
since it is possible that state of the art descriptors in other application areas do
not show better performance than simpler descriptors on the task at hand. Comparing results of this paper with the literature, we also note that using multiple
features does not guarantee better results. Therefore, another important research
area, stemming from these results, is feature combination. Obviously, this question needs more elaborate studies that will show what features are needed to
adequately represent aerial images, and how they should be combined. Also, the
whole community would benefit from more manually annotated ground truth
datasets which are publicly available so that the algorithms from various groups
can be compared.
References
1. Daugman, J.G.: Complete discrete 2-D Gabor transforms by neural networks for
image analysis and compression. IEEE Transactions on Acoustics, Speech and Signal Processing 36(7), 1169–1179 (1988)
2. Fauqueur, J., Kingsbury, N.G., Anderson, R.: Semantic discriminant mapping for
classification and browsing of remote sensing textures and objects. In: Proceedings
of IEEE International Conference on Image Processing (ICIP 2005). pp. 846–849
(2005)
3. Ma, W.Y., Manjunath, B.S.: A texture thesaurus for browsing large aerial photographs. Journal of the American Society for Information Science 49(7), 633–648
(1998)
4. Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image
data. IEEE Transactions on Pattern analysis and Machine Intelligence 18(8), 837–
842 (1996)
5. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation
of the spatial envelope. International Journal of Computer Vision 42(3), 145–175
(2001)
6. Ozdemir, B., Aksoy, S.: Image classification using subgraph histogram representation. In: Proceedings of 20th IAPR International Conference on Pattern Recognition. Istanbul, Turkey (2010)
7. Parulekar, A., Datta, R., Li, J., Wang, J.Z.: Large-scale satellite image browsing
using automatic semantic categorization and content-based retrieval. In: IEEE International Workshop on Semantic Knowledge in Computer Vision, in conjunction
with IEEE International Conference on Computer Vision. pp. 1873–1880. Beijing,
China (2005)
8. Ramapriyan, H.K.: Satellite imagery in earth science applications. In: Castelli, V.,
Bergman, L.D. (eds.) Image Databases, pp. 35–82. John Wiley & Sons, Inc. (2002)
9. Vapnik, V.: Statistical Learning Theory. John Wiley (1998)
10. Yang, L., Wu, X., Praun, E., Ma, X.: Tree detection from aerial imagery. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances
in Geographic Information Systems. pp. 131–137. GIS ’09, New York, NY, USA
(2009)
Download