Uploaded by Vi Eltins

IET Image Processing - 2020 - Bera - Effect of pooling strategy on convolutional neural network for classification of

advertisement
IET Image Processing
Research Article
Effect of pooling strategy on convolutional
neural network for classification of
hyperspectral remote sensing images
ISSN 1751-9659
Received on 21st May 2019
Revised 30th September 2019
Accepted on 17th October 2019
E-First on 20th January 2020
doi: 10.1049/iet-ipr.2019.0561
www.ietdl.org
Somenath Bera1, Vimal K. Shrivastava2
1School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, India
2School of Electronics Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, India
E-mail: vimal.shrivastavafet@kiit.ac.in
Abstract: The deep convolutional neural network (CNN) has recently attracted the researchers for classification of
hyperspectral remote sensing images. The CNN mainly consists of convolution layer, pooling layer and fully connected layer.
The pooling is a regularisation technique and improves the performance of CNN while reducing the computation time. Various
pooling strategies have been developed in literature. This study shows the effect of pooling strategy on the performance of deep
CNN for classification of hyperspectral remote sensing images. The authors have compared the performance of various pooling
strategies such as max pooling, average pooling, stochastic pooling, rank-based average pooling and rank-based weighted
pooling. The experiments were performed on three well-known hyperspectral remote sensing datasets: Indian Pines, University
of Pavia and Kennedy Space Center. The proposed experimental results show that max pooling has produced better results for
all the three considered datasets.
1
Introduction
Hyperspectral image (HSI) acquired by advanced hyperspectral
sensors consists of high spectral and spatial resolution that provide
prolific information for the study and earth monitoring. HSI is
arranged with several hundreds of spectral bands of the identical
scenario which helps in classifying different objects in the surface.
On the other hand, hyperspectral sensors also provide spatial
information of small spatial structure of images and significantly
improves the classification accuracy [1]. HSIs are widely used in
agricultural monitoring [2], environment analysis and prediction
[3], climate monitoring [4], crop analysis [5], mineral detection [6]
and so on. These applications often require the identification of the
label of every pixel in the image. However, manual labelling of
pixels in the image is time consuming and costly. Therefore, there
is a need of a classification model for automatic labelling of pixels
in the image. Due to less training samples provided to train the
classification model, the accuracy of the classification model
depends on effective feature extraction and feature selection
algorithms. Another challenge is curse of dimensionality [7], which
is called Hughes phenomenon [8]. To handle such problem, various
dimension reduction algorithms have been proposed in literature,
which can be divided into feature extraction [9] and feature
selection [10] algorithms.
The main aim of feature extraction is to find a set of
representative vectors of HSI while eliminating the high
dimensionality. Typical feature extraction methods include
principal component analysis (PCA) [11], independent component
analysis [12], Fisher discriminant analysis [13], local linear
embedding [14] and so on. In [15], the authors proposed a semisupervised support vector machine to classify the HSI, where
sparse coding and dictionary learning were used to extract
discriminant features. In [16], a hybrid feature extraction method
has been proposed for synthetic aperture radar image registration.
On the other hand, the aim of feature selection algorithms is to
retain the most significant bands from the HSI and eliminate other
bands which has no impact in the classification [17].
Several algorithms have been presented in literature for
hyperspectral remote sensing image classification. The traditional
classification algorithms such as support vector machine [18],
multinomial logistic regression (LR) [19], minimum spanning
forest [20] and so on are used in classification of HSI. Recently,
IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486
© The Institution of Engineering and Technology 2019
deep learning (DL) [21] has produced encouraging results on HSI
classification [22–29] because of its ability to automatically learn
representative features from the data. Various DL methods have
been proposed in literature such as stacked autoencoder, deep
belief network and convolutional neural network (CNN) [30].
Among various DL methods, deep CNN has shown great
robustness and effectiveness in spatial feature extraction of HSI
because it supports local connection and weight-sharing
mechanism. Zhao and Du [31] used deep CNN for spatial features
extraction and LR for classification. Xu et al. [32] introduced band
grouping-based long short-term memory model in their work,
followed by CNN for spectral and spatial feature extraction. Cao et
al. [28] proposed a CNN framework for features extraction from
hyperspectral cube.
In general, CNN has three main layers: convolutional layer,
pooling layer and fully connected layer. In this paper, we mainly
focus on pooling layer. Pooling strategy collects the discriminative
information by removing irrelevant details and makes the
convolution features more invariant towards the small translations
of the input. Literatures have shown that pooling can greatly
improve the performance of image recognition [33]. Broadly, there
are two types of pooling strategies: value-based and rank-based.
Most of the deep CNN models presented in literature adopt max
pooling strategy which belongs to value-based pooling [34, 35]. On
the other hand, Shi et al. [36] proposed rank based pooling strategy
and it has shown impressive classification performance on
following four image datasets: MNIST, CIFAR-10, CIFAR-100 and
NORB. However, the effect of various pooling strategies on the
performance of deep CNN model has not been explored for HSI
datasets to the best of our knowledge. Hence, an investigation is
required to analyse the performance of various pooling strategies
for better understanding of their behaviour in HSI classification.
Therefore, we have presented here a comprehensive comparison of
five different pooling strategies: max pooling, average pooling,
stochastic pooling, rank-based average pooling (RAP) and rankbased weighted pooling (RWP). We have performed the
experiments on three well-known hyperspectral remote sensing
datasets: Indian Pines, University of Pavia and Kennedy Space
Center (KSC). The important contributions of our paper are: (i)
development of deep CNN architecture with value-based and rankbased pooling for HSI classification; (ii) investigation on effect of
480
2
Methodology
In this paper, a deep CNN model has been presented for
classification of HSI and the effect of different pooling strategies
on the performance of deep CNN has been explored. A deep CNN
model is constructed with input layer, several convolution and
pooling layers, fully connected layer and output layer. The details
of CNN are given below.
py =
2.1 Convolutional neural network
The human visual system has the power to detect and classify
objects very efficiently. Using this concept, machine learning
researchers have implemented several data processing methods that
are inspired from biological visual systems. Along this line, CNN
is inspired from neuro-science. The CNN model has advantage
over other DL models because of local connections and shared
weights, i.e. the same weights are applied at every location of the
input. In other words, all the pixel positions share the same filter
weights. Therefore, computational parameters can be reduced by
using this concept. A deep CNN can be established by arranging
convolution layers with non-linear operation and pooling layers.
The layers of CNN are described below in brief.
2.1.1 Convolution layer: The convolution layer represented by set
of filters (or kernels) and biases. These filters have small receptive
field and are trained to learn specific features for an image [37]. A
convolution layer is formulated as shown in the following equation
[23]:
(1)
i=1
f lj represents jth activation map of the current (l)th layer,
where
f il − 1 is the ith activation map of the previous (l−1)th layer and K is
the number of input activation maps. wil j and bil are weight and bias
vectors. The * operator is used for convolution operation and α
denotes the activation function. After applying the activation
function to every activation map, the generated activation maps are
then send to the pooling layer.
2.1.2 Pooling layer: Pooling layer offers translation invariance
while diminishing the resolution of the activation maps and hence
reduces the computational complexity [38]. The pooling layer
activations generated from d × d (e.g. d = 2) window of activation
maps of previous convolution layer. The pooling strategy can be
broadly categorised into two types: value-based and rank-based.
The examples of value-based pooling are: max pooling, average
pooling and stochastic pooling. To reduce the scale problem
encountered by value-based pooling methods, Shi et al. [36]
proposed rank-based pooling strategies. The appropriate usage of
rank can ignore the scale problems as the rank of activation does
not create any difference when the activation is much larger than
surrounding activation. The rank-based pooling is derived from the
observation that ranking list is invariant under changes of
activation values in a pooling region. The examples of rank-based
pooling are RAP and RWP.
The different pooling strategies are described below in brief.
(a) Value-based pooling: The value-based pooling methods are
based on activation values. We have compared three value-based
pooling strategies: max, average and stochastic. A max pooling
provides the strongest value from the d × d pooling region of
IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486
© The Institution of Engineering and Technology 2019
ay
.
∑ ax
(2)
x ∈ Rj
c j = az where z ∼ multinomial(p1, p2, …p R j ) .
(3)
c j = ∑ pyay .
(4)
y ∈ Rj
(b) Rank-based pooling: In [36], a rank-based pooling strategy is
proposed to achieve robust features and to avoid the scaling
problem encountered on value-based pooling strategies. We have
compared two rank-based pooling strategies: RAP and RWP.
The RAP helps to solve the problem of useful information loss
generated by value-based pooling (mainly max pooling and
average pooling). Here, weights of largest m elements are assigned
to 1/m and remaining elements are assigned to 0 in the pooling
region. The output of the pooling region is calculated as shown in
(5) [36]
ou =
K
f lj = α( ∑ f il − 1 ∗ wil j + bil)
convolution activation map. Similarly, an average pooling provides
the average value from the d × d pooling region of convolution
activation map. There is another pooling strategy called stochastic
pooling [39] which randomly picks activation according to a
multinomial distribution at training phase and also involves
probabilistic weighting at test phase. Since stochastic pooling may
affect the network's predictions because of noise at test phase,
probabilistic weighting is considered [39]. More specifically, at
training phase stochastic pooling first generates the probability (p)
for every element inside pooling region (j) by normalising the
activations as shown in (2) [39]. Then it selects a location z from
multinomial distribution within the pooling region based on p and
the pooled activation is shown in (3) [39]. Finally at test phase, the
activations in every pooling region are multiplied by the
probability py (see (2)) and added as shown in (4) [39]
1
av .
m v ∈ R∑
,r ≤ m
u
(5)
v
where m indicates the rank to select the elements which are
associated with averaging. Ru represents the pooling region in
activation maps and v represents the index of every elements
within it. rv and av are rank and value of activation v, respectively.
The RWP reduces the scale problem and improves the
classification accuracy by assigning larger weights to higher
activations in the pooling region. The weights are calculated from
the following equation [40]:
pk = μ(1 − μ)k − 1,
k = 1, 2, …, n .
(6)
where μ is a hyper-parameter, k indicates the rank of activations,
and the size of pooling region is n. The RWP is defined by [36]
os = ∑ pkak .
k ∈ Rs
(7)
where pk is calculated from (6) and ak is value of activation k.
2.1.3 Fully connected layer: Activations in the fully connected
layer have connections to all activations of the previous layer. This
accommodates knowledge from all activation maps of the previous
layer. The output of the fully connected layer is the classification
output.
The deep CNN model has several layers such as convolution,
pooling and fully connected.
2.2 HSI classification using deep CNN
We have applied deep CNN model in classification of HSI. The
HSI dataset is first normalised in the range of [−0.5, +0.5] before
applying to PCA. Then, first principal component having
481
17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
five pooling strategies on the performance of deep CNN for HSI
classification.
The remaining part of this paper is organised as follows.
Section 2 presents the methodology of HSI classification using
deep CNN and discussed various pooling strategies in detail.
Section 3 reported the experimental results and discussion. Finally,
we finish this paper with conclusion in Section 4.
Table 1 Architecture details of the CNN
No.
Convolution
1
2
3
4 × 4 × 32
5 × 5 × 64
4 × 4 × 128
Stride
ReLU
Pooling
Stride
Dropout
1
1
1
yes
yes
yes
2×2
2×2
no
2
2
no
no
no
50%
maximum variance has been selected using PCA which produced
2D image. Then, a patch of size (27 × 27) has been extracted and
passed through several layers of deep CNN model. The deep CNN
model extracts discriminating spatial features and avoids the use of
handcrafted feature extraction techniques. Our deep CNN model
consists of three convolutional layers, two pooling layers and one
fully connected layer. The function of convolutional layers is to
extract features with discriminative information, noise reduction
and contrast enhancement. After passing through several
convolution layers, the number of feature maps increases, which
leads to high computational complexity for further layers.
Therefore, the pooling layer helps to reduce the dimension of
feature maps while adding the translation invariance property.
Lastly, the fully connected layer has the full connections to all the
activations of the previous layer which exploits information across
all the activation maps of the previous layer. However, we have
added a dropout [41] of 50% at the last layer and L2 regularisation
[42] to reduce the overfitting. Finally, the extracted features were
fed to the LR classifier which utilises softmax as its activation
function.
The structure of deep CNN model is shown in Fig. 1 and the
details of this structure are presented in Table 1. Fig. 1 shows that
the first convolution layer generated 32 activation maps, second
convolution layer generated 64 activation maps and third
convolution layer generated 128 activation maps. The window size
of all pooling layers is set to 2 × 2 with stride of 2. After several
feature extraction stages, the deep CNN model is trained using
back-propagation method with a mini-batch size of 100. Apart
from that, other parameters like learning rate, dropout ratio and
weight decay are set to 0.01, 0.5 and 0.0001, respectively. The
number of training epochs were fixed to 200.
3
Results and discussion
The performance evaluation of the presented deep CNN is done on
three well-known datasets in hyperspectral remote sensing images:
482
Indian Pines, University of Pavia and KSC. For training, we have
randomly selected 10% labelled data from each class and
remaining data has been used for testing.
3.1 Dataset description
(i) Indian Pines Dataset: This dataset covers the Indian Pines test
site in North-western Indiana. It was acquired by an airborne
visible/infrared imaging spectrometer (AVIRIS) sensor. The dataset
size is 145 × 145 pixels with spectral resolution 10 nm managing
the range of 400–2500 nm and having spatial resolution of 20 m.
This scene has 16 classes mostly associated with land covers with
220 bands. However, only 200 bands were used for classification
and other 20 bands have been removed as they were affected by
atmosphere absorption. The reference map has been depicted in
Fig. 2 and the number of training and test samples have been listed
in Table 2.
(ii) University of Pavia Dataset: This dataset covers the
Engineering School at the University of Pavia, northern Italy. It
was acquired by a reflective optics system imaging spectrometer
sensor. The dataset size is 610 × 340 pixels with the spatial
resolution of 1.3 m. This scene consists of 9 classes and 115
spectral bands. But only 103 bands were investigated for
classification and other 12 bands have been removed as they were
containing noise. The reference map has been depicted in Fig. 3
and the number of training and test samples have been listed in
Table 3.
(iii) KSC Dataset: It covers KSC, Florida. It was acquired by an
AVIRIS sensor. The dataset size is 512 × 614 pixels with a spatial
resolution of 18 m. This scene consists of 13 classes and 224
spectral bands. However, only 176 bands were investigated for
classification and remaining bands have been removed as they
were water absorption and low-signal-to-noise-ratio bands. The
reference map has been depicted in Fig. 4 and the number of
training and test samples have been listed in Table 4.
IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486
© The Institution of Engineering and Technology 2019
17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Fig. 1 Architecture of deep CNN model for classification of hyperspectral remote sensing images
Table 2 Number of training and test samples used in the
Indian Pines dataset
Class
Class name
No. of training No. of testing
number
samples
samples
Table 3 Number of training and test samples used in the
University of Pavia dataset
Class number Class name No. of training
No. of testing
samples
samples
1
2
3
4
5
6
7
1
2
3
4
5
6
7
8
9
8
9
10
11
12
13
14
15
16
Alfalfa
Corn-notill
Corn-mintill
Corn
Grass-pasture
Grass-trees
Grass-pasturemowed
Hay-windrowed
Oats
Soybeans-notill
Soybeans-mintill
Soybeans-clean
Wheat
Woods
Building-grass-treesdrives
Stone-steel-towers
total
5
143
83
24
49
73
3
41
1285
747
213
434
657
25
48
2
98
246
60
21
127
39
430
18
874
2209
533
184
1138
347
10
1031
83
9218
Asphalt
Meadows
Gravels
Trees
Metal sheets
Bare soil
Bitumen
Bricks
Shadows
total
664
1865
210
307
135
503
133
369
95
4281
5967
16,784
1889
2757
1210
4526
1197
3313
852
38,495
3.2 Performance evaluation
As mentioned earlier, the objective of this paper is to investigate
the effects of different pooling strategies on the performance of
deep CNN model for HSI classification. To evaluate the
performance of deep CNN model for HSI classification with
different pooling strategies, we have used the following
parameters:
(a) Overall accuracy (OA): The OA is defined as the number of
test data classified accurately divided by total number of test data.
(b) Average accuracy (AA): The AA is defined as the average value
of the classification accuracies of each class.
All the experiments were repeated 20 times by randomly selected
training samples and the average results were reported. The
quantitative comparison of deep CNN model with different pooling
strategies in terms of accuracy per class, AA and OA, has been
presented in Tables 5–7.
IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486
© The Institution of Engineering and Technology 2019
Fig. 3 University of Pavia dataset
(a) False colour image, (b) Ground-truth map
The classification results obtained by deep CNN model using
different pooling strategies on Indian Pines dataset have been
shown in Table 5. From this table, we observed that the deep CNN
model with max pooling obtains the best class-specific accuracies
on six classes (namely Alfalfa, Corn-notill, Corn-mintill,
Soybeans-mintill, Soybeans-clean and Stone-Steel-Towers) and the
best class-specific accuracies of remaining ten classes are obtained
using other pooling strategies. However, the difference between
class-specific accuracy of these ten classes obtained using max
pooling and other pooling strategies are too small. Further, we have
observed that ‘Soybeans-mintill’ class was the most difficult one to
be classified. However, our deep CNN model with max pooling is
able to achieve 89.74% accuracy for ‘Soybeans-mintill’ class.
483
17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Fig. 2 Indian Pines dataset
(a) False colour image, (b) Ground-truth map
Table 4 Number of training and test samples used in the KSC dataset
Class number
Class name
No. of training samples
1
2
3
4
5
6
7
8
9
10
11
12
13
Scrub
Willow swamp
CP hammock
CP/Oak
Slash pine
Oak/Broadleaf
Hardwood swamp
Graminoid marsh
Spartina marsh
Catiail marsh
Salt marsh
Mud flats
Water
total
No. of testing samples
77
25
26
26
17
23
11
44
52
41
42
51
93
528
684
218
230
226
144
206
94
387
468
363
377
452
834
4683
Table 5 Classification results obtained by deep CNN model using different pooling strategies on Indian Pines dataset
Average
Stochastic
RAP
RWP
Pooling strategy
Max
OA
AA
Alfalfa
Corn-notill
Corn-mintill
Corn
Grass-pasture
Grass-trees
Grass-pasture-mowe
Hay-windrowed
Oats
Soybeans-notill
Soybeans-mintill
Soybeans-clean
Wheat
Woods
Buildng grass trees
Stone-steel-towers
81.86
97.33
99.76
92.44
95.80
98.91
97.90
97.33
99.91
99.54
99.86
94.83
89.74
96.98
99.77
97.72
97.85
98.99
79.97
97.31
99.71
92.52
95.80
99.00
97.77
97.33
99.92
99.58
99.88
94.99
89.14
96.83
99.76
97.74
98.06
98.94
77.28
96.91
99.74
89.41
94.79
98.95
97.73
97.65
99.90
99.73
99.93
94.19
87.87
96.69
99.71
97.34
97.89
98.91
76.32
96.99
99.74
91.66
95.00
98.92
97.63
97.28
99.93
99.66
99.86
94.33
87.30
96.56
99.80
97.63
97.55
98.89
77.58
97.18
99.72
92.11
95.40
98.87
98.23
97.24
99.95
99.79
99.88
94.49
87.64
96.91
99.78
98.15
97.66
98.99
Bold values indicate highest overall accuracy (OA) and average accuracy (AA) among various pooling strategies.
484
IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486
© The Institution of Engineering and Technology 2019
17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Fig. 4 KSC dataset
(a) False colour image, (b) Ground-truth map
OA
AA
Asphalt
Meadows
Gravels
Trees
Metal sheets
Bare soil
Bitumen
Bricks
Shadows
87.52
97.17
96.41
94.13
98.09
97.25
99.74
94.69
98.84
97.79
97.53
87.23
96.80
96.33
92.50
98.27
96.98
99.71
93.19
98.62
98.01
97.92
85.97
96.84
95.87
93.68
97.85
96.94
99.60
93.99
98.35
97.78
97.81
83.69
96.45
95.31
91.71
97.53
97.10
99.71
92.79
98.44
97.90
97.67
84.88
96.92
96.73
92.92
98.42
97.12
99.76
93.18
98.57
97.99
97.76
Table 7 Classification results obtained by deep CNN model using different pooling strategies on KSC dataset
Stochastic
RAP
Pooling strategy
Max
Average
RWP
OA
AA
Scrub
Willow swamp
CP hammock
CP/oak
Slash pine
Oak/broadleaf
Hardwood swamp
Graminoid marsh
Spartina marsh
Catiail marsh
Salt marsh
Mud flats
Water
88.46
97.85
94.10
98.00
97.62
97.73
99.07
97.51
99.76
97.41
97.09
99.06
96.38
98.33
99.94
Bold values indicate highest overall accuracy (OA) and average accuracy (AA) among various pooling strategies.
91.13
98.37
95.35
97.96
98.62
98.29
99.47
98.22
99.88
98.76
98.15
99.13
97.50
98.60
99.93
90.50
98.36
96.12
98.65
98.22
98.10
99.44
98.53
99.61
97.78
97.72
99.24
97.52
98.45
99.26
89.59
98.27
95.17
98.14
98.09
98.19
99.35
98.25
99.86
97.55
98.13
99.15
97.11
98.57
99.92
88.28
97.86
94.41
98.11
97.50
98.17
98.99
97.67
99.70
96.86
96.52
99.10
96.84
98.30
99.98
Bold values indicate highest overall accuracy (OA) and average accuracy (AA) among various pooling strategies.
Lastly, we have observed that the deep CNN model with max
pooling shows the improvement on OA as compared to average
pooling of 1.89%, stochastic pooling of 4.58%, RAP of 4.54% and
RWP of 4.28%.
The classification results obtained by deep CNN model using
different pooling strategies on University of Pavia dataset have
been shown in Table 6. From this table, we observed that our deep
CNN model with max pooling achieved the best class-specific
accuracies on four classes (namely Meadows, Trees, Bare soil and
Bitumen) and the class-specific accuracies of remaining five
classes are not much lower than other pooling strategies. In this
dataset, we have found ‘Meadows’ class was the most difficult one
to be classified. The max pooling still shows the highest accuracy
of 94.13% for this class. Lastly, the deep CNN model with max
pooling shows an improvement on OA with respect to average
pooling of 0.29%, stochastic pooling of 1.55%, RAP of 3.83% and
RWP of 2.62%.
The classification results obtained by deep CNN model using
different pooling strategies on KSC dataset have been shown in
Table 7. From this table, we observed that our deep CNN model
with max pooling achieved the best class specific accuracies on
seven classes (namely CP hammock, CP/Oak, Slash pine,
Hardwood swamp, Graminoid marsh, Spartina marsh and Mud
flats) and the best class-accuracies for remaining six classes have
been obtained using average pooling strategy. However, the
difference between class-accuracy for these six classes obtained
using max pooling and average pooling is <1%. Here, the class
‘Scrub’ was found to be the most difficult one to be classified.
Still, our model with average pooling has achieved the accuracy of
96.12%, marginally better than max pooling that achieved an
accuracy of 95.35%. Lastly, we have observed that the CNN model
with max pooling shows an improvement on OA with respect to
IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486
© The Institution of Engineering and Technology 2019
average pooling of 0.63%, stochastic pooling of 1.54%, RAP of
2.85% and RAW of 2.67%.
In terms of overall performance, following observations have
been made: (i) deep CNN model with max pooling strategy has
achieved better OA and AA than other pooling strategies for all the
three datasets; (ii) our deep CNN model with max pooling strategy
is able to achieve a very high OA and AA on large test samples
with small training samples. For example, OA = 81.86% and AA =
97.33% has been achieved on 9218 test samples with 1031 training
samples for Indian Pines dataset. Similarly, OA = 87.52% and AA
= 97.17% has been obtained on 38,495 test samples with 4281
training samples for University of Pavia dataset and OA = 91.13%
and AA = 98.37% has been obtained on 4683 test samples with
only 528 training samples for KSC dataset. This shows that our
deep CNN model is able to achieve very high classification
accuracy with small training set which is desirable constraint in
HSI classification; (iii) after max pooling, average pooling strategy
has performed better than the other pooling strategies for all the
three datasets and thus, it is the closest competitor of max pooling.
After analysing the performance of different pooling strategies
for all the three considered datasets, we have observed that max
pooling carries the prominent local feature which contains
discriminating information for image classification. In this way, it
holds the high frequency components for the next layers and hence,
it is more suitable for HSI classification. On the other hand,
average pooling, RAP and RWP carries diverse feature by
combining information from all the elements of considered pooling
region. Therefore, there operation may have poor performance than
max pooling on HSI classification. Lastly, the stochastic pooling
may or may not select the salient information as it selects
activation by sampling from a multinomial distribution of each
pooling region.
485
17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Table 6 Classification results obtained by deep CNN model using different pooling strategies on University of Pavia dataset
Stochastic
RAP
RWP
Pooling strategy
Max
Average
In this paper, all the experiments were performed using a PC
integrated with Intel®Core(TM) i5-6200U, CPU@2.30 GHz. All
the experiments were implemented using MATLAB software. For
the presented deep CNN model, it costs 25 s for the Indian Pines
dataset, 107 s for University of Pavia dataset and 10.5 s for KSC
datasets for each epoch.
4
Conclusion
In this paper, we have presented a hyperspectral remote sensing
image classification approach using deep CNN model. The
experiment has been performed on three hyperspectral datasets:
Indian Pines, University of Pavia and KSC. The paper shows that
features extracted using deep CNN model are useful for
classification of HSI and able to achieve high classification
accuracy with small training set. Further, this paper examines the
performance of five pooling strategies such as max, average,
stochastic, RAP and RWP on deep CNN model for classification of
HSIs. The experimental results show that deep CNN model with
max pooling has obtained better classification accuracy as
compared with other pooling strategies for all three datasets. We
have compared the performance of different pooling strategies on
2D CNN model which extract only spatial features. The future
scope of this work can be comparison of performance of different
pooling strategies on 3D CNN model which extract spatial as well
as spectral features.
5
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
486
References
Kong, Y., Wang, X., Cheng, Y.: ‘Spectral–spatial feature extraction for hsi
classification based on supervised hypergraph and sample expanded cnn’,
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 2018, 11, (11), pp. 4128–
4140
Luo, B., Yang, C., Chanussot, J., et al.: ‘Crop yield estimation based on
unsupervised linear unmixing of multidate hyperspectral imagery’, IEEE
Trans. Geosci. Remote Sens., 2012, 51, (1), pp. 162–173
Yang, X., Yu, Y.: ‘Estimating soil salinity under various moisture conditions:
an experimental study’, IEEE Trans. Geosci. Remote Sens., 2017, 55, (5), pp.
2525–2533
Islam, T., Hulley, G.C., Malakar, N.K., et al.: ‘A physics-based algorithm for
the simultaneous retrieval of land surface temperature and emissivity from
viirs thermal infrared data’, IEEE Trans. Geosci. Remote Sens., 2016, 55, (1),
pp. 563–576
Huang, J., Wang, H., Dai, Q., et al.: ‘Analysis of ndvi data for crop
identification and yield estimation’, IEEE J. Sel. Top. Appl. Earth Obs.
Remote Sens., 2014, 7, (11), pp. 4374–4384
Sharma, A.: ‘Radioactive mineral identification based on fft radix-2
algorithm’, Electron. Lett., 2004, 40, (9), pp. 536–537
Donoho, D.L.: ‘High-dimensional data analysis: the curses and blessings of
dimensionality’, AMS Math Chall. Lect., 2000, 1, (2000), p. 32
Hughes, G.: ‘On the mean accuracy of statistical pattern recognizers’, IEEE
Trans. Inf. Theory, 1968, 14, (1), pp. 55–63
Zhao, W., Guo, Z., Yue, J., et al.: ‘On combining multiscale deep learning
features for the classification of hyperspectral remote sensing imagery’, Int. J.
Remote Sens., 2015, 36, (13), pp. 3368–3379
Chang, C.I., Wang, S.: ‘Constrained band selection for hyperspectral
imagery’, IEEE Trans. Geosci. Remote Sens., 2006, 44, (6), pp. 1575–1585
Rodarmel, C., Shan, J.: ‘Principal component analysis for hyperspectral image
classification’, Surv. Land Inf. Sci., 2002, 62, (2), pp. 115–122
Villa, A., Benediktsson, J.A., Chanussot, J., et al.: ‘Hyperspectral image
classification with independent component discriminant analysis’, IEEE
Trans. Geosci. Remote Sens., 2011, 49, (12), pp. 4865–4876
Pradhan, M.K., Minz, S., Shrivastava, V.K.: ‘Fisher discriminant ratio based
multiview active learning for the classification of remote sensing images’.
2018 4th Int. Conf. on Recent Advances in Information Technology (RAIT),
Dhanbad, India, 2018, pp. 1–6
Roweis, S.T., Saul, L.K.: ‘Nonlinear dimensionality reduction by locally
linear embedding’, Science, 2000, 290, (5500), pp. 2323–2326
Andekah, Z.A., Naderan, M., Akbarizadeh, G.: ‘Semi-supervised
hyperspectral image classification using spatial-spectral features and
superpixel-based sparse codes’. 2017 Iranian Conf. on Electrical Engineering
(ICEE), Tehran, Iran, 2017, pp. 2229–2234
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
Norouzi, M., Akbarizadeh, G., Eftekhar, F.: ‘A hybrid feature extraction
method for sar image registration’, Signal. Image. Video. Process., 2018, 12,
(8), pp. 1559–1566
Hinton, G.E., Salakhutdinov, R.R.: ‘Reducing the dimensionality of data with
neural networks’, Science, 2006, 313, (5786), pp. 504–507
Melgani, F., Bruzzone, L.: ‘Classification of hyperspectral remote sensing
images with support vector machines’, IEEE Trans. Geosci. Remote Sens.,
2004, 42, (8), pp. 1778–1790
Li, J., Bioucas-Dias, J.M., Plaza, A.: ‘Semisupervised hyperspectral image
segmentation using multinomial logistic regression with active learning’,
IEEE Trans. Geosci. Remote Sens., 2010, 48, (11), pp. 4085–4098
Bernard, K., Tarabalka, Y., Angulo, J., et al.: ‘Spectral–spatial classification
of hyperspectral data based on a stochastic minimum spanning forest
approach’, IEEE Trans. Image Process., 2011, 21, (4), pp. 2008–2021
Bengio, Y., Courville, A., Vincent, P.: ‘Representation learning: a review and
new perspectives’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (8), pp.
1798–1828
Deng, C., Xue, Y., Liu, X., et al.: ‘Active transfer learning network: a unified
deep joint spectral–spatial feature learning model for hyperspectral image
classification’, IEEE Trans. Geosci. Remote Sens., 2018, 57, (3), pp. 1741–
1754
Chen, Y., Jiang, H., Li, C., et al.: ‘Deep feature extraction and classification of
hyperspectral images based on convolutional neural networks’, IEEE Trans.
Geosci. Remote Sens., 2016, 54, (10), pp. 6232–6251
Sharifzadeh, F., Akbarizadeh, G., Kavian, Y.S.: ‘Ship classification in sar
images using a new hybrid cnn–mlp classifier’, J. Ind. Soc. Remote Sens.,
2019, 47, (4), pp. 551–562
Qing, C., Ruan, J., Xu, X., et al.: ‘Spatial-spectral classification of
hyperspectral images: a deep learning framework with markov random fields
based modelling’, IET Image Process., 2018, 13, (2), pp. 235–245
Zhu, J., Fang, L., Ghamisi, P.: ‘Deformable convolutional neural networks for
hyperspectral image classification’, IEEE Geosci. Remote Sens. Lett., 2018,
15, (8), pp. 1254–1258
Li, W., Chen, C., Zhang, M., et al.: ‘Data augmentation for hyperspectral
image classification with deep cnn’, IEEE Geosci. Remote Sens. Lett., 2018,
16, (4), pp. 593–597
Cao, X., Zhou, F., Xu, L., et al.: ‘Hyperspectral image classification with
markov random fields and a convolutional neural network’, IEEE Trans.
Image Process., 2018, 27, (5), pp. 2354–2367
Feng, J., Chen, J., Liu, L., et al.: ‘Cnn-based multilayer spatial–spectral
feature fusion and sample augmentation with local and nonlocal constraints
for hyperspectral image classification’, IEEE J. Sel. Top. Appl. Earth Obs.
Remote Sens., 2019, 12, (4), pp. 1299–1313
Zhang, L., Zhang, L., Du, B.: ‘Deep learning for remote sensing data: a
technical tutorial on the state of the art’, IEEE Geosci. Remote Sens. Mag.,
2016, 4, (2), pp. 22–40
Zhao, W., Du, S.: ‘Spectral–spatial feature extraction for hyperspectral image
classification: a dimension reduction and deep learning approach’, IEEE
Trans. Geosci. Remote Sens., 2016, 54, (8), pp. 4544–4554
Xu, Y., Zhang, L., Du, B., et al.: ‘Spectral–spatial unified networks for
hyperspectral image classification’, IEEE Trans. Geosci. Remote Sens., 2018,
56, (10), pp. 5893–5909
Boureau, Y.L., Ponce, J., LeCun, Y.: ‘A theoretical analysis of feature pooling
in visual recognition’. Proc. of the 27th int. Conf. on Machine Learning
(ICML-10), Haifa, Israel, 2010, pp. 111–118
Xu, X., Li, W., Ran, Q., et al.: ‘Multisource remote sensing data classification
based on convolutional neural network’, IEEE Trans. Geosci. Remote Sens.,
2017, 56, (2), pp. 937–949
Paoletti, M., Haut, J., Plaza, J., et al.: ‘A new deep convolutional neural
network for fast hyperspectral image classification’, ISPRS J. Photogramm.
Remote Sens., 2018, 145, pp. 120–147
Shi, Z., Ye, Y., Wu, Y.: ‘Rank-based pooling for deep convolutional neural
networks’, Neural Netw., 2016, 83, pp. 21–31
Yang, X., Ye, Y., Li, X., et al.: ‘Hyperspectral image classification with deep
learning models’, IEEE Trans. Geosci. Remote Sens., 2018, 56, (9), pp. 5408–
5423
Li, Y., Xie, W., Li, H.: ‘Hyperspectral image reconstruction by deep
convolutional neural network for classification’, Pattern Recognit., 2017, 63,
pp. 371–383
Zeiler, M.D., Fergus, R.: ‘Stochastic pooling for regularization of deep
convolutional neural networks’, arXiv preprint arXiv:13013557, 2013
Michalewicz, Z.: ‘Genetic algorithms + data structures = evolution programs’
(Springer Science & Business Media, Berlin, 2013)
Srivastava, N., Hinton, G., Krizhevsky, A., et al.: ‘Dropout: a simple way to
prevent neural networks from overfitting’, J. Mach. Learn. Res., 2014, 15, (1),
pp. 1929–1958
Karpathy, A.: ‘Stanford university cs231n: convolutional neural networks for
visual recognition’, URL: Available at http://cs231n stanfordedu/syllabus
html, 2018
IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486
© The Institution of Engineering and Technology 2019
17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
3.3 Computational time
Download