IET Image Processing Research Article Effect of pooling strategy on convolutional neural network for classification of hyperspectral remote sensing images ISSN 1751-9659 Received on 21st May 2019 Revised 30th September 2019 Accepted on 17th October 2019 E-First on 20th January 2020 doi: 10.1049/iet-ipr.2019.0561 www.ietdl.org Somenath Bera1, Vimal K. Shrivastava2 1School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, India 2School of Electronics Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, India E-mail: vimal.shrivastavafet@kiit.ac.in Abstract: The deep convolutional neural network (CNN) has recently attracted the researchers for classification of hyperspectral remote sensing images. The CNN mainly consists of convolution layer, pooling layer and fully connected layer. The pooling is a regularisation technique and improves the performance of CNN while reducing the computation time. Various pooling strategies have been developed in literature. This study shows the effect of pooling strategy on the performance of deep CNN for classification of hyperspectral remote sensing images. The authors have compared the performance of various pooling strategies such as max pooling, average pooling, stochastic pooling, rank-based average pooling and rank-based weighted pooling. The experiments were performed on three well-known hyperspectral remote sensing datasets: Indian Pines, University of Pavia and Kennedy Space Center. The proposed experimental results show that max pooling has produced better results for all the three considered datasets. 1 Introduction Hyperspectral image (HSI) acquired by advanced hyperspectral sensors consists of high spectral and spatial resolution that provide prolific information for the study and earth monitoring. HSI is arranged with several hundreds of spectral bands of the identical scenario which helps in classifying different objects in the surface. On the other hand, hyperspectral sensors also provide spatial information of small spatial structure of images and significantly improves the classification accuracy [1]. HSIs are widely used in agricultural monitoring [2], environment analysis and prediction [3], climate monitoring [4], crop analysis [5], mineral detection [6] and so on. These applications often require the identification of the label of every pixel in the image. However, manual labelling of pixels in the image is time consuming and costly. Therefore, there is a need of a classification model for automatic labelling of pixels in the image. Due to less training samples provided to train the classification model, the accuracy of the classification model depends on effective feature extraction and feature selection algorithms. Another challenge is curse of dimensionality [7], which is called Hughes phenomenon [8]. To handle such problem, various dimension reduction algorithms have been proposed in literature, which can be divided into feature extraction [9] and feature selection [10] algorithms. The main aim of feature extraction is to find a set of representative vectors of HSI while eliminating the high dimensionality. Typical feature extraction methods include principal component analysis (PCA) [11], independent component analysis [12], Fisher discriminant analysis [13], local linear embedding [14] and so on. In [15], the authors proposed a semisupervised support vector machine to classify the HSI, where sparse coding and dictionary learning were used to extract discriminant features. In [16], a hybrid feature extraction method has been proposed for synthetic aperture radar image registration. On the other hand, the aim of feature selection algorithms is to retain the most significant bands from the HSI and eliminate other bands which has no impact in the classification [17]. Several algorithms have been presented in literature for hyperspectral remote sensing image classification. The traditional classification algorithms such as support vector machine [18], multinomial logistic regression (LR) [19], minimum spanning forest [20] and so on are used in classification of HSI. Recently, IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486 © The Institution of Engineering and Technology 2019 deep learning (DL) [21] has produced encouraging results on HSI classification [22–29] because of its ability to automatically learn representative features from the data. Various DL methods have been proposed in literature such as stacked autoencoder, deep belief network and convolutional neural network (CNN) [30]. Among various DL methods, deep CNN has shown great robustness and effectiveness in spatial feature extraction of HSI because it supports local connection and weight-sharing mechanism. Zhao and Du [31] used deep CNN for spatial features extraction and LR for classification. Xu et al. [32] introduced band grouping-based long short-term memory model in their work, followed by CNN for spectral and spatial feature extraction. Cao et al. [28] proposed a CNN framework for features extraction from hyperspectral cube. In general, CNN has three main layers: convolutional layer, pooling layer and fully connected layer. In this paper, we mainly focus on pooling layer. Pooling strategy collects the discriminative information by removing irrelevant details and makes the convolution features more invariant towards the small translations of the input. Literatures have shown that pooling can greatly improve the performance of image recognition [33]. Broadly, there are two types of pooling strategies: value-based and rank-based. Most of the deep CNN models presented in literature adopt max pooling strategy which belongs to value-based pooling [34, 35]. On the other hand, Shi et al. [36] proposed rank based pooling strategy and it has shown impressive classification performance on following four image datasets: MNIST, CIFAR-10, CIFAR-100 and NORB. However, the effect of various pooling strategies on the performance of deep CNN model has not been explored for HSI datasets to the best of our knowledge. Hence, an investigation is required to analyse the performance of various pooling strategies for better understanding of their behaviour in HSI classification. Therefore, we have presented here a comprehensive comparison of five different pooling strategies: max pooling, average pooling, stochastic pooling, rank-based average pooling (RAP) and rankbased weighted pooling (RWP). We have performed the experiments on three well-known hyperspectral remote sensing datasets: Indian Pines, University of Pavia and Kennedy Space Center (KSC). The important contributions of our paper are: (i) development of deep CNN architecture with value-based and rankbased pooling for HSI classification; (ii) investigation on effect of 480 2 Methodology In this paper, a deep CNN model has been presented for classification of HSI and the effect of different pooling strategies on the performance of deep CNN has been explored. A deep CNN model is constructed with input layer, several convolution and pooling layers, fully connected layer and output layer. The details of CNN are given below. py = 2.1 Convolutional neural network The human visual system has the power to detect and classify objects very efficiently. Using this concept, machine learning researchers have implemented several data processing methods that are inspired from biological visual systems. Along this line, CNN is inspired from neuro-science. The CNN model has advantage over other DL models because of local connections and shared weights, i.e. the same weights are applied at every location of the input. In other words, all the pixel positions share the same filter weights. Therefore, computational parameters can be reduced by using this concept. A deep CNN can be established by arranging convolution layers with non-linear operation and pooling layers. The layers of CNN are described below in brief. 2.1.1 Convolution layer: The convolution layer represented by set of filters (or kernels) and biases. These filters have small receptive field and are trained to learn specific features for an image [37]. A convolution layer is formulated as shown in the following equation [23]: (1) i=1 f lj represents jth activation map of the current (l)th layer, where f il − 1 is the ith activation map of the previous (l−1)th layer and K is the number of input activation maps. wil j and bil are weight and bias vectors. The * operator is used for convolution operation and α denotes the activation function. After applying the activation function to every activation map, the generated activation maps are then send to the pooling layer. 2.1.2 Pooling layer: Pooling layer offers translation invariance while diminishing the resolution of the activation maps and hence reduces the computational complexity [38]. The pooling layer activations generated from d × d (e.g. d = 2) window of activation maps of previous convolution layer. The pooling strategy can be broadly categorised into two types: value-based and rank-based. The examples of value-based pooling are: max pooling, average pooling and stochastic pooling. To reduce the scale problem encountered by value-based pooling methods, Shi et al. [36] proposed rank-based pooling strategies. The appropriate usage of rank can ignore the scale problems as the rank of activation does not create any difference when the activation is much larger than surrounding activation. The rank-based pooling is derived from the observation that ranking list is invariant under changes of activation values in a pooling region. The examples of rank-based pooling are RAP and RWP. The different pooling strategies are described below in brief. (a) Value-based pooling: The value-based pooling methods are based on activation values. We have compared three value-based pooling strategies: max, average and stochastic. A max pooling provides the strongest value from the d × d pooling region of IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486 © The Institution of Engineering and Technology 2019 ay . ∑ ax (2) x ∈ Rj c j = az where z ∼ multinomial(p1, p2, …p R j ) . (3) c j = ∑ pyay . (4) y ∈ Rj (b) Rank-based pooling: In [36], a rank-based pooling strategy is proposed to achieve robust features and to avoid the scaling problem encountered on value-based pooling strategies. We have compared two rank-based pooling strategies: RAP and RWP. The RAP helps to solve the problem of useful information loss generated by value-based pooling (mainly max pooling and average pooling). Here, weights of largest m elements are assigned to 1/m and remaining elements are assigned to 0 in the pooling region. The output of the pooling region is calculated as shown in (5) [36] ou = K f lj = α( ∑ f il − 1 ∗ wil j + bil) convolution activation map. Similarly, an average pooling provides the average value from the d × d pooling region of convolution activation map. There is another pooling strategy called stochastic pooling [39] which randomly picks activation according to a multinomial distribution at training phase and also involves probabilistic weighting at test phase. Since stochastic pooling may affect the network's predictions because of noise at test phase, probabilistic weighting is considered [39]. More specifically, at training phase stochastic pooling first generates the probability (p) for every element inside pooling region (j) by normalising the activations as shown in (2) [39]. Then it selects a location z from multinomial distribution within the pooling region based on p and the pooled activation is shown in (3) [39]. Finally at test phase, the activations in every pooling region are multiplied by the probability py (see (2)) and added as shown in (4) [39] 1 av . m v ∈ R∑ ,r ≤ m u (5) v where m indicates the rank to select the elements which are associated with averaging. Ru represents the pooling region in activation maps and v represents the index of every elements within it. rv and av are rank and value of activation v, respectively. The RWP reduces the scale problem and improves the classification accuracy by assigning larger weights to higher activations in the pooling region. The weights are calculated from the following equation [40]: pk = μ(1 − μ)k − 1, k = 1, 2, …, n . (6) where μ is a hyper-parameter, k indicates the rank of activations, and the size of pooling region is n. The RWP is defined by [36] os = ∑ pkak . k ∈ Rs (7) where pk is calculated from (6) and ak is value of activation k. 2.1.3 Fully connected layer: Activations in the fully connected layer have connections to all activations of the previous layer. This accommodates knowledge from all activation maps of the previous layer. The output of the fully connected layer is the classification output. The deep CNN model has several layers such as convolution, pooling and fully connected. 2.2 HSI classification using deep CNN We have applied deep CNN model in classification of HSI. The HSI dataset is first normalised in the range of [−0.5, +0.5] before applying to PCA. Then, first principal component having 481 17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License five pooling strategies on the performance of deep CNN for HSI classification. The remaining part of this paper is organised as follows. Section 2 presents the methodology of HSI classification using deep CNN and discussed various pooling strategies in detail. Section 3 reported the experimental results and discussion. Finally, we finish this paper with conclusion in Section 4. Table 1 Architecture details of the CNN No. Convolution 1 2 3 4 × 4 × 32 5 × 5 × 64 4 × 4 × 128 Stride ReLU Pooling Stride Dropout 1 1 1 yes yes yes 2×2 2×2 no 2 2 no no no 50% maximum variance has been selected using PCA which produced 2D image. Then, a patch of size (27 × 27) has been extracted and passed through several layers of deep CNN model. The deep CNN model extracts discriminating spatial features and avoids the use of handcrafted feature extraction techniques. Our deep CNN model consists of three convolutional layers, two pooling layers and one fully connected layer. The function of convolutional layers is to extract features with discriminative information, noise reduction and contrast enhancement. After passing through several convolution layers, the number of feature maps increases, which leads to high computational complexity for further layers. Therefore, the pooling layer helps to reduce the dimension of feature maps while adding the translation invariance property. Lastly, the fully connected layer has the full connections to all the activations of the previous layer which exploits information across all the activation maps of the previous layer. However, we have added a dropout [41] of 50% at the last layer and L2 regularisation [42] to reduce the overfitting. Finally, the extracted features were fed to the LR classifier which utilises softmax as its activation function. The structure of deep CNN model is shown in Fig. 1 and the details of this structure are presented in Table 1. Fig. 1 shows that the first convolution layer generated 32 activation maps, second convolution layer generated 64 activation maps and third convolution layer generated 128 activation maps. The window size of all pooling layers is set to 2 × 2 with stride of 2. After several feature extraction stages, the deep CNN model is trained using back-propagation method with a mini-batch size of 100. Apart from that, other parameters like learning rate, dropout ratio and weight decay are set to 0.01, 0.5 and 0.0001, respectively. The number of training epochs were fixed to 200. 3 Results and discussion The performance evaluation of the presented deep CNN is done on three well-known datasets in hyperspectral remote sensing images: 482 Indian Pines, University of Pavia and KSC. For training, we have randomly selected 10% labelled data from each class and remaining data has been used for testing. 3.1 Dataset description (i) Indian Pines Dataset: This dataset covers the Indian Pines test site in North-western Indiana. It was acquired by an airborne visible/infrared imaging spectrometer (AVIRIS) sensor. The dataset size is 145 × 145 pixels with spectral resolution 10 nm managing the range of 400–2500 nm and having spatial resolution of 20 m. This scene has 16 classes mostly associated with land covers with 220 bands. However, only 200 bands were used for classification and other 20 bands have been removed as they were affected by atmosphere absorption. The reference map has been depicted in Fig. 2 and the number of training and test samples have been listed in Table 2. (ii) University of Pavia Dataset: This dataset covers the Engineering School at the University of Pavia, northern Italy. It was acquired by a reflective optics system imaging spectrometer sensor. The dataset size is 610 × 340 pixels with the spatial resolution of 1.3 m. This scene consists of 9 classes and 115 spectral bands. But only 103 bands were investigated for classification and other 12 bands have been removed as they were containing noise. The reference map has been depicted in Fig. 3 and the number of training and test samples have been listed in Table 3. (iii) KSC Dataset: It covers KSC, Florida. It was acquired by an AVIRIS sensor. The dataset size is 512 × 614 pixels with a spatial resolution of 18 m. This scene consists of 13 classes and 224 spectral bands. However, only 176 bands were investigated for classification and remaining bands have been removed as they were water absorption and low-signal-to-noise-ratio bands. The reference map has been depicted in Fig. 4 and the number of training and test samples have been listed in Table 4. IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486 © The Institution of Engineering and Technology 2019 17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Fig. 1 Architecture of deep CNN model for classification of hyperspectral remote sensing images Table 2 Number of training and test samples used in the Indian Pines dataset Class Class name No. of training No. of testing number samples samples Table 3 Number of training and test samples used in the University of Pavia dataset Class number Class name No. of training No. of testing samples samples 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 8 9 10 11 12 13 14 15 16 Alfalfa Corn-notill Corn-mintill Corn Grass-pasture Grass-trees Grass-pasturemowed Hay-windrowed Oats Soybeans-notill Soybeans-mintill Soybeans-clean Wheat Woods Building-grass-treesdrives Stone-steel-towers total 5 143 83 24 49 73 3 41 1285 747 213 434 657 25 48 2 98 246 60 21 127 39 430 18 874 2209 533 184 1138 347 10 1031 83 9218 Asphalt Meadows Gravels Trees Metal sheets Bare soil Bitumen Bricks Shadows total 664 1865 210 307 135 503 133 369 95 4281 5967 16,784 1889 2757 1210 4526 1197 3313 852 38,495 3.2 Performance evaluation As mentioned earlier, the objective of this paper is to investigate the effects of different pooling strategies on the performance of deep CNN model for HSI classification. To evaluate the performance of deep CNN model for HSI classification with different pooling strategies, we have used the following parameters: (a) Overall accuracy (OA): The OA is defined as the number of test data classified accurately divided by total number of test data. (b) Average accuracy (AA): The AA is defined as the average value of the classification accuracies of each class. All the experiments were repeated 20 times by randomly selected training samples and the average results were reported. The quantitative comparison of deep CNN model with different pooling strategies in terms of accuracy per class, AA and OA, has been presented in Tables 5–7. IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486 © The Institution of Engineering and Technology 2019 Fig. 3 University of Pavia dataset (a) False colour image, (b) Ground-truth map The classification results obtained by deep CNN model using different pooling strategies on Indian Pines dataset have been shown in Table 5. From this table, we observed that the deep CNN model with max pooling obtains the best class-specific accuracies on six classes (namely Alfalfa, Corn-notill, Corn-mintill, Soybeans-mintill, Soybeans-clean and Stone-Steel-Towers) and the best class-specific accuracies of remaining ten classes are obtained using other pooling strategies. However, the difference between class-specific accuracy of these ten classes obtained using max pooling and other pooling strategies are too small. Further, we have observed that ‘Soybeans-mintill’ class was the most difficult one to be classified. However, our deep CNN model with max pooling is able to achieve 89.74% accuracy for ‘Soybeans-mintill’ class. 483 17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Fig. 2 Indian Pines dataset (a) False colour image, (b) Ground-truth map Table 4 Number of training and test samples used in the KSC dataset Class number Class name No. of training samples 1 2 3 4 5 6 7 8 9 10 11 12 13 Scrub Willow swamp CP hammock CP/Oak Slash pine Oak/Broadleaf Hardwood swamp Graminoid marsh Spartina marsh Catiail marsh Salt marsh Mud flats Water total No. of testing samples 77 25 26 26 17 23 11 44 52 41 42 51 93 528 684 218 230 226 144 206 94 387 468 363 377 452 834 4683 Table 5 Classification results obtained by deep CNN model using different pooling strategies on Indian Pines dataset Average Stochastic RAP RWP Pooling strategy Max OA AA Alfalfa Corn-notill Corn-mintill Corn Grass-pasture Grass-trees Grass-pasture-mowe Hay-windrowed Oats Soybeans-notill Soybeans-mintill Soybeans-clean Wheat Woods Buildng grass trees Stone-steel-towers 81.86 97.33 99.76 92.44 95.80 98.91 97.90 97.33 99.91 99.54 99.86 94.83 89.74 96.98 99.77 97.72 97.85 98.99 79.97 97.31 99.71 92.52 95.80 99.00 97.77 97.33 99.92 99.58 99.88 94.99 89.14 96.83 99.76 97.74 98.06 98.94 77.28 96.91 99.74 89.41 94.79 98.95 97.73 97.65 99.90 99.73 99.93 94.19 87.87 96.69 99.71 97.34 97.89 98.91 76.32 96.99 99.74 91.66 95.00 98.92 97.63 97.28 99.93 99.66 99.86 94.33 87.30 96.56 99.80 97.63 97.55 98.89 77.58 97.18 99.72 92.11 95.40 98.87 98.23 97.24 99.95 99.79 99.88 94.49 87.64 96.91 99.78 98.15 97.66 98.99 Bold values indicate highest overall accuracy (OA) and average accuracy (AA) among various pooling strategies. 484 IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486 © The Institution of Engineering and Technology 2019 17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Fig. 4 KSC dataset (a) False colour image, (b) Ground-truth map OA AA Asphalt Meadows Gravels Trees Metal sheets Bare soil Bitumen Bricks Shadows 87.52 97.17 96.41 94.13 98.09 97.25 99.74 94.69 98.84 97.79 97.53 87.23 96.80 96.33 92.50 98.27 96.98 99.71 93.19 98.62 98.01 97.92 85.97 96.84 95.87 93.68 97.85 96.94 99.60 93.99 98.35 97.78 97.81 83.69 96.45 95.31 91.71 97.53 97.10 99.71 92.79 98.44 97.90 97.67 84.88 96.92 96.73 92.92 98.42 97.12 99.76 93.18 98.57 97.99 97.76 Table 7 Classification results obtained by deep CNN model using different pooling strategies on KSC dataset Stochastic RAP Pooling strategy Max Average RWP OA AA Scrub Willow swamp CP hammock CP/oak Slash pine Oak/broadleaf Hardwood swamp Graminoid marsh Spartina marsh Catiail marsh Salt marsh Mud flats Water 88.46 97.85 94.10 98.00 97.62 97.73 99.07 97.51 99.76 97.41 97.09 99.06 96.38 98.33 99.94 Bold values indicate highest overall accuracy (OA) and average accuracy (AA) among various pooling strategies. 91.13 98.37 95.35 97.96 98.62 98.29 99.47 98.22 99.88 98.76 98.15 99.13 97.50 98.60 99.93 90.50 98.36 96.12 98.65 98.22 98.10 99.44 98.53 99.61 97.78 97.72 99.24 97.52 98.45 99.26 89.59 98.27 95.17 98.14 98.09 98.19 99.35 98.25 99.86 97.55 98.13 99.15 97.11 98.57 99.92 88.28 97.86 94.41 98.11 97.50 98.17 98.99 97.67 99.70 96.86 96.52 99.10 96.84 98.30 99.98 Bold values indicate highest overall accuracy (OA) and average accuracy (AA) among various pooling strategies. Lastly, we have observed that the deep CNN model with max pooling shows the improvement on OA as compared to average pooling of 1.89%, stochastic pooling of 4.58%, RAP of 4.54% and RWP of 4.28%. The classification results obtained by deep CNN model using different pooling strategies on University of Pavia dataset have been shown in Table 6. From this table, we observed that our deep CNN model with max pooling achieved the best class-specific accuracies on four classes (namely Meadows, Trees, Bare soil and Bitumen) and the class-specific accuracies of remaining five classes are not much lower than other pooling strategies. In this dataset, we have found ‘Meadows’ class was the most difficult one to be classified. The max pooling still shows the highest accuracy of 94.13% for this class. Lastly, the deep CNN model with max pooling shows an improvement on OA with respect to average pooling of 0.29%, stochastic pooling of 1.55%, RAP of 3.83% and RWP of 2.62%. The classification results obtained by deep CNN model using different pooling strategies on KSC dataset have been shown in Table 7. From this table, we observed that our deep CNN model with max pooling achieved the best class specific accuracies on seven classes (namely CP hammock, CP/Oak, Slash pine, Hardwood swamp, Graminoid marsh, Spartina marsh and Mud flats) and the best class-accuracies for remaining six classes have been obtained using average pooling strategy. However, the difference between class-accuracy for these six classes obtained using max pooling and average pooling is <1%. Here, the class ‘Scrub’ was found to be the most difficult one to be classified. Still, our model with average pooling has achieved the accuracy of 96.12%, marginally better than max pooling that achieved an accuracy of 95.35%. Lastly, we have observed that the CNN model with max pooling shows an improvement on OA with respect to IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486 © The Institution of Engineering and Technology 2019 average pooling of 0.63%, stochastic pooling of 1.54%, RAP of 2.85% and RAW of 2.67%. In terms of overall performance, following observations have been made: (i) deep CNN model with max pooling strategy has achieved better OA and AA than other pooling strategies for all the three datasets; (ii) our deep CNN model with max pooling strategy is able to achieve a very high OA and AA on large test samples with small training samples. For example, OA = 81.86% and AA = 97.33% has been achieved on 9218 test samples with 1031 training samples for Indian Pines dataset. Similarly, OA = 87.52% and AA = 97.17% has been obtained on 38,495 test samples with 4281 training samples for University of Pavia dataset and OA = 91.13% and AA = 98.37% has been obtained on 4683 test samples with only 528 training samples for KSC dataset. This shows that our deep CNN model is able to achieve very high classification accuracy with small training set which is desirable constraint in HSI classification; (iii) after max pooling, average pooling strategy has performed better than the other pooling strategies for all the three datasets and thus, it is the closest competitor of max pooling. After analysing the performance of different pooling strategies for all the three considered datasets, we have observed that max pooling carries the prominent local feature which contains discriminating information for image classification. In this way, it holds the high frequency components for the next layers and hence, it is more suitable for HSI classification. On the other hand, average pooling, RAP and RWP carries diverse feature by combining information from all the elements of considered pooling region. Therefore, there operation may have poor performance than max pooling on HSI classification. Lastly, the stochastic pooling may or may not select the salient information as it selects activation by sampling from a multinomial distribution of each pooling region. 485 17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Table 6 Classification results obtained by deep CNN model using different pooling strategies on University of Pavia dataset Stochastic RAP RWP Pooling strategy Max Average In this paper, all the experiments were performed using a PC integrated with Intel®Core(TM) i5-6200U, CPU@2.30 GHz. All the experiments were implemented using MATLAB software. For the presented deep CNN model, it costs 25 s for the Indian Pines dataset, 107 s for University of Pavia dataset and 10.5 s for KSC datasets for each epoch. 4 Conclusion In this paper, we have presented a hyperspectral remote sensing image classification approach using deep CNN model. The experiment has been performed on three hyperspectral datasets: Indian Pines, University of Pavia and KSC. The paper shows that features extracted using deep CNN model are useful for classification of HSI and able to achieve high classification accuracy with small training set. Further, this paper examines the performance of five pooling strategies such as max, average, stochastic, RAP and RWP on deep CNN model for classification of HSIs. The experimental results show that deep CNN model with max pooling has obtained better classification accuracy as compared with other pooling strategies for all three datasets. We have compared the performance of different pooling strategies on 2D CNN model which extract only spatial features. The future scope of this work can be comparison of performance of different pooling strategies on 3D CNN model which extract spatial as well as spectral features. 5 [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] 486 References Kong, Y., Wang, X., Cheng, Y.: ‘Spectral–spatial feature extraction for hsi classification based on supervised hypergraph and sample expanded cnn’, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 2018, 11, (11), pp. 4128– 4140 Luo, B., Yang, C., Chanussot, J., et al.: ‘Crop yield estimation based on unsupervised linear unmixing of multidate hyperspectral imagery’, IEEE Trans. Geosci. Remote Sens., 2012, 51, (1), pp. 162–173 Yang, X., Yu, Y.: ‘Estimating soil salinity under various moisture conditions: an experimental study’, IEEE Trans. Geosci. Remote Sens., 2017, 55, (5), pp. 2525–2533 Islam, T., Hulley, G.C., Malakar, N.K., et al.: ‘A physics-based algorithm for the simultaneous retrieval of land surface temperature and emissivity from viirs thermal infrared data’, IEEE Trans. Geosci. Remote Sens., 2016, 55, (1), pp. 563–576 Huang, J., Wang, H., Dai, Q., et al.: ‘Analysis of ndvi data for crop identification and yield estimation’, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 2014, 7, (11), pp. 4374–4384 Sharma, A.: ‘Radioactive mineral identification based on fft radix-2 algorithm’, Electron. Lett., 2004, 40, (9), pp. 536–537 Donoho, D.L.: ‘High-dimensional data analysis: the curses and blessings of dimensionality’, AMS Math Chall. Lect., 2000, 1, (2000), p. 32 Hughes, G.: ‘On the mean accuracy of statistical pattern recognizers’, IEEE Trans. Inf. Theory, 1968, 14, (1), pp. 55–63 Zhao, W., Guo, Z., Yue, J., et al.: ‘On combining multiscale deep learning features for the classification of hyperspectral remote sensing imagery’, Int. J. Remote Sens., 2015, 36, (13), pp. 3368–3379 Chang, C.I., Wang, S.: ‘Constrained band selection for hyperspectral imagery’, IEEE Trans. Geosci. Remote Sens., 2006, 44, (6), pp. 1575–1585 Rodarmel, C., Shan, J.: ‘Principal component analysis for hyperspectral image classification’, Surv. Land Inf. Sci., 2002, 62, (2), pp. 115–122 Villa, A., Benediktsson, J.A., Chanussot, J., et al.: ‘Hyperspectral image classification with independent component discriminant analysis’, IEEE Trans. Geosci. Remote Sens., 2011, 49, (12), pp. 4865–4876 Pradhan, M.K., Minz, S., Shrivastava, V.K.: ‘Fisher discriminant ratio based multiview active learning for the classification of remote sensing images’. 2018 4th Int. Conf. on Recent Advances in Information Technology (RAIT), Dhanbad, India, 2018, pp. 1–6 Roweis, S.T., Saul, L.K.: ‘Nonlinear dimensionality reduction by locally linear embedding’, Science, 2000, 290, (5500), pp. 2323–2326 Andekah, Z.A., Naderan, M., Akbarizadeh, G.: ‘Semi-supervised hyperspectral image classification using spatial-spectral features and superpixel-based sparse codes’. 2017 Iranian Conf. on Electrical Engineering (ICEE), Tehran, Iran, 2017, pp. 2229–2234 [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] Norouzi, M., Akbarizadeh, G., Eftekhar, F.: ‘A hybrid feature extraction method for sar image registration’, Signal. Image. Video. Process., 2018, 12, (8), pp. 1559–1566 Hinton, G.E., Salakhutdinov, R.R.: ‘Reducing the dimensionality of data with neural networks’, Science, 2006, 313, (5786), pp. 504–507 Melgani, F., Bruzzone, L.: ‘Classification of hyperspectral remote sensing images with support vector machines’, IEEE Trans. Geosci. Remote Sens., 2004, 42, (8), pp. 1778–1790 Li, J., Bioucas-Dias, J.M., Plaza, A.: ‘Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning’, IEEE Trans. Geosci. Remote Sens., 2010, 48, (11), pp. 4085–4098 Bernard, K., Tarabalka, Y., Angulo, J., et al.: ‘Spectral–spatial classification of hyperspectral data based on a stochastic minimum spanning forest approach’, IEEE Trans. Image Process., 2011, 21, (4), pp. 2008–2021 Bengio, Y., Courville, A., Vincent, P.: ‘Representation learning: a review and new perspectives’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (8), pp. 1798–1828 Deng, C., Xue, Y., Liu, X., et al.: ‘Active transfer learning network: a unified deep joint spectral–spatial feature learning model for hyperspectral image classification’, IEEE Trans. Geosci. Remote Sens., 2018, 57, (3), pp. 1741– 1754 Chen, Y., Jiang, H., Li, C., et al.: ‘Deep feature extraction and classification of hyperspectral images based on convolutional neural networks’, IEEE Trans. Geosci. Remote Sens., 2016, 54, (10), pp. 6232–6251 Sharifzadeh, F., Akbarizadeh, G., Kavian, Y.S.: ‘Ship classification in sar images using a new hybrid cnn–mlp classifier’, J. Ind. Soc. Remote Sens., 2019, 47, (4), pp. 551–562 Qing, C., Ruan, J., Xu, X., et al.: ‘Spatial-spectral classification of hyperspectral images: a deep learning framework with markov random fields based modelling’, IET Image Process., 2018, 13, (2), pp. 235–245 Zhu, J., Fang, L., Ghamisi, P.: ‘Deformable convolutional neural networks for hyperspectral image classification’, IEEE Geosci. Remote Sens. Lett., 2018, 15, (8), pp. 1254–1258 Li, W., Chen, C., Zhang, M., et al.: ‘Data augmentation for hyperspectral image classification with deep cnn’, IEEE Geosci. Remote Sens. Lett., 2018, 16, (4), pp. 593–597 Cao, X., Zhou, F., Xu, L., et al.: ‘Hyperspectral image classification with markov random fields and a convolutional neural network’, IEEE Trans. Image Process., 2018, 27, (5), pp. 2354–2367 Feng, J., Chen, J., Liu, L., et al.: ‘Cnn-based multilayer spatial–spectral feature fusion and sample augmentation with local and nonlocal constraints for hyperspectral image classification’, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 2019, 12, (4), pp. 1299–1313 Zhang, L., Zhang, L., Du, B.: ‘Deep learning for remote sensing data: a technical tutorial on the state of the art’, IEEE Geosci. Remote Sens. Mag., 2016, 4, (2), pp. 22–40 Zhao, W., Du, S.: ‘Spectral–spatial feature extraction for hyperspectral image classification: a dimension reduction and deep learning approach’, IEEE Trans. Geosci. Remote Sens., 2016, 54, (8), pp. 4544–4554 Xu, Y., Zhang, L., Du, B., et al.: ‘Spectral–spatial unified networks for hyperspectral image classification’, IEEE Trans. Geosci. Remote Sens., 2018, 56, (10), pp. 5893–5909 Boureau, Y.L., Ponce, J., LeCun, Y.: ‘A theoretical analysis of feature pooling in visual recognition’. Proc. of the 27th int. Conf. on Machine Learning (ICML-10), Haifa, Israel, 2010, pp. 111–118 Xu, X., Li, W., Ran, Q., et al.: ‘Multisource remote sensing data classification based on convolutional neural network’, IEEE Trans. Geosci. Remote Sens., 2017, 56, (2), pp. 937–949 Paoletti, M., Haut, J., Plaza, J., et al.: ‘A new deep convolutional neural network for fast hyperspectral image classification’, ISPRS J. Photogramm. Remote Sens., 2018, 145, pp. 120–147 Shi, Z., Ye, Y., Wu, Y.: ‘Rank-based pooling for deep convolutional neural networks’, Neural Netw., 2016, 83, pp. 21–31 Yang, X., Ye, Y., Li, X., et al.: ‘Hyperspectral image classification with deep learning models’, IEEE Trans. Geosci. Remote Sens., 2018, 56, (9), pp. 5408– 5423 Li, Y., Xie, W., Li, H.: ‘Hyperspectral image reconstruction by deep convolutional neural network for classification’, Pattern Recognit., 2017, 63, pp. 371–383 Zeiler, M.D., Fergus, R.: ‘Stochastic pooling for regularization of deep convolutional neural networks’, arXiv preprint arXiv:13013557, 2013 Michalewicz, Z.: ‘Genetic algorithms + data structures = evolution programs’ (Springer Science & Business Media, Berlin, 2013) Srivastava, N., Hinton, G., Krizhevsky, A., et al.: ‘Dropout: a simple way to prevent neural networks from overfitting’, J. Mach. Learn. Res., 2014, 15, (1), pp. 1929–1958 Karpathy, A.: ‘Stanford university cs231n: convolutional neural networks for visual recognition’, URL: Available at http://cs231n stanfordedu/syllabus html, 2018 IET Image Process., 2020, Vol. 14 Iss. 3, pp. 480-486 © The Institution of Engineering and Technology 2019 17519667, 2020, 3, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-ipr.2019.0561 by CAPES, Wiley Online Library on [01/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 3.3 Computational time