EXTENDED MULTI-STRUCTURE LOCAL BINARY PATTERN FOR HIGHRESOLUTION IMAGE SCENE CLASSIFICATION Xiaoyong Bian1,2, Chen Chen3, Qian Du4, Yuxia Sheng5 1 School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, China 2 Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System 3 Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75080 USA 4 Department of Electrical and Computer Engineering, Mississippi State University, Mississippi State, MS 39762 USA 5 School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan, 430081 China ABSTRACT This paper presents a novel extended multi-structure local binary pattern (EMSLBP) approach for high-resolution image classification, generalizing the well-known local binary pattern (LBP) approach. In the proposed EMSLBP approach, three-coupled descriptors with multi-structure sampling are proposed to extract complementary features (pixel value and radial difference) from local image patches. The anisotropic features derived from elliptical sampling are also rotation invariant by averaging the histograms over rotational angles and combined with the isotropic features extracted from circular sampling. Experimental results show that the proposed method can effectively capture local spatial pattern and local contrast, consistently outperforming several state-of-the-art classification algorithms. Index Terms—Spatial classification, High-resolution image, Random texture, Feature extraction, Rotation invariance 1. INTRODUCTION High-resolution image classification is challenging due to severe within-class variation. In recent years, the exploitation of spatial information to enhance classification performance has become an important part of high-resolution remote sensing research [1]. There have been a variety of studies that utilize spatial information for high-resolution image classification or attempt to extract informative and spatially invariant features [2]. The early representative spatial methods for highresolution image classification include knowledge transfer framework [3] and Markov random fields model (MRF) [4], which yield better accuracies than conventional spectral classification algorithms. Later, a geostatistical analysis of high-resolution data across space has been This work was supported in part by the National Natural Science Foundation of China under Grant 61501337, in part by the Natural Science Fund of Hubei Province under Grants 2015CFC839 and Q20151101. studied in [5] and better classification results are also provided. A probabilistic modeling spatial classification method based on extended random walkers (ERW) is proposed in [6] and better performance is reported therein. In addition, local binary pattern (LBP) operator, a simple yet efficient operator to describe local image patterns, has been presented and widely used in texture classification [7]. Recently, some LBP-based impressive classification results on hyperspectral images are also reported [8]. Although different from texture image, high-resolution satellite image is still full of diversified textures with land use and land cover (LULC) classes, especially true with multi-cluster classes in a large scene. This fact motivates us to develop new LBP descriptors for high-resolution image classification. However, there are still some unknown for LBP to be effectively designed to the classification of high-resolution image. Inspired by the work of Liu et al. [9] and Guo et al. [10], in this paper, an extended multi-structure LBP (EMSLBP) based highresolution image classification paradigm is proposed. First, high-resolution image datasets can be converted into YCbCr color space for feature extraction purpose. Then, the EMSLBP algorithm is adopted to extract spatial feature histograms from the scene image and feature histograms are combined. Finally, the support vector machine (SVM) classifier is used to obtain classification maps. This paper is an extension of our previous work [11]. Here, the previously proposed framework is enhanced by considering the pixel values and differences between central and neighboring pixels in a local patch as new LBP descriptors on the basis of extended multi-structure sampling. These enhancements lead to a substantial improvement in performance, as evidenced across extensively experimental tests on a wide range of highresolution image datasets. The pixel values of a central pixel and its neighboring pixels are both considered; while for pixel differences, for simplicity, only radial difference is studied. In our proposed approach feature code maps share the same format as conventional LBP and are readily combined to form the final feature histogram, and the implementation is simple as well. 2 difference-based descriptors RD LBPcriu (abbreviated as , p ,r 2. Extended Multi-structure LBP The original LBP methods compute patterns on small local patches and only consider the symmetric microstructures. Their performance may be limited, because they merely depend on single microstructure and oversimplify local structure. Moreover, they are sensitive to noise and image rotation. An improved method suggested by Ojala [7] is to consider only the rotation invariant “uniform” descriptor (called LBPpriu,r 2 ) defined as LBPpriu,r 2 P 1 s( x r ,n - x0,0 ), if U ( LBPp ,r ) 2 n 0 otherwise p 1, (1) U ( LBPp ,r ) | r ,n - x0,0 ) - s( xr ,mod(n 1, p ) - x0,0 ) |, n 0 (2) 1, x 0 s( x ) 0, x 0 (abbreviated as RD LBPe, p ,r ) are introduced, here subscripts c and e mean circularly symmetric sampling (isotropic properties) and elliptically asymmetric sampling (anisotropic properties), respectively. Similar to [9], we define the CI LBPc , p ,r , NI LBPc , p ,r , RD LBPc , p ,r descriptors as follows: CI LBPc, p ,r s( x0,0 - ) (3) where is the mean of the image, and p 1 s( x r ,n - r ), if U ( LBPp ,r ) 2 n 0 otherwise p 1, (4) 1 p 1 xr,n . Obviously, NI LBPc, p,r p n 0 and p -1 s( x 2 RD LBPeriu , p ,r ), NI LBPc , p ,r where where r LBPpriu,r 2 differ in the selection of thresholding value. The Thus, the LBP methods encode the local image information by circularly symmetric sampling gray values at a central pixel x0, 0 and p points ( xr ,n ) np01 . Suppose the coordinates of the central pixel are (0, 0), and let a 2n / p , for circular neighborhood, the coordinates of x r ,n are [ r sin(a ) , r cos(a ) ]; for elliptical neighborhood, let the length of the minor axis be equal to the radius r of circular neighborhood and set a certain ratio of elliptical major and minor axis as m , X 0 mr cos(a ) , Y0 r sin(a ) , then the x-coordinate of x r ,n is X 0 sin( ) Y0 cos( ) , while its y-coordinate of x r ,n equals RD LBPc , p ,r X 0 cos( ) Y0 sin( ) , where four different NI LBPc , p ,r descriptor tends to be more robust to noise. In addition, RD LBPc , p ,r p 1 s( x r ,n - x r 1,n ), if U ( LBPp ,r ) 2 (5) n 0 otherwise p 1, with the objective that RD LBPc , p ,r is to obtain local radial difference patterns computed from the pixel values of the pairs of neighboring pixels of the same radial direction. Note that CI LBPe, p ,r and CI LBPc , p ,r share the same definition format except that an elliptical neighborhood structure is employed for the former. The definitions for NI LBPe , p ,r and RD LBPe, p ,r are in the rotational angles {0 ,45 ,90 ,135 } in each ellipse. Those locations not falling exactly on a pixel are estimated by interpolation. As can be seen, the LBPpriu,r 2 has p 2 same way as their counterparts respectively. Let EMSLBPx , p ,r be any of the three local feature descriptors distinct output values, leading to local image representation of low dimensionality. Note that the following function s() shares the same definition as Equation (2). However, the LBPpriu,r 2 descriptor loses local texture LBP pattern of each pixel (i, j ) , then feature extractor hx of length K is computed as information and may fail to classify the LULC classes in high-resolution image, since only the sign of the difference is utilized mentioned above; on the other hand, the LULC classes are often of random distribution and anisotropic microstructures of them are also observed, as stated in [11]. To avoid such problems, an extended multi-structure LBP sampling is particularly preferred, i.e., we extend the neighbor distribution in the elliptical manner to capture anisotropic properties of LULC classes. Specifically, four 2 local pixel value-based CI LBPcriu (abbreviated as , p ,r 2 (abbreviated as NI LBPc , p ,r ), CI LBPc , p ,r ), NI LBPcriu , p ,r 2 CI LBPeriu , p ,r NI LBP riu 2 e , p ,r (abbreviated as CI LBPe, p ,r ) and (abbreviated as NI LBPe, p ,r ); two local aforementioned, and EMSLBPx , p ,r (i, j ) is the extracted hx ( k ) N M i 1 j 1 ( EMSLBP x , p ,r (i, j ) k ) (6) where 0 k K 1 , K 2 p is the number of LBP codes, the subscript x represents ‘c’ or ‘e’, and () is the Dirac delta function. M and N are the size of the image. Furthermore, the proposed three-coupled CI LBPx , p ,r , NI LBPx , p ,r and NI LBPx , p ,r histogram features can be readily fused and also be utilized to extract the macrostructure information by applying it on the multiscale down-sampled pyramid images. 3. Spatial classification with Extended MULTISTRUCTURE LBP A schematic illustration of the proposed EMSLBP based high-resolution image classification method is shown in Fig. 1. First, the extended multi-structure LBP sampling is adopted to obtain the initial feature histograms, which measure the distribution that a feature descriptor contributes to the discrimination power for a pixel to predict. Second, the anisotropic features are averaged and combined with isotropic features to be stacked as a final feature vector. Finally, the class of each test sample is determined by performing the SVM classification. Fig. 1. A schematic illustration of the proposed extended multistructure LBP based spatial classification method. 4. EXPERIMENTAL RESULTS 4.1. Image Data and Experimental Setup The first dataset used in our experiments is the 19-class satellite scene dataset [13]. It consists of 19 classes of high resolution satellite scenes. There are 50 images with sizes of 600 × 600 pixels for each class. The second dataset is the 21-class land-use dataset with ground truth labeling [14]. The dataset consists of images of 21 land use classes, and each class contains 100 images with sizes of 256 × 256 pixels. This is a challenging dataset due to a variety of spatial patterns in those 21 classes. For the 19-class and 21-class datasets, we randomly select 60%, 80% samples from each class as training samples and others were used for testing, respectively, with a fivefold cross-validation for free parameter selection. The randomly partitioned process is repeated ten times. The mean accuracy over the ten splits is used to evaluate the algorithms. For simplicity, we fix the sampling points p and alter different values of r to achieve the optimal implementation, where the value of p is set as 16 and three radii are chosen (i.e., r = [1, 2, 3]). The proposed EMSLBP method is compared with several classification methods including the NI/RD/CI-LBP in [9], CLBP_CSM [10], LBPV [15], MS-CLBP [8] and LBPpriu,r 2 3.1. EMSLBP Rotation Invariant Feature Extraction (the same as CLBP _ S priu,r2 , abbreviated as LBPriu 2 ) using The proposed approach introduces three-coupled local texture feature descriptors to extract spatial features for local image as aforementioned. The isotropic part of the proposed method is rotation invariant, and on the contrary of the anisotropic features extracted by elliptical sampling. Fortunately, there are some successful techniques that can contribute to rotation invariance. For instance, it could be done by globally searching for the corresponding angle (or minimal distance) among the extracted anisotropic histograms of all candidate samples but that would be computationally expensive. We propose to derive rotation invariance from anisotropic histograms by averaging the histograms over different rotational angles. The reason is that an average anisotropic histogram is insensitive to local image fluctuation such as rotation and its use as statistical feature of each image is globally invariant to these changes. overall accuracy (OA). To make the comparison as fair as possible, we use the same experimental settings as in [8, 9, 10], [15]. For other compared methods, the multiresolution features are simply stacked. 3.2. Feature Descriptors Combination After anisotropic part of the proposed method is transformed into rotation invariance, all the extracted feature histograms are directly stacked as a final composite feature vector or combined jointly, where a 2-D or 3-D joint histogram of them is built first, and then converted to 1D histogram and concatenated with each other. Based on the above analysis, the representation can be compared using standard distance metrics, allowing robust classification methods to be employed and class label assignment on the test set is conducted using LIBSVM [12]. 4.2. Classification Results Fig. 2(a) compares the best scores achieved by our proposed method and those of state-of-the-art methods. It can be seen that our approach have an increase of 5% to 9% with the best scores over LBPV and LBPriu 2 in the same test sets, whereas LBPriu 2 always produces an inferior performance in both cases, most likely in part due to the limited discrimination of the same microstructure of local image. Three different training rates are further investigated as shown in Fig. 2(b)-(c), from which we see that our method outperforms other methods with almost all training rates. Thus, it can be confirmed that our method is able to provide complementary texture features of image patch at multi-resolution multi-structure without a significant increase in computational complexity. Table 1 gives the average confusion matrices calculated from the 10 runs of random partitions of the training and testing sets using the proposed method. As seen in Table. 1, most of the LULC classes can be correctly classified, some even achieving very high classification accuracies, e.g., commercial, desert, and railway station in 19 classes and chaparral, harbor, and forest in 21 classes. However, there are still a few difficult classes, e.g., buildings, dense residential, storage tanks, and tennis court. This is partly due to the high similarity of these scene. achieve consistent improvements compared to the conventional LBP method and its variants. 6. REFERENCES Fig. 2. Comparison of our approach, NI/RD/CI-LBP, CLBP_CSM, LBPV, MS-CLBP and LBPriu 2 . (a) Results with the best classification scores for 19-class and 21-class datasets. (b) Results with different training rates by SVM for 19-class. (c) Results with different training rates by SVM for 21-class. Table 1. Average confusion matrices for the proposed EMSLBP method on two dataset (a) 19-class (b) 21-class. 5. CONCLUSIONS This paper introduces a novel extended multi-structure LBP based (EMSLBP) spatial method for high-resolution image scene classification. Through combining threecoupled complementary descriptors with multi-structure sampling in the classification framework, the classification accuracy of SVM can be consistently improved. The experimental results show the proposed approach can [1] G. Camps-Valls, D. Tuia, L. Bruzzone, and J. A. Benediktsson, “Advances in Hyperspectral Image Classification: Earth monitoring with statistical learning methods,” IEEE Signal Processing, vol. 31, no. 1, pp. 45– 54, Jan. 2014. [2] L. Bruzzone and C. Persello, “A novel approach to the selection of spatially invariant features for the classification of hyperspectral images with improved generalization capability,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 9, pp. 3180–3191, Sep. 2009. [3] S. Rajan, J. Ghosh, and M. Crawford, “Exploiting class hierarchies for knowledge transfer in hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 11, pp. 3408–3417, Nov. 2006. [4] Y. Tarabalka, J. Benediktsson, M. Fauvel, and J. Chanussot, “SVM- and MRF-based method for accurate classification of hyperspectral images,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 4, pp. 736–740, Oct. 2010. [5] G. Jun and J. Ghosh, “Spatially adaptive classification of land cover with remote sensing data,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 7, pp. 2662–2673, Jul. 2011. [6] X. Kang, S. Li, L. Fang, M. Li, and J. A. Benediktsson, “Extended random walker-based classification of hyperspectral images,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 1, pp. 144–153, Jan. 2015. [7] T. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987, Jul. 2002. [8] C. Chen, B. Zhang, H. Su, W. Li, and L. Wang, “Land-Use Scene Classification Using Multi-Scale Completed Local Binary Patterns,” Signal, Image and Video Processing, vol. 10, no. 4, pp. 745–752, Apr. 2016. [9] L. Liu, L. Zhao, Y. Long, G. Kuang, and P. Fieguth, “Extended local binary patterns for texture classification,” Image Vis. Comput., vol. 30, no. 2, pp. 86–99, Feb. 2012. [10] Z. Guo, L. Zhang, and D. Zhang, “A completed modeling of local binary pattern operator for texture classification,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1657–1663, Jun. 2010. [11] X. Bian, X. Zhang, R. Liu, L. Ma, and X. Fu, “Adaptive classification of hyperspectral images using local consistency,” Electronic Imaging, vol. 23, no. 6, pp. 063014-1–063014-17, Nov. 2014. [12] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1–27, Apr. 2011. [13] D. Dai and W. Yang, “Satellite image classification via two-layer sparse coding with biased image representation," IEEE Geosci. Remote Sens. Lett., vol. 8, no. 1, pp. 173–176, Jan. 2011. [14] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions for land-use classification,” in Proc. Int. Conf. Advances in Geographic Information Systems, San Jose, CA, pp. 270–279, Nov. 2010. [15] Z. Guo, L. Zhang, and D. Zhang, “Rotation invariant texture classification using LBP variance with global matching,” Pattern Recogn., vol. 43, no. 3, pp. 706–719, Mar. 2010.