Shape Descriptors in Morphological Galaxy Classification Ishita Dutta1, S. Banerjee2 & M. De3 1&2 Department of Natural Science, West Bengal University of Technology Department of Engineering and Technological Studies, University of Kalyani E-mail: idutta_kalyani@yahoo.co.in1, bsreeparna1@rediffmail.com2, demallika@yahoo.com3 3 Abstract – A Morphological Classification of Galaxies is an important step towards the understanding of the origin of the Universe. Shape descriptors could be useful indicators in classifying galaxy shapes. Galaxy images obtained from different modalities using different regions of the electromagnetic spectrum are preprocessed and expressed as chain codes for efficient computation. Shape matching between these galaxy images with the prototype images of Hubble is performed using Principal Component Analysis (PCA). Then Euclidean distance between the score values of candidate and prototype images were computed and the proper class for each of the candidate images is obtained. Promising results are discussed and the overall accuracy of classification is 87%. I. classification scheme is the system devised by Sir Edwin Hubble in 1936 [9]. This scheme is commonly referred to as the "Hubble Tuning Fork" and is shown in the figure 1 . In this figure the following shapes are depicted: Elliptical: E0, E3 , E5, E7 Spiral: S0, Sa, Sb, Sc Barred spiral: SBa, SBb, SBc The barred spiral galaxy is a spiral galaxy with a band of bright stars emerging from the center and running across the middle of the galaxy. In Hubble tuning fork it is classified as "SB" (spiral, barred) ranged them into three sub-categories based on how open the arms of the spiral are. SBa types feature tightly bound arms, while SBc types are at the other extreme and have loosely bound arms. SBb-type galaxies lie in between. E or Elliptical galaxies are featureless objects with elliptical isophotes (these elliptical shapes contain old stars and are referred to as bulges). S0 and SB0 are essentially disk galaxies without spiral structures and are often referred to as lenticular galaxies, which contain both disks (where star creation activity can take place) and bulges. Spiral galaxies are in the development stage, with star formation occurring in the discs, bars and spiral arms all of which have different intensities (show up as different colors) INTRODUCTION Galaxies are gravitationally bound celestial entities composed of gas, dust, and billions of stars. Galaxies form over billions of years, and their morphology – essentially their shape and general visual appearance – gives astronomers much information about their composition and their evolution. Galaxy classification is important because astrophysicists frequently make use of large catalogues of information to test existing theories against, or to form new conjectures to explain the physical processes governing galaxies, star formation, and the nature of the universe. This paper represents our attempts to automate galaxy classification using EFD [1] [2]. The paper is structured in the following way: Section 2 introduces Galaxy Classification, Section 3 describes Earlier Work, Section 4 discusses the methodologies applied in detail. Section 5 presents experimental results, and our Discussions are in Section 6. III. EARLIER WORK Guo et al. [3] developed a new classifications framework for quantitative galaxy classification using irregular shape symmetry measures. Before analyzing the galaxy shape, they performed image segmentation to separate the target object from the background which is adopted in our paper as preprocessing work. Then all the classes such as Bilateral symmetry and rotational symmetry galaxy (BR), Rotational symmetry galaxy (R), Bilateral symmetry galaxy (B) and Irregular galaxy (Ir) are defined by geometric operations. II. GALAXY CLASSIFICATION Morphological galaxy classification is a system used by astronomers to classify galaxies based on their structure and appearance. The most common ISSN (Print) : 2319 – 2526, Volume-2, Issue-5, 2013 136 International Journal on Advanced Computer Theory and Engineering (IJACTE) Kasivajhula et al. [4] performed a comparative study between three machine learning algorithms i.e. Support Vector Machines (SVM), Random Forests (RF), and Naïve Bayes (NB) as applied to morphological galaxy classification. For this purpose morphic feature by image analysis and data compressed through PCA [5] are used and it is shown that RF performed better than SVM and NB. Also, morphic features were found to be more effective than PCA [5] features. Odewahn et al. [6] and Butler [7] use automated surface photometry and pattern classification techniques to morphologically classify galaxies. In this paper a two-dimensional light distribution of a galaxy is reconstructed using Fourier series fits to azimuthal profiles computed in concentric elliptical annuli centered on the galaxy. Both the phase and amplitude of each Fourier component have been studied as a function of radial bin number for a large collection of galaxy images using principal-component analysis. IV. GALAXY CLASSIFICATION USING EFD The architecture of our process is divided into two main phase. In the Image Preprocessing phase, each galaxy image is enhanced and morphological operations are performed to remove extraneous noise. In the second stage, post-processing is carried out. The contours are encoded in the Freeman chain code [10] form and then approximated with the first twenty coefficients of Elliptic Fourier Descriptors [1] [2] which are subsequently renormalized. A Principal Component Analysis (PCA) [5] is then performed and the Euclidean distances between the candidate images we have used and the prototype images obtained from Hubble Tuning Fork [9] are computed. The best match corresponds to the minimum Euclidean distance between the candidate and prototype image, and the classification obtained from the prototype image is taken to be the class of the candidate image. Details of pre-processing and postprocessing are described below. In another paper Butler [7] used the Fourier technique presented in Odewahn et al. to reconstruct the images of one barred spiral galaxy and one nonbarred spiral galaxy for which the method worked well. A variant of the PCA [5] based classification method is adopted by Luminita et al. [8] in which the system of principal directions of each class reflects the tendencies of that class and if a new example has to be assigned to one of the classes (Spiral or Elliptical), the comparison between the current system and the system that would result by assigning the example to the class supplies information about the consequences or disturbance on the class features implied by the decision concerning this assignment. Although Fourier Descriptor and PCA [5] have been used extensively for galaxy classification, no work appears to have been done in using the EFD [1] [2] for the classification of galaxies. 4.1 Image Preprocessing The flowchart for the image pre-processing operations is described in Figure 2. Because the Galaxy images from satellite data are not same size and scale and have noise it is difficult to classify them. Hence, preprocessing operations are performed. Input images are first digitized and subsequently Otsu's method [11] is applied with a small offset 0.01 to threshold the images. This method is applied for automatic binarization level decision, based on the shape of the histogram. Next, we perform a morphological opening operation [3] [12] [13] on the resulting black-white image to remove small objects. This is followed by a flood-filling operation [3] [12] [13] to fill all objects with holes. After that an open source package SHAPE developed by Iwata and Ukai [14] has been used for shape analysis, described in the Image Post-processing subsection. Fig.1: Hubble’s classification scheme Fig. 2 : Flowchart of pre-processing work ISSN (Print) : 2319 – 2526, Volume-2, Issue-5, 2013 137 International Journal on Advanced Computer Theory and Engineering (IJACTE) 4.2 Image Post-Processing components of each image and the Euclidean distance between the score values of each candidate image with all the model images are computed. Then the average distance is computed and the minimum value is considered as the best match. Since we considered three Principal components for each image the asymmetry due to the bar and arms of spiral and barred spiral galaxy can be differentiated properly. In the second phase of our project we use mainly an open source package SHAPE proposed by Kuhl and Giardina [2], which can delineate any type of shape with a closed two-dimensional contour. After the noise removal of the galaxy images, shape analysis, which constitutes the post-processing phase described in Figure 3, is performed. First the contours are chain coded using Freeman's chain code [10]. SHAPE [1] [2] was used to chain code the contours, then approximate the shapes with the first 20 harmonics of the Elliptic Fourier descriptors (EFD) [1] [2], normalize the EFD coefficients [1] [2]. The coefficients of the EFDs [1][2] are subsequently normalized to be invariant with respect to the size, rotation, and starting point, with the procedure based on the ellipse of the first harmonic. The principal component analysis is then performed of the coefficients of the EFDs [1] [2]. This performance is done based on the variance-covariance matrix of the coefficients. The scores of the derived principal components are also calculated and stored in text format files, which can be provided as input files for the various subsequent analyses. This process was performed both for candidate and prototype images. Then the Euclidean distance between a particular candidate image and all the model images are calculated and the best match is chosen. Fig. 4 : Original image and Output of the pre-processing work Fig. 3 : Flowchart of post-processing work Table 1. Classification obtained from shape matching of Candidate & Prototype images V. RESULTS VI. RESULTS AND FUTURE WORK The algorithm was tested using 50 images. Among them 42 images matched with Hubble's scheme. The output of the pre-processing work is shown in Figure 4. In case of morphological opening operation the structuring element is a disk shaped with a radius of 60 pixels. The Structuring element is 60 pixels in diameter. The classification obtained from shape matching of Candidate & Prototype images are shown in Table 1 for some chosen images. Here we consider three principal In this paper, Galaxy classification was performed using Elliptic Fourier Descriptors and subsequently, Principal Component Analysis (PCA) [5] was used for dimensionality reduction .For this purpose Hubble Tuning Fork [9] images were used as model/prototype images. The whole process was carried out in two phases. First of all some prepossessing is done to ISSN (Print) : 2319 – 2526, Volume-2, Issue-5, 2013 138 International Journal on Advanced Computer Theory and Engineering (IJACTE) remove noise, threshols the image and extract the shapes. These contours were subsequently chain coded and the EFDs [1,2] were obtained using the SHAPE package [14These were used to extract the score values of principal components of the images. These values, in turn, were matched with the score values of the model patterns of Hubble which had been similarly postprocessed . Thus the proper classification was obtained for each candidate image. The overall agreement is 84% for this algorithm. Galaxy shapes have been analyzed based either on their luminosity and color or directly on the shapes. Odewahn et.al.[6] and Butler [6] have used the luminosity of galaxies to interpret their shape. The Fourier components of light distribution from galaxies have been studied and analyzed. However, only the first PCA has been considered and so asymmetries like bars and arms cannot always be detected. Besides, only a few galaxies have been studied. Our method compares PCAs of candidate images with model images using two principal components : aspect and curvature. Fig. 5 : ROC Curve Our results compare well with other methods. Kasivajhula et al. [4] performed a comparative study between three classification techniques used in the literature, namely Random Forest with morphic features only (RF), Support Vector Machines (SVM) and Naiive Bayes classifier (NB) and found that RF gives the best results with 85.72% correct classification accuracy, followed by SVM with 80.41 % and NB least with 79.91% . The PCA based classification technique [8] gave classification accuracies ranging from 60% using 5 training samples, and 95 % for 35 training samples. The perceptron based method [15} gives a correct classification of 75%-85% depending on image dimensions (i.e. best with image size 16x16 and decreasing with image sizes 12x12 and 8x8) but more or less independent of image size. Our method gives an overall accuracy of 87% (AUC), is simple to implement, does not depend on number of training samples and can be applied to realistic image sizes. Most of the previous work based on shape analysis focused on characterizing asymmetries or were restrcted to rotational symmetry within an existing framework. Guo et. al. [3] have quantized the imperfect symmetry measures using geometric transformation. Our method performs a shape based analysis of galaxies using EFDs [1] and PCA decomposition using SHAPE [14], a software package that has been successfully used to study biological shapes. This method gives an actual comparison with geometric shapes rather than computing the degree of asymmetry. This method using EFD has been successfully applied to the study of tornado shapes [1]. Furthermore, Like Guo et. al. [3], this method, being a shape based method rather than a color or luminosity based method, can be extended to other astronomical objects like galaxy clusters, nebulae and nebulae clusters. Future work to improve classification accuracy includes incorporating more training data. With more data, a statistical analysis can be performed to ascertain the accuracy, quantitatively. VII. ACKNOWLEDGMENTS ID and SB wish to acknowledge a research grant from University Grants Commission (UGC) of the Government of India (F: 37-534 (09)) for funding of this research. Table 2: Results of classification using Receiver Operating Characteristic Analysis (ROC). VIII. REFERENCES The ROC curve is given in Figure 5. The Area Under the Curve (AUC) is 87% which indicates the overall accuracy of our method. [1] M.A. Abidi and R.C. Gonzalez. Shape Decomposition Using Elliptic Fourier Descriptors, Proceedings. 8th.IEEE South-east ISSN (Print) : 2319 – 2526, Volume-2, Issue-5, 2013 139 International Journal on Advanced Computer Theory and Engineering (IJACTE) Symposium on System Theory, Knoxville, TN, 53-61, 1986 [2] F P Kuhl and C R Giardina, Elliptic Fourier features of a closed contour, Computer Graphics and Image Processing, 18, 236–258, 1982 [3] Q. Guo, F. Guo and J. Shao. Irregular Shape Symmetry Analysis: Theory and Application to Quantitative Galaxy Classification, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 32, NO. 10, OCTOBER 2010. [4] S. Kasivajhula, N. Raghavan and H. Shah. Morphological Galaxy Classification Using Machine Learning, pp1-5 [5] Calleja J., Fuentes O., “Machine Learning and Image Analysis for Morphological Galaxy Classification.” Monthly Notices of the Royal Astronomical Society, Vol. 24, pp. 87-93, 2004. [6] S. C. Odewahn, S. H. Cohen, R. A. Windhorst, and N. S. Philip. Automated Galaxy Morphology: A Fourier Approach, ApJ, 568, 539, 2002 [7] A. R. Butler. Development of a Fourier Technique for Automated Spiral Galaxy Classification, Bulletin of American Astronomical Society Meeting 209, vol. 38 p. 923 (2007). [8] L. State, D. Constantin and C. Sararu. PCA Approach on Morphological Classification of Galaxies, IEEE Systems Signals and Image Processing, Chalkida Greece, IWSSIP 2009, 9781-4244-4530-1/09, pp1-4, 2009 [9] E.P. Hubble. The Realm of the Nebulae, New Haven, 1936. [10] H. Freeman. On encoding of arbitrary geometric configurations, IRE Transactions on Electronic computers EC 10, 260-268, 1961 [11] N. Otsu. A Threshold Selection Method from Gray-Level Histograms, IEEE Trans. Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, 1979 [12] K.R. Castleman. Digital Image Processing, Prentice Hall, 1996. [13] R.C. Gonzalez, R.E. Woods. Digital Image Processing, second ed. Prentice Hall, 2002. [14] H. Iwata and Y. Ukai. SHAPE: A Computer Program Package for Quantitative Evaluation of Biological Shapes, Journal of Heredity, 93(5), 384-385,2002 [15] J. Calleja and O. Fuentes. Automated classification of galaxy images. Proceedings of the Eight International Conference on Knowledge-Based Intelligent Information and Engineering Systems, 3215, September 2004 ISSN (Print) : 2319 – 2526, Volume-2, Issue-5, 2013 140