Morphological galaxy classification is a system used by

advertisement
Shape Descriptors in Morphological Galaxy Classification
Ishita Dutta1, S. Banerjee2 & M. De3
1&2
Department of Natural Science, West Bengal University of Technology
Department of Engineering and Technological Studies, University of Kalyani
E-mail: idutta_kalyani@yahoo.co.in1, bsreeparna1@rediffmail.com2, demallika@yahoo.com3
3
Abstract – A Morphological Classification of Galaxies is an
important step towards the understanding of the origin of
the Universe. Shape descriptors could be useful indicators
in classifying galaxy shapes. Galaxy images obtained from
different modalities using different regions of the
electromagnetic spectrum are preprocessed and expressed
as chain codes for efficient computation. Shape matching
between these galaxy images with the prototype images of
Hubble is performed using Principal Component Analysis
(PCA). Then Euclidean distance between the score values
of candidate and prototype images were computed and the
proper class for each of the candidate images is obtained.
Promising results are discussed and the overall accuracy of
classification is 87%.
I.
classification scheme is the system devised by Sir
Edwin Hubble in 1936 [9].
This scheme is commonly referred to as the
"Hubble Tuning Fork" and is shown in the figure 1 . In
this figure the following shapes are depicted:
Elliptical: E0, E3 , E5, E7
Spiral: S0, Sa, Sb, Sc
Barred spiral: SBa, SBb, SBc
The barred spiral galaxy is a spiral galaxy with a
band of bright stars emerging from the center and
running across the middle of the galaxy. In Hubble
tuning fork it is classified as "SB" (spiral, barred)
ranged them into three sub-categories based on how
open the arms of the spiral are. SBa types feature tightly
bound arms, while SBc types are at the other extreme
and have loosely bound arms. SBb-type galaxies lie in
between. E or Elliptical galaxies are featureless objects
with elliptical isophotes (these elliptical shapes contain
old stars and are referred to as bulges). S0 and SB0 are
essentially disk galaxies without spiral structures and are
often referred to as lenticular galaxies, which contain
both disks (where star creation activity can take place)
and bulges. Spiral galaxies are in the development stage,
with star formation occurring in the discs, bars and
spiral arms all of which have different intensities (show
up as different colors)
INTRODUCTION
Galaxies are gravitationally bound celestial entities
composed of gas, dust, and billions of stars. Galaxies
form over billions of years, and their morphology –
essentially their shape and general visual appearance –
gives astronomers much information about their
composition and their evolution. Galaxy classification is
important because astrophysicists frequently make use
of large catalogues of information to test existing
theories against, or to form new conjectures to explain
the physical processes governing galaxies, star
formation, and the nature of the universe. This paper
represents our attempts to automate galaxy classification
using EFD [1] [2].
The paper is structured in the following way:
Section 2 introduces Galaxy Classification, Section 3
describes Earlier Work, Section 4 discusses the
methodologies applied in detail. Section 5 presents
experimental results, and our Discussions are in
Section 6.
III. EARLIER WORK
Guo et al. [3] developed a new classifications
framework for quantitative galaxy classification using
irregular shape symmetry measures. Before analyzing
the galaxy shape, they performed image segmentation to
separate the target object from the background which is
adopted in our paper as preprocessing work. Then all the
classes such as Bilateral symmetry and rotational
symmetry galaxy (BR), Rotational symmetry galaxy
(R), Bilateral symmetry galaxy (B) and Irregular galaxy
(Ir) are defined by geometric operations.
II. GALAXY CLASSIFICATION
Morphological galaxy classification is a system
used by astronomers to classify galaxies based on their
structure and appearance. The most common
ISSN (Print) : 2319 – 2526, Volume-2, Issue-5, 2013
136
International Journal on Advanced Computer Theory and Engineering (IJACTE)
Kasivajhula et al. [4] performed a comparative
study between three machine learning algorithms i.e.
Support Vector Machines (SVM), Random Forests
(RF), and Naïve Bayes (NB) as applied to
morphological galaxy classification. For this purpose
morphic feature by image analysis and data compressed
through PCA [5] are used and it is shown that RF
performed better than SVM and NB. Also, morphic
features were found to be more effective than PCA [5]
features. Odewahn et al. [6] and Butler [7] use
automated surface photometry and pattern classification
techniques to morphologically classify galaxies. In this
paper a two-dimensional light distribution of a galaxy is
reconstructed using Fourier series fits to azimuthal
profiles computed in concentric elliptical annuli
centered on the galaxy. Both the phase and amplitude of
each Fourier component have been studied as a function
of radial bin number for a large collection of galaxy
images using principal-component analysis.
IV. GALAXY CLASSIFICATION USING EFD
The architecture of our process is divided into two
main phase. In the Image Preprocessing phase, each
galaxy image is enhanced and morphological operations
are performed to remove extraneous noise. In the second
stage, post-processing is carried out. The contours are
encoded in the Freeman chain code [10] form and then
approximated with the first twenty coefficients of
Elliptic Fourier Descriptors [1] [2] which are
subsequently renormalized. A Principal Component
Analysis (PCA) [5] is then performed and the Euclidean
distances between the candidate images we have used
and the prototype images obtained from Hubble Tuning
Fork [9] are computed. The best match corresponds to
the minimum Euclidean distance between the candidate
and prototype image, and the classification obtained
from the prototype image is taken to be the class of the
candidate image. Details of pre-processing and postprocessing are described below.
In another paper Butler [7] used the Fourier
technique presented in Odewahn et al. to reconstruct the
images of one barred spiral galaxy and one nonbarred
spiral galaxy for which the method worked well. A
variant of the PCA [5] based classification method is
adopted by Luminita et al. [8] in which the system of
principal directions of each class reflects the tendencies
of that class and if a new example has to be assigned to
one of the classes (Spiral or Elliptical), the comparison
between the current system and the system that would
result by assigning the example to the class supplies
information about the consequences or disturbance on
the class features implied by the decision concerning
this assignment. Although Fourier Descriptor and PCA
[5] have been used extensively for galaxy classification,
no work appears to have been done in using the EFD [1]
[2] for the classification of galaxies.
4.1 Image Preprocessing
The flowchart for the image pre-processing
operations is described in Figure 2. Because the Galaxy
images from satellite data are not same size and scale
and have noise it is difficult to classify them. Hence,
preprocessing operations are performed. Input images
are first digitized and subsequently Otsu's method [11]
is applied with a small offset 0.01 to threshold the
images. This method is applied for automatic
binarization level decision, based on the shape of the
histogram. Next, we perform a morphological opening
operation [3] [12] [13] on the resulting black-white
image to remove small objects. This is followed by a
flood-filling operation [3] [12] [13] to fill all objects
with holes. After that an open source package SHAPE
developed by Iwata and Ukai [14] has been used for
shape analysis, described in the Image Post-processing
subsection.
Fig.1: Hubble’s classification scheme
Fig. 2 : Flowchart of pre-processing work
ISSN (Print) : 2319 – 2526, Volume-2, Issue-5, 2013
137
International Journal on Advanced Computer Theory and Engineering (IJACTE)
4.2 Image Post-Processing
components of each image and the Euclidean distance
between the score values of each candidate image with
all the model images are computed. Then the average
distance is computed and the minimum value is
considered as the best match. Since we considered three
Principal components for each image the asymmetry due
to the bar and arms of spiral and barred spiral galaxy can
be differentiated properly.
In the second phase of our project we use mainly an
open source package SHAPE proposed by Kuhl and
Giardina [2], which can delineate any type of shape with
a closed two-dimensional contour. After the noise
removal of the galaxy images, shape analysis, which
constitutes the post-processing phase described in
Figure 3, is performed. First the contours are chain
coded using Freeman's chain code [10]. SHAPE [1] [2]
was used to chain code the contours, then approximate
the shapes with the first 20 harmonics of the Elliptic
Fourier descriptors (EFD) [1] [2], normalize the EFD
coefficients [1] [2]. The coefficients of the EFDs [1][2]
are subsequently normalized to be invariant with respect
to the size, rotation, and starting point, with the
procedure based on the ellipse of the first harmonic. The
principal component analysis is then performed of the
coefficients of the EFDs [1] [2]. This performance is
done based on the variance-covariance matrix of the
coefficients. The scores of the derived principal
components are also calculated and stored in text format
files, which can be provided as input files for the various
subsequent analyses. This process was performed both
for candidate and prototype images. Then the Euclidean
distance between a particular candidate image and all
the model images are calculated and the best match is
chosen.
Fig. 4 : Original image and Output of the pre-processing work
Fig. 3 : Flowchart of post-processing work
Table 1. Classification obtained from shape matching of
Candidate & Prototype images
V. RESULTS
VI. RESULTS AND FUTURE WORK
The algorithm was tested using 50 images. Among
them 42 images matched with Hubble's scheme. The
output of the pre-processing work is shown in Figure 4.
In case of morphological opening operation the
structuring element is a disk shaped with a radius of 60
pixels. The Structuring element is 60 pixels in diameter.
The classification obtained from shape matching of
Candidate & Prototype images are shown in Table 1 for
some chosen images. Here we consider three principal
In this paper, Galaxy classification was performed
using Elliptic Fourier Descriptors and subsequently,
Principal Component Analysis (PCA) [5] was used for
dimensionality reduction .For this purpose Hubble
Tuning Fork [9] images were used as model/prototype
images. The whole process was carried out in two
phases. First of all some prepossessing is done to
ISSN (Print) : 2319 – 2526, Volume-2, Issue-5, 2013
138
International Journal on Advanced Computer Theory and Engineering (IJACTE)
remove noise, threshols the image and extract the
shapes. These contours were subsequently chain coded
and the EFDs [1,2] were obtained using the SHAPE
package [14These were used to extract the score values
of principal components of the images. These values, in
turn, were matched with the score values of the model
patterns of Hubble which had been similarly postprocessed . Thus the proper classification was obtained
for each candidate image. The overall agreement is 84%
for this algorithm.
Galaxy shapes have been analyzed based either on
their luminosity and color or directly on the shapes.
Odewahn et.al.[6] and Butler [6] have used the
luminosity of galaxies to interpret their shape. The
Fourier components of light distribution from galaxies
have been studied and analyzed. However, only the first
PCA has been considered and so asymmetries like bars
and arms cannot always be detected. Besides, only a few
galaxies have been studied. Our method compares PCAs
of candidate images with model images using two
principal components : aspect and curvature.
Fig. 5 : ROC Curve
Our results compare well with other methods.
Kasivajhula et al. [4] performed a comparative study
between three classification techniques used in the
literature, namely Random Forest with morphic features
only (RF), Support Vector Machines (SVM) and Naiive
Bayes classifier (NB) and found that RF gives the best
results with 85.72% correct classification accuracy,
followed by SVM with 80.41 % and NB least with
79.91% . The PCA based classification technique [8]
gave classification accuracies ranging from 60% using 5
training samples, and 95 % for 35 training samples. The
perceptron based method [15} gives a correct
classification of 75%-85% depending on image
dimensions (i.e. best with image size 16x16 and
decreasing with image sizes 12x12 and 8x8) but more or
less independent of image size. Our method gives an
overall accuracy of 87% (AUC), is simple to implement,
does not depend on number of training samples and can
be applied to realistic image sizes.
Most of the previous work based on shape analysis
focused on characterizing asymmetries or were restrcted
to rotational symmetry within an existing framework.
Guo et. al. [3] have quantized the imperfect symmetry
measures using geometric transformation. Our method
performs a shape based analysis of galaxies using EFDs
[1] and PCA decomposition using SHAPE [14], a
software package that has been successfully used to
study biological shapes. This method gives an actual
comparison with geometric shapes rather than
computing the degree of asymmetry. This method using
EFD has been successfully applied to the study of
tornado shapes [1]. Furthermore, Like Guo et. al. [3],
this method, being a shape based method rather than a
color or luminosity based method, can be extended to
other astronomical objects like galaxy clusters, nebulae
and nebulae clusters.
Future work to improve classification accuracy includes
incorporating more training data. With more data, a
statistical analysis can be performed to ascertain the
accuracy, quantitatively.
VII. ACKNOWLEDGMENTS
ID and SB wish to acknowledge a research grant
from University Grants Commission (UGC) of the
Government of India (F: 37-534 (09)) for funding of this
research.
Table 2: Results of classification using Receiver
Operating Characteristic Analysis (ROC).
VIII. REFERENCES
The ROC curve is given in Figure 5. The Area Under
the Curve (AUC) is 87% which indicates the overall
accuracy of our method.
[1]
M.A. Abidi and R.C. Gonzalez. Shape
Decomposition
Using
Elliptic
Fourier
Descriptors, Proceedings. 8th.IEEE South-east
ISSN (Print) : 2319 – 2526, Volume-2, Issue-5, 2013
139
International Journal on Advanced Computer Theory and Engineering (IJACTE)
Symposium on System Theory, Knoxville, TN,
53-61, 1986
[2]
F P Kuhl and C R Giardina, Elliptic Fourier
features of a closed contour, Computer Graphics
and Image Processing, 18, 236–258, 1982
[3]
Q. Guo, F. Guo and J. Shao. Irregular Shape
Symmetry Analysis: Theory and Application to
Quantitative Galaxy Classification, IEEE
TRANSACTIONS ON PATTERN ANALYSIS
AND MACHINE INTELLIGENCE, VOL. 32,
NO. 10, OCTOBER 2010.
[4]
S. Kasivajhula, N. Raghavan and H. Shah.
Morphological Galaxy Classification Using
Machine Learning, pp1-5
[5]
Calleja J., Fuentes O., “Machine Learning and
Image Analysis for Morphological Galaxy
Classification.” Monthly Notices of the Royal
Astronomical Society, Vol. 24, pp. 87-93, 2004.
[6]
S. C. Odewahn, S. H. Cohen, R. A. Windhorst,
and N. S. Philip. Automated Galaxy Morphology:
A Fourier Approach, ApJ, 568, 539, 2002
[7]
A. R. Butler. Development of a Fourier
Technique for Automated Spiral Galaxy
Classification,
Bulletin
of
American
Astronomical Society Meeting 209, vol. 38 p.
923 (2007).
[8]
L. State, D. Constantin and C. Sararu. PCA
Approach on Morphological Classification of
Galaxies, IEEE Systems Signals and Image
Processing, Chalkida Greece, IWSSIP 2009, 9781-4244-4530-1/09, pp1-4, 2009
[9]
E.P. Hubble. The Realm of the Nebulae, New
Haven, 1936.
[10]
H. Freeman. On encoding of arbitrary geometric
configurations, IRE Transactions on Electronic
computers EC 10, 260-268, 1961
[11]
N. Otsu. A Threshold Selection Method from
Gray-Level Histograms, IEEE Trans. Systems,
Man, and Cybernetics, vol. 9, no. 1, pp. 62-66,
1979
[12]
K.R. Castleman. Digital Image Processing,
Prentice Hall, 1996.
[13]
R.C. Gonzalez, R.E. Woods. Digital Image
Processing, second ed. Prentice Hall, 2002.
[14]
H. Iwata and Y. Ukai. SHAPE: A Computer
Program Package for Quantitative Evaluation of
Biological Shapes, Journal of Heredity, 93(5),
384-385,2002
[15]
J. Calleja and O. Fuentes. Automated
classification of galaxy images. Proceedings of
the Eight International
Conference on
Knowledge-Based Intelligent Information and
Engineering Systems, 3215, September 2004

ISSN (Print) : 2319 – 2526, Volume-2, Issue-5, 2013
140
Download