GRB Classification using Self Organizing Map (SOM) Praveen Boinee Ph.D student Udine university Presentation outline GRB classification Neural networks Self organizing Map Operations How it is used in the classification Visualization Techniques with SOM Experimenting with data Research Plan References GRB Data Analysis Importance of analysis But … can be useful in understanding the physics of the gamma ray sources can be helpful in finding the GRB sources GRB data is one of the complex astronomical data sets High dimensionality Analysis Techniques Statistical Artificial Neural Networks can be efficiently used in data classification GRB Classes Two GRB classes are known to exist Burst class properties are indistinct Difficult to assign individual GRB’s to a class because of attribute overlap More complexity has been added by instrumental bias in the data GRB classification process Data Base PreProcessed GRB Data Data Preparation Classified data Data Mining GRB subclasses Scientific and Logical Assessment Visualization Neural Networks Set of interconnected neurons / information processing units A program designed to model how the brain performs a particular task Used to extract the pattern of information from data sets where numbers are vast and has hidden relations Ability to handle noisy data Neural Network Learning Learning = Training = knowing information This information is stored on the links between the neurons Neural Network Also called weights Two types of learning Input Weights Supervised unsupervised After Training Neural Network is ready to Classify the data Find hidden patterns / relations Output Supervised vs. Unsupervised Learning Imagine an organism or machine which experiences a series of sensory inputs:x1, x2, x3, x4, . . . Supervised learning: The machine is also given desired outputs y1, y2, . . ., and its goal is to learn to produce the correct output given a new input. Unsupervised learning: The goal of the machine is to build representations of x that can be used for reasoning, decision making, predicting things, communicating etc. Goals of Unsupervised Learning To find useful representations of the data, for example: finding clusters dimensionality reduction finding the hidden causes or sources of the data modelling the data density Uses of Unsupervised Learning data compression outlier detection classification make other learning tasks easier a theory of human learning and perception Self-Organisation The brain cells are self organizing themselves in groups, according to incoming information. This incoming information is not only received by a single neural cell, but also influences other cells in its neighbourhood. This organisation results in some kind of map, where Neural cells with similar functions are arranged close together. SOM mechanism is also based on this principle SOM working SOM produces the similarity graph of the input data Converts non-linear relationships between high dimensional data into simple geometric relationships Input pattern Weight Updated Weight Output space Input space Illustration of the SOM model with a 7 X 7architecture SOM – Self organizing Map Valuable tool in data mining and KDD Neural network algorithm for Data Mining Based on Unsupervised learning Vector quantisation + vector projection Used in clustering and visualization of high dimensional data sets Very effective in information visualizations Introduced by Teuvo Kohonen in 1984 Used in many fields But little done in astronomy area!! SOM Architecture Set of neurons / cluster units Each neuron is assigned with a prototype vector that is taken from the input data set The neurons of the map can be arranged either on a rectangular or a hexagonal lattice Every neuron has a neighborhood as shown in the figure Hexagonal Rectangular SOM in Classification Initialization Training Visualization Initialization Consider an n-dimensional dataset Each row in the data set is treated as a ndimensional vector For each neuron /classifier unit in the map assign a a prototype vector from the data set Prototype vectors are initialized Randomly Linearly After training Prototype vectors serves as an exemplar for all the vector that associated with the neuron Training – Best matching procedure Let i be a neuron in n n grid mi be the prototype vector associated to i x R n be a arbitrary vector Now our task is to map this x to any one of the neuron For each neuron compute the distance D min x m i i Better statistic: i Di max x mi i neuron satisfying the above statistic is the winner and denoted by b Topology Adjust – critical step The following update rule is used for each neuron i in the the neighborhood of winner neuron b mi t 1 mi t t hbi t x mi t t is the discrete time coordinate mi (t 1) is a prototype vector at t 1 2 rb ri hbi (t ) exp 2 t 2 is a neigh bourhood kernel rb , ri radius vectors of b,i neurons is the width of the kernel σ t (t ) is a scala r valued learning ra te of the map σ t ,α t are monotonically decreasing with time Training – Topology Training and Topology adjustments are made iteratively until a sufficiently accurate map is obtained After training the prototype vectors contain the cluster means for the classification Neurons can be labeled with the cluster means or classes of the associated prototype vectors Data Visualization using SOM Data visualization techniques using the SOM can be divided to three categories based on their goal: 1. visualization of clusters and shape of the data: projections, U-matrices and other distance matrices 2. visualization of components / variables: component planes, scatter plots 3. visualization of data projections: hit histograms, response surfaces Data Visualization using SOM The idea is to visually present many variables together offering a degree of control over a number of different visual properties High dimensionality of data set and visual properties such as color, size can be added to the position property for proper visualization purposes. Multiple views can be used by linking all separate views together when the use of these properties makes it difficult. Representation forms Cell visualizations (Distances matrices e.g. U-matrix, similarity coloring, map unit size) Component planes representation (Graphs, scatter plots, ..) Visual properties Color Position Shape Lighting Surface reflectance Transparency Derived information User interactions Clusters (data structure) View (2D/3D Shape of Data clusters distribution Relationships Object identifiers (icons) Mesh visualizations SOM grid Connection lines Surface plot of distance matrix Coordinates control Data classification in Cube Points The data set constructed for this demo consists of random vectors taken from a cube in 3D space The data is plotted using 'o's of different colors and the map prototype vectors with black '+'s. From the visualization we can see there are three clusters, some prototype vectors between the clusters 3 – xy points 2 – yz points 1 – zx points Similar vectors are coded with same color Clusters are coded with different colors in interpolated form XY plane points YZ plane points ZX plane points Data distributions for each vector component World Poverty Map Data set has 39 indicators describing various quality-of-life factors, such as state of health, nutrition, educational services, etc,) PhD research seminar (Qualifying phase) - September 19, 2001- Etien Luc Koua WEB SOM SOM analysis technique to map thousands of articles posted on Usenet newsgroups Lagus et al. (1996); Honkela et al. (1998) - HUT NN Research Centre) GRB classification - Choice of Parameters Three variables have been identified by Bagoly study on Batse 3B catalog using principle components and factor analysis Burst duration Parameter (T90 ): Time it takes for 90% of the total burst flux to arrive, taken from duration table of BATSE catalog Total flux in the channels : The rate of flow of particles or energy through a given surface Weighted fluence : the sum of the energies of the photons passing through a unit area. Batse 3B Data U-matrix of an SOM trained with 100 random GRBs from classes 1b and 2b (mukherjee classification).Distances increasing from gray to black color codes Landscape Plot Classes 1 and 2 are separated by clear boundary( mountain range ) Software Packages SOM_PAK MS-DOS / UNIX Free, from the website. The "official" SOM implementation. SOM Toolbox Matlab 5 Free, from the website. Software Geo-vista an open software development environment Java Bean component technology http://www.geovista.psu.edu/software/software. jsp Research Plan 1 2 Conceptual framework Theoretical model of the SOM for GRB data 3 Modeling and preprocessing of data 4 SOM algorithm adaptation and implementation Evaluation results and conclusions 8 Case studies: application to multi dimensional data sets Visualization system design Network training and testing 7 6 5 References T. Kohonen :Self – organizing Maps (second edition) H.J Rajaneimi , P.Mahonen : Classifying GRB using SOM ,APJ566:202-209 2002 February 10 J.Hakkila ,A.Meegan : AI Gamma-Ray Burst Classification: Methodology/Preliminary Results arXiv:astro-ph/9712077 4 Dec 1997 Juha Vesanto :SOM-Based Data visualization Methods in Intelligent Data Analysis journal, 1999: S.Kaski:Data exploration using SOM ,Espoo 1997 : T.Kohonen : Exploration of very large data bases by SOM , ICNN’97 Piscataway,NJ S.Mukherjee : Three types of Gamma Ray Bursts ,APJ 508:314-327,1998 M.Koskela , J. Laaksonen : Self Organizing Image retrieval with MPEG-7 Descriptors http://www.batse.msfc.nasa.gov/batse/grb/