The Supervised Network Self-Organizing Map for Classification of

advertisement
The Supervised Network SelfOrganizing Map for Classification
of Large Data Sets
Authors: Papadimitriou et al,
Advisor: Dr. Hsu
Graduate: Yu-Wei Su
Outline




Motivation
Objective
Introduction
The Supervised Network SOM





The classification partition SOM(CP-SOM)
The supervised expert network
Applications
Conclusions
Personal opinion
Motivation


Real data sets are frequently characterized
by the large number of noisy observations
Unsupervised learning schemes usually can’t
discriminate well over the state space of
complex decision boundaries
Objective


To develop the Network Self-Organizing
Map(SNet-SOM) to handle to ambiguous
regions of state space
To develop more computationally efficient
unsupervised learning scheme
Introduction



SNet-SOM utilizes a two stage learning
process that identifying and classifying at the
simple regions and supervised learning for
the difficult ones
The simple regions is handled by the SNetSOM based on SOM of Kohonen
The basic SOM is modified with a dynamic
node insertion/deletion process with an
entropy-based criterion
Introduction( cont.)

The difficult regions is handled by supervised
learning process such as RBF(radial basis
function) or SVM(support vector machine)
The Supervised Network SOM

The SNet-SOM consists of two components


The classification partition SOM(CP-SOM)
The supervised expert network
CP-SOM



The size of the CP-SOM is dynamically
expanded with an adaptive process for the
ambiguous regions
The dynamic growth is based on the entropybase criterion
Classifcation are performed only at the
unambiguous part of state space that
corresponds to the neurons of small entropy
CP-SOM( cont.)

CP-SOM learning flow

Initialization phase



Usually four nodes to represent the input data
It has lighter computational demands because of
avoiding the fine-tuning of neurons and the small size
Adaptation phase

However , parameters do not need to shrink with time
because the neighborhood is large enough to include
the whole and during subsequent training epochs, the
neighborhood becomes localized near the winning
neuron
CP-SOM( cont.)
w j (k ), j  N k

w j (k  1)  
w j (k )   (k )   k  ( xk  w j (k )), j  N k

Expansion phase




The controlling the number of training patterns that
correspond to the ambiguous regions is the motivation
for modifying SOM
The expansion phase follows the adaptation phase
SupervisedExpertMaxPatterns specified a limitation of
training set
SupervisedExpertMinPatterns specified the lower bound
of training set
CP-SOM( cont.)
1.
To compute the entropy for every node I
Nc
HN (m)   Pk log Pk
k 1
2.
3.
Detection of the neurons whose are
ambiguous according entropy threshold
value
Evaluation of the map to compute the
number of training patterns that correspond
to the ambiguous neurons denoted by
NumTrainingSetAtAmbiguous
CP-SOM( cont.)
If NumTrainingSetAtAmbiguous > SupervisedExpertMaxPatterns
4.
Perform map expansion by inserting smoothly at
the neighborhood of each ambiguous neuron a
number of neurons that depends on its fuzziness
2. Repeat the adaptation phase after the dynamic
extension
else
1.
If NumTrainingSetAtAmbiguous < Supervised ExpertMinPatterns

Reduce the parameter NodeEntropyThresholdForConsideringAmbiguous
and more node will be as ambiguous. Restarting
from step 2
CP-SOM( cont.)
else



generate training and testing set for the
supervised expert
endif
The assignment of a class label to each
neuron of the CP-SOM is performed by
majority-voting scheme
As a local averaging operator defined over
the class labels of all the patterns that
activate neuron as the winner
The supervised expert network


Has the task of discriminating over the state
space regions where are complex class
decision boundaries
Appropriate neural network models are
Radial Basis Function(RBF) and the Support
Vector Machines(SVM)
RBF supervising expert


Obtaining generalization performance by
which obtain a tradeoff between the fitness of
the solution to the training set and
smoothness of the solution
The tradeoff cost function as :(  positive real
number called regularization para., D a
stabilizer)
C ( F )  Cs ( F )  Cr ( F )
where
1 l
Cs ( F )   [d i  F ( xi )]2
2 i 1
1
2
Cr ( F )  DF
2
RBF supervising expert( cont.)


Proper generalization performance is a
difficult issue as well as the selection of
centers and para.
Supervised ExpertMinPatterns is hard to
estimated
SVM supervising expert


SVM obtains high generalization performance
without prioir knowledge even dimension of
input space is high
The classification is to estimate a function
f:RN{±1} using input-output training data
(x1,y1),…(xl,yl)  RN x {±1}
SVM supervising expert(cont.)

To minimized the risk in order to obtain
generalization performance
1
R f    | f ( x)  y | dP( x, y )
2

Since P(x,y) is unknown ,can only minimize
the empirical risk
1 l 1
Remp [ f ]   | f ( xi )  yi |
l i 1 2
SVM supervising expert(cont.)


R[f] has the dependence on the VC dim para.
h which is done by maximum separation Δ
between different classes with linear
hyperplane
For a set of pattern vector x1,…,xl X, the
hyperplanes can be as {x X: w.x+b=0}, w a
weight vector, b a bias
SVM supervising expert(cont.)
SVM supervising expert(cont.)




xi·w+b≥+1 for yi=+1 (1) yi(xi·w+b)-1≥0 ,i=1,…,l
xi·w+b≤-1 for yi= -1 (2) , w is the Normal vector of H1,H2
H1: xi·w+b=1 ,H2: xi·w+b=-1
Margin=2/║w║, ◎ is a support vector.
Applications



synthesis data
distinction of chaos from noise
ischemia detection
Synthesis data

The synthesis model look like:
Y  f ( A1, A2 ,..., An )   fi ( Ai )

Construction steps:
1.
2.
3.
4.
i
Generation of some proper value Vi,i=1,…,N
Induction of observation noise,V’i=1,…,N
Computation of the values of outcome variables
Induce observation noise to the outcome
variables,O’
Synthesis data( cont.)
distinction of chaos from
noise



To design a classification system that is able
distinguishing between a three-dim vector
and random Gaussian noise
Lorenz chaotic system has been used to
generate a chaotic trajectory that lying at the
three-dim space
The difficulty of distinguishing noise is
dependent on the state space region
distinction of chaos from
noise( cont.)




The regions far from the attractor can be
handle effectively with the CP-SOM
classification
Rest regions with supervised expert can’t be
distinguished since these are regions where
the classes overlap
Training set :20,000 half of Lorenz system
and half of Gaussian noise
Test set:20,000 which is constructed similarly
distinction of chaos from
noise( cont.)




Size of ambiguous pattern set was near 2000
with entropy criterion 0.2
Plain SOM avg. performance 79%
SNet-SOM with RBF 81%
SNet-SOM with SVM 82%
ischemia detection



The ECG signals of the European ST-T
database, which are a set of longterm Holter
recording provided by eight countries
From the samples composing each beat, a
window of 400 millisec. is selected
Signal component forms the input to PCA in
order to describe most of its content with a
few coefficients
ischemia detection( cont.)



The term dimensionality reduction refers to
that 100-dim data vector X is represented
with a vector X of 5-dim
A wavelet based denoising technique based
on Lipschitz regulariation theory is applied
Training set:15,000 ST-T segment from
44,000 beats from 6 records,


Two class:normal, ischemic
Test set: 15 records with 120,000 ECG beats
ischemia detection( cont.)
ischemia detection( cont.)
Conclusions


To obtain significant computational benefits in
large scale problems
The SNet-SOM is a modular architecture that
can be improved along many directions
Personal opinion


Provide a director of detecting noise within
improved SOM
It is a nice reference to my research
Download