Fuzzy Pattern Recognition

advertisement
Fuzzy Pattern
Recognition
Overview of Pattern Recognition
Pattern Recognition Procedure
Unknown
Speech
/Image
/Data
Feature
Extraction
Feature
Reduction
Classification
(supervised)
Class
Label
Known
Clustering
(unsupervised or
self-organizing)
Performance
Criteria
Clusters
Cluster
Validity
Overview of Pattern Recognition
Supervised Learning for Classification



The class label is known for a set of samples.
Find the decision boundary from the given
samples.
For unknown data set, do classification
Unsupervised Learning for Clustering

Set of data is given, find the group or grouping
boundary
Reinforcement Learning (Reward/Penalty)


Unkind teacher is given
Trial and Error Scheme
Overview of Pattern Recognition
Classification and Clustering
Problem:
Which class to assign
Problem:
How to partition
Class 1
How many clusters
Class 2
?
Classification
Clustering
Overview of Pattern Recognition
Pattern Recognition Algorithm

Based on statistical approach
Parametric Approach


Bay’es Classifier with Gaussian Density
Nonlinear Boundary or Decision Function
Nonparametric Approach for Density Estimation



Parzen window
K-nearest method
Based on Neural Networks
Classifier

Multilayer Perceptron, ART, Neocogntion, …
Clustering

SOM(Self-Organizing Map)
Fuzzy Pattern Recognition
Classification



Rule-Based Classifier
Fuzzy Perceptron
Fuzzy K-NN Algorithm
Clustering




Fuzzy C-Mean
Possibilistic C-Mean
Fuzzy C-Shell Clustering
Fuzzy Rough Clustering
Cluster Validity

Validity Measures Based on Fuzzy Set Theory
Fuzzy Pattern Recognition
Fuzzy Classification
Rule-Based Classifier

Idea: Nonlinear Partition of Feature Space
x2
Rule 1 : If x1 is S and x2 is S, t henclass1
Rule 2 : If x1 is S and x2 is not S, t henclass 2
Rule 3 : If x1 is M and x2 is S, t henclass 2
Rule 4 : If x1 is M and x2 is not S, t henclass1
Rule 5 : If x1 is not S and x2 is L, t henclass 1
Rule 6 : If x1 is L and x2 is S, t henclass 1
x1

Rule 7 : If x1 is L and x2 is M, t henclass 2
How to find the rule from sample data.
Project the labeled training data, and design membership
functions
Fuzzy clustering and projection to obtain membership function
Fuzzy Classification
Fuzzy K-Nearest Neighbor Algorithm

Crisp K-NN Algorithm
Class 1
Class 2
Class 2
K=3
Class 1
W  {x1 , x2 ,..., xn } : A set of n labelled patterns
BEGIN
Input y, a unknownsample
Set K , 1  K  n
Initializei  1
DO UNT IL( K - nearest neighborsfound)
Computedistancefrom y to xi
Find K - nearest neighbors
END DO UNT IL
Determinethemajorityclass represented in theset of K - nearest neighbors
IF (a tie exists)T HEN
Computethedistancesof neighborsin each classes which tied
IF (no tie exists)T HEN
Classify y to theclass with minimumsum
ELSE
Classify y to theclass with thelast minimumsum
END IF
ELSE
Classify y to themajorityclass
END IF
END
Fuzzy Classification
Fuzzy K-Nearest Neighbor Algorithm

Fuzzy K-NN Algorithm
Known Class Membershipof LabeledSamples:
uij  thedegree of belongedness of x j to class i
Class 1
Class 2
W  {x1 , x2 ,..., xn } : A set of n labelled patterns
uij for all i  1,2 , ... C and j  1,2 , ... n
BEGIN
Input y, a unknownsample
Set K , 1  K  n
Init ializei  1
DO UNT IL( K - nearestneighborsfound)
Computedistancefrom y to xi
Find K - nearestneighbors
END DO UNT IL
ui ( y ) 
u
j 1
K
ij
1/ y  x
 1 / y  x
j 1
2 /( m 1)
j
2 /( m 1)
j


(m  1)
END FOR
Classify y to theclass wit h themaximumui ( y )
END

fuzzy clusteringor othermethods
m  T heneighborsare moreevenly weighted.
For i  1 to C
Computethemembershipof y to i - th class
K
How to calculateuij for all xi to class j.
m  T hecloser neighborsare far morehavily
weighted than thos
e fartheraway.
Fuzzy Nearest Prototype
Classification
Crisp and Fuzzy Nearest Prototype Classification
Prototype of Class 1
Prototype of Class 2
Decision Boundary
• Crisp Version
W  {Z1 , Z 2 ,...,Z C } : A set of C prototypevectors
BEGIN
Input x, a vector t obe classified
For all i  1,...C
Find thedist ancefrom Z i to x
END For
Determinetheminimumdist anceand classify x to thatclass
END
• Fuzzy Version
W  {Z1 , Z 2 ,...,Z C } : A set of C protot ypevect ors
BEGIN
Input x, a vect or tobe classified
For all i  1,...C
Find thedist ancefrom Z i to x
END For
For all i  1,...C
Comput eui ( x) 
1/ x  Zi
C
 (1 / x  Z j
j 1
END For
END
2 /( m 1)
2 /( m 1)
)
Fuzzy Perceptron
Crisp Single-Layer Perceptron (Twoclass problem)
Find the linear decision
boundary of separable
data
Linear Decision Boundary
X  {x1 , x2 ,..., xN | xi  R d , i  1,2,...N } : Set of N d - dimensional input vectors
C  {cx1 , cx2 ,...,cxN | ci {1,1}, i  1,2,...N } : Associatedlabel set
X 1  {xi | cxi  1} and X 1  {xi | cxi  1} : Sets of positiveand negativeexample
X 1  N1 and X 1  N 1 : N1  N 1  N


t
xi  xi(1) , xi( 2) ,..., xi( d ) ,1 : Augmentedinput vector for each xi
w  w1 , w2 ,...,wd , wd 1  : Linearseparatinghyperplane
t
LearningRule :
w (t  1)  w (t )  η(cx (t )  Cw (t ) ( x(t )))x(t )
1 if sgn(w t (t ) x(t ))  0
Cw (t ) ( x(t )  
otherwise
 1
Fuzzy Perceptron
Fuzzy Perceptron
uk ( xi )  [0,1] : Degree of belongedness of xi to class k
For twoclass case u1 ( xi )  u1 ( xi )  1.
For xi  X 1 ,
exp( f (d 1 ( xi )  d 1 ( xi )) / d )  exp( f )
u1 ( xi )  0.5 
2(exp(f )  exp( f ))
For xi  X 1 ,
u1 ( xi )  0.5 
exp( f (d1 ( xi )  d 1 ( xi )) / d )  exp( f )
2(exp(f )  exp( f ))
d1 ( xi )  xi  mean of positiveclass
u1 ( xi )  1  u1 ( xi ).
u1 ( xi )  1  u1 ( xi ).
d 1 ( xi )  xi  mean of negativeclass
d  mean of positiveclass - mean of negativeclass
f  a constant
Fuzzy Perceptron
Fuzzy Perceptron
LearningRule :
w (t  1)  w (t )  η u1 ( x(t ))  u1 ( x(t )) (c x ( t )  C w (t ) ( x(t )))x(t )
m
Advantage



u1 ( x(t )  u 1 ( x(t ))
1
Generalize the crisp algorithm
Elegant termination in non-separable case
Crisp case: Not terminate in finite time
m
Fuzzy Perceptron
Termination of FP

If misclassifications are all caused by very fuzzy
data, then terminate the learning.
veryfuzzy data u1 ( xi ) [0.5   ,0.5   ]
1  exp( f )


2(exp(f )  exp( f ))
Note: FP can be combined with kernel-based
method. (J.H. Chen & C.S. Chen, IEEE Trans.
On NNs, 2002)
Fuzzy C-Mean
Clustering Objective

The aim of the iterative algorithm is to decr
ease the value of an objective function
Notations



x1 , x 2 ,...,x n
Samples
Prototypes p1 , p 2 ,...,p k
L2-distance:
d
|| x i  p j ||   ( x ik  p jk ) 2
2
k 1
Fuzzy C-Mean
Crisp objective:
n

min || xi  p j ||2
i 1 j{1, 2,...,k }
Fuzzy objective
k
n
m
2
u
||
p

x
||
  ij
i
j
i 1 j 1
Fuzzy C-Mean
Crisp C-Mean Algorithm


Initiate k seeds of prototypes p1, p2, …, pk
Grouping:
Assign samples to their nearest prototypes
Form non-overlapping clusters out of these
samples

Centering:
Centers of clusters become new prototypes

Repeat the grouping and centering steps,
until convergence
Fuzzy C-Mean
Crisp C-Mean Algorithm

Grouping: Assigning samples to their
nearest prototypes helps to decrease the
objective
n

min
i 1 j{1, 2,...,k }

|| x i  p j ||2
Centering: Also helps to decrease the
above objective, because
m
m

 | y i  w ||   || y i  y ||2  || y  w ||2
i 1
2
i 1

and equality holds only if
1 m
w  y   yi
m i 1
Fuzzy C-Mean
Membership matrix: Uc×n

Uij is the grade of membership of sample j
with respect to prototype i
Crisp membership:
uij  1, if || pi  x j ||2  min || p k  x j ||2
k
uij  0, otherwise
Fuzzy membership:
c
 uij  1, j  1,, n
i 1
Fuzzy C-Mean
Objective function of FCM
c n
m 2
  uij d ij
i 1 j 1
c n
   uijm || p i  x j || 2
i 1 j 1
Introducing the Lagrange multiplier λ
c
with respect to the constraint i1uij  1,
the objective function as:
J
c
m 2
 uij dij
i 1
 c  
     uij   1
  i 1  
Fuzzy C-Mean
Setting the partial derivatives to zero,
J  c

   uik   1  0
  k 1 
J
 m  uijm1  dij2    0
uij



 md2
ij

 1
 m 1


From the 2nd equation, u
From this fact and the 1st equation,
ij
   m11
1   uik   

2 

k 1
k 1  m  d ik 
C
C

 
m
1
m 1 C
 1 
 2 

k 1  d ik 
1
m 1
Fuzzy C-Mean
Therefore, updating rule is
1
 m1

 
m
1

1
 m1
 1
  2 
k 1 d ik 
c
 1  1
  m1
uij 

1  2
d
c  1  m 1  ij 
  2 
k 1 d ik 
1

1
2
c  d ij  m 1
 2 
k 1 d ik 
 
1
Fuzzy C-Mean
Setting the derivative of J with respect t
o pi to zero,
J
 c n m
2
0

u
||
p

x
||
  ij
i
j
p i p i i 1 j 1
n
  uijm
j 1

n
m
 uij
j 1
n

|| p i  x j || 2
p i

(p i  x j )T (p i  x j )
p i
 2  uijm (p i  x j )
j 1
Fuzzy C-Mean
Update rule of ci:
n
n
J
  uijm (pi  x j )  0
pi j 1
pi 
m
u
 ij x j
j 1
n
j 1
To summarize:
uij 
n
1
1 ( m1)
 d ij 
  d 
k 1 kj 
c
 uij
pi 
 uij x j
j 1
n
m
m
u
 ij
j 1
m
Fuzzy C-Mean
K-means
Fuzzy c-means
Fuzzy C-Mean
Fuzzy C-Mean
Gustafson-Kessel Algorithm
Cluster Validity to Determine Number of
Clusters
Extraction of Rule Base from Fuzzy
Cluster
Possibilistic C-Mean
Problem of FCM

Equal Evidence = Ignorance
u1 ( A)  u1 ( B) u1 ( A)  u1 ( B)


u1 ( B)  u1 (C ) u1 ( B)  u1 (C )
u1 ( A)  u2 ( A)  0.5

u1 ( B)  u2 ( B)  0.5
u1 ( A)  u2 ( A)  0.5

u1 ( B)  u2 ( B)  0.5
u ( A)  u2 ( A)  0.5
 1
u1 ( B)  u2 ( B)  0.5
u ( A)  u2 ( A)  0.5
 1
u1 ( B)  u2 ( B)  0.5
Possibilistic C-Mean
Objective Function of Fuzzy C-Mean
 c
 

J   u d      uij   1
i 1
  i 1  
c
m
ij


2
ij
Constraint from Ruspini: Sum of membership of a datum
over all classes should be 1.
Too restrictive condition for noisy data
Objective Function of PCM
C
N
C
N
i 1
j 1
J   (uij ) d  i  (1  uij ) m
m
i 1 j 1


2
ij
Minimize intra-cluster distance
Make membership as large as possible
Possibilistic C-Mean
Necessary Condition
uij 
1
d
1  
 i
2
ij




1
m 1
Determination of  i

N
Average cluster distance
i  K
u
j 1
N
u
j 1

Based on alpha-cut
i 
d

 
x j  i

 i 
2
ij
m
ij
d ij2
m
ij
, where  i  is the - cut of  i .
Possibilistic C-Mean
Membership according to
d ij2
i
Possibilistic C-Mean
N
Cluster Centers
ci 
m
u
 ij x j
j 1
N
m
u
 ij
j 1
Inner Product


dij2  ( x j  ci )t Ai ( x j  ci )
Gustafson-Kessel (See previous page)
Spherical shell cluster d  ( x  c  r )
2
ij
1/ 2
j
j
Approximate Prototype
1
pi   ( H i ) 1 wi
2
N
N
  2ci 
x 
t
t
m x j 
pi   t
H

u
x
1
w

2
uijm ( x j x j )  j 


i
ij 
j
i

2
j 1
j 1
1
1
ci ci  ri 


2
i
Possibilistic C-Mean
2-Pass Algorithm:


Initialize PC Partition
DO Until (Change in PC Partition is Small)
Update Prototype
Update PC Partition using average cluster
distances


Based on the resulted PC Partition
DO Until (Change in PC Partition is Small)
Update Prototype
Update PC Partition using alpha-cut distances
Possibilistic C-Mean
Advantage


Robust to noisy data
Possibly good to get the fuzzy rule base
FCM-Based C-Shell
PCM-Based C-Shell
Other Notion of Distance
Other Notion of Distance

Weights on features

p
d (v, xk )    st xk( s )  v ( s )
2
s 1
i {1,2,...C}
p

s 1

Optimal Weights
 is 
is
a
1
 n m (s) (s)
p   uik xk  vi
 k 1

 n m (r ) (r )
r 1
  uik xk  vi
 k 1
1 /(t 1)



2 






2

2
Other Notion of Distance
FCM with Euclidian Distance
FCM with Adaptive Distance
Download