Document

advertisement
Pattern
Classification
All materials in these slides were taken from
Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John Wiley
& Sons, 2000
with the permission of the authors and the
publisher
Chapter 2 (part 3)
Bayesian Decision Theory
(Sections 2-6,2-9)
• Discriminant Functions for the Normal Density
• Bayes Decision Theory – Discrete Features
Discriminant Functions for the
Normal Density
2
• We saw that the minimum error-rate
classification can be achieved by the
discriminant function
gi(x) = ln P(x | i) + ln P(i)
• Case of multivariate normal
1
1
d
1
t
gi ( x )   ( x   i )  ( x   i )  ln 2  ln  i  ln P (  i )
2
2
2
i
Pattern Classification, Chapter 2 (Part 3)
3
• Case i = 2.I
(I stands for the identity matrix)
gi ( x )  wit x  wi 0 (linear discriminant function)
where :
i
1
t
wi  2 ; wi 0  

i  i  ln P (  i )
2

2
(  i 0 is called the threshold for the ith category! )
Pattern Classification, Chapter 2 (Part 3)
4
• A classifier that uses linear discriminant functions
is called “a linear machine”
• The decision surfaces for a linear machine are
pieces of hyperplanes defined by:
gi(x) = gj(x)
Pattern Classification, Chapter 2 (Part 3)
5
Pattern Classification, Chapter 2 (Part 3)
6
• The hyperplane separating Ri and Rj
1
2
x0  (  i   j ) 
2
i   j
2
P(  i )
ln
( i   j )
P(  j )
always orthogonal to the line linking the means!
1
if P (  i )  P (  j ) then x0  (  i   j )
2
Pattern Classification, Chapter 2 (Part 3)
7
Pattern Classification, Chapter 2 (Part 3)
8
Pattern Classification, Chapter 2 (Part 3)
9
• Case i =  (covariance of all classes are
identical but arbitrary!)
• Hyperplane separating Ri and Rj


ln P (  i ) / P (  j )
1
x0  (  i   j ) 
.(  i   j )
t
1
2
( i   j )  ( i   j )
(the hyperplane separating Ri and Rj is generally
not orthogonal to the line between the means!)
Pattern Classification, Chapter 2 (Part 3)
10
Pattern Classification, Chapter 2 (Part 3)
11
Pattern Classification, Chapter 2 (Part 3)
12
• Case i = arbitrary
•
The covariance matrices are different for each category
g i ( x )  x tWi x  w it x  w i 0
where :
1 1
Wi    i
2
wi   i 1  i
1 t 1
1
wi 0    i  i  i  ln  i  ln P (  i )
2
2
(Hyperquadrics which are: hyperplanes, pairs of hyperplanes,
hyperspheres, hyperellipsoids, hyperparaboloids,
hyperhyperboloids)
Pattern Classification, Chapter 2 (Part 3)
13
Pattern Classification, Chapter 2 (Part 3)
14
Pattern Classification, Chapter 2 (Part 3)
15
Bayes Decision Theory – Discrete
Features
•
•
Components of x are binary or integer valued, x can
take only one of m discrete values
v1, v2, …, vm
Case of independent binary features in 2 category
problem
Let x = [x1, x2, …, xd ]t where each xi is either 0 or 1, with
probabilities:
pi = P(xi = 1 | 1)
qi = P(xi = 1 | 2)
Pattern Classification, Chapter 2 (Part 3)
16
• The discriminant function in this case is:
d
g ( x )   w i x i  w0
i 1
where :
pi ( 1  q i )
w i  ln
q i ( 1  pi )
i  1 ,...,d
and :
1  pi
P( 1 )
w0   ln
 ln
1  qi
P(  2 )
i 1
d
decide 1 if g(x)  0 and  2 if g(x)  0
Pattern Classification, Chapter 2 (Part 3)
17
Bayesian Belief Network
• Features
• Causal relationships
• Statistically independent
• Bayesian belief nets
• Causal networks
• Belief nets
Pattern Classification, Chapter 2 (Part 3)
18
x1 and x3 are independent
Pattern Classification, Chapter 2 (Part 3)
19
Structure
• Node
• Discrete variables
• Parent, Child Nodes
• Direct influence
• Conditional Probability Table
• Set by expert or by learning from training set
• (Sorry, learning is not discussed here)
Pattern Classification, Chapter 2 (Part 3)
20
Pattern Classification, Chapter 2 (Part 3)
21
Examples
Pattern Classification, Chapter 2 (Part 3)
22
Pattern Classification, Chapter 2 (Part 3)
23
Pattern Classification, Chapter 2 (Part 3)
24
Evidence e
Pattern Classification, Chapter 2 (Part 3)
25
Ex. 4. Belief Network for Fish
P(a)
a1=winter, 0.25
a2=spring, 0.25
a3=summer,0.25
a4=autumn, 0.25
P(x|a,b)
P(b)
A
B
b1=north Atlantic, 0.6
b2=south Atlantic, 0.4
x1
x2
a1b1
0.5
0.5
a1b2
0.7
0.3
a2b1
0.6
0.4
a2b2
0.8
0.2
a3b1
0.4
0.6
a3b2
0.1
0.9
a4b1
0.2
0.8
0.3
0.7
a4b2
C
P(c|x)
c1=light,c2=medium, c3=dark
x1 0.6, 0.2,
0.2
x2 0.2, 0.3,
0.5
X
x1=salmon
x2=sea bass
P(d|x)
D
d1=wide, d2=thin
x1 0.3,
0.7
x2 0.6,
0.4
Pattern Classification, Chapter 2 (Part 3)
26
Belief Network for Fish
• Fish was caught in the summer in the north
Atlantic and is a see bass that is dark and thin
• P(a3,b1,x2,c3,d2)
= P(a3)P(b1)P(x2|a3,b1)P(c3|x2)P(d2|x2)
=0.25*0.6*0.4*0.5*0.4
=0.012
Pattern Classification, Chapter 2 (Part 3)
27
Light, south Atlantic, fish?
Pattern Classification, Chapter 2 (Part 3)
28
Normalize
Pattern Classification, Chapter 2 (Part 3)
29
Conditionally Independent
Pattern Classification, Chapter 2 (Part 3)
30
Medical Application
• Medical diagnosis
• Uppermost nodes: biological agent
• (virus or bacteria)
• Intermediate nodes: diseases
• (flu or emphysema)
• Lowermost nodes: symptoms
• (high temperature or coughing)
• Finds the most likely disease or cause
• By entering measured values
Pattern Classification, Chapter 2 (Part 3)
31
Exercise 50 (based on Ex. 4)
• (a)
• December 20, north Atlantic, thin
• P(a1)=P(a4)=0.5, P(b1)=1, P(d2)=1
• Fish? Error rate?
• (b)
• Thin, medium lightness
• Season? Probability?
• (c)
• Thin, medium lightness, north atlantic
• Season?, probability?
Pattern Classification, Chapter 2 (Part 3)
Download