lecture_04

advertisement
ECE 8443 – Pattern Recognition
LECTURE 04: PERFORMANCE BOUNDS
• Objectives:
Typical Examples
Performance Bounds
ROC Curves
• Resources:
D.H.S.: Chapter 2 (Part 3)
V.V.: Chernoff Bound
J.G.: Bhattacharyya
T.T. : ROC Curves
NIST: DET Curves
URL:
Audio:
Two-Category Case (Review)
• A classifier that places a pattern in one of two classes is often referred to as a
dichotomizer.
• We can reshape the decision rule:
if g 1( x )  g 2 ( x )

g( x )  g 1( x ) - g 2 ( x )  0
• If we use log of the posterior probabilities:
g( x )  P ( 1 x )  P ( 2 x )
f( x )  ln( g( x ))  ln(
p x  1 
p x  2 
)  ln(
P  1 
P  2 
)
• A dichotomizer can be viewed as a machine that computes a single
discriminant function and classifies x according to the sign (e.g., support
vector machines).
ECE 8443: Lecture 04, Slide 1
Unconstrained or “Full” Covariance (Review)
ECE 8443: Lecture 04, Slide 2
Threshold Decoding (Review)
• This has a simple geometric interpretation:
x  i
2
 x
2
j
 2
2
ln
P ( j )
P ( i )
• The decision region when the priors are equal and the support regions are
spherical is simply halfway between the means (Euclidean distance).
ECE 8443: Lecture 04, Slide 3
General Case for Gaussian Classifiers
P ( i )
g i ( x ) - g j ( x )  ln

P ( j )
1
2

1

1
t
2
1
2
ln(
j

t
1
 i
1
t
( j 
t
i
2
[( x   i )  i
x (

1
j
1
)
j
)x  ( i
t

j
1
t
(x   i )  (x   j ) 
 i i
1
1
i  
 i )  ln
1
j
j
( x   j )]
t
 j) x
P ( i )
P ( j )

1
ln(
2
i

j
t
 x Ax  b x  c
where :
A 
1
(
2
c
1
2
1
j
 i
1
t
( j 
ECE 8443: Lecture 04, Slide 4
j

1
b  i
)
t
j
 i i
1
1
i  
 i )  ln
1
j
P ( i )
P ( j )
j

1
2
ln(
i

j
)
)
Identity Covariance
• Case: i = 2I
A 0
c
b 
1
2
1

2
( i   j )
t
2
t
(  j  j   i  i )  ln
P ( i )
P ( j )
• This can be rewritten as:
t
w (x  x 0 )  0
x0 
1
2
( i   j ) 

2
i  
ECE 8443: Lecture 04, Slide 5
2
j
ln
P ( i )
P ( j )
( i   j )
Equal Covariances
• Case: i = 
x0 
1
2
( i   j ) 
ln P ( i ) P ( j )
t
( i   j ) 
ECE 8443: Lecture 04, Slide 6
1
( i   j )
( i   j )
Arbitrary Covariances
ECE 8443: Lecture 04, Slide 7
Typical Examples of 2D Classifiers
ECE 8443: Lecture 04, Slide 8
Error Bounds
• Bayes decision rule guarantees lowest average error rate
• Closed-form solution for two-class Gaussian distributions
• Full calculation for high dimensional space difficult
• Bounds provide a way to get insight into a problem and
engineer better solutions.
• Need the following inequality:
 1 
min[ a , b ]  a b
 a , b  0 and 0    1
Assume a  b without loss of generality: min[a,b] = b.
Also, ab(1- ) = (a/b)b and (a/b)  1.
Therefore, b  (a/b)b, which implies min[a,b]  ab(1- ) .
• Apply to our standard expression for P(error).
ECE 8443: Lecture 04, Slide 9
Chernoff Bound
• Recall:
P ( error )   min[ P ( 1 x ), P ( 2 x )] p ( x ) d x
  min[
P ( 1 ) p ( x  1 ) P ( 2 ) p ( x  2 )
,
] p (x )dx
p (x )
p (x )
  min[ P ( 1 ) p ( x  1 ), P ( 2 ) p ( x  2 )] d x


1 
( 2 ) p
1 
(x  2 )dx
 P ( 1 ) P
1 
( 2 )  p ( x  1 ) p
1 
(x  2 )dx
  P ( 1 ) p ( x  1 ) P


• Note that this integral is over the entire feature space, not the decision
regions (which makes it simpler).
• If the conditional probabilities are normal, this expression can be simplified.
ECE 8443: Lecture 04, Slide 10
Chernoff Bound for Normal Densities
• If the conditional probabilities are normal, our bound can be evaluated
analytically:

 p ( x 1 ) p
1 
( x  2 ) d x  exp(  k (  ))
where:
k ( ) 
 (1   )
2

1
2
ln
t
(  2   1 ) [   1  (1   )  2 ]
  1  (1   )  2
1

2
(1   )
• Procedure: find the value of  that
minimizes exp(-k( ), and then
compute P(error) using the bound.
• Benefit: one-dimensional optimization
using 
ECE 8443: Lecture 04, Slide 11
1
( 2  1 )
Bhattacharyya Bound
• The Chernoff bound is loose for extreme values
• The Bhattacharyya bound can be derived by  = 0.5:

 P ( 1 ) P
1 

( 2 )  p ( x  1 ) p

P ( 1 ) P ( 2 ) 

P ( 1 ) P ( 2 ) exp(  k (  ))
1 
(x  2 )dx
p ( x 1 ) p ( x  2 ) d x
where:
k ( ) 
1
8
1   2
t
( 2  1 ) [
1   2
2
]
1
( 2  1 ) 
1
2
ln
2
1  2
• These bounds can still be used if the distributions are not Gaussian (why?
hint: Occam’s Razor). However, they might not be adequately tight.
ECE 8443: Lecture 04, Slide 12
Receiver Operating Characteristic (ROC)
• How do we compare two decision rules if they require different thresholds for
optimum performance?
• Consider four probabilities:
*
P ( x  x | x   1 ) : false alarm
*
P ( x  x | x   1 ) : correct rejection
P ( x  x | x   2 ) : hit
P ( x  x | x   2 ) : miss
ECE 8443: Lecture 04, Slide 13
*
*
General ROC Curves
• An ROC curve is typically monotonic but not symmetric:
• One system can be considered superior to another only if its ROC curve lies
above the competing system for the operating region of interest.
ECE 8443: Lecture 04, Slide 14
Summary
• Gaussian Distributions: how is the shape of the decision region influenced by
the mean and covariance?
• Bounds on performance (i.e., Chernoff, Bhattacharyya) are useful
abstractions for obtaining closed-form solutions to problems.
• A Receiver Operating Characteristic (ROC) curve is a very useful way to
analyze performance and select operating points for systems.
• Discrete features can be handled in a way completely analogous to
continuous features.
ECE 8443: Lecture 04, Slide 15
Download