lecture_04

ECE 8443 – Pattern Recognition LECTURE 04: PERFORMANCE BOUNDS • Objectives: Typical Examples Performance Bounds ROC Curves • Resources: D.H.S.: Chapter 2 (Part 3) V.V.: Chernoff Bound J.G.: Bhattacharyya T.T. : ROC Curves NIST: DET Curves URL: Audio: Two-Category Case (Review) • A classifier that places a pattern in one of two classes is often referred to as a dichotomizer. • We can reshape the decision rule: if g 1( x )  g 2 ( x )  g( x )  g 1( x ) - g 2 ( x )  0 • If we use log of the posterior probabilities: g( x )  P ( 1 x )  P ( 2 x ) f( x )  ln( g( x ))  ln( p x  1  p x  2  )  ln( P  1  P  2  ) • A dichotomizer can be viewed as a machine that computes a single discriminant function and classifies x according to the sign (e.g., support vector machines). ECE 8443: Lecture 04, Slide 1 Unconstrained or “Full” Covariance (Review) ECE 8443: Lecture 04, Slide 2 Threshold Decoding (Review) • This has a simple geometric interpretation: x  i 2  x 2 j  2 2 ln P ( j ) P ( i ) • The decision region when the priors are equal and the support regions are spherical is simply halfway between the means (Euclidean distance). ECE 8443: Lecture 04, Slide 3 General Case for Gaussian Classifiers P ( i ) g i ( x ) - g j ( x )  ln  P ( j ) 1 2  1  1 t 2 1 2 ln( j  t 1  i 1 t ( j  t i 2 [( x   i )  i x (  1 j 1 ) j )x  ( i t  j 1 t (x   i )  (x   j )   i i 1 1 i    i )  ln 1 j j ( x   j )] t  j) x P ( i ) P ( j )  1 ln( 2 i  j t  x Ax  b x  c where : A  1 ( 2 c 1 2 1 j  i 1 t ( j  ECE 8443: Lecture 04, Slide 4 j  1 b  i ) t j  i i 1 1 i    i )  ln 1 j P ( i ) P ( j ) j  1 2 ln( i  j ) ) Identity Covariance • Case: i = 2I A 0 c b  1 2 1  2 ( i   j ) t 2 t (  j  j   i  i )  ln P ( i ) P ( j ) • This can be rewritten as: t w (x  x 0 )  0 x0  1 2 ( i   j )   2 i   ECE 8443: Lecture 04, Slide 5 2 j ln P ( i ) P ( j ) ( i   j ) Equal Covariances • Case: i =  x0  1 2 ( i   j )  ln P ( i ) P ( j ) t ( i   j )  ECE 8443: Lecture 04, Slide 6 1 ( i   j ) ( i   j ) Arbitrary Covariances ECE 8443: Lecture 04, Slide 7 Typical Examples of 2D Classifiers ECE 8443: Lecture 04, Slide 8 Error Bounds • Bayes decision rule guarantees lowest average error rate • Closed-form solution for two-class Gaussian distributions • Full calculation for high dimensional space difficult • Bounds provide a way to get insight into a problem and engineer better solutions. • Need the following inequality:  1  min[ a , b ]  a b  a , b  0 and 0    1 Assume a  b without loss of generality: min[a,b] = b. Also, ab(1- ) = (a/b)b and (a/b)  1. Therefore, b  (a/b)b, which implies min[a,b]  ab(1- ) . • Apply to our standard expression for P(error). ECE 8443: Lecture 04, Slide 9 Chernoff Bound • Recall: P ( error )   min[ P ( 1 x ), P ( 2 x )] p ( x ) d x   min[ P ( 1 ) p ( x  1 ) P ( 2 ) p ( x  2 ) , ] p (x )dx p (x ) p (x )   min[ P ( 1 ) p ( x  1 ), P ( 2 ) p ( x  2 )] d x   1  ( 2 ) p 1  (x  2 )dx  P ( 1 ) P 1  ( 2 )  p ( x  1 ) p 1  (x  2 )dx   P ( 1 ) p ( x  1 ) P   • Note that this integral is over the entire feature space, not the decision regions (which makes it simpler). • If the conditional probabilities are normal, this expression can be simplified. ECE 8443: Lecture 04, Slide 10 Chernoff Bound for Normal Densities • If the conditional probabilities are normal, our bound can be evaluated analytically:   p ( x 1 ) p 1  ( x  2 ) d x  exp(  k (  )) where: k ( )   (1   ) 2  1 2 ln t (  2   1 ) [   1  (1   )  2 ]   1  (1   )  2 1  2 (1   ) • Procedure: find the value of  that minimizes exp(-k( ), and then compute P(error) using the bound. • Benefit: one-dimensional optimization using  ECE 8443: Lecture 04, Slide 11 1 ( 2  1 ) Bhattacharyya Bound • The Chernoff bound is loose for extreme values • The Bhattacharyya bound can be derived by  = 0.5:   P ( 1 ) P 1   ( 2 )  p ( x  1 ) p  P ( 1 ) P ( 2 )   P ( 1 ) P ( 2 ) exp(  k (  )) 1  (x  2 )dx p ( x 1 ) p ( x  2 ) d x where: k ( )  1 8 1   2 t ( 2  1 ) [ 1   2 2 ] 1 ( 2  1 )  1 2 ln 2 1  2 • These bounds can still be used if the distributions are not Gaussian (why? hint: Occam’s Razor). However, they might not be adequately tight. ECE 8443: Lecture 04, Slide 12 Receiver Operating Characteristic (ROC) • How do we compare two decision rules if they require different thresholds for optimum performance? • Consider four probabilities: * P ( x  x | x   1 ) : false alarm * P ( x  x | x   1 ) : correct rejection P ( x  x | x   2 ) : hit P ( x  x | x   2 ) : miss ECE 8443: Lecture 04, Slide 13 * * General ROC Curves • An ROC curve is typically monotonic but not symmetric: • One system can be considered superior to another only if its ROC curve lies above the competing system for the operating region of interest. ECE 8443: Lecture 04, Slide 14 Summary • Gaussian Distributions: how is the shape of the decision region influenced by the mean and covariance? • Bounds on performance (i.e., Chernoff, Bhattacharyya) are useful abstractions for obtaining closed-form solutions to problems. • A Receiver Operating Characteristic (ROC) curve is a very useful way to analyze performance and select operating points for systems. • Discrete features can be handled in a way completely analogous to continuous features. ECE 8443: Lecture 04, Slide 15

lecture_04

Related documents

Products

Support

lecture_04

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib