ECE 8443 – Pattern Recognition LECTURE 04: PERFORMANCE BOUNDS • Objectives: Typical Examples Performance Bounds ROC Curves • Resources: D.H.S.: Chapter 2 (Part 3) V.V.: Chernoff Bound J.G.: Bhattacharyya T.T. : ROC Curves NIST: DET Curves URL: Audio: Two-Category Case (Review) • A classifier that places a pattern in one of two classes is often referred to as a dichotomizer. • We can reshape the decision rule: if g 1( x ) g 2 ( x ) g( x ) g 1( x ) - g 2 ( x ) 0 • If we use log of the posterior probabilities: g( x ) P ( 1 x ) P ( 2 x ) f( x ) ln( g( x )) ln( p x 1 p x 2 ) ln( P 1 P 2 ) • A dichotomizer can be viewed as a machine that computes a single discriminant function and classifies x according to the sign (e.g., support vector machines). ECE 8443: Lecture 04, Slide 1 Unconstrained or “Full” Covariance (Review) ECE 8443: Lecture 04, Slide 2 Threshold Decoding (Review) • This has a simple geometric interpretation: x i 2 x 2 j 2 2 ln P ( j ) P ( i ) • The decision region when the priors are equal and the support regions are spherical is simply halfway between the means (Euclidean distance). ECE 8443: Lecture 04, Slide 3 General Case for Gaussian Classifiers P ( i ) g i ( x ) - g j ( x ) ln P ( j ) 1 2 1 1 t 2 1 2 ln( j t 1 i 1 t ( j t i 2 [( x i ) i x ( 1 j 1 ) j )x ( i t j 1 t (x i ) (x j ) i i 1 1 i i ) ln 1 j j ( x j )] t j) x P ( i ) P ( j ) 1 ln( 2 i j t x Ax b x c where : A 1 ( 2 c 1 2 1 j i 1 t ( j ECE 8443: Lecture 04, Slide 4 j 1 b i ) t j i i 1 1 i i ) ln 1 j P ( i ) P ( j ) j 1 2 ln( i j ) ) Identity Covariance • Case: i = 2I A 0 c b 1 2 1 2 ( i j ) t 2 t ( j j i i ) ln P ( i ) P ( j ) • This can be rewritten as: t w (x x 0 ) 0 x0 1 2 ( i j ) 2 i ECE 8443: Lecture 04, Slide 5 2 j ln P ( i ) P ( j ) ( i j ) Equal Covariances • Case: i = x0 1 2 ( i j ) ln P ( i ) P ( j ) t ( i j ) ECE 8443: Lecture 04, Slide 6 1 ( i j ) ( i j ) Arbitrary Covariances ECE 8443: Lecture 04, Slide 7 Typical Examples of 2D Classifiers ECE 8443: Lecture 04, Slide 8 Error Bounds • Bayes decision rule guarantees lowest average error rate • Closed-form solution for two-class Gaussian distributions • Full calculation for high dimensional space difficult • Bounds provide a way to get insight into a problem and engineer better solutions. • Need the following inequality: 1 min[ a , b ] a b a , b 0 and 0 1 Assume a b without loss of generality: min[a,b] = b. Also, ab(1- ) = (a/b)b and (a/b) 1. Therefore, b (a/b)b, which implies min[a,b] ab(1- ) . • Apply to our standard expression for P(error). ECE 8443: Lecture 04, Slide 9 Chernoff Bound • Recall: P ( error ) min[ P ( 1 x ), P ( 2 x )] p ( x ) d x min[ P ( 1 ) p ( x 1 ) P ( 2 ) p ( x 2 ) , ] p (x )dx p (x ) p (x ) min[ P ( 1 ) p ( x 1 ), P ( 2 ) p ( x 2 )] d x 1 ( 2 ) p 1 (x 2 )dx P ( 1 ) P 1 ( 2 ) p ( x 1 ) p 1 (x 2 )dx P ( 1 ) p ( x 1 ) P • Note that this integral is over the entire feature space, not the decision regions (which makes it simpler). • If the conditional probabilities are normal, this expression can be simplified. ECE 8443: Lecture 04, Slide 10 Chernoff Bound for Normal Densities • If the conditional probabilities are normal, our bound can be evaluated analytically: p ( x 1 ) p 1 ( x 2 ) d x exp( k ( )) where: k ( ) (1 ) 2 1 2 ln t ( 2 1 ) [ 1 (1 ) 2 ] 1 (1 ) 2 1 2 (1 ) • Procedure: find the value of that minimizes exp(-k( ), and then compute P(error) using the bound. • Benefit: one-dimensional optimization using ECE 8443: Lecture 04, Slide 11 1 ( 2 1 ) Bhattacharyya Bound • The Chernoff bound is loose for extreme values • The Bhattacharyya bound can be derived by = 0.5: P ( 1 ) P 1 ( 2 ) p ( x 1 ) p P ( 1 ) P ( 2 ) P ( 1 ) P ( 2 ) exp( k ( )) 1 (x 2 )dx p ( x 1 ) p ( x 2 ) d x where: k ( ) 1 8 1 2 t ( 2 1 ) [ 1 2 2 ] 1 ( 2 1 ) 1 2 ln 2 1 2 • These bounds can still be used if the distributions are not Gaussian (why? hint: Occam’s Razor). However, they might not be adequately tight. ECE 8443: Lecture 04, Slide 12 Receiver Operating Characteristic (ROC) • How do we compare two decision rules if they require different thresholds for optimum performance? • Consider four probabilities: * P ( x x | x 1 ) : false alarm * P ( x x | x 1 ) : correct rejection P ( x x | x 2 ) : hit P ( x x | x 2 ) : miss ECE 8443: Lecture 04, Slide 13 * * General ROC Curves • An ROC curve is typically monotonic but not symmetric: • One system can be considered superior to another only if its ROC curve lies above the competing system for the operating region of interest. ECE 8443: Lecture 04, Slide 14 Summary • Gaussian Distributions: how is the shape of the decision region influenced by the mean and covariance? • Bounds on performance (i.e., Chernoff, Bhattacharyya) are useful abstractions for obtaining closed-form solutions to problems. • A Receiver Operating Characteristic (ROC) curve is a very useful way to analyze performance and select operating points for systems. • Discrete features can be handled in a way completely analogous to continuous features. ECE 8443: Lecture 04, Slide 15