0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher Pattern Classification, Chapter 2 (Part 2) Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5) • Minimum-Error-Rate Classification • Classifiers, Discriminant Functions and Decision Surfaces • The Normal Density 2 Minimum-Error-Rate Classification • Actions are decisions on classes If action αi is taken and the true state of nature is ωj then: the decision is correct if i = j and in error if i ≠ j • Seek a decision rule that minimizes the probability of error which is the error rate Pattern Classification, Chapter 2 (Part 2) 3 • Introduction of the zero-one loss function: ⎧0 i = j λ ( α i ,ω j ) = ⎨ ⎩1 i ≠ j i , j = 1 ,..., c Therefore, the conditional risk is: j =c R(α i | x) = ∑ λ (α i | ω j ) P(ω j | x) j =1 = ∑ P(ω j | x) = 1 − P(ωi | x) j ≠i “The risk corresponding to this loss function is the average probability error” Pattern Classification, Chapter 2 (Part 2) 4 • Minimize the risk requires maximize P(ωi | x) (since R(αi | x) = 1 – P(ωi | x)) • For Minimum error rate • Decide ωi if P (ωi | x) > P(ωj | x) ∀j ≠ i • Equivalently ωi ∈ arg minj P (ωi | x) Pattern Classification, Chapter 2 (Part 2) 5 • Regions of decision and zero-one loss function, therefore: P( x | ω1 ) λ12 − λ 22 P ( ω 2 ) Let . = θ λ then decide ω 1 if : > θλ λ 21 − λ11 P ( ω 1 ) P( x | ω 2 ) • If λ is the zero-one loss function which means: ⎛ 0 1⎞ ⎟⎟ λ = ⎜⎜ ⎝1 0⎠ P( ω 2 ) then θ λ = = θa P( ω1 ) ⎛0 2 ⎞ 2 P( ω 2 ) ⎟⎟ then θ λ = = θb if λ = ⎜⎜ P( ω1 ) ⎝1 0⎠ Pattern Classification, Chapter 2 (Part 2) 6 Pattern Classification, Chapter 2 (Part 2) Classifiers, Discriminant Functions and Decision Surfaces 7 • The multi-category case • Set of discriminant functions gi(x), i = 1,…, c • The classifier assigns a feature vector x to class ωi if: gi(x) > gj(x) ∀j ≠ i Pattern Classification, Chapter 2 (Part 2) 8 Pattern Classification, Chapter 2 (Part 2) 9 • Feature space divided into c decision regions if gi(x) > gj(x) ∀j ≠ i then x is in Ri (Ri means assign x to ωi) • The two-category case • A classifier is a “dichotomizer” that has two discriminant functions g1 and g2 Let g(x) ≡ g1(x) – g2(x) Decide ω1 if g(x) > 0 ; Otherwise decide ω2 Pattern Classification, Chapter 2 (Part 2) 10 • Bayes Classifier gi(x) = - R(αi | x) (max. discriminant corresponds to min. risk!) • For the minimum error rate, we take gi(x) = P(ωi | x) (max. discrimination corresponds to max.posterior!) gi(x) ≡ P(x | ωi) P(ωi) gi(x) = ln P(x | ωi) + ln P(ωi) (ln: natural logarithm!) Pattern Classification, Chapter 2 (Part 2) 11 Pattern Classification, Chapter 2 (Part 2) 12 • The computation of g(x) g( x ) = P ( ω 1 | x ) − P ( ω 2 | x ) P( ω1 ) P( x | ω1 ) = ln + ln P( ω 2 ) P( x | ω 2 ) Pattern Classification, Chapter 2 (Part 2) 13 Error Probabilities • Two-category case P (error ) = P ( x ∈ R2 , ω1 ) + P ( x ∈ R1 , ω 2 ) = P ( x ∈ R2 ω1 ) P (ω1 ) + P ( x ∈ R1 ω 2 ) P (ω 2 ) = ∫ p ( x ω ) P (ω )dx + ∫ p ( x ω 1 R2 1 2 ) P (ω 2 ) dx R1 Pattern Classification, Chapter 2 (Part 2) Minimax Solution 14 • Total Risk R R ( P (ω1 )) = Rmm + P (ω1 ) S Pattern Classification, Chapter 2 (Part 2)