Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher Chapter 2 (part 3) Bayesian Decision Theory (Sections 2-6,2-9) • Discriminant Functions for the Normal Density • Bayes Decision Theory – Discrete Features Discriminant Functions for the Normal Density 2 • We saw that the minimum error-rate classification can be achieved by the discriminant function gi(x) = ln P(x | i) + ln P(i) • Case of multivariate normal 1 1 d 1 t gi ( x ) ( x i ) ( x i ) ln 2 ln i ln P ( i ) 2 2 2 i Pattern Classification, Chapter 2 (Part 3) 3 • Case i = 2.I (I stands for the identity matrix) gi ( x ) wit x wi 0 (linear discriminant function) where : i 1 t wi 2 ; wi 0 i i ln P ( i ) 2 2 ( i 0 is called the threshold for the ith category! ) Pattern Classification, Chapter 2 (Part 3) 4 • A classifier that uses linear discriminant functions is called “a linear machine” • The decision surfaces for a linear machine are pieces of hyperplanes defined by: gi(x) = gj(x) Pattern Classification, Chapter 2 (Part 3) 5 Pattern Classification, Chapter 2 (Part 3) 6 • The hyperplane separating Ri and Rj 1 2 x0 ( i j ) 2 i j 2 P( i ) ln ( i j ) P( j ) always orthogonal to the line linking the means! 1 if P ( i ) P ( j ) then x0 ( i j ) 2 Pattern Classification, Chapter 2 (Part 3) 7 Pattern Classification, Chapter 2 (Part 3) 8 Pattern Classification, Chapter 2 (Part 3) 9 • Case i = (covariance of all classes are identical but arbitrary!) • Hyperplane separating Ri and Rj ln P ( i ) / P ( j ) 1 x0 ( i j ) .( i j ) t 1 2 ( i j ) ( i j ) (the hyperplane separating Ri and Rj is generally not orthogonal to the line between the means!) Pattern Classification, Chapter 2 (Part 3) 10 Pattern Classification, Chapter 2 (Part 3) 11 Pattern Classification, Chapter 2 (Part 3) 12 • Case i = arbitrary • The covariance matrices are different for each category g i ( x ) x tWi x w it x w i 0 where : 1 1 Wi i 2 wi i 1 i 1 t 1 1 wi 0 i i i ln i ln P ( i ) 2 2 (Hyperquadrics which are: hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids) Pattern Classification, Chapter 2 (Part 3) 13 Pattern Classification, Chapter 2 (Part 3) 14 Pattern Classification, Chapter 2 (Part 3) 15 Bayes Decision Theory – Discrete Features • • Components of x are binary or integer valued, x can take only one of m discrete values v1, v2, …, vm Case of independent binary features in 2 category problem Let x = [x1, x2, …, xd ]t where each xi is either 0 or 1, with probabilities: pi = P(xi = 1 | 1) qi = P(xi = 1 | 2) Pattern Classification, Chapter 2 (Part 3) 16 • The discriminant function in this case is: d g ( x ) w i x i w0 i 1 where : pi ( 1 q i ) w i ln q i ( 1 pi ) i 1 ,...,d and : 1 pi P( 1 ) w0 ln ln 1 qi P( 2 ) i 1 d decide 1 if g(x) 0 and 2 if g(x) 0 Pattern Classification, Chapter 2 (Part 3) 17 Bayesian Belief Network • Features • Causal relationships • Statistically independent • Bayesian belief nets • Causal networks • Belief nets Pattern Classification, Chapter 2 (Part 3) 18 x1 and x3 are independent Pattern Classification, Chapter 2 (Part 3) 19 Structure • Node • Discrete variables • Parent, Child Nodes • Direct influence • Conditional Probability Table • Set by expert or by learning from training set • (Sorry, learning is not discussed here) Pattern Classification, Chapter 2 (Part 3) 20 Pattern Classification, Chapter 2 (Part 3) 21 Examples Pattern Classification, Chapter 2 (Part 3) 22 Pattern Classification, Chapter 2 (Part 3) 23 Pattern Classification, Chapter 2 (Part 3) 24 Evidence e Pattern Classification, Chapter 2 (Part 3) 25 Ex. 4. Belief Network for Fish P(a) a1=winter, 0.25 a2=spring, 0.25 a3=summer,0.25 a4=autumn, 0.25 P(x|a,b) P(b) A B b1=north Atlantic, 0.6 b2=south Atlantic, 0.4 x1 x2 a1b1 0.5 0.5 a1b2 0.7 0.3 a2b1 0.6 0.4 a2b2 0.8 0.2 a3b1 0.4 0.6 a3b2 0.1 0.9 a4b1 0.2 0.8 0.3 0.7 a4b2 C P(c|x) c1=light,c2=medium, c3=dark x1 0.6, 0.2, 0.2 x2 0.2, 0.3, 0.5 X x1=salmon x2=sea bass P(d|x) D d1=wide, d2=thin x1 0.3, 0.7 x2 0.6, 0.4 Pattern Classification, Chapter 2 (Part 3) 26 Belief Network for Fish • Fish was caught in the summer in the north Atlantic and is a see bass that is dark and thin • P(a3,b1,x2,c3,d2) = P(a3)P(b1)P(x2|a3,b1)P(c3|x2)P(d2|x2) =0.25*0.6*0.4*0.5*0.4 =0.012 Pattern Classification, Chapter 2 (Part 3) 27 Light, south Atlantic, fish? Pattern Classification, Chapter 2 (Part 3) 28 Normalize Pattern Classification, Chapter 2 (Part 3) 29 Conditionally Independent Pattern Classification, Chapter 2 (Part 3) 30 Medical Application • Medical diagnosis • Uppermost nodes: biological agent • (virus or bacteria) • Intermediate nodes: diseases • (flu or emphysema) • Lowermost nodes: symptoms • (high temperature or coughing) • Finds the most likely disease or cause • By entering measured values Pattern Classification, Chapter 2 (Part 3) 31 Exercise 50 (based on Ex. 4) • (a) • December 20, north Atlantic, thin • P(a1)=P(a4)=0.5, P(b1)=1, P(d2)=1 • Fish? Error rate? • (b) • Thin, medium lightness • Season? Probability? • (c) • Thin, medium lightness, north atlantic • Season?, probability? Pattern Classification, Chapter 2 (Part 3)