※ Discrimination and classification 1. To describe the differential features of objects from several known collections. 2. To assign new objects into two or more labeled by using a rule. Example: We want to judge two species of chickweed by Sepal(萼片) and petal(花瓣) length, petal cleft(半裂的) depth, bract(苞片) length, scarious(乾膜質的) tip length, pollen(花粉) diameter(直徑). Let f1 ( x) and f 2 ( x) be density functions associates with the p 1 vector random variable X for the populations 1 and 2 . Let be the sample space of all possible observations x . Let R1 be the set of x values for which we classify objects as 1 and R2 = R1 be the remaining x values for which we classify objects as 2 . The conditional probability of classifying an object as 2 when it is from 1 is p(2 | 1) p( X R2 | 1 ) f1 ( x) dx , similarly, the conditional probability of R2 classifying an object as 1 when it is really from 2 is p(1 | 2) p ( X R1 | 2 ) f 2 ( x) dx . R1 Let p1 be the prior probability of 1 and p2 be the prior probability of 2 , where p1 p2 1. Then p(obervation is misclassif ied as 1 ) p( X R1 | 2 ) p( 2 ) p(1 | 2) p2 , p(obervation is misclassif ied as 2 ) p( X R2 | 1 ) p(1 ) p(2 | 1) p1 . Let c(1 | 2) is cost when an observation from 2 is incorrectly classified as 1 , and c(2 | 1) is the cost when 1 observation is incorrectly classified as 2 . 1. For two populations (minimize ECM): Allocate x to 1 if f1 ( x) C (1 2) P2 , in particular two populations are normal distribution then f 2 ( x) C (2 1) P1 a. If 1 2 C (1 2) P2 1 ) Allocate x to 1 if ( 1 2 ) ' 1 x ( 1 2 ) ' 1 ( 1 2 ) ln( 2 C (2 1) P1 PS: Above inequality is implemented by substituting the sample quantities x1 , x2 , Spooled for 1 , 2 , respectively. b. If 1 2 Allocate x to 1 if C (1 2) P2 1 x(11 21 ) x ( 1' 11 2' 21 ) x K ln( ) 2 C (2 1) P1 1 1 Where K ln( 1 ) ( 1' 11 1 2 ) 2 2 2 PS: Above inequality is implemented by substituting the sample quantities x1 , x2 , S1 , S2 for 1 , 2 , 1 , 2 respectively. 2. For two populations using Fisher’s discrimination (maximum separation): It does not assume that the populations are normal, but it need the equal population covariance matrices. | y y2 | Choose a linear transformation a' , y a ' x , such that the separation= 1 is maximum, where Sy S y2 n1 n2 j 1 j 1 ( y1 j y1 ) 2 ( y2 j y2 ) 2 n1 n2 2 is the pooled estimate of variance. Then the linear transformation 1 yˆ aˆ ' x ( x1 x2 )' S pooled x maximizes the ratio of the separation So, allocate x to 1 if ( 1 2 ) ' 1 x C (1 2) P2 1 ( 1 2 ) ' 1 ( 1 2 ) ln( ) 2 C (2 1) P1 PS: Above inequality is implemented by substituting the sample quantities x1 , x2 , Spooled for 1 , 2 , respectively. Remark: Fisher’s linear discrimination rule is equivalent to the minimum ECM rule with equal prior probabilities and equal costs of misclassification. 3. For several populations (minimum TPM): Allocate x to k if ln Pk f k ( x) ln Pi f i ( x) for all i k , in particular all populations are normal distribution then a. unequal i 1 1 Allocate x to k if d kQ ( x) max {d iQ ( x)} , where d iQ ( x) ln i ( x i ) ' i1 ( x i ) ln Pi 1i g 2 2 PS: Above inequality is implemented by substituting the sample quantities xi , Si for i , i respectively. b. equal i , i 1,2, g 1 Allocate x to k if di ( x) max{di ( x)} , where d i ( x) i' 1 x i' 1 i ln Pi 1i g 2 PS: Above inequality is implemented by substituting the sample quantities xi , S pooled for i , respectively. 4. For several populations using Fisher’s discrimination (maximum separation): Now we introduce this method. It does not assume that the populations are normal, but it need the equal population covariance matrices. We consider the linear combination Y a' X Sum of squared distances from population s to overall mean of Y Variance of Y g (iY Y ) 2 i 1 2 Y g g (a' i a' ) 2 i 1 a' a a' ( ( i )( i )')a i 1 a ' a a ' B a a ' a g Where B ( i )( i )' i 1 Ordinarily, and i are unavailable. Suppose a random sample of size ni from population i , i 1,2, g . Denote the ni p data set, from population i , by X i and its j th row by xij' . g 1 We define xi ni ni xij and x j 1 ni x ij i 1 i 1 g n i 1 g the sample between groups matrix B ni ( xi x )( xi x )' i 1 i g The sample covariance of population i is S i and S pooled ni ( x i 1 j 1 ij g (n i 1 Consequently we want to chose an â maximizing maximizing xi )( xij xi )' i 1) W g (n i 1 i 1) aˆ ' Baˆ , then it is equivalent that chose an â aˆ ' S pooled aˆ aˆ ' Baˆ . aˆ 'Waˆ Then aˆ1 eˆ1 , aˆ 2 eˆ2 , , aˆ s eˆs , where eˆ1 , eˆ2 , , eˆs are the eigenvectors of W 1 B and scaled so that eˆ' S pooleseˆ 1 , where s min{( g 1), p} . The linear combination aˆ1' x is called the sample first discriminant, and aˆ k' x is called the sample k th discriminant. Remark: 1 1 2 1 2 Let e1 , e2 , , es be the eigenvectors of B then e1 , e2 , , es are also the eigenvectors of B . Similarly eˆ1 , eˆ2 , , eˆs are the eigenvectors of W 1 B then eˆ1 , eˆ2 , , eˆs are also the eigenvectors of 1 S pooled B. a1' i Y1 a1' x ' Y ' a2 x a 2 Moreover Y has mean vector iY 2 i under population i and covariance ' ' Ys a s x as i matrix I . Then the appropriate measure of squared distance form Y y to iY is s ( y iY )' ( y iY ) ( y j iYj ) 2 j 1 Allocate x to k if s s s j 1 j 1 j 1 ( y j kYj ) 2 (a 'j ( x k )) 2 (a 'j ( x i )) 2 for all i k . PS: Above inequality is implemented by substituting the sample quantities xi â j for i a j , where â j is defined as above. In factor, Fisher’s discrimination among several population is a special case in “normal theory” discriminant score d i (x) , i.e. s s j 1 j 1 ( y j kYj ) 2 (a 'j ( x k )) 2 ( x i )' 1 ( x i ) 2di ( x) x' 1 x 2 ln Pi , where 1 2 1 2 y j a x , a j e j and e j is an eigenvector of B ' j 1 2