Classification with several populations Presented by: Libin Zhou Classification procedure Minimum Expected Cost of Misclassification Method (ECM) The ECM for two populations is: ECM P(2 | 1)c(2 | 1) p1 P(1 | 2)c(1 | 2) p2 Where: P is the conditional probability; p is the prior probability; c is the cost of misclassification The ECM for multiple populations could be: g ECM (1) P ( 2 | 1)c( 2 | 1) P (3 | 1)c(3 | 1) ... P ( g | 1)c ( g | 1) P ( k | 1)c( k | 1) k 2 ECM (i ) g P ( k | i )c ( k | i ) k 1& k i g g i 1 i 1 ECM p1 * ECM (1) ... Pg * ECM ( g ) ( pi * ECM (i )) ( pi ( g P(k | i)c(k | i))) k 1& k i Minimum ECM classification Rule Result 11.5 on page 614. g When p i 1&i k i f i ( x)c(k | i ) is smallest, assigning x to population k could minimize the ECM. • If misclassification costs are equal, the rule could be simplified as pk f k ( x) pi f i ( x) Or ln pk f k ( x) ln pi f i ( x) Maximum posterior probability Rule The posterior probability is P( | x) =P(x comes from population k given that x was observed) P( k | x) pk f k ( x ) g p i 1 i f i ( x) ( prior) * (likelihood) [( prior) * (likelihood)] for k=1,2,…,g This rule is the generalization of the largest posterior probability rule for two populations classification (Equation (11-9)) Classification with Normal population When the populations are multivariate normal distribution, the term f i (x) in the minimum ECM classification rule with equal misclassification costs ln pk f k ( x) ln pi fi ( x) (Equation(11-41)) could be written by 1 1 1 f i ( x) exp[ ( x )' i i ( x i )] 2 p/2 1/ 2 (2 ) | i | Then we get 1 ln pk f k ( x) ln pk ( 2p ) ln(2 ) 12 ln | k | 12 ( x k )' k ( x k ) max(ln pi f i ( x)) i d iQ ( x) 12 ln | i | 12 ( x i )' i ( x i ) ln pi 1 Where d is the quadratic discrimination score and i=1,2,…,p Minimum total probability of misclassification (TPM) rule for normal populations with different i If the quadratic discrimination score dkQ ( x) max(diQ ( x)) then x would be allocated to population k Estimated Minimum (TPM) rule for several normal populations with different i In practice, the i and i are usually unknown, but a training set of correctly classified observations is often available for the construction of estimates. The relevant sample quantities for population i are xi and Si Then the estimated dˆiQ could be written by ˆd Q ( x) 1 ln | S | 1 ( x x )' S 1 ( x x ) ln p i i i i i i 2 2 i=1,2,…,g The estimated minimum TPM rule for equalcovariance normal population If the covariance of the several populations are equal, then the quadratic discrimination score could be simplified into an estimate of a linear discriminant score based on the pooled estimate of the covariance. 1 1 dˆi ( x) xi' S pooled x 12 xi' S pooled xi ln pi We can also define a new variable: Generalized Squared Distance 1 Di2 ( x) ( x xi )S pooled ( x xi ) Then the sample discriminant score could be written by 1 dˆiQ ( x) 12 ln | Si | 12 ( x xi )' Si ( x xi ) ln pi cons 12 Di2 ( x) ln pi Example 11.11. Classifying a potential business-school graduate student Introduction: the admission officer of a business school has used an “index” of undergraduate grade point average (GPA) and graduate management aptitude test (GMAT) scores to help decide which applicants should be admitted to the school’s graduate programs. Analysis: Populations: Pop1—admit; Pop2—do not admit; Pop3— borderline Variable: x1—GPA; x2—GMAT Question: Allocating a new applicant with variables (3.21,497) using sample discriminant scores Resolution 1) calculate the mean values for each populations 2) calculate the pooled covariance 3) calculate the sample squared distances Di2 ( x) using the sample squared distance function 1 Di2 ( x) ( x xi )S pooled ( x xi ) Where i=1,2,3 • 2 2 2 4) Results: D1 ( x0 ) =2.58; D2 ( x0 ) =17.10;D3 ( x0 )=2.47 From the rule of assigning x to the “closest” population, the new application should be assigned to population 3, borderline.