Ch11_Classification_LZhou

advertisement
Classification with
several populations
Presented by: Libin Zhou
Classification procedure

Minimum Expected Cost of Misclassification Method (ECM)
 The ECM for two populations is:
ECM  P(2 | 1)c(2 | 1) p1  P(1 | 2)c(1 | 2) p2
Where: P is the conditional probability; p is the prior probability; c is the
cost of misclassification

The ECM for multiple populations could be:
g
ECM (1)  P ( 2 | 1)c( 2 | 1)  P (3 | 1)c(3 | 1)  ...  P ( g | 1)c ( g | 1)   P ( k | 1)c( k | 1)
k 2
ECM (i ) 
g
 P ( k | i )c ( k | i )
k 1& k  i
g
g
i 1
i 1
ECM  p1 * ECM (1)  ...  Pg * ECM ( g )   ( pi * ECM (i ))   ( pi (
g
 P(k | i)c(k | i)))
k 1& k  i
Minimum ECM classification Rule

Result 11.5 on page 614.
g

When
p
i 1&i  k
i
f i ( x)c(k | i )
is smallest, assigning x to population k could minimize the ECM.
•
If misclassification costs are equal, the rule could be
simplified as
pk f k ( x)  pi f i ( x)
Or
ln pk f k ( x)  ln pi f i ( x)
Maximum posterior probability Rule

The posterior probability is P(  | x) =P(x comes from
population k given that x was observed)
P( k | x) 
pk f k ( x )
g
p
i 1

i
f i ( x)

( prior) * (likelihood)
 [( prior) * (likelihood)]
for k=1,2,…,g
This rule is the generalization of the largest posterior
probability rule for two populations classification
(Equation (11-9))
Classification with Normal population
When the populations are multivariate normal distribution, the
term f i (x) in the minimum ECM classification rule with equal
misclassification costs ln pk f k ( x)  ln pi fi ( x) (Equation(11-41))
could be written by
1
1
1
f i ( x) 
exp[

(
x


)'
i i ( x i )]
2
p/2
1/ 2
(2 ) | i |
Then we get
1
ln pk f k ( x)  ln pk  ( 2p ) ln(2 )  12 ln |  k |  12 ( x   k )' k ( x   k )  max(ln pi f i ( x))

i
d iQ ( x)   12 ln |  i |  12 ( x  i )' i ( x  i )  ln pi
1
Where d is the quadratic discrimination score and i=1,2,…,p
Minimum total probability of misclassification
(TPM) rule for normal populations with different i

If the quadratic discrimination score
dkQ ( x)  max(diQ ( x))
then x would be allocated to population k
Estimated Minimum (TPM) rule for several
normal populations with different i


In practice, the i and i are usually unknown, but a
training set of correctly classified observations is
often available for the construction of estimates.
The relevant sample quantities for population i are xi
and Si
Then the estimated dˆiQ could be written by
ˆd Q ( x)   1 ln | S |  1 ( x  x )' S 1 ( x  x )  ln p
i
i
i
i
i
i
2
2
i=1,2,…,g
The estimated minimum TPM rule for equalcovariance normal population

If the covariance of the several populations are equal, then the
quadratic discrimination score could be simplified into an estimate of
a linear discriminant score based on the pooled estimate of the
covariance.
1
1
dˆi ( x)  xi' S pooled
x  12 xi' S pooled
xi  ln pi

We can also define a new variable: Generalized Squared Distance
1
Di2 ( x)  ( x  xi )S pooled
( x  xi )
Then the sample discriminant score could be written by
1
dˆiQ ( x)   12 ln | Si |  12 ( x  xi )' Si ( x  xi )  ln pi  cons 12 Di2 ( x)  ln pi
Example 11.11. Classifying a potential
business-school graduate student



Introduction: the admission officer of a business school has used an
“index” of undergraduate grade point average (GPA) and graduate
management aptitude test (GMAT) scores to help decide which
applicants should be admitted to the school’s graduate programs.
Analysis:
 Populations: Pop1—admit; Pop2—do not admit; Pop3—
borderline
 Variable: x1—GPA; x2—GMAT
Question: Allocating a new applicant with variables
(3.21,497) using sample discriminant scores
Resolution



1) calculate the mean values for each populations
2) calculate the pooled covariance
3) calculate the sample squared distances Di2 ( x)
using the sample squared distance function
1
Di2 ( x)  ( x  xi )S pooled
( x  xi )
Where i=1,2,3
•
2
2
2
4) Results: D1 ( x0 ) =2.58; D2 ( x0 ) =17.10;D3 ( x0 )=2.47
From the rule of assigning x to the “closest” population, the new
application should be assigned to population 3, borderline.
Download