population denote

advertisement
※ Discrimination and classification
1. To describe the differential features of objects from several known collections.
2. To assign new objects into two or more labeled by using a rule.
Example:
We want to judge two species of chickweed by Sepal(萼片) and petal(花瓣) length, petal cleft(半裂的)
depth, bract(苞片) length, scarious(乾膜質的) tip length, pollen(花粉) diameter(直徑).
Let f1 ( x) and f 2 ( x) be density functions associates with the p 1 vector random variable X for the
populations 1 and  2 . Let  be the sample space of all possible observations x .
Let R1 be the set of x values for which we classify objects as 1 and R2 =   R1 be the remaining
x values for which we classify objects as  2 . The conditional probability of classifying an object as  2
when it is from 1 is p(2 | 1)  p( X  R2 |  1 )   f1 ( x) dx , similarly, the conditional probability of
R2
classifying an object as 1 when it is really from  2 is p(1 | 2)  p ( X  R1 |  2 )   f 2 ( x) dx .
R1
Let p1 be the prior probability of 1 and p2 be the prior probability of  2 , where p1  p2  1. Then
p(obervation is misclassif ied as 1 )  p( X  R1 |  2 ) p( 2 )  p(1 | 2) p2 ,
p(obervation is misclassif ied as  2 )  p( X  R2 | 1 ) p(1 )  p(2 | 1) p1 .
Let c(1 | 2) is cost when an observation from  2 is incorrectly classified as 1 , and c(2 | 1) is the cost
when 1 observation is incorrectly classified as  2 .
1. For two populations (minimize ECM):
Allocate x to  1 if
f1 ( x) C (1 2) P2

, in particular two populations are normal distribution then
f 2 ( x) C (2 1) P1
a. If 1   2  
C (1 2) P2
1
)
Allocate x to  1 if ( 1   2 ) '  1 x  ( 1   2 ) '  1 ( 1   2 )  ln(
2
C (2 1) P1
PS: Above inequality is implemented by substituting the sample quantities x1 , x2 , Spooled for 1 , 2 , 
respectively.
b. If 1   2
Allocate x to  1 if 
C (1 2) P2
1
x(11   21 ) x  ( 1' 11   2'  21 ) x  K  ln(
)
2
C (2 1) P1
1 
1
Where K  ln( 1 )  ( 1' 11 1   2 )
2 2
2
PS: Above inequality is implemented by substituting the sample quantities x1 , x2 , S1 , S2 for 1 , 2 , 1 ,  2
respectively.
2. For two populations using Fisher’s discrimination (maximum separation):
It does not assume that the populations are normal, but it need the equal population covariance matrices.
| y  y2 |
Choose a linear transformation a' , y  a ' x , such that the separation= 1
is maximum, where
Sy
S y2 
n1
n2
j 1
j 1
 ( y1 j  y1 ) 2   ( y2 j  y2 ) 2
n1  n2  2
is the pooled estimate of variance. Then the linear transformation
1
yˆ  aˆ ' x  ( x1  x2 )' S pooled
x maximizes the ratio of the separation
So, allocate x to  1 if ( 1   2 ) '  1 x 
C (1 2) P2
1
( 1   2 ) '  1 ( 1   2 )  ln(
)
2
C (2 1) P1
PS: Above inequality is implemented by substituting the sample quantities x1 , x2 , Spooled for 1 , 2 , 
respectively.
Remark: Fisher’s linear discrimination rule is equivalent to the minimum ECM rule with equal prior
probabilities and equal costs of misclassification.
3. For several populations (minimum TPM):
Allocate x to  k if ln Pk f k ( x)  ln Pi f i ( x) for all i  k , in particular all populations are normal
distribution then
a. unequal  i
1
1
Allocate x to  k if d kQ ( x)  max {d iQ ( x)} , where d iQ ( x)   ln  i  ( x   i ) '  i1 ( x   i )  ln Pi
1i  g
2
2
PS: Above inequality is implemented by substituting the sample quantities xi , Si for i ,  i respectively.
b. equal  i   , i  1,2,  g
1
Allocate x to  k if di ( x)  max{di ( x)} , where d i ( x)   i'  1 x   i'  1  i  ln Pi
1i  g
2
PS: Above inequality is implemented by substituting the sample quantities xi , S pooled for i , 
respectively.
4. For several populations using Fisher’s discrimination (maximum separation):
Now we introduce this method.
It does not assume that the populations are normal, but it need the equal population covariance matrices.
We consider the linear combination Y  a' X
 Sum of squared distances from 


 population s to overall mean of Y  


Variance of Y




g
 (iY  Y ) 2
i 1

2
Y
g

g
 (a'  i  a'  ) 2

i 1
a' a
a' ( (  i   )(  i   )')a
i 1
a ' a

a ' B a
a ' a
g
Where B   (  i   )(  i   )'
i 1
Ordinarily,  and  i are unavailable. Suppose a random sample of size ni from population
 i , i  1,2, g . Denote the ni  p data set, from population  i , by X i and its j th row by xij' .
g
1
We define xi 
ni
ni
 xij and x 
j 1
ni
 x
ij
i 1 i 1
g
n
i 1
g
the sample between groups matrix B   ni ( xi  x )( xi  x )'
i 1
i
g
The sample covariance of population  i is S i and S pooled 
ni
 ( x
i 1 j 1
ij
g
 (n
i 1
Consequently we want to chose an â maximizing
maximizing
 xi )( xij  xi )'
i

 1)
W
g
 (n
i 1
i
 1)
aˆ ' Baˆ
, then it is equivalent that chose an â
aˆ ' S pooled aˆ
aˆ ' Baˆ
.
aˆ 'Waˆ
Then aˆ1  eˆ1 , aˆ 2  eˆ2 , , aˆ s  eˆs , where eˆ1 , eˆ2 , , eˆs are the eigenvectors of W 1 B and scaled so that
eˆ' S pooleseˆ  1 , where s  min{( g  1), p} .
The linear combination aˆ1' x is called the sample first discriminant, and aˆ k' x is called the sample k th
discriminant.
Remark:

1
1
2

1
2
Let e1 , e2 , , es be the eigenvectors of  B then e1 , e2 , , es are also the eigenvectors of  B  .
Similarly eˆ1 , eˆ2 , , eˆs are the eigenvectors of W 1 B then eˆ1 , eˆ2 , , eˆs are also the eigenvectors of
1
S pooled
B.
 a1'  i 
Y1   a1' x 
 ' 
Y   ' 
a2 x 
a
2


Moreover Y 
has mean vector  iY   2 i  under population  i and covariance

  
   
 ' 
   ' 
Ys   a s x 
 as  i 
matrix I .
Then the appropriate measure of squared distance form Y  y to  iY is
s
( y  iY )' ( y  iY )   ( y j  iYj ) 2
j 1
Allocate x to  k if
s
s
s
j 1
j 1
j 1
 ( y j  kYj ) 2   (a 'j ( x  k )) 2   (a 'j ( x  i )) 2 for all i  k .
PS: Above inequality is implemented by substituting the sample quantities xi â j for  i a j , where â j
is defined as above.
In factor, Fisher’s discrimination among several population is a special case in “normal theory” discriminant
score d i (x) , i.e.
s
s
j 1
j 1
 ( y j  kYj ) 2   (a 'j ( x  k )) 2  ( x  i )'  1 ( x  i )  2di ( x)  x'  1 x  2 ln Pi , where
1

2

1
2
y j  a x , a j   e j and e j is an eigenvector of  B 
'
j

1
2
Download