11.7 Fisher`s discriminant function: several populations

1 11.7 Fisher’s Method for Discriminating Among Several Populations 1. Separation: Suppose there are g populations, X 1 , X 2 , , X n1 : population 1 X n1 1 , X n1  2 ,, X n1  n2 : population 2   X n1ng 11 , , X nT : population g, n1  n2    ng  nT . where Let X j be the sample mean for the population j, j  1,, g , and nT X  X i 1 nT i . The sample between matrix g B   n j ( X j  X )( X j  X )t j 1 Thus,   a t Ba   n j a t X j  X X j  X  a   n j a t X j  a t X X tj a  X t a g g t j 1  j 1   n j Y j  Y  , g 2 j 1 Yi  a t X i , i  1,  , nT , Y j is the mean for the j’th population, j  1,  , g , n1 for example, Y1   Yi i 1 n1 nT and Y  Y i 1 nT i . The sample within group matrix W is 2 n1  n2 W   X i  X 1 X i  X 1   n1  X t i 1 i  n1 1  X 2 X i  X 2      X nT t i  X g X i  X g  t i i  n1 n g 1 1 Thus,  a X a tWa   a t X i  X 1 X i  X 1  a    n1 nT t i 1   Yi  Y1   n1 n1  n2  Y 2 i 1 i  n1 1 i  n1  ng 1 1  Y2     i  Y nT 2 i  X g X i  X g  a t t  Yg  . 2 i i  n1  ng 1 1 Note:  Y n1 a tWa  nT  g i 1  i  Y1   2  Y i  n1 1 i  Y2      Y nT 2  Yg  2 i i  n1  n g 1 1 nT  g the pooled estimate based on Y1 , Y2 , , Yn . T  X n1 S pooled  n1  n2 W  nT  g i 1  X 1 X i  X 1     i  X nT t i i  n1  n g 1 1  X g X i  X g  t nT  g  the pooled estimate based on X 1 , X 2 , , X nT . We now introduce Fisher’s linear discriminant method for several populations. Fisher’s discriminant method for several populations is as follows: Find the vector â1 maximizing the separation function  n Y g S (a)  t a Ba  a tWa j 1  Y n1 i 1 i  Y1   2 n1  n2  Y i  n1 1 i j Y  2 j  Y2     2  Y nT  Yg  , 2 i i  n1  n g 1 1 subject to aˆ1t S pooled aˆ1  1. The linear combination aˆ1t X is called the sample first discriminant. Find the vector â 2 maximizing the separation function S (a ) subject 3 to aˆ 2t S pooled aˆ 2  1 and aˆ 2t S pooled aˆ1  0 .   Find the vector â s maximizing the separation function S (a ) subject to aˆ st S pooled aˆ s  1 and aˆ st S pooled aˆ l  0, l  s. aˆ tj S pooled aˆ j is the estimate of Var(aˆ tj X ), j  1,, s. Note: aˆ tj S pooled aˆ l , j  l. is the estimate of Cov(aˆ tj X , aˆ lt X ), j  l. The condition aˆ tj S pooled aˆl  0 is similar to the condition given in the principal component analysis. Intuitively, S (a ) measures the difference among the transformed means reflected by  n Y g j 1 j Y  2 j relative to the random variation of the transformed  Y n1 data reflected by i i 1  Y1   2 n1  n2  Y i  n1 1 i  Y2     2  Y nT  Yk  . As the 2 i i  n1  ng 1 1 transformed observations Y1 , Y2 , , Yn1 ( population  1), Yn1 1 , Yn1  2 ,, Yn1  n2 ( population  2), , Yn1  ng 1 1 , , YnT ( population  g )  n Y g are separated, j 1 j Y  2 j should be large even as the random variation of the transformed data is taken into account. Important result: Let e1 , e2 ,, es be the orthonormal eigenvector of W 1 2 BW 1 2 4 corresponding to the eigenvalues 1  2     s  0. Then, 1 / 2 1 / 2 1 / 2 1 aˆ j  S pooled e j , j  1,, s, where S pooled S pooled  S pooled . The following important result provides another way to obtain the discriminants!! Important result: Let e1 , e2 ,, es be the eigenvectors of W 1 B corresponding to the eigenvalues 1  2     s  0. Then, aˆ j , j  1, , s, are the scaled eigenvectors satisfying aˆ j t S pooled aˆ j  1 . That is, ej ˆj  a e tj S pooled e j 2. Classification: Fisher’s classification method for several populations is as follows: For an observation X 0 , Fisher’s classification procedure based on the first r  s sample discriminants is to allocate X 0 to the population l if         2 2 j 2 t t j 2 ˆ ˆ ˆ ˆ     Y  Y  a X  X  a X  X  Y  Y  j l  j 0 l  j 0 i  j i , i  l, r j 1 r j 1 r j 1 r j 1 where Yˆj  aˆ tj X 0 , Yi j  aˆ tj X i , j  1,, r; i  1, g Intuition of Fisher’s method: R p : population 1 X 1 population 2 X 2 l j ( X )  aˆ tj X , j  1, , r X0 population g X g 5 Y11 ,, Y1 r R :  Yˆ r j 1  Y1 j j  2 Yˆ1 ,, Yˆr Y21 ,, Y2r Yg1 ,, Ygr : the “total” square distance between the transformed X 0 ( Yˆ1 ,, Yˆr ) and the transformed mean of the population 1 ( Y11 ,, Y1 r ).  Yˆ r j 1  Y2 j j  2 : the “total” square distance between the transformed X 0 ( Yˆ1 ,, Yˆr ) and the transformed mean of the population 2 ( Y21 ,, Y2r ).    Yˆ r j 1  Yk j j  2 : the “total” square distance between the transformed ( Yˆ1 ,, Yˆr ) X 0 and the transformed mean of the population g ( Yg1 ,, Ygr ).   Yˆ r j 1 j  Yl j    Yˆ 2 r j 1  2 j  Yi , i  l , imply the total distance between the transformed X 0 and the transformed mean of the population l is smaller than the one between the one between the transformed X 0 and the transformed mean of the other populations. In some sense, X 0 is “closer” to the population l than to the other populations. Therefore, X 0 is allocated to the population l. Example: 6  x1t   2 5  x 4t  0 6  x7t   1  2       X 1   x 2t    0 3 n1  3; X 2   x5t   2 4 n2  3; X 3   x8t    0 0  n3  3 .  x3t    1 1  x6t  1 2  x9t   1  4       Then, 0  1 1  0 x1   , x 2   , x3   , x   5  ,  3  3  4   2 6 3  t B   3xi  x xi  x    , i 1 3 62 3 and W   x j  x1 x j  x1    x j  x2 x j  x2    x j  x3 x j  x3  3 t j 1 6 j 4 t 9 t j 7  6  2    2 24  . Further, S pooled   1 W  n1  n2  n3  3  1  3 1  3 , W 1 B  1.07143 1.4  . 0.21429 2.7 4     The eigenvectors of W 1 B are 0.7183  0.9929    e1     2 . 867 ; e  2  1  0.11842  0.9043 . 0.9213   Thus, aˆ1  e1 e1t S pooled e1  0.386  0.938  e2 e2  ; aˆ 2     . 3.47 0.495 1.12  0.112 e2t S pooled e2 e1 Therefore, yˆ1  aˆ1t x  0.386 x1  0.495x2 ; yˆ 2  aˆ 2t x  0.938x1  0.112 x2 . 1 To classify a new observation x0    , we need to compute 3 yˆ1  aˆ1t x0  1.87, yˆ 2  aˆ 2t x0  0.60 , y11  aˆ1t x1  1.10, y12  aˆ2t x1  1.27 , 7 y21  aˆ1t x2  2.37, y22  aˆ2t x2  0.49 , y31  aˆ1t x3  0.99, y32  aˆ 2t x3  0.22 . Since  yˆ 2 2 j 1  yˆ j y j y j 2 j y j 3  yˆ 2  4.09,   1.87  2.37  0.60  0.49 2  0.26, 2  8.32, 2 2 2 2 j 1   1.87  1.10  0.60  1.27 2 2 j 1 j 1   1.87  0.99  0.60  0.22 2 1 the observation x0    is allocated to population 2. 3 Useful Splus Commands: >xmean1=apply(ir[1:50,],2,mean) > xmean2=apply(ir[51:100,],2,mean) # X1 # X2 > xmean3=apply(ir[101:150,],2,mean) # X 3 >xmean=apply(ir,2,mean) # X >b1<-50*(xmean1-xmean)%*%t(xmean1-xmean) >b2<-50*(xmean2-xmean)%*%t(xmean2-xmean) # n1 ( X 1  X )( X 1  X ) t # n2 ( X 2  X )( X 2  X ) t >b3<-50*(xmean3-xmean)%*%t(xmean3-xmean) # n3 ( X 3  X )( X 3  X ) t # B   n j X j  X X j  X  g >b<-b1+b2+b3 t j 1  X  X 1 X i  X 1  n1 >sum1=49*var(ir[1:50,]) # i 1 t i n1  n2 > sum2=49*var(ir[51:100,]) #  X i  n1 1  X 2  X i  X 2  t i  X nT > sum3=49*var(ir[101:150,]) # i  n1  n2 1 >w<-sum1+sum2+sum3 >invw<-solve(w)  X 3  X i  X 3  t i #W # W 1 8 >spool=w/(50+50+50-3) # S pooled  W >evectors=eigen(invw%*%b)$vectors # e1 , e2 , e3 , e4 # aˆ1  n1  n2  n3  3 e1 e1t S pooled e1 >a1hat=evectors[,1]/sqrt(t(evectors[,1])%*%spool%*%evectors[,1]) # aˆ 2  e2 e2t S pooled e2 >a2hat=evectors[,2]/sqrt(t(evectors[,2])%*%spool%*%evectors[,2]) >a1x=ir%*%a1hat # aˆ1t X i , i  1, nT . > a2x=ir%*%a2hat # aˆ 2t X i , i  1,  nT >plot(a1x,a2x) # separation based on the first two sample discriminant 5  3 Objective: Allocate x0    and compute the error rate 1   1 Useful Splus Commands (Classfication): >x0=c(5,3,1,1) >yhat1=a1hat%*%x0 >yhat2=a2hat%*%x0 # ŷ1 # ŷ 2 >y2bar2=a2hat%*%xmean2 # y11 # y12 # y 21 # y 22 >y3bar1=a1hat%*%xmean3 # y 31 >y3bar2=a2hat%*%xmean3 # y 32 >y1bar1=a1hat%*%xmean1 >y1bar2=a2hat%*%xmean1 >y2bar1=a1hat%*%xmean2  yˆ 2 >dis1=(yhat1-y1bar1)^2+(yhat2-y1bar2)^2 # j 1 j  y1j  2 9  yˆ j  y 2j   yˆ j  y3j  2 >dis2=(yhat1-y2bar1)^2+(yhat2-y2bar2)^2 # j 1 2 >dis3=(yhat1-y3bar1)^2+(yhat2-y3bar2)^2 # j 1 >c(dis1,dis2,dis3) 2 2

11.7 Fisher`s discriminant function: several populations

Related documents

Products

Support

11.7 Fisher`s discriminant function: several populations

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib