Useful Splus Commands (2 Populations, classification):

advertisement
25
Useful Splus Commands (2 Populations, classification):
>xmean1=apply(irsv[1:50,],2,mean)
> xmean2=apply(irsv[51:100,],2,mean)
>y1bar=ira%*%xmean1
>y2bar=ira%*%xmean2
# X1
# X2
# y1
# y2
>mid=(y1bar+y2bar)/2
# y1  y 2
>x0=irsv[1,]
>yhat=ira%*%x0
2
# X0
# ŷ 0
The following important result provides another way to obtain the
discriminants!!
Important result:
Let e1 , e2 ,, es be the eigenvectors of W 1 B corresponding to the
eigenvalues 1  2     s  0. Then, aˆ j , j  1, , s, are the scaled
eigenvectors satisfying aˆ j t S pooled aˆ j  1 . That is,
ˆj 
a
ej
e tj S pooled e j
Example:
 x1t   2 5
 x 4t  0
 t 
 
X 1   x 2    0 3 n1  3; X 2   x5t   2
 x3t    1 1
 x 6t  1
 
 
 x7t   1  2
6
 
4 n 2  3; X 3   x8t    0
0  n3  3 .
 x9t   1  4
2
 
Then,
 0 
 1
1 
 0 
x1   , x 2   , x3  
,
x

5  ,

 3 
3
 4
  2
6
t
B   3 xi  x  xi  x   
i 1
3
3
,
62
3
and
W   x j  x1 x j  x1    x j  x2 x j  x2    x j  x3 x j  x3 
3
t
j 1
 6

 2
6
t
j 4
9
j 7
 2
24 
25
t
.
26
Further,
S pooled 
1 
3 , W 1 B  1.07143 1.4  .
0.21429 2.7
4 



 1
W

n1  n2  n3  3  1
 3
The eigenvectors of W 1 B are
0.7183
 0.9929 


e1  


2
.
867
;
e

2
 1
 0.11842  0.9043 .
0.9213


Thus,
aˆ1 
e1
e1t S pooled e1
0.386
 0.938 
e2
e2

; aˆ 2 



.
3.47 0.495
1.12  0.112
e2t S pooled e2
e1

Therefore,
yˆ1  aˆ1t x  0.386 x1  0.495x2 ; yˆ 2  aˆ 2t x  0.938x1  0.112 x2 .
1
To classify a new observation x0    , we need to compute
3
yˆ1  aˆ1t x0  1.87, yˆ 2  aˆ 2t x0  0.60 ,
y11  aˆ1t x1  1.10, y12  aˆ2t x1  1.27 ,
y21  aˆ1t x2  2.37, y22  aˆ2t x2  0.49 ,
y31  aˆ1t x3  0.99, y32  aˆ 2t x3  0.22 .
Since
 yˆ
2
2
j 1
 yˆ
j
y
j
y

2
j 1
  1.87  1.10  0.60  1.27 
2
2
 4.09,
2
2
j 1
j
1
j
2
  1.87  2.37   0.60  0.49
2
2
 0.26,
2
 8.32,
2
yˆ j  y 3j
  1.87  0.99  0.60  0.22
2
1
the observation x0    is allocated to population 2.
3
Useful Splus Commands:
>xmean1=apply(ir[1:50,],2,mean)
> xmean2=apply(ir[51:100,],2,mean)
# X1
# X2
26
27
> xmean3=apply(ir[101:150,],2,mean) # X 3
>xmean=apply(ir,2,mean)
# X
>b1<-50*(xmean1-xmean)%*%t(xmean1-xmean)
>b2<-50*(xmean2-xmean)%*%t(xmean2-xmean)
# n1 ( X 1  X )( X 1  X ) t
# n2 ( X 2  X )( X 2  X ) t
>b3<-50*(xmean3-xmean)%*%t(xmean3-xmean)
# n3 ( X 3  X )( X 3  X ) t
# B   n j X j  X X j  X 
g
>b<-b1+b2+b3
t
j 1
 X
 X 1 X i  X 1 
n1
>sum1=49*var(ir[1:50,])
#
i 1
t
i
n1  n2
> sum2=49*var(ir[51:100,])
#
 X
i  n1 1
 X 2  X i  X 2 
t
i
 X
nT
> sum3=49*var(ir[101:150,])
#
i  n1  n2 1
>w<-sum1+sum2+sum3
 X 3  X i  X 3 
t
i
#W
# W 1
>invw<-solve(w)
>spool=w/(50+50+50-3)
# S pooled  W
>evectors=eigen(invw%*%b)$vectors
# e1 , e2 , e3 , e4
#
aˆ1 
n1  n2  n3  3
e1
e1t S pooled e1
>a1hat=evectors[,1]/sqrt(t(evectors[,1])%*%spool%*%evectors[,1])
#
aˆ 2 
e2
e2t S pooled e2
>a2hat=evectors[,2]/sqrt(t(evectors[,2])%*%spool%*%evectors[,2])
# aˆ1t X i , i  1, nT .
>a1x=ir%*%ia1hat
> a2x=ir%*%a2hat
>plot(a1x,a2x)
# aˆ 2t X i , i  1,  nT
# separation based on the first two sample discriminant
27
28
Objective: Allocate x0  5 3 1 1t and compute the error rate
Useful Splus Commands (Classfication):
>x0=c(5,3,1,1)
>yhat1=a1hat%*%x0
>yhat2=a2hat%*%x0
# ŷ1
# ŷ 2
>y1bar1=a1hat%*%xmean1
>y1bar2=a2hat%*%xmean1
>y2bar1=a1hat%*%xmean2
>y2bar2=a2hat%*%xmean2
#
#
#
#
>y3bar1=a1hat%*%xmean3
# y 31
>y3bar2=a2hat%*%xmean3
# y 32
y11
y12
y 21
y 22
 yˆ
2
>dis1=(yhat1-y1bar1)^2+(yhat2-y1bar2)^2
#
j 1
#
j
 y 2j

 yˆ
j
 y3j

j 1
2
>dis3=(yhat1-y3bar1)^2+(yhat2-y3bar2)^2
#
 y1j
 yˆ
2
>dis2=(yhat1-y2bar1)^2+(yhat2-y2bar2)^2
j
j 1

2
2
2
>c(dis1,dis2,dis3)
Homework
Suppose we have the following data for 2 populations:
X1 , X 2 
Population 1
(-3, 11), (1, 7), (-1, 3)
Population 2
(1, 13), (5, 9), (3, 5)
Population 3
(2, 15), (6, 11), (4, 7)
(a) Based on populations 1 and 2, please answer the following questions:
(i) Find
â
and the linear discriminant function. (20%)
(ii) Find the error rate based the linear discriminant function in (i) for
populations 1 and 2. (10%)
(b) Based on all populations, suppose the two discriminants are
yˆ1  0.558 X 1  0.204 X 2 ,
yˆ 2  0.149 X 1  0.204 X 2 .
Please classify the observations (1, 7), (5, 9) and (6, 11). (10%)
28
Download