Stat 407 Lab 5 Classification Fall 2001 SOLUTION

advertisement
Stat 407 Lab 5 Classification Fall 2001 SOLUTION
This lab is designed to help you explore a classification problem using Splus (Chapter 6, Multivariate of the
S-Plus manual). We will use the Australian crabs data as the example.
The primary question is “How do we distinguish the species from 5 physical measurements?”
1. Startup S-plus and load the crabs data.
2. Set the variable Sp to be a “factor”, by entering the Data, Change Variable Type, selecting Sp and setting
the New Type to be Factor.
3. Compute the variance-covariance matrix for the physical measurements for each species. Is it appropriate
to assume that the variance-covariance structure is homogeneous in this data?
Species 1
FL
RW
CL
FL 9.118044 6.179527 20.74157
RW 6.179527 5.195168 14.10301
CL 20.741568 14.103006 47.64731
CW 23.642372 16.210933 54.22052
BD 9.158436 6.322804 21.00494
CW
BD
23.64237 9.158436
16.21093 6.322804
54.22052 21.004935
61.87456 23.952615
23.95262 9.411930
Species 2
FL
RW
CL
CW
BD
FL
RW
CL
CW
BD
10.729394 7.718697 21.90159 24.49089 10.140828
7.718697 6.788787 15.41899 17.67314 7.059574
21.901586 15.418993 45.75524 50.84400 21.192592
24.490889 17.673143 50.84400 56.86551 23.530772
10.140828 7.059574 21.19259 23.53077 9.931834
They look quite similar, and from our plots of the data in earlier labs, the variance-covariance structure
between groups looked similar.
4. From the main menu choose Statistics, Multivariate, Discriminant Analysis. In the dialog window
choose Sp as the dependent variable, and the 5 physical measurements FL, RW, CL, CW, BD as the
independent variables. Run the Classical discriminant analysis.
5. Write down the discriminant functions for the two species produced by S-Plus.
f1 (x) = −16.6+0.01F L+1.55RW +1.72CL+1.60CW −7.47BD , f2 (x) = −23.7+8.20F L+2.44RW +3.78CL−6.
6. Plot the predicted species values for each crab as a histogram. Mark the discriminant boundary between
the two groups.
a1<-c(-16.7,0.01,1.55,1.72,1.60,-7.47)
a2<-c(-23.7,8.20,2.44,3.78,-6.47,-0.64)
x<-t(as.matrix(a1[2:6]))%*%t(d.crabs[,4:8])
x<-x+rep(a1[1],200)
y<-t(as.matrix(a2[2:6]))%*%t(d.crabs[,4:8])
y<-y+rep(a2[1],200)
hist(x-y,20)
1
7. Write down the discriminant rule (equation 11.9 in text, equation “*” in notes) based on quantities produced
by S-Plus.
Assign to group 1 if
(−8.19 − 0.89 − 2.06 8.07 − 6.83)0 x0 + 7.00 > 0
else assign to group 2.
8. Calculate the apparent error rate for the discriminant rule. Based on our analyses in previous labs do you
think that the apparent error rate for a classification rule for Sex be higher or lower than this value.
The apparent error rate is 0.
9. Classify this new observation; to what species does it likely belong?
FL
RW
CL
CW
BD
23.1 20.2 46.2 52.5 21.1
Plugging this value into equation * returns a result of -15.77, indicating this observation belongs to group
2.
2
Download