Stat 407 Lab 5 Classification Fall 2001

advertisement
Stat 407 Lab 5 Classification Fall 2001
This lab is designed to help you explore a classification problem using Splus (Chapter 6, Multivariate of the
S-Plus manual). We will use the Australian crabs data as the example.
The primary question is “How do we distinguish the species from 5 physical measurements?”
1. Startup S-plus and load the crabs data.
2. Set the variable Sp to be a “factor”, by entering the Data, Change Variable Type, selecting Sp and setting
the New Type to be Factor.
3. Compute the variance-covariance matrix for the physical measurements for each species. Is it appropriate
to assume that the variance-covariance structure is homogeneous in this data?
4. From the main menu choose Statistics, Multivariate, Discriminant Analysis. In the dialog window
choose Sp as the dependent variable, and the 5 physical measurements FL, RW, CL, CW, BD as the
independent variables. Run the Classical discriminant analysis.
5. Write down the discriminant functions for the two species produced by S-Plus.
6. Plot the predicted species values for each crab as a histogram. Mark the discriminant boundary between
the two groups. Here is some sample scripting code to do this:
a1<-c(-16.7,0.01,1.55,1.72,1.60,-7.47)
a2<-c(-23.7,8.20,2.44,3.78,-6.47,-0.64)
x<-t(as.matrix(a1[2:6]))%*%t(d.crabs[,4:8])
x<-x+rep(a1[1],200)
y<-t(as.matrix(a2[2:6]))%*%t(d.crabs[,4:8])
y<-y+rep(a2[1],200)
hist(x-y,20)
7. Write down the discriminant rule (equation 11.9 in text, equation “*” in notes) based on quantities produced
by S-Plus.
8. Calculate the apparent error rate for the discriminant rule. Based on our analyses in previous labs do you
think that the apparent error rate for a classification rule for Sex be higher or lower than this value.
9. Classify this new observation; to what species does it likely belong?
FL
RW
CL
CW
BD
23.1 20.2 46.2 52.5 21.1
1
Download