Stat 407 Lab 5 Classification Fall 2001 This lab is designed to help you explore a classification problem using Splus (Chapter 6, Multivariate of the S-Plus manual). We will use the Australian crabs data as the example. The primary question is “How do we distinguish the species from 5 physical measurements?” 1. Startup S-plus and load the crabs data. 2. Set the variable Sp to be a “factor”, by entering the Data, Change Variable Type, selecting Sp and setting the New Type to be Factor. 3. Compute the variance-covariance matrix for the physical measurements for each species. Is it appropriate to assume that the variance-covariance structure is homogeneous in this data? 4. From the main menu choose Statistics, Multivariate, Discriminant Analysis. In the dialog window choose Sp as the dependent variable, and the 5 physical measurements FL, RW, CL, CW, BD as the independent variables. Run the Classical discriminant analysis. 5. Write down the discriminant functions for the two species produced by S-Plus. 6. Plot the predicted species values for each crab as a histogram. Mark the discriminant boundary between the two groups. Here is some sample scripting code to do this: a1<-c(-16.7,0.01,1.55,1.72,1.60,-7.47) a2<-c(-23.7,8.20,2.44,3.78,-6.47,-0.64) x<-t(as.matrix(a1[2:6]))%*%t(d.crabs[,4:8]) x<-x+rep(a1[1],200) y<-t(as.matrix(a2[2:6]))%*%t(d.crabs[,4:8]) y<-y+rep(a2[1],200) hist(x-y,20) 7. Write down the discriminant rule (equation 11.9 in text, equation “*” in notes) based on quantities produced by S-Plus. 8. Calculate the apparent error rate for the discriminant rule. Based on our analyses in previous labs do you think that the apparent error rate for a classification rule for Sex be higher or lower than this value. 9. Classify this new observation; to what species does it likely belong? FL RW CL CW BD 23.1 20.2 46.2 52.5 21.1 1