advertisement

Stat 407 Lab 5 Classification Fall 2001 SOLUTION This lab is designed to help you explore a classification problem using Splus (Chapter 6, Multivariate of the S-Plus manual). We will use the Australian crabs data as the example. The primary question is “How do we distinguish the species from 5 physical measurements?” 1. Startup S-plus and load the crabs data. 2. Set the variable Sp to be a “factor”, by entering the Data, Change Variable Type, selecting Sp and setting the New Type to be Factor. 3. Compute the variance-covariance matrix for the physical measurements for each species. Is it appropriate to assume that the variance-covariance structure is homogeneous in this data? Species 1 FL RW CL FL 9.118044 6.179527 20.74157 RW 6.179527 5.195168 14.10301 CL 20.741568 14.103006 47.64731 CW 23.642372 16.210933 54.22052 BD 9.158436 6.322804 21.00494 CW BD 23.64237 9.158436 16.21093 6.322804 54.22052 21.004935 61.87456 23.952615 23.95262 9.411930 Species 2 FL RW CL CW BD FL RW CL CW BD 10.729394 7.718697 21.90159 24.49089 10.140828 7.718697 6.788787 15.41899 17.67314 7.059574 21.901586 15.418993 45.75524 50.84400 21.192592 24.490889 17.673143 50.84400 56.86551 23.530772 10.140828 7.059574 21.19259 23.53077 9.931834 They look quite similar, and from our plots of the data in earlier labs, the variance-covariance structure between groups looked similar. 4. From the main menu choose Statistics, Multivariate, Discriminant Analysis. In the dialog window choose Sp as the dependent variable, and the 5 physical measurements FL, RW, CL, CW, BD as the independent variables. Run the Classical discriminant analysis. 5. Write down the discriminant functions for the two species produced by S-Plus. f1 (x) = −16.6+0.01F L+1.55RW +1.72CL+1.60CW −7.47BD , f2 (x) = −23.7+8.20F L+2.44RW +3.78CL−6. 6. Plot the predicted species values for each crab as a histogram. Mark the discriminant boundary between the two groups. a1<-c(-16.7,0.01,1.55,1.72,1.60,-7.47) a2<-c(-23.7,8.20,2.44,3.78,-6.47,-0.64) x<-t(as.matrix(a1[2:6]))%*%t(d.crabs[,4:8]) x<-x+rep(a1[1],200) y<-t(as.matrix(a2[2:6]))%*%t(d.crabs[,4:8]) y<-y+rep(a2[1],200) hist(x-y,20) 1 7. Write down the discriminant rule (equation 11.9 in text, equation “*” in notes) based on quantities produced by S-Plus. Assign to group 1 if (−8.19 − 0.89 − 2.06 8.07 − 6.83)0 x0 + 7.00 > 0 else assign to group 2. 8. Calculate the apparent error rate for the discriminant rule. Based on our analyses in previous labs do you think that the apparent error rate for a classification rule for Sex be higher or lower than this value. The apparent error rate is 0. 9. Classify this new observation; to what species does it likely belong? FL RW CL CW BD 23.1 20.2 46.2 52.5 21.1 Plugging this value into equation * returns a result of -15.77, indicating this observation belongs to group 2. 2