Stat 407 Lab 2 (Summary Statistics of Multivariate Data) Fall 2001 SOLUTION This lab is an introduction to using S-Plus with data collected on crabs from Australia. This data is taken from “Modern Applied Statistics with S-Plus (2nd ed)” by Venables and Ripley. The data contains measurements on 2 species of crabs (blue, orange) and males and females from both species. The variables in the data set are: Index = Obs number within group CL = Carapace Length CW = Carapace Width FL = Frontal Lobe RW = Rear Width BD = Body Depth We are going to calculate the summary statistics for this data, and report them in matrix format. 1. Copy the data from my class web site. You should be able to save it to onto your own zip disk. 2. Load the data into S-Plus. You need to go into File → Import Data → From File. Navigate the appropriate folders to select the crabs data on your zip disk. Use the Data Set controls to get the data into an S-Plus data structure. (Chapter 2 of the Splus Users guide gives detailed information on how to do this.) 3. Generate summary statistics for the data set. Calculate the mean, min, max, and variance for each variables, and the covariances for each pair of variables, for the 5 physical measurements on the crabs. Report these in matrix format. (Chapter 6 of the Splus users manual gives detailed information on how to do this.) Mean: Min: Max: FL RW CL CW BD FL [15.6 [ 7.2 [23.1 RW 12.74 6.50 20.20 CL 32.1 14.7 47.6 FL RW CL CW [12.22 8.16 24.4 26.6 [ 8.16 6.62 16.4 18.2 [24.36 16.35 50.7 55.8 [26.55 18.24 55.8 62.0 [11.82 7.84 24.0 26.1 CW 36.4 17.1 54.6 BD 14.0]’ 6.1]’ 21.6]’ BD 11.82] 7.84] 23.97] 26.09] 11.73] 4. Generate the mean vector and variance-covariance matrix conditionally on species. With your group decide on a good way to present this information, so that we can compare the values across species. It could be tabular, or side-by-side on the page, or whatever you think best. From this display describe how the two species differ in the physical measurements. FL RW CL CW BD Sp 1 Mean: [14.1 11.9 30.1 34.7 12.6]’ Sp 2 Mean: [17.1 13.5 34.2 38.1 15.5]’ Sp 1 FL RW CL CW FL RW CL [ 9.12 6.18 20.7 [ 6.18 5.20 14.1 [20.74 14.10 47.6 [23.64 16.21 54.2 CW BD 23.6 9.16] 16.2 6.32] 54.2 21.00] 61.9 23.95] 1 BD [ 9.16 Sp 2 FL RW CL CW BD 6.32 21.0 24.0 9.41] FL RW CL CW BD [10.73 7.72 21.9 24.5 10.14] [ 7.72 6.79 15.4 17.7 7.06] [21.90 15.42 45.8 50.8 21.19] [24.49 17.67 50.8 56.9 23.53] [10.14 7.06 21.2 23.5 9.93] Species 2 is slightly bigger overall, in terms of mean value. The variances and covariances are similar for both species. 5. In general language, explain what you learn by examining the summary statistics of multivariate data (that is, not in relation to this crabs data). Why is it important to calculate these and study them? What type of information can’t you learn about the nature of and relationship between variables, from studying only the mean and variance-covariance? From the mean and variance information we learn about the location and scale for the multiple variables. The means are single values which give an estimate for the center of the variable values. The variance provides information on the spread of the values around the mean value. The covariances provide information on how similar pairs of variables are, in a single numner. (The min and max provide the spread of the values in the sample.) 2