Stat 407 Lab 3 (Plotting Multivariate Data using S-Plus) Fall... This lab is an introduction to using graphics in S-Plus...

advertisement
Stat 407 Lab 3 (Plotting Multivariate Data using S-Plus) Fall 2001 SOLUTION
This lab is an introduction to using graphics in S-Plus with data collected on crabs from Australia. This
data is taken from “Modern Applied Statistics with S-Plus (3rd ed)” by Venables and Ripley. (Chapters 3, 4, 5
of the Splus users manual gives detailed information on the generating the plots for this Lab.)
The data contains measurements on 2 species of crabs (blue - sp= 1, orange - sp= 2) and males (sex= 1) and
females (sex= 2) from both species. The variables in the data set are:
Index = Obs number within group
Sp = Species
Sex = Gender
FL = Frontal Lobe
RW = Rear Width
CL = Carapace Length
CW = Carapace Width
BD = Body Depth
1. You should already have a copy of this data on you zip disk. Locate it and load it into S-Plus.
2. Generate a scatterplot matrix of the 5 physical measurements. To do this highlight the last 5 columns in
the spreadsheet. Open the Graph menu, select 2D plot, select Matrix, click OK. Describe the relationships
between the 5 variables. (Note also, that double left-clicking on the plot brings up some plot options so
that you can edit colors, backgrounds, ... Also the top panel has a selection of plot editing tools.)
All 5 variables have a strong linear relationship. The variance is larger at higher values, especially for plots
containing RW, also to a lesser extent FL. CL and CW are almost perfectly linearly related. There is some
evidence of clustering in some plots, for example, RW vs CL, and FL vs CW.
3. Generate a coplot (using trellis plot display) of RW against CL, and Sex as the conditioning variable, with
a smoothed loess curve overlay. Make sure no column of data is highlighted in the Data view. In the Graph
menu, select 2D plot, Linear, Smoothing - Loess, OK. Select the X and Y variables accordingly. Now
here is the tricky conditioning part. Make sure both Data window and Plot window are visible. Highlight
the Sex column in the Data window. Click half-way down the column and drag this to the plot window.
You will see a dashed rectangle highlight in the plot window, release the mouse button when the cursor is
in this box. What can you see from this plot? (Note that sex= 1 =male, sex= 2 =female.)
The larger male crabs have lower than expected RW values based on CL values.
4. Generate a scatterplot of RW againts CL, and use symbol or color to represent the two sexes. Make sure
that no column is highlighted in the Data view. In the Graph menu, select 2D plot, Linear, Scatterplot,
OK. Select the CL and RW as the x and y variables respectively, then select Sex as the w variable. In the
Symbol/Color panel select color of the symbol to be set by the w variable. What do you notice from this
plot? Is it easier to see differences between the sexes from this plot or from the coplot (previous question)?
The relationship between RW and CL for both males and females appears linear. The smallest crabs seem
to be almost indistinguishable on the sexes, but as the values of both variables get larger the female crabs
have increasingly higher RW values in comparison to males. The values of RW and CL for both sexes have
higher variance as the values increase.
5. Using graphics find which of the physical measurements are important for separating the two species. (You
can also use the summary statistics that you calculated in the last lab.)
FL, BD and CW appear to be fairly important variables for separating species. For example, the blue crabs
(sp= 1) tend to have higher values for CW then the orange crabs (sp= 2) based on similar FL values.
6. Explain the differences in the joint distribution between the 4 groups (2 species, 2 sexes). Are there some
values of the variables for which the 4 groups would be difficult to separate?
1
The 4 groups have strong linear trends between all 5 variables, that is, “pencil-shaped”. There is a shift of
the linear trend according to species. There is different slope to the linear trend for females and males, especially
when RW is considered in relation to the other variables. For the smaller crabs appear more difficult to distinguish
between the sexes.
2
Download