Homework 2

advertisement
Stat 519 Multivariate Analysis
Homework #2
R Graphics
The data set LOST DAYS may be found in your text CD data sets in chapter 3. This data set
looks at lost work days per crew member due to injury (LostDPC), crew size (SIZE), foreman
age (ForeAge), foreman experience in years (ForeExp), average crew member experience in
years (AvgExp), and whether or not the crew customarily uses power tools (Power).
Import this file into R and use it to answer the following questions. Turn in your R input and
output, including the requested plots, along with any discussion on your results.
1. Examine histograms and kernel density estimates of LostDPC. In addition to the defaults,
examine at least one additional choice of bin width (histogram) and bandwidth (kde).
What do these plots tell you about the distribution of LostDPC?
2. Examine a strip chart and a boxplot of LostDPC, broken down by Power. What do these
charts tell you about the effect of the use of power tools on the distribution of lost days
per capita?
3. Examine scatterplots, looking at the effect on LostDPC (y) from each of crew size,
foreman age, foreman experience, and average crew experience (x). Discuss your results.
Which independent variable appears to have the greatest influence on lost days per
capita?
4. Use the plot3d function from the rgl library to examine 3d scatterplots of this data. Your
third variable (z) should be LostDPC, and your second (y) should be the influential
variable you identified in problem 3. Try each of the remaining numeric variables as your
first (x) variable. (Remember, you’ll need to use the rgl.snapshot command to save your
rgl images to import into Word.) Discuss your results. Can you find any interaction
effects (that is, behavior of x and y on z that is not obvious from just looking at x vs z and
y vs z separately)?
5. Construct a pairwise scatterplot matrix, a stars plot, and a parallel coordinates plot on the
five numeric variables, using color to separate by Power. Examine these plots for
additional interesting interactions, outliers, or other features of the data. Discuss your
observations.
6. Construct conditioning plots on some variables which you find in interesting and
illuminative, and discuss your observations.
Download