Looking at Data – Relationships Math 311.01, Winter 2003 Lab 3

advertisement
Math 311.01, Winter 2003
Lab 3
Looking at Data – Relationships
Minitab Commands Discussed: GraphPlot
StatBasic StatisticsCorrelation
StatRegressionFitted Line Plot
StatRegressionRegression
In this lab, we’ll review and investigate various Minitab commands that permit the analysis of
relationships between two variables.
Scatterplots:
Consider the very small data set given in Example 2.4 on page 110. This data set is recorded as
archaeopteryx.mpj on the class webpage (http://www.cwu.edu/~englundt/Data.htm). Download
it now.
You’ve already learned that the command GraphPlot can be used to construct a basic
scatterplot of these data. Do so now. Make the length of the Humorous the explanatory variable.
Note that we could alter the plotting symbol using the dialog box that appears when you click on
the Edit Attributions box.
Using the dialog box that appears when you click on the Annotations button, it is possible to give
the plot a title, label plotted points, etc.
Using the dialog box that appears when you click on the Frame button, you can change the labels
on the axes.
Rather that just plotting the points in a scatterplot, you can add connection lines (joint the points
with lines), add projection lines (drop a line from each point to the x-axis), and add areas (fill in
the area under a polygon joining the points).
Also, you can employ the scatterplot smoother lowness to plot a piecewise linear continuous
curve through the scatter of points.
These features are available via GraphPlotDisplay. There are a number of features that allow
you to control the appearance of the plot.
It is also possible to have multiple scatterplots on the same plot. For example, C3 in the
archaeopteryx worksheet contains the natural log of the femur variable (we’ll learn how
to do this soon). We obtain the plot below by adding another pair of variables to the second
Graph variables box in Display 2.1 (see last page) with C3 in the y variable and humerus in the
x variable. To put these scatterplots on the same plot use FrameMultiple Graphs and click on
Overlay graphs on the same page radio button.
80
70
Femur
60
50
40
30
20
10
0
40
45
50
55
60
65
Humerus
70
75
80
85
Correlations:
While a scatterplot is a convenient graphical method for assessing whether or not there is any
relationship between two variables, we would also like to assess this numerically. The
correlation coefficient, r, provides a numerical summarization of the degree to which a linear
relationship exists between two quantitative variables, and this can be calculated using the
StatBasic StatisticsCorrelation command.
Regressions:
Regression is another technique for assessing the strength of a linear relationship existing
between two variables and it is closely related to correlations. For this we may use the
StatRegression command.
As noted in the text and in lecture, the regression analysis of two quantitative variables involves
computing the least-squares line y=a+bx, where one variable is taken to be the response variable
y and the other is taken to be the explanatory variable x.
It is very convenient to have a scatterplot of the points together with the least-squares line. This
can be accomplished using the StatRegressionFitted Line Plot command.
Additional topics:
There are some additional quantities that are often of interest in a regression analysis. For
example, you may wish to have the fitted values y  a  bx at each x value printed as well as the
residuals y  y . Clicking on the Results button in the dialog box of Display 2.4 (see the next
page) and filling in the ensuing dialog box as in Display 2.7 results in these quantities being
printed in the Session window as well as the usual output of Display 2.5.
You will probably want to keep these values for later work. In this case, clicking on the Storage
button of Display 2.4 and filling in the ensuing dialog box as in Display 2.8 results in these
quantities being saved in the next two available columns – in this case, C4 and C5 – with the
names resl1 and fits1 for the residuals and fits, respectively.
Even more likely is that you’ll want to plot the residuals as part of assessing whether the
assumptions that underlie a regression analysis make sense in the particular application. For this,
click on the Graphs button in the dialog box of Display 2.4. The dialog box of Display 2.9
becomes available. Notice that we have requested that the standardized residuals – each residual
divided by its standard error – be plotted. You will probably want to just plot Regular residuals.
Recall, no pattern should be discernable.
Questions:
1. Complete exercise 2.10. Calculate the least-squares line and make a scatterplot of Fuel
used against Speed together with the least-squares line. Plot the regular residuals against
Speed. What is the squared correlation coefficient between these variables?
2. Complete exercise 2.62 using Minitab.
3. Complete any homework problems you have not yet completed.
Download