Math 311.01, Winter 2003 Lab 3 Looking at Data – Relationships Minitab Commands Discussed: GraphPlot StatBasic StatisticsCorrelation StatRegressionFitted Line Plot StatRegressionRegression In this lab, we’ll review and investigate various Minitab commands that permit the analysis of relationships between two variables. Scatterplots: Consider the very small data set given in Example 2.4 on page 110. This data set is recorded as archaeopteryx.mpj on the class webpage (http://www.cwu.edu/~englundt/Data.htm). Download it now. You’ve already learned that the command GraphPlot can be used to construct a basic scatterplot of these data. Do so now. Make the length of the Humorous the explanatory variable. Note that we could alter the plotting symbol using the dialog box that appears when you click on the Edit Attributions box. Using the dialog box that appears when you click on the Annotations button, it is possible to give the plot a title, label plotted points, etc. Using the dialog box that appears when you click on the Frame button, you can change the labels on the axes. Rather that just plotting the points in a scatterplot, you can add connection lines (joint the points with lines), add projection lines (drop a line from each point to the x-axis), and add areas (fill in the area under a polygon joining the points). Also, you can employ the scatterplot smoother lowness to plot a piecewise linear continuous curve through the scatter of points. These features are available via GraphPlotDisplay. There are a number of features that allow you to control the appearance of the plot. It is also possible to have multiple scatterplots on the same plot. For example, C3 in the archaeopteryx worksheet contains the natural log of the femur variable (we’ll learn how to do this soon). We obtain the plot below by adding another pair of variables to the second Graph variables box in Display 2.1 (see last page) with C3 in the y variable and humerus in the x variable. To put these scatterplots on the same plot use FrameMultiple Graphs and click on Overlay graphs on the same page radio button. 80 70 Femur 60 50 40 30 20 10 0 40 45 50 55 60 65 Humerus 70 75 80 85 Correlations: While a scatterplot is a convenient graphical method for assessing whether or not there is any relationship between two variables, we would also like to assess this numerically. The correlation coefficient, r, provides a numerical summarization of the degree to which a linear relationship exists between two quantitative variables, and this can be calculated using the StatBasic StatisticsCorrelation command. Regressions: Regression is another technique for assessing the strength of a linear relationship existing between two variables and it is closely related to correlations. For this we may use the StatRegression command. As noted in the text and in lecture, the regression analysis of two quantitative variables involves computing the least-squares line y=a+bx, where one variable is taken to be the response variable y and the other is taken to be the explanatory variable x. It is very convenient to have a scatterplot of the points together with the least-squares line. This can be accomplished using the StatRegressionFitted Line Plot command. Additional topics: There are some additional quantities that are often of interest in a regression analysis. For example, you may wish to have the fitted values y a bx at each x value printed as well as the residuals y y . Clicking on the Results button in the dialog box of Display 2.4 (see the next page) and filling in the ensuing dialog box as in Display 2.7 results in these quantities being printed in the Session window as well as the usual output of Display 2.5. You will probably want to keep these values for later work. In this case, clicking on the Storage button of Display 2.4 and filling in the ensuing dialog box as in Display 2.8 results in these quantities being saved in the next two available columns – in this case, C4 and C5 – with the names resl1 and fits1 for the residuals and fits, respectively. Even more likely is that you’ll want to plot the residuals as part of assessing whether the assumptions that underlie a regression analysis make sense in the particular application. For this, click on the Graphs button in the dialog box of Display 2.4. The dialog box of Display 2.9 becomes available. Notice that we have requested that the standardized residuals – each residual divided by its standard error – be plotted. You will probably want to just plot Regular residuals. Recall, no pattern should be discernable. Questions: 1. Complete exercise 2.10. Calculate the least-squares line and make a scatterplot of Fuel used against Speed together with the least-squares line. Plot the regular residuals against Speed. What is the squared correlation coefficient between these variables? 2. Complete exercise 2.62 using Minitab. 3. Complete any homework problems you have not yet completed.