Activity 6: Regression and Correlation Observations on rainfall volume (m3) and runoff volume for a particular location were recorded. These data can be obtained on the website. When you click on the link, the data should open automatically in Minitab. This lab will cover some things we haven’t discussed in class, including Sections 12.4 and 12.5. The topics covered in this lab might be on the final exam. A) Are the assumptions for regression reasonable in this case? Go to Stat > Regression > Regression and enter the variables. The response is runoff volume and the predictor is rainfall volume. Under ‘Graphs’, obtain a normal probability plot of the residuals and a plot of the residuals versus rainfall volume. Copy and paste these plots below. What do these plots indicate about the assumptions? (The model assumptions are on p. 500. The two plots indicate whether the normality assumption is satisfied and whether the assumption of constant variance σ2 is satisfied. You should know how to check the former assumption using the normal probability plot. The latter assumption may be checked by verifying that the height of the spread of residuals around the center line is roughly constant across the width of the plot of residuals versus rainfall.) Normal Probability Plot of the Residuals (response is runoff v) 2 Normal Score 1 0 -1 -2 -10 0 10 Residual The normal probability plot is fairly straight, which indicates that the assumption of normally distributed errors is reasonable. The residual plot shows a fairly even spread of points around the center across the width of the plot, which indicates that the assumption of equal variance of the errors is reasonable. B) Is there a significant linear relationship between rainfall and runoff volume? Give the appropriate hypotheses, show how the test statistic (given in the output) was calculated, report the p-value (also given in the output), and make a complete conclusion. Hint: The linear relationship is significant as long as the estimated slope coefficient is significantly different from zero. If you’re stuck, refer to the model utility test on p. 526. The regression equation is runoff volume = - 1.13 + 0.827 rainfall volume Predictor Constant rainfall Coef -1.128 0.82697 S = 5.240 SE Coef 2.368 0.03652 R-Sq = 97.5% T -0.48 22.64 P 0.642 0.000 R-Sq(adj) = 97.3% The hypotheses are H0: β1=0 and Ha: β1≠0. The t-statistic 22.64 equals the slope estimate (.82697) divided by its standard error (.03652). The p-value is 0.000, which means that there is a statistically significant linear relationship between rainfall volume and runoff volume. C) The sample correlation, r, is a numerical measure of the strength of the linear relationship between two variables. It always lies between -1 and 1, and its properties are listed on p. 540. Its square is the coefficient of determination, and its sign (plus or minus) is the same as the slope of the regression line. What is the correlation between rainfall and runoff volume? Explain how you found it from the regression output. In this case, r is positive because the slope is positive. Numerically, it’s the square root of r2=.975, so r=.987. D) Use Minitab to test for the absence of correlation. The alternative will be the 2-sided alternative on p. 544. Go to Stat > Basic Statistics > Correlation. Report a p-value for the test, and interpret it in plain English. (Is there evidence that a linear relationship between rainfall volume and runoff volume exists?) Pearson correlation of rainfall volume and runoff volume = 0.988 P-Value = 0.000 The small p-value means that the correlation coefficient is significantly different from zero, which means that there is a significant linear relationship between rainfall volume and runoff volume. E) In a year with 30 m3 of rainfall, what is a good estimate of the expected (mean) amount of runoff? What is a good estimate for the actual (mean plus error) runoff? The difference between these questions is the difference between a confidence interval for the mean (top of p. 532) and a prediction interval for a future observation (p. 535). Minitab can give you both intervals at the same time if you go to Stat > Regression > Regression and click the Options button. Type the value(s) of the predictor variable you want in the “new observations” box and make sure the confidence level is 95%. You do not need to check any of the storage boxes; the confidence interval for the mean (CI) and the prediction interval (PI) will be displayed automatically. Give each of these intervals below, and explain in plain English why the PI is wider than the CI. New Obs 1 Fit 23.68 SE Fit 1.60 ( 95.0% CI 20.23, 27.13) ( 95.0% PI 11.85, 35.52) Both the CI and PI take into account the variability in estimating where the true regression should pass through x=30. The prediction interval is wider than the confidence interval because it is the only interval that ALSO accounts for the variability of a single observation around this line.