Lab 6 solutions

advertisement
Activity 6: Regression and Correlation
Observations on rainfall volume (m3) and runoff volume for a particular location were recorded. These data can
be obtained on the website. When you click on the link, the data should open automatically in Minitab.
This lab will cover some things we haven’t discussed in class, including Sections 12.4 and 12.5. The topics
covered in this lab might be on the final exam.
A) Are the assumptions for regression reasonable in this case? Go to Stat > Regression > Regression and
enter the variables. The response is runoff volume and the predictor is rainfall volume. Under ‘Graphs’, obtain
a normal probability plot of the residuals and a plot of the residuals versus rainfall volume. Copy and paste
these plots below. What do these plots indicate about the assumptions? (The model assumptions are on p. 500.
The two plots indicate whether the normality assumption is satisfied and whether the assumption of constant
variance σ2 is satisfied. You should know how to check the former assumption using the normal probability
plot. The latter assumption may be checked by verifying that the height of the spread of residuals around the
center line is roughly constant across the width of the plot of residuals versus rainfall.)
Normal Probability Plot of the Residuals
(response is runoff v)
2
Normal Score
1
0
-1
-2
-10
0
10
Residual
The normal probability plot is fairly straight, which indicates that the assumption of normally distributed errors
is reasonable. The residual plot shows a fairly even spread of points around the center across the width of the
plot, which indicates that the assumption of equal variance of the errors is reasonable.
B) Is there a significant linear relationship between rainfall and runoff volume? Give the appropriate
hypotheses, show how the test statistic (given in the output) was calculated, report the p-value (also given in the
output), and make a complete conclusion. Hint: The linear relationship is significant as long as the estimated
slope coefficient is significantly different from zero. If you’re stuck, refer to the model utility test on p. 526.
The regression equation is
runoff volume = - 1.13 + 0.827 rainfall volume
Predictor
Constant
rainfall
Coef
-1.128
0.82697
S = 5.240
SE Coef
2.368
0.03652
R-Sq = 97.5%
T
-0.48
22.64
P
0.642
0.000
R-Sq(adj) = 97.3%
The hypotheses are H0: β1=0 and Ha: β1≠0. The t-statistic 22.64 equals the slope estimate (.82697) divided by
its standard error (.03652). The p-value is 0.000, which means that there is a statistically significant linear
relationship between rainfall volume and runoff volume.
C) The sample correlation, r, is a numerical measure of the strength of the linear relationship between two
variables. It always lies between -1 and 1, and its properties are listed on p. 540. Its square is the coefficient of
determination, and its sign (plus or minus) is the same as the slope of the regression line. What is the
correlation between rainfall and runoff volume? Explain how you found it from the regression output.
In this case, r is positive because the slope is positive. Numerically, it’s the square root of r2=.975, so r=.987.
D) Use Minitab to test for the absence of correlation. The alternative will be the 2-sided alternative on p. 544.
Go to Stat > Basic Statistics > Correlation. Report a p-value for the test, and interpret it in plain English. (Is
there evidence that a linear relationship between rainfall volume and runoff volume exists?)
Pearson correlation of rainfall volume and runoff volume = 0.988
P-Value = 0.000
The small p-value means that the correlation coefficient is significantly different from zero, which means that
there is a significant linear relationship between rainfall volume and runoff volume.
E) In a year with 30 m3 of rainfall, what is a good estimate of the expected (mean) amount of runoff? What is a
good estimate for the actual (mean plus error) runoff? The difference between these questions is the difference
between a confidence interval for the mean (top of p. 532) and a prediction interval for a future observation (p.
535). Minitab can give you both intervals at the same time if you go to Stat > Regression > Regression and
click the Options button. Type the value(s) of the predictor variable you want in the “new observations” box
and make sure the confidence level is 95%. You do not need to check any of the storage boxes; the confidence
interval for the mean (CI) and the prediction interval (PI) will be displayed automatically. Give each of these
intervals below, and explain in plain English why the PI is wider than the CI.
New Obs
1
Fit
23.68
SE Fit
1.60
(
95.0% CI
20.23,
27.13)
(
95.0% PI
11.85,
35.52)
Both the CI and PI take into account the variability in estimating where the true regression should pass through
x=30. The prediction interval is wider than the confidence interval because it is the only interval that ALSO
accounts for the variability of a single observation around this line.
Download