pptx

advertisement
STA291
Statistical Methods
Lecture 27
Inference for Regression
Does the cost of a movie depend on its length?
๐ต๐‘ข๐‘‘๐‘”๐‘’๐‘ก = −31.38695 + 0.7144001 × ๐‘…๐‘ข๐‘›๐‘‡๐‘–๐‘š๐‘’
Now we want to know, how useful is this model?
The Population and the Sample
The movie budget sample is based on 120 observations.
But we know observations vary from sample to sample. So
we imagine a true line that summarizes the relationship
between x and y for the entire population,
๏ญ y ๏€ฝ ๏ข0 ๏€ซ ๏ข1 x
Where µy is the population mean of y at a given value of x.
We write µy instead of y because the regression line
assumes that the means of the y values for each value of x
fall exactly on the line.
The Population and the Sample
For a given value x:
๏‚ง Most, if not all, of the y values obtained from a
particular sample will not lie on the line.
๏‚ง The sampled y values will be distributed about µy.
๏‚ง We can account for the difference between ลท and µy by
adding the error residual, or ε : y ๏€ฝ ๏ข 0 ๏€ซ ๏ข1 x ๏€ซ ๏ฅ
The Population and the Sample
Regression Inference
๏‚ง Collect a sample and estimate the population β’s by
finding a regression line (Chapter 6):
yˆ ๏€ฝ b0 ๏€ซ b1 x
b0 estimates ๏ข0 , b1 estimates ๏ข1
๏‚ง The residuals e = y – ลท are the sample-based versions
of ε.
๏‚ง Account for the uncertainties in β0 and β1 by making
confidence intervals, as we’ve done for means and
proportions.
Assumptions and Conditions
In this order:
1. Linearity Assumption
2. Independence Assumption
3. Equal Variance Assumption
4. Normal Population Assumption
Assumptions and Conditions
Summary of Assumptions and Conditions
Assumptions and Conditions
Summary of Assumptions and Conditions
1. Make a scatterplot of the data to check for linearity.
(Linearity Assumption)
2. Fit a regression and find the residuals, e, and
predicted values ลท.
3. Plot the residuals against time (if appropriate) and
check for evidence of patterns (Independence
Assumption).
4. Make a scatterplot of the residuals against x or the
predicted values. This plot should not exhibit a “fan”
or “cone” shape. (Equal Variance Assumption)
5. Make a histogram and Normal probability plot of the
residuals (Normal Population Assumption)
The Standard Error of the Slope
For a sample, we expect b1 to be close, but not equal to
the model slope β1.
For similar samples, the standard error of the slope is a
measure of the variability of b1 about the true slope β1.
๐‘ ๐‘’
๐‘†๐ธ ๐‘1 =
๐‘ ๐‘ฅ ๐‘› − 1
Spread around the line: se
Spread of the x values: sx
Sample size: n
The Standard Error of the Slope
๐‘†๐ธ ๐‘1 =
๐‘ ๐‘’
๐‘ ๐‘ฅ ๐‘› − 1
Which of these scatterplots would give the more consistent
regression slope estimate if we were to sample repeatedly
Hint: Compare se’s.
from the underlying population?
The Standard Error of the Slope
๐‘†๐ธ ๐‘1 =
๐‘ ๐‘’
๐‘ ๐‘ฅ ๐‘› − 1
Which of these scatterplots would give the more consistent
regression slope estimate if we were to sample repeatedly
Hint: Compare sx’s.
from the underlying population?
The Standard Error of the Slope
๐‘†๐ธ ๐‘1 =
๐‘ ๐‘’
๐‘ ๐‘ฅ ๐‘› − 1
Which of these scatterplots would give the more consistent
regression slope estimate if we were to sample repeatedly
Hint: Compare n’s.
from the underlying population?
A Test for the Regression Slope
When the conditions are met, the standardized
estimated regression slope,
๐‘ก=
๐‘1 − ๐›ฝ1
๐‘†๐ธ ๐‘1
Follows a t-distribution with df = n – 2.
We estimate SE(b1) with:
๐‘†๐ธ ๐‘1 =
๐‘ ๐‘’
๐‘ ๐‘ฅ ๐‘› − 1
Where sx is the ordinary standard deviation of the x’s and
๐‘ ๐‘’ =
๐‘ฆ−๐‘ฆ
๐‘›−2
2
A Test for the Regression Slope
The usual null hypothesis about the slope is that it’s equal
to 0. Why?
A slope of zero says that y doesn’t tend to change linearly
when x changes. In other words, if the slope equals zero,
there is no linear association between the two variables.
H0: β1 = 0. This would mean that x and y are not linearly
related.
Ha: β1 ≠ 0. This would mean . . .
CI for the Regression Slope
When the assumptions and conditions are met,
we can find a confidence interval for ๏ข1 from
∗
๐‘1 ± ๐‘ก๐‘›−2
× ๐‘†๐ธ ๐‘1
Where the critical value t* depends on the
confidence level and has df = n – 2.
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to
see how long it would last. A test subject showered
with the soap each day for 15 days and recorded the
weight (in grams) remaining. Conditions were met so
a linear regression gave the following:
Dependent variable is: Weight
R squared = 99.5%
s = 2.949
Variable
Coefficient
SE(Coeff)
Intercept
123.141
1.382
Day
-5.57476
0.1068
t-ratio
89.1
-52.2
P-value
<0.0001
<0.0001
What is the standard deviation of the residuals?
What is the standard error of b1?
What are the hypotheses for the regression slope?
At α = 0.05, what is the conclusion?
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see how
long it would last. A test subject showered with the soap each
day for 15 days and recorded the weight (in grams) remaining.
Conditions were met so a linear regression gave the following:
Dependent variable is: Weight
R squared = 99.5%
s = 2.949
Variable
Coefficient
SE(Coeff)
Intercept
123.141
1.382
Day
-5.57476
0.1068
t-ratio
89.1
-52.2
P-value
<0.0001
<0.0001
What is the standard deviation of the residuals? se = 2.949
What is the standard error of b1? SE( b1 ) = 0.0168
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see how
long it would last. A test subject showered with the soap each
day for 15 days and recorded the weight (in grams) remaining.
Conditions were met so a linear regression gave the following:
Dependent variable is: Weight
R squared = 99.5%
s = 2.949
Variable
Coefficient
SE(Coeff)
Intercept
123.141
1.382
Day
-5.57476
0.1068
t-ratio
89.1
-52.2
P-value
<0.0001
<0.0001
H o : ๏ข1 ๏€ฝ 0
What are the hypotheses for the
H a : ๏ข1 ๏‚น 0
regression slope?
At α = 0.05, what is the conclusion? Since the p-value is small
(<0.0001), reject the null hypothesis. There is strong evidence
of a linear relationship between Weight and Day.
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see how
long it would last. A test subject showered with the soap each
day for 15 days and recorded the weight (in grams) remaining.
Conditions were met so a linear regression gave the following:
Dependent variable is: Weight
R squared = 99.5%
s = 2.949
Variable
Coefficient
SE(Coeff)
Intercept
123.141
1.382
Day
-5.57476
0.1068
t-ratio
89.1
-52.2
P-value
<0.0001
<0.0001
Find a 95% confidence interval for the slope?
Interpret the 95% confidence interval for the slope?
At α = 0.05, is the confidence interval consistent with the
hypothesis test conclusion?
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see how
long it would last. A test subject showered with the soap each
day for 15 days and recorded the weight (in grams) remaining.
Conditions were met so a linear regression gave the following:
Dependent variable is: Weight
R squared = 99.5%
s = 2.949
Variable
Coefficient
SE(Coeff)
Intercept
123.141
1.382
Day
-5.57476
0.1068
t-ratio
89.1
-52.2
P-value
<0.0001
<0.0001
Find a 95% confidence interval for the slope?
b1 ๏‚ฑ t * SE (b1 ) ๏€ฝ ๏€ญ5.57476 ๏‚ฑ (2.160)(0.1068) ๏€ฝ ( ๏€ญ5.805, ๏€ญ5.344)
Interpret the 95% confidence interval for the slope? We can be
95% confident that weight of soap decreases by between 5.34
and 5.8 grams per day.
At α = 0.05, is the confidence interval consistent with the
hypothesis test conclusion? Yes, the interval does not contain
zero, so reject the null hypothesis.
๏‚ง Don’t fit a linear regression to data that aren’t straight.
๏‚ง Watch out for changing spread.
๏‚ง Watch out for non-Normal errors. Check the histogram
and the Normal probability plot.
๏‚ง Watch out for extrapolation. It is always dangerous to
predict for x-values that lie far away from the center of
the data.
๏‚ง Watch out for high-influence points and unusual
observations.
๏‚ง Watch out for one-tailed tests. Most software
packages perform only two-tailed tests. Adjust your
P-values accordingly.
Looking back
o Know
the Assumptions and conditions
for inference about regression
coefficients and how to check them, in
this order: LIEN
o Know the components of the standard
error of the slope coefficient
o Test statistic
o CI Interpretation
Download