LinRegtTest.Notes

advertisement
AP Statistics
Linear Regression t Test
NOTES
Body weights and backpack weights were collected for eight students.
Weight(lbs)
Backpack weight (lbs)
120
26
187
30
109
26
103
24
131
29
165
35
158
31
116
28
These data were entered into a statistics package and least squares regression of backpack weight
on body weight was requested. Here are the results:
Predictor
Constant
Backpack
Coef
16.265
0.0908
Stdev
3.937
0.02831
S=2.270
R-sq=63.2%
r-sq(adj)=57%
t-ratio
4.13
3.21
P
0.006
0.018
What is the equation of the least-squares line?
Interpret r-squared in the context of this problem.
2
If a scatter plot, residual plot, and r do not provide convincing evidence for a useful linear
relationship, we can use the following model utility test for simple linear regression to help
us assess the situation more definitively.
Our data is a sample – we have not done a census on all students body weight vs. backpack
weight. So, the a and b we have been calculating are statistics – they are based on a sample.
But there is some “true” value out there for the slope of the regression line.
Let  represent the true slope of the linear regression line. This is a parameter. In the same way
that x is the sample mean used to estimate the population parameter  , b is the sample slope
statistic used to estimate the true slope of the regression line (from yˆ  a  bx ) .
Hypotheses about  can be tested using a t-test very similar to the t-tests discussed for other
situations. The null hypothesis specifies that there is no useful linear relationship between x and
y, whereas the alternative hypothesis specifies that there is a useful linear relationship between x
and y. If Ho is rejected, we conclude that the simple linear regression model is useful for
predicting y. This means that knowledge of x is useful for predicting y.
Ho :  = 0 (there is no useful linear relationship)
Ha :   0 (there is a useful linear relationship) …. or  > 0 or  < 0 (if needed)
test statistic: t =
b  hypothesizedvalue
sb
where df = n – 2 and sb 
se
Sxx
Conditions:
a) The scatterplot indicates a reasonable linear relationship
b) The set of observations represents the population and was randomly selected.
c) The residual plot does not show any curved pattern and is reasonable scattered with little
skewness and no extreme outliers.
d) The errors around the regression line at each value of x follow a normal distribution.
CALCULATOR TEST: LinRegTTest
When p is small we reject Ho and conclude that knowing x does give us information about y.
What you must know (or have) for the following HT and CI:
b1 = slope estimate (average) based on sample data
 = hypothesized population parameter for slope
s b1 = standard deviation of slope estimate based on sample data
n = sample size (number of sampled paired data)
sb
s.e.( sb1 ) = standard error of slope estimate = 1 : This is what appears on
n
computer outputs.



t* = critical test value based on confidence level and degrees of freedom
Run LinRegtTest in calculator and compare with computer output above.
Use the data from example 1 to determine if there is a significant relationship between
bodyweight and backpack weight.
The Fish and Wildlife Agency is interested in being able to estimate the weight of bears based on
their length. Data was collected from a random sample of 143 bears and a least squares
regression line estimated. The output from this model is provided below.
Predictor
Constant
Length
Coef
-422.49
10.1487
Stdev
31.19
0.5031
S=56.07
R-sq=74.3%
R-sq(adj)=74.1%
t-ratio
-13.55
20.17
P
0.000
0.000
(Analysis of variance information may also be included, but is not tested on AP test.)
Download