Lecture#35

advertisement
Stat 141 R1 - Lecture #35
Announcements:
1) Assignment #11 Question 5:
The answer is wrong … should be “fail to reject” but MyStatLab wants
“reject”... so give the wrong answer for full marks in this question ๏Œ
2) Exam:
STAT 141 R1 3 hrs 1400 Wed Apr 17 MAIN GYM,
~45 Multiple Choice Questions
Chapters 7, 8, 18-28 …. some pre MT skills will be required.
Simple Linear Regression …. continued
Last time:
Ex) Predicting final exam marks (%) from midterm exam marks (%) in a class
of 88 students:
Student
Midterm mark
Final mark
#1
67%
62%
#2
72%
50%
…
…
…
#88
88%
91%
Stat 141 R1 - Lecture #35
Given x = midterm percentage, y = final percentage,
๐‘› = 88, xฬ… = 67.812, yฬ… = 52.643,
sx = 17.922, sy = 25.430, r = 0.718
∑(yi–ลทi)2 = 27278.82
We had calculated:
The slope and intercept of the sample line of best fit:
๏ƒฐ sample line of best fit: ๐‘ฆฬ‚= -16.443 + 1.019 x
An estimate for σ (standard deviation about the population line): se
Given SSE = ∑(yi–ลทi)2 = 27278.82 :
๐‘ ๐‘’2 =
๐‘†๐‘†๐ธ
๐‘›−2
=
27278.82
88−2
= 317.196
→ ๐œŽ ≈ ๐‘ ๐‘’ = √317.196 = 17.810
page 2
Stat 141 R1 - Lecture #35
page 3
Inference for the population slope β1
When the 4 basic assumptions of the SLR model are satisfied:
o The relationship between x and y is sufficiently linear.
Presuming linearity, this means, at any x, με = 0.
o The std. dev. of ε is the same for any particular x (constant).
o The distribution of ε at any particular x is normal.
o The random deviations ε1, ε2, ..., εn associated with different
observations are independent of one another.
then: i) b1 is normally distributed
ii) The mean of b1 is ๐œ‡b1 = β1
iii) The standard deviation of b1 is ๐œŽ๐‘1 =
The standard error of b1 is SE(๐‘1 ) =
๐œŽ
√∑(๐‘ฅ๐‘– −๐‘ฅฬ… )2
๐‘ ๐‘’
√∑(๐‘ฅ๐‘– −๐‘ฅฬ… )2
=
๐‘ ๐‘’
๐‘ ๐‘ฅ √๐‘›−1
๏ƒฐ CI for β1: b1± tα/2× SE(b1) with df= n– 2
Test statistic for H0: β1= 0 : ๐‘ก0 =
๐‘1 −๐›ฝ1
๐‘†๐ธ(๐‘1 )
with df= n – 2
Ex) Construct a 95% CI for β1.
Sol.: SE(b1) =
๐‘ ๐‘’
๐‘ ๐‘ฅ √๐‘›−1
=
17.810
17.922√87
= = 0.1065,
df = n-2 = 88-2 = 86 => using df = 75: tα/2 ≈ t0.025 = 1.992
b1± tα/2× SE(b1) = 1.019 ± (1.992)(0.1065) = 1.019 ± 0.212 =
(0.807, 1.231)
Stat 141 R1 - Lecture #35
page 4
Ex) Is there sufficient evidence to conclude that the final percentage
increases as midterm percentage increases? Carry out an appropriate
test using ๏ก = 0.01 .
Sol.: H0: β1=0
HA: β1> 0
Assumptions of SLR model: as above
Test statistic:
๐‘ −๐›ฝ
1.019−0
๐‘ก0 = 1 1) =
= 9.568
๐‘†๐ธ(๐‘1
0.1065
with df = 88 – 2 = 86 => 75
P-value:
In the t-table, the corresponding range of p-values is (0.005, 0).
Note that the test is one-tailed.
Conclusion: Reject H0 in favour of HA at ๏ก = 0.01
There is convincing evidence against H0 in favour of HA: final
percentage increases as midterm percentage increases (a positive linear
association between mt% and fin%).
A typical summary table ( by Excel or StatsCrunch etc )
Stat 141 R1 - Lecture #35
page 5
Inferences based on the estimated regression line:
• CI for the mean value of y corresponding to an x value:
For ลทν= b0 + b1xν
ลทν ± tdf, α/2× SE(๐œ‡ฬ‚ ๐œˆ )
(df = n – 2)
• Prediction interval (PI) for an individual value of y corresponding to an
x value:
ลทν ± tdf, α/2× SE(๐‘ฆฬ‚๐œˆ )
(df = n – 2)
Note that the PI is wider than the CI. Why?
Ex) Give a 95% CI for the mean final% when midterm% = 73%.
Compare with the 95% PI for final% when midterm% = 73%.
Sol.: ลทν= b0+ b1xν= -16.443+ 1.019(73) = 57.928%
where tn-2,α/2= t86,0.025 ≈ 1.992
Stat 141 R1 - Lecture #35
CI: ลทν ± tdf, α/2× SE(๐œ‡ฬ‚ ๐œˆ )= 57.928 ± (1.992)(1.977) =
57.928 ± 3.939 = (53.990, 61.867)
PI: ลทν ± tdf, α/2× SE(๐‘ฆฬ‚๐œˆ )= 57.928 ± (1.992)(17.919) =
57.928 ± 35.696 = (22.233, 93.624)
๏Š๏Š๏Š๏Š THIS IS THE END! ๏Š๏Š๏Š๏Š
page 5
Download