Chapter 8 Notes Filled In

advertisement
AP Statistics- Chapter 8
Line of Best Fit
Errors = residuals = actual Y – predicted Y
(+)
from line
(-)
line of best fit ~ average (mean) of the plot
sum of Positive errors = sum of negative errors
Linear Regression Line:
ο‚· Describes how… a response variable (Y) changes as an explanatory variable (x) changes
ο‚·
Used to … predict the value for Y for a given value of X
Y = mX + b
Most accurate Regression line:
ο‚· Called: Least Squares Regression Line (LSR line or LSRL)
ο‚·
Definition: minimizes… the (π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿπ‘ )2 of Y
“y hat” (sample)
ο‚·
Form: 𝑦̂ = π‘π‘œ + 𝑏1 π‘₯
slope
y-intercept
ο‚·
Pieces:
b1 =
𝑠𝑦
π‘Ÿ (𝑠 )
π‘₯
b0 = 𝑦̅ − 𝑏1 π‘₯
ο‚·
always … passes through (π‘₯Μ… , 𝑦̅)
ο‚·
not resistant (to outliers) – outliers affect the line a lot!
ο‚·
on calculator: STAT οƒ  CALC οƒ  #8 LinReg (a+bx) X List, Y List, Y1
Note: You get Y1 by going to VARS οƒ  Y VARS οƒ  Function οƒ  Y1
COMPLETE WORKSHEET 8A
Vocab:
Extrapolation- using x value far outside the range of the data to predict a y value
-inaccurate/unsure of reliability
Coefficient of Determinationο‚· symbol: π‘Ÿ 2
*listed as a percent
ο‚·
calculation:
ο‚·
sentence interpretation: π‘Ÿ 2 % of the change in the y-variable is due to the change in the x-variable (or due to
the LSR line)
is explained by
Example: For airfare data r = 0.795 so π‘Ÿ 2 = 0.632
63.2% of the change in airfare is due to the change in distance flown.
Interpreting the slope of the LSR line:
Sentence: On average, for every increase in 1 x-variable (units), the y-variable tends to increase/decrease by
slope (units).
Example: For airfare data slope = 0.1174 (distance vs. airfare)
On average, for every 1 mile flown, the airfare tends to increase by $0.12.
RESIDUALS
Example #1: Ungroup the group RESID. You will have lists LIST1 and LIST2.
1. Create a scatterplot of LIST1 (x) versus LIST2 (y) below. Describe the scatterplot.
14
12
L
i 10
s 8
t 6
y = -0.5871x + 17.886
R² = 0.5761
4
2
2
0
0
5
10
15
20
25
List 1
Linear, moderately strong, negative, no outliers
2. Find the LSR line, r, and r2 for the data and record it below. Add the LSR line to your plot.
𝑦̂ = 17.886 − 0.587π‘₯
r = -.759
π‘Ÿ 2 = .576
3. Look at your plot. Do you think that the line does a good job of describing the trend of the data?
Yes. The positive errors and negative errors cancel out.
4. Does it hit every point though?
No. That is okay though. It doesn’t have to go through any points.
5. There are obviously errors/residuals between the line and the actual points. Remember, residuals are calculated
by doing actual value – predicted value. The predicted value is from the LSR line. What type of residuals will the
points that fall ABOVE the line have? Positive or negative?
Positive
6. What type of residuals will the points that fall BELOW the line have? Positive or negative?
Negatve
7. Keep looking at your plot. Will all the residuals be the same number? Or will they all be varied?
They will be varied
8. Keep looking at your plot. What can you say about the number of positive residuals as compared to the number
of negative residuals. Will there be about the same amount? Or more positive? More negative?
There should be the same positive as negative.
RESIDUALS
Residuals (errors):
= actual y – predicted y
οƒ₯ residuals ο€½ 0
= 𝑦𝑖 − 𝑦̂𝑖
xresiduals ο€½ 0
Residual Plot:
ο‚· Definition: scatterplot of the x-variable vs. the residuals
ο‚·
Helps… assess the fit of the LSR line
ο‚·
No pattern = scattered = our line is a good model for the data
ο‚·
Pattern = another model (quadratic, exponential, etc.) would be a better fit for the data
ο‚·
On calculator:
o Scatterplot
o X-list: keep the x-variable
o Y-list:𝐿 RESID
Note: You must run the LSR line first!!!(LinReg)
Use another model if you get residual plots like these:
Example #2:
1) Do you think that the LSR line is the best model for our data in the first example (the one with LIST1 and
LIST2)? Or do you think we should use a curved model, like a quadratic regression equation?
2) Let’s create a residual plot for this example. Sketch it here:
3) What does it look like? Scattered? Linear?
4) So is our line a good model for our data?
Example #3:
1) Create a scatterplot of the following data and sketch it below. Describe its form.
x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
y 20 18 18 16 15 14 14 12 11 10 12 13 15 15 16 18 19 20 22 23
25
20
15
y = 0.2068x + 13.879
R² = 0.1133
10
5
0
0
5
10
15
20
25
Strong, Nonlinear, Negative until x= 10 then positive
2) Calculate the LSR line and add it to your plot. Is the line a good model for this data?
Y = 13.879+0.207x
3) Create the residual plot below
Y-Values
10
5
0
-5
0
5
10
15
20
25
-10
4) Using the residual plot, is the line the best model for the data? Why?
The line is not the best model for the data because the residual plot is not scattered.
COMPLETE WORKSHEET 8B & 8C
Chapter 8: Linear Regression Computer Outputs
An insurance company conducts a survey of 15 of its life insurance agents. The average number of minutes spent
with each potential customer and the number of policies sold in a week are noted for each agent.
The following is a printout from the statistical analysis tool on Microsoft Excel.
Regression Statistics
Mul ti pl e R
R Squa re
Adjus ted R Squa re
Sta nda rd Error
Obs erva ti ons
0.883620846
0.780785799
0.763923168
1.311483261
15
Coefficients
-1.73106061
0.549242424
Intercept
Mi nutes
slope
y-int
Standard Error
2.04612023
0.080716215
t Stat
-0.84602
6.80461
P-value
0.4128433
1.25E-05
1. What is the equation of the LSR line relating minutes spent and policies sold.
𝑦̂ = −1.731 + 0.549π‘₯
2
2. What is the value of r? What is the value of r ?
π‘Ÿ = 0.884
π‘Ÿ 2 = 0.781
3. Interpret the slope in the context of the problem
On average, for every increase in one minute spent with each potential customer, 0.549 life insurance
policies tend to be sold each week.
The following is a MINITAB regression printout relating average number of degree-days per month to gas
consumption (in cubic feet).
y-int
Predictor
Constant
Degree-d
Coef
123.24
20.221
S= 43.45
R-sq = 97.8%
slope
StDev
28.60
1.145
T
4.31
17.66
P
0.004
0.000
R-sq(adj) = 97.5%
1. What is the equation of the LSR line relating degree days to gas consumption?
𝑦̂ = 123.24 + 20.221π‘₯
2. What is the value of r? What is the value of r2?
π‘Ÿ = 0.990
π‘Ÿ 2 = 0.978
3. Interpret the slope in the context of the problem?
On average, for every one increase in degree-days per month, gas consumption tends to increase by 20.221
cubic feet.
Download