AP Statistics- Chapter 8 Line of Best Fit Errors = residuals = actual Y – predicted Y (+) from line (-) line of best fit ~ average (mean) of the plot sum of Positive errors = sum of negative errors Linear Regression Line: ο· Describes how… a response variable (Y) changes as an explanatory variable (x) changes ο· Used to … predict the value for Y for a given value of X Y = mX + b Most accurate Regression line: ο· Called: Least Squares Regression Line (LSR line or LSRL) ο· Definition: minimizes… the (ππππππ )2 of Y “y hat” (sample) ο· Form: π¦Μ = ππ + π1 π₯ slope y-intercept ο· Pieces: b1 = π π¦ π (π ) π₯ b0 = π¦Μ − π1 π₯ ο· always … passes through (π₯Μ , π¦Μ ) ο· not resistant (to outliers) – outliers affect the line a lot! ο· on calculator: STAT ο CALC ο #8 LinReg (a+bx) X List, Y List, Y1 Note: You get Y1 by going to VARS ο Y VARS ο Function ο Y1 COMPLETE WORKSHEET 8A Vocab: Extrapolation- using x value far outside the range of the data to predict a y value -inaccurate/unsure of reliability Coefficient of Determinationο· symbol: π 2 *listed as a percent ο· calculation: ο· sentence interpretation: π 2 % of the change in the y-variable is due to the change in the x-variable (or due to the LSR line) is explained by Example: For airfare data r = 0.795 so π 2 = 0.632 63.2% of the change in airfare is due to the change in distance flown. Interpreting the slope of the LSR line: Sentence: On average, for every increase in 1 x-variable (units), the y-variable tends to increase/decrease by slope (units). Example: For airfare data slope = 0.1174 (distance vs. airfare) On average, for every 1 mile flown, the airfare tends to increase by $0.12. RESIDUALS Example #1: Ungroup the group RESID. You will have lists LIST1 and LIST2. 1. Create a scatterplot of LIST1 (x) versus LIST2 (y) below. Describe the scatterplot. 14 12 L i 10 s 8 t 6 y = -0.5871x + 17.886 R² = 0.5761 4 2 2 0 0 5 10 15 20 25 List 1 Linear, moderately strong, negative, no outliers 2. Find the LSR line, r, and r2 for the data and record it below. Add the LSR line to your plot. π¦Μ = 17.886 − 0.587π₯ r = -.759 π 2 = .576 3. Look at your plot. Do you think that the line does a good job of describing the trend of the data? Yes. The positive errors and negative errors cancel out. 4. Does it hit every point though? No. That is okay though. It doesn’t have to go through any points. 5. There are obviously errors/residuals between the line and the actual points. Remember, residuals are calculated by doing actual value – predicted value. The predicted value is from the LSR line. What type of residuals will the points that fall ABOVE the line have? Positive or negative? Positive 6. What type of residuals will the points that fall BELOW the line have? Positive or negative? Negatve 7. Keep looking at your plot. Will all the residuals be the same number? Or will they all be varied? They will be varied 8. Keep looking at your plot. What can you say about the number of positive residuals as compared to the number of negative residuals. Will there be about the same amount? Or more positive? More negative? There should be the same positive as negative. RESIDUALS Residuals (errors): = actual y – predicted y ο₯ residuals ο½ 0 = π¦π − π¦Μπ xresiduals ο½ 0 Residual Plot: ο· Definition: scatterplot of the x-variable vs. the residuals ο· Helps… assess the fit of the LSR line ο· No pattern = scattered = our line is a good model for the data ο· Pattern = another model (quadratic, exponential, etc.) would be a better fit for the data ο· On calculator: o Scatterplot o X-list: keep the x-variable o Y-list:πΏ RESID Note: You must run the LSR line first!!!(LinReg) Use another model if you get residual plots like these: Example #2: 1) Do you think that the LSR line is the best model for our data in the first example (the one with LIST1 and LIST2)? Or do you think we should use a curved model, like a quadratic regression equation? 2) Let’s create a residual plot for this example. Sketch it here: 3) What does it look like? Scattered? Linear? 4) So is our line a good model for our data? Example #3: 1) Create a scatterplot of the following data and sketch it below. Describe its form. x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 y 20 18 18 16 15 14 14 12 11 10 12 13 15 15 16 18 19 20 22 23 25 20 15 y = 0.2068x + 13.879 R² = 0.1133 10 5 0 0 5 10 15 20 25 Strong, Nonlinear, Negative until x= 10 then positive 2) Calculate the LSR line and add it to your plot. Is the line a good model for this data? Y = 13.879+0.207x 3) Create the residual plot below Y-Values 10 5 0 -5 0 5 10 15 20 25 -10 4) Using the residual plot, is the line the best model for the data? Why? The line is not the best model for the data because the residual plot is not scattered. COMPLETE WORKSHEET 8B & 8C Chapter 8: Linear Regression Computer Outputs An insurance company conducts a survey of 15 of its life insurance agents. The average number of minutes spent with each potential customer and the number of policies sold in a week are noted for each agent. The following is a printout from the statistical analysis tool on Microsoft Excel. Regression Statistics Mul ti pl e R R Squa re Adjus ted R Squa re Sta nda rd Error Obs erva ti ons 0.883620846 0.780785799 0.763923168 1.311483261 15 Coefficients -1.73106061 0.549242424 Intercept Mi nutes slope y-int Standard Error 2.04612023 0.080716215 t Stat -0.84602 6.80461 P-value 0.4128433 1.25E-05 1. What is the equation of the LSR line relating minutes spent and policies sold. π¦Μ = −1.731 + 0.549π₯ 2 2. What is the value of r? What is the value of r ? π = 0.884 π 2 = 0.781 3. Interpret the slope in the context of the problem On average, for every increase in one minute spent with each potential customer, 0.549 life insurance policies tend to be sold each week. The following is a MINITAB regression printout relating average number of degree-days per month to gas consumption (in cubic feet). y-int Predictor Constant Degree-d Coef 123.24 20.221 S= 43.45 R-sq = 97.8% slope StDev 28.60 1.145 T 4.31 17.66 P 0.004 0.000 R-sq(adj) = 97.5% 1. What is the equation of the LSR line relating degree days to gas consumption? π¦Μ = 123.24 + 20.221π₯ 2. What is the value of r? What is the value of r2? π = 0.990 π 2 = 0.978 3. Interpret the slope in the context of the problem? On average, for every one increase in degree-days per month, gas consumption tends to increase by 20.221 cubic feet.