advertisement

WARM-UP • Do the work on the slip of paper (handout) HOMEWORK QUESTIONS SECTION 3.2 LEAST SQUARES REGRESSION LINES REGRESSION LINE • A regression line is a straight line that describes how a response variable (y) changes as an explanatory variable (x) changes. • You can use a regression line to predict the value of y for any value of x by substituting this x into the equation of the line. INTERPRETING REGRESSION LINES • 𝑦 = 𝑎 + 𝑏𝑥 • 𝒚 (read as “y hat”) is the predicted value of the y for a given value of x • b is the slope, the predicted change in y when x increases by 1 unit • a is the y-intercept, the predicted response variable when the explanatory variable equals zero (x=0) INFLUENTIAL POINT • An observation is influential if removing it would markedly change the position of the regression line. • Points that are outliers in the x direction are often influential. EXTRAPOLATION • Extrapolation is the use of a regression line for prediction using values of the explanatory variable (x) outside the range of the data from which the line was calculated. This should be avoided, as it leads to incorrect conclusions. • See warm-up… • What if I told you that the x’s were supposed to represent months and that the y’s were supposed to represent lows in temperature? Are your predictions still correct? RESIDUALS A residual is the difference between an observed value of the response variable and the value predicted by the regression line. That is… Residual = observed y – predicted y =𝑦−𝑦 RESIDUAL PLOTS A residual plot is a scatterplot that uses our explanatory variable as the x and the residuals as the y. We can use the residual plot to determine if a scatterplot has a linear fit. TWO IMPORTANT THINGS • The residual plot should show no obvious pattern. • A curved pattern shows that the relationship is not linear. A straight line may not be the best model for such data. • Increasing (or decreasing) spread about the line as x increases indicates that prediction of y will be less accurate for larger x (for smaller x). • The residuals should be relatively small in size. A regression line in a model that fits the data well should come “close” to most of the points. That is, the residuals should be fairly small. How do we decide whether the residuals are “small enough”? We consider the size of a “typical” prediction error. EXAMPLE – FAT GAIN Almost all of the residuals are between −0.7 and 0.7. For these individuals, the predicted fat gain from the least-squares line is within 0.7 kg of their actual fat gain during the study. That sounds pretty good. But the subjects gained only between 0.4 kg and 4.2 kg, so a prediction error of 0.7 kg is relatively large compared with the actual fat gain for an individual. The largest residual, 1.64, corresponds to a prediction error of 1.64 kg. This subject's actual fat gain was 3.8 kg, but the regression line predicted a fat gain of only 2.16 kg. That's a pretty large error, especially from the subject's perspective! SOMETHING UNUSUAL • Residuals from the least squares regression line have an unusual property – the mean of the residuals is always zero. • Why does this make sense? LEAST SQUARES REGRESSION LINE • The least squares regression line (LSRL) is the straight line that minimizes the sum of the squares of the vertical distances of the observed points from the line. • All LSRL’s go through the point (𝑥, 𝑦) • The LSRL is 𝒚 = 𝒂 + 𝒃𝒙 with • slope 𝒃 = 𝒓 𝒔𝒚 𝒔𝒙 and • y-intercept 𝒂 = 𝒚 − 𝒃𝒙 CALCULATING THE LSRL • The mean and standard deviation for this example are 𝑥 = 324.8 calories and 𝑠𝑥 = 257.66 calories. For the 16 people studied, the mean and the standard deviation are 𝑦 = 2.388 kg and 𝑠𝑦 = 1.1389 kg. The correlation is r = 00.7786. Find the equation of the LRSL. Show your work. COEFFICIENT OF DETERMINATION • The coefficient of determination is the fraction of the variation in one variable that is accounted for by the LSRL on the other variable. • 𝑟 2 on your calc… • This measures how well the regression was in explaining the response. • If 𝑟 2 = .73 it means that 73% of the variation in y is due to the straight line relationship between x and y. CAUTION! • Correlation and regression must be interpreted with caution. Plot the data to be sure that the relationship is roughly linear and to detect outliers. Also, the correlation and regression line are nonresistant, often outliers in x will greatly influence the regression line. • Most of all, be careful not to conclude that there is a cause-and-effect relationship between two variables just because they are strongly linear. (Don’t mistake correlation with causation!) HOMEWORK • Page 191 (35-42, 44-46)