ST 361: Ch3.3 Simple Linear Regression Topics: a) Definition b) Finding the regression line: methods of least squares c) Deviation between regression line and data d) Statistical inference ----------------------------------------------------------------------------------------------------------------------------(a) Simple Linear Regression: ___________________________________________________ Ex1. Speed of cars and the distances taken to stop. X (________________ variable; explanatory variable) = speed (mph) Y (________________ variable; response variable) = distance (feet) obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Speed 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 xi Dist 2 10 4 22 16 10 18 26 34 17 28 14 20 24 28 26 34 yi obs 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Speed 13 13 14 14 14 14 15 15 15 16 16 17 17 17 18 18 18 xi dist obs 34 46 26 36 60 80 20 26 54 32 40 32 40 50 42 56 76 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Speed 18 19 19 19 20 20 20 20 20 22 23 24 24 24 24 25 xi dist 84 36 46 68 32 48 52 56 64 66 54 70 92 93 120 85 1 A (sample) regression line has the form of ____________________________ that best describes the ______________ relationship between Y and X as displayed in the scatter plot. a : ____________ Usually not of interests b : _____________ slope = 0 implies __________________________________________ Ex1. The sample regression line is as show below: Y = 17.1 3.9 X The (sample) regression line Y = a bX can be used to (1) Describe the (linear) relationship between X and Y How? To report that when X increases by 1 unit, Y increases/decreases _____ units. E.g., in the speed-distance example (Ex1), (2) Predict Y using X How? If we know someone with X = x * , then we can predict the corresponding Y by Yˆ = a bx * E.g., in the speed-distance example (Ex1), for a car with speed X = 15 mph, we can predict the required stopping distance using 2 Interpolation: if x * is inside the range of the observed xi values E.g., predict Y for x * within ___ to ___ mph in the speed-distance example Extrapolation: if x * is outside the range of the observed xi values. Extrapolation too far could be dangerous, as we have no idea how the relationship may be. Population regression line vs. sample regression line The true underlying relationship between X and Y is Y = X …………………….. population regression line From sample data and obtain the estimates of and , and get Y = a bX………………………..sample regression line (b) Finding the regression line: Methods of Least Squares Thoughts: Each sample point in the scatter plot can be presented as ( xi , yi ) yi Residual (denoted by _____) = The difference between real yi and the predicted ŷi The best regression line Y = a + bX is the line that makes ei ’s as small as possible. That is, we find the line by finding a and b that minimizes n n i 1 i 1 ei2 yi a bxi 2 3 The solutions of the least square method: Comments: Swapping labels of X and Y will/will not (choose one) changes the value of b Regression line always goes through point x, y Regression coefficient b vs. correlation coefficient r (1) b and r have _____________ sign (2) the values of b and r are _______________ b measures the level of change in Y when X increase 1 unit r measures how far away the dots to the line (see (c) below) (3) Labels of variables X and Y b is/isnot sensitive to the labels of X variable and Y variable, while r is/is not (4) Change of units on X or Y b is/is not sensitive to the unit change (on either X or Y), while r is/is not 4 Ex1 (continue). In the speed-distance example, x i i 770, yi 2149, s x 5.3, s y 25.8, xi yi 38482 . i i Find (1) the regression line Y=a+bx, and (2) correlation coefficient r . 5 Ex2. Tips Y vs. Bill X (in dollars) X 20 40 60 80 Y 2.4 8 10 22 (a) Draw a scatterplot. Is the relationship between X and Y linear? (b) Determine the regression line using least square method. 4 x 50, y 10.6, s x 25.8, s y 8.3, xi y i 2728 i 1 (c) Calculate the sample correlation coefficient r. (d) What is the expected tip when the bill is 36 dollars? Is it a reasonable prediction? (e) What is the expected tip when the bill is 6 dollars? Is it a reasonable predication? 6