AP Statistics Section 3.2 A Regression Lines Linear relationships between two quantitative variables are quite common. Correlation measures the direction and strength of these relationships. Just as we drew a density curve to model the data in a histogram, we can summarize the overall pattern in a linear relationship by drawing a _______________ regression line on the scatterplot. Note that regression requires that we have an explanatory variable and a response variable. A regression line is often used to predict the value of y for a given value of x. Who:______________________________ 16 healthy young adults What:______________________________ Exp.-change in NEA (cal) ______________________________ Resp.-fat gain (kg) Why:_______________________________ Do changes in NEA explain weight gain When, where, how and by whom? The data come from a controlled experiment in which subjects were forced to overeat for an 8-week period. Results of the study were published in Science magazine in 1999. 8 F a t 6 G a i n 4 (kg) 2 0 -100 0 100 200 300 400 500 NEA (calories) 600 700 8 F a t G a i n (kg) 6 4 2 0 -100 0 100 200 300 400 500 NEA (calories) 600 700 Numerical summary: The correlation between NEA change and fat gain is r = _______ .7786 A least-squares regression line relating y to x has an equation of the form ___________ yˆ a bx In this equation, b is the _____, slope and a is the __________. y-intercept The formula at the right will allow you to find the value of b: br Sy Sx Once you have computed b, you can then find the value of a using this equation. a y b(x ) We can also find these values on our TI-83/84. same way we found r earlier For this example, the LSL is yˆ 3.505 .0034 x or FatGain(kg) 3.505 .0034( NEAchange(cal.)) Interpreting b: The slope b is the predicted _____________ rate of change in the response variable y as the explanatory variable x changes. The slope b = -.0034 tells us that fat gain goes down by .0034 kg for each additional calorie of NEA. You cannot say how important a relationship is by looking at how big the regression slope is. Interpreting a: The y-intercept a = 3.505 kg is the fat gain estimated by the model if NEA does not change when a person overeats. Model: Using the equation above, draw the LSL on your scatterplot. 8 F a t G a i n (kg) .0034 34 10000 .34 100 1 .7 500 6 4 2 0 -100 0 100 200 300 400 500 NEA (calories) 600 700 TI 83/84 8:LinReg(a+bx) L1, L2 , Y1 VARS Y VARS 1 : Function 1 : Y1 GRAPH ENTER Prediction: Predict the fat gain for an individual whose NEA increases by 400 cal by: (a) using the graph ___________ (b) using the equation _________ 8 F a t G a i n (kg) 6 4 2 0 -100 0 100 200 300 400 500 NEA (calories) 600 700 Prediction: Predict the fat gain for an individual whose NEA increases by 400 cal by: (a) using the graph ___________ 2.2 (b) using the equation _________ yˆ 3.505 .0034(400) Prediction: Predict the fat gain for an individual whose NEA increases by 400 cal by: (a) using the graph ___________ 2.2 (b) using the equation _________ 2.145 Predict the fat gain for an individual whose NEA increases by 1500 cal. yˆ 3.505 .0034(1500) yˆ 1.595 So we are predicting that this individual loses fat when he/she overeats. What went wrong? 1500 is way outside the range of NEA values in our data Extrapolation is the use of a regression line for prediction outside the range of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate. a b