9.2 Linear Regression • Key Concepts: – Residuals – Least Squares Criterion – Regression Line – Using a Regression Equation to Make Predictions 9.2 Linear Regression • In this section, we will assume a significant linear correlation exists between the two variables of interest. – We would like to find the equation of the line that best fits the data. How do we decide whether one line is a better fit than another? • We start by measuring the residuals (or amount of error). Each residual represents the difference between what our proposed line predicts for a given vale of x and the actual observed value (see p. 486). • A positive residual tells us our line over-estimated the observed value. A negative residual tells us our line underestimated the observed value. If the residual = 0, we have an exact match. 9.2 Linear Regression • The next step is to add all the residuals together. – The line that best fits the data should have the smallest sum of residuals. • Unfortunately, whenever we add all the residuals of a data set, we end up with a sum equal to zero. We can avoid this problem by squaring the residuals and then adding. – According to the Least Squares Criterion, the line of best fit (called the regression line) will have the smallest sum of squared residuals. 9.2 Linear Regression • The equation of a regression line for an independent variable x and a dependent variable y is: y mx b where y is the predicted y-value for a given x-value. – We use the Least Squares Criterion and Calculus to derive the slope m and y-intercept b of the regression line. 9.2 Linear Regression • The slope m and y-intercept b of a regression line are given by: m n xy x y n x x 2 2 y x b y mx m n n 9.2 Linear Regression • It is best to use a regression line to make predictions for x-vales over (or close to) the range of the original data. Extrapolation (using a regression line to make predictions for x-values well beyond the range of the original data) can lead to highly inaccurate results. • Practice building a regression equation and using it to make predictions: #20 p. 491 (Wins and Earned Run Averages) #24 p. 492 (High-Fiber Cereals: Caloric and Sugar Content)