9.2 Linear Regression • Key Concepts:

advertisement
9.2 Linear Regression
• Key Concepts:
– Residuals
– Least Squares Criterion
– Regression Line
– Using a Regression Equation to Make
Predictions
9.2 Linear Regression
• In this section, we will assume a significant
linear correlation exists between the two
variables of interest.
– We would like to find the equation of the line that best
fits the data. How do we decide whether one line is a
better fit than another?
• We start by measuring the residuals (or amount of error).
Each residual represents the difference between what our
proposed line predicts for a given vale of x and the actual
observed value (see p. 486).
• A positive residual tells us our line over-estimated the
observed value. A negative residual tells us our line underestimated the observed value. If the residual = 0, we have
an exact match.
9.2 Linear Regression
• The next step is to add all the residuals together.
– The line that best fits the data should have the
smallest sum of residuals.
• Unfortunately, whenever we add all the residuals of a data
set, we end up with a sum equal to zero. We can avoid this
problem by squaring the residuals and then adding.
– According to the Least Squares Criterion, the line of
best fit (called the regression line) will have the
smallest sum of squared residuals.
9.2 Linear Regression
• The equation of a regression line for an
independent variable x and a dependent
variable y is:
y  mx  b
where y is the predicted y-value for a given x-value.
– We use the Least Squares Criterion and Calculus to
derive the slope m and y-intercept b of the regression
line.
9.2 Linear Regression
• The slope m and y-intercept b of a regression
line are given by:
m
n xy    x   y 
n x    x 
2
2
y
x


b  y  mx 
m
n
n
9.2 Linear Regression
• It is best to use a regression line to make
predictions for x-vales over (or close to) the
range of the original data. Extrapolation (using a
regression line to make predictions for x-values
well beyond the range of the original data) can
lead to highly inaccurate results.
• Practice building a regression equation and
using it to make predictions:
#20 p. 491 (Wins and Earned Run Averages)
#24 p. 492 (High-Fiber Cereals: Caloric and Sugar Content)
Download