File

advertisement
Regression Line
Definition:
The least-squares regression line of y on x is the line that makes the
sum of the squared residuals as small as possible.
Least-Squares Regression
Different regression lines produce different residuals. The
regression line we want is the one that minimizes the sum of
the squared residuals.
+
 Least-Squares
Regression Line
Definition: Equation of the least-squares regression line
We have data on an explanatory variable x and a response variable y
for n individuals. From the data, calculate the means and standard
deviations of the two variables and their correlation. The least squares
regression line is the line ŷ = a + bx with
slope
br
and y intercept


sy
sx
a  y  bx
Least-Squares Regression
We can use technology to find the equation of the leastsquares regression line. We can also write it in terms of the
means and standard deviations of the two variables and
their correlation.
+
 Least-Squares
 Least-Squares
Regression Line
br
sx
and y intercept a  y  b x
The number of miles (in thousands) for the 11 used Hondas have a mean of
50.5 and a standard deviation of 19.3. The asking prices had a mean of

$14,425 and a standard deviation of $1,899. The correlation for these
variables is r = -0.874. Find the equation of the least-squares regression line
and explain what change in price we would expect for each additional 19.3
thousand miles.
To find the slope: b   0 . 874 1899   86 . 0
19 . 3
To find the intercept: 14425 = a - 86(50.5)  a = 18,768
For each additional 19.3 thousand miles, we expect the cost to change by
r  s y  (  0 . 874 )( 1899 )   1660 . So, the CR-V will cost about $1660 fewer
dollars for each additional 19.3 thousand miles.
Note: Due to roundoff error, values for the slope and y intercept are slightly
different from the values on page 166.
Least-Squares Regression
slope
sy
+
.
Definition:
Equation of the least-squares regression line
 Least-Squares
Regression Line
+
.
Body
Weight (lb):
120
187
109
103
131
165
158
116
Backpack
Weight (lb):
26
30
26
24
29
35
31
28
You should get yˆ  16 . 3  0 . 0908 x
Least-Squares Regression
Find the least-squares regression equation for the following
data using your calculator.
Plots
Definition:
A residual plot is a scatterplot of the residuals against the explanatory
variable. Residual plots help us assess how well a regression line fits
the data.
Least-Squares Regression
One of the first principles of data analysis is to look for an
overall pattern and for striking departures from the pattern. A
regression line describes the overall pattern of a linear
relationship between two variables. We see departures from
this pattern by looking at the residuals.
+
 Residual
Residual Plots
Pattern in residuals
Linear model not
appropriate
Definition:
If we use a least-squares regression line to predict the values of a
response variable y from an explanatory variable x, the standard
deviation of the residuals (s) is given by
s

residuals
n2
2

2
ˆ
(
y

y
)
 i
n2
Least-Squares Regression
A residual plot magnifies the deviations of the points from the
line, making it easier to see unusual observations and
patterns.
1)
The residual plot should show no obvious patterns
2)
The residuals should be relatively small in size.
+
 Interpreting
Plots
For the used Hondas data, the standard deviation is
s = 8 , 499 ,851 = $972. So, when we use number of miles to
11  2
estimate the asking price, we will be off by an average of
$972.
Least-Squares Regression
Here are the scatterplot and the residual plot for the used
Hondas data:
+
 Residual
Role of r2 in Regression
Definition:
The coefficient of determination r2 is the fraction of the variation in
the values of y that is accounted for by the least-squares regression
line of y on x. We can calculate r2 using the following formula:
2
r 1
where
and

SSE 
 residual
2

SST 
 (y
i
 y)
2
SSE
SST
Least-Squares Regression
The standard deviation of the residuals gives us a numerical
estimate of the average size of our prediction errors. There
is another numerical quantity that tells us how well the leastsquares regression line predicts values of the response y.
+
 The
Role of r2 in Regression
SSE/SST = 30.97/83.87
SSE/SST = 0.368
If we use the mean backpack
Therefore,
36.8%
of the variation
weight as our
prediction,
the sum in
pack
is unaccounted
for by
of theweight
squared
residuals is 83.87.
the
least-squares
regression line.
SST
= 83.87
Least-Squares Regression
r 2 tells us how much better the LSRL does at predicting values of y
than simply guessing the mean y for each value in the dataset.
Consider the example on page 179. If we needed to predict a
backpack weight for a new hiker, but didn’t know each hikers
weight, we could use the average backpack weight as our
prediction.
+
 The
1 – SSE/SST = 1 – 30.97/83.87
r2 = 0.632
If we use the LSRL to make our
63.2
% of the
variation
in backpack weight
predictions,
the
sum of the
is
accounted
for by
the linear model
squared
residuals
is 30.90.
relating
pack weight to body weight.
SSE = 30.90
Download