WARM-UP • Do the work on the slip of paper (handout)

advertisement
WARM-UP
• Do the work on the slip of paper (handout)
HOMEWORK QUESTIONS
SECTION 3.2
LEAST SQUARES REGRESSION LINES
REGRESSION LINE
• A regression line is a straight line that describes how
a response variable (y) changes as an explanatory
variable (x) changes.
• You can use a regression line to predict the value of
y for any value of x by substituting this x into the
equation of the line.
INTERPRETING REGRESSION LINES
• 𝑦 = 𝑎 + 𝑏𝑥
• 𝒚 (read as “y hat”) is the predicted value of the y
for a given value of x
• b is the slope, the predicted change in y when x
increases by 1 unit
• a is the y-intercept, the predicted response variable
when the explanatory variable equals zero (x=0)
INFLUENTIAL POINT
• An observation is influential if removing it would
markedly change the position of the regression line.
• Points that are outliers in the x direction are often
influential.
EXTRAPOLATION
• Extrapolation is the use of a regression line for
prediction using values of the explanatory variable
(x) outside the range of the data from which the
line was calculated. This should be avoided, as it
leads to incorrect conclusions.
• See warm-up…
• What if I told you that the x’s were supposed to
represent months and that the y’s were supposed
to represent lows in temperature? Are your
predictions still correct?
RESIDUALS
A residual is the difference
between an observed value
of the response variable and
the value predicted by the
regression line. That is…
Residual
= observed y – predicted y
=𝑦−𝑦
RESIDUAL
PLOTS
A residual plot is a
scatterplot that uses
our explanatory
variable as the x and
the residuals as the y.
We can use the
residual plot to
determine if a
scatterplot has a
linear fit.
TWO IMPORTANT THINGS
• The residual plot should show no obvious pattern.
• A curved pattern shows that the relationship is not linear. A
straight line may not be the best model for such data.
• Increasing (or decreasing) spread about the line as x
increases indicates that prediction of y will be less accurate
for larger x (for smaller x).
• The residuals should be relatively small in size. A
regression line in a model that fits the data well
should come “close” to most of the points. That is,
the residuals should be fairly small. How do we
decide whether the residuals are “small enough”?
We consider the size of a “typical” prediction error.
EXAMPLE – FAT GAIN
Almost all of the residuals are
between −0.7 and 0.7. For these
individuals, the predicted fat gain
from the least-squares line is within
0.7 kg of their actual fat gain
during the study. That sounds pretty
good. But the subjects gained only
between 0.4 kg and 4.2 kg, so a
prediction error of 0.7 kg is
relatively large compared with the
actual fat gain for an individual.
The largest residual, 1.64,
corresponds to a prediction error
of 1.64 kg. This subject's actual fat
gain was 3.8 kg, but the regression
line predicted a fat gain of only
2.16 kg. That's a pretty large error,
especially from the subject's
perspective!
SOMETHING UNUSUAL
• Residuals from the least squares regression line have
an unusual property – the mean of the residuals is
always zero.
• Why does this make sense?
LEAST SQUARES REGRESSION LINE
• The least squares regression line (LSRL) is the straight
line that minimizes the sum of the squares of the
vertical distances of the observed points from the
line.
• All LSRL’s go through the point (𝑥, 𝑦)
• The LSRL is 𝒚 = 𝒂 + 𝒃𝒙 with
• slope 𝒃 = 𝒓
𝒔𝒚
𝒔𝒙
and
• y-intercept 𝒂 = 𝒚 − 𝒃𝒙
CALCULATING THE LSRL
• The mean and standard deviation for this example
are 𝑥 = 324.8 calories and 𝑠𝑥 = 257.66 calories. For
the 16 people studied, the mean and the standard
deviation are 𝑦 = 2.388 kg and 𝑠𝑦 = 1.1389 kg. The
correlation is r = 00.7786. Find the equation of the
LRSL. Show your work.
COEFFICIENT OF DETERMINATION
• The coefficient of determination is the fraction of the
variation in one variable that is accounted for by the
LSRL on the other variable.
• 𝑟 2 on your calc…
• This measures how well the regression was in
explaining the response.
• If 𝑟 2 = .73 it means that 73% of the variation in y is due
to the straight line relationship between x and y.
CAUTION!
• Correlation and regression must be interpreted with
caution. Plot the data to be sure that the
relationship is roughly linear and to detect outliers.
Also, the correlation and regression line are
nonresistant, often outliers in x will greatly influence
the regression line.
• Most of all, be careful not to conclude that there is
a cause-and-effect relationship between two
variables just because they are strongly linear.
(Don’t mistake correlation with causation!)
HOMEWORK
• Page 191 (35-42, 44-46)
Download