Chp 8

advertisement
CHAPTER 8: LINEAR
REGRESSION
By Dara Lee and Michelle Smith
Period 1

The linear model is just an equation of a straight
line through data.

The points in a scatterplot don’t all line up, but a
straight line can summarize the general pattern.

The model can help understand how the
variables are associated.
LINEAR MODEL

An estimate from a model is called the predicted
value (ŷ)

The difference between observed (y) and predicted
values (ŷ) is called the residual (e)

Residual=Observed-Predicted (e=y-ŷ)

A negative residual means the predicted value is
too big.

A positive residual means the predicted value is too
small.

To see if a linear model is appropriate, the residuals
plot should be scattered with no interesting features,
no direction, no shape, no bends, and no outliers.
RESIDUALS

The line of best fit is the line for which the sum of the
squared residuals (R²) is the smallest.

Also known as “line of least squares.”

By squaring the residuals, all are made positive for
summation. This also emphasizes the largest residuals.

The smaller the sum, the better the fit.

Equation of the line: ŷ=bo+b1x

Equation of the slope: b1=r(Sy/Sx)

b0=y-b1x
LINE OF BEST FIT

The equation for a line that passes through
the origin can be written with just a slope
and no intercept: y=mx

The coordinates of these standard points
aren’t written as (x,y)—their coordinates are
z-scores: (zx,zy)

For every horizontal change in Sx there is a
vertical change in r(Sy)

Moving one standard deviation away from
the mean in “x” moves our estimate “r”
standard deviations away from the mean in
“y.”

In general, moving any number of standard
deviations in “x” moves “r” times that
number of standard deviations in “y.”
CORRELATION AND THE LINE

Each predicted “y” tends to be closer to its
mean (in standard deviations) than its
corresponding x was.

This property of the linear model is called
regression to the mean; the line is called the
regression line.
HOW BIG CAN PREDICTED VALUES
GET?

“r” is the correlation between two variables. The
greater the absolute value of the correlation, the
stronger the association.

The squared correlation gives the fraction of the
data’s variation accounted for by the model,
and 1-R² is the fraction of the original variation
left in the residuals.

An R² of 0 means that none of the variance in the
data is in the model; all of it is still in the residuals.

Squaring the residuals ensures that all are positive
so that they can be added to figure out the line
of best fit. The smaller the sum, the better the fit.
R²: THE VARIATION ACCOUNTED FOR

Quantitative Variables Condition: Variables cannot be
categorical variables.

Straight Enough Condition: Scatterplot must look
reasonably straight. The linearity can be checked again
after the regression, when residuals can be examined.

Outlier Condition: No point should be singled out. To spot
outliers, you can check the residuals—they may have
large residuals. Outliers can dramatically change a
regression model.
ASSUMPTIONS AND
CONDITIONS
CHAPTER 8 PROBLEM #33
Classified ads in the Ithaca
Journal offered several used
Toyota Corollas for sale. Listed
below are the ages of the cars
and the advertised prices.
Age (yr) Price Advertised ($)
1
12995, 10950
2
10495
3
10995, 10995
4
6995, 7990
5
8700, 6995
6
5990, 4995
9
3200, 2250, 3995
11
2900, 2995
13
1750
a) Find the equation of the regression
line.
b) Explain the meaning of the slope of
the line.
c) Explain the meaning of the intercept
of the line.
d) If you want to sell a 7-year-old
Corolla, what price seems
appropriate?
e) You have a chance to buy one of
two cars. They are about the same
age and appear to be in equally
good condition. Would you rather
buy the one with a positive residual or
a negative residual? Explain.
f) You see a “For Sale” sign on a 10year-old Corolla stating the asking
price as $1500. What is the residual?
g) Would this regression model be useful
in establishing a fair price for a 20year-old car? Explain.
CHAPTER 8 PROBLEM #33

a) Predicted price= 12319.6 - 924 x years

b) Every extra year of age decreases average
value by $924

c) The average new Corolla costs $12,319.60

d) $5851.60

e) Negative residual. Its price is below the predicted
value for its age.

f) -$1579.60

g) No. After age 13, the model predicts negative
prices. The relationship is no longer linear.
CHAPTER 8 PROBLEM #37
Here are the data used when the association
between the amounts of fat and calories in
hamburgers were examined.
Fat (g)
19
Calories 410
31
34
35
39
39
43
580
590
570
640
680
660
When a scatterplot was made, the equation of the
line of regression was calculated to be:
Predicted calories= 211+11.06 x calories/fat gram
a) Explain why you cannot use that model to estimate
the fat content of a burger with 600 calories.
b) Using an appropriate model, estimate the fat
content of a burger with 600 calories.
CHAPTER 8 PROBLEM #37

a) The regression was for predicting calories from
fat, not the other way around.

b) Predicted fat grams= -15 + .083 grams/calories
Predict 34.8 grams of fat.
Download