File

advertisement
Bell Ringer
A random sample of records of sales of homes from Feb.
15 to Apr. 30, 1993, from the files maintained by the
Albuquerque Board of Realtors gives the Price and Size (in
square feet) of 117 homes. A regression to predict Price
(in thousands of dollars) from Size has r = 0.84. The
residuals plot indicated that a linear model is appropriate.
a)What are the variables and units in this regression?
b)What units does the slope have?
c)Do you think the slope is positive or negative?
Linear Regression
Recall that a residual is the difference between
an observed value and the predicted value.
𝑒=𝑦 − 𝑦
The standard deviation of the residuals gives us a
measure of how much the points spread around
the regression line.
𝑠𝑒 =
𝑒2
𝑛−2
• r is the correlation coefficient
• If we square r, we get the portion of
the variation in “y” accounted for by
variation in “x”
𝟐
𝒓 = 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝒂𝒄𝒄𝒐𝒖𝒏𝒕𝒆𝒅 𝒇𝒐𝒓
AKA “coefficient of determination”
𝟏−
𝟐
𝒓
= 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝒍𝒆𝒇𝒕 𝒊𝒏 𝒕𝒉𝒆 𝒓𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔
Example
The correlation between a cereal’s fiber and potassium contents is
r = 0.903. What fraction of the variability in potassium is accounted for
by the amount of fiber that servings contain?
About 81.5% of the variability in potassium content is accounted for by the model.
The regression model for fiber (in grams) and potassium content (in mg)
based on 77 breakfast cereals is 𝑃𝑜𝑡𝑎𝑠𝑠𝑖𝑢𝑚 = 38 + 27𝐹𝑖𝑏𝑒𝑟. What does
it mean if 𝑠𝑒 = 30.77?
True potassium content of cereals vary from the predicted values with a standard
deviation of 30.77 milligrams.
The notation that is typically
2
used is 𝑅
We express 𝑅 as a percent
between 0% and 100%
2
A random sample of records of sales of homes from February 15 to April
30, 1993, from the files maintained by the Albuquerque Board of Realtors
gives the Price and Size (in square feet) of 117 homes. A regression to
predict Price (in thousands of dollars) from Size has an R-squared of
71.4%. The residuals plot indicated that a linear model is appropriate.
a)What are the variables and units in the regression?
The explanatory variable (x) is size, measured in square feet, and the response
variable (y) is price measured in thousands of dollars.
b)What units does the slope have?
The units of the slope are thousands of dollars per square foot.
c) Do you think the slope is positive or negative?
The slope of the regression line predicting price from size should be positive.
Bigger homes are expected to cost more.
From the bell ringer example: A regression to predict Price (in thousands
of dollars) from Size has an R-squared of 71.4%. The residuals plot
indicated that a linear model is appropriate.
a)What is the correlation between Size and Price?
The correlation between size and price is 𝑟 = .714 = 0.845. The positive value of the
square root is used, since the relationship is believed to be positive.
b)What would you predict about the Price of a home 1 standard deviation
above average in Size?
The price of a home that is one standard deviation above the mean size would be
predicted to be 0.845 standard deviations (in other words r standard deviations) above
the mean price.
c) What would you predict about the Price of a home 2 standard
deviations below average in Size?
The price of a home that is two standard deviations below the mean size would be
predicted to be 1.69 (or 2 x 0.845 ) standard deviations below the mean price.
Engine sizes (called
displacement) measure the
volume of the cylinders in
cubic inches. The regression
analysis of gasoline use and
displacement is shown.
The constant is the y-intercept of the
regression line.
Engine sizes (called
displacement) measure the
volume of the cylinders in
cubic inches. The regression
analysis of gasoline use and
displacement is shown.
The independent (explanatory) variable
is paired with the slope of the
regression line.
Engine sizes (called
displacement) measure the
volume of the cylinders in
cubic inches. The regression
analysis of gasoline use and
displacement is shown.
The equation of the regression line:
𝑓𝑢𝑒𝑙 𝑒𝑐𝑜𝑛𝑜𝑚𝑦 = 34.9799 − 0.066196 ∙ 𝑒𝑛𝑔𝑖𝑛𝑒𝑠𝑖𝑧𝑒
Engine sizes (called
displacement) measure the
volume of the cylinders in
cubic inches. The regression
analysis of gasoline use and
displacement is shown.
The only other information we need at
this time: n and r-squared (take the
square root for r).
1) How many cars were included in
this analysis?
2) What is the correlation between
engine size and fuel economy?
3) A car you are thinking of buying
is available with two different
size engines, 190 cubic inches or
240 cubic inches. How much
difference might this make in
your gas mileage?
Answers:
1) 89
2) r = -0.78
3) 19.1 mpg for 240 cubic inches or 22.4 mpg for
190 cubic inches – a difference of 3.3 mpg
Today’s Assignment:
Be sure to read Chapter 8
Add to HW: p. 192 #8, 10,
16, 18, 20, 22
Download