curvature

advertisement
(The next 8 questions are based on the following information.)
The general manager of a supermarket chain believes that sales of a product are related to the amount of
space the product is given on shelves. Across a sample of stores, she measures the sales of a particular
brand of laundry detergent (in boxes per week) and relates sales to the amount of shelf space (in inches)
devoted to the detergent. In the sample, shelf space ranges from 12 to 32 inches, while sales ranges from
173 to 395 boxes per week.
The general manager estimates two models, a straight-line model, and a quadratic (parabola) model.
Selected output from Excel is given below.
Straight-Line Model:
R2 = .191
Intercept
ShelfSpace
Quadratic Model:
R2 = .380
Intercept
ShelfSpace
ShelfSpace Squared
R2(Adjusted) = .148
Residual Standard Deviation = 47.8
Coefficients Standard Error t Stat
191.54
39.31
4.87
3.64
1.72
2.12
R2(Adjusted) = .312
Coefficients
-109.85
33.29
-0.67
Standard Error
133.14
12.72
0.29
P-value
0.0001
0.0479
Residual Standard Deviation = 43.0
t Stat
-0.83
2.62
-2.35
P-value
0.4201
0.0175
0.0305
320. Determine the correlation coefficient relating Sales to ShelfSpace.
(a) r= -0.67
(b) r= 0.44
(c) r= 0.62
(d) r= 0.67
Answer: (b)
correlation coefficient measures the straight-line association between X and Y. it’s the sqrt of the
R2 value from the straight-line model:
r = sqrt(.191) = .44
And the correlation is positive because the slope is positive (3.64)
321. Based on the quadratic model, what are the predicted sales of detergent if there are 20 inches of shelf
space devoted to detergent?
(a) 264 boxes per week
(b) 288 boxes per week
(c) 395 boxes per week
(d) Cannot be determined, because it would require extrapolating outside of the range of the data.
Answer: (b)
predicted sales = -109.85 + 33.29*20 - .67*202 = 288
322. Suppose one store is considering expanding the shelf space for detergent from 20 inches to 30
inches. Which model predicts a larger change in sales when changing from 20 inches to 30 inches of
shelf space?
(a) The straight-line model predicts a larger increase in sales than the quadratic model does.
(b) The quadratic model predicts a larger increase in sales than the straight-line model does.
(c) Both models predict the same increase in sales.
Answer: (a)
20 to 30 for the straight-line model means an increase of 10*3.64 = 36.4 boxes
20 to 30 for the quadratic model means a change from 288 (from previous problem) to :
-109.85 + 33.29*30 - .67*302 = 285.9. Or a decrease of 2 boxes.
323. Which statement(s) below is/are correct?
(a) There appears to be some curvature in the relationship between sales and shelf space, because the
coefficient for ShelfSpace Squared is significantly different from 0.
(b) There appears to be some curvature in the relationship between sales and shelf space, because the
Adjusted R2 is higher for the quadratic model than for the straight-line model.
(c) There does not appear to be any curvature in the relationship between sales and shelf space, because
the ShelfSpace coefficient in the straight-line model is significantly different from 0.
(d) Both (a) and (b) are correct
Answer: (d)
The better fit of the quadratic model (as measured by Adjusted R2) and the statistical significance
of the ShelfSpace Squared coefficient both tell us that there’s curvature in the relationship.
324. What is the best interpretation of the residual standard deviation (43.0) of the quadratic model?
(a) The typical prediction error for the quadratic model is about 43 boxes per week.
(b) The typical difference between the predictions of the quadratic model and the predictions of the
straight line model is about 43 boxes per week.
(c) When ShelfSpace=0, the slope of the parabola is 43.
(d) The highest point of the parabola is when ShelfSpace=43.
The residual standard deviation (43.0) is the typical prediction error of the model, so (a) is the right
answer.
325. Suppose we wanted to estimate a power model for these data (Sales = A * ShelfSpace B). Describe
the independent variable(s) and dependent variable you would use in order to estimate this model with a
simple linear regression.
(a) Independent variable: ShelfSpace
Dependent variable: Sales
(b) Independent variable: ShelfSpace2
Dependent variable: Sales2
(c) Independent variable: log(ShelfSpace)
Dependent variable: log(Sales)
(d) Independent variables: ShelfSpace and ShelfSpace2 Dependent variable: Sales2
Answer: (c)
To fit a power model relating Y to X, we run a simple linear regression of the log of Y on the log of
X.
326. What would you expect the exponent (B) from the power model to be?
(a) Less than 0
(b) Exactly 0
(c) Between 0 and 1
(d) Greater than 1
Answer: (c)
Here we need to think about the curvature of the relationship and pick the appropriate exponent to
match the curvature. The negative coef for ShelfSpace Squared tells us that the curve is like an
upside-down U, increasing steeply at first, and then leveling off (and then decreasing). The power
model that is closest to this is the case where the exponent is positive but less than 1 (i.e., between 0
and 1).
327. The best way to determine whether the amount of shelf space affects sales is to:
(a) Randomly assign different stores to have different amounts of shelf space, and then observe whether
sales are associated with amount of shelf space.
(b) Test the null hypothesis that the coefficient for ShelfSpace is equal to 0 for both the straight-line
model and the quadratic model.
(c) Conduct an observational study where we carefully observe whether shelf space and sales are
associated with each other.
(d) Hold shelf space constant across all stores and see if sales are identical in those stores.
Answer: (a)
To determine a cause and effect relationship, we need to run an experiment: we need to randomly
assign stores to different amounts of shelf space, and then see what happens.
(The next 4 questions are based on the following information.)
In an attempt to understand the relationship between driving speed and gas mileage, 50 identical cars are
driven at different speeds (ranging from 2 to 100 miles per hour), and the gas mileage is measured for
each car (in miles per gallon). The simple correlation between speed and mileage is very low, r=0.005.
A quadratic regression model is estimated, yielding the following output:
R2 = 0.710
Intercept
Speed
SpeedSquared
Adjusted R2 = 0.698
Coefficients Standard Error t Stat P-value
9.34
1.707 5.47 .000002
0.802
0.077 10.39 .000001
-0.00788
0.000734 -10.73 .000001
328. Based on the quadratic model, what is the predicted mileage for a car traveling at 50 miles per hour?
(a) 20.4
(b) 29.7
(c) 45.5
(d) 49.4
Answer: (b)
Plug in 50 for Speed and 502 for SpeedSquared:
predicted mileage = 9.34 + .802 * 50 - .00788 * 502 = 29.74
329. To test whether or not there is substantial curvature in the relationship between mileage and speed,
the appropriate test statistic and conclusion (using alpha=.05) are:
(a) Test statistic t= -10.73;
Conclusion: reject the null hypothesis of no curvature
(b) Test statistic t = 10.39;
Conclusion: reject the null hypothesis of substantial curvature
(c) Test statistic t = 5.47;
Conclusion: reject the null hypothesis of no curvature
(d) Test statistic t = 5.13;
Conclusion: reject the null hypothesis of substantial curvature
Answer: (a)
The coefficient for SpeedSquared will provide the test for curvature. If the coefficient is zero in the
population, then there’s no curvature; if the coef is not zero, then there’s some detectable
curvature. The t-value of –10.73 and the low p-value of .000001 tell us to reject the null
hypothesis of no curvature. So we conclude that there’s some curvature in the data.
330. Which of the following statements best describes the relationship between mileage and speed?
(a) Mileage appears to consistently decrease as speed increases.
(b) Mileage appears to consistently increase as speed increases.
(c) Mileage appears to be better for both very low and very high speeds, with worse mileage for
moderate speeds.
(d) Mileage appears to be worse for both very low and very high speeds, with better mileage for
moderate speeds.
Answer: (d)
The negative coefficient for SpeedSquared tells us that the plot of Mileage vs speed will look like an
upside-down U. Option (d) best describes an upside-down U shape.
331. Why does the quadratic model fit well, while the correlation between mileage and speed is so small?
(a) Because the relationship between mileage and speed is not described very well by a straight line, but
is described well by a parabola.
(b) Because the relationship between mileage and speed is not described very well by a parabola, but is
described quite well by a straight line.
(c) Because the quadratic model has more parameters than the straight line model, it will always yield a
large Adjusted R2
(d) Both (a) and (c) are true.
Answer: (a)
A straight line doesn’t fit the upside-down U very well at all – so the correlation is very low: r=.005.
But a parabola fits very well, based on the fit of the quadratic model: R2 = .710 and Adj R2= .698.
Option (c) is not true because Adjusted R2 can go up or down as more predictors are added.
Download