Solutions to PQ2

advertisement
Solutions to practice problems for Exam 2
1. a) 0, 1 and 2 are the parameters; b0, b1 and S2 are the respective estimates. Zero is not a
parameter, it is just a number and  is a random variable.
2. b) Correct answer is R2, r,  and F do not give the proportion of variation explained by the
regression model.
3. c) We have been using Spooled = √MSE, so S2 = MSE and S2 is an estimator of 2.
4. c) The df for the t-test = df for error = n – (number of parameters estimated) = n – (p+1)
5. b) It is obvious that you will fail to reject H0 when  = 0.05. When  gets smaller, the confidence
level = (1 – )×100 becomes larger. The larger the confidence level the wider will the confidence
interval will be, hence the CI will still include zero, so we will fail to reject Ho for any  ≤ 0.05.
6. b) In this case, predicted value of Y will be equal to the sample mean of y’s, i.e., ŷ  y for all X
values. So X is not a good predictor of Y; we do not know if there is a quadratic relation or not
(need to see a plot of the residuals for that). We cannot say “there is NO RELATIONSHIP” since
we did not check for other types of relations. Finally, the predictor is X, not Y (Y is the response).
7. d) You need to draw a scatter diagram which indicates that a quadratic fit will be better than a
linear fit. The coefficient of X2 should be positive (why?)
8. d) We are looking for association, not causation, between X and Y, so (a) or (b) cannot be true.
Since Y depends on X, a change in X will be associated to a change in Y. Also, (c) cannot be true
since the question states that “X is found to be highly significant.”
9. a) Remember the additional term (1) in the SE(predictor) which makes it larger, hence the PI is
always larger than a CI for the mean response.
2
10. a) Note the ( x*  X ) in the formula for the SE. The closer x* is to the sample mean, the smaller
the SE will be and hence both PI and CI will be narrower as x* gets closer to the sample mean. [In
2
an old version of notes, the square on ( x*  X ) did not appear. Please correct it.]
11. b) The slope gives the average change in Y for one unit increase in X, hence the change in Y will
always be , whenever X changes by one unit from any value. It does not depend one the value of
Y or the error term and 0 does not affect the values of Y.
12. c) No interaction means the regression lines are parallel, i.e., they have the same slop. The
intercepts may or may not be different. Thus there must be one slope but the intercept CAN be
more than one.
13. a) When two or more predictors are highly correlated, they do not give any extra information
about Y and hence the information provided by such predictors is redundant, not complementary.
However, we cannot say that ALL of these predictors are bad (useless), we may need one of them
in our model.
14. Using the estimation formula (you should have learned in STA 2023 and will not be given in this
test) we obtain
SY
54.2
 ( 0.774 ) 
 2.589555556 and
SX
16.2
a  y  bx  874.1  ( 2.589555556 )( 163.5 )  1297.492333 .
Hence the prediction equation is ŷ  1297.49  2.59 X
br
15. (i) d. When one of the predictors has a curvilinear relationship with the response we use a
quadratic regression model.
15. (ii) m. Influential points are those that can change the direction of the association.
15. (iii) f. We use residual plots to check the assumptions of the regression models.
15. (iv) c. Adjusted R2 is used when comparing different regression models, when the number of
predictors in them are different.
15. (v) e. Interaction is used when the effect of one predictor on Y depends on other predictors.
15. (vi) k. R2 gives the proportion of variation in Y explained by the regression model.
15. (vii) l. Residual = y  ˆy = observed value of y minus the predicted value for an given value of x.
15. (viii) n. An outlier is a point that lies far from other observed points.
15. (ix) b. Extrapolation (predicting y for a value of X outside the range of observed values of X) can
give bad predictions if the conditions (linear relation between X and Y) do not hold for the
values outside the range of observed values of X.
15. (x) i. Cause and effect can be erroneously assumed in an observational study.
15. (xi) j. This is how multiple linear regression model is shown.
15. (xii) g. This is how the fitted (or prediction) equation is shown in a multiple linear regression
model.
15. (xiii) a. Multicollinearity problem can occur when the information provided by several predictors
overlaps (i.e., they are highly correlated).
15. (xiv) h Dummy variables are used in a regression model to represent categorical variables.
16. b) Since the scatter diagram shows no relation between length of life-line and age, correlation
between them should be close to zero.
17. b) With a correlation close to zero, we can say that the length of life-line is a poor predictor of age
18. a) We expect F to be close to one since the length of life-line is not a good predictor of age at
death.
19. c) Since the length of life-line is not a good predictor of age at death, we might use the average
age of death as the predictor of the age of death.
20. b) The null hypothesis specifies that none of the predictors are providing significant information
about the response and hence 1 = 2 = … = p = 0.
21. d) The alternative hypothesis specifies that at least one of the predictors is providing significant
information about the response and hence at least one of i  0.
22. d) The method of finding estimates of the coefficients of the regression method is called the least
squares estimation because it minimizes the sum of the squared errors.
23. c) This is a misuse of causation. More fire fighters are sent to worse fires that cause large financial
damages, i.e., firefighters are not the cause of the financial damage but there is another intervening
variable (size of fire) that influences both variables (financial damage and number of fire fighters
dispatched)
24. a) In this problem response is the number of Elvis impersonators and the predictor is year.
Assuming this quotation is the result of a regression analysis, based on data between 1977 and
2007, we are talking about a prediction for the year 2012 which is outside the observed values of
the predictor. Hence here we are using extrapolation which does not give reliable results.
25. d) The response variable is sales, denoted by y and it is a quantitative variable. Hence the correct
answer is “all of the above.”
26. b) The slope is – 1418. This means as the price (predictor) increases by one dollar, sales will
decrease by $1418.
27. d) The intercept is the average value of response (sales) when predictor (price) is zero. But this
makes sense when zero is within or close to the values of the predictor used for estimating the
coefficients. In this problem since we are talking about sales a price of zero is not within the
values of price used in the analysis and hence the intercept should NOT be interpreted.
28. c) R2 gives the proportion of variability in response (sales) accounted for (or explained by) the
predictor (price). Hence we can say that 59.70% of variability in sales accounted for (or explained
by) price.
29. The correlation coefficient = r   Sign of slope 
R2
59.70%

  0.773 . Two
100
100
common mistakes in this type of problems is to ignore the fact that the square root of any number
can be either positive or negative and we decide on the sign by looking at the slope (Slope and
correlation must have the same sign, why?). Hence r = + 0.773 is NOT the correct answer. The
second common problem is the meaning of % (per cent = divide by 100):
So R2 = 59.70% = 59.70 / 100 = 0.5970 and hence r = – √0.5970. So 7.73 or – 7.73 are wrong.
You should also remember that r (the correlation coefficient) CANNOT be less than – 1 or more
than + 1.
30. b) You are asked to predict sales when price = $1.10. Putting this value in the prediction equation
we get predicted-sales = 2259 – 1418×1.10 = $699.20
31. a) Yes, since the p-value is small, but the low R2 says there is room for improvement and we may
add other predictors to the model to improve it. The values (and the signs) of the slope and the
intercept do not give us any information about how good a predictor is, so b and d are wrong.)
32. c) The vertical axis of a residual plot is always the residuals (standardized or not). On the
horizontal axis we put the predictor (X) to see if a quadratic term in X may improve the model (it
does in this case).
33. c) It seems that a quadratic model will fit the data better, i.e., a simple linear regression model
does not seem to be the best model.
34. Both b and c are correct answers, with the understanding that in b X1 = price and X2 = (price)2 or
in c we let X = price and hence X2 = (price)2. Note that a is not the correct answer because it
ignores the quadratic term (which has a p-value < 0.0005 and hence is significant and d means
adding a cubic term (X1X2 = (price)×(price)2 = (price)3.
35. b) To find R2 from the ANOVA we calculate the ratio of SSReg to SST:
R2 
SS Re g 16 ,060,569

 0.8066  80.7%
SST
19,911,800
36. d) The test statistic for testing if the coefficient of price2 is significantly different from zero is T
defined as T = coefficient/SE(Coefficient). Hence Tcal = 3522.3/436/8 = 8.06.
37. d) as can be seen by the sign of the coefficient for the quadratic term; b is wrong because it means
as price increase so does sales, whereas we have a negative slope in the SLR model. If we have to
choose between c and d, we choose d (the quadratic model) since the p-values for both
coefficients are extremely small (<0.0005).
2
38. Since the prediction equation for the quadratic model is ŷ  b0  b1 X  b2 X we have
ŷ  7990.0  10660  1.10  3522.3  ( 1.10 )2  525.98
39. c) We decide by looking at the p-value for the coefficient of the quadratic term. Since the p-value
is (almost) zero, we reject Ho: 2 = 0. Hence the quadratic term should be in the model.
40. a) The coefficient for flyer is 804.12 and it is significantly different from zero as indicated by the
corresponding p-value. Hence, on the average sales will increase by 804 units when the product is
advertised by a flyer.
41. d) Since the p-value corresponding to display is large (0.558) this variable should not be in the
model and hence we cannot interpret the coefficient.
42. a) Since the p-values for both are small, we conclude that both price and price2 are good predictor
of sales.
43. c) We must run the regression analysis one more time without display in the model. This may
change the estimates of the coefficients. We must also carry out an analysis of residuals to check
if the assumptions are satisfied and see if the model can be further improved (although it seems
satisfactory with the high R2 values..
44. b) Since the effect of one of these two predictors on sales may be more than the sum of their
individual effects, i.e., there may be significant interaction.
45. d) Note that although this looks similar to b, the error term () in b is wrong since it should not
appear in the equation of the fitted line (prediction equation).
46. b) The p-values of all predictors are very small, thus strongly supporting the alternative hypothesis
in ANOVA test.
47. d) The p-value reported in the output is for a two-sided alternative; we are asked for a one sided
alternative thus should take half of the reported p-value = 0.032/2 = 0.0016.
48. c) The p-value corresponding to papers is the smallest of all the p-values indicating a strong
support in explaining the variation in salary.
49. c) The 95% CI for 1 is b1  t × SE(b1) = 1.1031  t × 0.3596. [Note that 3.068 is NOT SE(b1), it
is the calculated value of the test-statistic, Tcal for testing Ho: 1 = 0 vs. Ha: 1  0.]
50. c) The df for the t is the df for error = n – p – 1 = 35 – 1 – 4 = 30.
51. d) We assume that the response (Y = Salary) has a normal distribution.
52. a) It seems that there is a very weak increasing linear relationship between age of mother and the
weight of baby.
53. d) All of the first three alternatives serve the same purpose, to help us determine if age of mother
is a significantly god predictor of the weight of baby.
54. b) Positive, since the linear relation seems to be increasing, but not significantly different from
zero since the points are far from the fitted line (and hence the correlation is close to zero).
55. d) Since there is no quadratic term (square of any of the X’s) in the model, it is not a quadratic
model (the product term is for interaction). It IS a multiple regression model with dummy
variables and uses least squares to estimate the parameters hence all of a, b and c are true
statements.
56. c) Since X2 = 0 for those who had prenatal care, that group is the baseline group.
57. d) When we put the values of X2 into the model, it simplifies to
y  0  1 X 1   for the baseline (prenatal care) group and
y  ( 0   2 )  ( 1   3 )X 1   for the “no prenatal care” group.
Hence, 3 reflects the change in slope
58. a) Using the simplified models in # 57, we see that 2 is the change in the intercept
59. c) Using the simplified models in # 57, we see that 1 is the slope for the baseline group
60. b) Using the simplified models in # 57, we see that 0 is the intercept for the baseline group.
61. b) We can see that the slopes of the lines for the two groups seem to be significantly different
from zero, since the (visually) fitted lines are not parallel to the x-axis. We are not so sure about
the interaction in c, there may or may not be interaction. Similarly, a is wrong since the coefficient
of age (slope) is different from zero and d is wrong since there is one correct statement.
62. a) We need to test for the change in slope since we are interested in the rate at which weight
increases, hence we will test Ho: 3 = 0 vs. Ha: 3  0.
63. a) Here we will use the estimates of the parameters in the second equation of # 57 since we are
interested in those who did not receive prenatal care (X2 = 1). So the prediction equation is
wt  ( 1.84  1.79 )  ( 0.53  0.003 )X 1  0.05  0.527 X 1
64. b) Here we will use the estimates of the parameters in the first equation of # 57 since we are
interested in those who did receive prenatal care (X2 = 0). So the prediction equation is
wt  1.84  0.53X 1
65. c) Since the two fitted lines intersect, we should use a model with interaction.
66. d) In ANOVA we test Ho: 1 = 2 = 3 = 0 vs. Ha: At least one of ’s  0.
67. d) Since the p-value < 0.0005, we conclude that at least one of the ’s  0, i.e., at least one of the
variables (including the variable for interaction) in the model is a good predictor of weight.
68. a) We would like to find if a model with interaction term is appropriate so we must first test if
there is any significant interaction, i.e., test Ho: 3 = 0 vs. Ha: 3  0.
69. d) The p-value corresponding to the test Ho: 3 = 0 vs. Ha: 3  0 is 0.098. Thus we will reject H0
at  = 0.10 but not at other levels of significance. This is not a strong support. Hence none of the
options in this question is the correct conclusion of the test.
70. a) Since there is very weak support for interaction, we will exclude it from the model and run the
analysis again without that variable in the model.
71. d) The p-values for height and gender are small so we reject the null hypotheses which state that
the corresponding parameters are zero. However, the p-value for the intercept (Constant) is large.
So we fail to reject Ho: 0 = 0. Hence, all of the statements in a, b and c are correct.
72. b) We have already eliminated the interaction from the model and also forced the regression line
to pass through the origin; hence the model does not have an intercept nor an interaction term,
leaving us with Weight  1 Height   2Gender  
Download