Statistics 401C
November 6, 2001
Exam 2 Name:
INSTRUCTIONS: Read the questions carefully and completely. Answer each question and show work in the space provided. Partial credit will not be given if work is not shown. When asked to explain, describe, or comment, do so within the context of the problem.
1.
[40 pts] A server at a local restaurant wants to predict the tip from the cost of a meal. The server writes down the meal cost and tip for 10 customers selected at random. The data appear below.
Customer 1 2 3 4 5 6 7 8 9 10
Meal Cost ($) 4.58
5.75
6.14
8.22
5.20
6.80
19.15
4.25
1.25
7.27
Tip ($) 0.70
0.85
0.90
1.25
0.80
1.00
0.85
0.60
0.25
1.10
Using simple linear regression of Tip on Meal Cost the following prediction equation is obtained.
Pred Tip = 0.66 + 0.025*Meal Cost
(a) [6] Give interpretations of the estimated intercept and slope coefficients.
(b) [3] Compute the predicted value and residual for customer 7. Round to 2 decimal places.
(c) [4] The
MS
Error = 0
.
0755 and the leverage for customer 7 is residual for customer 7.
h
= 0
.
851. Compute the studentized
(d) [5] Is the studentized residual for customer 7 statistically significant. Use an overall α = 0 .
10 and control for the fact that you could do multiple tests, one for each customer.
1
(e) [5] Is the leverage for customer 7 statistically significant? Support your answer and again use an overall
α
= 0
.
10 and control for the fact that you could do multiple tests, one for each customer.
(f) [4] What is Cook’s D for customer 7? What does this value indicate?
The data for customer 7 is removed and a simple linear regression is refit for the remaining 9 customers. Below is information on this new fit. Also refer to the JMP output on the studentized residuals.
Pred Tip = 0.04 + 0.1435*Meal Cost
Customer 1 2 3 4 5 6
Meal Cost ($) 4.58
5.75
Tip ($) 0.70
0.85
Pred Tip
Residual
Studentized h
Cook’s D
0.70
0.00
0.86
−
0
.
01
6.14
0.90
8.22
1.25
5.20
0.80
6.80
1.00
0.92
1.22
0.79
1.01
−
0
.
02 0.03
0.01
−
0
.
01
0.14
−
0
.
53
−
0
.
76 1.34
0.54
−
0
.
57
0.14
0.11
0.00
0.02
0.12
0.33
0.11
0.16
0.04
0.45
0.02
0.03
7 8 9 10
4.25
1.25
7.27
0.85
0.25
1.10
0.65
0.22
1.08
−
0
.
05 0.03
0.02
−
1
.
87 1.87
0.69
0.16
0.65
0.21
0.33
3.27
0.06
(g) [5] Describe the distribution of studentized residuals. Are the conditions necessary for the regression analysis satisfied?
(h) [2] Do any of the h values exceed 2 k +1 n
? If so, who are the associated customers?
(i) [2] Do any of Cook’s D values exceed 1? If so, who are the associated customers?
2
(j) [4] Why do the customers identified in (h) and (i) exhibit high leverage and high influence?
(k) Extra Credit [5] For the analysis with all 10 customers, what is the jackknife residual for customer 7?
2.
[25 pts] Data are collected on the selling price (pounds sterling) and age (100 to 200 years) of 32 antique grandfather clocks. One wishes to predict the selling price from the age. There is an extra variable, the number of bidders (5 to 15) at the auction. Refer to the output on the various models relating selling price to age, number of bidders and a cross product term.
(a) [3] Look at the output for the simple linear regression of selling Price on Age. Give the prediction equation and value of R 2
.
(b) [3] Look at the output for the multiple regression of selling Price on Age and number of Bidders.
Give the prediction equation and value of R 2
.
(c) [5] From the information in (a) and (b) are Age and number of Bidders confounded? Explain briefly.
3
(d) [3] Look at the output for the multiple regression of selling Price on Age, number of Bidders and a cross product term (Age*Bidders). Give the prediction equation and value of
R 2
.
(e) [5] Is there significant interaction? Support your answer.
(f) [6] On the graph below, plot the regression lines for the relationship between selling Price and
Age for an auction with 10 Bidders and an auction with 15 Bidders. It must be clear to me what equations you are using to plot your lines.
4
3.
[35 pts] A production plant is looking into cost reduction. One of the most costly items is the amount of water used each month. The relationship between water usage (Water) and four other variables; average monthly temperature (Temperature), amount of production (Production), number of days the plant operates that month (Days) and the number of persons on the monthly payroll (Persons) is investigated. Refer to the JMP output accompanying this problem.
(a) [5] For the simple linear regression of Water on Production what is the value of R relationship statistically significant? Support your answer using
α
= 0
.
05.
2 and is the
(b) [5] For the multiple regression of Water on Production and Temperature what is the increase in
R 2 from the simple linear regression in (a)? Does adding Temperature add significantly to the explanatory power of the model? Support your answer using α = 0 .
05.
(c) [5] Is there a significant linear relationship between Water usage and Temperature (ignoring all other variables)? Support your answer using
α
= 0
.
05.
(d) [4] Looking at the full model (Water regressed on Production, Temperature, Days and Persons) does the model have significant explanatory power? Support your answer using
α
= 0
.
05.
5
(e) [6] In the full model, which (if any) of the individual variables add significant explanatory power?
Since you will be doing four simultaneous tests you should adjust the overall significance level of
0.05 to account for these multiple tests.
(f) [6] Does dropping both Temperature and Days from the full model result in a significant reduction in explanatory power? Support your answer using α = 0 .
05.
(g) [4] Are Persons and Production collinear? Support your answer.
6