Exam3

advertisement
Exam 3 (100pts)
Stat 3500 (W07)
Instructor: Lin, Xiaoyan
Name:________________
Section: _________
Note: please show all of your work to get partial credit. Good luck!
Note: during your intermediate steps of calculation, you’d better keep at least 3
decimal digits.
1.(42pts) Suppose we want to predict intelligence level (y) based on brain size (x1),
person’s height (x2, in inches), person’s weight (x3, in lbs). Use the output below
(when necessary) to answer the following questions.
Predictor
Coef
Constant
14.63
Brain Si (x1) 1.4409
Height (x2) -0.01625
Weight (x3)
-0.2130
S = 21.5728
SE Coef
44.30
0.5706
0.03356
0.1797
R-Sq = 18.9%
Source
Regression
Residual Error
Total
DF
3
34
37
SS
3571.4
15823.1
18894.6
T
0.33
2.53
-0.48
-1.18
P
0.743
0.016
0.631
0.244
R-Sq(adj) = 11.7%
MS
1190.5
465.4
F
____
Predicted Values for New Observations (x1=90, x2=72, x3=180)
Fit
104.81
SE Fit
6.46
95% CI
(91.68, 117.93)
95% PI_____
(_____, _____)
(a).(5pts) Write the proposed model.
(b).(5pts) Report the least squares prediction equation .
(c).(6pts) Show that weight (x3) is not useful using either a 95% confidence
interval or using a hypothesis test with significance level 0.05. If you use CI, be
sure to explain briefly why weight is not useful.
1
(d).(8pts) Test the overall usefulness of the model. Use a significance level of 0.10.
Ho:
Ha:
Test Statistic:
Rejection Region:
Conclusion:
(e).(6pts) Calculate a 95% interval for the intelligence level of a student who has a
brain size of 90, height is 72 inches, and weight is 180 lbs. Hint: The interval of
interest is the blank one in the output!!
(f).(6pts) If there was one predictor (x-variable) that you could remove from the
model which one would it be and why?
(g).(6pts) Suppose we add another predictor to the model. What happens to R2?
2
2. (20pts) Consider a fuel consumption problem in which a natural gas company
wishes to predict weekly fuel consumption (y) for its city. We wish to predict y
on the basis of average hourly temperature (x1) and the chill index (x2). Data was
collected for eight weeks and two models were proposed.
Model 1: E( y)   0  1 x1   2 x2   3 x1 x2   4 x1   5 x2
2
Source
Regression
Residual Error
Total
DF
5
2
7
SS
25.1889
0.3598
25.5488
2
MS
5.03778
.1799
F
28.00
P
0.025
MS
12.438
0.135
F
92.30
P
0.000
Model 2: E ( y )   0  1 x1   4 x12
Source
Regression
Residual Error
Total
DF
2
5
7
SS
24.875
0.674
25.549
(a).(4pts) Set up the null and alternative hypotheses for testing which model is better.
Ho: ___________________
Ha: _____________________
(b).(6pts) Perform the test corresponding to your hypotheses from part a using α = 0.10.
Test Statistic:
Rejection Region:
Conclusion & Interpretation:
(c).(10pts) Use Ra2 criterion to see which model is a better model. (hint: use the
outputs to calculate R a2 for each model first.)
3
3.(20pts) Suppose a golfer has decided to keep a log of his scores from various
courses in Columbia. This golfer is interested in building a regression model to
estimate his average score (y) based on what golf course (L.A. Nickell, Lake of
the Woods, A.L. Gustin) he is playing. Use the following output to answer the
questions given below. NOTE: Each time he played was on a different day
(independent of each other).
Lake of the Woods
75
74
79
75
Avg. 75.75
L.A. Nickell
79
82
77
78
79.00
A.L. Gustin
80
82
84
80
81.50
(a).(8pts) Propose a model for estimating the golfer’s score based on the course he
is playing. Be sure to define any indicator variables, if any.
E(y) =
(b).(8pts) Calculate the least squares line (prediction equation) by hand.
(c).(4pts) Set up the null hypothesis and alternative hypothesis for testing whether
the there are significant difference among the different courses.
4
4.(12pts) Consider the second-order interaction model :
2
2
E(y)=  0  1 x1   2 x1   3 x2   4 x1 x2  5 x1 x2 ,
1 , level1
.
0 , level 2
where, x1 is a quantitative variable, x2  
The resulting least squares prediction equation is
ŷ = 48.8  3.4 x1  .07 x1  2.4 x2  3.7 x1 x2  .02 x1 x2
2
2
(a).(8pts) Write down the separate prediction equations for each level.
(b).(4pts) Suppose x1 =1, what is the predicted y value for level 2?
5.(6pts) Propose an appropriate model according to the following plot between
response variable y and the predictor x.
0
5
10
y
15
20
plot of y vs. x
2
4
6
8
10
x
5
Download