MidTerm2F03

advertisement
Stat 511
Fall 2003
Midterm 2
Statistics 511
Midterm 2
Nov. 24, 2003
The following rules apply.
1. You may use 3 sheets of paper for any information you need - doublesided, any font.
2. You may use a calculator.
3. You may not collaborate or copy.
4. Failure to comply with item 3 could lead to reduction in your grade, or
disciplinary action.
I have read the rules above and agree to comply with them.
Signature ________________________________________________
Name (printed) ___________________________________________
1
Stat 511
Fall 2003
Midterm 2
1. (31/50 points) This problem refers to the additional handout "Computer Output for
Problem 1". All the information needed to answer computational problems is available on
the output.
Why animals sleep is one of the unanswered mysteries of biology. Among mammals, the
amount of sleep needed varies widely among species. One hypothesis is that the amount
of sleep needed depends on the body weight, brain weight, lifespan and gestation period
(number of days of pregnancy) of the animal. Another hypothesis is that animals that are
in danger due to predation are selected for shorter sleep periods, since they are more
likely to be attacked by predators while they are asleep.
A study collected data on 51 mammal species. The variables used included the average
number of hours of sleep per day, the average body weight, average brain weight,
maximum observed lifespan, average gestation time, and "danger", an esimate of how
likely the species was to suffer attack by predators while sleeping.
The distribution of body and brain weights were highly skewed, so the investigators took
logarithm of these quantities.
They then regressed
SLEEP
the average number of hours of sleep per day
on
LBRAIN
DANGER
LBODYWT
LIFESPAN
GESTATE
LOG(average brain weight)
danger of predation index
LOG(average body weight)
maximum life span
average gestation length
a) Assuming that the model assumptions are met, what is the meaning of the p-value on
the ANOVA table. (i.e. what hypothesis does it test, and what do you conclude for these
data?)
2
Stat 511
Fall 2003
Midterm 2
b) Is the effect of DANGER statistically significant when the other variables are in the
model? Justify your answer briefly.
c) Are the effects of LBODYWT, LIFESPAN and GESTATE statistically significant
when LBRAINWT and DANGER are in the model? Compute a statistical test.
3
Stat 511
Fall 2003
Midterm 2
d) What is the variance inflation factor for DANGER and what important information
does it tell the investigator.
e) What is the most important feature you see on the partial leverage plot for DANGER?
4
Stat 511
Fall 2003
Midterm 2
f) What is the sign of the regression coefficient for DANGER? Explain how you arrived
at this answer.
Another investigator used the same data, but fitted only LBRAINWT and
DANGER. All of the remaining problems refer to this regression.
g) What is the ANOVA table from that regression, including the F-test and p-value.
Source
D.F.
SS
Mean Square
Model
Error
Total
5
F
P-value
Stat 511
Fall 2003
Midterm 2
h) Test whether or not DANGER is statistically significant when only LBRAINWT is in
the model.
i) Is the variance inflation factor for LBRAINWT in this model greater than 10? Justify
your response without computing the variance inflation factor. (You do not have enough
information to make this computation, but you do have enough to answer this question.)
6
Stat 511
Fall 2003
Midterm 2
j) The highest leverage value for the model with LBRAIN and DANGER is 0.145, for the
smallest mammal in the sample (Lesser short-tailed shrew). Is the lesser
short-tailed shrew a high leverage point for this regression? Briefly justify your answer.
k) The lesser short-tailed shrew has a an average log(brainwt) of -1.97 and a danger index
of 4. The estimated regression equation is, where SLEEP is measured in hours.
PREDICTED SLEEP= 17.79 -0.922 LBRAIN - 1.71 DANGER
Compute a 95% confidence interval for the estimated number of hours of sleep for this
species.
7
Stat 511
Fall 2003
Midterm 2
2.(10/50 points) This problem refers to the additional handout "Computer Output for
Problem 2". All the information needed to answer computational problems is available on
the output.
The stopping speed of cars is determined by a number of factors including weight. In a
1983 study, data on stopping time and weight were compiled from 406 models of car
manufactured during the previous 10 years.
a) A plot of stopping time versus weight with a loess curve is included in the output.
Based on this plot, what degree polynomial appears to be adequate to fit the data?
Briefly explain your answer.
8
Stat 511
Fall 2003
Midterm 2
b) The investigator decided to fit a 7th degree polynomial to the data. To avoid
multicollinearity he first centered weight by subtracting the mean. The residuals from
this model appear to fit the Normal regression model. Use sequential testing (without
pooling) to determine an appropriate degree for the polynomial.
9
Stat 511
Fall 2003
Midterm 2
c) The investigator consulted a statistician, who insisted that a degree 3 polynomial
would provide adequate fit to these data. Based on the model R2 and the residual plot
provided, is there evidence that the 3rd degree polynomial does not fit? Support your
answer with 3 pieces of evidence.
10
Stat 511
Fall 2003
Midterm 2
d) The investigator was surprised that when the 7th degree polynomial was fitted, the
cubic term was highly statistically significant, but when the 3rd degree polynomial was
fitted, the 3rd degree term was not statistically significant. Briefly explain why this
might occur.
11
Stat 511
Fall 2003
Midterm 2
3. (9/50 points)
a. Suppose we observe Y1, Y2, Y3 which are random variables with
E(Yi) = 
Var(Yi)=2
Cov(Yi ,Yj)=2i-j|

We will do this problem only for n=3.

a) Write the mean vector and Variance matrix of the vector Y.
b) Let Z1=Y1, Z2=Y2-Y1, Z3=Y3-Y2.
Find a 3x3 matrix A so that AY=Z, where Y and Z.
12
Stat 511
Fall 2003
Midterm 2
c) Compute E(Z)
d) Compute Var(Z2)
e) Compute Covariance(Z1, Z3)
13
Download