Stat 511 Fall 2003 Midterm 2 Statistics 511 Midterm 2 Nov. 24, 2003 The following rules apply. 1. You may use 3 sheets of paper for any information you need - doublesided, any font. 2. You may use a calculator. 3. You may not collaborate or copy. 4. Failure to comply with item 3 could lead to reduction in your grade, or disciplinary action. I have read the rules above and agree to comply with them. Signature ________________________________________________ Name (printed) ___________________________________________ 1 Stat 511 Fall 2003 Midterm 2 1. (31/50 points) This problem refers to the additional handout "Computer Output for Problem 1". All the information needed to answer computational problems is available on the output. Why animals sleep is one of the unanswered mysteries of biology. Among mammals, the amount of sleep needed varies widely among species. One hypothesis is that the amount of sleep needed depends on the body weight, brain weight, lifespan and gestation period (number of days of pregnancy) of the animal. Another hypothesis is that animals that are in danger due to predation are selected for shorter sleep periods, since they are more likely to be attacked by predators while they are asleep. A study collected data on 51 mammal species. The variables used included the average number of hours of sleep per day, the average body weight, average brain weight, maximum observed lifespan, average gestation time, and "danger", an esimate of how likely the species was to suffer attack by predators while sleeping. The distribution of body and brain weights were highly skewed, so the investigators took logarithm of these quantities. They then regressed SLEEP the average number of hours of sleep per day on LBRAIN DANGER LBODYWT LIFESPAN GESTATE LOG(average brain weight) danger of predation index LOG(average body weight) maximum life span average gestation length a) Assuming that the model assumptions are met, what is the meaning of the p-value on the ANOVA table. (i.e. what hypothesis does it test, and what do you conclude for these data?) 2 Stat 511 Fall 2003 Midterm 2 b) Is the effect of DANGER statistically significant when the other variables are in the model? Justify your answer briefly. c) Are the effects of LBODYWT, LIFESPAN and GESTATE statistically significant when LBRAINWT and DANGER are in the model? Compute a statistical test. 3 Stat 511 Fall 2003 Midterm 2 d) What is the variance inflation factor for DANGER and what important information does it tell the investigator. e) What is the most important feature you see on the partial leverage plot for DANGER? 4 Stat 511 Fall 2003 Midterm 2 f) What is the sign of the regression coefficient for DANGER? Explain how you arrived at this answer. Another investigator used the same data, but fitted only LBRAINWT and DANGER. All of the remaining problems refer to this regression. g) What is the ANOVA table from that regression, including the F-test and p-value. Source D.F. SS Mean Square Model Error Total 5 F P-value Stat 511 Fall 2003 Midterm 2 h) Test whether or not DANGER is statistically significant when only LBRAINWT is in the model. i) Is the variance inflation factor for LBRAINWT in this model greater than 10? Justify your response without computing the variance inflation factor. (You do not have enough information to make this computation, but you do have enough to answer this question.) 6 Stat 511 Fall 2003 Midterm 2 j) The highest leverage value for the model with LBRAIN and DANGER is 0.145, for the smallest mammal in the sample (Lesser short-tailed shrew). Is the lesser short-tailed shrew a high leverage point for this regression? Briefly justify your answer. k) The lesser short-tailed shrew has a an average log(brainwt) of -1.97 and a danger index of 4. The estimated regression equation is, where SLEEP is measured in hours. PREDICTED SLEEP= 17.79 -0.922 LBRAIN - 1.71 DANGER Compute a 95% confidence interval for the estimated number of hours of sleep for this species. 7 Stat 511 Fall 2003 Midterm 2 2.(10/50 points) This problem refers to the additional handout "Computer Output for Problem 2". All the information needed to answer computational problems is available on the output. The stopping speed of cars is determined by a number of factors including weight. In a 1983 study, data on stopping time and weight were compiled from 406 models of car manufactured during the previous 10 years. a) A plot of stopping time versus weight with a loess curve is included in the output. Based on this plot, what degree polynomial appears to be adequate to fit the data? Briefly explain your answer. 8 Stat 511 Fall 2003 Midterm 2 b) The investigator decided to fit a 7th degree polynomial to the data. To avoid multicollinearity he first centered weight by subtracting the mean. The residuals from this model appear to fit the Normal regression model. Use sequential testing (without pooling) to determine an appropriate degree for the polynomial. 9 Stat 511 Fall 2003 Midterm 2 c) The investigator consulted a statistician, who insisted that a degree 3 polynomial would provide adequate fit to these data. Based on the model R2 and the residual plot provided, is there evidence that the 3rd degree polynomial does not fit? Support your answer with 3 pieces of evidence. 10 Stat 511 Fall 2003 Midterm 2 d) The investigator was surprised that when the 7th degree polynomial was fitted, the cubic term was highly statistically significant, but when the 3rd degree polynomial was fitted, the 3rd degree term was not statistically significant. Briefly explain why this might occur. 11 Stat 511 Fall 2003 Midterm 2 3. (9/50 points) a. Suppose we observe Y1, Y2, Y3 which are random variables with E(Yi) = Var(Yi)=2 Cov(Yi ,Yj)=2i-j| We will do this problem only for n=3. a) Write the mean vector and Variance matrix of the vector Y. b) Let Z1=Y1, Z2=Y2-Y1, Z3=Y3-Y2. Find a 3x3 matrix A so that AY=Z, where Y and Z. 12 Stat 511 Fall 2003 Midterm 2 c) Compute E(Z) d) Compute Var(Z2) e) Compute Covariance(Z1, Z3) 13