LAST NAME (Please Print): FIRST NAME (Please Print): HONOR PLEDGE (Please Sign): Statistics 111 Midterm 4 • This is a closed book exam. • You may use your calculator and a single page of notes. • The room is crowded. Please be careful to look only at your own exam. Try to sit one seat apart; the proctors may ask you to randomize your seating a bit. • Report all numerical answers to at least two correct decimal places or (when appropriate) write them as a fraction. • All question parts count for 1 point. 1 1. Captain Hornblower is sailing from Bombay to London. He can either sail through the Suez canal or around the Cape of Good Hope. The Suez route risks capture by Somali pirates; this happens with probability 1/80 and will cost a random ransom that has a uniform distribution between $2 million and $4 million. If he travels around the Cape, the longer trip will cost him $30,000. What is the expected loss from taking the Suez route? What is his best choice? 2. Suppose the duration of a summer romance has density f (x) = θx(θ−1) for 0 < x < 1 and 0 < θ and is 0 otherwise. (Here, time is scaled so that a full summer has duration 1.) What is the survival function? What is the hazard function? What is the shape of the failure rate? You observe three summer romances, which lasted 1/2, 3/4, and 2/3 of the summer. What is your maximum likelihood estimate for θ? 3. You can invest d dollars in fixing up a house to sell. You believe that the sale price of the fixed-up house has an exponential distribution with parameter λ = (5d)−1 where d is the amount you invest. What is your net expected profit from investing $10K? If you utility for money is linear and you have $30K in the bank, how much should you invest? 2 4. Describe how you would use 2-fold cross-validation to assess the predictive accuracy of a nonparametric regression. (Anthony: Please print neatly.) (6 points) 5. You want to compare salaries for new graduates among Harvard, Yale, Princeton, and Duke. To control for effects due to major, you pick one person from each school who majored in Statistics, Economics, Computer Science, English, and Biology. Source df SS MS School Major 50 Error 30 Total 600 Complete the table above (10 points). 3 F In words specific to the problem, what is the appropriate null hypothesis regarding school? What is the critical value for a 0.05 level test on school? In words specific to the problem, what is the conclusion for a 0.05 level test regarding school? Was it useful to use blocks in this problem? Assume that one should not have used blocks. Write the one-way ANOVA table for the situation above without blocking (12 points). Source df SS MS F 6. Which model is better for the lifespan of rabbit in the wild: (A) competing risks or a (B) Cox proportional hazards model? 7. You fit a Cox proportional hazards model to the lifespans of companies. The two covariates (measured in millions) are Liquidity, with coefficient 10, and CEO Pay, with coefficient -12. Suppose Exxon (in the numerator) has $8M and $11M for Liquidity and CEO Pay, respectively, while Walmart has $6M and $12M. What is the hazard ratio? Which company will probably last longer? 4 8. You observe the following IQs for random Duke students in three different majors. Statistics Economics English 120 122 121 115 115 113 110 105 107 110 Suppose you do an analysis of variance and that the mean squared error is 5 and that the test is significant at the 0.05 level. Now you want to find which majors are significantly different. What is the value of your critical value? What is the value of your test statistic for comparing Statistics and English majors? 9. In order to pass this test, you need to know probability and statistics and have a working calculator. The chance that you know probability is 0.8, the chance that you know statistics is 0.9, and you brought two calculators to class: one fails with probability 0.7 and the other fails with probability 0.6. What is the probability that you pass? 10. List all, and only, the true statements (10 pts.) A. Multicollinearity occurs when two explanatory variables are strongly correlated. B. Interactions in regression are handled by taking the product of explanatory variables. C. Increasing failure rates describe human lifespans. D. Including irrelevant explanatory variables reduces predictive accuracy. E. Interpolation is less reliable than extrapolation. F. A running line smoother is more accurate than bin smoothing. G. In high dimensions, it becomes hard for statistical tests to select a good model. H. People underestimate common risks. I. Humans frame risk perception in terms of dread and controllability. J. People tend to have linear utility for money. 5