STAT 511 Spring 2001 Assignment 7 NAME ________________ Reading Assignment: Efron & Tibshirani, article on bootstrap estimation from Statistical Science, 1986, Vol. 1., N0. 1, pp 54-77. Firth, D., article on Generalized Linear Models from Stat. Theory & Modelling, 1991, pp 55-82. (These articles are available from the Parks Library electronic reserve system. The course web page has a link) Rencher, Chapter 17. Written Assignment: Solutions will be posted by April 27. Final Exam: 1. Monday, April 30, 9:45-11:45 a.m. in 171 Durham. Suppose X1, X2, ..., Xn is a random sample from a population with c.d.f. F(x) and density function f(x). Let θ denote the median of the population. Suppose n = 2m is an even integer. Then, a consistent estimator for the median is obtained by ordering the observations from smallest to largest, X(1) ≤ X(2) ≤ ... ≤ X(n), and computing the sample median θ̂n = (X(m) + X(m+1))/2 When the form of the c.d.f. is known, the exact finite sample distribution of θ̂n can be obtained from the theory of order statistics. Unless n is very small, however, the evaluation of moments, such as the variance of θ̂n , can be quite tedious. Standard asymptotic theory reveals that n Var( θ̂n ) → 1 4[f (θ)]2 as n → ∞ , where f(θ) is the density function evaluated at θ, the true median for the population. You could use (4n[f( θ̂n )]2 )-1 as an estimate of Var( θ̂n ), if you knew the form of the density function. The bootstrap provides a reasonably good estimator for the standard error of the sample median, as well as a confidence interval for the median, without any knowledge of the c.d.f. F(x) or density f(x). Basically, the bootstrap replaces the derivation of asymptotic theoretical results with simulation. (a) The file posted as heart1.dat contains survival times (in days) for 69 patients who received heart transplants. There is one line of data for each patient and each line contains a survival time. Compute the sample median of the survival times. Use bootstrap methods to obtain a standard error for the sample median and 95% percentile confidence limits for the population median. Also compute bias corrected accelerated 95% confidence limits. (Use 5000 bootstrap samples. SPLUS code for doing this is posted in the file heart.ssc. Be sure to remove 2 unwanted files from your .Data directory or your .MySwork directory when you are finished with them.) 2. (b) Repeat part (a) for the survival times for a similar set of 34 patients that did not receive heart transplants these data are posted as heart2.dat. (S-PLUS code is available in the file posted as heart.ssc.) (c) Use bootstrap methods to test the null hypothesis that the median survival time for patients that received heart transplants is equal to the median survival time for similar patients that do not receive heart transplants. State your conclusion. (S-PLUS code is available in the file posted as heart.ssc.) Consider the model you fit to the chemical process data in part (c) of Problem 3 on Assignment 6. Use appropriate bootstrap methods to (a) Construct individual 95% confidence intervals for β1 and β2. Use both empirical and BCA bootstrap methods for each of the following resampling schemes: i. Resample cases with replacement. ii. Keep the model matrix fixed and resample centered residuals iii. Keep the model matrix fixed and resample centered residuals that have also been re-scaled to approximately have the same variance as the actual random errors. 3. (b) Construct a 95% confidence interval for the expected fraction of original material remaining after the process is run for 80 minutes at a temperature of 625 degrees Kalvin. (c) Construct a 95% prediction interval for the actual fraction of original material the would remain if the process was run for 80 minutes at a temperature of 625 degrees Kalvin. The file shelter.dat on the course web page contains data on wind speed at various distances behind a windbreak. In this case the windbreak is a row of trees planted along the western boundary of a field. The data were recorded when the wind was directly from the west and hit the windbreak at a 90 degree angle. There are two numbers on each line of the file. The first is distance behind the windbreak (in meters) and the second is the relative velocity of the wind recorded as a percentage of the wind speed in front of the windbreak. Plot the data. Try each of the following methods for fitting a curve to these data: (i) (ii) (iii) Polynomial regression (It may require a high order polynomial) Loess Gaussian kernel smoother 3 For each method provide a graph of the fitted curve and a residual plot with a “smoothed curve” to indicate any trend. Indicate the degree of the polynomial or the span or bandwidth you selected for Loess or Gaussian kernel smoothing. Finally estimate the mean relative wind speed at 1, 2, 5, 10, 20 and 30 meters behind the windbreak and report standard errors or confidence intervals for your estimates. S-PLUS code is displayed in the solutions and posted in the file shelter.ssc. The SAS package has a LOESS procedure for fitting smooth curves to data without specifying a formula for the curve. For more information of model free curve fitting techniques look at the article by Hastie and Tibshirani that have been posted on the Parks Library electronic reserve system for this course. (The course web page has a link to this site.)