STAT 511 Spring 2001

advertisement
STAT 511
Spring 2001
Assignment 7
NAME ________________
Reading Assignment: Efron & Tibshirani, article on bootstrap estimation from Statistical
Science, 1986, Vol. 1., N0. 1, pp 54-77.
Firth, D., article on Generalized Linear Models from Stat. Theory &
Modelling, 1991, pp 55-82. (These articles are available from the Parks
Library electronic reserve system. The course web page has a link)
Rencher, Chapter 17.
Written Assignment: Solutions will be posted by April 27.
Final Exam:
1.
Monday, April 30, 9:45-11:45 a.m. in 171 Durham.
Suppose X1, X2, ..., Xn is a random sample from a population with c.d.f. F(x) and density
function f(x). Let θ denote the median of the population. Suppose n = 2m is an even
integer. Then, a consistent estimator for the median is obtained by ordering the
observations from smallest to largest, X(1) ≤ X(2) ≤ ... ≤ X(n), and computing the sample
median
θ̂n = (X(m) + X(m+1))/2
When the form of the c.d.f. is known, the exact finite sample distribution of θ̂n can be
obtained from the theory of order statistics. Unless n is very small, however, the
evaluation of moments, such as the variance of θ̂n , can be quite tedious. Standard
asymptotic theory reveals that
n Var( θ̂n ) →
1
4[f (θ)]2
as n → ∞ ,
where f(θ) is the density function evaluated at θ, the true median for the population. You
could use (4n[f( θ̂n )]2 )-1 as an estimate of Var( θ̂n ), if you knew the form of the density
function.
The bootstrap provides a reasonably good estimator for the standard error of the sample
median, as well as a confidence interval for the median, without any knowledge of the
c.d.f. F(x) or density f(x). Basically, the bootstrap replaces the derivation of asymptotic
theoretical results with simulation.
(a)
The file posted as heart1.dat contains survival times (in days) for 69 patients
who received heart transplants. There is one line of data for each patient and each
line contains a survival time. Compute the sample median of the survival times.
Use bootstrap methods to obtain a standard error for the sample median and 95%
percentile confidence limits for the population median. Also compute bias
corrected accelerated 95% confidence limits. (Use 5000 bootstrap samples. SPLUS code for doing this is posted in the file heart.ssc. Be sure to remove
2
unwanted files from your .Data directory or your .MySwork directory when you
are finished with them.)
2.
(b)
Repeat part (a) for the survival times for a similar set of 34 patients that did not
receive heart transplants these data are posted as heart2.dat. (S-PLUS code
is available in the file posted as heart.ssc.)
(c)
Use bootstrap methods to test the null hypothesis that the median survival time for
patients that received heart transplants is equal to the median survival time for
similar patients that do not receive heart transplants. State your conclusion.
(S-PLUS code is available in the file posted as heart.ssc.)
Consider the model you fit to the chemical process data in part (c) of Problem 3 on
Assignment 6. Use appropriate bootstrap methods to
(a) Construct individual 95% confidence intervals for β1 and β2. Use both empirical and
BCA bootstrap methods for each of the following resampling schemes:
i.
Resample cases with replacement.
ii.
Keep the model matrix fixed and resample centered residuals
iii.
Keep the model matrix fixed and resample centered residuals that have
also been re-scaled to approximately have the same variance as the actual
random errors.
3.
(b)
Construct a 95% confidence interval for the expected fraction of original material
remaining after the process is run for 80 minutes at a temperature of 625 degrees
Kalvin.
(c)
Construct a 95% prediction interval for the actual fraction of original material the
would remain if the process was run for 80 minutes at a temperature of 625
degrees Kalvin.
The file shelter.dat on the course web page contains data on wind speed at various
distances behind a windbreak. In this case the windbreak is a row of trees planted along
the western boundary of a field. The data were recorded when the wind was directly
from the west and hit the windbreak at a 90 degree angle. There are two numbers on
each line of the file. The first is distance behind the windbreak (in meters) and the
second is the relative velocity of the wind recorded as a percentage of the wind speed in
front of the windbreak.
Plot the data. Try each of the following methods for fitting a curve to these data:
(i)
(ii)
(iii)
Polynomial regression (It may require a high order polynomial)
Loess
Gaussian kernel smoother
3
For each method provide a graph of the fitted curve and a residual plot with a “smoothed
curve” to indicate any trend. Indicate the degree of the polynomial or the span or
bandwidth you selected for Loess or Gaussian kernel smoothing. Finally estimate the
mean relative wind speed at 1, 2, 5, 10, 20 and 30 meters behind the windbreak and
report standard errors or confidence intervals for your estimates. S-PLUS code is
displayed in the solutions and posted in the file shelter.ssc. The SAS package has a
LOESS procedure for fitting smooth curves to data without specifying a formula for the
curve. For more information of model free curve fitting techniques look at the article by
Hastie and Tibshirani that have been posted on the Parks Library electronic reserve
system for this course. (The course web page has a link to this site.)
Download