Stat 579: Final Exam – November 17, 7-9 pm

advertisement
Stat 579: Final Exam – November 17, 7-9 pm
The total number of points you can attempt to score is 110. Out of this, 10 points are reserved for clarity
of presentation. However, the total score credited will be 100.
1. The astronomer Edmund Halley devised a faster and iterative root-finding method than the NewtonRaphson method, but which requires second derivatives. Thus, if xn is the current solution, then the
update xn+1 is as follows
xn+1 = xn −
f (xn )
f 0 (xn ) − f (xn )f 00 (xn )/2f 0 (xn )
where the updates proceed until convergence.
(a) Write a function in R called halley() which takes as its input the initial estimate, the function
(for which the root is to be found), its first two derivative functions, and desired tolerance. Make
sure that the function halley() is so written that it can take variable number of arguments (for
use in statistical problems where we may need to maximize a parameter given some observations
as in the next problem). [25 points]
(b) The following logistic distribution has the density function:
f (x; µ) =
exp {−(x − µ)}
f or
[1 + exp {−(x − µ)}]2
− ∞ < x < ∞,
−∞ < µ < ∞
For an observed random sample x1 , x2 , . . . , xn from this distribution, the log likelihood is
`(µ) = −
n
X
(Xi − µ) − 2
i=1
n
X
log {1 + exp {−(Xi − µ)}}
i=1
The file at http://maitra.public.iastate.edu/stat579/logistic.dat provides realizations
from the logistic distribution. We now analyze this dataset.
i. Plot the log likelihood `(µ) in the range −10 < µ < 10. What is an appropriate initial value?
[15 points]
ii. Use the R function optimize() to find the maximum likelihood estimate of µ. [10 points]
iii. Use the function halley() to find the maximum likelihood estimate of µ by solving `0 (µ) = 0
using the starting value of µ(0) that you identify in the figure. (If you are unable to use halley,
you may use the Newton-Raphson method from the class, but you will only be credited with a
maximum of 10 points on this question.) [15 points]
Turn in the plot, any functions you write, function calls, and the results. P.S. To help you, the
necessary derivatives are provided:
∂`
∂µ
∂2`
∂µ2
= n−2
i=1
= −2
n
X
i=1
3
∂ `
∂µ3
n
X
= −2
exp {−(Xi − µ)}
1 + exp {−(Xi − µ)}
exp {−(Xi − µ)}
[1 + exp {−(Xi − µ)}]2
n
X
exp {−(Xi − µ)} [1 − exp {−(Xi − µ)}]
3
i=1
[1 + exp {−(Xi − µ)}]
2. Seattle Air Pollution Data. In Seattle, the concentrations of fine particles are highest in the winter, and
woodsmoke is a major source. If similar associations with deaths or hospitalizations are also seen in
Seattle it is less likely that the associations are explained by confounding. Confounding is an important
issue because the estimated health effects of fine particles are very small, substantially smaller than the
Stat 579, Fall 2011 – Maitra
2
variations expected from changes in temperature or other seasonally varying factors such as influenza
epidemics.
The file available on the WWW at the publicly accessible location:
http://maitra.public.iastate.edu/stat579/SEAir.dat provides daily data on pollution, weather,
and hospital admissions for asthma and for all respiratory diseases in Seattle over an eight-year period
(January 1, 1987–December 31, 1994). The file contains:
date:
dow:
yr:
mo:
admyng:
Date, in number of days since 1-1-1960
Day of the week, with Sunday=1,Monday=2,...
Year: 1=1987, 2=1988,...
Month
Number of hospital admissions for all respiratory
diseases on that day in Seattle hospitals, for people aged
5-65 years
astyng: Number of hospital admissions for asthma on that day
in Seattle hospitals, for people aged 5-65 years
pm25: 24-hour average of concentration of fine particles
(below 2.5 microns aerodynamic diameter)
temp: Daily maximum temperature (Fahrenheit)
o3max8: Highest 8-hour average concentration of ozone. Ozone
is measured only in summer; the concentrations are thought to
be low in winter
so2avg: 24-hour average of sulfur dioxide concentration.
coavg:
24-hour average of carbon monoxide concentration.
stagno: Air stagnation index: the number of hours in the day
with little or no wind.
The pollution measurements (pm25, o3max8, so2avg,coavg) are averaged over all the available monitors in the Seattle area. There is only one monitor for sulfur dioxide, and it is near the only important
source of sulfur dioxide pollution in the area. Ozone and sulfur dioxide are known to have respiratory
effects at sufficiently high concentrations, but the concentration of sulfur dioxide in Seattle is low.
Carbon dioxide is not thought to have effects on respiratory disease at the concentrations we see in
Seattle.
(a) For each of the seven days of the week, and using tapply and a single function with appropriate
arguments, calculate the median and the first and third quartiles (only) of the total number of
admissions for all respiratory diseases on that day of the week. Plot these against the days of the
week using a line plot. Comment. [15 points].
(b) Create a new dataframe with the observations from February 29, 1988 and February 29, 1992
discarded. Next create a three-dimensional array with the eight years in the third dimension with
each index representing observations in the dataframe for that year. Provide annotated R code
for this purpose. [10 points]
(c) Using the apply function on the above array, provide the yearly averages of number of daily
hospital admissions for asthma as well as all respiratory diseases for that year, the 24-hour average
concentration of fine particles, the daily maximum temperature (in F), the highest 8-hour daily
concentration of ozone, the 24-hour average of sulfur dioxide and carbon monoxide concentrations
as well as the air stagnation index. Comment on changes in these quantities (if any) over the
years. [10 points]
What to turn in
Turn in your best effort code, and the first five rows of each matrix (using the head function in R) wherever
appropriate.
Download