Stat 579: Final Exam – November 17, 7-9 pm The total number of points you can attempt to score is 110. Out of this, 10 points are reserved for clarity of presentation. However, the total score credited will be 100. 1. The astronomer Edmund Halley devised a faster and iterative root-finding method than the NewtonRaphson method, but which requires second derivatives. Thus, if xn is the current solution, then the update xn+1 is as follows xn+1 = xn − f (xn ) f 0 (xn ) − f (xn )f 00 (xn )/2f 0 (xn ) where the updates proceed until convergence. (a) Write a function in R called halley() which takes as its input the initial estimate, the function (for which the root is to be found), its first two derivative functions, and desired tolerance. Make sure that the function halley() is so written that it can take variable number of arguments (for use in statistical problems where we may need to maximize a parameter given some observations as in the next problem). [25 points] (b) The following logistic distribution has the density function: f (x; µ) = exp {−(x − µ)} f or [1 + exp {−(x − µ)}]2 − ∞ < x < ∞, −∞ < µ < ∞ For an observed random sample x1 , x2 , . . . , xn from this distribution, the log likelihood is `(µ) = − n X (Xi − µ) − 2 i=1 n X log {1 + exp {−(Xi − µ)}} i=1 The file at http://maitra.public.iastate.edu/stat579/logistic.dat provides realizations from the logistic distribution. We now analyze this dataset. i. Plot the log likelihood `(µ) in the range −10 < µ < 10. What is an appropriate initial value? [15 points] ii. Use the R function optimize() to find the maximum likelihood estimate of µ. [10 points] iii. Use the function halley() to find the maximum likelihood estimate of µ by solving `0 (µ) = 0 using the starting value of µ(0) that you identify in the figure. (If you are unable to use halley, you may use the Newton-Raphson method from the class, but you will only be credited with a maximum of 10 points on this question.) [15 points] Turn in the plot, any functions you write, function calls, and the results. P.S. To help you, the necessary derivatives are provided: ∂` ∂µ ∂2` ∂µ2 = n−2 i=1 = −2 n X i=1 3 ∂ ` ∂µ3 n X = −2 exp {−(Xi − µ)} 1 + exp {−(Xi − µ)} exp {−(Xi − µ)} [1 + exp {−(Xi − µ)}]2 n X exp {−(Xi − µ)} [1 − exp {−(Xi − µ)}] 3 i=1 [1 + exp {−(Xi − µ)}] 2. Seattle Air Pollution Data. In Seattle, the concentrations of fine particles are highest in the winter, and woodsmoke is a major source. If similar associations with deaths or hospitalizations are also seen in Seattle it is less likely that the associations are explained by confounding. Confounding is an important issue because the estimated health effects of fine particles are very small, substantially smaller than the Stat 579, Fall 2011 – Maitra 2 variations expected from changes in temperature or other seasonally varying factors such as influenza epidemics. The file available on the WWW at the publicly accessible location: http://maitra.public.iastate.edu/stat579/SEAir.dat provides daily data on pollution, weather, and hospital admissions for asthma and for all respiratory diseases in Seattle over an eight-year period (January 1, 1987–December 31, 1994). The file contains: date: dow: yr: mo: admyng: Date, in number of days since 1-1-1960 Day of the week, with Sunday=1,Monday=2,... Year: 1=1987, 2=1988,... Month Number of hospital admissions for all respiratory diseases on that day in Seattle hospitals, for people aged 5-65 years astyng: Number of hospital admissions for asthma on that day in Seattle hospitals, for people aged 5-65 years pm25: 24-hour average of concentration of fine particles (below 2.5 microns aerodynamic diameter) temp: Daily maximum temperature (Fahrenheit) o3max8: Highest 8-hour average concentration of ozone. Ozone is measured only in summer; the concentrations are thought to be low in winter so2avg: 24-hour average of sulfur dioxide concentration. coavg: 24-hour average of carbon monoxide concentration. stagno: Air stagnation index: the number of hours in the day with little or no wind. The pollution measurements (pm25, o3max8, so2avg,coavg) are averaged over all the available monitors in the Seattle area. There is only one monitor for sulfur dioxide, and it is near the only important source of sulfur dioxide pollution in the area. Ozone and sulfur dioxide are known to have respiratory effects at sufficiently high concentrations, but the concentration of sulfur dioxide in Seattle is low. Carbon dioxide is not thought to have effects on respiratory disease at the concentrations we see in Seattle. (a) For each of the seven days of the week, and using tapply and a single function with appropriate arguments, calculate the median and the first and third quartiles (only) of the total number of admissions for all respiratory diseases on that day of the week. Plot these against the days of the week using a line plot. Comment. [15 points]. (b) Create a new dataframe with the observations from February 29, 1988 and February 29, 1992 discarded. Next create a three-dimensional array with the eight years in the third dimension with each index representing observations in the dataframe for that year. Provide annotated R code for this purpose. [10 points] (c) Using the apply function on the above array, provide the yearly averages of number of daily hospital admissions for asthma as well as all respiratory diseases for that year, the 24-hour average concentration of fine particles, the daily maximum temperature (in F), the highest 8-hour daily concentration of ozone, the 24-hour average of sulfur dioxide and carbon monoxide concentrations as well as the air stagnation index. Comment on changes in these quantities (if any) over the years. [10 points] What to turn in Turn in your best effort code, and the first five rows of each matrix (using the head function in R) wherever appropriate.