Statistics 580 Assignment No.4 (40 points) Spring 2009

advertisement

Spring 2009

Statistics 580

Assignment No.4 (40 points)

1. Use the function nlm() in R to obtain the maximum likelihood estimates of k and λ for a random sample from the two-parameter Weibull density f ( x ; λ, k ) = k

λ x

λ k

1 e − ( x/λ ) k

, x ≥ 0 , k > 0 , λ > 0 .

assuming that there are no censored observations. Set-up the objective, gradient, and the Hessian using the deriv3() (or the deriv() ) function in R as in the logistic regression class example. Use the carcinogen data and starting values you used in

Problem #3 (Assignment #3).

2. A standard model for population growth is given by the differential equation dN dt

= rN 1 −

N

K where N is the population size, r is the growth rate parameter, and K is the population carrying capacity of the environment. The solution to this differential equation is given by

K

N t

=

1 + exp ( a − rt ) where N t denotes the population size at time t and a = log ( K − N

0

) /N

0

. Use the

R function nls() to obtain the least squares estimates of K, a and r for fitting the growth rates of AIDS cases in Australia given below. Use the Gauss-Newton algorithm with appropriate starting values you must estimate using the data and print trace information.

Year 1981 1984 1987 1990 1993 1996 1999 2002 2005

Cases 2 55 802 2623 5060 7493 8415 9118 9609

3. Use the R function nr() you wrote to minimize the the sum of squared errors from fitting the model in the previous problem to the Australian AIDS data. Define the derivs() function appropriately and use the same starting values you used above.

Modify your nr() function to write convergence information at each iteration to a file.

Print this file after convergence.

4. An experiment is performed and the data obtained are considered to be a random sample of size n from the bivariate Normal distribution, x = x

1 x

2

!

∼ N ( µ , X ) with µ =

µ

1

µ

2

!

, X =

σ

11

σ

12

σ

12

σ

22

!

and we wish to find the MLE of θ = ( µ

1

, µ

2

, σ

11

, σ

12

, σ

22

) 0 . Suppose that through a random accident, p data values are missing from the first variable x

1

, q data values

1

are missing from the second variable x

2

,and both variables are observed for the other n − ( p + q ) data values. We label the observed data vector x obs

= ( x

1 ,p +1

, . . . , x

1 ,p + q

, x

1 ,p + q +1

, . . . x

1 ,n

, x

2 , 1

, . . . , x

2 ,p

, x

2 ,p + q +1

, . . . , x

2 ,n

)

0

, the missing data vector x mis

= ( x

1 , 1

, . . . , x

1 ,p

, x

2 ,p +1

, . . . , x

2 ,p + q

)

0

, and the complete-data vector is ( x

1 , 1

, . . . x

1 ,n

, x

2 , 1

, . . . , x

2 ,n

) 0 .

(a) Show that the complete data log-likelihood ` ( θ ) belongs to the regular exponential family with the sufficient statistic ( P x

1 i

, P x

2 i

, P x 2

1 i

, P x

1 i x

2 i

, P x 2

2 i

) 0 .

(b) Give (no need to prove) the complete-data MLE of θ

(c) Formulate the E-step and M-step for the ( t + 1) st of θ after the t th iteration given iteration, using the above information.

θ

( t ) is the value

(d) Write an R function to perform these iterations. Use it in a main program to obtain MLE’s using the data set given as a dumped R matrix available in the file virus.R

in the Downloads folder. (To access this data just source this file and note that the NA’s indicate missing values.) For this data, n = 100 , p = 10 , and q = 15 .

5. ( Bonus problem for additional 5 points ) Write a C program to obtain the maximum likelihood estimates of the two-parameter Weibull as in Problem#1 using the

GSL function gsl multimin fdfminimizer vector bfgs .

Follow the C programs supplied for using this function for the logistic problem. Recall that for using the above fuction we need only provide the function and its first derivative. Use appropriate values for the step size and tolerance parameters. Use the same starting values as in Problem #1.

Due Tuesday,14 April, 2009

2

Download