Project #3 UNL STAT 950 Fall 2012 Complete the following problems below. When R is needed for part of a problem, include all of your R program output with code inside of it and any additional information needed to explain your answer. You may need to edit your output and code in order to make it look nice after you copy and paste it into your Word document. 1) The purpose of this problem is to perform inferences about the ratio of variances from two 2 2 populations. Let Y1 ~ F1 and Y2 ~ F2 with Y1 being independent of Y2. Define t(F,F 1 2 ) 1 / 2 as the ratio of the two variances from the two populations. a) What is the plug-in estimator Tplug = t(Fˆ1,Fˆ2 ) ? b) Derive the nonparametric -method estimate of the variance for Tplug. c) Simulate samples from F1 and F2 using the following code: > > > > > > set.seed(8121) n1<-10 n2<-10 y1<-rnorm(n = n1, mean = 0, sd = 1) #Sample from F_1 y2<-rnorm(n = n2, mean = 0, sd = 2) #Sample from F_2 set1<-data.frame(y = c(y1, y2), pop = c(rep(x = 1, times = n1), rep(x = 2, times = n2))) > set1 y pop 1 1.49855323 1 2 -0.98053791 1 3 0.44460395 1 4 -0.97115284 1 5 -1.16749149 1 6 0.33838806 1 7 0.61929116 1 8 0.91287587 1 9 0.73927369 1 10 0.77852717 1 11 0.09511827 2 12 0.50533449 2 13 -2.14266837 2 14 -1.71690455 2 15 -2.14187131 2 16 1.14793945 2 17 -1.58705439 2 18 -2.13012146 2 19 0.52124358 2 20 3.05966581 2 Calculate the empirical influence values for this data using both your derivations from b) and the empinf() function. Calculate the non-parametric -method estimate of the variance for Tplug. d) Estimate the empirical influence values using the jackknife and regression-based methods. Calculate estimates of the variance for Tplug using these estimates of the empirical influence values. When implementing the regression-based methods, use R = 1999 resamples, the boot() function, and set a seed of 8111 before implementing the boot() function. 1 e) Suppose the statistic of interest is changed to 1 n 1 n 2 2 T (Y1j Y1 ) (Y2 j Y2 ) n1 1 j1 n2 1 j1 Use the jackknife and regression-based methods to estimate the variance of T (with same resamples). Compare your estimates to those found in parts c) and d) and provide specific reasons why differences or similarities occur. f) Find the bootstrap estimate of the variance of T using R = 1999 resamples, the boot() function, and set a seed of 8111 before implementing the boot() function. Compare the variance here to those obtained previously. g) In the previous parts, you saw some similarities and differences among the variance estimates. How could we determine which estimate (if any) is correct? There are a few different approaches to answering this question. The parts outlined below provide one approach through using Monte Carlo simulation. i) Simulate 500 data sets using the same settings as given at the beginning of this problem. Display the first and last data sets simulated. Do not use a for loop here! ii) For each simulated data set, calculate the variance estimates for T using each of the four methods examined in this problem (this includes the nonparametric -method variance estimate for Tplug). Set a seed number of 8729 right before estimating the variance with the first data set. Display the first and last variance calculated for each of the four methods. iii) Average the variance estimates for each method over the 500 simulated data sets. iv) Estimate T for each of the 500 simulated data sets. Calculate the sample variance across 2 these estimates of T; i.e., calculate 4991 b500 1 (tb t ) where tb is the estimate of T for the bth data set and t 5001 b500 1 tb . The resulting sample variance is a Monte Carlo estimate of the actual variance of T. v) Compare the values from iii) to that calculated in iv). Which of the variance estimators is doing a better job? vi) Repeat this process for n1 = n2 = 100 using the same seed numbers. Do the variance estimators improve? Explain. 2) This is a continuation of #3 of Project #1 (Example 10.1.22 of Casella and Berger). For simplicity of notation, let T (n 1)1 nj1(Yj Y)2 denote the unbiased sample variance, let Tbias n1 nj1(Yj Y)2 denote that biased sample variance, and let the population variance be . a) Calculate t and tbias. b) Using actual resamples, find the bootstrap estimate of the bias for both T and T bias and find the corresponding corrected statistic values. Compare bias estimates and corrected statistics to the actual values (remember that 2 = 4). Use a seed number of 7818 with the boot() function when taking R = 1999 resamples. c) Using actual resamples, find the bootstrap estimate for the bias of the bias corresponding to both T and Tbias. Compare bias estimates and corrected statistics to the actual values. Use a seed number of 7818 again with the boot() function when taking R = 1999 and M = 500 resamples. d) You may be surprised here that the corrected estimates are not closer to 2. Why is this not a cause for concern? Describe how one could evaluate if a similar problem occurs for other cases? e) Use the jack.after.boot() function to create the default diagnostic plot with respect to T. Comment on the sensitivity of the bootstrap calculations for T using the plot. 2