Assignment #1 STAT 992 Spring 2015 Complete the following problems below. Within each part, include your R program output with code inside of it and any additional information needed to explain your answer. Your R code and output should be formatted in the exact same manner as in the lecture notes. 1) (37 total points) The purpose of this problem is for you to obtain experience with MC simulation in the context of simple linear regression models. a) (3 points) Simulate one data set consisting of an explanatory variable X and a response variable Y using the model Y 0 1X where ~ independent N(0,2), 0 = 1, 1 = 2, and 2 = 1. Use a sample size of n = 20 and X ~ independent Uniform(0,1). Set a seed of 9110 prior to simulating the data. Simulate first with rnorm() and X second runif(). Your first observation should be x = 0.16184103, y = 1.5476160. b) (5 points) Estimate and state the corresponding simple linear regression model for the data simulated in part a). Are the estimates for 0, 1, and 2 close to their actual values? If not, discuss if this is of concern. c) (3 points) Simulate R = 10000 different data sets using the same initial seed as in part a). Verify your second simulated data set has a first observation of x = 0.9703157, y = 3.1645653. d) (3 points) Estimate how long R = 10000 MC simulations would take with the simulated data in c) if ̂0 , ̂1 , Var(ˆ1 ) , and ̂2 were calculated for each data set. Use first 100 simulated data sets to make this judgment. I recommend using the for() function to help estimate each model. e) (5 points) Estimate the corresponding simple linear regression models for each data set simulated in c). For each data set, save the following: ̂0 , ̂1 , Var(ˆ1 ) , and ̂2 . Print their values for the first six data sets. Compare your estimate of time from part d) to how long it took for all R simulations. f) (6 points) Evaluate the approximate unbiasedness of ̂0 , ̂1 , Var(ˆ1 ) , and ̂2 using the values obtained from e). g) (4 points) A standard t-based confidence interval for 1 is ˆ1 t1/2,n2 Var(ˆ1)1/ 2 . What is the estimated true confidence level of this interval using the values obtained from e)? Set = 0.05. Is the confidence interval conservative, liberal, or neither? h) (4 points) A standard t-based hypothesis test of H0: 1 = 2 vs. Ha: 1 2 uses the test statistic of t0 (ˆ1 2) Var(ˆ1 )1/2 . What is the estimated size of this test using the values obtained from e)? Set = 0.05. Is the hypothesis test procedure conservative, liberal, or neither? i) (4 points) The actual sampling distribution of ̂1 is N(1, 2 ni1(Xi X)2 ). Construct a EDF plot and a histogram with this sampling distrbution overlayed upon them. Discuss how well the actual sampling distribution approximates the empirical distributions plotted. Because 2 ni1(Xi X)2 will not be exactly the same for each data set, simply use (R 1)1 Rr 1(ˆ1,r ˆ1 )2 where ̂1,r is the estimated value of 1 for the rth simulated data set and ˆ1 R1 Rr 1ˆ1,r 2) (10 points) For a statistical problem of your own choosing, perform ONE set of MC simulations to evaluate either the unbiasedness of an estimator, the true confidence level of a confidence 1 interval, OR the size of a hypothesis test. Describe the statistical problem so that a student who has completed the first year of our MS program would understand it. 2