Assignment #1

advertisement
Assignment #1
STAT 992
Spring 2015
Complete the following problems below. Within each part, include your R program output with code
inside of it and any additional information needed to explain your answer. Your R code and output
should be formatted in the exact same manner as in the lecture notes.
1) (37 total points) The purpose of this problem is for you to obtain experience with MC simulation in
the context of simple linear regression models.
a) (3 points) Simulate one data set consisting of an explanatory variable X and a response
variable Y using the model Y  0  1X   where  ~ independent N(0,2), 0 = 1, 1 = 2, and
2 = 1. Use a sample size of n = 20 and X ~ independent Uniform(0,1). Set a seed of 9110
prior to simulating the data. Simulate  first with rnorm() and X second runif(). Your first
observation should be x = 0.16184103, y = 1.5476160.
b) (5 points) Estimate and state the corresponding simple linear regression model for the data
simulated in part a). Are the estimates for 0, 1, and 2 close to their actual values? If not,
discuss if this is of concern.
c) (3 points) Simulate R = 10000 different data sets using the same initial seed as in part a).
Verify your second simulated data set has a first observation of x = 0.9703157, y = 3.1645653.
d) (3 points) Estimate how long R = 10000 MC simulations would take with the simulated data in
c) if ̂0 , ̂1 , Var(ˆ1 ) , and ̂2 were calculated for each data set. Use first 100 simulated data
sets to make this judgment. I recommend using the for() function to help estimate each
model.
e) (5 points) Estimate the corresponding simple linear regression models for each data set
simulated in c). For each data set, save the following: ̂0 , ̂1 , Var(ˆ1 ) , and ̂2 . Print their
values for the first six data sets. Compare your estimate of time from part d) to how long it took
for all R simulations.
f) (6 points) Evaluate the approximate unbiasedness of ̂0 , ̂1 , Var(ˆ1 ) , and ̂2 using the values
obtained from e).
g) (4 points) A standard t-based confidence interval for 1 is ˆ1  t1/2,n2 Var(ˆ1)1/ 2 . What is the
estimated true confidence level of this interval using the values obtained from e)? Set  = 0.05.
Is the confidence interval conservative, liberal, or neither?
h) (4 points) A standard t-based hypothesis test of H0: 1 = 2 vs. Ha: 1  2 uses the test statistic
of t0  (ˆ1  2) Var(ˆ1 )1/2 . What is the estimated size of this test using the values obtained from
e)? Set  = 0.05. Is the hypothesis test procedure conservative, liberal, or neither?
i) (4 points) The actual sampling distribution of ̂1 is N(1, 2 ni1(Xi  X)2 ). Construct a EDF plot
and a histogram with this sampling distrbution overlayed upon them. Discuss how well the
actual sampling distribution approximates the empirical distributions plotted. Because
2 ni1(Xi  X)2 will not be exactly the same for each data set, simply use (R  1)1 Rr 1(ˆ1,r  ˆ1 )2
where ̂1,r is the estimated value of 1 for the rth simulated data set and ˆ1  R1 Rr 1ˆ1,r
2) (10 points) For a statistical problem of your own choosing, perform ONE set of MC simulations to
evaluate either the unbiasedness of an estimator, the true confidence level of a confidence
1
interval, OR the size of a hypothesis test. Describe the statistical problem so that a student who
has completed the first year of our MS program would understand it.
2
Download