Assignment

advertisement



Name__________________________
Extra Credit E733
th
Due Friday December 4 at 9:00AM
A perfectly done assignment will be worth 25 points added into your total points
o It will be very difficult to earn all 25 points. However most people who completed
this last year did earn some points.
For this assignment use the word document on the class website to type and to paste into,
but give me a hardcopy. Keep everything on the same page as it is in this document
o When you type things in please use Arial font and italicize it (making it blue would
also be nice)
For everything you do set the seed to 1937 so we all get the same results.
Part I (short answer/theory review)
1. What are some uses of the Weibull distribution?
2. What are some applications of the log normal distribution?
3. What are some applications of the F distribution?
4. What are some applications of the chi-square distribution?

These notes may not be distributed or copied without the permission of David Welsch
1
Part II (simulations)
 Unless otherwise stated in this section us 10,000 observations
 Remember we want to set the seed to 1937
5. Bunch of F’s: simulate the following 4 F-distributions (remember I did this slightly incorrect
in the notes):
one with 1 numerator df and 5 denominator df
another with 5 numerator df and 5 denominator df
another with 1 numerator df and 100 denominator df
another with 10 numerator df and 1 denominator df
Paste your kdensity graph here:
6. Put the kernel density graph of an exponential with a lambda of 2 (mean of ½) here
(To do this use the stata command for the gamma distribution):
2
7. Using stata simulations demonstrate that a chi-square with 3 degrees of freedom is the same
as “adding up three squared standard normal random variables”. For this one use 100,000
observations.
a) Copy and paste your state code here:
b) Copy and past two nicely labeled descriptive statistics here
c) Copy and past two nicely labeled histograms here.
3
8. Simulate 4 different gamma distributions stata for the following gamma functions and put
them all on the same graph.
Note: this is “standard notation” you will give the stata command different values.
Gamma(2,1),Gamma(2,2),Gamma(2,4),Gamma(2,10)
Put your graph here:
9. Simulate 4 different gamma distributions in stata for the following gamma functions and put
them all on the same graph.
Note: this is “standard notation” you will give the stata command different values.
Gamma(1,2),Gamma(2,2),Gamma(4,2),Gamma(10,2)
Put your graph here:
10. Did the previous two problems give you any insight into properties of what the gamma
density looks like?
4
Part III (Demonstrating the CLT)
11. CLT with a gamma
 We will start with taking samples sizes of 30
Run the following code (note: be careful to use the right things for: ` and ')
set seed 1939
postfile sim_mem xmean using CLTsimres1, replace
forvalues i=1/100000 {
drop _all
quietly set obs 30
tempvar x
generate `x'=rgamma(10,.5)
quietly sum `x'
post sim_mem (r(mean))
}
postclose sim_mem
use CLTsimres1, clear
sum
histogram xmean
A. What is the population mean?
B. Put your histogram here.
5

Now change the number of observations in each of your samples to 10.
C. Put your histogram here.
D. Why is the shape of this histogram somewhat surprising?
6
Part IV (simulating OLS)
12. A two variable regression with normal standard errors.
Consider the following DGP:
Yi   0  1 X 1i  1 X 1i   i
i  1...1.000.,000
where is  0  5 , 1  2  3  6 , and  is an error term (independent of the X’s) that is
normally distributed with a mean of 10 and a standard deviation of 30.
Run the following stata code to create a population of 1 million (note: we can pick x1 and x2
from any distribution just so that they are random):
clear
set seed 1937
set obs 1000000
gen x1=1000*runiform()-100
gen x2=rgamma(2,3)
gen error=rnormal(10,30)
gen y=5+2*x1+6*x2+error
Now run the following stata code to randomly pick a sample from this population where
(n=10,000) and run a regression
* randomly sorts the population and picks first 10,000 observations
generate rannum=runiform()
sort rannum
keep in 1/10000
reg y x1 x2
Note: it would be OK to skip the second step and just run a regression of the first created dataset
Now answer the questions below.
A. Are the estimated coefficients (approximately) what we would expect?
B. What is the true value of the population constant term?
C. What is your estimate of the constant term?
D. Why are they so true constant and the estimate constant so different?
7
Now create an irrelevant variable and add it to the regression to see how much it affects the
estimated coefficients.
Run the following stata code:
* generate a random X3 that is not part of the DGP
gen x3=rgamma(33,12)
* run the regression to see if we include this irrelevant variable how it
* will affect the estimate coefficients
reg y x1 x2 x3
E. How much did it affect your “beta-hat 1” and your “beta-hat 2”?
F. Which estimated coefficient did it affect?
G. Why is the significance of X3 interesting?
13. Now try to create your own DGP where the errors are not normally distributed and sample
(n=100,000) from it. Did it matter that the errors were not normally distributed.
8
Download