Name__________________________ Extra Credit E733 th Due Friday December 4 at 9:00AM A perfectly done assignment will be worth 25 points added into your total points o It will be very difficult to earn all 25 points. However most people who completed this last year did earn some points. For this assignment use the word document on the class website to type and to paste into, but give me a hardcopy. Keep everything on the same page as it is in this document o When you type things in please use Arial font and italicize it (making it blue would also be nice) For everything you do set the seed to 1937 so we all get the same results. Part I (short answer/theory review) 1. What are some uses of the Weibull distribution? 2. What are some applications of the log normal distribution? 3. What are some applications of the F distribution? 4. What are some applications of the chi-square distribution? These notes may not be distributed or copied without the permission of David Welsch 1 Part II (simulations) Unless otherwise stated in this section us 10,000 observations Remember we want to set the seed to 1937 5. Bunch of F’s: simulate the following 4 F-distributions (remember I did this slightly incorrect in the notes): one with 1 numerator df and 5 denominator df another with 5 numerator df and 5 denominator df another with 1 numerator df and 100 denominator df another with 10 numerator df and 1 denominator df Paste your kdensity graph here: 6. Put the kernel density graph of an exponential with a lambda of 2 (mean of ½) here (To do this use the stata command for the gamma distribution): 2 7. Using stata simulations demonstrate that a chi-square with 3 degrees of freedom is the same as “adding up three squared standard normal random variables”. For this one use 100,000 observations. a) Copy and paste your state code here: b) Copy and past two nicely labeled descriptive statistics here c) Copy and past two nicely labeled histograms here. 3 8. Simulate 4 different gamma distributions stata for the following gamma functions and put them all on the same graph. Note: this is “standard notation” you will give the stata command different values. Gamma(2,1),Gamma(2,2),Gamma(2,4),Gamma(2,10) Put your graph here: 9. Simulate 4 different gamma distributions in stata for the following gamma functions and put them all on the same graph. Note: this is “standard notation” you will give the stata command different values. Gamma(1,2),Gamma(2,2),Gamma(4,2),Gamma(10,2) Put your graph here: 10. Did the previous two problems give you any insight into properties of what the gamma density looks like? 4 Part III (Demonstrating the CLT) 11. CLT with a gamma We will start with taking samples sizes of 30 Run the following code (note: be careful to use the right things for: ` and ') set seed 1939 postfile sim_mem xmean using CLTsimres1, replace forvalues i=1/100000 { drop _all quietly set obs 30 tempvar x generate `x'=rgamma(10,.5) quietly sum `x' post sim_mem (r(mean)) } postclose sim_mem use CLTsimres1, clear sum histogram xmean A. What is the population mean? B. Put your histogram here. 5 Now change the number of observations in each of your samples to 10. C. Put your histogram here. D. Why is the shape of this histogram somewhat surprising? 6 Part IV (simulating OLS) 12. A two variable regression with normal standard errors. Consider the following DGP: Yi 0 1 X 1i 1 X 1i i i 1...1.000.,000 where is 0 5 , 1 2 3 6 , and is an error term (independent of the X’s) that is normally distributed with a mean of 10 and a standard deviation of 30. Run the following stata code to create a population of 1 million (note: we can pick x1 and x2 from any distribution just so that they are random): clear set seed 1937 set obs 1000000 gen x1=1000*runiform()-100 gen x2=rgamma(2,3) gen error=rnormal(10,30) gen y=5+2*x1+6*x2+error Now run the following stata code to randomly pick a sample from this population where (n=10,000) and run a regression * randomly sorts the population and picks first 10,000 observations generate rannum=runiform() sort rannum keep in 1/10000 reg y x1 x2 Note: it would be OK to skip the second step and just run a regression of the first created dataset Now answer the questions below. A. Are the estimated coefficients (approximately) what we would expect? B. What is the true value of the population constant term? C. What is your estimate of the constant term? D. Why are they so true constant and the estimate constant so different? 7 Now create an irrelevant variable and add it to the regression to see how much it affects the estimated coefficients. Run the following stata code: * generate a random X3 that is not part of the DGP gen x3=rgamma(33,12) * run the regression to see if we include this irrelevant variable how it * will affect the estimate coefficients reg y x1 x2 x3 E. How much did it affect your “beta-hat 1” and your “beta-hat 2”? F. Which estimated coefficient did it affect? G. Why is the significance of X3 interesting? 13. Now try to create your own DGP where the errors are not normally distributed and sample (n=100,000) from it. Did it matter that the errors were not normally distributed. 8