Homework 1 1. Show that (x i x )( yi y ) ( xi x )( yi ) ( xi )( yi y ) i 2. i i (a) How does the above relate to 𝐸(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅), where x and y are two random variables? Explain. (b) Write another expected value expression that is equivalent to the expression in (a). (c) What distributional concept does that expected value represent? How do questions 1 and 2 relate to probability and statistics? 3. [From Wool C5.3] Suppose you are interested in the effect of parental education on child birthweights. a. Your initial model is bwght = 0 + 1motheduc+u Suppose that mothereduc was a binary variable that was equal to 1 if a mother had completed high school and 0 if a mother had not completed high school. Interpret the coefficients 0 and 1. [I’m not asking you to run this regression, just to show me you know how to interpret coefficient on binary variables.] b. Your next model is bwght = 0 + 1cigs+ 2parity+3faminc+4motheduc+5fatheduc +u Suppose that faminc was measured in logs. How would you interpret 3? c. Use BWGHT.RAW for the following problem. Using STATA, estimate the regression equation in (b). What is the estimated coefficient on the number of cigarettes? Interpret it. What do we mean by asking “is that coefficient statistically significant”? What is the relevant hypothesis test and its associated test statistic? What do you conclude and why? d. Conduct an F test for whether or not parental education (both mother and father) are jointly significant. What is the test statistic? How is it distributed? What do you conclude and why? e. Now construct an LM test of whether motheduc and fatheduc are jointly significant. [NOTE: In obtaining the residuals for the restricted model, be sure that the restricted model is estimated using only those observations for which all the variables in the unrestricted model are available.] 4. Do the Fundamental Econometrics Concepts review. (Attached at the end.) STATA Basics Use the data you had in Anton’s class. If you do not have any data, download the cpsdata.dta from the website. 1. Load your data into Stata. 2. Describe your data. Label any variables missing descriptors. If you have string variables that should be converted to numeric or missing values that should be recoded, do that. 3. Calculate the summary statistics for your dataset. Note any features of interest using comments in your do-file (e.g., potential outliers, low variance variables, etc.) 4. Create a two way table for some of your variables of interest. 5. Use the tab, sum command for a variable of interest. 6. Make a well designed graph for a key relationship of interest. (Use labels, titles, etc.) 7. Use the gen command and the egen command to make two variables of interest. Label these. 8. Create a relevant dummy variable for your data. 9. Run a regression and conduct a hypothesis test of interest to you. Explain how to interpret the results using comments in your do-file. 10. Make a table with the regression results using the outreg2 command. Turn in Your do file Your log file A word document with a well designed graph and a formal table of regression results (from the outreg command, not cut and pasted Stata output). Fundamental Econometric Concepts The following concepts are at the heart of econometrics. I strongly encourage you to write up your answers to these questions in a format that you can refer to throughout this course. I will not provide answers to these questions. You will need to turn this in and continue to resubmit revisions until all of the answers are correct, so I strongly encourage you to type it up! If you did not cover a topic in 561, do not answer the question yet. OLS Estimator 1. What is an estimator? Why is an estimator a random variable? What does it mean that it has a distribution? 2. What does it mean for an estimator to be unbiased? Consistent? Efficient? BLUE? 3. What is the formula for the OLS estimator? Give the formula for the OLS estimator using summation notation for a single independent variable and using matrix notation for multiple independent variables. Give some intuition for the formulas. 4. What is the formula for the standard error of the OLS estimator? Give this in both summation notation and matrix notation. Explain the intuition for this formula. 5. What are the Gauss-Markov assumptions that must be satisfied for the OLS estimator to be BLUE? 6. Under what violations will the OLS estimator be biased? 7. Under what violations will the OLS estimator be inefficient? 8. Which of the GM assumptions can be tested directly? Complications: 9. Multicollinearity 10. Omitted variables 11. Included irrelevant variables 12. Measurement error in X 13. Measurement error in Y 14. Heteroskedasticity 15. Autocorrelation For 9-15, Explain a. Does this problem violate one of the GM assumptions? If so, which? b. Does this problem bias the coefficient estimates or the standard errors? Is what direction is the bias? (If there are specific conditions that need to be satisfied to sign the bias, state these.) c. How would you determine whether this problem was present in your data? If there are specific tests for this problem that YOU HAVE ALREADY STUDIED, list them. d. If there is a solution for this problem that YOU HAVE ALREADY STUDIED, describe it. Hypothesis Testing Once we have constructed a point estimate using our estimator, we of course would like to test hypotheses. To do those tests, we need to use an appropriate distribution for the estimator. (Why?) A distribution is characterized by a family (normal, t, F, Poisson, etc.) and its parameters (mean, variance, etc.) This distribution of the OLS estimator depends on a number of factors, including whether Errors are normally distributed Errors are not normally distributed Variance of errors is known Variance of errors is unknown Sample is small Sample is large List the implications of the above cases (in the relevant combinations) for a. The distribution of the OLS Estimator b. The mean of that distribution c. The standard error d. Hypothesis tests about β (e.g., what distributions will you use, etc.) e. Hypothesis tests about linear restrictions Be sure to state when something is known and when it is estimated, when a distribution is exact and when it is approximate, etc. Hypothesis testing: For each of the following, 1. Give a few examples of hypotheses where the test statistic takes on this distribution/form. 2. What types of conclusions are made about these hypotheses (reject, fail to reject, accept, fail to accept, etc)? 3. How is the test statistic constructed? 4. How is it distributed (what is the relevant distribution/degrees of freedom)? 5. How is the test statistic used to form conclusions about the hypothesis? That is, the test statistic is calculated to be some number—say, 3.2. What does that tell you? Tests A. t-test B. F test C. Wald test D. LM test