Hypothesis Testing - Montana State University

advertisement
Homework 1
1. Show that
 (x
i
 x )( yi  y )   ( xi  x )( yi )   ( xi )( yi  y )
i
2.
i
i
(a) How does the above relate to 𝐸(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅), where x and y are two random variables?
Explain. (b) Write another expected value expression that is equivalent to the expression in
(a). (c) What distributional concept does that expected value represent? How do questions 1
and 2 relate to probability and statistics?
3. [From Wool C5.3] Suppose you are interested in the effect of parental education on child
birthweights.
a.
Your initial model is
bwght = 0 + 1motheduc+u
Suppose that mothereduc was a binary variable that was equal to 1 if a mother had
completed high school and 0 if a mother had not completed high school. Interpret the
coefficients 0 and 1. [I’m not asking you to run this regression, just to show me you
know how to interpret coefficient on binary variables.]
b. Your next model is
bwght = 0 + 1cigs+ 2parity+3faminc+4motheduc+5fatheduc +u
Suppose that faminc was measured in logs. How would you interpret 3?
c. Use BWGHT.RAW for the following problem. Using STATA, estimate the regression
equation in (b). What is the estimated coefficient on the number of cigarettes? Interpret
it. What do we mean by asking “is that coefficient statistically significant”? What is the
relevant hypothesis test and its associated test statistic? What do you conclude and why?
d. Conduct an F test for whether or not parental education (both mother and father) are
jointly significant. What is the test statistic? How is it distributed? What do you
conclude and why?
e. Now construct an LM test of whether motheduc and fatheduc are jointly significant.
[NOTE: In obtaining the residuals for the restricted model, be sure that the restricted
model is estimated using only those observations for which all the variables in the
unrestricted model are available.]
4. Do the Fundamental Econometrics Concepts review. (Attached at the end.)
STATA Basics
Use the data you had in Anton’s class. If you do not have any data, download the cpsdata.dta
from the website.
1. Load your data into Stata.
2. Describe your data. Label any variables missing descriptors. If you have string variables
that should be converted to numeric or missing values that should be recoded, do that.
3. Calculate the summary statistics for your dataset. Note any features of interest using
comments in your do-file (e.g., potential outliers, low variance variables, etc.)
4. Create a two way table for some of your variables of interest.
5. Use the tab, sum command for a variable of interest.
6. Make a well designed graph for a key relationship of interest. (Use labels, titles, etc.)
7. Use the gen command and the egen command to make two variables of interest. Label these.
8. Create a relevant dummy variable for your data.
9. Run a regression and conduct a hypothesis test of interest to you. Explain how to interpret
the results using comments in your do-file.
10. Make a table with the regression results using the outreg2 command.
Turn in
 Your do file
 Your log file
 A word document with a well designed graph and a formal table of regression results
(from the outreg command, not cut and pasted Stata output).
Fundamental Econometric Concepts
The following concepts are at the heart of econometrics. I strongly encourage you to
write up your answers to these questions in a format that you can refer to throughout
this course. I will not provide answers to these questions. You will need to turn this
in and continue to resubmit revisions until all of the answers are correct, so I strongly
encourage you to type it up!
If you did not cover a topic in 561, do not answer the question yet.
OLS Estimator
1. What is an estimator? Why is an estimator a random variable? What does it
mean that it has a distribution?
2. What does it mean for an estimator to be unbiased? Consistent? Efficient?
BLUE?
3. What is the formula for the OLS estimator? Give the formula for the OLS
estimator using summation notation for a single independent variable and using
matrix notation for multiple independent variables. Give some intuition for the
formulas.
4. What is the formula for the standard error of the OLS estimator? Give this in
both summation notation and matrix notation. Explain the intuition for this
formula.
5. What are the Gauss-Markov assumptions that must be satisfied for the OLS
estimator to be BLUE?
6. Under what violations will the OLS estimator be biased?
7. Under what violations will the OLS estimator be inefficient?
8. Which of the GM assumptions can be tested directly?
Complications:
9. Multicollinearity
10. Omitted variables
11. Included irrelevant variables
12. Measurement error in X
13. Measurement error in Y
14. Heteroskedasticity
15. Autocorrelation
For 9-15, Explain
a. Does this problem violate one of the GM assumptions? If so, which?
b. Does this problem bias the coefficient estimates or the standard errors? Is
what direction is the bias? (If there are specific conditions that need to be
satisfied to sign the bias, state these.)
c. How would you determine whether this problem was present in your data?
If there are specific tests for this problem that YOU HAVE ALREADY
STUDIED, list them.
d. If there is a solution for this problem that YOU HAVE ALREADY
STUDIED, describe it.
Hypothesis Testing
Once we have constructed a point estimate using our estimator, we of course would
like to test hypotheses. To do those tests, we need to use an appropriate distribution
for the estimator. (Why?) A distribution is characterized by a family (normal, t, F,
Poisson, etc.) and its parameters (mean, variance, etc.)
This distribution of the OLS estimator depends on a number of factors, including
whether






Errors are normally distributed
Errors are not normally distributed
Variance of errors is known
Variance of errors is unknown
Sample is small
Sample is large
List the implications of the above cases (in the relevant combinations) for
a. The distribution of the OLS Estimator
b. The mean of that distribution
c. The standard error
d. Hypothesis tests about β (e.g., what distributions will you use, etc.)
e. Hypothesis tests about linear restrictions
Be sure to state when something is known and when it is estimated, when a
distribution is exact and when it is approximate, etc.
Hypothesis testing:
For each of the following,
1. Give a few examples of hypotheses where the test statistic takes on this
distribution/form.
2. What types of conclusions are made about these hypotheses (reject, fail to
reject, accept, fail to accept, etc)?
3. How is the test statistic constructed?
4. How is it distributed (what is the relevant distribution/degrees of freedom)?
5. How is the test statistic used to form conclusions about the hypothesis? That
is, the test statistic is calculated to be some number—say, 3.2. What does that
tell you?
Tests
A. t-test
B. F test
C. Wald test
D. LM test
Download