Lab #2 - pantherFILE

advertisement
Lab #2
POL SCI 701
Fall 2002
Let's start out by taking the STATA tutorial on regression. Type tutorial regress. Once
you've completed the tutorial, you may move onto the actual assignment.
Part I: Follow along with Hamilton.
This section is designed to get you to read the Hamilton chapters I gave you and to see
how STATA runs certain commands. I also hope to give you experience interpreting the
output obtained by running the various regression commands. Be sure to answer each
question fully and to attach all relevant output. I'd prefer responses to the various
questions to be typed. You can either answer each number separately or write your
responses in essay form. Just be sure to answer each part of each question. NOTE: It is
adequate for this section to use only those tests given in the Hamilton chapters. In Part
II, you will be asked to also run tests learned in class.
Using states.dta (from the Hamilton package), do the following:
1) Run a regression using mean composite SAT score as the dependent
variable and per pupil expenditures as the independent variable. Interpret
all relevant output. (F, R-squared, Adj R-squared, the effect of x on y,
whether x is significant and at what level, MSE (standard error of the
estimate we've talked about before)). What does this tell us
substantively? (Attach both your answers and the output generated.)
2) Control for the potential influence of other variables including
percentage of high school grads taking the test, median household income
of the student's family, percentage of test takers over 25 with a high school
diploma, and percentage of test takers over 25 with a bachelor's degree.
What do you find? Again, interpret (and attach) all relevant output. Test
the null hypothesis that the addition of these variables was not necessary.
(HINT: Conduct an F test using information obtained from this regression
and the regression in number 1.)
3) Which variable has the most substantive impact on SAT scores? (Run
the appropriate regression needed to answer this question and interpret all
relevant output.)
* For the following questions, use the large model without standardized
coefficients. You should run that model again so that it's the last regression in
STATA's memory.
4) Is there any evidence of multicollinearity in your model? Use plain old
correlations to test this and the outputs of your model for clues.
5) Is there any evidence of autocorrelation in your model? Use the
durbin-watson. (Note: To use the t ( ) option with regdw, you should first
sort the data by region, then create a variable that merely numbers the
cases (so each state is now a number), (use gen stcode=_n) and then use
that new variable as the time variable. It should be clear to you why we
would need to do this. [See your notes on autocorrelation if it is not!])
Compare your results here with an appropriate diagnostic plot. What do
you find?
6) Is there evidence of an omitted variable bias? Use ovtest.
7) Is there evidence of heteroskedasticity? Use hettest. Which states
seem to be the problem? [HINT: You may want to run a diagnostic plot to
determine this]
8) Are there any influential cases? Use cooksd, dfits, and dfbeta to
determine this.
Part II: Applying what we've learned here and in class to a real data problem.
In this section, we'll use the tests in Hamilton along with some of the other tests we
learned how to conduct in class to test for violations of the various OLS assumptions.
Use the STATA dataset auto.dta (found in the lab computers in the STATA folder) to
conduct the following analyses.
1) Create a variable called forxmpg which is the interaction between
foreign and mpg (gen foxmpg=foreign*mpg).
2) Run a regression using price as the dependent variable, and weight,
mpg, forxmpg, and foreign as the independent variables. Interpret this
regression. First, what seems to be the theory here? Second, is the theory
supported? Third, what does the regression output tell us? Which
variable has the most impact on the dependent variable?
3) Is there evidence of heteroskedasticity here? Use STATA's hettest, a
plot of the residuals, the Goldfeld-Quandt test, and White's general
heteroskedasticity test. What do you conclude from each?
4) Is there evidence of multicollinearity? Examine the output of the
regression for clues, check the pairwise correlations and partial
correlations, run the necessary auxiliary regressions, and examine the
VIF. What do you conclude?
5) Are there any influential cases? Again, use cooksd, dffits, and dfbeta
to make this determination.
6) Since this isn't a time-series or really even cross sectional dataset, open
the states.dta dataset again and test it for autocorrelation (you obtained
the Durbin Watson above). Again, be sure to sort by region and then
create a counting variable that can be a proxy for time. Now plot the
residuals and run the Breush-Godfrey and M tests. What can you
conclude? Is this conclusion different from that found above using
Durbin's d?
Download