Applied Regression/Maximum Likelihood

advertisement
POL 733: Applied Regression
Chris Johnston
christopher.johnston@duke.edu
(919)660-4345
Meetings by appointment, or stop by my office!
Course Synopsis
This is a course on applied regression analysis. The goal of this course is to give you the
conceptual background and concrete tools necessary to read and understand scholarly work
utilizing the general linear model and its multilevel extensions, and to conduct and communicate
your own research utilizing these methods. The course is organized by the nature of the
dependent variable of interest. Each section of the course will be concerned with a different type
of dependent variable, and the methods appropriate to modeling that variable. The final section
of the course will introduce you to multilevel modeling as a generalization of the models we
examine in the previous weeks. I will attempt to cover as much MLM as possible, but the extent
of this coverage will depend on course progression. At a minimum, I hope to get you to the point
where you will feel comfortable with MLMs reported in scholarly articles, and with your ability
to estimate and interpret your own MLMs, perhaps with a bit of supplemental reading.
The course is applied in the sense that we will focus less on statistical theory than on conceptual
understanding and practical execution. The course is designed to give you a great deal of handson experience with all models discussed. Roughly, our class meeting on Thursday will be lecture
and our meeting on Tuesday will be “lab.” In the latter, we will troubleshoot issues related to
estimation, interpretation, and presentation of models discussed on Thursday. Nearly every week
contains a homework assignment to be completed by 8am on Tuesday, which will be emailed to
me. I will read through these assignments prior to class to pinpoint areas of concern, and
structure Tuesday’s class to prioritize these issues. Homework assignments will be “graded” as
“complete” or “incomplete.” I do not expect perfection on these by any means; they are intended
as a learning aid. Nonetheless, you must put time into them. I will give an incomplete on any
assignment for which it is clear you did not make a good-faith effort to attempt the exercises.
In addition to homework assignments, you are required to complete 3 paper assignments, each of
which will entail the estimation, interpretation, and communication of models discussed in the
previous section of the course. Finally, you will be required to complete a full research paper by
the end of the semester, and present your findings to the faculty during a poster session on the
final day of class. The quality of your final paper should be such that, with a little extra work,
you could submit the paper as an article at a peer-reviewed journal. I will grade your final paper
as a reviewer for a journal. There are no exams for this course. Your learning will be examined
on the basis of your demonstrated competence in applying what we cover in class.
Books



Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and
Multilevel/Hierarchical Models. Cambridge: Cambridge University Press.
Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent
Variables. Thousand Oaks, CA: Sage.
Fox, John, and Sanford Weisberg. 2011. An R Companion to Applied Regression, 2nd
Edition. Thousand Oaks, CA: Sage.
Grades



Completion of homework assignments: 20%
Three paper assignments: 20% each
One final paper: 20%
I will use the following scale to assign final grades:
A
B+
C+
D+
93-100
87-90
77-80
67-70
AB
C
D
90-93
83-87
73-77
60-67
BCF
80-83
70-73
<60
Policy
I will follow Duke University’s procedures to establish whether absences from any event related
to this class are justified (e.g. illness, sport events) and merit ad hoc arrangements. Late
assignments will be reduced by one full letter grade for each day beyond the deadline.
I will also follow the University’s policy in any event of plagiarism and academic dishonesty.
Grade complaints: You have the right to dispute a grade if you disagree with it. You must do so
in writing, no more than 3 working days after I have returned the assignment to you. Upon
receiving your appeal, I will reevaluate your grade. Please note that I will reevaluate the entire
assignment. Thus, if I have made an error in your favor, this will also be corrected.
A Note on Software
I will use and teach R in this class, and it will be easiest for you to do the same. I know there are
some Stata users, but, the more I have thought about it, the more strongly I recommend that you
all use R. Very little prior knowledge of R is required to do well in this course, and I am very
willing to help those without prior experience. I will provide notes for Stata users if there is
interest in these, but again, I recommend against this. Software notes in R will be posted on
Sakai and listed for the week in which we discuss the issues relevant to the notes. These will help
you to complete your homework assignment due the following Tuesday.
Homework
Homework assignments should be turned in by email no later than 8am on the Tuesday of the
week for which the assignment is listed. You should append the code and the output generated
for the assignment to the end of the document with your answers to the questions. This will
enable me to identify where problems exist (if there are any problems).
Paper Assignments
All paper assignments should be treated more formally than the homework assignments. That is,
your presentation of any results of analyses you conduct should “look nice.” Do not copy and
paste output from a software package. I am mostly thinking here about tables of coefficients, etc.
Treat these assignments as “mini-research papers”: If you wouldn’t do it in submitting a paper to
a journal, don’t do it here.
Assignment 1
Find data of interest to you that contain at least one dependent variable that can be reasonably
treated as continuous, and that contains predictor variables useful for testing a coherent
hypothesis. In looking for data, you should keep in mind issues of model specification: if you
would not be willing to defend your specification as valid, then rethink your specification or find
another dataset. The paper should do the following things:
(1)
(2)
(3)
(4)
(5)
Describe your data and the hypothesis you are testing (can be relatively brief).
Describe the statistical model you will estimate, and why that model.
Estimate the model via OLS and MLE and present results clearly in a table and/or figure.
Estimate heteroskedasticity-consistent SEs and bootstrapped SEs, and table the results.
Model the variance as a function of predictor variables chosen by you, and explain why
you included those predictors in the variance portion of the model. Present these results
clearly in a table and/or figure.
(6) Estimate the same model as (3), but add an interaction with your key predictor of interest.
Estimate this interactive model via OLS and present the key results clearly in a figure.
(7) Briefly summarize what you found in 3-6.
In all cases, you should substantively interpret your findings in-text.
Assignment 2
Everything same as above, except: find a binary DV, nix (4), after (3) use the MLE model for all
analyses, and provide model-appropriate fit statistics.
Assignment 3
Describe, estimate, present, and interpret three models and respective hypothesis tests: one
ordinal DV, one nominal DV, and one selection model.
Course Schedule
 Indicates a required reading for that week
 Indicates a homework assignment, posted on Sakai under resources, to be completed and
emailed to me no later than 8am on Tuesday of the assignment’s respective week
 Indicates the due date for a paper assignment
o Indicates a recommended reading
January 9th: No class meeting, but…
 G&H: skim Chapter 2 and refresh your understanding of these core concepts
 Long: Chapter 1
o F&W: Chapters 1-3 if you are using R and need to refresh your basic skills
January 14th and 16th: Introduction to regression, Continuous DVs I: concepts


G&H: Chapter 3
Long: Chapter 2.1 and 2.2
January 21st and 23rd: Continuous DVs II: estimation and inference




o
o
G&H: Chapter 3
Long: All of Chapter 2
Software notes 1
HW #1
F&W: Chapter 4
Brambor, Thomas, Williams Roberts Clark, and Matt Golder. 2006. “Understanding
Interaction Models: Improving Empirical Analyses.” Political Analysis 14 (1): 63-82.
January 28th and 30th: Continuous DVs III: diagnostics and additional complexities




G&H Chapter 4
Software notes 2
F&W: Chapter 6 (ignore section 6.6. for now)
HW #2
February 4th and 6th: Binary DVs I: concepts
 Long: Chapter 3.1-3.5
 G&H Chapter 5.1-5.5 and Chapter 6.4
 Paper assignment 1 due on Thursday at start of class
February 11th and 13th: Binary DVs II: estimation and inference


o

o
Long: all of Chapter 3
Software notes 3
F&W: Chapter 5 (up to p. 246)
HW #3
Berry, William D., Jacqueline H.R. DeMeritt, and Justin Esarey. 2010. “Testing for
Interaction in Binary Logit and Probit Models: Is a Product Term Essential?” American
Journal of Political Science 54 (1): 248-266.
February 18th and 20th: Binary DVs III: diagnostics and additional complexities




o
o
G&H: Chapter 5.6-5.8
Long: Chapter 4
Software notes 4
HW #4
F&W: Chapter 6.6 only
Herron, Michael C. 1999. “Postestimation Uncertainty in Limited Dependent Variable
Models.” Political Analysis 8 (1): 83-98.
o Gelman, Andrew, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung Su. 2008. “A Weakly
Informative Default Prior Distribution for Logistic and Other Regression Models.” The
Annals of Applied Statistics 2 (4): 1360-1383.
o Alvarez, R. Michael, and John Brehm. 1997. “Are Americans Ambivalent Towards
Racial Policies?” American Journal of Political Science 41 (2): 345-374.
February 25th and 27th: Ordinal DVs



o

G&H: Chapter 6.5
Long: Chapter 5
Software notes 5
F&W: Chapter 5.9
Paper assignment 2 due on Thursday at start of class
March 4th and 6th: Nominal DVs



o

G&H: 6.5
Long: Chapter 6
Software notes 6
F&W: Chapter 5.7 and 5.8
HW #5
March 11th and 13th: Spring Break
March 18th and 20th: Count DVs



o

G&H: Chapter 6.2
Long: Chapter 8
Software notes 7
F&W: Chapter 5.5 and 5.10.4
HW #6
March 25th and 27th: Models for censoring, truncation, and selection
 Long: Chapter 7
 Software notes 8
 HW #7
April 1st and 3rd: Multilevel models I: Concepts
 G&H: Chapters 1, 11, and 12
 Software notes 9
 Paper assignment 3 due on Thursday at start of class
April 8th and 10th: Multilevel models II: Estimation and Inference
 G&H: Chapters 13, 14, and 15
 Software notes 10
 HW #8
April 15th: Multilevel models III
TBA
May 2nd: Poster presentations (Time TBA)
FINAL PAPER DUE BY 5PM!!!
Download