POL 733: Applied Regression Chris Johnston christopher.johnston@duke.edu (919)660-4345 Meetings by appointment, or stop by my office! Course Synopsis This is a course on applied regression analysis. The goal of this course is to give you the conceptual background and concrete tools necessary to read and understand scholarly work utilizing the general linear model and its multilevel extensions, and to conduct and communicate your own research utilizing these methods. The course is organized by the nature of the dependent variable of interest. Each section of the course will be concerned with a different type of dependent variable, and the methods appropriate to modeling that variable. The final section of the course will introduce you to multilevel modeling as a generalization of the models we examine in the previous weeks. I will attempt to cover as much MLM as possible, but the extent of this coverage will depend on course progression. At a minimum, I hope to get you to the point where you will feel comfortable with MLMs reported in scholarly articles, and with your ability to estimate and interpret your own MLMs, perhaps with a bit of supplemental reading. The course is applied in the sense that we will focus less on statistical theory than on conceptual understanding and practical execution. The course is designed to give you a great deal of handson experience with all models discussed. Roughly, our class meeting on Thursday will be lecture and our meeting on Tuesday will be “lab.” In the latter, we will troubleshoot issues related to estimation, interpretation, and presentation of models discussed on Thursday. Nearly every week contains a homework assignment to be completed by 8am on Tuesday, which will be emailed to me. I will read through these assignments prior to class to pinpoint areas of concern, and structure Tuesday’s class to prioritize these issues. Homework assignments will be “graded” as “complete” or “incomplete.” I do not expect perfection on these by any means; they are intended as a learning aid. Nonetheless, you must put time into them. I will give an incomplete on any assignment for which it is clear you did not make a good-faith effort to attempt the exercises. In addition to homework assignments, you are required to complete 3 paper assignments, each of which will entail the estimation, interpretation, and communication of models discussed in the previous section of the course. Finally, you will be required to complete a full research paper by the end of the semester, and present your findings to the faculty during a poster session on the final day of class. The quality of your final paper should be such that, with a little extra work, you could submit the paper as an article at a peer-reviewed journal. I will grade your final paper as a reviewer for a journal. There are no exams for this course. Your learning will be examined on the basis of your demonstrated competence in applying what we cover in class. Books Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press. Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage. Fox, John, and Sanford Weisberg. 2011. An R Companion to Applied Regression, 2nd Edition. Thousand Oaks, CA: Sage. Grades Completion of homework assignments: 20% Three paper assignments: 20% each One final paper: 20% I will use the following scale to assign final grades: A B+ C+ D+ 93-100 87-90 77-80 67-70 AB C D 90-93 83-87 73-77 60-67 BCF 80-83 70-73 <60 Policy I will follow Duke University’s procedures to establish whether absences from any event related to this class are justified (e.g. illness, sport events) and merit ad hoc arrangements. Late assignments will be reduced by one full letter grade for each day beyond the deadline. I will also follow the University’s policy in any event of plagiarism and academic dishonesty. Grade complaints: You have the right to dispute a grade if you disagree with it. You must do so in writing, no more than 3 working days after I have returned the assignment to you. Upon receiving your appeal, I will reevaluate your grade. Please note that I will reevaluate the entire assignment. Thus, if I have made an error in your favor, this will also be corrected. A Note on Software I will use and teach R in this class, and it will be easiest for you to do the same. I know there are some Stata users, but, the more I have thought about it, the more strongly I recommend that you all use R. Very little prior knowledge of R is required to do well in this course, and I am very willing to help those without prior experience. I will provide notes for Stata users if there is interest in these, but again, I recommend against this. Software notes in R will be posted on Sakai and listed for the week in which we discuss the issues relevant to the notes. These will help you to complete your homework assignment due the following Tuesday. Homework Homework assignments should be turned in by email no later than 8am on the Tuesday of the week for which the assignment is listed. You should append the code and the output generated for the assignment to the end of the document with your answers to the questions. This will enable me to identify where problems exist (if there are any problems). Paper Assignments All paper assignments should be treated more formally than the homework assignments. That is, your presentation of any results of analyses you conduct should “look nice.” Do not copy and paste output from a software package. I am mostly thinking here about tables of coefficients, etc. Treat these assignments as “mini-research papers”: If you wouldn’t do it in submitting a paper to a journal, don’t do it here. Assignment 1 Find data of interest to you that contain at least one dependent variable that can be reasonably treated as continuous, and that contains predictor variables useful for testing a coherent hypothesis. In looking for data, you should keep in mind issues of model specification: if you would not be willing to defend your specification as valid, then rethink your specification or find another dataset. The paper should do the following things: (1) (2) (3) (4) (5) Describe your data and the hypothesis you are testing (can be relatively brief). Describe the statistical model you will estimate, and why that model. Estimate the model via OLS and MLE and present results clearly in a table and/or figure. Estimate heteroskedasticity-consistent SEs and bootstrapped SEs, and table the results. Model the variance as a function of predictor variables chosen by you, and explain why you included those predictors in the variance portion of the model. Present these results clearly in a table and/or figure. (6) Estimate the same model as (3), but add an interaction with your key predictor of interest. Estimate this interactive model via OLS and present the key results clearly in a figure. (7) Briefly summarize what you found in 3-6. In all cases, you should substantively interpret your findings in-text. Assignment 2 Everything same as above, except: find a binary DV, nix (4), after (3) use the MLE model for all analyses, and provide model-appropriate fit statistics. Assignment 3 Describe, estimate, present, and interpret three models and respective hypothesis tests: one ordinal DV, one nominal DV, and one selection model. Course Schedule Indicates a required reading for that week Indicates a homework assignment, posted on Sakai under resources, to be completed and emailed to me no later than 8am on Tuesday of the assignment’s respective week Indicates the due date for a paper assignment o Indicates a recommended reading January 9th: No class meeting, but… G&H: skim Chapter 2 and refresh your understanding of these core concepts Long: Chapter 1 o F&W: Chapters 1-3 if you are using R and need to refresh your basic skills January 14th and 16th: Introduction to regression, Continuous DVs I: concepts G&H: Chapter 3 Long: Chapter 2.1 and 2.2 January 21st and 23rd: Continuous DVs II: estimation and inference o o G&H: Chapter 3 Long: All of Chapter 2 Software notes 1 HW #1 F&W: Chapter 4 Brambor, Thomas, Williams Roberts Clark, and Matt Golder. 2006. “Understanding Interaction Models: Improving Empirical Analyses.” Political Analysis 14 (1): 63-82. January 28th and 30th: Continuous DVs III: diagnostics and additional complexities G&H Chapter 4 Software notes 2 F&W: Chapter 6 (ignore section 6.6. for now) HW #2 February 4th and 6th: Binary DVs I: concepts Long: Chapter 3.1-3.5 G&H Chapter 5.1-5.5 and Chapter 6.4 Paper assignment 1 due on Thursday at start of class February 11th and 13th: Binary DVs II: estimation and inference o o Long: all of Chapter 3 Software notes 3 F&W: Chapter 5 (up to p. 246) HW #3 Berry, William D., Jacqueline H.R. DeMeritt, and Justin Esarey. 2010. “Testing for Interaction in Binary Logit and Probit Models: Is a Product Term Essential?” American Journal of Political Science 54 (1): 248-266. February 18th and 20th: Binary DVs III: diagnostics and additional complexities o o G&H: Chapter 5.6-5.8 Long: Chapter 4 Software notes 4 HW #4 F&W: Chapter 6.6 only Herron, Michael C. 1999. “Postestimation Uncertainty in Limited Dependent Variable Models.” Political Analysis 8 (1): 83-98. o Gelman, Andrew, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung Su. 2008. “A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models.” The Annals of Applied Statistics 2 (4): 1360-1383. o Alvarez, R. Michael, and John Brehm. 1997. “Are Americans Ambivalent Towards Racial Policies?” American Journal of Political Science 41 (2): 345-374. February 25th and 27th: Ordinal DVs o G&H: Chapter 6.5 Long: Chapter 5 Software notes 5 F&W: Chapter 5.9 Paper assignment 2 due on Thursday at start of class March 4th and 6th: Nominal DVs o G&H: 6.5 Long: Chapter 6 Software notes 6 F&W: Chapter 5.7 and 5.8 HW #5 March 11th and 13th: Spring Break March 18th and 20th: Count DVs o G&H: Chapter 6.2 Long: Chapter 8 Software notes 7 F&W: Chapter 5.5 and 5.10.4 HW #6 March 25th and 27th: Models for censoring, truncation, and selection Long: Chapter 7 Software notes 8 HW #7 April 1st and 3rd: Multilevel models I: Concepts G&H: Chapters 1, 11, and 12 Software notes 9 Paper assignment 3 due on Thursday at start of class April 8th and 10th: Multilevel models II: Estimation and Inference G&H: Chapters 13, 14, and 15 Software notes 10 HW #8 April 15th: Multilevel models III TBA May 2nd: Poster presentations (Time TBA) FINAL PAPER DUE BY 5PM!!!