SOCY 7111 Data III Advanced Data Analysis TTH 2:00-3:15 KTCH 33 Fall 2010 Instructor: Professor Fred Pampel Office: 102A IBS #3 (1424 Broadway) Email: fred.pampel@colorado.edu Office Hours: 3:30-4:30 TTH and by appointment Phone: 2-5620 Texts Generalized Linear Models: An Applied Approach (2004) – John P. Hoffman Multilevel Modeling (2004) – Douglas A. Luke Missing Data (2002) – Paul D. Allison Selections (see CU Learn or http://www.colorado.edu/ibs/pop/pampel) Statistics for Stata Version 10 (2009) – Lawrence Hamilton Logistic Regression: A Primer (2000) – Fred C. Pampel Regression Models for Categorical Dependent Variables Using Stata, 2nd Edition (2006) – J. Scott Long and Jeremy Freese UCLA Academic Technology Services: Resources to Help You Learn and Use Stata (2010) – Xiao Chen, Phil Ender, Michael Mitchell, and Christine Wells, URL: http://www.ats.ucla.edu/stat/stata Introducing Multilevel Modeling (1998) – Ita Kreft and Jan de Leeuw Multilevel Analysis: Techniques and Applications (2002) – Joop Hox Quantitative Data Analysis: Doing Social Research to Test Ideas (2009) – Donald J. Treiman Stata Reference Manuals (2009) – Stata Programs Stata 11 – Stata Corp (versions 9 and 10 will work for most but not all assignments) SPost: Post-Estimation Analysis with Stata – J. Scott Long and Jeremy Freese (free download) Objectives Data III covers several widely used statistical methods that extend the basic regression model to deal with 1) categorical and limited dependent variables, 2) multilevel data, 3) 1 missing data, and 4) complex survey designs. Although the methods also apply to analysis of longitudinal data – a topic covered in another course – this course focuses on cross-sectional data. Students should have had a previous course on multiple regression and experience using Stata, the statistical analysis package to be used throughout the course. More informally, I intend this class as a practicum in quantitative social research that emphasizes using statistics with real data. Perhaps the most important and most difficult skill to teach is the insightful application of statistical techniques to real research questions. The course thus emphasizes the match between theoretical reasoning, substantive research problems, and statistical results. Toward that end, the assignments require the application of the techniques covered in class to a topic and data set of your choice. Assignments The sections below list the lecture topics and assigned readings. Mastering the material requires more than reading, however. To apply the readings, the course requires completion of four problem assignments and two papers (with details handed out during the semester). First, the four problem assignments involve the written interpretation of computer output. Each assignment contributes 12.5 percent of the grade (50 percent in total). These assignments involve the concrete application of material covered in a more abstract form in the readings, will help you to explore your data, and are needed to prepare you to complete the papers. Second, the two papers are based on analysis of your data to address a research problem of your own selection. They contribute 25 percent each to the final grade (50 percent in total). Each should be about 10 pages, demonstrate your understanding of the statistical techniques, and relate the results to a substantive issue and related theory. You can choose the topic and data, but the end goal is to complete a professional research paper that has the potential for later publication. Stata will be used for the data analysis in the problems and papers, and it is available in the Ketchum data labs (rooms 3 and 116). Having Stata on your own computer, however, will make the assignments easier to complete. When working on the lab computers, it is best to keep your data and command files on a flash drive and copy them to the particular machine being used each time. Working on a single computer can avoid this inconvenience. I assume some experience using Stata. Those unfamiliar with the program (and those wanting to upgrade their understanding) should work through a Stata tutorial and consult two manuals: Getting Started with Stata and the User’s Guide. The manuals are available in the Ketchum 3 lab, and the web tutorials can be found at 2 http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/index.html, http://data.princeton.edu/stata, and http://www.ats.ucla.edu/stat/stata/sk/modules_sk.htm. I will give examples and guidance on commands for the statistical techniques we cover, but you will need to know the basic commands to create and recode variables, select cases, and obtain descriptive statistics. You will need to select a data set for analysis to be used in the problems and papers. I have made several available to you on CU Learn: the 2008 General Social Survey, the 2006 National Health Interview Survey, the 2006 Monitoring the Future survey of teen drug use, and the 2008 Eurobarometer Survey of European attitudes and climate views. However, you can select any others of more interest. ICPSR (http://www.icpsr.umich.edu/icpsrweb/ICPSR) makes thousands of data sets available for download, or you may have your own data to use. In any case, choose your topic, data, and variables in the first week or two, in time to use for the first problem. Schedule The schedule below lists the dates, topics, and assignments, and the section to follow lists the specific readings to complete for each class period. At this stage, the schedule represents a rough guide, and changes will occur throughout the semester. I would like to cover all the topics listed but may need to spend more time on difficult topics or bring in other materials. We may not proceed at exactly the pace initially planned. Week Date Topic 1 Orientation Background: Logs and Exponents Regression Review Regression Assumptions Link Functions Maximum Likelihood Logistic Regression SPost Interpretation Problems Probit Models Ordered Logit and Probit Multinomial Logistic Regression Poisson Regression Negative Binomial Regression Sample Selection Event History and Survival Models Introduction to Multilevel Models Basic Multilevel Models Building Multilevel Models Multilevel Analysis I 2 3 4 5 6 7 8 9 10 Aug 24 Aug 26 Aug 31 Sept 2 Sep 7 Sep 9 Sep 14 Sep 16 Sep 21 Sep 23 Sep 28 Sep 30 Oct 5 Oct 7 Oct 12 Oct 14 Oct 19 Oct 21 Oct 26 Oct 28 Assignment 3 Select Data and Topic Theory Outline Problem 1 Due Problem 2 Due Paper 1 Due 11 12 13 Nov 2 Nov 4 Nov 9 Nov 11 Nov 16 Nov 18 Multilevel Analysis II Multilevel Model Assessment Multilevel Extensions Standard Approaches to Missing Data Multiple Imputation Imputation Phase Problem 3 Due Fall Break and Thanksgiving Holiday 14 15 Nov 30 Dec 2 Dec 7 Dec 9 Estimation Phase Complex Samples Stata Survey Commands More on Stata Survey Commands Problem 4 Due Finals Week 16 Dec 14 4:30 (Tuesday) Paper 2 Due Readings Aug 26 Background Freedman, David A. 1991. “Statistical Models and Shoe Leather.” Sociological Methodology 21:291-313 Pampel, pp. 74-82 Aug 31 Regression Review Hoffman, pp. 1-21 Sep 2 Regression Assumptions Pampel, pp. 1-10 Hamilton, Chapter 7, “Regression Diagnostics,” pp. 209-228 Sep 7 Link Functions Hoffman, pp. 22-33 Sep 9 Maximum Likelihood Hoffman, pp. 33-44 Pampel, pp. 39-48 4 Sep 14 Logistic Models Hoffman, pp. 45-54, 59-64 UCLA, “Logistic Regression with Stata.” URL: http://www.ats.ucla.edu/stat/stata/webbooks/logistic/chapter1/statalog1.htm (up to Tools to Assist) UCLA, “Stata Annotated Output: Logistic Regression Analysis.” URL: http://www.ats.ucla.edu/stat/stata/output/stata_logistic.htm Sep 16 SPost: Additional Coefficients for Interpretations Long and Freese, pp.136-181 UCLA, “Logistic Regression with Stata.” URL: http://www.ats.ucla.edu/stat/stata/webbooks/logistic/chapter1/statalog1.htm (start with Tools to Assist) Sep 21 Interpretation Problems Mood, Carina. 2010. “Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It.” European Sociological Review 26:67-82 Sep 23 Probit Models Hoffman, pp. 54-59 Pampel, pp. 54-68 Sep 28 Ordered Logit and Probit Hoffman, pp. 65-82 UCLA, “Ordinal Logistic Regression.” URL: http://www.ats.ucla.edu/stat/stata/dae/ologit.htm Sep 30 Multinomial Logistic Regression Hoffman, pp. 83-100 UCLA, “Stata Data Analysis Examples: Multinomial Logistic Regression.” URL: http://www.ats.ucla.edu/stat/stata/dae/mlogit.htm Oct 5 Poisson Regression Hoffman, pp. 101-112 Long and Freese, pp. 349-370 UCLA, “Stata Data Analysis Examples: Poisson Regression.” URL: http://www.ats.ucla.edu/stat/stata/dae/poissonreg.htm 5 Oct 7 Negative Binomial Regression Hoffman, pp. 112-120 Long and Freese, pp. 372-381 UCLA, “Stata Data Analysis Examples: Negative Binomial Regression.” URL: http://www.ats.ucla.edu/stat/stata/dae/nbreg.htm UCLA, “Stata Annotated Output: Negative Binomial Regression.” URL: http://www.ats.ucla.edu/stat/stata/output/stata_nbreg_output.htm Oct 12 Sample Selection Berk, Richard A. 1983. “An Introduction to Sample Selection Bias in Sociological Data.” American Sociological Review 48:386-398 Stata 11 Reference Manual, “Heckman Selection Model,” pp. 644-653 Oct 14 Event History and Survival Models Hoffman, pp. 121-148 Oct 19 Introduction to Multilevel Models Luke, pp. 1-9 Kreft and de Leeuw, pp. 1-14 Oct 21 Basic Multilevel Models Luke, pp. 9-23 Kreft and de Leeuw, pp. 35-44 Oct 26 Building Multilevel Models Luke, pp. 23-33 Kreft and de Leeuw, pp. 44-56 Oct 28 Multilevel Analysis I Hox, Chapter 4, “Some Important Methodological and Statistical Issues,” pp. 49-58 Hamilton, Chapter 15, “Multilevel and Mixed Effects Modeling,” pp. 413-421 Luke, pp. 48-53 Nov 2 Multilevel Analysis II Hox, Chapter 4, “Some Important Methodological and Statistical Issues, pp. 58-63 Hamilton, Chapter 15, “Multilevel and Mixed Effects Modeling,” pp. 421-434 6 Nov 4 Multilevel Model Assessment Luke, pp. 33-48 Hox, Chapter 4, “Some Important Methodological and Statistical Issues, pp. 63-66 Nov 9 Multilevel Extensions Luke, pp. 53-72 Hamilton, Chapter 15, “Multilevel and Mixed Effects Modeling,” pp. 434-438 Nov 11 Standard Approaches to Missing Data Allison, pp. 1-12 Treiman, Chapter 8 “Multiple Imputation of Missing Data,” pp. 181-194 Nov 16 Multiple Imputation Allison, pp. 27-50 Nov 18 Imputation Phase Allison, pp. 50-55, 69-73 Stata 11 Reference Manual, “mi,” pp. 1-13 Fall Break and Thanksgiving Holiday Nov 30 Estimation Phase Stata 11 Reference Manual, “mi,” pp. 14-23, 105-118 Dec 2 Complex Samples Treiman, Chapter 9, “Sample Design and Survey Estimation,” pp. 195-215 Dec 7 Stata Survey Commands Treiman, Chapter 9, “Sample Design and Survey Estimation,” pp. 215-225 Hamilton, Chapter 14, “Survey Data Analysis,” pp. 391-399 Dec 9 More on Stata Survey Commands Hamilton, Chapter 14, “Survey Data Analysis,” pp. 399-408 Stata Reference Manual, “Introduction to Survey Commands,” pp. 3-14, 16-17 7 Special Issues Disability. If you qualify for accommodations because of a disability, please submit to me a letter from Disability Services in a timely manner so that your needs may be addressed. Disability Services determines accommodations based on documented disabilities. Contact: 303-492-8671, Willard 322, and http://www.Colorado.EDU/disabilityservices. Religious Obligations. Campus policy regarding religious observances requires that faculty make every effort to reasonably and fairly deal with all students who, because of religious obligations, have conflicts with scheduled exams, assignments or required attendance. Please inform me ahead of time of any obligation that conflicts with the assignments or exams so that accommodations can be made. See full details at http://www.colorado.edu/policies/fac_relig.html. Appropriate Behavior. Students and faculty each have responsibility for maintaining an appropriate learning environment. Students who fail to adhere to such behavioral standards may be subject to discipline. Faculty have the professional responsibility to treat all students with understanding, dignity and respect, to guide classroom discussion and to set reasonable limits on the manner in which they and their students express opinions. Professional courtesy and sensitivity are especially important with respect to individuals and topics dealing with differences of race, culture, religion, politics, sexual orientation, gender variance, and nationalities. Class rosters are provided to the instructor with the student's legal name. I will gladly honor your request to address you by an alternate name or gender pronoun. Please advise me of this preference early in the semester so that I may make appropriate changes to my records. See polices at http://www.colorado.edu/policies/classbehavior.html and at http://www.colorado.edu/studentaffairs/judicialaffairs/code.html#student_code. Discrimination and Harassment. The University of Colorado at Boulder policy on Discrimination and Harassment (http://www.colorado.edu/policies/discrimination.html), the University of Colorado policy on Sexual Harassment, and the University of Colorado policy on Amorous Relationships applies to all students, staff and faculty. Any student, staff or faculty member who believes s/he has been the subject of discrimination or harassment based upon race, color, national origin, sex, age, disability, religion, sexual orientation, or veteran status should contact the Office of Discrimination and Harassment (ODH) at 303-492-2127 or the Office of Judicial Affairs at 303-492-5550. Information about the ODH and the campus resources available to assist individuals regarding discrimination or harassment can be obtained at http://www.colorado.edu/odh. Honor Code. All students of the University of Colorado at Boulder are responsible for knowing and adhering to the academic integrity policy of this institution. Violations of this policy may include: cheating, plagiarism, aid of academic dishonesty, fabrication, lying, bribery, and threatening behavior. All incidents of academic misconduct shall be reported to the Honor Code Council (honor@colorado.edu; 303-725-2273). Students 8 who are found to be in violation of the academic integrity policy will be subject to both academic sanctions from the faculty member and non-academic sanctions (including but not limited to university probation, suspension, or expulsion). Other information on the Honor Code can be found at http://www.colorado.edu/policies/honor.html and http://www.colorado.edu/academics/honorcode. 9