The University of North Carolina at Chapel Hill School of Social Work SOWO 918: Applied Regression Analysis and Generalized Linear Models Spring Semester, 2015 INSTRUCTOR Roderick Rose Ph.D. Room 245D Tate Turner Kuralt, CB #3550 School of Social Work, Chapel Hill, NC 27599-3550 Phone: (919) 260-0542 Email: rarose@email.unc.edu TEACHING ASSISTANT Vivien Li Room 245A wenli@email.unc.edu CLASS MEETING TIMES & OFFICE HOURS Class meets on Wednesdays 9:00-11:50 am (Room 102 TTK) Office hours are Tuesday 10-noon with Dr. Rose and Wednesday 12-1 with Vivien Li Course Description: This course introduces statistical frameworks, analytical tools, and social behavioral applications of OLS regression model, weighted least-square regression, logistic regression models, and generalized linear models. Course Objectives: At the completion of the course, students will be able to: 1. Understand the type and nature of research questions and data that are suitable for regression analysis; 2. Use Stata computing software package to manage and analyze data with the OLS regression model; 3. Understand the Gauss-Markov theorem and the BLUE property of OLS, especially conditions under which BLUE does not hold; 4. Have a solid understanding of the five assumptions embedded in the OLS regression; 5. Know how to conduct statistical tests detecting violations of OLS assumptions (i.e., multicollinearity, heteroskedasticity, influential data and outliers, etc.); 6. Know how to take remedial measures if harmful violations exist (i.e., weighted leastsquares regression, etc.); 7. Understand the type and nature of research questions and data that are suitable for the generalized linear models; 1 8. Have a solid understanding of basic concepts of categorical data (i.e., odds ratio, relative risk, marginal probability, and conditional probability); 9. Use Stata computing software package to manage and analyze data with the binary, ordered, and multinomial logistic regressions; 10. Know how to interpret results of regression analysis and logistic regression analysis, and communicate findings to general audiences clearly and effectively in writing; 11. Understand limitations of the regression and logistic regression models, and common pitfalls in using these models; 12. Understand the basics of conducting a Monte Carlo study. Pre-requirement: Students are assumed to be familiar with descriptive and inferential statistics. They should have statistical and statistical software background at least equivalent to that provided by SOWO 911. Students without such prerequisites should contact the instructor to determine their eligibility to take the course. Statistical Software Package: This course will use Stata as the main software package. However, you may also use SAS. Required Textbook: Kutner, M. H., Nachtsheim, C.J., and Neter, J. (2004). Applied Linear Regression Models, 4th Edition, McGraw Hills/Irwin. Required Articles for Reading All required journal articles are available on E-journals. Required book chapters will be distributed in class. Recommended Textbooks: Kennedy, P. (2003). A Guide to Econometrics, 5th ed. Cambridge, MA: The MIT Press. Berk, R.A. (2004). Regression Analysis: A Constructive Critique, Thousand Oaks, CA: Sage Publications Assignments Assignment 1 Assignment 2 Assignment 3 Assignment 4 Assignment 5 Class Project/Participation Midterm Exam Final Exam Grade Percentage 8% 8% 8% 8% 8% 10% 20% 30% GRADING SYSTEM The standard School of Social Work interpretation of grades and numerical scores will be used. H = 94-100 P = 80-93 2 L = 70-79 F = 69 and below POLICY ON CLASS ATTENDANCE Class attendance is an important element of class evaluation, and you are expected to attend all scheduled sessions. You will easily fall behind the course if you miss a class session. It will affect the class learning project, so it is imperative to attend. Students are responsible for informing the instructor when they must miss a class session. POLICY ON INCOMPLETE AND LATE ASSIGNMENTS Assignments are to be turned in to the professor by 5pm of the due date noted in the course outline. Brief extensions may be granted by the professor given advance notice of at least 24 hours. Late assignments (not turned in by 5pm on the due date) will be reduced 10 percent for each day late (including weekend days). A grade of incomplete will only be given under extenuating circumstances and in accordance with University policy. POLICY ON ACADEMIC DISHONESTY Students are expected to follow the UNC Honor Code. Please include the honor code statement along with your signature on all assignments: “I have neither given nor received unauthorized aid on this assignment.” Please refer to the APA Style Guide, the SSW Manual, and the SSW Writing Guide for information on attribution of quotes, plagiarism and appropriate use of assistance in preparing assignments. If reason exists to believe that academic dishonesty has occurred, a referral will be made to the Office of the Student Attorney General for investigation and further action as required. POLICY ON ACCOMMODATIONS FOR STUDENTS WITH DISABILITIES Students with disabilities that affect their participation in the course may notify the instructor if they wish to have special accommodations in instructional format, examination format, etc., considered. 3 SESSION SCHEDULE All sessions meet in Tate-Turner-Kuralt Room 102 except as noted. 1 1/7 -SSWR Wed 1/14, no class2 1/21 3 1/28 4 2/4 5 2/11 6 2/18 by Vivien Li 7 2/25 8 3/4 -Spring break on 3/11, no class9 3/18 10 3/25 11 4/1 by Vivien Li 12 4/8 13 4/15 14 4/22 4 COURSE OUTLINE (TOPICS, READINGS, AND ASSIGNMENTS) 1 Introduction and course overview Course overview Project: Orientation to Stata or SAS on the VCL Review of important concepts: Association versus causation Univariate, bivariate, and multivariate analyses Statistical properties Summation operator No readings due for this session. Readings and practice to be completed after this session: Neter et al: 1.1-1.2 Guo, S. (2008). Quantitative research. An entry of the Encyclopedia of Social Work, 20th Edition. New York, NY: The Oxford University Press. 2 Simple linear regression Optimization: minimization versus maximization Least-square estimator Interval estimation and hypothesis testing Readings to be completed for this session: Kutner et al: 1.3-1.7, Chapters 2 & 4 Seminar reading to be completed for this session: Holland, Paul W. (1986) Statistics and causal inference. Journal of the American Statistical Association, 81: 945-960. doi:10.2307/2289064 Assignment 1 (due in session 4). In this Assignment, you will solve problems pertaining to simple linear regression and basic matrix algebra. 3 Matrix algebra Matrices, vectors, and scalars Matrix operations Readings to be completed for this session: Kutner et al: 5.1-5.9 Seminar reading to be completed for this session: Kenney, G. M., Ruhter, J., & Selden, T. M. (2009). Containing costs and improving care for children in Medicaid and CHIP. Health Affairs, w1026w1036. 5 4 Multiple linear regression Model specifications Estimation of regression coefficients Inferences concerning Readings to be completed for this session: Kutner et al: Chapter 6 (the first half) Seminar reading to be completed for this session: Haveman, R., Holden, K., Romanv, A., & Wolfe, B. (2007). Assessing the maintenance of savings sufficiency over the first decade of retirement. International Tax and Public Finance 14, 481-502. Assignment 1 due today Assignment 2 out (Due in session 6):In this Assignment, you will solve problems pertaining to multiple linear regression models. 5 ANOVA and R2 Review of variance, covariance and correlation Decomposition of total sum of squares, F test R2 and adjusted R2 Readings to be completed for this session: Kutner et al: Chapter 6 (the second half) Guo, S. (1996). “Determinants of fertility decline in urban China”, in Alice Goldstein and Feng Wang (Eds.): China, Many Facets in Demographic Change, Westview Press. Look ahead to assignment 3. Students should find the article they will use to complete assignment 3 by session 6. Seminar reading to be completed for this session: Bowen, Powers & Rose, 2005 (in Sakai folder) 6 The Five Assumptions of OLS BLUE criterion and properties of OLS Maximum likelihood estimator The five OLS assumptions Readings to be completed for this session: Kutner et al: 1.8 6 Kennedy, P. (2003). A Guide to Econometrics, 5th Edition, Chapter 3, Cambridge, MA: The MIT Press. (available on Sakai) No seminar reading. Assignment 2 due today Assignment 3 out (Due in session 7). You will find an article that uses multiple linear regression and you will do a brief 10 minute presentation. In this presentation, you will critique how the authors address the 5 assumptions of OLS. 7 The Five Assumptions & Power Project: a Monte Carlo study showing the consequences for OLS under various data generation conditions. Power analysis Students will present their critiques (assignment 3). Assignment 3 due today Readings to be completed for this session: Kutner et al: Chapter 3 Rose & Bowen. Available on Sakai. Guo, S., & Hussey, D.L. (2004). Nonprobability sampling in social work research: Dilemmas, consequences, and strategies. Journal of Social Service Research, 30(3): 1-18. No seminar reading. 8 Violating OLS Assumptions and Remedial Measures – I Specification errors and selection of Predictors Influential data and outliers Multicollinearity Readings to be completed for this session: Kutner et al: Chapters 9 and 10 Lum & Chang, 1998 Guo, response to Lum & Chang Seminar reading Parish, S. L., Rose, R. A., & Swaine, J. G. (2012). (in Sakai folder) Midterm exam out (DUE DATE PROVIDED BY INSTRUCTOR): Use data sets provided by the course or data set you choose to run a multiple linear regression model. Write a paper (no more than 12 pages, double spaced) to present findings. The paper should include: (1) research questions and data that are suitable to a regression analysis; (2) methods and specification of the regression model; (3) 7 diagnostics detecting at least two problems pertaining to violation of regression assumptions and discussion of remedial measures; (4) interpretation of findings; and (5) presentation of findings that answers research questions effectively and efficiently. 9 9. Violating OLS Assumptions and Remedial Measures – II Heteroskedasticity Weighted Least Squares Readings to be completed for this session: Kutner et al: Chapter 11 Gujarati, D. N. (1995). Basic Econometrics, Third Edition, McGraw-Hill, pp. 365-389. No seminar reading. Assignment 4 out today (due session 11): In this Assignment, you are required to solve problems pertaining to interpretation of standardized regression coefficients, diagnostics of heteroskedasticity, performing partial F test, and testing interactions. 10 Other Topics in Regression Analysis -- I Regression through origin Partial-correlation coefficient Standardized regression coefficient Functional form, curvilinear relationship and polynomial regression models R2 increment: Hierarchical regression analysis Readings to be completed for this session: Kutner et al: Chapter 7 http://andrewgelman.com/2014/06/02/hate-stepwise-regression/ http://www.danielezrajohnson.com/stepwise.pdf White, M., Biddlecom, A. and Guo, S. (1993) “Immigration, naturalization and residential assimilation among Asian Americans in 1980s”, Social Forces,72(1): 93-118. Seminar reading to be completed for this session: No seminar reading this week; read the Gelman and Johnson stepwise essays/comments (shown above) and be prepared to talk about them. 11 Other Topics in Regression Analysis -- I Dummy variables as predictors: Interpretation of regression coefficients Comparison of several regression equations 8 Indirect effects: mediation, suppression and confounding Testing interactions: Mediator versus moderator Interaction, joint effect and moderator ANCOVA Experimental design Concomitant variables Readings to be completed for this session: Kutner et al: Chapter 8 Seminar reading to be completed for this session: Rose, Parish et al. (in Sakai folder) Assignment 4, due today. Assignment 5 out today (due in session 13): In this Assignment, you are required: (a) to perform maximum likelihood estimation using Excel; (b) to solve problems pertaining to interpretation of odds ratio and running binary logistic regression; and (c) do a group exercise on critical review of a study using regression. 12 Logistic Regression Analysis - I Violations of OLS assumption when dependent variable is dichotomous Logistic response function Maximum likelihood estimator Readings to be completed for this session: Kutner et al: Chapter 13 Seminar reading to be completed for this session: Parish, S. L., Rose, R. A., & Andrews, M. E. (2009). (in Sakai folder) 13 Logistic regression analysis - II Odds ratio Relative risk Predicted probability Readings to be completed for this session: Kutner et al: Chapter 14 Agresti, A. (1990). Categorical Data Analysis, pp. 13-19. Hoboken, NJ: John Wiley & Sons, Inc. No seminar reading (Hand out Final) 9 Final Exam (DUE DATE PROVIDED BY INSTRUCTOR): DESCRIBE 14 Other topics Multinomial logistic regression Ordinal logistic regression Overview of the generalized linear models Common pitfalls in conducting and reporting regression analysis Missing data Readings to be completed for this session: Rose & Fraser 2008 (on Sakai) No seminar reading 10