The University of North Carolina at Chapel Hill School of Social Work SOWO 917 Longitudinal and Multilevel Analysis Fall Semester, 2015 INSTRUCTOR Din Chen, Ph.D. Wallace H. Kuralt Distinguished Professor Director of Statistical Development and Consultation School of Social Work, University of North Carolina at Chapel Hill, NC USA Office: Room 548C, Tate Turner Kuralt, CB #3550 School of Social Work, Chapel Hill, NC 27599-3550 Phone: (919) 843-2434 Email: dinchen@email.unc.edu CLASS MEETING TIMES & OFFICE HOURS Class meets on Wednesdays 9:00-11:50 am (Room 102 TTK) from 8/17 to 12/14 Office hours are Wednesday 12-2 or by appointment COURSE DESCRIPTION This course introduces the context and intuition for longitudinal and multilevel models, and the statistical frameworks, analytical tools, and social behavioral applications of three types of models: event history analysis (EHA), multilevel modeling (MLM), and growth curve analysis. COURSE OBJECTIVES At the completion of the course, students will have a solid understanding of the challenges and problems in longitudinal and multilevel analysis. They will know how to choose appropriate statistical analyses that best suit the type of data and research questions for a given study. They are expected to be able to conceptualize, design, run, interpret, and communicate results clearly and effectively in spoken and written settings based on multilevel modeling (including two-level and three-level hierarchical linear models, growth curve analysis, categorical MLMs, and understanding cross-classification and cross-level effects) and event history analysis (life tables, Kaplan-Meier’s estimate of survivor function, discrete time model, Cox proportional hazard model, marginal models handling multilevel event data). Students are encouraged to bring their own research projects/datasets to be used as examples in the course. PRE-REQUISITES Students are assumed to be familiar with descriptive and inferential statistics as well as multiple regression analysis. They should have statistical and statistical software background at least equivalent to that provided by SOWO918, SOCI209, PSYC282, 1 EDUC284 (linear regression), or SOCI211 (categorical data analysis). Students without such prerequisites should contact the instructor to determine their eligibility to take this course. SAKAI COURSE SITE Go to: https://www.unc.edu/sakai/ Enter your ONYEN Navigate to SOWO917.001.FA15 This syllabus is under “syllabus” on the left-hand navigation menu All class lecture notes, assignments, and other materials as needed will be provided under “resources” on the left-hand navigation menu All course materials are on the web site and students are responsible for bringing their materials to class. STATISTICAL SOFTWARE PACKAGES Students may choose to use Stata, SPSS, SAS, or R as the primary statistical software package for the course. I will mainly use R/SAS for this class along with the textbooks (below) due to their functionalities on multilevel modelling and graphics. I will reference R/SAS to SPSS/Stata at various times in classroom lectures, materials, and demonstrations. TEXTBOOK (ALL ARE EXPECTED TO BUY THE FIRST BOOK FOR THIS CLASS) Finch, W. H., Bolin, J.E. and Kelley, K. (2014). Multilevel Modelling Using R. Chapman and Hall/CRC. Statistics in the Social Behavioral Sciences Series. (Referred as “R4MLM” for this class) Singer, J.D., and Willett, J.B., (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence, New York, NY: Oxford University Press RECOMMENDED TEXTBOOKS Allison, P.D. (1995). Survival Analysis Using the SAS System. Cary, NC: SAS Institute Inc. Bliese, P. (2013) R Manual for MLM: http://cran.rproject.org/doc/contrib/Bliese_Multilevel.pdf (saved into Sakai “Resources” as “Bliese_Multilevel.pdf”) Cleves, M.A., Gould, W.W., & Gutierrez, R.G. (2004). An introduction to survival analysis using Stata, Rev. ed., College Station, TX: Stata Press. Guo, S. (2010). Survival Analysis: A Practical Guide to Social Work Research. New York, NY: Oxford University Press. Pinheiro, J. and Bates, D. (2009). Mixed-Effects Models in S and SPlus. Springer. 2 Rabe-Hesketh, S., & Skrondal, A. (2005). Multilevel and Longitudinal Modeling Using Stata, College Station, TX: Stata Press. Raudenbush, S.W., & Bryk, A.S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods, Second Edition, Thousand Oaks, CA: Sage Publications Ltd. SAS Online documentation for Proc Mixed: http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer. htm#mixed_toc.htm ASSIGNMENTS GRADE PERCENTAGE Assignment 1 Assignment 2 Assignment 3 Assignment 4 Assignment 5 Midterm project (take home) Final Project (take home) 10% 10% 10% 10% 10% 25% 25% GRADING SYSTEM The standard School of Social Work interpretation of grades and numerical scores will be used. H = 94-100 P = 80-93 L = 70-79 F = 69 and below POLICY ON CLASS ATTENDANCE Class attendance is an important element of class evaluation, and you are expected to attend all scheduled sessions. Each class session will cover a great deal of materials and you will easily fall behind the course if you miss a class session which will affect the class learning project, so it is imperative to attend. Students are responsible for informing the instructor when they must miss a class session. You are expected not to miss more than two sessions for the whole semester. Starting from the second missing, your course grade will be reduced by 10% for each session missed. POLICY ON INCOMPLETE AND LATE ASSIGNMENTS Assignments are to be turned in to the professor by 5pm of the due date noted in the course outline. Brief extensions may be granted by the professor given advance notice of at least 24 hours. Late assignments (not turned in by 5pm on the due date) will be reduced 10% for each day late (including weekend days). A grade of incomplete will only be given under extenuating circumstances and in accordance with University policy. 3 POLICY ON ACADEMIC DISHONESTY Students are expected to follow the UNC Honor Code. Please include the honor code statement along with your signature on all assignments: “I have neither given nor received unauthorized aid on this assignment.” Please refer to the APA Style Guide, the SSW Manual, and the SSW Writing Guide for information on attribution of quotes, plagiarism and appropriate use of assistance in preparing assignments. If reason exists to believe that academic dishonesty has occurred, a referral will be made to the Office of the Student Attorney General for investigation and further action as required. POLICY ON ACCOMMODATIONS FOR STUDENTS WITH DISABILITIES Students with disabilities that affect their participation in the course may notify the instructor if they wish to have special accommodations in instructional format, examination format, etc., considered. SESSION SCHEDULE All sessions meet in Tate-Turner-Kuralt Room 102 except as noted. 1 8/19 2 8/26 3 9/2 4 9/9 5 9/16 6 9/23 7 9/30 8 10/7 -Fall break on 10/14, no class 9 10/21 -Midterm exam due on 10/27. 10 10/28 11 11/4 12 11/11 13 11/18 -Thanksgiving on 11/26, no class on Wed 11/25. 14 12/2-Class final project. Due on 12/11. 4 COURSE OUTLINE (TOPICS, READINGS, AND ASSIGNMENTS) 1(8/19) Introduction and course overview Introduction to R: How to install and get help. Review of fundamental statistical concepts. How to do regression using R Readings to be completed for this session: R4MLM: Chapter 1 Optional Reading: Guo, S. (2013). Advanced statistical analysis. Entry for the Encyclopedia of Social Work Online. New York, NY: The Oxford University Press. Assignment #1 (Due in 8/25): In this assignment you will demonstrate your readiness to use R to read data for data summary, regression analysis and plot. You will also use R for a simple simulation study to generate your own data and analyze it. 2(8/26) Introduction to multilevel and hierarchical linear modeling The importance of context to social and behavioral science. Overview of MLM/HLM. Nested and cluster data in multi-level hypotheses in social sciences Variance decomposition, intra-class correlation & reliability Pitfalls of ignoring multilevel data structure Random effects & fixed effects Overview of Two-level MLM and three-level MLM Readings to be completed for this session: R4MLM: Chapter 2 Raudenbush & Bryk, Chapters 1 and 2 Singer, J. D. (1998). Using SAS Proc Mixed to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics 23(4), 323-355. Hedges, L. V. (2007). Correcting a significance test for clustering. Journal of Educational and Behavioral Statistics 32(2), 151-179. Optional Reading: Guo, S. (2005). “Analyzing grouped data with hierarchical linear modeling”, Children and Youth Services Review 27:637-65. 3(9/2) Fitting Two-level MLM in R Two-level models and implementation in R packages nlme and lme4 Writing out equations and substitution. Estimation theory. Variance explained and presenting results. 5 Readings to be completed for this session: R4MLM: Chaprer 3 Raudenbush & Bryk, Chapter 5 (99-130) Primo, D., Jacobsmeier, M. L., and Milyo, J. (2007). Estimating the impact of state policies and institutions with mixed-level data. State Politics and Policy Quarterly, 7(4), 446-449. Snijders, T. A. B., & Bosker, R. J. (1994). Modeled variance in two-level models. Sociological Methods and Research 22(3), 342-363. Hedges, L. V. & Hedberg, E. C. (2007). Intraclass correlations forplanning group randomized experiments in rural education. Educational Evaluation and Policy Analysis 29(1), 60-87. 4 (9/9) Models for Three and more levels Three-level models and implementation in R packages nlme and lme4 Model fitting and goodness-of-fit indices. Readings to be completed for this session: R4MLM: Chapter 4 Raudenbush & Bryk, Finish chapter 5 Rose, R. A., & Bowen, G. L. (2009). Power analysis in social work intervention research design: Designing cluster randomized trials. Social Work Research, 33(1), 43–52. Raudenbush, S. W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2(2), 173-185. Optional reading: Schochet, P. Z. (2005). Statistical power for random assignment evaluations of education programs. Washington, DC: Mathematica Policy Research. Assignment 2 (due in 9/16): Two-/Three-level data analysis using MLM. 5 (9/17) Longitudinal Data Analysis Using MLM Longitudinal or multilevel longitudinal data and specifying time Random effects vs. fixed effects models The multilevel model for change; model building Readings to be completed for this session: R4MLM: Chapter 5 Singer & Willett, Chapters 1-3 Raudenbush, S. W. (2001). Comparing personal trajectories and drawing causal inferences from longitudinal data. Annual Review of Psychology 52, 501-525. Raudenbush, SW., & Liu, X. (2000). “Statistical power and optimal design for multisite randomized trials.” Psychological Methods 5(2): 199-213. 6 6 (9/23) Graphing Data in Multilevel Contexts. R powerful graphics for MLM Readings to be completed for this session: R4MLM: Chapter 6 Assignment 3 (due in 9/29): Longitudinal data analysis using MLM. 7(9/30) Generalized Linear Models for non-normal data Logistic regression and Poisson Regression When and how to do these regression models Model fitting and presentations Readings to be completed for this session: R4MLM: Chapter 7 Other lecture notes will be provided 8(10/7) Advanced MLM: Generalized Linear Models. Random coefficient logistic regression for categorical data Random coefficient Poisson regression for count data MLM Data and model fitting, presentation Readings to be completed for this session: R4MLM: Chapter 8 Assignment 4 (due in 10/25): Data analysis using MLM. 9(10/21) Introduction to Bayesian MLM. Introduction to Bayesian models and Bayesian MLM Introduction to Markov Chain Monte Carlo (MCMC) Easy implementation in R for data analysis Readings to be completed for this session: R4MLM: Chapter 9. Midterm Exam (Due on 10/27): Use data sets provided by the course or data set you choose to run a multilevel regression model. Write a brief paper (no more than 12 pages, double spaced) to present findings. The paper should include: (1) 2 research questions; (2) data description and specification of the multilevel regression; (3) description of the process by which the model will be fitted; (4) a description of model diagnostics and sensitivity tests; and (5) report and interpret the findings from each of (2)-(3). You should be able to explain the findings to a lay audience. 7 10(10/28) Intro to Survival Analysis Overview of event history analysis Censoring Discrete-time event occurrence Life tables Hazard and survival functions/curves Survival data analysis using R/SAS Readings to be completed for this session: Singer & Willett, Chapters 9-10. Yang, T. & Aldrich, H. E. (2012). Out of sight but not out of mind: Why failure to account for left truncation biases research on failure rates. Journal of Business Venturing 27, 477-492. Seminar reading to be completed for this session: Berger, M. C. & Black. D. A. (1998). The duration of Medicaid spells: An analysis using flow and stock samples. The Review of Economics and Statistics 80(4), 667-675. Optional readings: Guang Guo (1993). “Event history analysis for left-truncated data”, Sociological Methodology, 23, 217-243. Harris, K.M. (1993). “Work and welfare among single mothers in poverty.” American Journal of Sociology 99: 317-352. 11(11/4) Discrete-time models, continued The discrete time hazard model Alternate specifications for time Time-varying covariates Proportionality and unobserved heterogeneity Parametric models (Weibull, accelerated failure time, etc.) Survival data analysis using R/SAS Readings to be completed for this session: Singer & Willett, Chapters 11-12. Nam, Y. (2005). The roles of employment barriers in welfare exits and reentries after welfare reform: Event history analysis. Social Service Review 79(2), 268-293. Haque, M. M. & Washington, S. (2014). A parametric duration model of the reaction times of drivers distracted by mobile phone conversations. Accident Analysis and Prevention 62, 42-53. Seminar reading to be completed for this session: Glick, J. E. & Van Hook, J. (2011). Does a house divided stand? Kinship and the continuity of shared living arrangements. Journal of Marriage and Family 73, 1149-1164. Optional readings: 8 12(11/11) Lee, E. T. & Go, O. T. (1997). Survival analysis in public health research. Annual Review of Public Health 18, 105-134. Allison, P.D. (1982). “Discrete-time methods for the analysis of event histories”, Sociological Methodology, 13, 61-98. Hetling, A., Ovwigho, P. C., & Born, C. E. (2007). Do welfare avoidance grants prevent cash assistance? Social Service Review 81(4), 609-631. Kaplan Meier & Cox proportional hazards model The clog-log model Rare event models Kaplan-Meier’s estimate of survivor functions The cumulative hazard function and kernel smoothing Partial likelihood estimator Cox regression Data analysis using R/SAS Readings to be completed for this session: Singer & Willett, Chapters 13 and 14 up to page 516. Heise, M. (2012). Law and policy entrepreneurs: Empirical evidence on the expansion of school choice policy. Notre Dame Law Review 87(5), 19171940. Seminar reading to be completed for this session: Kosterman, R., Hawkins, D., Guo, J., Catalano, R. F., & Abbott, R. D. (2000). The dynamics of alcohol and marijuana initiation: Patterns and predictors of first use in adolescence. American Journal of Public Health 90(3), 360-366. Optional readings: Sandefur & Cook, (1998). “Permanent exits from public assistance: The impact of duration, family, and work”. Social Forces, 77(2) 763-786. Guo, S., Biegel, D., Johnson, J. & Dyches, H. (2001) “Assessing the impact of mobile crisis services on preventing hospitalization: A community-based evaluation”. Psychiatric Services 52(2):223-228. Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society 2(XX), 187-220. Efron, B. (1977). The efficiency of Cox’s likelihood function for censored dta. Journal of the American Statistical Association 72(359), 557-565. Assignment 5 (due in 10/17): Survival Data analysis. 13(11/18) Cox proportional hazards model, continued. Partial likelihood method Interpreting results Alternate structures for time Non-proportional hazards and interactions with time 9 Diagnostics Competing risks Power analysis for survival models Introduction to multilevel event time data (multivariate failure time data) Readings to be completed for this session: Singer & Willett, finish chapter 14 and 15. Jozwiak, K. & Moerbeek, M. (2012). Power analysis for trials with discretetime survival endpoints. Journal of Educational and Behavioral Statistics 37(5), 630-654. MORE Optional readings: Stata documentation on STPOWER: http://www.stata.com/manuals13/ststpower.pdf Heckman, J.J., & Singer, B. (1985), “Social science duration analysis”, in Longitudinal Studies of Labor Market Data, New York, NY: Cambridge University Press. Chapter 2. Grilli, L. (2005). The random effects proportional hazards model with grouped survival data: A comparison between the group continuous and continuation ratio versions. Journal of the Royal Statistical Society Series A, 168(1), 83-94. Guo, S., & Wells, K. (2003). Research on timing of foster-care outcomes: one methodological problem and approaches to its solution. Social Service Review 77(1): 1-24. Lin, D.Y. (1994). Cox regression analysis of multivariate failure time data: The marginal approach. Statistics in Medicine 13: 2233-2247. Trussell, J., & Richards, T. (1985). “Correcting for unmeasured heterogeneity in hazard models using the Heckman-Singer procedure.” Sociological Methodology 15: 242-276. 14 Final Exam (Due on 12/11): Use data sets provided by the course or data set you choose to run an event history/survival regression model (any type). Write a brief paper (no more than 12 pages, double spaced) to present findings. The paper should include: (1) 2 research questions; (2) data description and specification of the survival regression; (3) description of the process by which the model will be fitted; (4) a description of model diagnostics and sensitivity tests; and (5) report and interpret the findings from each of (2)-(3). You should be able to explain the findings to a lay audience. 10