The University of North Carolina at Chapel Hill Fall Semester, 2015

advertisement
The University of North Carolina at Chapel Hill
School of Social Work
SOWO 917 Longitudinal and Multilevel Analysis
Fall Semester, 2015
INSTRUCTOR
Din Chen, Ph.D.
Wallace H. Kuralt Distinguished Professor
Director of Statistical Development and Consultation
School of Social Work, University of North Carolina at Chapel Hill, NC USA
Office: Room 548C, Tate Turner Kuralt, CB #3550
School of Social Work, Chapel Hill, NC 27599-3550
Phone: (919) 843-2434
Email: dinchen@email.unc.edu
CLASS MEETING TIMES & OFFICE HOURS
Class meets on Wednesdays 9:00-11:50 am (Room 102 TTK) from 8/17 to 12/14
Office hours are Wednesday 12-2 or by appointment
COURSE DESCRIPTION
This course introduces the context and intuition for longitudinal and multilevel models,
and the statistical frameworks, analytical tools, and social behavioral applications of three
types of models: event history analysis (EHA), multilevel modeling (MLM), and growth
curve analysis.
COURSE OBJECTIVES
At the completion of the course, students will have a solid understanding of the
challenges and problems in longitudinal and multilevel analysis. They will know how to
choose appropriate statistical analyses that best suit the type of data and research
questions for a given study. They are expected to be able to conceptualize, design, run,
interpret, and communicate results clearly and effectively in spoken and written settings
based on multilevel modeling (including two-level and three-level hierarchical linear
models, growth curve analysis, categorical MLMs, and understanding cross-classification
and cross-level effects) and event history analysis (life tables, Kaplan-Meier’s estimate of
survivor function, discrete time model, Cox proportional hazard model, marginal models
handling multilevel event data). Students are encouraged to bring their own research
projects/datasets to be used as examples in the course.
PRE-REQUISITES
Students are assumed to be familiar with descriptive and inferential statistics as well as
multiple regression analysis. They should have statistical and statistical software
background at least equivalent to that provided by SOWO918, SOCI209, PSYC282,
1
EDUC284 (linear regression), or SOCI211 (categorical data analysis). Students without
such prerequisites should contact the instructor to determine their eligibility to take this
course.
SAKAI COURSE SITE
Go to: https://www.unc.edu/sakai/
Enter your ONYEN
Navigate to SOWO917.001.FA15
This syllabus is under “syllabus” on the left-hand navigation menu
All class lecture notes, assignments, and other materials as needed will be provided under
“resources” on the left-hand navigation menu
All course materials are on the web site and students are responsible for bringing their
materials to class.
STATISTICAL SOFTWARE PACKAGES
Students may choose to use Stata, SPSS, SAS, or R as the primary statistical software
package for the course. I will mainly use R/SAS for this class along with the textbooks
(below) due to their functionalities on multilevel modelling and graphics. I will reference
R/SAS to SPSS/Stata at various times in classroom lectures, materials, and
demonstrations.
TEXTBOOK (ALL ARE EXPECTED TO BUY THE FIRST BOOK FOR THIS CLASS)
Finch, W. H., Bolin, J.E. and Kelley, K. (2014). Multilevel Modelling Using R. Chapman
and Hall/CRC. Statistics in the Social Behavioral Sciences Series.
(Referred as “R4MLM” for this class)
Singer, J.D., and Willett, J.B., (2003). Applied Longitudinal Data Analysis: Modeling
Change and Event Occurrence, New York, NY: Oxford University Press
RECOMMENDED TEXTBOOKS
Allison, P.D. (1995). Survival Analysis Using the SAS System. Cary, NC: SAS Institute
Inc.
Bliese,
P.
(2013)
R
Manual
for
MLM:
http://cran.rproject.org/doc/contrib/Bliese_Multilevel.pdf (saved into Sakai “Resources” as
“Bliese_Multilevel.pdf”)
Cleves, M.A., Gould, W.W., & Gutierrez, R.G. (2004). An introduction to survival
analysis using Stata, Rev. ed., College Station, TX: Stata Press.
Guo, S. (2010). Survival Analysis: A Practical Guide to Social Work Research. New
York, NY: Oxford University Press.
Pinheiro, J. and Bates, D. (2009). Mixed-Effects Models in S and SPlus. Springer.
2
Rabe-Hesketh, S., & Skrondal, A. (2005). Multilevel and Longitudinal Modeling Using
Stata, College Station, TX: Stata Press.
Raudenbush, S.W., & Bryk, A.S. (2002). Hierarchical Linear Models: Applications and
Data Analysis Methods, Second Edition, Thousand Oaks, CA: Sage Publications
Ltd.
SAS
Online
documentation
for
Proc
Mixed:
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.
htm#mixed_toc.htm
ASSIGNMENTS
GRADE PERCENTAGE
Assignment 1
Assignment 2
Assignment 3
Assignment 4
Assignment 5
Midterm project (take home)
Final Project (take home)
10%
10%
10%
10%
10%
25%
25%
GRADING SYSTEM
The standard School of Social Work interpretation of grades and numerical scores will be used.
H = 94-100
P = 80-93
L = 70-79
F = 69 and below
POLICY ON CLASS ATTENDANCE
Class attendance is an important element of class evaluation, and you are expected to attend all
scheduled sessions. Each class session will cover a great deal of materials and you will easily fall
behind the course if you miss a class session which will affect the class learning project, so it is
imperative to attend. Students are responsible for informing the instructor when they must miss a
class session. You are expected not to miss more than two sessions for the whole semester.
Starting from the second missing, your course grade will be reduced by 10% for each session
missed.
POLICY ON INCOMPLETE AND LATE ASSIGNMENTS
Assignments are to be turned in to the professor by 5pm of the due date noted in the course
outline. Brief extensions may be granted by the professor given advance notice of at least 24
hours. Late assignments (not turned in by 5pm on the due date) will be reduced 10% for each day
late (including weekend days). A grade of incomplete will only be given under extenuating
circumstances and in accordance with University policy.
3
POLICY ON ACADEMIC DISHONESTY
Students are expected to follow the UNC Honor Code. Please include the honor code statement
along with your signature on all assignments:
“I have neither given nor received unauthorized aid on this assignment.”
Please refer to the APA Style Guide, the SSW Manual, and the SSW Writing Guide for
information on attribution of quotes, plagiarism and appropriate use of assistance in preparing
assignments.
If reason exists to believe that academic dishonesty has occurred, a referral will be made to the
Office of the Student Attorney General for investigation and further action as required.
POLICY ON ACCOMMODATIONS FOR STUDENTS WITH DISABILITIES
Students with disabilities that affect their participation in the course may notify the instructor if
they wish to have special accommodations in instructional format, examination format, etc.,
considered.
SESSION SCHEDULE
All sessions meet in Tate-Turner-Kuralt Room 102 except as noted.
1
8/19
2
8/26
3
9/2
4
9/9
5
9/16
6
9/23
7
9/30
8
10/7
-Fall break on 10/14, no class
9
10/21
-Midterm exam due on 10/27.
10
10/28
11
11/4
12
11/11
13
11/18
-Thanksgiving on 11/26, no class on Wed 11/25.
14
12/2-Class final project. Due on 12/11.
4
COURSE OUTLINE (TOPICS, READINGS, AND ASSIGNMENTS)
1(8/19)
Introduction and course overview
Introduction to R: How to install and get help.
Review of fundamental statistical concepts.
How to do regression using R
Readings to be completed for this session:
 R4MLM: Chapter 1
Optional Reading:
 Guo, S. (2013). Advanced statistical analysis. Entry for the Encyclopedia of
Social Work Online. New York, NY: The Oxford University Press.
Assignment #1 (Due in 8/25): In this assignment you will demonstrate your
readiness to use R to read data for data summary, regression analysis and
plot. You will also use R for a simple simulation study to generate your
own data and analyze it.
2(8/26)
Introduction to multilevel and hierarchical linear modeling
The importance of context to social and behavioral science.
Overview of MLM/HLM.
Nested and cluster data in multi-level hypotheses in social sciences
Variance decomposition, intra-class correlation & reliability
Pitfalls of ignoring multilevel data structure
Random effects & fixed effects
Overview of Two-level MLM and three-level MLM
Readings to be completed for this session:
 R4MLM: Chapter 2
 Raudenbush & Bryk, Chapters 1 and 2
 Singer, J. D. (1998). Using SAS Proc Mixed to fit multilevel models,
hierarchical models, and individual growth models. Journal of Educational
and Behavioral Statistics 23(4), 323-355.
 Hedges, L. V. (2007). Correcting a significance test for clustering. Journal
of Educational and Behavioral Statistics 32(2), 151-179.
Optional Reading:
 Guo, S. (2005). “Analyzing grouped data with hierarchical linear modeling”,
Children and Youth Services Review 27:637-65.
3(9/2)
Fitting Two-level MLM in R
Two-level models and implementation in R packages nlme and lme4
Writing out equations and substitution.
Estimation theory.
Variance explained and presenting results.
5
Readings to be completed for this session:
 R4MLM: Chaprer 3
 Raudenbush & Bryk, Chapter 5 (99-130)
 Primo, D., Jacobsmeier, M. L., and Milyo, J. (2007). Estimating the impact
of state policies and institutions with mixed-level data. State Politics and
Policy Quarterly, 7(4), 446-449.
 Snijders, T. A. B., & Bosker, R. J. (1994). Modeled variance in two-level
models. Sociological Methods and Research 22(3), 342-363.
 Hedges, L. V. & Hedberg, E. C. (2007). Intraclass correlations forplanning
group randomized experiments in rural education. Educational Evaluation
and Policy Analysis 29(1), 60-87.
4 (9/9)
Models for Three and more levels
Three-level models and implementation in R packages nlme and lme4
Model fitting and goodness-of-fit indices.
Readings to be completed for this session:
 R4MLM: Chapter 4
 Raudenbush & Bryk, Finish chapter 5
 Rose, R. A., & Bowen, G. L. (2009). Power analysis in social work
intervention research design: Designing cluster randomized trials. Social
Work Research, 33(1), 43–52.
 Raudenbush, S. W. (1997). Statistical analysis and optimal design for cluster
randomized trials. Psychological Methods, 2(2), 173-185.
Optional reading:
 Schochet, P. Z. (2005). Statistical power for random assignment evaluations
of education programs. Washington, DC: Mathematica Policy Research.
Assignment 2 (due in 9/16): Two-/Three-level data analysis using MLM.
5 (9/17)
Longitudinal Data Analysis Using MLM
Longitudinal or multilevel longitudinal data and specifying time
Random effects vs. fixed effects models
The multilevel model for change; model building
Readings to be completed for this session:
 R4MLM: Chapter 5
 Singer & Willett, Chapters 1-3
 Raudenbush, S. W. (2001). Comparing personal trajectories and drawing
causal inferences from longitudinal data. Annual Review of Psychology 52,
501-525.
 Raudenbush, SW., & Liu, X. (2000). “Statistical power and optimal design
for multisite randomized trials.” Psychological Methods 5(2): 199-213.
6
6 (9/23)
Graphing Data in Multilevel Contexts.
R powerful graphics for MLM
Readings to be completed for this session:
 R4MLM: Chapter 6
Assignment 3 (due in 9/29): Longitudinal data analysis using MLM.
7(9/30)
Generalized Linear Models for non-normal data
Logistic regression and Poisson Regression
When and how to do these regression models
Model fitting and presentations
Readings to be completed for this session:
 R4MLM: Chapter 7
 Other lecture notes will be provided
8(10/7)
Advanced MLM: Generalized Linear Models.
Random coefficient logistic regression for categorical data
Random coefficient Poisson regression for count data
MLM Data and model fitting, presentation
Readings to be completed for this session:
 R4MLM: Chapter 8
Assignment 4 (due in 10/25): Data analysis using MLM.
9(10/21)
Introduction to Bayesian MLM.
Introduction to Bayesian models and Bayesian MLM
Introduction to Markov Chain Monte Carlo (MCMC)
Easy implementation in R for data analysis
Readings to be completed for this session:
 R4MLM: Chapter 9.
Midterm Exam (Due on 10/27):
Use data sets provided by the course or data set you choose to run a multilevel regression
model. Write a brief paper (no more than 12 pages, double spaced) to present findings.
The paper should include: (1) 2 research questions; (2) data description and specification
of the multilevel regression; (3) description of the process by which the model will be
fitted; (4) a description of model diagnostics and sensitivity tests; and (5) report and
interpret the findings from each of (2)-(3). You should be able to explain the findings to a
lay audience.
7
10(10/28)
Intro to Survival Analysis
Overview of event history analysis
Censoring
Discrete-time event occurrence
Life tables
Hazard and survival functions/curves
Survival data analysis using R/SAS
Readings to be completed for this session:
 Singer & Willett, Chapters 9-10.
 Yang, T. & Aldrich, H. E. (2012). Out of sight but not out of mind: Why
failure to account for left truncation biases research on failure rates. Journal
of Business Venturing 27, 477-492.
Seminar reading to be completed for this session:
 Berger, M. C. & Black. D. A. (1998). The duration of Medicaid spells: An
analysis using flow and stock samples. The Review of Economics and
Statistics 80(4), 667-675.
Optional readings:
 Guang Guo (1993). “Event history analysis for left-truncated data”,
Sociological Methodology, 23, 217-243.
 Harris, K.M. (1993). “Work and welfare among single mothers in poverty.”
American Journal of Sociology 99: 317-352.
11(11/4)
Discrete-time models, continued
The discrete time hazard model
Alternate specifications for time
Time-varying covariates
Proportionality and unobserved heterogeneity
Parametric models (Weibull, accelerated failure time, etc.)
Survival data analysis using R/SAS
Readings to be completed for this session:
 Singer & Willett, Chapters 11-12.
 Nam, Y. (2005). The roles of employment barriers in welfare exits and
reentries after welfare reform: Event history analysis. Social Service Review
79(2), 268-293.
 Haque, M. M. & Washington, S. (2014). A parametric duration model of the
reaction times of drivers distracted by mobile phone conversations. Accident
Analysis and Prevention 62, 42-53.
Seminar reading to be completed for this session:
 Glick, J. E. & Van Hook, J. (2011). Does a house divided stand? Kinship
and the continuity of shared living arrangements. Journal of Marriage and
Family 73, 1149-1164.
Optional readings:
8



12(11/11)
Lee, E. T. & Go, O. T. (1997). Survival analysis in public health research.
Annual Review of Public Health 18, 105-134.
Allison, P.D. (1982). “Discrete-time methods for the analysis of event
histories”, Sociological Methodology, 13, 61-98.
Hetling, A., Ovwigho, P. C., & Born, C. E. (2007). Do welfare avoidance
grants prevent cash assistance? Social Service Review 81(4), 609-631.
Kaplan Meier & Cox proportional hazards model
The clog-log model
Rare event models
Kaplan-Meier’s estimate of survivor functions
The cumulative hazard function and kernel smoothing
Partial likelihood estimator
Cox regression
Data analysis using R/SAS
Readings to be completed for this session:
 Singer & Willett, Chapters 13 and 14 up to page 516.
 Heise, M. (2012). Law and policy entrepreneurs: Empirical evidence on the
expansion of school choice policy. Notre Dame Law Review 87(5), 19171940.
Seminar reading to be completed for this session:
 Kosterman, R., Hawkins, D., Guo, J., Catalano, R. F., & Abbott, R. D.
(2000). The dynamics of alcohol and marijuana initiation: Patterns and
predictors of first use in adolescence. American Journal of Public Health
90(3), 360-366.
Optional readings:
 Sandefur & Cook, (1998). “Permanent exits from public assistance: The
impact of duration, family, and work”. Social Forces, 77(2) 763-786.
 Guo, S., Biegel, D., Johnson, J. & Dyches, H. (2001) “Assessing the impact
of mobile crisis services on preventing hospitalization: A community-based
evaluation”. Psychiatric Services 52(2):223-228.
 Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal
Statistical Society 2(XX), 187-220.
 Efron, B. (1977). The efficiency of Cox’s likelihood function for censored
dta. Journal of the American Statistical Association 72(359), 557-565.
Assignment 5 (due in 10/17): Survival Data analysis.
13(11/18)
Cox proportional hazards model, continued.
Partial likelihood method
Interpreting results
Alternate structures for time
Non-proportional hazards and interactions with time
9
Diagnostics
Competing risks
Power analysis for survival models
Introduction to multilevel event time data (multivariate failure time data)
Readings to be completed for this session:
 Singer & Willett, finish chapter 14 and 15.
 Jozwiak, K. & Moerbeek, M. (2012). Power analysis for trials with discretetime survival endpoints. Journal of Educational and Behavioral Statistics
37(5), 630-654.
 MORE
Optional readings:
 Stata
documentation
on
STPOWER:
http://www.stata.com/manuals13/ststpower.pdf
 Heckman, J.J., & Singer, B. (1985), “Social science duration analysis”, in
Longitudinal Studies of Labor Market Data, New York, NY: Cambridge
University Press. Chapter 2.
 Grilli, L. (2005). The random effects proportional hazards model with
grouped survival data: A comparison between the group continuous and
continuation ratio versions. Journal of the Royal Statistical Society Series A,
168(1), 83-94.
 Guo, S., & Wells, K. (2003). Research on timing of foster-care outcomes:
one methodological problem and approaches to its solution. Social Service
Review 77(1): 1-24.
 Lin, D.Y. (1994). Cox regression analysis of multivariate failure time data:
The marginal approach. Statistics in Medicine 13: 2233-2247.
 Trussell, J., & Richards, T. (1985).
“Correcting for unmeasured
heterogeneity in hazard models using the Heckman-Singer procedure.”
Sociological Methodology 15: 242-276.
14
Final Exam (Due on 12/11): Use data sets provided by the course or data set you
choose to run an event history/survival regression model (any type). Write a brief
paper (no more than 12 pages, double spaced) to present findings. The paper
should include: (1) 2 research questions; (2) data description and specification of
the survival regression; (3) description of the process by which the model will be
fitted; (4) a description of model diagnostics and sensitivity tests; and (5) report and
interpret the findings from each of (2)-(3). You should be able to explain the
findings to a lay audience.
10
Download