Planning the Data Analysis: Statistical Power, Mixed Effects Linear Models, Moderator and Mediator Models and Related Issues Katie Witkiewitz, PhD Department of Psychology University of New Mexico 2012 NIH Summer Institute on Social and Behavioral Intervention Research July 12, 2012 Ten questions… 1. What is the type of study design? - Number of treatment groups/arms - Randomization procedures - Any clustering? 2. 3. 4. 5. 6. 7. How many subjects can be recruited/observed in a study period? Will there be an equal number of participants in each group? What are the research hypotheses? What do you hope to achieve or learn from each aim? What are the primary outcome measures? What types of variables will be included? - Unit of measurement for DVs, IVs, covariates - Issues with measures (e.g., skewness, counts, inflated zeroes) 8. How many total evaluations and measurements? 9. For repeated measurements, what is the measurement interval? 10. What kinds of missing data patterns do you expect and how much? 2 Other questions depend on design • Qualitative research – – – – – If focus groups, how many groups and group composition? How will data be transcribed and coded? Which software, if any, do you plan to use? What are the research questions? Are you comparing? Aggregating? Contrasting? Sorting? • Secondary data analyses – What is the available data? – Return to questions #4-10 3 Components of the data analysis plan • Study design – – – – Brief overview of design Sampling plan Randomization plan, if applicable Precision/power analysis and sample size • Data management • Statistical analyses – – – – Proposed analyses for each primary aim Secondary or exploratory analyses Interim analyses, if applicable Missing data methodology 4 Study design issues for data analysis • Design issues specific to data analyses – Sampling procedures • Stratified sampling • Clustering – Randomization procedures – # of treatment conditions • Realistically achievable n within each condition • Allocation ratio – Measurement • Unit of measurement for outcome(s) and any covariates • # of measures/constructs • Duration and intervals of assessment – Sample size 5 What sample size do I need? • How many subjects can you realistically recruit? • Precision – how precise do you want to be in measuring the effect? • Power – powering your study to detect a significant effect 6 Precision analyses for pilot studies • Sim & Lewis (2012) recommend n>50 • Precision increases with greater n • In order to carry out any precision-based sample size calculation you need: – Width of confidence interval (e.g., 95%) – narrower interval = more precise estimate – Formula for the relevant standard error 7 Effect of Sample Size on Precision • Estimating a percentage of 30% Sample size 95% CI 50 ± 12.7%, (17.3% - 42.7%) 100 ± 9.0% (21% - 39%) 500 ± 4.0% (26% - 34%) 1000 ± 2.8% (27.2% - 32.8%) 8 Precision Analysis • The sample size required to achieve the desired maximum error E can be chosen as z1 / 2 2 n E 2 2 • Example: suppose that we wish to have a 95% assurance that the error in the estimated mean is less than 10% of the standard deviation (i.e., 0.1). The required sample size is z1 / 2 2 n E 2 2 1 . 96 2 ( 0 . 1 ) 2 2 384 . 2 385 Precision Analysis • Suppose that we wish to have a 95% assurance that the error in the estimated mean is less than 25% of the standard deviation (i.e., 0.25). The required sample size is z 1 / 2 2 n E 2 2 1 . 96 2 2 ( 0 . 25 ) 2 61 . 5 62 Insert Successful, Carefully Controlled Pilot Study Here I have estimates of the effect from my pilot study, now what do I do? 11 Effect size has been defined in various ways • Cohen (1988): “degree to which the phenomenon is present in the population” (p. 9). • Kelley & Preacher (2012): “a quantitative reflection of the magnitude of some phenomenon that is used for the purpose of addressing a question of interest.” (p. 140). • Context of social and behavioral interventions: magnitude of the detectable, minimally expected difference between intervention conditions on outcome of interest. – Standardized difference between two means Cohen’s d x treatment x control S pooled 12 Importance of variability in the achieved power of an intervention 13 Do achieved effect sizes from a prior study = the probable effect size in a subsequent study? Study 1 – Pilot trial d x treatment x control S pooled Study 2 – Main RCT d 10 . 5 9 . 5 0 .5 2 .0 d 10 . 5 9 . 5 0 . 67 1 .5 14 Can I adjust my SD from my pilot to estimate SD for RCT? • If you have pilot data, then you can use the precision analysis logic to “inflate” your observed pilot SD. • Sim & Lewis (2012) provide formulas and calculations 15 Statistical power and sample size • How many subjects do you need to have sufficient statistical power to detect the hypothesized effect? – – – – – What level of power is desirable for your design? What level for statistical significance? What are your statistical tests? What effect size do you suspect to detect? What is the probable variation in the sample? 16 Step-by-step approach to calculating effect sizes. • Decide on the question of interest. – Group differences (Cohen’s d/f2; Hedges g) – Strength of association (r, partial r, β, η2) – Difference in proportion (risk ratio (RR), odds ratio (OR), number needed to treat (NNT)) • Examine prior studies in existing literature to obtain estimates of parameters in the effect size equation. • Find the appropriate formula (or an online calculator) and input the parameters. 17 Online effect size calculators • • • • • www.campbellcollaboration.org/resources/ www.danielsoper.com/statcalc3/ www.divms.uiowa.edu/~rlenth/Power/ http://statpages.org/#Power + many more… 18 Step-by-step approach to power analysis. • Decide on α and β • Using effect size (or range of effect sizes) estimate how many subjects needed to obtain β, given α, for a particular test. • Use existing software or simulation study. 19 Software for estimating power • Within statistics programs: – SAS, R, Stata, SPSS – SamplePower • Stand-alone and online programs: – http://statpages.org/#Power – www.divms.uiowa.edu/~rlenth/Power/ – Optimal Design http://sitemaker.umich.edu/group-based/home – G*power – available free at http://www.psycho.uniduesseldorf.de/aap/projects/gpower/ – PASS –http://www.ncss.com/pass.html – Power & Precision –http://www.power-analysis.com/ – nQuery – www.statistical-solutions-software.com 20 21 22 23 Simulation studies for estimating power • Generate multiple datasets that mimic the design with sample size n and hypothesized parameter estimates as starting values with set effect size – SAS, Mplus, R, Stata, Matlab • More complicated model? Difficulty in determining parameter estimates 24 Data management • Briefly describe your plan for data entry and management. • Overview of preliminary data checking. – Check distributions of primary measures – Examine randomization failures – Test for systematic attrition biases • Clarify how you will handle data issues that may arise. 25 26 Statistical analyses – Types of Analyses – Proposed analyses for each primary aim – Secondary or exploratory analyses • • • • Mediation models Moderation models Mediated-moderation and moderated-mediation Multiple mediator models – Missing data methodology 27 Types of Analyses • Intent-to-treat – includes all randomized subjects, whether or not the subjects were compliant or completed the study • Full analysis set – excludes those who are missing all data by completing no assessments after randomization. May also exclude those who never took a single treatment dose • Per protocol –subset of subjects who complied with the protocol to ensure that the data would be likely to exhibit the effects of the treatment, according to the underlying scientific model 28 Which statistical test? • What are your research hypotheses? • What is the level of measurement for outcome measure(s) and IVs? Outcomes IVs Statistical test Interval/scale 1 DV at 1 time 1 with 2 levels t-test or regression (interval/scale), Mann Whitney (ordinal), χ2 (categorical) 1 with 2+ levels ANOVA or regression (interval/scale), Kruskal-Wallis (ordinal), χ2 (categorical) Interval/scale 2+ DVs at 1 time 1+ with 2+ levels MANOVA, multivariate linear regression, latent variable model Interval/scale repeated measures 1+ with 2+ levels Mixed-effects model (aka random effects regression; multilevel model; latent growth model), repeated measures ANOVA, survival model Categorical DV 1+ with 2+ levels Logistic regression (multinomial if 2+ categories of DV), binary classification test Count DV 1+ with 2+ levels Poisson or negative binomial regression models, generalized linear models 29 30 Latent Variable Models Latent Variable unobserved; unmeasured; based on relationships between observed variables observed variable (x1) observed variable (x2) observed variable (x3) E1 E2 E3 observed; measured variables error or “residual,” not explained by shared variance 31 Latent variables can be continuous or categorical; two representations of the same reality Continuous latent variable – correlation explained by underlying factor Y Ex. structural equation models, factor models, growth curve models, multilevel models X Categorical latent variable – correlation reflects difference between discrete groups on mean levels of observed variables Y X Ex. latent class analysis, mixture analyses, latent transition analysis, latent profile analysis 32 Types of latent variable models Continuous latent variable Categorical latent variable Categorical & continuous latent variable Crosssectional Factor analysis* Latent class/profile analysis Factor mixture model Longitudinal Latent growth curve (i.e., mixed effects*, multilevel, HLM) Latent Markov model (i.e., latent transition analysis) Growth mixture model (i.e., latent class growth, semiparametric group) 33 Factor Analysis • Common tool for examining constructs and creating a measure of related constructs. • Models should be driven by theory, guided by best-practices for model selection, evaluation. 34 Aim #1: Develop a multidimensional measure of alcohol treatment outcome. Analysis plan text for factor analysis model: “Measurement models will be estimated for each construct across studies using a moderated nonlinear factor analysis (MNLFA) approach (Bauer & Hussong, 2009). MLNFA is a novel approach for examining measure structure that allows for items of mixed scale types (i.e., binary, count, continuous) and allows parameters of the factor model to vary as a function of moderator variables (e.g., source of the data, gender).” 35 Mixed Effects Models (aka linear mixed models, random effects regression models, multilevel models, hierarchical linear models, latent growth curve models) • Mixed = “fixed” and “random” effects. • Fixed effects – group level, no variability within individuals (or groups) • Random effects – individual level, variability within individuals (or groups) 36 Mixed effects models in pictures: Fixed effects model Bill Jane Joe y Gordan Sue 1 2 3 time 37 Mixed effects models in pictures: Random-intercept model y Intercept 1 2 3 time Slope Time 1 Time 2 Time 3 X1 X2 X3 ε1 ε2 ε3 38 Mixed effects models in pictures: Random intercept and random slope model y Intercept 1 2 3 time Slope Time 1 Time 2 Time 3 X1 X2 X3 ε1 ε2 ε3 39 Aim #2: To examine the effect of treatment in reducing heavy drinking days among help-seeking alcohol dependent patients. Analysis plan text for mixed effects model: “The primary outcome measure of percent heavy drinking days assessed weekly across 14 weeks of treatment will be examined using a mixed effects model with fixed effects of treatment and random effects of time (week since randomization).” 40 Survival Models • Modeling the amount time (t) to an event (T), where Survival (t) = Pr (T > t) – Comparison in survival rate across groups – Incorporate time-invariant or time-varying covariates • Cox Proportional Hazards Model – Underlying hazard function, h(t)=dS(t)/dt describes how the hazard (risk) changes over time in response to explanatory covariates. 41 Aim #3: To examine the effect of treatment on the amount of time to first drinking or drug use lapse following release from jail. Analysis plan text for survival model: “We will estimate the time to first lapse using a Cox proportional hazards model, where time (t) will be evaluated weekly over the 6-month time interval. The hazard probability for a given week (t) is estimated by the proportion of individuals under observation who are known to have not experienced any drinking or substance use lapses prior to week t that then experienced their first drinking or substance use lapse during week t, conditional on treatment group, gender, and dependence severity.” 42 Testing Mediation • Purpose: To statistically test whether the association between the IV and DV is explained by some other variable(s). • An illustration of mediation a, b, and c are path coefficients. Variables in parentheses are standard errors of those path coefficients. Comparison of Mediation Tests MacKinnon et al 2002 • Baron and Kenny’s approach criticized for having low power - the product of the regression coefficients α and β is often skewed and highly kurtotic • Product of coefficients approach more powerful z’ = α*β/SQRT(β2*sα2 + α2*sβ2) • Bootstrapping to get range of CI for indirect effect: – ProdClin: http://www.public.asu.edu/~davidpm/ripl/Prodclin/ – Macros created for SPSS and SAS: http://www.afhayes.com/spss-sas-and-mplus-macros-andcode.html#sobel – Mplus code: www.statmodel.com – R code generator: http://www.quantpsy.org/medmc/medmc.htm Designing your study to test mediation hypotheses • Key to testing mediation is measurement – Temporal precedence of hypothesized variables in the causal chain • Experimental manipulation, if possible • Mechanism of change? – Mediator ≠ Mechanism – Other steps need to establish mechanism, see Kazdin (2007) 45 What is Moderation? • Variable that affects the direction and/or strength of the relationship between a predictor and a criterion variable – Categorical (e.g., males vs. females) – Continuous (e.g., level of moderator) • Designing your study to test moderation – need a larger sample size More Complex Examples • “Conditional indirect effects” see Preacher, Rucker, & Hayes (2007) – Mediated moderation – Moderated mediation • Latent mediators/moderators • Multiple mediators and/or moderators • Mediation/moderation of growth process Missing Data Issues and Methodology • Design the study to minimize missing data • Acknowledge how you will you examine missing data and test missing data assumptions • Missing data methodologies 48 Examining Missing Data and Missing Data Assumptions • Missing data pattern – which values are missing and which are observed – Univariate missing – confined to a single variable – Monotone missing – missing for all cases, e.g., attrition • Missing data mechanism – why values are missing and the association between missing values and treatment outcomes – Missing completely at random (MCAR) – Missing at random (MAR) – Missing not at random (MNAR) 49 Missing Data Methodologies • Commonly used methods under MAR (or MCAR) 1. 2. Complete case - discard incomplete cases Imputation – fill-in missing values - Single imputation (e.g., mean, baseline, LOCF, BOCF) - Underestimate standard errors and yield biased estimates - Multiple imputation – takes into account uncertainty in imputations 3. Analyze incomplete data using method that does not require complete set - Maximum likelihood - Bayesian methods 50 Problem with single imputation methods 51 Analytic Methods under MNAR • Sensitivity analyses to estimate degree of bias (see Enders, 2010) • Pattern Mixture Models – assume that the substantive data are conditional on the missing data mechanism. The conditional model of the observed data is estimated separately for each missing data pattern. • Selection models – assume that the missing data mechanism is conditional on the substantive data. Observed data are used to predict the probability of missing data. 52 Describing missing data procedures: “For the proposed study we will use maximum likelihood estimation for all analyses, which provides the variance-covariance matrix for all available data and is a preferred method for estimation when some data are missing (Schafer & Graham, 2002). Sensitivity analyses will be used to test the influence of missing data and missing data assumptions (see Witkiewitz et al., 2012).” 53 Common pitfalls 1. 2. 3. 4. 5. Data analysis plan does not match rest of proposal. Cannot address aims or answer hypotheses Not consistent with measures/research design Ignore critical issues or makes unrealistic assumptions. Effect size over-estimated or not included Assumes data will be normally distributed, when unlikely Does not address missing data and attrition biases Measurement invariance when comparing across groups Propose complicated models without statistical expertise in the research team. Propose to test models without clear hypothesis or rationale. Include complex measures (e.g., time varying covariates, genetic information, imaging data), without clear description of how the measures will be analyzed or included in the analyses. 54 References • • • • • • • • • • • Sim & Lewis (2012). The size of a pilot study for a clinical trial should be calculated in relation to considerations of precision and efficiency. J of Clinical Epidemiology, 65, 301-308. Vickers, A. J. (2005). Parametric versus non-parametric statistics in the analysis of randomized trials with non-normally distributed data. BMC Medical Research Methodology, 5, 35. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum. Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17, 137-152. MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., & Sheets, V. (2002). A comparison of methods to test the significance of the mediated effect. Psychological Methods, 7(1), 83-104. Preacher, K.J., Rucker, D.D., & Hayes, A.F. (2007). Addressing moderated mediation hypotheses: Theory, methods, and prescriptions. Multivariate Behavioral Research, 42, 185-227. Kazdin, A.E. (2007). Mediators and mechanisms of change in psychotherapy research. Annual Review of Clinical Psychology, 3, 1-27. National Research Council. (2010). The Prevention and Treatment of Missing Data in Clinical Trials. Panel on Handling Missing Data in Clinical Trials. Washington, DC: The National Academies Press. Enders, C.K. (2010). Applied missing data analysis. New York: Guilford Press. Schafer, J.L. & J.W. Graham, J. W. (2002). Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147-177. Witkiewitz, K., Bush, T., Magnusson, L. B., Carlini, B. H., & Zbikowski, S. M. (2012). Trajectories of cigarettes per day during the course of telephone tobacco cessation counseling services: A comparison of missing data models. Nicotine and Tobacco Research. doi: 10.1093/ntr/ntr291 55