Planning the Data Analysis: Statistical Power, Mixed

advertisement
Planning the Data Analysis:
Statistical Power, Mixed Effects
Linear Models, Moderator and
Mediator Models and Related Issues
Katie Witkiewitz, PhD
Department of Psychology
University of New Mexico
2012 NIH Summer Institute on
Social and Behavioral Intervention Research
July 12, 2012
Ten questions…
1.
What is the type of study design?
- Number of treatment groups/arms
- Randomization procedures
- Any clustering?
2.
3.
4.
5.
6.
7.
How many subjects can be recruited/observed in a study period?
Will there be an equal number of participants in each group?
What are the research hypotheses?
What do you hope to achieve or learn from each aim?
What are the primary outcome measures?
What types of variables will be included?
- Unit of measurement for DVs, IVs, covariates
- Issues with measures (e.g., skewness, counts, inflated zeroes)
8. How many total evaluations and measurements?
9. For repeated measurements, what is the measurement interval?
10. What kinds of missing data patterns do you expect and how much?
2
Other questions depend on design
• Qualitative research
–
–
–
–
–
If focus groups, how many groups and group composition?
How will data be transcribed and coded?
Which software, if any, do you plan to use?
What are the research questions?
Are you comparing? Aggregating? Contrasting? Sorting?
• Secondary data analyses
– What is the available data?
– Return to questions #4-10
3
Components of the data analysis plan
• Study design
–
–
–
–
Brief overview of design
Sampling plan
Randomization plan, if applicable
Precision/power analysis and sample size
• Data management
• Statistical analyses
–
–
–
–
Proposed analyses for each primary aim
Secondary or exploratory analyses
Interim analyses, if applicable
Missing data methodology
4
Study design issues for data analysis
• Design issues specific to data analyses
– Sampling procedures
• Stratified sampling
• Clustering
– Randomization procedures
– # of treatment conditions
• Realistically achievable n within each condition
• Allocation ratio
– Measurement
• Unit of measurement for outcome(s) and any covariates
• # of measures/constructs
• Duration and intervals of assessment
– Sample size
5
What sample size do I need?
• How many subjects can
you realistically recruit?
• Precision – how precise
do you want to be in
measuring the effect?
• Power – powering your
study to detect a
significant effect
6
Precision analyses for pilot studies
• Sim & Lewis (2012) recommend n>50
• Precision increases with greater n
• In order to carry out any precision-based
sample size calculation you need:
– Width of confidence interval (e.g., 95%) –
narrower interval = more precise estimate
– Formula for the relevant standard error
7
Effect of Sample Size on Precision
• Estimating a percentage of 30%
Sample size
95% CI
50
± 12.7%, (17.3% - 42.7%)
100
± 9.0% (21% - 39%)
500
± 4.0% (26% - 34%)
1000
± 2.8% (27.2% - 32.8%)
8
Precision Analysis
• The sample size required to achieve the desired
maximum error E can be chosen as
z1  / 2
2
n
E
2
2
• Example: suppose that we wish to have a 95%
assurance that the error in the estimated mean is
less than 10% of the standard deviation  (i.e.,
0.1). The required sample size is
z1   / 2 
2
n
E
2
2
1 . 96 
2
( 0 . 1 )
2
2

 384 . 2  385
Precision Analysis
• Suppose that we wish to have a 95% assurance
that the error in the estimated mean is less than
25% of the standard deviation (i.e., 0.25). The
required sample size is
z 1   / 2
2
n
E
2
2
1 . 96 
2

2
( 0 . 25  )
2
 61 . 5  62
Insert Successful, Carefully
Controlled Pilot Study Here
I have estimates of the effect from
my pilot study, now what do I do?
11
Effect size has been defined in various ways
• Cohen (1988): “degree to which the phenomenon is present in the
population” (p. 9).
• Kelley & Preacher (2012): “a quantitative reflection of the
magnitude of some phenomenon that is used for the purpose of
addressing a question of interest.” (p. 140).
• Context of social and behavioral interventions: magnitude of the
detectable, minimally expected difference between intervention
conditions on outcome of interest.
– Standardized difference between two means
Cohen’s d 
x treatment  x control
S pooled
12
Importance of variability in the
achieved power of an intervention
13
Do achieved effect sizes from a prior
study = the probable effect size in a
subsequent study?
Study 1 – Pilot trial
d 
x treatment  x control
S pooled
Study 2 – Main RCT
d 
10 . 5  9 . 5
0 .5
 2 .0
d 
10 . 5  9 . 5
 0 . 67
1 .5
14
Can I adjust my SD from my pilot
to estimate SD for RCT?
• If you have pilot data, then you can
use the precision analysis logic to
“inflate” your observed pilot SD.
• Sim & Lewis (2012) provide formulas
and calculations
15
Statistical power and sample size
• How many subjects do you need to have sufficient
statistical power to detect the hypothesized effect?
–
–
–
–
–
What level of power is desirable for your design?
What level for statistical significance?
What are your statistical tests?
What effect size do you suspect to detect?
What is the probable variation in the sample?
16
Step-by-step approach to
calculating effect sizes.
• Decide on the question of interest.
– Group differences (Cohen’s d/f2; Hedges g)
– Strength of association (r, partial r, β, η2)
– Difference in proportion (risk ratio (RR), odds
ratio (OR), number needed to treat (NNT))
• Examine prior studies in existing literature to
obtain estimates of parameters in the effect
size equation.
• Find the appropriate formula (or an online
calculator) and input the parameters.
17
Online effect size calculators
•
•
•
•
•
www.campbellcollaboration.org/resources/
www.danielsoper.com/statcalc3/
www.divms.uiowa.edu/~rlenth/Power/
http://statpages.org/#Power
+ many more…
18
Step-by-step approach to power
analysis.
• Decide on α and β
• Using effect size (or range of effect sizes)
estimate how many subjects needed to
obtain β, given α, for a particular test.
• Use existing software or simulation study.
19
Software for estimating power
• Within statistics programs:
– SAS, R, Stata, SPSS – SamplePower
• Stand-alone and online programs:
– http://statpages.org/#Power
– www.divms.uiowa.edu/~rlenth/Power/
– Optimal Design http://sitemaker.umich.edu/group-based/home
– G*power – available free at
http://www.psycho.uniduesseldorf.de/aap/projects/gpower/
– PASS –http://www.ncss.com/pass.html
– Power & Precision –http://www.power-analysis.com/
– nQuery – www.statistical-solutions-software.com
20
21
22
23
Simulation studies for estimating
power
• Generate multiple datasets that mimic the
design with sample size n and hypothesized
parameter estimates as starting values with set
effect size
– SAS, Mplus, R, Stata, Matlab
• More complicated model? Difficulty in
determining parameter estimates
24
Data management
• Briefly describe your plan for data entry
and management.
• Overview of preliminary data checking.
– Check distributions of primary measures
– Examine randomization failures
– Test for systematic attrition biases
• Clarify how you will handle data issues
that may arise.
25
26
Statistical analyses
– Types of Analyses
– Proposed analyses for each primary aim
– Secondary or exploratory analyses
•
•
•
•
Mediation models
Moderation models
Mediated-moderation and moderated-mediation
Multiple mediator models
– Missing data methodology
27
Types of Analyses
• Intent-to-treat – includes all randomized subjects,
whether or not the subjects were compliant or
completed the study
• Full analysis set – excludes those who are missing
all data by completing no assessments after
randomization. May also exclude those who never
took a single treatment dose
• Per protocol –subset of subjects who complied
with the protocol to ensure that the data would be
likely to exhibit the effects of the treatment,
according to the underlying scientific model
28
Which statistical test?
• What are your research hypotheses?
• What is the level of measurement for outcome measure(s) and IVs?
Outcomes
IVs
Statistical test
Interval/scale 1 DV
at 1 time
1 with 2 levels
t-test or regression (interval/scale), Mann
Whitney (ordinal), χ2 (categorical)
1 with 2+ levels
ANOVA or regression (interval/scale),
Kruskal-Wallis (ordinal), χ2 (categorical)
Interval/scale 2+
DVs at 1 time
1+ with 2+ levels
MANOVA, multivariate linear regression,
latent variable model
Interval/scale
repeated measures
1+ with 2+ levels
Mixed-effects model (aka random effects
regression; multilevel model; latent growth
model), repeated measures ANOVA,
survival model
Categorical DV
1+ with 2+ levels
Logistic regression (multinomial if 2+
categories of DV), binary classification test
Count DV
1+ with 2+ levels
Poisson or negative binomial regression
models, generalized linear models
29
30
Latent Variable Models
Latent Variable
unobserved; unmeasured; based on
relationships between observed variables
observed
variable (x1)
observed
variable (x2)
observed
variable (x3)
E1
E2
E3
observed; measured variables
error or “residual,” not
explained by shared
variance
31
Latent variables can be continuous or categorical;
two representations of the same reality
Continuous latent variable –
correlation explained by underlying
factor
Y
Ex. structural equation models, factor
models, growth curve models,
multilevel models
X
Categorical latent variable –
correlation reflects difference between
discrete groups on mean levels of
observed variables
Y
X
Ex. latent class analysis, mixture
analyses, latent transition analysis,
latent profile analysis
32
Types of latent variable models
Continuous latent
variable
Categorical latent
variable
Categorical &
continuous latent
variable
Crosssectional
Factor analysis*
Latent class/profile
analysis
Factor mixture model
Longitudinal
Latent growth curve
(i.e., mixed effects*,
multilevel, HLM)
Latent Markov model
(i.e., latent transition
analysis)
Growth mixture
model (i.e., latent
class growth, semiparametric group)
33
Factor Analysis
• Common tool for examining constructs
and creating a measure of related
constructs.
• Models should be driven by theory, guided
by best-practices for model selection,
evaluation.
34
Aim #1: Develop a multidimensional
measure of alcohol treatment outcome.
Analysis plan text for factor analysis model:
“Measurement models will be estimated for each
construct across studies using a moderated nonlinear
factor analysis (MNLFA) approach (Bauer & Hussong,
2009). MLNFA is a novel approach for examining
measure structure that allows for items of mixed scale
types (i.e., binary, count, continuous) and allows
parameters of the factor model to vary as a function of
moderator variables (e.g., source of the data, gender).”
35
Mixed Effects Models
(aka linear mixed models, random effects regression models, multilevel
models, hierarchical linear models, latent growth curve models)
• Mixed = “fixed” and “random” effects.
• Fixed effects – group level, no variability
within individuals (or groups)
• Random effects – individual level,
variability within individuals (or groups)
36
Mixed effects models in pictures: Fixed effects model
Bill
Jane
Joe
y
Gordan
Sue
1
2
3
time
37
Mixed effects models in pictures: Random-intercept model
y
Intercept
1
2
3
time
Slope
Time 1
Time 2
Time 3
X1
X2
X3
ε1
ε2
ε3
38
Mixed effects models in pictures: Random intercept and
random slope model
y
Intercept
1
2
3
time
Slope
Time 1
Time 2
Time 3
X1
X2
X3
ε1
ε2
ε3
39
Aim #2: To examine the effect of treatment
in reducing heavy drinking days among
help-seeking alcohol dependent patients.
Analysis plan text for mixed effects model:
“The primary outcome measure of percent heavy
drinking days assessed weekly across 14 weeks
of treatment will be examined using a mixed
effects model with fixed effects of treatment and
random effects of time (week since
randomization).”
40
Survival Models
• Modeling the amount time (t) to an event
(T), where Survival (t) = Pr (T > t)
– Comparison in survival rate across groups
– Incorporate time-invariant or time-varying
covariates
• Cox Proportional Hazards Model
– Underlying hazard function, h(t)=dS(t)/dt
describes how the hazard (risk) changes over
time in response to explanatory covariates.
41
Aim #3: To examine the effect of treatment
on the amount of time to first drinking or
drug use lapse following release from jail.
Analysis plan text for survival model:
“We will estimate the time to first lapse using a Cox
proportional hazards model, where time (t) will be
evaluated weekly over the 6-month time interval.
The hazard probability for a given week (t) is
estimated by the proportion of individuals under
observation who are known to have not experienced
any drinking or substance use lapses prior to week t
that then experienced their first drinking or
substance use lapse during week t, conditional on
treatment group, gender, and dependence severity.”
42
Testing Mediation
• Purpose: To statistically test whether the
association between the IV and DV is explained
by some other variable(s).
• An illustration of mediation
a, b, and c are path coefficients. Variables in
parentheses are standard errors of those path
coefficients.
Comparison of Mediation Tests
MacKinnon et al 2002
• Baron and Kenny’s approach criticized for having low
power - the product of the regression coefficients α and β
is often skewed and highly kurtotic
• Product of coefficients approach more powerful
z’ = α*β/SQRT(β2*sα2 + α2*sβ2)
• Bootstrapping to get range of CI for indirect effect:
– ProdClin: http://www.public.asu.edu/~davidpm/ripl/Prodclin/
– Macros created for SPSS and SAS:
http://www.afhayes.com/spss-sas-and-mplus-macros-andcode.html#sobel
– Mplus code: www.statmodel.com
– R code generator:
http://www.quantpsy.org/medmc/medmc.htm
Designing your study to test
mediation hypotheses
• Key to testing mediation is measurement
– Temporal precedence of hypothesized
variables in the causal chain
• Experimental manipulation, if possible
• Mechanism of change?
– Mediator ≠ Mechanism
– Other steps need to establish mechanism, see
Kazdin (2007)
45
What is Moderation?
• Variable that affects the direction and/or
strength of the relationship between a
predictor and a criterion variable
– Categorical (e.g., males vs. females)
– Continuous (e.g., level of moderator)
• Designing your study to test moderation –
need a larger sample size
More Complex Examples
• “Conditional indirect effects” see Preacher,
Rucker, & Hayes (2007)
– Mediated moderation
– Moderated mediation
• Latent mediators/moderators
• Multiple mediators and/or moderators
• Mediation/moderation of growth process
Missing Data Issues and
Methodology
• Design the study to minimize missing data
• Acknowledge how you will you examine
missing data and test missing data
assumptions
• Missing data methodologies
48
Examining Missing Data and Missing
Data Assumptions
• Missing data pattern – which values are
missing and which are observed
– Univariate missing – confined to a single variable
– Monotone missing – missing for all cases, e.g.,
attrition
• Missing data mechanism – why values are
missing and the association between missing
values and treatment outcomes
– Missing completely at random (MCAR)
– Missing at random (MAR)
– Missing not at random (MNAR)
49
Missing Data Methodologies
• Commonly used methods under MAR (or MCAR)
1.
2.
Complete case - discard incomplete cases
Imputation – fill-in missing values
- Single imputation (e.g., mean, baseline, LOCF, BOCF)
-
Underestimate standard errors and yield biased estimates
-
Multiple imputation – takes into account uncertainty
in imputations
3. Analyze incomplete data using method that does not
require complete set
- Maximum likelihood
- Bayesian methods
50
Problem with single imputation methods
51
Analytic Methods under MNAR
• Sensitivity analyses to estimate degree of bias (see
Enders, 2010)
• Pattern Mixture Models – assume that the substantive
data are conditional on the missing data mechanism. The
conditional model of the observed data is estimated
separately for each missing data pattern.
• Selection models – assume that the missing data
mechanism is conditional on the substantive data.
Observed data are used to predict the probability of
missing data.
52
Describing missing data procedures:
“For the proposed study we will use maximum
likelihood estimation for all analyses, which
provides the variance-covariance matrix for all
available data and is a preferred method for
estimation when some data are missing (Schafer &
Graham, 2002). Sensitivity analyses will be used
to test the influence of missing data and missing
data assumptions (see Witkiewitz et al., 2012).”
53
Common pitfalls
1.
2.
3.
4.
5.
Data analysis plan does not match rest of proposal.
Cannot address aims or answer hypotheses
Not consistent with measures/research design
Ignore critical issues or makes unrealistic assumptions.
Effect size over-estimated or not included
Assumes data will be normally distributed, when unlikely
Does not address missing data and attrition biases
Measurement invariance when comparing across groups
Propose complicated models without statistical expertise in the
research team.
Propose to test models without clear hypothesis or rationale.
Include complex measures (e.g., time varying covariates, genetic
information, imaging data), without clear description of how the
measures will be analyzed or included in the analyses.
54
References
•
•
•
•
•
•
•
•
•
•
•
Sim & Lewis (2012). The size of a pilot study for a clinical trial should be calculated in relation to
considerations of precision and efficiency. J of Clinical Epidemiology, 65, 301-308.
Vickers, A. J. (2005). Parametric versus non-parametric statistics in the analysis of randomized
trials with non-normally distributed data. BMC Medical Research Methodology, 5, 35.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.
Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17, 137-152.
MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., & Sheets, V. (2002). A
comparison of methods to test the significance of the mediated effect. Psychological Methods,
7(1), 83-104.
Preacher, K.J., Rucker, D.D., & Hayes, A.F. (2007). Addressing moderated mediation hypotheses:
Theory, methods, and prescriptions. Multivariate Behavioral Research, 42, 185-227.
Kazdin, A.E. (2007). Mediators and mechanisms of change in psychotherapy research. Annual
Review of Clinical Psychology, 3, 1-27.
National Research Council. (2010). The Prevention and Treatment of Missing Data in Clinical
Trials. Panel on Handling Missing Data in Clinical Trials. Washington, DC: The National
Academies Press.
Enders, C.K. (2010). Applied missing data analysis. New York: Guilford Press.
Schafer, J.L. & J.W. Graham, J. W. (2002). Missing Data: Our View of the State of the Art.
Psychological Methods, 7, 147-177.
Witkiewitz, K., Bush, T., Magnusson, L. B., Carlini, B. H., & Zbikowski, S. M. (2012). Trajectories
of cigarettes per day during the course of telephone tobacco cessation counseling services: A
comparison of missing data models. Nicotine and Tobacco Research. doi: 10.1093/ntr/ntr291
55
Download