Study

advertisement
Statistical Analysis of Single Case
Design
Serial Dependence Is More than
Needing Cheerios for Breakfast
Goal of Presentation
• Review concept of effect size
• Describe issues in using effect size concept for
single case design
• Describe different traditional approaches to
calculating effect size for single case design
• Illustrate one recent approach
Behavioral Intervention Research
3
Putting A Coat on Your Dog Before
Going For a Walk
How We Have Gotten To This Meeting
• Long history of statistical analysis of SCD
• Criticism of quality of educational research
(Shalvelson & Towne, 2002)
• Establishment of IES
– Initial resistance to SCD
• Influence of professional groups
• IES willingness to fund research on statistical
analysis
Concept of Effect Size of Study
• Effect size is a statistic quantifying the extent
to which sample statistics diverge from the
null hypotheses
(Thompson, 2006)
Types of ESs for Group Design
• Glass Δ = (Me – Mc) / SDc
• Cohen’s d = (Me – Mc)/ Sdpooled
• Interpretation
– Small = .20
– Medium = .50
– Large = .80
• R2
• Eta2 = SSeffect/SStotal
Statistical Analysis Antithetical To
Single Case Design (SCD)?
• Original developers believed that socially
important treatment effects have to be large
enough to be reliably detected by visual
inspection of the data.
Kazdin (2011) proposes
• Visual inspection less trustworthy when effects at
not crystal clear
• Serial dependence may obscure visual analysis
• Detection of small effects may lead to
understanding that could in turn lead to large
effects
• Statistical analysis may generate ESs that allow
one to answer more precise questions
• Effects for different types of individuals
• Experimenter effects
Example: PRT and Meta-Analysis
(Shadish, 2012)
• Pivotal Response Training (PRT) for Childhood
Autism
• 18 studies containing 91 SCD’s.
• For this example, to meet the assumptions of the
method, the preliminary analysis:
– Used only the 14 studies with at least 3 cases (66
SCDs).
– Kept only the first baseline and PRT treatment phases,
eliminating studies with no baseline
• After computing 14 effect sizes (one for each
study), he used standard random effects metaanalytic methods to summarize results:
Results
------- Distribution Description --------------------------------N
Min ES
Max ES
Wghtd SD
14.000
.181
2.087
.374
------- Fixed & Random Effects Model ----------------------------Mean ES
-95%CI
+95%CI
SE
Z
P
Fixed
.4878
.3719
.6037
.0591
8.2485
.0000
Random
.6630
.4257
.9002
.1210
5.4774
.0000
------- Random Effects Variance Component -----------------------v
=
.112554
------- Homogeneity Analysis ------------------------------------Q
df
p
39.9398
13.0000
.0001
Random effects v estimated via noniterative method of moments.
I2 = 67.5%
The results are of the order of magnitude that we commonly see in meta-analyses of
between groups studies
Studies done at UCSB (=0) or elsewhere (=1)
------- Analog ANOVA table (Homogeneity Q) ------Q
df
p
Between
3.8550
1.0000
.0496
Of course, we have no idea
Within
16.8138
12.0000
.1567
why studies done at UCSB
Total
20.6688
13.0000
.0797
produce larger effects:
------- Q by Group ------Group
Qw
df
.0000
1.9192
3.0000
1.0000 14.8947
9.0000
• different kinds of patients?
• different kinds of outcomes?
But the analysis does illustrate
one way to explore
heterogeneity
p
.5894
.0939
------- Effect Size Results Total
Mean ES
SE
-95%CI
Total
.6197
.0980
.4277
------+95%CI
.8118
------- Effect Size Results by Group ------Group Mean ES
SE
-95%CI
+95%CI
.0000
1.0228
.2275
.5769
1.4686
1.0000
.5279
.1086
.3151
.7407
Z
6.3253
Z
4.4965
4.8627
P
.0000
P
.0000
.0000
k
14.0000
k
4.0000
10.0000
------- Maximum Likelihood Random Effects Variance Component ------v
=
.05453
se(v) =
.04455
Search for the Holy Grail of Effect Size
Estimators
• No single approach agreed upon: (40+ have
been identified, Swaminathan et al., 2008)
• Classes of approaches
– Computational approaches
– Randomization test
– Regression approaches
– Tau-U (Parker et al., 2011) as combined approach
Computational Approaches
• Percentage of Nonoverlapping Datapoints
(PND) (Scruggs, Mastropieri, & Casto, 1987)
• Percentage of Zero Data (Campbell, 2004)
• Improvement Rate Difference (Parker,
Vannest, & Brown, 2009)
ABAB 7
Evaluate for LEVEL
Evaluate for TREND
Level of Experimental Control
No Exp Control
1
2
3
4
Publishable
5
6
Strong Exp Control
7
Problem with phases
Randomization Test
• Edgington (1975, 1980) advocated strongly for
use of nonparametric randomization tests.
– Involves selection of comparison points in the
baseline and treatment conditions
– Requires random start day for participants (could be
random assignment of participants in MB design,
Wampold & Worsham, 1986)
• Criticized for SDC
– Large Type I Error rate (Haardofer & Gagne, 2010)
– Not robust to independence assumption and
sensitivity low for data series < 30 to 40 datapoints
(Manolov & Solanas, 2009)
ABAB 7
Evaluate for LEVEL
Evaluate for TREND
Level of Experimental Control
No Exp Control
1
2
3
4
Publishable
5
6
Strong Exp Control
7
Regression (Least Squares Approaches)
• ITSACORR (Crosbie, 1993)
– Interrupted time series analysis
– Criticized for not being correlated with other
methods
• White, Rusch, Kazdin, & Hartmann (1989) Last
day of Treatment Comparison (LDT)
– Compares two LDT for baseline and treatment
– Power weak because of lengthy predictions
Regression Analyses
• Mean shift and mean-plus-trend model (Center,
Skiba, & Casey; 1985-86)
• Ordinary least squares regression analysis (Allison
& Gorman, 1993)
– Both approaches attempt to control for trends in
baseline when examining the performances in
treatment
• d-Estimator (Shadish, Hedges, Rinscoff, 2012)
• GLS with removal of autocorrelaiton
(Swaminathan, Horner, Rogers, & Sugai, 2012)
Tau-U
(Parker, Vannest, Javis, & Sauber, 2011)
• Mann-Whitney U a nonparametric that
compares individual data point in groups (AB
comparisons)
• Kendal’s Tau does these same thing for trend
within groups
• Tau-U
– Tests and control for trend in A phase
– Test for differences in A and B phases
– Test and adjust for tend in B phase
ABAB 7
Evaluate for LEVEL
Evaluate for TREND
Level of Experimental Control
No Exp Control
1
2
3
4
Publishable
5
6
Strong Exp Control
7
Tau-U Calculator
http://www.singlecaseresearch.org/
Vannest, K.J., Parker, R.I., & Gonen, O. (2011). Single Case Research:
web based calculators for SCR analysis. (Version 1.0) [Web-based
application]. College Station, TX: Texas A&M University. Retrieved
Sunday 15th July 2012.
• Combines nonoverlap between phases with trend
from within intervention phases
– Will detect and allow researcher to control for
undesirable trend in baseline phase
• Data are easily entered on free website
• Generates a d for effects with trend withdrawn
when necessary
Themes: An accessible and feasible
effect size estimator
• As end users, SCD researchers need a tool that
we can use without having to consult our
statisticians
– Utility of the hand calculated trend line analysis
– Example of feasible tool, but criticized (ITSACORR<
Crosbie, 1993).
• Parker, Vannest, Davis, & Sauber (2011)
– Tau-U calculator
Theme: What is an effect—a d that
detect treatment or/and level effect
• If a single effect size is going to be generated
for an AB comparison: should the d be
reported separately for level (intercept) or
trend (slope)?
– If so, problematic for meta-analysis
• ES estimators here appear to provide a
combined effect for slope and intercept
• Parker et al. (2011) incorporate both
Theme: What comparisons get
included in the meta-analysis
• Should we only use the initial AB comparison
in ABAB Designs?
Theme: What comparisons get
included in the meta-analysis
• Should we only
include points at
which functional
relationship is
established?
Theme: How many effects sizes per
study?
Study
Subject
Moments
Study 1
Subj1 Subj2
m
mm m m
Study 2
…
Subj1
Subj1
m
30
m
Study K
Subj2 Subj3 Subj4
m m mm mm
m
Challenge: How do you handle zero
baselines or treatment phases?
Heterogeneity In SCD: A reality
• SCD Researchers used a range of different
designs and design combinations
• A look at current designs
– Fall, 2011 Issue of Journal of Applied Behavior
Analysis
Comparison of SDC and Group Design ESs:
The Apples and Oranges Issues
• Logic of casual inferences different
– Groups: Means differences between groups
– SCD: Replication of effects either within or across all
“participants”
– Generally d represents different comparison
• Data collected to document an effect different
– Group designs collect data before treatment and after
treatment
– SCDs collect data throughout treatment phase, so for
treatments that build performance across time, they
may appear less efficacious because they are including
“acquisition” phase effects in analysis
Conclusions
•
•
•
•
I learned a lot
Sophistication of analyses is increasing
Feasibility of using statistical analysis is improving
Can use statistical analysis as supplement to
visual inspection (Kazdin, 2011)
• Statistical analysis may not be for everybody, but
it is going to foster acceptability in the larger
education research community, and for that
reason SCD researchers should consider it.
Download