General Overview and History of Statistical Analysis of Single

advertisement
General Overview and History of
Statistical Analysis of Single-Case
Intervention Data
Joel R. Levin
University of Arizona
Historical Overview
Three Considerations (after Levin, Marascuilo, &
Hubert, 1978)
1. Why an investigator would want, or feel the need,
to conduct a formal statistical analysis
• scholarly research vs. client-centered focus
• investigator’s purpose and desired inferences
• generalizations from a single-case study
Levin, J. R., Marascuilo, L. A., & Hubert, L. J. (1978). N = nonparametric
randomization tests. In T. R. Kratochwill (Ed.), Single subject research:
Strategies for evaluating change (pp. 167-196). New York: Academic
Press.
Historical Overview
2. Whether any statistical analysis should be
conducted
• predefined outcome criteria
• visual analysis options
• validity issues
– external: generalization, sampling
– internal: design features, experimental
control
3. Which statistical procedure, of many potentially
available, is the most appropriate one to use for the
particular study/design?
• distributional assumptions
• statistical properties
• the most important criteria to be satisfied
Levin, J. R., Marascuilo, L. A., & Hubert, L. J. (1978). N = nonparametric randomization tests.
In T. R. Kratochwill (Ed.), Single subject research: Strategies for evaluating change (pp.
167-196). New York: Academic Press.
The Statistical Elephant in the Room:
The Independence Assumption
• Of tea tastes (Fisher, 1935) and t tests
• From Shine (Shine & Bower, 1971) to Gentile,
Roden, & Klein (1972) and beyond (e.g.,
Kratochwill et al., 1974)
• As reflected by the series autocorrelation (AC)
coefficient, or the degree of serial dependence
in the data
Fisher, R. A. (1935). The design of experiments. New York: Hafner.
Gentile, J. R., Roden, A. H., & Klein, R. D. (1972). An analysis of variance model for the
intrasubject replication design. Journal of Applied BehaviorAnalysis, 5, 193-198.
Kratochwill, T. R., Alden, K., Demuth, D., Dawson, 0., Panicucci, C., Arntson, P.,
McMurray, N., Hempstead, J., & Levin, J. (1974). A further consideration in the
application of an analysis-of-variance model for the intrasubject replication design.
Journal of Applied Behavior Analysis, 7, 629-633.
Shine, L. C., & Bower, S. M. (1971). A one-way analysis of variance for single-subject
designs. Educational and Psychological Measurement, 31, 105-113.
First-Order Autocorrelation = .50
Behavior
2
1
Time
First-Order Autocorrelation = 0
Behavior
2
1
Tim e
From Levin, J. R., Lall, V. F., & Kratochwill, T. R. (2011). Extensions of a versatile
randomization test for assessing single-case intervention effects. Journal of School
Psychology, 49, 55-79.
The Statistical Elephant in the Room:
The Independence Assumption
Empirical evidence, based on theory (e.g.,
Crosbie, 1987; Toothaker et al., 1983)
But what exactly did these researchers find?
And what does that have to do with the price of t
in time-series intervention studies?
Let’s take a look, in our own mini “Monte Carlo
simulation study…
Autocorrelation Implications for an AB
Intervention Study: A Simulation Example
(With Special Thanks to John Ferron)
20 simulation “studies” were conducted with 30
outcome observations apiece.
Each observation was drawn from a normal
distribution with a mean of 0 and a standard
deviation of 1 (i.e., normally distributed z scores).
Each “study” of 30 observations was split into two
halves, with the first 15 representing a baseline
(A) phase and the second 15 representing a B
(intervention) phase.
A Little Food for Thought
In each of the 20 AB “studies,” the mean of the 15
A-phase observations and the mean of the 15 Bphase observations are calculated and
statistically compared on the basis of a
conventional two independent-sample t test
based on a Type I error probability (α) per study
of .10.
Given the information presented so far, what
statistical results would you expect to find in the
20 “studies”?
A Little More Food to Chew On
OK, now consider this additional information:
In 10 of the “studies,” the 30 observations were
generated assuming that they were mutually
independent (i.e., as reflected by an
autcorrelation of 0).
In the other 10 “studies,” the 30 observations
were generated assuming that they were serially
dependent (as reflected by an autcorrelation of
.40).
What would you expect of the results now? Or
would you prefer to phone a personal friend who
is also a professional statistician?
A simulation with ρ = 0: r = -.018, p = .991
A Phase Mean = -.27 (47.3)
B Phase Mean = -.26 (47.4)
(Note: 20 changes of direction in the series)
A simulation with ρ = .40: r = .575, p = .025
In Addition
For the 10 AC = 0 simulations, with a nominal α
of .10, only 1 resulted in rejecting the “no phasemean difference” hypothesis ‒ thereby yielding an
empirical α of 1/10 = .10.
whereas
For the 10 AC = .40 simulations, with a nominal α
of .10, 4 of them resulted in rejecting the “no
phase-mean difference” hypothesis ‒ thereby
yielding an empirical α of 4/10 = .40.
A coincidence perhaps???? Or are we entering
the twilight zone????
The Statistical Elephant in the Room:
The Independence Assumption
• But what about in practice?
– point (Busk & Marascuilo, 1988; Sharpley &
Alavosius, 1988)
– counterpoint (Huitema, 1985, 1988; Shadish &
Sullivan, 2011)
– what’s the point? (Baer, 1988; Parsonson & Baer,
1992)
• Other autocorrelation issues
– how best to estimate it? (e.g., Crosbie, 1993;
Hedges, Pustejovsky, & Shadish, 2012; Riviello &
Beretvas, 2011)
Some Autocorrelation References
Baer, D. M. (1988). An autocorrelated commentary on the need for a different debate. Behavioral
Assessment, 10, 295-298.
Busk, P. L., & Marascuilo, L. A. (1988). Autocorrelation in single-subject research: A
counterargument to the myth of no autocorrelation. Behavioral Assessment, 10, 229-242.
Crosbie, J. (1987). The inability of the binomial test to control Type I error with single-subject data.
Behavioral Assessment, 9, 141-146.
Crosbie, J. (1993). Interrupted time-series analysis with brief single-subject data. Journal of
Consulting and Clinical Psychology, 61, 966-974.
Huitema, B. E. (1985). Autocorrelation in applied behavior analysis: A myth. Behavioral
Assessment, 7, 107-118.
Huitema, B. E. (1988). Autocorrelation: 10 years of confusion. Behavioral Assessment, 10, 253297.
Parsonson, B. S., & Baer, D. M. (1992). The visual analysis of data, and current research into the
stimuli controlling it. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case research design and
analysis: New developments for psychology and education (pp. 15-40). Hillsdale, NJ: Eribaum.
Riviello, C., & Beretvas, S. N. (2011). Detecting lag-one autocorrelation in interrupted time series
experiments with small datasets. Unpublished manuscript, University of Texas, Austin.
Shadish, W. R., Rindskopf, D. M., Hedges, L. V., & Sullivan, K. J. (2012). Bayesian estimates of
autocorrelations in single-case designs. Behavior Research Methods, 4(4). DOI
10.3758/s13428-012-0282-1.
Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess
intervention effects in 2008. Behavior Research Methods, 43, 971-980.
Sharpley, C. F., & Alavosius, M. P. (1988). Autocorrelation in behavioral data: An alternative
perspective. Behavioral Assessment, 10, 243-251.
Toothaker, L., E., Banz, M., Noble, C., Camp, J., & Davis, 0. (1983). N = 1 designs: The failure of
ANOVA-based tests. Journal of Educational Statistics, 8, 289-309.
So Who Cares About Elephants Anyway?
• Reflect back on our three initiating questions
• Why?
– The hope for improved data-analysis
objectivity and reliability, as well as more
“valid” conclusions about intervention effects?
– But always keep in mind the IOTT!
• Whether?
In light of the investigator’s purposes, is
statistical analysis really necessary?
So Who Cares About Elephants Anyway?
• Which?
– Statistical-conclusion validity issues
• Acceptable Type I error control
• Acceptable statistical power
• The potential to yield informative
confidence intervals and effect-size
measures
Proposed Statistical Analysis Strategies
Standard parametric statistical methods (e.g., t
test, analysis of variance, conventional
regression analysis, binomial test)
– problems problems everywhere…
• Time-series analysis (e.g., ARIMA models)
– concepts and basics (McCleary & Welsh,
1992)
– problems, and problems within problems
(Crosbie, 1993)
– current limitations
•
The Elephant Plods Along to This Day:
Some Published Examples
Sometimes masquerading in esoteric
international costumes:
Lancioni et al.’s (2009) two-sample (A-phase vs.
B-phase observations) Kolmogorov-Smirnov test
Lancioni, G. E. et al. (2009). A learning assessment procedure to
re-evaluate three persons with a diagnosis of post-coma
vegetative state and pervasive motor impairment. Brain Injury, 23,
154-162.
The Elephant Plods Along to This Day:
Some Published Examples
Take a gander at this:
Wadnerkar et al.’s (2011) chi square “goodness
of fit” test applied to a single case’s pre- and postintervention categorized (and correlated) eyegaze frequency counts – complete with
computational errors.
Wadnerkar, M. B., et al. (2011). A single case study of a familycentred intervention with a young girl with cerebral palsy who is a
multimodal communicator. Child: Care, Health, and Development,
38, 87-97.
Figure 2. Frequency of codes for girl’s multimodal communicative behaviours
at pre and post intervention. AAC, augmentative and alternative
communication.
The Elephant Plods Along to This Day:
Some Published Examples
How about an impressive combination of
“nonparametric” techniques?
Bragard et al.’s (2012) McNemar test of
proportions and Wilcoxon test applied to 24
individual pre- and posttest items on a picturenaming task.
Bragard, A. et al. (2012). Word-finding intervention for children
with specific language impairment: A multiple single-case study.
Language, Speech, and Hearing Services in the Schools, 43, 222234.
Proposed Statistical Analysis Strategies
Adapted regression-based, HLM, and GAM models
(e.g., Beretvas & Chung, 2008; Maggin, Swaminathan,
Rogers, O’Keeffe, Sugai, & Horner, 2011; Rindskopf &
Ferron, 2014; Shadish et al., 2014)
– potential advantages, but wait and see
– current limitations
• Nonparametric (permutation-based) analyses (e.g.,
Edgington & Onghena, 2007; Ferron & Levin, 2014)
– potential advantages
– current limitations
• Editorial comment: Tradeoffs between statistical
analysis elegance/complexity and
parsimony/comprehensibility
•
Proposed Statistical Analysis Strategies: References
Beretvas, S. N., & Chung, H. (2008). An evaluation of modified R2-change effect size
indices for single-subject experimental designs. Evidence-Based Communication
Assessment and Intervention, 2:3, 129-128.
Edgington, E. S., & Onghena, P. (2007). Randomization tests (4th ed.) Boca Raton, FL:
Chapman & Hall/CRC.
Ferron, J. M., & Levin, J. R. (2014). Single-case permutation and randomization
statistical tests: Present status, promising new developments. (pp. 153-183). In T. R.
Kratochwill & J. R. Levin (Eds.), Single-case intervention research: Methodological
and statistical advances. Washington, DC: American Psychological Association.
Maggin, D. M., Swaminathan, H., Rogers, H. J., O’Keeffe, B. V., Sugai, G., & Horner, R.
H. (2011). A generalized least squares regression approach for computing effect
sizes in single-case research: Application examples. Journal of School Psychology,
49, 301-321.
McCleary, R., & Welsh, W. N. (1992). Philosophical and statistical foundations of timeseries experiments. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case research
design and analysis: New developments for psychology and education (pp. 41-91).
Hillsdale, NJ: Erlbaum.
Rindskopf, D. M., & Ferron, J. M. (2014). Using multilevel models to analyze singe-case
design data (pp. 221-246). In T. R. Kratochwill & J. R. Levin (Eds.), Single-case
intervention research: Methodological and statistical advances. Washington, DC:
American Psychological Association.
Shadish, W. R., et al. (2014). Analyzing single-case designs: d, G, hierarchical models,
Bayesian estimators, and the hopes and fears of researchers about analyses (pp.
247-281). In T. R. Kratochwill & J. R. Levin (Eds.), Single-case intervention research:
Methodological and statistical advances. Washington, DC: American Psychological
Association.
“True” Single-Case Applications and
Classroom-Based Applications
• Statistical issues
─ consideration of the autocorrelation “elephant” in
each, vis-à-vis statistical properties
─ this and other statistical-conclusion validity issues
are currently being studied for different statisticalanalysis strategies applied to various single-case
designs
Download