General Overview and History of Statistical Analysis of Single-Case Intervention Data Joel R. Levin University of Arizona Historical Overview Three Considerations (after Levin, Marascuilo, & Hubert, 1978) 1. Why an investigator would want, or feel the need, to conduct a formal statistical analysis • scholarly research vs. client-centered focus • investigator’s purpose and desired inferences • generalizations from a single-case study Levin, J. R., Marascuilo, L. A., & Hubert, L. J. (1978). N = nonparametric randomization tests. In T. R. Kratochwill (Ed.), Single subject research: Strategies for evaluating change (pp. 167-196). New York: Academic Press. Historical Overview 2. Whether any statistical analysis should be conducted • predefined outcome criteria • visual analysis options • validity issues – external: generalization, sampling – internal: design features, experimental control 3. Which statistical procedure, of many potentially available, is the most appropriate one to use for the particular study/design? • distributional assumptions • statistical properties • the most important criteria to be satisfied Levin, J. R., Marascuilo, L. A., & Hubert, L. J. (1978). N = nonparametric randomization tests. In T. R. Kratochwill (Ed.), Single subject research: Strategies for evaluating change (pp. 167-196). New York: Academic Press. The Statistical Elephant in the Room: The Independence Assumption • Of tea tastes (Fisher, 1935) and t tests • From Shine (Shine & Bower, 1971) to Gentile, Roden, & Klein (1972) and beyond (e.g., Kratochwill et al., 1974) • As reflected by the series autocorrelation (AC) coefficient, or the degree of serial dependence in the data Fisher, R. A. (1935). The design of experiments. New York: Hafner. Gentile, J. R., Roden, A. H., & Klein, R. D. (1972). An analysis of variance model for the intrasubject replication design. Journal of Applied BehaviorAnalysis, 5, 193-198. Kratochwill, T. R., Alden, K., Demuth, D., Dawson, 0., Panicucci, C., Arntson, P., McMurray, N., Hempstead, J., & Levin, J. (1974). A further consideration in the application of an analysis-of-variance model for the intrasubject replication design. Journal of Applied Behavior Analysis, 7, 629-633. Shine, L. C., & Bower, S. M. (1971). A one-way analysis of variance for single-subject designs. Educational and Psychological Measurement, 31, 105-113. First-Order Autocorrelation = .50 Behavior 2 1 Time First-Order Autocorrelation = 0 Behavior 2 1 Tim e From Levin, J. R., Lall, V. F., & Kratochwill, T. R. (2011). Extensions of a versatile randomization test for assessing single-case intervention effects. Journal of School Psychology, 49, 55-79. The Statistical Elephant in the Room: The Independence Assumption Empirical evidence, based on theory (e.g., Crosbie, 1987; Toothaker et al., 1983) But what exactly did these researchers find? And what does that have to do with the price of t in time-series intervention studies? Let’s take a look, in our own mini “Monte Carlo simulation study… Autocorrelation Implications for an AB Intervention Study: A Simulation Example (With Special Thanks to John Ferron) 20 simulation “studies” were conducted with 30 outcome observations apiece. Each observation was drawn from a normal distribution with a mean of 0 and a standard deviation of 1 (i.e., normally distributed z scores). Each “study” of 30 observations was split into two halves, with the first 15 representing a baseline (A) phase and the second 15 representing a B (intervention) phase. A Little Food for Thought In each of the 20 AB “studies,” the mean of the 15 A-phase observations and the mean of the 15 Bphase observations are calculated and statistically compared on the basis of a conventional two independent-sample t test based on a Type I error probability (α) per study of .10. Given the information presented so far, what statistical results would you expect to find in the 20 “studies”? A Little More Food to Chew On OK, now consider this additional information: In 10 of the “studies,” the 30 observations were generated assuming that they were mutually independent (i.e., as reflected by an autcorrelation of 0). In the other 10 “studies,” the 30 observations were generated assuming that they were serially dependent (as reflected by an autcorrelation of .40). What would you expect of the results now? Or would you prefer to phone a personal friend who is also a professional statistician? A simulation with ρ = 0: r = -.018, p = .991 A Phase Mean = -.27 (47.3) B Phase Mean = -.26 (47.4) (Note: 20 changes of direction in the series) A simulation with ρ = .40: r = .575, p = .025 In Addition For the 10 AC = 0 simulations, with a nominal α of .10, only 1 resulted in rejecting the “no phasemean difference” hypothesis ‒ thereby yielding an empirical α of 1/10 = .10. whereas For the 10 AC = .40 simulations, with a nominal α of .10, 4 of them resulted in rejecting the “no phase-mean difference” hypothesis ‒ thereby yielding an empirical α of 4/10 = .40. A coincidence perhaps???? Or are we entering the twilight zone???? The Statistical Elephant in the Room: The Independence Assumption • But what about in practice? – point (Busk & Marascuilo, 1988; Sharpley & Alavosius, 1988) – counterpoint (Huitema, 1985, 1988; Shadish & Sullivan, 2011) – what’s the point? (Baer, 1988; Parsonson & Baer, 1992) • Other autocorrelation issues – how best to estimate it? (e.g., Crosbie, 1993; Hedges, Pustejovsky, & Shadish, 2012; Riviello & Beretvas, 2011) Some Autocorrelation References Baer, D. M. (1988). An autocorrelated commentary on the need for a different debate. Behavioral Assessment, 10, 295-298. Busk, P. L., & Marascuilo, L. A. (1988). Autocorrelation in single-subject research: A counterargument to the myth of no autocorrelation. Behavioral Assessment, 10, 229-242. Crosbie, J. (1987). The inability of the binomial test to control Type I error with single-subject data. Behavioral Assessment, 9, 141-146. Crosbie, J. (1993). Interrupted time-series analysis with brief single-subject data. Journal of Consulting and Clinical Psychology, 61, 966-974. Huitema, B. E. (1985). Autocorrelation in applied behavior analysis: A myth. Behavioral Assessment, 7, 107-118. Huitema, B. E. (1988). Autocorrelation: 10 years of confusion. Behavioral Assessment, 10, 253297. Parsonson, B. S., & Baer, D. M. (1992). The visual analysis of data, and current research into the stimuli controlling it. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case research design and analysis: New developments for psychology and education (pp. 15-40). Hillsdale, NJ: Eribaum. Riviello, C., & Beretvas, S. N. (2011). Detecting lag-one autocorrelation in interrupted time series experiments with small datasets. Unpublished manuscript, University of Texas, Austin. Shadish, W. R., Rindskopf, D. M., Hedges, L. V., & Sullivan, K. J. (2012). Bayesian estimates of autocorrelations in single-case designs. Behavior Research Methods, 4(4). DOI 10.3758/s13428-012-0282-1. Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods, 43, 971-980. Sharpley, C. F., & Alavosius, M. P. (1988). Autocorrelation in behavioral data: An alternative perspective. Behavioral Assessment, 10, 243-251. Toothaker, L., E., Banz, M., Noble, C., Camp, J., & Davis, 0. (1983). N = 1 designs: The failure of ANOVA-based tests. Journal of Educational Statistics, 8, 289-309. So Who Cares About Elephants Anyway? • Reflect back on our three initiating questions • Why? – The hope for improved data-analysis objectivity and reliability, as well as more “valid” conclusions about intervention effects? – But always keep in mind the IOTT! • Whether? In light of the investigator’s purposes, is statistical analysis really necessary? So Who Cares About Elephants Anyway? • Which? – Statistical-conclusion validity issues • Acceptable Type I error control • Acceptable statistical power • The potential to yield informative confidence intervals and effect-size measures Proposed Statistical Analysis Strategies Standard parametric statistical methods (e.g., t test, analysis of variance, conventional regression analysis, binomial test) – problems problems everywhere… • Time-series analysis (e.g., ARIMA models) – concepts and basics (McCleary & Welsh, 1992) – problems, and problems within problems (Crosbie, 1993) – current limitations • The Elephant Plods Along to This Day: Some Published Examples Sometimes masquerading in esoteric international costumes: Lancioni et al.’s (2009) two-sample (A-phase vs. B-phase observations) Kolmogorov-Smirnov test Lancioni, G. E. et al. (2009). A learning assessment procedure to re-evaluate three persons with a diagnosis of post-coma vegetative state and pervasive motor impairment. Brain Injury, 23, 154-162. The Elephant Plods Along to This Day: Some Published Examples Take a gander at this: Wadnerkar et al.’s (2011) chi square “goodness of fit” test applied to a single case’s pre- and postintervention categorized (and correlated) eyegaze frequency counts – complete with computational errors. Wadnerkar, M. B., et al. (2011). A single case study of a familycentred intervention with a young girl with cerebral palsy who is a multimodal communicator. Child: Care, Health, and Development, 38, 87-97. Figure 2. Frequency of codes for girl’s multimodal communicative behaviours at pre and post intervention. AAC, augmentative and alternative communication. The Elephant Plods Along to This Day: Some Published Examples How about an impressive combination of “nonparametric” techniques? Bragard et al.’s (2012) McNemar test of proportions and Wilcoxon test applied to 24 individual pre- and posttest items on a picturenaming task. Bragard, A. et al. (2012). Word-finding intervention for children with specific language impairment: A multiple single-case study. Language, Speech, and Hearing Services in the Schools, 43, 222234. Proposed Statistical Analysis Strategies Adapted regression-based, HLM, and GAM models (e.g., Beretvas & Chung, 2008; Maggin, Swaminathan, Rogers, O’Keeffe, Sugai, & Horner, 2011; Rindskopf & Ferron, 2014; Shadish et al., 2014) – potential advantages, but wait and see – current limitations • Nonparametric (permutation-based) analyses (e.g., Edgington & Onghena, 2007; Ferron & Levin, 2014) – potential advantages – current limitations • Editorial comment: Tradeoffs between statistical analysis elegance/complexity and parsimony/comprehensibility • Proposed Statistical Analysis Strategies: References Beretvas, S. N., & Chung, H. (2008). An evaluation of modified R2-change effect size indices for single-subject experimental designs. Evidence-Based Communication Assessment and Intervention, 2:3, 129-128. Edgington, E. S., & Onghena, P. (2007). Randomization tests (4th ed.) Boca Raton, FL: Chapman & Hall/CRC. Ferron, J. M., & Levin, J. R. (2014). Single-case permutation and randomization statistical tests: Present status, promising new developments. (pp. 153-183). In T. R. Kratochwill & J. R. Levin (Eds.), Single-case intervention research: Methodological and statistical advances. Washington, DC: American Psychological Association. Maggin, D. M., Swaminathan, H., Rogers, H. J., O’Keeffe, B. V., Sugai, G., & Horner, R. H. (2011). A generalized least squares regression approach for computing effect sizes in single-case research: Application examples. Journal of School Psychology, 49, 301-321. McCleary, R., & Welsh, W. N. (1992). Philosophical and statistical foundations of timeseries experiments. In T. R. Kratochwill & J. R. Levin (Eds.), Single-case research design and analysis: New developments for psychology and education (pp. 41-91). Hillsdale, NJ: Erlbaum. Rindskopf, D. M., & Ferron, J. M. (2014). Using multilevel models to analyze singe-case design data (pp. 221-246). In T. R. Kratochwill & J. R. Levin (Eds.), Single-case intervention research: Methodological and statistical advances. Washington, DC: American Psychological Association. Shadish, W. R., et al. (2014). Analyzing single-case designs: d, G, hierarchical models, Bayesian estimators, and the hopes and fears of researchers about analyses (pp. 247-281). In T. R. Kratochwill & J. R. Levin (Eds.), Single-case intervention research: Methodological and statistical advances. Washington, DC: American Psychological Association. “True” Single-Case Applications and Classroom-Based Applications • Statistical issues ─ consideration of the autocorrelation “elephant” in each, vis-à-vis statistical properties ─ this and other statistical-conclusion validity issues are currently being studied for different statisticalanalysis strategies applied to various single-case designs