Statistical Analysis of Single Case Design Serial Dependence Is More than Needing Cheerios for Breakfast Goal of Presentation • Review concept of effect size • Describe issues in using effect size concept for single case design • Describe different traditional approaches to calculating effect size for single case design • Illustrate one recent approach Behavioral Intervention Research 3 Putting A Coat on Your Dog Before Going For a Walk How We Have Gotten To This Meeting • Long history of statistical analysis of SCD • Criticism of quality of educational research (Shalvelson & Towne, 2002) • Establishment of IES – Initial resistance to SCD • Influence of professional groups • IES willingness to fund research on statistical analysis Concept of Effect Size of Study • Effect size is a statistic quantifying the extent to which sample statistics diverge from the null hypotheses (Thompson, 2006) Types of ESs for Group Design • Glass Δ = (Me – Mc) / SDc • Cohen’s d = (Me – Mc)/ Sdpooled • Interpretation – Small = .20 – Medium = .50 – Large = .80 • R2 • Eta2 = SSeffect/SStotal Statistical Analysis Antithetical To Single Case Design (SCD)? • Original developers believed that socially important treatment effects have to be large enough to be reliably detected by visual inspection of the data. Kazdin (2011) proposes • Visual inspection less trustworthy when effects at not crystal clear • Serial dependence may obscure visual analysis • Detection of small effects may lead to understanding that could in turn lead to large effects • Statistical analysis may generate ESs that allow one to answer more precise questions • Effects for different types of individuals • Experimenter effects Example: PRT and Meta-Analysis (Shadish, 2012) • Pivotal Response Training (PRT) for Childhood Autism • 18 studies containing 91 SCD’s. • For this example, to meet the assumptions of the method, the preliminary analysis: – Used only the 14 studies with at least 3 cases (66 SCDs). – Kept only the first baseline and PRT treatment phases, eliminating studies with no baseline • After computing 14 effect sizes (one for each study), he used standard random effects metaanalytic methods to summarize results: Results ------- Distribution Description --------------------------------N Min ES Max ES Wghtd SD 14.000 .181 2.087 .374 ------- Fixed & Random Effects Model ----------------------------Mean ES -95%CI +95%CI SE Z P Fixed .4878 .3719 .6037 .0591 8.2485 .0000 Random .6630 .4257 .9002 .1210 5.4774 .0000 ------- Random Effects Variance Component -----------------------v = .112554 ------- Homogeneity Analysis ------------------------------------Q df p 39.9398 13.0000 .0001 Random effects v estimated via noniterative method of moments. I2 = 67.5% The results are of the order of magnitude that we commonly see in meta-analyses of between groups studies Studies done at UCSB (=0) or elsewhere (=1) ------- Analog ANOVA table (Homogeneity Q) ------Q df p Between 3.8550 1.0000 .0496 Of course, we have no idea Within 16.8138 12.0000 .1567 why studies done at UCSB Total 20.6688 13.0000 .0797 produce larger effects: ------- Q by Group ------Group Qw df .0000 1.9192 3.0000 1.0000 14.8947 9.0000 • different kinds of patients? • different kinds of outcomes? But the analysis does illustrate one way to explore heterogeneity p .5894 .0939 ------- Effect Size Results Total Mean ES SE -95%CI Total .6197 .0980 .4277 ------+95%CI .8118 ------- Effect Size Results by Group ------Group Mean ES SE -95%CI +95%CI .0000 1.0228 .2275 .5769 1.4686 1.0000 .5279 .1086 .3151 .7407 Z 6.3253 Z 4.4965 4.8627 P .0000 P .0000 .0000 k 14.0000 k 4.0000 10.0000 ------- Maximum Likelihood Random Effects Variance Component ------v = .05453 se(v) = .04455 Search for the Holy Grail of Effect Size Estimators • No single approach agreed upon: (40+ have been identified, Swaminathan et al., 2008) • Classes of approaches – Computational approaches – Randomization test – Regression approaches – Tau-U (Parker et al., 2011) as combined approach Computational Approaches • Percentage of Nonoverlapping Datapoints (PND) (Scruggs, Mastropieri, & Casto, 1987) • Percentage of Zero Data (Campbell, 2004) • Improvement Rate Difference (Parker, Vannest, & Brown, 2009) ABAB 7 Evaluate for LEVEL Evaluate for TREND Level of Experimental Control No Exp Control 1 2 3 4 Publishable 5 6 Strong Exp Control 7 Problem with phases Randomization Test • Edgington (1975, 1980) advocated strongly for use of nonparametric randomization tests. – Involves selection of comparison points in the baseline and treatment conditions – Requires random start day for participants (could be random assignment of participants in MB design, Wampold & Worsham, 1986) • Criticized for SDC – Large Type I Error rate (Haardofer & Gagne, 2010) – Not robust to independence assumption and sensitivity low for data series < 30 to 40 datapoints (Manolov & Solanas, 2009) ABAB 7 Evaluate for LEVEL Evaluate for TREND Level of Experimental Control No Exp Control 1 2 3 4 Publishable 5 6 Strong Exp Control 7 Regression (Least Squares Approaches) • ITSACORR (Crosbie, 1993) – Interrupted time series analysis – Criticized for not being correlated with other methods • White, Rusch, Kazdin, & Hartmann (1989) Last day of Treatment Comparison (LDT) – Compares two LDT for baseline and treatment – Power weak because of lengthy predictions Regression Analyses • Mean shift and mean-plus-trend model (Center, Skiba, & Casey; 1985-86) • Ordinary least squares regression analysis (Allison & Gorman, 1993) – Both approaches attempt to control for trends in baseline when examining the performances in treatment • d-Estimator (Shadish, Hedges, Rinscoff, 2012) • GLS with removal of autocorrelaiton (Swaminathan, Horner, Rogers, & Sugai, 2012) Tau-U (Parker, Vannest, Javis, & Sauber, 2011) • Mann-Whitney U a nonparametric that compares individual data point in groups (AB comparisons) • Kendal’s Tau does these same thing for trend within groups • Tau-U – Tests and control for trend in A phase – Test for differences in A and B phases – Test and adjust for tend in B phase ABAB 7 Evaluate for LEVEL Evaluate for TREND Level of Experimental Control No Exp Control 1 2 3 4 Publishable 5 6 Strong Exp Control 7 Tau-U Calculator http://www.singlecaseresearch.org/ Vannest, K.J., Parker, R.I., & Gonen, O. (2011). Single Case Research: web based calculators for SCR analysis. (Version 1.0) [Web-based application]. College Station, TX: Texas A&M University. Retrieved Sunday 15th July 2012. • Combines nonoverlap between phases with trend from within intervention phases – Will detect and allow researcher to control for undesirable trend in baseline phase • Data are easily entered on free website • Generates a d for effects with trend withdrawn when necessary Themes: An accessible and feasible effect size estimator • As end users, SCD researchers need a tool that we can use without having to consult our statisticians – Utility of the hand calculated trend line analysis – Example of feasible tool, but criticized (ITSACORR< Crosbie, 1993). • Parker, Vannest, Davis, & Sauber (2011) – Tau-U calculator Theme: What is an effect—a d that detect treatment or/and level effect • If a single effect size is going to be generated for an AB comparison: should the d be reported separately for level (intercept) or trend (slope)? – If so, problematic for meta-analysis • ES estimators here appear to provide a combined effect for slope and intercept • Parker et al. (2011) incorporate both Theme: What comparisons get included in the meta-analysis • Should we only use the initial AB comparison in ABAB Designs? Theme: What comparisons get included in the meta-analysis • Should we only include points at which functional relationship is established? Theme: How many effects sizes per study? Study Subject Moments Study 1 Subj1 Subj2 m mm m m Study 2 … Subj1 Subj1 m 30 m Study K Subj2 Subj3 Subj4 m m mm mm m Challenge: How do you handle zero baselines or treatment phases? Heterogeneity In SCD: A reality • SCD Researchers used a range of different designs and design combinations • A look at current designs – Fall, 2011 Issue of Journal of Applied Behavior Analysis Comparison of SDC and Group Design ESs: The Apples and Oranges Issues • Logic of casual inferences different – Groups: Means differences between groups – SCD: Replication of effects either within or across all “participants” – Generally d represents different comparison • Data collected to document an effect different – Group designs collect data before treatment and after treatment – SCDs collect data throughout treatment phase, so for treatments that build performance across time, they may appear less efficacious because they are including “acquisition” phase effects in analysis Conclusions • • • • I learned a lot Sophistication of analyses is increasing Feasibility of using statistical analysis is improving Can use statistical analysis as supplement to visual inspection (Kazdin, 2011) • Statistical analysis may not be for everybody, but it is going to foster acceptability in the larger education research community, and for that reason SCD researchers should consider it.