Class 11 -12 Chapters 5 & Elkins (1989) Threats to Statistical Conclusion Validity Are the observed relations among variables accurate? Power Unreliability of Measures Introduces error variance Attenuates Correlations Unreliability of Treatment Implementation Specificity- Active ingredients Fidelity of delivery Competency Extraneous Variance in the Experimental Setting Heterogeneity of Participants 2 Threats to Internal Validity Can we conclude that there is a causal relation between the IV and the DV? Did treatment cause differences in DV across groups? Selection Inclusion –Exclusion criteria & Who gets assigned to which group? History Attrition What do we know about drop-outs? Repeated Testing Effects Reaction to Control Group Assignment Double- blind designs pharmaceutical studies Placebo effects – non-specific-factors vs active ingredient are responsible for observations Houston study ----------- 3 Department of Veterans Affairs (VA) and Baylor College of Medicine- Houston 180 osteoarthritis and knee pain patients randomly assigned to (New England J of Medicine, 2002): Debridement worn, torn,cartilage is cut and removed with viewing tube called an arthroscope Arthroscopic lavage bad cartilage is flushed out Simulated arthroscopic Surgery small incisions were made, but no instruments were inserted and no cartilage removed 4 Findings During two years of follow-up:, patients in all three groups reported moderate improvements in pain and ability to function. intervention groupsdid not report not less pain or better function than the placebo group. Placebo patients reported better outcomes than the debridement patients at certain points during follow-up. Patients were blind to type of surgery 5 Threats to Construct Validity To what extent variables capture desired constructs Mono-Operation Bias (Instruments) Mono-Method Bias Experimenter Expectancies Self-Report Clinician ragted Allegiance Effect 6 Threats to External Validity Can we generalize observed relations across persons, settings and times Person-Units Outcome Measures Settings 7 Elkin et al: Purpose Test feasibility of the collaborative clinical trial model Examine relative efficacy of CBT, IPT, and Medication for Depression 8 NIMH Treatment of Depression Collaborative Research Program U. of Pittsburg George Washington U. U. of Oklahoma 250 Patients: Major depressive disorder 28 therapists: years experience 2 -27; 10 psychologists 18 psychiatrists 71% male 9 Experimental Between-Group Designs 1. Post-Test Only Control 2. Pre-Test -- Post-Test Control 3. Solomon Four Group (combination of 1 and 2 above) Factorial Design more than one independent variable; interactions treatment X therapist or patient characteristic Dependent Sample Design (Matching) 10 Experimental Between-Group Designs 1. Post-Test Only Control 2. Pre-Test -- Post-Test Control 3. Solomon Four Group (combination of 1 and 2 above) Factorial Design - Post Hoc more than one independent variable; interactions treatment X patient characteristic (depression level at intake) Dependent Sample Design (Matching) 11 IVs: Experimental Groups: Cognitive Behavioral Therapy Interpersonal Therapy 16 individual sessions/ 50 min. Medication + Clinical Management* Pill-Placebo + Clinical Management* 1st session 55 min.; then 20 to 25 min. * Minimal supportive therapy condition 12 Dependent Variables Clinical Evaluator Self Report 13 Dependent Variables Clinical Evaluator • Hamilton Rating Scale Depression (HRSD) • Global Assessment Scale (GAS) Self Report • Beck Depression Inventory (BDI) • Hopkins Symptom Checklist (HSCL-90) 14 Outcome Research Strategies Primary Analyses Secondary Analyses (Post-Hoc) 15 1. 2. 3. 4. 5. 6. Treatment Package Strategy Dismantling Strategy Constructive Strategy Parametric Strategy (structural components) Comparative Outcome Strategy Client and Therapist Variation Strategy Moderation Designs Outcome Research Strategies Primary Analyses Treatment package Comparative Secondary Analyses Client Variation -moderation effect? 17 Outcome Research Strategies Secondary Analyses Client Variation -moderation effect depression level at intake as moderator differences between in outcomes treatment groups Were outcomes across treatment groups different for patients with higher versus lower levels of depression at pre-test? 18 Control Groups CBT IPT Medication + Clinical Management* Pill- Placebo + Clinical Management* * Minimal supportive therapy condition 19 Treatments & Therapists Cognitive Behavioral Therapy Different group of experienced therapists Interpersonal Therapy Medication + Clinical Mngmnt Pill-Placebo + Clinical Mngmnt Same therapists psychiatrists 20 Treatments & Therapists Cognitive Behavioral Therapy Different group of experienced therapists (potential confound) Interpersonal Therapy Medication + Clinical Mngmnt Pill-Placebo + Clinical Mngmnt Same therapists: psychiatrists (safeguards internal validity- undermines generalizability) 21 Ensure Valid Treatments Specify the treatment(s) Therapist training/monitoring Fidelity Checks 22 Ensure Valid Treatments Specify the treatment(s) Manuals Therapist training/monitoring Fidelity Checks- therapy tapes Collaborative Study Psychotherapy Rating Scale (CSPRS): Taped treatments could be discriminated 95% of the time 23 Attrition (>15 sessions or 12 weeks) Total: 77/239 32% CBT IPT Meds/CM Placebo/CM 32% 23% 33% 40% Early terminators more depressed at pre-test than completers. 24 Which group to use in outcome analysis?? Total N = 239 Completers N = 155 15 weeks or 12 sessions N = 204 At least 3.5 weeks or 4 sessions End Point Intent to Treat Group N = 239 (last assessment or pre-test) End-Point 25 Assessment Times Pre treatment Post Treatment 4, 8, 12 weeks Termination – 15 weeks Follow up: 6, 12, 18 months 26 Analyses of Pre-test/Post-test (1) Paired T-Test to examine differences between pre-test and post-test scores (p. 974) How Many ?? 27 Table 1 Completer Group: At least 12 sessions; n=155 (page 975) 28 Analyses of Pre-test/Post-test (1) Paired T-Test to examine differences between pre-test and post-test scores (p. 974) How Many ?? 4 Treatment groups X 4 Outcome measures CBT IPT IMI-CM Pla-CM HRSD GAS BDI HSCL-90 X 3 Samples – Completers; End Point 204; 239 29 Findings – T-Tests P.974 right 30 IVs: Experimental Groups: Cognitive Behavioral Therapy Interpersonal Therapy 16 individual sessions/ 50 min. Medication + Clinical Management* Pill-Placebo + Clinical Management* 1st session 55 min.; then 20 to 25 min. * Minimal supportive therapy condition 31 Analyses of Post-test scores Use pre-test as a covariate in analyses of covariance to compare mean post-test scores across the 4 treatment groups Calculate a residualized change score – amount of variability in the post-test that is not associated with the pre-test score Used a p<.10 in ANCOVAS and p =.10/6 =.01666=.017 pair-wise comparisons(6) Bonferroni correction (p.974) 32 Table 1 Completer Group: At least 12 sessions; n=155 (page 975) 33 ANCOVAS: Post test scores Statistically significant differences between groups in scales at post-test Four 3 X 4 ANCOVAS: differences across treatments in Post-treatment scores in: HRSD, GAS --- BDI, HSCL90 3 (sites) X 4 (treatment groups) Analyses reported only for treatment groups combining them across sites 34 Co-Variates Pre-test scores Marriage Status (1,2) Why not MANCOVAS? P.973 35 Table 1 Completer Group: At least 12 sessions; n=155 (page 975) p<.10 BDI -No significance differences in pair-wise comparisons 36 Table 1 End Point 239 Group CBT IPT IMI-CM PLA-CM p<.10 37 Findings Pair-wise Comparisons Sample Clinical Evaluator Self-Report BDI Pairwise NS HSCL-90-T p=.006 Completer N = 155 IMI-CM<PLA-CM EP-204 GAS IMI-CM<PLA-CM (trend p=.020--- .017) EP-239 HRSDep IPT, IMI-CM<PLA-CM (trend p=.017,.018) GAS p =.010 IMI-CM<PLA-CM 38 Measuring Change Elkin et al. 1989 Statistical significance Clinical significance Recovery Analysis 39 Measuring Change Elkin et al. 1989 Statistical significance Differences between groups in scales at post-test controlling for pre-test scores Clinical significance Percentage of participants that changed from dysfunctional to functional level (using cut-off scores) 40 Clinical Significance Recovery Analysis Proportion of patients who improved vs. not improved Cut Off Scores Not Depressed HRSD < 6 and BDI < 9 Depressed HRSD > 6 or BDI > 9 Statistical Analyses Chi square: Proportion of depressed and nondepressed patients across treatment groups at termination. 42 43 End Point 239 HRSD p = .04 CBT IPT IMI-CM P-CM Proportion of cases that met recovery criteria 36%(ns) Proportion of cases that met recovery criteria 43% Proportion of cases that met recovery criteria Proportion of cases that met recovery criteria 42% 21% Chi Square (Χ2) tests to what extent the proportion in each group is what may be expected by chance or if it is larger or smaller than expected……. IPT = IMI-CM>Placebo-CM CBT - % comparison was not sig. for any group 44 Completer Group on HRSD CBT IPT IMI-CM Proportion of cases that met recovery criteria 51% Proportion of cases that met recovery criteria 55% Proportion of cases that met recovery criteria 57% P-CM Proportion of cases that met recovery criteria 29% Chi Square (Χ2) tests to what extent the proportion in each group is what may be expected by chance or if it is larger or smaller than expected……. IPT, IMI-CM>Placebo-CM 45 Secondary Analyses To examine effect of pre-treatment severity (HRSD/GAS) on outcome by treatment group DVs: Post-treatment scores Severity Criteria HRSD>20 GAS<50 Covariate 44% of sample 41% Marital Status 46 2X4 ANCOVA (severity x treatment) DVs- Post Test HRSD, GAS, BDI, HSCL-90 Main Effect for Main Effect for (Interaction term)*** 47 2X4 ANCOVA (severity x treatment) DVs- Post Test HRSD, GAS, BDI, HSCL-90 Main Effect for Severity More Severe Pre-Test HRSD>20; GAS<50 Less Severe Pre-Test Main Effect for Treatment CBT IPT IMI-CM P-CM Severity X Treatment (interaction term)******* 48 Interaction Effect HRSD Severity x TG Dependent Variables: HRSD* GAS, BDI, HSCL-90 (p.976) Completer* Completer S CBT BDI IPT IPT IMI-CM IMI-CM P-CM P-CM CBT IPT IMI-CM CBT IPT High HRSD High Depression Low LowHRSD Depression End Point 204* P-CM High HRSD Low HRSD End Point 239^ IMI-CM P-CM High HRSD Low HRSD 4 sets of 3 2X4 Ancovas: 4DVs, 3 sample subgroups *p<.10; ^p<.11 49 Interaction Effect GAS Severity x TG: Dependent Variables: HRSD GAS, BDI, HSCL-90 Completer S Completer** BDI CBT IPT IPT IMI-CM IMI-CM P-CM P-CM High HighGAS Depression Low LowGAS Depression End Point 204**** CBT IPT IMI-CM P-CM High GAS Low GAS End Point 239* CBT IPT IMI-CM P-CM High GAS Low GAS 50 Treatment by Severity Interaction/end-point 204 sample Higher score Negative Outcome Higher Score Positive Outcome 51 Summary All Pairwise analyses following interaction effects p.976 Less severe groups: no differences across treatment groups More severe groups IPT more effective than PLA-CM in 3 instances all in the HRSD measure in the END Point Sample 204 (3 out of 4 comparisons) IMI-CM more effective than PLA-CM across a number of measures (8 out 10 comparisons) 52 Figure 2 Recovery Rates (%) endpoint /204 sample 53 Figure 2 Recovery Rates (%) endpoint /204 sample for severity groups (p.977) Less severe subgroups: NS differences among treatments for all samples with HRSD or GAS. More severe subgroups for HRSD and GAS: Consistent findings across the three samples IPT>PLA-CM 5/6 and IMI-CM>PLA-CM 6/6 54 Threats to Statistical Conclusion Validity Are the observed relations among variables accurate? Power Unreliability of Measures Unreliability of Treatment Implementation Extraneous Variance in the Experimental Setting Heterogeneity of Participants 55 Threats to Statistical Conclusion Validity Are the observed relations among variables accurate? • • • • Large N by group range 34-62 + Outcome measures are well-known + Power analyses 81-95% for medium effects + p<.10 for Mancovas and .10/6 for pairwise comp Unreliability of Treatment Implementation • • • • Experienced Therapists – 2-27yrs Mean = 11 + Manuals, training per treatment group + Closely monitored + Taped sessions – 95% correctly classified + Extraneous Variance in the Experimental Setting • • • • Not known for the most part 28 therapists from 3 – 11 patients each no way to control for therapist effects P. 980 one site CBT another site IPT similar to Meds/CM Heterogeneity of Participants • • • • Random assignment to groups + Only included 45% of those screened. + Mostly women 70% female + 89% white participants + Power Unreliability of Measures 56 Threats to Internal Validity Can we conclude that there is a causal relation between the IV and the DV? Did treatment cause differences in DV across groups? Selection Who gets assigned to which group? History Attrition What do we know about drop-outs? Repeated Testing Effects Reaction to Control Group Assignment 57 Threats to Internal Validity Can we conclude that there is a causal relation between the IV and the DV? Selection Used RandomizationSee factors under Heterogeneity of Participants History Time frame of study not reported Did therapy happen at about the same time for everyone? Attrition Relatively high attrition rates - 32% -- about 25% was for negative reasons related to treatment- (-) Early terminators were more depressed at intake (-) Repeated Testing Effects Tested at frequent intervals –’ pre-test, 4, 8, 12, weeks, termination 6 12 and 18 months follow-up Reaction to Control Group Assignment Not known – but could be the case. Placebo/CM experienced the highest attrition – 32% CBT—23% IPT – 33% Meds/CM -- 40% Placebo/CM 58 Threats to Construct Validity To what extent variables capture desired constructs Mono-Operation Bias (Instruments) Mono-Method Bias Experimenter Expectancies 59 Threats to Construct Validity To what extent variables capture desired constructs Mono-Operation Bias (Instruments) • Used 4 different outcome measures HIRSD, BDI, GAS, HCSL-90 + • Measures of well-known psychometric properties + Mono-Method Bias • Used both patient self report and clinician completed measures + • Measures of well-known psychometric properties + Experimenter Expectancies • Clinicians not blind to therapy modality• Psychiatrist blind to Med condition + 60 Threats to External Validity Can we generalize observed relations across persons, settings and times Person-Units Outcome Measures Settings 61 Threats to External Validity Can we generalize observed relations across persons, settings and times Person-Units Outcome Measures Settings • Highly selected sample (-) • Only 45% screened were selected (-) • Generalizable to white (89%) women (70%) highly educated (75% coll degree or some coll) who were less severely depressed (p.974) • Interview and self –report measures + • Clinical significance recovery rates + • Statistically significant findings were not consistent across measures – HRSD detected more differences in depression that BDI Empirical Question ???? 62 Results: Summary 1/3 Paired T test showed stat. sig. differences (p<.001) in Pre- Post scores in all measures for all three groups of participants (even placebo pill/CM) Intent-to treat Completers Minimum 3.5<Sessions Completers of all or most sessions At least 12<sessions > 15 (n=155) 63 Results: Summary 2/3 ANCOVAS showed no stat sig differences in pre-test scores in any measure for any treatment group Stat sig differences in post-test BDI/HSCL90 HSRD/GAS Completers Total Group (239) Results: Summary 3/3 Pairwise Follow-up ANCOVA HSCL-90 IMI-CM> PLA-CM (Completer) GAS -- IMI-CM>PLA-CM (Total 239 group) HRSD IPT, IMI-CM>trend PLA-CM (Total 239) Recovery Findings (Clinical Significance) IPT, IMI-CM > PLA-CM ( End-Point 239) 43% 42% 21% Post-test HRSD<6 CBT = 36% NS 65