Class 11 & 12

advertisement
Class 11 -12
Chapters 5 & Elkins (1989)
Threats to Statistical Conclusion Validity
Are the observed relations among variables accurate?
Power
Unreliability of Measures
Introduces error variance
Attenuates Correlations
Unreliability of Treatment
Implementation
Specificity- Active ingredients
Fidelity of delivery
Competency
Extraneous Variance in the
Experimental Setting
Heterogeneity of
Participants
2
Threats to Internal Validity
Can we conclude that there is a causal relation between the IV and
the DV? Did treatment cause differences in DV across groups?
Selection
Inclusion –Exclusion criteria &
Who gets assigned to which group?
History
Attrition
What do we know about drop-outs?
Repeated Testing
Effects
Reaction to
Control Group
Assignment
Double- blind designs pharmaceutical studies
Placebo effects – non-specific-factors vs active
ingredient are responsible for observations
Houston study -----------
3
Department of Veterans Affairs (VA) and
Baylor College of Medicine- Houston
 180 osteoarthritis and knee pain patients randomly
assigned to (New England J of Medicine, 2002):
Debridement
worn, torn,cartilage is cut and
removed with viewing tube called an
arthroscope
Arthroscopic
lavage
bad cartilage is flushed out
Simulated
arthroscopic
Surgery
small incisions were made, but no
instruments were inserted and no
cartilage removed
4
Findings
 During two years of follow-up:,
 patients in all three groups reported moderate
improvements in pain and ability to function.

intervention groupsdid not report not less pain or
better function than the placebo group.

Placebo patients reported better outcomes than
the debridement patients at certain points during
follow-up.

Patients were blind to type of surgery
5
Threats to Construct Validity
To what extent variables capture desired constructs
Mono-Operation Bias
(Instruments)
Mono-Method Bias
Experimenter
Expectancies
Self-Report
Clinician ragted
Allegiance Effect
6
Threats to External Validity
Can we generalize observed relations across persons,
settings and times
Person-Units
Outcome Measures
Settings
7
Elkin et al: Purpose
 Test feasibility of the collaborative
clinical trial model
 Examine relative efficacy of CBT,
IPT, and Medication for Depression
8
NIMH Treatment of Depression
Collaborative Research Program
 U. of Pittsburg
 George Washington U.
 U. of Oklahoma
 250 Patients: Major depressive disorder
 28 therapists: years experience 2 -27;
 10 psychologists
 18 psychiatrists
71% male
9
Experimental Between-Group Designs
1. Post-Test Only Control
2. Pre-Test -- Post-Test Control
3. Solomon Four Group
(combination of 1 and 2 above)
 Factorial Design

more than one independent variable; interactions
treatment X therapist
or patient characteristic
 Dependent Sample Design (Matching)
10
Experimental Between-Group Designs
1. Post-Test Only Control
2. Pre-Test -- Post-Test Control
3. Solomon Four Group
(combination of 1 and 2 above)
 Factorial Design - Post Hoc

more than one independent variable; interactions

treatment X patient characteristic (depression level at intake)
 Dependent Sample Design (Matching)
11
IVs: Experimental Groups:
 Cognitive
Behavioral Therapy
 Interpersonal Therapy

16 individual sessions/ 50 min.
 Medication
+ Clinical Management*
 Pill-Placebo + Clinical Management*

1st session 55 min.; then 20 to 25 min.
* Minimal supportive therapy condition
12
Dependent Variables
Clinical Evaluator
Self Report
13
Dependent Variables
Clinical Evaluator
• Hamilton Rating Scale
Depression (HRSD)
• Global Assessment
Scale (GAS)
Self Report
• Beck Depression
Inventory (BDI)
• Hopkins Symptom
Checklist (HSCL-90)
14
Outcome Research Strategies
 Primary Analyses
 Secondary Analyses (Post-Hoc)
15
1.
2.
3.
4.
5.
6.
Treatment Package Strategy
Dismantling Strategy
Constructive Strategy
Parametric Strategy (structural
components)
Comparative Outcome Strategy
Client and Therapist Variation Strategy
Moderation Designs
Outcome Research Strategies
 Primary Analyses
 Treatment package
 Comparative
 Secondary Analyses
 Client
Variation -moderation effect?
17
Outcome Research Strategies
 Secondary Analyses
 Client
Variation -moderation effect
depression level at intake as moderator
differences between in outcomes
treatment groups
Were outcomes across treatment groups
different for patients with higher versus lower
levels of depression at pre-test?
18
Control Groups
 CBT
 IPT
 Medication
+ Clinical Management*
 Pill- Placebo + Clinical Management*
* Minimal supportive therapy condition
19
Treatments & Therapists
Cognitive Behavioral
Therapy
Different group of
experienced therapists
Interpersonal Therapy
Medication + Clinical
Mngmnt
Pill-Placebo + Clinical
Mngmnt
Same therapists psychiatrists
20
Treatments & Therapists
Cognitive Behavioral
Therapy
Different group of
experienced therapists
(potential confound)
Interpersonal Therapy
Medication + Clinical
Mngmnt
Pill-Placebo + Clinical
Mngmnt
Same therapists:
psychiatrists
(safeguards internal validity-
undermines generalizability)
21
Ensure Valid Treatments
 Specify the treatment(s)
 Therapist training/monitoring
 Fidelity Checks
22
Ensure Valid Treatments
 Specify the treatment(s)
 Manuals
 Therapist training/monitoring
 Fidelity Checks- therapy tapes


Collaborative Study Psychotherapy Rating Scale
(CSPRS):
Taped treatments could be discriminated 95% of
the time
23
Attrition (>15 sessions or 12 weeks)
Total: 77/239
32%
CBT
IPT
Meds/CM
Placebo/CM
32%
23%
33%
40%
Early terminators more depressed at pre-test than completers.
24
Which group to use in outcome analysis??
Total N = 239
Completers
N = 155
15 weeks or
12 sessions
N = 204
At least 3.5 weeks or
4 sessions
End Point
Intent to Treat Group
N = 239
(last assessment or pre-test)
End-Point
25
Assessment Times
Pre treatment
Post Treatment
 4,
8, 12 weeks
 Termination – 15 weeks
 Follow up: 6, 12, 18 months
26
Analyses of Pre-test/Post-test (1)
 Paired T-Test to examine
differences between pre-test
and post-test scores (p. 974)
 How Many ??
27
Table 1 Completer Group:
At least 12 sessions; n=155 (page 975)
28
Analyses of Pre-test/Post-test (1)

Paired T-Test to examine differences between
pre-test and post-test scores (p. 974)
 How Many ??
4 Treatment groups X 4 Outcome measures
CBT
IPT
IMI-CM
Pla-CM
HRSD
GAS
BDI
HSCL-90
X 3 Samples – Completers; End Point 204; 239
29
Findings – T-Tests
 P.974 right
30
IVs: Experimental Groups:
 Cognitive
Behavioral Therapy
 Interpersonal Therapy

16 individual sessions/ 50 min.
 Medication
+ Clinical Management*
 Pill-Placebo + Clinical Management*

1st session 55 min.; then 20 to 25 min.
* Minimal supportive therapy condition
31
Analyses of Post-test scores

Use pre-test as a covariate in analyses of covariance to compare mean post-test scores
across the 4 treatment groups


Calculate a residualized change score – amount of
variability in the post-test that is not associated with
the pre-test score
Used a p<.10 in ANCOVAS and
p =.10/6 =.01666=.017 pair-wise comparisons(6)
Bonferroni correction (p.974)
32
Table 1 Completer Group:
At least 12 sessions; n=155 (page 975)
33
ANCOVAS: Post test scores
 Statistically significant differences
between groups in scales at post-test

Four 3 X 4 ANCOVAS: differences across
treatments in Post-treatment scores in:
HRSD, GAS --- BDI, HSCL90
 3 (sites) X 4 (treatment groups)
 Analyses reported only for treatment groups
combining them across sites
34
Co-Variates
 Pre-test scores
 Marriage Status (1,2)
 Why
not MANCOVAS? P.973
35
Table 1 Completer Group:
At least 12 sessions; n=155 (page 975) p<.10
 BDI -No significance differences in pair-wise comparisons
36
Table 1 End Point 239 Group
CBT
IPT
IMI-CM
PLA-CM p<.10
37
Findings Pair-wise Comparisons
Sample
Clinical Evaluator
Self-Report
BDI
Pairwise NS
HSCL-90-T p=.006
Completer
N = 155
IMI-CM<PLA-CM
EP-204
GAS
IMI-CM<PLA-CM
(trend p=.020--- .017)
EP-239
HRSDep
IPT, IMI-CM<PLA-CM (trend p=.017,.018)
GAS p =.010
IMI-CM<PLA-CM
38
Measuring Change Elkin et al. 1989
 Statistical significance
 Clinical significance
 Recovery Analysis
39
Measuring Change Elkin et al. 1989
 Statistical significance
 Differences between groups in scales at
post-test controlling for pre-test scores
 Clinical significance
 Percentage of participants that changed
from dysfunctional to functional level (using
cut-off scores)
40
Clinical Significance
 Recovery Analysis

Proportion of patients who improved vs. not
improved
 Cut Off Scores


Not Depressed HRSD < 6 and BDI < 9
Depressed
HRSD > 6 or BDI > 9
 Statistical Analyses

Chi square: Proportion of depressed and nondepressed patients across treatment groups at
termination.
42
43
End Point 239 HRSD
p = .04
CBT
IPT
IMI-CM
P-CM
Proportion of
cases that met
recovery
criteria
36%(ns)
Proportion of
cases that met
recovery
criteria
43%
Proportion of
cases that
met recovery
criteria
Proportion of
cases that met
recovery criteria
42%
21%
 Chi Square (Χ2) tests to what extent the proportion in each
group is what may be expected by chance or if it is larger or
smaller than expected…….
 IPT = IMI-CM>Placebo-CM
 CBT - % comparison was not sig. for any group
44
Completer Group on HRSD
CBT
IPT
IMI-CM
Proportion of
cases that met
recovery
criteria
51%
Proportion of
cases that met
recovery
criteria
55%
Proportion of
cases that
met recovery
criteria
57%
P-CM
Proportion of
cases that met
recovery criteria
29%
 Chi Square (Χ2) tests to what extent the proportion in each
group is what may be expected by chance or if it is larger or
smaller than expected…….
 IPT, IMI-CM>Placebo-CM
45
Secondary Analyses
 To examine effect of pre-treatment
severity (HRSD/GAS) on outcome by
treatment group
 DVs:
Post-treatment scores
 Severity Criteria
HRSD>20
 GAS<50

 Covariate
44% of sample
41%
Marital Status
46
2X4 ANCOVA (severity x treatment)
DVs- Post Test HRSD, GAS, BDI, HSCL-90
 Main Effect for
 Main Effect for
 (Interaction term)***
47
2X4 ANCOVA (severity x treatment)
DVs- Post Test HRSD, GAS, BDI, HSCL-90
 Main Effect for Severity


More Severe Pre-Test HRSD>20; GAS<50
Less Severe Pre-Test
 Main Effect for Treatment




CBT
IPT
IMI-CM
P-CM
 Severity X Treatment (interaction term)*******
48
Interaction Effect HRSD Severity x TG
Dependent Variables: HRSD* GAS, BDI, HSCL-90 (p.976)
Completer*
Completer S
CBT
BDI
IPT
IPT
IMI-CM
IMI-CM P-CM
P-CM
CBT
IPT
IMI-CM
CBT
IPT
High
HRSD
High
Depression
Low
LowHRSD
Depression
End Point 204*
P-CM
High HRSD
Low HRSD
End Point 239^
IMI-CM P-CM
High HRSD
Low HRSD
4 sets of 3 2X4 Ancovas: 4DVs, 3 sample subgroups *p<.10; ^p<.11
49
Interaction Effect GAS Severity x TG:
Dependent Variables: HRSD GAS, BDI, HSCL-90
Completer S
Completer**
BDI
CBT
IPT
IPT
IMI-CM
IMI-CM P-CM
P-CM
High
HighGAS
Depression
Low
LowGAS
Depression
End Point 204****
CBT
IPT
IMI-CM
P-CM
High GAS
Low GAS
End Point 239*
CBT
IPT
IMI-CM P-CM
High GAS
Low GAS
50
Treatment by Severity Interaction/end-point 204 sample
Higher score Negative Outcome
Higher Score Positive Outcome
51
Summary All Pairwise analyses
following interaction effects p.976
 Less severe groups: no differences across
treatment groups
 More severe groups

IPT more effective than PLA-CM in 3
instances all in the HRSD measure in the END
Point Sample 204 (3 out of 4 comparisons)

IMI-CM more effective than PLA-CM across a
number of measures (8 out 10 comparisons)
52
Figure 2
Recovery Rates (%) endpoint /204 sample
53
Figure 2 Recovery Rates (%) endpoint /204
sample for severity groups (p.977)
 Less severe subgroups: NS differences among
treatments for all samples with HRSD or GAS.
 More severe subgroups for HRSD and GAS:
 Consistent findings across the three samples

IPT>PLA-CM 5/6 and IMI-CM>PLA-CM 6/6
54
Threats to Statistical Conclusion Validity
Are the observed relations among variables accurate?
Power
Unreliability of Measures
Unreliability of Treatment
Implementation
Extraneous Variance in the
Experimental Setting
Heterogeneity of
Participants
55
Threats to Statistical Conclusion Validity
Are the observed relations among variables accurate?
•
•
•
•
Large N by group range 34-62 +
Outcome measures are well-known +
Power analyses 81-95% for medium effects +
p<.10 for Mancovas and .10/6 for pairwise comp
Unreliability of Treatment
Implementation
•
•
•
•
Experienced Therapists – 2-27yrs Mean = 11 +
Manuals, training per treatment group +
Closely monitored +
Taped sessions – 95% correctly classified +
Extraneous Variance in the
Experimental Setting
•
•
•
•
Not known for the most part 28 therapists from 3 – 11 patients each no way to control for therapist effects P. 980 one site CBT another site IPT similar to
Meds/CM
Heterogeneity of
Participants
•
•
•
•
Random assignment to groups +
Only included 45% of those screened. +
Mostly women 70% female +
89% white participants +
Power
Unreliability of Measures
56
Threats to Internal Validity
Can we conclude that there is a causal relation between the IV and
the DV? Did treatment cause differences in DV across groups?
Selection
Who gets assigned to which group?
History
Attrition
What do we know about drop-outs?
Repeated Testing
Effects
Reaction to
Control Group
Assignment
57
Threats to Internal Validity
Can we conclude that there is a causal relation between the IV
and the DV?
Selection
Used RandomizationSee factors under Heterogeneity of Participants
History
Time frame of study not reported
Did therapy happen at about the same time for
everyone?
Attrition
Relatively high attrition rates - 32% -- about 25%
was for negative reasons related to treatment- (-)
Early terminators were more depressed at intake (-)
Repeated Testing
Effects
Tested at frequent intervals –’ pre-test, 4, 8, 12,
weeks, termination 6 12 and 18 months follow-up
Reaction to
Control Group
Assignment
Not known – but could be the case.
Placebo/CM experienced the highest attrition –
32% CBT—23% IPT – 33% Meds/CM -- 40%
Placebo/CM
58
Threats to Construct Validity
To what extent variables capture desired constructs
Mono-Operation Bias
(Instruments)
Mono-Method Bias
Experimenter
Expectancies
59
Threats to Construct Validity
To what extent variables capture desired constructs
Mono-Operation Bias
(Instruments)
• Used 4 different outcome measures
HIRSD, BDI, GAS, HCSL-90 +
• Measures of well-known psychometric
properties +
Mono-Method Bias
• Used both patient self report and clinician
completed measures +
• Measures of well-known psychometric
properties +
Experimenter
Expectancies
• Clinicians not blind to therapy modality• Psychiatrist blind to Med condition +
60
Threats to External Validity
Can we generalize observed relations across persons,
settings and times
Person-Units
Outcome Measures
Settings
61
Threats to External Validity
Can we generalize observed relations across persons,
settings and times
Person-Units
Outcome Measures
Settings
• Highly selected sample (-)
• Only 45% screened were selected (-)
• Generalizable to white (89%) women (70%)
highly educated (75% coll degree or some coll)
who were less severely depressed (p.974)
• Interview and self –report measures +
• Clinical significance recovery rates +
• Statistically significant findings were not
consistent across measures – HRSD detected
more differences in depression that BDI Empirical Question ????
62
Results: Summary 1/3
 Paired T test showed stat. sig. differences
(p<.001) in Pre- Post scores in all measures
for all three groups of participants (even
placebo pill/CM)


Intent-to treat
Completers Minimum


3.5<Sessions
Completers of all or most sessions

At least 12<sessions > 15 (n=155)
63
Results: Summary 2/3
 ANCOVAS showed
 no stat sig differences in pre-test scores in
any measure for any treatment group
 Stat
sig differences in post-test
BDI/HSCL90
 HSRD/GAS

Completers
Total Group (239)
Results: Summary 3/3
 Pairwise Follow-up ANCOVA

HSCL-90 IMI-CM> PLA-CM (Completer)

GAS --
IMI-CM>PLA-CM (Total 239 group)

HRSD
IPT, IMI-CM>trend PLA-CM (Total 239)
 Recovery Findings (Clinical Significance)


IPT, IMI-CM > PLA-CM ( End-Point 239)
43% 42%
21%
Post-test HRSD<6
CBT = 36% NS
65
Download