Notes 5 - Wharton Statistics Department

advertisement
Stat 921 Notes 5
Reading:
Observational Studies, Chapter 2.7-2.8
I. Addendum to Notes 4
This plot illustrates how the rank sum test statistic for the
additive treatment effect model, rTsi  rCsi   , is decreasing in
.
II. Hodges-Lehmann Estimates (Section 2.7.2)
Hodges and Lehmann (1963, Annals of Mathematical Statistics)
developed a general method for forming point estimates for the
additive treatment effect model from a test statistic.
1
For the additive treatment effect model, consider a test statistic
t ( Z , R   0 Z ) for testing H 0 :    0 . That is, we subtract the
hypothesized treatment effect  0 Z from the observed responses
R and ask whether the adjusted responses R   0 Z appear to be
free of a treatment effect. The Hodges-Lehmann estimate of 
is the value ˆ such that the adjusted responses R  ˆZ appear to
be exactly free of a treatment effect.
Suppose we can determine the expectation, say t , of the
statistic t ( Z , R   Z ) when calculated using the correct  , that
is when calculated from responses R   Z = rC that hae been
adjusted so they are free of a treatment effect. For example, in
an experiment with a single stratum and m of N units treated,
the rank sum statistic has expectation t  m( N  1) / 2 if the
treatment has no effect. This is true because in the absence of a
treatment effect , the rank sum statistic is the sum of the m
scores randomly selected from N scores whose mean is
( N  1) / 2 .
Roughly speaking, the Hodges-Lehmann estimate is the solution
to the equation t ( Z , R  ˆ Z )  t , that is the ˆ such that the
adjusted responses R  ˆZ appear to be exactly free of a
treatment effect in the sense that the test statistic
t ( Z , R  ˆ Z ) exactly equals its expectation in the absence of an
effect.
2
Technical complications arise because there might be no or
more than one  0 for which T ( 0 )  t ( 0 ) . To resolve these
complications, the Hodges-Lehmann estimator is defined for
T ( 0 ) a decreasing function as
inf{ 0 : t ( 0 )  T ( 0 }  sup{ 0 : t ( 0 )  T ( 0 }
ˆHL 
.
2
Roughly speaking, if no solution to T ( 0 )  t ( 0 ) exists, average
the smallest  0 that is too large and the largest  0 that is too
small.
For finding ˆ , it useful to recall from Notes 4 that for an effect
increasing statistic, t ( Z , R  ˆ Z ) is decreasing in ˆ and can be
found by the bisection method.
For particular test statistics, there are other ways of computing
ˆ . For the rank sum statistic, Hodges and Lehmann (1963)
shows that ˆ is the median of m( N  m) pairwise differences
formed by taking each of the m treated responses and subtracting
each of the N  m control responses.
The wilcox.test function in R computes the Hodges-Lehmann
estimate based on the rank sum statistic
wilcox.test(intrinsic,extrinsic,conf.int=TRUE)
Wilcoxon rank sum test with continuity correction
data: intrinsic and extrinsic
W = 404.5, p-value = 0.006431
3
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
1.000058 6.600008
sample estimates:
difference in location
3.499931
Warning messages:
1: In wilcox.test.default(intrinsic, extrinsic, conf.int = TRUE) :
cannot compute exact p-value with ties
2: In wilcox.test.default(intrinsic, extrinsic, conf.int = TRUE) :
cannot compute exact confidence intervals with ties
The effect of the intrinsic treatment is estimated to be 3.5.
Simulation Study comparing mean difference to HodgesLehmann based on Mann-Whitney
m=25, N=50 ,   1 , 2000 simulations
Distribution
of rCi
Bias
Root Mean Square
Error
ˆMD
ˆHL
ˆMD
ˆHL
N(0,1)
0.001
-0.001
0.284
0.292
t with 3 df
0.008
0.005
0.475
0.358
Cauchy
-9.408
-0.001
386.22
0.553
Exponential 0.002
0.010
0.289
0.190
4
Uniform
0.002
0.002
0.082
0.088
Double
-0.001
Exponential
0.002
0.394
0.331
III. Censored Outcomes
In some experiments, an outcome records the time to some
event.
 In a clinical trial, the outcome may be the time between a
patient’s entry into the trial and the patient’s death. In a
psychological experiment, the outcome may be the time
lapse between administration of a stimulus by the
experiment
 In a psychological experiment, the outcome may be the
time lapse between administration of a stimulus by the
experimenter and the production of a response by the
subject.
 In a study of remedial education, the outcome may be the
time until a certain level of proficiency in reading is
reached.
Times may not be censored in the sense that, when data analysis
begins, the event may not yet have occurred. The patient may
be alive at the close of the study. The stimulus may never elicit
a response. The student may not develop proficiency in reading
during the period under study.
If the event occurs for a unit after, say 3 months, the unit’s
response is written 3. If the unit entered the study 3 months ago,
5
if the event has not yet occurred, and if the analysis is done
today, then the unit’s response is written 3+ signifying that the
event has not yet occurred.
Example:
Treatment: 3, 4+, 6, 8+
Control: 2, 5+, 7, 9
S
ns
Gehan’s test statistic: t ( Z , r )   Z si qsi where
s 1 i 1
qsi
is the
number of units in stratum s who definitely have outcomes less
than unit i minus the number who definitely have outcomes
greater than unit i.
Gehan’s test statistic:
For treated unit with response =3, contribution is 1-3=-2
For treated unit with response =4+, contribution is 1-0=1
For treated unit with response =6, contribution is 1-2=-1
For treated unit with response =8+, contribution is 2-0=2
Test statistic is -2+1-1+2=0.
IV. Job Training Data: Comparison of Models
A good diagnostic for a treatment effect model is to compare the
boxplots of estimated potential responses under control for the
units that received treatment , , at the Hodges-Lehmann estimate
to the rC (ˆHL ) | Z  1, to the responses of the units under control.
The boxplots should look very similar if the model is correct.
6
# Additive Treatment Effect Model
wilcox.test(treated.r.jobtrain,control.r.jobtrain,conf.int=TRUE);
boxplot(treated.r.jobtrain-130.68,control.r.jobtrain,names=c("Adjusted
Treated","Control"),main="Additive Treamtent Effect Model")
# Find Hodges-Lehmann estimate for Tobit model
# r_C=max(r_T-beta,0)
betagrid=seq(300,400,5);
pvalgrid=rep(0,length(betagrid));
for(i in 1:length(betagrid)){
adjusted.control=pmax(treated.r.jobtrain-betagrid[i],0);
pvalgrid[i]=wilcox.test(treated.r.jobtrain,adjusted.control,conf.int=TRUE)$p.value;
}
boxplot(pmax(treated.r.jobtrain-365,0),control.r.jobtrain,names=c("Adjusted
Treated","Control"),main="Tobit model");
7
8
Download