Notes 2

advertisement
Stat 921 Notes 2
Reading:
Observational Studies, Chapters 2.1-2.4.
I. Randomization Inference for Testing Hypothesis of No
Treatment Effect
Randomization Inference: Inference that only requires
randomization for its validity.
Fisher described randomization as the “reasoned basis” for
inference and the “physical basis of the validity of the test.”
Example: Researchers used 7 red and 7 black playing cards to
randomly assign 14 volunteer males with high blood pressure to
one of two diets for four weeks: a fish oil diet and a standard oil
diet. The reductions in diastolic blood pressure after four weeks
among the 14 men are shown below (based on a study by H.R.
Knapp and G.A. Fitzgerald, “The Antihypertensive Effects of
Fish Oil,” New England Journal of Medicine 320 (1989): 10371043):
Fish oil diet (T) : 8, 12, 10, 14, 2, 0, 0
Regular oil diet (C): -6, 0, 1, 2, -3, -4, 2
Consider the null hypothesis of no treatment effect:
H 0 : rTi  rCi for all i .
1
Notation:
Let rz denote the vector of potential responses for randomization
assignment z .
Let Z denote the observed randomization assignment.
Let r  rZ denote the vector of observed responses.
Under H 0 : rTi  rCi for all i , rz is the same for all z and hence
rz  r .
Test Statistic:
Consider the following test statistic: the difference in sample
means for the treated and control groups.
Z T r (1  Z )T r
t(Z, r)  T 
Z 1 (1  Z )T 1
where 1 is any N-tuple of 1’s where we assume there are N
subjects in the trial. Let T be the observed value of this test
statistic.
Suppose we would like to reject for large values of T , which
would be reasonable if we are interested in the alternative that
the treatment has a positive effect.
The p-value is the probability that the test statistic would be
greater than or equal to its actual value T under the null
hypothesis of no treatment effect. This is simply the proportion
of random assignments z that lead to values of t ( z, r ) that are
greater than or equal to T . Namely, letting  be the set of all
possible random assignments z , the p-value is
2
PrH0 {t ( Z , r )  T } 
|{z   : t ( z, r )  T }|
.
||
For the fish oil data, the observed test statistic is
T
1
1
8  12  10  14  2  0  0    6  0  1  2  3  4  2   7.71
7
7
# Randomization test for no treatment effect using the difference
# in sample means between the treated and control subjects as
# the test statistic
# Fish oil diet data
treated.r=c(8,12,10,14,2,0,0);
control.r=c(-6,0,1,2,-3,-4,2);
# Recursive function for finding all subsets of size r from
# the set (1,...,n)
#
# Based on the following observation: Suppose you single
# out one of the elements, say the first. Then the subsets of
# size r consist of those that contain the first and those that
# do not. The first group may be generated by attaching the
# first object to each subset of size r-1 selected from the
# n-1 others, and the second group consists of all subsets of
# size r from the n-1 others
subsets=function(n,r,v=1:n){
if(r<=0) NULL else
if(r>=n) v[1:n] else
rbind(cbind(v[1],subsets(n-1,r-1,v[-1])),subsets(n-1,r,v[-1]))
}
# Function for testing no treatment effect using the test statistic,
# difference in sample mean between the treated and control group,
# rejecting for large values of the test statistic
3
treat.effect.samplemean.test.func=function(treated.r,control.r){
# Create vectors for r and Z, and find total number in
# experiment and number of treated subjects
r=c(treated.r,control.r);
Z=c(rep(1,length(treated.r)),rep(0,length(control.r)));
N=length(r);
m=length(treated.r);
# Observed test statistic
obs.test.stat=mean(r[Z==1])-mean(r[Z==0]);
# Compute distribution of test statistic over random
# assignments
# Matrix that contains all possible assignments of subjects
# to the treated group
subsetmat=subsets(N,m);
# The r for each subject in the treated group in the
# assignments in subsetmat
treated.r.mat=matrix(r[t(subsetmat)],ncol=m,byrow=T)
# Sum of all the r’s
sum.r=sum(r)
# Mean of the treated and control groups for the assignments in subsetmat,
# and corresponding test statistic
mean.treated.r=apply(treated.r.mat,1,mean);
mean.control.r=(sum.r-(mean.treated.r)*m)/(N-m);
teststat=mean.treated.r-mean.control.r;
# p-value is the probability the test statistic under the
# null hypothesis would be greater than the observed test
# statistic
pval=sum(teststat>=obs.test.stat)/length(teststat);
pval;
}
# Test no treatment effect for fish oil data
treat.effect.samplemean.test.func(treated.r,control.r)
[1] 0.009032634
4
Conclusion: There is strong evidence (p<0.01) that the fish oil
diet has an effect.
Monte Carlo method for approximate randomization inference
For the job training experiment data from Notes 1, there are 297
 722 
treated subjects and 425 control subjects so  297  possible


random assignments, far too many to enumerate.
The p-value is
pval  PrH0 {t ( Z , r )  T } 
|{z   : t ( z, r )  T }|
||
 EH0  I [t ( Z , r )  T ]
.
1 if event occurs
I
[event]


where
.
0 otherwise
The Monte Carlo method approximates this expected value by
drawing independent, randomly chosen random assignments and
counting the proportion for which I [t ( Z , r )  T ] , i.e., if for
K randomly chosen random assignments
1 K
MC , K
pval
  I [t ( Z , r )  T ] .
K k 1
5
P
 pval as K   .
By the law of large numbers, pval
Furthermore, by the Central Limit Theorem, an approximate
MC , K
95% CI for pval is pval
MC , K
{ pval MC , K }{1  pval MC , K }
 1.96
K
### Function for testing no treatment effect using the difference in
### sample means as the test statistic, rejecting for large values of the test
### statistic and using the Monte Carlo method with K draws
treat.effect.samplemean.montecarlo.test.func=function(treated.r,control.r,K){
# Create vectors for r and Z, and find total number in
# experiment and number of treated subjects
r=c(treated.r,control.r);
Z=c(rep(1,length(treated.r)),rep(0,length(control.r)));
N=length(r);
m=length(treated.r);
# Observed test statistic
obs.test.stat=mean(r[Z==1])-mean(r[Z==0]);
# Monte Carlo simulatoin
montecarlo.test.stat=rep(0,K);
for(i in 1:K){
treatedgroup=sample(1:N,m); # Draw random assignment
controlgroup=(1:N)[-treatedgroup];
# Compute test statistic for random assignment
montecarlo.test.stat[i]=mean(r[treatedgroup])-mean(r[controlgroup]);
}
# Monte Carlo p-value is proportion of randomly drawn
# test statistics that are >= observed test statistic
pval=sum(montecarlo.test.stat>=obs.test.stat)/K;
# 95% CI for true p-value based on Monte Carlo p-value
lowerci=pval-1.96*sqrt(pval*(1-pval)/K);
upperci=pval+1.96*sqrt(pval*(1-pval)/K);
list(pval=pval,lowerci=lowerci,upperci=upperci);
}
6
# For the Fish Oil experiment
treat.effect.samplemean.montecarlo.test.func(treated.r,control.r,100000)
$pval
[1] 0.00907
$lowerci
[1] 0.0084824
$upperci
[1] 0.0096576
# For the job training experiment
nswdata=read.table("nswdata.txt",header=TRUE);
treated.r.jobtrain=nswdata$earnings78[nswdata$treatment==1];
control.r.jobtrain=nswdata$earnings78[nswdata$treatment==0];
treat.effect.samplemean.montecarlo.test.func(treated.r.jobtrain,control.r.jobtrain,10
0000)
$pval
[1] 0.03107
$lowerci
[1] 0.02999459
$upperci
[1] 0.03214541
Conclusion: There is no moderate evidence (p=0.03) that the job
training program has an effect.
II. General Setting for Randomized Experiments (Chapter 2.3)
There are N units available for experimentation. A unit is an
opportunity to apply or withhold the treatment. Often, a unit is a
person who will receive either the treatment or control as
7
determined by the experimenter. However, it may happen that it
is not possible to assign a treatment to a single person, so a
group of people form a single unit, perhaps all children in a
particular classroom or school. On the other hand, a single
person may present several opportunities to apply different
treatments as in a longitudinal study.
The N units are divided into S strata or subclasses on the basis of
covariates, where a covariate is a variable measured prior to the
assignment of treatment. The stratum to which a unit belongs is
not affected by the treatment, since the strata are formed prior to
treatment. There are ns units in stratum s for s  1, , S , so
N   ns .
Write Z si  1 if the ith unit in stratum s receives the treatment
and write Z si  0 if this unit receives the control. Write ms for
the number of treated units in stratum s, so
ns
ms   Z si
. Finally,
i 1
write Z for the N-dimensional column vector containing the Z si
for all units in lexical order, that is
8
 Z11 


 Z12 

 Z 

  1
Z =  Z1,n1     ,

  
 Z 21   Z S 




 Z S ,nS 
 Z s1 


where Z s  

 Z s ,n 
 s
When there is only one stratum, we drop the s subscript.
Methods of Assigning Treatments at Random
The most commonly used assignment mechanism fixes the
number ms of treated subjects in stratum s. Let  be the set
S  ns 
containing the K   s 1  m  possible treatment assignments
 s
 z1 
z   
in which z s is an ns -tuple with ms ones and ns  ms
 zS 
zeros. In the most common assignment mechanism, each of
these K possible assignments is given the same probability,
Pr( Z  z )  1/ K for all z   . This type of randomized
experiment, with equal probabilities and fixed ms , will be called
a uniform randomized experiment.
9
Proposition 1: In a uniform randomized experiment, the
 ns 
Z1 , , Z S are mutually independent, and Pr( Z S  zs )  1/  
 ms 
for each ns -tuple z s containing ms ones and ns  ms zeros.
Proof: Problem 3 at end of Chapter 2 in text book.
10
Download