TESTING IF THE HETEROGENEITY DISTRIBUTION OF A RANDOMIZED EXPERIMENT CHANGES DURING

advertisement
TESTING IF THE HETEROGENEITY DISTRIBUTION OF
A RANDOMIZED EXPERIMENT CHANGES DURING
THE EXPERIMENTAL PERIOD:
A STATISTICAL ANALYSIS OF A SOCIAL EXPERIMENT
Marcel Voia
Department of Economics, Carleton University, 1125 Colonel By Drive, Ottawa ,
Ontario K1S 5B6, Canada.
E-mail: mvoia@connect.carleton.ca
Ričardas Zitikis
Department of Statistical and Actuarial Sciences, University of Western Ontario,
London, Ontario N6A 5B7, Canada.
E-mail:
zitikis@stats.uwo.ca
February 2006
Abstract. The paper considers statistical tests for determining if
the heterogeneity distribution of the treatment group in a randomized experiment changes during the experimental period. Solving the
problem is of practical interest since heterogeneity changes may indicate serious selection problems in the randomized experiment. To
make the tests easily implementable in practice, we discuss estimating critical values, for which we use bootstrap. To asses the actual
performance of the tests, we conduct simulation studies. Finally, the
tests are applied to analyze a social experiment data set, which is the
main goal of the present paper.
Classification codes: C12, D31, D63.
Key words and phrases: Stochastic dominance, Kolmogorov-Smirnov
type statistic, asymptotic distribution, bootstrap, selection, testing
for intersection.
1
2
1. Introduction and motivation
In this paper we consider statistical tests that can assist researchers in identifying
changes in distributions during experimental periods of randomized experiments.
Moreover, we argue that the tests can also identify if there is unobserved heterogeneity. We shall now discuss these issues in detail.
When the use of a treatment is a matter of choice, selection issues might arise
even though randomization is used for allocating individuals to the treatment and
control groups. For example, those individuals who believe that they would benefit
most from the treatment may disproportionately be the ones that choose to avail
themselves from the treatment. Therefore, selection into the treatment group can
be a serious problem for a randomized experiment, and researchers should therefore
take appropriate measures in order to find consistent estimates of the treatment
effect.
Athey and Imbens (2003) have introduced an estimator that can be used to estimate the effect of a treatment on the entire distribution of a treatment program.
Their estimator allows for unobserved heterogeneity to be different between the
two groups. Therefore, it allows for self-selection, or noncompliance, in one of the
groups. Thus, in the absence of a treatment, the differences between the two groups
are determined by the differences in the conditional distributions of unobserved heterogeneity in the groups. These differences can in turn be compared by employing
statistical tests for the equality of two distributions, for testing their first-order stochastic dominance (FSD), second-order stochastic dominance (SSD), higher-order
dominance, or intersection.
We find an extensive literature on testing for stochastic dominance, which essentially starts with the work by McFadden (1989) where he proposes and analyzes
a Kolmogorov-Smirnov-type test statistic for stochastic dominance. Subsequently,
Anderson (1996), Davidson and Duclos (2000), Barrett and Donald (2003), Whang,
Linton, and Maasoumi (2005) develop powerful statistical inferential results for stochastic dominance of any order. Horvath, Kokoszka and Zitikis (2004) contribute to
the literature by showing how to modify the statistics in order to test for stochastic
dominance over non-compact intervals.
In this paper we are interested in identifying if the heterogeneity distribution
of the individuals from the treatment group changes during the treatment period
and if the heterogeneity distribution changes between groups during the treatment
3
period. Using the fact that we can construct the counterfactual distribution of the
treatment in the second period (which is the distribution of the treated individuals as
they were not treated; cf. Athey and Imbens (2003) for detail), we can identify if the
distribution of the individuals from the treatment group changes over the treatment
period by testing if the counterfactual treatment distribution in the second period
intersects the treatment distribution at the baseline. Hence, our statistical tests
are based on checking whether a distribution dominates or intersects another one.
For example, using the test we can check if the treatment and control distributions
dominate each other or intersect during the period following the baseline.
The paper is organized as follows. In Section 2 we formulate the problem rigorously. In Section 3 we describe various tests. In Section 4 we apply the tests on
different simulation designs. In Section 5 we apply the methodology to an experimental treatment data set (cf. Decker et al 2000) and present findings. Section
6 contains concluding notes. Technical results, tables, and figures are relegated to
appendices at the end of the paper.
2. Mathematical formulation of the problem
Assume that each individual in the population of interest can be assigned to one
and only one of the two sub-groups, which correspond to control and treatment. Let
G be a random variable taking two values: G = 0 if a randomly selected individual
is assigned to the control group and G = 1 if assigned to the treatment group. (We
use the upper-case G to indicate that this random variable assigns the individuals
to groups.)
Next, there are two time periods. The first one, which we denote by t = 0, is the
time at the introduction of a certain treatment policy. The second period, which we
denote by t = 1, is the period after the introduction of the treatment policy, or the
time when the effect of the policy is measured. (We use the lower-case t to denote
the time periods since the assignment of individuals to the two time periods is not
random.)
Hence, we have the pair (G, t) that can take on one of the four possible values:
(0, 0), (0, 1), (1, 0), and (1, 1). The variable of interest is X (G,t) , which can, for
example, mean the time out-of-work measured in weeks, as it is in the example
we analyze in the present paper below. We shall be interested in properties of the
4
conditional distribution functions
£
¤
F (g,t) (x) := P X (G,t) ≤ x| G = g
for various choices of g, t ∈ {0, 1}.
We assume that at the time of the random assignment (when the treatment was
not yet enforced) the control and the treatment groups have the same heterogeneity
distribution, which we write as F (0,0) ≡ F (1,0) . As for the two distributions F (0,1)
and F (1,1) , there are three possibilities, and their descriptions are given next and
followed by a discussion and further notes.
(1) The distributions are equal. It means that the treatment does not have any
effect on the outcome variable. In this case we write the null hypothesis as
(1)
H0 : F (0,1) ≡ F (1,1) .
(2) One of the distributions dominates another one. We shall concentrate on the
case when F (0,1) (x) ≤ F (1,1) (x) for all x. We formulate the corresponding
null hypothesis as
(2)
H0 : F (0,1) ≤ F (1,1) .
(Data might suggest testing the null hypothesis F (0,1) ≥ F (1,1) , which can be
done analogously by interchanging the roles of F (0,1) and F (1,1) .)
(3) The two distributions intersect, and there are two points y0 and z0 such that
F (0,1) (y0 ) < F (1,1) (y0 ) and F (0,1) (z0 ) > F (1,1) (z0 ). In this case we write the
null hypothesis as
(3)
H0 : F (0,1) ./ F (1,1) .
Several problems of practical interest can be formulated, and we are now to discuss
them.
First, we are naturally interested in whether the distributions of the control and
the treatment groups differ after the introduction of a new policy. This requires
(1)
testing the above defined null hypothesis H0
(1)
(1)
against the alternative H1 , where
(1)
H1 := “not H0 ”, where we use the notation “:=” for “equality by definition”.
In order for the treatment to be effective over the whole treatment group, the
distributions of the control and the treatment groups have to be different and should
not intersect. This would ensure that the treatment has had an effect on the whole
treatment group. For example, in the context of reducing out-of-work time, we
would be interested in testing whether F (0,1) ≤ F (1,1) or not. In other words, we
5
(2)
are interested in testing the above defined H0
(2)
(2)
against the alternative H1 , where
(2)
H1 := “not H0 ”.
On the other hand, if for some individuals the treatment is not effective, or if the
individuals consider the treatment as not worth taking (assuming that the treatment
is not mandatory), then these individuals may not be taking the treatment. Also,
it may happen that some individuals are not taking the treatment because they are
negatively affected by the fact that they have been selected for treatment and thus
decided not to follow it. Therefore, if we look at the effect of the policy during the
time period t = 1, then we shall observe that at some point the two distributions
intersect. Hence, we are interested in testing whether F (0,1) and F (1,1) intersect, that
(3)
(3)
(3)
(3)
is, we want to test H0 against the alternative H1 , where H1 := “not H0 ”.
The other interesting (and similar) problem concerns the distributions F (1,0) and
F (1,1) . Specifically, we maybe interested in testing whether the behavior of those in
the treatment group has changed after the introduction of a new policy if compared
to their behavior before the introduction of the policy. For this, we may want
to test if F (1,0) and F (1,1) differ, or dominate each other, or intersect. That is,
just like above, we are interested in testing whether any of the null hypotheses
H0 : F (1,0) ≡ F (1,1) , H0 : F (1,0) ≤ F (1,1) , or H0 : F (1,0) ./ F (1,1) holds against the
corresponding alternative “not H0 ”.
3. Methodology
(0,1)
Let X1
(0,1)
, . . . , Xn
be independent and identically distributed random variables,
each having the distribution function F (0,1) . Denote the corresponding empirical
(0,1) . Likewise, let X (1,1) , . . . , X (1,1) be independent and
distribution functions by F[
m
1
identically distributed random variables, each having the distribution function F (1,1) .
(1,1) . We assume that
Denote the corresponding empirical distribution functions by F[
all the X’s are independent random variables. In other words, we consider the case
of two independent populations. Furthermore, we assume that the sample sizes n
and m are comparable, which is a natural assumption. Specifically, we assume that
there exists a number 0 < η < 1 such that both sample sizes n and m tend to
infinity in such a way that
m
→ η.
n+m
6
(1)
Testing H0
(1)
vs H1 . Considerations in this subsection are based on the classical
Komogorov-Smirnov test. Namely, with the help of the parameter
¯
¯
κ := sup ¯F (0,1) (x) − F (1,1) (x)¯ ,
x
we rewrite the null and the alternative hypotheses as follows:
(1)
(1)
H0 : κ = 0 vs H1 :
κ > 0.
(3.1)
Next, we need to construct an empirical estimator for κ and to establish its asymptotic distribution (or a bound) so that critical values would be possible to calculate,
or estimate. We define an estimator of κ by
¯
¯
¯[
¯
[
(0,1)
(1,1)
κ
b := sup ¯F
(x) − F
(x)¯ .
x
The estimator κ
b is consistent (cf. Theorem 7.1 below). The asymptotic behavior
of the estimator under the null and the alternative hypotheses is investigated in
Theorem 7.2. Namely, based on the theorem we have that
r
nm
b :=
K
κ
b
n+m
(1)
is an appropriate statistic for testing the null hypothesis H0 against the alternative
(1)
b > kα , where kα is
H1 . The corresponding rejection (i.e., critical) region is R : K
the α-critical value of the (classical) Kolmogorov-Smirnov test.
(2)
Testing H0
(2)
vs H1 . Considerations in this subsection follow those in Whang,
Linton, and Maasoumi (2005). Namely, with the help of the parameter
¡
¢
δ := sup F (0,1) (x) − F (1,1) (x) ,
x
(2)
(2)
we rewrite the hypotheses H0 and H1 as
(2)
(2)
H0 : δ = 0 vs H1 :
δ > 0.
(3.2)
The empirical estimator of δ is
³
´
[
(0,1) (x) − F
(1,1) (x) .
δb := sup F[
x
The estimator δb is consistent (cf. Theorem 7.1). The asymptotic behavior of the
estimator under the null and the alternative hypotheses is investigated in Theorem
7.3. Namely, based on the theorem we conclude that
r
nm b
b :=
D
δ
n+m
7
(1)
is an appropriate statistic for testing the null hypothesis H0 against the alternative
(1)
b > dα , where dα is the α-critical
H1 . The corresponding rejection region is R : D
value of the maximum of a Gaussian stochastic process Γ (cf. Appendix 7 below for
definition) which depends on both F (0,1) and F (1,1) . Since the distributions are not,
in general, identical, the critical value dα is not distribution free and thus needs to
be estimated. For this, we use bootstrap whose detailed description follows.
(0,1)
(0,1)
(0,1)∗
(0,1)∗
From X1 , . . . , Xn we sample with replacement and obtain n values X1
, . . . , Xn
∗
(0,1) (x) be the corresponding empirical distribution function. Next, from
Let F[
(1,1)
(1,1)
X1
(1,1)∗
(1,1)∗
, . . . , Xm we sample with replacement and obtain m values X1
, . . . , Xm .
∗
(1,1) (x) be the corresponding empirical distribution function. With the notaLet F[
tion above, we define the process
r
´ r nm ³
´
∗
∗
nm ³ [
∗
[
[
(0,1)
(0,1)
(1,1) (x) − F
(1,1) (x) ,
F
(x) − F
(x) −
F[
∆ (x) :=
n+m
n+m
and then, in turn, the quantity
b ∗ := sup ∆∗ (x)
D
x
b ∗ . Now
We repeat the above sampling procedure M times and obtain M values of D
we are in the position to define an estimator d∗α of dα as the smallest x such that
b ∗ are at or below x. With the
at least 100(1 − α)% of the obtained M values of D
(2)
just defined d∗α , the rejection region for testing the null hypothesis H0 against the
(2)
b > d∗ .
alternative H1 is R : D
α
(3)
Testing H0
(3)
vs H1 . Again, our considerations follow those in Whang, Linton,
and Maasoumi (2005). First we note that if there is an x0 such that the strict
inequality F (0,1) (x0 ) > F (1,1) (x0 ) holds, then the earlier introduced parameter δ
is strictly positive. Likewise, the existence of x1 such that F (0,1) (x1 ) < F (1,1) (x1 )
results in a strictly positive value of the parameter
θ := sup(F (1,1) (x) − F (0,1) (x)).
x
Hence, if the two distributions F (0,1) and F (1,1) intersect, then the parameter
τ := min(δ, θ)
is strictly positive. In view of the discussion above, we reformulate the null hypoth(3)
esis as H0 : τ > 0. Under the alternative, the two distribution functions dominate
each other. Hence, the parameter τ will never be positive. In fact, we have τ = 0
.
8
since F (0,1) (x) and F (1,1) (x) always coincide at x = ±∞. Hence, we reformulate the
(3)
alternative as H1 : τ = 0.
The way the null and alternative hypotheses appear above poses a serious problem
in developing a statistical test of desired size or level. To circumvent the problem,
we shall formulate our problem somewhat differently. That is, we shall test the null
hypothesis
(not 3)
H0
: F (0,1) dom F (1,1) ,
where “ F (0,1) dom F (1,1) ” means that one of the distributions dominates another
one, without specifying whether F (0,1) ≤ F (1,1) or F (0,1) ≥ F (1,1) . The alternative
(not 3)
H1
(not 3)
, which is the complement of H0
specified
(3)
H0
: F
(0,1)
./ F
(1,1)
by definition, coincides with the earlier
(not 3)
. Hence, if we reject the null hypothesis H0
: τ=
0, then we shall have significant evidence to claim that the two distributions F (0,1)
and F (1,1) intersect. In summary, we shall test
(not 3)
H0
(not 3)
: τ = 0 vs H1
:
τ > 0.
(3.3)
We define an estimator of τ by
b θ),
b
τb := min(δ,
where δb is same as above, and
[
(1,1) (x) − F
(0,1) (x)).
θb := sup(F[
x
The estimator τb is consistent (cf. Theorem 7.1), and its asymptotic properties are
described in Theorem 7.4. Namely, based on the theorem we have that
r
nm
Tb :=
τb
n+m
(not 3)
is an appropriate statistic for testing the null hypothesis H0
(not 3)
tive H1
against the alterna-
(3)
(recall that it coincides with H0 ). The corresponding rejection region
is R : Tb > tα , where tα is the α-critical value of a distribution (cf. Theorem 7.4)
that depends on F (0,1) and F (1,1) (x). Hence, we need to estimate tα , for which we
use a bootstrap as follows.
With the same process ∆∗ (x) as defined earlier, let
Tb∗ := max(sup ∆∗ (x), sup(−∆∗ (x)))
x
x
(the maximum is not a typographical error). We repeat the above sampling procedure M times and in this way obtain M values of Tb∗ . Now we define the estimator
9
t∗α as the smallest x such that at least 100(1 − α)% of the obtained M values of Tb∗
(3)
(3)
are at or below x. With the t∗α , the rejection region for testing H0 against H1 . is
R : Tb > t∗α .
4. Simulation designs
In the present section we assess the performance of the tests discussed in the
previous section so that we would gain confidence in the performance of tests on a
real data set in Section 5 below.
We consider that the simulated data comes from a randomized treatment experiment, where the treatment is aimed at reducing the duration of unemployment. We
consider six situations, and they are presented in the six subsections below. The
first subsection considers a linear data generating process (DGP) with no differences
in distributions in the second period (no treatment effect). The second subsection
considers a non-linear DGP with no differences in distributions in the second period
(again, no treatment effect). The third subsection considers a linear data generating process (DGP) with no selection problem in the second period and differences
in distribution (with treatment effect). The forth one considers a non-linear DGP
with no selection problem in the second period and differences in distribution (again,
with treatment effect). The fifth subsection considers linear data generating process
(DGP) with selection problem in the second period. Finally, the sixth subsection
considers non-linear DGP with selection problem in the second period. We note at
the outset that in the last two cases we have different heterogeneity distributions
for the control and the treatment groups in the second period (which indicates a
selection problem), whereas the distributions in the first time period are same for
all six cases. To simplify considerations, in all the six subsections we assume that
n = m, and thus specify only n throughout.
4.1. Linear DGP with equality of two distributions in the second period.
(0,1)
We simulate two data sets X1
(0,1)
, . . . , Xn
(1,1)
and X1
(1,1)
, . . . , Xn
using the model
X (G,1) = 16 + 4ε,
where ε ∼ N (0, 1) is a standard normal random variable. Using the simulated data,
we perform the Kolmogorov-Smirnov test for the equality of the distributions F (0,1)
(non-treated) and F (1,1) (treated). We use the formulas F (0,1) (x) = F (1,1) (x) =
Φ( x−16
) to produce the two graphs in Figure 8.1.a.
4
10
(1)
We know from Theorem 7.2 that under the null hypothesis, which is H0 :
b asymptotically has the Kolmogorov-Smirnov
F (0,1) = F (1,1) , the test statistic K
distribution, which we denote by FKS . Hence, the asymptotic P -value of the test
b where P∗ denotes the conditional distribution given the (simulated)
is P∗ [Γ > K],
(0,1)
values of X1
(0,1)
, . . . , Xn
(1,1)
and X1
(1,1)
, . . . , Xn
b is calculated using the sim. (The K
ulated values.) The right-hand side of the following equality is useful for practical
calculations of the asymptotic P -value:
b = 1 − FKS (K).
b
P∗ [Γ > K]
In each of the four cases n = 200, n = 500, n = 1000, n = 2000, we simulate 1000
b In each of
sets of random variables and in this way obtain B = 1000 values of K.
b and draw histograms
the four cases we therefore obtain 1000 values of 1 − FKS (K)
of these P -values in Figure 8.2.
If the test gives a P -value smaller than a significance level 0 < α < 0.5 (cf., e.g.,
Abadie, 2000), then we reject the given null hypothesis. Considering now the level
of significance α = 0.1, our findings show (cf. Table 8.1, row 4.1) that we do not
reject the null of equality of distributions for n = 200, 500, 1000, 2000. Also the
histograms of P -values show (cf. Figure 8.2) that as the sample size increases the
test converges to its asymptotic distribution.
4.2. Non-Linear DGP with the equality of two distributions in the second
(0,1)
period. We simulate two data sets X1
(0,1)
, . . . , Xn
(1,1)
and X1
(1,1)
, . . . , Xn
using the
(same) model
X (G,1) = exp{2.2 + 0.4ε},
where ε ∼ N (0, 1). Using the simulated data, we perform the Kolmogorov-Smirnov
test for the equality of the distributions F (0,1) and F (1,1) . Therefore, we use: F (0,1) (x) =
x−2.2
F (1,1) (x) = Φ( log 0.4
) to produce the two graphs in Figure 8.1.d.
For the non-linear DGP case, our findings show (cf. Table 8.1, row 4.2) that we
do not reject the null of equality of distributions for n = 500, 1000, 2000 at the
level of significance α = 0.1, but we reject the null of equality of distributions for
n = 200 at the level of significance α = 0.1. This result suggests that when dealing
with data that comes from a non-linear model the sample sizes need to be larger.
The histograms of P -values confirm the above results by showing (cf. Figure 8.3)
that as the sample size increases the test converges to its asymptotic distribution
but at a slower rate than in the case of the linear model above.
11
4.3. Linear DGP with no selection problem in the second period. Our
model in this subsection is
X (G,1) = 16 + 3(1 − G) + (1 − G)4ε1 + G4ε2
with independent random variables ε1 , ε2 ∼ N (0, 1) and
(
0 with probability 12 ,
G=
1 with probability 12 .
Hence, we simulate two sets of random numbers, corresponding to the control and
treatment groups, according to X (G,1) |G=0 = 19 + 4ε1 and X (G,1) |G=1 = 16 + 4ε2
respectively. Using the simulated data, we perform a test for dominance of F (0,1)
and F (1,1) . We use the formulas F (0,1) (x) = Φ( x−19
) and F (1,1) (x) = Φ( x−16
) to
4
4
produce the two graphs in Figure 8.1.b.
(0,1)
We simulate two data sets X1
(0,1)
, . . . , Xn
(1,1)
and X1
(1,1)
, . . . , Xn
from the distrib-
utions F (0,1) and F (1,1) , respectively. Using the simulated data, we perform the test
that the distribution F (0,1) (non-treated) is below the distribution F (1,1) (treated).
(2)
We know from Theorem 7.3 that under the hypothesis H0 : F (0,1) ≤ F (1,1) the test
b is such that, asymptotically, P[D
b > xα ] does not exceed the significance
statistic D
level α whenever xα solves the equation P[Γ+ > xα ] = α. Hence, the critical reb of the test using
gion is (xα , ∞). We estimate the asymptotic P -value P∗ [Γ+ > D]
bootstrap as follows
b ≈ P∗ [D
b ∗ > D],
b
P∗ [Γ+ > D]
b ∗ := supx (∆∗ (x)) with the earlier notation ∆∗ (x). In each of the four cases
where D
n = 200, 500, 1000, 2000, we simulate 1000 sets of random variables and in this
b For each value of D,
b we calculate P∗ [D
b ∗ > D]
b using
way obtain 1000 values of D.
b we have obtained a value
1000 bootstrap iterations. Hence, for each value of D
b ∗ > D],
b which is an approximate P -value of the test. To visualize the
of P∗ [D
distribution of the 1000 P -values for each of the four sample sizes specified above,
we have produced the histograms in Figure 8.4.
For the linear DGP case, our findings show (cf. Table 8.1, row 4.3) that we do not
reject the null of dominance of distributions for n = 500, 1000, 2000 at the level of
significance α = 0.1, but the test (mistakenly) rejects the null of dominance when
n = 200 at the level of significance α = 0.1. The histograms of P -values are given
in Figure 8.4.
12
4.4. Non-Linear DGP without selection problem in the second period.
The theoretical model of this subsection is
X (G,1) = exp(2.7 + 0.2(1 − G) + (1 − G)0.4ε1 + G0.4ε2 )
with ε1 , ε2 , and G as before. Hence, we have the equations X (G,1) |G=0 = exp(2.9 +
x−2.9
0.4ε1 ) and X (G,1) |G=1 = exp(2.7+0.4ε2 ). We used the formulas F (0,1) (x) = Φ( log 0.4
)
x−2.7
and F (1,1) (x) = Φ( log 0.4
) to construct the distributions in Figure 8.1.e.
We perform a simulation study along the lines of the previous subsections. For
the non-linear DGP case, our findings show (cf. Table 8.1, row 4.4) that we do not
reject the null of dominance of distributions for n = 500, 1000, 2000 at the level of
significance α = 0.1, but we reject the null of dominance of distributions for n = 200
at the level of significance α = 0.05. The histograms of P -values is given in Figure
8.5.
4.5. Linear DGP with selection problem in the second period. We simulate
observations of
X (G,1) = 16 + 3(1 − G) + (1 − G)ε1 + G4ε2
with same independent random variables ε1 , ε2 and G as above. Hence, we have the
equations X (G,1) |G=0 = 19 + 1ε1 and X (G,1) |G=1 = 16 + 4ε2 . Using the simulated
data, we perform a test for dominance vs intersection of F (0,1) and F (1,1) . We use
the formulas F (0,1) (x) = Φ( x−19
) and F (1,1) (x) = Φ( x−16
) to produce the two graphs
1
4
in Figure 8.1.c.
(0,1)
We simulate two data sets X1
(0,1)
, . . . , Xn
(1,1)
and X1
(1,1)
, . . . , Xn
from the distrib-
utions F (0,1) and F (1,1) , respectively. Using the simulated data, we perform the test
(not 3)
: F (0,1) dom F (1,1) . Theorem 7.4 says that under the
the test statistic Tb is such that, asymptotically, P[Tb > xα ] does
for the null hypothesis H0
(not 3)
hypothesis H0
not exceed the significance level α whenever xα solves the equation P[max(Γ+ , Γ− ) >
xα ] = α. The critical value xα is not distribution free, and so the asymptotic P value of the test, P∗ [max(Γ+ , Γ− ) > Tb], is not calculable. Hence, we use bootstrap
to estimate the P -value:
P∗ [max(Γ+ , Γ− ) > Tb] ≈ P∗ [Tb∗ > Tb],
where Tb∗ := max(supx (∆∗ (x)), supx (−∆∗ (x))) with the same ∆∗ (x)) as above.
In each of the four cases n = 200, 500, 1000, 2000, we simulate 1000 sets of
random variables and obtain 1000 values of Tb. For each value of Tb, we then calculate
13
P∗ [Tb∗ > Tb] using 1000 bootstrap iterations. Hence, for each value of Tb we obtain a
value of P∗ [Tb∗ > Tb], which is an approximate P -value of the test. To visualize the
distribution of the 1000 P -values for each of the four sample sizes specified above,
we produce histograms in Figure 8.6.
At the level of significance α = 0.2, our findings show (cf. Table 8.1, row 4.5) that
we reject the null of dominance of distributions for n = 1000, 2000 which means
that we accept the alternative of intersection of distributions. Also, at the same
level of significance, we do not reject the null of dominance for n = 200, 500. The
histograms of P -values are given in Figure 8.6.
Note that if we formulate the null hypothesis as equality of two cdf’s, then the
critical values are those of the Kolmogorov-Smirnov test considered earlier. At the
level of significance α = 0.1, our findings show (cf. Table 8.1, row 4.5KS ) that we
reject the null of equality of distributions for n = 500, 1000, 2000 which means
we accept the alternative of intersection of distributions. This note shows that it
is important to consider various plausible null hypotheses and analyze data from
various angles.
4.6. Non-Linear DGP with selection problem in the second period. The
model of this subsection is
X (G,1) = exp(2.7 + 0.2(1 − G) + (1 − G)0.1ε1 + G0.4 ε2 )
with same independent random variables ε1 , ε2 and G as before. Hence, we have the
equations X (G,1) |G=0 = exp(2.9 + 0.1 ε1 ) and X (G,1) |G=1 = exp(2.7 + 0.4 ε2 ). We use
x−2.9
x−2.7
the two formulas F (0,1) (x) = Φ( log 0.1
) and F (1,1) (x) = Φ( log 0.4
) to produce the
two graphs in Figure 8.1.f.
For this case, our findings show (cf. Table 8.1, row 4.6) that we reject the null
of dominance of distributions for n = 500, 1000, 2000 at the level of significance
α = 0.2, but we do not reject the null of dominance of distributions for n = 200 at
the level of significance α = 0.2. The histograms of P -values are given in Figure 8.7.
Note that if we formulate the null hypothesis as equality of two cdf’s, then at
the level of significance α = 0.05, our findings show (cf. Table 8.1/4.6KS ) that we
reject the null of equality of distributions for n=200,500,1000,2000 which means we
accept the alternative of intersection of distributions for all sample sizes. In this
case we can see that the null hypothesis of equality of distributions is rejected even
for (relatively) small samples.
14
5. Application
5.1. Data. The data we analyze is from the Job Search Assistance (JSA) Demonstration Experiment (cf. Decker et al 2000). The experiment tested if the JSA
demonstration services would speed up re-employment and reduce the unemployment insurance (UI) benefits claimed by the demonstration participants when workers are encouraged to search more effectively and aggressively for a new job.
The demonstration was conducted in the District of Columbia (D.C.) and Florida.
The D.C. demonstration operated in a single office and served a targeted sample
of claimants from the full D.C. claimant population. Claimant selection occurred
between June 1995 and June 1996, and a total of 8,071 claimants were randomly
assigned to a control group and three alternative treatment groups. The three
service strategies developed for promoting rapid re-employment and reduced UI
spells among targeted UI claimants are:
(1) Structured Job Search Assistance (SJSA). Claimants assigned to this treatment were required to participate in an orientation, testing, a job search
workshop, and a one-on-one assessment interview. Claimants who failed to
participate in any service, unless explicitly excused, could be denied benefits. After completion of the services, claimants were required to have two
additional contacts with demonstration staff to report on their job search
progress (cf. Decker et al, 2000, p.VII).
(2) Individualized Job Search Assistance (IJSA). This treatment assigned claimants
to services based on their assessed needs. All claimants were required to
participate in an orientation and a one-on-one assessment interview. During
the assessment interview, the claimant and a demonstration staff member
developed a service plan to address the claimants needs. If the service plan
included demonstration-specific services, such as testing, a job search workshop, or additional counseling, these services would become mandatory (cf.
Decker et al, 2000, p.VII).
(3) Individualized Job Search Assistance With Training (IJSA+). This treatment was identical to the second treatment, except for the inclusion of a
coordinated effort with local Economic Dislocation and Worker Adjustment
Act (EDWAA) staff to enroll interested claimants in training. During the
15
orientation, an EDWAA staff member discussed local opportunities for training. Training opportunities were also discussed during the assessment interview, and any claimant interested in training was scheduled to meet with an
EDWAA staff member at the demonstration office (cf. Decker et al, 2000,
p.VIII).
We consider applying the test on the data associated with SJSA treatment in
D.C. because:
(1) The estimates obtain on the JSA treatments reduced UI receipt significantly
over the initial benefit year. The largest impact occurred in D.C. for the
SJSA treatment, which reduced average UI receipt by more than a week, or
by 182$ per claimant (cf. Decker et al, 2000, p.83).
(2) SJSA increased the rate at which D.C. claimants exited UI throughout the
entire potential UI spell. The impact of SJSA is represented by the difference
between the exit rates for the SJSA and control groups. At the five-week
mark, the cumulative exit rate for the SJSA group was 17.7%, which was
more than 50% higher than the 11.6% rate for the control group. The absolute magnitude of this difference then remained relatively steady over time,
even though the SJSA services were received early in the UI spell (cf. Decker
et al, 2000, p.99).
(3) At the same time SJSA was associated with a modest increase in the likelihood of being employed in each quarter (about 2 to 3 percentage points),
and the estimated impacts are statistically significant in about half of the
quarters (cf. Decker et al, 2000, p.138).
The following information can help in identifying a potential selection into the
SJSA treatment in D.C. (cf. also Figure 8.8.a):
(1) About 78% of those who did not attended the orientation reported that they
had gotten a job before their scheduled orientation (cf. Decker et al, 2000,
p.46).
(2) 5% were excused in D.C. from the orientation (cf. Decker et al, 2000, p.48).
(3) About 15% failed to attend the orientation in D.C. Overall about 82% of
those who were not excused attended the orientation (cf. Decker et al, 2000,
p.48).
16
(4) About 97% of claimants who attended the orientation attended assessment.
Attendance rates at the assessment interview are lower in SJSA (81% for
D.C.) than in IJSA and IJSA+ in D.C. (cf. Decker et al, 2000, p.49).
(5) D.C. staff may not have aggressively assigned claimants to group services
because they felt that one-on-one counseling was more effective or more acceptable to demonstration participants and because of a shortage of resources
for providing group services. The D.C. office had difficulty maintaining sufficient staff who were trained to conduct group services and had a shortage
of space for providing group services. In contrast, the D.C. office had ample staff for one-on-one counseling and adequate office space for conducting
one-on-one services (cf. Decker et al, 2000, p.50).
5.2. Empirical Results. Looking at the EDFs of the treatment and control groups
for the SJSA experiment (cf. Figure 8.8.a), we observe that for lower durations
of unemployment the treatment dominates the control (i.e., there is a treatment
effect), but for higher durations of unemployment (above 30 weeks) the treatment
is dominated by the control group (i.e., there is no treatment effect). Therefore, the
treatment is not uniform over the group of treated individuals, and it is possible to
observe a change in unobserved heterogeneity at the period t = 1 for the individuals
from the treatment group. To test if indeed is a change in unobserved heterogeneity
for the individuals from the treated group we perform a test for the dominance vs
intersection of distributions. We obtain that the P -value is 0.213 (cf. Table 8.2) for
the test. Alternatively if we use Kolmogorov-Smirnov statistic to test the equality
of distributions against the alternative that the distributions intersect, we get that
the P -value is 0.195 The results, however, of the two tests are similar.
Given that our sample is larger than 2000 observations and that the simulations
show that for n = 2000 the test is very close to the true value (cf. Figure 8.8.b),
we have that the two distributions intersect at a higher level of significance. We
can also conclude that there is a change in unobserved heterogeneity in the treated
group at t = 1.
6. Conclusions
In this paper we developed a theoretical framework necessary to test if the heterogeneity distribution of the treatment group in a randomized experiment changes
during the experimental period. To make the tests easily implementable in practice,
17
we discuss how to estimate critical values using bootstrap methodologies. To asses
the performance of the tests, we conduct simulation studies. We apply the tests to
analyze a social experiment data set (the SJSA experiment). The tests identify that
there is a change in unobserved heterogeneity in the treated group at t = 1. The
change in unobserved heterogeneity for the treated individuals at time t = 1 is a
strong evidence of selection into the treatment. This selection can be explained by
the burden of participating in this treatment felt by the individuals from the lower
tail of distribution (individuals with long spells of unemployment).
18
References
Abadie, A. (2000), Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models. NBER Technical Working Papers, National Bureau of
Economic Research.
Anderson, G. (1996), Nonparametric tests of stochastic dominance in income distributions. Econometrica, 64, 1183-1193.
Athey, S. and G. Imbens, (2003), Identification and Inference in Nonlinear DifferenceIn-Differences Models. NBER Technical Working Paper No. t0280.
Barrett, G.F. and Donald, S.G. (2003), Consistent tests for stochastic dominance
Econometrica 71, 71-104.
Davidson, R. and Duclos, J.-Y. (2000), Statistical inference for stochastic dominance
and for the measurement of poverty and inequality. Econometrica 68, 1435-1464.
Decker, P.T, Olsen, R.B., Freeman, L., Klepinger, D.H. (2000). Assisting Unemployment Insurance Claimants: The Long-Term Impacts of the Job Search Assistance
Demonstration. W.E. Upjohn Institute for Employment Research, Kalamazoo,
MI
Fraker, T. and Maynard, R. (1987), The Adequacy of Comparison Group Designs
for Evaluations of Employment Related Programs. Journal of Human Resources,
22, 194-227.
Heckman, J. (1992), Randomization and Social Policy Evaluation. In Charles Manski and Irwin Garfinkle, eds., Evaluating Welfare and Training Programs (Cambridge, Mass.: Harvard University Press), 201-230.
Heckman, J. (1997), Randomization as an Instrumental Variables Estimator: A
Study of Implicit Behavioral Assumptions in One Widely-used Estimator. Journal
of Human Resources, 32, 442-462.
Heckman, J. and Hotz, J. (1989), Choosing Among Alternative Nonexperimental
Methods for Estimating the Impact of Social Programs: The Case of Manpower
Training. Journal of the American Statistical Association, 84 (408), 862-880.
Whang, Y.-J., Linton, O. and Maasoumi, E. (2005). Consistent testing for stochastic
dominance under general sampling schemes. Review of Economic Studies, 72.
Meyer, B., K. Viscusi and D. Durbin (1995), Workers Compensation and Injury
Duration: Evidence from a Natural Experiment. American Economic Review,
85, 322-340.
19
McFadden, D. (1989), Testing for stochastic dominance. In: Studies in the Economics of Uncertainty (eds. T.B. Fomby and T.K. Seo). Springer-Verlag, New
York.
Schmid, F. and Trede, M. (1996a), Testing for first order stochastic dominance in
either direction. Comput. Statist. 11, 165-173.
Schmid, F. and Trede, M. (1998), A Kolmogorov-type test for second-order stochastic dominance. Statist. Probab. Lett. 37, 183-193.
Shaked, M. and Shanthikumar, J.G. (1994), Stochastic Orders and their Applications. Academic Press, Boston, MA, 1994.
7. Appendix: technical results and proofs
b θ,
b and τb are strongly (and thus weakly) consistent
b, δ,
Theorem 7.1. The statistics κ
estimators of, respectively, κ, δ, θ, and τ .
The proofs of the above theorem and those to be presented below are relegated
to the second half of this section.
The rest of this appendix is devoted to establishing distributional results for the
b and τb. Throughout, we use the following Gaussian stochastic
three estimators κ
b, δ,
process
Γ(x) :=
√
η B1 (F (0,1) (x)) −
p
1 − η B2 (F (1,1) (x)),
where B1 and B2 are two independent (standard) Brownian bridges on the interval [0, 1]. Note that when the two distributions F (0,1) and F (1,1) are equal, then
supx |Γ(x)| is not smaller than supt |B(t)|, and is exactly supt |B(t)| when the distributions are continuous. The distribution of the random variable supt |B(t)| does
not depend on any unknown parameter, has been tabulated, and is known as the
Kolmogorov-Smirnov distribution.
(1)
Theorem 7.2. Under the null hypothesis H0 , we have that
b →d Γ,
K
(7.1)
(1)
where Γ := supx |Γ(x)|. Under the alternative hypothesis H1 , we have that
·r
¸
£
¤
nm
lim P
|b
κ − κ| > x ≤ P Γ > x .
n,m→∞
n+m
b tends in probability to +∞ under the alternative.
Hence, the test statistic K
(7.2)
20
(2)
b is such that
Theorem 7.3. Under the null hypothesis H0 , the test statistic D
£
¤
£
¤
b > x ≤ P Γ+ > x ,
lim sup P D
(7.3)
n,m→∞
(2)
where Γ+ := supx (Γ(x)). Under the alternative hypothesis H1 , we have that
·r
¸
£
¤
nm b
lim P
(δ − δ) > x ≤ P Γ > x ,
(7.4)
n,m→∞
n+m
where Γ is same as in Theorem 7.2. Hence, under the alternative, the test statistic
b tends in probability to +∞.
D
(3)
Theorem 7.4. Under the null hypothesis H0 , the test statistic Tb is such that
£
¤
£
¤
lim sup P Tb > x ≤ P max(Γ+ , Γ− ) > x ,
(7.5)
n,m→∞
where Γ− := supx (−Γ(x)), and Γ+ is same as in Theorem 7.3. Under the alternative
(3)
hypothesis H1 , we have that
·r
¸
£
¤
nm
lim P
(b
τ − τ) > x ≤ P Γ > x .
n,m→∞
n+m
(7.6)
Hence, the test statistic Tb tends in probability to +∞ under the alternative.
Proof of Theorem 7.1. The strong consistency of the four estimators follows from
the classical Glivenko-Cantelli Lemma which implies that the two suprema
[
(0,1) (x) − F (0,1) (x)| and sup |F
(1,1) (x) − F (1,1) (x)|
sup |F[
x
x
converge to 0 almost surely.
¤
Proof of Theorem 7.2. Under the null hypothesis, we have F (0,1) (x) = F (1,1) (x)
b = supx |∆(x)|, where
for all x, and so K
r
r
nm
nm
(0,1)
(0,1) (x) − F
(1,1) (x) − F (1,1) (x)).
∆(x) :=
(F[
(x)) −
(F[
n+m
n+m
b converges in distribution to Γ. Statement (7.1) is proved. To
Consequently, K
prove statement (7.2), we first note that under the alternative we have the equality
b = supx |Ξ(x)|, where
K
r
nm
(F (0,1) (x) − F (1,1) (x)).
Ξ(x) := ∆(x) +
n+m
Obviously now,
r
nm
|b
κ − κ| ≤ sup |∆(x)| →d Γ.
n+m
x
(7.7)
21
Statement (7.2) follows. The last note of Theorem 7.2 follows from statement (7.7)
p nm
and the fact that n+m
κ → ∞, since κ > 0 under the alternative.
¤
b = supx (Ξ(x)) with the earProof of Theorem 7.3. We first write the equation D
lier defined function Ξ(x). Now we note that, under the null hypothesis, supx (Ξ(x)))
does not exceed supx (∆(x))), and we already know that the latter quantity converges
in distribution to Γ+ . Statement (7.4) follows. To prove statement (7.5), we write
the bound
r
nm b
|δ − δ| ≤ sup |∆(x)|.
(7.8)
n+m
x
Statement (7.5) follows. The last note of Theorem 7.3 follows from statement (7.8)
p nm
and the fact that n+m
δ → ∞, since δ > 0 under the alternative.
¤
Proof of Theorem 7.4. With the earlier defined function Ξ(x), we have that Tb is
the minimum between supx (Ξ(x)) and supx (−Ξ(x))). Following arguments in the
proof of Theorem 7.3, we have that supx (Ξ(x)) ≤ supx (∆(x)) provided that F (0,1) ≤
F (1,1) . If, however, F (0,1) ≥ F (1,1) , then we analogously prove that supx (−Ξ(x)) ≤
supx (−∆(x)). Since we do not know which of the two cases happens, we estimate Tb from above by the maximum between supx (∆(x)) and supx (−∆(x)). Since
supx (∆(x)) →d Γ+ and supx (−∆(x)) →d Γ− , we have statement (7.7). Statement
(7.8) follows from the bound
r
nm
|b
τ − τ | ≤ sup |∆(x)|.
n+m
x
The last note of Theorem 7.4 follows from statement (7.9) and the fact that
∞, since τ > 0 under the alternative.
(7.9)
p
nm
n+m
δ→
¤
22
8. Appendix: tables and figures
Table 8.1. P -values of tests for the equality, dominance
and intersection of the control and treatment distributions at t = 1. Linear and non-linear DGPs.
Subsection n = 200
n = 500
n = 1000
n = 2000
4.1
0.165
0.525
0.839
0.914
4.2
0.062
0.291
0.685
0.783
4.3
0.073
0.462
0.716
0.801
4.4
0.003
0.222
0.723
0.808
4.5
0.270
0.208
0.143
0.085
4.6
0.213
0.140
0.104
0.057
4.5KS
0.146
0.096
0.023
0.005
4.6KS
0.028
0.001
0.000
0.000
Table 8.2. P -values of tests for distributional differences between the control and treatment groups for SJSA
at t = 1.
Intersection
P -value
0.213
P -valueKS
0.195
20
25
30
35
0.8
0.4
0.0
20
25
20
25
30
35
0.4
0.0
0.4
15
10
20
30
40
f)
60
80
60
0.4
0.0
20
40
60
duration variable
duration variable
e)
f)
Figure 8.1. EDFs as functions of duration for linear models: a)
equality of distributions, b) dominance of distributions, c) intersection of distributions. EDFs as functions of duration for nonlinear
models: d) equality of distributions, e) dominance of distributions, f)
intersection of distributions.
50
0.8
duration variable
c)
EDFs treatment and control
duration variable
40
30
0.8
b)
EDFs treatment and control
duration variable
0.4
20
15
a)
0.0
10
10
duration variable
0.8
5
EDFs treatment and control
0.8
0.4
0.0
15
0.8
10
0.0
EDFs treatment and control
EDFs treatment and control
EDFs treatment and control
23
80
0.8
0.0
0.4
Density
0.8
0.4
0.0
Density
1.2
24
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.6
0.8
1.0
0.8
0.0
0.4
Density
0.4
0.0
Density
0.4
P−values, N=500
0.8
P−values, N=200
0.2
0.0
0.2
0.4
0.6
0.8
P−values, N=1000
1.0
0.0
0.2
0.4
0.6
P−values, N=2000
Figure 8.2. Histograms of P -values for the equality of distributions.
Linear model.
0.8
1.0
0.8
0.0
0.4
Density
1.0
0.5
0.0
Density
1.5
1.2
25
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.4
0.6
0.8
1.0
P−values, N=500
0.8
0.0
0.4
Density
0.8
0.4
0.0
Density
1.2
P−values, N=200
0.2
0.0
0.2
0.4
0.6
0.8
P−values, N=1000
1.0
0.0
0.2
0.4
0.6
P−values, N=2000
Figure 8.3. Histograms of P -values for the equality of distributions.
Non-linear model.
0.8
1.0
0.8
0.0
0.4
Density
1.0
0.5
0.0
Density
1.5
1.2
26
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.4
0.6
0.8
1.0
0.4
0.0
0.4
Density
0.8
0.8
P−values, N=500
0.0
Density
P−values, N=200
0.2
0.0
0.2
0.4
0.6
0.8
P−values, N=1000
1.0
0.0
0.2
0.4
0.6
P−values, N=2000
Figure 8.4. Histograms of P -values for the dominance of distributions. Linear model.
0.8
1.0
1.0
0.0
0.5
Density
1.0
0.5
0.0
Density
1.5
1.5
27
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.4
0.6
0.8
1.0
P−values, N=500
0.8
0.0
0.4
Density
0.8
0.4
0.0
Density
1.2
P−values, N=200
0.2
0.0
0.2
0.4
0.6
0.8
P−values, N=1000
1.0
0.0
0.2
0.4
0.6
P−values, N=2000
Figure 8.5. Histograms of P -values for the dominance of distributions. Non-Linear model.
0.8
1.0
0.8
0.0
0.4
Density
0.8
0.4
0.0
Density
28
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.6
0.8
1.0
P−values, N=500
0.0
0.4
Density
0.8
0.4
0.0
Density
0.4
0.8
P−values, N=200
0.2
0.0
0.2
0.4
0.6
0.8
P−values, N=1000
1.0
0.0
0.2
0.4
0.6
P−values, N=2000
Figure 8.6. Histograms of P -values for the intersection of distributions. Linear model.
0.8
1.0
1.2
0.8
0.0
0.4
Density
0.8
0.4
0.0
Density
1.2
29
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.4
0.6
0.8
1.0
P−values, N=500
0.8
0.0
0.4
Density
0.8
0.4
0.0
Density
1.2
P−values, N=200
0.2
0.0
0.2
0.4
0.6
0.8
P−values, N=1000
1.0
0.0
0.2
0.4
0.6
P−values, N=2000
Figure 8.7. Histograms of P -values for the intersection of distributions. Non-linear model.
0.8
1.0
0.4
Density
0.8
1.0
0.6
0.0
0.2
EDFs treatment and control
30
0
10
20
30
40
0.0
SJSA treatment/control duration variables
a)
0.2
0.4
0.6
P−values
b)
Figure 8.8. a) Treatment and control EDFs in the second period,
b) Histograms of P -values for the SJSA treatment.
0.8
1.0
Download