COUNTERFACTUAL MAPPING AND INDIVIDUAL TREATMENT EFFECTS

advertisement
COUNTERFACTUAL MAPPING AND INDIVIDUAL TREATMENT EFFECTS
IN NONSEPARABLE MODELS WITH BINARY ENDOGENEITY*
QUANG VUONG† AND HAIQING XU‡
A BSTRACT. This paper establishes nonparametric identification of individual treatment
effects in a nonseparable model with a binary endogenous regressor. The outcome
variable may be continuous, discrete or a mixture of both, and the instrumental variable
can take binary values. We distinguish the cases where the model includes or does not
include a selection equation for the binary endogenous regressor. First, we establish
point identification of the structural function when it is continuous and strictly monotone
in the latent variable. The key to our results is the identification of a so-called “counterfactual mapping” that links each outcome with its counterfactual. This then identifies
every individual treatment effect. Second, we characterize all the testable restrictions
imposed by the model with or without the selection equation. Lastly, we generalize our
identification results to the case where the outcome variable has a probability mass in its
distribution such as when the outcome variable is censored or binary.
Keywords: Nonparametric identification, nonseparable models, discrete endogenous
variable, counterfactual mapping, individual treatment effects
Date: Saturday 16th May, 2015.
∗ A previous version of this paper was circulated under the title “Identification of nonseparable models
with a binary endogenous regressor”. We thank Jason Abrevaya, Federico Bugni, Karim Chalak, Xiaohong
Chen, Victor Chernozhukov, Andrew Chesher, Denis Chetverikov, Xavier D’Haultfoeuille, Stephen
Donald, Jinyong Hahn, Shakeeb Khan, Brendan Kline, Qi Li, Robert Lieli, Matthew Masten, Isabelle
Perrigne, Geert Ridder, Xiaoxia Shi, Robin Sickles, Elie Tamer, Edward Vytlacil and Nese Yildiz for
useful comments. We also thanks seminar participants at Duke, Rice, NYU, UCLA, Princeton, Rochester,
2013 Bilkent University Annual Summer Workshop, 23rd Annual Meeting of the Midwest Econometrics
Group, 2014 Cowles Summer Conference, 2014 Cemmap Conference, and the 6th French Econometrics
Conference.
† (corresponding author) Department of Economics, New York University, 19 W. 4th Street, 6FL, New
York, NY, 10012, qvuong@nyu.edu.
‡ Department of Economics, University of Texas at Austin, Austin, TX, 78712, h.xu@austin.utexas.edu.
1
1. I NTRODUCTION AND RELATED LITERATURE
The primary aim of this paper is to establish nonparametric identification of individual treatment effects in a nonseparable model with a binary endogenous regressor. We
focus on the case where the instrumental variable has limited variations, namely when
it takes only binary values. We introduce a counterfactual mapping, which links each
realized outcome with its counterfactual to identify individual treatment effects. As a
by–product, we also identify the structural function. The key idea is to exploit information from the “complier group” in the presence of endogeneity following Imbens
and Angrist (1994). The identification results in the paper are related to the methods
in Chernozhukov and Hansen (2005), Athey and Imbens (2006), and Guerre, Perrigne,
and Vuong (2009). When the outcome variable has mass points in its distribution, our
approach also generalizes Vytlacil and Yildiz (2007) to the fully nonseparable case.
The secondary objective of this paper is to investigate the identification power as
well as the restrictions on observations provided by the selection equation, such as its
monotonicity. In particular, we derive an easily verifiable high level condition that is
sufficient for identification of the structural function and show that the monotonicity
of the selection equation is sufficient for such a condition. Moreover, we characterize
all the testable restrictions on observables imposed by the model with or without the
selection equation.
Nonparametric identification of nonseparable models has become central for understanding the source of identification power in structural models, especially in models
with discrete endogenous variables, see, e.g., Heckman, Smith, and Clements (1997),
Chesher (2005), Heckman and Vytlacil (2005) and the references therein. Theoretic models that admit structural relationships usually just provide qualitative properties, e.g.,
monotonicity between economic variables of interest (see e.g. Milgrom and Shannon,
1994; Athey, 2001, 2002; Reny, 2011). For the related empirical analysis, introducing
additional parametric restrictions on the functional form, especially on the form of
2
heterogeneity, without careful justifications may lead to spurious identification; see
Heckman and Robb (1986) and Imbens and Wooldridge (2009).
The literature on nonseparable models and treatment effects is vast, see e.g. Imbens
(2007), Imbens and Wooldridge (2009) for surveys. Our paper also considers the identification of the structural functions in nonseparable models with binary endogenous
regressors. Identification of such functions was studied in a setting with continuous
endogenous regressors; see Chesher (2003), Matzkin (2008), Imbens and Newey (2009),
D’Haultfœuille and Février (forthcoming), Torgovitsky (forthcoming), among others.
For the binary endogenous variable case, Chernozhukov and Hansen (2005), Chen,
Chernozhukov, Lee, and Newey (2014) and Chernozhukov and Hansen (2013) establish identification of the structural function without requiring a selection equation.
Further, Chesher (2005) establishes partial identification of the structural function at
some conditional quantile of the error term under local conditions on the instrumental
variable. Subsequently, Jun, Pinkse, and Xu (2011) tighten Chesher (2005)’s bounds by
strengthening Chesher’s local conditions to the full independence of the instrumental
variable.
The idea behind our identification strategy differs from the above literature. It is
based on identifying the counterfactual mapping that relates each individual outcome to
its counterfactual under the monotonicity of the outcome equation. This then identifies
the structural function. Similar to Imbens and Angrist (1994), we consider an exogenous
change in the instrumental variable, which affects the distribution of outcomes through
a subpopulation called the “complier” group. We can then identify the conditional probability distribution of the (potential) outcome for this subpopulation at each value of the
binary endogenous variable. From these two conditional distributions, we can identify
constructively the counterfactual mapping relating the quantiles of one distribution
to the other for the whole population. The idea of matching two quantile functions to
exploit limited variations of instrumental variables was introduced by Athey and Imbens (2006) in the context of policy intervention analysis with repeated cross–sectional
3
data in nonlinear difference–in–difference models. It was also used by Guerre, Perrigne,
and Vuong (2009) in the empirical auction literature with an endogenous number of
bidders, and exploited by D’Haultfœuille and Février (forthcoming) and Torgovitsky
(forthcoming) in a triangular model with continuous endogenous regressors.
Key among our identifying conditions is the (strict or weak) monotonicity of the
structural function, which comes from the nonseparable model literature. See e.g.
Matzkin (1999, 2003, 2008, 2015), Chesher (2003, 2005), Chernozhukov and Hansen (2005,
2013) and Chen, Chernozhukov, Lee, and Newey (2014). Without any monotonicity
restriction, Manski (1990) derives sharp bounds for the average treatment effect (ATE)
with and without instrumental variables. Using an instrumental variable and the weak
monotonicity assumption on the selection equation, Imbens and Angrist (1994) establish
point identification of the local average treatment effect (LATE). Alternatively, in a
similar setting, Heckman and Vytlacil (1999, 2005) develop the marginal treatment effect
(MTE) and establish its identification by using local variations in instrumental variables.
In this paper, we impose a strict/weak monotonicity assumption on the outcome
equation, which allows us to constructively identify the counterfactual mapping, as
well as the individual treatment effects.
In our setting, we allow the instrumental variable to take only binary values, a case
where identification at infinity obviously fails; see, e.g., Heckman (1990). When the
outcome variable is continuously distributed, we assume that the outcome equation is
strictly monotone in the error term. Under this assumption, our identification strategy
only requires a binary–valued instrumental variable. On the other hand, when the
outcome variable has a mass probability, the strict monotonicity condition is no longer
proper given that the error term is usually assumed to be continuously distributed. We
then replace the strict monotonicity by the weak monotonicity of the outcome equation.
As a consequence, our rank condition requires more variations in the instrumental
variable, though a finite support is still allowed, thereby generalizing Vytlacil and Yildiz
(2007)’s results from weakly to fully nonseparable models.
4
Our method is related to the instrumental variable approach developed in Chernozhukov and Hansen (2005) and generalized by Chen, Chernozhukov, Lee, and Newey
(2014) and Chernozhukov and Hansen (2013). The instrumental variable approach does
not require a selection equation. Identification then relies on a full rank condition
of an equation system. In our setting, we exploit the identification power from the
monotonicity of some identified functions to deliver a weaker sufficient condition for
identification in a constructive way. Moreover, we characterize all the testable restrictions on observables imposed by the model with and without the selection equation,
thereby allowing researchers to assess the model from data.
The structure of the paper is organized as follows. We introduce our benchmark
model in Section 2 and we establish its identification in Section 3. We also discuss
partial identification when point identification fails. Section 4 extends our identification
argument to the case where there is no selection equation. Section 5 characterizes
the restrictions imposed on data by the model with or without the selection equation.
Section 6 generalizes our method to the case where the distribution of the outcome
variable has some mass points. Section 7 concludes.
2. T HE B ENCHMARK M ODEL
We consider the nonseparable triangular system with a binary endogenous variable:
Y = h( D, X, e),
(1)
D = 1[m( X, Z ) − η ≥ 0],
(2)
where Y is the outcome variable, D ∈ {0, 1} is a binary endogenous variable, X ∈ RdX
is a vector of observed exogenous covariates and Z ∈ {z1 , z2 } is a binary instrumental
variable for the binary endogenous variable D.1 The error terms e and η are assumed
1An early use of this triangular model is Heckman (1979) for sample selection in a parametric setting.
Moreover, considering a binary valued instrument highlights the identification power of the instrumental
variable. For the importance of having binary instruments in the treatment effects literature, see e.g.
Imbens and Wooldridge (2009) for a detailed discussion.
5
to be scalar valued disturbances.2 The functions h and m are unknown structural
relationships, where h is the function of interest. For every x ∈ SX , we also define the
individual treatment effect (ITE) as the function
ITEx (e) ≡ h(1, x, e) − h(0, x, e)
(3)
where e ∈ Se .3 See e.g. Rubin (1974), and Heckman, Smith, and Clements (1997).
Following standard convention, we refer to (1) and (2) as the outcome equation and
the selection equation, respectively. Note that (2) covers the general setting where
D = g( X, Z, η ) under additional standard assumptions. Specifically, suppose that
g is non–increasing and left–continuous in η. For each ( x, z), let m( x, z) = inf{η ∈
R : g( x, z, η ) = 0}. It follows that g( x, z, η ) = 1 {m( x, z) − η ≥ 0} for all ( x, z). For a
detailed discussion, see e.g. Vytlacil (2002, 2006).
A running example of this model is the return to education (see e.g. Heckman, 1979;
Chesher, 2005): let Y, D and ( X, Z ) be ‘earnings’, ‘schooling’ and ‘demographics’, respectively. Moreover, let e be job related ability and η be education related talent. Intuitively,
these two latent variables should be correlated to each other, which accounts for the
endogeneity problem in the triangular system. The difference between demographics X
and Z is that Z affects the education level of an individual, but not the wage outcome.
For instance, Z could be some institutional features of the education system. Another
example arises when h( D, X, ·) is the equilibrium bidding strategy with risk averse
bidders, D indicates the level of competition (the number of bidders), X are characteristics of the auctioned object and e is bidder’s private value, see e.g. Guerre, Perrigne,
and Vuong (2000, 2009). This example is especially relevant as strict monotonicity and
nonseparability in the scalar private value e arises from auction theory.
2Note that η being scalar is not essential and can be relaxed by the monotonicity assumption in Imbens
and Angrist (1994). When e ∈ Rk with k ≥ 2, there always exists a (known) bijection from Rk to R that
depends on neither d nor x. In the nonseparable model context, multidimensional errors have also been
considered by Hoderlein and Mammen (2007) and Kasy (2014).
3For a generic random variable W, we use S to denote the support of W.
W
6
In the benchmark model, we make the following assumptions.
Assumption A. Equation (1) holds where (i) h is continuous and strictly increasing in e, with
e distributed as U [0, 1], and (ii) ( X, Z ) is independent of e.
Assumption B. Equation (2) holds where (i) Z is conditionally independent of η given ( X, e),
and (ii) conditional on X, the joint distribution of (e, η ) is absolutely continuous with respect to
Lebesgue measure.
In assumption A–(i), the continuity and strict monotonicity of h follow e.g. Matzkin
(1999, 2003), Chesher (2003) and Chernozhukov and Hansen (2005). This assumption
rules out mass points in the distribution of Y. In Section 6, we generalize our results
by allowing h to be flat inside the support of e. The uniform distribution of e on
[0, 1] is a normalization which is standard in the literature; see e.g. Chesher (2003)
and Chernozhukov and Hansen (2005).4 The essential restriction in assumption A-(i)
is the so–called rank preservation/invariance condition, see e.g. Heckman, Smith,
and Clements (1997), Chernozhukov and Hansen (2005) and Firpo (2007), among
others. For any given τ ∈ (0, 1), it requires that the relative rank/quantile of h(1, x, τ )
in the distribution of h(1, x, e) be the same as that of h(0, x, τ ) in the distribution of
h(0, x, e). In Section 5, we characterize the model restrictions on observables imposed
by assumption A. These restrictions are weak but testable. If the error term is additive,
as is widely assumed, i.e., Y = h∗ ( D, X ) + e for some real valued function h∗ , then
assumption A–(i) holds. Assumption A–(ii) is standard in the literature, see, e.g., Imbens
and Newey (2009).
To illustrate, here we provide two examples of the nonseparable structural function h
satisfying assumption of strict monotonicity. The first example seems to be new. The
second example was considered by Guerre, Perrigne, and Vuong (2009).
4Matzkin (2003) proposes an alternative normalization on the structural function h, instead of normalizing
the distribution of e.
7
Example 1 (Additive Error with Generalized Heterogeneity). Let
Y = h∗ ( D, X ) + σ( D, X ) · e
for some real valued function h∗ , positive function σ > 0 and where e has unit variance. In particular, the heterogeneity σ depends on the endogenous binary variable D. Thus, h( D, X, e) =
h∗ ( D, X ) + σ( D, X ) · e.
Example 2 (First–price Auction with Risk Aversion). Consider a first price auction with I
risk averse bidders within the private value paradigm. In a symmetric equilibrium, each bidder’s
equilibrium bid has the following form
Bi = h( I, X, ei ), ∀ i = 1, · · · , I,
where Bi is bidder i’s equilibrium bid, I ∈ { I` , Ih } is the number of bidders which reflects the
level of competition, X is characteristics of the auctioned object, ei is the private value. In
particular, the bidding strategy h is strictly increasing in private information ei .
Moreover, assumption B specifies the selection equation. Assumption B–(i) together
with assumption A–(ii) is equivalent to e⊥ X and (e, η )⊥ Z | X.5 It is worth emphasizing
that we require the instrumental variable Z to be independent of (e, η ) (conditionally
on X), which is stronger than the local independence restriction imposed by Chesher
(2005). Moreover, it is worth noting that if the objects of interest are the individual
treatment effects (ITE) but not the structural function h, then the independence condition
in assumption A–(ii) can be relaxed to Z ⊥e| X. The latter combined with the conditional
independence in assumption B–(i) becomes equivalent to Z ⊥(e, η )| X. Under such a
weaker condition, Section 3 shows that we can still identify the counterfactual mapping
φx and hence the ITEs as our identification argument is conditional on X = x. In other
words, identification of the ITEs does not require exogeneity of X. Assumption B–(ii)
5This is because assumption A-(ii) is equivalent to e⊥ X and e⊥ Z | X. The latter condition and η ⊥ Z |( X, e)
in assumption B–(i) are equivalent to (e, η )⊥ Z | X.
8
on the joint distribution of (e, η ) is standard and weak. It rules out mass points in the
joint distribution of (e, η ).
Under assumption B, the function m is identified up to the distribution function
Fη |X (·|·), i.e., m( x, z) = Fη−|X1 (E( D | X = x, Z = z)| x ) for all ( x, z) ∈ SXZ , where Fη−|X1 (·| x )
is the quantile. Moreover, for any x ∈ SX and t ∈ R, we have by (e, η )⊥ Z | X,
P(h(1, x, e) ≤ t; η ≤ m( x, z)| X = x ) = P(Y ≤ t; D = 1| X = x, Z = z);
P(h(0, x, e) ≤ t; η > m( x, z)| X = x ) = P(Y ≤ t; D = 0| X = x, Z = z).
In contrast, neither P(h(1, x, e) ≤ t; η > m( x, z)| X = x ) nor P(h(0, x, e) ≤ t; η ≤
m( x, z)| X = x ) is identified yet. Next, suppose there are z1 , z2 ∈ SZ|X = x with p( x, z1 ) <
p( x, z2 ), where p( x, z) = E( D = 1| X = x, Z = z) denotes the propensity score. Then,
the conditional distribution Ph(d,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) can be identified as follows:
For d = 0, 1 and all t ∈ R,
P h(d, x, e) ≤ t| X = x; m( x, z1 ) < η ≤ m( x, z2 )
P h(d, x, e) ≤ t; m( x, z1 ) < η ≤ m( x, z2 )| X = x
=
P m( x, z1 ) < η ≤ m( x, z2 )| X = x
=
P(Y ≤ t; D = d| X = x, Z = z2 ) − P(Y ≤ t; D = d| X = x, Z = z1 )
.
P( D = d| X = x, Z = z2 ) − P( D = d| X = x, Z = z1 )
(4)
As a matter of fact, the above calculation follows Imbens and Angrist (1994) who derives
the conditional expectations given the complier group, i.e., { X = x, m( x, z1 ) < η ≤
m( x, z2 )}, while we consider the full conditional distributions. An expression similar to
(4) can also be found in Imbens and Rubin (1997) and Froelich and Melly (2013). We use
(4) for the identification analysis in the next section.
9
3. I DENTIFICATION
In this section, we develop a simple approach for the identification of individual
treatment effects as well as the structural function h under assumptions A and B. In particular, our method involves a weak rank condition that is straightforward to verify: For
any given x, the propensity score has a non–degenerate support, i.e., p( x, z1 ) 6= p( x, z2 ).
Under our rank condition, the identification at infinity arguments are obviously not
satisfied (see e.g. Chamberlain, 1986; Heckman, 1990). The proposed identification strategy is constructive and derived under a binary variation of the instrumental variable,
which differs from e.g. Heckman (1990) and Chernozhukov and Hansen (2005).
The key to our identification strategy is to match h(1, x, ·) with h(0, x, ·) under weak
conditions, i.e. h(1, x, ·) = φx (h(0, x, ·)) on [0, 1], where the function φx can be identified in a constructive manner for every x ∈ SX . We call φx the counterfactual mapping because we can find the counterfactual outcome h(1, x, e) from h(0, x, e) using
φx for any e ∈ [0, 1]. Specifically, fix x ∈ SX . For each y ∈ Sh(0,x,e) , then we have
φx (y) ≡ h 1, x, h−1 (0, x, y) , where h−1 (0, x, ·) is the inverse function of h(0, x, ·). Under
assumption A, the counterfactual mapping φx is well–defined, continuous and strictly
increasing from Sh(0,x,e) onto Sh(1,x,e) . Thus, the ITE satisfies
ITEx (e) = D (Y − φx−1 (Y )) + (1 − D )(φx (Y ) − Y ),
(5)
conditional on X = x, where φx−1 (·) denotes the inverse of φx (·). In particular, ITEx (·)
is identified as soon as φx (·) is. Moreover, given φx and any (d, x ), we can also identify
h(d, x, ·) as the quantile function of Fh(d,x,e) following e.g. Chesher (2003).
We first provide a lemma that shows that identification of φx (·) is sufficient for
identifying h(d, x, ·) under assumption A.
Lemma 1. Suppose assumption A holds. Then, for any x ∈ SX , h(0, x, ·) and h(1, x, ·) are
identified on [0, 1] if and only if φx (·) is identified on Sh(0,x,e) .
Proof. See Appendix A.1
10
Lemma 1 reduces the identification of h(1, x, ·) and h(0, x, ·) into the identification of
one function, namely, the counterfactual mapping φx (·). To see the if part, note that
conditional on X = x,
h(1, x, e) = YD + φx (Y )(1 − D ), h(0, x, e) = φx−1 (Y ) D + Y (1 − D ).
(6)
Thus, the identification of φx (·) provides the distributions of h(0, x, e) and h(1, x, e),
which further identify h(0, x, ·) and h(1, x, ·). Specifically, for d = 0, 1 we have h(d, x, e) =
1
Fh−(d,x,e
(e) for every e ∈ (0, 1) by assumption A, respectively.
)
To identify the counterfactual mapping φx (·), we introduce a minimal rank condition,
namely, that the propensity score p( x, z) varies with z.
Assumption C (Rank Condition). For every x ∈ SX , S p(X,Z)|X = x is not a singleton.
Given that Z is binary, assumption C is equivalent to SZ|X = x = {z1 , z2 } and p( x, z1 ) 6=
p( x, z2 ) for all x ∈ SX . Under assumption C, we can identify the conditional distribution
of h(d, x, e) given m( x, z1 ) < η ≤ m( x, z2 ) (and X = x) for d = 0, 1 by (4). Throughout,
we use Fh(d,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) (·) to denote the conditional CDF of h(d, x, e) given
X = x and m( x, z1 ) < η ≤ m( x, z2 ). Let Cdx be its support.
We have Cdx ⊆ SY | D=d,X = x .6 In the next assumption, we require that the complier
group be sufficiently large so that these two supports are equal.
Assumption D (Support Condition). We have Cdx = SY | D=d,X = x for d = 0, 1 and x ∈ SX .
By (4), assumption D is testable. Assumption D requires that, for any individual in the
population, there is an individual from the complier group who has the same outcome
and treatment status. Without assumption D, the structural function h(d, x, ·) is point
identified only on the support of e given the complier group as we discussed later.
6Because Y = h( D, X, e), this is equivalent to S
e|m( x,z1 )<η ≤m( x,z2 ),X = x ⊆ Se| D =d,X = x . Consider d = 1.
We have Se|m( x,z1 )<η ≤m( x,z2 ),X = x = Se|m( x,z1 )<η ≤m( x,z2 ),Z=z2 ,X = x by (e, η )⊥ Z | X. But {m( x, z1 ) < η ≤
m( x, z2 ), Z = z2 , X = x } ⊆ {η ≤ m( x, z2 ), Z = z2 , X = x } = { D = 1, Z = z2 , X = x } ⊆ { D = 1, X = x }.
The proof is similar for d = 0.
11
Assumption D is equivalent to the support of e for the complier group being equal to
that in the population, i.e. Se|m( x,z1 )<η ≤m( x,z2 ),X = x = [0, 1].7 It is typically satisfied. For
instance, a sufficient condition for assumption D is that conditional on X, (e, η ) has a
rectangular support such as [0, 1] × R as often assumed for the triangular model (1)-(2).
For a generic random variable W, let QW be the quantile function of its distribution,
i.e. QW (τ ) = inf {w ∈ R : P(W ≤ w) ≥ τ } for any τ ∈ [0, 1]. Further, for any
x, d and a generic subset A ⊆ Sη , let Qh(d,x,e)|X = x,η ∈ A be the conditional quantile
function of h(d, x, e) given X = x and η ∈ A, i.e. Qh(d,x,e)|X = x,η ∈ A (τ ) = inf {h ∈ R :
P [h(d, x, e) ≤ h| X = x, η ∈ A] ≥ τ }.
Lemma 2. Suppose assumptions A to D hold. For every x ∈ SX , let z1 , z2 ∈ SZ|X = x such
that p( x, z1 ) < p( x, z2 ). Then, Sh(0,x,e) = SY | D=0,X = x and the counterfactual mapping φx (·)
is identified: For any y0 ∈ Sh(0,x,e) ,
φx (y0 ) = Qh(1,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) Fh(0,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) (y0 ) ,
(7)
where Qh(1,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) (·) and Fh(0,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) (·) are given by (4).
Proof. See Appendix A.2
As a second object of interest, we also identify the individual treatment effects by (5).
It is worth emphasizing that Lemma 2 holds without the independence between X and
e, as it holds pointwise in x. On the other hand, the requirement that the rank condition
and support condition hold for any x ∈ SX in assumptions C and D, respectively,
is useful to identify every individual treatment effect in the whole population by (5).
Moreover, using the counterfactual mapping φx (·) and (6), we can identify other policy
7From the previous footnote, assumption D is equivalent to S
e|m( x,z1 )<η ≤m( x,z2 ),X = x = Se| D =d,X = x for
d = 0, 1. Thus Se|m( x,z1 )<η ≤m( x,z2 ),X = x = Se| X = x , which is equal to [0, 1] by assumption A. Conversely, if
Se|m( x,z1 )<η ≤m( x,z2 ),X = x = [0, 1], then Se|m( x,z1 )<η ≤m( x,z2 ),X = x = Se| X = x . Therefore, the first inclusion in
the previous footnote implies Se|m( x,z1 )<η ≤m( x,z2 ),X = x = Se| D=d,X = x for d = 0, 1.
12
effects of interest, e.g.,
n o
−1
ATE ≡ E h(1, X, e) − h(0, X, e) = E D Y − φX
(Y ) + ( 1 − D ) φ X (Y ) − Y
−1
ATT ≡ E h(1, X, e) − h(0, X, e) D = 1 = E Y − φX
(Y ) D = 1
QTEx (·) ≡ Qh(1,x,e) (·) − Qh(0,x,e) (·)
= Q D·Y +(1− D)·φX (Y )|X (·| x ) − Q(1− D)·Y + D·φ−1 (Y )|X (·| x ).
X
where ATT and QTE respectively refer to the average treatment effect on the treated
and the quantile treatment effect, which have also been extensively discussed in the
literature; See e.g. Heckman and Vytlacil (2007) and Imbens and Wooldridge (2009).
In particular, if one imposes additivity on h, i.e. h( D, X, e) = h∗ ( D, X ) + e, then
φx (y) = h∗ (1, x ) − h∗ (0, x ) + y. By Lemma 2, we can show that for any τ ∈ [0, 1],
h∗ (1, x ) − h∗ (0, x )
= Qh(1,x,e)|X =x,m(x,z1 )<η ≤m(x,z2 ) (τ ) − Qh(0,x,e)|X =x,m(x,z1 )<η ≤m(x,z2 ) (τ ). (8)
Equation (8) is worth discussing. First, by integrating out τ, we obtain
h∗ (1, x ) − h∗ (0, x ) = E[h(1, x, e) − h(0, x, e)| X = x, m( x, z1 ) < η ≤ m( x, z2 )]
=
E(Y | X = x, Z = z2 ) − E(Y | X = x, Z = z1 )
.
p( x, z2 ) − p( x, z1 )
Note that the RHS has the same expression as the (conditional) LATE in Imbens and
Angrist (1994). This is not surprising, since there is no unobserved heterogeneity
in individual treatment effects as h(1, x, e) − h(0, x, e) = h∗ (1, x ) − h∗ (0, x ). Thus,
individual treatment effects are the same as (conditional) LATE or conditional ATE,
which is defined as E[h(1, X, e) − h(0, X, e)| X = x ].8 Second, (8) is a useful model
restriction from the linear additivity, which suggests several ways of testing the additive
separability (in e) of the structural function h. A natural testing idea is to test that the
8From such an intuition, equation (2) is not even needed for this simple expression of h∗ (1, x ) − h∗ (0, x ).
13
slope of φx (·) equals 1. Alternatively, one can also test that the RHS of (8) is constant
across quantiles.
By Lemma 1 and Lemma 2, the identification of h follows.
Theorem 1. Suppose that assumptions A to D hold. For every x ∈ SX , d = 0, 1 and τ ∈ [0, 1],
h(d, x, τ ) is identified as the τ–th quantile of the distribution of Fh(d,x,e) (·), where
Fh(1,x,e) (t) = P YD + φX (Y )(1 − D ) ≤ t| X = x ,
−1
Fh(0,x,e) (t) = P Y (1 − D ) + φX
(Y ) D ≤ t | X = x .
The proof of theorem 1 is straightforward given the above discussion, and hence omitted.
Our identification results in Theorem 1 are different from those in a seminal paper by
Chernozhukov and Hansen (2005) who establish the identification of h(d, x, τ ) under a
full–rank condition for any given τ ∈ (0, 1). Subsequently, Chen, Chernozhukov, Lee,
and Newey (2014) and Chernozhukov and Hansen (2013) weaken this rank condition
with respect to the regions where it should hold. Our rank condition requires a binary
instrument for the identification of h(d, x, ·), which is weaker than the rank conditions
in Chernozhukov and Hansen (2005), Chen, Chernozhukov, Lee, and Newey (2014) and
Chernozhukov and Hansen (2013). A closer comparison of identification conditions
between our approach and Chernozhukov and Hansen (2005) is provided in Section 4,
where we show that our approach exploits the shape restrictions (i.e. assumption A) to
the fullest.9 Moreover, our identification is constructive and highlights the identification
power of complier groups (as well as defier groups when (2) is not assumed) that are
induced by variations in the instrumental variables.
3.1. Discussion. First, we note that the rank condition (assumption C) is minimal for
an instrumental variable Z. In contrast, the identification at infinity approach developed
in Chamberlain (1986) and Heckman (1990) involves a strong support condition on
Z. In a semiparametric setting with discrete endogenous regressors, Lewbel (1998)
9We thank Victor Chernozhukov for helpful comments.
14
proposes a “special regressor” approach for the identification of linear index coefficients
under a large support assumption on the special regressor. Our condition on the
instrumental variable is related to that in recent papers by D’Haultfœuille and Février
(forthcoming) and Torgovitsky (forthcoming), who study a triangular system with
continuous endogenous regressors. By extending the quantile matching approach using
a chaining idea, their approaches for the identification of the structural relationship also
allow for a binary valued instrumental variable. See also Guerre, Perrigne, and Vuong
(2009) for an early use of the chaining argument.
In Lemma 2, assumption D is a support condition that relates to the effectiveness of
the instrumental variable. Intuitively, the compliers group contains all the information
on (individual) treatment effects as well as the structural function (conditional on
X = x). Lemma 2 highlights that, in addition to generating variations in the propensity
score, the instrumental variable also needs to be fully effective to point identify the
counterfactual mapping over its whole support. Otherwise, there exists a non–trivial
subgroup in the population for which we can not use exogenous variations in the
instrumental variable to point identify their treatment effects. When assumption D fails,
we call the instrumental variable Z partially effective. In this case, the counterfactual
mapping on the support of the complier group provides bounds for potential outcomes
of the population.10
∗ ( τ | s ) = sup{ w : P(W ≤
For any two generic random variables W and S, let QW
|S
w|S = s) ≤ τ } for any τ ∈ [0, 1]. Because {w : P(W ≤ w|S = s) ≤ τ } {w : P(W ≤
S
∗ ( τ | s ) with equality if Q
w|S = s) ≥ τ } = R, we have QW |S (τ |s) ≤ QW
W |S ( τ | s ) is an
|S
∗ (1| s ) = + ∞ and
inner point in the support of W given S = s. Moreover, we have QW
|S
QW |S (0|s) = −∞, which represent trivial bounds.
10Note that parametric assumptions on h (or on the counterfactual mapping) can also help achieve point
identification of potential outcomes outside of the support Cdx .
15
Corollary 1. Suppose assumptions A to C hold. For every x ∈ SX , let z1 , z2 ∈ SZ|X = x such
that p( x, z1 ) < p( x, z2 ). Then, the counterfactual mapping φx (·) is partially identified by
φx,` (y0 ) ≡ Qh(1,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) Fh(0,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) (y0 ) ≤ φx (y0 )
≤ Q∗h(1,x,e)|X =x,m(x,z1 )<η ≤m(x,z2 ) Fh(0,x,e)|X =x,m(x,z1 )<η ≤m(x,z2 ) (y0 ) ≡ φx,u (y0 ) (9)
for all y0 ∈ SY | D=0,X = x .11 Moreover, the structural function h(d, x, τ ) is partially identified
by
h` (d, x, τ ) ≤ h(d, x, τ ) ≤ hu (d, x, τ )
where for s = `, u, hs (1, x, τ ) is the τ–th quantile of the distribution P YD + φx,s (Y )(1 −
D ) ≤ t| X = x ; hs (0, x, τ ) is the τ–th quantile of the distribution P ϕ x,s (Y ) D + Y (1 − D ) ≤
t| X = x , in which ϕ x,s (·) is defined in footnote 11.
Proof. See Appendix A.3.
◦,
Note that the upper and lower bounds of φx (y0 ) collapse to each other when y0 ∈ C0x
◦ . A similar property applies to
the interior of C0x . That is, φx (·) is point identified on C0x
the structural function h(d, x, τ ) when τ ∈ Se◦|X = x,m( x,z
1 )< η ≤ m ( x,z2 )
.
4. E XTENDING THE IDENTIFICATION ARGUMENT
In this section, we investigate the identification power arising from the selection
equation, as well as from variations in the instrumental variable Z. In particular, we
drop assumption B and provide a general sufficient condition for the identification of
the counterfactual mapping φx as well as the structural function h. Such a condition is
related to, but weaker than, the rank condition for global identification developed in
Chernozhukov and Hansen (2005, Theorem 2). Throughout, we maintain assumption A.
11Note that (9) implies that for all y ∈ S
1
Y | D =1,X = x ,
ϕ x,` (y1 ) ≡ Qh(0,x,e)| X = x,m( x,z1 )<η ≤m( x,z2 ) Fh(1,x,e)| X = x,m( x,z1 )<η ≤m( x,z2 ) (y1 ) ≤ φx−1 (y1 )
≤ Q∗h(0,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) Fh(1,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) (y1 ) ≡ ϕ x,u (y1 ).
By definition, we can show ϕ x,` (y1 ) = inf{y0 : φx,u (y0 ) ≥ y1 } and ϕ x,u (y1 ) = sup{y0 : φx,` (y0 ) ≤ y1 }.
16
Under assumption A, we have P Y ≤ h( D, X, τ )| X = x, Z = z = τ for all z ∈
SZ|X = x and τ ∈ (0, 1), which is called the main testable implication by Chernozhukov
and Hansen (2005, Theorem 1). Because D is binary, we have for all z ∈ SZ|X = x ,
P Y ≤ h(1, x, τ ); D = 1| X = x, Z = z
+ P Y ≤ h(0, x, τ ); D = 0| X = x, Z = z = τ. (10)
Suppose z1 , z2 ∈ SZ|X = x with p( x, z1 ) < p( x, z2 ). Thus, we obtain
P Y ≤ h(1, x, τ ); D = 1| X = x, Z = z1 + P Y ≤ h(0, x, τ ); D = 0| X = x, Z = z1
= P Y ≤ h(1, x, τ ); D = 1| X = x, Z = z2 + P Y ≤ h(0, x, τ ); D = 0| X = x, Z = z2 ,
i.e.,
∆0 (h(0, x, τ ), x, z1 , z2 ) = ∆1 (h(1, x, τ ), x, z1 , z2 ),
(11)
where ∆d (·, x, z1 , z2 ) is defined for y ∈ R as
∆0 (y, x, z1 , z2 ) ≡ P [Y ≤ y; D = 0| X = x, Z = z1 ] − P [Y ≤ y; D = 0| X = x, Z = z2 ] ,
∆1 (y, x, z1 , z2 ) ≡ P [Y ≤ y; D = 1| X = x, Z = z2 ] − P [Y ≤ y; D = 1| X = x, Z = z1 ] .
Note that ∆d is (up to sign) the numerator of the RHS in (4). By definition, ∆d (·, x, z1 , z2 )
is identified on R for d = 0, 1. Let y = h(0, x, τ ) in (11) so that h(1, x, τ ) = φx (y). Since
τ is arbitrary, then
∆0 (y, x, z1 , z2 ) = ∆1 (φx (y), x, z1 , z2 ), ∀y ∈ Sh(0,x,e) .
(12)
Our general identification result is based on (12) and exploits the strict monotonicity
of φx (·).12 Recall that under assumption B, ∆d (y, x, z1 , z2 ) = P[h(d, x, e) ≤ y; m( x, z1 ) <
η ≤ m( x, z2 )| X = x ], which is continuous and strictly increasing in y ∈ Sh(d,x,e) under
12Chernozhukov and Hansen (2005) first write (10) for Z = z and Z = z and then find sufficient
2
1
condition for the resulting system of two (nonlinear) equations to have a unique local/global solution in
the two unknowns h(1, x, τ ) and h(0, x, τ ).
17
assumption D, thereby identifying φx (·) = ∆1−1 (∆0 (·, x, z1 , z2 ), x, z1 , z2 ) on Sh(0,x,e) . This
suggests that identification of the counterfactual mapping φx can be achieved under
weaker conditions than assumption B.
Definition 1 (Piecewise Monotone). Let g : R → R and S ⊆ R. We say that g is piecewise
weakly (strictly) monotone on the support S if S can be partitioned into a countable number of
non–overlapping intervals such that g is weakly (strictly) monotone in every interval.
Lemma 3. Suppose assumption A holds. Given x ∈ SX , suppose z1 , z2 ∈ SZ|X = x with
p( x, z1 ) < p( x, z2 ). Then, ∆1 (·, x, z1 , z2 ) is piecewise weakly (strictly) monotone on Sh(1,x,e)
if and only if ∆0 (·, x, z1 , z2 ) is piecewise weakly (strictly) monotone on Sh(0,x,e) .
Proof. See Appendix A.4
Assumption E. Suppose ∆d (·, x, z1 , z2 ) is piecewise weakly monotone on Sh(d,x,e) , d = 0, 1.
Assumption E is weak. In particular, if ∆d (·, x, z1 , z2 ) is continuously differentiable as in
Chernozhukov and Hansen (2005, Theorem 2-i), then assumption E holds.
Lemma 4. Let x ∈ SX such that z1 , z2 ∈ SZ|X = x with p( x, z1 ) < p( x, z2 ). Suppose (i)
φx (·) : Sh(0,x,e) → Sh(1,x,e) is continuous and strictly increasing, and (ii) equation (12) and
assumption E hold where ∆d (·, x, z1 , z2 ) and Sh(d,x,e) are known for d = 0, 1. Then, φx (·) is
identified if and only if ∆d (·, x, z1 , z2 ) is piecewise strictly monotone on Sh(d,x,e) for some d.
Proof. See Appendix A.5
Lemma 4 is useful when identification is based on (12) only. In this case, under the
maintained assumption E, Lemma 4 provides a necessary and sufficient condition for
the identification of φx (·).
By replacing assumptions B and D with the weaker assumption that ∆d (·, x, z1 , z2 )
be piecewise strictly monotone on SY |X = x,D=d for d = 0, 1, we now extend Lemma 2
and Theorem 1. Note that the non–flatness of ∆d (·, x, z1 , z2 ) on SY |X = x,D=d for d = 0, 1
plays a similar role as assumption D.
18
Theorem 2. Suppose that assumptions A and C hold. Let x ∈ SX such that z1 , z2 ∈
SZ|X = x with p( x, z1 ) < p( x, z2 ). Suppose ∆d (·, x, z1 , z2 ) is piecewise strictly monotone
on SY |X = x,D=d for d = 0, 1. Then the support of h(d, x, e) is identified by Sh(d,x,e) =
SY | D=d,X = x . Moreover, φx (·) and h(d, x, ·) are identified on the supports Sh(0,x,e) and [0, 1],
respectively.
Proof. See Appendix A.6.
5. C HARACTERIZATION OF MODEL RESTRICTIONS
In empirical applications, an important question is whether to adopt the nonseparable
model (1) with or without the selection equation (2). As shown in Section 3, the selection
equation provides a simple and constructive identification result, but can introduce
additional restrictions on the data. In this section, we characterize all the restrictions
on observables imposed by the model with or without the selection equation. These
restrictions are useful for developing model selection and model specification tests.
Formally, we denote these two models by
n
o
M0 ≡ [h, FeD|XZ ] : assumption A holds ;
n
o
M1 ≡ [h, m, Feη |XZ ] : assumptions A and B hold .
To simplify, hereafter we assume SXZ = SX × {z1 , z2 }. Moreover, p( x, z1 ) < p( x, z2 )
for all x ∈ SX . We say that a conditional distribution FYD|XZ of observables is rationalized by model M if and only if there exists a structure in M that generates FYD|XZ .
Theorem 3. A conditional distribution FYD|XZ can be rationalized by M0 if and only if
(i) FY | DXZ is a continuous conditional CDF;
(ii) for each x ∈ SX , there exists a continuous and strictly increasing mapping gx : R → R
such that gx maps S(1− D)Y + Dg−1 (Y )|X = x onto SDY +(1− D) gx (Y )|X = x and
x
∆0 (·, x, z1 , z2 ) = ∆1 ( gx (·), x, z1 , z2 ).
19
(13)
Proof. See Appendix A.7
In Theorem 3, the key restriction is the existence of a solution to (13), which may not be
unique.
We now turn to M1 .
Theorem 4. A conditional distribution FYD|XZ rationalized by M0 can also be rationalized by
M1 if and only if for any x ∈ SX and d ∈ {0, 1}, ∆d (·, x, z1 , z2 ) is weakly increasing on R.
Proof. See Appendix A.8.
The condition in Theorem 4 strengthens Theorem 3–(ii): By Lemma 4, the fact that
∆d (·, x, z1 , z2 ) is continuous and strictly increasing on SY | D=d,X = x implies that (13) has a
unique solution gx (·) = φx (·). See also Vuong and Xu (2014) for a characterization of all
the restrictions associated with two other models which do not impose monotonicity in
the outcome equation, namely, {[h, FeD|XZ ]: e is continuously distributed on an interval
and assumption A–(ii) holds} and {[h, m, Feη |XZ ]: assumption B holds}.
6. G ENERALIZATION
This section provides another extension of Lemma 2 and Theorem 1. Specifically, we
relax the continuity and strict monotonicity assumption of h in assumption A so that
our method applies to the case where the outcome variable has a probability mass in
its distribution such as when the outcome variable is censored or binary (a challenging
situation according to Wooldridge, 2013).
Assumption A’. Equation (1) holds where (i) h is left–continuous and weakly increasing in e,
with e distributed as U [0, 1], and (ii) ( X, Z ) is independent of e.
The left–continuity of h is a normalization for the identification of the structural function
at its discontinuous points.
Because assumption A’ relaxes assumption A, we need to strengthen assumption B
to achieve point identification of h. Specifically, we require the covariates X to be
20
exogenous in (2). This approach was first introduced by Vytlacil and Yildiz (2007) in a
weakly nonseparable model. See also Shaikh and Vytlacil (2011).
Assumption B’. Equation (2) holds where (i) (e, η ) is independent of ( X, Z ), (ii) the distribution of (e, η ) is absolutely continuous with respect to Lebesgue measure, and (iii) S(e,η ) is a
rectangle.
Under assumption B’-(i), we can drop X = x in Fh(d,x,e)|X = x,m( x̃,z̃1 )<η ≤m( x̃,z̃2 ) . To simplify
our exposition, we assume assumption B’-(iii) which is slightly stronger than the support
condition in assumption D.
A key point from Vytlacil and Yildiz (2007) is that assumption B’ allows us to use
variations in X for the identification of h(d, x, ·) when Z has sufficient variations (more
than binary). The following rank condition is introduced for the identification of
h(1, x, ·). The identification of h(0, x, ·) can be obtained similarly.
Assumption C’ (Generalized Rank Condition). For any x ∈ SX , there exists x̃ ∈ SX such
that (i) the set S p(X,Z)|X = x
T
S p(X,Z)|X = x̃ contains at least two different values, and (ii) for
any pair τ1 , τ2 ∈ (0, 1),
h(0, x̃, τ1 ) = h(0, x̃, τ2 ) =⇒ h(1, x, τ1 ) = h(1, x, τ2 ).
Condition (i) of assumption C’ requires that there exist z1 , z2 ∈ SZ|X = x and z̃1 , z̃2 ∈
SZ|X = x̃ such that p( x, z1 ) = p( x̃, z̃1 ) < p( x, z2 ) = p( x̃, z̃2 ). Condition (ii) is testable
since it is equivalent to the following condition: For any τ1 , τ2 ∈ (0, 1),
Qh(0,x̃,e)|m( x̃,z̃1 )<η ≤m( x̃,z̃2 ) (τ1 ) = Qh(0,x̃,e)|m( x̃,z̃1 )<η ≤m( x̃,z̃2 ) (τ2 )
=⇒ Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) (τ1 ) = Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) (τ2 ).
Thus, we can verify whether there is x̃ ∈ SX satisfying assumption C’. When h(d, x, ·)
is strictly monotone in e, by setting x̃ = x, assumption C’ reduces to assumption C.
21
Fix x and let x̃ satisfy assumption C’. We define a generalized counterfactual map
ping φx,x̃ (·) as φx,x̃ y = h(1, x, h−1 (0, x̃, y)) for all y ∈ Sh(0,x̃,e) .13 If x̃ = x, then
φx,x̃ (·) reduces to the counterfactual mapping φx (·) in Section 3. Let (z1 , z2 ) ∈ SZ|X = x
and (z̃1 , z̃2 )SZ|X = x̃ with p( x, z1 ) = p( x̃, z̃1 ) < p( x, z2 ) = p( x̃, z̃2 ). The next theorem
generalizes Lemma 2 and Theorem 1.
Theorem 5. Suppose assumptions A’ and B’ hold. For any x ∈ SX , suppose in addition
assumption C’ hold with x̃, then φx,x̃ (·) is identified by
φx,x̃ (·) = Qh(1,x,e)|m( x,z1 )<η ≤m( x,z2 ) Fh(0,x̃,e)|m( x̃,z̃1 )<η ≤m( x̃,z̃2 ) (·) .
(14)
and for any τ ∈ [0, 1], h(1, x, τ ) is identified as the τ–th quantile of the distribution
Fh(1,x,e) (·) = P(Y ≤ ·; D = 1| X = x ) + P φx,x̃ (Y ) ≤ ·; D = 0| X = x̃ .
Proof. See Appendix A.9
To illustrate Theorem 5, we discuss the generalized rank condition (assumption C’)
with two examples: the first example is a fully nonseparable censored regression model
while the second example is a weakly separable binary response model. The first
example seems to be new, though special cases have been studied previously under
some parametric and/or separability assumptions. The second example was studied by
Vytlacil and Yildiz (2007).
Example 3. Consider the model
Y = h∗ ( D, X, e)1 h∗ ( D, X, e) ≥ 0 ,
D = 1 m( X, Z ) ≥ η ,
where h∗ is strictly increasing in e. The structural unknowns are (h∗ , m, Feη ).
13Due to the weak monotonicity of h in e, we define h−1 (0, x̃, y) = inf {τ : h(0, x̃, τ ) ≥ y}.
τ
22
Fix x ∈ SX . For d = 0, 1, let τdx solve h∗ (d, x, τdx ) = 0. W.l.o.g., let τdx ∈ (0, 1). Thus,
assumption C’ is satisfied if there exists an x̃ ∈ SX such that τ0x̃ ≤ τ1x and p( x, z1 ) =
p( x̃, z̃1 ) < p( x, z2 ) = p( x̃, z̃2 ) for some z1 , z2 ∈ SZ|X = x , z̃1 , z̃2 ∈ SZ|X = x̃ .
Example 4. Let Y and D denote a binary outcome variable and a binary endogenous regressor,
respectively. Consider
Y = 1 h∗ ( D, X ) ≥ e ,
D = 1 m( X, Z ) ≥ η .
To identify h∗ (1, x ) for some x, assumption C’ requires that there exists x̃ ∈ SX such that
h∗ (1, x ) = h∗ (0, x̃ ) and p( x, z1 ) = p( x̃, z̃1 ) < p( x, z2 ) = p( x̃, z̃2 ) for some z1 , z2 ∈ SZ|X = x ,
z̃1 , z̃2 ∈ SZ|X = x̃ . Thus, assumption C’ reduces exactly to the support condition in Vytlacil and
Yildiz (2007).
Motivated by Shaikh and Vytlacil (2011), assumption C’-(ii) can be relaxed leading to
partial identification. For generic random variables W and S, let KW |S be the Kolmogorov
(conditional) CDF defined by KW |S (w|s) = P(W < w|S = s). By definition, we have
KW |S (w|s) ≤ FW |S (w|s) with equality if w is not a mass point of W given S = s.
Corollary 2. Suppose assumptions A’ and B’ hold. For any x ∈ SX , suppose in addition
assumption C’-(i) hold with x̃, then φx,x̃ (·) is partially identified by
φ
(·)
≡
Qh(1,x,e)|m( x,z1 )<η ≤m( x,z2 ) Kh(0,x̃,e)|m( x̃,z̃1 )<η ≤m( x̃,z̃2 ) (·) ≤ φx,x̃ (·)
x,x̃
≤ Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fh(0,x̃,e)|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (·) ≡ φ x,x̃ (·),
and for any τ ∈ [0, 1], the lower and upper bounds of h(1, x, τ ) are identified as the τ–th
quantiles of the distributions P(Y ≤ ·; D = 1| X = x ) + P φ (Y ) ≤ ·; D = 0| X = x̃ and
x,x̃
P(Y ≤ ·; D = 1| X = x ) + P φ x,x̃ (Y ) ≤ ·; D = 0| X = x̃ , respectively. 14
Proof. See Appendix A.10.
14This corollary was first presented at Bilkent Workshop in Econometric Theory and Applications, 2013.
23
In Corollary 2, the lower and upper bounds collapse to each other if in addition C’-(ii)
holds for x̃.
Similarly to Corollary 1, we can also drop the support condition B’-(iii) to construct
bounds under assumption C’-(i).
Corollary 3. Suppose assumptions A’ and B’-(i,ii) hold. For any x ∈ SX , suppose assumption
C’-(i) hold with x̃, then φx,x̃ (·) is partially identified by
L
φx,
x̃ (·) ≡ Q h(1,x,e)|m( x,z1 )<η ≤m( x,z2 ) Kh(0,x̃,e)|m( x̃,z̃1 )<η ≤m( x̃,z̃2 ) (·) ≤ φx,x̃ (·)
H
≤ Q∗h(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fh(0,x̃,e)|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (·) ≡ φx,
x̃ (·),
and for any τ ∈ [0, 1], the lower and upper bounds of h(1, x, τ ) are identified as the τ–th
L
quantiles of the distributions P(Y ≤ ·; D = 1| X = x ) + P φx,
x̃ (Y ) ≤ ·; D = 0| X = x̃ and
H
P(Y ≤ ·; D = 1| X = x ) + P φx,
x̃ (Y ) ≤ ·; D = 0| X = x̃ , respectively.
Proof. See Appendix A.11.
When there are multiple x̃ satisfying assumption C’-(i), we can use them jointly to
tighten our bounds in Corollaries 2 and 3 by taking the sup and the inf over x̃. See
Shaikh and Vytlacil (2011).
7. C ONCLUSION
This paper establishes nonparametric identification of the counterfactual mapping,
individual treatment effects and structural function in a nonseparable model with a
binary endogenous regressor. Our benchmark model assumes strict monotonicity in the
outcome equation and weak monotonicity in the selection equation. Our counterfactual
mapping then links each outcome with its counterfactual. Moreover, we consider two
extensions: One without a selection equation and the other with weak monotonicity in
the outcome equation.
As indicated in Section 3, the counterfactual mapping and individual treatment effects
can be identified under the weaker independence assumption Z ⊥(e, η )| X, the rank
24
condition (assumption C) and the support condition (assumption D). In particular,
assumption D is satisfied as soon as e and η have a rectangle support given X. Some
policy effects such as ATE can also be identified. When assumption D fails, we obtain
partial identification of the counterfactual mapping and individual treatment effects. It
is important to note that such results do not require exogeneity of X. On the other hand,
identification of the structural function h, which is necessary for some counterfactuals
in empirical IO, requires the independence of X and e in assumption A.
To conclude, one needs to develop a nonparametric estimation method for the structural function h. To simplify ideas, we consider the benchmark model in Section 3
where X is discrete and Z is binary. We further assume p( x, z1 ) 6= p( x, z2 ) for all
( x, z1 ), ( x, z2 ) ∈ SXZ . A method will be first to estimate the counterfactual mapping
φx (·) and its inverse by
φ̂x (·) = Q̂h(1,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) ( F̂h(0,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) (·)),
φ̂x−1 (·) = Q̂h(0,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) ( F̂h(1,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) (·)),
where Q̂h(d,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) (·) is the quantile function of
F̂h(d,x,e)|X = x,m( x,z1 )<η ≤m( x,z2 ) (·)
=
∑in=1 1(Yi ≤ ·; Di = d; Xi = x )[1( Zi = z2 ) − 1( Zi = z1 )]
.
∑in=1 1( Di = d; Xi = x )[1( Zi = z2 ) − 1( Zi = z1 )]
In the second step, one estimates the structural function for any τ ∈ (0, 1) by
ĥ(1, x, τ ) = Q̂YD+φx (Y )(1− D)|X = x (τ ), ĥ(0, x, τ ) = Q̂Y (1− D)+φ−1 (Y ) D|X = x (τ ),
x
which are respectively the τ–th quantiles of
∑in=1 1(YD + φ̂x (Y )(1 − D ) ≤ ·; Xi = x )
F̂YD+φx (Y )(1− D)|X = x (·) =
,
∑in=1 1( Xi = x )
F̂Y (1− D)+φ−1 (Y ) D|X = x (·) =
x
25
∑in=1 1(Y (1 − D ) + φ̂x−1 (Y ) D ≤ ·; Xi = x )
.
∑in=1 1( Xi = x )
The above estimation procedure can be easily implemented.
Excluding possibly boundaries, the above suggests that the estimators of the counter√
factual mapping φx (·) and the structural function h are n–consistent. Their precise
asymptotic distributions can be derived using the functional delta method and the composition lemma 3.9.27 in Van Der Vaart and Wellner (1996). If Z is continuous, we conjec√
ture that φx (·) and h are still estimated at n–rate. This is because we can average out
Z and obtain EFh(0,x,e)|X = x,m( x,Z1 )<η ≤m( x,Z2 ) (·) = EFh(1,x,e)|X = x,m( x,Z1 )<η ≤m( x,Z2 ) (φx (·))
where Z1 and Z2 are two different observations. On the other hand, if X is continuous,
√
the estimation of φx and h will no longer be n–consistent. An important question is
then the optimal rate for estimating these functions. Moreover, we should note that even
in the benchmark model, there might exist several values x̃ 6= x satisfying assumption
C’–(i). In this case, one has other counterfactual mapping φx,x̃ that can be used to
identify h. A second question is to use this property to improve efficiency.
R EFERENCES
ATHEY, S. (2001): “Single crossing properties and the existence of pure strategy equilibria in games of incomplete information,” Econometrica, 69(4), 861–889.
(2002): “Monotone comparative statics under uncertainty,” The Quarterly Journal
of Economics, 117(1), 187–223.
ATHEY, S.,
AND
G. W. I MBENS (2006): “Identification and inference in nonlinear
difference-in-differences models,” Econometrica, 74(2), 431–497.
C HAMBERLAIN , G. (1986): “Asymptotic efficiency in semi-parametric models with
censoring,” Journal of Econometrics, 32(2), 189–218.
C HEN , X., V. C HERNOZHUKOV, S. L EE , AND W. K. N EWEY (2014): “Local identification
of nonparametric and semiparametric models,” Econometrica, 82(2), 785–809.
C HERNOZHUKOV, V.,
AND
C. H ANSEN (2005): “An IV model of quantile treatment
effects,” Econometrica, 73(1), 245–261.
26
(2013): “Quantile models with endogeneity,” Annual Review of Economics, 5,
57–81.
C HESHER , A. (2003): “Identification in nonseparable models,” Econometrica, 71(5),
1405–1441.
(2005): “Nonparametric identification under discrete variation,” Econometrica,
73(5), 1525–1550.
D’H AULTFŒUILLE , X., AND P. F ÉVRIER (forthcoming): “Identification of nonseparable
models with endogeneity and discrete instruments,” Econometrica.
F IRPO , S. (2007): “Efficient semiparametric estimation of quantile treatment effects,”
Econometrica, 75(1), 259–276.
F ROELICH , M., AND B. M ELLY (2013): “Unconditional quantile treatment effects under
endogeneity,” Journal of Business & Economic Statistics, 31(3), 346–357.
G UERRE , E., I. P ERRIGNE , AND Q. V UONG (2000): “Optimal nonparametric estimation
of first–price auctions,” Econometrica, 68(3), 525–574.
(2009): “Nonparametric identification of risk aversion in first-price auctions
under exclusion restrictions,” Econometrica, 77(4), 1193–1227.
H ECKMAN , J. (1990): “Varieties of selection bias,” The American Economic Review, 80(2),
313–318.
H ECKMAN , J. J. (1979): “Sample selection bias as a specification error,” Econometrica:
Journal of the econometric society, pp. 153–161.
H ECKMAN , J. J.,
AND
R. R OBB (1986): Alternative methods for solving the problem of
selection bias in evaluating the impact of treatments on outcomes. Springer.
H ECKMAN , J. J., J. S MITH ,
AND
N. C LEMENTS (1997): “Making the most out of pro-
gramme evaluations and social experiments: Accounting for heterogeneity in programme impacts,” The Review of Economic Studies, 64(4), 487–535.
H ECKMAN , J. J., AND E. V YTLACIL (2005): “Structural equations, treatment effects, and
econometric policy evaluation,” Econometrica, 73(3), 669–738.
27
H ECKMAN , J. J., AND E. J. V YTLACIL (1999): “Local instrumental variables and latent
variable models for identifying and bounding treatment effects,” Proceedings of the
National Academy of Sciences, 96(8), 4730–4734.
(2007): “Econometric evaluation of social programs, part I: Causal models,
structural models and econometric policy evaluation,” Handbook of econometrics, 6,
4779–4874.
H ODERLEIN , S., AND E. M AMMEN (2007): “Identification of marginal effects in nonseparable models without monotonicity,” Econometrica, 75(5), 1513–1518.
I MBENS , G. W. (2007): “Nonadditive models with endogenous regressors,” Econometric
Society Monographs, 43, 17.
I MBENS , G. W.,
AND
J. D. A NGRIST (1994): “Identification and estimation of local
average treatment effects,” Econometrica, 62(2), 467–475.
I MBENS , G. W., AND W. K. N EWEY (2009): “Identification and estimation of triangular
simultaneous equations models without additivity,” Econometrica, 77(5), 1481–1512.
I MBENS , G. W., AND D. B. R UBIN (1997): “Estimating outcome distributions for compliers in instrumental variables models,” The Review of Economic Studies, 64(4), 555–574.
I MBENS , G. W., AND J. M. W OOLDRIDGE (2009): “Recent developments in the econometrics of program evaluation,” Journal of Economic Literature, 47(1), 5–86.
J UN , S. J., J. P INKSE , AND H. X U (2011): “Tighter bounds in triangular systems,” Journal
of Econometrics, 161(2), 122–128.
K ASY, M. (2014): “Instrumental variables with unrestricted heterogeneity and continuous treatment,” The Review of Economic Studies, 81(4), 1614–1636.
L EWBEL , A. (1998): “Semiparametric latent variable model estimation with endogenous
or mismeasured regressors,” Econometrica, pp. 105–121.
M ANSKI , C. F. (1990): “Nonparametric bounds on treatment effects,” The American
Economic Review, 80(2), 319–323.
M ATZKIN , R. L. (1999): “Nonparametric estimation of nonadditive random functions,”
Discussion paper, Northwestern University.
28
(2003): “Nonparametric estimation of nonadditive random functions,” Econometrica, 71(5), 1339–1375.
(2008): “Identification in nonparametric simultaneous equations models,” Econometrica, 76(5), 945–978.
(2015): “Estimation of nonparametric models with simultaneity,” Econometrica,
83(1), 1–66.
M ILGROM , P., AND C. S HANNON (1994): “Monotone comparative statics,” Econometrica:
Journal of the Econometric Society, pp. 157–180.
R ENY, P. J. (2011): “On the existence of monotone pure–strategy equilibria in Bayesian
games,” Econometrica, 79(2), 499–553.
R UBIN , D. B. (1974): “Estimating causal effects of treatments in randomized and
nonrandomized studies.,” Journal of Educational Psychology, 66(5), 688.
S HAIKH , A. M., AND E. J. V YTLACIL (2011): “Partial identification in triangular systems
of equations with binary dependent variables,” Econometrica, 79(3), 949–955.
T ORGOVITSKY, A. (forthcoming): “Identification of nonseparable models with general
instruments,” Econometrica.
VAN D ER VAART, A. W., AND J. A. W ELLNER (1996): Weak Convergence. Springer.
V UONG , Q., AND H. X U (2014): “Characterization of model restrictions in a triangular
system,” Discussion paper, New York University and University of Texas at Austin.
V YTLACIL , E. (2002): “Independence, monotonicity, and latent index models: An
equivalence result,” Econometrica, 70(1), 331–341.
(2006): “A note on additive separability and latent index models of binary choice:
representation results,” Oxford Bulletin of Economics and Statistics, 68(4), 515–518.
V YTLACIL , E., AND N. Y ILDIZ (2007): “Dummy endogenous variables in weakly separable models,” Econometrica, 75(3), 757–779.
W OOLDRIDGE , J. (2013): “Control function methods in applied econometrics,” Discussion paper, Michigan State University.
29
A PPENDIX A. P ROOFS
A.1. Proof of Lemma 1.
Proof. The only if part is straightforward by the definition of φx (·) and it suffices to show
the if part. Suppose φx (·) is identified. Fix x. By definition, h(1, x, ·) = φx (h(0, x, ·)) on
[0, 1]. Then, conditional on X = x, we have h(1, x, e) = YD + φx (Y )(1 − D ) and h(0, x, e) =
Y (1 − D ) + φx−1 (Y ) D. Therefore we can identify Fh(d,x,e) as follows: for all t ∈ R,
Fh(1,x,e) (t) = P(YD + φx (Y )(1 − D ) ≤ t| X = x ),
Fh(0,x,e) (t) = P(Y (1 − D ) + φx−1 (Y ) D ≤ t| X = x ).
Further, by assumption A, we have h(d, x, τ ) = Qh(d,x,e) (τ ) for all τ ∈ [0, 1].
A.2. Proof of Lemma 2.
Proof. Fix x ∈ SX and z1 , z2 ∈ SZ|X = x with p( x, z1 ) < p( x, z2 ). First, we use proof by contradiction to show C0x = Sh(0,x,e) : Suppose C0x ( Sh(0,x,e) . Then it implies E ≡ {e ∈ [0, 1] : h(0, x, e) 6∈
C0x } does not belong to the complier group. Hence, for any e ∈ E, either h(1, x, e) 6∈ C0x , or
h(0, x, e) 6∈ C0x , which contradicts assumption D. Similarly, we also have C1x = Sh(1,x,e) .
Because h is strictly monotone in e, we have for any τ ∈ (0, 1)
Fh(0,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) (h(0, x, τ )) = Fe|X = x,m(x,z1 )<η ≤m(x,z2 ) (τ )
= Fh(1,x,e)|X =x,m(x,z1 )<η ≤m(x,z2 ) (h(1, x, τ )).
Note that Fh(1,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) is continuous and strictly increasing at h(1, x, τ ). Then
h(1, x, τ ) = Qh(1,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) Fh(0,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) (h(0, x, τ )) .
Let y = h(0, x, τ ) ∈ C0x . Then, τ = h−1 (0, x, y) and the above equation becomes
φx (y) = Qh(1,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) Fh(0,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) (y) .
30
A.3. Proof of Corollary 1.
Proof. By the proof of lemma 2, we have
Fh(0,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) (h(0, x, τ )) = Fh(1,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) (h(1, x, τ )).
Fh(1,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) is continuous and weakly increasing at h(1, x, τ ), we have
h(1, x, τ ) ≥ Qh(1,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) Fh(0,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) (h(0, x, τ ))
and
h(1, x, τ ) ≤ Q∗h(1,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) Fh(0,x,e)|X = x,m(x,z1 )<η ≤m(x,z2 ) (h(0, x, τ )) .
Let y0 = h(0, x, τ ) ∈ C0x . Then, h(1, x, τ ) = φx (y0 ) and the above equation becomes
φx,` (y0 ) ≤ φx (y0 ) ≤ φx,u (y0 ).
By a similar argument, we can show the result in footnote 11.
Moreover, P YD + φx,` (Y )(1 − D ) ≤ ·| X = x and P YD + φx,u (Y )(1 − D ) ≤ ·| X = x are
the upper and lower bounds of P(h(1, x, e) ≤ ·| X = x ), respectively. Therefore, their τ–th
quantile serves as the lower and upper bounds of h(1, x, τ ), respectively. A similar argument
holds for h(0, x, τ ).
A.4. Proof of Lemma 3.
Proof. For the if part, suppose ∆0 (·, x, z1 , z2 ) is piecewise weakly monotone on Sh(0,x,e) . Then,
by definition, Sh(0,x,e) can be partitioned into a sequence of non–overlapping intervals { Ij : j =
1, · · · , J }, where J ∈ N ∪ {+∞}, such that ∆0 (·, x, z1 , z2 ) is weakly monotone on every interval.
By assumption A, φx (·) is continuous and strictly increasing. Then, Sh(1,x,e) can be partitioned
into a sequence of non–overlapped intervals {φx ( Ij ) : j = 1, · · · , J }. Moreover, by (12), we have
∆1 (y, x, z1 , z2 ) = ∆0 φx−1 (y), x, z1 , z2 , ∀y ∈ Sh(1,x,e) .
Clearly, ∆1 (·, x, z1 , z2 ) is weakly monotone in every interval φx ( Ij ). The only if part can be shown
similarly.
31
A.5. Proof of Lemma 4.
Proof. We first show the if part. W.l.o.g., let ∆0 (·, x, z1 , z2 ) be piecewise strictly monotone on
Sh(0,x,e) , which can be partitioned into a sequence of non–overlapping intervals { Ij : j =
1, · · · , J }. On each interval Ij , due to the monotonicity, ∆0 (·, x, z1 , z2 ) has at most a countable
number of discontinuity points. Hence, Sh(0,x,e) can be further partitioned into a countable
number of non–overlapped intervals such that ∆0 (·, x, z1 , z2 ) is continuous and strictly monotone
on every interval. For such a sequence of intervals, we further merge adjacent intervals if
∆0 (·, x, z1 , z2 ) is continuous and strictly monotone on the union of them.
Thus, we partition Sh(0,x,e) into a countable number of non–overlapped intervals { Ij0 : j =
1, · · · , J 0 }, where J 0 ∈ N ∪ {+∞}, such that ∆0 (·, x, z1 , z2 ) is continuous, strictly monotone
on every interval, and discontinuous or achieve a local extreme value at each endpoints of
these intervals. By the proof of Lemma 3, (12) and the fact that φx (·) : Sh(0,x,e) → Sh(1,x,e) is
continuous and strictly increasing implies that Sh(1,x,e) can be partitioned into the same number
of non–overlapped intervals {φx ( Ij0 ) : j = 1, · · · , J 0 }, such that ∆1 (·, x, z1 , z2 ) is continuous,
strictly monotone on every interval, and discontinuous or achieve a local extreme value at each
endpoints of these intervals.
Moreover, for each y ∈ Ij0 , we solve φx (y) by (12) as
−1
φx (y) = ∆1,j
(∆0,j (y, x, z1 , z2 ), x, z1 , z2 ),
where ∆0,j (·, x, z1 , z2 ) and ∆1,j (·, x, z1 , z2 ) are the projections of ∆0 (·, x, z1 , z2 ) and ∆1 (·, x, z1 , z2 )
on the support Ij0 and φx ( Ij ), respectively.
For the only if part, w.l.o.g., suppose ∆0 (·, x, z1 , z2 ) is constant on a non–degenerate interval
I ⊆ Sh(0,x,e) . By the proof of Lemma 3, ∆1 (·, x, z1 , z2 ) is also constant on φx ( I ). It suffices to
construct a continuous and strictly increasing function φ̃x 6= φx such that (12) holds for φ̃x .
Let φ̃x (y) = φx (y) for all y 6∈ I and φ̃x (y) = φx ( g(y)) for all y ∈ I, where g is an arbitrary
continuous, strictly increasing mapping from I onto I. Clearly, there are plenty of choices
for the function g. Moreover, φ̃x 6= φx if g(t) is not an identity mapping. By construction,
∆1 (φ̃x (y), x, z1 , z2 ) = ∆1 (φx (y), x, z1 , z2 ) holds for all y ∈ I because ∆1 (·, x, z1 , z2 ) is constant on
φx ( I ). This equation also holds for all y 6∈ I by the definition of φ̃x . Then (12) holds for φ̃x .
32
A.6. Proof of Theorem 2.
Proof. We first show the identification of Sh(d,x,e) . Clearly, SY |X = x,D=d ⊆ Sh(d,x,e) . Suppose
w.l.o.g. SY |X = x,D=0 ( Sh(0,x,e) . Therefore, there exists an interval Ie in Se such that P(e ∈
Ie ; D = 0| X = x, Z = z) = 0 and P(e ∈ Ie ; D = 1| X = x, Z = z) = P( Ie ) for all z ∈ SZ|X = x . In
other words, conditional on X = x and Z = z for any z ∈ SZ|X = x , all individual with e ∈ Ie
choose D = 1 almost surely. Thus, Sh(1,x,e)|e∈ Ie ⊆ SY |X = x,D=1 and the latter support is identified.
For e1 < e2 ∈ Ie , note that
P(Y ≤ h(1, x, e2 ); D = 1| X = x, Z = z) − P(Y ≤ h(1, x, e1 ); D = 1| X = x, Z = z)
= P(e ∈ (e1 , e2 ]), ∀z ∈ SZ|X =x .
Hence, for z1 , z2 ∈ SZ|X = x with p( x, z1 ) < p( x, z2 ), we have
∆1 (h(1, x, e2 ), x, z1 , z2 ) − ∆1 (h(1, x, e1 ), x, z1 , z2 )
= P(Y ≤ h(1, x, e2 ); D = 1| X = x, Z = z2 ) − P(Y ≤ h(1, x, e1 ); D = 1| X = x, Z = z2 )
− P(Y ≤ h(1, x, e2 ); D = 1| X = x, Z = z1 ) + P(Y ≤ h(1, x, e1 ); D = 1| X = x, Z = z1 ) = 0.
which contradicts with the piecewise strict monotonicity of ∆1 (·, x, z1 , z2 ) on SY |X = x,D=1 . Therefore, we have SY |X = x,D=d = Sh(d,x,e) .
Thus, the identification of φx (·) and h(d, x, ·) follows directly from Lemmas 1 and 4.
A.7. Proof of Theorem 3.
Proof. For the only if part, let gx (·) = φx (·) on the support Sh(0,x,e) . The result is straightforward
by Section 4,
For the if part, we prove by constructing a structure S = [h, FeD|XZ ] to rationalize the given
1
distribution FYD|XZ . Fix an arbitrary x. Let Hx (y) ≡ P Y · (1 − D ) + g−
x (Y ) · D ≤ y | X = x
and h(0, x, τ ) be its τ–th quantile. Moreover, let h(1, x, τ ) = gx (h(0, x, τ )). Clearly, condition
(i) and the fact that gx is continuous and strictly increasing on the support ensure h(d, x, ·) are
continuous and strictly increasing on [0, 1]. Further let PS (e ≤ τ; D = d| X = x, Z = z) = P(Y ≤
h(d, x, τ ); D = d| X = x, Z = z) for all z ∈ SZ|X = x and (τ, d) ∈ [0, 1] × {0, 1}, where PS denote
33
the probability measure under the constructed structure S. We now show that ( X, Z )⊥e and
e ∼ U [0, 1].
By (13), we have
P Y ≤ gx (y); D = 1| X = x, Z = z1 + P Y ≤ y; D = 0| X = x, Z = z1
= P Y ≤ gx (y); D = 1| X = x, Z = z2 + P Y ≤ y; D = 0| X = x, Z = z2
= P Y ≤ gx (y); D = 1| X = x + P Y ≤ y; D = 0| X = x = Hx (y).
where the second equality is because SZ|X = x = {z1 , z2 }.
Hence, for any τ ∈ [0, 1] and i = 1, 2,
PS (e ≤ τ | X = x, Z = zi ) = P Y ≤ h(0, x, τ ); D = 0| X = x, Z = zi
+ P Y ≤ h(1, x, τ ); D = 1| X = x, Z = zi = Hx (h(0, x, τ )) = τ,
where the last step is by the construction of h(0, x, ·) and the strict monotonicity of Hx (·) on the
support S(1− D)Y + Dg−x 1 (Y )|X = x . Therefore, ( X, Z )⊥e and e ∼ U [0, 1].
Now it suffices to show that the constructed structure S = [h, FeD|XZ ] generates the given
distribution FYD|XZ . This is true because for any (y, d, x, z) ∈ SYDXZ , we have
PS (Y ≤ y; D = d| X = x, Z = z) = PS (e ≤ h−1 (d, x, y); D = d| X = x, Z = z)
= P(Y ≤ y; D = d| X = x, Z = z).
The last step comes from the construction of FeD|XZ .
A.8. Proof of Theorem 4.
Proof. By Section 4, the only if part is trivial.
We now show the if part. Fix X = x. Suppose the condition holds for a structure S0 =
[h, FeD|XZ ] satisfying assumption A. For notation simplicity, we will suppress x in the following
analysis.
It suffices to construct an observational equivalent structural S1 ∈ M1 . Let η ∼ U (0, 1) and
m(z) = p(z). Hence, it suffices to construct the copula function of the joint distribution Feη .
34
By Theorem 3, there exists a continuous and strictly increasing function g, which maps the
support S(1− D)Y + Dg−1 (Y ) onto SDY +(1− D) g(Y ) and solves (13). Moreover, for each τ ∈ (0, 1), we
define h(1, τ ) and h(0, τ ) as τ–th quantile of the distribution of P( DY + (1 − D ) g(Y ) ≤ ·) and
P( Dg−1 (Y ) + (1 − D )Y ≤ ·), respectively. Because g is a one–to–one and onto mapping from
S(1− D)Y + Dg−1 (Y ) to SDY +(1− D) g(Y ) , h(d, ·) is strictly increasing on the support (0, 1). Moreover,
let C = Sh−1 (1,Y )| D=1 ∩ Sh−1 (0,Y )| D=0 . By definition, (i) ∆d (y, z1 , z2 ) is strictly increasing on h(d, C )
and flat elsewhere; (ii) ∆0 (h(0, τ ), z1 , z2 ) = ∆1 (h(1, τ ), z1 , z2 ).
Moreover, for any τ ∈ (0, 1), let
P(e ≤ τ, m(z1 ) < η ≤ m(z2 )) = ∆1 (h(1, τ ), z1 , z2 ),
which is weakly increasing in h(1, τ ) ∈ SDY +(1− D) g(Y ) . Note that P(m(z1 ) < η ≤ m(z2 )) =
∆1 (∞, z1 , z2 ) = p(z2 ) − p(z1 ). Moreover, for any τ ∈ (0, 1), let
P(e ≤ τ, η ≤ m(z1 )) = P(Y ≤ h(1, τ )| D = 1, Z = z1 ) − ∆1 (h(1, τ ), z1 , z2 ).
P(e ≤ τ, η > m(z2 )) = P(Y ≤ h(0, τ )| D = 0, Z = z2 ) − ∆0 (h(0, τ ), z1 , z2 ).
By the weak monotonicity of ∆d (·, z1 , z2 ), the constructed Feη is also monotone on its support.
Using the constructed m, h, Fe,η , one can verify that this structure S1 ∈ S1 is observationally
equivalent.
A.9. Proof of Theorem 5.
Proof. Our proofs take two steps: First, we will show that
h(1, x, τ ) = Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fh(0,x̃,e)|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (h(0, x̃, τ ))
∀τ ∈ [0, 1].
Second, we will show that the distribution of h(d, x, e) can be identified, from which we identify
function h(d, x, ·).
Fix x ∈ SX and x̃ ∈ SX satisfying assumption C’. Let further z1 , z2 ∈ SZ|X = x and z̃1 , z̃2 ∈
SZ|X = x̃ such that p( x, z1 ) = p( x̃, z̃1 ) < p( x, z2 ) = p( x̃, z̃2 ). By definition, the RHS of (14) is
weakly increasing and left–continuous. For any τ ∈ [0, 1], let ψ(0, x̃, τ ) = sup{e : h(0, x̃, e) =
35
h(0, x̃, τ )}. By A’, we have
Fh(0,x̃,e)|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (h(0, x̃, τ )) = Fe|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (ψ(0, x̃, τ )).
Therefore,
Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fh(0,x̃,e)|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (h(0, x̃, τ ))
= Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fe|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (ψ(0, x̃, τ ))
= Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fe|m(x,z1 )<η ≤m(x,z2 ) (ψ(0, x̃, τ )) .
The last step comes from the fact that m( x, z j ) = m( x̃, z̃ j ) for j = 1, 2. Note that
P h(1, x, e) ≤ h(1, x, ψ(0, x̃, τ ))m( x, z1 ) < η ≤ m( x, z2 )
≥ P e ≤ ψ(0, x̃, τ )m( x, z1 ) < η ≤ m( x, z2 ) = Fe|m(x,z1 )<η ≤m(x,z2 ) (ψ(0, x̃, τ )),
which implies
Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fe|m(x,z1 )<η ≤m(x,z2 ) (ψ(0, x̃, τ )) ≤ h(1, x, ψ(0, x̃, τ )).
Moreover, for any y < h(1, x, ψ(0, x̃, τ )),
P h(1, x, e) ≤ ym( x, z1 ) < η ≤ m( x, z2 )
= P h(1, x, e) ≤ h 1, x, ψ(0, x̃, τ ) m( x, z1 ) < η ≤ m( x, z2 )
− P y < h(1, x, e) ≤ h(1, x, ψ(0, x̃, τ ))m( x, z1 ) < η ≤ m( x, z2 )
< Fe|m(x,z1 )<η ≤m(x,z2 ) (ψ(0, x̃, τ )),
where the last inequality comes from the fact that by A’, for any y < h(1, x, ψ(0, x̃, τ ),
P y < h(1, x, e) ≤ h(1, x, ψ(0, x̃, τ ))m( x, z1 ) < η ≤ m( x, z2 ) > 0.
Thus, we have that
Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fe|m(x,z1 )<η ≤m(x,z2 ) (ψ(0, x̃, τ )) = h (1, x, ψ(0, x̃, τ )) ,
36
which gives us that
Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fh(0,x̃,e)|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (h(0, x̃, τ )) = h (1, x, ψ(0, x̃, τ )) .
By the definition of ψ(0, x̃, τ ) and condition (ii), there is h (1, x, ψ(0, x̃, τ )) = h (1, x, τ ) for all
τ ∈ [0, 1]. Hence,
Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fh(0,x̃,e)|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (y) = h 1, x, h−1 (0, x̃, y)
for all y ∈ Sh(0,x̃,e) .
For the second step, it is straightforward that the distribution of h(1, x, e) is identified as
Fh(1,x,e) (y) = P(Y ≤ y; D = 1| X = x ) + P φx,x̃ (Y ) ≤ y; D = 0| X = x̃ , ∀y ∈ R.
Now we show the identification of h(1, x, ·) from Fh(1,x,e) .
By definition, Qh(1,x,e) (τ ) = inf {y ∈ R : P [h(1, x, e) ≤ y] ≥ τ }. Because of the weakly monotonicity of h in e, we have h(1, x, u) ≤ h (1, x, τ ) for all u ≤ τ. Therefore, we have that
P [h(1, x, e) ≤ h (1, x, τ )] ≥ τ.
It follows that Qh(1,x,e) (τ ) ≤ h (1, x, τ ).
Moreover, fix arbitrary y < h (1, x, τ ). Then
P [h(1, x, e) ≤ y] = P [h(1, x, e) ≤ y; e ≤ τ ] = P [e ≤ τ ] − P [e ≤ τ; h(1, x, e) > y] < τ.
where the first equality is because h(1, x, e) ≤ y implies e ≤ τ, and the inequality comes from
the fact that P [e ≤ τ; h(1, x, e) > y] > 0 under A’. Thus, Qh(1,x,e) (τ ) > y for all y < h(1, x, τ ).
Hence, Qh(1,x,e) (τ ) = h (1, x, τ ).
A.10. Proof of Corollary 2.
Proof. Fix x ∈ SX and x̃ ∈ SX satisfying assumption C’-(i). Let further z1 , z2 ∈ SZ|X = x and
z̃1 , z̃2 ∈ SZ|X = x̃ such that p( x, z1 ) = p( x̃, z̃1 ) < p( x, z2 ) = p( x̃, z̃2 ). By the proof of Corollary 1,
we have
Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fh(0,x̃,e)|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (h(0, x̃, τ )) = h (1, x, ψ(0, x̃, τ )) ≥ h (1, x, τ )
37
where the last inequality comes from the fact that ψ(0, x̃, τ ) ≥ τ. Therefore,
Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fh(0,x̃,e)|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (y) ≥ h 1, x, h−1 (0, x̃, y) = φx,x̃ (y)
for all y ∈ Sh(0,x̃,e) .
For the lower bound, let ϕ(0, x̃, τ ) = inf{e : h(0, x̃, e) = h(0, x̃, τ )}. By A’, we have
Kh(0,x̃,e)|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (h(0, x̃, τ )) = Fe|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) ( ϕ(0, x̃, τ )).
Therefore,
Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Kh(0,x̃,e)|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) (h(0, x̃, τ ))
= Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fe|m(x̃,z̃1 )<η ≤m(x̃,z̃2 ) ( ϕ(0, x̃, τ ))
= Qh(1,x,e)|m(x,z1 )<η ≤m(x,z2 ) Fe|m(x,z1 )<η ≤m(x,z2 ) ( ϕ(0, x̃, τ ))
= h(1, x, ϕ(0, x̃, τ )) ≤ h(1, x, τ ),
which thereafter provides a lower bound for h(1, x, h−1 (0, x̃, ·)).
Given the interval identification of φx,x̃ (·), the bounds of h(1, x, y) follow a similar argument
of the proof in Corollary 1.
A.11. Proof of Corollary 3.
Proof. The proof simply follows the arguments in Corollaries 1 and 2.
38
Download