More on Confidence Intervals for Partially Identified Parameters Jörg Stoye New York University

advertisement
More on Confidence Intervals for Partially Identified
Parameters
Jörg StoyeW
New York University
August 6, 2007
Abstract
This paper extends Imbens and Manski’s (2004) analysis of confidence intervals for interval
identified parameters. For their final result, Imbens and Manski implicitly assume supere!cient
estimation of a nuisance parameter. This appears to have gone unnoticed before, and it limits the
result’s applicability.
I re-analyze the problem both with assumptions that merely weaken the supere!ciency condition and with assumptions that remove it altogether. Imbens and Manski’s confidence region
is found to be valid under weaker assumptions than theirs, yet supere!ciency is required. I also
provide a dierent confidence interval that is valid under supere!ciency but can be adapted to
the general case, in which case it embeds a specification test for nonemptiness of the identified set.
A methodological contribution is to notice that the di!culty of inference comes from a boundary problem regarding a nuisance parameter, clarifying the connection to other work on partial
identification.
Keywords: Bounds, identification regions, confidence intervals, uniform convergence, supere!ciency.
JEL classification codes: C10, C14.
WI
am indebted to Adam Rosen, whose insightful comments helped to much improve this paper. I also thank Xiaohong
Chen, Guido Imbens, Chuck Manski, and Hyungsik Roger Moon for feedback and encouragement and participants of
the Econometric Society 2007 North American Summer Meeting (Duke University) for comments. Of course, any
errors are mine. Address: Department of Economics, New York University, 19 W. 4wk Street, New York, NY 10012,
j.stoye@nyu.edu.
1
1
Introduction
Analysis of partial identification, that is, of models where only bounds on parameters are identified,
has become an active field of econometrics.1 Within this field, attention has only recently turned to
general treatments of estimation and inference. An important contribution in this direction is due to
Imbens and Manski (2004, IM henceforth). Their major innovation is to point out that in constructing
confidence regions for partially identified parameters, one might be interested in coverage probabilities
for the parameter rather than its “identified set.” The intuitively most obvious, and previously used,
confidence regions have nominal coverage probabilities defined for the latter, which means that they
are conservative with respect to the former. IM go on to propose a number of confidence regions
designed to cover real-valued parameters that can be asymptotically concluded to lie in an interval.
This paper refines and extends IM’s technical analysis, specifically their last result, a confidence
interval that exhibits uniform coverage of partially identified parameters if the length of the identified
interval is a nuisance parameter. IM’s proof of coverage for that confidence set relies on a high-level
assumption that turns out to imply supere!cient estimation of this nuisance parameter and that will
fail in many applications. I take this discovery as point of departure for a new analysis of the problem,
providing dierent confidence intervals that are valid with respectively without supere!ciency.
A brief summary and overview of results goes as follows. In section 2, I describe a simplified,
and somewhat generalized, version of IM’s model, briefly summarize the relevant aspects of their
contribution, and explain the aforementioned issue. Section 3 provides a re-analysis of the problem.
To begin, I show how to construct a confidence region if the length of the identified interval is known.
This case is a simple but instructive benchmark; subsequent complications stem from the fact that the
interval’s length is generally a nuisance parameter. Section 3.2 analyses inference given supere!cient
estimation of this nuisance parameter. It reconstructs IM’s result from weaker assumptions, but also
proposes a dierent confidence region. In section 3.3, supere!ciency is dropped altogether. This
case requires a quite dierent analysis, and I propose a confidence region that adapts the last of the
previous ones and embeds a specification test for emptiness of the identified set. Section 4 concludes
and highlights connections to current research on partially identified models. The appendix contains
all proofs.
1 See
Manski (2003) for a survey and Haile and Tamer (2003) as well as Honoré and Tamer (2006) for recent contri-
butions.
2
2
Background
Following Woutersen (2006), I consider a simplification and generalization of IM’s setup that removes
some nuisance parameters. The object of interest is the real-valued parameter 0 (S ) of a probability
distribution S ([); S must lie in a set P that is characterized by ex ante constraints (maintained
assumptions). The random variable [ is not completely observable, so that 0 may not be identified.
Assume, however, that the observable aspects of S ([) identify bounds o (S ) and x (S ) s.t. 0 5 [o > x ]
a.s. See the aforecited references for examples. The interval 0 [o > x ] will also be called identified
set. Let (S ) x o denote its length; obviously, is identified as well. Assume that estimators
b
b exist and are connected by the identity b b
x , and x b
o .
o , b
Confidence regions for identified sets of this type are conventionally formed as
¸
bo
bx
f f FL = b
x + s
>
o s > b
Q
Q
bx are standard errors for b
o respectively b
x , and where f is chosen s.t.
where bo respectively (f ) (f ) = 1 =
(1)
For example, f = 1 (0=975) 1=96 for a 95%-confidence interval. Under regularity conditions,
Pr(0 FL ) $ 1 ; see Horowitz and Manski (2000). IM’s contribution is motivated by the
observations that (i) one might be interested in coverage of 0 rather than 0 , (ii) whenever A 0,
then Pr(0 5 FL ) $ 1 @2. In words, a 90% C.I. for 0 is a 95% C.I. for 0 . The reason is that
asymptotically, is large relative to sampling error, so that noncoverage risk is eectively one-sided
at {o > x } and vanishes otherwise. One would, therefore, be tempted to construct a level C.I. for as FL2 .2
Unfortunately, this intuition works pointwise but not uniformly over interesting specifications of
P. Specifically, Pr(0 5 FL ) = 1 if = 0 and also Pr(0 5 FL ) $ 1 along any local
parameter sequence where Q = r(Q 1@2 ), i.e. if becomes small relative to the sampling error.
While uniformity failures are standard in econometrics, this one is unpalatable because it concerns a
very salient region of the parameter space; were it neglected, one would be led to construct confidence
intervals that shrink as a parameter moves from point identification to slight underidentification.3
2 To
avoid uninstructive complications, I presume $ =5 throughout.
problem would be avoided if P were restricted s.t. { is bounded away from 0. But such a restriction will
3 The
frequently be inappropriate. For example, one cannot a priori bound from below the degree of item nonresponse in a
survey or of attrition in a panel.
Even in cases where { is known a priori, e.g. interval data, the problem arguably disappears only in a superficial sense.
Were it ignored, one would construct confidence intervals that work uniformly given any model but whose performance
deteriorates across models as point identification is approached.
3
IM therefore conclude by proposing an intermediate confidence region that takes the uniformity
problem into account. It is defined as
FL1
where f solves
¸
f1 f1 bo b
bx
b
>
o s > x + s
Q
Q
(2)
Ã
!
s
b
¢
¡
Q
f1 +
f1 = 1 =
max {b
o > bx }
(3)
Comparison with (1) reveals that calibration of f1 takes into account the estimated length of the
b = 0, that is if point
identified set. For a 95% confidence set, f1 will be 1 (0=975) 1=96 if b grows large relative to
identification must be presumed, and will approach 1 (0=95) 1=64 as sampling error. IM show uniform validity of FL1 under the following assumption.
Assumption 1 (i) There exist estimators b
o and b
x that satisfy:
5
35 6 5
6
64
2
b
s
0
o o
o x
g
o
8$
8D
Q7
Q C7 8 > 7
b
0
o x
2x
x x
uniformly in S 5 P, and there are estimators
³
´
b2o > b2o > b
that converge to their population values
uniformly in S 5 P.
(ii) For all S 5 P, 2 2o > 2x 2 for some positive and finite 2 and 2 , and x o ? 4.
¯
³s ¯
´
¯b
¯
(iii) For all A 0, there are y A 0, N, and Q0 s.t. Q Q0 implies S u
Q ¯
¯ A Ny ? uniformly in S 5 P.
While it is clear that uniformity can obtain only under restrictions on P, it is important to note that
is not bounded from below, thus the specific uniformity problem that arises near point identification
is not assumed away. Having said that, conditions (i) and (ii) are fairly standard, but (iii) deserves
b approaches its population counterpart in a specific way. If
some explanation. It implies that b = 0 with probability approaching 1 in finite samples, i.e. if point identification obtains,
= 0, then b must be degenerate. What’s more,
then this will be learned exactly, and the limiting distribution of degenerate limiting distributions occur along any local parameter sequence that converges to zero, as
is formally stated in the following lemma.4
Lemma 1 Assumption 1(iii) implies that
¯
s ¯¯
s
b Q ¯¯ $
Q ¯
0 for any sequence of distributions {SQ } P s.t. Q (SQ ) $ 0.
4 This
paper makes heavy use of local parameters, and to minimize confusion, I reserve the subscript (·)Q for deter-
ministic functions of Q, including local parameters; hence the use of f where IM used FQ . Estimators are denoted by
f throughout.
(·)
4
b to be supere!cient at = 0. This feature appears to not
In words, assumption 1(iii) requires have been previously recognized; it is certainly nonstandard and might even seem undesirable.5 This
judgment is moderated by the fact that, as will be shown below, the feature is fulfilled and useful in
a leading application, namely estimation of a mean with missing data. Nonetheless, some di!culties
b is not given in other leading applications, notably when b
remain. First, supere!ciency of o and b
x
come from moment conditions. Second, assumptions 1(i)-(ii) and (iii) are mutually consistent only if
additional restrictions hold. To see this, note that by assumption 1(i)-(ii),
´
´ s ³
´
s ³
s ³
b Q =
Q b
x x Q b
o o
¡
¢
g
$ Q 0> 2o + 2x 2 o x
uniformly in S . In view of lemma 1, this is consistent with condition (iii) for sequences of distributions
S s.t. $ 0 only if 2o 2x $ 0 and $ 1 for all those sequences. These restrictions are again violated
in important applications, and it also turns out that if they hold, then an interesting alternative to
FL1 can be formulated. All in all, there is ample reason to take a second look at the inference problem.
3
Re-analysis of the Inference Problem
3.1
Inference with Known {
I will now re-analyze the problem and provide several results that circumvent the aforementioned
issues. To begin, move one step back and assume that is known. More specifically, impose:
Assumption 2 (i) There exists an estimator b
o that satisfies:
i
s h
¢
¡
g
Q b
o o $ Q 0> 2o
uniformly in S 5 P, and there is an estimator b2o that converges to 2o uniformly in S 5 P.
(ii) 0 is known.
(iii) For all S 5 P, 2 2o > 2x 2 for some positive and finite 2 and 2 .
By symmetry, it could of course be x that can be estimated. A natural application for this
scenario would be inference about the mean from interval data, where the length of intervals (e.g.,
income brackets) does not vary on the support of . Define
¸
e
f e
f bo
bo
f = b
x + s
>
FL
o s > b
Q
Q
5 When
Hodges originally defined a supere!cient estimator, his intent was not, of course, to propose its use. For
cautionary tales regarding the implicit, and sometimes inadvertent, use of supere!cient estimators, see Leeb and Pötscher
(2005).
5
Ã
!
s
Q
e
f +
(e
f ) = 1 =
bo
where
(4)
Lemma 2 establishes that this confidence interval is uniformly valid.
Lemma 2 Let assumption 2 hold. Then
lim inf
inf
Q $4 5 S :0 (S )=
³
´
f = 1 =
Pr 0 5 FL
Lemma 2 generalizes IM’s lemma 3. It is technically new but easy to prove: The normal approx³
´
f is concave in 0 and equals (1 ) if 0 5 {o > x }. The main purpose of
imation to Pr 0 5 FL
f is not feasible. As will be seen, the
lemma 2 is as a backdrop for the case with unknown , when FL
s
impossibility of estimating Q , and by implication e
f , is the root cause of most complications.
3.2
Inference with Supere!ciency
In this section, I assume that is unknown but maintain supere!ciency. I begin by showcasing the
weakest (to my knowledge) assumption under which FL1 is valid.
Assumption 3 (i) There exists an estimator b
o that satisfies:
i
s h
¡
¢
g
Q b
o o $ Q 0> 2o
uniformly in S 5 P, and there is an estimator b2o that converges to 2o uniformly in S 5 P.
b that satisfies:
(ii) There exists an estimator ´
i
s h³
¡
¢
g
b (o + ) $
Q b
o + Q 0> 2x
uniformly in S 5 P, and there is an estimator b2x that converges to 2x uniformly in S 5 P.
¯
s
s ¯¯
s
b Q ¯¯ $
(iii) There exists a sequence {dQ } s.t. dQ $ 0, dQ Q $ 4, and Q ¯
0 for any
sequence of distributions {SQ } P with Q dQ .
(iv) For all S 5 P, 2 2o > 2x 2 for some positive and finite 2 and 2 .
b (By
Assumption 3 models a situation where x is estimated only indirectly by b
x b
o + .
symmetry, the case of directly estimating (x > ) is covered as well.) Importantly, uniform joint
³
´
b is not imposed. Furthermore, condition (iii) has been replaced
asymptotic normality of b
o > b
o + with a requirement that is strictly weaker and arguably more transparent about what is really being
required.
Of course, assumption 3(iii) is again a supere!ciency condition, but it faithfully models a leading
application and IM’s motivation, namely estimation of a mean with missing data. To see this, let
6
= E[, where [ 5 [0> 1], and assume that one observes realizations of (G> G · [), where G 5 {0> 1}
indicates whether a data point is present (G = 1) or missing (G = 0). Then the identified set for 0 is
[o > x ] = [(1 ) E([|G = 1)> (1 ) E([|G = 1) + ]>
where Pr(G = 0) = 1 EG (the definition as “one minus propensity score” insures consistency
with previous use). The obvious estimator for 0 is its sample analog
5
6
:
9
Q
Q
Q
:
91 X
X
X
1
1
:
9
b0 9
Gl [l >
Gl [l + 1 Gl : =
:
9Q
Q
Q
l=1
l=1
7| l=1
{z
} |
{z
} |
{z
}8
e
o
e
o
(5)
e
b is, therefore, natural. Under regularity conditions,
o +
In this application, indirect estimation of b
x as b
³
´
b to individually normal distributions follows from a uniform central
uniform convergence of b
o > b
o + b fulfils part (iii) of both assumptions 1 and 3,
limit theorem. Finally, it is interesting to note that making it a natural example of a supere!cient estimator.
This section’s first result is as follows.
Proposition 1 Let assumption 3 hold. Then
lim inf
inf
Q$4 5 S :0 (S )=
¡
¢
Pr 0 5 FL1 = 1 =
In words, assumption 3 su!ces for validity of IM’s interval. To understand the use of supere!ciency,
f , with f1 being an estimator of e
f . Validity of
it is helpful to think of FL1 as feasible version of FL
FL1 would easily follow from consistency of f1 , but unfortunately, such consistency does not obtain
³
´
³s
´
s
b is usually of order R(Q 1@2 ), so that
b Q does
under standard assumptions: Q
not vanish.
This is where supere!ciency comes into play. Think in terms of sequences of distributions SQ that
give rise to local parameters Q , and distinguish between sequences where Q vanishes fast enough
³s
´
s
b Q does
for condition (iii) to apply and sequences where this fails. In the former case,
Q
vanish, and consistency of f1 for e
f is recovered. In the latter case, Q grows uniformly large relative
to sampling error, so that the uniformity problem does not arise to begin with. The “naive” FL2 is
f ) is asymptotically equivalent to it.
then a valid construction, and FL1 (as well as FL
Proposition 1 shows that FL1 is valid under conditions that weaken assumption 1 and remove
some hidden restrictions. However, further investigation reveals that the interval has a nonstandard
property. Its simplicity stems in part from the fact that expression (3) simultaneously calibrates
¡
¢
¡
¢
Pr o 5 FL1 and Pr x 5 FL1 . This is possible because max {b
bx } is substituted where one would
o > 7
otherwise have to dierentiate between bo and bx . In eect, f1 is calibrated under the presumption
that max {b
o > bx } will be used as standard error at both ends of the confidence interval. Of course,
this presumption is not correct — bo is used near b
o and bx near b
x . As a result, the nominal size
of FL1 is not 1 in finite samples.6 If bo A bx , the interval will be nominally conservative at x ,
nominally invalid at o , and therefore nominally invalid for 0 . It also follows that FL1 is the inversion
of a hypothesis test for K0 : 0 5 0 that is nominally biased, that is, its nominal power is larger for
some points inside of 0 than for some points outside of it.
To be sure, this feature is a finite sample phenomenon. Under assumption 3, nominal size of FL1
will approach 1 at both o and x as Q $ 4. (For future reference, note the reason for this when
Q is small: Supere!ciency then implies that o $ x .) Nonetheless, it is of interest to notice that
it can be avoided at the price of a mild strengthening of assumptions. Specifically, impose:
Assumption 4 (i) There exist estimators b
o and b
x that satisfy:
6
64
5
35 6 5
b
s
o x
o o
0
2o
g
8 $ Q C7 8 > 7
8D
Q7
b
0
o x
2x
x x
uniformly in S 5 P, and there are estimators
³
´
b2o > b2o > b
that converge to their population values
uniformly in S 5 P.
(ii) For all S 5 P, 2 2o > 2x 2 for some positive and finite 2 and 2 , and x o ? 4.
¯
s ¯¯
s
b Q ¯¯ $
(iii) There exists a sequence {dQ } s.t. dQ $ 0, dQ Q 1@2 $ 4, and Q ¯
0 for any
sequence of distributions {SQ } P with Q dQ .
Assumption 4 re-introduces joint normality and diers from assumption 1 merely by the modification of part (iii). It is fulfilled in estimation of the mean with missing data: Let 1 E([|G = 1),
b as in (5), then assumption 4(i) holds with 2 = 2 @s,
2 Y du([|G = 1), s 1 , and b
o and o
2o = 2 @s + s(1 s)(1 21 ), and o x = 2 @s s(1 s)1 .
Given assumption 4, one can construct a confidence region that reflects the bivariate nature of the
estimation problem by taking into account the correlation between b
o and b
x . Specifically, let (f2o > f2x )
minimize (fo > fx ) subject to the constraint that
Ã
!
s
b q
fo
fx + Q 2
Pr }1 > b
1
}1 + 1b
}2
bo
bx
Ã
!
s
b q
fo + Q fx
2
Pr 1 >
1b
}2 b
}1 > }1 bo
bx
6 By
1 at , say, I mean
the nominal size of FL
o
U
1
FL
!
(6)
(7)
2o g{, i.e. its size at o as predicted from sample
{3e
o @e
data. Confidence regions are typically constructed by setting nominal size equal to 1 3 .
8
where }1 and }2 are independent standard normal random variables.7 In typical cases, (f2o > f2x ) will be
uniquely characterized by the fact that both of (6,7) hold with equality, but it is conceivable that one
of the conditions is slack at the solution. Let
¸
f2o b
f2x
2
b
=
FL o s > x + s
Q
Q
Then:
Proposition 2 Let assumption 4 hold. Then
lim inf
inf
Q$4 5 S :0 (S )=
¢
¡
Pr 0 5 FL2 = 1 =
f : Knowledge of would imply that
Observe that if were known, FL2 would simplify to FL
= 1, b
= 1, and bo = bx , which can be substituted into (6,7) to get
!
Ã
s
fo + Q fo
1
Pr } bo
bo
Ã
!
s
fx + Q fx
Pr 1 >
}
bo
bo
f .
bo e
f , yielding FL
where } is standard normal. The program is then solved by setting f3o = f3x = By the same token, FL2 is asymptotically equivalent to FL1 along any parameter sequence where
supere!ciency applies. For parameter sequences where does not vanish, all of these intervals are
asymptotically equivalent anyway because they converge to FL2 .
In the regular case where both of (6,7) bind, FL3 has nominal size of exactly 1 at both endpoints
of 0 , and accordingly corresponds to a nominally unbiased hypothesis test, under sample information
that is available for the mean with missing data. This might be considered a refinement, although (i)
given asymptotic equivalence, it will only matter in small samples, and (ii) nominal size must be taken
s
with a large grain of salt due to nonvanishing estimation error in Q . Perhaps the more important
dierence is that FL2 , unlike FL1 , is readily adapted to the more general case.8
3.3
Inference with Joint Normality
While the supere!ciency assumption was seen to have a natural application, it is of obvious interest
to consider inference about 0 without it. For a potential application, imagine that b
x and b
o derive
7 Appendix
8 As
B exhibits closed-from expressions for (6,7), illustrating that they can be evaluated without simulation.
an aside, this section’s findings resolve questions posed to me by Adam Rosen and other readers of IM, namely,
1 is valid even though is not estimated and (ii) whether estimating can lead to a refinement. The brief
(i) why FL
answers are: (i) Supere!ciency implies = 1 in the critical case, eliminating the need to estimate it; indeed, it is now
seen that mention of can be removed from the assumptions. (ii) Estimating allows for inference that is dierent in
finite samples but not under first-order asymptotics.
9
from separate inequality moment conditions (as in Pakes et al. 2006 and Rosen 2006). I therefore now
turn to the following assumption:
Assumption 5 (i) There are estimators b
o and b
x that satisfy:
6
5
35 6 5
b
s
2o
0
o o
g
8$
Q7
Q C7 8 > 7
b
0
o x
x x
64
o x
2x
8D
uniformly in S 5 P, and there are estimators (b
o > bx > b
) that converge to their population values
uniformly in S 5 P.
(ii) For all S 5 P, 2 2o > 2x 2 for some positive and finite 2 and 2 , and ? 4.
Relative to previous assumptions, assumption 5 simply removes supere!ciency. This leads to
b need not vanish
numerous di!culties. At the core of these lies the fact that sample variation in as $ 0. This leads to boundary problems in the implicit estimation of . In fact, is the
exact example for inconsistency of the bootstrap given by Andrews (2000), and it is not possible to
consistently estimate a local parameter Q = R(Q 1@2 ).
To circumvent this issue, I use a shrinkage estimator
;
? >
b
b A eQ
>
= 0 otherwise
s
b in the
where eQ is some pre-assigned sequence s.t. eQ $ 0 and eQ Q $ 4. will replace calibration of f but not in the subsequent construction of a confidence region. This will insure
uniform validity, intuitively because supere!ciency at = 0 is artificially restored. Of course, there
is some price to be paid: The confidence region presented below will be uniformly valid and pointwise
exact, but conservative along local parameter sequences.
A second modification relative to IM is that I propose to generalize not FL1 but FL2 . The reason
is that without supere!ciency, the distortion of nominal size of FL1 will persist for large Q as vanishes, and the interval is accordingly expected to be invalid. (Going back to the discussion that
motivated FL2 , the problem is that $ 0 does not any more imply o $ x .) Hence, let (f3o > f3x )
minimize (fo + fx ) subject to the constraint that
Ã
!
s
q
fo
fx + Q 2
Pr }1 > b
1
}1 + 1b
}2
bo
bx
Ã
!
s
q
fo + Q fx
2
Pr 1 >
+ 1b
}2 b
}1 > }1 bo
bx
10
(8)
(9)
where }1 and }2 are independent standard normal random variables. As before, it will typically but
not necessarily be the case that both of (8,9) bind at the solution. Finally, define
; h
i
f3
f3
f3o
f3
? b
so > b
sx
+
b
x + sxQ
> b
o sQ
o
x
3
Q
Q
FL =
=
B
otherwise
(10)
The definition reveals a third modification: If b
x is too far below b
o , then FL3 is empty, which can
be interpreted as rejection of the maintained assumption that x o . In other words, FL3 embeds a
specification test. IM do not consider such a test, presumably for two reasons: It does not arise in their
leading application, i.e. estimation of means with missing data, and it is trivial in their framework
because supere!ciency implies fast learning about in the critical region where 0. But the issue
b ? 0 is a generic possibility.
is substantively interesting in other applications, and is nontrivial when Of course, one could construct a version of FL3 that is never empty; one example would be the convex
n
s
s o
hull of b
o f3o @ Q > b
x + f3x @ Q . But realistically, samples where b
x is much below b
o would lead
one to question whether x o holds. This motivates the specification test, which does not aect the
interval’s asymptotic validity. For this section’s intended applications, e.g. moment inequalities, such
a test seems attractive.
This section’s result is the following.
Proposition 3 Let assumption 5 hold. Then
lim inf
inf
Q$4 5 S :0 (S )=
¡
¢
Pr 0 5 FL3 = 1 =
An intriguing aspect of FL3 is that it is analogous to FL2 , except that it uses and accommodates
the resulting possibility that b
o f3
so
Q
Ab
x +
f3
sx .
Q
Together, FL2 and FL3 therefore provide a unified
approach to inference for interval identified parameters — one can switch between the setting with and
b
without supere!ciency essentially by substituting for .
Some further remarks on FL3 are in order.
• The construction of can be refined in two ways. First, I defined a soft thresholding estimator
b would also insure validity and presumably
for simplicity, but making a smooth function of improve performance for close to eQ . Second, the sequence eQ is left to adjustment by
the user. This adjustment is subject to the following trade-o: The slower eQ vanishes, the less
conservative FL3 is along local parameter sequences, but the quality of the uniform approximation
¡
¢
to limQ$4 inf 5 inf S :0 (S )= Pr 0 5 FL3 deteriorates, and uniformity breaks down for eQ =
R(Q 1@2 ). Fine-tuning this trade-o is a possible subject of further research.
• The event that = 0 can be interpreted as failure of a pre-test to reject K0 : x = o , where the
size of the pre-test approaches 1 as Q $ 4. In this sense, the present approach is similar to the
11
“conservative pre-test” solution to the parameter-on-the-boundary problem given by Andrews
(2000, section 4). However, one should not interpret FL3 as being based on model selection. If
x = o , then it is e!cient to estimate both from the same variance-weighted average of b
x and
b
o and to construct an according Wald confidence region, and a post-model selection confidence
region would do just that. Unfortunately, it would be invalid if Q = R(Q 1@2 ). In contrast,
FL3 employs a shrinkage estimator of to calibrate cuto values for implicit hypothesis tests,
but not in the subsequent construction of the interval. What’s more, even this first step is not
easily interpreted as model selection once is smoothed.
Having said that, there is a tight connection between the parameter-on-the-boundary issues
encountered here and issues with post-model selection estimators (Leeb and Pötscher 2005),
the underlying problem being discontinuity of pointwise limit distributions. See Andrews and
Guggenberger (2007) for a more elaborate discussion.
• FL3 could be simplified by letting f3o = bo 1 (1 2) and f3x = bx 1 (1 2), implying that
b This would render the interval shorter without aecting its firstFL3 = FL2 , whenever = .
b su!ces to insure uniformity. But it would
order asymptotics, because the transformation of b the confidence region ignores the two-sided nature of noncoverage
imply that whenever = ,
risk and hence has nominal size below (although approaching as Q grows large). The
improvement in interval length is due to this failure of nominal size and therefore spurious.
4
Conclusion
This paper extended Imbens and Manski’s (2004) analysis of confidence regions for partially identified
parameters. A brief summary of its findings goes as follows. First, I establish that one assumption
used for IM’s final result boils down to supere!cient estimation of a nuisance parameter . This
nature of their assumption appears to have gone unnoticed before. The inference problem is then reanalyzed with and without supere!ciency. IM’s confidence region is found to be valid under conditions
that are substantially weaker than theirs. Furthermore, valid inference can be achieved by a dierent
confidence region that is easily adapted to the case without supere!ciency, in which case it also embeds
a specification test.
A conceptual contribution beyond these findings is to recognize that the gist of the inference
problem lies in estimation of , specifically when it is small. This insight allows for rather brief and
transparent proofs. More importantly, it connects the present, very specific setting to more general
models of partial identification. For example, once the boundary problem has been recognized, analogy
to Andrews (2000) suggests that a straightforward normal approximation, as well as a bootstrap, will
12
fail, whereas subsampling might work. Indeed, carefully specified subsampling techniques are known
to yield valid inference for parameters identified by moment inequalities, of which the present scenario
is a special case (Chernozhukov, Hong, and Tamer 2007; Romano and Shaikh 2006; Andrews and
Guggenberger 2007). The bootstrap, on the other hand, does not work in the same setting, unless it
is modified in several ways, one of which resembles the trick employed here (Bugni 2007). Against the
backdrop of these (subsequent) results, validity of simple normal approximations in IM appears as a
puzzle that is now resolved. At the same time, the updated version of these normal approximations
has practical value because it provides closed-form inference for many important, if relatively simple,
applications.
A
Proofs
Lemma 1 The aim is to show that if Q $ 0, then
;> % A 0> <Q : Q Q =, Pr
¯
´
³s ¯
¯
¯b
Q ¯
Q ¯ A ? %=
Fix and %. By assumption 1(iii), there exist Q , y A 0, and n s.t.
¯
³s ¯
´
¯b
¯
Q ¯
¯ A Ny ? %
Q Q =, Pr
uniformly over P. Specifically, the preceding inequality will obtain if is chosen in (0> 1@y N 1@y ],
in which case Ny . Because Q $ 0, Q can be chosen s.t. Q Q , Q 1@y N 1@y .
Hence, the conclusion obtains by choosing Q = max{Q > Q }.
Lemma 2 Parameterize 0 as 0 = o + d for some d 5 [0> 1]. Then
´
³
f
Pr 0 5 FL
¶
μ
e
f e
f bo
bo
= Pr b
o s o + d b
o + + s
Q
Q
´
4
3
s ³
s
s
Q o b
o
b
Q
Q
b
o
o
d (1 d) + e
f D
= Pr Ce
f o
o
o
o
o
Ã
!
Ã
!
s
s
Q
Q
$ e
f +
(1 d) e
f d
o
o
uniformly over P. Besides uniform asymptotic normality of b
o , this convergence statement uses that
by uniform consistency of bo in conjunction with the lower bound on o , bo @ o $ 1 uniformly, and also
that the derivative of the standard normal c.d.f. is uniformly bounded.
Evaluation of derivatives straightforwardly establishes that the last expression in the preceding
display is strictly concave in d, hence it is minimized at d 5 {0> 1} / 0 5 {o > x }. But in those cases,
13
the preceding algebra simplifies to
³
´
f $ Pr o 5 FL
!
Ãs
Q
f ) = 1 +e
f (e
o
and similarly for x .
Preliminaries to Propositions The following proofs mostly consider sequences {SQ } that will be
identified with the implied sequences {Q > Q } {(SQ )> 0 (SQ )}. For ease of notation, I will generally suppress the Q subscript on (o > o > x ) and on estimators. Some algebraic steps treat (o > o > x )
as constant; this is w.l.o.g. because by compactness implied in part (ii) of every assumption, any
sequence {SQ } induces a sequence of values (o > o > x ) with finitely many accumulation points, and
the argument can be conducted separately for the according subsequences.
¡
¢
I will show that inf {Q } inf {SQ }:Q 50 (SQ ) limQ $4 Pr Q 5 FLl $ 1 > l = 1> 2> 3. These
are pointwise limits, but because they are taken over sequences of distributions, the propositions are
implied. In particular, the limits apply to sequences s.t. (Q > SQ ) is least favorable given Q . Proofs
present two arguments, one for the case that {Q } is small enough and one for the case that {Q } is
large. “Small” and “large” is delimited by dQ in propositions 1 and 2 and by fQ , to be defined later,
in proposition 3. In either case, any sequence {SQ } can be decomposed into two subsequences such
that either subsequence is covered by one of the cases.
Proposition 1 Let Q dQ , then
that
¯
s ¯¯
s
b Q ¯¯ $
Q ¯
0 by condition (iii). This furthermore implies
´ s ³
´ s s ³
´ s
s ³
b o Q $
Q b
x x = Q b
o + Q b
o o + Q Q >
s
bx bo $ 0. It
which in conjunction with conditions (i)-(ii) implies that x = o , hence by (iv) that Ã
follows that
f1
s
+ Q
b
max {b
o > bx }
!
¶
μ
s Q
s
>
$ f1 + Q
bo
but then the argument can be completed as in lemma 2.
s
s
s
Let Q A dQ , then Q Q $ 4, hence lim supQ $4 Q (Q o ) = 4 or lim supQ $4 Q (x Q ) =
4 or both. Write
¡
¢
Pr Q 5 FL1
μ
¶
f1 f1 bo
bx
b
b
b
= Pr o s Q o + + s
Q
Q
³
´ s
´
s
s ³
1
b + f1 = Pr f bo Q (Q o ) + Q o b
o Q bx
³
´´
s
s ³
= Pr f1 bo Q (Q o ) + Q o b
o
³s
´ s
´
s ³
b + f1 Pr
Q (Q o ) + Q o b
o A Q bx =
14
s
s
b divergence of Q Q implies diverQ (Q o ) ? 4. By consistency of ,
s
b Thus
gence in probability of Q .
Assume lim supQ$4
³s
´ s
´
s ³
b + f1 Q (Q o ) + Q o b
o A Q bx
³s ³
´
´ s
s
b Q (Q o )
Pr
Q o b
o A Q Pr
$ 0>
where the convergence statement uses that f1 bx 0 by construction and that
´
s ³
Q o b
o converges
to a random variable by assumption. It follows that
¡
¢
lim Pr Q 5 FL1
´´
³
s
s ³
= lim Pr f1 bo Q (Q o ) + Q o b
o
Q $4
³
´´
s ³
lim Pr f1 bo Q o b
o
Q $4
Q $4
= 1 (f1 )
1 >
s
Q (Q o ) 0, and the second inequality uses the definition of
´
s ³
1
f , as well as convergence of bo and Q o b
o @ o .
s
For any subsequence of {SQ } s.t. Q (Q x ) fails to diverge, the argument is entirely symmetric.
where the first inequality uses that
If both diverge, coverage probability trivially converges to 1. To see that a coverage probability of
1 can be attained, consider the case of = 0.
Proposition 2 A short proof uses asymptotic equivalence to FL1 , which was essentially shown in
the text. The longer argument below shows why FL2 will generally have exact nominal size and will
also be needed for proposition 3. To begin, let (e
fo > e
fx ) fulfil
Ã
!
s
e
fo
e
fx + Q p
Pr }1 > }1 1
+ 1 2 }2
o
x
Ã
!
s
e
fo + Q p
e
f
x
Pr 1
+ 1 2 }2 }1 > }1 o
x
15
and write
¸¶
μ
e
fo
e
fx
x + s
o s > b
Pr o 5 b
Q
Q
μ
¶
e
fo
fx
e
= Pr b
o s o b
x + s
Q
Q
Ã
!
s ³
s ³
´
´ e
e
fo
fx
Q
Q b
b
= Pr o o x o +
o
o
o
o
Ã
!
s ³
s
´
³
´ e
e
fo
Q
Q
f
x
b
= Pr o o +
o b
x x + x o + o b
o
o
o
o
Ã
!
s ³
s
s
μ
¶
´
´ p
e
fo
e
fx
x ³
Q
Q
Q
x
2
b
b
$ Pr +
1 }2 +
o o 1
o o +
o
o
o
o
o
o
o
Ã
!
s ³
s
s
´
´ e
Q
Q³
Q
e
fo
x p
fx
= Pr o > x
o +
+
1 2 }2
o b
o b
o
o
o
o
o
o
Ã
!
s
p
e
fo
e
fx
Q
2
$ Pr }1 > }1 (11)
+
+ 1 }2
o
x
x
1 =
(12)
Here, the first convergence statement uses that by assumption,
μ
¶
´ s ³
´´
´
³s ³
x s ³
g
2
2
b
b
b
Q x x | Q o o
$ Q Q o o > x (1 )
o
uniformly; the algebra also uses that x > o , so that neither can vanish.
As before, for any sequence {Q } s.t. Q ? dQ , supere!ciency implies that (f2o > f2x ) is consistent
for (e
fo > e
fx ), so that validity of FL2 at {o > x } follows. Convexity of the power function over [o > x ]
follows as before. For Q dQ , the argument entirely resembles proposition 1. Notice finally that (12)
will bind if (6) binds, implying that FL2 will then have exact nominal size at o . A similar argument
applies for x . But (f2o > f2x ) can minimize (fo + fx ) subject to (6,7) only if at least one of (6,7) binds,
implying that FL2 is nominally exact.
¡
¡
¢1@2
¢1@2
Proposition 3 Let fQ Q 1@2 eQ
, thus Q 1@2 fQ = Q 1@2 eQ
$ 4, and for parameter
sequences s.t. Q fQ , the proof is again as before. For the other case, consider (e
fo > e
fx ) as defined
in the previous proof. Convergence of (f3o > f3x ) to (e
fo > e
fx ) cannot be claimed. However, by uniform
b eQ ) is uniformly asymptotically
convergence of estimators and uniform bounds on ( o > x ), Pr(
³s
´
³³
´
¢1@2 ´
¡
bounded below by Q (eQ fQ ) @2 = Q 1@2 eQ Q 1@2 eQ
@2 $ 1. Hence, =
0 with probability approaching 1. Expression (11) is easily seen to increase in for every (e
fo > e
fx ),
hence FL3 is valid (if potentially conservative) at o . The argument for x is similar. (Regarding
pointwise exactness of the interval, notice that fQ $ 0, so the conservative distortion vanishes under
pointwise asymptotics.)
16
Now consider 0 do + (1 d)x , some d 5 [0> 1]. By assumption,
´
s ³
g
Q db
o + (1 d)b
x 0 $ Q (0> d )>
where 2d d2 2o + (1 d)2 2x 2d(1 d) o x . Let d A 0, then
¸¶
μ
f3o b
f3x
b
Pr 0 5 o s > x + s
Q
Q
μ
¶
3
f
f3
= Pr b
o so 0 b
x + sx
Q
Q
Ãs
!
s
s
³
´
´ f3
f3o
Q
Q
Q ³b
x
b
b
b
b
b
= Pr
(1 d) o x (0 do (1 d)x ) d x o +
d
d
d
d
d
Ãs
s
³
´
3
Q
b + Q (1 d) fo
= Pr
(1 d) d
d
d
!
s
s
³
´ sQ
3
f
Q
Q
b +
(0 db
o (1 d)b
x ) d d + x =
d
d
d
d
Consider varying , holding (o > o > x > ) constant. The cuto values fo and fx depend on only
´
s ³
b is asymptotically pivotal. Hence,
through , but recall that Pr( = 0) $ 1. Also, Q
d
the preceding probability’s limit depends on only through
s
Q
d (1
s
d) and
Q
d d.
As
s
Q
d
A 0,
the probability is minimized by setting = 0. In this case, however, 0 = o , for which coverage has
b = Q for large
already been established. Finally, d = 0 only if o = x and = 1, in which case enough Q , and the conclusion follows from lemma 2.
B
Closed-Form Expressions for (fo > fx )
This appendix provides a closed-form equivalent of (6,7). Specifically, these expressions can be written
as
4
s
b
b
fx + Q D
Cq
}+ q
g (}) 1 2
4
1b
bx 1 b
2
3
4
s
Z fx @ex
b
b
+
Q
f
o
D g (}) 1 Cq
}+ q
2
2
4
1b
bo 1 b
Z
if b
? 1 and
3
fo @e
o
Ã
!
s
b
fx + Q 1
bx
!
Ã
s
μ ¶
b
fx
fo + Q 1
bx
bo
μ
¶
n
s
e
e
fl +
Q
s
s
if b
= 1. There is no discontinuity at the limit because }
+
$
I
}
2
2
μ
fo
bo
¶
1e
el
1e
o
s
e
fl + Q >l
el
o> x, as b
$ 1. It follows that (6,7), and similarly (8,9), can be evaluated without simulation.
17
=
References
[1] Andrews, D.W.K. (2000): “Inconsistency of the Bootstrap when a Parameter is on the Boundary
of the Parameter Space,” Econometrica 68: 399-405.
[2] Andrews, D.W.K. and P. Guggenberger (2007): “Validity of Subsampling and ‘Plug-in Asymptotic’ Inference for Parameters Defined by Moment Inequalities,” mimeo, Cowles Foundation.
[3] Bugni, F.A. (2007): “Bootstrap Inference in Partially Identified Models,” mimeo, Northwestern
University.
[4] Chernozhukov, V., H. Hong, and E.T. Tamer (2007): “Parameter Set Inference in a Class of
Econometric Models,” forthcoming in Econometrica.
[5] Haile, P. and E.T. Tamer (2003): “Inference with an Incomplete Model of English Auctions,”
Journal of Political Economy 111: 1-50.
[6] Honoré, B.E. and E.T. Tamer (2006): “Bounds on Parameters in Panel Dynamic Discrete Choice
Models,” Econometrica 74: 611—629.
[7] Horowitz, J. and C.F. Manski (2000): “Nonparametric Analysis of Randomized Experiments with
Missing Covariate and Outcome Data,” Journal of the American Statistical Association 95:
77—84.
[8] Imbens, G. and C.F. Manski (2004): “Confidence Intervals for Partially Identified Parameters,”
Econometrica 72: 1845-1857.
[9] Leeb, H. and B. Pötscher (2005): “Model Selection and Inference: Facts and Fiction,” Econometric
Theory 21: 21-59.
[10] Manski, C.F. (2003): Partial Identification of Probability Distributions. Springer Verlag.
[11] Pakes, A., J. Porter, K. Ho, and J. Ishii (2006): “Moment Inequalities and their Application,”
mimeo, Harvard University.
[12] Romano, J.P. and A.M. Shaikh (2006): “Inference for Identifiable Parameters in Partially Identified Econometric Models,” mimeo, Stanford University and University of Chicago.
[13] Rosen, A.M. (2006): “Confidence Sets for Partially Identified Parameters that Satisfy a Finite
Number of Moment Inequalities,” mimeo, University College London.
[14] Woutersen, T. (2006): “A Simple Way to Calculate Confidence Intervals for Partially Identified
Parameters,” mimeo, Johns Hopkins University.
18
Download