Document 11163365

advertisement
^
B31
M415
4.^
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
POSTERIOR INFERENCE
IN
CURVED EXPONENTIAL
UNDER INCREASING DIMENSIONS
Alexandre Belloni
Victor Cheniozhukov
Working Paper 07March
2007
1
Room
,
E52-251
50 Memorial Drive
Cambridge, MA 021 42
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
http://ssrn.com/aDstract=970624
FAMILIES
POSTERIOR INFERENCE IN CURVED EXPONENTIAL
FAMILIES UNDER INCREASING DIMENSIONS
By Alexandre Belloni*
IBM
Watson Research Center and
T.J.
MIT
Operations Research Center
AND
By Victor CHERNOzHUKovt
MIT
Department of Economics and Operations Research Center
In this work we study the large sample properties of the posteriorbased inference in the curved exponential family under increasing dimension. The curved structure arises from the imposition of various
restrictions, such as moment restrictions, on the model, and plays a
fundamental
role in various
branches of data analysis.
We
establish
conditions under which the posterior distribution
is approximately
normal, which in turn imphes various good properties of estimation
and inference procedures based on the posterior. We also discuss the
multinomial model with
moment
restrictions, that arises in
a variety
of econometric applications. In our analysis, both the parameter di-
mension and the number of moments are increasing with the sample
size.
1.
Introduction.
sample results
The main motivation
for posterior inference in the
for this
paper
is
to obtain large
curved exponential family under
increasing dimension. Recall that in the exponential family, the log of a
density
is
linear in parameters 9
£ G; in the curved exponential family,
these parameters 9 are restricted to
by a lower dimensional parameter
of densities that
[8],
fall in
Lehmann and
lie
S
on a curve ?/ h^ 9{rj) parameterized
There are many classical examples
'I'.
the curved exponential family; see for example Efron
Casella
densities have also
77
[14],
and Bandorff-Nielsen
been extensively used
[1].
in applications
Curved exponential
[8, 13]. An example
of the condition that puts a curved structure onto an exponential family
a
moment
is
restriction of the type:
[ m{x,a)f{x,9)dx
=
0,
'Research support ftom a National Science Foundation grant is gratefully acknowlIBM Herman Goldstein Fellowship is also gratefully acknowledged.
^Research support from a National Science Foundation grant is gratefully acknowl-
edged,
edged. Sloan Foundation Research Fellowship and Castle Krob Chair are also gratefully
acknowledged.
1
BELLONI AND CHERNOZHUKOV
2
on a curve that can be parameterized as {0(??), r] S \I'},
(a, (3) contains a and other parameters (3 that are sufficient to parameterize all parameters
€
that solve the above equation
for some a. In econometric applications, often moment restrictions represent Euler equations that result from the data x being an outcome of an
optimization by rational decision-makers; see e.g. Hansen and Singleton [9],
Chamberlain [3], Imbens [11], and Donald, Imbens and Newey [5]. Thus, the
curved exponential framework is a fundamental complement to the expothat restricts 9 to
lie
where component
77
=
nential framework, at least in certain fields of data analysis.
Despite the importance of applications to high-dimensional data, theoretthe curved exponential family are not as well understood
ical properties of
as the corresponding properties of the exponential family. In this paper,
we
contribute to the theoretical analysis of the posterior inference in curved
exponential families of high dimension.
We
provide sufScient conditions un-
der which consistency and asymptotic normality of the posterior
when both the dimension
increasing
i.e.
"large."
is
achieved
and the sample size are
weak conditions on the priors, and ex-
of the parameter space
We
allow for
pressly allow for improper priors in particular.
We
also study the conver-
We
gence of moments and the precisions with which we can estimate them.
moment restricnumber of moments are
then apply these results to the multinomial model with
tions,
where both the parameter dimension and the
increasing with the sample
The
size.
present analysis of the posterior inference in the curved exponential
family builds upon the previous work of Ghosal
[12]
who
studied posterior
inference in the exponential family under increasing dimension.
Under
suf-
growth restrictions on the dimension of the model, Ghosal showed
that the posterior distributions concentrate in neighborhoods of the true
parameter and can be approximated by an appropriate normal distribution.
Ghosal's analysis extended in a fundamental way the classical results of
ficient
Portnoy
[17]
for
maximum
likelihood
methods
for the
exponential family
with increasing dimensions.
we
some useful results for exponential families. In fact, we begin
our analysis by first revisiting Ghosal's increasing dimension for the exponential family. We present several results that complement Ghosal's results
in several ways: First, we amend the conditions on priors to allow for a larger
set of priors, for example, improper priors; second, we use concentration inIn addition to a detailed treatment of the curved exponential family,
also establish
equalities for log-concave densities to sharpen the conditions under
the normal approximations apply; and third,
tion of Qf-th order
moments
of the posterior
we show
which
that the approxima-
by the corresponding moments
POSTERIOR INFERENCE
IN
CURVED EXPONENTIAL
3
normal density becomes exponentially difficult in the order a.
rest of the paper is organized as follows. In Section 2 we formally
define the framework, assumptions, and develop results for the exponential
of the
The
family. In Section 3, the
main
we develop the results for the curved
we apply our results to the multinomial
section,
exponential family. In Section 4
model with moment restrictions. Appendices G, D, and B
the main results and technical lemmas.
2.
Exponential Family Revisited.
random samples
Assume
we
that
collect proofs of
are have a trian-
gular array of
1)
X,
X,
Assume
further that each
\
-^(n)
n)
X>^'
^
xr.
are independent ^'"^-dimensional vec-
>
draw from a d'^^-dimensional standard exponential family whose density
defined by
tors
is
,
/(.r;^'">) =exp((.T,^("))-V.^"^(6'("^)),
(2.1)
where
9^^'
€ 0'"^ an open convex set of IR
and
i/)*^"^
is
S 0'"^ denote the (sequence
the associate nor-
parameter
assumed to be bounded away from the boundary of 0*"' (uniformly
malizing function. Let 9q
of) true
which
is
in n).
For notational convenience we will suppress the superscript (n) but
understood that the associate objects are changing with
Under
{Xi}^-i
framework, the posterior density of
tt
given the observed data
defined as
n„ie)
(2.2)
where
this
is
is
n.
(=
-
n^'^'j
TT^e)
n f{X,;
0)
=
n{9)exp
1(^2 X^,e\ -
nii9)
denotes a prior distribution on 0. As expected, we will
need to impose some regularity conditions on the prior
n. These conditions
Although the same Lipschitz condition
is required, we require only a relative lower bound on the value of the prior
on the true parameter instead of an absolute bound (see Theorem 1). Finally,
our conditions allow for improper priors in opposition to [12]. In fact, the
differs
from the ones imposed
uninformative prior trivially
in [12].
satisfies
our assumptions.
BELLONI AND CHERNOZHUKOV
4
Our
results are stated in terms of a re-centered gaussian distribution in
fi = V'(^o) and F = tp"{6o) be the mean
and covariance matrix associated with the random variables {Xi}, and let
J = F^/^ be its square root (i.e., JJ'^ = F). The re-centering is defined as
the local parameter space. Let
A„
:=
v^J-i
U ^J'^i Xi- ^y,
it
follows that
E\An]
=
posterior in the local parameter space
same
In the
is
defined for u e
and £'[A„A^]
0,
the identity matrix of appropriate (increasing) dimension
d.
is
Moreover, the
U=
^/nJ{Q —
9o)
Portnoy [17] and Ghosal [12], conditions on the growth
and fourth moments are required. More precisely, the
lines of
rates of the third
following quantities play an important role in the analysis:
(2.4)
^
(2.5)
where
V
Bin{c)
=
sup 1^9
[{a,
Vf]
B2n{c)
=
sup Ise
[{a,
V)"]
a
is
random
:
:
a G S'^-\ \\J{9
a e
S'-\
variable distributed as
\\J{9
-
<
60)
- 9o)f <
cd
—
cd
J~^{U — Eg[U]) and U has
density f{-\9) as defined in (2.1). Moreover, the following combination of
(2.4) and (2.5) is relevant
A„(c):=M>/-5i„(0) + -52n(c)
(2.6).
where we note that A„(c)
in [12]. This quantity
is
is
different (in fact smaller)
than the one defined
used to bound deviations from normality of the
posterior in a neighborhood of the true parameter.
Next we state the main
Theorem
(i)
results of this work.
For any constant
1
c
>
suppose that:
'
Bin{c)^/d/n^O;
{a) A„(c)d
—
»
•
0;
{in) \\F-'^\\d/n -^ 0;
(iv) the
.
.
prior
TT
density satisfies: supg In
^L
{
<
0{d), and
|ln7r(0)-ln^(^o)|</^n(c)|i^-^o||
POSTERIOR INFERENCE
for any e such that
<
||6'-6'o||
IN
CURVED EXPONENTIAL
y/\\F~^\\cd/n.
We
-
require Kn{c)y/\\F-^\\cd/n
0.
Then we have
asym.ptotic norm.ality of the posterior density function, that
\K{'^)
-
(Pd{u;
A„, Id)\du -^p
0.
As mentioned earlier, Theorem 1 has different assumptions on the prior
Theorem 3 of [12] has. On the other hand, it does not requires additional
that
technical assumptions used in
some
In
applications
and the growth
improved by Ind factors.
as discussed in Section B,
[12],
n
condition of d with relative to the sample size
is
might be desired to have stronger convergence
The following theorem pro-
it
properties than simplj' asymptotic normality.
Q-moment convergence.
vides sufficient conditions for the
Theorem
2 For some sequence
of {a} and d —> oo,
a \n{d +
+ a)n +
Md,a := {d
let
a)
Suppose that the following strengthening of assum.ptions
for any fixed
''^"^"
{ii')
IcMdJ
Xn {cMd,a/d)
/
>\
{iv')
l^
Kn
and
[ii)
(iv) hoi
c:
^ 0;
'
•
-
—
.
J\\F-'\\[cMd.c]''"'
(-1,^
IJ\
{cMd,a/d)
V
0.
—;r^
Then we have
|||u|n<(u) -(^d(^;A„,Jd)|ciu^p
(2.7)
We
emphasize that Theorem
size increases.
Our
a and d
2 allows for
0.
grow
to
as the sample
conditions highlight the polynomial trade off between
n
between n and a. This suggests that the
estimation of higher moments in increasing dimensions applications could
be very deUcate. Conditions (ii') and {iv') simpUfy significantly if a = o{d),
and d but an exponential trade
in such case
As an
we have M^^a
illustrative
~
d-
example consider the multinomial distribution
cation analyzed by Ghosal in
finite
[12].
Let
X=
{a;°,x\
support of a multinomial random variable
size n.
{X =
is
x'} which
is
given by 9
X
.
.
.
,x'^}
where d
appli-
be the known
is
allowed to
denote by Pi the probability of the event
assumed to satisfy max, l/pi = 0{d). The parameter
grow with sample
space
off
=
For each
{9i,. ..,9d)
i
where
9i
=
log(pi/(l
the assumption on the pj's the true value of 9j's
is
-
Yl'j=iPj)) (under
bounded). The Fisher
BELLONI AND CHERNOZHUKOV
information matrix
given by
is
F = P — pp'
P =
where
diag(p). Using a
rank-one update formula, we have
F-'=p-'+ r''szi =pl-p'P-^p
(2.8)
Therefore we have
\\F-^\\
<
=
„,-,/p-l/2
P
j = pi/2
1
In order to
of a
random
P/
+ yi -p'p-^p
and J-V2
to
variable which define
We
F^/^ and
The
=
It is also
p-l„„/p-l/2
^
PP^
-p'p-^p+ yi -p'p-^p
bound the third and fourth moments
B\n and B2n- Let o £ 5"~^ and q be
have that
d
,
^a2Pj"^''^(9i-Pj)
Under the assumption on the
S2„(C)
0{(f).
inverse
= p-i/2+
d
(a,J-\q-p)) =
its
1
bound A„ we need
distributed as f{-;0).
= Eti ^ + t:$^ <
trace(F"-i)
possible to derive an expression for J
l-po'
pt it
+
—---=XIPi''^9'-P')-
can be shown that i?i„(c)
=
0((i^'")
and
0(p2).
relations above were derived
results (the case
by Ghosal
in [12],
where the growth con-
— was imposed to obtain the asymptotic normality
of a = 0). We relax this growth requirement by combining
dition that d^ {hi. d)/n
>
Ghosal's approach with our analysis and an uninformative (improper) prior.
and our definition of A„ remove the logawe have Kn{c) =
Theorem 1 leads to a weaker growth condition
in that it only requires that d'^/n — 0. Moreover, the results of Theorem
2.4 of [12] now follow under the weaker growth condition that d^/n -^ 0,
In this case
rithmic factors. Therefore,
>
replacing the previous growth condition that d^{logd)/n -^
moment
estimation (a
>
0),
the conditions of
Theorem
0.
For higher
2 are satisfied with
for any strictly positive value of 6.
d'^'^'-'^^ /n —
Suppose next that we are interested to allow a grow with the sample
size as well. If d is growing in a polynomial rate with respect to n, our
results do not allow for a = O(lnn). Some limitation along these lines
should be expected since there is an exponential trade off between a and n.
However, it is definitely possible to let both the dimension and a to grow
with the sample size if, for instance, we are wilhng to accept the condition
that a = O(Vlnn). Such slow growth conditions illustrate the potential
limitations for the practical estimation of higher order moments.
the condition that
>
POSTERIOR INFERENCE
IN
Curved Exponential Family.
3.
CURVED EXPONENTIAL
Next we consider the case
7
of a curved
exponential family. Being a generalization of the canonical exponential family, its
analysis has
Let Xi,X2,
.
.
many
with the previous setup.
similarities
,Xn be
.
iid
observations from a d-dimensional curved ex-
ponential family whose density function
given by
is
/(x;e)=exp((x,^(77))-V^„(e(77))),
where
77
n —* 00
e ^ c
IR'^S
61
:
\&
—
0, an open subset of
>
as before. In this section
we assume that J
IR'^,
—
and d
>
00 as
~
Id for notational
lies
in the interior of
convenience.
The parameter
the set
^ C
of interest
The
H*^^.
is
r],
whose true value
true value of 6 induced by
r?o
770
is
given by 9q
=
9{r]o).
The mapping 77 1— ^(77) takes values from JR^^ to IR'' where c-d < di < d, for
some c > 0. Moreover, assume that 770 is the unique solution to the system
>
e{v)
=
Oo-
Thus, the parameter 9 corresponds to a high-dimensional linear parametrization of the log-density,
and
77
tion of the density of interest.
on the mapping
Assumption A.
[
require the following regularity conditions
9{-).
For every k, and uniformly in 7 6 B{0,K\/d), there
—> IR such that G'G has eigenvalues
G
exists a linear operator
,
describes the lower-dimensional parametriza-
We
:
M*
bounded from above and away from
-''
and
v^(9(77o+7/y^)-^(%))=n„ +
(3.9)
where
zero,
||ri„||
<
5i„
and
||/?2n||
<
So-n-
for every
(/
+
n
i?2n)G7,
Moreover, those coefficients are
such that
''
Sind^/^-^0 and Sond -^
(3.10)
Assumption B. There
for
.'
every
77
e
\I/
\:
exist a strictly positive constants £0
(uniformly on n)
such that
we have
mi)-0{m)\\>e,\\ri-ml
(3.11)
Thus the mapping
''
0-
77
1^
^(77) is
allowed to be nonlinear and discontinuous.
For example, the additional condition of 5i„ =
implies the continuity of
the mapping in a neighborhood of 770. More generally, condition (3.10) does
impose that the map admits an approximate linearization in the neighborhood of 770 whose quality is controlled by the errors (5i„ and 52n- An example
of a kind of map allowed in this framework is given in the figure.
BELLONI AND CHERNOZHUKOV
-0,5
1
5
1.5
1
2
Fig 1. This figure illustrates the mapping 6{-). The (discontinuous) solid line is the mapping while the dash line represents the linear map induced by G. The dash-dot line represents the deviation band controlled by ri„ and i?2n-
Again, given a prior n on 0, the posterior of
rj
given the data
is
denoted
by
^„(r?)
where X
Under
a
n{9i7]))
J]
=
fixf, v)
Ad{v)) exp {n (X,
0(r/))
-
n^(0(r/)))
= iE"=i^ithis
framework, we also define the local parameter space to describe
contiguous deviations from the true parameter as
7
=
and
v^(T/-r7o),
let
s
= (G'G)-^G'v^(X -
^)
order approximation to the normalized maximum
liklelihood/extremum estimate. Again, similar bounds hold for s: E[s] = 0,
(once more) be a
E[ss'^]
first
= (G'G)~\
= x/n(^ —
where r
and
%),
=
||s||
is TT*
Op{\/d).
" (7)
=
The
posterior density of 7 over F,
-r^2l_^ where
'
^
j^e(-t)d-y
(3.12)
^(7)
By
=
exp [n (^X,e{r,o
construction
+ n'^'j) -
we have
e{vo))
-n
[^(^(770
+
n-^/^7))
-
V'(e(r?o))
POSTERIOR INFERENCE
IN
CURVED EXPONENTIAL
9
where u-y e W C H''.
Next we first show that tails have small mass outside a vrf-neighborhood
in r. We also need an additional condition on a„ as defined in (B.18) and
repeated here for the reader's convenience
a„
Therefore, using
Lemma 2,
Z„ by above with
logd
=
= sup{c
in a
:
<
A„(c)
1/16}.
neighborhood of size \Jand we can
lemma
a proper gaussian. In the next
o{ani which
is
it is
still
bound
required that
a substantially weaker condition than the one used
in [12] for establishing asymptotic normality for the posterior of (regular)
exponential densities, An(clogd)(i
=
Lemma
(Hi),
Assume
1
that \ogd
=
o{an). Then, for
o(l).
and (iv) hold. In addition, suppose
some constant k independent of d and d\, we
that (i),(ii),
have
'
I
K(e[r,o)
Jr\B{o,kVd)
Comment
+
n-'^''-u^)z„(u-,.)d'y<o{
f
\h
'
^
3.1
+
n(e(rn,)
n-^/-u-,)z,{u^)d^\
>
^
J
The only assumption made on d\ in the previous lemma
If di logd = o[d) the proof simplifies significantly (there is
was that di < d.
no need to define region
(II)).
Next we address the consistency question
for the
maximum
likelihood
estimator associated with the curved exponential family.
Theorem
and
3 In addition to Assumptions
(iv) hold.
Then
the
maximum
\\v-m\\
A and
B, suppose that an
likelihood estimator
=
t)
—
>
oo,
satisfies
0{^).
Two
remarks regarding Theorem 3 are worth mention. First, a sufficient
condition for a„ — cxd is simply An(c) — 0, stronger than the condition
»
y^d/nBin{c) needed
Ghosal
in [12].
>
for consistency for the
Second, our consistency result
exponential case obtained by
relies
model d.
Finally, we can state the asymptotic normality
on the dimension of the
larger
result for the curved ex-
ponential family.
Theorem
4 Suppose that Assumptions A, B,
In addition, suppose that logd
=
(i),
(ii),
(Hi),
and
(iv) hold.
o{an)- Then, asym.ptotic norm.ality for the
posterior density associated with the curved exponential family holds,
\Kii)-'f'd,ir,s,iG'G)-')\dj.
BELLONI AND CHERNOZHUKOV
10
In this section
4. Multinomial Model with Moment Restrictions.
we provide a iiigli-level discussion of tlie multinomial model with moment
,1''} be the known finite support of a
restrictions. Let X = {x^,x^,x^,
multinomial random variable X which was described in Section 2. Conditions
(i) — (iv) were verified in the same section.
As discussed in the introduction, it is of interest to incorporate moment
.
The parameter
model
of interest
.
Imbens
restrictions into this model, see
to a curved exponential
.
G
is 77
^ C IR^
continuously differentiable) vector-valued
IR^'^
a discussion. This will lead
[11] for
as studied in Section 3.
a compact
moment
set.
Consider a (twice
function
m X
:
^
x
^>
such that
E[m(X,
The
=
7])]
for
a unique
%
'
£
"I'.
case of interest consists of the cardinality d of the support
number
larger than the
moment
of
conditions
M
which
than the dimension di of the parameter of interest
77.
in
The
turn
X
being
is
larger
log-likelihood
function associated with this model
K9,r7)
=
=
.x,}ln9,
l{q,ri)
some
= —00
6
if
and
=
arg
max
mapping
<?
:
^ —
>
E [?7?.(X,7])m(X,
7/)']
is
mapping
that Assumption
A
^
:
—>
Q
to have d*-^/n -^
0.
In order to verify
77
in a
9j{ri)
"^ in its
=
We
0.
log((7j(77)/go(^)) (for j
is
77,
=
refer to
=
Qin and Lawless
a twice continuous
and
(5i„
=
is
this implies
0(dd'i{d/n)].
Assumption B, we use that the parameter
interior).
>
g
neighborhood of 770. In particular
holds with S2n
a compact set ^, and assume that the mapping
contains
1,
mapping. Assuming that the matrix
uniformly positive definite over
of
log-
A''"^ formally defined
use the inverse function theorem to show that ^
differentiable
1.
j=l
discussed in Section 2 the function
l,...,(i) is the natural 9{-)
^gj =
j=0
Y^ qjVi(Xj,ri) =0, J2qj =
j=o
[18]
0,
l{q,ri)
(4.14)
As
=
any of the moments conditions. This
violates
likelihood function induces the
^(r?)
,
such that y^gj77z( 2:^,77)
j=0
77
,
^,
,
for
and
EE/{X.
-^^=°
(4.13)
77
It suffices
belongs in
injective (over a set that
Newey and McFadden [16] for a dismoment restrictions.
cussion of primitive assumptions for identification with
POSTERIOR INFERENCE
APPENDIX
For a,6 e
IN
A:
CURVED EXPONENTIAL
NOTATION
their (Euclidean) inner product
IR*^,
11
is
denoted
l^y {a,b),
and
= \/{a,a). The unit sphere in R'' is denoted by 5"^"^ = {v e W^
=
= 1}. For a linear operator A, the operator norm is denoted by
= 1}. Let
sup{||Aa||
fJ-;V) denote the rf-dimensional gaussian
||a||
:
||u||
j|.4||
:
||a||
(j)d{-:,
mean
density function with
/;
APPENDIX
In this section
we prove
and covariance matrix V.
B:
TECHNICAL RESULTS
lemmas needed to prove our main
Our exposition follows the work of Ghosal
completeness we include Proposition 1, which can be
the technical
result in the following section.
[12].
For the sake of
found in Portnoy
[17],
and a specialized version of
All the remaining proofs use different techniques
Lemma
1
of Ghosal
[12].
and weaker assumptions
we no longer require the
which leads to a sharper analysis. In particular,
prior to be proper, no bounds on the growth of det {ip" {do)) are imposed,
and Inn and \nd do not need to be of the same order.
As mentioned earlier we follow the notation in Ghosal [12], for u G W let
Z„(u) =exp('(u,A„)
(B.15)
-
and
IJuJlVz)
(B.16)
Zn{u)
=
otherwise
exp
(if
(
^o
-^ /f^ A'„ J-iu\
~n[i>
+
let
n~^^-J~^u ^ 0),
(^o
Z„(u)
+ n-'/-J-\) -
=
Z„(u)
=
0.
^.(('o)]
]
,
The quantity
(B.15) denotes the likelihood ratio associated with / as a function of u. In
a parallel manner, (B.16)
We
is
associated with a standard gaussian density.
start recalling a result
on the Taylor expansion of ^ which
and Z{u).
is
key to
control deviations between Z{u)
Proposition 1 (Portnoy [17]) Let ip' andip" denote respectively the gradient and the Hessian ofip. For any 9, 6q e Q, there exists 8 = X9 + {l — X)9o,
for some X G [0, 1], such that
(B.17)
i-{0)
=
^{9o)
+
lEgA{9-eo,Uf]+4-AEg\{9-eo,U)']-3(EJ{0-eo,Uf]
+
{i''(eo).e--eo)
+ ^{e-9o,i'"{do){0-0o)) +
where Eg [g{U)] denotes the expectation of g{V
—E
\V])
and
V~
f{'\9).
Based on Proposition 1 we control the pointwise deviation between Z-n and
Zn in a neighborhood of zero (i.e., in a neighborhood of the true parameter).
BELLONI AND CHERNOZHUKOV
12
Lemma
2 (Essentially in
that
<
\\u\\
\fcd,
Ghosal
|lnZ„(u)-lnZ„(u)|
<
Proof. Under our
definitions,
vr'^/~J~'^u)
or Portnoy [17]) For allu such
[12]
we have
A„(c)||uf and lnZ„(u)
— ip{9o)\. Using
(/)
=
Proposition
<
|lnZ„(u)
1
(A„,u)-i||u||2(l-2A„(c)).
— lnZ„(u)| =
we have that
(/) is
7i|'0(^o
+
bounded above
by
(/)
<
n\lE[{^,vf]+^,[E,\{^,V)'-^{E,[{^,vf])
<
'^{n'^/^\\ufB,„{0)
+
n-'\\u\\*B2n{c))<Xn{c)\\uf.
Tlie second inequality follows directly from the first result.
Next we show how to bound the integrated deviation between the quan(B.15) and (B.16) restricted to the neighborhood of zero.
tities in
Lemma
c>
3 For any
( I Zn{u)dv}\
\J
J
Proof. Using
|e^
we have
\Zn{u)
I
- Zn{u)\du <
cd\n{c)e'"^^^'^''^
J{u:\\u\\<V^]
— e^j <
[x
— y|niax{e^,e^} and Lemma
2 (since
<
||u||
\/cd)
we have
\Z^{u)
By
-
Zn{u)\
-
A„(c)||u||2exp((A„,u)-i(l-A„(c))||u||2).
Zn{u)\du
:
.
^
|lnZ„(u) -lnZ„(«)|exp ((A„,u)
<
integrating over the set
/H(^/^5) \Zn{u)
-
<
;.
The next lemma controls
<
<
<
<
H{vcd) = {u
/„,^j
:
||u||
A„(c)||t.|p
i(l
< Vcd} we
Xn{c))\\uf)
obtain
exp ((A„,u) - 1(1
-
A„(c))||«|p
cdA„(c)/^(^,exp((A„„«)-i(l-A„(c))||«f)
cdA„(c)e<='*^"(=)/^^^)exp((A„,^.)-|!|u|r-)
cdA„(c)e='^-^"(=''/Z„(u)dM.
the tail of Zn relatively to Zn- In order to achieve
makes use of a concentration inequality for log-concave densities functions developed by Lovasz and Vempala in [15]. The lemma is stated with a
given bound on the norm of A„ which is allowed to grow with the dimension.
Such bound on A„ can be easily obtained with probability arbitrary close
to one by standard arguments.
that
it
POSTERIOR INFERENCE
Lemma
k
>
4 Suppose
we have
1
I{u:\\u\\>kV^}
CURVED EXPONENTIAL
IN
< Cid and
that ||An,p
n{eo+n~^^^J-^u)Zn{u)du<
w/tereo 16max{4Ci,
A„(c)
(^supTv{u))
:
<
\\u\\
a} and
7r(^o+n""^''V"'M)Z„(u)dw<
7r(eo+n"^/2j-^u)
<
<
<
InZniu)
Under our assumptions we
e^'^i'
'
Z„,(u)du.
/
for
any u 6 H{k\/cd)'^ we have
-/cd(i-=%^/cc- v'Hcr)
Iiave
<
<
<
<
~
V
"
16 and assuming A„(c)
SHihVTdY Zn{u)du
''
j
(A„,u)-i||uf(l-2A„(c))
fc\/^v'C^-|fc2cd(l-2A„(c))
2
>
Z„(u)dM
JHikV^Y
H(fc\/S)=
Next note that A^ G H{kvcd). Moreover,
some u := /cv^w/|juj| such that
since c
/"
Te^''^''*^)
complement by H{aY. Then
its
sup
H(kV3r
Then for every
1/16.
-;"''
1/(1 -2A„(c))}.
Proof. Define H{a) := {u
we have
<
13
1/4.
\2
Using
Lemma
5.16 of
[15]
we have
{e^-^cf-'^ J Zn{u)du
2(ei-"c)^-i//f(v^)^n(«)du
2(ei-^c)^-ie'=P^''W/^(^)Z„(w)du
where we used that J Zn{u)du < 2/^/
/^>
Zn{u)du (note that k does not
appear).
>
Since c
A„(c)
^
We
16,
we have
and l/d
^
0,
77i
7?2
:= c
-»
f
Ml —
- 1 >
l/d)g
—
A„(c)
j
—
is
satisfied. In fact,
we expect
lemma could depend on n
we can have c as large as
a„ :=sup{c: A„(c)
(B.18)
(since
f).
note that the value of c in the previous
long as the condition
1
<
as
1/16}.
(B.18) characterizes a neighborhood of size ^/and on which the quantity
Zn{-) can
still
be bounded by a proper gaussian.
tribution outside this neighborhood.
lemma
for
We
Lemma
4 bounds the con-
close this section with a technical
bounding the difference between the expectation of a function
with respect to two probability
densities.
BELLONI AND CHERNOZHUKOV
14
Lemma
define If
and
5 Let f and g be two nonnegative integrable functions in IR
= J f{u)du and Ig
J g{u)du. Moreover, let h be a third positive
,
=
A C
function and
H'^ be a set such that hf
g{u)
f{u)
<
/^ h{u)f{u)du
—
m&XueA h{u) + hf
du <
h{u)
=
'-
If
'-
,
,
,,
,
-
\f[u)
I
co.
Then
...
,
g[u)\du.
A
Proof. Simply note that
/M_sM|d„ = /^/,(„)|/M_/M
7 - ^1
+ /li)_i£M|du
- g{u)\du
/a Ku)f{u)du +}-J^ h{u)\f{u)
'^Y-I^^i-Jj,h{u)\J{u)-g{u)\du
'^'--^'i:'^''^''
APPENDIX
f^\f{u)-g{u)\du.
PROOF OF THEOREMS
C:
AND
1
2
Armed with Lemmas 2, 3, 4, and 5, we now show asymptotic normahty
and moments convergence results (respectively Theorems 1 and 2) under
the appropriate growth conditions of the dimension of the parameter space
with respect to the sample size.
easy to see that
It is
therefore
its
proof
Proof of Theorem
a
is
follows from
1
[d + a) (l
Let M^^a
2.
constant and d grows to
be using that
We
Theorem
y^SM^ >
A = {u
+
a =
2 with
0,
+ q)'
+a
ah\{d
d
infinity, this simplifies to
In the case that
a multiple of d.
4||A„|| in the analysis (recall that ||A„||
will divide the integral of (2.7) in
where
Theorem
omitted.
is
€
c is a fixed constant.
IR'^
:
||u||
=
We will
0{Vd)).
two regions
< ^cMd.,a}
A^
and
Thus we have
(C.19)
||u||"]7r*(u)
-
(pd{u;
An, Id)\du
To bound
the
first
<
J^
|iu||"|7r*(u)
-
An,Id)\du
cpdiu;
+
+ J^.\\ur\K{u)-Mu;^n,Id)\du.
,
term,
we
as defined above. In this case,
will use
Lemma
5
with h{u)
=
we have hf / If < max^eA h{u)
||u||°
and
A
-c-'^mS.
POSTERIOR INFERENCE
CURVED EXPONENTIAL
IN
15
Therefore
/a ii"ri<(
2c'r-Mli'
I
7r{eo)Z„{u)du '/^'
^
"
- n(eo)Zr,{u)\dv
n-i/2j-iu)z„(u)
n\
u/ nv ;i
J
V
I
'"-''
7^(60)
+
;
To bound the very
^
,
+
JZn(u)du
I
fT /a l^n(w) - Zn{u)\du
last
term we apply
Lemma
with
3
Cd^a
=
cMd^a/d to
obtain
Zn{u)du Jk
J
which converges
On
to zero
assumption
over,
under our assumption
the other hand, the
n-Q/2
{iv') also
[ii').
bounded by assumption [iv'). Moreensures that the term converges to zero as follows
first
9o
l/rO:/2
2c°''-M^^ sup
term
is
+ n-i/2j-iu) -
1
/a Zn{u)du
Kn{ci,^y
-1-^0.
2c«/2i\//"''^e'^'*''<'''>^"('^'''°'
The second term
^
^
of (C.19)
is
<
J Zn{u)du
^(^0)
bounded above by
'\u\\''Z„(u)du+-
1
/
||ii||"7r(6'o+n-'/-J-^u)Z„(u)du.
JniOo)Z^{u)duJA
J Zn(u)du
JA
The
term above converges to zero by standard bounds on gaussian
first
densities for an appropriate choice of the constant c (note that c can be
chosen independently of d and a).
Finally, we bound
Thus we have
the last term. Let A^ := [u
:
\\u\\
€
[k
^cM^^a
,
(^
+
l)\/c7Wd^
(C.20)
/
\\u\\''n{eQ+n-^^^J-\:}Z„(u)du<'S2ik+'^)°'^'^^^^'^da
f Tr{eo+n-^/~J-hi)Z,,iu)du
}.
BELLONI AND CHERNOZHUKOV
16
Using
Lemma
4 for each integral we have
f Tx(ea+n-^'^J-\i)Zn{u)du<
Since M^^q
(^sup7r(u)) ("e^^''^-
> max{l,Q} we have
'''"('=""'
j Zn{ii)chu\
e''''^'^-''''/^
'
oo
7:=i
by choosing
c large enough. Moreover, our definition of M^^a. also implies
that
c"/2M°/2e-cM,,,/io
provided that c
We
is
^
g^p I'^^i^ g ^
1^^
j,^_^^)
_ cM, jio") <
exp (-cA/rd,„/20)
large enough.
have that (C.20) can be bounded above by
(sup^(u))
U^^^'-^-i^'--')
I
Zn{u)dv\
e-'^^o--'^^
under our assumptions. Therefore the result
APPENDIX
Proof of
(/) :=
B
Lemma
(0,
hNo)
,
1.
D:
=
follows.
o (n{eo)
I
Zniu)du\
D
PROOFS OF SECTION
3
Divide F into three regions:
{II) := {7
:
max{||7||,
(///):= r
\ ((J)
U
||t/^-,||}
< hNr]
\
B
(0,
kNc)
,
(//)),
where k is chosen later to be large enough independent of the dimensions d
or di. Region (/) is defined to be the region where the linear approximation
G for 9{-) is valid in the sense of Assumption A. Region (///) represents
the tail of the distribution; either 7 or Uj has large norm. Finally, region
(/J) is an intermediary region for which G is not a valid approximation but
we still have interesting guarantees for deviations from normality. We point
out that regions (//) or (///) might be highly non-convex. We will derive
sufficient conditions on the values of Nq and A''^ as a function of the d and
di
.
It will
be
sufficient to set
Ng =
\fd and
For notational convenience we define cq
Our assumptions
Nt = \fd log d.
= k^N^/d and
are such that
dXnicc) -^
and A„(cr) < 1/16.
ct
= k~Nf/d.
POSTERIOR INFERENCE
u^
IN
CURVED EXPONENTIAL
We first bound the contribution of region (///)•
= kNc-^'-n € U. Using Lemma 2 we have
lnZ„(u^) < (A„,w^:
Since InZn(-)
is
concave
U
in
-
17
For any 7 e {III), define
^(1 -2A„(cG))||u-yf
and hrZ„(0)
=
by design, we have
(D,21)
MinZ„(K,) <
\nZn{u,) <
(^LllM^^lp^ <
-\]u,\\Ng
-\\u,\\Ng
y
5'
The contribution
1/16.
J {II I)
The
bounded by
'
^
integral
/(///)
By
of (///) can be
on
\BeeT^[f^o)J J[iii)
right can be
tlie
<
e-^p(-T^G||w7ll)d7
definition of (///),
other hand 7 G
bounded
5
V
J
as follows
/B(o,fc/v^)n(/7/)exp(-f iVG||u^||)(i7
7 £ B{0,Nt) D (III) imphes
implies that \\uj\\ > EoNt-
B{0,Nt)
> kMr- On
||u^||
A
the
standard bound on
the integrals yields
/(///)
exp {-'^Ng\\u4) dj
<
exp
(-'^NgNt +
Using the assumption on the
prior,
ln{kNT))
di
+ exp(-f^eoNGNT +
dilndi)
.
we can bound the contribution
of
(///) by
/
(D.22)
7r(6io)exp
Next consider 7 e
_
Cprrord
{II).
+
dj Indj
+
di
lu{kNT)
p
~ -j-o^gNt
By definition 7 G B{0, kNr)
< 1/16, we have that
\
B{0, kNc)- Under
the assumption that Xn{cT)
lnZ„(u^)
<
(A„,u-y)
Therefore, by choosing k such that
n(0{'n,)
11)
^
+ n-'^'u,)zUn,)d-f
<
'
17
- --||u^f
kNG >
8||A„||,
we have
n{eo)(sup^)f
\eee'!^(t^o)J Jill)
-
"^'°^'^^g^(v'/
exp ( {An,v.^) - ^-hu^f)
26
J
\
'^P
Ilu-vll"
2
I
16" ^" /
d"/
d-y
BELLONI AND CHERNOZHUKOV
18
Again using our assumption on tlie prior and standard bounds to gaussian
we can bound the contribution of (//) by
densities,
(D.23)
Finally,
for
we show
any 7 G
(/)
bound on the
a lower
integral over (/). First note that
condition (3.10) holds and we have
+ 5n2)kNG + 5ni) C
Therefore, u^ £ BiO,{\\G\\
ity,
- y^-'^c)
Tr{9o)exp{cpriord+d, ln(l/£o)
=
u-y
ri„
+
(/
+ R2n)Gl-
B{0,2\\G\\kNG)- For simplic-
letC(;)=4|lGf7V2/d
+n
I{i)'^{^ivo)
>
^/-UyjZn{u~,)d-i
Under our assumptions exp
(3.11), |iA„||
=
0{Vd), and
lnZ„(ri„
+
(/
+
I
7r(6'o)exp
~-'^'n(c{/))\/
^^
)
'•
Furthermore, using
< kNa, we have
II7II
=
i?2„)G7)
~^
f-/\„(c(/))y'^j /(;)Z„(u^)d7
(A„,ri„
+
+
(/
/?2n)G7)
-'-^^^4^\\r:n +
>
o(l)
{I
+ R2n)Gyf
-
.'
+ (A„,G7)-^i^^#^|!G7f.
Therefore we have
/(O
n
(^('?o)
Choosing
follows since
+
n-'^hi,) Zn{u^)dy
Ng =
Vd, A^i
we have d >
di.
>
>
>
-K{e,)0
(/,,)
exp ((A„, G7) - '-^^^.\\G^\\-) d^)
7r(eo)0((l-2A„(c(;)))^'/2det(G'G)-^/2)
7t{9o)0 {expH\G\\di))
= ^\ogd
and k
sufficiently large the result
u
Proof of Theorem 3. Let 7 be such that = ?]o +
< ~cd for any 7 ^ B{Q,ky/d) where k
77
that lnZ{u-y)
Therefore, since the contribution of the prior
7 e S(0, kVd) and the
Using (D.21) with
In
As stated
earlier,
result follows.
is
We
will
show
sufficiently large.
bounded by
(iv),
the
MLE
'.
,
No = Vd we
Z{u^)
is
n^^"^-
have
<-\\uj\\VdP/5<-eoPd/5.
the result follows by choosing k sufficiently large.
Proof of Theorem 4. Using Lemma 1 and known results for gaussian
we can restrict our analj'sis to B{0, kVd) since the remaining part
densities,
has negligible mass.
The remaining
2.
of the proof follows the
same steps
in
the proof of
Theorem
POSTERIOR INFERENCE
IN
CURVED EXPONENTIAL
19
REFERENCES
[1]
O. Barndorff-Nielsen (1978). Information and exponential families in statistical theory. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons,
Ltd., Chichester.
[2]
P. J. Bickel
and
A. Yahav (1969).
J.
of Bayes solutions, Z. Wahrsch. Verw.
[3]
[4]
Some contributions
Geb 11, 257-276.
to the asymptotic theory
G. Chamberlain (1987). Asymptotic efficiency in estimation with conditional
restrictions, Journal of Econometrics 34, no. 3, 305-334
V. Chernozhukov and H.
Hong
(2003).
An
MCMC
approach to
moment
classical estimation,
Journal of Econometrics 115, 293-346.
[5]
G. Donald, G.
S.
and consistent
117, no.
[6]
1,
W.
W.
Imbens,
K.
tests with conditional
Newey (2003). Empirical likelihood estimation
moment restrictions. Journal of Econometrics,
55-93.
Uniform Cental Limit Theorems, Cambridge Studies
R. Dudley (2000).
in
advanced
mathematics.
[7]
B. Efron (1975). Defining the curvature of a statistical problem (with applications to
second order efficiency) (with discussion). Annals of Statistics, Vol. 3, No. 1189-1242.
,
[8]
B. Efron (1978).
6,
[9]
No.
L. P.
2,
The Geometry
of Exponential Famihes,
Annals of
Statistics, Vol.
362-376.
Hansen and K.
J.
Singleton (1982).
Generalized instrumental variables estima-
tion of nonlinear rational expectations models, Econometrica 50, no. 5, 1269-1286.
[10]
I.
Ibragimov and R. Has'minskii (1981). Statistical Estimation: Asymptotic Theory,
Springer, Berlin.
[11]
W. Imbens
G.
One-step estimators
(1997).
moments models. Rev. Econom. Stud.
[12]
families
generahzed method of
Asymptotic normahty of posterior distributions for exponential
of parameters tends to infinity. Journal of Multivariate
Ghosal (2000).
S.
for over-identified
64, no. 3, 359-383.
when the number
Analysis, vol 73, 49-68.
[13]
D. Hunter and M.
S.
networks, to appear
[14]
E. L.
Lehmann and
Springer Texts
[15]
Lovasz and
L.
Handcock.
in
G. Casella (1998). Theory of point estimation. Second edition.
Springer- Verlag, New York.
in Statistics.
S.
Vempala,
Algorithms, To appear
[16]
W.
Inference in curvedexponential family models for
Journal of Graphical and Computational Statistics,
in
The Geometry
Random
of
Logconcave Functions and Sampling
Structures and Algorithms.
Newey and D. McFadden (1994). Large Sample Estimation and Hypothesis
Handbook of Econometrics, Volume IV, Chapter 36, Editedby R.F. Engle
and D.L. McFadden, Elsevier Science.
K.
Testing,
[17]
S,
Portnoy (1988). Asymptotic behavior of likelihood methods for exponential famiwhen the number of parameters tends to infinity. Ann. Statist. 16, no. 1, 356-366.
lies
[18]
J.
Qin and
tions,
J.
Lawless (1994).
The Annals
Empirical Likelihood and General Estimating Equa1, 300-325.
of Statistics, Vol. 22, No.
E-MAIL: belloni@mit.edu
1101 KiTCHAWAN Road,
Building 800 Office 32-221,
E-MAIL: vchern@mit.edu
50 Memorial Drive
Heights, NY 10.598
HTTP:// WEB, MIT.EDU/bELLONI/ www
Cambridge MA 02142
http://WEB.MIT.EDU/vCHERN/WWW
YoRKTOWN
Room E52-262f
,
Date Due
Download