10B A basic introduction to large deviation theory

advertisement
10B A basic introduction to large deviation theory
In Chap. 10, we developed the theory of escape problems in the weak noise limit
(rare events) using a combination of WKB methods and path-integrals. In this
section, we present a basic introduction to a more fundamental mathematical approach to rare events, namely, large deviation theory. We follow closely the notes of
Touchette [3], which are a shorter version of the review [2].
Let us begin with a simple motivating example of large deviations. Consider the
sum (sample mean)
1 n
Sn = ∑ Xi ,
n i=1
where X1 , X2 , . . . is an i.i.d. sequence of random variables generating from the probability density p(X) with X ∈ R. The joint probability density function (pdf) is
n
p(X1 , . . . , Xn ) = ∏ p(X j ).
j=1
The corresponding pdf of the sum Sn is obtained as follows:
pSn (s) = P[Sn = s] =
Z ∞
−∞
dx1 . . .
Z ∞
−∞
n
dxn δ ( ∑ xi − ns)p(x1 , . . . , xn ).
Introducing the Fourier representation of the Dirac delta function, δ (x) =
and using the product decomposition of the joint pdf,
Z ∞
Z ∞
dk −ikns n
ikx j
e
e
p(x
)dx
pSn (s) =
j
j
∏ −∞
−∞ 2π
j=1
=
Z ∞
−∞
pe(k)n e−ikns
dk
2π
In the case of a Gaussian pdf with mean µ and variance σ 2 ,
pe(k) = eikµ e−σ
2 k2 /2
,
and in the case of an exponential pdf on [0, ∞) with mean µ,
pe(k) =
µ
.
1 − iµk
Substituting into the integral expression for pSn (s) then gives
pSn (s) =
Z ∞
−∞
eikn(µ−s) e−nσ
(10B.1)
i=1
2 k2 /2
for the Gaussian, and
1
2
2
dk
1
=√
e−n(µ−s) /2σ
2
2π
2πnσ
R∞
−∞ e
ikx dk/2π,
pSn (s) =
Z ∞
−∞
e−ikns
I
µ
1 − iµk
n
dk
2π
e−izns/µ dz
µ n−1
=
[1 − iz]n 2π
(n − 1)!
n
n−1
n
s
=
e−ns/µ .
(n − 1)! µ
µn
=
µ
−ns
µ
n−1
e−ns/µ
We have closed the contour in the lower-half complex z-plane and used the following
residue theorem for an analytic function f (z):
f
(n−1)
1
(z0 ) =
(n − 1)!
I
f (z) dz
.
(z − z0 )n 2πi
We now note that in both cases the leading order behavior in n for large n can be
expressed as
pSn (s) ≈ e−nI(s) ,
(10B.2)
where
I(s) =
(s − µ)2
,
2σ 2
s∈R
(10B.3)
for the Gaussian pdf and
1
I(s) = − {(n − 1) ln n − ln[(n − 1)!] − n ln(s/µ) − ns/µ}
n
s
= − 1 − ln(s/µ), s ≥ 0
µ
(10B.4)
for the exponential pdf (after applying Strirling’s formula). In both cases I(s) ≥ 0
and I(s) = 0 only when s = µ = E[X]. Since the pdf of Sn is normalized, it becomes
more and more concentrated around s = µ as n → ∞, that is, pSn (s) → δ (s − µ) in
the large-n limit.
Large deviation principle
The leading order exponential form e−nI(s) found for the Gaussian and exponential
pdfs is the fundamental property of large deviation theory, which is known as the
large deviation principle. It arises in a much wider range of stochastic process then
sums of i.i.d. random variables. Following Touchette [3], we will avoid the technical aspects of the rigorous formulation of the large deviation principle. For our
purposes, a random variable Sn or its pdf p(Sn ) satisfies a large deviation principle
(LDP) if the following limit exists:
1
lim − ln[pSn (s) = I(s),
n
n→∞
2
(10B.5)
with I(s) the so-called rate function. A more rigorous definition involves probability measures on sets rather than in terms of pdfs, and gives lower and upper bounds
on these probabilities rather than a simple limit [4, 2]. One of the main goals of
large deviation theory is to identify stochastic processes that satisfy an LDP, and
to develop analytical (and numerical) methods for determining the associated rate
function. Touchette [3] identifies three approaches to establishing an LDP for a random variable Sn :
1. Direct method: Derive an explicit expression for p(Sn ) and show that it has the
form of an LDP. (This was the method used to derive the LDP for Gaussian and
exponential sample means.)
2. Indirect method: Calculate certain functions of Sn that can be used to infer that
Sn satisfies an LDP . One example is to use a generating function and apply the
Gartner-Ellis Theorem (see below).
3. Contraction method: Relate Sn to another random variable An , say, that is known
to satisfy an LDP and use this to derive an LDP for Sn .
Varadhan and Gartner-Ellis Theorems
In order to discuss in more detail the indirect method, we need to introduce two theorems. Again, we will sacrifice mathematical rigor for ease of presentation. The first
theorem is due to Varahadan, and is concerned with the calculation of the functional
expectation
Z
Wn [ f ] = E[en f (Sn ) ] =
pSn (s)en f (s) ds.
(10B.6)
R
If Sn satisfies an LDP with rate function I(s), then for large n
Wn [ f ] ≈
Z
R
en[ f (s)−I(s)] ds ≈ en sups [ f (s)−I(s)] ,
(10B.7)
after making a saddle-point approximation. We have exploited the fact that the remaining corrections are sub-exponential in n. We now introduce a functional λ [ f ]
such that
1
(10B.8)
λ [ f ] ≡ lim lnWn [ f ] = sup{ f (s) − I(s)}.
n→∞ n
s∈R
This is a statement of the Varadhan Theorem [4], which was originally applied to
bounded functions f , but also holds for unbounded functions.
Consider the special case f (s) = ks with k ∈ R. The functional Wn [ f ] reduces to
a scaled generating function of Sn , that is,
W [kn ] = E[eknSn ],
and λ [ f ] reduces to the scaled cumulant generating function for Sn , λ (k), with
3
λ (k) ≡ lim
1
n→∞ n
ln E[enkSn ] = sup{ks − I(s)}.
(10B.9)
s∈R
Hence, the Varadhan Theorem implies that if Sn satisfies an LDP with rate function I(s), then the scaled cumulant generating function λ (k) of Sn is the LegendreFenchel transform of I(s). It turns out that this result can be inverted, which is the
basis of the Gartner-Ellis Theorem [1]: if λ (k) is differentiable, then Sn satisfies
an LDP and the corresponding rate function I(s) is given by the Legendre-Fenchel
transform of λ (k):
I(s) = sup{ks − λ (k)}.
(10B.10)
k∈R
The usefulness of this result is that one can often calculate λ (k) without knowing
the full pdf p(Sn ). However, it is important to note that not all rate functions can be
calculated using the Gartner-Ellis Theorem.
Some properties of I(s) and λ (k)
We first discuss some properties of λ (k). By definition, λ (0) = 0, since a pdf is
normalized to unity. Moreover,
E[Sn enkSn ] 0
= lim E[Sn ]
(10B.11)
λ (0) = lim
n→∞
n→∞ E[enkSn ] k=0
assuming λ 0 (0) exists. Similarly
λ 00 (0) = lim n Var Sn .
(10B.12)
n→∞
Another important feature of λ (k) is that it is always convex. This follows from an
integral version of Holder’s inequality:
!p
!q
∑ |yi zi | ≤ ∑ |y j |1/p
i
j
∑ |z j |1/q
j
,
0 ≤ p, q ≤ 1,
p + q = 1.
Setting ya = enαk1 a , za = en(1−α)k2 a and p = α, we have
D
Eα D
E1−α D
E
≥ en(αk1 +(1−α)k2 )Sn .
enk1 Sn + enk2 Sn
Taking the logarithm of both sides gives
D
E
D
E
D
E
α ln enk1 Sn + (1 − α) ln enk2 Sn ≥ ln en(αk1 +(1−α)k2 )Sn .
Finally, multiplying both sides by 1/n and taking the limit n → ∞ shows that
4
αλ (k1 ) + (1 − α)λ (k2 ) ≥ λ (αk1 + (1 − α)k2 ),
(10B.13)
which establishes that λ (k) is convex. It can also be shown from properties of the
Legendre-Fenchel transform that rate functions obtained from the Gartner-Ellis Theorem are strictly convex. We thus infer one limitation of the Gartner-Ellis Theorem,
namely, it cannot generate non-convex rate-functions with more than one minimum.
Suppose that a rate function can be determined from the Legendre-Fenchel transform of λ (k) according to the Gartner-Ellis theorem. Differentiability and convexity
of λ (k) then imply that the Legendre-Fenchel transform reduces to a standard Legendre transform, that is,
I(s) = k(s)s − λ (k(s)),
(10B.14)
with k(s) the unique root of λ 0 (k) = s. Differentiating the Legendre transform equation with respect to s gives
I 0 (s) = k0 (s)s + k(s) − λ 0 (k(s))k0 (s) = k(s),
I 00 (s) = k0 (s) =
1
.
λ 00 (k)
It follows that if λ (k) is strictly convex, λ 00 (k) > 0, then so is I(s). Finally, note
that any rate function obtained from the Gartner-Ellis Theorem has a unique global
minimum whose root s∗ satisfies I 0 (s∗ ) = k(s∗ ) = 0. This implies that
s∗ = λ 0 (k(s∗ )) = λ 0 (0) = lim E[Sn ],
n→∞
and I(s∗ ) = 0. Thus, in the large-n limit the pdf concentrates around the mean, which
is an expression of the law of large numbers. This suggests that one interpretation
of the LDP is that it quantifies the likelihood of a large deviation from the mean.
The contraction principle
Let An be a random variable that has an LDP with rate function IA (a). Consider a
second random variable Bn = f (An ). Does this also satisfy an LDP and, if so, what
is its rate function? In order to address this issue, first write the pdf of Bn in terms
of the pdf of An :
Z
pBn (b) =
{a: f (a)=b}
pAn (a)da.
Using the LDP for An and the saddle-point method gives
Z
−nIA (a)
pBn (b) ≈
e
da ≈ exp −n inf
IA (a) .
{a: f (a)=b}
{a: f (a)=b}
Hence, p(Bn ) also satisfies an LDP with rate function
IB (b) =
inf
IA (a).
{a: f (a)=b}
5
(10B.15)
The latter formula is known as the contraction principle, since f may be many-toone, that is, there may be several a’s for which b = f (a), in which case information
regarding the rate function of A is “contracted” down to Bn .
LDP for a stochastic differential equation (SDE)
We now link the general discussion of LDPs to the case of an SDE with weak noise
considered in Sec. 10.2. Consider the one-dimensional SDE with additive noise
√
(10B.16)
dX(t) = f (X(t) + εdW (t), X(0) = 0,
with W (t) a Wiener process. We are interested in the pdf of random paths {X(t),t ∈
[0, T ]} of duration T in the limit where the noise strength ε vanishes. Denote the
pdf by the functional p[x]. The occurrence of an LDP in the low noise limit is a
reflection of the fact that random paths of the SDE should converge in probability
to the deterministic path given by the solution to the ODE
ẋ(t) = f (x(t)),
x(0) = 0.
The path fluctuations away from the deterministic path are characterized by a functional LDP of the form
p[x] ≈ e−I(x)/ε ,
I[x] =
Z T
0
[ẋ(t) − f (x(t))]2 dt.
(10B.17)
This result was derived in Sec. 10.2 using path-integrals. A coarser-grained LDP
can be derived using the contraction principle. That is, if we are only interested in
the pdf p(x, T ) of the state X(T ) given x(0) = 0, then
p(x, T ) ≈ e−Φ(x,T )/ε ,
Φ(x, T ) =
inf
{x(t)}:x(0)=0,x(T )=x
I[x],
(10B.18)
with Φ identified as the quasi-potential.
Supplementary references
1. Ellis, R. S.: The theory of large deviations: From Boltzmanns 1877 calculation to equilibrium
macrostates in 2D turbulence. Physica D, 133, 106-136 (1999).
2. Touchette, H.: The large deviation approach to statistical mechanics. Phys. Rep., 478, 1-69
(2008).
3. Touchette, H.: A basic introduction to large deviations: Theory, applications, simulations.
arXiv:1106.4146v3 (2012).
4. Varadhan, S. R. S.: Asymptotic probabilities and differential equations. Comm. Pure Appl.
Math. 19, 261-286 (1966).
6
Download