10B A basic introduction to large deviation theory In Chap. 10, we developed the theory of escape problems in the weak noise limit (rare events) using a combination of WKB methods and path-integrals. In this section, we present a basic introduction to a more fundamental mathematical approach to rare events, namely, large deviation theory. We follow closely the notes of Touchette [3], which are a shorter version of the review [2]. Let us begin with a simple motivating example of large deviations. Consider the sum (sample mean) 1 n Sn = ∑ Xi , n i=1 where X1 , X2 , . . . is an i.i.d. sequence of random variables generating from the probability density p(X) with X ∈ R. The joint probability density function (pdf) is n p(X1 , . . . , Xn ) = ∏ p(X j ). j=1 The corresponding pdf of the sum Sn is obtained as follows: pSn (s) = P[Sn = s] = Z ∞ −∞ dx1 . . . Z ∞ −∞ n dxn δ ( ∑ xi − ns)p(x1 , . . . , xn ). Introducing the Fourier representation of the Dirac delta function, δ (x) = and using the product decomposition of the joint pdf, Z ∞ Z ∞ dk −ikns n ikx j e e p(x )dx pSn (s) = j j ∏ −∞ −∞ 2π j=1 = Z ∞ −∞ pe(k)n e−ikns dk 2π In the case of a Gaussian pdf with mean µ and variance σ 2 , pe(k) = eikµ e−σ 2 k2 /2 , and in the case of an exponential pdf on [0, ∞) with mean µ, pe(k) = µ . 1 − iµk Substituting into the integral expression for pSn (s) then gives pSn (s) = Z ∞ −∞ eikn(µ−s) e−nσ (10B.1) i=1 2 k2 /2 for the Gaussian, and 1 2 2 dk 1 =√ e−n(µ−s) /2σ 2 2π 2πnσ R∞ −∞ e ikx dk/2π, pSn (s) = Z ∞ −∞ e−ikns I µ 1 − iµk n dk 2π e−izns/µ dz µ n−1 = [1 − iz]n 2π (n − 1)! n n−1 n s = e−ns/µ . (n − 1)! µ µn = µ −ns µ n−1 e−ns/µ We have closed the contour in the lower-half complex z-plane and used the following residue theorem for an analytic function f (z): f (n−1) 1 (z0 ) = (n − 1)! I f (z) dz . (z − z0 )n 2πi We now note that in both cases the leading order behavior in n for large n can be expressed as pSn (s) ≈ e−nI(s) , (10B.2) where I(s) = (s − µ)2 , 2σ 2 s∈R (10B.3) for the Gaussian pdf and 1 I(s) = − {(n − 1) ln n − ln[(n − 1)!] − n ln(s/µ) − ns/µ} n s = − 1 − ln(s/µ), s ≥ 0 µ (10B.4) for the exponential pdf (after applying Strirling’s formula). In both cases I(s) ≥ 0 and I(s) = 0 only when s = µ = E[X]. Since the pdf of Sn is normalized, it becomes more and more concentrated around s = µ as n → ∞, that is, pSn (s) → δ (s − µ) in the large-n limit. Large deviation principle The leading order exponential form e−nI(s) found for the Gaussian and exponential pdfs is the fundamental property of large deviation theory, which is known as the large deviation principle. It arises in a much wider range of stochastic process then sums of i.i.d. random variables. Following Touchette [3], we will avoid the technical aspects of the rigorous formulation of the large deviation principle. For our purposes, a random variable Sn or its pdf p(Sn ) satisfies a large deviation principle (LDP) if the following limit exists: 1 lim − ln[pSn (s) = I(s), n n→∞ 2 (10B.5) with I(s) the so-called rate function. A more rigorous definition involves probability measures on sets rather than in terms of pdfs, and gives lower and upper bounds on these probabilities rather than a simple limit [4, 2]. One of the main goals of large deviation theory is to identify stochastic processes that satisfy an LDP, and to develop analytical (and numerical) methods for determining the associated rate function. Touchette [3] identifies three approaches to establishing an LDP for a random variable Sn : 1. Direct method: Derive an explicit expression for p(Sn ) and show that it has the form of an LDP. (This was the method used to derive the LDP for Gaussian and exponential sample means.) 2. Indirect method: Calculate certain functions of Sn that can be used to infer that Sn satisfies an LDP . One example is to use a generating function and apply the Gartner-Ellis Theorem (see below). 3. Contraction method: Relate Sn to another random variable An , say, that is known to satisfy an LDP and use this to derive an LDP for Sn . Varadhan and Gartner-Ellis Theorems In order to discuss in more detail the indirect method, we need to introduce two theorems. Again, we will sacrifice mathematical rigor for ease of presentation. The first theorem is due to Varahadan, and is concerned with the calculation of the functional expectation Z Wn [ f ] = E[en f (Sn ) ] = pSn (s)en f (s) ds. (10B.6) R If Sn satisfies an LDP with rate function I(s), then for large n Wn [ f ] ≈ Z R en[ f (s)−I(s)] ds ≈ en sups [ f (s)−I(s)] , (10B.7) after making a saddle-point approximation. We have exploited the fact that the remaining corrections are sub-exponential in n. We now introduce a functional λ [ f ] such that 1 (10B.8) λ [ f ] ≡ lim lnWn [ f ] = sup{ f (s) − I(s)}. n→∞ n s∈R This is a statement of the Varadhan Theorem [4], which was originally applied to bounded functions f , but also holds for unbounded functions. Consider the special case f (s) = ks with k ∈ R. The functional Wn [ f ] reduces to a scaled generating function of Sn , that is, W [kn ] = E[eknSn ], and λ [ f ] reduces to the scaled cumulant generating function for Sn , λ (k), with 3 λ (k) ≡ lim 1 n→∞ n ln E[enkSn ] = sup{ks − I(s)}. (10B.9) s∈R Hence, the Varadhan Theorem implies that if Sn satisfies an LDP with rate function I(s), then the scaled cumulant generating function λ (k) of Sn is the LegendreFenchel transform of I(s). It turns out that this result can be inverted, which is the basis of the Gartner-Ellis Theorem [1]: if λ (k) is differentiable, then Sn satisfies an LDP and the corresponding rate function I(s) is given by the Legendre-Fenchel transform of λ (k): I(s) = sup{ks − λ (k)}. (10B.10) k∈R The usefulness of this result is that one can often calculate λ (k) without knowing the full pdf p(Sn ). However, it is important to note that not all rate functions can be calculated using the Gartner-Ellis Theorem. Some properties of I(s) and λ (k) We first discuss some properties of λ (k). By definition, λ (0) = 0, since a pdf is normalized to unity. Moreover, E[Sn enkSn ] 0 = lim E[Sn ] (10B.11) λ (0) = lim n→∞ n→∞ E[enkSn ] k=0 assuming λ 0 (0) exists. Similarly λ 00 (0) = lim n Var Sn . (10B.12) n→∞ Another important feature of λ (k) is that it is always convex. This follows from an integral version of Holder’s inequality: !p !q ∑ |yi zi | ≤ ∑ |y j |1/p i j ∑ |z j |1/q j , 0 ≤ p, q ≤ 1, p + q = 1. Setting ya = enαk1 a , za = en(1−α)k2 a and p = α, we have D Eα D E1−α D E ≥ en(αk1 +(1−α)k2 )Sn . enk1 Sn + enk2 Sn Taking the logarithm of both sides gives D E D E D E α ln enk1 Sn + (1 − α) ln enk2 Sn ≥ ln en(αk1 +(1−α)k2 )Sn . Finally, multiplying both sides by 1/n and taking the limit n → ∞ shows that 4 αλ (k1 ) + (1 − α)λ (k2 ) ≥ λ (αk1 + (1 − α)k2 ), (10B.13) which establishes that λ (k) is convex. It can also be shown from properties of the Legendre-Fenchel transform that rate functions obtained from the Gartner-Ellis Theorem are strictly convex. We thus infer one limitation of the Gartner-Ellis Theorem, namely, it cannot generate non-convex rate-functions with more than one minimum. Suppose that a rate function can be determined from the Legendre-Fenchel transform of λ (k) according to the Gartner-Ellis theorem. Differentiability and convexity of λ (k) then imply that the Legendre-Fenchel transform reduces to a standard Legendre transform, that is, I(s) = k(s)s − λ (k(s)), (10B.14) with k(s) the unique root of λ 0 (k) = s. Differentiating the Legendre transform equation with respect to s gives I 0 (s) = k0 (s)s + k(s) − λ 0 (k(s))k0 (s) = k(s), I 00 (s) = k0 (s) = 1 . λ 00 (k) It follows that if λ (k) is strictly convex, λ 00 (k) > 0, then so is I(s). Finally, note that any rate function obtained from the Gartner-Ellis Theorem has a unique global minimum whose root s∗ satisfies I 0 (s∗ ) = k(s∗ ) = 0. This implies that s∗ = λ 0 (k(s∗ )) = λ 0 (0) = lim E[Sn ], n→∞ and I(s∗ ) = 0. Thus, in the large-n limit the pdf concentrates around the mean, which is an expression of the law of large numbers. This suggests that one interpretation of the LDP is that it quantifies the likelihood of a large deviation from the mean. The contraction principle Let An be a random variable that has an LDP with rate function IA (a). Consider a second random variable Bn = f (An ). Does this also satisfy an LDP and, if so, what is its rate function? In order to address this issue, first write the pdf of Bn in terms of the pdf of An : Z pBn (b) = {a: f (a)=b} pAn (a)da. Using the LDP for An and the saddle-point method gives Z −nIA (a) pBn (b) ≈ e da ≈ exp −n inf IA (a) . {a: f (a)=b} {a: f (a)=b} Hence, p(Bn ) also satisfies an LDP with rate function IB (b) = inf IA (a). {a: f (a)=b} 5 (10B.15) The latter formula is known as the contraction principle, since f may be many-toone, that is, there may be several a’s for which b = f (a), in which case information regarding the rate function of A is “contracted” down to Bn . LDP for a stochastic differential equation (SDE) We now link the general discussion of LDPs to the case of an SDE with weak noise considered in Sec. 10.2. Consider the one-dimensional SDE with additive noise √ (10B.16) dX(t) = f (X(t) + εdW (t), X(0) = 0, with W (t) a Wiener process. We are interested in the pdf of random paths {X(t),t ∈ [0, T ]} of duration T in the limit where the noise strength ε vanishes. Denote the pdf by the functional p[x]. The occurrence of an LDP in the low noise limit is a reflection of the fact that random paths of the SDE should converge in probability to the deterministic path given by the solution to the ODE ẋ(t) = f (x(t)), x(0) = 0. The path fluctuations away from the deterministic path are characterized by a functional LDP of the form p[x] ≈ e−I(x)/ε , I[x] = Z T 0 [ẋ(t) − f (x(t))]2 dt. (10B.17) This result was derived in Sec. 10.2 using path-integrals. A coarser-grained LDP can be derived using the contraction principle. That is, if we are only interested in the pdf p(x, T ) of the state X(T ) given x(0) = 0, then p(x, T ) ≈ e−Φ(x,T )/ε , Φ(x, T ) = inf {x(t)}:x(0)=0,x(T )=x I[x], (10B.18) with Φ identified as the quasi-potential. Supplementary references 1. Ellis, R. S.: The theory of large deviations: From Boltzmanns 1877 calculation to equilibrium macrostates in 2D turbulence. Physica D, 133, 106-136 (1999). 2. Touchette, H.: The large deviation approach to statistical mechanics. Phys. Rep., 478, 1-69 (2008). 3. Touchette, H.: A basic introduction to large deviations: Theory, applications, simulations. arXiv:1106.4146v3 (2012). 4. Varadhan, S. R. S.: Asymptotic probabilities and differential equations. Comm. Pure Appl. Math. 19, 261-286 (1966). 6