A tour on the computational side of Large Deviation Theory Eric Vanden-Eijnden Courant Institute • Rare events matter - challenge for modeling and computations. • Rare events pathways are often predictable - large deviation theory (LDT). Numerical aspects: minimum action method & string method. • LDT-based importance sampling strategies - UQ and reliability Q, filtering, etc. • Beyond LDT - when entropy matters. It is easier to go down a hill than up but the view is much better at the top - H. W. Beecher Friday, September 13, 13 Unlikely (or infrequent) events matter • Some events are rare (i.e. infrequent) but have dramatic consequences - massive earthquakes, giant hurricanes, pandemics, etc. • Rare events can be less dramatic but important nonetheless - for example, electronic components or engineering devices used in automobile, aerospace and medical are required to be extremely reliable, with very small probability of failure. • Rare events always happen eventually given enough time and may be the most interesting/important aspect of the system’s dynamics - e.g. conformational changes of macromolecules, kinetic phase transitions, thermally induced magnetization reversal in micromagnets, regime changes in climate, etc. - Metastability. Friday, September 13, 13 Unlikely (or infrequent) events matter • Realistic models (SODEs, SPDEs, Markov jump processes, etc.) are often too complex to be amenable to analytical solution. • Direct numerical simulations of these models very challenging or even impossible. • Requires new computational approaches based e.g. on large deviation theory (LDT). • Requires to go beyond LDT in certain situations - entropic effects, ruggedness, etc. Friday, September 13, 13 Large Deviation Theory • The way rare events occur is often predictable: the probability of the rare event is dominated by the most likely (i.e. least unlikely) scenario for it to happen - essence of large deviation theory. Happy families are all alike; every unhappy family is unhappy in its own way -Tolstoi. • Calculation of the path of maximum likelihood (PML) reduces to a deterministic optimization problem. 1.2 1 • For example, in gradient systems (i.e. 0.8 0.6 0.4 0.2 0 1.5 −0.2 1 −0.4 0.5 −1 −0.5 0 x 0.5 1 position y systems navigating over an energy landscape), rare events are associated with barrier crossing events and follow the minimum energy path (MEP) connecting two minima of the energy potential. 0 −0.5 −1 −1.5 0 2 4 6 time Friday, September 13, 13 8 10 4 x 10 Freidlin-Wentzell Approach to LDT Wentzell-Freidlin theory of large deviations (W. & F.; Da Prato & Zabczyk) Key object: action functional 1 ST ( ) = 2 associated with S(P)DE Z T 1 ( (t))( ˙ (t) b( (t))) 2 dt 0 " " dX (t) = b(X (t))dt + p " (X " (t))dW (t) Then: Probability that {X " (t)}t2[0,T ] be close to a given path { (t)}t2[0,T ] is roughly ⇢ P sup |X " (t) (t)| < ⇣ exp( " 1 ST ( )) . 0tT Reduce estimation of probability to a minimization problem, e.g. P{X " (T ) 2 A|X " (0) = x} ⇣ exp( " where the infinum is taken over all Looking at long times, T ⇣ exp(" Friday, September 13, 13 (·) such that 1 C) 1 inf ST ( )), (0) = x and (T ) 2 A. for some C > 0, the quasi-potential is key: The role of the Quasi-Potential (QP) Looking at long time-intervals, T ⇣ exp(" V : H 2 7! R is the key quantity: 1 C) for some C > 0, the quasi-potential V (x, y) = inf inf {ST ( ) : (0) = x, (T ) = y} T Interpretation: V (x, y) = lim lim lim " T !1 !0 "!0 1 log P{⌧y" (x) T } where ⌧y" (x) is the first is the first entrance time in a -neighborhood of y: ⌧y" (x) = inf{t : X " (t) 2 B (y), X " (0) = x} Long time intervals most relevant since the e↵ects of the noise is unavoidable on these timescales. Friday, September 13, 13 Over every mountain there is a path, although it may not be seen from the valley -T. Roethke Computational side of LDT • Numerical tools (string method, minimum action method - MAM, etc.) have been developed to identify the paths by which rare events are most likely to occur. . String method and MAM evolve curves while controlling their parametrization until they converge to the MLP. Applicable to both gradient systems (where MLPs are MEPs) and non-gradient systems. 1.5 1.2 1 1 0.8 0.5 position y 0.6 0.4 0.2 0 −0.5 0 −1 −0.2 −0.4 −1 −0.5 0 x 0.5 1 −1.5 0 2 4 6 time • These tools have been applied to problems from biochemistry, material sciences, atmosphere-ocean sciences, etc. Friday, September 13, 13 8 10 4 x 10 Geometric rephrasing of QP definition (M. Heymann & E. V.-E., 2008) V (x, y) ⇣ inf inf{ST (') : '(0) = x, '(T ) = y} T nR o = 12 inf kbka sin2 (2⌘)ds : joins x to y where kuk2a = hu, uia with hu, via = ha 1 (x)u, vi if h·, ·i was the original inner-product, and ⌘ is the local value of the angle between and b. Parametrize = {'(s) : s 2 [0, 1]} so that: V (x, y) = inf = nR inf 1 0 k kb(')k k' ( a a 0 sup '(·) ✓(·) '(0)=x H(',✓)=0 '(1)=y R1 h'0 , b(')ia ) ds : '(0) = x, '(1) = y 0 , ✓ids h' 0 o where H(', ✓) is the Hamiltonian associated with the Lagrangian L(', '0 ) = k'0 b(')k2a : H(', ✓) = hb('), ✓i + 12 h✓, a(')✓i Friday, September 13, 13 Geometric rephrasing of QP definition (M. Heymann & E. V.-E., 2008) The associated Euler-Lagrange equations can be written as ⇢ 0 ' = H✓ (', ✓) = b(') + ✓ ✓ 0 = H' (', ✓) = rbT (')✓ where is determined such that 0 = H(', ✓) and we impose some (arbitray) constraint on the paramterization of , e.g. |'0 | = cst. Di↵erentiating the equation for ' and using the one for ✓, this system of equations can be written as 0= 2 0 0 '” + ' H✓' '0 + H✓✓ H' which can be reduced to 0= 2 '” P H✓' ' 0 H✓✓ H' , P = Id where we used P '0 = 0, P '00 = '00 (since |'0 | = cst) and ✓=a Friday, September 13, 13 1 ' 0 b , kbka = k'0 ka '0 ⌦ '0 |'0 |2 (?) Allen-Cahn equation in 2D - MEPs Example: Allen-Cahn in one- and two-dimensions (Faris & Jona-Lasiinio (1d); E, Ren & V.-E.) Allen-Cahn energy for u : [0, 1]2 7! R: E(u) = Z [0,1]2 1 2 |ru|2 + 1 4 1 (1 u2 )2 dx with Dirichlet boundary condition: u = +1 on the right and left edges of [0, 1]2 , u = 1 on top and bottom ones. Two minimizers of the Allen-Cahn energy Simulations by W. Ren Friday, September 13, 13 Allen-Cahn equation in 2D - MEPs Example: Allen-Cahn in one- and two-dimensions (Faris & Jona-Lasiinio (1d); E, Ren & V.-E.) Allen-Cahn energy for u : [0, 1]2 7! R: E(u) = Z [0,1]2 1 2 |ru|2 + 1 4 1 (1 u2 )2 dx with Dirichlet boundary condition: u = +1 on the right and left edges of [0, 1]2 , u = 1 on top and bottom ones. Two minimizers of the Allen-Cahn energy Simulations by W. Ren Friday, September 13, 13 Allen-Cahn equation in 2D - MLPs at finite T Example: Allen-Cahn in two-dimensions (E, Ren & V.-E.) Dynamics: ut = Action: 1 ST (') = 2 Unconstrained reversal Friday, September 13, 13 2 Z u+u T 0 Z ⌦ |'t 3 u + 2 p ' 2" ⌘(t, x) ' + '3 |2 dxdt Time-constrained reversal Allen-Cahn equation in 2D - MLPs at finite T Example: Allen-Cahn in two-dimensions (E, Ren & V.-E.) Dynamics: ut = Action: 1 ST (') = 2 Unconstrained reversal Friday, September 13, 13 2 Z u+u T 0 Z ⌦ |'t 3 u + 2 p ' 2" ⌘(t, x) ' + '3 |2 dxdt Time-constrained reversal Thermally induced magnetization reversal (W. E, W. Ren & E. V.-E., 2003: A. Kent, D. Stein & E. V.-E., 2009 ...) Main building blocks in Magnetoelectronics (used e.g. as storage devices, etc.) Small elements at superparamagnetic limit, where thermal effects become important and limit data retention time by magnetization reversal. Reversal complex due to non-uniformity in space. a) 1 S2 S3 b) C3,4 1 1 V5,6 V3,4 S1 1 V1,2 3 V2 1 V7,8 C C1,2 0 1 0.026 V24 1 m2 S4 V2 V22 5,6 2 V8 V15,16 8 V1 13,14 1 V9,10 2 V6 V2 7 S 0.025 V25 C 0.024 S5 V111,12 6 7 0 m1 1 Dynamics can be reduced to a Markov jump process on energy map, whose nodes are the energy minima and whose edges are the minimum energy paths Friday, September 13, 13 0.023 0.022 S S 1 1 E[m] 7,8 (b) 0.021 (a) 0.02 0.019 0 0.2 0.4 0.6 α 0.8 1 on: 1 Z T Z Non-gradient effects 2 0 ⌦ S ( ' ) = T |' t 2 ' ' + example: 3 2 ' | d x d t AC with c sin( y ) u p amics: u t = 2 u + u u 3 + x + advection 2 " ⌘( t , x ) in 2D with periodic BC MLPs for different shear strengths Simulations by M. Heymann and M. Tao Friday, September 13, 13 on: 1 Z T Z Non-gradient effects 2 0 ⌦ S ( ' ) = T |' t 2 ' ' + example: 3 2 ' | d x d t AC with c sin( y ) u p amics: u t = 2 u + u u 3 + x + advection 2 " ⌘( t , x ) in 2D with periodic BC minimal action 0.22 0.21 S 0.2 min 0.19 MLPs for different shear strengths Simulations by M. Heymann and M. Tao Friday, September 13, 13 0.18 0.17 0 0.2 0.4 c 0.6 0.8 1 on: 1 Z T probable path The system ZRare event and most Gradient system Most probable paths of different types Non-gradient system shear 2 3 Optimal 2 |' ' ' + ' | d x d t t Ginzburg-Landau Sheared SPDE Brief description of the numerical method example: AC with advection Non-gradient effects 2 0 ⌦ S ( ' ) = T amics: u t = 2 u + u u 3 + c sin( y ) u x + p 2 " ⌘( t , x ) in 2D with periodic BC Transition via a limit cycle instead of saddle Action⇡0.4. Dynamics near the limit cycle @ Matlab. minimal action 0.22 0.21 S 0.2 min 0.19 MLPs for different shear strengths Molei Tao and Eric Vanden-Eijnden Simulations by M. Heymann and M. Tao Friday, September 13, 13 0.18 gMAM and most probable paths 0.17 0 0.2 0.4 c 0.6 0.8 1 Unstable limit cycle along PML 14 Friday, September 13, 13 kB(u)k kB(u)k prime denotes di↵erentiation with respect to arclength-coordinate ⌧ . The dition for the velocity field u is u(x, t) = 0 for t ! 1, the final condition iliary field p is p(t = 0 ) = wx as we can see from (5.12) using integration The norm k · k is given by the inverse of on its support (which is compact domain), hence ⇣ ⌘ (with T. Grafke, R. Grauer and T. Schäfer) ▷ Random Burgers equation kf k = f F 1 ˆ 1 fˆ , (5.14) 1 Application to fluid dynamics problems L2 Key question: statistics of high gradients (responsible for dissipation) ˆ denotes the inverse Fourier transform operator and f is the Fourier trans- 0 0 0 0 u + uu = ⌫u + ⌘(x, t), h⌘(x, t)⌘(x , t )i = (x x ) (t t t x xx mparison with previous work [2, 5], it is useful to compare the above equa- ) e corresponding instanton equations in the original parametrization (using as parameter) Hamilton’s equation associated with LD action (cf. Balkovsky et al.; Chernykh & Stepanov): Z ut + uux ⌫uxx = (x x0 )p(x0 )dx0 (5.15a) 1.5 pt + upx + ⌫pxx = 0. geometric t =−1000 0.05 (5.15b) t =−100 min min 1 tmin=−10 −0.05 −20 u(x, t=0) n problem in solving the above equations with the boundary conditions ! 1 and p(0 100, x) = wx consists in the fact that the fixpoint of the 0.5 nly reached in the limit t ! 1 and not in finite time. When Chernykh 200 0 nov computed a numerical solution to the system above, they used a comtwo clever tricks 300 to mitigate this difficulty: First, they used self-similar of the heat-kernel to design a coordinate transform that leads−0.5to an expo400 aling in time, second they used the linearization of the system around zero −1 replace the boundary condition u = 0 by u = p (where the constant 0 0 500 −20 −15 −10 200 function 400 600 800 following 1000 mputed from the correlation ). In the we show that the 0 −15 −10 −5 100 200 300 15 400 500 200 400 600 Friday, September 13, 13 800 1000 −5 0 x 5 10 15 20 Two-dimensional barotropic QG equations Predictions of equilibrium statistical mecha @q @q @ + r? · rq + U(t) + = Dq ( )q + Fq @t @x @x R IM maximizes µ log µ under the constr q = the+entropy h, Z Z@ dU = h dx dy + DU (U) + FU , dt @x Canonical: Q dµ = Q0 , Microcanonical: E = Application to fluid dynamics problems Motivation Methodology Stochastic Motivation ODE model ▷ 2D Methodology Stochastic barotropic QG equation (with W. Ren & X. Yang) Here (t, x, y ) is the stream function, q(t, x, y ) the vort 2 Dissipation via Ekman drag: 1 = , 2 = eh(x, y ) the bottom topography, U(t) the zonal base flow Stochastic model Metastable states of invariant measure PDE FDT 1 1 the beta-plane dµ = Zparameter exp( "(Pedlosky Q) (E 79, E0Vallis ) dUd05). , Dissipation by Eckman drag - fixes the noise by Conclusions PDE model Stochastic ODEConclusions model Inviscid equations with D = F = 0 preserve p @q @q @ ? +r · rq + U(t) + = q + 2" ⌘(t, x, y ) @t @x @x Minimize the enstrophy Q subject to the constraint E = E0 : q= + h, Z Z p 1 dU @ min Q = U + q 2+dxeUdy ,(t), = h dx e + 2"e⇠(t) 2 U, dt @x Z 1 1 s.t. U 2 + |r |2 dx dy = E0 . 2 equilibrium distribution Most likely Z 2states of p p minimize enstrophy ( q + at2"E = cst: ⌘) dx + U(e 2"e ⇠) Using Lagrange multiplier, where (t) = µUc = , µ Z 2 2 | | dx + e U ( c = c + h). states – in QG (t),Most likely states = selective decay Z 1 enstrophy varies faster1than energy. Energy E = U2 + |r |2 dx dy , 2 Only choice leading to 2multimodal Z distribution. Enstrophy Q= U+ 1 2 |q|2 dx dy . Dynamics? choose D and F so that we formally have invariant measure (cf strategy of Landau-Lifshitz for N equations) 50 . 40 30 E N.B. the stationary solutions original system N.B.These fixing are the damping fixes the forcing of by the fluctuation-dissipation For E large enough, there are more than one solution, 20 without dissipation and forcing. theorem (FDT). out of which two are stable in absence of noise. 10 0 −15 Friday, September 13, 13 −10 −5 µ 0 5 16 Example (16-mode topography Motivation We take the domain size as Lx = parameters are chosen as = 1, coefficients are taken as = e = Application to fluid dynamics problems Methodology Stochastic ODE model ▷ 2D Stochastic PDE model barotropic QGStatistical equation Nonequilibrium Mechanics of Climate Variability (a) Conclusions (b) (c) 6 2.5 4 2.5 4 2 2 2 2 2 2 1.5 0 1.5 0 1.5 0 y 4 1 −2 1 −2 1 −2 0.5 −4 0.5 −4 0.5 −4 0 0 2 4 6 0 0 −6 2 4 6 0 0 −6 2 4 x x x (d) (e) (f ) 6 6 6 −6 6 4 2.5 4 2.5 4 2 2 2 2 2 2 1.5 0 1.5 0 1.5 0 y 2.5 y 1 −2 1 −2 1 −2 0.5 −4 0.5 −4 0.5 −4 0 0 2 4 6 0 0 −6 2 4 6 0 0 −6 2 4 x x x (g) (h) (i) 6 6 6 −6 6 4 2.5 4 2.5 4 2 2 2 2 2 2 1.5 0 1.5 0 1.5 0 y 2.5 y y 6 2.5 y y 6 y 1615 modes topography with periodic BC: 1 −2 1 −2 1 −2 0.5 −4 0.5 −4 0.5 −4 0 0 2 4 x 6 −6 0 0 2 4 x 6 −6 0 0 2 4 6 −6 x Snapshots of streamfunction along PML path from X 0 to X 1 ; the Figure 8. Snapshots of the stream function along the transition states labeled by (a) and (i) are the initial and final states, and (f ) is the saddle point. The Friday, September 13, 13 17 Markov Jump Processes • M species involved in N reactions with rates depending on system state • Reactions are fast but lead to small changes: (Lf )(x) = N X " 1 aj (x) (f (x + "⌫j ) j=1 • Hamiltonian of LDT: H(x, ✓) = N X ⇣ aj (x) eh✓,⌫j i j=1 ▷ Application f (x)) ⌘ 1 to genetic toggle switch (Roma et al.) 3.5 25 3 20 2.5 2 15 1.5 Expression of gene a inhibits expression of gene b and vice-versa 10 1 5 0.5 0 20 40 Friday, September 13, 13 60 80 100 120 140 20 40 60 80 100 120 140 160 Beyond LDT - when entropy matters • LDT can fail if entropic effects matter - many alternative paths for the event, with lower probability individually, but large one globally. • These situations require a more general approach to rare event analysis. Friday, September 13, 13 (Weinan E & V.-E. 2006, ... Schütte, Metzner & V.-E. 2009, Berezhkovskii, Hummer & Szabo 2009) Transition Path Theory • Main idea: focus on `reactive’ trajectories associated with a given transition (reaction) • Generator = transition rate matrix A B i, j 2 S = {1, 2, . . . , N } Li,j • Microscopic reversibility (detailed balance) µi Li,j = µj Lj,i • Probability distribution of reactive trajectories µR i = µi qi (1 • Probability current of reactive trajectories R fi,j = µi Li,j (qj • Committor function: Friday, September 13, 13 qi ) + qi ) Generator of loop erased reactive paths LR i,j = Li,j (qj qi = Ei (⌧B < ⌧A ) qi ) + The example of a maze No energy landscape, no clear transition state - not a mountain pass problem. Many different reactive paths from entrance to exit - the walkers wanders in the many deadends of the maze and this affects dramatically the rate of the `reaction’. Yet the current of these reactive trajectories concentrates on a single path. Committor Effective current 1 A 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 B Friday, September 13, 13 0 Reorganization of LJ cluster after self-assembly with Masha Cameron • Double funnel landscape - ground state is not accessible directly by self-assembly and requires dynamical reorganization; • Can be described by a MJP using the network calculated by Wales; icosahedron • ~1e4 local minima of potential; • ~1e4 saddle points between these minima; • Network of orbits (minimum energy paths); • Transition matrix ; Li,j = ⌫e Vi,j /" • Spectral analysis difficult (large network, many small eigenvalues). • Can be analyzed by TPT. • LDT only applicable at extremely low temperatures. At physical temperature, system remains strongly metastable, but pathways and rate of transition are different than those predicted by LDT (entropic effects). Friday, September 13, 13 octahedron Reorganization of LJ cluster after self-assembly with Masha Cameron Number of transition pathways increase with T They become shorter and go thru higher barriers Figure 3: The cartoon of the transition process at T = 0.05 from ICO (state 7) to FCC (state 1). The edges shown carry at least 10% of the total reactive flux from ICO to FCC. The blue arrows indicate the dominant representative pathway. The red arrows indicate the highest barriers: V(342,254) = 4.219 and V(958,1) = 4.272. The percentages next to each arrow show the percentages of the total reactive flux carried by the edge. The values of the committor q at some states are shown. Friday, September 13, 13 Reorganization of LJ cluster after self-assembly with Masha Cameron Number of transition pathways increase with T They become shorter and go thru higher barriers Figure 3: The cartoon of the transition process at T = 0.05 from ICO (state 7) to FCC (state 1). The edges shown carry at least 10% of the total reactive flux from ICO to FCC. The blue arrows indicate the dominant representative pathway. The red arrows indicate the highest barriers: V(342,254) = 4.219 and V(958,1) = 4.272. The percentages next to each arrow show the percentages of the total reactive flux carried by the edge. The values of the committor q at some states are shown. Friday, September 13, 13 Reorganization of LJ cluster after self-assembly with Masha Cameron T= 0.05 T= 0.15 Percentage of pathways Percentage of pathways 342 - 354 342 -354 958 - 1 958 - 1 3223 - 354 958 - 607 3886 - 354 351 - 354 3184 - 2831 396 - 1502 1208 - 3299 248 - 3299 355 - 354 4950 - 2933 396 - 5162 The highest potential barrier Friday, September 13, 13 958 - 607 3223 - 354 The highest potential barrier Conclusions • LDT gives rough estimate of probability and path of maximum likelihood by which rare event occur. • Can be integrated in importance sampling procedures to estimate the probability of the rare events, its rate of occurrence, etc. • Can be used in data assimilation techniques - allows to optimally incorporate the information from noisy observations. Also offers the possibility to prevent or precipitate the occurrence of rare events in situations where the system’s evolution can be acted upon. • In situation where entropy matters and LDT becomes inapplicable, TPT is the right framework. Thanks to Weinan E, Weiqing Ren, Matthias Heymann, Masha Cameron, Jonathan Weare, ... Thank you! Friday, September 13, 13 A few references • W. E, W. Ren, and and E. V.-E. ``String method for the study of rare events,'' Phys. Rev. B. 66 (2002), 052301. • W. E, W. Ren, and E. V.-E., ``Minimum action method for the study of rare events,'' Comm. Pure Applied Math. 52, 637-656 (2004). • M. Heymann and E. V.-E., ``The Geometric Minimum Action Method: A least action principle on the space of curves,'' Comm. Pure. App. Math. 61(8), 1052-1117 (2008). • W. E and E. V.-E., ``Toward a theory of transition paths,'' J. Stat. Phys. 123, 503-523 (2006). • W. E and E. V.-E., ``Transition-Path Theory and Path-Finding Algorithms for the Study of Rare Events,'' Annu. Rev. Phys. Chem. 61, 391-420 (2010). 26 Friday, September 13, 13