presentation source

Monte Carlo Methods for Quantum Field Theory A D Kennedy adk@ph.ed.ac.uk Maxwell Institute, the University of Edinburgh NTU, Taipei, Taiwan; 10 September, 1999 Functional Integrals and QFT  The Expectation value of an operator  is defined non-perturbatively by the Functional Integral 1  S ( )    d e ( ) Z  Normalisation constant Z chosen such that  The action is S()  Defined in Euclidean space-time  Lattice regularisation  d is the appropriate functional measure 1 1 d   d x  Continuum limit: lattice spacing a  0  Thermodynamic limit: physical volume V   2 x NTU, Taipei, Taiwan; 10 September, 1999 Monte Carlo methods  Monte Carlo integration is based on the identification of probabilities with measures  There are much better methods of carrying out low dimensional quadrature  all other methods become hopelessly expensive for large dimensions  in lattice QFT there is one integration per degree of freedom – we are approximating an infinite dimensional functional integral 3 NTU, Taipei, Taiwan; 10 September, 1999 Monte Carlo methods  Generate a sequence of random field configurations 1,2 , ,t , , N  chosen from the probability distribution 1  S (t ) P(t ) dt  e dt Z  Measure the value of  on each configuration and compute the average 1 N    (t ) N t 1 4 NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem  Law of Large Numbers   lim   Central Limit Theorem     O N   where the variance of the distribution of  is    C2 N C2     2 The Laplace–DeMoivre Central Limit theorem is an asymptotic expansion for the probability distribution of   Distribution of values for a single sample  = () P ( )   d P            5 NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem  Generating function for connected moments W (k )  ln  d P ( ) e  ln  d P  e ik ik    ln e ik   n 0 ik n n! Cn  The first few cumulants are C0  0 C3     C1   C4    C2     3 4   3C22 2  Note that this is an asymptotic expansion 6 NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem  Distribution of the average of N samples 1 N   P     d1  d N P1  P N      t  N t 1    Connected generating function W (k )  ln  d  P ( ) eik   ik N   ln  d1 d N P1  P N  exp   t   N t 1    ik   N N  N ln e  ln  d P  e n  ik  Cn k   NW     n 1  N  n1 n! N 7 ik N NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem  Take inverse Fourier transform to obtain distribution P 1 W  k  ik  P    dk e e  2 e C3 d 3 C4 d 4    2 3 3 4 3!N d  4!N d  e 8  C3 d 3 C d4  4  2 3 3 4 3!N d  4!N d  e     dk 2  ik   21N ik  C2 ik  2 e e 2 2 C2 N 2 C N 2 NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem 9 NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem  Re-scale to show convergence to Gaussian distribution d P    F   d  where       N and    2 2C2  C3   3C2 e F    1    3 6C2 N   2C2 10 2 NTU, Taipei, Taiwan; 10 September, 1999 Importance Sampling  Integral I   dx f (x)  Sample from distribution 0   ( x)  1 Normalisation N   dx (x)  1 Probability  Estimator of integral  Estimator of variance f ( x) I    ( x) dx  ( x) 2 f ( x)  f ( x)  V    ( x)dx   I    dx  I2  ( x)   ( x)  11 2 NTU, Taipei, Taiwan; 10 September, 1999 Importance Sampling  Minimise variance Constraint N=1 Lagrange multiplier   (V  N ) f ( y)2   0 2  ( y )  ( y) f ( x)  Optimal measure   ( x )  opt  dx f ( x)  12 Optimal variance Vopt   dx f ( x)    dx f ( x)  2 2 NTU, Taipei, Taiwan; 10 September, 1999 2 0 Importance Sampling  Example: f ( x)  sin( x)  Optimal weight  ( x)  12 sin( x)  Optimal variance 2 Vopt 13 2 2 dx sin x 2        dx sin( x)     dx sin( x)   16 0  0  NTU, Taipei, Taiwan; 10 September, 1999 Importance Sampling 2 0 dx sin x 1 Construct cheap approximation to |sin(x)| 2 Calculate relative area within each rectangle 3 Choose a random number uniformly 4 Select corresponding rectangle 5 Choose another random number uniformly 6 Select corresponding x value within rectangle 7 Compute |sin(x)| 14 NTU, Taipei, Taiwan; 10 September, 1999 2 0 Importance Sampling dx sin x  With 100 rectangles we have V = 16.02328561  … but we can do better! 2 2 0 0 I   dx sin( x) sin( x)    dx sin( x)  sin( x)  Vopt  0  For which  With 100 rectangles we have V = 0.011642808 15 NTU, Taipei, Taiwan; 10 September, 1999 Random number generators  Pseudorandom number generators  Random numbers used infrequently in Monte Carlo for QFT – compared to spin models  Linear congruential generator xn1  axn  b mod m  – for suitable choice of a, b, and m (see, e.g., Donald Knuth, Art of Computer Programming) – usually chose b = 0 and m = power of 2 – seem to be good enough in practice – problems for spin models if m is too small a multiple of V – parallel implementation trivial xnV  axn  b mod m  a aV mod m  b  V 1 j a  b mod m  j 0 16 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains State space   (Ergodic) stochastic transitions P’:     Deterministic evolution of probability distribution P: Q  Q  Distribution converges to unique fixed point Q  17 NTU, Taipei, Taiwan; 10 September, 1999 Proof of convergence of Markov chains   Define a metric d Q1, Q2    dx Q1  x   Q2  x  on the space of (equivalence classes of) probability distributions Prove that d  PQ1, PQ2   1    d Q1, Q2  with  > 0, so the Markov process P is a contraction mapping  The sequence Q, PQ, P²Q, P³Q,… is Cauchy  The space of probability distributions is complete, so the sequence converges to a unique fixed point Q  lim P nQ n 18 NTU, Taipei, Taiwan; 10 September, 1999 Proof of convergence of Markov chains d  PQ1, PQ2    dx PQ1  x   PQ2  x    dx  dy P x  y Q1  y    dy P x  y Q2  y  Q y   Q1  y   Q2  y    dx  dy P x  y Q y    y     y   1   dx  dy P x  y Q y  Q y     Q y  a  b  a  b  2 min a , b    dx  dy P x  y  Q y   2 dx min  dy P x  y Q y   Q y   19 NTU, Taipei, Taiwan; 10 September, 1999 Proof of convergence of Markov chains d  PQ1, PQ2   dx P x  y   1   dy Q y   2 dx min  dy P x  y Q y   Q y     dy Q y   2 dx inf P x  y  min  dy Q y   Q y  y   dy Q y  Q y    dy Q y   Q y    dy Q y    dy Q1  y    dy Q2  y   1  1 0   dy Q y   Q y   12  dy Q y    dy Q y    dx inf P x  y  dy Q y   d Q1,Q2  y 0    1   dx inf P x  y   1 y 20 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains  Use Markov chains to sample from Q  Suppose we can construct an ergodic Markov process P which has distribution Q as its fixed point  start with an arbitrary state (“field configuration”)  iterate the Markov process until it has converged (“thermalized”)  thereafter, successive configurations will be distributed according to Q – but in general they will be correlated  to construct P we only need relative probabilities of states – don’t know the normalisation of Q – cannot use Markov chains to compute integrals directly – we can compute ratios of integrals 21 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains   How do we construct a Markov process with a specified fixed point? Q x    dy P x  y Q y  Detailed balance P y  x Q x   P x  y Q y   integrate w.r.t. y to obtain fixed point condition  sufficient but not necessary for fixed point   Q x    Metropolis algorithm P x  y   min 1,  Q y    consider cases Q(x)>Q(y) and Q(x)<Q(y) separately to obtain detailed balance condition  sufficient but not necessary for detailed balance Q x   other choices are possible, e.g., P x  y   Q x   Q y  22 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains  Composition of Markov steps  Let P and P be two Markov steps which have the desired fixed point distribution...  … they need not be ergodic  Then the composition of the two steps P  P will also have the desired fixed point...  … and it may be ergodic  This trivially generalises to any (fixed) number of steps  For the case where P is not ergodic but (P) is the terminology “weakly” and “strongly” ergodic are sometimes used 23 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains  This result justifies “sweeping” through a lattice performing single site updates  each individual single site update has the desired fixed point because it satisfies detailed balance  the entire sweep therefore has the desired fixed point, and is ergodic...  … but the entire sweep does not satisfy detailed balance  of course it would satisfy detailed balance if the sites were updated in a random order…  … but this is not necessary  … and it is undesirable because it puts too much randomness into the system (more on this later) 24 NTU, Taipei, Taiwan; 10 September, 1999 Autocorrelations  Exponential autocorrelations  the unique fixed point of an ergodic Markov process corresponds to the unique eigenvector with eigenvalue 1  all its other eigenvalues must lie within the unit circle  in particular, the largest subleading eigenvalue is max  1  this corresponds to the exponential autocorrelation time N exp   25 1 ln max 0 NTU, Taipei, Taiwan; 10 September, 1999 Autocorrelations  Integrated autocorrelations  consider the autocorrelation of some operator  without loss of generality we may assume  26 2  0 1 N N  2  t t  N t 1 t1 N 1 N 1 N  2  2  t   2   t t   N  t 1  t 1 t t 1 NTU, Taipei, Taiwan; 10 September, 1999 Autocorrelations  define the autocorrelation function C     2 t  t  1 2 2 N 1        N   C   2  N N 1    2  the autocorrelation function must fall faster that the exponential autocorrelation C    max  e Nexp  for a sufficiently large number of samples 2    N exp     1  2 C   1  O     N   1  N    define the integrated autocorrelation function  27 2  1  2 A     N exp  1  O   N  N   2  A   C    1 NTU, Taipei, Taiwan; 10 September, 1999 Continuum limits  We are not interested in lattice QFTs per se, but in their continuum limit as a  0  this corresponds to a continuous phase transition of the discrete lattice model – for the continuum theory to have a finite correlation length a (inverse mass gap) in physical units this correlation length must diverge in lattice units  we expect such systems to exhibit universal behaviour as they approach a continuous phase transition – the nature of the continuum theory is characterised by its symmetries – the details of the microscopic interactions are unimportant at macroscopic length scales – universality is a consequence of the way that the the theory scales as a  0 while a is kept fixed 28 NTU, Taipei, Taiwan; 10 September, 1999 Continuum limits  the nature of the continuum field theory depends on the way that physical quantities behave as the system approaches the continuum limit  the scaling of the parameters in the action required is described by the renormalisation group equation (RGE) – the RGE states the “reparameterisation invariance” of the theory as the choice of scale at which we choose to fix the renormalisation conditions – as a  0 we expect the details of the regularisation scheme (cut off effects, lattice artefacts) to vanish, so the effect of changing a is just an RGE transformation • on the lattice the a renormalisation group transformation may be implemented as a “block spin” transformation of the fields • strictly speaking, the renormalisation “group” is a semigroup on the lattice, as blockings are not invertible 29 NTU, Taipei, Taiwan; 10 September, 1999 Continuum limits  the continuum limit of a lattice QFT corresponds to a fixed point of the renormalisation group – at such a fixed point the form of the action does not change under an RG transformation  the parameters in the action scale according to a set of critical exponents  all known four dimensional QFTs correspond to trivial (Gaussian) fixed points – for such a fixed point the UV nature of the theory may by analysed using perturbation theory – monomials in the action may be classified according to their power counting dimension • d < 4 relevant (superrenormalisable) • d = 4 marginal (renormalisable) • d > 4 irrelevant (nonrenormalisable) 30 NTU, Taipei, Taiwan; 10 September, 1999 Continuum limits  The behaviour of our Markov chains as the system approaches a continuous phase transition is described by its dynamical critical exponents  these describe how badly the system (model + algorithm) undergo critical slowing down  the dynamical critical exponent z tells us how the cost of generating an independent configuration grows as the correlation length of the system is taken to , cost   z  this is closely related (but not always identical) to the dynamical critical exponent for the exponential or integrated autocorrelations 31 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains  “Coupling from the Past”  Propp and Wilson (1996)  Use fixed set of random numbers  Flypaper principle: if states coalesce they stay together forever – Eventually, all states coalesce to some state  with probability one – Any state from t = - will coalesce to  –  is a sample from the fixed point distribution  32 NTU, Taipei, Taiwan; 10 September, 1999 Global heatbath  The ideal generator selects field configurations randomly and independently from the desired distribution  it is hard to construct such global heatbath generators for any but the simplest systems  they can be built by selecting sufficiently independent configurations from a Markov process…  … or better yet using CFTP which guarantees that the samples are truly uncorrelated  but this such generators are expensive! 33 NTU, Taipei, Taiwan; 10 September, 1999 Global heatbath  For the purposes of Monte Carlo integration these is no need for the configurations to be completely uncorrelated  we just need to take the autocorrelations into account in our estimate of the variance of the resulting integral  using all (or most) of the configurations generated by a Markov process is more cost-effective than just using independent ones  the optimal choice balances the cost of applying each Markov step with the cost of making measurements 34 NTU, Taipei, Taiwan; 10 September, 1999 Local heatbath    35 For systems with a local bosonic action we can build a Markov process with the fixed point distribution  exp(-S) out of steps which update a single site with this fixed point If this update generates a new site variable value which is completely independent of its old value then it is called a local heatbath For free field theory we just need to generate a Gaussian-distributed random variable NTU, Taipei, Taiwan; 10 September, 1999 Gaussian generators  If x1, x2,, xn  are uniformly distributed random n numbers then n1  x j is approximately Gaussian j 1 by the Central Limit theorem  This is neither cheap nor accurate 36 NTU, Taipei, Taiwan; 10 September, 1999 Gaussian generators  If x is uniformly distributed and f is a monotonically increasing function then f(x) is distributed as d 1 P y    dx  y  f  x     y      1 f  f   f  f 1y     Choosing f  x    2 ln x we obtain P y   y e    21 y 2  Therefore generate two uniform random numbers x and x, set r   2 ln x1,   2 x2 y1, then r cos , y 2  r sin are two independent Gaussian distributed random numbers 37 NTU, Taipei, Taiwan; 10 September, 1999 Gaussian generators  Even better methods exist  the rectangle-wedge-tail (RWT) method  the ziggurat method  these do not require special function evaluations  they can be more interesting to implement for parallel computers 38 NTU, Taipei, Taiwan; 10 September, 1999 Local heatbath  For pure gauge theories the field variables live on the links of the lattice and take their values in a representation of the gauge group  for SU(2) Creutz gave an exact local heatbath algorithm – it requires a rejection test: this is different from the Metropolis accept/reject step in that one must continue generating candidate group elements until one is accepted  for SU(3) the “quasi-heatbath” method of Cabibbo and Marinari is widely used – update a sequence of SU(2) subgroups – this is not quite an SU(3) heatbath method… – … but sweeping through the lattice updating SU(2) subgroups is also a valid algorithm, as long as the entire sweep is ergodic – for a higher acceptance rate there is an alternative SU(2) subgroup heatbath algorithm 39 NTU, Taipei, Taiwan; 10 September, 1999 Generalised Hybrid Monte Carlo  In order to carry out Monte Carlo computations including the effects of dynamical fermions we would like to find an algorithm which  update the fields globally – because single link updates are not cheap if the action is not local  take large steps through configuration space – because small-step methods carry out a random walk which leads to critical slowing down with a dynamical critical exponent z=2  does not introduce any systematic errors 40 NTU, Taipei, Taiwan; 10 September, 1999 Generalised Hybrid Monte Carlo  A useful class of algorithms with these properties is the (Generalised) Hybrid Monte Carlo (GHMC) method  introduce a “fictitious momentum” p corresponding to each dynamical degree of freedom q  find a Markov chain with fixed point  exp[-H(q,p)] where H is the “fictitious Hamiltonian” ½ p + S(q) – the action S of the underlying QFT plays the rôle of the potential in the “fictitious” classical mechanical system – this gives the evolution of the system in a fifth dimension, “fictitious” or computer time  this generates the desired distribution exp[-S(q)] if we ignore the momenta q (i.e., the marginal distribution) 41 NTU, Taipei, Taiwan; 10 September, 1999 Generalised Hybrid Monte Carlo  The GHMC Markov chain alternates two Markov steps  Molecular Dynamics Monte Carlo (MDMC)  Partial Momentum Refreshment   42 both have the desired fixed point together they are ergodic NTU, Taipei, Taiwan; 10 September, 1999 MDMC  If we could integrate Hamilton’s equations exactly we could follow a trajectory of constant fictitious energy  this corresponds to a set of equiprobable fictitious phase space configurations  Liouville’s theorem tells us that this also preserves the functional integral measure dp dq as required  Any approximate integration scheme which is reversible and area preserving may be used to suggest configurations to a Metropolis accept/reject test  with acceptance probability min[1,exp(-H)] 43 NTU, Taipei, Taiwan; 10 September, 1999 MDMC  We build the MDMC step out of three parts  Molecular Dynamics (MD), an approximate integrator U   : q, p   q , p which is exactly       q , p  – area preserving det U *  det   1    q , p   – reversible F  U    F  U    1  a momentum flip F : p   p  a Metropolis accept/reject step  The composition of these gives  q     F  U   e H  y  1 y  e H  p        q     p     with y being a uniformly distributed random number in [0,1] 44 NTU, Taipei, Taiwan; 10 September, 1999 Partial Momentum Refreshment  This mixes the Gaussian distributed momenta p with Gaussian noise   p   cos          sin    sin   p    F  cos       The Gaussian distribution of p is invariant under F  the extra momentum flip F ensures that for small  the momenta are reversed after a rejection rather than after an acceptance  for  =  / 2 all momentum flips are irrelevant 45 NTU, Taipei, Taiwan; 10 September, 1999 Generalised Hybrid Monte Carlo  Special cases  the Hybrid Monte Carlo (HMC) algorithm is the special case where  =  / 2   = 0 corresponds to an exact version of the Molecular Dynamics (MD) or Microcanonical algorithm (which is in general non-ergodic)  the L2MC algorithm of Horowitz corresponds to choosing arbitrary  but MDMC trajectories of a single leapfrog integration step ( = ). This method is also called Kramers algorithm.  The Langevin Monte Carlo algorithm corresponds to choosing  =  / 2 and MDMC trajectories of a single leapfrog integration step ( = ). 46 NTU, Taipei, Taiwan; 10 September, 1999 Generalised Hybrid Monte Carlo  Further special cases  the Hybrid and Langevin algorithms are approximations where the Metropolis step is omitted  the Local Hybrid Monte Carlo (LHMC) or Overrelaxation algorithm corresponds to updating a subset of the degrees of freedom (typically those living on one site or link) at a time 47 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators  Baker-Campbell-Hausdorff (BCH) formula  if A and B belong to any (non-commutative) algebra then e AeB  e AB  , where  constructed from commutators of A and B (i.e., is in the Free Lie Algebra generated by {A,B})   A B  more precisely, ln e e   cn where c1  A  B and n 1  n 2  B 1  1 2m cn 1   2 cn , A  B     ck1 , , ck2m , A  B    n  1 2 m ! k1,,k 2 m 1 m 0 k1  k 2 m n     48        NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators  explicitly, the first few terms are   1 A, A, B   B, A, B   1 B, A, A, B  ln e Ae B  A  B  21 A, B   12 24   A, A, A, A, B   4B, A, A, A, B   1  6A, B , A, A, B   4B, B, A, A, B     720     2A, B , B, A, B   B, B, B, A, B      in order to construct reversible integrators we use symmetric symplectic integrators  the following identity follows directly from the BCH formula 1 A, A, B   2B, A, B  ln e A 2e B e A 2  A  B  24  7A, A, A, A, B   28B, A, A, A, B   1  12A, B , A, A, B   32B, B, A, A, B     5760     16A, B , B, A, B   8B, B, B, A, B      49  NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators  We are interested in finding the classical trajectory in phase space of a system described by the Hamiltonian H q, p   T  p   S q   21 p 2  S q   the basic idea of such a symplectic integrator is to write the time evolution operator as  dp  dq     d exp    exp      dt    dt p dt q     H  H    Hˆ  exp     e     q p p q         exp   Sq   T  p    p q     50 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators  define Q  T  p       P   S q and so that Hˆ  P  Q q p  since the kinetic energy T is a function only of p and the potential energy S is a function only of q, it follows that the Q P action of e and e may be evaluated trivially eQ : f q, p   f q  T  p , p  eP : f q, p   f q, p  S q  51 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators  from the BCH formula we find that the PQP symmetric symplectic integrator is given by U0        e  1 P 2 e Q e 1 P 2      expP Q  P,P,Q2Q,P,Q 1 24 3 O 5    1 P, P,Q   2Q, P,Q   exp  P  Q   24  2  O 4   eH   e P Q   O 2  ˆ  in addition to conserving energy to O(²) such symmetric symplectic integrators are manifestly area preserving and reversible 52 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators  For each symplectic integrator there exists a Hamiltonian H’ which is exactly conserved.  For the PQP integrator we have H   H   ˆ  H   p q q p 1 P, P,Q   2Q, P,Q  2  Q  P  24  32Q, Q, P, P,Q   16P,Q , Q, P,Q     4 1  5760  28Q, P, P, P,Q   12P,Q , P, P,Q   O  6   8Q, Q, Q, P,Q   7P, P, P, P,Q       53 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators  substituting in the exact forms for the operations P and Q we obtain the vector field H   H   ˆ H   p q q p           p  S   121 2 pS  SS  p 2S  2 p  q p   q       3 4  2  3SS p 4  p S  6 S  q    4 4 5  6 1   p S  720     O     4 2      30SS  6SS p   p     2    3S 2S  SS    54      NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators  solving these differential equations for H’ we find   SS  2S   3S S 1 2 p 2S   S 2  2 H   H  24  1  p 4S  4   6 p 2  720 2 2 4    O  6  note that H’ cannot be written as the sum of a p-dependent kinetic term and a q-dependent potential term 55 NTU, Taipei, Taiwan; 10 September, 1999 Langevin algorithm  Consider the Hybrid Monte Carlo algorithm when take only one leapfrog step.  combining the leapfrog equations of motion we obtain S 2      S S               2          12  ignore the Monte Carlo step   recall that for   the fictitious momenta  are just 2 Gaussian distributed random noise  S  which is the  let    , then         12 usual form of the Langevin equation  2 56 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms  What happens if we omit the Metropolis test?  we still have an ergodic algorithm, so there is some unique fixed point distribution eS S  which will be generated by the algorithm  the condition for this to be a fixed point is  S    S     S   S   e   d e  d e  12 2     U     for the Langevin algorithm (which corresponds to  = , a single leapfrog step) we may expand in powers of  and find a n solution for S  n Sn  the equation determining the leading term in this expansion is the Fokker-Planck equation; the general equations are sometimes known as the Kramers-Moyal equations 57 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms        in the general case we may change variables to    U           whence we obtain e S    S      d d e H  ,  S      U    d d det U* e area preservati on : det U*  1 reversibility : U 1  F  U  F     U   H  S U 1 H F  H S  F  S H  H  U  1 F2  1 S  S  U  1   d d e H e H  S U F        e S    S     d e e  H S  58   21 2 e H S  1 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms  since H is extensive so is H, and thus so is the connected generating function (cumulants)  F   H  ,  e  e   We can thus show order by order in  that S is extensive too 59 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms  For the Langevin algorithm we have H Sn        O  4 and    O  2 , so we immediately find that S  O  2   2S  S 2     81 2S j j  S 2j S2  81  2 2    j   j   j    S4 i  481 Sijjkk  2SijjkSk  SijkS j Sk  2SijkSijk    – if S is local then so are all the Sn – we only have an asymptotic expansion for S, and in general the subleading terms are neither local nor extensive – therefore, taking the continuum limit a  0 limit for fixed  will probably not give exact results for renormalised quantities 60 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms H   O 2  and Sn so again we find S  O  2   For the Hybrid algorithm     O  0 – we have made use of the fact that H’ is conserved and differs from H by O  2  61 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms  What effect do such systematic errors in the distribution have on measurements?  large errors will occur where observables are discontinuous functions of the parameters in the action, e.g., near phase transitions  if we want to improve the action so as to reduce lattice discretisation effects and get a better approximation to the underlying continuum theory, then we have to ensure that  2 «an for the appropriate n  step size errors need not be irrelevant 62 NTU, Taipei, Taiwan; 10 September, 1999 Local HMC  Consider the Gaussian model defined by the free field action d  2 2 2 1 S    2       x   m   x  1   introduce the Hamiltonian H  21  2  S   on “fictitious” phase space. – the corresponding equations of motion for the single site x x   2x  Fx where  2  2d  m2 and F   x y 1 y   the solution in terms of the Gaussian distributed random initial momentum  x and the initial field value  x is  x t    x cos   63 1  cos  2 F x sin  NTU, Taipei, Taiwan; 10 September, 1999 Local HMC  identify   1 cos  and  x   to get the usual Adler overrelaxation update  2    F    x   1    x   2      For gauge theories various overrelaxation methods have been suggested  Hybrid Overrelaxation: this alternates a heatbath step with many overrelaxation steps with  = 2  LHMC: this uses an analytic solution for the equations of motion for the update of a single U(1) or SO(2) subgroup at a time. In this case the equations of motion may be solved in terms of elliptic functions 64 NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds  Gauge fields take their values in some Lie group, so we need to define classical mechanics on a group manifold which preserves the group-invariant Haar measure  a Lie group G is a smooth manifold on which there is a natural mapping L: G  G  G defined by the group action  this induces a map called the pull-back of L on the cotangent bundle defined by L* : G  F  F L*g f  f  Lg L* : G  TG  TG L* : G  T *G  T *G Lg *v f   v L*g f  L*g v    Lg *v  – F is the space of 0 forms, which are smooth mappings from G to the real numbers , 65 NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds *  a form is left invariant if L     the tangent space to a Lie group at the origin is called the Lie algebra, and we may choose a set of basis vectors ei 0  which satisfy the commutation relations ei , e j    cijk ek where k c ijk are the structure constants of the algebra  we may define a set of left invariant vector fields on TG by ei g   Lg *ei 0  the corresponding left invariant dual forms  i satisfy the i i j k Maurer-Cartan equations d   21  c jk    jk  We may therefore define a closed symplectic 2 form which globally defines an invariant Poisson bracket by   d  i p i    i  dp i  p i d i     i  dp i  21 p i c ijk j   k  i 66 i i NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds  We may now follow the usual procedure to find the equations of motion  introduce a Hamiltonian function (0 form) H on the cotangent bundle (phase space) over the group manifold  define a vector field h such that dH y    h, y  y  TG  the classical trajectories  t  Qt , Pt  are then the integral curves of h:  t  hst   in the natural basis we have   i i   i      h    h i ei  h y  y e  y  i  i  i  p  p  i  i   i i  H i  i i i i i j k dH y     ei H y  i y    h, y     h y  y h  p  c jk h y  p i  i  jk   67 NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds  equating coefficients of the components of y we find  H     k k H  h    i ei   c ji p  ei H  i  j p i  p  jk  p   the equations of motion in the local coordinate basis e j   e ij  j are therefore j q  H j j j i i H k H     Qt   i ei , Pt     ckj Pt  ej k k  p q  i p k  i   2  which for a Hamiltonian of the form H  f p  S q reduce to H H Q tj   i eij , Pt j   e kj q k i p k 68 NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds  q iTi  In terms of constrained variables U q   e i  the representation of the generators is U g  Ti   ei g U g  g 0 i g g 0  from which it follows that ei U   UTi   i  and for the Hamiltonian H  21  p  S U  leads to the i equations of motion U  PU  S  S †  UTi ab  † Ti U ab   T SU U P  Ti  Uab iab  U ab  2      for the case G = SU(n) the operator T is the projection onto traceless antihermitian matrices 69 NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds  Discrete equations of motion  we can now easily construct a discrete PQP symmetric integrator from these equations   P 21    P 0   T S U 0 U 0  21  U    exp P 21   U 0    P    P 21    T S U  U   21   the exponential map from the Lie algebra to the Lie group may be evaluated exactly using the Cayley-Hamilton theorem – all functions of an n  n matrix M may be written as a polynomial of degree n - 1 in M – the coefficients of this polynomial can be expressed in terms of the invariants (traces of powers) of M 70 NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions  Fermion fields are Grassmann valued  required to satisfy the spin-statistics theorem  even “classical” Fermion fields obey anticommutation relations  Grassmann algebras behave like negative dimensional manifolds 71 NTU, Taipei, Taiwan; 10 September, 1999 Grassmann algebras  Grassmann algebras  linear space spanned by generators {,,,…} with coefficients a, b,… in some field (usually the real or complex numbers)  algebra structure defined by nilpotency condition for elements of the linear space ² = 0 – there are many elements of the algebra which are not in the linear space (e.g., ) – nilpotency implies anticommutativity  +  = 0 • 0 = ² = ² = ( + )² = ² +  +  + ² =  +  = 0 – anticommutativity implies nilpotency 2² = 0 (unless the coefficient field has characteristic 2, i.e., 2 = 0) 72 NTU, Taipei, Taiwan; 10 September, 1999 Grassmann algebras  Grassmann algebras have a natural grading corresponding to the number of generators in a given product – deg(1) = 0, deg() = 1, deg() = 2,... – all elements of the algebra can be decomposed into a sum of terms of definite grading   – the parity transform of  is P     1deg   – a natural antiderivation is defined on a Grassmann algebra • linearity: d(a + b) = a d + b d • anti-Leibniz rule: d() = (d) + P()(d)  definite integration on a Grassmann algebra is defined to be the same as derivation – hence change of variables leads to the inverse Jacobian   i     d1 d n   d1 d n det      j 73 NTU, Taipei, Taiwan; 10 September, 1999 Grassmann algebras  Gaussians over Grassmann manifolds – like all functions, Gaussians are polynomials over Grassmann algebras  i aij j a e ij   e i ij j   1   i aij j  ij ij – there is no reason for this function to be positive even for real coefficients 74 NTU, Taipei, Taiwan; 10 September, 1999 Grassmann algebras  Gaussian integrals over Grassmann manifolds  d 1  d  n e   i aij j ij   Pf aij  – where Pf(a) is the Pfaffian • Pf(a)² = det(a) – if we separate the Grassmann variables into two “conjugate” sets then we find the more familiar result  d1 d nd 1 d n e   i aij j ij   det aij  – despite the notation, this is a purely algebraic identity – it does not require the matrix a > 0, unlike the bosonic analogue 75 NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions  Pseudofermions  direct simulation of Grassmann fields is not feasible – the problem is not that of manipulating anticommuting values in a computer – it is that e F  e  M is not positive, and thus we get poor importance sampling S  we therefore integrate out the fermion fields to obtain the fermion determinant  d d e  M  det M  –  and  always occur quadratically – the overall sign of the exponent is unimportant 76 NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions  any operator  can be expressed solely in terms of the bosonic fields      M        , , e         0 – e.g., the fermion propagator is G ( x, y )    x  y   M 1( x, y ) 77 NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions  including the determinant as part of the observable to be det M     S B measured is not feasible   det M   S B – the determinant is extensive in the lattice volume, thus again we get poor importance sampling  represent the fermion determinant as a bosonic Gaussian  M 1    integral with a non-local kernel det M     d  d e  the fermion kernel must be positive definite (all its eigenvalues must have positive real parts) otherwise the bosonic integral will not converge  the new bosonic fields are called “pseudofermions” 78 NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions  it is usually convenient to introduce two flavours of fermion and 1 † 2   †   M M  to write det M    det M  M     d  d e    this not only guarantees positivity, but also allows us to generate the pseudofermions from a global heatbath by applying M † to a random Gaussian distributed field  the equations for motion for the boson (gauge) fields are    1 S        B   † M †M       1 †  1 SB   † † †    M M   M M M M        the evaluation of the pseudofermion action and the corresponding force then requires us to find the solution of a 1 (large) set of linear equations M †M       79    NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions  It is not necessary to carry out the inversions required for the equations of motion exactly  there is a trade-off between the cost of computing the force and the acceptance rate of the Metropolis MDMC step  The inversions required to compute the pseudofermion action for the accept/reject step does need to be computed exactly, however  we usually take “exactly” to by synonymous with “to machine precision” – this may well not be true, as we will discuss later 80 NTU, Taipei, Taiwan; 10 September, 1999 Krylov solvers  One of the main reasons why dynamical fermion lattice calculations are feasible is the existence of very effective numerical methods for solving large sparse systems of linear equations  family of iterative methods based on Krylov spaces – Conjugate Gradients (CG, CGNE) – BiConjugate Gradients (BiCG, BiCGstab, BiCG)  these are often introduced as exact methods – they require O(V) iterations to find the solution – they do not give the exact answer in practice because of rounding errors – they are more naturally thought of as methods for solving systems of linear equations in an (almost) -dimensional linear space • this is what we are approximating on the lattice anyway 81 NTU, Taipei, Taiwan; 10 September, 1999 Krylov solvers  BiCG on a Banach space  we want to solve the equation Ax = b on a Banach space – this is a normed linear space – the norm endows the space with a topology – the linear space operations are continuous in this topology  we solve the system exactly in a Krylov subspace   K n  span b, Ab, A2b, A3 b,, An b  there is no concept of “orthogonality” in a Banach space, so we also need to introduce a dual Krylov space of linear functionals on the Banach space Kˆ n  span bˆ, A† bˆ, A†2bˆ, A†3 bˆ,, A†nbˆ    The adjoint of a linear operator is defined by A† bˆ  x   bˆ  Ax  x   82 NTU, Taipei, Taiwan; 10 September, 1999 Krylov solvers  for iteration n we choose a search direction pn 1  such that if we add a multiple of it to the previous solution xn 1  xn  pn 1  then the resulting residual satisfies rn 1  b  Ax n 1  rn  Apn 1  ker Kˆ n  we can then choose  such that rn 1  ker Kˆ n 1  it suffices to use the three term recurrence pn 1  rn  pn  pn 1  because Ap j  ker Kˆ n  j  n  2  this process converges in the norm topology (if the solution exists!)  the norm of the residual does not decrease monotonically 83 NTU, Taipei, Taiwan; 10 September, 1999 Krylov solvers  CG on Hilbert space  if our topological normed linear space has a sesquilinear inner product then it is called a Hilbert space  this allows us to define a natural mapping from vectors to linear functionals f : x  xˆ by xˆ y   x, y y †  the operator A is self adjoint if A  A , i.e., Ax, y  x, Ay x, y  if A is self adjoint then we can choose Kˆ n  f K n   this is the CG algorithm  the norm decreases monotonically in this case 84 NTU, Taipei, Taiwan; 10 September, 1999 Higher order integrators  Campostrini “wiggles”    Since U0    e H  R0 3  O  5 we are led to consider the Campostrini “wiggle” ˆ U0  U0   U0    e 2 H  R0 2   3  3  O  5 ˆ        this be adjusted to give an integration scheme correct to O  by choosing   3 2 5  the step size may be kept fixed by taking    2     to get integration schemes of arbitrarily high order, all of which are still reversible and area preserving, we use the recursive definition ˆ 2n 1 Un 1 n Un 1  n n Un 1 n   e 2  H  Rn 1 2   n  n 2n 1  O n2n 3  n n    with  n  2n 1 2 and  n   2   n  85 NTU, Taipei, Taiwan; 10 September, 1999 Higher order integrators  So why aren’t they used in practice?  consider the simplest leapfrog scheme for a single simple harmonic oscillator with frequency   for a single step we have 2  q t    q t     1  21        pt     1  3 1  1  2  pt          4 2  The eigenvalues of this matrix are 1     i 1  1 2 2 1 4   2 e   i cos1 1 21  2   for  > 2 the integrator becomes unstable 86 NTU, Taipei, Taiwan; 10 September, 1999 Higher order integrators  the orbits change from circles to hyperbolae  the energy diverges exponentially in  instead of oscillating  for bosonic systems H » 1 for such a large integration step size  for light dynamical fermions there seem to be a few “modes” which have a large force due to the small eigenvalues of the Dirac operator, and this force plays the rôle of the frequency   for these systems  is limited by stability rather than by the Metropolis acceptance rate 87 NTU, Taipei, Taiwan; 10 September, 1999 Instabilities  Some recent full QCD measurements on large lattices with quite light quarks  large CG residual: always unstable  small CG residual: unstable for  > 0.0075  intermediate CG residual: unstable for  > 0.0075 163  32 lattice,   0.1355,   5.2, cSW  2.0171 88 NTU, Taipei, Taiwan; 10 September, 1999 Acceptance rates  We can compute the average Metropolis acceptance rate Pacc  area preservation implies that e  H  1 1 1 1 H H   H  H            d  d  e  d  d  e  d  d  e e    Z Z Z  the probability distribution of H has an asymptotic expansion as the lattice volume V    H 1 1 H PH     d d e    H  ~ exp  Z 4 H 4 H   the average Metropolis acceptance rate is thus Pacc ~ erfc 21 89 H   erfc  1 8 H 2 2     NTU, Taipei, Taiwan; 10 September, 1999 Acceptance rates  90 The acceptance rate is a function of the variable x  V  4n  4 for the nth order Campostrini “wiggle” used to generate trajectories with  = 1, and x  V  4n  6 for  =  NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents  To a good approximation, the cost C of a Molecular Dynamics computation is proportional to the total time for which we have to integrate Hamilton’s equations.  the cost per independent configuration is then  C  1  2 A    note that the cost depends on the particular operator under consideration  the optimal trajectory lengths obtained by minimising the cost as a function of the parameters , , and  of the algorithm 91 NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents  While the cost depends upon the details of the implementation of the algorithm, the way that it scales with the correlation length  of the system is an intrinsic property of the algorithm; C   z where z is the dynamical critical exponent.  for local algorithms the cost is independent of the trajectory length, C  1  2 A, and thus minimising the cost is equivalent to minimising A  free field theory analysis is useful for understanding and optimising algorithms, especially if our results do not depend on the details of the spectrum 92 NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents  HMC  for free field theory we can show that choosing    and  =  / 2 gives z = 1  this is to be compared with z = 2 for constant   the optimum cost for HMC is thus  C  V V  4 1/ 4  /    V 5 / 4  for nth order Campostrini integration (if it were stable) the cost is  C  V V 93  1 4n  4 4n  4  /   4n 5  V 4n  4  NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents  LMC  for free field theory we can show that choosing  =  and  =  / 2 gives z = 2  the optimum cost for LMC is thus  C  V V  6 1/ 3  /  2  V 4 / 3 2  for nth order Campostrini integration (if it were stable) the cost is 2n  4 1  C  V V 4n  6 94  2n  3  /  2  V 2n 3 2 NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents  L2MC (Kramers)  in the approximation of unit acceptance rate we find that setting  =  and suitably tuning  we can arrange to get z = 1  however, if Pacc  1 then the system will carry out a random walk backwards and forwards along a trajectory because the momentum, and thus the direction of travel, must be reversed upon a Metropolis rejection. 95 NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents  a simple-minded analysis is that the average time between rejections must be O() to achieve z = 1  this time is approximately   n 0 Pacc n 1  Pacc  n   1  Pacc Pacc  for small  we have 1  Pacc  erf kV 6, hence we are required to scale  so as to keep V 4 2 fixed  This leads to a cost for L2MC of  C  V V 4 2  1/ 4  /    V 5 / 4 3 / 2  Or using nth order Campostrini integration  C  V V  1 4n  4 2 4n  4   /   4 n  5 2n  3  V 4 n  4 2n  2   A more careful free field theory analysis leads to the same conclusions 96 NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents  Optimal parameters for GHMC  (almost) analytic calculation in free field theory  minimum cost for GHMC appears for acceptance probability in the range 40%90%  very similar to HMC  minimum cost for L2MC (Kramers) occurs for acceptance rate very close to 1  and this cost is much larger 97 NTU, Taipei, Taiwan; 10 September, 1999 Reversibility  Are HMC trajectories reversible and area preserving in practice?  the only fundamental source of irreversibility is the rounding error caused by using finite precision floating point arithmetic – for fermionic systems we can also introduce irreversibility by choosing the starting vector for the iterative linear equation solver time-asymmetrically • we might do this if we want to use (some extrapolation of) the previous solution as the starting vector – floating point arithmetic is not associative – it is more natural to store compact variables as scaled integers (fixed point) • saves memory • does not solve the precision problem 98 NTU, Taipei, Taiwan; 10 September, 1999 Reversibility  data for SU(3) gauge theory and QCD with heavy quarks show that rounding errors are amplified exponentially – the underlying continuous time equations of motion are chaotic – Liapunov exponents characterise the divergence of nearby trajectories – the instability in the integrator occurs when H » 1 • zero acceptance rate anyhow 99 NTU, Taipei, Taiwan; 10 September, 1999 Reversibility  in QCD the Liapunov exponents appear to scale with  as the system approaches the continuum limit    –  = constant – this can be interpreted as saying that the Liapunov exponent characterises the chaotic nature of the continuum classical equations of motion, and is not a lattice artefact – therefore we should not have to worry about reversibility breaking down as we approach the continuum limit – caveat: data is only for small lattices, and is not conclusive 100 NTU, Taipei, Taiwan; 10 September, 1999 Reversibility  Data for QCD with lighter dynamical quarks – instability occurs close to region in  where acceptance rate is near one • may be explained as a few “modes” becoming unstable because of large fermionic force – integrator goes unstable if too poor an approximation to the fermionic force is used 163  32 lattice,   0.1355,   5.2, cSW  2.0171 101 NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC  Consider the update of a single site x using the LHMC algorithm F  2      x   1     x   2      the values of  at all other sites are left unchanged  this may be written in matrix form as M x yz   yz   1   xy   yz  2   y  ˆ ,z   y  ˆ ,z       2    Px yz   yz xy 102    M x  Px where  NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC  for a complete sweep through the lattice consisting of exactly V  Ld updates in some order x1, x2,, xV, we have   M x1  Px1  M x2 M x1  Px1   Px2    M  P  it is convenient to introduce the Fourier transformed fields ~ and momenta ~  the corresponding Fourier transformed single site update matrix is 2i  p  q  x    ~ Mx 103 pq   pq 1  e V L 2  2   2  cos q  1 L    NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC  we want to compute integrated autocorrelation functions for ~2 ~2 operators like the magnetic susceptibility   0  0 – the behaviour of the autocorrelations of linear operators such as the magnetisation ~0 can be misleading  in order to do this it is useful to consider quadratic operators as linear ~  ~ ~ , so operators on quadratic monomials in the fields,  pq ~   ~   00 00 p q – these quadratic monomials are updated according to ~   ~~  M~  P~ M~  P~  pq p q p q Q~ Q ~ – so, after averaging over the momenta we have   M   P  – where Ppq  Q  Ppr Pqr Q and M pq ,rs  M pr Mqs r 104 NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC  the integrated autocorrelation function for  is 1  A     t 0 t 0  C t      pq M   Q t  105    M t 0 ~  ~   pq 00 00, pq ~2   ~  00 00 t 0   t  0  2 Q t 00,00   1 M  Q 1 ~ ~    pq 00  2  00,00 NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC  Random updates  the update matrix for a sweep after averaging over the V independent choices of update sites is just V 1 ~ ~  1 ~  M Q     MzQi     MzQ  i 1 V zi 1  V z 1  V V V  we thus find that 1  A  ~  1 MQ  1  00,00  1 2m 2  2 1  e m  2d   d  1  1 0  O    O m  O   2 V  m   V   which tells us that z = 2 for any choice of the overrelaxation parameter  106 NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC  Even/Odd updates  in a single site update the new value of the field at x only depends at the old value at x and its nearest neighbours, so even sites depend only on odd sites and vice versa – the update matrix for a sweep in Fourier space thus reduces to a 2 × 2 matrix coupling ~p and ~p  1 L,,L  2 ~ Q – M thus becomes a 3 × 3 matrix in this case  the integrated autocorrelation function is d 2    7 2  16  8 4   A    O m 2 9 2    2m – this is minimised by choosing   2  A  – hence z = 1 107 d 1   Om  m 2   m  O m 2 , for which d NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC  Lexicographical updates  this is a scheme in which  x  ˆ is always updated before  x – except at the edges of the lattice  a single site update depends on the new values of the neighbouring sites in the + directions and the old values in the - directions newthe sweep update implicitly as    O  N   P  – so we may write – where Oxy  1    xy  Rxy  108 old  1  1 N    Rxy   R xy xy 2  x , y  ˆ 2  x , y  ˆ L L     updateable   x,y  ˆ  x 1   x,y  ˆ  x L     L 2    2    P   NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC  the sweep update matrices are easily found explicitly M  1  N  O P  1  N  P ~ ~  the Fourier transforms O and N are diagonal up to surface ~ terms R , which we expect to be suppressed by a factor of 1/L 1 1 m  2  2d    1  O    2   m 2 m 2  2d L m 2  2d – which is minimised by the choice   m2  d – for which A  0 and hence z = 0  we thus obtain 1  A  2    this can be achieved for most operators – though with different values of  – the dynamical critical exponent for the exponential autocorrelation time is still one at best 109 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation  What is the best polynomial approximation p(x) to a continuous function f(x) for x in [0,1] ?  Weierstrass’ theorem: any continuous function can be arbitrarily well approximated by a polynomial n  k  n n   p x  f – Bernstein polynomials: n   n   k  t (1  t )n k k 0     1n 1  n       p  f  dx p x  f x  minimise the appropriate norm n     0 where n  1  Chebyshev’s theorem – the error |p(x) - f(x)| reaches its maximum at exactly d+2 points on the unit interval – there is always a unique polynomial of any degree d which minimises p  f   max p x   f  x  0 x 1 110 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation – necessity: • suppose p - f has less than d + 2 extrema of equal magnitude • then at most d + 1 maxima exceed some magnitude • this defines a “gap” • we can construct a polynomial q of degree d which has the opposite sign to p - f at each of these maxima (Lagrange interpolation) • and whose magnitude is smaller than the “gap” • the polynomial p + q is then a better approximation than p to f 111 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation – sufficiency: • suppose there is a polynomial p  f  p  f    • then p  xi   f  xi   p xi   f  xi  at each of the d+2 extrema of px   f x  • therefore p’ - p must have d+1 zeros on the unit interval • thus p’ - p = 0 as it is a polynomial of degree d 112 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation  Convergence is often exponential in d  Example d – the best approximation of degree d-1 over [-1,1] to x is pd  x   x d  21  Td  x  1 – where the Chebyshev polynomials are Td  x   cos d cos  x  d 1   – the notation comes from a different transliteration of Chebyshev! – and the error is x  pd  x    21  d 1 d  Td  x    2e d ln 2 Chebyshev’s theorem is easily extended to rational approximations  rational functions with equal degree numerator and denominator are usually best  convergence is still often exponential  and rational functions usually give much better approximations 113 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation  a realistic example of a rational approximation is x  2.34756610 45 x  0.10483446 00 x  0.00730638 14  1  0.39046039 01 x  0.41059997 19 x  0.02861654 46 x  0.00127791 93  x – this is accurate to within almost 0.1% over the range [0.003,1]  using a partial fraction expansion of such rational functions allows us to use a multiple mass linear equation solver, thus reducing the cost significantly.  the partial fraction expansion of the rational function above is 1 0.05110937 75 0.14082862 37 0.59648450 33  0.39046039 01    x x  0.00127791 93 x  0.02861654 46 x  0.41059997 19 – this appears to be numerically stable. 114 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation 115 NTU, Taipei, Taiwan; 10 September, 1999 Multibosons  Chebyshev polynomial approximations were introduced by Lüscher in his Multiboson method  1  1 detM   det P M   det  M  i    i  † 1   det M  i     di e   M  i   i  i  no solution of linear equations is required  one must store n scalar fields  the dynamics of multiboson fields is stiff, so very small step sizes are required for gauge field updates 116 NTU, Taipei, Taiwan; 10 September, 1999 Reweighting  Making Lüscher’s method exact  one can introduce an accept/reject step  one can reweight the configurations by the ratio detM detP M  – this factor is close to one if P(M) is a good approximation  if one only approximates 1/x over the interval [,1] which does not cover the spectrum of M, then the reweighting factor may be large  it has been suggested that this will help to sample configurations which would otherwise be suppressed by small eigenvalues of the Dirac operator 117 NTU, Taipei, Taiwan; 10 September, 1999 Noisy methods  Since the GHMC algorithm is correct for any reversible and area-preserving mapping, it is often useful to use some modified MD process which is cheaper  accept/reject step must use the true Hamiltonian, of course  tune the parameters in the MD Hamiltonian to maximise the Metropolis acceptance rate  add irrelevant operators? – a different kind of “improvement” 118 NTU, Taipei, Taiwan; 10 September, 1999 Noisy inexact algorithms  Replace the force -S’ in the leapfrog equations of motion by some noisy estimator F̂ such that Fˆ  S   Choose the noise  independently for each leapfrog step  The equation for the shift in the fixed point distribution is now  H S  e  , 1   2  For the noisy Langevin algorithm we have H  O   , 2  H and Sn  ,  O  , but since e  O  4 we obtain   S  O  2    ,    For the noisy Hybrid algorithm H  O  and  , Sn  O  0 , so S  O   , 119   NTU, Taipei, Taiwan; 10 September, 1999 Noisy inexact algorithms  This method is useful for non-local actions, such as staggered fermions with nf  4 flavours  here the action may be written as  1  nf tr M M   1 4 nf tr ln M, and the force is – a cheap way of estimating such a trace noisily is to use the fact that tr Q  ij iQij j  – there is no need to use Gaussian noise for this estimator — indeed Z noise might well be better  using a non area preserving irreversible update step the noisy Hybrid algorithm can be adjusted to have H  ,  O  2 , and thus S  O  2 too     • this is the Hybrid R algorithm  Campostrini’s integration scheme may be used to produce higher order Langevin and Hybrid algorithms with S  O  2n for arbitrary n, but this is not applicable to the noisy versions  120  NTU, Taipei, Taiwan; 10 September, 1999 Kennedy–Kuti algorithm  Noise without noise  suppose it is prohibitively expensive to evaluate R  Q  Q,  but that we can compute an unbiased estimator for it cheaply, Rˆ    R  if we look carefully at the proof that the Metropolis algorithm satisfies detailed balance we see that the ratio R is used for two quite different purposes – it is used to give an ordering to configurations: ’ <  if R < 1, that is, if Q(’) < Q() – it is used as the acceptance probability if ’ <   we can produce a valid algorithm using the noisy estimator R̂ for the latter rôle just by choosing another ordering for the configurations – A suitable ordering is provided by any cheap function f such that the set of configurations for which f(’) = f() has measure zero 121 NTU, Taipei, Taiwan; 10 September, 1999 Kennedy–Kuti algorithm  we now define the acceptance probability as P        Rˆ  f    f      Rˆ  f    f   – where  are to be chosen so that 0  P      1 • taking   21 ,   0 is often a convenient choice – detailed balance is easily established by considering the cases f() > f(‘) and f() < f(‘) separately – in practice the condition 0  P      1 can rarely be satisfied exactly, but usually the number of violation can be made (exponentially) small 122 NTU, Taipei, Taiwan; 10 September, 1999 Noisy fermions  an interesting application is when we have a non-local fermionic determinant in the desired fixed point distribution, such as det M    exp tr lnM   – in this case we need to produce an unbiased estimator of det M    exp tr ln M  MM 1 R det M        exp tr ln 1  MM 1  exp tr ln1  Q 123 NTU, Taipei, Taiwan; 10 September, 1999 Noisy fermions   it is easy to construct unbiased estimators for tr Q n  – let  be a vector of random numbers whose components are chosen independently from a distribution with mean zero and variance one (Gaussian or Z noise, for example) ˆn  – set Q n  ij  j, then Qˆ n   Q ij i    tr Q n  ˆ n  are independent, then – furthermore, if Q̂n and Q Qˆ nQˆ n      tr Q n tr Q n   as long as all the eigenvalues of Q lie within the unit circle then  1 n  tr ln1   Q    tr   Q n 1n  – similarly the exponential can be expanded as a power series 124 NTU, Taipei, Taiwan; 10 September, 1999 Bhanot–Kennedy algorithm  In order to obtain an unbiased estimator for R we sum these series stochastically   suppose f  x   1  n 1fn x n where 0  pn  fn 1 fn  1 – our series can be transformed into this form  this can be written in “factored form” as    f  x   1  p0 x  p1 x 2  p2 x 3    and may be summed stochastically using the following scheme k 2 fˆ x   1 p0  x  p1  x  p2   pk 1  x pk  1 p0 1 p1 1 p2 1 pk 125 NTU, Taipei, Taiwan; 10 September, 1999 “Kentucky” algorithm  This gets rid of the systematic errors of the Kennedy—Kuti method by eliminating the violations  promote the noise to the status of dynamical fields – Suppose that we have an unbiased estimator Rˆ  ,     d  p Rˆ  ,   detM   1 1 S   S   ˆ         d  e det M       d  e R  ,       Z Z 1   d d e S   p Rˆ  ,   Z 1   d d e S   p  Rˆ  ,  sgnRˆ  ,   Z  sgnRˆ  ,      , 126 NTU, Taipei, Taiwan; 10 September, 1999 “Kentucky” algorithm  the  and  fields can be updated by alternating Metropolis Markov steps  there is a sign problem only if the Kennedy—Kuti algorithm would have many violations  if one constructs the estimator using the Kennedy—Bhanot algorithm then one will need an infinite number of noise fields in principle – will these ever reach equilibrium? 127 NTU, Taipei, Taiwan; 10 September, 1999 Cost of noisy algorithms  Inexact algorithms  these have only a trivial linear volume dependence, with z = 1 for Hybrid and z = 2 for Langevin  Noisy inexact algorithms  The noisy trajectories deviate from the true classical trajectory by a factor of 1  O  for each step, or 1  O   for a trajectory of   steps – This will not affect the integrated autocorrelation function as long as   «1 – Thus the noisy Langevin algorithm should have C  V 2 – Thus the noisy Hybrid algorithm should have C  V      V 2 128 NTU, Taipei, Taiwan; 10 September, 1999 Cost of noisy algorithms  Noisy exact algorithms  these algorithms use noisy estimators to produce an (almost) exact algorithm which is applicable to non-local actions – a straightforward approach leads to a cost of 2 C  V V 2     V 2 2 for exact noisy Langevin   • it is amusing to note that this algorithm should not care what force term is used in the equations of motion – an exact noisy Hybrid algorithm is also possible, and for it C  V V     V 2 129 NTU, Taipei, Taiwan; 10 September, 1999 Cost of noisy algorithms  these results apply only to operators (like the magnetisation) which couple sufficiently strongly to the slowest modes of the system. For other operators, like the energy in 2 dimensions, we can even obtain z = 0  Summary  too little noise increases critical slowing down because the system is too weakly ergodic  too much noise increases critical slowing down because the system takes a drunkard’s walk through phase space  to attain z = 1 for any operator (and especially for the exponential autocorrelation time) one must be able to tune the amount of noise suitably 130 NTU, Taipei, Taiwan; 10 September, 1999 PHMC  Polynomial Hybrid Monte Carlo algorithm: instead of using Chebyshev polynomials in the multiboson algorithm, Frezzotti & Jansen and deForcrand suggested using them directly in HMC  polynomial approximation to 1/x are typically of order 40 to 100 at present  numerical stability problems with high order polynomial evaluation  polynomials must be factored  correct ordering of the roots is important  Frezzotti & Jansen claim there are advantages from using reweighting 131 NTU, Taipei, Taiwan; 10 September, 1999 RHMC  Rational Hybrid Monte Carlo algorithm: the idea is similar to PHMC, but uses rational approximations instead of polynomial ones  much lower orders required for a given accuracy  evaluation simplified by using partial fraction expansion and multiple mass linear equation solvers  1/x is already a rational function, so RHMC reduces to HMC in this case  can be made exact using noisy accept/reject step 132 NTU, Taipei, Taiwan; 10 September, 1999 Ginsparg–Wilson fermions  Is it possible to have chiral symmetry on the lattice without doublers if we only insist that the symmetry holds on shell?  such a transformation should be of the form (Lüscher)  5 1 21aD   e  5 1 21aD   ;  e  for it to be a symmetry the Dirac operator must be invariant  1 21aD  5  5 1 21aD  De De D  for a small transformation this implies that 1  21 aD  5D  D 5 1  21 aD   0  which is the Ginsparg-Wilson relation  5D  D 5  aD 5D 133 NTU, Taipei, Taiwan; 10 September, 1999 Neuberger’s operator  We can find a solution of the Ginsparg-Wilson relation as follows  let the lattice Dirac operator to be of the form aD  1   5ˆ5 ; ˆ5†  ˆ5 ; aD†  1  ˆ5 5   5aD 5 2  this satisfies the GW relation if ˆ5  1  and it must also have the correct continuum limit D    m    ˆ5   5 a  m   1  O a 2  both of these conditions are satisfied if we define (Neuberger) ˆ5   5 134 DW DW  1  1 DW  1 †  sgn 5 Dw  1 NTU, Taipei, Taiwan; 10 September, 1999 Neuberger’s operator    There are many other possible solutions The discontinuity is necessary This operator is local (Lüscher)  at least if we constrain the fields such that the plaquette < 1/30  by local we mean a function of fast decrease, as opposed to ultralocal which means a function of compact support 135 NTU, Taipei, Taiwan; 10 September, 1999 Computing Neuberger’s operator  Use polynomial approximation to Neuberger’s operator  high degree polynomials have numerical instabilities  for dynamical GW fermions this leads to a PHMC algorithm  Use rational approximation  requires solution of linear equations just to apply the operator  for dynamical GW fermions this leads to an RHMC algorithm  requires nested linear equation solvers in the dynamical case  nested solvers can be avoided at the price of a much more illconditioned system  attempts to combine inner and outer solves in one Krylov space method 136 NTU, Taipei, Taiwan; 10 September, 1999 Computing Neuberger’s operator  Extract low-lying eigenvectors explicitly, and evaluate their contribution to the Dirac operator directly  efficient methods based on Ritz functional  very costly for dynamical fermions if there is a finite density of zero eigenvalues in the continuum limit (Banks-Casher)  might allow for noisy accept/reject step if we can replace the step function with something continuous (so it has a reasonable series expansion)  Use better approximate solution of GW relation instead of Wilson operator  e.g., a relatively cheap “perfect action” 137 NTU, Taipei, Taiwan; 10 September, 1999

presentation source

Related documents

Products

Support

presentation source

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib