Monte Carlo Methods for Quantum Field Theory A D Kennedy adk@ph.ed.ac.uk Maxwell Institute, the University of Edinburgh NTU, Taipei, Taiwan; 10 September, 1999 Functional Integrals and QFT The Expectation value of an operator is defined non-perturbatively by the Functional Integral 1 S ( ) d e ( ) Z Normalisation constant Z chosen such that The action is S() Defined in Euclidean space-time Lattice regularisation d is the appropriate functional measure 1 1 d d x Continuum limit: lattice spacing a 0 Thermodynamic limit: physical volume V 2 x NTU, Taipei, Taiwan; 10 September, 1999 Monte Carlo methods Monte Carlo integration is based on the identification of probabilities with measures There are much better methods of carrying out low dimensional quadrature all other methods become hopelessly expensive for large dimensions in lattice QFT there is one integration per degree of freedom – we are approximating an infinite dimensional functional integral 3 NTU, Taipei, Taiwan; 10 September, 1999 Monte Carlo methods Generate a sequence of random field configurations 1,2 , ,t , , N chosen from the probability distribution 1 S (t ) P(t ) dt e dt Z Measure the value of on each configuration and compute the average 1 N (t ) N t 1 4 NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem Law of Large Numbers lim Central Limit Theorem O N where the variance of the distribution of is C2 N C2 2 The Laplace–DeMoivre Central Limit theorem is an asymptotic expansion for the probability distribution of Distribution of values for a single sample = () P ( ) d P 5 NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem Generating function for connected moments W (k ) ln d P ( ) e ln d P e ik ik ln e ik n 0 ik n n! Cn The first few cumulants are C0 0 C3 C1 C4 C2 3 4 3C22 2 Note that this is an asymptotic expansion 6 NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem Distribution of the average of N samples 1 N P d1 d N P1 P N t N t 1 Connected generating function W (k ) ln d P ( ) eik ik N ln d1 d N P1 P N exp t N t 1 ik N N N ln e ln d P e n ik Cn k NW n 1 N n1 n! N 7 ik N NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem Take inverse Fourier transform to obtain distribution P 1 W k ik P dk e e 2 e C3 d 3 C4 d 4 2 3 3 4 3!N d 4!N d e 8 C3 d 3 C d4 4 2 3 3 4 3!N d 4!N d e dk 2 ik 21N ik C2 ik 2 e e 2 2 C2 N 2 C N 2 NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem 9 NTU, Taipei, Taiwan; 10 September, 1999 Central Limit Theorem Re-scale to show convergence to Gaussian distribution d P F d where N and 2 2C2 C3 3C2 e F 1 3 6C2 N 2C2 10 2 NTU, Taipei, Taiwan; 10 September, 1999 Importance Sampling Integral I dx f (x) Sample from distribution 0 ( x) 1 Normalisation N dx (x) 1 Probability Estimator of integral Estimator of variance f ( x) I ( x) dx ( x) 2 f ( x) f ( x) V ( x)dx I dx I2 ( x) ( x) 11 2 NTU, Taipei, Taiwan; 10 September, 1999 Importance Sampling Minimise variance Constraint N=1 Lagrange multiplier (V N ) f ( y)2 0 2 ( y ) ( y) f ( x) Optimal measure ( x ) opt dx f ( x) 12 Optimal variance Vopt dx f ( x) dx f ( x) 2 2 NTU, Taipei, Taiwan; 10 September, 1999 2 0 Importance Sampling Example: f ( x) sin( x) Optimal weight ( x) 12 sin( x) Optimal variance 2 Vopt 13 2 2 dx sin x 2 dx sin( x) dx sin( x) 16 0 0 NTU, Taipei, Taiwan; 10 September, 1999 Importance Sampling 2 0 dx sin x 1 Construct cheap approximation to |sin(x)| 2 Calculate relative area within each rectangle 3 Choose a random number uniformly 4 Select corresponding rectangle 5 Choose another random number uniformly 6 Select corresponding x value within rectangle 7 Compute |sin(x)| 14 NTU, Taipei, Taiwan; 10 September, 1999 2 0 Importance Sampling dx sin x With 100 rectangles we have V = 16.02328561 … but we can do better! 2 2 0 0 I dx sin( x) sin( x) dx sin( x) sin( x) Vopt 0 For which With 100 rectangles we have V = 0.011642808 15 NTU, Taipei, Taiwan; 10 September, 1999 Random number generators Pseudorandom number generators Random numbers used infrequently in Monte Carlo for QFT – compared to spin models Linear congruential generator xn1 axn b mod m – for suitable choice of a, b, and m (see, e.g., Donald Knuth, Art of Computer Programming) – usually chose b = 0 and m = power of 2 – seem to be good enough in practice – problems for spin models if m is too small a multiple of V – parallel implementation trivial xnV axn b mod m a aV mod m b V 1 j a b mod m j 0 16 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains State space (Ergodic) stochastic transitions P’: Deterministic evolution of probability distribution P: Q Q Distribution converges to unique fixed point Q 17 NTU, Taipei, Taiwan; 10 September, 1999 Proof of convergence of Markov chains Define a metric d Q1, Q2 dx Q1 x Q2 x on the space of (equivalence classes of) probability distributions Prove that d PQ1, PQ2 1 d Q1, Q2 with > 0, so the Markov process P is a contraction mapping The sequence Q, PQ, P²Q, P³Q,… is Cauchy The space of probability distributions is complete, so the sequence converges to a unique fixed point Q lim P nQ n 18 NTU, Taipei, Taiwan; 10 September, 1999 Proof of convergence of Markov chains d PQ1, PQ2 dx PQ1 x PQ2 x dx dy P x y Q1 y dy P x y Q2 y Q y Q1 y Q2 y dx dy P x y Q y y y 1 dx dy P x y Q y Q y Q y a b a b 2 min a , b dx dy P x y Q y 2 dx min dy P x y Q y Q y 19 NTU, Taipei, Taiwan; 10 September, 1999 Proof of convergence of Markov chains d PQ1, PQ2 dx P x y 1 dy Q y 2 dx min dy P x y Q y Q y dy Q y 2 dx inf P x y min dy Q y Q y y dy Q y Q y dy Q y Q y dy Q y dy Q1 y dy Q2 y 1 1 0 dy Q y Q y 12 dy Q y dy Q y dx inf P x y dy Q y d Q1,Q2 y 0 1 dx inf P x y 1 y 20 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains Use Markov chains to sample from Q Suppose we can construct an ergodic Markov process P which has distribution Q as its fixed point start with an arbitrary state (“field configuration”) iterate the Markov process until it has converged (“thermalized”) thereafter, successive configurations will be distributed according to Q – but in general they will be correlated to construct P we only need relative probabilities of states – don’t know the normalisation of Q – cannot use Markov chains to compute integrals directly – we can compute ratios of integrals 21 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains How do we construct a Markov process with a specified fixed point? Q x dy P x y Q y Detailed balance P y x Q x P x y Q y integrate w.r.t. y to obtain fixed point condition sufficient but not necessary for fixed point Q x Metropolis algorithm P x y min 1, Q y consider cases Q(x)>Q(y) and Q(x)<Q(y) separately to obtain detailed balance condition sufficient but not necessary for detailed balance Q x other choices are possible, e.g., P x y Q x Q y 22 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains Composition of Markov steps Let P and P be two Markov steps which have the desired fixed point distribution... … they need not be ergodic Then the composition of the two steps P P will also have the desired fixed point... … and it may be ergodic This trivially generalises to any (fixed) number of steps For the case where P is not ergodic but (P) is the terminology “weakly” and “strongly” ergodic are sometimes used 23 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains This result justifies “sweeping” through a lattice performing single site updates each individual single site update has the desired fixed point because it satisfies detailed balance the entire sweep therefore has the desired fixed point, and is ergodic... … but the entire sweep does not satisfy detailed balance of course it would satisfy detailed balance if the sites were updated in a random order… … but this is not necessary … and it is undesirable because it puts too much randomness into the system (more on this later) 24 NTU, Taipei, Taiwan; 10 September, 1999 Autocorrelations Exponential autocorrelations the unique fixed point of an ergodic Markov process corresponds to the unique eigenvector with eigenvalue 1 all its other eigenvalues must lie within the unit circle in particular, the largest subleading eigenvalue is max 1 this corresponds to the exponential autocorrelation time N exp 25 1 ln max 0 NTU, Taipei, Taiwan; 10 September, 1999 Autocorrelations Integrated autocorrelations consider the autocorrelation of some operator without loss of generality we may assume 26 2 0 1 N N 2 t t N t 1 t1 N 1 N 1 N 2 2 t 2 t t N t 1 t 1 t t 1 NTU, Taipei, Taiwan; 10 September, 1999 Autocorrelations define the autocorrelation function C 2 t t 1 2 2 N 1 N C 2 N N 1 2 the autocorrelation function must fall faster that the exponential autocorrelation C max e Nexp for a sufficiently large number of samples 2 N exp 1 2 C 1 O N 1 N define the integrated autocorrelation function 27 2 1 2 A N exp 1 O N N 2 A C 1 NTU, Taipei, Taiwan; 10 September, 1999 Continuum limits We are not interested in lattice QFTs per se, but in their continuum limit as a 0 this corresponds to a continuous phase transition of the discrete lattice model – for the continuum theory to have a finite correlation length a (inverse mass gap) in physical units this correlation length must diverge in lattice units we expect such systems to exhibit universal behaviour as they approach a continuous phase transition – the nature of the continuum theory is characterised by its symmetries – the details of the microscopic interactions are unimportant at macroscopic length scales – universality is a consequence of the way that the the theory scales as a 0 while a is kept fixed 28 NTU, Taipei, Taiwan; 10 September, 1999 Continuum limits the nature of the continuum field theory depends on the way that physical quantities behave as the system approaches the continuum limit the scaling of the parameters in the action required is described by the renormalisation group equation (RGE) – the RGE states the “reparameterisation invariance” of the theory as the choice of scale at which we choose to fix the renormalisation conditions – as a 0 we expect the details of the regularisation scheme (cut off effects, lattice artefacts) to vanish, so the effect of changing a is just an RGE transformation • on the lattice the a renormalisation group transformation may be implemented as a “block spin” transformation of the fields • strictly speaking, the renormalisation “group” is a semigroup on the lattice, as blockings are not invertible 29 NTU, Taipei, Taiwan; 10 September, 1999 Continuum limits the continuum limit of a lattice QFT corresponds to a fixed point of the renormalisation group – at such a fixed point the form of the action does not change under an RG transformation the parameters in the action scale according to a set of critical exponents all known four dimensional QFTs correspond to trivial (Gaussian) fixed points – for such a fixed point the UV nature of the theory may by analysed using perturbation theory – monomials in the action may be classified according to their power counting dimension • d < 4 relevant (superrenormalisable) • d = 4 marginal (renormalisable) • d > 4 irrelevant (nonrenormalisable) 30 NTU, Taipei, Taiwan; 10 September, 1999 Continuum limits The behaviour of our Markov chains as the system approaches a continuous phase transition is described by its dynamical critical exponents these describe how badly the system (model + algorithm) undergo critical slowing down the dynamical critical exponent z tells us how the cost of generating an independent configuration grows as the correlation length of the system is taken to , cost z this is closely related (but not always identical) to the dynamical critical exponent for the exponential or integrated autocorrelations 31 NTU, Taipei, Taiwan; 10 September, 1999 Markov chains “Coupling from the Past” Propp and Wilson (1996) Use fixed set of random numbers Flypaper principle: if states coalesce they stay together forever – Eventually, all states coalesce to some state with probability one – Any state from t = - will coalesce to – is a sample from the fixed point distribution 32 NTU, Taipei, Taiwan; 10 September, 1999 Global heatbath The ideal generator selects field configurations randomly and independently from the desired distribution it is hard to construct such global heatbath generators for any but the simplest systems they can be built by selecting sufficiently independent configurations from a Markov process… … or better yet using CFTP which guarantees that the samples are truly uncorrelated but this such generators are expensive! 33 NTU, Taipei, Taiwan; 10 September, 1999 Global heatbath For the purposes of Monte Carlo integration these is no need for the configurations to be completely uncorrelated we just need to take the autocorrelations into account in our estimate of the variance of the resulting integral using all (or most) of the configurations generated by a Markov process is more cost-effective than just using independent ones the optimal choice balances the cost of applying each Markov step with the cost of making measurements 34 NTU, Taipei, Taiwan; 10 September, 1999 Local heatbath 35 For systems with a local bosonic action we can build a Markov process with the fixed point distribution exp(-S) out of steps which update a single site with this fixed point If this update generates a new site variable value which is completely independent of its old value then it is called a local heatbath For free field theory we just need to generate a Gaussian-distributed random variable NTU, Taipei, Taiwan; 10 September, 1999 Gaussian generators If x1, x2,, xn are uniformly distributed random n numbers then n1 x j is approximately Gaussian j 1 by the Central Limit theorem This is neither cheap nor accurate 36 NTU, Taipei, Taiwan; 10 September, 1999 Gaussian generators If x is uniformly distributed and f is a monotonically increasing function then f(x) is distributed as d 1 P y dx y f x y 1 f f f f 1y Choosing f x 2 ln x we obtain P y y e 21 y 2 Therefore generate two uniform random numbers x and x, set r 2 ln x1, 2 x2 y1, then r cos , y 2 r sin are two independent Gaussian distributed random numbers 37 NTU, Taipei, Taiwan; 10 September, 1999 Gaussian generators Even better methods exist the rectangle-wedge-tail (RWT) method the ziggurat method these do not require special function evaluations they can be more interesting to implement for parallel computers 38 NTU, Taipei, Taiwan; 10 September, 1999 Local heatbath For pure gauge theories the field variables live on the links of the lattice and take their values in a representation of the gauge group for SU(2) Creutz gave an exact local heatbath algorithm – it requires a rejection test: this is different from the Metropolis accept/reject step in that one must continue generating candidate group elements until one is accepted for SU(3) the “quasi-heatbath” method of Cabibbo and Marinari is widely used – update a sequence of SU(2) subgroups – this is not quite an SU(3) heatbath method… – … but sweeping through the lattice updating SU(2) subgroups is also a valid algorithm, as long as the entire sweep is ergodic – for a higher acceptance rate there is an alternative SU(2) subgroup heatbath algorithm 39 NTU, Taipei, Taiwan; 10 September, 1999 Generalised Hybrid Monte Carlo In order to carry out Monte Carlo computations including the effects of dynamical fermions we would like to find an algorithm which update the fields globally – because single link updates are not cheap if the action is not local take large steps through configuration space – because small-step methods carry out a random walk which leads to critical slowing down with a dynamical critical exponent z=2 does not introduce any systematic errors 40 NTU, Taipei, Taiwan; 10 September, 1999 Generalised Hybrid Monte Carlo A useful class of algorithms with these properties is the (Generalised) Hybrid Monte Carlo (GHMC) method introduce a “fictitious momentum” p corresponding to each dynamical degree of freedom q find a Markov chain with fixed point exp[-H(q,p)] where H is the “fictitious Hamiltonian” ½ p + S(q) – the action S of the underlying QFT plays the rôle of the potential in the “fictitious” classical mechanical system – this gives the evolution of the system in a fifth dimension, “fictitious” or computer time this generates the desired distribution exp[-S(q)] if we ignore the momenta q (i.e., the marginal distribution) 41 NTU, Taipei, Taiwan; 10 September, 1999 Generalised Hybrid Monte Carlo The GHMC Markov chain alternates two Markov steps Molecular Dynamics Monte Carlo (MDMC) Partial Momentum Refreshment 42 both have the desired fixed point together they are ergodic NTU, Taipei, Taiwan; 10 September, 1999 MDMC If we could integrate Hamilton’s equations exactly we could follow a trajectory of constant fictitious energy this corresponds to a set of equiprobable fictitious phase space configurations Liouville’s theorem tells us that this also preserves the functional integral measure dp dq as required Any approximate integration scheme which is reversible and area preserving may be used to suggest configurations to a Metropolis accept/reject test with acceptance probability min[1,exp(-H)] 43 NTU, Taipei, Taiwan; 10 September, 1999 MDMC We build the MDMC step out of three parts Molecular Dynamics (MD), an approximate integrator U : q, p q , p which is exactly q , p – area preserving det U * det 1 q , p – reversible F U F U 1 a momentum flip F : p p a Metropolis accept/reject step The composition of these gives q F U e H y 1 y e H p q p with y being a uniformly distributed random number in [0,1] 44 NTU, Taipei, Taiwan; 10 September, 1999 Partial Momentum Refreshment This mixes the Gaussian distributed momenta p with Gaussian noise p cos sin sin p F cos The Gaussian distribution of p is invariant under F the extra momentum flip F ensures that for small the momenta are reversed after a rejection rather than after an acceptance for = / 2 all momentum flips are irrelevant 45 NTU, Taipei, Taiwan; 10 September, 1999 Generalised Hybrid Monte Carlo Special cases the Hybrid Monte Carlo (HMC) algorithm is the special case where = / 2 = 0 corresponds to an exact version of the Molecular Dynamics (MD) or Microcanonical algorithm (which is in general non-ergodic) the L2MC algorithm of Horowitz corresponds to choosing arbitrary but MDMC trajectories of a single leapfrog integration step ( = ). This method is also called Kramers algorithm. The Langevin Monte Carlo algorithm corresponds to choosing = / 2 and MDMC trajectories of a single leapfrog integration step ( = ). 46 NTU, Taipei, Taiwan; 10 September, 1999 Generalised Hybrid Monte Carlo Further special cases the Hybrid and Langevin algorithms are approximations where the Metropolis step is omitted the Local Hybrid Monte Carlo (LHMC) or Overrelaxation algorithm corresponds to updating a subset of the degrees of freedom (typically those living on one site or link) at a time 47 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators Baker-Campbell-Hausdorff (BCH) formula if A and B belong to any (non-commutative) algebra then e AeB e AB , where constructed from commutators of A and B (i.e., is in the Free Lie Algebra generated by {A,B}) A B more precisely, ln e e cn where c1 A B and n 1 n 2 B 1 1 2m cn 1 2 cn , A B ck1 , , ck2m , A B n 1 2 m ! k1,,k 2 m 1 m 0 k1 k 2 m n 48 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators explicitly, the first few terms are 1 A, A, B B, A, B 1 B, A, A, B ln e Ae B A B 21 A, B 12 24 A, A, A, A, B 4B, A, A, A, B 1 6A, B , A, A, B 4B, B, A, A, B 720 2A, B , B, A, B B, B, B, A, B in order to construct reversible integrators we use symmetric symplectic integrators the following identity follows directly from the BCH formula 1 A, A, B 2B, A, B ln e A 2e B e A 2 A B 24 7A, A, A, A, B 28B, A, A, A, B 1 12A, B , A, A, B 32B, B, A, A, B 5760 16A, B , B, A, B 8B, B, B, A, B 49 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators We are interested in finding the classical trajectory in phase space of a system described by the Hamiltonian H q, p T p S q 21 p 2 S q the basic idea of such a symplectic integrator is to write the time evolution operator as dp dq d exp exp dt dt p dt q H H Hˆ exp e q p p q exp Sq T p p q 50 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators define Q T p P S q and so that Hˆ P Q q p since the kinetic energy T is a function only of p and the potential energy S is a function only of q, it follows that the Q P action of e and e may be evaluated trivially eQ : f q, p f q T p , p eP : f q, p f q, p S q 51 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators from the BCH formula we find that the PQP symmetric symplectic integrator is given by U0 e 1 P 2 e Q e 1 P 2 expP Q P,P,Q2Q,P,Q 1 24 3 O 5 1 P, P,Q 2Q, P,Q exp P Q 24 2 O 4 eH e P Q O 2 ˆ in addition to conserving energy to O(²) such symmetric symplectic integrators are manifestly area preserving and reversible 52 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators For each symplectic integrator there exists a Hamiltonian H’ which is exactly conserved. For the PQP integrator we have H H ˆ H p q q p 1 P, P,Q 2Q, P,Q 2 Q P 24 32Q, Q, P, P,Q 16P,Q , Q, P,Q 4 1 5760 28Q, P, P, P,Q 12P,Q , P, P,Q O 6 8Q, Q, Q, P,Q 7P, P, P, P,Q 53 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators substituting in the exact forms for the operations P and Q we obtain the vector field H H ˆ H p q q p p S 121 2 pS SS p 2S 2 p q p q 3 4 2 3SS p 4 p S 6 S q 4 4 5 6 1 p S 720 O 4 2 30SS 6SS p p 2 3S 2S SS 54 NTU, Taipei, Taiwan; 10 September, 1999 Symplectic Integrators solving these differential equations for H’ we find SS 2S 3S S 1 2 p 2S S 2 2 H H 24 1 p 4S 4 6 p 2 720 2 2 4 O 6 note that H’ cannot be written as the sum of a p-dependent kinetic term and a q-dependent potential term 55 NTU, Taipei, Taiwan; 10 September, 1999 Langevin algorithm Consider the Hybrid Monte Carlo algorithm when take only one leapfrog step. combining the leapfrog equations of motion we obtain S 2 S S 2 12 ignore the Monte Carlo step recall that for the fictitious momenta are just 2 Gaussian distributed random noise S which is the let , then 12 usual form of the Langevin equation 2 56 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms What happens if we omit the Metropolis test? we still have an ergodic algorithm, so there is some unique fixed point distribution eS S which will be generated by the algorithm the condition for this to be a fixed point is S S S S e d e d e 12 2 U for the Langevin algorithm (which corresponds to = , a single leapfrog step) we may expand in powers of and find a n solution for S n Sn the equation determining the leading term in this expansion is the Fokker-Planck equation; the general equations are sometimes known as the Kramers-Moyal equations 57 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms in the general case we may change variables to U whence we obtain e S S d d e H , S U d d det U* e area preservati on : det U* 1 reversibility : U 1 F U F U H S U 1 H F H S F S H H U 1 F2 1 S S U 1 d d e H e H S U F e S S d e e H S 58 21 2 e H S 1 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms since H is extensive so is H, and thus so is the connected generating function (cumulants) F H , e e We can thus show order by order in that S is extensive too 59 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms For the Langevin algorithm we have H Sn O 4 and O 2 , so we immediately find that S O 2 2S S 2 81 2S j j S 2j S2 81 2 2 j j j S4 i 481 Sijjkk 2SijjkSk SijkS j Sk 2SijkSijk – if S is local then so are all the Sn – we only have an asymptotic expansion for S, and in general the subleading terms are neither local nor extensive – therefore, taking the continuum limit a 0 limit for fixed will probably not give exact results for renormalised quantities 60 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms H O 2 and Sn so again we find S O 2 For the Hybrid algorithm O 0 – we have made use of the fact that H’ is conserved and differs from H by O 2 61 NTU, Taipei, Taiwan; 10 September, 1999 Inexact algorithms What effect do such systematic errors in the distribution have on measurements? large errors will occur where observables are discontinuous functions of the parameters in the action, e.g., near phase transitions if we want to improve the action so as to reduce lattice discretisation effects and get a better approximation to the underlying continuum theory, then we have to ensure that 2 «an for the appropriate n step size errors need not be irrelevant 62 NTU, Taipei, Taiwan; 10 September, 1999 Local HMC Consider the Gaussian model defined by the free field action d 2 2 2 1 S 2 x m x 1 introduce the Hamiltonian H 21 2 S on “fictitious” phase space. – the corresponding equations of motion for the single site x x 2x Fx where 2 2d m2 and F x y 1 y the solution in terms of the Gaussian distributed random initial momentum x and the initial field value x is x t x cos 63 1 cos 2 F x sin NTU, Taipei, Taiwan; 10 September, 1999 Local HMC identify 1 cos and x to get the usual Adler overrelaxation update 2 F x 1 x 2 For gauge theories various overrelaxation methods have been suggested Hybrid Overrelaxation: this alternates a heatbath step with many overrelaxation steps with = 2 LHMC: this uses an analytic solution for the equations of motion for the update of a single U(1) or SO(2) subgroup at a time. In this case the equations of motion may be solved in terms of elliptic functions 64 NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds Gauge fields take their values in some Lie group, so we need to define classical mechanics on a group manifold which preserves the group-invariant Haar measure a Lie group G is a smooth manifold on which there is a natural mapping L: G G G defined by the group action this induces a map called the pull-back of L on the cotangent bundle defined by L* : G F F L*g f f Lg L* : G TG TG L* : G T *G T *G Lg *v f v L*g f L*g v Lg *v – F is the space of 0 forms, which are smooth mappings from G to the real numbers , 65 NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds * a form is left invariant if L the tangent space to a Lie group at the origin is called the Lie algebra, and we may choose a set of basis vectors ei 0 which satisfy the commutation relations ei , e j cijk ek where k c ijk are the structure constants of the algebra we may define a set of left invariant vector fields on TG by ei g Lg *ei 0 the corresponding left invariant dual forms i satisfy the i i j k Maurer-Cartan equations d 21 c jk jk We may therefore define a closed symplectic 2 form which globally defines an invariant Poisson bracket by d i p i i dp i p i d i i dp i 21 p i c ijk j k i 66 i i NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds We may now follow the usual procedure to find the equations of motion introduce a Hamiltonian function (0 form) H on the cotangent bundle (phase space) over the group manifold define a vector field h such that dH y h, y y TG the classical trajectories t Qt , Pt are then the integral curves of h: t hst in the natural basis we have i i i h h i ei h y y e y i i i p p i i i i H i i i i i i j k dH y ei H y i y h, y h y y h p c jk h y p i i jk 67 NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds equating coefficients of the components of y we find H k k H h i ei c ji p ei H i j p i p jk p the equations of motion in the local coordinate basis e j e ij j are therefore j q H j j j i i H k H Qt i ei , Pt ckj Pt ej k k p q i p k i 2 which for a Hamiltonian of the form H f p S q reduce to H H Q tj i eij , Pt j e kj q k i p k 68 NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds q iTi In terms of constrained variables U q e i the representation of the generators is U g Ti ei g U g g 0 i g g 0 from which it follows that ei U UTi i and for the Hamiltonian H 21 p S U leads to the i equations of motion U PU S S † UTi ab † Ti U ab T SU U P Ti Uab iab U ab 2 for the case G = SU(n) the operator T is the projection onto traceless antihermitian matrices 69 NTU, Taipei, Taiwan; 10 September, 1999 Classical Mechanics on Group Manifolds Discrete equations of motion we can now easily construct a discrete PQP symmetric integrator from these equations P 21 P 0 T S U 0 U 0 21 U exp P 21 U 0 P P 21 T S U U 21 the exponential map from the Lie algebra to the Lie group may be evaluated exactly using the Cayley-Hamilton theorem – all functions of an n n matrix M may be written as a polynomial of degree n - 1 in M – the coefficients of this polynomial can be expressed in terms of the invariants (traces of powers) of M 70 NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions Fermion fields are Grassmann valued required to satisfy the spin-statistics theorem even “classical” Fermion fields obey anticommutation relations Grassmann algebras behave like negative dimensional manifolds 71 NTU, Taipei, Taiwan; 10 September, 1999 Grassmann algebras Grassmann algebras linear space spanned by generators {,,,…} with coefficients a, b,… in some field (usually the real or complex numbers) algebra structure defined by nilpotency condition for elements of the linear space ² = 0 – there are many elements of the algebra which are not in the linear space (e.g., ) – nilpotency implies anticommutativity + = 0 • 0 = ² = ² = ( + )² = ² + + + ² = + = 0 – anticommutativity implies nilpotency 2² = 0 (unless the coefficient field has characteristic 2, i.e., 2 = 0) 72 NTU, Taipei, Taiwan; 10 September, 1999 Grassmann algebras Grassmann algebras have a natural grading corresponding to the number of generators in a given product – deg(1) = 0, deg() = 1, deg() = 2,... – all elements of the algebra can be decomposed into a sum of terms of definite grading – the parity transform of is P 1deg – a natural antiderivation is defined on a Grassmann algebra • linearity: d(a + b) = a d + b d • anti-Leibniz rule: d() = (d) + P()(d) definite integration on a Grassmann algebra is defined to be the same as derivation – hence change of variables leads to the inverse Jacobian i d1 d n d1 d n det j 73 NTU, Taipei, Taiwan; 10 September, 1999 Grassmann algebras Gaussians over Grassmann manifolds – like all functions, Gaussians are polynomials over Grassmann algebras i aij j a e ij e i ij j 1 i aij j ij ij – there is no reason for this function to be positive even for real coefficients 74 NTU, Taipei, Taiwan; 10 September, 1999 Grassmann algebras Gaussian integrals over Grassmann manifolds d 1 d n e i aij j ij Pf aij – where Pf(a) is the Pfaffian • Pf(a)² = det(a) – if we separate the Grassmann variables into two “conjugate” sets then we find the more familiar result d1 d nd 1 d n e i aij j ij det aij – despite the notation, this is a purely algebraic identity – it does not require the matrix a > 0, unlike the bosonic analogue 75 NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions Pseudofermions direct simulation of Grassmann fields is not feasible – the problem is not that of manipulating anticommuting values in a computer – it is that e F e M is not positive, and thus we get poor importance sampling S we therefore integrate out the fermion fields to obtain the fermion determinant d d e M det M – and always occur quadratically – the overall sign of the exponent is unimportant 76 NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions any operator can be expressed solely in terms of the bosonic fields M , , e 0 – e.g., the fermion propagator is G ( x, y ) x y M 1( x, y ) 77 NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions including the determinant as part of the observable to be det M S B measured is not feasible det M S B – the determinant is extensive in the lattice volume, thus again we get poor importance sampling represent the fermion determinant as a bosonic Gaussian M 1 integral with a non-local kernel det M d d e the fermion kernel must be positive definite (all its eigenvalues must have positive real parts) otherwise the bosonic integral will not converge the new bosonic fields are called “pseudofermions” 78 NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions it is usually convenient to introduce two flavours of fermion and 1 † 2 † M M to write det M det M M d d e this not only guarantees positivity, but also allows us to generate the pseudofermions from a global heatbath by applying M † to a random Gaussian distributed field the equations for motion for the boson (gauge) fields are 1 S B † M †M 1 † 1 SB † † † M M M M M M the evaluation of the pseudofermion action and the corresponding force then requires us to find the solution of a 1 (large) set of linear equations M †M 79 NTU, Taipei, Taiwan; 10 September, 1999 Dynamical fermions It is not necessary to carry out the inversions required for the equations of motion exactly there is a trade-off between the cost of computing the force and the acceptance rate of the Metropolis MDMC step The inversions required to compute the pseudofermion action for the accept/reject step does need to be computed exactly, however we usually take “exactly” to by synonymous with “to machine precision” – this may well not be true, as we will discuss later 80 NTU, Taipei, Taiwan; 10 September, 1999 Krylov solvers One of the main reasons why dynamical fermion lattice calculations are feasible is the existence of very effective numerical methods for solving large sparse systems of linear equations family of iterative methods based on Krylov spaces – Conjugate Gradients (CG, CGNE) – BiConjugate Gradients (BiCG, BiCGstab, BiCG) these are often introduced as exact methods – they require O(V) iterations to find the solution – they do not give the exact answer in practice because of rounding errors – they are more naturally thought of as methods for solving systems of linear equations in an (almost) -dimensional linear space • this is what we are approximating on the lattice anyway 81 NTU, Taipei, Taiwan; 10 September, 1999 Krylov solvers BiCG on a Banach space we want to solve the equation Ax = b on a Banach space – this is a normed linear space – the norm endows the space with a topology – the linear space operations are continuous in this topology we solve the system exactly in a Krylov subspace K n span b, Ab, A2b, A3 b,, An b there is no concept of “orthogonality” in a Banach space, so we also need to introduce a dual Krylov space of linear functionals on the Banach space Kˆ n span bˆ, A† bˆ, A†2bˆ, A†3 bˆ,, A†nbˆ The adjoint of a linear operator is defined by A† bˆ x bˆ Ax x 82 NTU, Taipei, Taiwan; 10 September, 1999 Krylov solvers for iteration n we choose a search direction pn 1 such that if we add a multiple of it to the previous solution xn 1 xn pn 1 then the resulting residual satisfies rn 1 b Ax n 1 rn Apn 1 ker Kˆ n we can then choose such that rn 1 ker Kˆ n 1 it suffices to use the three term recurrence pn 1 rn pn pn 1 because Ap j ker Kˆ n j n 2 this process converges in the norm topology (if the solution exists!) the norm of the residual does not decrease monotonically 83 NTU, Taipei, Taiwan; 10 September, 1999 Krylov solvers CG on Hilbert space if our topological normed linear space has a sesquilinear inner product then it is called a Hilbert space this allows us to define a natural mapping from vectors to linear functionals f : x xˆ by xˆ y x, y y † the operator A is self adjoint if A A , i.e., Ax, y x, Ay x, y if A is self adjoint then we can choose Kˆ n f K n this is the CG algorithm the norm decreases monotonically in this case 84 NTU, Taipei, Taiwan; 10 September, 1999 Higher order integrators Campostrini “wiggles” Since U0 e H R0 3 O 5 we are led to consider the Campostrini “wiggle” ˆ U0 U0 U0 e 2 H R0 2 3 3 O 5 ˆ this be adjusted to give an integration scheme correct to O by choosing 3 2 5 the step size may be kept fixed by taking 2 to get integration schemes of arbitrarily high order, all of which are still reversible and area preserving, we use the recursive definition ˆ 2n 1 Un 1 n Un 1 n n Un 1 n e 2 H Rn 1 2 n n 2n 1 O n2n 3 n n with n 2n 1 2 and n 2 n 85 NTU, Taipei, Taiwan; 10 September, 1999 Higher order integrators So why aren’t they used in practice? consider the simplest leapfrog scheme for a single simple harmonic oscillator with frequency for a single step we have 2 q t q t 1 21 pt 1 3 1 1 2 pt 4 2 The eigenvalues of this matrix are 1 i 1 1 2 2 1 4 2 e i cos1 1 21 2 for > 2 the integrator becomes unstable 86 NTU, Taipei, Taiwan; 10 September, 1999 Higher order integrators the orbits change from circles to hyperbolae the energy diverges exponentially in instead of oscillating for bosonic systems H » 1 for such a large integration step size for light dynamical fermions there seem to be a few “modes” which have a large force due to the small eigenvalues of the Dirac operator, and this force plays the rôle of the frequency for these systems is limited by stability rather than by the Metropolis acceptance rate 87 NTU, Taipei, Taiwan; 10 September, 1999 Instabilities Some recent full QCD measurements on large lattices with quite light quarks large CG residual: always unstable small CG residual: unstable for > 0.0075 intermediate CG residual: unstable for > 0.0075 163 32 lattice, 0.1355, 5.2, cSW 2.0171 88 NTU, Taipei, Taiwan; 10 September, 1999 Acceptance rates We can compute the average Metropolis acceptance rate Pacc area preservation implies that e H 1 1 1 1 H H H H d d e d d e d d e e Z Z Z the probability distribution of H has an asymptotic expansion as the lattice volume V H 1 1 H PH d d e H ~ exp Z 4 H 4 H the average Metropolis acceptance rate is thus Pacc ~ erfc 21 89 H erfc 1 8 H 2 2 NTU, Taipei, Taiwan; 10 September, 1999 Acceptance rates 90 The acceptance rate is a function of the variable x V 4n 4 for the nth order Campostrini “wiggle” used to generate trajectories with = 1, and x V 4n 6 for = NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents To a good approximation, the cost C of a Molecular Dynamics computation is proportional to the total time for which we have to integrate Hamilton’s equations. the cost per independent configuration is then C 1 2 A note that the cost depends on the particular operator under consideration the optimal trajectory lengths obtained by minimising the cost as a function of the parameters , , and of the algorithm 91 NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents While the cost depends upon the details of the implementation of the algorithm, the way that it scales with the correlation length of the system is an intrinsic property of the algorithm; C z where z is the dynamical critical exponent. for local algorithms the cost is independent of the trajectory length, C 1 2 A, and thus minimising the cost is equivalent to minimising A free field theory analysis is useful for understanding and optimising algorithms, especially if our results do not depend on the details of the spectrum 92 NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents HMC for free field theory we can show that choosing and = / 2 gives z = 1 this is to be compared with z = 2 for constant the optimum cost for HMC is thus C V V 4 1/ 4 / V 5 / 4 for nth order Campostrini integration (if it were stable) the cost is C V V 93 1 4n 4 4n 4 / 4n 5 V 4n 4 NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents LMC for free field theory we can show that choosing = and = / 2 gives z = 2 the optimum cost for LMC is thus C V V 6 1/ 3 / 2 V 4 / 3 2 for nth order Campostrini integration (if it were stable) the cost is 2n 4 1 C V V 4n 6 94 2n 3 / 2 V 2n 3 2 NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents L2MC (Kramers) in the approximation of unit acceptance rate we find that setting = and suitably tuning we can arrange to get z = 1 however, if Pacc 1 then the system will carry out a random walk backwards and forwards along a trajectory because the momentum, and thus the direction of travel, must be reversed upon a Metropolis rejection. 95 NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents a simple-minded analysis is that the average time between rejections must be O() to achieve z = 1 this time is approximately n 0 Pacc n 1 Pacc n 1 Pacc Pacc for small we have 1 Pacc erf kV 6, hence we are required to scale so as to keep V 4 2 fixed This leads to a cost for L2MC of C V V 4 2 1/ 4 / V 5 / 4 3 / 2 Or using nth order Campostrini integration C V V 1 4n 4 2 4n 4 / 4 n 5 2n 3 V 4 n 4 2n 2 A more careful free field theory analysis leads to the same conclusions 96 NTU, Taipei, Taiwan; 10 September, 1999 Cost and dynamical critical exponents Optimal parameters for GHMC (almost) analytic calculation in free field theory minimum cost for GHMC appears for acceptance probability in the range 40%90% very similar to HMC minimum cost for L2MC (Kramers) occurs for acceptance rate very close to 1 and this cost is much larger 97 NTU, Taipei, Taiwan; 10 September, 1999 Reversibility Are HMC trajectories reversible and area preserving in practice? the only fundamental source of irreversibility is the rounding error caused by using finite precision floating point arithmetic – for fermionic systems we can also introduce irreversibility by choosing the starting vector for the iterative linear equation solver time-asymmetrically • we might do this if we want to use (some extrapolation of) the previous solution as the starting vector – floating point arithmetic is not associative – it is more natural to store compact variables as scaled integers (fixed point) • saves memory • does not solve the precision problem 98 NTU, Taipei, Taiwan; 10 September, 1999 Reversibility data for SU(3) gauge theory and QCD with heavy quarks show that rounding errors are amplified exponentially – the underlying continuous time equations of motion are chaotic – Liapunov exponents characterise the divergence of nearby trajectories – the instability in the integrator occurs when H » 1 • zero acceptance rate anyhow 99 NTU, Taipei, Taiwan; 10 September, 1999 Reversibility in QCD the Liapunov exponents appear to scale with as the system approaches the continuum limit – = constant – this can be interpreted as saying that the Liapunov exponent characterises the chaotic nature of the continuum classical equations of motion, and is not a lattice artefact – therefore we should not have to worry about reversibility breaking down as we approach the continuum limit – caveat: data is only for small lattices, and is not conclusive 100 NTU, Taipei, Taiwan; 10 September, 1999 Reversibility Data for QCD with lighter dynamical quarks – instability occurs close to region in where acceptance rate is near one • may be explained as a few “modes” becoming unstable because of large fermionic force – integrator goes unstable if too poor an approximation to the fermionic force is used 163 32 lattice, 0.1355, 5.2, cSW 2.0171 101 NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC Consider the update of a single site x using the LHMC algorithm F 2 x 1 x 2 the values of at all other sites are left unchanged this may be written in matrix form as M x yz yz 1 xy yz 2 y ˆ ,z y ˆ ,z 2 Px yz yz xy 102 M x Px where NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC for a complete sweep through the lattice consisting of exactly V Ld updates in some order x1, x2,, xV, we have M x1 Px1 M x2 M x1 Px1 Px2 M P it is convenient to introduce the Fourier transformed fields ~ and momenta ~ the corresponding Fourier transformed single site update matrix is 2i p q x ~ Mx 103 pq pq 1 e V L 2 2 2 cos q 1 L NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC we want to compute integrated autocorrelation functions for ~2 ~2 operators like the magnetic susceptibility 0 0 – the behaviour of the autocorrelations of linear operators such as the magnetisation ~0 can be misleading in order to do this it is useful to consider quadratic operators as linear ~ ~ ~ , so operators on quadratic monomials in the fields, pq ~ ~ 00 00 p q – these quadratic monomials are updated according to ~ ~~ M~ P~ M~ P~ pq p q p q Q~ Q ~ – so, after averaging over the momenta we have M P – where Ppq Q Ppr Pqr Q and M pq ,rs M pr Mqs r 104 NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC the integrated autocorrelation function for is 1 A t 0 t 0 C t pq M Q t 105 M t 0 ~ ~ pq 00 00, pq ~2 ~ 00 00 t 0 t 0 2 Q t 00,00 1 M Q 1 ~ ~ pq 00 2 00,00 NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC Random updates the update matrix for a sweep after averaging over the V independent choices of update sites is just V 1 ~ ~ 1 ~ M Q MzQi MzQ i 1 V zi 1 V z 1 V V V we thus find that 1 A ~ 1 MQ 1 00,00 1 2m 2 2 1 e m 2d d 1 1 0 O O m O 2 V m V which tells us that z = 2 for any choice of the overrelaxation parameter 106 NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC Even/Odd updates in a single site update the new value of the field at x only depends at the old value at x and its nearest neighbours, so even sites depend only on odd sites and vice versa – the update matrix for a sweep in Fourier space thus reduces to a 2 × 2 matrix coupling ~p and ~p 1 L,,L 2 ~ Q – M thus becomes a 3 × 3 matrix in this case the integrated autocorrelation function is d 2 7 2 16 8 4 A O m 2 9 2 2m – this is minimised by choosing 2 A – hence z = 1 107 d 1 Om m 2 m O m 2 , for which d NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC Lexicographical updates this is a scheme in which x ˆ is always updated before x – except at the edges of the lattice a single site update depends on the new values of the neighbouring sites in the + directions and the old values in the - directions newthe sweep update implicitly as O N P – so we may write – where Oxy 1 xy Rxy 108 old 1 1 N Rxy R xy xy 2 x , y ˆ 2 x , y ˆ L L updateable x,y ˆ x 1 x,y ˆ x L L 2 2 P NTU, Taipei, Taiwan; 10 September, 1999 Update ordering for LHMC the sweep update matrices are easily found explicitly M 1 N O P 1 N P ~ ~ the Fourier transforms O and N are diagonal up to surface ~ terms R , which we expect to be suppressed by a factor of 1/L 1 1 m 2 2d 1 O 2 m 2 m 2 2d L m 2 2d – which is minimised by the choice m2 d – for which A 0 and hence z = 0 we thus obtain 1 A 2 this can be achieved for most operators – though with different values of – the dynamical critical exponent for the exponential autocorrelation time is still one at best 109 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation What is the best polynomial approximation p(x) to a continuous function f(x) for x in [0,1] ? Weierstrass’ theorem: any continuous function can be arbitrarily well approximated by a polynomial n k n n p x f – Bernstein polynomials: n n k t (1 t )n k k 0 1n 1 n p f dx p x f x minimise the appropriate norm n 0 where n 1 Chebyshev’s theorem – the error |p(x) - f(x)| reaches its maximum at exactly d+2 points on the unit interval – there is always a unique polynomial of any degree d which minimises p f max p x f x 0 x 1 110 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation – necessity: • suppose p - f has less than d + 2 extrema of equal magnitude • then at most d + 1 maxima exceed some magnitude • this defines a “gap” • we can construct a polynomial q of degree d which has the opposite sign to p - f at each of these maxima (Lagrange interpolation) • and whose magnitude is smaller than the “gap” • the polynomial p + q is then a better approximation than p to f 111 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation – sufficiency: • suppose there is a polynomial p f p f • then p xi f xi p xi f xi at each of the d+2 extrema of px f x • therefore p’ - p must have d+1 zeros on the unit interval • thus p’ - p = 0 as it is a polynomial of degree d 112 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation Convergence is often exponential in d Example d – the best approximation of degree d-1 over [-1,1] to x is pd x x d 21 Td x 1 – where the Chebyshev polynomials are Td x cos d cos x d 1 – the notation comes from a different transliteration of Chebyshev! – and the error is x pd x 21 d 1 d Td x 2e d ln 2 Chebyshev’s theorem is easily extended to rational approximations rational functions with equal degree numerator and denominator are usually best convergence is still often exponential and rational functions usually give much better approximations 113 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation a realistic example of a rational approximation is x 2.34756610 45 x 0.10483446 00 x 0.00730638 14 1 0.39046039 01 x 0.41059997 19 x 0.02861654 46 x 0.00127791 93 x – this is accurate to within almost 0.1% over the range [0.003,1] using a partial fraction expansion of such rational functions allows us to use a multiple mass linear equation solver, thus reducing the cost significantly. the partial fraction expansion of the rational function above is 1 0.05110937 75 0.14082862 37 0.59648450 33 0.39046039 01 x x 0.00127791 93 x 0.02861654 46 x 0.41059997 19 – this appears to be numerically stable. 114 NTU, Taipei, Taiwan; 10 September, 1999 Chebyshev approximation 115 NTU, Taipei, Taiwan; 10 September, 1999 Multibosons Chebyshev polynomial approximations were introduced by Lüscher in his Multiboson method 1 1 detM det P M det M i i † 1 det M i di e M i i i no solution of linear equations is required one must store n scalar fields the dynamics of multiboson fields is stiff, so very small step sizes are required for gauge field updates 116 NTU, Taipei, Taiwan; 10 September, 1999 Reweighting Making Lüscher’s method exact one can introduce an accept/reject step one can reweight the configurations by the ratio detM detP M – this factor is close to one if P(M) is a good approximation if one only approximates 1/x over the interval [,1] which does not cover the spectrum of M, then the reweighting factor may be large it has been suggested that this will help to sample configurations which would otherwise be suppressed by small eigenvalues of the Dirac operator 117 NTU, Taipei, Taiwan; 10 September, 1999 Noisy methods Since the GHMC algorithm is correct for any reversible and area-preserving mapping, it is often useful to use some modified MD process which is cheaper accept/reject step must use the true Hamiltonian, of course tune the parameters in the MD Hamiltonian to maximise the Metropolis acceptance rate add irrelevant operators? – a different kind of “improvement” 118 NTU, Taipei, Taiwan; 10 September, 1999 Noisy inexact algorithms Replace the force -S’ in the leapfrog equations of motion by some noisy estimator F̂ such that Fˆ S Choose the noise independently for each leapfrog step The equation for the shift in the fixed point distribution is now H S e , 1 2 For the noisy Langevin algorithm we have H O , 2 H and Sn , O , but since e O 4 we obtain S O 2 , For the noisy Hybrid algorithm H O and , Sn O 0 , so S O , 119 NTU, Taipei, Taiwan; 10 September, 1999 Noisy inexact algorithms This method is useful for non-local actions, such as staggered fermions with nf 4 flavours here the action may be written as 1 nf tr M M 1 4 nf tr ln M, and the force is – a cheap way of estimating such a trace noisily is to use the fact that tr Q ij iQij j – there is no need to use Gaussian noise for this estimator — indeed Z noise might well be better using a non area preserving irreversible update step the noisy Hybrid algorithm can be adjusted to have H , O 2 , and thus S O 2 too • this is the Hybrid R algorithm Campostrini’s integration scheme may be used to produce higher order Langevin and Hybrid algorithms with S O 2n for arbitrary n, but this is not applicable to the noisy versions 120 NTU, Taipei, Taiwan; 10 September, 1999 Kennedy–Kuti algorithm Noise without noise suppose it is prohibitively expensive to evaluate R Q Q, but that we can compute an unbiased estimator for it cheaply, Rˆ R if we look carefully at the proof that the Metropolis algorithm satisfies detailed balance we see that the ratio R is used for two quite different purposes – it is used to give an ordering to configurations: ’ < if R < 1, that is, if Q(’) < Q() – it is used as the acceptance probability if ’ < we can produce a valid algorithm using the noisy estimator R̂ for the latter rôle just by choosing another ordering for the configurations – A suitable ordering is provided by any cheap function f such that the set of configurations for which f(’) = f() has measure zero 121 NTU, Taipei, Taiwan; 10 September, 1999 Kennedy–Kuti algorithm we now define the acceptance probability as P Rˆ f f Rˆ f f – where are to be chosen so that 0 P 1 • taking 21 , 0 is often a convenient choice – detailed balance is easily established by considering the cases f() > f(‘) and f() < f(‘) separately – in practice the condition 0 P 1 can rarely be satisfied exactly, but usually the number of violation can be made (exponentially) small 122 NTU, Taipei, Taiwan; 10 September, 1999 Noisy fermions an interesting application is when we have a non-local fermionic determinant in the desired fixed point distribution, such as det M exp tr lnM – in this case we need to produce an unbiased estimator of det M exp tr ln M MM 1 R det M exp tr ln 1 MM 1 exp tr ln1 Q 123 NTU, Taipei, Taiwan; 10 September, 1999 Noisy fermions it is easy to construct unbiased estimators for tr Q n – let be a vector of random numbers whose components are chosen independently from a distribution with mean zero and variance one (Gaussian or Z noise, for example) ˆn – set Q n ij j, then Qˆ n Q ij i tr Q n ˆ n are independent, then – furthermore, if Q̂n and Q Qˆ nQˆ n tr Q n tr Q n as long as all the eigenvalues of Q lie within the unit circle then 1 n tr ln1 Q tr Q n 1n – similarly the exponential can be expanded as a power series 124 NTU, Taipei, Taiwan; 10 September, 1999 Bhanot–Kennedy algorithm In order to obtain an unbiased estimator for R we sum these series stochastically suppose f x 1 n 1fn x n where 0 pn fn 1 fn 1 – our series can be transformed into this form this can be written in “factored form” as f x 1 p0 x p1 x 2 p2 x 3 and may be summed stochastically using the following scheme k 2 fˆ x 1 p0 x p1 x p2 pk 1 x pk 1 p0 1 p1 1 p2 1 pk 125 NTU, Taipei, Taiwan; 10 September, 1999 “Kentucky” algorithm This gets rid of the systematic errors of the Kennedy—Kuti method by eliminating the violations promote the noise to the status of dynamical fields – Suppose that we have an unbiased estimator Rˆ , d p Rˆ , detM 1 1 S S ˆ d e det M d e R , Z Z 1 d d e S p Rˆ , Z 1 d d e S p Rˆ , sgnRˆ , Z sgnRˆ , , 126 NTU, Taipei, Taiwan; 10 September, 1999 “Kentucky” algorithm the and fields can be updated by alternating Metropolis Markov steps there is a sign problem only if the Kennedy—Kuti algorithm would have many violations if one constructs the estimator using the Kennedy—Bhanot algorithm then one will need an infinite number of noise fields in principle – will these ever reach equilibrium? 127 NTU, Taipei, Taiwan; 10 September, 1999 Cost of noisy algorithms Inexact algorithms these have only a trivial linear volume dependence, with z = 1 for Hybrid and z = 2 for Langevin Noisy inexact algorithms The noisy trajectories deviate from the true classical trajectory by a factor of 1 O for each step, or 1 O for a trajectory of steps – This will not affect the integrated autocorrelation function as long as «1 – Thus the noisy Langevin algorithm should have C V 2 – Thus the noisy Hybrid algorithm should have C V V 2 128 NTU, Taipei, Taiwan; 10 September, 1999 Cost of noisy algorithms Noisy exact algorithms these algorithms use noisy estimators to produce an (almost) exact algorithm which is applicable to non-local actions – a straightforward approach leads to a cost of 2 C V V 2 V 2 2 for exact noisy Langevin • it is amusing to note that this algorithm should not care what force term is used in the equations of motion – an exact noisy Hybrid algorithm is also possible, and for it C V V V 2 129 NTU, Taipei, Taiwan; 10 September, 1999 Cost of noisy algorithms these results apply only to operators (like the magnetisation) which couple sufficiently strongly to the slowest modes of the system. For other operators, like the energy in 2 dimensions, we can even obtain z = 0 Summary too little noise increases critical slowing down because the system is too weakly ergodic too much noise increases critical slowing down because the system takes a drunkard’s walk through phase space to attain z = 1 for any operator (and especially for the exponential autocorrelation time) one must be able to tune the amount of noise suitably 130 NTU, Taipei, Taiwan; 10 September, 1999 PHMC Polynomial Hybrid Monte Carlo algorithm: instead of using Chebyshev polynomials in the multiboson algorithm, Frezzotti & Jansen and deForcrand suggested using them directly in HMC polynomial approximation to 1/x are typically of order 40 to 100 at present numerical stability problems with high order polynomial evaluation polynomials must be factored correct ordering of the roots is important Frezzotti & Jansen claim there are advantages from using reweighting 131 NTU, Taipei, Taiwan; 10 September, 1999 RHMC Rational Hybrid Monte Carlo algorithm: the idea is similar to PHMC, but uses rational approximations instead of polynomial ones much lower orders required for a given accuracy evaluation simplified by using partial fraction expansion and multiple mass linear equation solvers 1/x is already a rational function, so RHMC reduces to HMC in this case can be made exact using noisy accept/reject step 132 NTU, Taipei, Taiwan; 10 September, 1999 Ginsparg–Wilson fermions Is it possible to have chiral symmetry on the lattice without doublers if we only insist that the symmetry holds on shell? such a transformation should be of the form (Lüscher) 5 1 21aD e 5 1 21aD ; e for it to be a symmetry the Dirac operator must be invariant 1 21aD 5 5 1 21aD De De D for a small transformation this implies that 1 21 aD 5D D 5 1 21 aD 0 which is the Ginsparg-Wilson relation 5D D 5 aD 5D 133 NTU, Taipei, Taiwan; 10 September, 1999 Neuberger’s operator We can find a solution of the Ginsparg-Wilson relation as follows let the lattice Dirac operator to be of the form aD 1 5ˆ5 ; ˆ5† ˆ5 ; aD† 1 ˆ5 5 5aD 5 2 this satisfies the GW relation if ˆ5 1 and it must also have the correct continuum limit D m ˆ5 5 a m 1 O a 2 both of these conditions are satisfied if we define (Neuberger) ˆ5 5 134 DW DW 1 1 DW 1 † sgn 5 Dw 1 NTU, Taipei, Taiwan; 10 September, 1999 Neuberger’s operator There are many other possible solutions The discontinuity is necessary This operator is local (Lüscher) at least if we constrain the fields such that the plaquette < 1/30 by local we mean a function of fast decrease, as opposed to ultralocal which means a function of compact support 135 NTU, Taipei, Taiwan; 10 September, 1999 Computing Neuberger’s operator Use polynomial approximation to Neuberger’s operator high degree polynomials have numerical instabilities for dynamical GW fermions this leads to a PHMC algorithm Use rational approximation requires solution of linear equations just to apply the operator for dynamical GW fermions this leads to an RHMC algorithm requires nested linear equation solvers in the dynamical case nested solvers can be avoided at the price of a much more illconditioned system attempts to combine inner and outer solves in one Krylov space method 136 NTU, Taipei, Taiwan; 10 September, 1999 Computing Neuberger’s operator Extract low-lying eigenvectors explicitly, and evaluate their contribution to the Dirac operator directly efficient methods based on Ritz functional very costly for dynamical fermions if there is a finite density of zero eigenvalues in the continuum limit (Banks-Casher) might allow for noisy accept/reject step if we can replace the step function with something continuous (so it has a reasonable series expansion) Use better approximate solution of GW relation instead of Wilson operator e.g., a relatively cheap “perfect action” 137 NTU, Taipei, Taiwan; 10 September, 1999