presentation source

advertisement
Monte Carlo Methods for
Quantum Field Theory
A D Kennedy
adk@ph.ed.ac.uk
Maxwell Institute, the University of Edinburgh
NTU, Taipei, Taiwan; 10 September, 1999
Functional Integrals and QFT

The Expectation value of an operator  is
defined non-perturbatively by the Functional
Integral
1
 S ( )
   d e
( )
Z
 Normalisation constant Z chosen such that
 The action is S()
 Defined in Euclidean space-time
 Lattice regularisation
 d is the appropriate functional measure
1 1
d   d x
 Continuum limit: lattice spacing a  0
 Thermodynamic limit: physical volume V  
2
x
NTU, Taipei, Taiwan; 10 September, 1999
Monte Carlo methods

Monte Carlo integration is based on the
identification of probabilities with measures

There are much better methods of carrying out
low dimensional quadrature
 all other methods become hopelessly expensive for large
dimensions
 in lattice QFT there is one integration per degree of freedom
– we are approximating an infinite dimensional functional integral
3
NTU, Taipei, Taiwan; 10 September, 1999
Monte Carlo methods

Generate a sequence of random field
configurations 1,2 , ,t , , N  chosen
from the probability distribution
1  S (t )
P(t ) dt  e
dt
Z

Measure the value of  on each configuration
and compute the average
1 N
   (t )
N t 1
4
NTU, Taipei, Taiwan; 10 September, 1999
Central Limit Theorem

Law of Large Numbers   lim 

Central Limit Theorem     O
N 
 where the variance of the distribution of  is

 
C2
N
C2    
2
The Laplace–DeMoivre Central Limit theorem
is an asymptotic expansion for the probability
distribution of 
 Distribution of values for a single sample  = ()
P ( )   d P           
5
NTU, Taipei, Taiwan; 10 September, 1999
Central Limit Theorem
 Generating function for connected moments
W (k )  ln  d P ( ) e
 ln  d P  e
ik
ik  
 ln e
ik


n 0
ik n
n!
Cn
 The first few cumulants are
C0  0
C3    
C1  
C4   
C2    
3
4

 3C22
2
 Note that this is an asymptotic
expansion
6
NTU, Taipei, Taiwan; 10 September, 1999
Central Limit Theorem
 Distribution of the average of N samples
1 N


P     d1  d N P1  P N      t 
N t 1


 Connected generating function
W (k )  ln  d  P ( ) eik 
 ik N

 ln  d1 d N P1  P N  exp   t 
 N t 1



ik   N N
 N ln e
 ln  d P  e
n

ik  Cn
k

 NW    
n 1
 N  n1 n! N
7
ik N
NTU, Taipei, Taiwan; 10 September, 1999
Central Limit Theorem
 Take inverse Fourier transform to obtain distribution
P
1
W  k  ik 
P   
dk e
e

2
e
C3 d 3
C4 d 4



2
3
3
4
3!N d  4!N d 
e
8

C3 d 3
C
d4
 4

2
3
3
4
3!N d  4!N d 
e

 

dk
2

ik   21N ik  C2 ik 
2
e
e
2
2 C2 N
2 C N
2
NTU, Taipei, Taiwan; 10 September, 1999
Central Limit Theorem
9
NTU, Taipei, Taiwan; 10 September, 1999
Central Limit Theorem
 Re-scale to show convergence to Gaussian distribution
d
P    F  
d
 where 
   

N and


 2 2C2
 C3   3C2
e
F    1 
 
3
6C2 N

 2C2
10
2
NTU, Taipei, Taiwan; 10 September, 1999
Importance Sampling

Integral I   dx f (x)

Sample from distribution
0   ( x)  1
Normalisation N   dx (x)  1
Probability

Estimator of integral

Estimator of variance
f ( x)
I    ( x) dx
 ( x)
2
f ( x)
 f ( x)

V    ( x)dx 
 I    dx
 I2
 ( x)
  ( x)

11
2
NTU, Taipei, Taiwan; 10 September, 1999
Importance Sampling

Minimise variance
Constraint N=1
Lagrange multiplier 
 (V  N )
f ( y)2

 0
2
 ( y )
 ( y)
f ( x)
 Optimal measure   ( x ) 
opt
 dx f ( x)

12
Optimal variance Vopt   dx f ( x)    dx f ( x) 
2
2
NTU, Taipei, Taiwan; 10 September, 1999
2
0
Importance Sampling

Example: f ( x)  sin( x)

Optimal weight  ( x)  12 sin( x)

Optimal variance
2
Vopt
13
2
2
dx sin x
2

 

   dx sin( x)     dx sin( x)   16
0
 0

NTU, Taipei, Taiwan; 10 September, 1999
Importance Sampling
2
0
dx sin x
1 Construct cheap approximation
to |sin(x)|
2 Calculate relative area within
each rectangle
3 Choose a random number
uniformly
4 Select corresponding rectangle
5 Choose another random number
uniformly
6 Select corresponding x value
within rectangle
7 Compute |sin(x)|
14
NTU, Taipei, Taiwan; 10 September, 1999
2
0
Importance Sampling
dx sin x

With 100 rectangles we have V = 16.02328561

… but we can do better!
2
2
0
0
I   dx sin( x) sin( x)    dx sin( x)  sin( x) 
Vopt  0

For which

With 100 rectangles we have V = 0.011642808
15
NTU, Taipei, Taiwan; 10 September, 1999
Random number generators

Pseudorandom number generators
 Random numbers used infrequently in Monte Carlo for QFT
– compared to spin models
 Linear congruential generator xn1  axn  b mod m 
– for suitable choice of a, b, and m (see, e.g., Donald Knuth, Art of
Computer Programming)
– usually chose b = 0 and m = power of 2
– seem to be good enough in practice
– problems for spin models if m is too small a multiple of V
– parallel implementation trivial
xnV  axn  b mod m 
a aV mod m 
b 
V 1
j
a
 b mod m 
j 0
16
NTU, Taipei, Taiwan; 10 September, 1999
Markov chains
State space 
 (Ergodic) stochastic transitions P’:   
 Deterministic evolution of probability
distribution P: Q  Q
 Distribution converges to unique fixed point Q

17
NTU, Taipei, Taiwan; 10 September, 1999
Proof of convergence of Markov chains


Define a metric d Q1, Q2    dx Q1  x   Q2  x 
on the space of (equivalence classes of)
probability distributions
Prove that d  PQ1, PQ2   1    d Q1, Q2 
with  > 0, so the Markov process P is a
contraction mapping

The sequence Q, PQ, P²Q, P³Q,… is Cauchy

The space of probability distributions is
complete, so the sequence converges to a
unique fixed point Q  lim P nQ
n
18
NTU, Taipei, Taiwan; 10 September, 1999
Proof of convergence of Markov chains
d  PQ1, PQ2    dx PQ1  x   PQ2  x 
  dx  dy P x  y Q1  y    dy P x  y Q2  y 
Q y   Q1  y   Q2  y 
  dx  dy P x  y Q y 
  y     y   1
  dx  dy P x  y Q y  Q y     Q y 
a  b  a  b  2 min a , b 
  dx  dy P x  y  Q y 
 2 dx min  dy P x  y Q y   Q y 

19
NTU, Taipei, Taiwan; 10 September, 1999
Proof of convergence of Markov chains
d  PQ1, PQ2 
 dx P x  y   1
  dy Q y   2 dx min  dy P x  y Q y   Q y 

  dy Q y   2 dx inf P x  y  min  dy Q y   Q y 
y

 dy Q y  Q y    dy Q y   Q y 
  dy Q y    dy Q1  y    dy Q2  y   1  1 0
  dy Q y   Q y   12  dy Q y 
  dy Q y    dx inf P x  y  dy Q y   d Q1,Q2 
y
0    1   dx inf P x  y   1
y
20
NTU, Taipei, Taiwan; 10 September, 1999
Markov chains

Use Markov chains to sample from Q
 Suppose we can construct an ergodic Markov process P which
has distribution Q as its fixed point
 start with an arbitrary state (“field configuration”)
 iterate the Markov process until it has converged
(“thermalized”)
 thereafter, successive configurations will be distributed
according to Q
– but in general they will be correlated
 to construct P we only need relative probabilities of states
– don’t know the normalisation of Q
– cannot use Markov chains to compute integrals directly
– we can compute ratios of integrals
21
NTU, Taipei, Taiwan; 10 September, 1999
Markov chains


How do we construct a Markov process with a
specified fixed point? Q x    dy P x  y Q y 
Detailed balance P y  x Q x   P x  y Q y 
 integrate w.r.t. y to obtain fixed point condition
 sufficient but not necessary for fixed point

 Q x  

Metropolis algorithm P x  y   min 1,
 Q y  
 consider cases Q(x)>Q(y) and Q(x)<Q(y) separately to obtain
detailed balance condition
 sufficient but not necessary for detailed balance
Q x 
 other choices are possible, e.g., P x  y  
Q x   Q y 
22
NTU, Taipei, Taiwan; 10 September, 1999
Markov chains

Composition of Markov steps
 Let P and P be two Markov steps which have the desired
fixed point distribution...
 … they need not be ergodic
 Then the composition of the two steps P  P will also have
the desired fixed point...
 … and it may be ergodic

This trivially generalises to any (fixed) number
of steps
 For the case where P is not ergodic but (P) is the
terminology “weakly” and “strongly” ergodic are sometimes
used
23
NTU, Taipei, Taiwan; 10 September, 1999
Markov chains

This result justifies “sweeping” through a lattice
performing single site updates
 each individual single site update has the desired fixed point
because it satisfies detailed balance
 the entire sweep therefore has the desired fixed point, and is
ergodic...
 … but the entire sweep does not satisfy detailed balance
 of course it would satisfy detailed balance if the sites were
updated in a random order…
 … but this is not necessary
 … and it is undesirable because it puts too much randomness
into the system (more on this later)
24
NTU, Taipei, Taiwan; 10 September, 1999
Autocorrelations

Exponential autocorrelations
 the unique fixed point of an ergodic Markov process
corresponds to the unique eigenvector with eigenvalue 1
 all its other eigenvalues must lie within the unit circle
 in particular, the largest subleading eigenvalue is
max  1
 this corresponds to the exponential autocorrelation time
N exp  
25
1
ln max
0
NTU, Taipei, Taiwan; 10 September, 1999
Autocorrelations

Integrated autocorrelations
 consider the autocorrelation of some operator
 without loss of generality we may assume

26
2
 0
1 N N
 2  t t 
N t 1 t1
N 1 N
1 N

2
 2  t   2   t t  
N  t 1

t 1 t t 1
NTU, Taipei, Taiwan; 10 September, 1999
Autocorrelations
 define the autocorrelation function C   

2
t  t 
1 2
2 N 1

      N   C   2 
N
N 1

 
2
 the autocorrelation function must fall faster that the exponential
autocorrelation C    max  e Nexp
 for a sufficiently large number of samples
2


 N exp 


 1  2 C  
1  O



 N 
 1
 N 

 define the integrated autocorrelation function

27
2
 1  2 A 
 
 N exp 
1

O


N 
N


2

A   C  
 1
NTU, Taipei, Taiwan; 10 September, 1999
Continuum limits

We are not interested in lattice QFTs per se,
but in their continuum limit as a  0
 this corresponds to a continuous phase transition of the
discrete lattice model
– for the continuum theory to have a finite correlation length a
(inverse mass gap) in physical units this correlation length must
diverge in lattice units
 we expect such systems to exhibit universal behaviour as they
approach a continuous phase transition
– the nature of the continuum theory is characterised by its
symmetries
– the details of the microscopic interactions are unimportant at
macroscopic length scales
– universality is a consequence of the way that the the theory
scales as a  0 while a is kept fixed
28
NTU, Taipei, Taiwan; 10 September, 1999
Continuum limits
 the nature of the continuum field theory depends on the way
that physical quantities behave as the system approaches the
continuum limit
 the scaling of the parameters in the action required is
described by the renormalisation group equation (RGE)
– the RGE states the “reparameterisation invariance” of the theory
as the choice of scale at which we choose to fix the
renormalisation conditions
– as a  0 we expect the details of the regularisation scheme (cut
off effects, lattice artefacts) to vanish, so the effect of changing a
is just an RGE transformation
• on the lattice the a renormalisation group transformation may
be implemented as a “block spin” transformation of the fields
• strictly speaking, the renormalisation “group” is a semigroup
on the lattice, as blockings are not invertible
29
NTU, Taipei, Taiwan; 10 September, 1999
Continuum limits
 the continuum limit of a lattice QFT corresponds to a fixed
point of the renormalisation group
– at such a fixed point the form of the action does not change under
an RG transformation
 the parameters in the action scale according to a set of critical
exponents
 all known four dimensional QFTs correspond to trivial
(Gaussian) fixed points
– for such a fixed point the UV nature of the theory may by
analysed using perturbation theory
– monomials in the action may be classified according to their
power counting dimension
• d < 4 relevant (superrenormalisable)
• d = 4 marginal (renormalisable)
• d > 4 irrelevant (nonrenormalisable)
30
NTU, Taipei, Taiwan; 10 September, 1999
Continuum limits

The behaviour of our Markov chains as the
system approaches a continuous phase
transition is described by its dynamical critical
exponents
 these describe how badly the system (model + algorithm)
undergo critical slowing down
 the dynamical critical exponent z tells us how the cost of
generating an independent configuration grows as the
correlation length of the system is taken to , cost   z
 this is closely related (but not always identical) to the
dynamical critical exponent for the exponential or integrated
autocorrelations
31
NTU, Taipei, Taiwan; 10 September, 1999
Markov chains
 “Coupling
from the Past”

Propp and Wilson (1996)

Use fixed set of random numbers

Flypaper principle: if states coalesce they stay together forever
–
Eventually, all states coalesce to some state  with probability one
–
Any state from t = - will coalesce to 
–
 is a sample from the fixed point distribution

32
NTU, Taipei, Taiwan; 10 September, 1999
Global heatbath

The ideal generator selects field configurations
randomly and independently from the desired
distribution
 it is hard to construct such global heatbath generators for any
but the simplest systems
 they can be built by selecting sufficiently independent
configurations from a Markov process…
 … or better yet using CFTP which guarantees that the
samples are truly uncorrelated
 but this such generators are expensive!
33
NTU, Taipei, Taiwan; 10 September, 1999
Global heatbath

For the purposes of Monte Carlo integration
these is no need for the configurations to be
completely uncorrelated
 we just need to take the autocorrelations into account in our
estimate of the variance of the resulting integral
 using all (or most) of the configurations generated by a Markov
process is more cost-effective than just using independent
ones
 the optimal choice balances the cost of applying each Markov
step with the cost of making measurements
34
NTU, Taipei, Taiwan; 10 September, 1999
Local heatbath



35
For systems with a local bosonic action we can
build a Markov process with the fixed point
distribution  exp(-S) out of steps which
update a single site with this fixed point
If this update generates a new site variable
value which is completely independent of its
old value then it is called a local heatbath
For free field theory we just need to generate a
Gaussian-distributed random variable
NTU, Taipei, Taiwan; 10 September, 1999
Gaussian generators

If x1, x2,, xn  are uniformly distributed random
n
numbers then n1  x j is approximately Gaussian
j 1
by the Central Limit theorem
 This is neither cheap nor accurate
36
NTU, Taipei, Taiwan; 10 September, 1999
Gaussian generators

If x is uniformly distributed and f is a
monotonically increasing function then f(x) is
distributed as
d
1
P y    dx  y  f  x   
 y    

1
f  f  
f  f 1y 


 Choosing f  x    2 ln x we obtain P y   y e


 21 y 2
 Therefore generate two uniform random numbers x and x,
set
r   2 ln x1,   2 x2
y1, then
r cos , y 2  r sin
are two independent Gaussian distributed random numbers
37
NTU, Taipei, Taiwan; 10 September, 1999
Gaussian generators

Even better methods exist
 the rectangle-wedge-tail (RWT) method
 the ziggurat method
 these do not require special function evaluations
 they can be more interesting to implement for parallel
computers
38
NTU, Taipei, Taiwan; 10 September, 1999
Local heatbath

For pure gauge theories the field variables live
on the links of the lattice and take their values
in a representation of the gauge group
 for SU(2) Creutz gave an exact local heatbath algorithm
– it requires a rejection test: this is different from the Metropolis
accept/reject step in that one must continue generating candidate
group elements until one is accepted
 for SU(3) the “quasi-heatbath” method of Cabibbo and Marinari
is widely used
– update a sequence of SU(2) subgroups
– this is not quite an SU(3) heatbath method…
– … but sweeping through the lattice updating SU(2) subgroups is
also a valid algorithm, as long as the entire sweep is ergodic
– for a higher acceptance rate there is an alternative SU(2)
subgroup heatbath algorithm
39
NTU, Taipei, Taiwan; 10 September, 1999
Generalised Hybrid Monte Carlo

In order to carry out Monte Carlo computations
including the effects of dynamical fermions we
would like to find an algorithm which
 update the fields globally
– because single link updates are not cheap if the action is not local
 take large steps through configuration space
– because small-step methods carry out a random walk which leads
to critical slowing down with a dynamical critical exponent z=2
 does not introduce any systematic errors
40
NTU, Taipei, Taiwan; 10 September, 1999
Generalised Hybrid Monte Carlo

A useful class of algorithms with these
properties is the (Generalised) Hybrid Monte
Carlo (GHMC) method
 introduce a “fictitious momentum” p corresponding to each
dynamical degree of freedom q
 find a Markov chain with fixed point  exp[-H(q,p)] where H is
the “fictitious Hamiltonian” ½ p + S(q)
– the action S of the underlying QFT plays the rôle of the potential
in the “fictitious” classical mechanical system
– this gives the evolution of the system in a fifth dimension,
“fictitious” or computer time
 this generates the desired distribution exp[-S(q)] if we ignore
the momenta q (i.e., the marginal distribution)
41
NTU, Taipei, Taiwan; 10 September, 1999
Generalised Hybrid Monte Carlo

The GHMC Markov chain alternates two
Markov steps
 Molecular Dynamics Monte Carlo (MDMC)
 Partial Momentum Refreshment


42
both have the desired fixed point
together they are ergodic
NTU, Taipei, Taiwan; 10 September, 1999
MDMC

If we could integrate Hamilton’s equations
exactly we could follow a trajectory of constant
fictitious energy
 this corresponds to a set of equiprobable fictitious phase
space configurations
 Liouville’s theorem tells us that this also preserves the
functional integral measure dp dq as required

Any approximate integration scheme which is
reversible and area preserving may be used to
suggest configurations to a Metropolis
accept/reject test
 with acceptance probability min[1,exp(-H)]
43
NTU, Taipei, Taiwan; 10 September, 1999
MDMC

We build the MDMC step out of three parts
 Molecular Dynamics (MD), an approximate integrator
U   : q, p   q , p which is exactly




  q , p 
– area preserving det U *  det 
 1



q
,
p


– reversible F  U    F  U    1
 a momentum flip F : p   p
 a Metropolis accept/reject step

The composition of these gives
 q 
   F  U   e H  y  1 y  e H
 p 
 




q 
 
 p
 

 with y being a uniformly distributed random number in [0,1]
44
NTU, Taipei, Taiwan; 10 September, 1999
Partial Momentum Refreshment

This mixes the Gaussian distributed momenta
p with Gaussian noise 
 p   cos 
 
      sin
  
sin   p 
  F 
cos     
 The Gaussian distribution of p is invariant under F
 the extra momentum flip F ensures that for small  the
momenta are reversed after a rejection rather than after an
acceptance
 for  =  / 2 all momentum flips are irrelevant
45
NTU, Taipei, Taiwan; 10 September, 1999
Generalised Hybrid Monte Carlo

Special cases
 the Hybrid Monte Carlo (HMC) algorithm is the special case
where  =  / 2
  = 0 corresponds to an exact version of the Molecular
Dynamics (MD) or Microcanonical algorithm (which is in
general non-ergodic)
 the L2MC algorithm of Horowitz corresponds to choosing
arbitrary  but MDMC trajectories of a single leapfrog
integration step ( = ). This method is also called Kramers
algorithm.
 The Langevin Monte Carlo algorithm corresponds to choosing
 =  / 2 and MDMC trajectories of a single leapfrog
integration step ( = ).
46
NTU, Taipei, Taiwan; 10 September, 1999
Generalised Hybrid Monte Carlo

Further special cases
 the Hybrid and Langevin algorithms are approximations where
the Metropolis step is omitted
 the Local Hybrid Monte Carlo (LHMC) or Overrelaxation
algorithm corresponds to updating a subset of the degrees of
freedom (typically those living on one site or link) at a time
47
NTU, Taipei, Taiwan; 10 September, 1999
Symplectic Integrators

Baker-Campbell-Hausdorff (BCH) formula
 if A and B belong to any (non-commutative) algebra then
e AeB  e AB  , where  constructed from commutators of A
and B (i.e., is in the Free Lie Algebra generated by {A,B})


A B
 more precisely, ln e e   cn where c1  A  B and
n 1

n 2  B
1  1
2m
cn 1 
 2 cn , A  B   
 ck1 , , ck2m , A  B 


n  1
2
m
!
k1,,k 2 m 1
m 0
k1  k 2 m n

  
48





 
NTU, Taipei, Taiwan; 10 September, 1999
Symplectic Integrators
 explicitly, the first few terms are


1 A, A, B   B, A, B   1 B, A, A, B 
ln e Ae B  A  B  21 A, B   12
24
  A, A, A, A, B   4B, A, A, A, B  
1  6A, B , A, A, B   4B, B, A, A, B   
 720


  2A, B , B, A, B   B, B, B, A, B  


 in order to construct reversible integrators we use symmetric
symplectic integrators
 the following identity follows directly from the BCH formula
1 A, A, B   2B, A, B 
ln e A 2e B e A 2  A  B  24
 7A, A, A, A, B   28B, A, A, A, B  
1  12A, B , A, A, B   32B, B, A, A, B   
 5760


  16A, B , B, A, B   8B, B, B, A, B  



49

NTU, Taipei, Taiwan; 10 September, 1999
Symplectic Integrators

We are interested in finding the classical
trajectory in phase space of a system
described by the Hamiltonian
H q, p   T  p   S q   21 p 2  S q 
 the basic idea of such a symplectic integrator is to write the
time evolution operator as
 dp  dq   
 d
exp    exp  


 dt 
  dt p dt q  
  H  H   
Hˆ
 exp  


e


  q p p q  
 

 
 exp   Sq   T  p   
p
q  
 
50
NTU, Taipei, Taiwan; 10 September, 1999
Symplectic Integrators
 define Q  T  p 





P


S
q
and
so that Hˆ  P  Q
q
p
 since the kinetic energy T is a function only of p and the
potential energy S is a function only of q, it follows that the
Q
P
action of e and e may be evaluated trivially
eQ : f q, p   f q  T  p , p 
eP : f q, p   f q, p  S q 
51
NTU, Taipei, Taiwan; 10 September, 1999
Symplectic Integrators
 from the BCH formula we find that the PQP symmetric
symplectic integrator is given by
U0  
 


 e

1 P
2
e Q e
1 P
2
 



expP Q  P,P,Q2Q,P,Q
1
24
3
O 5 
 
1 P, P,Q   2Q, P,Q 
 exp  P  Q   24
 2  O 4 
 eH   e P Q   O 2 
ˆ
 in addition to conserving energy to O(²) such symmetric
symplectic integrators are manifestly area preserving and
reversible
52
NTU, Taipei, Taiwan; 10 September, 1999
Symplectic Integrators

For each symplectic integrator there exists a
Hamiltonian H’ which is exactly conserved.
 For the PQP integrator we have
H   H  
ˆ

H 

p q q p
1 P, P,Q   2Q, P,Q  2
 Q  P  24
 32Q, Q, P, P,Q   16P,Q , Q, P,Q  

 4
1
 5760  28Q, P, P, P,Q   12P,Q , P, P,Q   O  6
  8Q, Q, Q, P,Q   7P, P, P, P,Q  


 
53
NTU, Taipei, Taiwan; 10 September, 1999
Symplectic Integrators
 substituting in the exact forms for the operations P and Q we
obtain the vector field
H   H  
ˆ
H 

p q q p





 

  p  S   121 2 pS  SS  p 2S  2
p 
q
p 
 q






3 4 
2  3SS p
4

p
S

6
S

q 

 4
4 5 
6
1


p
S
 720  


O




4 2 
    30SS  6SS p 

p 
 

2
   3S 2S  SS 


54



 
NTU, Taipei, Taiwan; 10 September, 1999
Symplectic Integrators
 solving these differential equations for H’ we find


SS  2S   3S S
1 2 p 2S   S 2  2
H   H  24

1  p 4S  4   6 p 2
 720
2
2
4
 
 O  6
 note that H’ cannot be written as the sum of a p-dependent
kinetic term and a q-dependent potential term
55
NTU, Taipei, Taiwan; 10 September, 1999
Langevin algorithm

Consider the Hybrid Monte Carlo algorithm
when take only one leapfrog step.
 combining the leapfrog equations of motion we obtain
S 2 
 

 S S    
   
 
    2 
        12
 ignore the Monte Carlo step

 recall that for   the fictitious momenta  are just
2
Gaussian distributed random noise 
S
 which is the
 let    , then         12
usual form of the Langevin equation 
2
56
NTU, Taipei, Taiwan; 10 September, 1999
Inexact algorithms

What happens if we omit the Metropolis test?
 we still have an ergodic algorithm, so there is some unique
fixed point distribution eS S  which will be generated by the
algorithm
 the condition for this to be a fixed point is
 S    S   
 S   S  
e
  d e
 d e
 12 2
    U   
 for the Langevin algorithm (which corresponds to  = , a
single leapfrog step) we may expand in powers of  and find a
n
solution for S  n Sn
 the equation determining the leading term in this expansion is
the Fokker-Planck equation; the general equations are
sometimes known as the Kramers-Moyal equations
57
NTU, Taipei, Taiwan; 10 September, 1999
Inexact algorithms
   
 
 in the general case we may change variables to    U   
  

 
 whence we obtain
e S    S      d d e H  ,  S      U 
  d d det U* e
area preservati on : det U*  1
reversibility : U 1  F  U  F
    U 
 H  S U 1
H F  H
S  F  S
H  H  U  1
F2  1
S  S  U  1
  d d e H e H  S U F      
 e S    S     d e
e  H S 
58

 21 2
e H S 
1
NTU, Taipei, Taiwan; 10 September, 1999
Inexact algorithms
 since H is extensive so is H, and thus so is the connected
generating function (cumulants)
 F  
H  , 
e
 e

 We can thus show order by order in  that S is extensive
too
59
NTU, Taipei, Taiwan; 10 September, 1999
Inexact algorithms
 For the Langevin algorithm we have H
Sn

 

 
 O  4 and
 
 O  2 , so we immediately find that S  O  2
  2S  S 2 
   81 2S j j  S 2j
S2  81  2 2  
 j  
j   j



S4 i  481 Sijjkk  2SijjkSk  SijkS j Sk  2SijkSijk 


– if S is local then so are all the Sn
– we only have an asymptotic expansion for S, and in general the
subleading terms are neither local nor extensive
– therefore, taking the continuum limit a  0 limit for fixed  will
probably not give exact results for renormalised quantities
60
NTU, Taipei, Taiwan; 10 September, 1999
Inexact algorithms
H   O 2  and Sn
so again we find S  O  2 
 For the Hybrid algorithm

 
 O  0
– we have made use of the fact that H’ is conserved and differs
from H by O  2 
61
NTU, Taipei, Taiwan; 10 September, 1999
Inexact algorithms

What effect do such systematic errors in the
distribution have on measurements?
 large errors will occur where observables are discontinuous
functions of the parameters in the action, e.g., near phase
transitions
 if we want to improve the action so as to reduce lattice
discretisation effects and get a better approximation to the
underlying continuum theory, then we have to ensure that
 2 «an for the appropriate n
 step size errors need not be irrelevant
62
NTU, Taipei, Taiwan; 10 September, 1999
Local HMC

Consider the Gaussian model defined by the
free field action
d

2
2 2
1
S    2       x   m  
x  1

 introduce the Hamiltonian H  21  2  S   on “fictitious” phase
space.
– the corresponding equations of motion for the single site x
x   2x  Fx where  2  2d  m2 and F   x y 1 y 
 the solution in terms of the Gaussian distributed random initial
momentum  x and the initial field value  x is
 x t    x cos  
63
1  cos 
2
F
x
sin

NTU, Taipei, Taiwan; 10 September, 1999
Local HMC
 identify   1 cos  and  x   to get the usual Adler
overrelaxation update
 2   
F

  x   1    x   2 




For gauge theories various overrelaxation
methods have been suggested
 Hybrid Overrelaxation: this alternates a heatbath step with
many overrelaxation steps with  = 2
 LHMC: this uses an analytic solution for the equations of
motion for the update of a single U(1) or SO(2) subgroup at a
time. In this case the equations of motion may be solved in
terms of elliptic functions
64
NTU, Taipei, Taiwan; 10 September, 1999
Classical Mechanics on Group Manifolds

Gauge fields take their values in some Lie
group, so we need to define classical
mechanics on a group manifold which
preserves the group-invariant Haar measure
 a Lie group G is a smooth manifold on which there is a natural
mapping L: G  G  G defined by the group action
 this induces a map called the pull-back of L on the cotangent
bundle defined by
L* : G  F  F
L*g f  f  Lg
L* : G  TG  TG
L* : G  T *G  T *G
Lg *v f   v L*g f 
L*g v    Lg *v 
– F is the space of 0 forms, which are smooth mappings from G to
the real numbers
,
65
NTU, Taipei, Taiwan; 10 September, 1999
Classical Mechanics on Group Manifolds
*
 a form is left invariant if L   
 the tangent space to a Lie group at the origin is called the Lie
algebra, and we may choose a set of basis vectors ei 0 
which satisfy the commutation relations ei , e j    cijk ek where
k
c ijk are the structure constants of the algebra
 we may define a set of left invariant vector fields on TG by
ei g   Lg *ei 0
 the corresponding left invariant dual forms  i satisfy the
i
i
j
k
Maurer-Cartan equations d   21  c jk   
jk
 We may therefore define a closed symplectic 2 form which
globally defines an invariant Poisson bracket by
  d  i p i    i  dp i  p i d i     i  dp i  21 p i c ijk j   k 
i
66
i
i
NTU, Taipei, Taiwan; 10 September, 1999
Classical Mechanics on Group Manifolds

We may now follow the usual procedure to find
the equations of motion
 introduce a Hamiltonian function (0 form) H on the cotangent
bundle (phase space) over the group manifold
 define a vector field h such that dH y    h, y  y  TG
 the classical trajectories  t  Qt , Pt  are then the integral
curves of h:  t  hst 
 in the natural basis we have

 i
i  
i  



h    h i ei  h
y

y
e

y

i

i 
i 
p 
p 
i 
i 
 i i

H i 
i
i i
i
i
j k
dH y     ei H y  i y    h, y     h y  y h  p  c jk h y 
p
i 
i 
jk


67
NTU, Taipei, Taiwan; 10 September, 1999
Classical Mechanics on Group Manifolds
 equating coefficients of the components of y we find
 H

  
k k H

h    i ei   c ji p
 ei H  i 
j
p
i  p
 jk
 p 
 the equations of motion in the local coordinate basis
e j   e ij  j are therefore
j
q

H j
j
j
i
i H
k H 



Qt   i ei , Pt     ckj Pt
 ej
k
k 
p
q 
i p
k  i
 
2
 which for a Hamiltonian of the form H  f p  S q reduce to
H
H
Q tj   i eij , Pt j   e kj
q k
i p
k
68
NTU, Taipei, Taiwan; 10 September, 1999
Classical Mechanics on Group Manifolds
 q iTi

In terms of constrained variables U q   e i
 the representation of the generators is
U g 
Ti 
 ei g U g  g 0
i
g g 0
 from which it follows that ei U   UTi
 
i
 and for the Hamiltonian H  21  p  S U  leads to the
i
equations of motion
U  PU
 S

S
†

UTi ab  † Ti U ab   T SU U
P  Ti 
Uab
iab  U ab

2




 for the case G = SU(n) the operator T is the projection onto
traceless antihermitian matrices
69
NTU, Taipei, Taiwan; 10 September, 1999
Classical Mechanics on Group Manifolds

Discrete equations of motion
 we can now easily construct a discrete PQP symmetric
integrator from these equations


P 21    P 0   T S U 0 U 0  21 
U    exp P 21   U 0 


P    P 21    T S U  U   21 
 the exponential map from the Lie algebra to the Lie group may
be evaluated exactly using the Cayley-Hamilton theorem
– all functions of an n  n matrix M may be written as a polynomial
of degree n - 1 in M
– the coefficients of this polynomial can be expressed in terms of
the invariants (traces of powers) of M
70
NTU, Taipei, Taiwan; 10 September, 1999
Dynamical fermions

Fermion fields are Grassmann valued
 required to satisfy the spin-statistics theorem
 even “classical” Fermion fields obey anticommutation relations
 Grassmann algebras behave like negative dimensional
manifolds
71
NTU, Taipei, Taiwan; 10 September, 1999
Grassmann algebras

Grassmann algebras
 linear space spanned by generators {,,,…} with
coefficients a, b,… in some field (usually the real or complex
numbers)
 algebra structure defined by nilpotency condition for elements
of the linear space ² = 0
– there are many elements of the algebra which are not in the linear
space (e.g., )
– nilpotency implies anticommutativity  +  = 0
• 0 = ² = ² = ( + )² = ² +  +  + ² =  +  = 0
– anticommutativity implies nilpotency 2² = 0 (unless the
coefficient field has characteristic 2, i.e., 2 = 0)
72
NTU, Taipei, Taiwan; 10 September, 1999
Grassmann algebras
 Grassmann algebras have a natural grading corresponding to
the number of generators in a given product
– deg(1) = 0, deg() = 1, deg() = 2,...
– all elements of the algebra can be decomposed into a sum of
terms of definite grading
 
– the parity transform of  is P     1deg  
– a natural antiderivation is defined on a Grassmann algebra
• linearity: d(a + b) = a d + b d
• anti-Leibniz rule: d() = (d) + P()(d)
 definite integration on a Grassmann algebra is defined to be
the same as derivation
– hence change of variables leads to the inverse Jacobian
  i 


 d1 d n   d1 d n det    
 j
73
NTU, Taipei, Taiwan; 10 September, 1999
Grassmann algebras
 Gaussians over Grassmann manifolds
– like all functions, Gaussians are polynomials over Grassmann
algebras
 i aij j
a
e ij
  e i ij j   1   i aij j 
ij
ij
– there is no reason for this function to be positive even for real
coefficients
74
NTU, Taipei, Taiwan; 10 September, 1999
Grassmann algebras
 Gaussian integrals over Grassmann manifolds
 d 1  d  n e
  i aij j
ij
  Pf aij 
– where Pf(a) is the Pfaffian
• Pf(a)² = det(a)
– if we separate the Grassmann variables into two “conjugate” sets
then we find the more familiar result
 d1 d nd 1 d n e
  i aij j
ij
  det aij 
– despite the notation, this is a purely algebraic identity
– it does not require the matrix a > 0, unlike the bosonic analogue
75
NTU, Taipei, Taiwan; 10 September, 1999
Dynamical fermions

Pseudofermions
 direct simulation of Grassmann fields is not feasible
– the problem is not that of manipulating anticommuting values in a
computer
– it is that e F  e  M is not positive, and thus we get poor
importance sampling
S
 we therefore integrate out the fermion fields to obtain the
fermion determinant  d d e  M  det M 
–  and  always occur quadratically
– the overall sign of the exponent is unimportant
76
NTU, Taipei, Taiwan; 10 September, 1999
Dynamical fermions
 any operator  can be expressed solely in terms of the
bosonic fields
     M  
     , ,
e
    
   0
– e.g., the fermion propagator is
G ( x, y )    x  y   M 1( x, y )
77
NTU, Taipei, Taiwan; 10 September, 1999
Dynamical fermions
 including the determinant as part of the observable to be
det M     S
B
measured is not feasible  
det M   S
B
– the determinant is extensive in the lattice volume, thus again we
get poor importance sampling
 represent the fermion determinant as a bosonic Gaussian
 M 1   
integral with a non-local kernel det M     d  d e
 the fermion kernel must be positive definite (all its eigenvalues
must have positive real parts) otherwise the bosonic integral
will not converge
 the new bosonic fields are called “pseudofermions”
78
NTU, Taipei, Taiwan; 10 September, 1999
Dynamical fermions
 it is usually convenient to introduce two flavours of fermion and
1
†
2


†


M
M

to write det M    det M  M     d  d e


 this not only guarantees positivity, but also allows us to
generate the pseudofermions from a global heatbath by
applying M † to a random Gaussian distributed field
 the equations for motion for the boson (gauge) fields are
  
1
S  


   B   † M †M  




1 † 
1
SB  
†
†
†


 M M  
M M M M   



 the evaluation of the pseudofermion action and the
corresponding force then requires us to find the solution of a
1
(large) set of linear equations M †M   



79



NTU, Taipei, Taiwan; 10 September, 1999
Dynamical fermions

It is not necessary to carry out the inversions
required for the equations of motion exactly
 there is a trade-off between the cost of computing the force
and the acceptance rate of the Metropolis MDMC step

The inversions required to compute the
pseudofermion action for the accept/reject step
does need to be computed exactly, however
 we usually take “exactly” to by synonymous with “to machine
precision”
– this may well not be true, as we will discuss later
80
NTU, Taipei, Taiwan; 10 September, 1999
Krylov solvers

One of the main reasons why dynamical fermion lattice
calculations are feasible is the existence of very
effective numerical methods for solving large sparse
systems of linear equations
 family of iterative methods based on Krylov spaces
– Conjugate Gradients (CG, CGNE)
– BiConjugate Gradients (BiCG, BiCGstab, BiCG)
 these are often introduced as exact methods
– they require O(V) iterations to find the solution
– they do not give the exact answer in practice because of rounding
errors
– they are more naturally thought of as methods for solving systems
of linear equations in an (almost) -dimensional linear space
• this is what we are approximating on the lattice anyway
81
NTU, Taipei, Taiwan; 10 September, 1999
Krylov solvers

BiCG on a Banach space
 we want to solve the equation Ax = b on a Banach space
– this is a normed linear space
– the norm endows the space with a topology
– the linear space operations are continuous in this topology
 we solve the system exactly in a Krylov subspace


K n  span b, Ab, A2b, A3 b,, An b
 there is no concept of “orthogonality” in a Banach space, so
we also need to introduce a dual Krylov space of linear
functionals on the Banach space
Kˆ n  span bˆ, A† bˆ, A†2bˆ, A†3 bˆ,, A†nbˆ


 The adjoint of a linear operator is defined by
A† bˆ  x   bˆ  Ax  x
 
82
NTU, Taipei, Taiwan; 10 September, 1999
Krylov solvers
 for iteration n we choose a search direction pn 1
 such that if we add a multiple of it to the previous solution
xn 1  xn  pn 1
 then the resulting residual satisfies
rn 1  b  Ax n 1  rn  Apn 1  ker Kˆ n
 we can then choose  such that rn 1  ker Kˆ n 1
 it suffices to use the three term recurrence pn 1  rn  pn  pn 1
 because Ap j  ker Kˆ n  j  n  2
 this process converges in the norm topology (if the solution
exists!)
 the norm of the residual does not decrease monotonically
83
NTU, Taipei, Taiwan; 10 September, 1999
Krylov solvers

CG on Hilbert space
 if our topological normed linear space has a sesquilinear inner
product then it is called a Hilbert space
 this allows us to define a natural mapping from vectors to
linear functionals f : x  xˆ by xˆ y   x, y y
†
 the operator A is self adjoint if A  A , i.e., Ax, y  x, Ay x, y
 if A is self adjoint then we can choose Kˆ n  f K n 
 this is the CG algorithm
 the norm decreases monotonically in this case
84
NTU, Taipei, Taiwan; 10 September, 1999
Higher order integrators

Campostrini “wiggles”
 
 Since U0    e H  R0 3  O  5 we are led to consider
the Campostrini “wiggle”
ˆ
U0  U0   U0    e 2 H  R0 2   3  3  O  5
ˆ


 
 
 this be adjusted to give an integration scheme correct to O 
by choosing   3 2
5
 the step size may be kept fixed by taking    2   
 to get integration schemes of arbitrarily high order, all of which
are still reversible and area preserving, we use the recursive
definition
ˆ
2n 1
Un 1 n Un 1  n n Un 1 n   e 2  H  Rn 1 2   n
 n 2n 1  O n2n 3 
n
n


 with  n  2n 1 2 and  n   2   n 
85
NTU, Taipei, Taiwan; 10 September, 1999
Higher order integrators

So why aren’t they used in practice?
 consider the simplest leapfrog scheme for a single simple
harmonic oscillator with frequency 
 for a single step we have
2
 q t 

 q t     1  21  




 pt    
1  3 1  1  2  pt 




 


4
2
 The eigenvalues of this matrix are
1     i 1 
1
2
2
1
4
 
2
e

 i cos1 1 21  2

 for  > 2 the integrator becomes unstable
86
NTU, Taipei, Taiwan; 10 September, 1999
Higher order integrators
 the orbits change from circles to hyperbolae
 the energy diverges exponentially in  instead of oscillating
 for bosonic systems H » 1 for such a large integration step
size
 for light dynamical fermions there seem to be a few “modes”
which have a large force due to the small eigenvalues of the
Dirac operator, and this force plays the rôle of the frequency 
 for these systems  is limited by stability rather than by the
Metropolis acceptance rate
87
NTU, Taipei, Taiwan; 10 September, 1999
Instabilities

Some recent full QCD
measurements on large
lattices with quite light
quarks
 large CG residual: always
unstable
 small CG residual: unstable
for  > 0.0075
 intermediate CG residual:
unstable for  > 0.0075
163  32 lattice,   0.1355,   5.2, cSW  2.0171
88
NTU, Taipei, Taiwan; 10 September, 1999
Acceptance rates

We can compute the average Metropolis
acceptance rate Pacc
 area preservation implies that
e  H  1
1
1
1
H
H 
 H  H











d

d

e

d

d

e

d

d

e
e



Z
Z
Z
 the probability distribution of H has an asymptotic expansion as the
lattice volume V
   H
1
1
H
PH     d d e    H  ~
exp 
Z
4 H
4 H

 the average Metropolis acceptance rate is thus
Pacc ~ erfc 21
89
H   erfc

1
8
H 2
2 



NTU, Taipei, Taiwan; 10 September, 1999
Acceptance rates

90
The acceptance rate is a function of the
variable x  V  4n  4 for the nth order Campostrini
“wiggle” used to generate trajectories with
 = 1, and x  V  4n  6 for  = 
NTU, Taipei, Taiwan; 10 September, 1999
Cost and dynamical critical exponents

To a good approximation, the cost C of a
Molecular Dynamics computation is
proportional to the total time for which we have
to integrate Hamilton’s equations.
 the cost per independent configuration is then

C  1  2 A 

 note that the cost depends on the particular operator under
consideration
 the optimal trajectory lengths obtained by minimising the cost
as a function of the parameters , , and  of the algorithm
91
NTU, Taipei, Taiwan; 10 September, 1999
Cost and dynamical critical exponents

While the cost depends upon the details of the
implementation of the algorithm, the way that it
scales with the correlation length  of the
system is an intrinsic property of the algorithm;
C   z where z is the dynamical critical
exponent.
 for local algorithms the cost is independent of the trajectory
length, C  1  2 A, and thus minimising the cost is
equivalent to minimising A
 free field theory analysis is useful for understanding and
optimising algorithms, especially if our results do not depend
on the details of the spectrum
92
NTU, Taipei, Taiwan; 10 September, 1999
Cost and dynamical critical exponents

HMC
 for free field theory we can show that choosing    and
 =  / 2 gives z = 1
 this is to be compared with z = 2 for constant 
 the optimum cost for HMC is thus

C  V V

4 1/ 4
 /    V 5 / 4
 for nth order Campostrini integration (if it were stable) the cost
is

C  V V
93

1
4n  4 4n  4
 /  
4n 5
 V 4n  4

NTU, Taipei, Taiwan; 10 September, 1999
Cost and dynamical critical exponents

LMC
 for free field theory we can show that choosing  =  and
 =  / 2 gives z = 2
 the optimum cost for LMC is thus

C  V V

6 1/ 3
 /  2  V 4 / 3 2
 for nth order Campostrini integration (if it were stable) the cost
is
2n  4
1

C  V V 4n  6
94

2n  3
 /  2  V 2n 3 2
NTU, Taipei, Taiwan; 10 September, 1999
Cost and dynamical critical exponents

L2MC (Kramers)
 in the approximation of unit acceptance rate we find that
setting  =  and suitably tuning  we can arrange to get z = 1
 however, if Pacc  1 then the system will carry out a random
walk backwards and forwards along a trajectory because the
momentum, and thus the direction of travel, must be reversed
upon a Metropolis rejection.
95
NTU, Taipei, Taiwan; 10 September, 1999
Cost and dynamical critical exponents
 a simple-minded analysis is that the average time between
rejections must be O() to achieve z = 1
 this time is approximately


n 0
Pacc
n
1 
Pacc 
n  
1  Pacc
Pacc
 for small  we have 1  Pacc  erf kV 6, hence we are
required to scale  so as to keep V 4 2 fixed
 This leads to a cost for L2MC of

C  V V 4 2

1/ 4
 /    V 5 / 4 3 / 2
 Or using nth order Campostrini integration

C  V V

1
4n  4 2 4n  4

 /  
4 n  5 2n  3
 V 4 n  4 2n  2

 A more careful free field theory analysis leads to the same
conclusions
96
NTU, Taipei, Taiwan; 10 September, 1999
Cost and dynamical critical exponents

Optimal parameters for
GHMC
 (almost) analytic calculation in
free field theory
 minimum cost for GHMC
appears for acceptance
probability in the range 40%90%
 very similar to HMC
 minimum cost for L2MC
(Kramers) occurs for
acceptance rate very close
to 1
 and this cost is much larger
97
NTU, Taipei, Taiwan; 10 September, 1999
Reversibility

Are HMC trajectories reversible and area
preserving in practice?
 the only fundamental source of irreversibility is the rounding
error caused by using finite precision floating point arithmetic
– for fermionic systems we can also introduce irreversibility by
choosing the starting vector for the iterative linear equation solver
time-asymmetrically
• we might do this if we want to use (some extrapolation of) the
previous solution as the starting vector
– floating point arithmetic is not associative
– it is more natural to store compact variables as scaled integers
(fixed point)
• saves memory
• does not solve the precision problem
98
NTU, Taipei, Taiwan; 10 September, 1999
Reversibility
 data for SU(3) gauge theory
and QCD with heavy quarks
show that rounding errors are
amplified exponentially
– the underlying continuous time
equations of motion are
chaotic
– Liapunov exponents
characterise the divergence of
nearby trajectories
– the instability in the integrator
occurs when H » 1
• zero acceptance rate
anyhow
99
NTU, Taipei, Taiwan; 10 September, 1999
Reversibility
 in QCD the Liapunov exponents appear to scale with  as the
system approaches the continuum limit   
–  = constant
– this can be interpreted as saying that the Liapunov exponent
characterises the chaotic nature of the continuum classical
equations of motion, and is not a lattice artefact
– therefore we should not have to worry about reversibility breaking
down as we approach the continuum limit
– caveat: data is only for small lattices, and is not conclusive
100
NTU, Taipei, Taiwan; 10 September, 1999
Reversibility
 Data for QCD with
lighter dynamical quarks
– instability occurs close
to region in  where
acceptance rate is near
one
• may be explained
as a few “modes”
becoming unstable
because of large
fermionic force
– integrator goes
unstable if too poor an
approximation to the
fermionic force is used
163  32 lattice,   0.1355,   5.2, cSW  2.0171
101
NTU, Taipei, Taiwan; 10 September, 1999
Update ordering for LHMC

Consider the update of a single site x using the
LHMC algorithm
F
 2   
  x   1     x   2 



 the values of  at all other sites are left unchanged
 this may be written in matrix form as
M x yz   yz


1
  xy   yz  2   y  ˆ ,z   y  ˆ ,z 
 


 2   
Px yz   yz xy
102
   M x  Px where

NTU, Taipei, Taiwan; 10 September, 1999
Update ordering for LHMC
 for a complete sweep through the lattice consisting of exactly
V  Ld updates in some order x1, x2,, xV, we have
  M x1  Px1  M x2 M x1  Px1   Px2    M  P
 it is convenient to introduce the Fourier transformed fields ~
and momenta ~
 the corresponding Fourier transformed single site update
matrix is
2i  p  q  x

 
~
Mx
103
pq
  pq
1
 e
V
L
2

2
  2  cos q  1
L
 

NTU, Taipei, Taiwan; 10 September, 1999
Update ordering for LHMC
 we want to compute integrated autocorrelation functions for
~2 ~2
operators like the magnetic susceptibility   0  0
– the behaviour of the autocorrelations of linear operators such as the
magnetisation ~0 can be misleading
 in order to do this it is useful to consider quadratic operators as linear
~  ~ ~ , so
operators on quadratic monomials in the fields, 
pq
~  
~
 
00
00
p q
– these quadratic monomials are updated according to
~   ~~  M~  P~ M~  P~

pq
p q
p
q
Q~
Q
~
– so, after averaging over the momenta we have   M   P

– where Ppq 
Q
 Ppr Pqr
Q
and M pq
,rs  M pr Mqs
r
104
NTU, Taipei, Taiwan; 10 September, 1999
Update ordering for LHMC
 the integrated autocorrelation function for  is
1  A 



t 0
t 0
 C t   


pq
M  
Q t

105
 
 M
t 0
~ 
~


pq 00
00, pq
~2  
~

00
00
t 0

 t  0 
2
Q t
00,00

 1 M

Q 1
~
~
 

pq
00

2

00,00
NTU, Taipei, Taiwan; 10 September, 1999
Update ordering for LHMC

Random updates
 the update matrix for a sweep after averaging over the V
independent choices of update sites is just
V
1
~
~  1
~ 
M Q     MzQi     MzQ 
i 1 V zi 1
 V z 1 
V
V
V
 we thus find that 1  A

~
 1 MQ

1

00,00

1
2m 2
 2
1  e m  2d
 
d
 1
 1
0
 O  

O
m

O
 
2
V

m
 
V 
 which tells us that z = 2 for any choice of the overrelaxation
parameter 
106
NTU, Taipei, Taiwan; 10 September, 1999
Update ordering for LHMC

Even/Odd updates
 in a single site update the new value of the field at x only
depends at the old value at x and its nearest neighbours, so
even sites depend only on odd sites and vice versa
– the update matrix for a sweep in Fourier space thus reduces to a
2 × 2 matrix coupling ~p and ~p  1 L,,L 
2
~
Q
– M thus becomes a 3 × 3 matrix in this case
 the integrated autocorrelation function is
d 2    7 2  16  8
4


A 


O
m
2
9 2   
2m
– this is minimised by choosing   2 
A 
– hence z = 1
107
d 1
  Om 
m 2
 
m
 O m 2 , for which
d
NTU, Taipei, Taiwan; 10 September, 1999
Update ordering for LHMC

Lexicographical updates
 this is a scheme in which  x  ˆ is always updated before  x
– except at the edges of the lattice
 a single site update depends on the new values of the
neighbouring sites in the + directions and the old values in
the - directions
newthe sweep update implicitly as    O  N   P 
– so we may write
– where
Oxy  1    xy 
Rxy 
108
old

1

1
N



Rxy


R
xy
xy
2  x , y  ˆ
2  x , y  ˆ
L
L
 
 
updateable

 x,y  ˆ  x 1   x,y  ˆ  x L 

 
L
2


 2   
P 

NTU, Taipei, Taiwan; 10 September, 1999
Update ordering for LHMC
 the sweep update matrices are easily found explicitly
M  1  N  O
P  1  N  P
~
~
 the Fourier transforms O and N are diagonal up to surface
~
terms R , which we expect to be suppressed by a factor of 1/L
1
1
m

2
 2d  
 1

O
 
 2   m 2 m 2  2d
L
m 2  2d
– which is minimised by the choice  
m2  d
– for which A  0 and hence z = 0
 we thus obtain 1  A 
2


 this can be achieved for most operators
– though with different values of 
– the dynamical critical exponent for the exponential autocorrelation
time is still one at best
109
NTU, Taipei, Taiwan; 10 September, 1999
Chebyshev approximation

What is the best polynomial approximation p(x)
to a continuous function f(x) for x in [0,1] ?
 Weierstrass’ theorem: any continuous function can be
arbitrarily well approximated by a polynomial
n
 k  n n


p
x

f
– Bernstein polynomials: n
  n   k  t (1  t )n k
k 0    
1n
1

n






p

f

dx
p
x

f
x
 minimise the appropriate norm
n




0
where n  1
 Chebyshev’s theorem
– the error |p(x) - f(x)| reaches its maximum at exactly d+2 points on
the unit interval
– there is always a unique polynomial of any degree d which
minimises p  f   max p x   f  x 
0 x 1
110
NTU, Taipei, Taiwan; 10 September, 1999
Chebyshev approximation
– necessity:
• suppose p - f has less than d + 2 extrema of equal magnitude
• then at most d + 1 maxima exceed some magnitude
• this defines a “gap”
• we can construct a polynomial q of degree d which has the opposite
sign to p - f at each of these maxima (Lagrange interpolation)
• and whose magnitude is smaller than the “gap”
• the polynomial p + q is then a better approximation than p to f
111
NTU, Taipei, Taiwan; 10 September, 1999
Chebyshev approximation
– sufficiency:
• suppose there is a polynomial p  f  p  f 


• then p  xi   f  xi   p xi   f  xi  at each of the d+2
extrema of px   f x 
• therefore p’ - p must have d+1 zeros on the unit interval
• thus p’ - p = 0 as it is a polynomial of degree d
112
NTU, Taipei, Taiwan; 10 September, 1999
Chebyshev approximation

Convergence is often exponential in d
 Example
d
– the best approximation of degree d-1 over [-1,1] to x is
pd  x   x d  21 
Td  x 
1
– where the Chebyshev polynomials are Td  x   cos d cos  x 
d 1


– the notation comes from a different transliteration of Chebyshev!
– and the error is x  pd  x 

 21 
d 1
d

Td  x    2e d ln 2
Chebyshev’s theorem is easily extended to
rational approximations
 rational functions with equal degree numerator and
denominator are usually best
 convergence is still often exponential
 and rational functions usually give much better approximations
113
NTU, Taipei, Taiwan; 10 September, 1999
Chebyshev approximation
 a realistic example of a rational approximation is
x  2.34756610 45 x  0.10483446 00 x  0.00730638 14 
1
 0.39046039 01
x  0.41059997 19 x  0.02861654 46 x  0.00127791 93 
x
– this is accurate to within almost 0.1% over the range [0.003,1]
 using a partial fraction expansion of such rational functions
allows us to use a multiple mass linear equation solver, thus
reducing the cost significantly.
 the partial fraction expansion of the rational function above is
1
0.05110937 75
0.14082862 37
0.59648450 33
 0.39046039 01 


x
x  0.00127791 93 x  0.02861654 46 x  0.41059997 19
– this appears to be numerically stable.
114
NTU, Taipei, Taiwan; 10 September, 1999
Chebyshev approximation
115
NTU, Taipei, Taiwan; 10 September, 1999
Multibosons

Chebyshev polynomial approximations were
introduced by Lüscher in his Multiboson
method

1 
1
detM   det P M   det  M  i  
 i

†
1
  det M  i     di e   M  i 

i

i
 no solution of linear equations is required
 one must store n scalar fields
 the dynamics of multiboson fields is stiff, so very small step
sizes are required for gauge field updates
116
NTU, Taipei, Taiwan; 10 September, 1999
Reweighting

Making Lüscher’s method exact
 one can introduce an accept/reject step
 one can reweight the configurations by the ratio
detM detP M 
– this factor is close to one if P(M) is a good approximation
 if one only approximates 1/x over the interval [,1] which does
not cover the spectrum of M, then the reweighting factor may
be large
 it has been suggested that this will help to sample
configurations which would otherwise be suppressed by small
eigenvalues of the Dirac operator
117
NTU, Taipei, Taiwan; 10 September, 1999
Noisy methods

Since the GHMC algorithm is correct for any
reversible and area-preserving mapping, it is
often useful to use some modified MD process
which is cheaper
 accept/reject step must use the true Hamiltonian, of course
 tune the parameters in the MD Hamiltonian to maximise the
Metropolis acceptance rate
 add irrelevant operators?
– a different kind of “improvement”
118
NTU, Taipei, Taiwan; 10 September, 1999
Noisy inexact algorithms

Replace the force -S’ in the leapfrog equations
of motion by some noisy estimator F̂ such that
Fˆ  S

 Choose the noise  independently for each leapfrog step
 The equation for the shift in the fixed point distribution is now
 H S 
e
 ,
1
 
2
 For the noisy Langevin algorithm we have H

O

 ,
2
 H
and Sn  ,  O  , but since e
 O  4 we obtain
 
S  O  2
 
 ,
 
 For the noisy Hybrid algorithm H
 O  and
 ,
Sn
 O  0 , so S  O 
 ,
119
 
NTU, Taipei, Taiwan; 10 September, 1999
Noisy inexact algorithms

This method is useful for non-local actions,
such as staggered fermions with nf  4 flavours
 here the action may be written as

1
 nf tr M M 

1
4 nf
tr ln M, and the force is
– a cheap way of estimating such a trace noisily is to use the fact
that tr Q  ij iQij j

– there is no need to use Gaussian noise for this estimator —
indeed Z noise might well be better
 using a non area preserving irreversible update step the noisy
Hybrid algorithm can be adjusted to have H  ,  O  2 , and
thus S  O  2 too
 
 
• this is the Hybrid R algorithm
 Campostrini’s integration scheme may be used to produce
higher order Langevin and Hybrid algorithms with S  O  2n
for arbitrary n, but this is not applicable to the noisy versions

120

NTU, Taipei, Taiwan; 10 September, 1999
Kennedy–Kuti algorithm

Noise without noise
 suppose it is prohibitively expensive to evaluate R  Q  Q, 
but that we can compute an unbiased estimator for it cheaply,
Rˆ  

R
 if we look carefully at the proof that the Metropolis algorithm
satisfies detailed balance we see that the ratio R is used for
two quite different purposes
– it is used to give an ordering to configurations: ’ <  if R < 1, that
is, if Q(’) < Q()
– it is used as the acceptance probability if ’ < 
 we can produce a valid algorithm using the noisy estimator R̂
for the latter rôle just by choosing another ordering for the
configurations
– A suitable ordering is provided by any cheap function f such that
the set of configurations for which f(’) = f() has measure zero
121
NTU, Taipei, Taiwan; 10 September, 1999
Kennedy–Kuti algorithm
 we now define the acceptance probability as
P        Rˆ  f    f      Rˆ  f    f  
– where
 are to be chosen so that 0  P      1
• taking
  21 ,   0 is often a convenient choice
– detailed balance is easily established by considering the cases
f() > f(‘) and f() < f(‘) separately
– in practice the condition 0  P      1 can rarely be
satisfied exactly, but usually the number of violation can be made
(exponentially) small
122
NTU, Taipei, Taiwan; 10 September, 1999
Noisy fermions
 an interesting application is when we have a non-local
fermionic determinant in the desired fixed point distribution,
such as det M    exp tr lnM  
– in this case we need to produce an unbiased estimator of
det M  
 exp tr ln M  MM 1
R
det M  




 exp tr ln 1  MM 1  exp tr ln1  Q
123
NTU, Taipei, Taiwan; 10 September, 1999
Noisy fermions

 it is easy to construct unbiased estimators for tr Q n

– let  be a vector of random numbers whose components are
chosen independently from a distribution with mean zero and
variance one (Gaussian or Z noise, for example)
ˆn 
– set Q
n

ij  j, then Qˆ n


Q
ij i


 tr Q n

ˆ n  are independent, then
– furthermore, if Q̂n and Q
Qˆ nQˆ n 



 tr Q n tr Q n

 as long as all the eigenvalues of Q lie within the unit circle
then

1
n

tr ln1   Q    tr   Q
n 1n

– similarly the exponential can be expanded as a power series
124
NTU, Taipei, Taiwan; 10 September, 1999
Bhanot–Kennedy algorithm

In order to obtain an unbiased estimator for R
we sum these series stochastically

 suppose f  x   1  n 1fn x n where 0  pn  fn 1 fn  1
– our series can be transformed into this form
 this can be written in “factored form” as



f  x   1  p0 x  p1 x 2  p2 x 3  
 and may be summed stochastically using the following scheme
k
2
fˆ x   1 p0  x  p1  x  p2   pk 1  x pk 
1 p0
1 p1
1 p2
1 pk
125
NTU, Taipei, Taiwan; 10 September, 1999
“Kentucky” algorithm

This gets rid of the systematic errors of the
Kennedy—Kuti method by eliminating the
violations
 promote the noise to the status of dynamical fields
– Suppose that we have an unbiased estimator
Rˆ  , 

  d  p Rˆ  ,   detM  
1
1
S  
S   ˆ








d

e
det
M






d

e
R  ,   



Z
Z
1
  d d e S   p Rˆ  ,  
Z
1
  d d e S   p  Rˆ  ,  sgnRˆ  ,  
Z
 sgnRˆ  ,  
 
 ,
126
NTU, Taipei, Taiwan; 10 September, 1999
“Kentucky” algorithm
 the  and  fields can be updated by alternating Metropolis
Markov steps
 there is a sign problem only if the Kennedy—Kuti algorithm
would have many violations
 if one constructs the estimator using the Kennedy—Bhanot
algorithm then one will need an infinite number of noise fields
in principle
– will these ever reach equilibrium?
127
NTU, Taipei, Taiwan; 10 September, 1999
Cost of noisy algorithms

Inexact algorithms
 these have only a trivial linear volume dependence, with z = 1
for Hybrid and z = 2 for Langevin

Noisy inexact algorithms
 The noisy trajectories deviate from the true classical trajectory
by a factor of 1  O  for each step, or 1  O   for a
trajectory of   steps
– This will not affect the integrated autocorrelation function as long
as   «1
– Thus the noisy Langevin algorithm should have
C  V 2
– Thus the noisy Hybrid algorithm should have
C  V      V 2
128
NTU, Taipei, Taiwan; 10 September, 1999
Cost of noisy algorithms

Noisy exact algorithms
 these algorithms use noisy estimators to produce an (almost)
exact algorithm which is applicable to non-local actions
– a straightforward approach leads to a cost of
2
C  V V 2     V 2 2 for exact noisy Langevin


• it is amusing to note that this algorithm should not care what
force term is used in the equations of motion
– an exact noisy Hybrid algorithm is also possible, and for it
C  V V     V 2
129
NTU, Taipei, Taiwan; 10 September, 1999
Cost of noisy algorithms
 these results apply only to operators (like the magnetisation)
which couple sufficiently strongly to the slowest modes of the
system. For other operators, like the energy in 2 dimensions,
we can even obtain z = 0

Summary
 too little noise increases critical slowing down because the
system is too weakly ergodic
 too much noise increases critical slowing down because the
system takes a drunkard’s walk through phase space
 to attain z = 1 for any operator (and especially for the
exponential autocorrelation time) one must be able to tune the
amount of noise suitably
130
NTU, Taipei, Taiwan; 10 September, 1999
PHMC

Polynomial Hybrid Monte Carlo algorithm:
instead of using Chebyshev polynomials in the
multiboson algorithm, Frezzotti & Jansen and
deForcrand suggested using them directly in
HMC
 polynomial approximation to 1/x are typically of order 40 to 100
at present
 numerical stability problems with high order polynomial
evaluation
 polynomials must be factored
 correct ordering of the roots is important
 Frezzotti & Jansen claim there are advantages from using
reweighting
131
NTU, Taipei, Taiwan; 10 September, 1999
RHMC

Rational Hybrid Monte Carlo algorithm: the
idea is similar to PHMC, but uses rational
approximations instead of polynomial ones
 much lower orders required for a given accuracy
 evaluation simplified by using partial fraction expansion and
multiple mass linear equation solvers
 1/x is already a rational function, so RHMC reduces to HMC in
this case
 can be made exact using noisy accept/reject step
132
NTU, Taipei, Taiwan; 10 September, 1999
Ginsparg–Wilson fermions

Is it possible to have chiral symmetry on the
lattice without doublers if we only insist that the
symmetry holds on shell?
 such a transformation should be of the form (Lüscher)
 5 1 21aD 
 e
 5 1 21aD 
 ;  e
 for it to be a symmetry the Dirac operator must be invariant
 1 21aD  5
 5 1 21aD 
De
De
D
 for a small transformation this implies that
1  21 aD  5D  D 5 1  21 aD   0
 which is the Ginsparg-Wilson relation
 5D  D 5  aD 5D
133
NTU, Taipei, Taiwan; 10 September, 1999
Neuberger’s operator

We can find a solution of the Ginsparg-Wilson
relation as follows
 let the lattice Dirac operator to be of the form
aD  1   5ˆ5 ; ˆ5†  ˆ5 ; aD†  1  ˆ5 5   5aD 5
2
 this satisfies the GW relation if ˆ5  1
 and it must also have the correct continuum limit
D    m
 
 ˆ5   5 a  m   1  O a 2
 both of these conditions are satisfied if we define (Neuberger)
ˆ5   5
134
DW
DW  1
 1 DW  1
†
 sgn 5 Dw  1
NTU, Taipei, Taiwan; 10 September, 1999
Neuberger’s operator



There are many other possible solutions
The discontinuity is necessary
This operator is local (Lüscher)
 at least if we constrain the fields such that the plaquette < 1/30
 by local we mean a function of fast decrease, as opposed to
ultralocal which means a function of compact support
135
NTU, Taipei, Taiwan; 10 September, 1999
Computing Neuberger’s operator

Use polynomial approximation to Neuberger’s
operator
 high degree polynomials have numerical instabilities
 for dynamical GW fermions this leads to a PHMC algorithm

Use rational approximation
 requires solution of linear equations just to apply the operator
 for dynamical GW fermions this leads to an RHMC algorithm
 requires nested linear equation solvers in the dynamical case
 nested solvers can be avoided at the price of a much more illconditioned system
 attempts to combine inner and outer solves in one Krylov
space method
136
NTU, Taipei, Taiwan; 10 September, 1999
Computing Neuberger’s operator

Extract low-lying eigenvectors explicitly, and
evaluate their contribution to the Dirac operator
directly
 efficient methods based on Ritz functional
 very costly for dynamical fermions if there is a finite density of
zero eigenvalues in the continuum limit (Banks-Casher)
 might allow for noisy accept/reject step if we can replace the
step function with something continuous (so it has a
reasonable series expansion)

Use better approximate solution of GW relation
instead of Wilson operator
 e.g., a relatively cheap “perfect action”
137
NTU, Taipei, Taiwan; 10 September, 1999
Download