Jefferson Lab State of Play of Dynamical Fermions Bálint Joó Jefferson Lab Newport News, VA, U.S.A -1- January 23, 2006 Jefferson Lab Contents I: Technology • Rational Hybrid Monte Carlo • Multiple Time Scale Intergrators • Scale Splitting Schemes • Differentiable Smearing -2- January 23, 2006 Jefferson Lab Contents II: State of Play • Staggered Like Simulations • Wilson Like Simulations • DWF like simulations • Overlap like simulations -3- January 23, 2006 Jefferson Lab Contents III: Possibilities and Constraints • Algorithmic Ideas • JLab programme and resource constraints • JLab future plans -4- January 23, 2006 Jefferson Lab Rational Hybrid Monte Carlo • Essentially same as Hybrid Monte Carlo BUT • Fermionic Functions in action replaced by Rational Approximations 1 • good for non-local actions: eg (M †M ) n • Inversion a natural operation for rational functions: Applying f (M ) is as hard as f −1(M ) • Write rational function in partial fraction form: f (M ) ≈ R(M ) = A X pi (M + qi)−1 i apply with multi-mass solver -5- January 23, 2006 Jefferson Lab (R)HMC Algorithm • Start with configuration U • Refresh momenta (Gaussian Noise), Pseudofermions • Reversible and Area Preserving Molecular Dynamics to generate U 0 in fixed pseudofermion background – details of MD later 0 • Accept/Reject with Pacc = min 1, e−[H(U )−H(U )] • Kennedy-Kuti noisy accept/reject to correct for Rational Approximation • If accepted, new configuration is U 0, otherwise it is U . -6- January 23, 2006 Jefferson Lab • Write Hamiltonian 1 1 H = π 2 + φ† R (M †M )− n φ 2 • eg n = 2 for 2 flavour staggered, 1 flavour wilson, DWF • Refresh Fermions: φ = R0 h 1 † (M M ) 2n χ with Gaussian χ i † • R f (M M ) is a rational approximation to f (M † M ): X pi Pm (M †M ) R M M = =A † † Qn(M M ) i M M + qi h † i -7- January 23, 2006 Jefferson Lab • It is easy to make rational approximation good to machine (or solver) accuracy using Remez Algorithm • Eliminates need for noisy accept reject step, to correct for badness of approximation • Successfully used in 2+1 flavour ASqTAD Staggered simulations and 2+1 flavour DWF simulations • Details (read all about it) – Horváth, Kennedy, Sint, hep-lat/9809092 – Clark, Kennedy, hep-lat/0309084 – Clark, deForcrand, Kennedy, hep-lat/0510004 – Mike Clark’s PhD thesis (available on request) -8- January 23, 2006 Jefferson Lab Introduction to Scale Splitting Ideas Most advances in algorithms in 2005 came from some kind of scale splitting schemes in HMC like algorithms. • Lüscher’s Domain Decomposition - split space • Hasenbusch Mass Preconditiononing – add auxiliary mass scale • Clark, Kennedy, deForcrand – approximation coefficients split scale The separated scales are then simulated on different timescales. Crucial Ingredient: Multiple Timescale Integrator -9- January 23, 2006 Jefferson Lab Multiple Timescale Integrators 2 • Consider Hamiltonian: H = 1 2 p + S(q) over phase space states (p, q). • Time Evolution Operator is:U (δτ ) = e d δτ dt = eδτ P +Q = eδτ H • Consider symplectic coordinate and momentum update operators: UQ(δτ ) = eδτ Q : (p, q) → (q ⊕ pδτ, p) UP (δτ ) = eδτ P : (p, q) → (q, p ⊕ Ṡδτ ) where UQ and UP are reversible and have unit determinant • Use Baker-Campbell-Hausdorff formula: 1 1 1 exp(A) exp(B) = exp(A+B+ [A, B]+ [A.[A, B]]+ [B, [B, A]]+. . .) 2 12 12 -10- January 23, 2006 Jefferson Lab • Single Time Scale (Leapfrog): δτ δτ U (3) (δτ ) = UQ UP (δτ ) UQ 2 2 = exp(δτ H + O(δτ 3)) 2 + S (U ) + S (U ) and operators • Now consider: H = 1 p 1 2 2 UP1 (δτ ) = eδτ P1 : (p, q) → (q, p ⊕ Ṡ1δτ ) UP2 (δτ ) = eδτ P2 : (p, q) → (q, p ⊕ Ṡ2δτ ) • Now construct operators: δτ δτ U1(δτ ) = UP1 UQ(δτ )UP1 2 2 U2(δτ ) = UP2 (δτ ) -11- January 23, 2006 Jefferson Lab Sexton Weingarten Integrator: • Consider: δτ U SW (δτ ) = U2 2 δτ U1 n n δτ U2 2 • Split Integration onto 2 time scales: δτ and δτ /n. • Can recursively introduce more scales eg: δτ , δτ /n1, δτ /(n1n2) • Choose n1 and n2 so to optimize (equalize?) Ṡiδτi force terms. • Quite an old idea: (J. C. Sexton, D. H. Weingarten, Nucl. Phys. B380, 665, 1992, M. J. Peardon, J. C. Sexton, Nucl. Phys. Proc. Suppl. 119:985-987, 2003) • New advances: How to split the fermionic part of the action -12- January 23, 2006 Jefferson Lab Omelyan Integrator • Alternative to leapfrog integrator. Originally by Omelyan, introduced to lattice QCD by deforcrand and Takaishi (hep-lat/0505020) • Instead of combining UP and UQ as in the leapfrog case, combine as U (δτ ) = UQ (λδτ ) UP δτ UQ ((1 − 2λ)δτ ) UP 2 δτ UQ (λδτ ) 2 • tune coefficient λ to minimise error on 3rd order term. • Roughly a 50% improvement is gained (more solves, but increased step-size) • Can be multi-scaled with a small amount of care -13- January 23, 2006 Jefferson Lab Hasenbusch Preconditioning • Write the desired determinant as: † det M M † † det M M = det M2M2 × † det M2M2 • Corresponding Fermion Action: S h i † † † −1 † −1 = ψ (M2 M2) ψ + φ M2(M M ) M2 φ = S1(ψ †, ψ) + S2(φ† , φ) • Choose M2 similar to M ⇒ M1−1M2 ≈ 1, Ṡ2 ≈ 0 • Now put S1 and S2 on different time scales (S2 on long timescale) -14- January 23, 2006 Jefferson Lab • Forces in S1 and S2 have different sizes, both smaller than before split • For Wilson fermions: Add small imaginary mass term to M2 eg: M2 = M + iρ † – Spectrum of M2M2 bounded from below • Used in – Wilson-Clover Simulations: Hasenbusch,Jansen: hep-lat/0211042 – Wilson Simulations: Urbach et. al. hep-lat/0506011, hep-lat/0510064 – Overlap Simulations: DeGrand and Schaefer hep-lat/0412005, heplat/0506021, hep-lat/0508025 -15- January 23, 2006 Jefferson Lab Nroots Preconditioning • Write † det M M = n Y i=1 † det M M 1 n • with corresponding actions: S= n X i=1 1 † † ψi M M n ψ • This kind of determinant splitting is an old idea (Joó, Horváth, Liu, hep-lat/0112033) but most fruitful within context of RHMC algorithm (Clark, Kennedy, hep-lat/0409134, Clark, Kennedy, deForcrand heplat/0510004) -16- January 23, 2006 Jefferson Lab • Similar in spirit to Hasenbusch preconditioning, each pseudofermion 1 κ(M ) term is now better conditioned: κ(M 1/n) = n 1 • (M †M ) n approximated with Rational Approximation, Coefficients from Remez algorithm (fits extremely well with RHMC).- where you do this thing anyway to do single flavour simulations. – Amounts to doing 2 flavour simulation as 1+1 flavour flavour RHMC • No mass tuning needed. Taking root divides condition number equally between terms. (Optimally according to Tony Kennedy) -17- January 23, 2006 Jefferson Lab Rational Multiple Timescaling • Rational Action: S=A X i φ† ! pi φ † (M M + qi ) • Rational Force: F = −Aφ† X i pi ! 1 1 †M + M †Ṁ φ Ṁ † † (M M + qi) (M M + qi) • Use multi shift CG – cost dominated by smallest shift • deForcrand & Clark: small shifts have high cost BUT small force -18- January 23, 2006 Jefferson Lab 1000 Relative force CG iterations CG iterations 800 600 400 200 0 0 1 2 3 4 5 Partial fraction 6 7 8 9 Thanks to M. Clark for figure -19- January 23, 2006 Jefferson Lab • Run different poles on different timescales • Ratio of scales may be guessed using pi and qi -s • Reduces utility of multi mass • But can then use chronological solver in principle • BUT – smallest shift, highest cost, longest stepsize – chronological solver least useful • Easy to combine with Nroots preconditioning • See deForcrand, Clark, Kennedy (hep-lat/0510004) -20- January 23, 2006 Jefferson Lab Splitting Space: Domain Decomposition • Discussed Extensively by Luigi • Application of Schwarz Domain Decomposition by Lüscher (hep-lat/0409106, hep-lat/0509152) • Draw momenta and pseudofermion fields, then split lattice into blocks • Identify active links in the blocks so that the blocks decouple • Integrate each block in MD time in the usual way, and put the boundary fields on a slower time scale. • Accept Reject as per normal HMC • Perform a random tranlsation on the links (to ensure ergodicity) -21- January 23, 2006 Jefferson Lab • The blocks provide a natural IR cutoff • This reduces ocurrence low (near singular) modes in block Dslash-es • Block size has to be small enough • but want block big enough to contain lots of active links • Updating inter-block boundaries deals with UV effects. • Suits ’fat blocks’ - most efficient on clusters... • Successful in: Wilson simulations, Wilson Clover simulations -22- January 23, 2006 Jefferson Lab Stout Link Smearing • Smearings used often in improved gauge actions • But most smearings involve a non-differentiable projection into SU (3) • This makes molecular dynamics difficult • Stout Links (Morningstar and Peardon, Phys. Rev. D69 (2004) 054501, hep-lat/0311018 • Stout links are differentiable through recursive procedure • Have been shown to be useful in overlap simulations (DeGrand & Schaefer) -23- January 23, 2006 Jefferson Lab Basic Idea • Take APE smearing staple sum, and close to form a loop • Project into the Lie algebra su(3) • exponentiate back into SU (3): – Cayley-Hamilton Theorem: eiQ = f1 + f1 Q + f2Q2 Q is traceless, antihermitian (we use this in HMC a lot) • Need time derivatives of Q, Q2 and f0 , f1, f2 . • Well behaved if dQ dt is well behaved -24- January 23, 2006 Jefferson Lab as the adverts used to say... -25- January 23, 2006 Jefferson Lab DWF Tricks • Combine DWF and PV matrices in the one flavour term (in 2+1 flavour RHMC) • Previously had: 1 1 S = ψ †(M †(m)M (m))− 2 ψ + φ†(M (1)† M (1)) 2 φ • Now consider † S = ψ (M † 1 1 1 − † † (1)M (1)) 4 (M (m)M (m)) 2 (M (1)M (1)) 4 ψ • Reduces Noise -26- January 23, 2006 Jefferson Lab Improved gauge actions • Improved gauge actions smooth gauge fields (even on coarser lattices) • Smooth fields improve the spectrum of auxiliary Dirac Operator H • Recent study by UKQCD/RBC Collaboration (Peter’s Lattice Talk) -27- January 23, 2006 Jefferson Lab Improved 5D Operators • Various new 5D operators have been suggested for Chiral Fermion Physics – Optimal DWF fermions - T.W.Chiu, hep-lat/0209153 – Alternative to DWF fermions - Neuberger, hep-lat/0005004 – Continued Fraction operator - Neuberger, hep-lat/9901003, Borici et.al hep-lat/0110070, Wenger hep-lat/0403003 – Möbius DWF fermions - Brower, Orginos, Neff, hep-lat/0409118, hep-lat/0511031 -28- January 23, 2006 Jefferson Lab • Realisation: (Edwards et al, hep-lat/0510086) that all improved 5D operators are different facets of a rational approximation to the Chiral Fermion operator distinguished through: – Representation - the structure of the matrix – Approximation - choice of coefficients – Kernel - scaling behaviour • Partially Quenched Performance study found best contenders: – Continued Fraction with Zolotarev Coefficients – Partial Fraction with Zolotarev Coefficienct – Standard (Shamir) DWF form was least effective -29- January 23, 2006 Jefferson Lab Cost vs Mres Dyn. DWF (Ls=12), mf=0.020 Cost Normalised by Unscaled Shamir DWF 5 Shamir (α=1, HT, tanh) Scaled Shamir (α=1.7, HT, tanh) Chiu DWF (b5=1, c5=1, Zolotarev) off graph 4 Borici (α=1, Hw, tanh) Continued Fraction (Hw, Zolotarev) Continued Fraction (HT, Zolotarev) DWF: Ls=32 3 DWF: Ls=24 2 1 0 1e-06 DWF: Ls=12 CFZ: Ls=8 CFZ: Ls=6 1e-05 0.0001 0.001 | mres | (mShamir/m) 0.01 0.1 1 -30- January 23, 2006 Jefferson Lab Self Criticism • Negative mres still a little troubling - could get rid of it by cooking sign function (as done in the KY papers) • Can’t please everyone: Zolotarev coefficients give small mres , compared to tanh approximation, but not small enough for purists • need explicit handling of low modes of H, make mres = 0. • Only preliminary DF experience - seems similar in cost to usual DWF. • More dynamical fermion studies needed - expensive (QCDOC rack months) -31- January 23, 2006 Jefferson Lab 4D Overlap Advances • As mass becomes small, changing topology becomes difficult – eigenvalue of Hw changes sign – Step function in action → delta function in fermion force – Acceptance goes to 0 even on 64 lattice (Szabó, Lat 2004) Szabó, Lat2004 -32- January 23, 2006 Jefferson Lab Reflection/Refraction Integrator • Phase space has surfaces where eigenmodes of Hw change sign. • Track eigenvalues of H through MD • At level crossing, compute “angle” between momenta and Normal to surface • Reflect/refract accordingly if hN, P i2 < 2∆S, then reflect: otherwise refract: P ← P − 2N hN, P i q P ← P −N hN, P i+N hN, P i 1 − 2∆S/hN, P i2 -33- January 23, 2006 Jefferson Lab • Original approach had step-size errors of O(τ1 ) where τ1 was the MD time needed to reach the zero-ev surface • Approach by Cundy et. al: hep-lat/0502007 remedies this and restores errors to O(τ 2). • Leaders in Overlap game: DeGrand and Schaefer, Cundy et. al (aka the Wuppertal Gang), and of course the original team: Fodor, Katz, Szabo et al (aka the Hungarians) -34- January 23, 2006 Jefferson Lab State of the Art Wilson Simulation Algorithms • Lüscher style Domain Decomposition • Jansen et al: Hasenbusch mass preconditioning and multiple timescales • Lattice Sizes: 243 ×48 with a = 0.06−0.08fm and mπ ≈ 294M eV . • PACS-CS in Japan: to focus 14.3Tflops of PACS-CS onto Wilson Clover simulations (eventually using Domain Decomposition) -35- January 23, 2006 Jefferson Lab State of the Art Staggered Simulations • R-algorithm with 2+1 flavours currently on NERSC archive • Large lattices 403 × 96 at a = 0.09fm, ml /ms = 0.2, 0.4 available through NERSC Gauge Connection (MILC and UKQCD Coordination) • Future running: 2+1 flavour RHMC, Nroots acceleration, Omelyan Multitimescale integrator on QCDOC (UKQCD and MILC using Mike Clark’s code in CPS) • Humongous lattices planned: 483 ×144 at a = 0.06fm, with ml /ms = 0.2, 0.4 -36- January 23, 2006 Jefferson Lab State of the art Twisted Mass Simulations • European Twisted Mass Collaboration (ETMC) (Germany, UK, France, Italy) • State of the art code similar to Wilson: Hasenbusch mass preconditioning and multiple time scales • Plans presented at ILDG 7: a = 0.075 − 0.12fm, L ≈ 2.5fm, 250MeV ≤ mπ ≤ 500MeV -37- January 23, 2006 Jefferson Lab State of the art Domain Wall Simulations • UKQCD-RBC QCDOC Collaboration • O(10) QCDOC Rack Years of concerted and coordinated effort • Normal Shamir formulation, 2+1 flavours, Nroots acceleration, Omelyan integrator with Multiple Timescales, and Hardware Optimized Multimass solver (all the tricks?) in CPS • Runs planned at ILDG7: 163 × 32, Ls = 8, a−1 = 1.5 − 2.2GeV. • According to Lattice 2005 contribution: mres is still about 30% of the lightest quark mass -38- January 23, 2006 Jefferson Lab State of The Art Overlap Simulations • DeGrand and Schaefer – in Boulder, Hasenbusch acceleration, Stout-smearing, improved calculation of tunnelling probability, – but smallish lattices so far. • The Wuppertal Gang (Cundy, Krieg, Lippert, Frommer etc) – Thin links (?) – Own version of reflecting/refracting integrator accurate to O(τ 2 ). – Planned for 163 × 32 lattices according to their Nicosia write up. • JLQCD - plans for large scale Dynamical Overlap using KEK BG/L -39- January 23, 2006 Jefferson Lab Apologies to the unmentioned There are other people doing other things: eg Approximate Overlap Operators (Bietenholz et al), Fodor’s group – the first to publish dynamical overlap simulations. -40- January 23, 2006 Jefferson Lab Algorithmic Games - Low Hanging Fruit • Stout Links in Dynamical Fermion Evolution – Structure and preliminary code ready in Chroma – But Needs debugging – Needs usefulness tests - ie running simulations eg Stout Link Wilson, Stout Link Clover – would suit Graduate Student, PostDoc • More work on Continued Fraction/Partial Fraction 5D operators – Can investigate tuning and most importantly, need (R)HMC runs – Need exact treatment of low e-values/modes – Resource intensive (need Racks of QCDOC and Human Drivers) -41- January 23, 2006 Jefferson Lab • Using Mixed 4D and 5D techniques – Operator Application in 4D, inversion in 5D – 5D inversion solves M φ = χ – so may need 2 5D inversions to get (M †M )−1χ • Hasenbusch Preconditioning – Opens up way for new determinant splittings, combined with... • Multi Timescale Integrators – In Chroma, we have a 2 timescale Sexton Weingarten integrator – Generalize and implement for more timescales -42- January 23, 2006 Jefferson Lab Somewhat Higher Hanging Fruit • Actually proper 4D Overlap Simulations – Needs tidying of 4D overlap code (in chroma) – Needs reflecting/refracting integrator, robust eigensolver techniques – Would probably involve structural changes in Chroma – However, this is playing catchup with Wuppertal, DeGrand etc. -43- January 23, 2006 Jefferson Lab Jefferson Lab Physics • Spectroscopy, Nucleon Excited States, GPDAs, Decays • Chiral operators don’t have +ve definite 4D transfer matrix – Wiggles in Correlation Functions – Excited states difficult. • Consider Wilson-Clover fermions in sea, on large fine lattices • Lüscher:Wilson-Clover simulations should improve (hep-lat/0512021) – eg: a = 0.08fm and V = 243 × 48 with mπ ≈ 300MeV – or a = 0.06fm and V = 323 × 64 -44- January 23, 2006 Jefferson Lab 0.625 0.6 0.575 γk γ5 γk γ4 a meff 0.55 1 0.525 0.5 γk 0.475 0.45 γ5 4 6 8 10 12 14 16 18 20 22 24 26 t From hep-lat/0601137 (thanks to J. Dudek, R. Edwards, D. Richards) -45- January 23, 2006 Jefferson Lab • JLab clusters oversubscribed (current SciDAC allocations) • Dynamical overlap development needs lots of resources, – Lots of algorithmic expertise at Jlab – Little justification to focus limited human effort away from Jlab physics – Need good incentives (to satisfy our masters) such as: ∗ shared computing resource (actual computer time) ∗ share human resource (especially developers, runners) -46- January 23, 2006 Jefferson Lab Chroma Development Plan • Fermion Sector rework to cope with Dynamical Clover (in progress) • MD interfaces need to change (extend) to support multi timescaling • Need to write a general multi-timescale integrator • Inversion structure needs rework – Make applying an inverse like applying matrix. – will clean up MD stuff, and propagators, easier to choose inverters • Above changes are beneficial to all DF simulations • Will allow us to try mixed 4D-5D approaches -47- January 23, 2006 Jefferson Lab Current JLab Plans • SciDAC ends this June. Currently applying for SciDAC 2. – Intrastructure application sometime in February – Applications for projects in late Feb, March • Considering applying for time for DF Wilson/Clover simulations. – Use Wilson-Clover valence for excited states (no wiggles) – Use Chiral operator for light quarks (possibly overlap valence). • Coordination with UKQCD could be beneficial -48- January 23, 2006 Jefferson Lab Need to sort out issues: • No of flavours? (2, 1+1, 2+1, 1+1+1?) • Anisotropy or lack thereof? • Politics politics politics – Coordinate production parameters – Share data – Share or Compete on Analysis? -49- January 23, 2006