Applying machine learning to quantum mechanics Gábor Csányi Engineering Laboratory −0.2 5 4 3 Configuration exploration: Nested Sampling 0.05 0.1 0.15 0.2 Relative temperature (kT/ε) 0.25 Force-fields by Machine Learning 0.3 -156.0 A 1:4.5 B -158.0 -160.0 15:1 -165 L2 -170 -162.0 -175 -164.0 -166.0 1:16 Free energy surface reconstruction Bayesian Inference -168.0 -180 -185 L1 -170.0 -172.0 Global -190 Log phase space volume Energy Heat capacity −0.1 Adaptive QM/MM with particle exchange Swift introduction to molecular modelling The world works according to the Schrödinger equation: @ i~ =H @t (x1 , x2 , . . .) xi 2 R 3 X 1 X Zi Zj H= r2xi + 2m |r r | i i j i ij ⌘ “matter waves” tendency to spread For fermions, antisymmetry: (. . . , xi , . . . , xj , . . .) = (. . . , xj , . . . , xi , . . .) Long range Coulomb attraction/repulsion macroscopically observable! • Range of validity: particles are electrons and atomic nuclei, stay the same • Thus far not contradicted by experiment Newtonian molecular dynamics At ambient conditions nuclear do not overlap (other than for H…), separable solution: = 1 (R1 ) 2 (R2 ) . . . Correspondence principle: quantum system reduces to a classical system: pi = h |rRi | i qi = h |Ri | i difficult electronic problem X p2 X Zi Zj i H = h |H| i = + + Vel (q1 , q2 , . . .) 2m |q q | i i j i ij | {z } “Interatomic potential” V (q1, q2, …) Typical questions “What are the defects that control material response?” Typical questions “What are the typically observed conformations?” Ingredients • Representation smoothness, faithfulness, continuity • Interpolation flexible but smooth functional form, few sensible parameters • Database predictive power non-domain specific of atomic neighbourhood of functions of configurations Function fitting with basis functions Fit a function f (x) based on observations y ⌘ {yi } at {xi } f (x) = yj = N X i=1 N X ↵i k(xi , x) ↵i k(xi , xj ) + i=1 y = (K + ↵=C 1 2 ⌫ I)↵ y ff(x) (x) = = kkTTC C 11yy 0 e.g. k(x, x ) = 2 ⌫ ij 2 |x x0 |2 /2 we regularised fit: arbitrary σ,σw,σν [K]ij ⌘ k(xi , xj ) C⌘K+ 2 ⌫I k ⌘ k(xi , x) 2 Machine learning framework: Kernel regression "(q(i) ) = N X ↵k K q(i) , q(k) k • Linear regression: (i) KDP (q , q (k) )=q (i) ·q (k) "(q(i) ) = KNN q(i) , q(k) = • Gaussian kernel (i) KSE q , q (k) |q(i) q(k) |2 + const. ⇣ X (q (i) = exp j q 2 2 j (k) 2 ⌘ ) N X (i) qj j • Neural networks X k (k) ↵ k qj = q(i) · 1D example 1 f(x) GP splines 0.5 f(x) 0 -0.5 -1 1.5 2 2.5 3 3.5 4 r Gaussians basis functions are wide! f (x) = kT C 1 y Bayesian function fitting: Gaussian Process Regression Fit a function f (x) based on observations y ⌘ {yi } at {xi } f (x) = H X wh basis functions h (x) h (x) h=1 fi = X Rih ⌘ Rih wh h Gaussian prior: P (w) = Normal(0, P (f ) = Normal(0, h (xi ) 2 w I) T 2 w RR ) " ⇠ Normal(0, Gaussian likelihood: y = f + " Joint of observations also Gaussian: P (y) = Normal(0, T 2 w RR + Expected function Var 2 ⌫ I) Var of data 2 ⌫ I) Predict the next observation using the conditional P (yN +1 |yN ) = P (yN +1 , yN )/P (yN ) f (x) = kT C 1 y Rih ⌘ h (xi ) N data points Limit of ∞ basis functions H basis functions RRT Use Gaussian basis functions, take H ! 1 , RRT is finite Prior is a Gaussian Process with ⇥ ⇤ 2 2 Cov[f (xi ), f (xj )] / exp (xi xj ) / Similarly for the observations: Cov[y(xi ), y(xj )] / exp ⇥ (xi 2 xj ) / 2 ⇤ + 2 ⌫ ij f (x) = kT C 1 y Gaussian Process connection to linear fit P (yN +1 |yN ) = P (yN +1 , yN )/P (yN ) / P (yN +1 , yN ) P (y1 . . . yN ) / Normal(0, CN ) / exp( N +1 P (yN +1 , yN ) / Normal(0, C ) / exp( CN +1 22 3 2 33 64 CN 5 4k 57 7 ⌘6 5 4 ⇥ ⇤ ⇥ ⇤ T k CN1+1 [k]i ⌘ C(xi , xN +1 ) ⌘ C(xN +1 , xN +1 ) T [yN 1 yN +1 ]CN +1 [yN 22 3 2 33 6 4 M 5 4 m5 7 7 ⌘6 4 5 ⇥ ⇤ ⇥ ⇤ T m µ µ = m = M = k T yN +1 ]) CN1 k 1 µ CN1 k 1 1 CN + mmT µ T 2 yN +1 ] = yN My + 2yT myN +1 + µyN +1 T 2 = µ(yN +1 + y m/µ) + . . . P (yN +1 |yN ) / exp( (yN +1 ȳ)2 /2ˆ 2 ) arg max P (yN +1 |yN ) = ȳ = kT CN1 yN yN +1 1 T y N CN y N ) 1 T [yN yN +1 ]CN +1 [yN ȳ = kT CN1 yN ˆ= kT CN1 k f (x) = kT C 1 y Gaussian Process Regression summary • Choose Gaussian covariance: xj )2 /2 2 ) Prior assumption X f (x) = arg max P (f |data) = ↵i K(x, x(i) ) K(xi , xj ) = exp( (xi f Maximum of posterior i ↵=C 1 2 wK + C 2 w K(xi , xi0 ) + ii0 = y⌘( 2 1 y ⌫ I) 2 ⌫ ii0 • Meaningful hyper-parameters: σ : smoothness (x-scale) of f σ w : y-scale of f σ ν: variance (noise) of input data • This is not an optimised fit, but a closed form estimate! Samples from the Gaussian prior P (f ) / Normal(0, C) f = f1.0 l = 1.0, = 1.0 f =f 2.0 l = 2.0, = 1.0 2 2 0 0 -2 -2 -4 -2 0 2 4 -4 -2 0 2 4 Samples from Gaussian posterior P (yN +1 |y) / Normal(kT C 1 y, . . .) 2 0 -2 -4 -2 0 2 4 Mean of posterior distribution ȳ(x) = kT (x)(K + T ⌫ I) 1 y |xi x0j |2 / 1 cov(y(x)) k(x, Kernel:=k(x xj ) =k e (x)(K + ⌫ I) k(x) i , x) [k(x)]i = k(x, xi ) Kij = k(xi , xj ) 2 0 -2 -4 -2 0 2 4 Generalised input data • Often direct measurements of yi not available • Consider any linear operator L(y) Compute P (yN +1 (xN +1 )|L(y)) L can be built using Σ , @/@x etc • Can deal with gradient data, sums, etc. Interpolating the solution of an eigenproblem X Zj ( 1) X 1 1X 2 rri + Hel = + 2 i |R r | |r r | j i i j ij ij @ i~ =H @t has spatial locality Weak Lowest eigenstate: E0 ⌘ Vel (R1 , R2 , . . .) = Many-body (cluster) expansion + + + + + + + + Strong + higher multi-body terms... E= X (i) "(q1 , q2 ,(i) (i) . . . , qM ) + analytic long range terms + analytic long range terms i qi are descriptors of local atomic neighbourhood Quantum mechanics is many-body 7 bcc fcc sc 6 DFT bcc Energy [eV / atom] Carbon: tight binding Tungsten: DFT, Embedded Atom Model Tungsten, Finnis-Sinclair fcc sc 5 4 3 2 1 0 -1 12 14 16 18 Volume [A3 / atom] 20 22 Quantum mechanics has some locality force F neighbourhood r Force errors around Si self interstitial far field Force errors around O in water with QM/MM r Is there a finite range atomic energy function that gives the correct total energy and forces? Traditional ideas for functional forms • Pair potentials: Lennard-Jones, RDF-derived, etc. • Three-body terms: Stillinger-Weber, MEAM, etc. • Embedded Atom (no angular dependence) 1X "i = V2 (|rij |) 2 j "i = • Bond Order Potential (BOP) Tight-binding-derived attractive term with pair-potential repulsion • ReaxFF: kitchen-sink + hundreds of parameters P j ⇢(|rij |) Representation is implicit These are NOT THE CORRECT functions. Limited accuracy, not systematic given by GOAL: potentials based on quantum mechanics Construct smooth similarity kernel directly • Overlap integral Z S(⇢i , ⇢i0 ) = ⇢i (r)⇢i0 (r)dr, • Integrate over all 3D rotations: k(⇢i , ⇢i0 ) = Z 2 S(⇢i , R̂⇢i0 ) dR̂ = cutoff: compact support Z • After LOTS of algebra: SOAP kernel k(⇢i , ⇢ ) = i0 X n,n0 ,l (i) (i0 ) pnn0 l pnn0 l dR̂ Z 2 ⇢i (r)⇢i0 (R̂r)dr pnl = pnn0 l = † cnl cnl † cnl cn0 l Smooth Overlap of Atomic Positions (SOAP) 1 SOAP l m ax = 2 l m ax = 4 l m ax = 6 0 Aver age dRMSD REF Error in distinguishing 10 10 −1 10 −2 10 4 6 8 10 n 12 14 16 Number of neighbours 18 20 Diamond Brenner potential 50 random data points 1 data point 300 configurations from ab initio MD: Brenner a) 1061 133 736 717 40 α / K−1 GAP (20ps MD) 39.8 30 Frequency / THz Frequency / THz −6 0 40 35 x 10 6 5 4 3 2 1 0 b) 25 20 15 10 DFT (QH) 500 1000 Temperature / K 1500 Thermal expansion 39.4 39.2 39 5 2000 39.6 38.8 0 Δ X Σ Γ Λ L 0 500 1000 Temperature / K Phonon spectrum 1500 Reactive potentials phase 2 Database: 0.1Å Energy / eV 6 phase 1 0.1Å 4 DFT−LDA GAP Brenner 2 0 Rhombohedral graphite 0 0.2 0.4 0.6 Reaction coordinate Diamond 0.8 1 Vacancy migration Energy / eV 4 DFT GAP 2 0 reaction coordinate How to generate databases? • Target applications: large systems • Capability of full quantum mechanics (QM): small systems QM MD on “representative” small systems: sheared primitive cell ➝ elasticity large unit cell ➝ phonons surface unit cells ➝ surface energy gamma surfaces ➝ screw dislocation vacancy in small cell ➝ vacancy vacancy @ gamma surface ➝ vacancy near dislocation Iterative refinement 1. QM MD ➝ Initial database 2. Model MD 3. QM ➝ Revised database What is the acceptable validation protocol? How far can the domain of validity be extended? Existing potentials for tungsten Error 15% 50% GAP BOP MEAM FS DFT reference 0 C11 C12 C44 Elastic const. Vacancy energy (100) (110) (111) (112) Surface energy Peierls barrier for screw dislocation glide Vacancy-dislocation binding energy (~100,000 atoms in 3D simulation box) Self-aware potentials: predicting the error • Bulk silicon: Diamond and Beta-tin phases Error Energy • Predicted error correctly signals where GAP is unreliable Outstanding problems • Accuracy on database accuracy in properties? • Database contents region of validity ? • Alloys - permutational complexity? Chemical variability? • Systematic treatment of long range effects • Electronic temperature