MODELING MATTER AT NANOSCALES 3. 3.04. Empirical classical PES and typical procedures of optimization Geometries from energy derivatives General algorithm: the line search strategy An unavoidable pathway for each kind of iterative minimization: {Rk}k such that k N E(Z’,Rk+1) E(Z’,Rk) is to find optimal step sizes in the way for finding a local or global minimum of a hypersurface in a given direction. General algorithm: the line search strategy An unavoidable pathway for each kind of iterative minimization: {Rk}k such that k N E(Z’,Rk+1) E(Z’,Rk) is to find optimal step sizes in the way for finding a local or global minimum of a hypersurface in a given direction. It is performed by a search strategy known as line search, meaning: • Computing E(Z’,Rk) and g(Rk) given Rk at a certain k point of the search. General algorithm: the line search strategy An unavoidable pathway for each kind of iterative minimization: {Rk}k such that k N E(Z’,Rk+1) E(Z’,Rk) is to find optimal step sizes in the way for finding a local or global minimum of a hypersurface in a given direction. It is performed by a search strategy known as line search, meaning: • Computing E(Z’,Rk) and g(Rk) given Rk at a certain k point of the search. • Choosing a search direction as F(Rk) General algorithm: the line search strategy An unavoidable pathway for each kind of iterative minimization: {Rk}k such that k N E(Z’,Rk+1) E(Z’,Rk) is to find optimal step sizes in the way for finding a local or global minimum of a hypersurface in a given direction. It is performed by a search strategy known as line search, meaning: • Computing E(Z’,Rk) and g(Rk) given Rk at a certain k point of the search. • Choosing a search direction as F(Rk) • Obtaining step sizes ak > 0 that minimizes E[Z’, Rk + akF(Rk)], i.e. the hypersurface at a better following location by appropriate mathematical procedures. General algorithm: the line search strategy An unavoidable pathway for each kind of iterative minimization: {Rk}k such that k N E(Z’,Rk+1) E(Z’,Rk) is to find optimal step sizes in the way for finding a local or global minimum of a hypersurface in a given direction. It is performed by a search strategy known as line search, meaning: • Computing E(Z’,Rk) and g(Rk) given Rk at a certain k point of the search. • Choosing a search direction as F(Rk) • Obtaining step sizes ak > 0 that minimizes E[Z’, Rk + akF(Rk)], i.e. the hypersurface at a better following location by appropriate mathematical procedures. • Selecting the new geometry at the following local step k + 1, i.e. Rk+1 = Rk + akF(Rk) and repeating the cycle from k+1 until achieving a value of a, given a local convergence threshold that will be the corresponding final optimization step. First order treatments Simple gradient methods (first order): steepest descent The method of steepest descent only make use of gradient information on the potential energy surface in order to determine the F direction of search: F R k g R k Simple gradient methods (first order): steepest descent The method of steepest descent only make use of gradient information on the potential energy surface in order to determine the F direction of search: F R k g R k Then, each optimization step k + 1 will depend on the previous one following: R k 1 R k α k gR k Simple gradient methods (first order): steepest descent The method of steepest descent only make use of gradient information on the potential energy surface in order to determine the F direction of search: F R k g R k Then, each optimization step k + 1 will depend on the previous one following: R k 1 R k α k gR k The inconvenience is that the gradient at a given point is always orthogonal to the gradient at the following point in the minimization gR k 1 FR k 0 It means a “zig-zag” approach to the minimum that looses much computing time. Simple gradient methods (first order): non – linear conjugate gradient The non – linear conjugate gradient method can be considered as a correction to the method of steepest descent. It remains only making use of gradient information on the potential energy surface as: F R 1 g R 1 Simple gradient methods (first order): non – linear conjugate gradient The non – linear conjugate gradient method can be considered as a correction to the method of steepest descent. It remains only making use of gradient information on the potential energy surface as: F R 1 g R 1 but subsequent steps behave as: FR k 1 gR k 1 β k 1FR k being bk+1 a matrix of real numbers. Simple gradient methods (first order): non – linear conjugate gradient The non – linear conjugate gradient method can be considered as a correction to the method of steepest descent. It remains only making use of gradient information on the potential energy surface as: F R 1 g R 1 but subsequent steps behave as: FR k 1 gR k 1 β k 1FR k being bk+1 a matrix of real numbers. As well as in the case of steepest descent, each optimization step k + 1 will depend on the previous one by following: R k 1 R k α k gR k Simple gradient methods (first order): non – linear conjugate gradient To calculate terms of the bk+1 diagonal matrix in: FR k 1 gR k 1 β k 1FR k several formulas are used depending on gradients themselves: - as Fletcher - Reeves: T g R k 1 gR k 1 FR β k 1 T g R k gR k Simple gradient methods (first order): non – linear conjugate gradient To calculate terms of the bk+1 diagonal matrix in: FR k 1 gR k 1 β k 1FR k several formulas are used depending on gradients themselves: - as Fletcher - Reeves: T g R k 1 gR k 1 FR β k 1 T g R k gR k - or Polak - Ribière: T g R k 1 gR k 1 gR k PR β k 1 gT R k gR k or even other. Simple gradient methods (first order): non – linear conjugate gradient The big advantage of non – linear gradient conjugate method is that being: gR k 1 FR k 0 Simple gradient methods (first order): non – linear conjugate gradient The big advantage of non – linear gradient conjugate method is that being: gR k 1 FR k 0 in steepest descent method, here it becomes non - orthogonal: FR k 1 FR k 0 Simple gradient methods (first order): non – linear conjugate gradient The big advantage of non – linear gradient conjugate method is that being: gR k 1 FR k 0 in steepest descent method, here it becomes non - orthogonal: FR k 1 FR k 0 It means a more straightforward pathway to minima, avoiding zig – zags. Simple gradient methods (first order): non – linear conjugate gradient It must be observed that non - linear gradient conjugate methods use the history of gradient development in a certain way. It means that curvatures (Hessians) are quite accounted. Advantages are: • a rather simple formulas for updating the direction vector. Simple gradient methods (first order): non – linear conjugate gradient It must be observed that non - linear gradient conjugate methods use the history of gradient development in a certain way. It means that curvatures (Hessians) are quite accounted. Advantages are: • a rather simple formulas for updating the direction vector. • being slightly more complicated than steepest descent, converges faster. Simple gradient methods (first order): non – linear conjugate gradient It must be observed that non - linear gradient conjugate methods use the history of gradient development in a certain way. It means that curvatures (Hessians) are quite accounted. Advantages are: • a rather simple formulas for updating the direction vector. • being slightly more complicated than steepest descent, converges faster. • more reliable than simpler methods. Simple gradient methods (first order): non – linear conjugate gradient It must be observed that non - linear gradient conjugate methods use the history of gradient development in a certain way. It means that curvatures (Hessians) are quite accounted. Advantages are: • a rather simple formulas for updating the direction vector. • being slightly more complicated than steepest descent, converges faster. • more reliable than simpler methods. It is considered that a good optimization strategy to perform initially a few steepest descent steps to reach a conformation near to the energy minimum and then finishing it with conjugate gradient steps until convergence. Second order treatments Gradient and Hessian path to optimize geometries The Taylor series can be truncated if Rf is any stationary point of the system or even being at their nearest surroundings. Moreover, gradients must collapse at equilibrium or any other stationary geometry. Then: 1 E Z ' , R f E Z ' , R eq q H R eq q 2 Gradient and Hessian path to optimize geometries As the derivative with respect to the position of any a center at geometry Rf is the gradient of energy at such coordinate, it can also be expanded as a Taylor’s series: E Z ' , R g a R f ra R f N 2 E Z ' , R E Z ' , R rbf rb ra R b ra rb R Gradient and Hessian path to optimize geometries As the derivative with respect to the position of any a center at geometry Rf is the gradient of energy at such coordinate, it can also be expanded as a Taylor’s series: E Z ' , R g a R f ra R f N 2 E Z ' , R E Z ' , R rbf rb ra R b ra rb R Therefore, the case of a full gradient matrix at the Rf geometry in matrix notation is: g R f gR HR q Gradient and Hessian path to optimize geometries When Rf is the equilibrium geometry results that g(Rf) = g(Req) = 0 by definition, and therefore gradients and curvatures become related by displacements q as: gR HR q Gradient and Hessian path to optimize geometries When Rf is the equilibrium geometry results that g(Rf) = g(Req) = 0 by definition, and therefore gradients and curvatures become related by displacements q as: gR HR q Hessian matrix is considered non – singular when HH−1 = H−1H = In, where In is the identity matrix of n order. Therefore, the displacement matrix can then be expressed as a product of the inverse Hessian by the gradient matrix if Hessian is non – singular: q H 1 R gR Gradient and Hessian path to optimize geometries Therefore, being: q H 1 R gR whereas q = Req – R, then: R eq R H 1 R gR Gradient and Hessian path to optimize geometries Therefore, being: q H 1 R gR whereas q = Req – R, then: R eq R H 1 R gR Remembering steps in the line – search procedure: R k 1 R k α k gR k arises that a good approach for step sizes is the inverse Hessian when it is non - singular: H 1 R k α k Gradient and Hessian path to optimize geometries That means a way to find the equilibrium geometry of any system following a process where an arbitrary geometry can be progressively changed downwards in the hypersurface to the point when g(R) becomes 0 by means of the curvature provided by the Hessian. Gradient and Hessian path to optimize geometries That means a way to find the equilibrium geometry of any system following a process where an arbitrary geometry can be progressively changed downwards in the hypersurface to the point when g(R) becomes 0 by means of the curvature provided by the Hessian. This is a general principle for finding optimized geometries guided by the gradient path and the curvature of hypersurfaces at each point. Gradient and Hessian methods (second order): Newton Raphson The Newton – Raphson scheme is based on the principle of using first derivatives of a function to find extrema and second derivatives to define the condition of maxima or minima. Therefore, the general recipe for an iterative term finding the next step is: g R k R k 1 R k H R k R k gR k H 1 R k Gradient and Hessian methods (second order): Newton Raphson The Newton – Raphson scheme is based on the principle of using first derivatives of a function to find extrema and second derivatives to define the condition of maxima or minima. Therefore, the general recipe for an iterative term finding the next step is: g R k R k 1 R k H R k R k gR k H 1 R k Observe that like the steepest descent method, Newton’s searches by the negative gradient direction. Gradient and Hessian methods (second order): Newton Raphson Newton – Raphson methods provide good output properties and fast convergence) if started near the equilibrium structure (either local or global). However, needs modifications if started far away from solution. Gradient and Hessian methods (second order): Newton Raphson Newton – Raphson methods provide good output properties and fast convergence) if started near the equilibrium structure (either local or global). However, needs modifications if started far away from solution. Another inconvenience is that the inverse Hessian is expensive to calculate. Gradient and Hessian methods (second order): Pseudo Newton - Raphson As not always is possible (as fast as desired) to compute H(Rk) at each iteration step, there are pseudo – Newton – Raphson methods that use approximate Hessian matrices: BR k H 1 R k becoming: R k 1 R k BR k gR k Being sure that Bk is always definite positive and lim B k H k 1 Gradient and Hessian methods (second order): Pseudo Newton - Raphson One case of pseudo – Newton – Raphson procedure is the Broyden – Fletcher – Goldfarb – Shanno procedure (BFGS), used by default in important program packages: BR k q k qTk BR k BR k 1 BR k qTk BR k q k T gR k 1 gR k gR k 1 gR k T gR k 1 gR k q k Gradient and Hessian methods (second order): Pseudo Newton - Raphson One case of pseudo – Newton – Raphson procedure is the Broyden – Fletcher – Goldfarb – Shanno procedure (BFGS), used by default in important program packages: BR k q k qTk BR k BR k 1 BR k qTk BR k q k T gR k 1 gR k gR k 1 gR k T gR k 1 gR k q k The evaluation of B(R1) is performed by particular formulas or by other method, as those previously described. Gradient and Hessian methods (second order): Pseudo Newton - Raphson Pseudo – Newton – Raphson method use to speed up convergence without requiring the high computational resources of finding the Hessian matrix at each optimization step. Gradient and Hessian methods (second order): Pseudo Newton - Raphson Pseudo – Newton – Raphson method use to speed up convergence without requiring the high computational resources of finding the Hessian matrix at each optimization step. In certain cases, as those of large systems with too many degrees of freedom it could be better to perform steepest descent or conjugate gradient procedures. Convergences There are several convergence criteria which are not mutually exclusive. If e is a convenient threshold: → Mathematical: gR k e and all terms in Hk are definite positive Convergences There are several convergence criteria which are not mutually exclusive. If e is a convenient threshold: → Mathematical: gR k e and all terms in Hk are definite positive → Gradient only : gR k e although the minimum is not granted Convergences There are several convergence criteria which are not mutually exclusive. If e is a convenient threshold: → Mathematical: gR k e and all terms in Hk are definite positive → Gradient only: gR k e although the minimum is not granted → Both on gradient and displacements: gR k e and R k 1 R k e Coordinates In practice, all matrices are treated in the 3N dimension space of x, y and z (or x1, x2, x3)on each nuclei or center of reference. Coordinates In practice, all matrices are treated in the 3N dimension space of x, y and z (or x1, x2, x3)on each nuclei or center of reference. It means that the Taylor series for energy can be expressed in terms of the X in place of the R matrix: E Z ' , X f E Z ' , X x 3N n 1 f n E Z ' , X xn xn X 2 1 3N 3N f E Z ' , X f x x x x n n m m 2! n1 m1 xn xm X 3 E Z ' , X 1 3N 3N 3N f f f ... xn xn xm xm x p x p 3! n1 m1 p 1 xn xm x p X Coordinates It must be observed that a molecule with 2 atoms have 2N = 6 coordinates, although only one dimension is significant for their geometry: x2 x1 x3 x6 x4 x5 → 3 molecular dimensions (x1, x2, x3) are referred to the translation of the system with respect to any external point. Coordinates It must be observed that a molecule with 2 atoms have 2N = 6 coordinates, although only one dimension is significant for their geometry: x2 x1 x3 x6 x4 x5 → 3 molecular dimensions (x1, x2, x3) are referred to the translation of the system with respect to any external point. → 2 molecular dimensions (x4, x5) are referred to its rotational motion. Coordinates It must be observed that a molecule with 2 atoms have 2N = 6 coordinates, although only one dimension is significant for their geometry: x2 x1 x3 x6 x4 x5 → 3 molecular dimensions (x1, x2, x3) are referred to the translation of the system with respect to any external point. → 2 molecular dimensions (x4, x5) are referred to its rotational motion. → Molecules with more than 2 atoms (non cylindrical symmetry) require an additional coordinate for dealing with rotations. Coordinates To be valid the expression in terms of Cartesian coordinates: X eq X H 1 X gX the Hessian must be non – singular, i.e. HH-1 = H-1H = I. Coordinates To be valid the expression in terms of Cartesian coordinates: X eq X H 1 X gX the Hessian must be non – singular, i.e. HH-1 = H-1H = I. The problem is that a 3N Hessian matrix in terms of Cartesian coordinates is singular as well as translation (involving x1, x2, x3) and rotation (involving x4, x5, x6) of the complete system is a constant sum of an overall kinetic energy to all terms in Hessian. It is comprehensive to the whole nanoscopic system. Coordinates The system´s geometry deals with the potential energy among the components that individually regards all other coordinates. Coordinates The system´s geometry deals with the potential energy among the components that individually regards all other coordinates. Therefore, there are no potential energy gradient, nor Hessian, for the whole system´s translational nor rotational energy components. Coordinates The solution is transforming the full X Cartesian coordinate matrix in an internal or a reduced Cartesian coordinate matrix X* with only the geometry significant 3N – 6 terms (or 3N – 5 in the case of cylindrical systems): X *eq X * H 1 X *gX * Coordinates The solution is transforming the full X Cartesian coordinate matrix in an internal or a reduced Cartesian coordinate matrix X* with only the geometry significant 3N – 6 terms (or 3N – 5 in the case of cylindrical systems): X *eq X * H 1 X *gX * The H(X*) Hessian matrix is non – singular and then all previous considerations holds for geometry optimizations. Finding gradients and Hessians There are two main forms to compute gradients and Hessians: – Analytically, meaning the evaluation of formulas of total energy derivatives from the ones used for the hypersurface function. Finding gradients and Hessians There are two main forms to compute gradients and Hessians: – Analytically, meaning the evaluation of formulas of total energy derivatives from the ones used for the hypersurface function. – Numerically, using series to evaluate points and their corresponding inter- and extrapolations Finding gradients and Hessians: numerical approaches A good algorithm to be used when only energy is known and analytical gradients can not be obtained is that of Fletcher Powell (FP). Finding gradients and Hessians: numerical approaches A good algorithm to be used when only energy is known and analytical gradients can not be obtained is that of Fletcher Powell (FP). It builds up an internal list of gradients by keeping track of the energy changes from one step to the next. The Fletcher - Powell algorithm is usually the method of choice when energy gradients cannot be computed. Finding gradients and Hessians: numerical approaches A good algorithm to be used when only energy is known and analytical gradients can not be obtained is that of Fletcher Powell (FP). It builds up an internal list of gradients by keeping track of the energy changes from one step to the next. The Fletcher - Powell algorithm is usually the method of choice when energy gradients cannot be computed. 1. Fletcher, R., A new approach to variable metric algorithms. The Computer Journal 1970, 13 (3), 317-322. 2. Fletcher, R.; Powell, M. J. D., A Rapidly Convergent Descent Method for Minimization. The Computer Journal 1963, 6 (2), 163-168. Finding gradients and Hessians: numerical approaches Some of the most efficient algorithms are the so – called quasi-Newton algorithms, which assume a quadratic potential surface near to minima. Finding gradients and Hessians: numerical approaches Some of the most efficient algorithms are the so – called quasi-Newton algorithms, which assume a quadratic potential surface near to minima. The Berny algorithm internally builds up a second derivative Hessian matrix. Finding gradients and Hessians: numeric and analytic approaches Another good algorithm is the geometric direct inversion of the iterative subspace (GDIIS) algorithm. Finding gradients and Hessians: numeric and analytic approaches Another good algorithm is the geometric direct inversion of the iterative subspace (GDIIS) algorithm. Molecular mechanics programs often use: Finding gradients and Hessians: numeric and analytic approaches The general DIIS procedure used in GAMESS that includes both numerical and analytical derivatives is described in the series: • Hamilton, T. P.; Pulay, P., Direct inversion in the iterative subspace (DIIS) optimization of open-shell, excited-state, and small multiconfiguration SCF wave functions. J. Chem. Phys. 1986, 84 (10), 5728-5734. • Pulay, P., Improved SCF convergence acceleration. J. Comput. Chem. 1982, 3 (4), 556-560. • Pulay, P., Convergence acceleration of iterative sequences. The case of SCF iteration. Chem. Phys. Lett. 1980, 73 (2), 393-398. Finding gradients and Hessians: refinements near minima and transition states The Newton – Raphson approach can be expressed as before in terms of and internal or reduced Cartesian coordinate matrix in a stationary point: q* H 1 X *gX * Finding gradients and Hessians: refinements near minima and transition states The Newton – Raphson approach can be expressed as before in terms of and internal or reduced Cartesian coordinate matrix in a stationary point: q* H 1 X *gX * For the sake of simplicity it can be written as: Hq g 0 Finding gradients and Hessians: refinements near minima and transition states A linear transformation is useful to be performed to treat variables: h 0 1 H h U * HU 0 h n Finding gradients and Hessians: refinements near minima and transition states A linear transformation is useful to be performed to treat variables: h 0 1 H h U * HU 0 h n Then the corresponding displacements are: q' U * q Finding gradients and Hessians: refinements near minima and transition states A linear transformation is useful to be performed to treat variables: h 0 1 H h U * HU 0 h n Then the corresponding displacements are: q' U * q and gradients: g ' U * g Finding gradients and Hessians: refinements near minima and transition states Such hi, q’i and g’i variables are called as the local principal modes or axes and the Newton – Raphson step along each principal axis is then: g 'i q 'i hi Finding gradients and Hessians: refinements near minima and transition states Such hi, q’i and g’i variables are called as the local principal modes or axes and the Newton – Raphson step along each principal axis is then: g 'i q 'i hi Then, for stationary points: g 'i 0 0 h1 h2 hn 0 h1 h2 hn h1 h 0 h 1 hn for minima Finding gradients and Hessians: refinements near minima and transition states Such hi, q’i and g’i variables are called as the local principal modes or axes and the Newton – Raphson step along each principal axis is then: g 'i q 'i hi Then, for stationary points: g 'i 0 0 h1 h2 hn for minima 0 h1 h2 hn for maxima h1 h 0 h 1 hn Finding gradients and Hessians: refinements near minima and transition states Such hi, q’i and g’i variables are called as the local principal modes or axes and the Newton – Raphson step along each principal axis is then: g 'i q 'i hi Then, for stationary points: g 'i 0 0 h1 h2 hn for minima 0 h1 h2 hn for maxima h1 h 0 h 1 hn for a order saddle point Finding gradients and Hessians: refinements near minima and transition states It conducts to stepping procedures where: g 'i q 'i hi l where l is an appropriately chosen shift parameter. Finding gradients and Hessians: refinements near minima and transition states It conducts to stepping procedures where: g 'i q 'i hi l where l is an appropriately chosen shift parameter. Depending upon the value of l, the sign of each (hi - l) will be positive or negative, and hence the direction of the step qi will be opposite or toward the direction of the gradient. Finding gradients and Hessians: refinements near minima and transition states All these expressions facilitate the use of algebra to determine the best l. These approaches are named as the eigenvector following procedures. Finding gradients and Hessians: refinements near minima and transition states All these expressions facilitate the use of algebra to determine the best l. These approaches are named as the eigenvector following procedures. They result excellent to find accurate minima and saddle points. Finding gradients and Hessians: refinements near minima and transition states All these expressions facilitate the use of algebra to determine the best l. These approaches are named as the eigenvector following procedures. They result excellent to find accurate minima and saddle points. • Banerjee, A.; Adams, N.; Simons, J.; Shepard, R., Search for stationary points on surfaces. J. Phys. Chem. 1985, 89 (1), 52-57. • Simons, J.; Joergensen, P.; Taylor, H.; Ozment, J., Walking on potential energy surfaces. J. Phys. Chem. 1983, 87 (15), 2745-2753. • Cerjan, C. J.; Miller, W. H., On finding transition states. J. Chem. Phys. 1981, 75 (6), 2800-2806. Summary An interesting paper describing how all this matter works in recent programs is: Comments The QM/MM chalenge is also being aforded: Comments This is a still active field of research: References A comprehensive illustration: Cramer, C. J., Essentials of Computational Chemistry. Theories and Models. 2nd. ed.; John Wiley & Sons Ltd: Chichester, 2004; p 596. An appendix in: Szabo, A.; Ostlund, N. S., Modern quantum chemistry: introduction to advanced electronic structure theory. First edition, revised ed.; McGraw-Hill: New York, 1989; p 466.