Supplementary Information: Exploring Chemical Reactivity of Complex Systems with Path-Based Coordinates. Role of the Distance Metric Kirill Zinovjev, Iñaki Tuñón* Departament de Química Física, Universitat de València, 46100 Burjassot, (Spain) *to whom correspondence should be addressed ignacio.tunon@uv.es Distance metrics used in Figure 1. Figure 1 was obtained using an arbitrary reference path in a two-dimensional space and three different metric tensors, the Euclidean metric, a constant metric and a variable metric, which were defined as: a) Euclidean Metric: 1 0 ) 0 1 ( b) Constant Metric: 1 0 ) 0 2 ( c) Variable Metric. The metric tensor was smoothly interpolated between the following three different matrixes. Reactants 1 0 ( 0 ) ; 2 ; TS ; 1 −0.5 ) ; −0.5 2 ( Products 1 0 ( ) 0 2 Note that this is a toy model introduced to illustrate the role of the metric and that does not correspond to any real system. Convergence of the string Figure S1. Convergence of the string is monitored from the Root-Mean-Square-Displacements (RMSD) of the nodes versus the simulation time. Definition of the s coordinate The s coordinate is defined from the converged MFEP in the following steps. First, the positions of the string nodes are averaged over the time range 10-50 ps of the string simulation. Then a least squares cubic spline interpolation of the string is done. Having the path defined with cubic splines has two advantages: i) least squares interpolations flatten any roughness that might lead to numerical problems and ii) it is more convenient to work with an analytic continuous description of the path than with a set of discrete points. The next step is to calculate the value of the average distance metric tensor at different points on Μ ) and the current the path. During the string calculations, values of the local metric tensor (π΄ position in the CV space were saved at each step of the dynamics and for each of the walkers. Μ observations roughly This results in W×T (W – number of walkers, T – number of time steps) π΄ equally distributed along the path. To calculate the average metric tensor as a function of the position on the path (π΄(π), where ο΄ is some initial arbitrary parameterization), the path is split in Μ is then assigned to the closest bin in the CV space. Then π΄(π) is obtained in 100 bins. Each π΄ each bin and a least squares cubic splines interpolation is performed for each element of the tensor. Components of the 9x9 metric tensor are provided in Table S1. Once the path and the metric have been fitted to analytic functions, a reparameterization of the path has to be done, so that the value of the s coordinate correspond to the arc-length of the path under the non-constant distance metric. To do this, the arc-length as a function of π is obtained: π′ ππ‘ π‘(π′) = ∫0 ππ π′ ππ(π) ππ = ∫0 | ππ | π΄−1 (π) ππ (S1) Finally, having a one-to-one mapping between the initial parameterization π and the arc-length t, it is straightforward to construct an inverse function π(π‘) and then to acquire the equidistant points along the path and the corresponding values of the metric tensor for each. Also, as was done in Ref. 6 and 7, the path is additionally extrapolated into the reactants and products basins to avoid numerical instabilities. The metric tensor in the extrapolated tails is constant and equal to the metric tensor at the corresponding endpoint of the path. The free energy profile corresponding to the converged string is plotted in Figure S2 as a function of π, the arc-length t and the coordinate s. In case of a constant distance metric the procedure is the same, except that the π΄(ο΄) calculation step is omitted and its constant value is directly used. Table S1. Components of the 9x9 metric tensors used in s0, sav and sM coordinates The coordinates are in the following order: d(C1-H2), d(H2-C3), d(O5-C6), d(C1-C6), d(C4-O5), d(C3-C4), hyb(C1), hyb(C3), hyb(C6) S0: 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 -0.03902 1.07538 0.00000 0.00000 0.00000 -0.00284 0.00000 -0.07263 0.00000 0.00000 0.00000 0.14576 -0.01770 -0.00755 0.00000 0.02218 0.00000 -0.07892 -0.02161 0.00000 -0.01770 0.16651 0.00000 0.00000 0.01516 0.00000 0.01505 0.00000 0.00000 -0.00755 0.00000 0.14576 -0.04354 0.00000 0.00289 0.00000 0.00000 -0.00284 0.00000 0.00000 -0.04354 0.16651 0.00000 0.01472 0.00000 -0.07991 0.00000 0.02218 0.01516 0.00000 0.00000 0.10873 0.00000 -0.04667 0.00000 -0.07263 0.00000 0.00000 0.00289 0.01472 0.00000 0.34662 0.00000 0.02149 0.00000 -0.07892 0.01505 0.00000 0.00000 -0.04667 0.00000 0.25172 -0.78394 1.07538 0.00000 0.00000 0.00000 -0.00848 0.00000 -0.08122 0.00000 0.00000 0.00000 0.14576 -0.02824 -0.02616 0.00000 0.02209 0.00000 -0.08135 -0.02214 0.00000 -0.02824 0.16651 0.00000 0.00000 0.01708 0.00000 0.02259 0.00000 0.00000 -0.02616 0.00000 0.14576 -0.03990 0.00000 0.00703 0.00000 0.00000 -0.00848 0.00000 0.00000 -0.03990 0.16651 0.00000 0.01888 0.00000 -0.08237 0.00000 0.02209 0.01708 0.00000 0.00000 0.10871 0.00000 -0.04729 0.00000 -0.08122 0.00000 0.00000 0.00703 0.01888 0.00000 0.34244 0.00000 0.02175 0.00000 -0.08135 0.02259 0.00000 0.00000 -0.04729 0.00000 0.25596 Sav: 1.07538 -0.03902 0.00000 -0.02161 0.00000 0.00000 -0.07991 0.00000 0.02149 SM (at TS): 1.07538 -0.78394 0.00000 -0.02214 0.00000 0.00000 -0.08237 0.00000 0.02175 Figure S2. Free energy profile (red line) corresponding to the converged string for the isochorismate reaction as a function of t0 (a), the arc-length L (b) and the coordinate s (c). In this last case the PMF (green line) is also presented. Parameters for Umbrella Sampling Simulations One of the results of the string method calculation is the free energy profile along the converged path (see Figure S2). Although not being a PMF, this profile is expected to be a good approximation of it (as was discussed in Ref. 7 and observed in Figure S2c). Therefore, one might take advantage of having this profile to guess the shape of the PMF. This information can be used to define the biasing potentials for Umbrella Sampling windows, flattening the underlying free energy as much as possible. Usually, the biasing potential is harmonic: πππ (π ) = πΎπ 2 (π − π π0 )2 (S3) The force constants (Ki) and reference positions (π π0 ) of the biasing potentials in window i should be chosen in such a way that the sampling accumulated from the whole set of US windows were as uniform as possible. If the number of simulation windows is sufficiently large one can assume that the density of states within each window will be normally distributed. Therefore to obtain a uniform sampling after summation of histograms of all windows the corresponding Gaussians should be equally distributed with standard deviation be equal to the spacing between windows: π−1 ππ = π−1 πΏ; ππ = πΏ/(π − 1) (S4) Where N is the number of windows, L the whole range of the RC, μi and σi mean and standard deviation of the distribution of the window i, respectively. If the underlying free energy can be approximated by the free energy profile (A), then the force constants (Ki) and reference positions (π π0 ) of the US windows can be estimated as follows: ππ π2 πΎπ = π2 − ππ 2 π΄(ππ ); π π΄(ππ ) π π0 = ππ + ππ πΎ π (S5) Figure S3 shows the expected and observed probability density obtained during the US simulations performed to obtain the PMF corresponding to the isochorismate reaction. It can be concluded that the use of the harmonic biasing potentials given by eq. (S5) produce a homogeneous sampling of the whole range of possible values of the reaction coordinate. Figure S3. Expected and observed normalized probability density during US simulations.