S1 Localization of transition state ensembles for reactions of arbitrary complexity Supplemental Material Kirill Zinovjev, Iñaki Tuñón* Departament de Química Física, Universitat de València, 46100 Burjassot, (Spain) *to whom correspondence should be addressed ignacio.tunon@uv.es S2 S1. Collective variables 1. d(A,A’) - distance The standard Euclidean distance between two points (atoms) in the 3D space: 1/2 2 d A, A A A Ai Ai i x, y , z (1) 2. hyb(A) – hybridization coordinate Is calculated as the point plane distance between the central atom A and the three substituents bonded to A in sp2 state: hyb A nˆ A S1 nˆ (2) S2 S1 S3 S1 S2 S1 S3 S1 (3) n̂ is a unit vector normal to the plane. 3. ESP(A) – electrostatic potential The electrostatic potential over a given atom (Na+ or Cl-) is given as: ESP( A) Xi A , A VESP A Xi qA (4) Here qA is the charge of atom A. The summation is done over all atoms in the system, except the atom A itself and the other atom in the NaCl pair (A’). So, the direct electrostatic interaction between Na+ and Cl- is excluded. S3 Instead of the standard Coulomb potential, the damped shifted potential from Fennell et al. (J. Chem. Phys. 124, 234104 (2006), eq. 18) was used: VESP r q A qB erfc r erfc R erfc R 2 exp 2 Rc2 c c , r Rc r R c r Rc Rc2 1/2 Rc (5) α was set to 0.2 and Rc to 12 Å. The value of VESP(r) for r > Rc is 0. 4. ESF(A) – projected electrostatic force. This coordinate takes the electrostatic force acting on a given atom (Na+ or Cl-) and measures the component that lies along the Na-Cl vector: ESF ( A) FESP A Xi qA Xi A , A A Xi A A A X i A A (6) Where FESP r d VESP r dr (7) VESP is the same as in the ESP coordinate case. 5. WatBridges(A,A’) – number of water bridges The number of water bridges formed between Na+ and Cl- atoms is obtained as following: nwat WaBridges ( A, A) c A A Oi c A A Oi i 1 (8) Here the summation is done over oxygen atoms of all the water molecules is the system. In practice we have used the same cutoff as for the ESP and ESF coordinates and ignored the water molecules that are more than 12 Å away. The c() function is a connectivity index (Phys. Rev. Lett. 90, 238302 (2003)): cA r 1 r 1 r0 n (9) S4 The connectivity index is approximately 1 when r < r0, decreases rapidly when r approaches r0 and is approximately 0 when r > r0. Value of n was set to 16 and r0 was 2.94 and 3.34 for Na+ and Cl- respectively. 6. WatDensity(A,A’) – water density between atoms The water density was calculated in a similar manner as in Mullen et al. (J. Chem. Theory. Comput. 10, 659, (2014)): A A / 2 O i WaDensity ( A, A) exp 3/2 2 2 2 i 1 2 nwat 1 2 (10) Summation is done as for WatBridges coordinate - over the oxygen atoms of the water molecules with 12 Å cutoff. The value in the nominator under exponential is the distance between the middle point of NaCl atom pair and an oxygen atom; σ was set to 1.85 Å. 7. Shell(A,n) – number of water molecules in the solvation shell Shell coordinate counts the number of water molecules that are within some distance from a given atom: nwat Shell ( A, n) c A,n A Oi i 1 (11) Here we utilize the same connectivity index function as in WatBridges (Eq. (9)). Atom A (Na+ or Cl-) and argument n define the value of r0: n 1 2 3 r0 (Na+) (Å) 2.94 5.07 7.40 r0 (Cl+) (Å) 3.34 5.80 7.78 The solvation shell radii are taken from Ballard et al. (J. Phys. Chem. B 116, 13490, (2012)). The values used here for r0 are set 0.4 Å smaller than the actual radii for the connectivity index to be close to 0 when the water molecule is on the shell frontier. S5 S2. Confidence intervals for committor histograms, transmission coefficients and importance analysis To calculate the confidence intervals we have used the bootstrapping technique. It allows estimating the properties of an arbitrary statistic of a given sample without any information on the underlying distribution. The main idea of the method is that a large number of new samples (bootstrap samples) are generated form the given sample. Then, for each bootstrap sample the statistic of interest is calculated. The distribution of the obtained values provides the properties of interest (mean, standard deviation etc.). In case of committor distribution and transmission coefficient, the situation is more complex, since, the quantities of interest are obtained by two separate sampling procedures. First, a set of structures that represent the transition state ensemble are sampled from molecular dynamics simulations and. Second, for each structure a set of trajectories are generated by sampling the initial velocities from Maxwell-Boltzmann distribution. To properly calculate the confidence intervals, each bootstrap sample should be obtained following the same procedure. In all the 5 cases presented in the text (2 for IPL and 3 for NaCl systems) the original sample consisted of 400 structures with 50 trajectories for each structure. 10000 bootstrap samples were obtained. For each bootstrap sample, first, 400 structures were sampled with replacement from the original ones. Then, for each structure, 50 trajectories were sampled with replacement from the trajectories obtained for this particular structure during the committor analysis. Then the committor histogram and transmission coefficient were obtained for this bootstrap sample. The results were saved and used then to calculate means and standard deviations. For simplicity, the 95% confidence intervals were calculated based on the quantiles of normal distribution: 95%CI 1.96 (12) In case of coefficients in importance analysis only one sampling is done (Eq. 26). However, the bootstrapping technique assumes the statistical independence of observations. Since the expression within the brackets of Eq. 26 is calculated from a molecular dynamics trajectory, the observations should be separated by an interval during which the autocorrelation function decays to 0. In all the cases studied in this paper this interval was around 50 fs. So, to calculate the confidence intervals, the observations were taken with 50 fs interval.