Supplementary Material for: A Supervised Fitting Approach to Force Field Parametrization with Application to the SIBFA Polarizable Force Field Mike Devereux, Nohad Gresh, Jean-Philip Piquemal and Markus Meuwly 1.1 Fitting Protocol using I-NoLLS During I-NoLLS fitting, the general strategy after each evaluation of the Jacobian matrix was to first examine the results of SVD. Often, a large reduction in the total error was predicted using only the first few singular directions. The distance in parameter space associated with adding each additional singular direction was balanced against the predicted reduction in the total error it would provide, to try to find the maximum possible improvement for the minimum possible parameter change. Early on in the fitting process, a large Levenberg-Marquardt constant was also used to help to reduce the step-size away from the current parameter values, and the constraints outlined in Eq. 13 of the main text were used to ensure values did not drift too far from their initial SBK or hand-fitted values either. A simple reduction factor could also be applied after selecting SVD and Levenberg-Marquardt values to scale the chosen parameter step down before it was submitted for testing. The chosen parameter step was then trialed, and if it led to a reduction in the total error it was accepted and the next evaluation of J began. If it failed or led to an increase in the total error, the singular components, Levenberg-Marquardt parameters and reduction factor were made more conservative and the new step was tested until reduction in the total error was achieved. Without these measures an erroneous step that either greatly increased the total error or caused a crash during evaluation of the energy in SIBFA was highly likely, highlighting the advantage of a supervised approach over a fully automated procedure for the current application. Rejecting steps that led to an increase in the total error also yielded significant time-savings by reducing the total number of evaluations of J necessary. Towards the end of the process the fit typically becomes more linear, the predicted improvement when including all singular components after SVD becomes increasingly small and little user supervision is required. Convergence of the fit is achieved when no further significant reduction in the total error is possible, as evaluated using the variance (σ2) reported by I-NoLLS. 2.1 (H2O)n and [Mg(H2O)n]2+ validation complex energies Fig. S1 SIBFA vs RVS energies for the series of (H2O)n (top) and [Mg(H2O)n]2+ (bottom) complexes used in the validation set. Parameters used in SIBFA were P1 (black circles), P2 (green diamonds), P3 (red squares) and P4 (blue triangles). All energy components are included (electrostatic, repulsion, polarization, charge transfer and total binding energy). The black dashed line is included at y=x to guide the eye. 2.2 Parameters adjusted during I-NoLLS fitting Below is a table showing parameters adjusted by I-NoLLS during fitting and final values for models P1, P2, P3 and P4. Standard parameter names are given as they are used in the SIBFA code, available upon request from the authors. Colored values relate specifically to H2O (green), formamide (red) or imidazole (blue). The definition of the parameters is detailed below regarding each energy contribution. The effective radii (six-lettered code) are denoted with a 'w' in the second position. Letters 'r', 's' or 'p', and 't' are the radii used for Erep, Epol and Ect, respectively. The effective radii used for the penetration term of EMTP* have two 'p's as the first two letters. The involvement of these radii can be found in Refs. S1 for Erep, Epol and Ect, and in Refs. S2 and S3 for EMTP* . EMTP*. Cnumpe, dnumpe, cdipnu, and paramc are the parameters used for Epen. See eg, Ref. S2. The first two ones correspond to parameters gamma and delta of equation 3. Cdipnu corresponds to khi of equations 7 and 8. Erep. Cofrea and cofreb are the multiplicative constants of the S2/R and S2/R2 components of Erep, respectively, and alfrea and alfreb are their corresponding Gaussian exponents. Details on the expressions of Erep are in Ref. S3 and S4. Epol. Rampol and vampol are the multiplicative factor and the exponent of the screening Gaussian function used to screen the electrostatic field exerted on a given ligand. They correspond to parameters E and F of equation 13 of Ref. S1. They are ligand-specific and are thus listed for water first, and for then formamide and imidazole. Ect. Cwhydr is the effective radius for polar H atoms acting as electron-acceptors for the charge-transfer contribution. It corresponds to parameter UM* of equation 18 of Ref. S1. Prop and alphf are the multiplicative constant and the exponential of the charge-transfer contribution in the case of non-metal cation complexes. They intervene in equation 16 of Ref. S4. Procat and proro3cat are cation-specific constants. The first is the multiplicative factor of Ect when the cation acts as the electron-acceptor. The second is the 'self-potential' of the cation. They correspond to parameters SM and FM of equations 15 and 17 of Ref. S1. For both Erep and Ect, the “dwlpi” values correspond to small increments/decrements of the effective radius of a heavy atom along the direction of its pi lone pairs in the case of formamide and imidazole. Regarding Mg(II). The parameters eg, ppen, pw, and egg, are the Mg(II) effective radii used for Erep, EMTP, Epol and Ect, respectively. Pkm1, pkm8, pkm612 and pkm712 are the multiplicative constants which are used for the pairs Mg-H, Mg-O, Mg-C and Mg-N, respectively. References. S1. Gresh, N. J. Comput. Chem. 1995, 16, 856. S2. Piquemal, J.-P., Gresh, N., Giessner-Prettre, C. J. Phys. Chem. A., 2003, 107, 10353. S3. Piquemal, J.-P., Chevreau, H., Gresh, N. J. Chem. Theory Comput. 2007 3, 824. S4. Gresh, N. J. Phys. Chem. A 1997, 101, 8680. Emtp Erep Epol Ect Parameter ppoxyg pphydr ppcjug ppnypr ppocar ppen cnumpe dnumpe paramc cdipnu coefpe rwoxyg rwhydr rwcjug rwnpyr rwocar eg pkm1 pkm8 pkm612 pkm712 cofrea (*104) cofreb (*104) alfrea alfreb dwlpi dwlpi dwlpi2 dwlpi pwoxyg swoxyg pwhydr swhydr pwcara swcara pwnitr swnitr pwocar swocar pw rampol vampol rampol vampol rampol vampol twcarb twnitr twocar twoxyg cwhydr prop alphf proro3cat procat egg P1 1.410 1.100 1.605 1.450 1.440 0.755 2.440 2.250 1.420 2.450 1.000 1.448 1.240 1.550 1.650 1.480 1.265 11.700 14.000 11.600 12.900 4.400 4.500 9.440 14.000 0.000 0.000 0.000 0.000 1.448 1.500 1.300 1.200 1.900 1.900 1.650 1.600 1.480 1.425 1.265 0.680 1.400 0.680 1.400 1.050 1.650 1.700 1.650 1.450 1.500 1.700 0.660 9.500 2.501 0.950 2.009 P2 1.410 1.100 1.605 1.500 1.440 0.755 2.440 2.250 1.420 2.450 1.000 1.448 1.240 1.550 1.650 1.450 1.285 7.500 9.400 8.020 8.250 3.244 3.244 9.140 14.000 0.000 0.000 0.000 0.000 1.448 1.500 1.300 1.200 1.900 1.900 1.650 1.600 1.480 1.425 1.285 0.630 1.450 0.654 1.420 1.050 1.650 1.700 1.650 1.450 1.500 1.700 0.660 9.500 3.751 0.591 2.509 P3 1.303 1.179 1.499 1.439 1.382 0.702 2.437 2.208 1.420 2.568 0.932 1.486 1.299 1.423 1.730 1.461 1.197 11.658 14.571 11.892 11.564 4.107 4.185 9.781 14.898 0.002 0.028 -0.072 -0.014 1.463 1.473 1.267 1.217 1.899 1.988 1.684 1.526 1.444 1.337 1.331 0.689 1.631 0.667 1.564 0.729 1.602 1.758 1.622 1.541 1.597 1.813 0.771 9.414 2.472 0.934 2.029 P4 1.306 1.179 1.499 1.460 1.381 0.702 2.443 2.214 1.420 2.567 0.933 1.476 1.339 1.422 1.688 1.488 1.368 7.419 9.766 8.199 7.743 3.043 3.151 9.484 14.159 0.023 0.077 -0.092 -0.071 1.465 1.476 1.269 1.217 1.900 1.987 1.683 1.529 1.445 1.337 1.347 0.694 1.649 0.667 1.572 0.727 1.606 1.701 1.556 1.370 1.578 1.821 0.776 9.572 3.637 0.451 2.389