1 Determination of force field parameters 2 a) Van de waals (vdw) parameters 3 As described in the Method section, the reasonable distance between OG and CT is 4 1.5 Å in substrate productive docking geometry. In practice, we applied enhanced vdw 5 interaction to assure this desired inter-atomic distance. The potential energy derived 6 from vdw interaction is calculated using Equation (S1-S3) in AD4. 7 Vvdw 12 6 4 r r 8 1 ( A B ) 2 9 A B (S1) (S2) (S3) 10 According to the Equation S2, the original σOG (3.2 Å) and σCT (4.0 Å) value 11 should be reduced to present a sum of atomic σ values of 3.0 Å to present idea 12 equilibrium OG-CT distance. On the other hand, the appropriate vdw radii must be 13 extensively investigated to eliminate possible troubles. For instance, we reduce the 14 σCT to 0.5 Å, while the σOG remains relatively large value (2.5 Å). In this case, the 15 nearby atoms of CT will present large repulsion to OG atom when the substrate 16 docked in ground state (Figure S1). This will lead the high docking energy in 17 productive docking conformation, and make the conformation unfavorable in 18 energy-based docking program. 19 We also note that the σCT should be less than 2.0 Å, which is equal to the vdw 20 radius of hydrogen atom. Otherwise, the hydroxyl hydrogen atom in substrate will 21 have higher priority than CT to dock around the OG due to small radius and higher 22 positive charge (Figure S2b). Practically, the σCT value 1.0 Å of and σOG value of 2.0 23 Å was applied in our docking process. 24 In addition, the original non-bound interaction between CT and OG is relatively 25 weak and unable to present productive docking geometry with reasonable OG-CT 26 distance (1.5-1.7 Å). Consequently, we increased the ε value of CT and OG (to 30 27 kcal/mol and 70 kcal/mol, respectively) to overcome the high system energy of TI 28 docking model (Figure S3). As a result, the OG-CT distance ranged from 1.5-1.7 Å in 29 most of docking solutions and the potential energy increased rapidly when big 30 deviation appeared in the inter-atomic distance (e.g. the potential energy increased 31 approximately 3.5 kcal/mol when the distance rose from 1.7 Å to 2.0 Å). 32 33 b) Parameters for catalytic hydrogen bond 34 Another important feature of substrate productive docking geometry is that the three 35 catalytically important hydrogen bonds (H-bonds) between substrate and the active 36 sites of enzyme must be present (Figure 2). However, these H-bonds, especially the 37 one between O2 atom in substrate and the HE atom of the protonated His residue were 38 usually absent in docking poses with the original AD4 force field setting. It might be 39 due to the naturally high system energy of catalytic transition state similar to that 40 shown in Figure S3. Accordingly, we tried to enhance these catalytic H-bonds 41 interaction and have applied a number of ε values (H-bond well depth) to explore the 42 most appropriate parameter. Practically, a ε value of 30 kcal/mol has been applied to 43 present productive docking geometry. Under this setting, the docking energy increases 44 approximately 5.0 kcal/mol in the absence of one catalytic H-bond. This energy is so 45 high that downgrade or eliminate the non-productive docking pose in docking ranks. 46 47 Receptor preparation 48 The lipase structures in active open conformation were obtained from the protein 49 database (PDB) as suggested by Tyagi and Pleiss (2006), and the esterase structures 50 were built by SWISS-MODEL (Arnold et al., 2006) using homology modeling 51 method. 52 Firstly, the ligand and other non-protein molecules were removed from the 53 protein structure. The omitted hydrogen atoms were then added and atom partial 54 charges in protein were calculated using the Dock Prep utility in Chimera software 55 (Pettersen et al., 2004). The catalytic histidine residue was protonated and had total 56 charge of +1. The hydrogen in serine hydroxyl group was removed manually and the 57 partial charge of the residue was recalculated using the restrained electrostatic 58 potential (RESP) method (Bayly et al., 1993) based on electrostatic potentials 59 calculated with GAUSSIAN 03 (Frisch et al. 2003), and the total residue charge was 60 -1. Finally, the receptor files were prepared in pdbqt format by ADT program (Sanner, 61 1999). A special atom type was assigned to the serine OG atom with modified vdw 62 parameters (ε=70 kcal, σ = 2.0 Å). The atom was also set as non-hydrogen forming 63 type in order to eliminate the possible hydrogen bonds to the polar hydrogen in 64 substrates (Figure S2b). 65 66 Ligand preparation 67 All substrates were prepared in their ester forms to facilitate comparison of the results 68 (Table 1). The 3D structures of substrates were generated by Chemoffice (Cambridge 69 Soft Corporation) program. Both of ground state form (GS) and high energy 70 tetrahedral intermediate (TI) of the substrate were included. The substrate structures 71 were then minimized and the atom partial charges were computed using Chimera 72 program. The molecular charge of the substrate was neutral in GS form while was -1 73 in TI. The final ligand files were prepared in pdbqt format by ADT. 74 Each chiral substrate had two enantiomers at ground state and four stereogenic 75 conformations in TI form owing to an additional asymmetric center in the molecule 76 (Figure S4). As a result, each substrate had 6 different structures to be docked and the 77 whole substrate library contained 486 molecules in total. These ligands could be 78 classified into GS, (R/S,R)-TI and (R/S,S)-TI forms (Figure S4). In TI model, the first 79 R/S is the original substrate enantioform, and the second one indicates the 80 conformation of the substrate tetrahedral carbon atom (CT). 81 For all ligands, the vdw well depth value (ε) of CT atom was adjusted to 30 kcal 82 and the collision radius (σ) was set at 1.0 Å. Simultaneously, the ε values of the 83 catalytic hydrogen bonds was increased to 30 kcal as described above. 84 85 Docking parameters 86 A searching grid box was set in appropriate size prior to docking (Table S1). The box 87 center was set exactly at the OG atom of the catalytic serine. 100 possible docking 88 conformations of each substrate in receptor were obtained by Lamakian genetic 89 algorithm (LGA) using system time as random seed generator. The population size 90 was 150 and maximum number of energy evaluation was 25,000,000. The maximum 91 generation of generations was 27,000, the gene mutation rate was 0.02 and the 92 crossover rate was set to 0.8 for the LGA searching. Other searching parameters were 93 used default setting (step size, energy outside grid and so on). Finally, the docking 94 free energy (ΔGdocking) was calculated by the sum of vdw, electrostatic, hydrogen 95 bond, desolvation and torsion items (Equation S4). The resulting docking solutions 96 were ranked with the docking energy from high to low. 97 GG d o c k i n g = v d w G h b + G e l e c + G d e s o l v + G t o r (S4) 98 99 Screening for the productive docking geometries 100 As described in Methods section, the productive docking geometry of substrate must 101 meet several criteria: 1), the distance between serine OG atom and substrate CT atom 102 should not exceed 1.7 Å; 2) all catalytic hydrogen bonds (H-bonds) must be formed 103 between enzyme and substrate (Figure 2). 104 In practice, we applied a Perl script to handle with the extensive screening work. 105 For each potential substrate docking pose, the coordinates of CT atom along with the 106 coordinates of OG atom in enzyme was obtained to calculate the OG-CT distance. 107 Meanwhile, the presence of catalytically important H-bond was checked out based on 108 a distance constraint (the distance between hydrogen atom and H-bond receptor 109 ranges from 1.2 to 2.7 Å) and an angle constraint (H-bond donor -- hydrogen atom -- 110 H-bond receptor is equal to or large than 120 ˚) as shown in Figure S5. 111 112 Statistical analysis 113 The statistical analysis was performed with SPSS 10.0 software (SPSS Inc.). The 114 prediction error distribution of GS and TI model were analyzed by nonparametric 115 tests (K-S test and two independent samples test). The influence of substrate torsion 116 level and the E value of enzyme/substrate on prediction were evaluated by analysis of 117 variance (ANOVA). The correlation analysis was achieved by Pearson correlation 118 method. 119 120 121 Results and discussions 122 123 Ground state versus tetrahedral intermediates 124 In the search of possible productive poses of substrate, the GS substrates presented 125 better docking performance than the substrate in TI form. Most of the well oriented 126 GSs (127/138) had the highest score in the ranks of docking energy. While in the TI 127 model, 95% of corrected docked (R/S,S)-TI ranked among the top 5 docking solutions. 128 Meanwhile, only 43% of (R/S,R)-TIs were scored within this range and approximately 129 30% (43/138) of (R/S,R)-TIs ranked out of the top 50. Additionally, different 130 proportions of (R/S,R)-TI and (R/S,S)-TI were observed in the scoring conformations 131 that were used to calculate the modified docking energy difference (ΔΔG’docking). 132 These scoring conformations mostly were (R/S,S)-TI (119/138), which well agreed 133 with the previous structure studies about the conformation of substrate analogues in 134 enzymes (Derewenda et al., 1992;Uppenberg et al., 1995). 135 To represent enzyme enantioselectivity, the ΔΔG’docking values were calculated by 136 Equation (2) and compared with the activation free energy difference (ΔΔG≠). Among 137 all the 69 enzyme/substrate docking pairs (Table 1), the experimental ΔΔG≠ value was 138 within the range from -4.02 kcal/mol to 4.09 kcal/mol. The predicted ΔΔG’docking 139 ranged from -7.36 kcal/mol to 2.74 kcal/mol in GS model (K=1.03), and from -4.8 140 kcal/mol to 2.8 kcal/mol for TIs (K=1.22), respectively. The overall prediction error 141 (EP) distributions of GS and TI model were demonstrated in Figure S6. As shown, the 142 errors of GS and TI model both normally distributed and no significant difference 143 (P<0.05, two independent samples test) was observed between them. However, the TI 144 model showed a little smaller error range (-4 to 4.5 kcal/mol) than GS (±5 kcal/mol), 145 and it presented superior accuracy in prediction. In TI model, 48% of prediction 146 results had the error lower than 1 kcal/mol, and the average error was 1.2 kcal/mol. 147 The GS model had 29% of results with the 1 kcal/mol of prediction error and the 148 average error was 2.3 kcal/mol. 149 It has been widely reported that docking with high-energy substrate intermediates 150 has striking improvement to represent enzyme activity and enantioselectivity 151 compared with the docking in GS model (Hermann et al., 2006; Tyagi and Pleiss, 152 2006; Juhl et al., 2009). This might be due to the fact that the intermediate structure 153 can adopt its docking geometry mostly similar to those naturally occurring. 154 As shown in Figure S7, the high-energy TI usually had its chiral groups exactly 155 localize at these so-called specificity pockets that are in charge of the enantiomeric 156 recognition (Orrenius et al., 1998). The docked GS substrate, by comparison, had its 157 alcohol and acid moiety differently oriented. Consequently, the chiral groups might be 158 docked out of the binding pockets and presented a different docking energy. The 159 docking results also revealed that the substrate docked as (R/S,S)-TI usually had lower 160 docking free energy than the results of (R/S,R)-TI, agreeing with the previous 161 structure and simulation studies (Uppenberg et al., 1995; Orrenius et al., 1998). As 162 shown in Figure S7, the docked (R/S,S)-TI had both acyl and alcohol moiety well 163 oriented towards the specific pockets. By comparison, the acid part and alcohol part of 164 (R/S,R)-TI adopt an opposite direction, which could lead to the severe steric repulsion 165 from protein as observed in other studies (Uppenberg et al., 1995; Orrenius et al., 166 1998; Ema, 2004). 167 168 References: 169 Arnold K, Bordoli L, Kopp J, Schwede T. 2006. The SWISS-MODEL workspace: a 170 web-based environment for protein structure homology modelling. Bioinformatics 171 22:195-201. 172 Bayly CI, Cieplak P,Cornell WD, Kollman PA. 1993. A well-behaved electrostatic 173 potential based method using charge restraints for deriving atomic charges-the 174 RESP model. J Phys Chem 97:10269-10280. 175 Derewenda, U., Brzozowski, A.M., Lawson, D.M., Derewenda, Z.S. 1992. Catalysis 176 at the interface: the anatomy of a conformational change in a triglyceride lipase. 177 178 179 Biochem 31:1532-1541. Ema T. 2004. Mechanism of enantioselectivity of lipases and other synthetically useful hydrolases. Curr Org Chem 8:1009-1025. 180 Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE. 2003. Gaussian 03, Revision B.04, 181 Gaussian, Inc., Pittsburgh PA, Revision C.02 was used for the O3LYP, TPSS and 182 B1B95 calculations. 183 Orrenius C, Hæffner F, Rotticci D, Öhrner N, Norin T, Hult K. 1998. Chiral 184 recognition of alcohol enantiomers in acyl transfer reactions catalysed by Candida 185 antarctica lipase B. Biocatal Biotransfor 16:1-15. 186 Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin 187 TE. 2004. UCSF Chimera - A visualization system for exploratory research and 188 analysis. J Comput Chem 25:1605-1612. 189 190 191 192 Sanner MF. 1999. Python: A programming language for software integration and development. J Mol Graphics Mod 17:57-61. Tyagi S, Pleiss J. 2006. Biochemical profiling in silico -Predicting substrate specificities of large enzyme families. J Biotech 124:108-116. 193 Uppenberg J, Öhrner N, Norin M, Hult K, Kleywegt GJ, Patkar S, Waagen V, 194 Anthonsen T, Jones TA. 1995. Crystallographic and molecular-modeling studies of 195 lipase B from Candida antarctica reveal a stereospecificity pocket for secondary 196 alcohols. Biochem 34:16838-16851. 197 198 199 200 Table: 201 202 Table S1. Grid size used for substrate docking conformation searching of 7 enzymes Enzyme Grid points Grid spacing Enzyme Grid points Grid spacing 203 204 205 206 207 208 209 210 211 212 213 214 (X, Y, Z) (Å) (X, Y, Z) (Å) BTL 50, 60, 50 0.375 RML 60, 60, 60 0.375 CALB 50, 60, 50 0.375 PFE 50, 60, 50 0.375 CRL 50, 60, 60 0.375 PPE 50, 60, 50 0.375 HLL 60, 60, 50 0.375 215 216 Figure legends: 217 Figure S1. The nearby atoms of carbonyl carbon (CT) present large repulsion to the 218 OG atom if OG remains large vdw radius. 219 220 Figure S2. Schematic representation of incorrect TI’s docking geometries in the ELE. 221 a), the long distance between the serine OG atom and substrate CT atom disrupts one 222 catalytic hydrogen bond; b), serine OG atom can form hydrogen bond to the polar 223 hydrogen in substrate. 224 225 Figure S3. Increasing the vdw interaction between OG and CT makes the substrate 226 productively docked. In this figure, “A” (green arrow) is vdw interaction between OG 227 and CT, “B” (red arrow) indicates the steric repulsion between protein and substrate 228 molecule. In the productive docking geometry, the substrate molecule is more close to 229 the protein than that in non-productive one; hence suffer higher repulsion (B). This 230 docking pose has so high energy and is excluded in the low-energy preferred docking 231 process. Increasing the vdw attractive interaction (A) can reduce the system energy 232 and equip the productive docking geometry higher rank in the docking solutions. 233 234 Figure S4. Schematic presentation of substrate GS, (R/S,R)-TI and (R/S,S)-TI forms 235 used in docking. R* indicates the chiral group in substrate. 236 237 Figure S5. Distance and angle constraints to confirm H-bonds (modified from the 238 Deep View software manual, point 101). 239 240 Figure S6. Prediction error (EP) distribution in the docking mediated prediction with 241 substrate in gound state (GS) and tetrahedral intermediate (TI) form. The errors were 242 calculated in Equation (4). The y axis shows the number of enzyme/substrate pairs 243 with a given error range (x axis). Values of zero correspond to the complex pairs that 244 are accurately predicted. 245 246 Figure S7. Productive docking geometry of substrate used in GS, (R/S,S)-TI and 247 (R/S,R)-TI form. Catalytic residues (Ser105 and His224) and oxyanion hole (Thr40 248 and Gln106) of CALB has been shown to facilitate the illustration of the substrate 249 orientation. In the enzyme, GS substrate docks in two distinctive conformations (GS1 250 and GS2), while GS1 has lower docking energy compared with GS2 and dominates 251 the scoring conformation. In TI model, the first R/S is the original substrate 252 enantioform, and the second one indicates the conformation of the substrate 253 tetrahedral carbon atom (CT). 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 Figure S1 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 Figure S2 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 Figure S3 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 Figure S4 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 Figure S5 351 352 353 354 355 356 357 358 359 360 361 362 Figure S6 363 364 365 366 367 Figure S7: