Supplemental Results Validation of BSO-PG and BSO-PGL Procedures on Native Structures with Randomized Sidechain Orientations Steric-Only Rotamer Libraries CGEN Rotamer Libraries Fixed Receptor Fixed Receptor Randomized Receptor Randomized Receptor Average RMSD (Å ) 1.09 1.25 2.70 2.62 Fraction <1.5Å RMSD 0.82 0.71 0.42 0.40 Fraction <2.0Å RMSD 0.85 0.74 0.48 0.46 Fraction <3.0Å RMSD 0.92 0.91 0.58 0.61 Table S1 – Testing of BSO-PG methodology. The functional groups of ligands and the set of all protein sidechains within 5Å of the ligand were optimized using a sidechain packing optimization. In the fixed receptor case the ligand functional groups were assigned a random rotamer before the optimization. In the randomized receptor case, all protein sidechains were assigned a random rotamer before the optimization. Heavy-atom RMSD of ligand functional groups to their native position is reported. In order to validate the BSO-PG procedure, this algorithm was tested for its ability to correctly predict the ligand functional group positions given either a fixed protein receptor or a fixed receptor with randomized sidechain orientations (table S1). The results were tested against the same test set described in the methods. Briefly this is a set of 86 proteins in which either apo or cross docking crystal structures are available in addition to a holo structure. In this case, however, only the holo structure was used. For each structure two predictions were made. In both predictions the functional groups of the ligand were assigned to random rotamer states. In the first prediction, a fixed protein receptor was used. So long as the rotamer library contains the ligand conformation of interest, and the ligand contains no solvent exposed regions, this should be successful nearly 100% of the time as the receptor rarely will have more than a single cavity large enough for each functional group without rearranging sidechains. Using the steric-only libraries, this sidechain optimization with a fixed ligand produced an average sidechain heavy atom RMSD of 1.09Å and 85% of cases were within 1.5Å RMSD (table S1). Visually inspection of the remaining cases showed that the cases with larger RMSD were primarily caused by ligand rearrangement in solvent accessible areas. When all of the protein sidechains within 5.0Å where randomized and then re-optimized, the results decreased slightly to 1.25Å average RMSD and 74% of all cases predicted to within 1.5Å RMSD. For comparison libraries were also built with an established sampling method designed for high-throughput screening with docking style energy functions. Each ligand is sampled in the CGEN sampling algorithm used for predicting ligand flexibility in the GLIDE docking algorithm (Friesner, Banks et al. 2004). This sampling was then converted into a rotamer library for that ligand using the same core definition as described in the steric-based approach. One-hundred-thousand conformers were generated and up to 10,000 for each functional group were placed in the rotamer library. As the steric-only rotamer libraries could contain millions of possible rotamers before sterically screening against the protein backbone conformation, these libraries are substantially smaller, but should contain only rotamers more likely to be found in energetically probably structures. This produced average RMSDs of 2.62 and 2.70Å heavy-atom RMSD in the ligand functional groups with a fixed and randomized protein respectively (table S1) Visual inspection revealed that many of the ligands were in energetically unfavorable conformations caused because the native ligand conformation was not included in the rotamer library. It is also notable that results were actually improved when the receptor binding site was randomized and allowed to re-optimize. This could be because while the CGEN rotamer library did not have the native conformation it had a nearnative conformation that could be accommodated in the receptor structure if some of the surrounding sidechains were optimized for this conformation. The CGEN approach was designed for use with lower resolution docking-style function and therefore is probably not ideal for this purpose. The CGEN approach does speed up the optimization of the ligand functional groups substantially as there are an order of magnitude fewer rotamer to evaluate. However as the bulk of the time for the optimization of the protein sidechains around a flexible ligand is taken up by the optimization of the protein sidechains, this gain is not significant as part of the overall algorithm. The BSO-PGL algorithm was tested in a similar way to the BSO-PG using the same test set. In this case the ligand core position was randomized in addition to the ligand functional group orientations. The ligand core was placed in an arbitrary position within 1.4Å and 25° of rigid body rotation of the native position for 86 test cases. For each test case the BSO-PGL algorithm was used to attempt to recreate the native structure with either a fixed protein receptor or a protein receptor in which all of the sidechains within 5Å were assigned to random rotamer states. A search area of 1.5Å and 30° of rigid body movement was set for the ligand and 10 clusters were scored for each ligand-receptor pair. Surprisingly, the results were improved with a randomized receptor verses a fixed receptor for all cases. This could be due to the same reasons that the results were improved in the BSO-PG case with CGEN rotamer libraries. Since only 10 positions are scored with the full energy function with full ligand and sidechain flexibility, there is probably a degree of undersampling. A fixed receptor does not allow a near-native ligand core position score well as the receptor is not allowed to adapt to it, whereas a flexible receptor is able to adapt and produce an energetically favorable conformation. The average RMSD using the steric-only rotamer libraries was 2.02Å RMSD with 70% of the test set correctly predicted to within 2.0Å heavy-atom RMSD (table S2). Again results were worse with the CGEN rotamer libraries than with the steric-only rotamer libraries for the same reasons as in the BSO-PG case. Steric-Only Rotamer Libraries CGEN Rotamer Libraries Fixed Receptor Fixed Receptor Randomized Receptor Randomized Receptor Average RMSD (A) 2.02 1.93 2.58 2.68 Fraction <1.5A RMSD 0.44 0.53 0.39 0.39 Fraction <2.0A RMSD 0.65 0.69 0.52 0.52 Fraction <3.0A RMSD 0.81 0.78 0.65 0.65 Table S2 – Testing of BSO-PGL methodology. The positions and conformations of ligands and the protein sidechains within 5Å of the ligand were optimized using the BS0-PGL hierarchal prediction method. In the fixed receptor case the ligand functional groups were assigned a random rotamer before the optimization and the ligand core was placed at a random position within 1.4Å and 25° of the native conformation. In the randomized receptor case, all protein sidechains were assigned a random rotamer before the optimization. Heavy-atom RMSD of ligands to their native position is reported. Further testing was conducted as to the size of the region that could be effectively sampled by this BSOPGL algorithm, again using this set of simplified test systems. Just as in hierarchal loop prediction there should be a relationship between the size of the sample space and the number of clusters required. For a constant number of clusters, 10 in this case, the sampling space was expanded from 0.5Å to 2.5Å in displacement and 10° to 50° in rotation. The ligand was placed at a random position within a slightly smaller (0.1Å 5°) space with all the protein sidechains and ligand functional groups assigned to random conformations. The heavy atom RMSD of the highest scoring cluster was then compared to the native holo structure (Table S3). As the search space expanded the scoring became less accurate. Since there are a constant number of clusters scored, there are more varied conformations within each cluster as the search space increases. Within each cluster, the RMSD of the central core region was calculated to the cluster representative before refinement. The maximum RMSD is a measure of the variation within the cluster which the scoring method will have to tolerate to produce accurate results. The average value over all the clusters generated is presented in table S3. When limited to 10 clusters a search space of 2.0Å and 40°, corresponding to a within cluster RMSD of 0.56Å, appears to be the maximum capable of generating fairly accurate results as accuracy begins to decrease more radically at this stage. Any search space larger than this would most likely require more clusters to be scored to maintain this resolution required for accurate results. A search space range of 1.5Å and 30° was used in this paper so as to be more conservative and reduce the effect of the number of clusters scored on the final results. The RMSD variation within clusters for the BSO-PGL algorithm presented in table 5 was also calculated and approximately equal variation (0.44Å) was noted here as for the equivalent search space in table S3 (1.5Å 30°). Steric-Only Rotamer Libraries Sampling Range for Ligand 0.5Å - 10° 1.0Å – 20° 1.5Å – 30° 2.0Å – 40° 2.5Å - 50° Maximum RMSD to Cluster Representative within each group (Å) 0.14 0.23 0.42 0.56 0.70 Average RMSD (Å) 1.58 1.80 1.93 2.12 2.35 Fraction <1.5Å RMSD 0.76 0.53 0.53 0.50 0.43 Fraction <2.0Å RMSD 0.86 0.71 0.69 0.65 0.57 Fraction <3.0Å RMSD 0.91 0.83 0.78 0.76 0.72 Table S3 – Success of BSO-PGL algorithm for different size sampling regions. The positions and conformations of ligands and the protein sidechains within 5Å of the ligand were optimized using the BS0PGL hierarchal prediction method. In the fixed receptor case the ligand functional groups were assigned a random rotamer before the optimization. All protein sidechains were assigned a random rotamer before the optimization. Heavy-atom RMSD of ligands to their native position is reported. The RMSD before refinement within each cluster was also reported.