SUPPLEMENTARY INFORMATION for Title: Plate-Based Diversity Subset Screening: An Efficient Paradigm for High Throughput Screening of a Large Screening File Journal: Molecular Diversity Authors: Andrew S. Bell, Joseph Bradley, Jeremy R. Everett, Michelle Knight, Jens Loesel, John Mathias, David McLoughlin, James Mills, Robert E. Sharp, Christine Williams, Terence P. Wood Corresponding Authors Jeremy R Everett: j.r.everett@greenwich.ac.uk Jens Loesel: Jens.Loesel@ecotoxchem.co.uk Total number Total Number of BCUT Cell of BCUT number of cells covered Occupancy cells compounds by PBDS 1 6,968 6,968 1,301 2 4,379 8,757 1,508 3 2,482 7,447 1,149 4 1,994 7,975 1,105 5 1,514 7,570 960 6 1,185 7,110 828 7 948 6,636 697 8 882 7,056 698 9 740 6,660 609 Totals 21,092 66,180 Number of cells % of cells double% of cells doublecovered by covered by covered by PBDS PBDS PBDS N/A 19% N/A 163 34% 4% 263 46% 11% 377 55% 19% 413 63% 27% 407 70% 34% 407 74% 43% 387 79% 44% 390 82% 53% Supplementary Table 1: the profile of low occupancy cells, with 1 to 9 compounds per cell, in the Pfizer screening file at the time of the construction of PBDS and the coverage of those cells by PBDS. N/A = not applicable, as single occupancy cells cannot be double-covered. Even though these low occupancy cells were not targeted by the PBDS design algorithm, they are enriched relative to random selection. Plate-Based Diversity Screening (PBDS) Supplementary Information C 9-Feb-16 Page 1 of 6 HTS Campaign Beta2 Beta2 Beta2 D3 D3 D3 MC3 MC3 MC3 HIV RT HIV RT HIV RT V1A V1A Method of Series Definition BCI Cluster Murcko BCUT BCI Cluster Murcko BCUT BCI Cluster Murcko BCUT BCI Cluster Murcko BCUT BCI Cluster Murcko Number of series identified from primary HTS data 468 123 213 499 162 240 444 86 211 455 124 209 1426 484 Mean Series Size 9.0 13.2 10.1 9.4 11.2 10.2 9.8 10.7 9.7 8.6 10.8 10.8 10.4 11.9 Minimum Maximum Series Series Size Size 5 57 5 193 5 51 5 35 5 126 5 63 5 40 5 74 5 47 5 41 5 96 5 205 5 56 5 247 Median Series Size 7 8 7 8 7 8 8 8 7 7 7 7 9 7 Number of Series contained in PBDS 273 71 128 291 83 133 253 38 117 181 52 89 614 201 Supplementary Table 2: the distribution of the number of primary hit series found by full file singleton HTS in five representative campaigns and the properties of those series, together with an analysis of the number and % of those series that would have been found if the Plate-Based Diversity Subset, PBDS had been used instead of the full Pfizer screening file. Three different methods of series definition were used: BCI fingerprints, Murcko scaffolds and BCUTs. A minimum series size of 5 compounds was defined in advance. The five targets were: the human beta-2 adrenergic receptor (Beta2), the human D3 dopamine receptor (D3), the human melanocortin 3 receptor (MC3), the HIV reverse transcriptase enzyme (HIV RT) and the human V1a vasopressin receptor (V1A). Plate-Based Diversity Screening (PBDS) Supplementary Information C 9-Feb-16 Page 2 of 6 Relative Recall % (PBDS/ HTS) 58% 58% 60% 58% 51% 55% 57% 44% 56% 40% 42% 43% 43% 42% Supplementary Figure 1: a) the % single-coverage of the target BCUT space achieved against the number of screening plates selected in 17 sequential iterations of the plate selection process. As the process progresses from the random starting point (magenta) to the final, 17 th iteration (light blue), the single-coverage of the target BCUT space improves significantly, such that more BCUT space is covered by fewer plates. b) the corresponding % double-coverage of the target BCUT space achieved in the same optimisation. The process converged by iteration 17. Supplementary Figure 2: the distribution of hydrogen bond donors for compounds in the Plate-Based Diversity Subset, PBDS. Plate-Based Diversity Screening (PBDS) Supplementary Information C 9-Feb-16 Page 3 of 6 Supplementary Figure 3: the distribution of the number of hydrogen bond acceptors (sum of nitrogen and oxygen atoms) in each molecule of the Plate-Based Diversity Subset, PBDS. Supplementary Figure 4: the distribution of molecular weight for compounds in the Plate-Based Diversity Subset, PBDS. Plate-Based Diversity Screening (PBDS) Supplementary Information C 9-Feb-16 Page 4 of 6 Supplementary Figure 5: the distribution of the number of rotatable bonds across the compounds in the PlateBased Diversity Subset, PBDS. Supplementary Figure 6: the distribution of the calculated (Ertl method) topological polar surface area in Å2 across the compounds in the Plate-Based Diversity Subset, PBDS. Plate-Based Diversity Screening (PBDS) Supplementary Information C 9-Feb-16 Page 5 of 6 b) 100 % of series retrieved % of series retrieved a) 80 60 40 80 BCI cluster 60 Murcko BCUT 40 Hit rate 20 20 0 100 0 10,000 20,000 30,000 40,000 number of compounds in iteration 0 0 10,000 20,000 30,000 40,000 number of compounds in iteration Supplementary Figure 7: a) the additional % of primary series retrieved for the PBDS vs the full file HTS for the beta-2 adrenergic receptor assay (with three different series definitions), plotted against the number of compounds added in a single iteration. The compounds in the iteration were in descending order according to their score in a Bayesian model of probability of beta-2 activity. The Hit rate line (light blue) plots the % of compounds found in the PBDS plus the single iteration relative to the full-file primary screen. b) the corresponding additional % of primary series and hit rate retrieved for the GDRS vs the full file HTS for the same assay. Plate-Based Diversity Screening (PBDS) Supplementary Information C 9-Feb-16 Page 6 of 6