Privileged Substructures Revisited: Target Community-Selective Scaffolds Jürgen Bajorath Life Science Informatics University of Bonn Privileged Substructures First postulated by Evans et al. in 1988 based on the observation that many cholecystokinin antagonists contained conserved substructures not frequently seen in other active compounds Since then the search for target class-privileged chemotypes has continued in medicinal chemistry Generally accepted definition: - Recurrent fragments in ligands of a given target family - Selective at the family level, but not for individual targets Evans BE et al. J. Med.Chem.1988, 31, 2235-2246 Privileged Substructures Existence of truly target family-privileged substructures has remained controversial Intrinsic limitation: Search for privileged substructures has been based on frequency of occurrence analysis of preselected substructures Often drawn conclusion: Substructure might occur with high frequency among ligands of a particular target family but also act on other families Privileged Substructures Are target family-privileged substructures truly privileged? Target Family Set # Compounds # Substructures GPCR class A 21620 1190 Ligand gated ion channels 3792 297 Nuclear hormone receptors (NHRs) 2176 121 Protein kinases 1079 101 Serine proteases 3015 323 Schnur DM et al. J. Med. Chem. 2006, 31, 2000-2009 Privileged Substructures Are target family-privileged substructures truly privileged? Ligand sets Target Family Substructure Sets GPCR Ion channels NHRs Protein kinases Serine proteases Random cpd sets GPCR class A - 26% 10% 11% 17% 46% Ligand gated ion channels 47% - 15% 19% 92% 99% Nuclear hormone receptors (NHRs) 40% 30% - 17% 15% 45% Protein kinases 48% 34% 16% - 20% 57% Serine proteases 25% 11% 7% 91% - 37% Schnur DM et al. J. Med. Chem. 2006, 31, 2000-2009 Changing the Analysis Concept Do molecular scaffolds exist that exclusively occur in ligands of individual target families ? Peptidases GPCRs Kinases ... - Bemis & Murcko framework (scaffold) - Large-scale distribution in target families Departing from frequency of occurrence analysis of preselected substructures Systematic compound data mining taking all available activity annotations into account Hierarchical Scaffolds Compound 1 R-groups Framework 2 Ring System 3 Linker Bemis GW and Murcko MA. J. Med. Chem.1996, 39, 2887-2893 Public Data Source - BindingDB BindingDB database: - Public repository of activity information of small molecules ~31,000 compound entries with ~57,000 activity annotations 17,745 compounds active against human targets extracted Analysis Strategy - Compound Sets Target pair sets: - Active compounds are organized into target pair sets - A set contains all compounds active against two individual targets (i.e. compounds might belong to multiple sets) Binding DB target pair sets: - Sets obtained for 520 pairs of targets that share >= 5 - compounds 6,343 compounds active against 259 human targets Pubchem confirmatory bioassays: - Only 3 relevant human target pairs meet the >= 5 compound criterion Compound-Based Target Network 520 target pairs are visualized in a network representation - Nodes: targets - Edges: target pair sets - Edge width: number of 1 2 3 4 5 6 shared compounds Densely connected communities - 18 communities - >= 4 targets - Different target families 7 9 10 11 12 13 14 15 16 8 17 18 Community-Selective Scaffolds 520 human target pair sets (6,343 BDB compounds; 259 targets); 18 target communities 206 community-selective scaffolds: - Exclusively act in a single community - With 5 - 45 compounds/scaffold (av. ~12) - Yielding 147 distinct carbon skeletons (topological diversity) Adding Selectivity Information For each compound active against a target pair, its target selectivity (TS) is calculated as: TS pKi A pKi B Compound |TS| values range from 0 to 6.86 - 0: equal potency, no selectivity - 6.86: potency difference of nearly 7 orders of magnitude, i.e. highly selective for one target over another Selectivity profiles of scaffolds - Community-based - Target-based Selectivity Profiles Community-based selectivity profile: - For each scaffold found in a given community All corresponding compounds active against any target pair in this community pooled Median of their absolute TS values determined (median |TS|) Target-based selectivity profile: - For each scaffold active against a given target All corresponding compounds active against this target pooled Selectivity against any other target calculated Median of their TS values determined (median TS) Community Selectivity of Scaffolds Scaffold / Community heat map: - Columns: target communities - Rows: scaffolds - Color spectrum: median |TS| Red: scaffold yields many compounds with different potency against individual targets Yellow: scaffold does not yield selective compounds Non-selective scaffolds - Occur in multiple communities Community-selective scaffolds - Exclusively occur in one community Target Selectivity of Scaffolds Scaffold / Target heat map: - Columns: targets in a community - Rows: scaffolds - Cell: the scaffold represents >= 5 compounds active against the target - Color spectrum: median TS Red (positive): more selective for the target over others in the community Yellow (negative): more selective for other members of the community Target Selectivity of Scaffolds Community 3: 16 serine proteases Different scaffolds display same selectivity profile - e.g. Factor Xa/Thrombin Scaffolds with no apparent target selectivity Number of scaffolds per target varies - Factor Xa: 17; Thrombin: 18 Tryptase: 0; Hepsin: 0 Target Selectivity Ranking Community-selective scaffolds are ranked according to median |TS| 5.2 37 scaffolds at least half of compounds having >= 100-fold potency differences against >= 2 community targets 111 scaffolds with targetselective tendency 2 1 0 Community-Selective Scaffolds 98: 1.10 3: 4.03 Rank Median |TS| DPP8 CA9 CA2 DPP4 CA1 CA14 CA12 Color spectrum: median TS Red: high potential to yield target-selective compounds Yellow: low potential CA5A CA7 CA5B CA4 CA3 CA6 Selectivity Searching (MDDR) Thrombin FXa Highly selective for FXa over other serine proteases Selectivity Searching Caspase 7 Caspase 3 Inhibit both caspase 3 and 7 with nM potency; ~200-fold selective over caspases 1, 6, 8 Extending the Analysis: ChemblDB Recent public domain database: ChemblDB - ~500,000 compounds with activity information 32,848 compounds with high-confidence annotations active against 671 human targets High-confidence activity annotations: - Target confidence level: 9 Interaction type: D(irect) ftp://ftp.ebi.ac.uk/pub/databases/chembl/latest/ ChemblDB vs. BindingDB Comparison at different levels - Active compounds (human targets) Scaffolds Network Community-selective scaffolds Topologically distinct scaffolds ChemblDB BDB 3,589 32,848 17,745 Compounds ChemblDB BDB 1,409 12,902 6,291 Scaffolds ChemblDB vs. BindingDB Comparison at different levels - Active compounds (human targets) Scaffolds Network Community-selective scaffolds Topologically distinct scaffolds BDB GPCRs shared targets unique targets tyrosine kinases CDB ChemblDB vs. BindingDB Comparison at different levels - Active compounds (human targets) Scaffolds Network Community-selective scaffolds Topologically distinct scaffolds ChemblDB BDB 34 311 206 Community-selective ChemblDB BDB 85 227 147 Topologically distinct Community-Selective Scaffolds Distribution in drugs? - DrugBank: 1,247 approved drugs with 726 unique scaffolds - Only 11 overlap with 206 community-selective BDB scaffolds - Community-selective scaffolds currently underrepresented in drugs; opportunities for further chemical exploration Conclusions The existence of target class-privileged substructures has remained controversial over the years From putative privileged substructures to confirmed target community-selective scaffolds through systematic data mining Community-seletive scaffolds are abundant and topologically diverse A subset of community-selective scaffolds displays a notable tendency to produce compounds with different target selectivity BDB and CDB contain complementary target and scaffold information Acknowledgments Ye Hu Anne Mai Wassermann Eugen Lounkine