Supplementary Materials - TNLIST, Department of Automation

Global optimization-based inference of chemogenomic features Zu,S. et al. Supplementary Materials Songpeng Zu1a, Ting Chen2, Shao Li1* 1Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 10084, China 2Molecular and Computational Biology Program, Department of Biological Science, University of Southern California, Los Angeles, California 90089, USA Table of Contents Supplementary Materials .................................................................................................................. 1 Data Sources ............................................................................................................................. 2 Methods..................................................................................................................................... 3 The EM framework ........................................................................................................... 3 Derivation of EM algorithm in our model ........................................................................ 4 The Association Method ................................................................................................... 5 Variance Estimation of the EM results. ............................................................................. 6 Combinations of drug chemical substructures .................................................................. 8 Estimation of fn and fp.............................................................................................................. 9 Results of the predicted drug-domain interactions .................................................................... 9 Reference ................................................................................................................................ 11 a Contact: zsp07@mails.tsinghua.edu.cn 1 / 11 Global optimization-based inference of chemogenomic features Zu,S. et al. Data Sources The information of drug-protein interactions, drug substructures, and protein domains is obtained from Tabei et al., (2012). A total of 1862 drugs are represented by 881-dimensional chemical substructure binary vectors from PubChem database[2], and 1554 proteins are represented by 876dimensional protein domain binary vectors from the Pfam database[3]. 4809 interactions exist between the drugs and the proteins. We deleted the drug chemical substructures or protein domains that never appeared in the drugs or proteins, and we merged those substructures or domains that appeared in the same drugs or proteins. The drug-domain interactions were extracted from PDB database by the script from Kruger et al., (2012). We only chose proteins that had multiple domains for our data. Finally, 53 pairs of drugprotein interactions with the records of drug-domain interaction were used. 2 / 11 Global optimization-based inference of chemogenomic features Zu,S. et al. Methods The EM framework Here we propose a probabilistic model to infer the substructure-domain interactions. It is inspired by the work [7], and follows the assumptions below: 1. The interactions between drug chemical substructures and protein domains in the pair of drugtarget interactions are independent. 2. A drug and a protein interact if and only if at least one pair of the drug chemical substructures and the protein domains interact. Let Yi represent the drug i, Pj represent the protein j, Zm represent the chemical substructure m, and (𝑖𝑗) Dn represent the domain n. Let 𝑍𝐷𝑚𝑛 denote whether the pair of the chemical substructure m from the drug i and the protein domain n from the protein j interact (the value is one) or not (zero otherwise). We use 𝜃𝑚𝑛 to present the interaction possibilities of the chemical substructure m and the protein domain n, that is, (𝑖𝑗) 𝜃𝑚𝑛 = Pr (𝑍𝐷𝑚𝑛 = 1) Then our aim is to evaluate the interaction possibilities θ = {𝜃𝑚𝑛 } of the chemical substructures and the protein domains. Under the Assumption 1 and 2, we can get Pr(𝑌𝑃𝑖𝑗 = 1|θ) = 1 − ∏ (1 − 𝜃𝑚𝑛 ) (𝑖𝑗) 𝑍𝐷𝑚𝑛 In which, 𝑌𝑃𝑖𝑗 denote whether or not the drug Yi and the protein Pj interact or not. If 𝑌𝑃𝑖𝑗 equals to 1, then they interact, and 0 otherwise. Since we know many drug-protein interactions remain unknown, which means YP cannot directly be used to denote the observed drug-protein interactions data, we then use the O = {Oij} (where Oij denote whether the drug i and the protein j interact or not, one for interaction, zero otherwise) to represent the given drug-protein interactions. In addition, in order to connect YP with O, we introduce two parameters, namely, the false negative rate fn and the false positive rate fp defined below: fp = Pr(𝑂𝑖𝑗 = 1|𝑌𝑃𝑖𝑗 = 0) fn = Pr(𝑂𝑖𝑗 = 0|𝑌𝑃𝑖𝑗 = 1) Then, we can get Pr(𝑂𝑖𝑗 = 1|θ) = ∑ Pr(𝑂𝑖𝑗 = 1|𝑌𝑃𝑖𝑗 = 𝑡) Pr(𝑌𝑃𝑖𝑗 = 𝑡|𝜃) 𝑡=0,1 = (1 − fn) Pr(𝑌𝑃𝑖𝑗 = 1|θ) + fp(1 − Pr(𝑌𝑃𝑖𝑗 = 1|θ) ) Moreover, the log likelihood function, i.e., the total probability of the observed drug-protein interactions data is l(θ) = log(Pr(𝑂|𝜃)) = log (∏ 𝑃𝑟 (𝑂𝑖𝑗 = 1|𝜃) 𝑖,𝑗 3 / 11 𝑜𝑖𝑗 1−𝑂𝑖𝑗 Pr(𝑂𝑖𝑗 = 0) ) Global optimization-based inference of chemogenomic features Zu,S. et al. In which θ = ({𝜃𝑚𝑛 }, fn, fp), where fn and fp are predefined. Then our aim is to estimate θ based on the maximum likelihood estimation (MLE). However, (𝑖𝑗) because we don’t know whether 𝑍𝐷𝑚𝑛 = 1 or 0 (which means the chemical substructure m from drug i interact with the protein domain n from protein j or not), this is a missing data problem. It is naturally to solve the problem by EM algorithm [6]. It follows:  The E step is : (𝑖𝑗) (𝑖𝑗) E (𝑍𝐷𝑚𝑛 |O, 𝜃 (𝑡−1) ) = 𝐸 (𝑍𝐷𝑚𝑛 |𝑂𝑖𝑗 , 𝜃 (𝑡−1) ) (𝑖𝑗) = Pr (𝑍𝐷𝑚𝑛 = 1|𝑂𝑖𝑗 , 𝜃 (𝑡−1) ) (𝑖𝑗) = Pr (𝑍𝐷𝑚𝑛 = 1, 𝑂𝑖𝑗 |𝜃 (𝑡−1) ) Pr(𝑂𝑖𝑗 |𝜃 (𝑡−1) ) (𝑡−1) 𝜃𝑚𝑛 (1 − 𝑓𝑛)𝑂𝑖𝑗 𝑓𝑛1−𝑂𝑖𝑗 =  Pr(𝑂𝑖𝑗 |𝜃 (𝑡−1) ) The M step is : (𝑡) 𝜃𝑚𝑛 = 1 (𝑖𝑗) ∑ 𝐸 (𝑍𝐷𝑚𝑛 |O, 𝜃 (𝑡−1) ) 𝑁𝑚𝑛 𝑖,𝑗 Note that 𝑁𝑚𝑛 is the total number of drug-protein pairs that contain the chemical substructure m and the protein domain n. Derivation of EM algorithm in our model Here we would show how to derive EM algorithm in our model. In general, two steps are involved in EM algorithm:  E step: 𝑄(𝜃, 𝜃 (𝑡) ) = 𝐸𝑦|𝑥,𝜃(𝑡) [log(Pr(𝑋, 𝑌|𝜃))]  M step: 𝜃 (𝑡+1) = 𝑎𝑟𝑔𝑚𝑎𝑥𝜃 (𝑄(𝜃, 𝜃 (𝑡) )) In which, X represents the observed and incomplete data. (X,Y) then are the complete data, while Y is the latent data. In our model, 𝑄(𝜃, 𝜃 (𝑡) ) = 𝐸𝑍𝐷|𝑂,𝜃(𝑡) [log(Pr(𝑍𝐷, 𝑂|𝜃))] = 𝐸𝑍𝐷|𝑂,𝜃(𝑡) [∑ log (Pr (𝑂𝑖𝑗 |𝑍𝐷 (𝑖𝑗) 𝑖,𝑗 4 / 11 ) Pr (𝑍𝐷 (𝑖𝑗) |𝜃))] Global optimization-based inference of chemogenomic features = ∑ 𝐸𝑍𝐷|𝑂,𝜃(𝑡) [log (Pr (𝑂𝑖𝑗 |𝑍𝐷 (𝑖𝑗) Zu,S. et al. ))] + ∑ 𝐸𝑍𝐷|𝑂,𝜃(𝑡) [log (Pr (𝑍𝐷 𝑖,𝑗 (𝑖𝑗) |𝜃))] 𝑖,𝑗 Not that the first summation has nothing to do with 𝜃, while the last summation can be rewritten as followed: (𝑖𝑗) last sum = ∑ 𝐸𝑍𝐷|𝑂,𝜃(𝑡) [log (∏ Pr (𝑍𝐷𝑚𝑛 |𝜃))] 𝑖,𝑗 𝑚,𝑛 𝑍𝐷 (𝑖𝑗) (𝑖𝑗) = ∑ ∑ 𝐸𝑍𝐷|𝑂,𝜃(𝑡) [log (𝜃𝑚𝑛𝑚𝑛 (1 − 𝜃𝑚𝑛 )1−𝑍𝐷𝑚𝑛 )] 𝑖,𝑗 𝑚𝑛 Then, 𝑍𝐷 (𝑖𝑗) (𝑖𝑗) 𝜕 log (𝜃𝑚𝑛𝑚𝑛 (1 − 𝜃𝑚𝑛 )1−𝑍𝐷𝑚𝑛 ) ∂𝑄(𝜃, 𝜃 (𝑡) ) = ∑ 𝐸𝑍𝐷|𝑂,𝜃(𝑡) [ ] ∂𝜃𝑚𝑛 𝜕𝜃𝑚𝑛 𝑖,𝑗 (𝑖𝑗) (𝑖𝑗) 𝑍𝐷𝑚𝑛 1 − 𝑍𝐷𝑚𝑛 = ∑ 𝐸𝑍𝐷|𝑂,𝜃(𝑡) [ − ] 𝜃𝑚𝑛 1 − 𝜃𝑚𝑛 𝑖,𝑗 Let the formula above equals to zero, we can finally get our EM procedure. The Association Method One of the problem of EM algorithm is that it converge to a local minimum and different initial values usually result in different local minimums. Instead of randomly choosing the initial values many times, here we used the association model to choose the initial values, which is a local way to evaluate the possibilities of drug substructures and protein domains interactions. It follows, 𝐼𝑚𝑛 𝜃𝑚𝑛 = 𝑁𝑚𝑛 in which Imn is the number of interacting pairs of drug-protein pairs containing the pair of chemical substructure Zm and protein domain Dn and Nmn is the number of total drug-protein pairs containing the pair of chemical substructure Zm and protein domain Dn. This method has two limitations.  Firstly, it computes the chemical substructure-protein domain interactions locally, which means it ignores other interactions between the chemical substructures and protein domains in the same drug-protein pairs. For example, drug Yi containing substructures {Zm ,Zy } interacts with both protein Pj containing domains {Dn ,Dy} and protein Pk containing domains {Dn ,Dc }. Substructure Zm and protein domain Dn do not appear in any other drugs and proteins, respectively. Then 𝜃𝑚𝑛 = 2/2 = 1. It obviously ignores other interactions between substructures and protein domains such as substructures Zn interacting with protein domain Dc. Therefore, to infer drug substructure and protein domain interactions, we should consider all the drug protein interactions and all the interactions between drug substructures and protein domains.  Secondly, this method relies on the accuracy of observed data. However, current drug-protein data are largely incomplete. 5 / 11 Global optimization-based inference of chemogenomic features Zu,S. et al. Variance Estimation of the EM results. The natural way to estimate the variance of the maximum likelihood estimation is followed [9]: 1 var(𝜃̂ ) ≈ 𝐼(𝜃̂) I(θ) is the observed information. 𝑑2 𝑙𝑜𝑔𝑃𝑟(𝑥|𝜃) 𝑑𝜃 2 In our situation, we derive the observed information below. 𝜕2 I(𝜃𝑚𝑛 ) = − 𝑙𝑜𝑔𝑃𝑟(𝑂|𝜃) 𝜕𝜃𝑚𝑛 2 I(θ) = − = − 𝜕2 (𝑚𝑛) (𝑚𝑛) log (Pr (𝑂𝑖𝑗 2 ∑ (𝑂𝑖𝑗 𝜕𝜃𝑚𝑛 𝑖,𝑗 (𝑚𝑛) = 1|𝜃)) + (1 − 𝑂𝑖𝑗 (𝑚𝑛) ) log (Pr (𝑂𝑖𝑗 = 0|𝜃))) Since 𝜕 (𝑚𝑛) (𝑚𝑛) (𝑚𝑛) (𝑚𝑛) ∑ (𝑂𝑖𝑗 log (Pr (𝑂𝑖𝑗 = 1|𝜃)) + (1 − 𝑂𝑖𝑗 ) log (Pr (𝑂𝑖𝑗 = 0|𝜃))) 𝜕𝜃𝑚𝑛 𝑖,𝑗 (𝑚𝑛) = ∑ 𝑖,𝑗 𝑂𝑖𝑗 (𝑚𝑛) Pr (𝑂𝑖𝑗 (𝑚𝑛) 1 − 𝑂𝑖𝑗 𝜕 𝜕 (𝑚𝑛) (𝑚𝑛) Pr (𝑂𝑖𝑗 = 1|𝜃) + Pr (𝑂𝑖𝑗 = 0|𝜃) (𝑚𝑛) 𝜕𝜃 𝜕𝜃 = 1|𝜃) 𝑚𝑛 Pr (𝑂 = 0|𝜃) 𝑚𝑛 𝑖𝑗 And 𝜕 𝜕 (𝑚𝑛) (𝑚𝑛) (𝑚𝑛) Pr (𝑂𝑖𝑗 = 1|𝜃) = ((1 − fn) Pr (𝑌𝑃𝑖𝑗 = 1|θ) + fp (1 − Pr (𝑌𝑃𝑖𝑗 = 1|θ) )) 𝜕𝜃𝑚𝑛 𝜕𝜃𝑚𝑛 Where 𝜕 𝜕 (𝑚𝑛) Pr (𝑌𝑃𝑖𝑗 = 1|𝜃) = (1 − 𝜕𝜃𝑚𝑛 𝜕𝜃𝑚𝑛 = ∏ (𝑖𝑗) 𝑤𝑖𝑡ℎ(𝑚𝑛) 𝑍𝐷𝑠𝑡 (1 − 𝜃𝑠𝑡 ) ∏ (𝑖𝑗) 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 (𝑚𝑛) 𝑍𝐷 𝑠𝑡 ( 𝑤𝑖𝑡ℎ𝑜𝑢𝑡(𝑚𝑛)) = ∏ (1 − 𝜃𝑠𝑡 ) (𝑖𝑗)∗ 𝑍𝐷𝑠𝑡 ≠𝑚𝑛 Then, (𝑖𝑗) 𝛿𝑚𝑛 = 𝜕 (𝑚𝑛) Pr (𝑂𝑖𝑗 = 1|𝜃) 𝜕𝜃𝑚𝑛 = (1 − fn − fp) ∏ (1 − 𝜃𝑠𝑡 ) (𝑖𝑗)∗ 𝑍𝐷𝑠𝑡 ≠𝑚𝑛 Also, let 6 / 11 (1 − 𝜃𝑠𝑡 )) Global optimization-based inference of chemogenomic features (𝑚𝑛) (𝑖𝑗) 𝜇𝑚𝑛 = Pr (𝑂𝑖𝑗 Zu,S. et al. = 1|𝜃) We can get 𝜕 (𝑚𝑛) (𝑚𝑛) (𝑚𝑛) (𝑚𝑛) ∑ (𝑂𝑖𝑗 log (Pr (𝑂𝑖𝑗 = 1|𝜃)) + (1 − 𝑂𝑖𝑗 ) log (Pr (𝑂𝑖𝑗 = 0|𝜃))) 𝜕𝜃𝑚𝑛 𝑖,𝑗 (𝑚𝑛) = ∑( 𝑖,𝑗 𝑂𝑖𝑗 (𝑖𝑗) 𝜇𝑚𝑛 (𝑚𝑛) − 1 − 𝑂𝑖𝑗 1− (𝑖𝑗) (𝑖𝑗) 𝜇𝑚𝑛 )𝛿𝑚𝑛 Then (𝑚𝑛) (𝑚𝑛) (𝑚𝑛) (𝑚𝑛) 𝑂𝑖𝑗 1 − 𝑂𝑖𝑗 𝑂𝑖𝑗 1 − 𝑂𝑖𝑗 𝜕 (𝑖𝑗) (𝑖𝑗) 𝜕 ∑( (𝑖𝑗) − )𝛿𝑚𝑛 = ∑ 𝛿𝑚𝑛 ( (𝑖𝑗) − ) (𝑖𝑗) (𝑖𝑗) 𝜕𝜃𝑚𝑛 𝜕𝜃𝑚𝑛 𝜇 𝜇 1 − 𝜇 1 − 𝜇 𝑚𝑛 𝑚𝑛 𝑚𝑛 𝑚𝑛 𝑖,𝑗 𝑖,𝑗 Note that, 𝜕 (𝑖𝑗) 𝛿 =0 𝜕𝜃𝑚𝑛 𝑚𝑛 Besides, 𝜕 (𝑖𝑗) (𝑖𝑗) 𝜇 = 𝛿𝑚𝑛 𝜕𝜃𝑚𝑛 𝑚𝑛 (𝑚𝑛) (𝑚𝑛) (𝑚𝑛) (𝑚𝑛) 𝑂𝑖𝑗 1 − 𝑂𝑖𝑗 𝑂𝑖𝑗 1 − 𝑂𝑖𝑗 𝜕 (𝑖𝑗) (𝑖𝑗) ( (𝑖𝑗) − )=− 𝛿𝑚𝑛 − 𝛿 2 (𝑖𝑗) (𝑖𝑗) 2 𝑚𝑛 (𝑖𝑗) 𝜕𝜃𝑚𝑛 𝜇 1 − 𝜇𝑚𝑛 (1 − 𝜇𝑚𝑛 ) 𝜇𝑚𝑛 𝑚𝑛 Finally, we have I(𝜃𝑚𝑛 ) = ∑ (𝑖,𝑗)∗ (𝑚𝑛) (𝑖𝑗) 2 𝑂𝑖𝑗 𝛿𝑚𝑛 ( (𝑖𝑗) 2 𝜇𝑚𝑛 7 / 11 (𝑚𝑛) + 1 − 𝑂𝑖𝑗 (𝑖𝑗) (1 − 𝜇𝑚𝑛 )2 ) Global optimization-based inference of chemogenomic features Zu,S. et al. Combinations of drug chemical substructures Different drug chemical substructures may take functions as one unit in the drug-protein interactions. We try to estimate the combination behaviors between two drug chemical substructures on drugprotein interactions through adding the pairs of drug chemical substructures as the new “drug chemical substructures”, we handle this problem by our probabilistic model. Instead of considering all the pairs of drug chemical substructures, we firstly use a filter method to select those pairs of drug chemical substructures that significantly appear in the drug-protein interactions. The filter method follows two steps: we use hypergeometric distribution to detect whether the co-appearing times of two drug chemical substructures are significant or not. Then we move out the pairs of drug chemical substructures that also significantly appear in the randomly selected compounds. The reason why we follow this procedure is that we are only interested in the combinations of drug chemical substructures that can interact with proteins but not often co-exists in the compound chemical space. The randomly selected compounds are from CHEMBL database, and in total, there are over 9,000 compounds representing the compound chemical space. We use the Bonferroni adjustment for the multiple test corrections here. Note that due to the PubChem substructures definition, we only consider the pairs with SMARTS records. Finally, we select 1870 pairs of drug chemical substructures for learning. 8 / 11 Global optimization-based inference of chemogenomic features Zu,S. et al. Estimation of fn and fp. In our model, two parameters, i.e., fn and fp, should be predefined. According to our model, fn = Pr(𝑂𝑖𝑗 = 0|𝑌𝑃𝑖𝑗 = 1) Pr(𝑂𝑖𝑗 = 1, 𝑌𝑃𝑖𝑗 = 1) =1− Pr(𝑌𝑃𝑖𝑗 = 1) Pr(𝑂𝑖𝑗 = 1) ≥1− Pr(𝑌𝑃𝑖𝑗 = 1) 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑒𝑟𝑠𝑒𝑣𝑒𝑑 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑎𝑚𝑜𝑛𝑔 𝑟𝑒𝑎𝑙 𝑖𝑛𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 ≈1− 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑒𝑎𝑙 𝑑𝑟𝑢𝑔 𝑎𝑛𝑑 𝑝𝑟𝑜𝑡𝑒𝑖𝑛 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 It is shown that on average the number of target proteins per drug is about 6.3[8]. Then we can get, 4809 = 0.41 1863 × 6.3 We can estimate fp, which equals to Pr(𝑂𝑖𝑗 = 1|𝑌𝑃𝑖𝑗 = 0), in the similar way. fp = Pr(𝑂𝑖𝑗 = 1|𝑌𝑃𝑖𝑗 = 0) Pr(𝑂𝑖𝑗 = 1, 𝑌𝑃𝑖𝑗 = 0) = Pr(𝑌𝑃𝑖𝑗 = 0) Pr(𝑂𝑖𝑗 = 1) ≤ Pr(𝑌𝑃𝑖𝑗 = 0) 𝑛𝑢𝑚𝑏𝑟 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑝𝑎𝑖𝑟𝑠 ≈ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑝𝑎𝑖𝑟𝑠 − 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑝𝑎𝑖𝑟𝑠 fn ≥ 1 − 4809 1863 × 1554 − 4809 ≤ 1.67 × 10−3 = In order to analyze our model robustness to these parameters, we used five folds cross validation to detect the recoveries of drug-protein interactions on different combinations of fn and fp. It showed that the performances of recovering drug-protein interactions kept stable on different combinations of fn and fp. The procedure are followed: (i) split the original drug-target interactions equally to five fold. (ii)Each time, we select one of them as the test data set and use the others as the training set in our model. (iii) Estimate the test set after learning by the area under the operating characteristic curve (AUC). The curve is generated by plotting the false positive rate in the x-axis versus true positive rate. Note that the negative samples are randomly selected from the known non-interacted drug protein pairs, since we do not have the real negative samples. Results of the predicted drug-domain interactions 9 / 11 Global optimization-based inference of chemogenomic features Zu,S. et al. Protein Uniprot ID Compound PubChem ID Protein Domain Pfam ID k value Prediction by GIFT ITAL_HUMAN LKHA4_HUMAN LKHA4_HUMAN SRC_HUMAN SRC_HUMAN SRC_HUMAN CATS_HUMAN LDHA_HUMAN LKHA4_HUMAN PDE5A_HUMAN SRC_HUMAN ANDR_HUMAN ANDR_HUMAN ANDR_HUMAN ANDR_HUMAN DNMT1_HUMAN ESR1_HUMAN MMP3_HUMAN MMP8_HUMAN NOS3_HUMAN PDE10_HUMAN RXRA_HUMAN THRB_HUMAN TPA_HUMAN ADH1A_HUMAN ADH1B_HUMAN ADH1B_HUMAN ANDR_HUMAN ANDR_HUMAN ANDR_HUMAN DHSO_HUMAN ESR1_HUMAN ESR1_HUMAN ESR1_HUMAN ESR1_HUMAN ESR1_HUMAN ESR1_HUMAN ESR1_HUMAN ESR2_HUMAN ESR2_HUMAN GCR_HUMAN GCR_HUMAN NOS3_HUMAN OTC_HUMAN PDE5A_HUMAN PDE5A_HUMAN PRGR_HUMAN PRGR_HUMAN PRGR_HUMAN PRGR_HUMAN ROCK1_HUMAN RXRA_HUMAN THRB_HUMAN 53232 445154 90334 311 867 971 5287799 974 72172 110634 5287544 261000 3371 5803 5920 439155 5280961 1990 1990 2733 4680 82146 2332 2332 5287890 347402 80654 10635 56069 6013 132302 448577 449205 449207 449209 5035 5757 5870 5280961 5757 55245 5743 1893 124992 110635 5212 261000 4369524 5994 6230 3064778 444795 5326608 PF00092 PF01433 PF01433 PF00017 PF00017 PF00017 PF00112 PF02866 PF01433 PF00233 PF00017 PF00104 PF00104 PF00104 PF00104 PF00145 PF00104 PF00413 PF00413 PF02898 PF00233 PF00104 PF00089 PF00089 PF08240 PF08240 PF08240 PF00104 PF00104 PF00104 PF08240 PF00104 PF00104 PF00104 PF00104 PF00104 PF00104 PF00104 PF00104 PF00104 PF00104 PF00104 PF02898 PF00185 PF00233 PF00233 PF00104 PF00104 PF00104 PF00104 PF00069 PF00104 PF00089 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.70 0.90 1.00 0.81 0.91 0.90 0.80 0.83 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.53 0.57 0.58 0.91 0.84 0.91 0.53 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.90 1.00 1.00 0.50 1.00 1.00 1.00 1.00 1.00 1.00 0.94 1.00 1.00 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE Table 1. Results of the prediction on drug-domain interactions. k value is the proportion of the number of the binding site residues lying within the protein domain over the total number of the binding site residues. When k value was larger than 0.5, we treated the drug interacted with the domain. TRUE means the drug was predicted to interact with domain by GIFT. 10 / 11 Global optimization-based inference of chemogenomic features Zu,S. et al. Reference 1. Tabei, Y., Pauwels, E., Stoven, V., Takemoto, K., & Yamanishi, Y. (2012). Identification of chemogenomic features from drug–target interaction networks using interpretable classifiers. Bioinformatics, 28(18), i487-i494. 2. Bolton E, Wang Y, Thiessen PA, Bryant SH. PubChem: Integrated Platform of Small Molecules and Biological Activities. Chapter 12 IN Annual Reports in Computational Chemistry, Volume 4, Elsevier: Oxford, UK; 2008, pp. 217-240. 3. The Pfam protein families database: M. Punta, P.C. Coggill, R.Y. Eberhardt, J. Mistry, J. Tate, C. Boursnell, N. Pang, K. Forslund, G. Ceric, J. Clements, A. Heger, L. Holm, E.L.L. Sonnhammer, S.R. Eddy, A. Bateman, R.D. Finn Nucleic Acids Research (2012) Database Issue 40:D290-D301 4. Finn, R. D., Miller, B. L., Clements, J., & Bateman, A. (2014). iPfam: a database of protein family and domain interactions found in the Protein Data Bank. Nucleic acids research, 42(D1), D364-D373. 5. Kruger, F. A., Rostom, R., & Overington, J. P. (2012). Mapping small molecule binding data to structural domains. BMC bioinformatics, 13(Suppl 17), S11. 6. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal statistical Society, 39(1), 1-38. 7. Deng, M., Mehta, S., Sun, F., & Chen, T. (2002). Inferring domain–domain interactions from protein–protein interactions. Genome research, 12(10), 1540-1548. 8. Mestres, J., Gregori-Puigjane, E., Valverde, S., & Sole, R. V. (2008). Data completeness—the Achilles heel of drug-target networks. Nature biotechnology, 26(9), 983-984. 9. Efron, B., & Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information.Biometrika, 65(3), 457-483. 11 / 11

Supplementary Materials - TNLIST, Department of Automation

Related documents

Products

Support

Supplementary Materials - TNLIST, Department of Automation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib