生物資訊於天然藥物開發之應用 Bioinformatics in Natural Product Research 童俊維 (Chun-Wei Tung) 高雄醫學大學藥學系暨毒理學博士學位學程 cwtung@kmu.edu.tw http://cwtung.kmu.edu.tw 103/11/28 @ 元智資工 1 2 More money, less drugs Annually, the North American and European pharmaceutical industries invest more than US$20 billion to identify and develop new drugs, about 22% of which is spent on screening assays and toxicity testing Sandra et al. (2004) EMBO reports. 5, 837 - 842 3 What happened? ? Effective/ Non-toxic A B Effective?/ Non-toxic? Healthy Species difference Individual difference 4 Toxicities leading to drug withdrawal from the US market Wilke et al. (2007) Nature Reviews Drug Discovery 6, 904-916 5 Hints for future drug design • Economic and fast method is required for drug discovery • In addition to efficacy, toxicity/safety should be evaluated • Species and individual difference • Bioinformatics!!! 6 Computer-aided drug design Target identification Toxicity screening Bioactive compound screening 7 Protein-ligand docking • Predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure. • Protein-ligand docking 8 Target identification Database for bioactive compound screening • C.W. Tung*, Y.C. Lin, H.S. Chang, C.C. Wang, I.S. Chen, J.L. Jheng and J.H. Li (2014) Database, 2014, bau055. • C.W. Tung* (2014) Current Computer-Aided Drug Design. (In press) • Y.C. Lin, C.C. Wang, I.S. Chen, J.L. Jheng, J.H. Li and C.W. Tung* (2013) The Scientific World Journal, 2013, 736386. Toxicity screening Bioactive compound screening 9 Plant-derived drugs • Plants are valuable resources for the development of therapeutic agents • E.g. Traditional Chinese medicine • Current global market for plant-derived drugs is worth >20 billion • More than 60% drugs are natural products, derivatives or natural product mimics • E.g. Willow is a natural source of aspirin • anti-inflammatory and antiplatelet • However, only 10–15% of plant species have been explored for developing clinically important drugs 10 Opportunity • Taiwan is rich in diversity of plants • Owing to the unique geographical features and location • Indigenous/endemic plants in Taiwan • Precious sources of novel pharmacologically active compounds. • Many studies identified novel compounds without further investigation of bioactivity • A curated database of Taiwan indigenous plants is desirable! 11 TIPdb: Taiwan indigenous plant database Bioactivity 2D structure Reference TIPdb Taxonomy KNApSAcK Afendi et al. (2012). Plant Cell Physiol., 53, e1. Merck Molecular Force Field (MMFF94) Balloon & DG-AMMOS Puranen et al. (2010) J. Comput. Chem., 31,20 1722–1732. Lagorce et al. (2009) BMC Chem. Biol., 9, 6. 3D structure Structure 12 • TIPdb is a structured and searchable database of AntiCancer, Anti-Platelet, and AntiTuberculosis phytochemicals from indigenous plants in Taiwan. • 1,116 Taiwan indigenous plants • 8,800 non-redundant 3D structures of phytochemicals • 5,243 records of anticancer, antiplatelet, and antituberculosis activities • http://cwtung.kmu.edu.tw/tipdb 13 14 Lipinski’s rule of five • Four criteria by analyzing the physicochemical properties of >2000 drugs • Molecular weight <500 Dalton • Octanol–water partition coefficient logP <5 • H-bond donors <5 • H-bond acceptors <10 15 16 Activity No. of No. of Activity Chemicals Records 1 Cytotoxic 494 2481 2 Anti-Platelet 339 2448 3 Anti-Tuberculosis 233 302 17 Target identification Target identification • C.W. Tung* (2012) BMC Bioinformatics, 13, 40. (Highly Accessed) • C.W. Tung* (2013) Journal of Theoretical Biology, 336, 11-17. Toxicity screening Bioactive compound screening 18 Prokaryotic ubiquitin-like protein (Pup) • Firstly identified post-translational protein modifier in prokaryotes • 64 amino acids • Important signal for the selective degradation of proteins Julie Maupin-Furlow Nature Reviews Microbiology 10, 100-111 (February 2012) doi:10.1038/nrmicro2696 19 PupDB Gene Ontology Protein (sites, function, sequence) BLAST Reference ® PupDB 3D structure Tools (Browse, Search, BLAST) 20 Statistics • 1391 proteins • 268 with known pupylation sites • 1123 without known pupylation sites 21 BLAST tool • Predict putative pupylation sites based on sequence similarity http://cwtung.kmu.edu.tw/pupdb 22 Sequence-based prediction of pupylation sites • No consensus motif 23 Aims • Identify discriminant features for pupylation sites • Develop prediction methods for pupylation sites • Analyze the preference of pupylation sites • Analyze the affected functions 24 Composition of k-spaced amino acid pairs • CKSAAP with k=0, 1, 2, 3 and 4 is used to encode pupylation and nonpupylation sites as 2205-dimensional feature vectors. • Considering the pair of A and C, the k-spaced amino acid pairs for k=0, 1, 2, 3 and 4 are represented as AC, AxC, AxxC, AxxxC and AxxxxC, respectively. N_ _ N AA N AC , ,..., N N total total N total 441 25 System flow 1. Rank importance of CKSAAPs using χ2-test 2. Searching for top p CKSAAPs giving highest cross-validation AUC 3. Searching for optimal window size giving highest AUC PupDB Training dataset 162 proteins with 183 pupylation sites Wrapper-based feature selection Support vector machine (SVM) Using RBF kernel Training Optimal window size LASDFKASDFSAL Test dataset 20 proteins with 29 pupylation sites Size: 5 Test Size: 9 26 10-fold cross-validation • iPUP is better than GPS-PUP (∆AUC=8%) 6% (AUC) better than GPS-PUP 27 Independent test • 6% (AUC) better than GPS-PUP 90 80 70 60 50 iPUP 40 GPS-PUP 30 20 10 0 Balanced Accuracy Accuracy Sensitivity Specificity Precision MCC AUC 28 Feature importance • C-terminal space containing pairs (5/25=20%) • Pupylation sites in lysines near the C-terminal end is 14.43% (14/97) that is two times higher than 7.95% (212/2666) in all lysines • In contrast, the percentage of nonpupylation sites in lysines near the N-terminal end is 4.21% that is much smaller than 7.95% in all lysines. 29 Overrepresented amino acid pairs Amino acid pairs with positive value is overrepresented in pupylation sites. In contrast, negative value means overrepresentation in non-pupylation sites. • Gene set enrichment analysis • -> Identify functions regulated by pupylation 30 Target identification Toxicity screening • C.W. Tung* and J.L. Jheng (2014) Neurocomputing, 145, 68-74. • C.W. Tung* (2014) Lecture Notes in Computer Science8626, 1-9. • C.W. Tung* (2013) Lecture Notes in Computer Science, 7986, 231-241. Toxicity screening Bioactive compound screening 31 Toxicity screening • Too many chemicals/ too few experimental data • Computational methods are potential alternatives to experiments • Based on the analysis of previous knowledge and experimental data • Prediction of non-genotoxic hepatocarcinogen Wilke et al. (2007) Nature Reviews Drug Discovery 6, 904-916 32 Chemical hepatocarcinogenesis • Exposure -> initiation, promotion and progression • Carcinogenic chemicals • Genotoxic carcinogenicity: directly interact with DNA (mutagenic) • Non-genotoxic carcinogenicity : non-mutagenic direct indirect DNA 33 Experiment methods • Genotoxic hepatocarcinogenicity • Several short-term in vitro and in vivo assays • Non-genotoxic hepatocarcinogenicity • 2-year rodent bioassays • Labor-intensive, time-consuming and expensive • It is desirable to develop alternative methods to efficiently prioritize potential non-genotoxic hepatocarcinogenicity of chemicals for further studies 34 Quantitative Structure-Activity Relationship (QSAR) • Chemical structure descriptors Non-genotoxic hepatocarcinogenicity ? Genotoxic hepatocarcinogenicity 35 Toxicogenomics • Toxicogenomics (TGx) • Gene expression profile (Transcriptome data) • Microarray • Performance better than QSAR (Liu et al., 2011; Yamada et al., 2012; Uehara et al., 2008) DNA QSAR (Structure level) RNA TGx (Transcriptome level) 36 Motivation • Non-genotoxic carcinogenicity could be caused by chemical-protein interactions Genotoxic carcinogenicity (In vitro assays) DNA Non-genotoxic carcinogenicity (TGx) RNA Non-genotoxic carcinogenicity (Chemical-protein interaction) Protein 37 Aims • To develop computational methods based chemical-protein interaction (CPI) • To identify the critical proteins for assessing non-genotoxic hepatocarcinogenicity • To compare the CPI method with QSAR and TGx 38 Dataset • NCTRlcdb: a National Center for Toxicological Research liver cancer database NCTRlcdb 999 chemicals 62 chemicals with available TGx data (Young et al., 2004) (Natsoulis et al., 2008) •Liver carcinogen (273) •Other carcinogen (293) •Non-carcinogen (304) •Other (129) •Direct DNA damage •Other mechanism 8 positive 32 negative chemicals Training dataset 5 positive 17 negative chemicals Non-genotoxic hepatocarcinogen (Positive) •Direct DNA damage •Liver carcinogen Genotoxic hepatocarcinogen + Non-carcinogen (Negative) Independent dataset (the same as Liu et al., 2011) 39 Chemical-protein interactions • STTICH (Search Tool for Interactions of Chemicals) • STITCH is a resource to explore known and predicted interactions of chemicals and proteins • Chemicals are linked to other chemicals and proteins by evidence derived from experiments, databases and the literature • STITCH contains interactions for between 300,000 small molecules and 2.6 million proteins from 1,133 organisms • This study use interactions from Rattus norvegicus 40 Example: Acetaminophen • A widely used over-the-counter analgesic (pain reliever) and antipyretic (fever reducer) Protein 10116.ENSRNOP00000055369 10116.ENSRNOP00000055898 10116.ENSRNOP00000055899 10116.ENSRNOP00000055979 10116.ENSRNOP00000056924 10116.ENSRNOP00000057452 10116.ENSRNOP00000059889 10116.ENSRNOP00000059937 10116.ENSRNOP00000060007 10116.ENSRNOP00000060118 10116.ENSRNOP00000060699 10116.ENSRNOP00000060976 Experi Data Text Combined mental base mining Score 0 0 0 0 0 0 0 0 0 0 0 279 150 0 0 150 0 150 150 0 150 0 150 0 777 170 157 0 190 154 0 157 0 204 0 0 806 170 157 150 190 259 150 157 150 204 150 279 41 Example: Chemical-chemical interaction Chemical1 Chemical2 Similarity Experimental Database Textmining Combined Score CID149837371 CID100000312 0 900 0 0 900 CID149835969 CID100033005 0 0 0 211 211 CID146173085 CID100000868 0 0 900 409 939 CID149786972 CID100000193 791 900 0 0 900 CID149786972 CID100002024 566 0 0 127 127 CID149786966 CID100000312 0 900 0 0 900 42 Combined score • The individual scores for a given chemical–protein or chemical– chemical interaction are combined into one overall score (von Mering, 2005) • i.e. Combined Score • Bayesian scoring scheme • 𝑆 =1− 𝑖 (1 − 𝑆𝑖 ) 43 Decision tree • Simple and interpretable classifier • Capable of generating interpretable rules for better understanding of biological problems • C5.0, an improved version of C4.5, with smaller trees and less computation time is applied in this study • R package C50 (Kuhn and Weston, 2012) 44 Prediction performance Model Class type ifier Feature selection #Feature 5-CV Acc. CPI C5.0 Information gain 1 0.82 Wapper-based QSAR* NCC (mRMR) 15 0.76 TGx-1 Wapper-based day* NCC (mRMR) 90 0.87 TGx-3 Wapper-based day* NCC (mRMR) 90 0.87 TGx-5 Wapper-based day* NCC (mRMR) 90 0.90 * Model performance from Liu et al (2011) 45 Independent test Model type #Feature Acc. Sen. Spe. MCC CPI 1 0.86 0.40 1.00 0.580 QSAR* 15 0.55 0.20 0.65 -0.138 TGx-1 day* 90 0.77 0.40 0.88 0.307 TGx-3 day* 90 0.77 0.20 0.94 0.206 TGx-5 day* 90 0.82 0.60 0.88 0.482 46 Learning knowledge from whole dataset Non-genotoxic hepatocarcinogen • IF a chemical interact with ABCC3 THEN non-genotoxic hepatocarcinogenicity Genotoxic hepatocarcinogen + non-carcinogen 47 ABCC3: ATP-binding cassette, subfamily C, member 3 • ATP-binding cassette (ABC) transporters that transports various molecules across membranes • Also known as the canalicular multispecific organic anion transporter 2, exhibits drug transmembrane transporter activity -> critical for drug transport, multidrug resistance and bile acid transport pathways 48 Difference between CPI and TGx • CPI-database scores of positive chemicals are significantly different from that of negative chemicals for ABCC3 (p < 0.05) • p-values • TGx-1d: 0.26 • TGx-3d: 0.30 • TGx-5d: 0.41 49 Summary • The mechanism of action of non-genotoxic hepatocarcinogenicity might involve complex regulations of proteins and chemicals • This study presents a novel CPI-based method and demonstrates the effectiveness of biomarker identification and superior prediction performance • Compared to TGx methods requiring assessment of 100 gene expression values and 5 to 28-day experiments, the identified single biomarker could be more cost-effective and time-saving 50 Further improvement • Protein-ligand docking • Distinguishable features of ABCC3 interactions between non-genotoxic hepatocarcinogenic and other chemicals • Construction of a larger dataset • Only a few non-genotoxic hepatocarcinogens are defined (often inconsistent definition) • A more objective and larger dataset is required! • Mutagenicity (Ames test) data are readily available for a large number of chemicals 51 Prediction of Ames-negative hepatocarcinogens • The Ames test is useful for identifying mutagenic carcinogens with an accuracy of 80% (Zeiger, 1998; Benigni et al., 2010) • However, 48% of Ames-negative chemicals are carcinogens (Cunningham, 2012) • Additional bioassays do not help in detecting carcinogens from Amesnegative chemicals (Zeiger, 2010) • The assessment of Ames-negative hepatocarcinogens still depends on 2-year rodent bioassays • Alternative methods!! 52 Computational methods for non-genotoxic hepatocarcinogens • Quantitative structure-activity relationship (QSAR) • Slightly better than random (Accuracy=55%) (Liu et al., 2011) • Toxicogenomics method (TGx) • Microarray data are only available for a small number of chemicals • Chemical-protein interaction (CPI) and chemical-chemical interaction (CCI) • CPI > CCI >= TGx > QSAR (Tung, 2013; Tung and Jheng 2014) • The results are based on a small dataset consisting of only 62 chemicals • It is required to collect a larger dataset for developing computational models Motivation Chemicals Ames(+) Accuracy=80% Ames(-) Accuracy=52% • The assessment of Ames-negative hepatocarcinogens still depends on 2-year rodent bioassays • Alternative methods!! Aims • Collect a relatively large dataset • Determine the best features for predicting Ames-negative hepatocarcinogens based on decision tree algorithm • Acquire decision rules for interpretation 55 Dataset • NCTRlcdb: a National Center for Toxicological Research liver cancer database 100 chemicals (60% training set) NCTRlcdb 999 chemicals (Young et al., 2004) •Liver carcinogen (273) •Other carcinogen (293) •Non-carcinogen (304) •Other (129) 73 hepatocarcinogen 93 noncarcinogen 166 Ames-negative chemicals 33 chemicals (20% validation set) 33 chemicals (20% test set) Model construction Feature selection Independent test 56 Feature selection • Step 1) Features with near zero variances were removed • Baseline model • Step 2) Minimum redundancy-maximum relevancy (mRMR) method (De Jay, 2013) is utilized to rank the feature importance • Step 3) Sequential backward feature elimination algorithm is applied to iteratively remove features with lowest ranks for selecting a feature subset giving the highest 10-fold cross-validation (10-CV) accuracy • Model based on the selected feature subset 57 Results of feature selection 75% Number of Features Training (10-CV) Validation CCI-baseline 223 64% 72.73% CCI-feature selection 11 70% 84.85% QSAR-baseline 612 49% 57.58% 69% 72.73% QSAR-feature selection 27 70% 70% Accuracy Method 69% CCI QSAR 65% 60% 55% 50% In addition to the mRMR method, three additional methods of chi-square test, variable importance of random forest, and relief were also evaluated with worse validation accuracies of 72.73%, 69.70% and 69.70%, respectively. 2 12 22 32 42 Number of selected features 52 58 Independent test Validation 1.00 0.90 0.80 0.70 Validation Method CCI QSAR Test CCI QSAR Accuracy (%) 84.85 72.73 75.76 69.70 Sensitivity (%) 78.57 57.14 50.00 71.43 Specificity (%) 89.47 84.21 94.74 68.42 Precision (%) 84.62 72.73 87.50 62.50 AUC 0.8421 0.7030 0.7180 0.6880 0.60 0.50 0.40 Accuracy Sensitivity Specificity CCI Precision AUC QSAR Independent Test 1.00 0.90 0.80 0.70 0.60 0.50 0.40 Accuracy Sensitivity Specificity CCI Precision AUC QSAR 59 Decision tree and rules • Five decision rules corresponding to five leaf nodes can be derived from the decision tree • In brief, • IF a chemical interacting with one of the four chemicals • THEN hepatocarcinogen • (correctly predict 27 hepatocarcinogens) • Otherwise, noncarcinogen • (55 noncarcinogens are correctly predicted with 18 miss-classified hepatocarcinogens) 60 Decision tree CID Name Note CID000007579 di-(4-aminophenyl)ether Ames-positive carcinogens CID000006324 ethane CID000005897 2-acetylaminofluorene CID000187790 deoxyguanosine Ames-positive carcinogens 61 Summary • Computational methods for hepatocarcinogenicity is important for efficient drug development compared to the traditional 2-year rodent bioassays • This study developed an alternative method for predicting Amesnegative hepatocarcinogens • A decision tree-based method using CCI information and mRMR feature selection • The prediction model performs well with validation and test accuracies of 85% and 76%, respectively • The acquired simple decision rules are useful for identifying Amesnegative hepatocarcinogens with high specificity and precision Future works Target identification (pupylation) Toxicity screening (hepatotoxicity) • Pupylation is a potential target for Mycobacterium tuberculosis Bioactive compound screening (TIPDB) • Apply advanced machine learning algorithms • Screening of pupylation inhibitors • Other toxicities • Experimental validation 63 Predicting potential effects induced by maleic acid 64 Maleic acid • Maleic acid is cis-isomer of butenedioic acid used as a fragrance ingredient and pH adjuster in beauty products or cosmetics • Manufacture of polymer products including food packaging and is listed as a legal indirect component in foods in both the United States and the European Union countries • The oral LD50 of the maleic acid are 708 and 2400 mg/kg in rat and mouse, respectively • Maleic anhydride, which is rapidly converted to maleic acid when encountering water, had been illegally added to modified starch to enhance favorable properties, such as elasticity 65 • The adulteration of maleic anhydride in modified starch gives rise to the concern about the long-term human oral exposure to maleic acid, especially in Taiwan 66 Reported toxicity • Nephrotoxicity in rabbits, rats and dogs • Renal tubular injury and cell necrosis in the proximal tubules of treated rats • Interfered renal proximal Na+ and H+ transport and inhibited the activity of proximal tubule Na-K-ATPase and H-ATPase • However, the toxicological effects of maleic acid on human health are still largely unknown 67 Aims • Identify maleic acid-interacting proteins • Infer functions, pathways and diseases affected by maleic acid • Predict the ADMET profile of maleic acid 68 System flow Gene Ontology term enrichment analysis STITCH database CPI data 101 proteins Pathway enrichment analysis Disease inference Davis et al. (2013) Nucleic Acids Res. 69 Chemical-GO term inference Chemical-gene interactions Chemical Gene-GO term associations Enrichment analysis THRB Response to chemical 7.64e-162 AR Developmental process 9.37e-156 PPARA . . . TGFB1 Membrane . . . Catalytic activity 1.39e-152 . . . 2.41e-147 Gene Corrected P-value Gene Ontology 70 Gene Ontology (GO) terms • The Gene Ontology project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data • Many of genes/proteins have Gene Ontology (GO) annotations that provide information about their associated biological processes, molecular functions, and cellular components • The significance of enrichment was calculated by the hypergeometric distribution and adjusted for multiple testing using the Bonferroni method • http://geneontology.org/ 71 Enrichment analysis • Identify functional annotations that are over-represented • Hypergeometric distribution • K: the number of genes with the term t • N: the number of total genes • n: the number of selected genes • k: the number of selected genes with the term t 72 Bonferroni correction • p<0.05 for 20 tests • p(at least one significant result) = 1-p(no significant results) • p(at least one significant result) = 1-(1-0.05)20 • p(at least one significant result) = 0.64 • Bonferroni correction • p < 0.05/20=0.0025 73 Molecular functions (Top 10) GO level GO term name Molecular function (MF) 4 Glutamate receptor activity 5 Ionotropic glutamate receptor activity GO term ID Corrected p-value No. of genes GO: 0008066 1.16 E−53 GO: 0004970 4.40E−33 23 15 9 Extracellular-glutamate-gated ion channel activity GO: 0005234 2.09 E−32 15 3 Transmembrane signaling receptor activity GO: 0004888 2.17 E−31 42 2 1 2 1 8 Signaling receptor activity Molecular transducer activity Signal transducer activity Receptor activity Excitatory extracellular ligand-gated ion channel activity GO: 0038023 GO: 0060089 GO: 0004871 GO: 0004872 GO: 0005231 4.88 E−30 6.18 E−29 6.18 E−29 7.98 E−29 1.84 E−26 42 44 44 43 16 7 Extracellular ligand-gated ion channel activity GO: 0005230 4.76 E−23 16 74 Cellular component (Top 10) GO level GO term name GO term ID Corrected p-value No. of genes Cellular component (CC) 3 Intrinsic to plasma membrane GO: 0031226 1.78 E−40 49 2 Plasma membrane part GO: 0044459 3.92 E−37 53 2 Plasma membrane GO: 0005886 3.70 E−34 67 4 Integral to plasma membrane GO: 0005887 4.20 E−34 44 2 Cell periphery GO: 0071944 1.28 E−33 67 1 Synapse part GO: 0044456 2.66 E−29 28 3 Ionotropic glutamate receptor complex GO: 0008328 4.08 E−28 15 2 Synaptic membrane GO: 0097060 4.22 E−28 24 1 Synapse GO: 0045202 2.22 E−25 28 3 Postsynaptic membrane GO: 0045211 1.96 E−24 21 75 Biological process (Top 10) GO level GO term name GO term ID Corrected p-value No. of genes Biological process (BP) 4 Synaptic transmission GO: 0007268 1.08 E−34 37 4 Transmission of nerve impulse GO: 0019226 1.21 E−32 37 3 Multicellular organismal signaling GO: 0035637 3.51 E−32 37 3 Cell–cell signaling GO: 0007267 6.65 E−32 41 3 System process GO: 0003008 5.02 E−29 45 1 Cellular process GO: 0009987 1.04 E−28 93 4 Neurological system process GO: 0050877 2.33 E−28 40 5 Glutamate receptor signaling pathway GO: 0007215 2.75 E−26 16 2 Single-organism metabolic process GO: 0044710 4.13 E−23 50 1 Multicellular organismal process GO: 0032501 8.69 E−23 63 76 Chemical-pathway inference Chemical-gene interactions Chemical Gene-pathway associations Enrichment analysis THRB Metabolism 1.14e-171 AR Pathway in cancer 4.88e-40 PPARA . . . TGFB1 PPAR signaling pathway . . . Developmental biology 4.30e-39 . . . 1.93e-37 Gene Corrected P-value Pathway 77 Pathways (Top 10) Pathway Pathway ID Corrected p-value Neuroactive ligand-receptor interaction KEGG:04080 1.35 E−47 Glutamatergic synapse KEGG:04724 2.91 E−37 Signal transduction REACT:111102 1.89 E−18 Neuronal system REACT:13685 5.11 E−16 Calcium signaling pathway KEGG:04020 1.45 E−11 Long-term potentiation KEGG:04720 4.36 E−11 Metabolism REACT:111217 4.57 E−08 Metabolic pathways KEGG:01100 1.16 E−07 Cyanoamino acid metabolism KEGG:00460 3.95 E−07 Amyotrophic lateral sclerosis (ALS) KEGG:05014 9.69 E−07 78 Chemical-disease inference Chemical-gene interactions Chemical Gene-disease associations Inference score THRB Kidney Disease 2.28 AR Hypertension 16.21 PPARA . . . TGFB1 Carcinoma, Hepatocellular . . . Fatty liver 63.70 . . . 14.04 Gene Disease 79 Inference Score • The degree of similarity between CTD chemical–gene–disease networks and a similar scale-free random network • Many biological networks, such as disease and metabolic networks, have been shown to be scale-free random networks [Barabasi et al. (1999) Science] • Inference score = log(p1*p2) • p1: The first statistic takes into account the connectivity of the chemical and disease along with the number of genes used to make the inference • p2: The second statistic takes into the account the connectivity of each of the genes used to make the inference • King et al. (2012) PLoS One p1 p2 80 Disease (Selected) Disease name Mental disorder Mental disorders Mental disorders diagnosed in childhood Schizophrenia and disorders with psychotic features Substance-related disorders Cocaine-related disorders Nervous system disease Epilepsy Central nervous system diseases Brain diseases Nervous system diseases Cardiovascular disease Vascular diseases Cancer Neoplasms Disease MeSH ID Corrected p-value No. of genes MESH: D001523 MESH: D019952 MESH: D019967 MESH: D019966 MESH: D019970 1.08 E−23 8.56 E−18 1.88 E−16 2.80 E−16 6.97 E−12 34 22 16 22 11 MESH: D004827 MESH: D002493 MESH: D001927 MESH: D009422 1.47 E−16 9.37 E−16 9.86 E−15 1.79 E−12 18 27 25 33 MESH: D014652 2.84 E−05 14 MESH: D009369 8.71 E−05 23 81 Proteins related to mental disorder 82 Predicted ADMET profile of maleic acid Model Result Probability Blood–brain barrier Y 0.9017 Human intestinal absorption Y 0.8740 P-glycoprotein substrate N 0.8006 P-glycoprotein inhibitor N >0.9808 Renal organic cation transporter N 0.9583 CYP inhibitory Low 0.9899 Human ether-a-go-go-Related Gene (hERG) Inhibition, a prediction for arrhythmias Weak/Non >0.9836 Carcinogens N 0.5130 83 Summary • GO analyses indicated that maleic acid could influence glutamate receptor activity and signal transmission at neural system • Maleic acid was inferred to be associated with mental disorders, nervous system diseases, cardiovascular disease, and cancers on humans • The prediction from QSAR models also suggested that maleic acid could penetrate into the brain after consumption • This study provide both the potential risks and mechanisms of applying maleic acid in food products • The approach can identify potential risks of poorly characterized chemicals 84 Acknowledgment • Dr. Chia-Chi Wang • Dr. Ying-Chi Lin • Dr. Ih-Sheng Chen • Dr. Hsun-Shuo Chang • Dr. Jih-Heng Li • Jhao-Liang Jheng 85