p - 元智大學資訊工程學系

advertisement
生物資訊於天然藥物開發之應用
Bioinformatics in Natural Product
Research
童俊維 (Chun-Wei Tung)
高雄醫學大學藥學系暨毒理學博士學位學程
cwtung@kmu.edu.tw
http://cwtung.kmu.edu.tw
103/11/28 @ 元智資工
1
2
More money, less drugs
Annually, the North American and
European pharmaceutical industries
invest more than US$20 billion to identify
and develop new drugs, about 22% of
which is spent on screening assays and
toxicity testing
Sandra et al. (2004) EMBO reports. 5, 837 - 842
3
What happened?
?
Effective/ Non-toxic
A
B
Effective?/ Non-toxic?
Healthy
Species difference
Individual difference
4
Toxicities leading to drug withdrawal from the
US market
Wilke et al. (2007) Nature Reviews Drug Discovery 6, 904-916
5
Hints for future drug design
• Economic and fast method is required for drug discovery
• In addition to efficacy, toxicity/safety should be evaluated
• Species and individual difference
• Bioinformatics!!!
6
Computer-aided drug design
Target
identification
Toxicity screening
Bioactive
compound
screening
7
Protein-ligand docking
• Predict how small molecules, such as substrates or drug
candidates, bind to a receptor of known 3D structure.
• Protein-ligand docking
8
Target
identification
Database for bioactive
compound screening
•
C.W. Tung*, Y.C. Lin, H.S. Chang, C.C. Wang, I.S. Chen, J.L. Jheng and J.H.
Li (2014) Database, 2014, bau055.
•
C.W. Tung* (2014) Current Computer-Aided Drug Design. (In press)
•
Y.C. Lin, C.C. Wang, I.S. Chen, J.L. Jheng, J.H. Li and C.W. Tung* (2013) The
Scientific World Journal, 2013, 736386.
Toxicity
screening
Bioactive
compound
screening
9
Plant-derived drugs
• Plants are valuable resources for the development of therapeutic
agents
• E.g. Traditional Chinese medicine
• Current global market for plant-derived drugs is worth >20 billion
• More than 60% drugs are natural products, derivatives or natural
product mimics
• E.g. Willow is a natural source of aspirin
• anti-inflammatory and antiplatelet
• However, only 10–15% of plant species have been explored for
developing clinically important drugs
10
Opportunity
• Taiwan is rich in diversity of plants
• Owing to the unique geographical features and location
• Indigenous/endemic plants in Taiwan
• Precious sources of novel pharmacologically active compounds.
• Many studies identified novel compounds without further
investigation of bioactivity
• A curated database of Taiwan indigenous plants is desirable!
11
TIPdb: Taiwan indigenous plant database
Bioactivity
2D structure
Reference
TIPdb
Taxonomy
KNApSAcK
Afendi et al. (2012). Plant Cell Physiol., 53, e1.
Merck Molecular
Force Field (MMFF94)
Balloon & DG-AMMOS
Puranen et al. (2010) J. Comput. Chem., 31,20 1722–1732.
Lagorce et al. (2009) BMC Chem. Biol., 9, 6.
3D structure
Structure
12
• TIPdb is a structured and
searchable database of AntiCancer, Anti-Platelet, and AntiTuberculosis phytochemicals from
indigenous plants in Taiwan.
• 1,116 Taiwan indigenous plants
• 8,800 non-redundant 3D
structures of phytochemicals
• 5,243 records of anticancer,
antiplatelet, and antituberculosis
activities
• http://cwtung.kmu.edu.tw/tipdb
13
14
Lipinski’s rule of five
• Four criteria by analyzing the
physicochemical properties of
>2000 drugs
• Molecular weight <500 Dalton
• Octanol–water partition
coefficient logP <5
• H-bond donors <5
• H-bond acceptors <10
15
16
Activity
No. of
No. of
Activity
Chemicals
Records
1 Cytotoxic
494
2481
2 Anti-Platelet
339
2448
3 Anti-Tuberculosis 233
302
17
Target
identification
Target identification
•
C.W. Tung* (2012) BMC Bioinformatics, 13, 40. (Highly Accessed)
•
C.W. Tung* (2013) Journal of Theoretical Biology, 336, 11-17.
Toxicity
screening
Bioactive
compound
screening
18
Prokaryotic ubiquitin-like protein (Pup)
• Firstly identified post-translational protein modifier in prokaryotes
• 64 amino acids
• Important signal for the selective degradation of proteins
Julie Maupin-Furlow
Nature Reviews Microbiology 10, 100-111 (February 2012)
doi:10.1038/nrmicro2696
19
PupDB
Gene Ontology
Protein
(sites,
function,
sequence)
BLAST
Reference
®
PupDB
3D structure
Tools
(Browse,
Search,
BLAST)
20
Statistics
• 1391 proteins
• 268 with known
pupylation sites
• 1123 without known
pupylation sites
21
BLAST tool
• Predict putative
pupylation sites based
on sequence similarity
http://cwtung.kmu.edu.tw/pupdb
22
Sequence-based prediction of pupylation
sites
• No consensus motif
23
Aims
• Identify discriminant features for pupylation sites
• Develop prediction methods for pupylation sites
• Analyze the preference of pupylation sites
• Analyze the affected functions
24
Composition of k-spaced amino acid pairs
• CKSAAP with k=0, 1, 2, 3 and 4 is used to encode pupylation and nonpupylation sites as 2205-dimensional feature vectors.
• Considering the pair of A and C, the k-spaced amino acid pairs for k=0,
1, 2, 3 and 4 are represented as AC, AxC, AxxC, AxxxC and AxxxxC,
respectively.
N_ _
 N AA N AC

,
,...,
N
N total
 total N total



 441
25
System flow
1. Rank importance of CKSAAPs using χ2-test
2. Searching for top p CKSAAPs giving highest
cross-validation AUC
3. Searching for optimal window size giving
highest AUC
PupDB
Training dataset
162 proteins with 183
pupylation sites
Wrapper-based feature selection
Support vector
machine (SVM)
Using RBF kernel
Training
Optimal window size
LASDFKASDFSAL
Test dataset
20 proteins with 29
pupylation sites
Size: 5
Test
Size: 9
26
10-fold cross-validation
• iPUP is better than GPS-PUP (∆AUC=8%)
6% (AUC)
better than
GPS-PUP
27
Independent test
• 6% (AUC) better than GPS-PUP
90
80
70
60
50
iPUP
40
GPS-PUP
30
20
10
0
Balanced
Accuracy
Accuracy
Sensitivity
Specificity
Precision
MCC
AUC
28
Feature importance
• C-terminal space containing pairs
(5/25=20%)
• Pupylation sites in lysines near the
C-terminal end is 14.43% (14/97)
that is two times higher than
7.95% (212/2666) in all lysines
• In contrast, the percentage of nonpupylation sites in lysines near the
N-terminal end is 4.21% that is
much smaller than 7.95% in all
lysines.
29
Overrepresented amino acid pairs
Amino acid pairs with
positive value is
overrepresented in
pupylation sites.
In contrast, negative
value means
overrepresentation in
non-pupylation sites.
• Gene set enrichment analysis
• -> Identify functions regulated by pupylation
30
Target
identification
Toxicity screening
•
C.W. Tung* and J.L. Jheng (2014) Neurocomputing, 145, 68-74.
•
C.W. Tung* (2014) Lecture Notes in Computer Science8626, 1-9.
•
C.W. Tung* (2013) Lecture Notes in Computer Science, 7986, 231-241.
Toxicity
screening
Bioactive
compound
screening
31
Toxicity screening
• Too many chemicals/ too few experimental data
• Computational methods are potential alternatives to experiments
• Based on the analysis of previous knowledge and experimental data
• Prediction of non-genotoxic hepatocarcinogen
Wilke et al. (2007) Nature Reviews Drug Discovery 6, 904-916
32
Chemical hepatocarcinogenesis
• Exposure -> initiation, promotion and progression
• Carcinogenic chemicals
• Genotoxic carcinogenicity: directly interact with DNA (mutagenic)
• Non-genotoxic carcinogenicity : non-mutagenic
direct
indirect
DNA
33
Experiment methods
• Genotoxic hepatocarcinogenicity
• Several short-term in vitro and in vivo assays
• Non-genotoxic hepatocarcinogenicity
• 2-year rodent bioassays
• Labor-intensive, time-consuming and expensive
• It is desirable to develop alternative methods to efficiently prioritize
potential non-genotoxic hepatocarcinogenicity of chemicals for
further studies
34
Quantitative Structure-Activity Relationship
(QSAR)
• Chemical structure descriptors
Non-genotoxic
hepatocarcinogenicity
?
Genotoxic
hepatocarcinogenicity
35
Toxicogenomics
• Toxicogenomics (TGx)
• Gene expression profile (Transcriptome data)
• Microarray
• Performance better than QSAR (Liu et al., 2011; Yamada et al., 2012; Uehara
et al., 2008)
DNA
QSAR (Structure level)
RNA
TGx (Transcriptome level)
36
Motivation
• Non-genotoxic carcinogenicity could be caused by chemical-protein
interactions
Genotoxic
carcinogenicity
(In vitro assays)
DNA
Non-genotoxic
carcinogenicity
(TGx)
RNA
Non-genotoxic
carcinogenicity
(Chemical-protein interaction)
Protein
37
Aims
• To develop computational methods based chemical-protein
interaction (CPI)
• To identify the critical proteins for assessing non-genotoxic
hepatocarcinogenicity
• To compare the CPI method with QSAR and TGx
38
Dataset
• NCTRlcdb: a National Center for Toxicological Research liver cancer
database
NCTRlcdb
999 chemicals
62 chemicals
with available
TGx data
(Young et al., 2004)
(Natsoulis et al., 2008)
•Liver carcinogen (273)
•Other carcinogen (293)
•Non-carcinogen (304)
•Other (129)
•Direct DNA damage
•Other mechanism
8 positive
32 negative
chemicals
Training dataset
5 positive
17 negative
chemicals
Non-genotoxic
hepatocarcinogen
(Positive)
•Direct DNA damage
•Liver carcinogen
Genotoxic
hepatocarcinogen +
Non-carcinogen
(Negative)
Independent dataset
(the same as Liu et al., 2011)
39
Chemical-protein interactions
• STTICH (Search Tool for Interactions of Chemicals)
• STITCH is a resource to explore known and predicted interactions of
chemicals and proteins
• Chemicals are linked to other chemicals and proteins by evidence
derived from experiments, databases and the literature
• STITCH contains interactions for between 300,000 small molecules
and 2.6 million proteins from 1,133 organisms
• This study use interactions from Rattus norvegicus
40
Example: Acetaminophen
• A widely used over-the-counter
analgesic (pain reliever) and
antipyretic (fever reducer)
Protein
10116.ENSRNOP00000055369
10116.ENSRNOP00000055898
10116.ENSRNOP00000055899
10116.ENSRNOP00000055979
10116.ENSRNOP00000056924
10116.ENSRNOP00000057452
10116.ENSRNOP00000059889
10116.ENSRNOP00000059937
10116.ENSRNOP00000060007
10116.ENSRNOP00000060118
10116.ENSRNOP00000060699
10116.ENSRNOP00000060976
Experi Data Text Combined
mental base mining Score
0
0
0
0
0
0
0
0
0
0
0
279
150
0
0
150
0
150
150
0
150
0
150
0
777
170
157
0
190
154
0
157
0
204
0
0
806
170
157
150
190
259
150
157
150
204
150
279
41
Example: Chemical-chemical interaction
Chemical1
Chemical2
Similarity Experimental Database Textmining
Combined
Score
CID149837371 CID100000312
0
900
0
0
900
CID149835969 CID100033005
0
0
0
211
211
CID146173085 CID100000868
0
0
900
409
939
CID149786972 CID100000193
791
900
0
0
900
CID149786972 CID100002024
566
0
0
127
127
CID149786966 CID100000312
0
900
0
0
900
42
Combined score
• The individual scores for a given chemical–protein or chemical–
chemical interaction are combined into one overall score (von Mering,
2005)
• i.e. Combined Score
• Bayesian scoring scheme
• 𝑆 =1−
𝑖 (1
− 𝑆𝑖 )
43
Decision tree
• Simple and interpretable classifier
• Capable of generating interpretable rules for better understanding of
biological problems
• C5.0, an improved version of C4.5, with smaller trees and less
computation time is applied in this study
• R package C50 (Kuhn and Weston, 2012)
44
Prediction performance
Model Class
type
ifier Feature selection #Feature 5-CV Acc.
CPI
C5.0 Information gain
1
0.82
Wapper-based
QSAR* NCC (mRMR)
15
0.76
TGx-1
Wapper-based
day*
NCC (mRMR)
90
0.87
TGx-3
Wapper-based
day*
NCC (mRMR)
90
0.87
TGx-5
Wapper-based
day*
NCC (mRMR)
90
0.90
* Model performance from Liu et al (2011)
45
Independent test
Model type #Feature Acc. Sen. Spe. MCC
CPI
1
0.86 0.40 1.00 0.580
QSAR*
15
0.55 0.20 0.65 -0.138
TGx-1 day*
90
0.77 0.40 0.88 0.307
TGx-3 day*
90
0.77 0.20 0.94 0.206
TGx-5 day*
90
0.82 0.60 0.88 0.482
46
Learning knowledge from whole dataset
Non-genotoxic
hepatocarcinogen
• IF a chemical interact with ABCC3
THEN non-genotoxic hepatocarcinogenicity
Genotoxic hepatocarcinogen +
non-carcinogen
47
ABCC3: ATP-binding cassette, subfamily C,
member 3
• ATP-binding cassette (ABC) transporters that transports various
molecules across membranes
• Also known as the canalicular multispecific organic anion transporter
2, exhibits drug transmembrane transporter activity
-> critical for drug transport, multidrug resistance and bile acid
transport pathways
48
Difference between CPI and TGx
• CPI-database scores of positive chemicals are significantly different
from that of negative chemicals for ABCC3 (p < 0.05)
• p-values
• TGx-1d: 0.26
• TGx-3d: 0.30
• TGx-5d: 0.41
49
Summary
• The mechanism of action of non-genotoxic hepatocarcinogenicity
might involve complex regulations of proteins and chemicals
• This study presents a novel CPI-based method and demonstrates the
effectiveness of biomarker identification and superior prediction
performance
• Compared to TGx methods requiring assessment of 100 gene
expression values and 5 to 28-day experiments, the identified single
biomarker could be more cost-effective and time-saving
50
Further improvement
• Protein-ligand docking
• Distinguishable features of ABCC3 interactions between non-genotoxic
hepatocarcinogenic and other chemicals
• Construction of a larger dataset
• Only a few non-genotoxic hepatocarcinogens are defined (often inconsistent
definition)
• A more objective and larger dataset is required!
• Mutagenicity (Ames test) data are readily available for a large number of
chemicals
51
Prediction of Ames-negative
hepatocarcinogens
• The Ames test is useful for identifying mutagenic carcinogens with an
accuracy of 80% (Zeiger, 1998; Benigni et al., 2010)
• However, 48% of Ames-negative chemicals are carcinogens
(Cunningham, 2012)
• Additional bioassays do not help in detecting carcinogens from Amesnegative chemicals (Zeiger, 2010)
• The assessment of Ames-negative hepatocarcinogens still depends on
2-year rodent bioassays
• Alternative methods!!
52
Computational methods for non-genotoxic
hepatocarcinogens
• Quantitative structure-activity relationship (QSAR)
• Slightly better than random (Accuracy=55%) (Liu et al., 2011)
• Toxicogenomics method (TGx)
• Microarray data are only available for a small number of chemicals
• Chemical-protein interaction (CPI) and chemical-chemical interaction
(CCI)
• CPI > CCI >= TGx > QSAR (Tung, 2013; Tung and Jheng 2014)
• The results are based on a small dataset consisting of only 62
chemicals
• It is required to collect a larger dataset for developing computational models
Motivation
Chemicals
Ames(+)
Accuracy=80%
Ames(-)
Accuracy=52%
• The assessment of Ames-negative
hepatocarcinogens still depends
on 2-year rodent bioassays
• Alternative methods!!
Aims
• Collect a relatively large dataset
• Determine the best features for predicting Ames-negative
hepatocarcinogens based on decision tree algorithm
• Acquire decision rules for interpretation
55
Dataset
• NCTRlcdb: a National Center for Toxicological Research liver cancer
database
100 chemicals
(60% training set)
NCTRlcdb
999 chemicals
(Young et al., 2004)
•Liver carcinogen (273)
•Other carcinogen (293)
•Non-carcinogen (304)
•Other (129)
73
hepatocarcinogen
93 noncarcinogen
166 Ames-negative
chemicals
33 chemicals
(20% validation
set)
33 chemicals
(20% test set)
Model
construction
Feature selection
Independent test
56
Feature selection
• Step 1) Features with near zero variances were removed
• Baseline model
• Step 2) Minimum redundancy-maximum relevancy (mRMR) method
(De Jay, 2013) is utilized to rank the feature importance
• Step 3) Sequential backward feature elimination algorithm is applied
to iteratively remove features with lowest ranks for selecting a feature
subset giving the highest 10-fold cross-validation (10-CV) accuracy
• Model based on the selected feature subset
57
Results of feature selection
75%
Number of
Features
Training
(10-CV)
Validation
CCI-baseline
223
64%
72.73%
CCI-feature selection
11
70%
84.85%
QSAR-baseline
612
49%
57.58%
69%
72.73%
QSAR-feature selection 27
70%
70%
Accuracy
Method
69%
CCI
QSAR
65%
60%
55%
50%
In addition to the mRMR method, three additional methods of
chi-square test, variable importance of random forest, and
relief were also evaluated with worse validation accuracies of
72.73%, 69.70% and 69.70%, respectively.
2
12
22
32
42
Number of selected features
52
58
Independent test
Validation
1.00
0.90
0.80
0.70
Validation
Method
CCI
QSAR
Test
CCI
QSAR
Accuracy (%)
84.85 72.73 75.76 69.70
Sensitivity (%)
78.57 57.14 50.00 71.43
Specificity (%)
89.47 84.21 94.74 68.42
Precision (%)
84.62 72.73 87.50 62.50
AUC
0.8421 0.7030 0.7180 0.6880
0.60
0.50
0.40
Accuracy
Sensitivity
Specificity
CCI
Precision
AUC
QSAR
Independent Test
1.00
0.90
0.80
0.70
0.60
0.50
0.40
Accuracy
Sensitivity
Specificity
CCI
Precision
AUC
QSAR
59
Decision tree and rules
• Five decision rules
corresponding to five leaf nodes
can be derived from the
decision tree
• In brief,
• IF a chemical interacting with one
of the four chemicals
• THEN hepatocarcinogen
• (correctly predict 27
hepatocarcinogens)
• Otherwise, noncarcinogen
• (55 noncarcinogens are correctly
predicted with 18 miss-classified
hepatocarcinogens)
60
Decision tree
CID
Name
Note
CID000007579
di-(4-aminophenyl)ether
Ames-positive
carcinogens
CID000006324
ethane
CID000005897
2-acetylaminofluorene
CID000187790
deoxyguanosine
Ames-positive
carcinogens
61
Summary
• Computational methods for hepatocarcinogenicity is important for
efficient drug development compared to the traditional 2-year rodent
bioassays
• This study developed an alternative method for predicting Amesnegative hepatocarcinogens
• A decision tree-based method using CCI information and mRMR feature
selection
• The prediction model performs well with validation and test accuracies of
85% and 76%, respectively
• The acquired simple decision rules are useful for identifying Amesnegative hepatocarcinogens with high specificity and precision
Future works
Target
identification
(pupylation)
Toxicity screening
(hepatotoxicity)
• Pupylation is a potential
target for Mycobacterium
tuberculosis
Bioactive
compound
screening
(TIPDB)
• Apply advanced machine
learning algorithms
• Screening of pupylation
inhibitors
• Other toxicities
• Experimental validation
63
Predicting potential effects
induced by maleic acid
64
Maleic acid
• Maleic acid is cis-isomer of butenedioic acid used as a fragrance
ingredient and pH adjuster in beauty products or cosmetics
• Manufacture of polymer products including food packaging and is
listed as a legal indirect component in foods in both the United States
and the European Union countries
• The oral LD50 of the maleic acid are 708 and 2400 mg/kg in rat and
mouse, respectively
• Maleic anhydride, which is rapidly converted to maleic acid when
encountering water, had been illegally added to modified starch to
enhance favorable properties, such as elasticity
65
• The adulteration of maleic anhydride in modified starch gives rise to
the concern about the long-term human oral exposure to maleic acid,
especially in Taiwan
66
Reported toxicity
• Nephrotoxicity in rabbits, rats and dogs
• Renal tubular injury and cell necrosis in the proximal tubules of
treated rats
• Interfered renal proximal Na+ and H+ transport and inhibited the
activity of proximal tubule Na-K-ATPase and H-ATPase
• However, the toxicological effects of maleic acid on human health are
still largely unknown
67
Aims
• Identify maleic acid-interacting proteins
• Infer functions, pathways and diseases affected by maleic acid
• Predict the ADMET profile of maleic acid
68
System flow
Gene Ontology term enrichment analysis
STITCH database
CPI data
101 proteins
Pathway enrichment analysis
Disease inference
Davis et al. (2013) Nucleic Acids Res.
69
Chemical-GO term inference
Chemical-gene interactions
Chemical
Gene-GO term associations
Enrichment analysis
THRB
Response to chemical
7.64e-162
AR
Developmental process
9.37e-156
PPARA
.
.
.
TGFB1
Membrane
.
.
.
Catalytic activity
1.39e-152
.
.
.
2.41e-147
Gene
Corrected P-value
Gene Ontology
70
Gene Ontology (GO) terms
• The Gene Ontology project provides a controlled vocabulary of
terms for describing gene product characteristics and gene product
annotation data
• Many of genes/proteins have Gene Ontology (GO) annotations that
provide information about their associated biological processes,
molecular functions, and cellular components
• The significance of enrichment was calculated by the hypergeometric
distribution and adjusted for multiple testing using the Bonferroni
method
• http://geneontology.org/
71
Enrichment analysis
• Identify functional annotations that are over-represented
• Hypergeometric distribution
• K: the number of genes with the term t
• N: the number of total genes
• n: the number of selected genes
• k: the number of selected genes with the term t
72
Bonferroni correction
• p<0.05 for 20 tests
• p(at least one significant result) = 1-p(no significant results)
• p(at least one significant result) = 1-(1-0.05)20
• p(at least one significant result) = 0.64
• Bonferroni correction
• p < 0.05/20=0.0025
73
Molecular functions (Top 10)
GO level GO term name
Molecular function (MF)
4
Glutamate receptor activity
5
Ionotropic glutamate receptor activity
GO term ID
Corrected p-value No. of genes
GO: 0008066 1.16 E−53
GO: 0004970 4.40E−33
23
15
9
Extracellular-glutamate-gated ion channel activity
GO: 0005234 2.09 E−32
15
3
Transmembrane signaling receptor activity
GO: 0004888 2.17 E−31
42
2
1
2
1
8
Signaling receptor activity
Molecular transducer activity
Signal transducer activity
Receptor activity
Excitatory extracellular ligand-gated ion channel activity
GO: 0038023
GO: 0060089
GO: 0004871
GO: 0004872
GO: 0005231
4.88 E−30
6.18 E−29
6.18 E−29
7.98 E−29
1.84 E−26
42
44
44
43
16
7
Extracellular ligand-gated ion channel activity
GO: 0005230 4.76 E−23
16
74
Cellular component (Top 10)
GO level GO term name
GO term ID
Corrected p-value No. of genes
Cellular component (CC)
3
Intrinsic to plasma membrane
GO: 0031226
1.78 E−40
49
2
Plasma membrane part
GO: 0044459
3.92 E−37
53
2
Plasma membrane
GO: 0005886
3.70 E−34
67
4
Integral to plasma membrane
GO: 0005887
4.20 E−34
44
2
Cell periphery
GO: 0071944
1.28 E−33
67
1
Synapse part
GO: 0044456
2.66 E−29
28
3
Ionotropic glutamate receptor complex
GO: 0008328
4.08 E−28
15
2
Synaptic membrane
GO: 0097060
4.22 E−28
24
1
Synapse
GO: 0045202
2.22 E−25
28
3
Postsynaptic membrane
GO: 0045211
1.96 E−24
21
75
Biological process (Top 10)
GO level GO term name
GO term ID
Corrected p-value No. of genes
Biological process (BP)
4
Synaptic transmission
GO: 0007268
1.08 E−34
37
4
Transmission of nerve impulse
GO: 0019226
1.21 E−32
37
3
Multicellular organismal signaling
GO: 0035637
3.51 E−32
37
3
Cell–cell signaling
GO: 0007267
6.65 E−32
41
3
System process
GO: 0003008
5.02 E−29
45
1
Cellular process
GO: 0009987
1.04 E−28
93
4
Neurological system process
GO: 0050877
2.33 E−28
40
5
Glutamate receptor signaling pathway
GO: 0007215
2.75 E−26
16
2
Single-organism metabolic process
GO: 0044710
4.13 E−23
50
1
Multicellular organismal process
GO: 0032501
8.69 E−23
63
76
Chemical-pathway inference
Chemical-gene interactions
Chemical
Gene-pathway associations
Enrichment analysis
THRB
Metabolism
1.14e-171
AR
Pathway in cancer
4.88e-40
PPARA
.
.
.
TGFB1
PPAR signaling pathway
.
.
.
Developmental biology
4.30e-39
.
.
.
1.93e-37
Gene
Corrected P-value
Pathway
77
Pathways (Top 10)
Pathway
Pathway ID
Corrected p-value
Neuroactive ligand-receptor interaction
KEGG:04080
1.35 E−47
Glutamatergic synapse
KEGG:04724
2.91 E−37
Signal transduction
REACT:111102
1.89 E−18
Neuronal system
REACT:13685
5.11 E−16
Calcium signaling pathway
KEGG:04020
1.45 E−11
Long-term potentiation
KEGG:04720
4.36 E−11
Metabolism
REACT:111217
4.57 E−08
Metabolic pathways
KEGG:01100
1.16 E−07
Cyanoamino acid metabolism
KEGG:00460
3.95 E−07
Amyotrophic lateral sclerosis (ALS)
KEGG:05014
9.69 E−07
78
Chemical-disease inference
Chemical-gene interactions
Chemical
Gene-disease associations
Inference score
THRB
Kidney Disease
2.28
AR
Hypertension
16.21
PPARA
.
.
.
TGFB1
Carcinoma, Hepatocellular
.
.
.
Fatty liver
63.70
.
.
.
14.04
Gene
Disease
79
Inference Score
• The degree of similarity between CTD chemical–gene–disease networks and a similar
scale-free random network
• Many biological networks, such as disease and metabolic networks, have been shown
to be scale-free random networks [Barabasi et al. (1999) Science]
• Inference score = log(p1*p2)
• p1: The first statistic takes into account the
connectivity of the chemical and disease along
with the number of genes used to make the
inference
• p2: The second statistic takes into the account the
connectivity of each of the genes used to make
the inference
• King et al. (2012) PLoS One
p1
p2
80
Disease (Selected)
Disease name
Mental disorder
Mental disorders
Mental disorders diagnosed in childhood
Schizophrenia and disorders with psychotic features
Substance-related disorders
Cocaine-related disorders
Nervous system disease
Epilepsy
Central nervous system diseases
Brain diseases
Nervous system diseases
Cardiovascular disease
Vascular diseases
Cancer
Neoplasms
Disease MeSH ID Corrected p-value No. of genes
MESH: D001523
MESH: D019952
MESH: D019967
MESH: D019966
MESH: D019970
1.08 E−23
8.56 E−18
1.88 E−16
2.80 E−16
6.97 E−12
34
22
16
22
11
MESH: D004827
MESH: D002493
MESH: D001927
MESH: D009422
1.47 E−16
9.37 E−16
9.86 E−15
1.79 E−12
18
27
25
33
MESH: D014652 2.84 E−05
14
MESH: D009369 8.71 E−05
23
81
Proteins related to
mental disorder
82
Predicted ADMET profile of maleic acid
Model
Result
Probability
Blood–brain barrier
Y
0.9017
Human intestinal absorption
Y
0.8740
P-glycoprotein substrate
N
0.8006
P-glycoprotein inhibitor
N
>0.9808
Renal organic cation transporter
N
0.9583
CYP inhibitory
Low
0.9899
Human ether-a-go-go-Related
Gene (hERG) Inhibition, a
prediction for arrhythmias
Weak/Non
>0.9836
Carcinogens
N
0.5130
83
Summary
• GO analyses indicated that maleic acid could influence glutamate
receptor activity and signal transmission at neural system
• Maleic acid was inferred to be associated with mental disorders,
nervous system diseases, cardiovascular disease, and cancers on
humans
• The prediction from QSAR models also suggested that maleic acid
could penetrate into the brain after consumption
• This study provide both the potential risks and mechanisms of
applying maleic acid in food products
• The approach can identify potential risks of poorly characterized
chemicals
84
Acknowledgment
• Dr. Chia-Chi Wang
• Dr. Ying-Chi Lin
• Dr. Ih-Sheng Chen
• Dr. Hsun-Shuo Chang
• Dr. Jih-Heng Li
• Jhao-Liang Jheng
85
Download