Lecture 12

advertisement
Today’s topics
•General discussion on systems biology
•Metabolomics approach for determining growthspecific metabolites based on FT-ICR-MS
•Self organizing mapping(SOM)
1
What is systems biology?
Each lab/group has its own definition of systems
biology.
This is because systems biology requires the
understanding and integration of different
branches of science and different levels of OMICS
information together and individual labs/groups
are working on different area.
Theoretical target: Understanding life as a system.
Practical Targets: Serving humanity by developing
new generation medical tests, drugs, foods, fuel,
materials, sensors, logic gates……
Bioinofomatics
Genome:
5’
3’
a
b
c
b
c
d
e
f
g
Integration of omics
i
k
m
to define
elements
j
l
(genome, mRNAs,
Activation (+)
Proteins,
metabolites)
A
h
Transcriptome:
5’
3’
a
h
Repression (-) G
d
e
D
E
f
i
k
G
g
j
m
l
3’
5’
3’
5’
Proteome, Interactome
A
B
Function
A
Unit
B C
Protein
C
D E
F
Metabolome
FT-MS
Metabolite 1
Metabolic Pathway
F
G
H
G
H K
I
J
K
L
M
Understanding
organism as a system
I L
M J
(Systems
Biology)
comprehensive and global analysis of diverse metabolites
produced in cells and organisms
B C
Metabolite 2
D E
F
Metabolite 3
Metabolite 4
I L
Metabolite 5
Understanding speciesHspecies
K
relations
Metabolite 6
(Survival Strategy)
Modelling can be extended to Plant-Human interaction.
・・・
・・・
Plant Systems Biology
Metabolomics
Physiological
Activity
Human Systems Biology
Plant-Human interacted
Systems biology
Okada, T., Afendi, FM., Amin, M., Takahashi, H., Nakamura, K., Kanaya, S.,
Current Computer Aided Drug Design, 179-196, 10, (2010)
Connect with Therapeutic Usage
Metabolomics
Proteome
Interactome
Transcriptome
・・・
Proteome
Interactome
Transcriptome
・・・
Medicinal Herb.
・・・
Prescription
・・・
・・・
Connect with Physiological Activity
Therapeutic
Usage
Human Omics
Plant Omics
・・・
Traditional & Modern
Knowledge of Medicinal
Plants
Modelling can be extended to Plant-Human interaction.
・・・
・・・
Plant Systems Biology
Metabolomics
Physiological
Activity
Human Systems Biology
Plant-Human interacted
Systems biology
 x11 x 21

 x 21 x 22
X
... ...

 x N1 x N 2
... x1M 

... x 2 M 
... ... 

... x NM 
Connect with Therapeutic Usage
Metabolomics
Proteome
Interactome
Transcriptome
・・・
Proteome
Interactome
Transcriptome
・・・
Medicinal Herb.
・・・
Prescription
・・・
・・・
Connect with Physiological Activity
Therapeutic
Usage
Human Omics
Plant Omics
・・・
Traditional & Modern
Knowledge of Medicinal
Plants
(1) Comprehensively understanding
of each layers
Principal component analysis
BL-SOM
DPClus
……….
……….
Modelling can be extended to Plant-Human interaction.
Therapeutic Usage
Physiological activity etc.
 y1 


 y2 
y 
... 


y 
 N
Physiological
Activity
Connect with Therapeutic Usage
・・・
y  f X 
Metabolomics
・・・
Metabolomics
Proteome
Interactome
Transcriptome
・・・
Proteome
Interactome
Transcriptome
・・・
Medicinal Herb.
・・・
Prescription
・・・
・・・
Connect with Physiological Activity
Therapeutic
Usage
Human Omics
Plant Omics
・・・
Traditional & Modern
Knowledge of Medicinal
Plants
Herb composition
metabolites in herbs.
 x11 x 21

 x 21 x 22
X
... ...

 x N1 x N 2
... x1M 

... x 2 M 
... ... 

... x NM 
(2) Relation between layers
Mathematical modeling
Partial Least Square
Multi-regression Analysis
Discriminant analysis
Plant-Human interaction
・・・
・・・
Plant Systems Biology
Metabolomics
Human Systems Biology
Plant-Human interacted
Systems biology
(1,2)Multivariate analysis
Partial least Square modeling
Principal Compornet Analysis
BL-Selforganizing Map
Metabolomics
DPClus (Network clustering)
….
Transcriptomcs ….
Physiological
Activity
Connect with Therapeutic Usage
Metabolomics
Proteome
Interactome
Transcriptome
・・・
Proteome
Interactome
Transcriptome
・・・
Medicinal Herb.
・・・
Prescription
・・・
・・・
Connect with Physiological Activity
Therapeutic
Usage
Human Omics
Plant Omics
・・・
Traditional & Modern
Knowledge of Medicinal
Plants
This situation can be exteneded to Plant-Human interaction.
・・・
・・・
・・・
Metabolomics
Physiological
Activity
Connect with Therapeutic Usage
Metabolomics
Proteome
Interactome
Transcriptome
・・・
Medicinal Herb.
Proteome
Interactome
Transcriptome
・・・
Prescription
・・・
・・・
Connect with Physiological Activity
Therapeutic
Usage
Human Omics
Plant Omics
・・・
Traditional & Modern
Knowledge of Medicinal
Plants
(3) Knowledge Systematization of interaction between
human and plants
Database
Plant-Human interaction
・・・
・・・
・・・
Metabolomics
Physiological
Activity
Connect with Therapeutic Usage
Metabolomics
Proteome
Interactome
Transcriptome
・・・
Medicinal Herb.
Proteome
Interactome
Transcriptome
・・・
Prescription
・・・
・・・
Connect with Physiological Activity
Therapeutic
Usage
Human Omics
Plant Omics
・・・
Traditional & Modern
Knowledge of Medicinal
Plants
(4) Systems Biology for Plant-Human interaction
Physiological
Activity
Connect with Therapeutic Usage
・・・
Plant Systems
Biology
Metabolomics
・・・
Metabolomics
Proteome
Interactome
Transcriptome
・・・
Proteome
Interactome
Transcriptome
・・・
Medicinal Herb.
・・・
Prescription
・・・
・・・
Connect with Physiological Activity
Therapeutic
Usage
Human Omics
Plant Omics
・・・
Traditional & Modern
Knowledge of Medicinal
Plants
Human Systems
Biology
Plant-Human interacted
Systems biology
[1] Responsibility of synergetic activity
[2] reduction of side effects in medication for the complexity of disease derived by mutifactorial causes
[3] metabolites in plants interact with multiple targeted proteins in human
regulate gene expression
 lead to dynamical state change in metabolome and physiological activity in human.
Metabolomics approach for determining growth-specific
metabolites based on FT-ICR-MS
11
[1] Metabolomics
Tissue Samples
MS
Species
Metabolite information
Molecular weight and formula
Fragmentation Pattern
Experimental Information
Species
Metabolite 1
Species-Metabolite relation DB
Metabolites
B C
Metabolite 2
D E
F
Metabolite 3
Metabolite 4
I L
H K
Metabolite 5
Metabolite 6
Interpretation of Metabolome
12
Data Processing from FT-MS data acquisition of a time series experiment
to assessment of cellular conditions
10
(a) Metabolite quantities
for time series experiments
OD600
T4
(b) Data preprocessing and
constructing data matrix
1
T3
T2
T1
E. coli
Time point
0.1
0
(c) Classification of ions into
metabolite-derivative group
(d) Annotation of ions as
metabolites
200
400
(e) Assessment of cellular condition
by metabolite composition
Molecular
formula
Exact mass Error
Candidate
Species
72.9878
73.9951
C2H2O3
74.0004
0.0053 Glyoxylic acid Escherichia coli
143.1080
144.1153
C8H16O2
144.1150
0.0003 Octanoic acid Escherichia coli
662.1037
663.1109
C21H27N7O14P2
663.1091
0.0018
NAD
Escherichia coli
664.1095
665.1168
C21H29N7O14P2
665.1248
0.0080
NADH
Escherichia coli
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
.....
M/2
600
800
Time (min)
 x11

 x21
 .....

 x s1
x12 ..... x1 j ..... x1k ..... x1M 

..... ..... x2 j ... x2 k ..... x2 M 
..... ..... ..... ... ..... ..... ..... 

xs 2 ..... ..... ... ..... ..... xsM 
Metabolites
M
Detected Theoretical
m/z
m/z
T8
T6 T7
T5
M+1
m/z
(b) Data matrix
metab.1
metab.200
x
x
.....
x
.....
x1k ..... x1M 
time 1  11 12
1j


time 2  x21 ..... ..... x2 j ... x2k ..... x2 M 
..... ..... ..... ..... ... ..... ..... .....
time

 xs1

 .....
 xt1

 .....
8  xN1

xs 2 ..... ..... ... ..... ..... xsM 

..... ..... ........ ..... ..... ..... 
xt 2 ..... xtj ... ..... ..... xtM 

..... ..... ........ ..... ..... ..... 
xN 2 ..... xNj ... xNk ..... xNM 
719.4869
747.5112
722.505
Software are provided by T. Nishioka (Kyoto Univ./Keio Univ.)
14
M-12
5
M-11
M-8
(c) Classification
of ions into
4
metabolite-derivative group
(DPClus)
3
M-9
M-10
6
9
8
10
M-15
7
M-16
M-5
M-14
Correlation network for individual
M-7
ions. 2-3
M-6
M-13
Intensity
ratio between Monoisotope
2-2
(M) and Isotope (M+1)
 # of Carbons in molecular formula:
11
M-17
PG9
PG3
PG10
1-3
M-3
M-2
PG4
1-4,5
1-1
M-4
M-1
PG7
PG6
PG1
2-1
PG2
PG8
PG5
1-6
1-2
15
(d) Annotation of ions as metabolites using KNApSAcK DB
Detected
m/za
Theoretical
m/z
Molecular
formula
72.9878
73.9951
C2H2O3
74.0004
0.0053 Glyoxylic acid
Escherichia coli
143.1080
144.1153
C8H16O2
144.1150
0.0003 Octanoic acid
Escherichia coli
253.2137
254.2210
C16H30O2
254.2246
0.0036 omega-Cycloheptanenonanoic acid
Alicyclobacillus acidocaldarius
253.2185
254.2258
C16H30O2
254.2246
0.0012 omega-Cycloheptanenonanoic acid
Alicyclobacillus acidocaldarius
281.2444
282.2516
C18H34O2
282.2559
0.0042 Oleic acid
Escherichia coli
C18H34O2
282.2559
0.0042 cis-11-Octadecanoic acid
Lactobacillus plantarum
Exact mass
Error
Candidate
Species
C18H34O2
282.2559
0.0042 omega-Cycloheptylundecanoic acid
Alicyclobacillus acidocaldarius
297.2410
298.2482
C18H34O3
298.2508
0.0026 alpha-Cycloheptaneundecanoic acid
Alicyclobacillus acidocaldarius
297.2467
298.2540
C18H34O3
298.2508
0.0032 alpha-Cycloheptaneundecanoic acid
Alicyclobacillus acidocaldarius
297.2516
298.2589
C18H34O3
298.2508
0.0081 alpha-Cycloheptaneundecanoic acid
Alicyclobacillus acidocaldarius
321.0506
322.0579
C10H15N2O8P
322.0566
0.0013 dTMP
Escherichia coli K12
346.0570
347.0643
C10H14N5O7P
347.0631
0.0012 AMP
Escherichia coli
C10H14N5O7P
347.0631
0.0012 3'-AMP
Escherichia coli
C10H14N5O7P
347.0631
0.0012 dGMP
Escherichia coli
401.0168
402.0241
C10H16N2O11P2
402.0229
0.0012 dTDP
Escherichia coli
402.9962
404.0035
C9H14N2O12P2
404.0022
0.0013 UDP
Escherichia coli
426.0237
427.0310
C10H15N5O10P2
427.0294
0.0016 Adenosine 3',5'-bisphosphate
Escherichia coli
C10H15N5O10P2
427.0294
0.0016 ADP
Escherichia coli
C10H15N5O10P2
427.0294
0.0016 dGDP
Escherichia coli
C20H19Cl2NO7
455.0539
0.0075 Antibiotic MI 178-34F18A2
Actinomadura spiralis MI178-34F18
C20H19Cl2NO7
455.0539
0.0075 Antibiotic MI 178-34F18C2
Actinomadura spiralis MI178-34F18
454.0391
455.0464
458.1112
459.1185
C15H22N7O8P
459.1267
0.0083 Phosmidosine B
Streptomyces sp. strain RK-16
495.1039
496.1112
C24H20N2O10
496.1118
0.0006 Kinamycin A
Streptomyces murayamaensis sp. nov.
C24H20N2O10
496.1118
0.0006 Kinamycin C
Streptomyces murayamaensis sp. nov.
505.9908
506.9981
C10H16N5O13P3
506.9957
0.0023 ATP,dGTP
Escherichia coli
547.0756
548.0829
C16H26N2O15P2
548.0808
0.0020 dTDP-L-rhamnose
Escherichia coli
565.0503
566.0576
C15H24N2O17P2
566.0550
0.0025 UDP-D-glucose
Escherichia coli
C15H24N2O17P2
566.0550
0.0025 UDP-D-galactose
Escherichia coli
C17H27N3O17P2
607.0816
0.0032 UDP-N-acetyl-D-mannosamine
Escherichia coli
C17H27N3O17P2
607.0816
0.0032 UDP-N-acetyl-D-glucosamine
Escherichia coli
606.0775
607.0848
ADP-L-glycero-beta-D-mannoheptopyranose
618.0897
619.0970
C17H27N5O16P2
619.0928
0.0042
662.1037
663.1109
C21H27N7O14P2
663.1091
0.0018 NAD
Escherichia coli
Escherichia coli
16
(e) Estimation of cell condition based on a function of the
composition of metabolites.
1
0.1
0
T4
T3
T2
T1
T5
200
T8
T6T7
400
600
PLS (Partial Least Square regression model)
-- extract important combinations of metabolites.
N (biol.condition) << M (metabolites)
800
Metabolites
Time (min)
measurement points
OD600
10
cell condition
Responses
K=1
Y
N=8
M=220
X
PLS
cell condition
N=8
Y(Cell density)= a1 x1 +…+ aj xj +….+ aM xM
xj, the quantity for jth metabolites
17
(e) Assessment of cellular condition by metabolite composition
Detection of stage-specific metabolites
(PLS model of OD600 to metabolite intensities)
y(OD600 Cell Density)= a1 x1 +…+ aj xj +….+ aM xM
xj , the quantity for jth
aj > 0, stationary phase-dominant metabolites
aj < 0, exponential phase-dominant metabolites
MS/MS analyses
0.1
dTDP-6-deoxy-L-mannose
Parasperone A
UDP-glucose, UDP-galactose
UDP-N-acetyl-D-glucosamine
UDP-N-acetyl-D-mannosamine
aj
Lenthionine
omega-Cycloheptylnonanoate
omega-Cycloheptylundecanoate, cis-11-Octadecanoic acid
UDP
Octanoic acid
dTMP, dGMP, 3'-AMP
NADH
PG2,4,6,8,10
80 metabolites
0.0
120 metabolites
Argyrin G
omega-Cycloheptyl-alpha-hydroxyundecanoate
ATP, dGTP
omega-Cycloheptyl-alpha-hydroxyundecanoate
dTDP
Glyoxylate
PG1,3,5,7,9
MS/MS analyses
-0.15
Exponential-phase dominant
ADP, Adenosine 3',5'-bisphosphate, dGDP
ADP-(D,L)-glycero-D-manno-heptose
Red: E.coli metabolites;Black: Other bacterial metabolites
NAD
Stationary-phase dominant
10 Phosphatidylglycerols detected by MS/MS spectra
O
O
unsaturated PGs
C15H31
O
O
X3
O
O
O
C15H31
O
O
X3
O
cyclopropanated PGs
Exponential phase
Cyclopropane
Formaiton of PGs
(b) Relation of mass differences among PG1 to 10
marker molecules
(Cluster 1)
∆(CH2)2
PG5
30:1(14:0,16:1) 28.0281
∆(CH2)2
PG1
32:1(16:0,16:1) 28.0315
PG3
34:1(16:0,18:1)
US
CFA 14.0170
CFA 14.0187
CFA 14.0110
∆(CH2)2
∆(CH2)2
PG6
PG2
31:0(14:0,c17:0) 28.0298 33:0(16:0,c17:0) 28.0237
Stationary phase
∆(CH2)2
PG7
34:2(16:1,18:1) 28.0330
Cyclopropane
PG9
36:2(18:1,18:1)
PG4
CFA 14.0181
34:5(16:0,c19:0)
US
(Cluster 2)
2.0138
CFA 14.0197
2.0051
∆(CH2)2
PG8
PG10
35:1(16:1,c19:1) 28.0314 37:1(18:1,c19:0)
Formation of PGs occurs in the
transition from exponential to stationary phase.
Self organizing Maps
Time-series Data
Growth curve
10
j
…
T
…
1
2
0.1
1
0.01
Time
Expression profiles
Gene1
Gene2
...
Genei
...
GeneD
Stage
 x11

 x21
 ...

 xi1
 ...

 xD1
1
x12
...
x22
... x1 j
... x2 j
...
xi 2
...
...
...
xij
...
...
...
...
...
...
x D 2 ... x Dj
...
2
…. j
...
x1T 

x2T 
... 

xiT 
... 

x DT 
… T
 x1 
 
 x2 
 ... 
 
 xi 
 ... 
 
 x D 
T, # of time-series microarray experiments
D, # of genes in a microarray
When we measure time-series microarray, gene expression profile is represented by a matrix
SOM makes it possible to examine gene similarity and stage similarity simultaneously.
Time-series Data
Growth curve
10
j
…
T
…
1
2
0.1
1
0.01
Time
Expression profiles
Gene1
Gene2
...
Genei
...
GeneD
Stage
 x11

 x21
 ...

 xi1
 ...

 xD1
1
x12
...
x22
... x1 j
... x2 j
...
xi 2
...
...
...
xij
...
...
...
...
...
...
x D 2 ... x Dj
...
2
…. j
…
...
x1T 

x2T 
... 

xiT 
... 

x DT 
… T
…
 x1 
 
 x2 
 ... 
 
 xi 
 ... 
 
 x D 
Expression similarity
T, # of time-series microarray experiments
D, # of genes in a microarray
Stage similarity
Multivariate Analysis
SOM : expression similarity of genes and stage
similarity simultaneously.
STATES
State-Transition
When we measure time-series microarray, gene expression profile is represented by a matrix
SOM makes it possible to examine gene similarity and stage similarity simultaneously.
BL-SOM is available at
http://kanaya.aist-nara.ac.jp/SOM/
SOM was developed by Prof. Teuvo Kohonen in the early 1980s
Multi-dimensional data/input vectors are mapped onto a
two dimensional array of nodes
In original SOM, output depends on input order of the
vectors.
To remove this problem Prof. Kanaya developed BLSOM.
[1] Initial model vectors are determined based on PCA of
the data.
[2] The learning process of BL-SOM makes the output
independent of the order of the input vectors.
SOM Algorithm
Source: “Clustering Challenges in Biological Networks” edited by S.
Butenko et. al.
SOM Algorithm
Source: “Clustering Challenges in Biological Networks” edited by S.
Butenko et. al.
SOM Algorithm
Source: “Clustering Challenges in Biological Networks” edited by S.
Butenko et. al.
SOM Algorithm
in Fig. before
Source: “Clustering Challenges in Biological Networks” edited by S.
Butenko et. al.
Self-organizing Mapping
(Summary)
X
[1] Detection method for transition points in gene expression and
metabolite quantity based on batch-learning Self-organinzing map
(BL-SOM)
1
[2] Diversity of metabolites in species
 Species-metabolite relation Database
XT
Gene i (xi1,xi2,..,xiT)
X2
Gene1
Gene2
...
Genei
...
GeneD
 x11

 x21
 ...

 xi1
 ...

 xD1
x12
...
x22
... x1 j
... x2 j
...
xi 2
...
...
...
xij
...
...
...
...
...
...
x D 2 ... x Dj
...
...
x1T 

x2T 
... 

xiT 
... 

x DT 
 x1 
 
 x2 
 ... 
 
 xi 
 ... 
 
 x D 
T, different time-series microarray experiments
Self-organizing Mapping (Summary)
Arrangement of lattice points
in multi-dimensional
expression space
X1
Lattice points are optimized for reflecting data
distribution
Gene Classification
Genes are classified into the nearest lattice
points
XT
Gene i (xi1,xi2,..,xiT)
X2
Self-organizing Mapping (Summary)
Arrangement of lattice points
in multi-dimensional
expression space
X1
Lattice points are optimized for reflecting data
distribution
Gene Classification
Genes with similar expression profiles are
clusterized to identical or near lattice points
X1 (Time 1)
Feature Mapping
X2 (Time 2)
In the i-th condition,
lattice points containing only
highly (low) expressed genes
are colored by red (blue).
XT
X2
(ex.)
Xk> Th.(k)
X3 (Time 3)
Xk< -Th.(k)
k=1,2,…,T
…..
…..
…..
Non-linear projection of multi-dimensional expression profiles of genes.
Original dimension is conserved in individual lattice points.
Several types of information is stored in SOM
XT (Time T)
Visually comparing among
each stage of time-series data
Estimation of transition points; Bacillus subtilis (LB medium)
(Data: Kazuo Kobayashi, Naotake Ogasawara (NAIST))
Stage 1
2
3
4
5
6
7
High prob.
10
Cell Density
(OD600 )
0
6
5
1
7
8
4
3
log(Prob. Density)
2
0.1
-1000
1
0.01
LB
0.001
-2000
0
200
400
SOM for time-series expression profile
State transition point is observed between stages 3 and 4
600
800
1000
Low
(min)
prob.
8
Integerated analysis of gene expression profile and metabolite quantity data of Arabidopsis
thaliana (sulfur def./cont.; Data are provided by K.Saito, M. Hirai group (PSC) )
ppm(error rate)
Nakamura et al (2004)
State transition
Feature Maps
Leaf
Leaf
Gene
Metabolites
(m/z)
Root
Lattice points with
highly difference
between 12 and 24 h.
Blue: Decreased
Red: increased
Accurate molecular weights
 Candidate metabolites corresponding to accurate molecular weights

3. Species-metabolite relation Database
Root
Download sites of BL-SOM
Riken: http://prime.psc.riken.jp/
NAIST: http://kanaya.naist.jp/SOM/
Application of BL-SOM to “-omics”
Genome
Kanaya et al., Gene, 276, 89-99 (2001)
Abe et al., Genome Res., 13, 693-702, (2003)
Abe et al., J.Earth Simulator, 6, 17-23, (2003)
Abe et al., DNA Res., 12, 281-290. (2005)
Transcriptome
Haesgawa et al., Plant Methods, 2:5:1-18 (2006)
Metabolome
Kim et al., J. Exp.Botany, 58, 415-424, (2007)
Fukusaki et al., J.Biosci.Bioeng., 100, 347-354, (2005)
Transcriptome and Metabolome
Hirai, M. Y., M. Klein, et al. J.Biol. Chem., 280, 25590-5 (2005)
Hirai, M. Y., M. Yano, et al. Proc Natl Acad Sci U S A 101, 10205-10 (2004)
Morioka, R, et al., BMC Bioinformatics, 8, 343, (2007)
Yano et al., J.Comput. Aided Chem.,7,125-136 (2007)
…
…
Some other popular clustering/classification algorithms:
K-mean clustering
Support vector machines
35
Summary of Bioinformatics Tool developed in our laboratory
http://kanaya.naist.jp/~skanaya/Web/JTop.html
All softwares and DB are freely accessable via Web.
Metabolomics
-- MS data processing
Transcriptome and Metabolomics Profiling
-- estimation of transition points
Species-metabolite DB
Network analysis: PPI
Transcriptomics
-- Statistics, Profiling, …
Some websites
www.geneontology.org
Some websites
where we can find
different types of
data and links to
other databases
www.genome.ad.jp/kegg
www.ncbi.nlm.nih.gov
www.ebi.ac.uk/databases
http://www.ebi.ac.uk/uniprot/
http://www.yeastgenome.org/
http://mips.helmholtz-muenchen.de/proj/ppi/
http://www.ebi.ac.uk/trembl
http://dip.doe-mbi.ucla.edu/dip/Main.cgi
www.ensembl.org
Download