Today’s topics •General discussion on systems biology •Metabolomics approach for determining growthspecific metabolites based on FT-ICR-MS •Self organizing mapping(SOM) 1 What is systems biology? Each lab/group has its own definition of systems biology. This is because systems biology requires the understanding and integration of different branches of science and different levels of OMICS information together and individual labs/groups are working on different area. Theoretical target: Understanding life as a system. Practical Targets: Serving humanity by developing new generation medical tests, drugs, foods, fuel, materials, sensors, logic gates…… Bioinofomatics Genome: 5’ 3’ a b c b c d e f g Integration of omics i k m to define elements j l (genome, mRNAs, Activation (+) Proteins, metabolites) A h Transcriptome: 5’ 3’ a h Repression (-) G d e D E f i k G g j m l 3’ 5’ 3’ 5’ Proteome, Interactome A B Function A Unit B C Protein C D E F Metabolome FT-MS Metabolite 1 Metabolic Pathway F G H G H K I J K L M Understanding organism as a system I L M J (Systems Biology) comprehensive and global analysis of diverse metabolites produced in cells and organisms B C Metabolite 2 D E F Metabolite 3 Metabolite 4 I L Metabolite 5 Understanding speciesHspecies K relations Metabolite 6 (Survival Strategy) Modelling can be extended to Plant-Human interaction. ・・・ ・・・ Plant Systems Biology Metabolomics Physiological Activity Human Systems Biology Plant-Human interacted Systems biology Okada, T., Afendi, FM., Amin, M., Takahashi, H., Nakamura, K., Kanaya, S., Current Computer Aided Drug Design, 179-196, 10, (2010) Connect with Therapeutic Usage Metabolomics Proteome Interactome Transcriptome ・・・ Proteome Interactome Transcriptome ・・・ Medicinal Herb. ・・・ Prescription ・・・ ・・・ Connect with Physiological Activity Therapeutic Usage Human Omics Plant Omics ・・・ Traditional & Modern Knowledge of Medicinal Plants Modelling can be extended to Plant-Human interaction. ・・・ ・・・ Plant Systems Biology Metabolomics Physiological Activity Human Systems Biology Plant-Human interacted Systems biology x11 x 21 x 21 x 22 X ... ... x N1 x N 2 ... x1M ... x 2 M ... ... ... x NM Connect with Therapeutic Usage Metabolomics Proteome Interactome Transcriptome ・・・ Proteome Interactome Transcriptome ・・・ Medicinal Herb. ・・・ Prescription ・・・ ・・・ Connect with Physiological Activity Therapeutic Usage Human Omics Plant Omics ・・・ Traditional & Modern Knowledge of Medicinal Plants (1) Comprehensively understanding of each layers Principal component analysis BL-SOM DPClus ………. ………. Modelling can be extended to Plant-Human interaction. Therapeutic Usage Physiological activity etc. y1 y2 y ... y N Physiological Activity Connect with Therapeutic Usage ・・・ y f X Metabolomics ・・・ Metabolomics Proteome Interactome Transcriptome ・・・ Proteome Interactome Transcriptome ・・・ Medicinal Herb. ・・・ Prescription ・・・ ・・・ Connect with Physiological Activity Therapeutic Usage Human Omics Plant Omics ・・・ Traditional & Modern Knowledge of Medicinal Plants Herb composition metabolites in herbs. x11 x 21 x 21 x 22 X ... ... x N1 x N 2 ... x1M ... x 2 M ... ... ... x NM (2) Relation between layers Mathematical modeling Partial Least Square Multi-regression Analysis Discriminant analysis Plant-Human interaction ・・・ ・・・ Plant Systems Biology Metabolomics Human Systems Biology Plant-Human interacted Systems biology (1,2)Multivariate analysis Partial least Square modeling Principal Compornet Analysis BL-Selforganizing Map Metabolomics DPClus (Network clustering) …. Transcriptomcs …. Physiological Activity Connect with Therapeutic Usage Metabolomics Proteome Interactome Transcriptome ・・・ Proteome Interactome Transcriptome ・・・ Medicinal Herb. ・・・ Prescription ・・・ ・・・ Connect with Physiological Activity Therapeutic Usage Human Omics Plant Omics ・・・ Traditional & Modern Knowledge of Medicinal Plants This situation can be exteneded to Plant-Human interaction. ・・・ ・・・ ・・・ Metabolomics Physiological Activity Connect with Therapeutic Usage Metabolomics Proteome Interactome Transcriptome ・・・ Medicinal Herb. Proteome Interactome Transcriptome ・・・ Prescription ・・・ ・・・ Connect with Physiological Activity Therapeutic Usage Human Omics Plant Omics ・・・ Traditional & Modern Knowledge of Medicinal Plants (3) Knowledge Systematization of interaction between human and plants Database Plant-Human interaction ・・・ ・・・ ・・・ Metabolomics Physiological Activity Connect with Therapeutic Usage Metabolomics Proteome Interactome Transcriptome ・・・ Medicinal Herb. Proteome Interactome Transcriptome ・・・ Prescription ・・・ ・・・ Connect with Physiological Activity Therapeutic Usage Human Omics Plant Omics ・・・ Traditional & Modern Knowledge of Medicinal Plants (4) Systems Biology for Plant-Human interaction Physiological Activity Connect with Therapeutic Usage ・・・ Plant Systems Biology Metabolomics ・・・ Metabolomics Proteome Interactome Transcriptome ・・・ Proteome Interactome Transcriptome ・・・ Medicinal Herb. ・・・ Prescription ・・・ ・・・ Connect with Physiological Activity Therapeutic Usage Human Omics Plant Omics ・・・ Traditional & Modern Knowledge of Medicinal Plants Human Systems Biology Plant-Human interacted Systems biology [1] Responsibility of synergetic activity [2] reduction of side effects in medication for the complexity of disease derived by mutifactorial causes [3] metabolites in plants interact with multiple targeted proteins in human regulate gene expression lead to dynamical state change in metabolome and physiological activity in human. Metabolomics approach for determining growth-specific metabolites based on FT-ICR-MS 11 [1] Metabolomics Tissue Samples MS Species Metabolite information Molecular weight and formula Fragmentation Pattern Experimental Information Species Metabolite 1 Species-Metabolite relation DB Metabolites B C Metabolite 2 D E F Metabolite 3 Metabolite 4 I L H K Metabolite 5 Metabolite 6 Interpretation of Metabolome 12 Data Processing from FT-MS data acquisition of a time series experiment to assessment of cellular conditions 10 (a) Metabolite quantities for time series experiments OD600 T4 (b) Data preprocessing and constructing data matrix 1 T3 T2 T1 E. coli Time point 0.1 0 (c) Classification of ions into metabolite-derivative group (d) Annotation of ions as metabolites 200 400 (e) Assessment of cellular condition by metabolite composition Molecular formula Exact mass Error Candidate Species 72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli 143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli 662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli 664.1095 665.1168 C21H29N7O14P2 665.1248 0.0080 NADH Escherichia coli ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... M/2 600 800 Time (min) x11 x21 ..... x s1 x12 ..... x1 j ..... x1k ..... x1M ..... ..... x2 j ... x2 k ..... x2 M ..... ..... ..... ... ..... ..... ..... xs 2 ..... ..... ... ..... ..... xsM Metabolites M Detected Theoretical m/z m/z T8 T6 T7 T5 M+1 m/z (b) Data matrix metab.1 metab.200 x x ..... x ..... x1k ..... x1M time 1 11 12 1j time 2 x21 ..... ..... x2 j ... x2k ..... x2 M ..... ..... ..... ..... ... ..... ..... ..... time xs1 ..... xt1 ..... 8 xN1 xs 2 ..... ..... ... ..... ..... xsM ..... ..... ........ ..... ..... ..... xt 2 ..... xtj ... ..... ..... xtM ..... ..... ........ ..... ..... ..... xN 2 ..... xNj ... xNk ..... xNM 719.4869 747.5112 722.505 Software are provided by T. Nishioka (Kyoto Univ./Keio Univ.) 14 M-12 5 M-11 M-8 (c) Classification of ions into 4 metabolite-derivative group (DPClus) 3 M-9 M-10 6 9 8 10 M-15 7 M-16 M-5 M-14 Correlation network for individual M-7 ions. 2-3 M-6 M-13 Intensity ratio between Monoisotope 2-2 (M) and Isotope (M+1) # of Carbons in molecular formula: 11 M-17 PG9 PG3 PG10 1-3 M-3 M-2 PG4 1-4,5 1-1 M-4 M-1 PG7 PG6 PG1 2-1 PG2 PG8 PG5 1-6 1-2 15 (d) Annotation of ions as metabolites using KNApSAcK DB Detected m/za Theoretical m/z Molecular formula 72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli 143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli 253.2137 254.2210 C16H30O2 254.2246 0.0036 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius 253.2185 254.2258 C16H30O2 254.2246 0.0012 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius 281.2444 282.2516 C18H34O2 282.2559 0.0042 Oleic acid Escherichia coli C18H34O2 282.2559 0.0042 cis-11-Octadecanoic acid Lactobacillus plantarum Exact mass Error Candidate Species C18H34O2 282.2559 0.0042 omega-Cycloheptylundecanoic acid Alicyclobacillus acidocaldarius 297.2410 298.2482 C18H34O3 298.2508 0.0026 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius 297.2467 298.2540 C18H34O3 298.2508 0.0032 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius 297.2516 298.2589 C18H34O3 298.2508 0.0081 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius 321.0506 322.0579 C10H15N2O8P 322.0566 0.0013 dTMP Escherichia coli K12 346.0570 347.0643 C10H14N5O7P 347.0631 0.0012 AMP Escherichia coli C10H14N5O7P 347.0631 0.0012 3'-AMP Escherichia coli C10H14N5O7P 347.0631 0.0012 dGMP Escherichia coli 401.0168 402.0241 C10H16N2O11P2 402.0229 0.0012 dTDP Escherichia coli 402.9962 404.0035 C9H14N2O12P2 404.0022 0.0013 UDP Escherichia coli 426.0237 427.0310 C10H15N5O10P2 427.0294 0.0016 Adenosine 3',5'-bisphosphate Escherichia coli C10H15N5O10P2 427.0294 0.0016 ADP Escherichia coli C10H15N5O10P2 427.0294 0.0016 dGDP Escherichia coli C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18A2 Actinomadura spiralis MI178-34F18 C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18C2 Actinomadura spiralis MI178-34F18 454.0391 455.0464 458.1112 459.1185 C15H22N7O8P 459.1267 0.0083 Phosmidosine B Streptomyces sp. strain RK-16 495.1039 496.1112 C24H20N2O10 496.1118 0.0006 Kinamycin A Streptomyces murayamaensis sp. nov. C24H20N2O10 496.1118 0.0006 Kinamycin C Streptomyces murayamaensis sp. nov. 505.9908 506.9981 C10H16N5O13P3 506.9957 0.0023 ATP,dGTP Escherichia coli 547.0756 548.0829 C16H26N2O15P2 548.0808 0.0020 dTDP-L-rhamnose Escherichia coli 565.0503 566.0576 C15H24N2O17P2 566.0550 0.0025 UDP-D-glucose Escherichia coli C15H24N2O17P2 566.0550 0.0025 UDP-D-galactose Escherichia coli C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-mannosamine Escherichia coli C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-glucosamine Escherichia coli 606.0775 607.0848 ADP-L-glycero-beta-D-mannoheptopyranose 618.0897 619.0970 C17H27N5O16P2 619.0928 0.0042 662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli Escherichia coli 16 (e) Estimation of cell condition based on a function of the composition of metabolites. 1 0.1 0 T4 T3 T2 T1 T5 200 T8 T6T7 400 600 PLS (Partial Least Square regression model) -- extract important combinations of metabolites. N (biol.condition) << M (metabolites) 800 Metabolites Time (min) measurement points OD600 10 cell condition Responses K=1 Y N=8 M=220 X PLS cell condition N=8 Y(Cell density)= a1 x1 +…+ aj xj +….+ aM xM xj, the quantity for jth metabolites 17 (e) Assessment of cellular condition by metabolite composition Detection of stage-specific metabolites (PLS model of OD600 to metabolite intensities) y(OD600 Cell Density)= a1 x1 +…+ aj xj +….+ aM xM xj , the quantity for jth aj > 0, stationary phase-dominant metabolites aj < 0, exponential phase-dominant metabolites MS/MS analyses 0.1 dTDP-6-deoxy-L-mannose Parasperone A UDP-glucose, UDP-galactose UDP-N-acetyl-D-glucosamine UDP-N-acetyl-D-mannosamine aj Lenthionine omega-Cycloheptylnonanoate omega-Cycloheptylundecanoate, cis-11-Octadecanoic acid UDP Octanoic acid dTMP, dGMP, 3'-AMP NADH PG2,4,6,8,10 80 metabolites 0.0 120 metabolites Argyrin G omega-Cycloheptyl-alpha-hydroxyundecanoate ATP, dGTP omega-Cycloheptyl-alpha-hydroxyundecanoate dTDP Glyoxylate PG1,3,5,7,9 MS/MS analyses -0.15 Exponential-phase dominant ADP, Adenosine 3',5'-bisphosphate, dGDP ADP-(D,L)-glycero-D-manno-heptose Red: E.coli metabolites;Black: Other bacterial metabolites NAD Stationary-phase dominant 10 Phosphatidylglycerols detected by MS/MS spectra O O unsaturated PGs C15H31 O O X3 O O O C15H31 O O X3 O cyclopropanated PGs Exponential phase Cyclopropane Formaiton of PGs (b) Relation of mass differences among PG1 to 10 marker molecules (Cluster 1) ∆(CH2)2 PG5 30:1(14:0,16:1) 28.0281 ∆(CH2)2 PG1 32:1(16:0,16:1) 28.0315 PG3 34:1(16:0,18:1) US CFA 14.0170 CFA 14.0187 CFA 14.0110 ∆(CH2)2 ∆(CH2)2 PG6 PG2 31:0(14:0,c17:0) 28.0298 33:0(16:0,c17:0) 28.0237 Stationary phase ∆(CH2)2 PG7 34:2(16:1,18:1) 28.0330 Cyclopropane PG9 36:2(18:1,18:1) PG4 CFA 14.0181 34:5(16:0,c19:0) US (Cluster 2) 2.0138 CFA 14.0197 2.0051 ∆(CH2)2 PG8 PG10 35:1(16:1,c19:1) 28.0314 37:1(18:1,c19:0) Formation of PGs occurs in the transition from exponential to stationary phase. Self organizing Maps Time-series Data Growth curve 10 j … T … 1 2 0.1 1 0.01 Time Expression profiles Gene1 Gene2 ... Genei ... GeneD Stage x11 x21 ... xi1 ... xD1 1 x12 ... x22 ... x1 j ... x2 j ... xi 2 ... ... ... xij ... ... ... ... ... ... x D 2 ... x Dj ... 2 …. j ... x1T x2T ... xiT ... x DT … T x1 x2 ... xi ... x D T, # of time-series microarray experiments D, # of genes in a microarray When we measure time-series microarray, gene expression profile is represented by a matrix SOM makes it possible to examine gene similarity and stage similarity simultaneously. Time-series Data Growth curve 10 j … T … 1 2 0.1 1 0.01 Time Expression profiles Gene1 Gene2 ... Genei ... GeneD Stage x11 x21 ... xi1 ... xD1 1 x12 ... x22 ... x1 j ... x2 j ... xi 2 ... ... ... xij ... ... ... ... ... ... x D 2 ... x Dj ... 2 …. j … ... x1T x2T ... xiT ... x DT … T … x1 x2 ... xi ... x D Expression similarity T, # of time-series microarray experiments D, # of genes in a microarray Stage similarity Multivariate Analysis SOM : expression similarity of genes and stage similarity simultaneously. STATES State-Transition When we measure time-series microarray, gene expression profile is represented by a matrix SOM makes it possible to examine gene similarity and stage similarity simultaneously. BL-SOM is available at http://kanaya.aist-nara.ac.jp/SOM/ SOM was developed by Prof. Teuvo Kohonen in the early 1980s Multi-dimensional data/input vectors are mapped onto a two dimensional array of nodes In original SOM, output depends on input order of the vectors. To remove this problem Prof. Kanaya developed BLSOM. [1] Initial model vectors are determined based on PCA of the data. [2] The learning process of BL-SOM makes the output independent of the order of the input vectors. SOM Algorithm Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al. SOM Algorithm Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al. SOM Algorithm Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al. SOM Algorithm in Fig. before Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al. Self-organizing Mapping (Summary) X [1] Detection method for transition points in gene expression and metabolite quantity based on batch-learning Self-organinzing map (BL-SOM) 1 [2] Diversity of metabolites in species Species-metabolite relation Database XT Gene i (xi1,xi2,..,xiT) X2 Gene1 Gene2 ... Genei ... GeneD x11 x21 ... xi1 ... xD1 x12 ... x22 ... x1 j ... x2 j ... xi 2 ... ... ... xij ... ... ... ... ... ... x D 2 ... x Dj ... ... x1T x2T ... xiT ... x DT x1 x2 ... xi ... x D T, different time-series microarray experiments Self-organizing Mapping (Summary) Arrangement of lattice points in multi-dimensional expression space X1 Lattice points are optimized for reflecting data distribution Gene Classification Genes are classified into the nearest lattice points XT Gene i (xi1,xi2,..,xiT) X2 Self-organizing Mapping (Summary) Arrangement of lattice points in multi-dimensional expression space X1 Lattice points are optimized for reflecting data distribution Gene Classification Genes with similar expression profiles are clusterized to identical or near lattice points X1 (Time 1) Feature Mapping X2 (Time 2) In the i-th condition, lattice points containing only highly (low) expressed genes are colored by red (blue). XT X2 (ex.) Xk> Th.(k) X3 (Time 3) Xk< -Th.(k) k=1,2,…,T ….. ….. ….. Non-linear projection of multi-dimensional expression profiles of genes. Original dimension is conserved in individual lattice points. Several types of information is stored in SOM XT (Time T) Visually comparing among each stage of time-series data Estimation of transition points; Bacillus subtilis (LB medium) (Data: Kazuo Kobayashi, Naotake Ogasawara (NAIST)) Stage 1 2 3 4 5 6 7 High prob. 10 Cell Density (OD600 ) 0 6 5 1 7 8 4 3 log(Prob. Density) 2 0.1 -1000 1 0.01 LB 0.001 -2000 0 200 400 SOM for time-series expression profile State transition point is observed between stages 3 and 4 600 800 1000 Low (min) prob. 8 Integerated analysis of gene expression profile and metabolite quantity data of Arabidopsis thaliana (sulfur def./cont.; Data are provided by K.Saito, M. Hirai group (PSC) ) ppm(error rate) Nakamura et al (2004) State transition Feature Maps Leaf Leaf Gene Metabolites (m/z) Root Lattice points with highly difference between 12 and 24 h. Blue: Decreased Red: increased Accurate molecular weights Candidate metabolites corresponding to accurate molecular weights 3. Species-metabolite relation Database Root Download sites of BL-SOM Riken: http://prime.psc.riken.jp/ NAIST: http://kanaya.naist.jp/SOM/ Application of BL-SOM to “-omics” Genome Kanaya et al., Gene, 276, 89-99 (2001) Abe et al., Genome Res., 13, 693-702, (2003) Abe et al., J.Earth Simulator, 6, 17-23, (2003) Abe et al., DNA Res., 12, 281-290. (2005) Transcriptome Haesgawa et al., Plant Methods, 2:5:1-18 (2006) Metabolome Kim et al., J. Exp.Botany, 58, 415-424, (2007) Fukusaki et al., J.Biosci.Bioeng., 100, 347-354, (2005) Transcriptome and Metabolome Hirai, M. Y., M. Klein, et al. J.Biol. Chem., 280, 25590-5 (2005) Hirai, M. Y., M. Yano, et al. Proc Natl Acad Sci U S A 101, 10205-10 (2004) Morioka, R, et al., BMC Bioinformatics, 8, 343, (2007) Yano et al., J.Comput. Aided Chem.,7,125-136 (2007) … … Some other popular clustering/classification algorithms: K-mean clustering Support vector machines 35 Summary of Bioinformatics Tool developed in our laboratory http://kanaya.naist.jp/~skanaya/Web/JTop.html All softwares and DB are freely accessable via Web. Metabolomics -- MS data processing Transcriptome and Metabolomics Profiling -- estimation of transition points Species-metabolite DB Network analysis: PPI Transcriptomics -- Statistics, Profiling, … Some websites www.geneontology.org Some websites where we can find different types of data and links to other databases www.genome.ad.jp/kegg www.ncbi.nlm.nih.gov www.ebi.ac.uk/databases http://www.ebi.ac.uk/uniprot/ http://www.yeastgenome.org/ http://mips.helmholtz-muenchen.de/proj/ppi/ http://www.ebi.ac.uk/trembl http://dip.doe-mbi.ucla.edu/dip/Main.cgi www.ensembl.org