Bioinformatics Today Prof. A.S. Kolaskar Advisor National Knowledge Commission Data Escalation • Full genome sequences – Prokaryotes ~350 – Eukaryotes ~125 – Viruses ~2,000 • • • • Proteomes Microarray data 3-D Structures Metabolomes ~50 ~80 ~2,500 ~10 ~50 TB Doubles every Twelve Months Data Explosion in Biology Metabolic Pathways Pharmacogenomics Proteins Medical Data Growth Human Genome Petabytes of Data SNPs Combinatorial Chemistry External Research Partnerships HTS The Internet Growth in Clinical Trials ESTs Mergers and Acquisitions 1990 2000 2004 Biodiversity Information System • Species2000 Database – Species 2000 & Integrated Taxonomic Information System (IT IS) have assembled 37 taxonomic databases - 880,000 species Also available on a free CD-ROM. • Real time access to the Species 2000 array of 26 online taxonomic databases - 450,000 species Indian Bioresource Information Network (IBIN) • • • • • • Jeevsampada Sasyasampada Matsyasampada Plants of India Microbial Species Information System Forest Resource Information System Matsya Sampada Fresh water Fin Fishes and Shell Fishes: Data on 1849 species include literature, distribution, common name, coloration, images, synonyms, remarks, references, morphology, economic importance, size etc. and the datasets are collated at National Bureau of Fish Genetic Resources, Lucknow. Searching Matsyasampada • • • • Vernacular Search Scientific Search Major Category Chosen – Vernacular Search – – – – Search a vernacular name Select Language – Maharashtra Drop down menu: Marathi names of all the fish Chosen - Shingat Information Available • • • • • • • Classification Common names Synonym Images Distribution Morphology References Taxonomic Information Taxonomic Heirarchy SuperOrder Ostariophysi Order Siluriformes Class Actinopterygii SubClass Neopterygii Order Family Siluriformes Bagridae SubFamily - Genus Mystus Species montanus Shingat • You are looking for 'Mystus montanus‘ • Common Names – English Wynaad Mystus – Kerala Vari kallencoori – Maharashtra Shingat Morphology (Mystus Montanus) • Morphology Body elongate and compressed, its depth about 5 times in standard length. Head depressed; occipital process narrow, about 4 times as long as broad, reaching to basal bone of dorsal fin; median longitudinal groove on head not extending to occipital process • Color In life, silvery above with a tinge of yellow along abdomen; a silvery line along the flank terminating in a dark spot at base of caudal fin, one or two light bands along the side above lateral line; a bluish spot on shoulder. Fins tinged with green • Finformulae D 7; A iii 9; PI 6; Vi 5 • Fishery This species which attains a length of 15 cm, is of minor interest to fisheries Synonyms Synonyms References Year Bagrus montanus Jerdon 1849 Macrones montanus (Jerdon) 1849 Macrones montanus dibrugarensis Chaudhuri 1913 Mystus montanus (Jerdon) 1849 Mystus vittatus dibrugarensis (Chaudhuri) 1913 References Economic Importance - Author Talwar and Jhingran Book Inland Fisheries vol. 2 Page no Pg.567 Images Distribution Shingta • • • • • • • • • • • • • You are looking for 'Mystus cavasius‘ Common Names Andhra Pradesh Muti-jhella Nahara-jella Thella jella Assam Barsingarah Singarah Bihar Palwa Tengra English Gangetic Mystus Karnataka Nai-kirle Maharashtra Katima Khirkirya Shingta Orissa Guntea Kontia Tengra Punjab Kinger Tamil Nadu Cutta Nai-kelunti Solai-kelunti Vazhappu Vella- kellette Uttar Pradesh Kala-tenguah Kavasi Shingti Singhara Tenguah West Bengal Kabasi-tengra Tengra Images Distribution Explore & Have Fun http://www.ibin.co.in/ Understanding Biology using Metabolic Pathways Database Development of Metabolic Pathways and related tools • Some of the major metabolic pathway databases are: KEGG Boehringer Mannheim WIT Ginsburg Malaria Database BioCyc PUMP-E KEGG KEGG LIMITATIONS Pathways are composites of pathways found in many organism (Unclear what subpathways occur in specific organism) Static visualization No detailed information about enzymes (inhibitors, subunits) No literature citations, no comments Boehringer Mannheim LIMITATIONS No query system Overloaded with information WIT Better representation but still readability needs to be improved PUMP-E: Salient Features • Dynamic Representation of pathways • Dynamically building the organism-specific pathways from genomic data • Development of Software for – Automated data updating (Perl scripts) – Reformatting and organization of relevant information from different databases – Drawing pathways diagrams – Comparison of pathways – Visualization of ligands, enzymes – Prediction of enzyme-substrate interactions • URL- http://202.41.70.51/mpe/ PUMP-E Comparison of Metabolic Pathways Case studies Malarial parasite, Mosquito & Human Metabolome of Plasmodium falciparum • Metabolic pathways of Plasmodium falciparum are known to be stage-specific. • Asexual blood-stage parasites depend on glycolysis and conversion of pyruvate to lactate to derive energy. • MS-MS studies carried out by Florens et.al(2002), revealed that gametocyte and sporozoite stages of the malarial parasite contain peptides of enzymes known to be involved in mitochondrial TCA cycle and oxidative phosphorylation. In Plasmodium falciparum Chromosomal locations of TCA cycle-enzymes Enzyme Chromosome Hs Ag Pf Citrate synthase 12 3L 10 Aconitase 22 3R 13 Isocitrate dehydrogenase 2 2L 13 Alpha-keto glutarate dehydrogenase (E1) Alpha-keto glutarate dehydrogenase (E2) 7 14 2R 3L 8 13 Alpha-keto glutarate dehydrogenase (E3) 7 3L 12 Succinyl CoA ligase 13 2L 14 Succinate dehydrogenase (Cyt b560) (SDHA) Succinate dehydrogenase ( Cyt b small) (SDHB) Succinate dehydrogenase (flavoprotein) (SDHC) 1 11 5 3L X 3L 10 Succinate dehydrogenase (iron-sulfur) (SDHD) 1 2L 12 Fumarase Malate dehydrogenase 1* 7 2R* 3R 9** 6 • * Class II non-iron dependent Fumarase • ** Class I iron-dependent Fumarase Comparison of TCA cycle enzymes of Plasmodium falciparum-Anopheles gambiae-Homo sapiens Comparison of TCA cycle 90 80 70 60 50 40 30 20 10 0 Query:Human Database: Anopheles Query: Human Database: Plasmodium SD SC H_ oA cy tb SD 560 H_ c SD ytb H_ fla SD v o H_ Fu iron m ar as e M DH Ci t_ sy Ac n on ita se IC D AK H G DH 1 AK G DH 2 AK G DH 3 Query: Anopheles Database: Plasmodium Enzyme Plasmodium contains only two SDH subunits in contrast to 4 SDH subunits in human & anopheles Fumarase class I is present in Plasmodium whereas Fumarase class II is present in human & anopheles TCA cycle: Comparison of proteome of host,vector and parasite revealed… • TCA cycle-specific enzymes of Homo sapiens and Anopheles gambiae have high degree of sequence identity. • Aconitase and Fumarase enzymes of Plasmodium falciparum show very less similarity with their human and mosquito counterparts. • An iron regulatory protein that has a C terminal domain similar to Aconitase is present in Plasmodium and it likely carries out the function of Aconitase enzyme. • Fumarase (Class I) an iron-dependent enzyme is present in Plasmodium whereas Fumarase (Class II), an non-iron dependent enzyme is present in human and mosquito. • Succinate dehydrogenase in Plasmodium contains only two subunits in contrast to its human & mosquito counterparts, which have four subunits. Comparative Metabolomics Bacterial Identification Total number of pathways in bacteria under study as per BioCyc 9.1 Organism name Phylum Genome Size (Mbp) Total number of pathways Agrobacterium tumefaciens Bacillus anthracis Bacillus subtilis Caulobacter crescentus Chlamydia trachomatis Escherichia coli Francisella tularensis Haemophilus influenzae Helicobacter pylori Mycoplasma pneumoniae Mycobacterium tuberculosis CDC1551 Mycobacterium tuberculosis H37Rv Shigella flexneri Treponema pallidum Vibrio cholerae Proteobacteria Firmicutes Firmicutes Proteobacteria Chlamydiae Proteobacteria Proteobacteria Proteobacteria Proteobacteria Firmicutes Actinobacteria Actinobacteria Proteobacteria Spirochaetes Proteobacteria 5.673462 5.22729 4.21463 4.01695 1.04252 4.63968 1.89282 1.83014 1.66787 0.816394 4.40384 4.41153 4.6072 1.13801 4.03346 207 254 145 176 61 198 184 127 123 48 186 184 179 56 207 Pathways identical in all 15 bacteria under study w.r.t E.coli • • • • • • • Aspartate biosynthesis and degradation Gluconeogenesis Glycolysis Pentose-Phosphate-Cycle Purine nucleotides de novo biosynthesis I De novo biosynthesis of pyrimidine ribonucleotides Salvage pathway of adenine, hypoxanthine and their nucleotides • Salvage pathways of pyrimidine ribonucleotides Importance of identical pathways • Aspartate is a constituent of proteins and participates in several biosyntheses pathways such as de novo biosynthesis of pyrimidine ribonucleotide, purine nucleotide de novo biosynthesis I, NAD biosynthesis I, pantothenate biosynthesis. Approximately 27 percent of the cell's nitrogen flows through aspartate. • Gluconeogenesis is a process by which glucose is generated. Glucose is an important source of energy. • Glycolysis is one of the most important metabolic processes. It is known to be present in all types of organisms. • Pentose-Phosphate-Cycle is one of the essential pathways of central metabolism. This pathway is an important source of NADPH. • Bases and nucleosides that are formed during degradation of RNA and DNA can be recovered through salvage pathways, which can be reconverted into nucleotides. • All the above pathways are essential for survival of an organism Pathways similar but not identical in all 15 bacteria under study w.r.t E.coli • • • • Glycine biosynthesis I TCA FormylTHF biosynthesis I Superpathway of biosynthesis of aspartate and asparagine • Colanic acid building blocks biosynthesis Comparison of pathways with respect to E.coli Organism name Identical Similar Pathways Pathways Pathways Additional Absent Pathways Agrobacterium tumefaciens (207) Bacillus anthracis (254) Bacillus subtilis (145) Caulobacter crescentus (176) Chlamydia trachomatis (61) Francisella tularensis (184) Haemophilus influenzae (127) Helicobacter pylori (123) Mycoplasma pneumoniae (48) Mycobacterium tuberculosis CDC1551 (186) Mycobacterium tuberculosis H37Rv (184) Shigella flexneri (179) Treponema pallidum (56) Vibrio cholerae (207) 114 94 107 90 43 76 98 59 33 96 94 114 39 113 65 72 70 86 142 94 82 112 155 81 81 60 143 61 19 32 21 22 13 28 18 27 10 21 23 24 16 24 9 56 9 Hamming Distance Calculations • Identical Pathways (0): – Start and end products are identical; intermediate steps are same. • Similar Pathways (1): – Start and end products are identical; intermediate steps are different • Pathways are absent (2): – Start or end products are not same Metabolic pathway path profile Columns represents ‘n’ number of pathways and rows represent 15 bacteria under study. Each column corresponds to a particular type of pathway. 2 denote pathway follows same path, 1 denotes pathway follows different path while 0 denotes absence of pathway. This represents a part of the organism specific metabolic pathway path profile. Metabolic pathway path profile based tree Acknowledgements • • • • Shweta Kohli Manjari Ms. Deshpande Sangita Sawant