Microbial Research Commons Including Viruses Prof. A.S. Kolaskar Bioinformatics Center University of Pune Pune, India Introduction • Increasing research in life sciences and biotechnology in Indian Universities and national research institutions • Increased need for microbial and genetic resources • Establishment of microbial and other biological culture collections in universities and research institutions Culture Collections In India • Microbial Type Culture Collection and Gene Bank (MTCC), Chandigarh – World Intellectual Property Organization (WIPO, recognized as International Depository Authority) • National Collection of Industrial Microorganisms (NCIM), Pune – Cultures are deposited for patenting • Virus cultures at National Institute of Virology (NIV) • National Facility for Animal Tissue and Cell Cuture, Pune Culture Collections In India • Anaerobic Bacterial Resource Center (ABRC), Hyderabad • National Collection of Dairy Cultures, Karnal • National Fungal Culture Collection of India, Pune • University of Mumbai Food and Fermentation Technology Division 21 Culture Collections from India registered with WDCM Thailand Network of Culture Collections • Biotech Culture Collection (BCC) – 3430 • Department of Medical Sciences Thailand (DMST) – 442 • Department of Agriculture (DOA) – 1163 • Thailand Institute of Scientific and Technological Research – 515 Issues • • • • • • • • • Limited characterization Very few cultures characterized at DNA finger printing level Data not fully computerized and information not available on the web Duplication of cultures in the repository Material Transfer Agreement similar to that in ATCC is followed by most repositories No systems in place to detect or prevent misuse of MTA Redistribution of cultures at informal level Very few scientists conversant with taxonomic classification even at the national culture collections Issues related to Biosafety and National security are not given due importance PUMP-E: Salient Features • Dynamic Representation of pathways • Dynamically building the organism-specific pathways from genomic data • Development of Software for – Automated data updating (Perl scripts) – Reformatting and organization of relevant information from different databases – Drawing pathways diagrams – Comparison of pathways – Visualization of ligands, enzymes – Prediction of enzyme-substrate interactions • URL- http://202.41.70.51/mpe/ Approaches Data acquisition & Integration Dynamic Visualization of Metabolic Pathways Query Interface Molecular Visualization Structure Prediction of Proteins Simulation of 3D Structures of Enzymes and Metabolites PUMP-E Enzyme Organism Homology models Database Reaction Search by keywords Compound Gene Pathway User-friendly Query interface Dynamic generation of queried pathway Molecular viewer Source Databases for Data Acquisition • • • • • Sequence databases: TIGR, NCBI, EBI Metabolite databases: KEGG Metabolic pathway database: KEGG 3D Structure database: PDB Enzyme Database: KEGG, EXPASY, IUBMB, BRENDA • Kinetics Data: NIST • Organism List : GOLD Motifs, patterns & signatures : PROSITE PUMP-E Front End and Query System • Web-based query interface • Supports complex advanced queries • Developed using ASP, HTML and java • Tested by various testing tools such as Winrunner, Test Director etc. PUMP-E : Front End and Query System PUMP-E Total number of pathways in bacteria under study as per BioCyc 9.1 Organism name Phylum Genome Size Total number (Mbp) of pathways Agrobacterium tumefaciens Bacillus anthracis Bacillus subtilis Caulobacter crescentus Chlamydia trachomatis Escherichia coli Francisella tularensis Haemophilus influenzae Helicobacter pylori Mycoplasma pneumoniae Mycobacterium tuberculosis CDC1551 Mycobacterium tuberculosis H37Rv Shigella flexneri Treponema pallidum Vibrio cholerae Proteobacteria Firmicutes Firmicutes Proteobacteria Chlamydiae Proteobacteria Proteobacteria Proteobacteria Proteobacteria Firmicutes Actinobacteria Actinobacteria Proteobacteria Spirochaetes Proteobacteria 5.673462 5.22729 4.21463 4.01695 1.04252 4.63968 1.89282 1.83014 1.66787 0.816394 4.40384 4.41153 4.6072 1.13801 4.03346 207 254 145 176 61 198 184 127 123 48 186 184 179 56 207 Hamming Distance Calculations • Identical Pathways (0): – Start and end products are identical; intermediate steps are same. • Similar Pathways (1): – Start and end products are identical; intermediate steps are different • Pathways are absent (2): – Start or end products are not same Metabolic pathway path profile Columns represents ‘n’ number of pathways and rows represent 15 bacteria under study. Each column corresponds to a particular type of pathway. 2 denote pathway follows same path, 1 denotes pathway follows different path while 0 denotes absence of pathway. This represents a part of the organism specific metabolic pathway path profile. Metabolic pathway path profile based tree Comparison of Pathways from Genus Bacillus with E.Coli Bacillus anthracis Bacillus cereus 10987 Bacillus subtilis Bacillus cereus Zk Bacillus anthracis Sterne Bacillus halodurans C-125 Bacillus anthracis strain A2012 Bacillus licheniformis ATCC 14580 Bacillus anthracis Ames Ancestor Bacillus cereus ATCC14579 198 Pathways of E.Coli are compared with pathways data from Biocyc for each of these organisms Pathways absent in Genus Bacillus; Present in E.Coli • • • • • Electron transport (aerobic and anaerobic) Phenyl ethyl amine degradation L-lyxose degradation Pyridoxal 5’-phosphate salvage pathway Super pathway of pyridoxal 5’-phosphate biosynthesis and salvage • D-allose degradation • Fructose lysine degradation • Taurine degradation Effect of pathways absent in genus Bacillus • • • • Because of absence of L-lyxose degradation pathways in genus bacillus, it cannot utilize L-lyxose sugar as source of energy D-Allose cannot be utilized as a sole carbon source by bacteria of genus bacillus as D-allose degradation pathway is absent Under sulfate starvation conditions, bacteria from genus bacillus cannot utilizes taurine as a sulfur source owing to absence of Taurine degradation pathway. Bacillus cannot grow on fructoselysine or psicoselysine as the sole carbon source because of absence of Fructose lysine degradation. Pathways present in Genus Bacillus; Absent in E.Coli • • • • • • • • 2 Nitro propane degradation Denitrification pathway Folate transformations Formaldehyde assimilation Methanogenesis from acetate Octane oxidation Spermine biosynthesis Xylulose monophosphate cycle Effect of pathways absent in E.coli • Xylulose monophosphate cycle and Methanogenesis from acetate are characteristic pathways of methanogenic bacteria and E.coli is not a methanogenic bacteria. Hence these pathways are absent in E.coli • E.coli cannot reduce nitrate to dinitrogen because of absence of Denitrification pathway • Formaldehyde produced from the oxidation of methane and methanol by methanotrophic bacteria is assimilated by Formaldehyde assimilation pathway. This pathway is absent in E.coli as it is not methanogenic Issues • Taxonomic classification as per NCBI and thus errors can creep in • No standard system to represent metabolic pathways • Errors in annotation at gene level translate into errors in metabolic pathways • Usefulness of metabolic pathways for characterization of microbes is not exploited . Animal Virus Information System Data Entry Format Initial data forms from: International Catalogue Of Arboviruses ICTV code for the Description of Virus Characters & ICTV reports WHO centre , Munich database Scientific literature Extract data from EMBL, NBRF-PIR, HDB & EM pictures from primary source Partially filled forms Online / Offline Additional information in unfilled fields Medline/ Literature Fully filled forms Validation by Experts Data updation Enter data through Data entry software Signature peptide sequences for animal virus families Family Genus Protein Peptide Togaviridae Alphavirus Structural polyprotein AYEHXXV/TXPN Filoviridae Filovirus Nucleocapsid protein Iridoviridae Lymphocystivirus Iridovirus Capsid protein Papovaviridae Papillomavirus L1 protein PQLSAIALGVAT AHGSTLAGVNV GEQYQQLREAA TSXFIDXAT IEKXXYGG SRXGDYXL CKYPDF/Y GHPLF/YNKV/L Polyomavirus Coat protein VP1 Coat protein VP2 PDPXXNEN GVGPLCK QVEEVR WXLPLXLGLYG Arenaviridae Arenavirus Surface glycoprotein Flaviviridae Flavivirus Non structural protein 1 MLXKEYXXRQXXTP PTHXHIXGXXCPXPHR LXLXGRSC CWYXMEIRP Envelope glycoprotein DRGWGNXCGXFGKG Hexon protein FKPYSGTA GVLAGQ PNYCFPL ,NPFNHHRN Adenoviridae Species specific peptides Family – Flaviviridae Protein – Envelope glycoproteins Virus St. Louis encephalitis virus Murray valley encephalitis virus Japanese encephalitis virus West Nile virus Kunjin virus Langat virus Yellow fever virus Powassan virus Dengue type 1 virus Dengue type 2 virus Dengue type 3 virus Dengue type 4 virus Tick borne encephalitis virus Louping ill virus VNPFISTGGAN EGRPAT VTANPYVASSTA Unique upto number of mismatches 3 0 3 LDVRMINIEA[S/V]Q TTKATGWIIQK STKATGRTILKE DGAEAWNEAGR FTCEDKK VGFSGTRP MRVTKDTN[D/G][N/S]NL 3 3 3 3 0 0 3 KDNQDWNSVE GTVLVQV GTIVIRV TEATQL GTILIKV TTAKEVA GTTVVKV GFLTSVGKA NPHWNNVER 3 0 0 0 0 0 0 0 0 Peptide VirGen Comparative genomics & data mining of viral genomes Browse VirGen at http://bioinfo.ernet.in/virgen/virgen.html Salient Features of VirGen • Organizes genomic data in a structured fashion navigating from the family to an isolate • Full genomes of viruses • Compilation of representative genome entries for every viral species (Virus Taxonomy, 7th report of ICTV) • Complete annotation of every genomic entry • Graphical representation of genome organization • Generation of alternative names of proteins • On-the-fly genome comparisons using BLAST2 • Multiple Sequence Alignment (MSA) of genomes, proteomes and individual proteins • Whole genome phylogeny • Prediction of B-cell epitopes VirGen Home Menu to browse viral families Navigation bar Search using Keywords & Motifs Genome analysis & Comparative genomics resources Guided tour & Help Genome Sample Record in VirGen Tabular display of genome annotation Retrieve sequence in FASTA format ‘Alternate names’ of proteins Browsing the Module of Whole Genome Phylogenetic Trees Most parsimonious tree of genus Flavivirus Input data: Whole genome Method: DNA parsimony Bootstrapping: 1000 Case Study: Insertions in Pestivirus 1 891-1787 bp region remains unannotated using representative strain What is the origin of the insert ??? BLAST with VirGen confirmed the non-viral origin of the insert BLAST with GenBank produced significant match with Bos taurus J-domain protein Issues • ICTV classification and information available in published literature do not always match • No standard method to describe viral isolates/strains • Electron micrograph and other image data are not readily available making identification difficult and inaccurate • Recombination occurs much faster in viruses than in bacteria/other microbes • Host/vector information needs to be described in standard language • Minimal availability of Immunological properties and therapeutic options in the databases Suggestions • Devise measures to build confidence amongst underdeveloped and developing nations that their resources will not be exploited • Networking and consortia among scientists, curators of culture collections, policy makers from developed and developing countries • Material transfer agreements should be standardized by taking into consideration national security and biosafety • Create awareness about open access and open educational resources • Lobbying to policy makers to make publicly available the outcomes of government funded research • Encouraging scientists to publish in open access journals • Organize training programs by international experts to improve quality of culture collections and databases • Improve access to specialized culture collections National Knowledge Commission • The National Knowledge Commission (NKC) was constituted in 2005 as a high-level advisory body to the Prime Minister of India. The Commission has been given a mandate to guide policy and direct reforms, focusing on certain key areas such as education, science and technology, agriculture, industry, e-governance etc. Easy access to knowledge, creation and preservation of knowledge systems, dissemination of knowledge and better knowledge services are core concerns of the commission. National Knowledge Commission Access Creation Services Concepts Applications NKC Working Model • • • • • • • • • Identify focus areas/target groups Consultations – formal and informal Background research and analysis Constitution of Working Groups Internal deliberations of NKC Finalization of recommendations Submission to PM Widespread dissemination Implementation Suggestions • Devise measures to build confidence amongst underdeveloped and developing nations that their resources will not be exploited • Networking and consortia among scientists, curators of culture collections, policy makers from developed and developing countries • Material transfer agreements should be standardized by taking into consideration national security and biosafety • Create awareness about open access and open educational resources • Lobbying to policy makers to make publicly available the outcomes of government funded research • Encouraging scientists to publish in open access journals • Organize training programs by international experts to improve quality of culture collections and databases • Improve access to specialized culture collections