Microbial Research Commons Including Viruses

advertisement
Microbial Research Commons
Including Viruses
Prof. A.S. Kolaskar
Bioinformatics Center
University of Pune
Pune, India
Introduction
• Increasing research in life sciences and
biotechnology in Indian Universities and
national research institutions
• Increased need for microbial and genetic
resources
• Establishment of microbial and other
biological culture collections in universities
and research institutions
Culture Collections In India
• Microbial Type Culture Collection and Gene Bank
(MTCC), Chandigarh – World Intellectual Property
Organization (WIPO, recognized as International
Depository Authority)
• National Collection of Industrial Microorganisms (NCIM),
Pune – Cultures are deposited for patenting
• Virus cultures at National Institute of Virology (NIV)
• National Facility for Animal Tissue and Cell Cuture, Pune
Culture Collections In India
• Anaerobic Bacterial Resource Center
(ABRC), Hyderabad
• National Collection of Dairy Cultures,
Karnal
• National Fungal Culture Collection of
India, Pune
• University of Mumbai Food and
Fermentation Technology Division
21 Culture Collections from India registered with WDCM
Thailand Network of Culture
Collections
• Biotech Culture Collection (BCC) – 3430
• Department of Medical Sciences Thailand
(DMST) – 442
• Department of Agriculture (DOA) – 1163
• Thailand Institute of Scientific and
Technological Research – 515
Issues
•
•
•
•
•
•
•
•
•
Limited characterization
Very few cultures characterized at DNA finger printing level
Data not fully computerized and information not available on the web
Duplication of cultures in the repository
Material Transfer Agreement similar to that in ATCC is followed by
most repositories
No systems in place to detect or prevent misuse of MTA
Redistribution of cultures at informal level
Very few scientists conversant with taxonomic classification even at
the national culture collections
Issues related to Biosafety and National security are not given due
importance
PUMP-E: Salient Features
• Dynamic Representation of pathways
• Dynamically building the organism-specific pathways
from genomic data
• Development of Software for
– Automated data updating (Perl scripts)
– Reformatting and organization of relevant
information from different databases
– Drawing pathways diagrams
– Comparison of pathways
– Visualization of ligands, enzymes
– Prediction of enzyme-substrate interactions
• URL-
http://202.41.70.51/mpe/
Approaches
 Data acquisition & Integration
 Dynamic Visualization of Metabolic
Pathways
 Query Interface
 Molecular Visualization
 Structure Prediction of Proteins
 Simulation of 3D Structures of Enzymes
and Metabolites
PUMP-E
Enzyme
Organism
Homology models
Database
Reaction
Search by
keywords
Compound
Gene
Pathway
User-friendly
Query interface
Dynamic generation of queried pathway
Molecular
viewer
Source Databases for Data Acquisition
•
•
•
•
•
Sequence databases: TIGR, NCBI, EBI
Metabolite databases: KEGG
Metabolic pathway database: KEGG
3D Structure database: PDB
Enzyme Database: KEGG, EXPASY,
IUBMB, BRENDA
• Kinetics Data: NIST
• Organism List : GOLD
 Motifs, patterns & signatures : PROSITE
PUMP-E
Front End and Query System
• Web-based query interface
• Supports complex advanced queries
• Developed using ASP, HTML and java
• Tested by various testing tools such as
Winrunner, Test Director etc.
PUMP-E : Front End and Query System
PUMP-E
Total number of pathways in bacteria under study as per BioCyc 9.1
Organism name
Phylum
Genome Size Total number
(Mbp)
of pathways
Agrobacterium tumefaciens
Bacillus anthracis
Bacillus subtilis
Caulobacter crescentus
Chlamydia trachomatis
Escherichia coli
Francisella tularensis
Haemophilus influenzae
Helicobacter pylori
Mycoplasma pneumoniae
Mycobacterium tuberculosis
CDC1551
Mycobacterium tuberculosis
H37Rv
Shigella flexneri
Treponema pallidum
Vibrio cholerae
Proteobacteria
Firmicutes
Firmicutes
Proteobacteria
Chlamydiae
Proteobacteria
Proteobacteria
Proteobacteria
Proteobacteria
Firmicutes
Actinobacteria
Actinobacteria
Proteobacteria
Spirochaetes
Proteobacteria
5.673462
5.22729
4.21463
4.01695
1.04252
4.63968
1.89282
1.83014
1.66787
0.816394
4.40384
4.41153
4.6072
1.13801
4.03346
207
254
145
176
61
198
184
127
123
48
186
184
179
56
207
Hamming Distance Calculations
• Identical Pathways (0):
– Start and end products are identical;
intermediate steps are same.
• Similar Pathways (1):
– Start and end products are identical;
intermediate steps are different
• Pathways are absent (2):
– Start or end products are not same
Metabolic pathway path profile
Columns represents ‘n’ number of pathways and rows represent 15 bacteria under study. Each column
corresponds to a particular type of pathway. 2 denote pathway follows same path, 1 denotes pathway follows
different path while 0 denotes absence of pathway. This represents a part of the organism specific metabolic
pathway path profile.
Metabolic pathway path profile based tree
Comparison of Pathways from
Genus Bacillus with E.Coli
Bacillus anthracis
Bacillus cereus 10987
Bacillus subtilis
Bacillus cereus Zk
Bacillus anthracis Sterne
Bacillus halodurans C-125
Bacillus anthracis strain A2012
Bacillus licheniformis ATCC
14580
Bacillus anthracis Ames Ancestor
Bacillus cereus ATCC14579
198 Pathways of E.Coli are compared with pathways data from
Biocyc for each of these organisms
Pathways absent in Genus
Bacillus; Present in E.Coli
•
•
•
•
•
Electron transport (aerobic and anaerobic)
Phenyl ethyl amine degradation
L-lyxose degradation
Pyridoxal 5’-phosphate salvage pathway
Super pathway of pyridoxal 5’-phosphate
biosynthesis and salvage
• D-allose degradation
• Fructose lysine degradation
• Taurine degradation
Effect of pathways absent in genus Bacillus
•
•
•
•
Because of absence of L-lyxose degradation pathways in genus bacillus, it
cannot utilize L-lyxose sugar as source of energy
D-Allose cannot be utilized as a sole carbon source by bacteria of genus
bacillus as D-allose degradation pathway is absent
Under sulfate starvation conditions, bacteria from genus bacillus cannot
utilizes taurine as a sulfur source owing to absence of Taurine degradation
pathway.
Bacillus cannot grow on fructoselysine or psicoselysine as the sole carbon
source because of absence of Fructose lysine degradation.
Pathways present in Genus
Bacillus; Absent in E.Coli
•
•
•
•
•
•
•
•
2 Nitro propane degradation
Denitrification pathway
Folate transformations
Formaldehyde assimilation
Methanogenesis from acetate
Octane oxidation
Spermine biosynthesis
Xylulose monophosphate cycle
Effect of pathways absent in E.coli
•
Xylulose monophosphate cycle and Methanogenesis from acetate are
characteristic pathways of methanogenic bacteria and E.coli is not a
methanogenic bacteria. Hence these pathways are absent in E.coli
•
E.coli cannot reduce nitrate to dinitrogen because of absence of
Denitrification pathway
•
Formaldehyde produced from the oxidation of methane and methanol by
methanotrophic bacteria is assimilated by Formaldehyde assimilation
pathway. This pathway is absent in E.coli as it is not methanogenic
Issues
• Taxonomic classification as per NCBI and
thus errors can creep in
• No standard system to represent
metabolic pathways
• Errors in annotation at gene level translate
into errors in metabolic pathways
• Usefulness of metabolic pathways for
characterization of microbes is not
exploited
.
Animal Virus Information System
Data Entry Format
Initial data forms from:
International Catalogue
Of Arboviruses
ICTV code for the
Description of Virus
Characters & ICTV
reports
WHO centre , Munich
database
Scientific literature
Extract data from
EMBL, NBRF-PIR,
HDB & EM pictures
from primary source
Partially filled
forms
Online / Offline
Additional
information in
unfilled fields
Medline/ Literature
Fully filled forms
Validation by Experts
Data updation
Enter data through
Data entry software
Signature peptide sequences for animal virus families
Family
Genus
Protein
Peptide
Togaviridae
Alphavirus
Structural polyprotein
AYEHXXV/TXPN
Filoviridae
Filovirus
Nucleocapsid protein
Iridoviridae
Lymphocystivirus
Iridovirus
Capsid protein
Papovaviridae
Papillomavirus
L1 protein
PQLSAIALGVAT
AHGSTLAGVNV
GEQYQQLREAA
TSXFIDXAT
IEKXXYGG
SRXGDYXL
CKYPDF/Y
GHPLF/YNKV/L
Polyomavirus
Coat protein VP1
Coat protein VP2
PDPXXNEN
GVGPLCK
QVEEVR
WXLPLXLGLYG
Arenaviridae
Arenavirus
Surface glycoprotein
Flaviviridae
Flavivirus
Non structural protein 1
MLXKEYXXRQXXTP
PTHXHIXGXXCPXPHR
LXLXGRSC
CWYXMEIRP
Envelope glycoprotein
DRGWGNXCGXFGKG
Hexon protein
FKPYSGTA
GVLAGQ
PNYCFPL ,NPFNHHRN
Adenoviridae
Species specific peptides Family – Flaviviridae Protein –
Envelope glycoproteins
Virus
St. Louis encephalitis virus
Murray valley encephalitis virus
Japanese encephalitis virus
West Nile virus
Kunjin virus
Langat virus
Yellow fever virus
Powassan virus
Dengue type 1 virus
Dengue type 2 virus
Dengue type 3 virus
Dengue type 4 virus
Tick borne encephalitis virus
Louping ill virus
VNPFISTGGAN
EGRPAT
VTANPYVASSTA
Unique upto number of
mismatches
3
0
3
LDVRMINIEA[S/V]Q
TTKATGWIIQK
STKATGRTILKE
DGAEAWNEAGR
FTCEDKK
VGFSGTRP
MRVTKDTN[D/G][N/S]NL
3
3
3
3
0
0
3
KDNQDWNSVE
GTVLVQV
GTIVIRV
TEATQL
GTILIKV
TTAKEVA
GTTVVKV
GFLTSVGKA
NPHWNNVER
3
0
0
0
0
0
0
0
0
Peptide
VirGen
Comparative genomics & data
mining of viral genomes
Browse VirGen at
http://bioinfo.ernet.in/virgen/virgen.html
Salient Features of VirGen
• Organizes genomic data in a structured fashion navigating from the
family to an isolate
• Full genomes of viruses
• Compilation of representative genome entries for every viral species
(Virus Taxonomy, 7th report of ICTV)
• Complete annotation of every genomic entry
• Graphical representation of genome organization
• Generation of alternative names of proteins
• On-the-fly genome comparisons using BLAST2
• Multiple Sequence Alignment (MSA) of genomes, proteomes and
individual proteins
• Whole genome phylogeny
• Prediction of B-cell epitopes
VirGen Home
Menu to browse
viral families
Navigation
bar
Search using
Keywords &
Motifs
Genome analysis &
Comparative
genomics resources
Guided tour
& Help
Genome Sample Record in VirGen
Tabular display of
genome annotation
Retrieve
sequence
in FASTA
format
‘Alternate names’ of
proteins
Browsing the Module of Whole Genome Phylogenetic Trees
Most parsimonious tree of genus Flavivirus
Input data: Whole genome
Method: DNA parsimony
Bootstrapping: 1000
Case Study: Insertions in Pestivirus 1
891-1787 bp region remains
unannotated using
representative strain
What is the
origin of the
insert ???
BLAST with VirGen confirmed the non-viral origin of the insert
BLAST with GenBank produced significant
match with Bos taurus J-domain protein
Issues
• ICTV classification and information available in published
literature do not always match
• No standard method to describe viral isolates/strains
• Electron micrograph and other image data are not
readily available making identification difficult and
inaccurate
• Recombination occurs much faster in viruses than in
bacteria/other microbes
• Host/vector information needs to be described in
standard language
• Minimal availability of Immunological properties and
therapeutic options in the databases
Suggestions
• Devise measures to build confidence amongst underdeveloped and
developing nations that their resources will not be exploited
• Networking and consortia among scientists, curators of culture
collections, policy makers from developed and developing countries
• Material transfer agreements should be standardized by taking into
consideration national security and biosafety
• Create awareness about open access and open educational
resources
• Lobbying to policy makers to make publicly available the outcomes
of government funded research
• Encouraging scientists to publish in open access journals
• Organize training programs by international experts to improve
quality of culture collections and databases
• Improve access to specialized culture collections
National Knowledge Commission
• The National Knowledge Commission (NKC)
was constituted in 2005 as a high-level advisory
body to the Prime Minister of India. The
Commission has been given a mandate to guide
policy and direct reforms, focusing on certain
key areas such as education, science and
technology, agriculture, industry, e-governance
etc. Easy access to knowledge, creation and
preservation of knowledge systems,
dissemination of knowledge and better
knowledge services are core concerns of the
commission.
National Knowledge Commission
Access
Creation
Services
Concepts
Applications
NKC Working Model
•
•
•
•
•
•
•
•
•
Identify focus areas/target groups
Consultations – formal and informal
Background research and analysis
Constitution of Working Groups
Internal deliberations of NKC
Finalization of recommendations
Submission to PM
Widespread dissemination
Implementation
Suggestions
• Devise measures to build confidence amongst underdeveloped and
developing nations that their resources will not be exploited
• Networking and consortia among scientists, curators of culture
collections, policy makers from developed and developing countries
• Material transfer agreements should be standardized by taking into
consideration national security and biosafety
• Create awareness about open access and open educational
resources
• Lobbying to policy makers to make publicly available the outcomes
of government funded research
• Encouraging scientists to publish in open access journals
• Organize training programs by international experts to improve
quality of culture collections and databases
• Improve access to specialized culture collections
Download