American Society for Microbiology 2008 - Edwards @ SDSU

advertisement
ASM General Meeting, Boston.
Annotating Metagenomes Using the
NMPDR
Rob Edwards
See also poster:
B-179 (126B)
Aziz et al
Department of Computer Sciences,
San Diego State University
Mathematics and Computer Sciences
Division, Argonne National Laboratory
www.nmpdr.org
www.theseed.org
Number of known sequences
How much has been sequenced?
100
Environmental bacterial
sequencing
genomes
First
bacterial
genome
1,000
bacterial
genomes
Year
www.nmpdr.org
www.theseed.org
How much will be sequenced?
Everybody in
USA
Everybody in
Boston
100
people
All
cultured
Bacteria
www.nmpdr.org
One genome from
every species
Most major
microbial environments
www.theseed.org
The Problem
How do you generate consistent and
accurate annotations for metagenomes?
www.nmpdr.org
www.theseed.org
The SEED
Family
www.nmpdr.org
www.theseed.org
Annotations using subsystems
FIG has developed the notion of
Subsystem – a generalization of “pathway”
as a collection of functional roles jointly
involved in a biological process or complex
Extended subsystems into FIGfams –
protein families that perform the same
functions.
www.nmpdr.org
www.theseed.org
Wikipedia Metabolism
http://en.wikipedia.org/wiki/Portal:Metabolism
Subsystems make up metabolism
SEED Viewer
www.nmpdr.org
www.theseed.org
Populated Subsystem
www.nmpdr.org
www.theseed.org
Subsystems Are Not Just Pathways
genome context
(virulence islands, prophages,
conserved gene clusters)
virulence mechanism
enzymatic activity
cellular localization
predicted or measured
co-regulation
common phenotype
combinations of criteria
www.nmpdr.org
www.theseed.org
Automated Annotations of Complete
genomes
http://rast.nmpdr.org/
• Automated user originated
processing
• Takes 1-7 hours depending on
size and complexity of the
genome
• ~1,500 external submissions,
including 150 genomes not yet
publicly released.
• Reannotation of >500 genomes
complete
• 789 users, 160 organizations,
Automated Annotations of Complete
Metagenomes
http://metagenomics.theseed.org/
MG-RAST Server
Accurate and consistent annotations in a few
days
Automatic metabolic reconstruction
Freely available after registration
www.nmpdr.org
www.theseed.org
Metagenome Annotation
Automated pipeline
– upload sequences in fasta, with or without Qscores
– removes exact duplicates (454 artefact)
– renumbers sequences (mapping provided)
– BLAST against SEED nr, 16S rDNA
– Annotations and metabolic reenactment
– Taxonomic summary
www.nmpdr.org
www.theseed.org
Metagenome Metabolic Reenactment
Phylogenomics
Comparing Metagenomes to Genomes (or
other metagenomes!)
Metabolic potential in environments
Hours of Compute Time
MG-RAST computation
~19 hours of compute per input megabyte
Input size (MB)
How much so far
~200 GS20
~200 FLX
~200 Sanger]
676 metagenomes
10,012,793,995 bp (10 Gbp)
Average: ~15 M bp per genome
Compute time (on a single CPU):
190,243 hours = 7,926 days = 21 years
www.nmpdr.org
www.theseed.org
Lots of sequences
all pyrosequencing
www.nmpdr.org
www.theseed.org
From Sequences To Environments
Stress
Membrane
transport
Sulfur
Signaling
Capsule
Motility
Phosphorus
RNA
CDA 60.2%
CDA 21.7%
Mine
Saltern
Coral
Fish
Respiration
Marine
Microbialites
Animals
Freshwater
Dinsdale et al, Nature 200
Upcoming Features
• More user options (removing
sequences, E-values, percent identities,
etc)
• More databases (ACLAME, human, etc)
• More user generated content (mashups) via webservices and published API
www.nmpdr.org
www.theseed.org
Accessing Data via Web Services
Thanks:
Bahador Nosrat
SDSU
Workshops
Free workshops on NMPDR, RAST, mgRAST, SEED
Upcoming workshops: Greece, Argonne,
Urbana-Champaign, San Diego
Contact Leslie McNeil
lkmcneil@ncsa.uiuc.edu
or visit
http://www.nmpdr.org/
Acknowledgements
Metagenomics Annotation Server FIG
Rick Stevens
Ross Overbeek
Daniel Paarman
Veronika Vonstein
Folker Meyer
Annotators
Bob Olsen
Mark D'Souza
Statistics & Web services
Liz Dinsdale
Dana Hall
Environmental Genomics
Beltran Rodriguez-Brito
Forest Rohwer
Bahador Nosrat
and the labs that
provided sequence
www.nmpdr.org
www.theseed.org
Download