Biodiversity initiative

advertisement
Biodiversity initiative:
Integrating Taxonomy, Genomics
and Biodiversity
+
+
= ?????
Speaker: Benjamin Linard
Alfried Vogler Team
1/8
Arthropods metagenomics
1 SAMPLE:
Mixed
sample
Mitochondrial
contigs
480 beetle specimens captured in Borneo
All 480
beetles
De novo
assembly into
contigs
DNA
extraction
1x Illumina
MiSeq run
(8.5Gb)
PCR
barcodes Est. 288
species
Pool DNA
by
volume
Mitochondrial DNA
2/8
Results of Alex Crampton
~5% Plat
beetle mitochondrial DNA
Shotgun output
No. reads
Est. proportion mitochondrial
reads
33,796,432
4.94%
Complete mitogenomes
35
Partial mitogenomes >10kb
85
Partial mitogenomes
2-10kb
420
Harpalinae
Buprestidae
Log(Biomass) = -2.37 + 0.85(log(No. Reads))
P<0.001; F1,84=73.32; R2=0.47
Tenebrionidae
Coccinelidae
Chrysomelidae
Curculionidae
3/8
Genomic information ?
Taxomomy
Abundance
~5% beetle mitochondria
Genomic analyses
Functional information ?
~95% genomic information ,
~45 % is Coleoptera DNA
# homologous contigs
Tribolium castaneum, chromosome 3
Homologous DNA
between 4
beetle metagenomic samples
position
Homologous contig
% GC
Chromosome region
with known sequence
NNNN region
(unresolved sequence)
4/8
Arthropods metagenomics
Computational requirements to analyse 1 arthropod soup :
Server:
~128Gb RAM, 24 cores Xeon 2.4 GHz
Assemblies
Type
RAM (Gb)
Time (6 cores)
Disk (Go)
Mitochondrial
< 10
< 12 hours
< 30
Total DNA
~ 100
~ 5 days
~ 300-500
( in the best case... when data complexity is manageable by current algorithms )
Several assemblies
(~1.5
per week)
Our last DNA
assembly
Successful
Aborted
One assembly at a time, unpredictible risk of memory overload
5/8
Arthropods metagenomics
Computational requirements to analyse 1 arthropod soup :
Server:
~128Gb RAM, 24 cores Xeon 2.4 GHz
Assemblies
Type
RAM (Gb)
Time (6 cores)
Disk (Go)
Mitochondrial
< 10
< 12 hours
< 30
Total DNA
~ 100
~ 5 days
~ 300-500
( in the best case... when data complexity is manageable by current algorithms )
Genomic
analyses :
Type
RAM (Gb)
Time (6 cores)
Disk (Go)
Homology/ alignments
<2
~ 5 days
< 10
Statistics / graphs
<2
~ 1 day
<2
Need support of SQL
database.
Currenlty ~ 300 Gb
Biodiversity & functional analysis
Future ?
for the analysis
of 1 arthropod soup
5/8
Growth of analysis pipeline !
More CPU to perform a complete metagenomic analyses
• standard MIGS
• standards MINIMESS
(D Field & al, 2008)
(J Raes & al, 2007)
~1 000 arthropods trancriptomes/genomes
~50 beetle species transcriptomes
~50 beetle draft/complete genomes
Long term perspective: Disk space consuming... # more reference genomes
# larger/more complex databases
Arthropods biodiversity
n traps per site:
(soil, canopy,
Ground…)
N plots
Many soup analyses...
Mitochondrial analysis, not a problem
Full DNA analysis...
 Which computational power could we access ?
 More computations to answer more interesting questions
Pooling DNA?
We will loose metagenomic resolution...
More complex assemblies
6/7
7/8
A source of DNA collection
Metadata (1 arthropod soup)
Soup metadata
125 species,
French Guyana
Soup metadata
24 species,
Congo,
Soup metadata
24 unidentified
tenebrionidae
from Madagascar
General:
Sampling localisation, date, methods...
Mitochondrial information:
Identified species/taxons
Abundance
Other identified species (plants, fungi...)
Genomic information:
Identified genes
Identified functions (sugar degradation)
access
Scientific
community
NHM data
portal
Kemu
Soup metadata
53 species +
abundance data
“I want all NHM collection data concerning the species X”
Data storage ?
NHM databases integration ?
in NHM
Integration, links,
queries ...
8 specimens,
4 images,
2 metagenomic samples ...
Thank you
for your attention.
BEETLE SOUP,
Your daily
source of DNA!
Download