Biodiversity initiative: Integrating Taxonomy, Genomics and Biodiversity + + = ????? Speaker: Benjamin Linard Alfried Vogler Team 1/8 Arthropods metagenomics 1 SAMPLE: Mixed sample Mitochondrial contigs 480 beetle specimens captured in Borneo All 480 beetles De novo assembly into contigs DNA extraction 1x Illumina MiSeq run (8.5Gb) PCR barcodes Est. 288 species Pool DNA by volume Mitochondrial DNA 2/8 Results of Alex Crampton ~5% Plat beetle mitochondrial DNA Shotgun output No. reads Est. proportion mitochondrial reads 33,796,432 4.94% Complete mitogenomes 35 Partial mitogenomes >10kb 85 Partial mitogenomes 2-10kb 420 Harpalinae Buprestidae Log(Biomass) = -2.37 + 0.85(log(No. Reads)) P<0.001; F1,84=73.32; R2=0.47 Tenebrionidae Coccinelidae Chrysomelidae Curculionidae 3/8 Genomic information ? Taxomomy Abundance ~5% beetle mitochondria Genomic analyses Functional information ? ~95% genomic information , ~45 % is Coleoptera DNA # homologous contigs Tribolium castaneum, chromosome 3 Homologous DNA between 4 beetle metagenomic samples position Homologous contig % GC Chromosome region with known sequence NNNN region (unresolved sequence) 4/8 Arthropods metagenomics Computational requirements to analyse 1 arthropod soup : Server: ~128Gb RAM, 24 cores Xeon 2.4 GHz Assemblies Type RAM (Gb) Time (6 cores) Disk (Go) Mitochondrial < 10 < 12 hours < 30 Total DNA ~ 100 ~ 5 days ~ 300-500 ( in the best case... when data complexity is manageable by current algorithms ) Several assemblies (~1.5 per week) Our last DNA assembly Successful Aborted One assembly at a time, unpredictible risk of memory overload 5/8 Arthropods metagenomics Computational requirements to analyse 1 arthropod soup : Server: ~128Gb RAM, 24 cores Xeon 2.4 GHz Assemblies Type RAM (Gb) Time (6 cores) Disk (Go) Mitochondrial < 10 < 12 hours < 30 Total DNA ~ 100 ~ 5 days ~ 300-500 ( in the best case... when data complexity is manageable by current algorithms ) Genomic analyses : Type RAM (Gb) Time (6 cores) Disk (Go) Homology/ alignments <2 ~ 5 days < 10 Statistics / graphs <2 ~ 1 day <2 Need support of SQL database. Currenlty ~ 300 Gb Biodiversity & functional analysis Future ? for the analysis of 1 arthropod soup 5/8 Growth of analysis pipeline ! More CPU to perform a complete metagenomic analyses • standard MIGS • standards MINIMESS (D Field & al, 2008) (J Raes & al, 2007) ~1 000 arthropods trancriptomes/genomes ~50 beetle species transcriptomes ~50 beetle draft/complete genomes Long term perspective: Disk space consuming... # more reference genomes # larger/more complex databases Arthropods biodiversity n traps per site: (soil, canopy, Ground…) N plots Many soup analyses... Mitochondrial analysis, not a problem Full DNA analysis... Which computational power could we access ? More computations to answer more interesting questions Pooling DNA? We will loose metagenomic resolution... More complex assemblies 6/7 7/8 A source of DNA collection Metadata (1 arthropod soup) Soup metadata 125 species, French Guyana Soup metadata 24 species, Congo, Soup metadata 24 unidentified tenebrionidae from Madagascar General: Sampling localisation, date, methods... Mitochondrial information: Identified species/taxons Abundance Other identified species (plants, fungi...) Genomic information: Identified genes Identified functions (sugar degradation) access Scientific community NHM data portal Kemu Soup metadata 53 species + abundance data “I want all NHM collection data concerning the species X” Data storage ? NHM databases integration ? in NHM Integration, links, queries ... 8 specimens, 4 images, 2 metagenomic samples ... Thank you for your attention. BEETLE SOUP, Your daily source of DNA!