Current Challenges in Metagenomics: an Overview Chandan Pal 17th December, GoBiG Meeting Why Metagenomics? • Study of microbial communities • Unculturable or difficult to grow in lab • Understanding microbial population, diversity Metagenomic Workflow • • • • Environmental Sampling Library preparation Sequencing (NGS platforms) Quality controlling of raw reads (duplicated reads, low quality reads) • Read alignment to reference databases • Assembling the reads into longer contigs • Predicting genes and assigning function to the genes Challenges Sampling bioinformatics analysis • Sampling (biodiversity richness, evenness, Extreme environments, DNA extraction) • Analysing replicates (soil samples, water samples etc.) • Identification of Microbial communities - Based on knowledge of similar organisms - Variability among closely related species (e.g. E.coli K12 vs 0157:H7) Challenges • Large number of sequencing data, data storage, processing, lack of supercomputing resources, no comprehensive pipeline • Sequence artifacts (sequence clustering: CD-Hit, SEED) (errors: SLP, pyronoise, Ampliconnoise)) • Metagenomic Assembly (gene rearrangements) - highly conserved sequences from different species - assembling all reads into single sequence - lateral gene transfer - low coverage - Speed (GPU, Cloud computing) • Gene prediction (frameshits at low coverage) & Functional annotation (homology based, depends on Public database) Challenges • Quality and Context of metadata (meaningful comparison between studies) • Integrative approach • Working on the Server or cloud computing • Data submission (one Illumina run= 600 Gb = 4-5 times fastq raw files) • Reverse metagenomics approach • Biotechnological applications