Current Challenges in Metagenomics: an Overview

Current Challenges in
Metagenomics: an Overview
Chandan Pal
17th December, GoBiG Meeting
Why Metagenomics?
• Study of microbial communities
• Unculturable or difficult to grow in lab
• Understanding microbial population, diversity
Metagenomic Workflow
Environmental Sampling
Library preparation
Sequencing (NGS platforms)
Quality controlling of raw reads (duplicated
reads, low quality reads)
• Read alignment to reference databases
• Assembling the reads into longer contigs
• Predicting genes and assigning function to the
Sampling  bioinformatics analysis
• Sampling (biodiversity richness, evenness,
Extreme environments, DNA extraction)
• Analysing replicates (soil samples, water samples
• Identification of Microbial communities
- Based on knowledge of similar organisms
- Variability among closely related species
(e.g. E.coli K12 vs 0157:H7)
• Large number of sequencing data, data storage, processing,
lack of supercomputing resources, no comprehensive
• Sequence artifacts (sequence clustering: CD-Hit, SEED)
(errors: SLP, pyronoise, Ampliconnoise))
• Metagenomic Assembly (gene rearrangements)
- highly conserved sequences from different species
- assembling all reads into single sequence
- lateral gene transfer
- low coverage
- Speed (GPU, Cloud computing)
• Gene prediction (frameshits at low coverage) & Functional
annotation (homology based, depends on Public database)
• Quality and Context of metadata (meaningful
comparison between studies)
• Integrative approach
• Working on the Server or cloud computing
• Data submission (one Illumina run= 600 Gb =
4-5 times fastq raw files)
• Reverse metagenomics approach
• Biotechnological applications