presentation - Joachim De Schrijver

advertisement

Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina

Prokaryotic profiling
›
›
›
›

De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing
Classic chain-terminator sequencing
Dye chain-terminator sequencing
Next-generation sequencing

Next-gen sequencing principle
› Massive parallel
› Add ACTGs
› Catch a signal

Roche/454 GS-FLX+ (‘454’)
› Pyrosequencing
 problems with homopolymers (e.g. AAAAAA)
› Long-read sequencing: 500-1000 bp
› Variable sequencing length
› 1 million reads/run
1Gb/run
› Sequencing speed: ~ 1 day/run
› Next-next generation: IonTorrent PGM/Proton

Illumina
› Sequence by synthesis
› Short-read sequencing: 36, 72, …, 150bp
› Fixed sequencing length
› 1 billion reads/run
100Gb/run (= 33 x human genome!)
Sequencing speed: 3 day – 10 days ~ length

Solid
› Short-read sequencing (similar to Illumina)

454

Illumina
Price per run: $10000/run
 Price per machine: $200-500.000

› Supporting IT hardware
› Peripheral devices such as fragmentation
instrument, PCR equipment …
› Negotiating power…

Use service centers!
› Nxtgnt (BE), GATC(EU), Baseclear(NL), BGI …
› No overhead cost, no maintenance etc.
› Cheaper
Next-generation sequencing has
become 2nd generation sequencing
 Next-next-generation sequencing is
almost there: 3rd generation sequencing

› Helicos: True Single Molecule Sequencing
› IonTorrent/Life: Cheap and fast
› Nanopore: Unlimited read size
› …

Evolution sequencing technology goes
hand in hand with evolution of
› IT infrastructure/hardware
› Analysis software

Hardware
› 1 Illumina run ~ 100Gb text-file ~ 5million page
book
› Processing power/storage are an issue!

Software
› Mapping to a human genome: ‘couple of hours’

Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina

Prokaryotic profiling
›
›
›
›

De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing

Prokaryotic genomics 101
› Prokaryotes = bacterias + archaea
› Prokaryotic genomes
 Large circular genome (0.5 – 10 Mb)
‘chromosome’
 Small plasmids (1-1000 kb) (virulence factors,
antibiotics resistance …)
 (Almost) no introns
 Easy ORF annotation

Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina

Prokaryotic profiling
›
›
›
›

De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing






1953: Watson/Crick discover DNA helix
1977: First complete genome
bacteriophage φX174
1995: First genome of free-living organism
H. influenza
2001: First draft of the human genome
2006: >200 complete bacterial genomes
2012: An uncountable number of bacterial
genomes have been sequenced using
next-gen sequencing

Complete bacterial genomes used to be
› Expensive
› Difficult to obtain
› ‘Nature’ or ‘Science’ work
› Remained complex until the invention of
next-generation sequencing

Using next-generation sequencing, de
novo sequencing has become
› Relatively easy
› Relatively cheap
› Routine research

Already >10 complete bacterial
genomes published in 2012
› More than just an assembly!

Practical
1. Get some DNA from an isolated species of
interest
2. Sequence: long or short reads (1-10 days)
3. Obtain your sequences
4. Assemble (1h)
 Pure de novo assembly
 Guided assembly
5. Annotate the genome (days-weeks)

Assembly:
Multiple ‘short’ reads
1 long sequence

Existing software
› Velvet
› SSAKE
› Newbler
› SSAKE
› …
Source: Nature 2009, MacLean et al.

Relatively cheap
› Sequencing cost: depending on coverage
 Illumina, 30x, 5Gb genome: $10-$100
 454, 30x, 5Gb genome: $1000-$5000
› Equipment
 IT infrastructure, sequencing equipment, people …

Relatively easy
› Need for IT support
› No out-of-the-box standard solution for
everything
› Several different software packages for
assembly

Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina

Prokaryotic profiling
›
›
›
›

De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing

De novo genome assembly
› Study of 1 single species
› Need for species isolation

Metagenomics analysis
› Study of a community of species
› No need for isolation (culturing bias!)
› Study the collective gene pool and function
of the community/ecology
› No need for individual functions

Practical
1. Get bacterial DNA or RNA from a sample
 Soil
 Gut/Fecal
 Ocean water (e.g. Craig Venter)
 …
2. Sequence: long or short reads (1-10 days)
3. Obtain your sequences
4. Map on a database of known genes (1 day)
5. Annotate/analyse the community (weeks)

2010: Giant Panda genome (2nd carnivore)
› No umami taster receptor -> no meat affinity
› The panda is more a dog than a bear
› The panda is a carnivore eating bamboo!
Still 2010 !: Panda ‘microbiome’
 Gut microbiome of the panda reveals
the presence of bamboo/cellulose
degrading pathways


A clinical example: gut microbiome can predict
diabetes and malnourishment
Plos One (2011), Brown et al. Plos One (2010),
GutValladares
Pathology et
(2011),Gupta
al.
et al.

Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina

Prokaryotic profiling
›
›
›
›

De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing

Classical SNP analysis - practical
1. Design PCR primers
2. Generate amplicons
3. Re-sequence using long read sequencing
 Conserve ‘SNP blocks’
4. Detect SNPs
5. Correlate SNPs to drug resistance, severity
of symptoms …
Amplicon resequencing is the same for
human, prokaryotic, viral analyses
 Many standardized out-of-the-box
solutions available
 Very simple analysis
 Watch out for the overkill…

›
›
Don’t use a bazooka to kill a fly!
Throughput can be too high

Profile the coding region of hepatitis C
Lauck et al. 2012

Use next-generation sequencing to
predict the optimal HIV therapy
Thielen et al. 2012

Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina

Prokaryotic profiling
›
›
›
›

De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing

Imagine the following research questions
› Which (known) species/groups are present in
a certain sample
› Does this composition alter given a certain
treatment, change of conditions, patients
etc.
No need for de novo genome
sequencing
 No metagenomics: species instead of
functions

Prokaryotes have the gene 16S rDNA,
coding for ribosomal RNA
 The 16S rDNA region is 1.5 kb long
 16S rDNA is specific for each
species/strain
1,500
903
 Theoretical: 4
= 10 possibilities
 In practice: 16S rDNA sequence known
for millions of species


16S rDNA can be isolated in different
species using universal PCR primers
› Isolate/amplify different regions using the
same primers

Compare the isolated sequences
against a database of known sequences

Practical procedure
1. Sample an environment and isolate DNA
2. Do a universal PCR amplification
3. Sequence using long read sequencing: the
longer the better!
4. Obtain sequences
5. Map sequences against a reference
database
6. Annotate the data

Example: The Antarctica project
› Which parameters determine the
›
›
›
›
composition of bacterial communities in
antarctical lakes?
20 different samples/lakes
Sequence 16S rDNA genes
1 x 454 run (1 million 500bp sequences)
Map all sequences back to the RDP
database

Analyse the data using computing
power
› Compare different locations
 Is species A present in location1, location2,…
› Assess the distribution in a single location
 How dominant is the most dominant species in
location 1
 How many species are in location 1
…

Visualize !

Analyse different samples on different
taxonomic levels
› Include taxonomic tree of life of bacterias
› Use a ‘taxonomy browser’

Analyse a single location

Compare different locations
Analysis
Lab work difficulty
Analysis difficulty
De novo genome
++ (isolate)
+
Metagenomics
+
+++ (pathways etc.)
SNP
+++ (design primers)
++ (correlate)
Species quantification
++ (universal primers) ++

Sequencing technology
› Roche/454 GS-FLX (‘454’)
› Illumina

Prokaryotic profiling
›
›
›
›

De novo genome sequencing
Metagenomics
SNP profiling
Species quantification
Viral profiling
› De novo genome sequencing

Viral profiling
› Viral profiling = prokaryotic profiling, but…
 Cheaper
 Faster
 Easier
› De novo genome sequencing = OK
› Don’t spend $10.000 on a 100kb genome!
› Multiplexing/pooling capacity is limited!

Watch out for the overkill
› An illumina run can be split into 8 lanes
› >20 samples per lane can be combined
 Still >100Mb per sample…
Thanks for your attention !
joachim.deschrijver@ugent.be
Download