Document

advertisement
Next Generation Sequencing
Miluše Hroudová
Laboratory of Genomics and Bioinformatics
Institute of Molecular Genetics of the ASCR, v.v.i.
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Outline
• Introduction to Next Generation Sequencing (NGS)
• Material - DNA / RNA (types, characteristics, applications)
- genomics x transcriptomics
• Technologies - Principles
- Workflow
- Parametres
• Data analysis (basic pipeline)
• Project example (IMG)
• Technology progression
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Basic Terms
• Base-pair - basic building block of double-stranded DNA, unit of DNA segment
length (bp)
• Read - continuous sequence produced by sequencer
• Coverage - the number of short reads that overlap each other within a specific
genomic region (how many times the particular base or region is read)
• Consensus sequence - idealized sequence in which each position represents the
base most often found when many sequences are compared
• Contig - set of overlapping segments (reads) of DNA sequences forming
continuous consensus sequence
• Assembly - aligning and merging fragments of DNA sequence (reads, contigs) in
order to reconstruct the original sequence
• Scaffold - set of linked non-contiguous series of genomic sequences, consisting of
contigs separated by gaps of known length
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Next Generation Sequencing Introduction
• Modern high-throughput DNA sequencing technologies
• Massive, parallel, rapid ...
• Decreasing price, time, workflow complexity, error rate
• Increasing data quantity and quality, read lenght (data storage
capacity), repertoire of bioinformatics tools
• Wide range of applications
• Third Generation Sequencing (single molecule, real time, in situ ...)
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Input Material, Target Sequence
DNA
eukaryotic
• De novo genome seq
• Resequencing (ChIP-Seq)
viral
• Amplicon seq (16S)
• Sequence capture
• Base modification detection
• Genomic variations
prokaryotic
=> Genomics
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
l chromosome
Genomics
• Area of genetics that concerns the sequencing and analysis of an organism’s genetic
information
• DNA sequencing + bioinformatics => sequence, assemble and analyze the function and
structure of genomes (the complete set of DNA within a single cell of an organism)
Bacterial genome
Human genome
Input Material, Target Sequence
RNA
All organisms
Total RNA
• RNA Seq (Whole Transcriptome
Shotgun Seq – WTSS, normalized)
Eukaryotes only
Coding RNA
4 % of total
Functional RNA
96 % of total
• SNPs detection
• RNA species other than mRNA
Pre-mRNA
(hnRNA)
Pre-rRNA
Pre-tRNA
mRNA
rRNA
tRNA
• Quantitative seq
(without normalization)
=> Transcriptomics
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
snRNA
snoRNA
miRNA
siRNA
Transcriptomics
• Study of the transcriptome - the complete set of RNA transcripts produced from
the genome, under specific circumstances at particular place and time
• Methods: RT PCR, Microarrays, mRNA seq
mRNA sequencing procedure
polyA
mRNA selection
Total RNA
Temperature based
fragmentation
mRNA
rRNA depletion
Normalization
cDNA
Normalized
cDNA
Fragmented
mRNA
Library
preparation
Reverse
transcription
Sequencing run
cDNA library
Adapter ligation
Size selection
Optional
DNA sequencing procedure
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Raw data
(reads)
RNA Quality
• quality of the starting total RNA - RNA integrity number (RIN)
• RIN<7 => unequal read distribution along 5’ and 3’ ends
Number of reads
=> bad sequencing results
454 reads distribution
RIN > 9
RIN < 7
Agilent Bioanalyzer traces
cDNA synthesis
Total RNA (ug)
mRNA with polyA 3’end
SMARTer II A Oligo:
5’-AAGCAGTGGTATCAACGCAGAGTACGCGGG-3’
Modified CDS Primer
5’-AAGCAGTGGTATCAACGCAGAGTTTTTGTTTTTTTCTTTTTTTTTTVN-3’
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
cDNA normalization
abundant transcripts
TRIMMER cDNA normalization kit (Evrogen)
rare transcripts
DSN = duplex-specific nuclease
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Sequencing Principles
• Sequencing by Synthesis
• Sanger/Dideoxy chain termination (Life Technologies, Applied Biosystems)
• Pyrosequencing (Roche/454)
• Reversible terminator (Illumina )
• Ion proton semiconductor (Life Technologies)
• Zero Mode Waveguide (Pacific Biosciences)
• Sequencing by Oligo Ligation Detection
• SOLiD (Applied Biosystems)
• Other
• Asynchronous virtual terminator chemistry - HeliScope (Helios)
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Actual Sequencing Platforms
• Roche/454 (GS FLX+/GS Junior)
• Illumina Genome Analyzer (HiSeq/MiSeq/NextSeq)
• Life Technologies (3500 Genetic Analyzer,
Ion Torrent Proton/PGM)
• Pacific Biosciences (PACBIO RSII)
• Applied Biosystems (SOLiD, 3730xl DNA Analyzer )
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Sanger (3500 GA, 3730xl DNA Analyzer)
Sequencing by synthesis
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Oligo Ligation Detection (SOLiD)
Sequencing
by ligation
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Reversible Terminator (HiSeq, MiSeq, NextSeq)
Cluster generation on a flow-cell surface
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Reversible Terminator (HiSeq, MiSeq, NextSeq)
Sequencing by synthesis
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Pyrosequencing (GS FLX, GS Junior)
Sequencing by synthesis
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Pyrosequencing (GS FLX, GS Junior)
Sequencing by synthesis
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Sequencing Matrices
Sanger, 96-well, 8 capillaries
96 x 600 bp / 24 h
1400 €
Pyrosequencing, 2 regions
1,000,000 x 600 bp / 20 h
5500 €
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Revers. terminator, MiSeq
10,000,000 x 250 bp / 40 h
1150 €
• Special tricks for amplicons, SeqCap, ChIP-Seq, small RNAs ...
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
user service
service
• Nucleic acid isolation/purification
• RNA – selection of particular RNA species, cDNA synthesis
• DNA – fragmentation, size selection (shotgun x paired end)
• Seq library preparation (platform specific adaptors ligation, indexes)
• Amplification of seq library (DNA-binding beads and other carriers)
• Sequencing run set up
• Image processing (images => sequence + quality information)
• Data analysis (assembly, mapping, annotation ...)
user
General Workflow
Pyrosequencing workflow
Library preparation:
Fragmentation
Emulsion PCR amplification:
Bead deposition onto PicoTiter Plate (PTP):
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Adaptor ligation
Paired-end x Mate-pair
• Paired-end – sequencing from both fragment ends (< 1 kb)
• Mate-pair – longer (3-20 kb) molecules circularized via internal adapter
x
Mate-pair types
• Mate-pair – longer (3-20 kb) molecules circularized via internal adapter
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Parametres Comparison
PacBio RSII
Sequencing by
synthesis
> 4000 bp
99,999%
0.06 M
1.6 GB
30 Min – 3 Hours
Read length, fast,
no amplification,
real time record
Low throughput,
low accuracy
Liu et al. 2012. Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology. 251364.
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Parametres Comparison
Liu et al. 2012. Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology. 251364.
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Parametres Comparison of Benchtop Variants
Junior
70 Mb
700 bp
18 hours
2 days
Pyrosequencing
Minimize hand on time,
increase emPCR reproducibility
µg
On/Off instrument
Liu et al. 2012. Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology. 251364.
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Applications and Suitable Seq Type
•
•
•
•
•
de novo DNA/RNA seq – Illumina, Roche/454 (PE), PacBio
Resequencing – SOLiD, Illumina
SNPs detection – Roche/454, PacBio (x InDels variation – Illumina, SOLiD)
Sequence capture - Illumina
Sanger - low-coverage sequencing of individual positions and regions (e.g.,
diagnostic genotyping) or the sequencing of virus- and phage-sized genomes
• Ion Torrent – short amplicons
• SOLiD - quantitative applications, small RNAs, epigenomics
• HeliScope – quantitative applications
• Combination of methods
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Data Analysis, Assembly, Annotation
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Data Analysis, Assembly, Annotation
• technology compatible software (user friendly, inefective)
• general, free access software (search for optimal tool)
• user developed (lack of qualified bioinformaticians)
• combination of different platforms data x problems with assemblers
• platform specific errors, incompatible software parametres
• multiple data filtering procedures
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Machine/Service Availability
• IMG – Roche/454 GS FLX+ (full run including library prep 5500 €/0,7GB)
- Illumina NextSeq (next year? )
• Illumina MiSeq – IEM AS CR, GeneCore EMBL (1150 €/ 10 GB)
• Illumina – GeneCore EMBL (HiSeq lane 100 bp PE 2500 €/200 GB)
• Ion Torrent - GeneCore EMBL, TU Liberec
• PacBio –Netherlands (Macrogen), Germany, Switzerland
• Beijing Genomics Institute (BGI, China) – Illumina HiSeq 2000
- Roche GS FLX+
- SOLiD 4
- Ion Torrent
- Sanger 3730xl DNA Analyzer
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Our Sequencing Projects
GS FLX+, Roche 454
HiSeq2000/MiSeq, Illumina
Amplicon seq
(environmental samples,
16S rDNA genes)
De novo genome sequencing
(bacteria, protozoa, platyhelminthes, plants ...)
Metagenomics
(simple bacterial consortia
x complex environmental samples)
Transcriptomics
(protozoa, cnidarians, insects,
human cancer research ...)
Beckman CEQ 2000XL
- minor sequencing analyses
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Sequence capture
(human cancer research,
animal population genetics ...)
Transcriptomics (Evo-Devo Studies)
Craspedacusta sowerbyi
Six and Pou genes
early evolution
Hroudova et al. 2012. PLoS ONE, 7(4): e36420
De Novo Genome Seq
Achromobacter xylosoxidans
• isolated from biphenyl contaminated soil
• 2-chlorobenzoate and 2,5-dichlorobenzoate degrader
Strnad et al. 2011. J Bacteriol 193: 791-792
Metagenomics
others
ecosystem
total DNA
At. ferroxidans
DNA fragments
sequencing
analysis
F. myxofacies
Metagenomic Research Examples
Cow rumen and biotechnology:
Fishing out genes for cellulose biodegradation
Lean vs. obese phenotype
Functional profiling and comparison of nine biomes
microbiome transplantation
Amplicon Sequencing
• 16S rDNA genes
• bacterial consortia actively degrading biphenyl, benzoate, and naphthalene
in a long-term contaminated soil
Uhlik et al. 2012. PLoS ONE, 7(7): e40653
Sequencing Hot Today and Near Future
• Single-Molecule Real-Time seq – SMRT Pac Bio
(without amplification necessary for signal detection)
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Sequencing Hot Today and Near Future
• Single cell DNA/RNA seq based on micro/nanofluidics technology
(without WGA based on MDA - Φ29 DNA polymerase)
• Nanopores – Oxford Nanopores Technologies
(reduced enzymatic steps,
electric current based detection)
• Silicon based nanopores - IBM
• Human genome (30x) under 1000 $ already announced by Illumina
(HiSeq X Ten)
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Before You Start Planning Seq Experiment
• sufficient sample source
• targeted application/platform
• computational capacity (storage, back up, operations)
• bioinformatics support
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Take-away message
• NGS - high-throughput, massive, parallel, rapid DNA sequencing
• Third generation – single molecule, real time, reduced chemistry
• Basic NGS principles – synthesis, ligation
• Basic workflow
sample - fragmentation - library prep - seq run - data analysis
• Applications – de novo seq, reseq, amplicons, SeqCap, RNA seq
(quantitative expression analysis x normalized cDNA seq)
• Choose the right one application and prepare sample appropriately
• Basic data analysis pipeline
image acquisition, quality metrics - filtering - contig building - annotation
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Acknowledgement
Laboratory
of Genomics and Bioinformatics
IMG AS CR, Prague
Čestmír Vlček
Václav Pačes
Jan Pačes
Hynek Strnad
Michal Kolář
Jakub Rídl
Šárka Pinkasová
•
•
•
•
Laboratory of Transcriptional Regulation, IMG (Dr. Zbyněk Kozmik)
Core facility of Genomics and Bioinformatics, IMG (Mgr. Šárka Kocourková, Mgr. Marcela Vedralová)
GeneCore, EMBL, Heidelberg (Dr. Vladimír Beneš)
Roche CR (Diagnostic Division), Genetica CR (Illumina Division)
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Miluše Hroudová
Institute of Molecular Genetics of the ASCR, v.v.i.
hroudova@img.cas.cz
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027
“Founding the Centre of Transgenic Technologies”
Download