• Chain termination method
(Sangers et al., 1977): In this method, the sequence of a single stranded DNA molecule is determined by enzymatic synthesis of complementary polynucleotide chains, these chains terminating at specific nucleotide positions.
• The chemical degradation method (Maxum and
Gilbert, 1977), in which the sequence of a double stranded DNA molecule is determined by treatment with chemicals that cut the molecule at specific nucleotide positions
Chain termination method
• Utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction
• Each of the four dideoxynucleotide chain terminators is labelled with different fluorescent dyes (ddA Green, ddT
Red, ddG Yellow and ddC Blue), each of which with different wavelengths of fluorescence and emission .
• The fragment stopping at the base position can be detected on the gel by a powerful laser beam.
• Owing to its greater expediency and speed, dyeterminator sequencing is now the mainstay in automated sequencing .
Capillary electrophoresis
View of dyeterminator read
Sanger method can sequence only 1000–1200 bp in one reaction
Genome sequencing
1970s: Bacteriophage
1995, the bacterium Haemophilus influenzae
Followed by several other bacteria and archaea
The first eukaryotic chromosome sequence in 1992: yeast
Many eukaryotes several plants and their pathogens
2006: Human genome
Until 2006, all genome sequencing used Sanger chemistry
Human Genome Project
Genomic DNA is enzymatically or mechanically broken down
Cloned into sequencing vectors
Sequenced individually
Numerous fragments of DNA sequenced – BIRTH OF GENOME
INFORMATICS AND NEXT GENERATION SEQUENCING
The core philosophy of massive parallel sequencing used in next-generation sequencing (NGS) is adapted from shotgun sequencing
NGS -breaking the entire genome into small pieces
Ligating DNA to designated adapters
DNA synthesis (sequencing-by-synthesis) massively parallel sequencing
Coverage (number of short reads that overlap each other within a specific genomic region)
Sufficient coverage is critical for accurate assembly of the genomic sequence.
To ensure the correct identification of genetic variants, short-read coverage of at least 30× is recommended in whole-genome scans
(Zhang et al., 2011. J Genet Genomics, 38:95-109)
Next generation sequencing
• Enables a genome to be sequenced within hours to days .
• The 454 FLX Pyrosequencer from Roche Applied Sciences was the first next-generation sequencer to become commercially available in 2004,
• The Solexa 1G Genetic Analyzer from Illumina was commercialized
2006
• SOLiD (Supported Oligonucleotide Ligation and Detection) System from
Applied Biosystems launched in 2007
Next-next generation or third generation sequencing
• Single molecule sequencing
Platforms on NGS technologies
Technology Amplification Read length
Throughput Sequence by synthesis
Currently available
Roche/GS-FLX Titanium
Illumina/HiSeq 2000, HiScan
ABI/SOLiD 5500xl
Polonator/G.007
Helicos/Heliscope
In development
Pacific BioSciences/RS
Visigen Biotechnologies
U.S. Genomics
Genovoxx
Oxford Nanopore Technologies
NABsys
E lectronic BioSciences
Emulsion PCR
Bridge PCR (Cluster
PCR)
Emulsion PCR
Emulsion PCR
No
400-600 500 bp
2 x 100 bp
50-100 bp
26 bp
Mbp/run
200
Gbp/run
>100
Gbp/run
8-10
Gbp/run
35 (25-
55) bp
21e37
Gbp/run
Pyrosequencing
Reversible terminators
Sequencing-by-ligation
(octamers)
Sequencing-by-ligation
(monomers)
True single-molecule sequencing (tSMS)
No
Single-molecule real time
(SMRT)
No
No
No
No
No
No
1000 bp N/A
>100
Kbp
N/A
N/A
N/A
N/A
N/A
35 bp
N/A
N/A
N/A
N/A
N/A
Single-molecule mapping
Single-molecule sequencing by synthesis
Nanopores/exonucleasecoupled
Nanopores
Nanopores
BioNanomatrix/nano analyzer
No
GE Global Research No
IBM
LingVitae
Complete Genomics base 4 innovation
CrackerBio
Reveo
I ntelligent BioSystems lLightSpeed Genomiics
No
No
No
No
No
No
No
400 Kbp N/A Nanochannel arrays
N/A N/A
N/A
N/A
N/A
N/A
70 bp N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
Closed complex/nanoparticle
Nanopores
Nanopores
DNA nanoball arrays
Nanostructure arrays
Nanowells
N/A
N/A
Direct-read sequencing by EM
Nano-knife edge
Electronics
nd
3130XL
Applied Biosystem
700bpx96
Specific targets
(PCR products,clones)
GS-FLX-Titanium
Roche
400bp x1 million
De novo sequencing
Genome Analyser
Illumina
100bp x 2 billion
Re-sequencing
(can de novo sequencing)
SOLiD
Applied Biosystem
50bp x 2.4 billion
Re-sequencing
(can de novo sequencing)
Roche GS-FLX 454 Genome Sequencer
Longest short reads (600 bp) among all the NGS platforms
Generates ~400 –600 Mb of sequence reads per run de novo assembly of microbes in metagenomics
Raw base accuracy reported is very good (over 99%)
Chemistry
• Nucleotide incorporation releases pyrophosphate (PPi)
• ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5 ´ phosphosulfate.
• This ATP acts as fuel to the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP.
• The light produced in the luciferase-catalyzed reaction is detected by a camera and analyzed in a program.
• Unincorporated nucleotides and ATP are degraded by the apyrase , and the reaction can restart with another nucleotide .
Illumina/Solexa Genome Analyzer
Superior data quality and proper read lengths have made it the system of choice for many genome sequencing projects.
Majority of published NGS papers used Genome Analyzer. uses a proprietary reversible terminator-based method that enables detection of single bases as they are incorporated into growing DNA strands
A fluorescently-labeled terminator is imaged as each dNTP is added and then cleaved to allow incorporation of the next base.
Since all four reversible terminator-bound dNTPs are present during each sequencing cycle, natural competition minimizes incorporation bias.
The end result is true base-by-base sequencing that enables the industry’s most accurate data for a broad range of applications.
Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk
ABI SOLiD platform
The latest model, 5500 ×l solid system (previously known as SOLiD4hq)
Can generate over 2.4 billion reads per run with a raw base accuracy of
99.94%
The SOLiD4 platform probably provides the best data quality as a result of its sequencing-by-ligation approach but the DNA library preparation procedures prior to sequencing can be tedious and time consuming.
Preferred for Re-sequencing than denovo sequencing.
(Zhang et al., 2011)
Sample Preparation
Nucleic acid isolation
Double-stranded cDNA synthesis
Rapid library preparation
Fragmentation (Nebulization/ shearing) into smaller sized fragments of 400 to 1000 bp
Addition of adopters
Remove small fragment (<300 bp)
Library Quality Assessment
Emulsion based clonal amplification (emPCR)
• Preparation of reagents and of emulsion oil
• Preparation of amplification mix (addition of additive, amplification mix, primers, enzyme mix and PPiase)
• DNA library capture (one molecule of DNA per bead and one bead per aqueous microreactor to be insulated from other beads by surrounding oil.
• Emulsification (shaking captured library to form a water –in-oil mixture)
• Amplification (emulsified beads are clonally amplified)
• Bead recovery and enrichment
Sequencing
Clonally amplified fragments loaded onto a PicoTiter Plate device for sequencing (diameter of Plate wells allow only one bead per well)
After addition of sequencing enzymes, fluidics subsystem of sequencing instrument flows individual nucleotides in a fixed order across all wells
Addition of one (or more) nucleotide(s) complementary to the template strand results in a chemiluminescent signal recorded by the CCD camera within the instrument
During nucleotide flow, thousands of beads each carrying millions of copies of ss DNA molecule are sequenced in parallel
Each 10-h sequencing run will typically produce over 1,000,000 flowgrams (one flowgram per bead)
Base calling (to check quality of each read)
Trimming primer sequence
Production of contigs
NGS platform under development (3 rd Generation sequencers)
Aim single DNA molecule sequencing (without amplification)
Provides accurate data with long reads i) Flouresence based single molecule sequencing (Pacific Biosciences;
US Genomics) ii) Nano technologies for single molecule sequencing (Oxford Nanopore technologies, Nabsys, BioNanomatrix, Electronic Biosciences,
Cracker Bio) iii) Electronic detection for single molecule sequencing (Reveo, Intelligent
Biosystems) iv) Electron microscopy for single molecule sequencing (Light speed genomics, Halcyon Molecular, ZS Genetics)
Single Molecule Sequencing
(Helicos Biosciences, USA)
Billions of single molecules of sample DNA are captured on an applicationspecific proprietary surface serve as templates for the sequencing-by-synthesis
Polymerase and one fluorescently labeled nucleotide (C, G, A or T) are added.
The polymerase catalyzes the sequence-specific incorporation of fluorescent nucleotides into nascent complementary strands on all the templates.
After a wash step, which removes all free nucleotides, the incorporated nucleotides are imaged and their positions recorded.
The fluorescent group is removed in a highly efficient cleavage process, leaving behind the incorporated nucleotide.
The process continues through each of the other three bases.
Multiple four-base cycles result in complementary strands greater than 25 bases in length synthesized on billions of templates—providing a greater than 25-base read from each of those individual templates.
Single
Molecule
Sequencing
(Helicos
Biosciences,
USA)
Ion Sequencing
(Rothberg et al., Life technologies: Nature, July 2011)
Non-optical method of DNA sequencing of genomes
Sequence data obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip
The ion chip contains ion-sensitive, 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions.
Performance of the system showed by sequencing three bacterial and one human genome
World’s smallest solid state pH meter
DNA is fragmented, ligated to adapters, and clonally amplified onto beads.
Sequencing primers and DNA polymerase are then bound to the templates and pipetted into the chip’s loading port. Individual beads are loaded into individual sensor wells by spinning. Well depth will allow only a single bead to occupy a well
All four nucleotides are provided in a stepwise fashion during an automated run. When nucleotide in the flow is complementary to the template base directly downstream of the sequencing primer, the nucleotide is incorporated into the nascent strand by the bound polymerase.
This increases length of sequencing primer by one base (or more, if a homopolymer stretch is directly downstream of the primer) and results in the hydrolysis of the incoming nucleotide triphosphate, which causes the net liberation of a single proton for each nucleotide incorporated during that flow.
Release of proton produces a shift in pH of surrounding solution proportional to the no. of nucleotides incorporated in the flow (0.02 pH units per single base incorporation). This is detected by the sensor on the bottom of each well, converted to a voltage and digitized by off-chip electronics . The signal generation and detection occurs over 4 s
After the flow of each nucleotide, a wash is used to ensure nucleotides do not remain in the well.
Mining NGS data to obtain meaningful information
Average NGS experiment generates gigabytes to terabytes of raw data
Existing bioinformatics tools functions fit into several general categories:
(1) alignment of reads to a reference sequence (2) de novo assembly (3) reference-based assembly (4) genetic variation detection (such as SNV,
Indel) (5) genome annotation (6) utilities for data analysis.
The most important step in NGS data analysis is successful assembly or alignment of reads to a reference genome .
After successful alignment and assembly the next step is to interpret the large number of putative novel genetic variants (or mutations) present by chance
Recognition of functional variants is at the center of the NGS data analysis and bioinformatics