Single Molecule Sequencing

advertisement

Next generation sequencing: an overview

A I Bhat

Indian Institute of Spices Research

Calicut

DNA sequencing

• Chain termination method

(Sangers et al., 1977): In this method, the sequence of a single stranded DNA molecule is determined by enzymatic synthesis of complementary polynucleotide chains, these chains terminating at specific nucleotide positions.

• The chemical degradation method (Maxum and

Gilbert, 1977), in which the sequence of a double stranded DNA molecule is determined by treatment with chemicals that cut the molecule at specific nucleotide positions

Chain termination method

Dye-terminator sequencing

• Utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction

• Each of the four dideoxynucleotide chain terminators is labelled with different fluorescent dyes (ddA Green, ddT

Red, ddG Yellow and ddC Blue), each of which with different wavelengths of fluorescence and emission .

• The fragment stopping at the base position can be detected on the gel by a powerful laser beam.

• Owing to its greater expediency and speed, dyeterminator sequencing is now the mainstay in automated sequencing .

Capillary electrophoresis

View of dyeterminator read

Sanger method can sequence only 1000–1200 bp in one reaction

Genome sequencing

1970s: Bacteriophage

1995, the bacterium Haemophilus influenzae

Followed by several other bacteria and archaea

The first eukaryotic chromosome sequence in 1992: yeast

Many eukaryotes several plants and their pathogens

2006: Human genome

Until 2006, all genome sequencing used Sanger chemistry

Shotgun sequencing

Human Genome Project

Genomic DNA is enzymatically or mechanically broken down

Cloned into sequencing vectors

Sequenced individually

Numerous fragments of DNA sequenced – BIRTH OF GENOME

INFORMATICS AND NEXT GENERATION SEQUENCING

Whole genome sequencing

The core philosophy of massive parallel sequencing used in next-generation sequencing (NGS) is adapted from shotgun sequencing

NGS -breaking the entire genome into small pieces

Ligating DNA to designated adapters

DNA synthesis (sequencing-by-synthesis) massively parallel sequencing

Coverage (number of short reads that overlap each other within a specific genomic region)

Sufficient coverage is critical for accurate assembly of the genomic sequence.

To ensure the correct identification of genetic variants, short-read coverage of at least 30× is recommended in whole-genome scans

(Zhang et al., 2011. J Genet Genomics, 38:95-109)

Next generation sequencing

• Enables a genome to be sequenced within hours to days .

• The 454 FLX Pyrosequencer from Roche Applied Sciences was the first next-generation sequencer to become commercially available in 2004,

• The Solexa 1G Genetic Analyzer from Illumina was commercialized

2006

• SOLiD (Supported Oligonucleotide Ligation and Detection) System from

Applied Biosystems launched in 2007

Next-next generation or third generation sequencing

• Single molecule sequencing

Platforms on NGS technologies

Technology Amplification Read length

Throughput Sequence by synthesis

Currently available

Roche/GS-FLX Titanium

Illumina/HiSeq 2000, HiScan

ABI/SOLiD 5500xl

Polonator/G.007

Helicos/Heliscope

In development

Pacific BioSciences/RS

Visigen Biotechnologies

U.S. Genomics

Genovoxx

Oxford Nanopore Technologies

NABsys

E lectronic BioSciences

Emulsion PCR

Bridge PCR (Cluster

PCR)

Emulsion PCR

Emulsion PCR

No

400-600 500 bp

2 x 100 bp

50-100 bp

26 bp

Mbp/run

200

Gbp/run

>100

Gbp/run

8-10

Gbp/run

35 (25-

55) bp

21e37

Gbp/run

Pyrosequencing

Reversible terminators

Sequencing-by-ligation

(octamers)

Sequencing-by-ligation

(monomers)

True single-molecule sequencing (tSMS)

No

Single-molecule real time

(SMRT)

No

No

No

No

No

No

1000 bp N/A

>100

Kbp

N/A

N/A

N/A

N/A

N/A

35 bp

N/A

N/A

N/A

N/A

N/A

Single-molecule mapping

Single-molecule sequencing by synthesis

Nanopores/exonucleasecoupled

Nanopores

Nanopores

BioNanomatrix/nano analyzer

No

GE Global Research No

IBM

LingVitae

Complete Genomics base 4 innovation

CrackerBio

Reveo

I ntelligent BioSystems lLightSpeed Genomiics

No

No

No

No

No

No

No

400 Kbp N/A Nanochannel arrays

N/A N/A

N/A

N/A

N/A

N/A

70 bp N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

Closed complex/nanoparticle

Nanopores

Nanopores

DNA nanoball arrays

Nanostructure arrays

Nanowells

N/A

N/A

Direct-read sequencing by EM

Nano-knife edge

Electronics

Next (2

nd

) generation platforms

3130XL

Applied Biosystem

700bpx96

Specific targets

(PCR products,clones)

GS-FLX-Titanium

Roche

400bp x1 million

De novo sequencing

Genome Analyser

Illumina

100bp x 2 billion

Re-sequencing

(can de novo sequencing)

SOLiD

Applied Biosystem

50bp x 2.4 billion

Re-sequencing

(can de novo sequencing)

Roche GS-FLX 454 Genome Sequencer

Longest short reads (600 bp) among all the NGS platforms

Generates ~400 –600 Mb of sequence reads per run de novo assembly of microbes in metagenomics

Raw base accuracy reported is very good (over 99%)

Chemistry

• Nucleotide incorporation releases pyrophosphate (PPi)

• ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5 ´ phosphosulfate.

• This ATP acts as fuel to the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP.

• The light produced in the luciferase-catalyzed reaction is detected by a camera and analyzed in a program.

• Unincorporated nucleotides and ATP are degraded by the apyrase , and the reaction can restart with another nucleotide .

Illumina/Solexa Genome Analyzer

Superior data quality and proper read lengths have made it the system of choice for many genome sequencing projects.

Majority of published NGS papers used Genome Analyzer. uses a proprietary reversible terminator-based method that enables detection of single bases as they are incorporated into growing DNA strands

A fluorescently-labeled terminator is imaged as each dNTP is added and then cleaved to allow incorporation of the next base.

Since all four reversible terminator-bound dNTPs are present during each sequencing cycle, natural competition minimizes incorporation bias.

The end result is true base-by-base sequencing that enables the industry’s most accurate data for a broad range of applications.

Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk

Solexa-based Whole Genome Sequencing

ABI SOLiD platform

The latest model, 5500 ×l solid system (previously known as SOLiD4hq)

Can generate over 2.4 billion reads per run with a raw base accuracy of

99.94%

The SOLiD4 platform probably provides the best data quality as a result of its sequencing-by-ligation approach but the DNA library preparation procedures prior to sequencing can be tedious and time consuming.

Preferred for Re-sequencing than denovo sequencing.

(Zhang et al., 2011)

Next generation sequencing using Roche 454

Sample Preparation

Nucleic acid isolation

Double-stranded cDNA synthesis

Rapid library preparation

Fragmentation (Nebulization/ shearing) into smaller sized fragments of 400 to 1000 bp

Addition of adopters

Remove small fragment (<300 bp)

Library Quality Assessment

Emulsion based clonal amplification (emPCR)

• Preparation of reagents and of emulsion oil

• Preparation of amplification mix (addition of additive, amplification mix, primers, enzyme mix and PPiase)

• DNA library capture (one molecule of DNA per bead and one bead per aqueous microreactor to be insulated from other beads by surrounding oil.

• Emulsification (shaking captured library to form a water –in-oil mixture)

• Amplification (emulsified beads are clonally amplified)

• Bead recovery and enrichment

Sequencing

Clonally amplified fragments loaded onto a PicoTiter Plate device for sequencing (diameter of Plate wells allow only one bead per well)

After addition of sequencing enzymes, fluidics subsystem of sequencing instrument flows individual nucleotides in a fixed order across all wells

Addition of one (or more) nucleotide(s) complementary to the template strand results in a chemiluminescent signal recorded by the CCD camera within the instrument

During nucleotide flow, thousands of beads each carrying millions of copies of ss DNA molecule are sequenced in parallel

Each 10-h sequencing run will typically produce over 1,000,000 flowgrams (one flowgram per bead)

Base calling (to check quality of each read)

Trimming primer sequence

Production of contigs

NGS platform under development (3 rd Generation sequencers)

Aim single DNA molecule sequencing (without amplification)

Provides accurate data with long reads i) Flouresence based single molecule sequencing (Pacific Biosciences;

US Genomics) ii) Nano technologies for single molecule sequencing (Oxford Nanopore technologies, Nabsys, BioNanomatrix, Electronic Biosciences,

Cracker Bio) iii) Electronic detection for single molecule sequencing (Reveo, Intelligent

Biosystems) iv) Electron microscopy for single molecule sequencing (Light speed genomics, Halcyon Molecular, ZS Genetics)

Single Molecule Sequencing

(Helicos Biosciences, USA)

Billions of single molecules of sample DNA are captured on an applicationspecific proprietary surface serve as templates for the sequencing-by-synthesis

Polymerase and one fluorescently labeled nucleotide (C, G, A or T) are added.

The polymerase catalyzes the sequence-specific incorporation of fluorescent nucleotides into nascent complementary strands on all the templates.

After a wash step, which removes all free nucleotides, the incorporated nucleotides are imaged and their positions recorded.

The fluorescent group is removed in a highly efficient cleavage process, leaving behind the incorporated nucleotide.

The process continues through each of the other three bases.

Multiple four-base cycles result in complementary strands greater than 25 bases in length synthesized on billions of templates—providing a greater than 25-base read from each of those individual templates.

Single

Molecule

Sequencing

(Helicos

Biosciences,

USA)

Ion Sequencing

(Rothberg et al., Life technologies: Nature, July 2011)

Non-optical method of DNA sequencing of genomes

Sequence data obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip

The ion chip contains ion-sensitive, 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions.

Performance of the system showed by sequencing three bacterial and one human genome

World’s smallest solid state pH meter

DNA is fragmented, ligated to adapters, and clonally amplified onto beads.

Sequencing primers and DNA polymerase are then bound to the templates and pipetted into the chip’s loading port. Individual beads are loaded into individual sensor wells by spinning. Well depth will allow only a single bead to occupy a well

All four nucleotides are provided in a stepwise fashion during an automated run. When nucleotide in the flow is complementary to the template base directly downstream of the sequencing primer, the nucleotide is incorporated into the nascent strand by the bound polymerase.

This increases length of sequencing primer by one base (or more, if a homopolymer stretch is directly downstream of the primer) and results in the hydrolysis of the incoming nucleotide triphosphate, which causes the net liberation of a single proton for each nucleotide incorporated during that flow.

Release of proton produces a shift in pH of surrounding solution proportional to the no. of nucleotides incorporated in the flow (0.02 pH units per single base incorporation). This is detected by the sensor on the bottom of each well, converted to a voltage and digitized by off-chip electronics . The signal generation and detection occurs over 4 s

After the flow of each nucleotide, a wash is used to ensure nucleotides do not remain in the well.

Sequencing methods

Mining NGS data to obtain meaningful information

Average NGS experiment generates gigabytes to terabytes of raw data

Existing bioinformatics tools functions fit into several general categories:

(1) alignment of reads to a reference sequence (2) de novo assembly (3) reference-based assembly (4) genetic variation detection (such as SNV,

Indel) (5) genome annotation (6) utilities for data analysis.

The most important step in NGS data analysis is successful assembly or alignment of reads to a reference genome .

After successful alignment and assembly the next step is to interpret the large number of putative novel genetic variants (or mutations) present by chance

Recognition of functional variants is at the center of the NGS data analysis and bioinformatics

Thanks

Download