Next Generation Sequencing This lecture will give a brief history of DNA sequencing, the development of new next generation sequencing (NGS) technologies and the core principles of the methods. It will also discuss their applications in modern molecular genetic settings. Learning Outcomes On successful completion of the lecture, students should be able to: Describe the basis of the term “Next Generation Sequencing” with respect to Sanger sequencing. Describe the core principles of the Illumina NGS process, including detailed knowledge of the four-step process Library construction Cluster generation Sequencing by synthesis Data Analysis Describe in detail the principles of some applications of NGS, including Whole Exome Sequencing (WES) and RNA-seq
What is the history of DNA sequencing and its significance? (3)
Brief history: DNA sequencing has evolved from the early manual methods to automated technologies.
Key milestone: The Human Genome Project (1990–2003) used traditional Sanger sequencing to map the human genome.
Impact: It revolutionized genetics research, but was costly, taking billions of dollars to complete.
What was the Human Genome Project and what were its key features? (4)
Timeline: 1990–2003
Size of the Genome: 3 billion base pairs
Sequencing Method: Traditional Sanger Sequencing
Significance: The project unravelled the first human genome sequence, serving as a foundation for genetics research, with a high cost of billions of dollars.
How has DNA sequencing advanced since the Human Genome Project? (2)
Time Efficiency: What used to take years is now achievable in just a day.
Cost Reduction: The sequencing process is now more affordable compared to the initial billion-dollar investment.
What is Next Generation Sequencing (NGS) and how does it differ from traditional methods? (3)
NGS: A collection of modern sequencing technologies that allow for massively parallel sequencing.
Advantage: NGS is faster, cheaper, and more scalable than traditional methods like Sanger sequencing.
Application: Used in various fields including genomics, cancer research, and personalized medicine.
What is Polymerase Chain Reaction (PCR) and why is it important for DNA sequencing? (4)
Principle: PCR is used to amplify specific regions of DNA by copying the target sequence.
Process: Primers flank the target region, and each cycle of PCR doubles the DNA amount.
Role in Sequencing: Amplifies sufficient DNA for sequencing or other molecular applications.
Significance: It is a fundamental technique for enabling DNA sequencing.
Who invented Sanger Sequencing and when? (2)
Inventor: Fred Sanger
Year: 1977
What are the key components and principles of Sanger Sequencing? (5)
Cycle Sequencing: Sanger sequencing is based on cycle sequencing.
PCR-Based: It relies on a PCR product as input.
Primers: Specific primers are required to initiate the sequencing reaction.
Modified Nucleotides: Uses chain terminators and nucleotide-specific colour tags.
Purpose of Modification: A small proportion of nucleotides are modified to allow the sequence to be read base by base.
What are the limitations of Sanger Sequencing? (4)
Throughput: Low-throughput—one reaction equals one sequence.
Length: It can sequence up to 800 base pairs per reaction.
Speed: It is slow, taking more time compared to modern sequencing methods.
Cost: High cost, especially when compared to newer technologies.
What are the strengths of Sanger Sequencing? (2)
Accuracy: Sanger sequencing is highly accurate, with an error rate of 99.99%.
Reliability: It has been the gold standard in DNA sequencing until the late 2000s.
What are the main applications of Sanger Sequencing? (3)
SNPs and Mutations: Used to identify single nucleotide polymorphisms (SNPs) and mutations.
Monogenic Diseases: Commonly used for identifying mutations causing monogenic diseases.
Single Gene Testing: Frequently used for single-gene tests, such as CFTR in cystic fibrosis.
Picture demonstrating Sanger Sequencing Machine:
What are the technological advancements in DNA sequencing since the end of the Human Genome Project? (3)
Cost Reduction: The cost of DNA sequencing has dramatically decreased.
Rate of Decrease: Since 2007, the cost reduction has surpassed the rate of Moore's law (which predicts the doubling of computing power every two years).
Technological Advancements: Significant improvements have been made in sequencing technologies, increasing throughput and accuracy.
What was the significance of 454 pyrosequencing in the development of Next Generation Sequencing (NGS)? (3)
Introduction: 454 pyrosequencing marked the beginning of new NGS methods around 13 years ago.
Throughput: DNA sequencing throughput increased by 10 orders of magnitude compared to earlier methods.
Impact: This advancement set the stage for further improvements in sequencing technology, making high-throughput sequencing more accessible.
What is Solexa sequencing-by-synthesis (SBS) and how did it contribute to NGS development? (3)
Development: Solexa SBS was developed at the end of 2005.
Method: It is a sequencing-by-synthesis (SBS) method, which involves synthesizing complementary strands of DNA and detecting the bases in real-time.
Contribution: Solexa's technology formed the foundation for Illumina's SBS sequencing, a dominant method in the sequencing market today.
What is the current dominant method in DNA sequencing and why? (2)
Illumina SBS: Illumina's sequencing-by-synthesis (SBS) technology is the leading method in the sequencing market.
Reason for Dominance: Illumina's SBS provides high-throughput, cost-effective, and accurate sequencing, making it the preferred choice in research and clinical settings.
How has Next Generation Sequencing (NGS) impacted Sanger sequencing in the lab? (2)
Replacement: NGS has largely replaced Sanger sequencing for almost all sequencing tests in the lab.
Advantage: NGS is faster, more cost-effective, and offers higher throughput compared to Sanger sequencing.
What is whole genome sequencing (WGS) and how is it used in NGS? (3)
Definition: Whole genome sequencing (WGS) involves sequencing the entire DNA of an organism, including all its genes and non-coding regions.
Application: Used to comprehensively analyze genetic variations across the entire genome, including mutations and structural changes.
Significance: WGS provides complete genetic information, making it useful for diverse research, including personalized medicine and disease studies.
What is whole exome sequencing (WES) and how does it differ from whole genome sequencing? (4)
Definition: Whole exome sequencing (WES) focuses on sequencing the exons, which are the protein-coding regions of the genome.
Difference from WGS: Unlike WGS, which sequences the entire genome, WES targets only the exonic regions, which make up about 1-2% of the entire genome.
Efficiency: WES is more cost-effective than WGS while still providing important insights into genetic variations that may lead to diseases.
Application: Commonly used to study genetic disorders, as most disease-related mutations are found in the exons.
What are the core principles of Next-Generation Sequencing (NGS)? (4)
DNA Library Construction: DNA is prepared by chopping it into small fragments.
Cluster Generation: DNA fragments are amplified to form clusters.
Sequencing-by-Synthesis: A method used to sequence the DNA fragments.
Data Analysis: Sequencing data is processed and analyzed for genetic information.
What are the steps involved in performing Next-Generation Sequencing (NGS)? (4)
DNA Library Construction: DNA is fragmented, adapted, and prepared for sequencing.
Cluster Generation: Fragments are amplified to create clusters for sequencing.
Sequencing-by-Synthesis: DNA sequencing occurs through real-time synthesis and detection.
Data Analysis: The data is processed and analyzed for genetic information.
What is the process of DNA library construction in NGS? (3)
Fragmentation: DNA is chopped into small fragments, typically 300 base pairs long, through shearing.
Methods of Shearing: Shearing can be done chemically, enzymatically, or physically (e.g., sonication).
Library: A DNA library is created from random fragments, typically from a patient's blood, for further sequencing study.
What steps are involved in preparing the DNA fragments for NGS sequencing? (3)
End Repair: The ends of the sheared DNA fragments are repaired.
Adenine Overhangs: Adenine (A) nucleotides are added to the ends of the fragments.
Adapter Ligation: Adapters with Thymine (T) overhangs are ligated to the DNA fragments, creating a library of stable fragments.
What is the function of adapters in DNA library construction? (3)
Sequencing Primer Sites: Adapters contain regions where sequencing primers can bind.
Anchors for Attachment: Adapters contain P5 and P7 anchors that attach the DNA fragments to the flow cell.
Role in Sequencing: The adapters enable proper attachment of fragments to the sequencing platform for synthesis.
What is the purpose of cluster generation in Next-Generation Sequencing (NGS)? (2)
Amplification: Cluster generation amplifies DNA fragments to a larger size for easier visualization.
Stronger Signal: The amplification ensures that the DNA fragments produce a signal strong enough to be detected during sequencing.
How are DNA library fragments prepared for cluster generation in NGS? (3)
Hybridization: The DNA library fragments are hybridized to the flow cell surface.
Challenge: Individual DNA molecules are too small to be visualized directly, making amplification necessary.
Goal: To amplify the fragments to a size large enough for detection.
What is bridge amplification and how does it contribute to cluster generation? (3)
Bridge Amplification: A process where DNA fragments are amplified on the flow cell surface to form clusters.
Formation of Clusters: Each cluster originates from a single DNA molecule, leading to the creation of billions of clusters.
Visualization: The resulting clusters are now large enough to be visualized and used in the sequencing process.
What is the outcome of successful cluster generation in NGS? (2)
Large Clusters: Many billions of amplified DNA clusters are formed from individual DNA library molecules.
Sequencing Readiness: The clusters are now visible and the flow cell is ready to be loaded onto the sequencing platform for sequencing.
How is a flow cell used in the sequencing process of NGS? (3)
Sequencing Machine: The sequencing machine processes a flow cell containing lanes.
Multiple Samples: Each lane of the flow cell can hold multiple DNA samples, which are indexed with DNA barcodes contained in the adapters.
DNA Libraries: DNA libraries are deposited onto the flow cell, where they undergo the next steps of amplification and sequencing.
What happens to the DNA libraries once deposited onto the flow cell in NGS? (3)
Bridge Amplification: The DNA libraries are subjected to bridge amplification, where DNA fragments form clusters on the flow cell.
Cluster Formation: Amplification results in the formation of "clusters," each originating from a single DNA fragment.
Sequencing Readiness: Once clusters are formed, the flow cell is ready for sequencing.
What is sequencing-by-synthesis in NGS? (2)
Sequencing Process: Sequencing-by-synthesis (SBS) is the method used to sequence the DNA fragments in the clusters on the flow cell.
Synthesis: During SBS, nucleotides are incorporated into the growing DNA strand, and the sequence is read in real-time.
How does DNA barcoding play a role in NGS sequencing? (2)
Indexed Samples: DNA samples are indexed with a unique DNA barcode, which is included in the adapters.
Sample Identification: These barcodes allow for multiple samples to be sequenced in the same lane, helping to differentiate and identify each sample during analysis.
What are the key features of the Sequencing-by-Synthesis (SBS) method? (3)
Modified Bases: The four bases (A, T, C, G) are modified with chain terminators and different fluorescent dye colors.
Single Nucleotide Incorporation: DNA polymerase incorporates one nucleotide at a time into the growing DNA strand.
Controlled Process: The incorporation is done in a controlled manner, one nucleotide per cycle.
What happens during each cycle of Sequencing-by-Synthesis? (5)
Single Nucleotide Incorporation: DNA polymerase incorporates a single nucleotide into the growing strand.
Flowcell Wash: The flowcell is washed to prepare for the next cycle.
Imaging: The four bases are imaged using digital photography, detecting the fluorescent color of the incorporated base.
Cleavage: The terminator chemical group and fluorescent dye are cleaved by an enzyme.
Repeat: The process is repeated for each cycle to build the full sequence.
How are the bases imaged during Sequencing-by-Synthesis? (3)
Camera Imaging: A camera captures images of the four bases on the surface of the flow cell after each cycle.
Fluorescent Colors: Each base emits a distinct fluorescent signal, allowing identification of the incorporated nucleotide.
Conversion: Each cycle's image is converted into a nucleotide base call (A, C, G, or T).
How many cycles does Sequencing-by-Synthesis typically involve? (1)
Cycle Number: Sequencing typically involves between 50 to 600 cycles, depending on the desired sequence length.
What happens after sequencing-by-synthesis in the sequencing process? (2)
Machine DNA Base Calls: The sequencing machine makes DNA base calls, identifying the nucleotide (A, C, G, or T) for each base in the sequence.
Short-Read Sequences: Millions of short-read sequences are generated, each representing a fragment of the original DNA library.
Picture demonstrating Illumina Sequencers:
What is involved in the analysis of NGS data? (4)
Short-Read Assembly: Short-read sequences from the sequencing machine need to be pieced together, similar to solving a jigsaw puzzle.
Mapping to Reference Genome: Sequence reads are mapped to the reference genome to locate their positions.
Consensus Sequence: The goal is to generate a consensus sequence that represents the original DNA sample library.
Genetic Variant Detection: The consensus sequence is compared against a reference human genome to identify genetic variants.
How is the analysis of NGS data achieved? (2)
Dedicated Software: Specialized software tools are used to map and assemble the sequence reads.
Bioinformatics Tools: Bioinformatics tools are employed to compare the consensus sequence to the reference genome and detect variants.
What are the main differences between Next-Generation Sequencing (NGS) and Sanger Sequencing? (4)
Readout Type: NGS produces a digital readout, whereas Sanger sequencing produces an analogue readout.
Number of Sequences: Sanger sequencing produces one sequence read, while NGS generates a consensus sequence from multiple reads.
Speed and Throughput: NGS offers high throughput and faster sequencing, whereas Sanger sequencing is slower and more limited in output.
Cost: NGS is generally more cost-effective for large-scale sequencing compared to Sanger sequencing, which is more expensive per read.
What is Whole Exome Sequencing (WES) and why is it used? (4)
Human Genome Overview: The human genome contains approximately 21,000 genes.
Focus on Exons: WES targets the exons of genes, which make up 1-2% of the genome and are responsible for protein coding.
Pathogenic Mutations: Around 80% of pathogenic mutations are found in protein-coding regions (exons).
Cost-Effectiveness: Sequencing the exome is more efficient and cheaper than sequencing the whole genome, with the exome costing £200-£300 compared to £1,000 for a full genome.
How is Whole Exome Sequencing (WES) achieved? (3)
Target Enrichment: Specific regions of interest (exons) are captured using probes (baits).
Capture Technique: Baits are designed to bind to the target regions, enriching the sample for the exonic sequences.
Exome Size: The exome typically covers around 50Mb of the genome, which is much smaller than the entire genome (~3 billion base pairs).
What are the steps involved in exome data analysis? (5)
Variant Calling: Identifying genetic variants such as single nucleotide polymorphisms (SNPs) or insertions/deletions (indels) in the sequencing data.
Variant Annotation: Providing functional context to variants, such as whether they affect protein coding sequences or regulatory regions.
Sequence Read Alignment: Aligning the short sequence reads from the sequencing machine to the reference genome to identify where the reads map.
Target Coverage: Evaluating how well the target regions (exons) are covered by the sequencing process.
Reporting: Generating reports that summarize the identified variants, their potential impact, and how they relate to known databases.
What is the focus of Whole Exome Sequencing? (2)
Protein Coding Mutations: Whole exome sequencing primarily targets mutations within the exons, which are the regions of the genome that code for proteins.
Exons: These are the sequences that will be analyzed because most pathogenic mutations occur within these coding regions.
How is a patient’s DNA used in Whole Exome Sequencing? (3)
Exome Sequencing: The patient’s DNA sample is subjected to exome sequencing to capture and analyze the protein-coding regions.
Consensus Sequence: After sequencing, a consensus sequence is generated, representing the most likely sequence of the patient’s exomes.
Mutation Identification: The sequencing results are analyzed to identify any mutations or genetic variations, such as a heterozygous mutation in a gene like CFTR.
What does a heterozygous mutation indicate? (2)
Heterozygous Mutation: This means one copy of a gene has a mutation, while the other copy is normal. This can influence the patient’s health depending on the gene and mutation.
Example: In the CFTR gene, a heterozygous mutation could indicate a genetic predisposition to cystic fibrosis.
How is Exome Sequencing applied in disease research? (4)
Disease-Affected Families: Exome sequencing is particularly useful when studying families with individuals affected by a genetic disease.
NGS in Disease Gene Identification: Next-Generation Sequencing (NGS) is used to identify the disease-causing genes in affected individuals.
Variant Profiling: Exome sequencing is performed on affected individuals, and their genetic variants are compared.
Mutation Identification: By comparing the variants, researchers attempt to identify the shared mutation that could be responsible for the disease.
What is RNA-seq and how does it work? (3)
RNA-seq Purpose: RNA-seq is a technique used to study gene expression by analyzing the RNA from cells or tissue samples.
cDNA Conversion: Before sequencing, RNA is converted into complementary DNA (cDNA) to create a library of the sample.
NGS for RNA: NGS is then used to sequence the cDNA, enabling the determination of which genes are actively expressed.
How is gene expression quantified in RNA-seq? (3)
Sequencing Reads: The number of sequencing reads produced from each gene indicates the abundance of that gene’s expression.
Quantification of Expression: This helps quantify the expression levels of thousands of genes in the sample.
Gene Expression Differences: By comparing sequencing results, RNA-seq can identify differences in gene expression across various experimental conditions.
What can RNA-seq reveal about gene isoforms? (2)
Gene Isoforms: RNA-seq can help discover distinct isoforms of genes, which are variations of a gene that may have different functions.
Differential Regulation: It can also show how different isoforms are differentially regulated and expressed under various conditions.