MOLECULAR BIOLOGY – PCR, sequencing, Genomics MOLECULAR BIOLOGY TECHNIQUES II. Polymerase Chain Reacton – PCR DNA sequencing MOLECULAR BIOLOGY – PCR Amplification of specific DNA fragments Synthetically derived DNA Cloning and/ or isolation from a genomic library Both possible but not the most convenient of methods e.g. cost and/ or labour intensive MOLECULAR BIOLOGY – PCR Polymerase Chain Reaction (PCR) A mechanism to exponentially amplify a specific DNA fragment in a test tube, using the principles of specific DNA base-pairing and DNA replication and employing these in repeated cycles THERMAL CYCLING • DNA containing fragment to be amplified (e.g. genomic DNA or cDNA) • Two oligonucleotide primers (ss) specific to DNA sequence of desired fragment* ~94oC - Denaturation step • Purified DNA polymerase (Klenow frag.) ~60oC - Primer annealing step • deoxyribonucleotide triphosphates (dNTPs) 37oC - Extension step x25-35 • Buffer solution (with required Mg2+ and K+ cations) REPEATED THERMAL CYCLING - initiates new rounds of DNA replication that can use the products of the previous round as template, thus exponentially amplifying the target DNA fragment * The oligonucleotide primer sequences must be complementary to DNA sequence flanking the fragment to be amplified and match with DNA sequence from the opposing strands of that fragment - see next slide MOLECULAR BIOLOGY – PCR 5’ 3’ 3’ 5’ 5’ dsDNA FRAGMENT TO BE AMPLIFIED EXTENSION - 37oC DENATURATION 94°C (Klenow) 3’ DNApol primer primer ANNEALING ~60oC DNApol 3’ 5’ DENATURATION 5’ 3’ DENATURATION 3’ 5’ DENATURATION With each repeated THERMAL CYCLE (denaturation, annealing & extension) the amount of target dsDNA doubles MOLECULAR BIOLOGY – PCR PCR’s DNApol problem ! Primitive PCR machine (3 water baths) DNApol (Klenow fragment) is killed by the heat THERMAL CYCLING INITIAL DENATURATION DENATURATION ANNEALING e.g. x30 EXTENSION 94oC 60oC 37oC TERMINAL EXTENSION Expensive Klenow had to be added after every thermal cycle ! Yellowstone National Park Thermal Springs Isolation of thermophillic bacteria: Thermophillus aquaticus (50-80oC) Has an extremly heat stable (t1/2 >40 mins at 95oC) DNA polymerase Taq polymerase ideally suited to PCR! MOLECULAR BIOLOGY – PCR Thermostable DNA polymerases and PCR The isolation of Taq polymerase permitted the automation of PCR thermal cycling as fresh DNApol did not need to be added after every cycle ! HOWEVER: Taq polymerase lacks a proofreading activity (3‘-5‘ exonuclease) and high error rate DNA polymerase Klenow Taq polymerase Pfu polymerase error rate (misincorporated nucleotide) 1: 50 000 1: 9 000 1: 1 300 000 ! ! ! Stratgene inc. isolated a DNA polmerase from the hyperthermophilic archae (primitive bacteria) Pyrococcus furiosus found in the marine sediment associated with ocean thermal vents Pfu polymerase is extremely heat stable (Pyrococcus furiosus optimum growth temperature is 100oC) Crucially Pfu polymerase has proof-reading activity and has the lowest error rate of any known thermostable polymerase Pfu polymerase is IDEALLY suited for PCR applications where high fidelity amplification of DNA is required (although more expensive than Taq polymerase) MOLECULAR BIOLOGY – PCR A typical PCR protocol Template DNA, sequence specific sense and antisense oligonucleotide primers, thermostable DNApol (e.g. Taq or Pfu), dNTPs & PCR buffer STEP TEMP INITIAL DENATURATION 94-96oC TIME NOTES 2-3 mins. ensures all template DNA is single stranded (some DNApol require ‘hot-start’ for activation e.g. Pfu) DENATURATION 94-96oC 0.5-2 mins. longer denaturation will ensure more single stranded DNA and better efficiency at cost of enzyme stability ANNEALING ~60oC 0.5-2 mins. Higher temperature increase product specificity (less chance of mismatches forming) but lowers potential yield. 15-25oC < melting temperature Tm of annealed primer EXTENSION ~72oC ~1 min/kb Taq processivity = 150 nucleotide per second (Pfu slower) TERMINAL EXTENSION ~72oC 5-10 mins. Allows any incomplete products get finished x25-30 MOLECULAR BIOLOGY – PCR ‘Invention’ of PCR KARY B. MULLIS Journal of Molecular Biology Volume 56, Issue 2 , 14 March 1971, Pages 341-361 Studies on polynucleotides XCVI. Repair replication of short synthetic DNA's as catalyzed by DNA polymerases K. Kleppe‡, E. Ohtsuka§, R. Kleppe‡, I. Molineux|| and H. G. Khorana|| Institute for Enzyme Research of the University of Wisconsin, Madison, Wisc. 53706, U.S.A. Received 20 July 1970. Cetus Corporation 1983 PCR discovery 1985 published, patent pending 1987 patented 1993 Nobel prize Dr. Kjell Kleppe H.G. Khorana Mullins would have been ‘aware’ of the work of Kleppe and Khorana. Although their method did not amplify DNA it is generally accepted their research was a ‘primer’ for PCRs discovery MOLECULAR BIOLOGY – PCR ‘Polymerase chain reaction (PCR)’ amplification of DNA - video/ tutorial http://www.sumanasinc.com/webcontent/animations/content/pcr.html MOLECULAR BIOLOGY – PCR Experimental uses of PCR Introduction of specific and useful DNA sequences Sequence specific (i.e. complementary) DNA oligonucleotide primer with non-complementary yet useful 3’ sequence PCR Incorporation of useful DNA sequence into PCR product EPITOPE TAG Generation of restriction enzyme sites for cloning Addition of extra protein coding DNA sequence for a ‘tag’ that can be used experimentally to detect or purify a protein MOLECULAR BIOLOGY – PCR Experimental uses of PCR Introduction of specific mutations within recombinant DNA ‘directed mutagenesis’ Mutagenic primer T 5‘ TGCTGTGATGT GCTGATGCTGAATGC 3‘ 3‘ CGCACGACACTACATCGACTACGACTTACGACGCTACAAGTTCATGAC 5‘ R T T L H R L R L T T L Q V H D Q Protein coding DNA sequence (cDNA) MOLECULAR BIOLOGY – PCR Experimental uses of PCR Degenerate PCR MOLECULAR BIOLOGY – PCR Experimental uses of PCR Nested PCR: two rounds of consecutive PCR using a second pair of primers with annealing sites within the products produced by the first pair of primers Some DNA fragments can sometimes be difficult to amplify by PCR - (potential secondary structures or spurious products arising from primers binding other on-target DNA). Nested PCR will increase the yield of true target DNA MOLECULAR BIOLOGY – PCR Experimental uses of PCR Detecting SNPs by PCR GCTGTGATGTAGCTGATGCTGAATG 3’TCGATCGCACGACACTACATCGACTACGACTTAAGACGCTACAA’5 SNP-specific primer amplification GCTGTGATGTAGCTGATGCTGAATGCTGCGATGTT 3’TCGATCGCACGACACTACATCGACTACGACTTACGACGCTACAA’5 SNP Detection of SNPs is important for: • diagnosing certain genetic diseases arising from ‘point mutation’ e.g. sickle cell anaemia (Hb gene E6V) • identifying linkage traits e.g. SNPs in the Apolipoprotein E are associated with increased risk of Alzheimer’s diseas MOLECULAR BIOLOGY – PCR Inverse PCR DNA digested with restriction enzyme not cutting in known region A method to amplify a particular DNA region (e.g. containing a gene) with only partial sequence information N.B. relies on being able to cut DNA with ‘restriction’ enzymes that only cut at specific DNA sequences - see lecture 8 Generated compatible ends are ligated into a circle DNA re-linearised by digestion with a restriction enzyme recognising a site within the know sequence PREVIOUSLY UNKNOWN DNA SEQUENCE CAN BE DETERMINED BY SEQUENCING FROM KNOWN FLANKS Unknown DNA can know be PCR amplified using primers specific to the known sequence Unknown DNA can know be PCR amplified using primers specific to the known sequence at each end DNA SEQUENCE WILL REVEAL WHERE UNKNOWN FRAGMENTS WHERE ORIGINALLY LIGATED (i.e. LEFT AND RIGHT) MOLECULAR BIOLOGY – PCR MICROSATELLITE SEQUENCES Sequence repeats: (A)n (CA)n (CAG)n (CAGT)n Variable Number of Tandem Repeats (VNTR) a 5’ 3’ 3’ 5’ b 5’ 3’ 3’ 5’ AFLP – amplified fragment length polymorphism DNA fingerprinting MOLECULAR BIOLOGY – PCR Experimental uses of PCR Reverse Transcription PCR (RTPCR) mRNA 5’ CCGAGTAGCTAGGAACTGATGAATGTCGATCGCACGACACTACATCGACTACGACTTAAGACGCTACAATCGATCGCACGACACTACATCG ACTACGACTTACGACGCTACAATTGAGGTCGATGA...CCCCATGAGGGTGTGACCCGACATGACATGACATTGAGGCACAAATCAATGTA 3’ GAAAAAAAAAAAAAAAAAAAAAAAAAA TTTTTTTTTTT Reverse transcription cDNA 5’ TTTTTTTTTTTTTTTTTTTTTTTTTCTACATTGATTTGTGCCTCAATGTCATGTCATGTCGGGTCACACCCTCATGGGG. . . TCATCGACCTCAATTGTAGCGTCGTAAGTCGTAGTCGATGTAGTGTCGTGCGATCGATTGTAGCGTCTTAAGTCGTAGTCGATGTAGTGTC G 3’ Normal PCR TGCGATCGACATTCATCAGTTCCTAGCTACTCGG Presence of DNA product reveals presence of mRNA in the original sample However, more quantitative rather than qualitative results maybe required MOLECULAR BIOLOGY – PCR Real-time PCR (Quantitative PCR or Q-PCR) product General PCR kinetics 1. 2. Plateau due to exhaustion of reagents Measurements of abundance must be taken in the exponential phase of the PCR PCR cycles If the number of PCR cycles used were not in the exponential phase, one could mistake samples 1. and 2. of being of equal concentration Continuous measurement of product synthesis would be preferable i.e measurements in ‘real time’ MOLECULAR BIOLOGY – PCR Real-time PCR (Quantitative PCR or Q-PCR) SYBR green-based Q-PCR assay • ds DNA intercalating dye • fluoresces green under blue light • only emits fluorescence when bound to double stranded DNA Under PCR cycling conditions denaturation annealing SYBR green fluorescence can be measured at the end of either the annealing* or extension steps after every PCR cycle and used to calculate the amount of DNA in the sample extension * Measurements usually taken at the end of the primer annealing step MOLECULAR BIOLOGY – PCR ‘Real-time PCR (Q-PCR)’ using SYBR green-based assay - video/ tutorial click on this link http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/real-time-pcr.html MOLECULAR BIOLOGY – PCR Real-time PCR (Quantitative PCR or Q-PCR) Fluorescent hybridisation probe based methods (e.g. TaqMan probes) DNA sequence complementary to DNA sequence of target molecule + other PCR reagents Fluorescent reporter group Fluorescence quencher At each ANNEALING step, probe and primers hybridises with target/ product DNA Molecular proximity of quencher prevents reporter fluorescence During EXTENSION step the annealed probe is digested by Taq DNApol (5’ - 3’ exonuclease activity) Reporter fluorescence no longer quenched and used to quantify the DNA present MOLECULAR BIOLOGY – PCR ‘Real-time PCR (Q-PCR)’ using fluorescent molecular probes - video/ tutorial http://www.biosearchtech.com/support/videos/realtime-pcr-probe-animation-video.aspx http://www.scanelis.com/webpages.aspx?rID=679 MOLECULAR BIOLOGY – sequencing DNA SEQUENCING (i.e. determining the order of the four possible deoxynucleotides in one of the DNA strands and by inference the order on the other strand) MOLECULAR BIOLOGY – sequencing Dideoxynucleotide trisphosphate chain terminator/ Sanger DNA sequencing DNA backbone comprises phosphodiester bonds between the 5’ and 3’ carbon atoms of the deoxyribose moeities of consecutive deoxynucleotides Addition of an additional deoxynucleotide to a growing DNA strand, during DNA synthesis, requires a free 3’-OH group However, incorporation of a chemically modified dideoxynucleotide (ddNTP), lacking a 3’-OH group, would prevent additional polymerisation and hence TERMINATE DNA synthesis Sanger realised such ‘chain termination’ could be exploited to reveal the sequence of a specific/ target DNA molecule, but how? MOLECULAR BIOLOGY – sequencing Dideoxynucleotide trisphosphate chain terminator/ Sanger DNA sequencing DNApol ddGTP is radioactively labelled 5’-CTGGGATACTGTACTAGC-3’ 3’-GGACCCTATGACATGATCGATGAATTGGAAACTAGCTAGATCGGCACGAG-5’ 5’-CTGGGATACTGTACTAGCACTTAACCTTTG 3’-GGACCCTATGACATGATCGATGAATTGGAAACTAGCTAGATCGGCACGAG-5’ 5’-CTGGGATACTGTACTAGCACTTAACCTTTGATCG 3’-GGACCCTATGACATGATCGATGAATTGGAAACTAGCTAGATCGGCACGAG-5’ dGTP ddGTP dTTP dATP 5’-CTGGGATACTGTACTAGCACTTAACCTTTGATCGATCTAG 3’-GGACCCTATGACATGATCGATGAATTGGAAACTAGCTAGATCGGCACGAG-5’ 5’-CTGGGATACTGTACTAGCACTTAACCTTTGATCGATCTAGCCG 3’-GGACCCTATGACATGATCGATGAATTGGAAACTAGCTAGATCGGCACGAG-5’ dCTP Target DNA, oligonucleotide primer & DNApol Generation of a series of differently sized fragments synthesised from the target DNA molecule that all end with radio-labelled dideoxy-G (specified by C in the target DNA) MOLECULAR BIOLOGY – sequencing Repeat reaction using the three other radio-labelled ddNTPS G A T dGTP dGTP dGTP ddGTP dTTP dATP dCTP Target DNA, oligonucleotide primer & DNApol ddATP dTTP dATP dCTP Target DNA, oligonucleotide primer & DNApol ddTTP dTTP dATP dCTP Target DNA, oligonucleotide primer & DNApol C dGTP ddCTP dTTP dATP dCTP Target DNA, oligonucleotide primer & DNApol Now have a complete population of varying length DNA fragments (at one base-pair resolution), derived from target DNA, that end with one of four radio-labelled dideoxynucleotides MOLECULAR BIOLOGY – sequencing G A T C ACTTAACCTTTGATCGATCTAGCCG ACTTAACCTTTGATCGATCTAGCC ACTTAACCTTTGATCGATCTAGC ACTTAACCTTTGATCGATCTAG ACTTAACCTTTGATCGATCTA ACTTAACCTTTGATCGATCT ACTTAACCTTTGATCGATC ACTTAACCTTTGATCGAT ACTTAACCTTTGATCGA ACTTAACCTTTGATCG ACTTAACCTTTGATC ACTTAACCTTTGAT ACTTAACCTTTGA ACTTAACCTTTG ACTTAACCTTT ACTTAACCTT ACTTAACCT ACTTAACC ACTTAAC ACTTAA ACTTA ACTT ACT AC A + polyacrylamide DNA sequencing gel autoradiography film Read off DNA sequence from bottom to top (5’-3’ on newly synthesised strand). Reverse complement for the other strand MOLECULAR BIOLOGY – sequencing ‘Dideoxynucleotide trisphosphate chain terminator/ Sanger DNA sequencing’ principle - videos/ tutorials http://spine.rutgers.edu/cellbio/assets/flash/dideoxy.htm http://smcg.cifn.unam.mx/enp-unam/03EstructuraDelGenoma/animaciones/secuencia.swf MOLECULAR BIOLOGY – sequencing Automation of the Sanger DNA sequencing method using fluorescently labelled ddNTPs Each ddNTP varient is conjugated to a specific fluorescent group (ddGTP, ddCTP, ddATP and ddTTP) allowing the 4 reactions to be pooled in one tube and the electrophoresed in the same lane The specific fluorescence signature of each band informs which nucleotide is at that position in the target DNA Process can be highly automated using ‘capillary tube electrophoresis’ coupled to automatic fluorescence detectors (~1Kb max) Automatic DNA sequence analyzers Principle of automated DNA sequencing detector capillary electrophoretic tubing MOLECULAR BIOLOGY – sequencing How to sequence a human genome - video/ tutorial Featuring a description of automated fluorescence based DNA sequencing http://www.wellcome.ac.uk/Education-resources/Teaching-and-education/Animations/DNA/WTDV026689.htm MOLECULAR BIOLOGY – PCR, sequencing Why not try to deduce the sequence of larger segments of DNA . . . Genes . . . Chromosomal regions . . . Whole Chromosomes . . . Entire genomes? MOLECULAR BIOLOGY – PCR, sequencing 1990 Human Genome Project (HGP) Complete sequencing of the whole human genome within 15 years MOLECULAR BIOLOGY – PCR, sequencing Whole Genome Shotgun DNA Sequencing Human genome (blood donors) Isolation of genomic DNA Cloning of the genomic DNA fragments (i.e. to build a genomic DNA library; consisting of BACs - 200Kb) Mapping BACs to known sequence markers (i.e. identify from what part of the genome does the BAC come from)? MOLECULAR BIOLOGY – PCR, sequencing Whole Genome Shotgun DNA Sequencing Mapped BACs (i.e. in correct order on chromosome) Fragmentation of BAC clones and BAC sub-clone libraries (typically cloned into bacteriophage; ~2Kb) Sanger-based sequencing of the sub-clones (from either end) Sequence alignment of overlapping sequences from various subclones to reconstitute the entire BAC DNA sequence MOLECULAR BIOLOGY – PCR, sequencing Whole Genome Shotgun DNA Sequencing Repeated iterations of sub-clone sequencing (to give sequence depth i.e. confidence) and BAC reconstitution, for all the BACS covering the entire genome. Publication of a draft sequence in 2000 and a complete sequence in 2003 ! Human genome rich in repetitive sequences: ??? AAAAAAAAAAAAAAAAAAAAAAAA GTCCTGCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAGCTTGGCTCACATAGT J. Craig Venter Francis Collins President William J. Clinton Now many hundreds of different species’ genomes have been shotgun sequenced MOLECULAR BIOLOGY – PCR, sequencing The politics of sequencing the human genome !!! Founded as an international publicly funded consortium effort to sequence all the bases of the human genome with 15 years at a cost of $3 billion Aimed to provide free and open access to all the data as a resource for research biologists During the 1990’s a number of groups had placed patents on genes that they had cloned, setting a commercial precedent/ incentive to whole genome sequencing MOLECULAR BIOLOGY – PCR, sequencing J. Craig Venter – founder of ‘CELERA Genomics’ 1998 launched a commercial bid to sequence human genome and secure gene patents $$$$$ Thus, the start of a race to publish the complete genome sequence between Celera and the publicly funded HGP begun. It was eventually decided that patents on genes were not legal but both projects ended up publishing at the same time MOLECULAR BIOLOGY – PCR, sequencing How the genome was ‘won’ for all of humanity and not for ‘profit’ ! Storage of the human genome DNA sequence (3.3 billion base-pairs) 3300 books of 1000 pages with 1000 bp per page 1 data CD (786 Mb; 2bits per bp) MOLECULAR BIOLOGY – Genome sequencing How to sequence a human genome by shotgun sequencing - video/ tutorial http://www.genome.gov/19519278#al-3 MOLECULAR BIOLOGY – PCR, sequencing NEXT GENERATION DNA SEQUENCING (NextGen DNASeq) Ultra high throughput with many millions of sequence reads per reaction allowing genomic scale experimentation analysis in single experiments! Examples of NextGen DNASeq technologies • Illumina (Solexa) sequencing • Ion semiconductor sequencing (e.g. Ion Torrent) • Lynx Therapeutics' Massively Parallel Signature Sequencing (MPSS) • Polony sequencing • 454 pyrosequencing • SOLiD sequencing • Ion semiconductor sequencing (e.g. Ion Torrent) • DNA nanoball sequencing • Helioscope(TM) single molecule sequencing • Single Molecule SMRT(TM) sequencing • Single Molecule real time (RNAP) sequencing • Nanopore DNA sequencing • VisiGen Biotechnologies approach MOLECULAR BIOLOGY – PCR, sequencing Illumina based DNA sequencing DNA or cDNA DNA sample preparation Adapters ligated to ends of fragmented (~300bp) DNA sample 2-step process: 1. ligation of the same oligonucleotides to both ends 2. PCR based amplification, adding unique DNA sequence at each end (i.e. pink and blue in figure) Specific DNA sequence adapters Sample DNA attachment to flow cell surface Sample DNA adapters base-pair with complementary oligos fixed to the surface of the flow cell (pink or blue) The sample DNA is therefore primed for copying resulting in a copy of the sample DNA being immobilised to the flow cell surface (the original sample DNA is washed away) MOLECULAR BIOLOGY – PCR, sequencing Illumina based DNA sequencing The adapter sequences (pink or blue) at the free end of the immobilised copies of the sample DNA are free to base-pair with other neighbouring oligos that are fixed to the surface of the flow cell Bridge amplification Such ‘bridge’ interactions prime another round of DNA copying, The result is two complementary copies of the original sample DNA being immobilised to the slide in proximity to each other MOLECULAR BIOLOGY – PCR, sequencing Illumina based DNA sequencing Repeated cycles of bridge amplification lead to the generation of copied complementary clusters of the original sample DNA Cluster formation The cluster contains copies of both strands of the original DNA (i.e. it’s complementary). Therefore prior to cluster sequencing one strand is removed by cleaving with a restriction enzyme that recognises a sequence within either the pink or blue adapter. The flow cell surface is covered in several million dense clusters - all representing one original DNA molecule in the sample Actual sequence reaction utilizing ‘reversible chain terminator fluorescent dNTPs’ MOLECULAR BIOLOGY – PCR, sequencing Illumina based DNA sequencing A mix of sequencing primers (complementary to one of the adapter sequences), DNA polymerase and differentially fluorescent labelled reversible chain terminator dNTPs (A, C, T and G) are added to flow cell Sequencing DNA clusters one base at a time Depending on the first nucleotide in the cluster, a specific fluorescent reversible chain terminator dNTP is incorporated leading to a stop in DNA synthesis! After washing unincorporated nucleotides away, a laser excites the flow cell and detects which of the four fluorescent chain terminator dNTPs were incorporated in each cluster on the flow cell. i.e. decodes the first sequenced base Once an image recording what was the first nucleotide to be incorporated in each cluster has been taken, both the fluorescent dyes and the blocking group that prevents extension of the DNA are removed (hence ‘reversible chain terminator dNTPs) and the cycle is repeated MOLECULAR BIOLOGY – PCR, sequencing Illumina based DNA sequencing Sequential sequencing rounds one base at a time Possible to get up to 50 base-pairs of good sequence but there are millions of different clusters! MOLECULAR BIOLOGY – PCR, sequencing The principles of ‘illumina-based’ next generation based sequencing - video http://www.illumina.com/technology/sequencing_technology.ilmn MOLECULAR BIOLOGY – PCR, sequencing The principles of ‘illumina-based’ next generation DNA sequencing - video http://www.youtube.com/watch?v=77r5p8IBwJk ION PERSONAL GENOME MACHINE SEQUENCER NextGen DNASeq Ion Torrent - video/ tutorial http://lifetech-it.hosted.jivesoftware.com/videos/1016 MOLECULAR BIOLOGY – PCR, sequencing Craig Venter Institute Sorcerer II expedition MOLECULAR BIOLOGY – PCR, sequencing „Our researchers discovered at least 1,800 new species and more than 1.2 million new genes from the Sargasso Sea“ Intensive horizontal gene transfer