Genomics & Biotechnology Michael D. Kane, PhD Asst. Professor, Department of Computer & Information Technology Lead Genomic Scientist, Bindley Bioscience Center Purdue University Adjunct Asst. Professor of Pharmacology Ohio Northern University 1. 2. 3. 4. 5. 6. 7. 8. Genomics Review Single Nucleotide Polymorphisms (SNPs) Basics of DNA Detection SNP Discovery SNP Detection Biotechnologies Data Formats Genomic Data serving as Clinical Decision Support Genomics Review DNA is Information Storage Genomics Review “Zipped Files” Decompression “Executable Files” Genomics Review DNA is Double Stranded – One strand is the “coding strand” and the other strand is there to stabilize the DNA sequence when not in use. Double-stranded DNA is very durable in our environment. DNA is Double Stranded… Genomics Review Anti-parallel Configuration Top strand is ALWAYS written 5’ to 3’ When DNA is written in file, top strand is represented and bottom strand is assumed. 5’ 3’ 3’ 5’ 5’ 3’ 3’ 5’ AGTCGTGATCTGCTAAATGTCTCGAAGTTCGATGCTAG |||||||||||||||||||||||||||||||||||||| TCAGCACTAGACGATTTACAGAGCTTCAAGATACGATC Courier font is preferred for writing sequence data since letter spacing is independent of character content. FASTA File Format This is how genomic information is stored in the computer world. >gi|1924939|emb|X98411.1|HSMYOSIE Homo sapiens partial mRNA for myosin-IF CAGGAGAAGCTGACCAGCCGCAAGATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGTGACCC TCAACGTGGAGCAGGCAGCCTACACCCGTGATGCCCTGGCCAAGGGGCTCTATGCCCGCCTCTTCGACTT CCTCGTGGAGGCCATCAACCGTGCTATGCAGAAACCCCAGGAAGAGTACAGCATCGGTGTGCTGGACATT TACGGCTTCGAGATCTTCCAGAAAAATGGCTTCGAGCAGTTTTGCATCAACTTCGTCAATGAGAAGCTGC AGCAAATCTTTATCGAACTTACCCTGAAGGCCGAGCAGGAGGAGTATGTGCAGGAAGGCATCCGCTGGAC TCCAATCCAGTACTTCAACAACAAGGTCGTCTGTGACCTCATCGAAAACAAGCTGAGCCCCCCAGGCATC ATGAGCGTCTTGGACGACGTGTGCGCCACCATGCACGCCACGGGCGGGGGAGCAGACCAGACACTGCTGC AGAAGCTGCAGGCGGCTGTGGGGACCCACGAGCATTTCAACAGCTGGAGCGCCGGCTTCGTCATCCACCA CTACGCTGGCAAGGTCTCCTACGACGTCAGCGGCTTCTGCGAGAGGAACCGAGACGTTCTCTTCTCCGAC CTCATAGAGCTGATGCAGTCCAGTGACCAGGCCTTCCTCCGGATGCTCTTCCCCGAGAAGCTGGATGGAG ACAAGAAGGGGCGCCCCAGCACCGCCGGCTCCAAGATCAAGAAACAAGCCAACGACCTGGTGGCCACACT GATGAGGTGCACACCCCACTACATCCGCTGCATCAAACCCAACGAGACCAAGCACGCCCGAGACTGGGAG GAGAACAGAGTCCAGCACCAGGTGGAATACCTGGGCCTGAAGGAAAACATCAGGGTGCGCAGAGCCGGCT TCGCCTACCGCCGCCAGTTCGCCAAATTCCTGCAGAGGTATGCCATTCTGACCCCCGAGACGTGGCCGCG GTGGCGTGGGGACGAACGCCAGGGCGTCCAGCACCTGCTTCGGGCGGTCAACATGGAGCCCGACCAGTAC CAGATGGGGAGCACCAAGGTCTTTGTCAAGAACCCAGAGTCGCTTTTCCTCCTGGAGGAGGTGCGAGAGC GAAAGTTCGATGGCTTTGCCCGAACCATCCAGAAGGCCTGGCGGCGCCACGTGGCTGTCCGGAAGTACGA GGAGATGCGGGAGGAAGCTTCCAACATCCTGCTGAACAAGAAGGAGCGGAGGCGCAACAGCATCAATCGG AACTTCGTCGGGGACTACCTGGGGCTGGAGGAGCGGCCCGAGCTGCGTCAGTTCCTGGGCAAGAAGGAGC GGGTGGACTTCGCCGATTCGGTCACCAAGTACGACCGCCGCTTCAAGCCCATCAAGCGGGACTTGATCCT GACGCCCAAGTGTGTGTATGTGATTGGGCGAGAGAAGATGAAGAAGGGACCTGAGAAAGGTCCAGTGTGT GAAATCTTGAAGAAGAAATTGGACATCCAGGCTCTGCGGGGGGTCTCCCTCAGCACGCGACAGGACGACT TCTTCATCCTCCAAGAGGATGCCGCCGACAGCTTCCTGGAGAGCGTCTTCAAGACCGAGTTTGTCAGCCT TCTGTGCAAGCGCTTCGAGGAGGCGACGCGGAGGCCCCTGCCCCTCACCTTCAGCGACACACTACAGTTT CGGGTGAAGAAGGAGGGCTGGGGCGGTGGCGGCACCCGCAGCGTCACCTTCTCCCGCGGCTTCGGCGACT TGGCAGTGCTCAAGGTTGGCGGTCGGACCCTCACGGTCAGCGTGGGCGATGGGCTGCCCAAGAACTCCAA GCCTACCGGAAAGGGATTGGCCAAGGGTAAACCTCGGAGGTCGTCCCAAGCCCCTACCCGGGCGGCCCCT GGCGCCCCCCAAGGCATGGATCGAAATGGGGCCCCCCTCTGCCCACAGGGGGGGGCCCCCTGCCCCCTGG AGAAATTCATTTGGCCCAGGGGGCACCCACAGGCCTCCCCGGCCCTCCGTCCACATCCCTGGGATGCCAG CAGACGACCCCGGGCACGTCCGCCCTCAGAGCACAACACAGAATTCCTCAACGTGCCTGACCAGGGGATG GCCGGCATGCAGAGGAAGCGCAGCGTGGGGCAACGGCCAGTGCCTGTGGGCCGACCCAAGCCCCAGCCTC GGACACATGGTCCCAGGTGCCGGGCCCTATACCAGTACGTGGGCCAAGATGTGGACGAGCTGAGCTTCAA CGTGAACGAGGTCATTGAGATCCTCATGGAAGATCCCTCGGGCTGGTGGAAGGGCCGGCTTCACGGCCAG GAGGGCCTTTTCCCAGGAAACTACGTGGAGAAGATCTGAGCTGGGCCCTGGGATACTGCCTTCTCTTTCG CCCGCCTATCTGCCTGCCGGCCTGGTGGGGAGCCAGGCCCTGCCAATGAAAGCCTCGTTTACCTGGGCTG CAATAGCCTAAAAGTCCAATCCTTTGGCCTCCAGTCCTTGCCCAGGCCCTGGGTCACCAGGTCACTGGTG CAGCCCCCGCCCCTGGGCCCTGGTTTTCCTCCAACATCACACCTGCTGCCCATTGTCCAAAACTGTGTGT GTCAAAGGGGACTAACAGCAGAATTTACCTCCCAACTGCCATGTGATTAAGAAATGGGTCTTGAGTCCTG TGCTGTTGGCAAAGTTCCAGGCACAGTTGGGGAGGGGGGGCCGGAATCCGC Single Nucleotide Polymorphisms (SNPs) Mutation An ontological perspective SNP Change in the base sequence of DNA Inherited or spontaneous Primary Cause of a Disease or Disorder Predisposes Carrier to Disease/Disorder Confers Disease Resistance to Carrier Effect of Base Change is Unknown Single Nucleotide Polymorphisms (SNPs) Typically, a SNP in a gene that encodes a drug metabolism enzyme will decrease the activity of the enzyme, thereby altering how well the body clears the drug. The Area Under the Curve (AUC) is a common representation of drug metabolism kinetics A normal (“mock”) patient’s AUC (solid line, lower left) following a standard warfarin oral dose shows the changes in drug plasma concentration over time. Warfarin is metabolized to 7-hydroxywarfarin by the oxidative metabolism enzyme 2C9, which is primary mechanism for warfarin clearance. There are two variant alleles that have a reduced capability for metabolizing warfarin, with 11% and 7% frequency in the Caucasian population for variants CYP2C9*2 and CYP2C9*3, respectively. Patients who are homozygous for these variant alleles (i.e. patients have two variant copies of the 2C9 gene) experience a 65% decrease in drug clearance rate 29 (dotted line, lower left). Note that the presence of a variant allele leads to increased drug plasma concentrations above the minimum toxic concentration and markedly increases the risk of an adverse drug response. Single Nucleotide Polymorphisms (SNPs) There are examples of SNPs in CYP genes (genes that encode P450 enzymes) that: 1. SNPs in the gene’s promoter region can increase or decrease gene expression levels, thereby altering the total amount of P450 enzyme in the liver. 2. SNPs in the CYP gene that do NOT have any effect on clearance rates for a particular drug. Single Nucleotide Polymorphisms (SNPs) Discovering SNPs and linking these to altered metabolism effects. Biotechnology: DNA sequencing of cohort of people (ethnicity is important). SNP in CYP gene is discovered (i.e. an altered DNA sequence is found). New SNP population frequency is determined. Molecular Biology methods are utilized to express the altered P450 in a non-clinical model. Effect of SNP on enzyme activity is studied (in the test tube). Note that this is only useful for nonsynonymous SNPs. 3 cohorts of people are evaluated (normals, heterozygous, and homozygous for allelic variant), dosed with a known drug (substrate) in a classic pharmacokinetic study. Effect of SNP is reported, and utilized as rationale for additional studies in other known substrates. In this case, this may involve DNA studies in a cohort of patients already taking the drug that are experiencing altered efficacy or toxicity profiles. Where do we get DNA sequence information? DNA Sequencing Methods -conversion of biological/bioanalytical data into sequence information NOTE: There are automated, high-throughput sequencing centers that COMPLETELY automate (robotics and information systems) DNA sequencing, preliminary identification and publishing. DNA Sequencing (old method) 5’-AAACCAGGCCGATAAGGTACTACACGAAAAAAA-3’ TTTTTTT dATP dCTP dTTP dGTP + ddATP32 ddCTP32 ddTTP32 ddGTP32 Step 1. Extend complementary sequence using “free” nucleotides with limiting amounts of radioactive “terminating” nucleotides. Step 2. Run product out on a electrophoresis gel. Step 3. Place gel against radiographic film, develop. A G C T AAACCAGGCCGATAAGGTACTACACGAAAAA ||||||||||||||||||||||||||||||||||||||| TTTGGTCCGGCTATTCCATGATGTGCTTTTTTT TTGGTCCGGCTATTCCATGATGTGCTTTTTTT TGGTCCGGCTATTCCATGATGTGCTTTTTTT GGTCCGGCTATTCCATGATGTGCTTTTTTT GTCCGGCTATTCCATGATGTGCTTTTTTT TCCGGCTATTCCATGATGTGCTTTTTTT CCGGCTATTCCATGATGTGCTTTTTTT CGGCTATTCCATGATGTGCTTTTTTT GGCTATTCCATGATGTGCTTTTTTT GCTATTCCATGATGTGCTTTTTTT CTATTCCATGATGTGCTTTTTTT TATTCCATGATGTGCTTTTTTT ATTCCATGATGTGCTTTTTTT DNA Sequencing new method http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/D/DNAsequencing.html DNA Sequencing – SNP Discovery IUPAC code Meaning A A C C G G T T M A or C R A or G W A or T S C or G Y C or T K G or T V A or C or G H A or C or T D A or G or T B C or G or T N G or A or T or C IUPAC = International Union of Pure and Applied Chemistry DNA Sequencing can be used for the Detection of known SNPs, but other more efficient, cost-effective, high-throughput biotechnology methods have been developed (and continue to be developed). Basics of DNA Detection The Key to DNA Detection is “Sequence-Specific Affinity” 5’ 3’ G T C A T T G C C A A C A G T A A C G G T T 3’ 5’ “GC” content (base paring) generally dictates thermodynamics of complementary binding. Tm = Melting Temperature Basics of DNA Detection “TARGET” is the fluorescence labeled DNA derived from the patient. “PROBE” is DNA attached to a fixed position Basics of DNA Detection Three Major Methods of SNP Detection: 1) RFLP 2) Hybridization 3) Single-Base Extension These biotechnology assays concatenate (A) a DNA sample preparation step, and (B) an analytical-instrument detection step. Keep in mind that these SNP assays are aimed at KNOWN SNPs, and are developed to determine if the patient’s DNA sample is one of three states: i) Homozygous normal ii) Heterozygous (one normal, one altered base) iii) Homozygous abnormal (both bases are altered) Basics of DNA Detection 2 copies of every CYP gene …AGATGCTCGATAATGATCGCTA… …TCTACGAGCTATTACTAGCGAT… Homozygous (NORMAL) Heterozygous …AGATGCTCGATAATGATCGCTA… …TCTACGAGCTATTACTAGCGAT… …AGATGCTCGATAATGATCGCTA… …TCTACGAGCTATTACTAGCGAT… …AGATGCTCGAGAATGATCGCTA… …TCTACGAGCTCTTACTAGCGAT… Homozygous (ABNORMAL) …AGATGCTCGAGAATGATCGCTA… …TCTACGAGCTCTTACTAGCGAT… …AGATGCTCGAGAATGATCGCTA… …TCTACGAGCTCTTACTAGCGAT… We will use CYP2C9*3 (7% frequency in Caucasian population) for our examples… What does a population frequency of 7% mean? How many people (out of 1,000) would be heterozygous for CYP2C9*3? 70 How many people (out of 1,000) would be homozygous for CYP2C9*3? 5 How many people (out of 1,000) would be at risk for decreased CYP2C9 activity (*2 = 11%; *3 =7%)? CYP Family Allele Nucleotide Change Enzyme Activity Change Associated Drug Concentration Change 1A2 CYP1A2*1C -3860 G>C Decreases Increases 2C9 CYP2C9*3A 1075 A>C Decreases Increases 3A4 CYP3A4*18A 878 T>C Increases Decreases >gi|13699817|ref|NM_000771.2| Homo sapiens cytochrome P450, family 2, subfamily C, polypeptide 9 (CYP2C9), mRNA ATGGATTCTCTTGTGGTCCTTGTGCTCTGTCTCTCATGTTTGCTTCTCCTTTCACTCTGGAGACAGAGCT CTGGGAGAGGAAAACTCCCTCCTGGCCCCACTCCTCTCCCAGTGATTGGAAATATCCTACAGATAGGTAT TAAGGACATCAGCAAATCCTTAACCAATCTCTCAAAGGTCTATGGCCCGGTGTTCACTCTGTATTTTGGC CTGAAACCCATAGTGGTGCTGCATGGATATGAAGCAGTGAAGGAAGCCCTGATTGATCTTGGAGAGGAGT TTTCTGGAAGAGGCATTTTCCCACTGGCTGAAAGAGCTAACAGAGGATTTGGAATTGTTTTCAGCAATGG AAAGAAATGGAAGGAGATCCGGCGTTTCTCCCTCATGACGCTGCGGAATTTTGGGATGGGGAAGAGGAGC ATTGAGGACCGTGTTCAAGAGGAAGCCCGCTGCCTTGTGGAGGAGTTGAGAAAAACCAAGGCCTCACCCT GTGATCCCACTTTCATCCTGGGCTGTGCTCCCTGCAATGTGATCTGCTCCATTATTTTCCATAAACGTTT TGATTATAAAGATCAGCAATTTCTTAACTTAATGGAAAAGTTGAATGAAAACATCAAGATTTTGAGCAGC CCCTGGATCCAGATCTGCAATAATTTTTCTCCTATCATTGATTACTTCCCGGGAACTCACAACAAATTAC TTAAAAACGTTGCTTTTATGAAAAGTTATATTTTGGAAAAAGTAAAAGAACACCAAGAATCAATGGACAT GAACAACCCTCAGGACTTTATTGATTGCTTCCTGATGAAAATGGAGAAGGAAAAGCACAACCAACCATCT GAATTTACTATTGAAAGCTTGGAAAACACTGCAGTTGACTTGTTTGGAGCTGGGACAGAGACGACAAGCA CAACCCTGAGATATGCTCTCCTTCTCCTGCTGAAGCACCCAGAGGTCACAGCTAAAGTCCAGGAAGAGAT TGAACGTGTGATTGGCAGAAACCGGAGCCCCTGCATGCAAGACAGGAGCCACATGCCCTACACAGATGCT GTGGTGCACGAGGTCCAGAGGTACATTGACCTTCTCCCCACCAGCCTGCCCCATGCAGTGACCTGTGACA TTAAATTCAGAAACTATCTCATTCCCAAGGGCACAACCATATTAATTTCCCTGACTTCTGTGCTACATGA CAACAAAGAATTTCCCAACCCAGAGATGTTTGACCCTCATCACTTTCTGGATGAAGGTGGCAATTTTAAG AAAAGTAAATACTTCATGCCTTTCTCAGCAGGAAAACGGATTTGTGTGGGAGAAGCCCTGGCCGGCATGG AGCTGTTTTTATTCCTGACCTCCATTTTACAGAACTTTAACCTGAAATCTCTGGTTGACCCAAAGAACCT TGACACCACTCCAGTTGTCAATGGATTTGCCTCTGTGCCGCCCTTCTACCAGCTGTGCTTCATTCCTGTC TGAAGAAGAGCAGATGGCCTGGCTGCTGCTGTGCAGTCCCTGCAGCTCTCTTTCCTCTGGGGCATTATCC ATCTTTGCACTATCTGTAATGCCTTTTCTCACCTGTCATCTCACATTTTCCCTTCCCTGAAGATCTAGTG AACATTCGACCTCCATTACGGAGAGTTTCCTATGTTTCACTGTGCAAATATATCTGCTATTCTCCATACT CTGTAACAGTTGCATTGACTGTCACATAATGCTCATACTTATCTAATGTAGAGTATTAATATGTTATTAT TAAATAGAGAAATATGATTTGTGTATTATAATTCAAAGGCATTTCTTTTCTGCATGATCTAAATAAAAAG CATTATTATTTGCTG Nonsynonymous mutations in CYP2C9 with functional effects Alleles Nucleotide change in cDNA Amino acid change Enzymatic activity CYP2C9 *2 430C > T Arg144Cys Decrease: an approximately 50% decrease of the maximum rate of metabolism (Vmax) and 30–50% lower turnover (kcat) of S-warfarin CYP2C9 *3 1075A > C Ile359Leu Decrease: a markedly higher Km and lower intrinsic clearance with an approximately 90% decrease of S-warfarin CYP2C9 *4 1076T > C Ile359Thr Decrease: 72–81% reduction of intrinsic clearance of diclofenac CYP2C9 *5 1080C > G Asp360Glu Decrease: intrinsic clearance of warfarin approximately 10% of wild type CYP2C9 *6 del818A Frame shift Null 449G > A Arg150His Increase: more than two-fold increase in the intrinsic clearance of tolbutamide CYP2C9 *8 CYP2C9 * 11 1003C > T Arg335Trp Decrease: a three-fold increase in the Km and more than a two-fold decrease in the intrinsic clearance of tolbutamide CYP2C9 * 12 1465C > T Pro489Ser Decrease: a modest decrease in the Vmax and the intrinsic clearance of tolbutamide CYP2C9 * 13 269T > C Leu90Pro Decrease: decreased activity toward all studied CYP2C9 substrates CYP2C9 * 14 374G > A Arg125His Decrease: 80–90% lower catalytic activity toward tolbutamide CYP2C9 * 15 485C > A Ser162X Null CYP2C9 * 16 895A > G Thr299Ala Decrease: 80–90% lower catalytic activity toward tolbutamide CYP2C9 * 17 1144C > T Pro382Ser Decrease: modest 30 to 40% decreases in caltalytic activity toward tolbutamide CYP2C9 * 19 1362G > C Gln454His Decrease: modest 30 to 40% decreases in caltalytic activity toward tolbutamide Non-synonymous mutations with functional activity are listed. Those that functional activity has not been examined were not listed. Missense mutations with functional effects mapped in the crystal structure of human CYP2C9 protein bound with warfarin (PDB: 10G5). S-warfarin and heme are shown in the skeleton model with pink and red, respectively. Amino acid residues are shown in the sphere mode with colors. Biotechnologies - PCR Essentially all SNP detection methods utilize PCR (Polymerase Chain Reaction) as a “sample preparation” step to DRAMATICALLY INCREASE or AMPLIFY the small DNA region under investigation. PCR is by far the most common DNA molecular biology technique utilized, and is used for gene cloning, gene sequencing, most DNA analysis methods, BUT can ONLY be used in known genomic regions and models (i.e. the DNA sequence under investigation must have already been sequenced to utilize PCR). PCR Concept: Amplification of a relatively short piece of DNA for manipulation or sequencing. Driving phenomena of PCR: Heating and Cooling Heating: Double-stranded DNA “comes apart” when heated to near boiling. This is also called “denaturing” or “melting”. Cooling: Complementary DNA “comes together” when cooled. This is also called “renaturing”, “annealing” or “hybridizing”. Double-Stranded DNA COOLING HEATING Single-Stranded DNA Molecular Basis of PCR: Polymerase Activity A Polymerase is an enzyme that synthesizes DNA. 1) DNA can ONLY be synthesized using the complementary strand! 2) Polymerases synthesize DNA in the 5’ 3’ direction! 5’-GTCGATGTCTGATCAATTGGGCTGATCATGTCGATGATGCTAGAAT-3’ 3’CTACGATCTTA-5’ 5’-GTCGATGTCTGATCAATTGGGCTGATCATGTCGATGATGCTAGAAT-3’ ACTAGTACAGCTACTACGATCTTA-5’ PCR uses the following reagents to AMPLIFY sections of DNA… 1) 2) 3) 4) DNA template Polymerase Free Nucleotides (which are incorporated during DNA synthesis) PCR Primers Primers are two short pieces of DNA (each with a unique sequence) that are complementary to the two different strands of the DNA template. In line diagrams, the primers are designated as arrows, where the arrows point in the direction of 3’ DNA synthesis. Double-Stranded DNA This section of the DNA template will be amplified. HEATING 3’ 5’ PCR Primers 5’ 3’ Single-Stranded DNA Double-Stranded DNA 5’-GGATGGAACACTGGGGGGAGCCGATACCCAGGACAGGGCAGTCCTGGAGGCAACCGTTATCCACCTCAGGGAGGGGGTGGCTGGGGT-3’ 3’-CCTACCTTGTGACCCCCCTCGGCTATGGGTCCTGTCCCGTCAGGACCTCCGTTGGCAATAGGTGGAGTCCCTCCCCCACCGACCCCA-5’ HEAT (95ºC, 30 seconds) Single-Stranded DNA 5’-GGATGGAACACTGGGGGGAGCCGATACCCAGGACAGGGCAGTCCTGGAGGCAACCGTTATCCACCTCAGGGAGGGGGTGGCTGGGGT-3’ 3’-CCTACCTTGTGACCCCCCTCGGCTATGGGTCCTGTCCCGTCAGGACCTCCGTTGGCAATAGGTGGAGTCCCTCCCCCACCGACCCCA-5’ COOL (60ºC, 30 seconds) PCR Primer Annealing 5’-GGATGGAACACTGGGGGGAGCCGATACCCAGGACAGGGCAGTCCTGGAGGCAACCGTTATCCACCTCAGGGAGGGGGTGGCTGGGGT-3’ 3’-CCCTCCCCCACCGACCCCA-5’ 5’-GGATGGAACACTGGGGGGA-3’ 3’-CCTACCTTGTGACCCCCCTCGGCTATGGGTCCTGTCCCGTCAGGACCTCCGTTGGCAATAGGTGGAGTCCCTCCCCCACCGACCCCA-5’ HEAT (72ºC, 30 seconds) Polymerase Elongation 5’-GGATGGAACACTGGGGGGAGCCGATACCCAGGACAGGGCAGTCCTGGAGGCAACCGTTATCCACCTCAGGGAGGGGGTGGCTGGGGT-3’ CTCGGCTATGGGTCCTGTCCCGTCAGGACCTCCGTTGGCAATAGGTGGAGTCCCTCCCCCACCGACCCCA-5’ 5’-GGATGGAACACTGGGGGGAGCCGATACCCAGGACAGGGCAGTCCTGGAGGCAACCGTTATCCACCTCAGGGA 3’-CCTACCTTGTGACCCCCCTCGGCTATGGGTCCTGTCCCGTCAGGACCTCCGTTGGCAATAGGTGGAGTCCCTCCCCCACCGACCCCA-5’ DNA Synthesis after 1 “cycle” of PCR = 1 double stranded DNA is now 2 “copies” 5’-GGATGGAACACTGGGGGGAGCCGATACCCAGGACAGGGCAGTCCTGGAGGCAACCGTTATCCACCTCAGGGAGGGGGTGGCTGGGGT-3’ 3’-CCTACCTTGTGACCCCCCTCGGCTATGGGTCCTGTCCCGTCAGGACCTCCGTTGGCAATAGGTGGAGTCCCTCCCCCACCGACCCCA-5’ 5’-GGATGGAACACTGGGGGGAGCCGATACCCAGGACAGGGCAGTCCTGGAGGCAACCGTTATCCACCTCAGGGAGGGGGTGGCTGGGGT-3’ 3’-CCTACCTTGTGACCCCCCTCGGCTATGGGTCCTGTCCCGTCAGGACCTCCGTTGGCAATAGGTGGAGTCCCTCCCCCACCGACCCCA-5’ 95ºC 30 Sec. Temperature 72ºC 30 Sec. 1) Denaturing Step 2) Primer Annealing Step 3) Elongation Step 60ºC 30 Sec. Time 95ºC 30 Sec. 95ºC 30 Sec. 95ºC 30 Sec. 72ºC 30 Sec. 60ºC 30 Sec. 95ºC 30 Sec. 72ºC 30 Sec. 60ºC 30 Sec. 72ºC 30 Sec. 60ºC 30 Sec. “THERMOCYCLING” 72ºC 30 Sec. 60ºC 30 Sec. Most PCR applications use 30 cycles (230 = 1.07 billion), representing an amplification of about 1 billion fold. Basics of DNA Detection Three Major Methods of SNP Detection: 1) RFLP 2) Hybridization 3) Single-Base Extension These biotechnology assays concatenate (A) a DNA sample preparation step, and (B) an analytical-instrument detection step. Keep in mind that these SNP assays are aimed at KNOWN SNPs, and are developed to determine if the patient’s DNA sample is one of three states: i) Homozygous normal ii) Heterozygous (one normal, one altered base) iii) Homozygous abnormal (both bases are altered) Biotechnologies - RFLP Restriction Fragment Length Polymorphism (RFLP, or sometimes called PCRRFLP) is used to assay DNA sequences arising from their differing nucleotide sequences. 1) The DNA region that harbors the known SNP is amplified using PCR. 2) The PCR product (short double-stranded DNA) is treated (digested or cut) with a restriction enzyme, which cuts DNA at specific sequence sites. 3) The results of the restriction enzyme digestion is analyzed to determine the number and/or size of the resulting DNA strands. 2 Restriction Enzyme Digestion 1 Biotechnologies - RFLP Using CYP2C9*3 (7% frequency in Caucasian population)… >gi|13699817|ref|NM_000771.2| Homo sapiens cytochrome P450, family 2, subfamily C, polypeptide 9 (CYP2C9), mRNA ATGGATTCTCTTGTGGTCCTTGTGCTCTGTCTCTCATGTTTGCTTCTCCTTTCACTCTGGAGACAGAGCT CTGGGAGAGGAAAACTCCCTCCTGGCCCCACTCCTCTCCCAGTGATTGGAAATATCCTACAGATAGGTAT TAAGGACATCAGCAAATCCTTAACCAATCTCTCAAAGGTCTATGGCCCGGTGTTCACTCTGTATTTTGGC CTGAAACCCATAGTGGTGCTGCATGGATATGAAGCAGTGAAGGAAGCCCTGATTGATCTTGGAGAGGAGT TTTCTGGAAGAGGCATTTTCCCACTGGCTGAAAGAGCTAACAGAGGATTTGGAATTGTTTTCAGCAATGG AAAGAAATGGAAGGAGATCCGGCGTTTCTCCCTCATGACGCTGCGGAATTTTGGGATGGGGAAGAGGAGC ATTGAGGACCGTGTTCAAGAGGAAGCCCGCTGCCTTGTGGAGGAGTTGAGAAAAACCAAGGCCTCACCCT GTGATCCCACTTTCATCCTGGGCTGTGCTCCCTGCAATGTGATCTGCTCCATTATTTTCCATAAACGTTT TGATTATAAAGATCAGCAATTTCTTAACTTAATGGAAAAGTTGAATGAAAACATCAAGATTTTGAGCAGC CCCTGGATCCAGATCTGCAATAATTTTTCTCCTATCATTGATTACTTCCCGGGAACTCACAACAAATTAC TTAAAAACGTTGCTTTTATGAAAAGTTATATTTTGGAAAAAGTAAAAGAACACCAAGAATCAATGGACAT GAACAACCCTCAGGACTTTATTGATTGCTTCCTGATGAAAATGGAGAAGGAAAAGCACAACCAACCATCT GAATTTACTATTGAAAGCTTGGAAAACACTGCAGTTGACTTGTTTGGAGCTGGGACAGAGACGACAAGCA CAACCCTGAGATATGCTCTCCTTCTCCTGCTGAAGCACCCAGAGGTCACAGCTAAAGTCCAGGAAGAGAT TGAACGTGTGATTGGCAGAAACCGGAGCCCCTGCATGCAAGACAGGAGCCACATGCCCTACACAGATGCT GTGGTGCACGAGGTCCAGAGGTACATTGACCTTCTCCCCACCAGCCTGCCCCATGCAGTGACCTGTGACA TTAAATTCAGAAACTATCTCATTCCCAAGGGCACAACCATATTAATTTCCCTGACTTCTGTGCTACATGA CAACAAAGAATTTCCCAACCCAGAGATGTTTGACCCTCATCACTTTCTGGATGAAGGTGGCAATTTTAAG AAAAGTAAATACTTCATGCCTTTCTCAGCAGGAAAACGGATTTGTGTGGGAGAAGCCCTGGCCGGCATGG AGCTGTTTTTATTCCTGACCTCCATTTTACAGAACTTTAACCTGAAATCTCTGGTTGACCCAAAGAACCT TGACACCACTCCAGTTGTCAATGGATTTGCCTCTGTGCCGCCCTTCTACCAGCTGTGCTTCATTCCTGTC TGAAGAAGAGCAGATGGCCTGGCTGCTGCTGTGCAGTCCCTGCAGCTCTCTTTCCTCTGGGGCATTATCC ATCTTTGCACTATCTGTAATGCCTTTTCTCACCTGTCATCTCACATTTTCCCTTCCCTGAAGATCTAGTG AACATTCGACCTCCATTACGGAGAGTTTCCTATGTTTCACTGTGCAAATATATCTGCTATTCTCCATACT CTGTAACAGTTGCATTGACTGTCACATAATGCTCATACTTATCTAATGTAGAGTATTAATATGTTATTAT TAAATAGAGAAATATGATTTGTGTATTATAATTCAAAGGCATTTCTTTTCTGCATGATCTAAATAAAAAG CATTATTATTTGCTG Biotechnologies - RFLP CYP2C9*1 GAGGTCCAGAGGTACATTGACCTTCTCCCCAC CYP2C9*3 GAGGTCCAGAGGTACCTTGACCTTCTCCCCAC Restriction Enzyme: Kpn I, which cuts at GGTACC Biotechnologies - RFLP PCR product = 105 base pairs, which spans the variant site. After KpnI digestions… 105 bp # of DNA Fragments 1 CYP2C9*1/*1 CYP2C9*1/*3 3 + 85 bp 20 bp 2 CYP2C9*3/*3 85 bp 20 bp Biotechnologies - Hybridization In a hybridization-based SNP assay, the difference in DNA sequence is sufficient to disrupt “natural” double-stranded re-naturing / annealing / hybridization. This is accomplished by using relatively short DNA “capture probes”. In long strands of DNA, a single mismatched base pair is NOT sufficient to disrupt the formation of a double-stranded DNA “hybrid”. >30 bp …TAGTCGCTAGATGATCG… …ATCAGCGAGCTACTAGC… Note: This is NOT a SNP!!!, it is just an example of doublestranded DNA with a mismatched base pair!!! Biotechnologies – Hybridization DNA Microarray Technology 1) PCR used to generate short DNA strand that harbors the variant position. 2) PCR uses a “primer” with a fluorescent “tag” for detection. 3) PCR products are “hybridized” to the microarray surface, then analyzed. This section of the DNA template will be amplified. 3’ 5’ PCR Primers 5’ 3’ Biotechnologies – Hybridization DNA Microarray Technology Fluoro-PCR product SNP location Microarray = 1”x3” glass slide These 2 “spots” contain a different short DNA strand that is “complementary” to CYP2C9*1 or CYP2C9*2