Role of Genetic Polymorphisms in Responses to Toxic Agents • Definitions • “Forward genetics” and toxicology • “Reverse genetics” and toxicology • Genetic markers • SNPs and their use in toxicology • Ethical, Legal and Social Issues (ELSI) “Toxicology is concerned with the interaction between xenobiotics and biological molecules directly or indirectly coded in the DNA, and can be regarded as a branch of GENETICS.” Michael F.W. Festing (2001) Gregor Mendel (1822 – 1884) TERMINOLOGY Gene: A sequence of DNA bases that encodes a protein Allele: A sequence of DNA bases Locus: Physical location of an allele on a chromosome Linkage: Proximity of two alleles on a chromosome Marker: An allele of known position on a chromosome Distance: Number of base-pairs between two alleles centiMorgan: Probabilistic distance of two alleles Phenotype: An outward, observable character (trait) Genotype: The internally coded, inheritable information Penetrance: No. with phenotype / No. with allele Modified from M.F. Ramoni, Harvard Medical School The 80s Revolution and the Human Genome Project Genetic Polymorphisms: naturally occurring DNA markers that identify regions of the genome and vary among individuals The intuition that polymorphisms could be used as markers sparkled the revolution On February 12, 2001 the Human Genome Project announced the completion of a first draft of the human genome and declared: “A SNP map promises to revolutionize both mapping diseases and tracing human history” SNP are Single Nucleotide Polymorphisms – subtle variations of the human genome across individuals Modified from M.F. Ramoni, Harvard Medical School DISTANCES ON A GENETIC MAP • Physical distances between alleles are base-pairs • But the recombination frequency is not constant • A useful measure of distance is based on the probability of recombination: the Morgan • A distance of 1 centiMorgan (cM) between two alleles means that they have 1% chance of being separated by recombination • A genetic distance of 1 cM is roughly equal to a physical distance of 1 million base pairs (1Mb) Modified from M.F. Ramoni, Harvard Medical School MORE TERMINOLOGY Physical Maps: maps in base-pairs Human physical map: 3000Mb (Mega-bases) Genetic Maps: maps in centiMorgan Human Male Map Length: 2851cM Human Female Map Length: 4296cM Correspondence between maps: Male cM ~ 1.05 Mb; Female cM ~ 0.88Mb Modified from M.F. Ramoni, Harvard Medical School Simple and Complex Traits Single Gene (Mendelian) diseases: Autosomal dominant (Huntington) Autosomal recessive (Cystic Fibrosis) X-linked dominant (Rett) X-linked recessive (Lesch-Nyhan) Today, over 400 single-gene diseases have been identified Problem: traits don’t always follow single-gene models Complex Trait: phenotype/genotype interaction Multiple cause: multiple genes in several loci determine a phenotype in conjunction with non-genetic factors (accidents of development, social factors, environment, infections, other factors) Multiple effect: gene causes more than one phenotype Modified from M.F. Ramoni, Harvard Medical School Genetic Markers Even though we share most DNA, there are variations (polymorphisms) Polymorphic: two or more forms of the same gene, or genetic marker exist with each form being too common in a population to be merely attributable to a new mutation Classes of polymorphic genetic markers: Single Nucleotide Polymorphisms (SNP): single base differences in population Microsatellites: short tandem repeat (e.g. GATA, 2 – 6 bp long) Minisatellites: simple sequence repeats (10 – 40 bp long) Variable Number of Tandem Repeats: the number of repeats may vary Restriction Fragment Length Polymorphisms: presence/absence of a site Deletions, Duplications, Insertions: alterations on a chromosome level Complex haplotypes: combinations of the above Genetic Markers Coding: Single Nucleotide Polymorphisms Restriction Fragment Length Polymorphisms Deletions, Duplications, Insertions Non-coding: Microsatellites Minisatellites Variable Number of Tandem Repeats Restriction Fragment Length Polymorphisms Single Nucleotide Polymorphisms Deletions, Duplications, Insertions Genetic Markers • Polymorphisms (allelic variations) are essential to: – Study inheritance patterns – Map phenotypes and anchor genes to the genetic map by cosegregation analysis – Determine change in function: resistant/sensitive populations • Genetically determined variability among humans is due to a difference in 0.1% of the genomic sequence! • Polymorphisms can be silent, or be exhibited at levels of: – Morphology – Protein – DNA Chromosomal rearrangements: Deletions, Duplications, Insertions Deletions: a certain part is lost, for example abc ac Insertions: a part is added, for example ac abc Duplications: can be tandem, for example abc abbc, or not, for example abc abcabc Reversals: a part is turned around, head to tail abc cba Transpositions: two parts change places, for example abcd acbd Insertion Deletion Copy Number Variability (CNVs) • CNV are DNA segments at 1 kb or larger with a variable number of copies in comparison with a reference genome. CNV can have dramatic phenotypic consequences as a result of altering gene dosage, disrupting coding sequences, or perturbing long-range gene regulation. • There are several well-known examples of CNV, including CYP2A6, CYP2D6, GSTM1, GSTT1, SULT1A1, SULT1A3, UGT2B17, and also the nearby UGT2B7, UGT2B10 and UGT2B11 genes. All these genes are deleted at a relatively high frequency in at least one ethnic group. In addition, CYP2A6, CYP2D6, SULT1A1and SULT1A3 can also present duplications and even multiduplications. Pharmacology & Therapeutics 116, Issue 3, 2007, Pages 496–526 Minisatellites • Original DNA fingerprinting technique • Relies on stretches of tandemly repeated sequences (usually 15 - 100bp) • Alleles show high variability in numbers of repeats Genotyping using minisatellites: • Digest genomic DNA • Run out on gel • Southern blot and probe with radiolabelled repeat DNA • Individuals appear with a set of bands unique to them, although each band is shared with one of their parents Microsatellites • • • • • • Number of repeats varies greatly between individuals Make up to 10-15% of the mammalian genome Believed to have no function Have high mutation rates Used in forensic analysis Can be amplified by PCR – fragments that are generated have different length due to different number of repeats Microsatellites are highly polymorphic due to potential for “skipping” during DNA replication Restriction Fragment Length Polymorphisms (RFLPs) • Consider two alleles having slightly different sequences GAATTC CTTAAG GCATTC CGTAAG EcoRI will cut the first but not the second Single nucleotide polymorphisms (SNPs) Variations of a single base between individuals: A most common form of genetic variation in humans Thought to be a major cause of genetic diversities among different individuals in drug response, disease susceptibility... A SNP must occur in at least 1% of the population Occur every 500-1000 bp About 50,000 – 100,000 SNPs in coding sequences SNPs may occur in coding regions: cSNP: SNP occurring in a coding region rSNP: SNP occurring in a regulatory region sSNP: Coding SNP with no change on amino acid Modified from M.F. Ramoni, Harvard Medical School Single nucleotide polymorphisms (SNPs) • Two bases (one for chromosome) for each locus • Because of the A-T C-G complement, a SNP can have only two variants: (AT) or (CG) • A SNP is a variable with two states: Major allele: Allele (AT) or (CG) more frequent Minor allele: Allele (AT) or (CG) less frequent • An individual can be, for each polymorphic locus: Homozygous on major allele Heterozygous on major/minor allele Homozygous on minor allele The role of SNP analysis through all stages of drug development Target Identification: disease association studies identify SNPs in candidate genes. The proteins encoded by such genes may represent novel drug targets Target Validation: population analysis determines the level of variation within a candidate gene. The presence of several SNPs will generate a large number of potential variants and such candidates can be eliminated Lead Identification: screens can be developed to identify lead compounds that interact with each variant of the drug target Lead Validation: biological assays can be performed that incorporate different lead compounds and all variants of the target protein Lead Optimization: knowledge of polymorphisms affecting the target can be used to develop drugs that work more efficiently over a broader group of patients or to identify drugs that work more efficiently in specific genotypes Preclinical Testing: animal models can be developed incorporating all known variants of the target to provide more accurate predictions of drug efficacy in humans Clinical Trials: trials can be carried out with groups of patients selected on the basis of genotype, to specifically test for adverse drug reactions at particular doses SNP discovery and SNP genotyping SNP discovery: detection of novel polymorphisms • DNA sequencing • In silico: comparing the sequences of genomic clones or ESTs deposited in public and proprietary databases • Single strand conformational polymorphisms SNP genotyping: identification of specific alleles in a known polymorphism 1.Allele discrimination: allele-specific PCR, allele-specific single-base primer extension (minisequencing), allele-specific ligation, allele-specific enzymatic cleavage, etc. 2. Presence of allele(s) of interest in a given DNA sample: Fluorescence detection, fluorescence resonance energy transfer, fluorescence polarization, mass spectrometry, etc. See details in: Twyman RM & Primrose SB, Techniques patents for SNP genotyping. Pharmacogenomics 4:67-79 (2003) Toxicology ≈ Genetics There is substantial polymorphism in genes that determine the response to xenobiotics both in humans and animals This has important implications for toxicology and pharmacology: • adverse reactions to drugs cause thousands of deaths each year and many of those are associated with susceptible phenotypes • are we protecting the most sensitive in human population when occupational/environmental limits of exposure are established? • how to account for strain differences in susceptibility in animal studies (1000-fold differences have been reported for TCDD LD50 in rats)? • genotyping of individuals from a sample of blood DNA is becoming increasingly easy so it is possible to genotype people for loci that are thought to control susceptibility to certain drugs/xenobiotics Adapted, in part, from M.F.W. Festing, Tox. Lett. 120:293-300 (2001) …loci that are thought to control susceptibility to certain drugs/xenobiotics: Before we can correctly interpret genotyping results we need to: • gain a much better understanding of the genetics of susceptibility • know the mode of action of xenobiotics Problem: relatively little research is done on the genetics of susceptibility and toxicologists in general seem to be unaware of the extent of genetic variation in response among the experimental animals that are being used Problem: modes of action of an overwhelming majority of established toxic substances are still largely unknown (not even worth mentioning scores of compounds that are being newly developed) Adapted, in part, from M.F.W. Festing, Tox. Lett. 120:293-300 (2001) Genotype-Phenotype Interactions in Complex Biological Systems Age Environment Adapted from: Huang, 2002 “The classical interaction of exposure with phase I and phase II XME metabolism, and risk of developing cancer. High exposure to a foreign chemical, combined with rapid metabolic activation and slow conjugation, should put an individual at a high risk of developing cancer. Low or negligible exposure, in combination with slow rates of activation and rapid rates of conjugation, should lead to a low risk of developing environmentally caused cancer.” Aromatic amines Heterocyclic amines N-oxidation O-acetylation: Reactive metabolites (acetoxy-derivatives) cancer Rapid acetylator Intermediate acetylator Slow acetylator From: Hulla et al. Toxc. Sci. (1999) NAT1 and NAT2 J Cancer Res Clin Oncol. 2011 Nov;137(11):1661-7 Lung Cancer. 2011 Aug;73(2):153-7. Survival in women with epithelial ovarian cancer From: Introduction to Biochemical Toxicology 3rd Edition (2001) p. 128 From: Strange et al. Toxc. Lett. (2000) • Several GST gene families have been identified • Null-phenotypes are detoxification-deficient and more likely to suffer formation of carcinogen-DNA adducts and/or mutations • In general, GSTM1- and GSTT1-null are considered high-risk “Reverse Genetics” “Forward Genetics” Genetics in Toxicology Phenotype (e.g., toxic symptoms, cancer) Studying mechanisms of action Genes that control susceptibility/resistance Genotype (gene knockout, polymorphism, etc.) Studying mechanisms of action Phenotype Adapted, in part, from M.F.W. Festing, Tox. Lett. 120:293-300 (2001) “Forward Genetics” and Toxicology Different animal strains nearly always respond differently to the same agent/dose unless the toxic insult is so dramatic that all the animals die very quickly Examples of strain differences (rats) in response to xenobiotics: 3,2’-dimethyl-4-aminobiphenyl prostate tumors 48% F344, 41% ACI, 13% LEW, 7% CD, 0% Wistar N-methyl-N-nitro-N-nitrosoguanidine(MNNG) stomach adenocarcinomas 67% WKY, 60% S-D, 53% LEW, 23% Wistar, 6% F344 There is no such thing as an “animal strain that is particularly susceptible/resistant to carcinogenesis” ! Adapted, in part, from M.F.W. Festing, Tox. Lett. 120:293-300 (2001) Current Approach: Animal studies Human population CD-1 Pharmaceutical Industry Single genome-based risk prediction B6C3F1 National Toxicology Program Genetically Diverse Human Population “Forward Genetics” and Toxicology Designing an IDEAL “forward genetics” animal study for investigating genetic variability in response to a toxic agent: • Survey the known facts about susceptibility in different strains of rodents • Small numbers of animals (4-6 per strain) of several strains should be used to characterize the response to the toxic agent “X” • At least 5 strains should be studied • Dose levels should be selected to elicit a suitable response • Endpoints should be quantitative (e.g., number of tumors) Adapted, in part, from M.F.W. Festing, Tox. Lett. 120:293-300 (2001) Parental strains and derivation of five major types of mouse genetic resources Each of the sequenced strains is shown in a different color depending on the origin. The four wild-derived strains, denoted by asterisks, are CAST/EiJ (M. m. cataneus) in red, PWD/PhJ (M. m. muculus) in blue, MOLF/EiJ (M. m. molossinus) in purple, and WSB/EiJ (M. m. domesticus) in green. The remaining 12 classical laboratory strains are shown in green reflecting the predominant contribution of the M. m. domesticus subspecies to these strains. The shade of green denotes the different origin of the classical strains, with the darker shades denoting strains of Swiss origin (FVB/NJ and NOD/LtJ), the yellow-green denoting a strain of Asian origin (KK/HlJ), and intermediate shade denoting Castle or C57-related strains (129S1/SvImJ, A/J, AKR/J, BALB/cBy, C3H/HeJ, DBA/2J, BTBR T+tf/J, and NZW/LacJ). The figure also shows schematically the derivation process for five types of resources, recombinant inbred lines (BXD); chromosome substitution strains (B.P), Collaborative Cross (CC), heterogeneous stocks (Northport HS), and laboratory strain diversity panel (LSDP) Mamm Genome. 2007 July; 18(6): 473–481 Recombinant inbred strains (RIs) female C57BL/6J (B) fully inbred male DBA/2J (D) BXD chromosome pair isogenic F1 heterogeneous F2 Inbred Isogenic siblings 20 generations Recombined chromosomes are needed for mapping BXD RI Strain set BXD1 brother-sister matings BXD2 +…+ BXD80 Image Credit: genenetwork.org • Once a susceptible/resistant strains have been identified, loci can be mapped • In mice, Recombinant Inbread strains (susceptible x resistant) can be generated • A set of RI strains can be tested for the susceptibility to agent “X” • Once the phenotype have been established, mice can be genotyped to determine which loci segregated with susceptibility/resistance From Zhou et al. (2005) Problems: large number of animals (100-300, or more) resolution of the genetic mapping is only about ± 20 cM (mouse genome is ~50K genes and 1900 cM 1cM ≈ 0.5 Mb) so the identified locus can contain ~500 genes Adapted, in part, from M.F.W. Festing, Tox. Lett. 120:293-300 (2001) “Collaborative Cross” The Resource for Forward Genetics Research Images from Threadgill DW Single Strain: Constant Genotype Control 10 mg 25 mg 50 mg 100 mg Vary the environment (e.g., treatment) Many Strains: Varied Genotype Strain 1 Strain 2 Strain 3 Strain 4 Strain 5 Strain 6 Fix the environment (same treatment), vary the genotype Total mouse SNPs = ~40M (M.m.musculus, M.m.domesticus, M.m.castaneous) Total human SNPs = ~20M Strain 7 Profiling Liver Toxicity to APAP in a Genetically Diverse Population Dose response to liver injury: ALT (24 h) Multi-strain profiling of APAP-induced liver injury: % liver necrosis (24h), reduced GSH (4h), ALT (24h), ALT (4h) Dose response to liver injury (4 h) vs survival (24 h) “Reverse Genetics” and Toxicology A knockout or over-expressor animal strain, or animals with a known polymorphism(s) in important genetic regions Dose with a chemical(s) Evaluate the phenotype Looks MUCH easier than “Forward Genetics” experiment! Let’s do it! Problems: if mutant to non-mutant comparison is being made, the genetic backgrounds MUST be identical ! if the strains have been crossed, care is needed to ensure that the observed differences are not due to a gene closely linked to the gene of interest genes do not act alone! Several alleles may be important, their effects can be additive or epistatic Adapted, in part, from M.F.W. Festing, Tox. Lett. 120:293-300 (2001) PPARa (+/+) + WY-14,643 (11 months) PPARa (-/-) + WY-14,643 (11 months) Peters et al., Carcinogenesis, 1997 Peroxisome Proliferators: Species Differences • • • • Mouse and rat: Marmoset: Guinea Pig: Humans: highly responsive does not respond no peroxisome proliferation, but have hypolipidaemia believed to be unresponsive, but have hypolipidaemia • • PPARa exists in mouse, rat, guinea pig and human In humans: Lower hepatic levels of PPARa Lower ligand binding activity Different structure (polymorphisms) Different PP Response Elements in DNA Presence of competing proteins for PPRE Expression of dominant-negative form of PPARa across mouse inbred strains Palmer et al., Molecular Pharmacology, 1998 Untreated (6, 24 hr in culture) Activation of PPARα in mouse and human hepatocytes Treated (6, 24 hr in culture) Limited overlap in response to Wy-14,643 at individual gene level but major overlap at pathway level Upregulated genes Wy-14,643 treatment causes major changes in gene expression in human and mouse hepatocytes Gene Ontology Downregulated genes Gene Set Analysis Well-studied genetic variants in human disease From Taylor et al. Trends Mol Med 7:507-512 (2001) Most drug-metabolizing enzymes exhibit clinically relevant genetic polymorphisms. Essentially all of the major human enzymes responsible for modification of functional groups [phase I reactions (left)] or conjugation with endogenous substituents [phase II reactions (right)] exhibit common polymorphisms at the genomic level; those enzyme polymorphisms that have already been associated with changes in drug effects are separated from the corresponding pie charts. The percentage of phase I and phase II metabolism of drugs that each enzyme contributes is estimated by the relative size of each section of the corresponding chart. ADH, alcohol dehydrogenase; ALDH, aldehyde dehydrogenase; CYP, cytochrome P450; DPD, dihydropyrimidine dehydrogenase; NQO1, NADPH:quinone oxidoreductase or DT diaphorase; COMT, catechol Omethyltransferase; GST, glutathione S-transferase; HMT, histamine methyltransferase; NAT, Nacetyltransferase; STs, sulfotransferases; TPMT, thiopurine methyltransferase; UGTs, uridine 5'-triphosphate glucuronosyltransferases. From Evans WE and Relling MV Science 286:487 (1999). Cytochrome P450 genotyping From: Flockhart DA and Webb DJ. Lancet (1998) FDA OKs Genetic Test Linked to Warfarin Sep 17 2007 WASHINGTON (AP) - A genetic test that can reveal what patients are especially sensitive to the blood-thinner warfarin won federal approval Monday. Such screenings could prevent thousands of complications each year, health officials estimate. The approval of the test comes a month after warfarin, sold under the brand name Coumadin and in generic forms, became the first widely used drug to include genetic testing information on its label. The information can help doctors determine how best to prescribe the drug. An estimated one-third of patients process the drug differently than do most others, exposing them to a higher risk of bleeding. Research suggests that most of that sensitivity is due to variations in two genes. The new test, made by Nanosphere Inc. of Northbrook, Ill., can detect some of those variants. One of the genes produces an enzyme that helps the body metabolize warfarin and other medicines; the second produces the blood-clotting protein that warfarin blocks. Human Cytochrome P450 2C9 with bound Warfarin Nature 424, 464-468 (2003) Image Source: www.pharmgkb.org POPULATION-BASED GWAS AND TOXICOLOGY: DRUG-INDUCED ADVERSE EFFECT STUDIES 866,399 markers 51 cases of flucloxacillin DILI 282 matched controls Daly et al. HLA-B*5701 genotype is a major determinant of drug-induced liver injury due to flucloxacillin. Nat Genet. 2009 Jul;41(7):816-9. The FDA Abacavir Warning (July 24, 2008) Abacavir (marketed as Ziagen) and Abacavir-containing Medications FDA reviewed data from two studies that support a recommendation for pre-therapy screening for the presence of the HLA-B*5701 allele and the selection of alternative therapy in positive subjects. Genetic tests for HLA-B*5701 are available and all patients should be screened for the HLAB*5701 allele before starting or restarting treatment with abacavir or abacavir-containing medications. Development of clinically suspected abacavir HSR requires immediate and permanent discontinuation of abacavir therapy in all patients, including patients negative for HLA-B*5701. Genomenewsnetworks.org The genomes of more than 180 organisms have been sequenced since 1995. The Quick Guide includes descriptions of these organisms and has links to sequencing centers and scientific abstracts. Ultra High Throughput Sequencing – Towards the “$1,000 Genome” Illumina® “SOLEXA” Genome Analyzer Roche® 454 Genome Sequencer Seqanswer.com Roche.com & Nature Biotechnology Illimina.com DNA Sequencing Transcriptome analysis Gene regulation and control DNA Sequencing Transcriptome analysis Gene regulation and control Ultra High Throughput Sequencing – Enabling GWAS Studies amateurbrainsurgery.com Genome-wide plots of available GWAS results for all associations P = 0.0001. (BMC Medical Genetics 2009) compgen.unc.edu www.niehs.nih.gov/crg/ ornl.gov From: Strange et al. Toxc. Lett. (2000) Toxicogenetics: what’s next? Goal: When we find all polymorphisms in genes important for metabolism/detoxification of xenobiotics, we can link them to particular drug or chemical toxicity and identify susceptible populations Problem: Simple research questions generate erroneous results (e.g. CYP2D6 polymorphisms and lung cancer, CYP2E1 polymorphisms and alcoholic liver disease) Problem: Biological complexity of mechanisms, ethnic variation, clinical heterogeneity, etc… both positive and negative results are true? Linking complex trait diseases to genetic polymorphisms requires (Todd, 1999): • large sample sizes and small p-values • Initial study + several replications • Genetic associations should make biological sense • Physiologically meaningful data should support a functional role of the polymorphism in question Ethical, Legal and Social Issues in toxicogenetics are as complex as the studies of polymorphisms themselves http://genomics.unc.edu/articles/elsi_article.htm