Lecture 26 : Tests of Neutrality 2 April 17, 2014 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based tests of neutrality Ewens-Watterson Test Tajima’s D Today Hudson-Kreitman-Aguade Test Synonymous versus Nonsynonymous substitutions McDonald-Kreitman Hudson-Kreitman-Aguade Test Divergence between species should be of same magnitude as variation within species Provides a correction factor for mutation rates at different sites Complex goodness of fit test Perform test for loci under selection and supposedly neutral loci Hudson-Kreitman- Aguade(HKA) test (Hamilton 266) Hudson-Kreitman- Aguade(HKA) test Adh Polymorphism within 0.101 species (S/m) Divergence between 0.056 Species(D/m) Ratio 1.80 (within/between) χ2 6.09 p-value 0.016 Control locus 0.022 0.052 0.42 http://www.nsf.gov/news/mmg/media/images/corn-and-teosinte_h1.jpg Teosinte Maize Maize w/TBR mutation http://www.nsf.gov/news/mmg/media/images/corn-and-teosinte_h1.jpg Mauricio 2001; Nature Reviews Genetics 2, 376 Problem 3. Files utr_mays.arp, utr_par.arp, exon_mays.arp, and exon_par.arp contain sequence data from the 5’ untranslated region and from an exon of the teosinte branched1(tb1) gene of maize (Zea mays ssp. mays) and its most likely wild progenitor Zea mays ssp. parviglumis. File Region of tb1 utr_mays.arp 5’ UTR utr_par.arp 5’ UTR exon_mays.arp exon exon_par.arp exon Subspecies mays parviglumis mays parviglumis For each of these regions of tb1 and for each subspecies: a) Use Arlequin to determine the number of segregating sites (S) and calculate the nucleotide diversity (). What can you infer by comparing nucleotide diversity between the two species for each region? b) Use Arlequin to perform the tests of neutrality developed by Ewens-Watterson and Tajima. Interpret and discuss the results both statistically and biologically. c) Interpret and discuss the results from the following 2 HKA tests: Test A tb1 5’ untranslated Average of region control loci Polymorphism 0.00093 0.01996 within subspecies Divergence 0.05255 0.02242 between subspecies χ2 13.58 p-value 0.001 Test B Polymorphism within subspecies Divergence between subspecies χ2 p-value tb1 translated region 0.00243 Average of control loci 0.01996 0.01273 0.02242 2.70 0.26 HKA Example: Teosinte Branched Using Synonymous Substitutions to Control for Factors Other Than Selection dN/dS or Ka/Ks Ratios Types of Mutations (Polymorphisms) Synonymous versus Nonsynonymous SNP First and second position SNP often changes amino acid UCA, UCU, UCG, and UCC all code for Serine Third position SNP often synonymous Majority of positions are nonsynonymous Not all amino acid changes affect fitness: allozymes Synonymous & Nonsynonymous Substitutions Synonymous substitution rate can be used to set neutral expectation for nonsynonymous rate dS is the relative rate of synonymous mutations per synonymous site dN is the relative rate of nonsynonymous mutations per non-synonymous site = dN/dS If = 1, neutral selection If < 1, purifying selection If > 1, positive Darwinian selection For human genes, ≈ 0.1 Complications in Estimating dN/dS Multiple mutations in a codon give multiple possible paths Two types of nucleotide base substitutions resulting in SNPs: transitions and transversions not equally likely CGT(Arg)->AGA(Arg) CGT(Arg)->AGT(Ser)->AGA(Arg) CGT(Arg)->CGA(Arg)->AGA(Arg) Back-mutations are invisible Complex evolutionary models using likelihood and Bayesian approaches must be used to estimate dN/dS (also called KA/KS or KN/KS depending on method) (PAML package) http://www.mun.ca/biology/scarr/Transitions_vs_Transversions.html dn/ds ratios for 363 mouse-rat comparisons Most genes show purifying selection (dN/dS < 1) Some evidence of positive selection, especially in genes related to immune system interleukin-3: mast cells and bone marrow cells in immune system Hartl and Clark 2007 McDonald-Kreitman Test Conceptually similar to HKA test Uses only one gene Contrasts ratios of synonymous divergence and polymorphism to rates of nonsynonymous divergence and polymorphism Gene provides internal control for evolution rates and demography Application of McDonald-Kreitman Test: Aligned 11,624 gene sequences between human and chimp Calculated synonymous and nonsynonymous substitutions between species (Divergence) and within humans (SNPs) Identified 304 genes showing evidence of positive selection (blue) and 814 genes showing purifying selection (red) in humans Positive selection: defense/immunity, apoptosis, sensory perception, and transcription factors Purifying selection: structural and housekeeping genes Bustamente et al. 2005. Nature 437, 1153-1157 Genes showing purifying (red) or positive (blue) selection in the human genome based on the McDonald-Kreitman Test Bustamente et al. 2005. Nature 437, 1153-1157 Problem 4. Calculate the ω = dN/dS ratio based on the following 2 DNA sequences: 5’-ATG GTT CAT TTT ACC GGA CGA AGT CGA TTA-3’ 5’-ATG GAT CAC TTG ACC GCA CGA AGT AGA TTA-3’ What does the value of ω indicate? Problem 5. GRADUATE STUDENTS ONLY: Search the literature for an example of an application of one of the tests for departures from neutrality. Describe the question that the test is addressing, the results, and the authors’ interpretation of the results. Receive two points of extra credit if you can find a case in which the test is inappropriately applied and/or interpreted. Please send the paper to Rose when you submit your report.