Assignment 9: Evolutionary Evidence for Function: Selection
BMMB 551: Genomics
The purpose of this assignment is to give you some experience with tools and resources for finding evidence of recent selection.
I. K
A
/K
S
, also called d
N
/d
S
, ratio test
The fundamental idea of comparing the rate of change in synonymous and nonsynonymous sites to infer evidence of selection, as we discussed in class, is straightforward. However, when more than one difference is found in a codon, then there are multiple pathways possible to generate those differences, and these have to be considered (most simply by giving them equal weights). The method described by Nei and Gojobori to deal with the complications encountered when there is more than one change per codon is effective, and there are web servers available for computing the d
N
/ d
S
(or K
A
/K
S
) ratio using this method. This is described on pages 418419 of Nei and Gojobori’s 1986 paper on a simple way to compute d
N
/ d
S
ratios. Alternatively, you can read a similar presentation at http://pubmlst.org/software/analysis/start/manual/dsdn.shtml
(1) Download (or cut-and-paste) this multifasta file from the Angel website; it is in the folder named “Assignment 9: Files for the assignment”.
RNASE1_1B_D1l_Dl1B_Mmul_Hs_multifasta.txt
This file contains the sequences of the coding regions of the cDNA for four pancreatic RNases. I have entered them as aligned sequences, and the alignment was aware of codons. Specifically, the macaque sequence is shorter than the others due to a deletion, and that deletion is shown as a series of dashes in the fastA file. The deletion has been placed so that codons are not interrupted. The RNase sequences include two from Douc langur (RNASE1 and
RNASE1B), and RNASE1 from Macaca mulatta and from Homo sapiens . The
Douc langur RNASE1
B is the same as “neoPR” from the presentation on adaptive evolution in RNases from leaf-eating monkey.
As always, you can instead gather sequences you want to study and make a multifasta file.
Be sure and make the alignment “codon aware”.
(2) Use online resources to analyze the rate of substitutions at synonymous and nonsynonymous sites.
Bergen (Norway) Center for Computational Science: http://services.cbu.uib.no/tools/kaks
This has a pretty clean interface and several options to work with. If you just input the multifasta file I provided, then in the second window (“Tree”), use the
1
option to “Calculate phylogenetic tree from prealigned sequences”. The defaults for Ka/Ks parameters are fine.
(3) Report on what you find in the output . I am interested in learning both (a) how well do you understand the method, and (b) what you can deduce biologically from the information about K
A
, K
S
, and the ratio. Which genes appear to be under purifying (negative) selection, and which show evidence of adaptive change
(positive selection)? A paper on “Adaptive evolution of a duplicated pancreatic ribonuclease gene in a leafeating monkey” from Zhang, Zhang and Rosenberg
(2002, Nature Genetics) is provided in the Assignment 9 folder to help you interpret the results. Do your results recapitulate those in the Zhang et al paper?
If not, discuss factors that could contribute to the differences. If your results do recapitulate those in the paper, then explain what those results are telling you biologically.
II. Signatures of recent selection based on allele frequencies and haplotype homozygosity.
(1) Pick at least one locus in which variants are reported to be adaptive, or pick some that you want to examine for signals of recent selection.
E.g.:
Light skin color in Europeans SLC24A5 chr15:45,840,001-46,940,000
Lamason RL, Mohideen MA, Mest JR, Wong AC, Norton HL, Aros MC, Jurynec MJ, Mao
X, Humphreville VR, Humbert JE, Sinha S, Moore JL, Jagadeeswaran P, Zhao W, Ning
G, Makalowska I, McKeigue PM, O'donnell D, Kittles R, Parra EJ, Mangini NJ, Grunwald
DJ, Shriver MD, Canfield VA, Cheng KC. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science. 2005 Dec 16;310(5755):1782-6.
Lactose tolerance in Europeans LCT chr2:135,314,001-136,800,000
Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA, Rhodes
M, Reich DE, Hirschhorn JN. Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet. 2004 Jun;74(6):1111-20. Epub 2004 Apr 26.
Brain size MCPH1 chr8:6,180,001-6,580,000 (This one is controversial.)
Evans PD, Gilbert SL, Mekel-Bobrov N, Vallender EJ, Anderson JR, Vaez-Azizi LM,
Tishkoff SA, Hudson RR, Lahn BT. Microcephalin, a gene regulating brain size, continues to evolve adaptively in humans. Science. 2005 Sep 9;309(5741):1717-20.
(2) Look for signatures of recent selection at your chosen locus or loci (the coordinates above include surrounding DNA to give you some context) in the
UCSC Genome Browser.
Here is a shared browser session: http://genome.ucsc.edu/cgibin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=rosshardison&h gS_otherUserSessionName=RecentSelection_iHS_TajimaD_hg18
2
Note that these are available only on the Mar. 2006 (NCBI36/hg18) assembly of the reference human genome, under “Variation and Repeats”.
Suggested tracks to start with: integrated haplotype scores (iHS) or HGDP XP-EHH : per-continent Cross
Population Extended Haplotype Homozygosity (XP-EHH). (These produce similar results.)
Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li J, Absher D, Srinivasan BS, Barsh GS,
Myers RM, Feldman MW, Pritchard JK. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009 May;19(5):826-37.
Tajima's D : a measure of genetic diversity at the nucleotide level that compares the observed nucleotide diversity to that expected under the assumption that all polymorphisms are neutral and that the population size is constant.
Carlson CS, Thomas DJ, Eberle M, Livingston R, Rieder M, Nickerson DA. Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res 2005
Nov;15(11):1553-65.
(3) Report on what you observe (screenshots are useful) and what it means biologically . I am interested in learning both (a) how well do you understand the methods, and (b) what you can deduce biologically from the information.
Connecting your observations to the readings is always good.
3