PROBLEM SET 4 POPULATION GENETICS PART I: The goal of this part of Problem Set 4 is to add to your understanding of the regulation of phenotypic variation in populations. By reviewing some ideas in quantitative population genetics, you will explore the influence of various factors (dominance, allele frequency, and genotype values) on the frequency distribution of quantitative traits. The fundamental “theorem” of quantitative genetics is Vp = Vg + Ve, or phenotypic variance is the sum of genotypic and environmental variances. The exercise will consist of two parts. First, you will review theoretical expectations for distribution of phenotypes under various assumptions. Second, you will generate samples from populations using a Monte-Carlo technique. First, however some preliminaries. Assume the following genetic model for a quantitative trait: Y=G+E Genotype Frequency Mean Value A1A1 p2 a A1A2 2pq b A2A2 q2 c where Y is the phenotypic value, G is the genotype at an autosomal locus, and E is the random environmental effect; A1 and A2 are the allelles at the locus with gene frequencies p and q (p + q = 1). The quantities a, b, and c are mean values (e.g. size) of genotypes A1A1, A1A2, A2A2, having frequencies p2, 2pq and q2 respectively. Random environmental effects, E, are assumed to have a normal distribution with mean zero and an environmental variance, Ve. The probability density function of a phenotype Y would be: f(Y)= p[A1A1}*f(Y|A1A1) + p[A1A2]*f(Y|A1A2) + p[A2A2]*f(Y|A2A2) = p2*g(Y,a,Ve) + 2pq*g(Y,b,Ve) + q2*g(Y,c,Ve) where g(Y,m,V) is a probability density function of a variable (Y) with mean m and variance V. The formula for the probability density function of a normal distribution, g(Y,a,Ve), is: Y a 2 1 gY , a ,Ve * exp . * Ve 2 * 31416 2 * Ve 1 The parameters of this distribution are p, a, b, c and Ve. The following relationships may be useful for the evaluating the dependence of phenotypic distribution on its parameters: Overall genotypic mean: G = ap2 + 2bpq + cq2 Total genetic variance: Vg = a2p2 + 2b2pq + c2q2 - (G)2 = 2pq[p(a - b) + q(b - c)]2 + p2q2[a - 2b + c]2 = Va + Vd where Va is called the additive variance and Vd is called the variance due to dominance. Heritability: hg 2 Vg Vp and, Degree of dominance: x ha2 Va Vp a 2b c ac PART Ia: Set up a spread-sheet to calculate Vg, Va, Vd, Vp, ha, hg and x given a set of parameters values for p, a, b, c, and Ve. The spreadsheet should also contain a graph of f(Y) vs Y from at least 20 pairs of data. The range of Y should by plus or minus 2 standard deviations ( Ve ) beyond the maximum genotypic range. Evaluate the probability density function by varying parameters values to obtain three types of phenotypic distributions (trimodal, bimodal, and unimodal). PART Ib: Create a Visual Basic module to generate three sets of 500 phenotypes using these three sets of parameter values. Use our standard function to generate standard normal random variable to add environmental variability and random numbers to assign genotype of each of the 500 individuals. Remember our standard function is: R = 2* RND - 1 ZR = log ((1+R)/(1-R))/1.82 From each of three samples of 500 individuals, generate a histogram of frequency vs Y. The histograms should contain a minimum of 15 bins. PART II: The purposes of this part of the assignment are to study the effect of selection and/or mutation on a large randomly mating population and the possible effect of population size on variation of gene frequencies in small populations. PART IIa. Effects of micro-evolutionary forces on genotype frequencies. You will create a spreadsheet model to explore the effects of selection and mutation on genetic 2 structure of populations. Assume a trait is controlled by two alleles (a and A) at an autosomal locus with the initial gene frequency of "a" being r and that of "A" being (1-r); W1, W2, and W3 are the relative fitness values for aa, aA , and AA genotypes; and m is the forward mutation rate of A to a and n is the reverse mutation rate of a to A. The initial allele frequency of a is obtained from genotype frequencies: T rS 2 where S is the frequency of aa, T the frequency of Aa, and U the frequency of AA. Selection and mutation act on the allele frequency of the next generation as follows: TW2 SW1 2 rG 1 (1 m n) m W where W SW1 TW2 UW3 The genotype frequencies in the next generation assume random mating: S G 1 rG21 TG 1 2rG 1 (1 rG 1 ) U G 1 (1 rG 1 ) 2 Your spreadsheet model should calculate the initial gene frequency of “a” from the initial genotype frequencies. Set the relative fitness values of the three genotypes (W1, W2, and W3), and the mutation rates (m and n) as constants. The spreadsheet should show genotype and frequency of the “a” allele from at least 100 generations. 1. Show that your predicted genotype frequencies will achieve Hardy-Weinberg equilibrium in one generation in the absence of selection and mutation. 2. Show the effect of variation in genotype fitness on allele and genotype frequencies. At a minimum, obtain results for W W2>W1 and W2>W3 and for W3<W2 and W3<W1 or W1<W2 and W1<W3. What is the effect of mutation on the steady-state obtained? 3. Assuming that W1=0 and both W2 and W3 are 1, show that the ultimate steady-state frequency of allele “a” is a function of the mutation rates. Find this function. PART IIb. Random genetic drift. Random genetic drift can be simulated by using a Monte-Carlo approach. You will create a second model (using Visual Basic) that simulates effects of random mating in finite populations. For each individual, you first randomly select a genotype for each parent and then the allele contributed by each parent. The sampling is repeated for each individual in the population for as many generations as desired. The following six assumptions will facilitate model building: 1. The model describes only single locus, 2 allele genetics systems, (i.e. three genotypes, aa, Aa, and AA, with frequencies of S, T, and U; 2. No selection occurs; 3 3. Genotypes of the next generation are determined only by the genotypes of the present generation; 4. Number of offspring per generation is constant; 5. Mating is random; and 6. Genotype frequencies are equal for males and females; Let N be the number of offspring per generation. The number of parents is thus 2N. For 2N random numbers, determine the genotype by the following convention: For random number R, if R is equal or less than S, then the genotype is aa. If R is less than S+T, then the genotype is aA. Otherwise the genotype is AA for a heterozygote parent (aA), another random number must be selected to choose the allele. If R is less than 0.5 then the allele is A. The Visual Basic model should accept variable population size and should continue until either 100 generations have elapsed or the frequency of “a” reaches 0 or 1. Plot frequency of the "a" allele with time for population sizes of 10, 20 and 30 individuals. Repeat the drift simulation three times to find the average time required for fixation to occur as a function of generation size. Finally create a plot of average time to fixation versus population size for a range of sizes of 10 to 100 individuals. SUBMIT: An EXCEL file with PART I a. PART Ia worksheet(s) and PART Ib module (10 points) b. Plots of 3 phenotypic distributions in PART Ia (5 points) c. Three histograms in PART Ib (5 points) PART II a. Models for PART IIa and IIb and their outputs (10 points). b. Two plots: One plot of the allele frequencies under genetic drift and a second plot of time to fixation as a function of generation size (10 points). A document file with the following discussion points: PART I A discussion comparing theoretical and simulated samples of frequencies of phenotypes. You should include insights you have gained about the influence of genotype values, dominance, and environmental variability on the observability of underlying genotypic frequencies in populations. You must elaborate on the values of Heritability and Dominance. Use your text book from Population Biology to discuss them. (20 points). PART II A discussion of the relative effects of micro-evolutionary forces and genetic drift on variation in genetic structure of populations. The discussion must include reference to the required analyses (1 to 3) in Part IIa. (20 points). 4 5