population genetics

advertisement
PROBLEM SET 4
POPULATION GENETICS
PART I: The goal of this part of Problem Set 4 is to add to your understanding of the
regulation of phenotypic variation in populations. By reviewing some ideas in
quantitative population genetics, you will explore the influence of various factors
(dominance, allele frequency, and genotype values) on the frequency distribution of
quantitative traits. The fundamental “theorem” of quantitative genetics is
Vp = Vg + Ve,
or phenotypic variance is the sum of genotypic and environmental variances.
The exercise will consist of two parts. First, you will review theoretical
expectations for distribution of phenotypes under various assumptions. Second, you will
generate samples from populations using a Monte-Carlo technique. First, however some
preliminaries.
Assume the following genetic model for a quantitative trait:
Y=G+E
Genotype
Frequency
Mean Value
A1A1
p2
a
A1A2
2pq
b
A2A2
q2
c
where Y is the phenotypic value, G is the genotype at an autosomal locus, and E is the
random environmental effect; A1 and A2 are the allelles at the locus with gene frequencies
p and q (p + q = 1). The quantities a, b, and c are mean values (e.g. size) of genotypes
A1A1, A1A2, A2A2, having frequencies p2, 2pq and q2 respectively. Random
environmental effects, E, are assumed to have a normal distribution with mean zero and
an environmental variance, Ve. The probability density function of a phenotype Y would
be:
f(Y)= p[A1A1}*f(Y|A1A1) + p[A1A2]*f(Y|A1A2) + p[A2A2]*f(Y|A2A2)
= p2*g(Y,a,Ve) + 2pq*g(Y,b,Ve) + q2*g(Y,c,Ve)
where g(Y,m,V) is a probability density function of a variable (Y) with mean m and
variance V. The formula for the probability density function of a normal distribution,
g(Y,a,Ve), is:
  Y  a 2 
1


gY , a ,Ve  
 * exp 

.
* Ve 
 2 * 31416
 2 * Ve 
1
The parameters of this distribution are p, a, b, c and Ve. The following
relationships may be useful for the evaluating the dependence of phenotypic distribution
on its parameters:
Overall genotypic mean: G = ap2 + 2bpq + cq2
Total genetic variance:
Vg = a2p2 + 2b2pq + c2q2 - (G)2
= 2pq[p(a - b) + q(b - c)]2 + p2q2[a - 2b + c]2
= Va + Vd
where Va is called the additive variance and Vd is called the variance due to
dominance.
Heritability: hg 
2
Vg
Vp
and,
Degree of dominance: x 
ha2 
Va
Vp
a  2b  c
ac
PART Ia: Set up a spread-sheet to calculate Vg, Va, Vd, Vp, ha, hg and x given a set of
parameters values for p, a, b, c, and Ve. The spreadsheet should also contain a graph of
f(Y) vs Y from at least 20 pairs of data. The range of Y should by plus or minus 2
standard deviations ( Ve ) beyond the maximum genotypic range. Evaluate the
probability density function by varying parameters values to obtain three types of
phenotypic distributions (trimodal, bimodal, and unimodal).
PART Ib: Create a Visual Basic module to generate three sets of 500 phenotypes using
these three sets of parameter values. Use our standard function to generate standard
normal random variable to add environmental variability and random numbers to assign
genotype of each of the 500 individuals. Remember our standard function is:
R = 2* RND - 1
ZR = log ((1+R)/(1-R))/1.82
From each of three samples of 500 individuals, generate a histogram of frequency vs Y.
The histograms should contain a minimum of 15 bins.
PART II: The purposes of this part of the assignment are to study the effect of
selection and/or mutation on a large randomly mating population and the possible effect
of population size on variation of gene frequencies in small populations.
PART IIa. Effects of micro-evolutionary forces on genotype frequencies. You will
create a spreadsheet model to explore the effects of selection and mutation on genetic
2
structure of populations. Assume a trait is controlled by two alleles (a and A) at an
autosomal locus with the initial gene frequency of "a" being r and that of "A" being (1-r);
W1, W2, and W3 are the relative fitness values for aa, aA , and AA genotypes; and m is
the forward mutation rate of A to a and n is the reverse mutation rate of a to A. The
initial allele frequency of a is obtained from genotype frequencies:
T
rS
2
where S is the frequency of aa, T the frequency of Aa, and U the frequency of AA.
Selection and mutation act on the allele frequency of the next generation as follows:
TW2
 SW1
2
rG 1 
 (1  m  n)  m
W
where
W  SW1  TW2  UW3
The genotype frequencies in the next generation assume random mating:
S G 1  rG21
TG 1  2rG 1 (1  rG 1 )
U G 1  (1  rG 1 ) 2
Your spreadsheet model should calculate the initial gene frequency of “a” from the initial
genotype frequencies. Set the relative fitness values of the three genotypes (W1, W2, and
W3), and the mutation rates (m and n) as constants. The spreadsheet should show
genotype and frequency of the “a” allele from at least 100 generations.
1. Show that your predicted genotype frequencies will achieve Hardy-Weinberg
equilibrium in one generation in the absence of selection and mutation.
2. Show the effect of variation in genotype fitness on allele and genotype frequencies.
At a minimum, obtain results for W W2>W1 and W2>W3 and for W3<W2 and W3<W1
or W1<W2 and W1<W3. What is the effect of mutation on the steady-state obtained?
3. Assuming that W1=0 and both W2 and W3 are 1, show that the ultimate steady-state
frequency of allele “a” is a function of the mutation rates. Find this function.
PART IIb. Random genetic drift. Random genetic drift can be simulated by using a
Monte-Carlo approach. You will create a second model (using Visual Basic) that
simulates effects of random mating in finite populations. For each individual, you first
randomly select a genotype for each parent and then the allele contributed by each parent.
The sampling is repeated for each individual in the population for as many generations as
desired. The following six assumptions will facilitate model building:
1. The model describes only single locus, 2 allele genetics systems, (i.e. three genotypes,
aa, Aa, and AA, with frequencies of S, T, and U;
2. No selection occurs;
3
3. Genotypes of the next generation are determined only by the genotypes of the present
generation;
4. Number of offspring per generation is constant;
5. Mating is random; and
6. Genotype frequencies are equal for males and females;
Let N be the number of offspring per generation. The number of parents is thus 2N. For
2N random numbers, determine the genotype by the following convention:
For random number R, if R is equal or less than S, then the genotype is aa. If R is less
than S+T, then the genotype is aA. Otherwise the genotype is AA for a heterozygote
parent (aA), another random number must be selected to choose the allele. If R is less
than 0.5 then the allele is A.
The Visual Basic model should accept variable population size and should continue until
either 100 generations have elapsed or the frequency of “a” reaches 0 or 1. Plot
frequency of the "a" allele with time for population sizes of 10, 20 and 30 individuals.
Repeat the drift simulation three times to find the average time required for fixation to
occur as a function of generation size. Finally create a plot of average time to fixation
versus population size for a range of sizes of 10 to 100 individuals.
SUBMIT:
An EXCEL file with
PART I
a. PART Ia worksheet(s) and PART Ib module (10 points)
b. Plots of 3 phenotypic distributions in PART Ia (5 points)
c. Three histograms in PART Ib (5 points)
PART II
a. Models for PART IIa and IIb and their outputs (10 points).
b. Two plots: One plot of the allele frequencies under genetic drift and a second
plot of time to fixation as a function of generation size (10 points).
A document file with the following discussion points:
PART I
A discussion comparing theoretical and simulated samples of frequencies of
phenotypes. You should include insights you have gained about the influence of
genotype values, dominance, and environmental variability on the observability of
underlying genotypic frequencies in populations. You must elaborate on the values of
Heritability and Dominance. Use your text book from Population Biology to discuss
them. (20 points).
PART II
A discussion of the relative effects of micro-evolutionary forces and genetic drift on
variation in genetic structure of populations. The discussion must include reference to
the required analyses (1 to 3) in Part IIa. (20 points).
4
5
Download