BIOL 464/GEN 535 Population Genetics

advertisement
Name: Answer Key
BIOL 464/GEN 535 Population Genetics
Fall 2011
Test # 2, 11/2/2011
Please answer all questions unless otherwise specified. Show as much of your work as you can, including which
formulas you use (even if actual calculations are done with software). Use the back of the page and additional
sheets as necessary.
1. Indicate the correct relationship between the terms or parameters in the table, all else being equal. (15 points)
Term 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Genetic drift in large populations
HT
Census population size
Number of alleles prior to bottleneck
FST with high gene flow between populations
Variance in allele frequencies among
completely isolated populations undergoing
genetic drift
FST among populations with very small
effective population sizes
Average allele frequency among many isolated
subpopulations before genetic drift
Genetic match probability with genotyping
error included
Genetic match probability with population
structure included
FRT
Match probability for 10 loci with 2 alleles
with equal population frequencies
Time to coalescence for first coalescent event
for a locus with six alleles
Average time to fixation of a favored allele in a
large population
Number of possible rooted trees for a group of
taxa
≤ or ≥
or =
≤
≥
≥
≥
≤
≥
≥
=
≥
≥
≤
≥
≤
≥
≥
Term 2
Genetic drift in small populations
Average HS
Effective population size
Number of alleles after bottleneck
FST with low gene flow between populations
Variance in allele frequencies within
completely isolated populations undergoing
genetic drift
FST among populations with very large
effective population sizes
Average allele frequency among many isolated
subpopulations after genetic drift
Genetic match probability without considering
genotyping error
Genetic match probability without considering
population structure
FST
Match probability for 2 loci with 10 alleles
with equal population frequencies
Time to coalescence for fifth coalescent event
for a locus with six alleles
Average time to fixation of a favored allele in a
small population
Number of possible unrooted trees for a group
of taxa
2. The following phylogeny and Structure bar plot was derived from 96 microsatellite genotypes for 69 dog
breeds (Parker et al. 2004)
Parker et al. 2004. Genetic Structure of the Purebred Domestic Dog. Science 304: 1160-1164.
A. Based on the phylogenetic tree above, mark the following statements True or False (5 points)
1.
2.
3.
4.
5.
The Akita is more closely related to the Siberian Husky than to the Afghan Hound. T F False
The Akita is more closely related to the Chow Chow than to the Shiba Inu. T F True
The Akita is more closely related to the Shiba Inu than to the Chinese Shar-Pei. T F True
The Basenji is more closely related to the Siberian Husky than to the Saluki. T F False
The Basenji is more closely related to the Saluki than to the Akita. T F True
B. Based on the figures on the previous page, propose a reasonable hypothesis for the origin and evolution of
the Alaskan Malamute (marked with circle). (5 points)
Based on the phylogenetic tree, it is reasonable to assume that the Alaskan Malamute is derived from the Gray
wolf. However, the results of the Structure analysis suggest that there has been recent admixture between the
modern gray wolf and another Malamute ancestor that is more closely-related to domesticated dog breeds.
Therefore, the Malamute breed probably results from recent hybridization, either through directed breeding
efforts or through secondary contact between feral dogs and wolves.
C. Why do you think the phylogenetic analysis failed to resolve the relationships among most of the highly
domesticated breeds? (5 points)
Domesticated dog breeds are probably highly similar across most of their genomes due to their rather recent
evolution from a common ancestor as well as the very specific breeding targets that drove the crossing efforts,
targeting relatively small portions of the dog genomes. Therefore, the most discriminant markers may not have
been used in this phylogenetic analysis, which was based on noncoding microsatellites. More importantly,
modern breeds are the result of extensive cross-hybridization among different lineages, which is difficult to
recreate with a simple, bifurcating tree such as that which results from a Neighbor-Joining analysis.
D. What additional analyses would you perform with this same set of genetic data to evaluate substructure and
relationships among the breeds? (5 points)
The first, and probably most important analysis, is to determine if Structure will detect further subdivision in
this dataset using k>2 a priori groups. The most likely number of groups could be determined from the
likelihood ratios derived from the mean posterior probabilities of each run, or by the delta-k method of Evanno
et al. 2005. After this initial run, subsequent runs of structure should be performed on each of the groups
identified in the full dataset. This procedure should be continued until k=1 is the best-supported grouping for
each subgroup. This would provide a much more fine-grained view of the relationships among the groups, and
in principle should be able to accommodate admixture among the breeds.
Many student answered this question by stating that they would perform some form of AMOVA on the breeds.
This approach is likely to be inferior to the phylogenetic approaches because it wouldn’t readily recapitulate the
hierarchical relationships among the breeds, and it would still be subject to the same limitations that are listed
above for the phylogenetic analysis.
Answers that suggested some form of additional data gathering received partial credit. This almost always helps
enhance resolution.
3. A Fish and Game warden detains a fisherman suspected of catching trout illegally in Blackwater Creek,
where fishing is prohibited. The fisherman claims to have caught the fish legally in a recreational lake that is
routinely stocked. The warden, having taken BIOL 464, decides to resolve the issue with microsatellites.
Based on the following profile, calculate an appropriate likelihood ratio. Be sure to state the two hypotheses
that you are comparing, and present a proper interpretation of the results. (15 points)
Locus
Fish Genotype
Blackwater Creek (native)
Allele Frequencies
0.68
0.68
0.59
0.32
0.69
0.31
1
121
121
2
135
139
3
148
154
See the Excel sheet for the calculations.
Tygart Lake (stocked) Allele
Frequencies
0.24
0.24
0.07
0.09
0.21
0.06
Hypothesis 1: The fish comes from the Blackwater Creek population.
Hypothesis 2: The fish comes from Tygart Lake (or some other location stocked from the same ponds)
Equations for determining the probability of observing a genotype in a given population:
Pk l  p
2
i l
Pk l  2 pil p jl
m
P   Pk
k 1
P(G|H1)= (0.68)2(2)(0.59)(0.32)(2)(0.69)(0.31) = 0.0747
P(G|H1)= (0.24)2(2)(0.07)(0.09)(2)(0.21)(0.06) = 1.83x10-5
L(H1, H 2 | G) = LR =
P(G | H1 )
0.0747
=
= 4084.1
P(G | H 2 ) 1.83x10-5
It approximately 4084 times more likely that the fish came from Blackwater Creek than from Tygart Lake.
Therefore, it appears that the angler should be fined for illegally catching protected fish.
4. Blackwater Creek in problem 3 has a highly degraded trout population due to historical overfishing and
habitat degradation. The current population size is estimated to be 50. Only two alleles are detected for
Locus 3, as shown in the table.
a. What is the probability that allele 148 will eventually become fixed in this population due to
genetic drift? (5 points)
This is simply the allele frequency: 0.69
u(q)  q0
b. What is the mean expected time to fixation for allele 148 in the Blackwater Creek
population? (10 points)
You can either use the diffusion approximation for this, or you can do the simulations in Populus, as we did in
the lab.
Diffusion approximation:
-4N(1- p)ln(1- p) -4(50)(1- 0.69)ln(1- 0.69)
=
=105.24
p
0.69
It will take approximately 105 generations for this allele to reach fixation.
T(p) =
c. What are the key assumptions of this calculation? What would cause underestimation of the
time to fixation? What could cause overestimation? (5 points)
The primary assumption is that census population size is a reasonable approximation of the drift
effective population size. This essentially means that Hardy-Weinberg conditions are met
currently (i.e., random mating, no immigration, no selection, equal sex ratio), and that there
haven’t been gigantic fluctuations in population size historically.
5. The following table contains data from locus 3 for 7 trout populations.
a. Assuming equal population sizes, estimate the proportion of genetic variation that is due to
differentiation among populations. (10 points)
Population
Blackwater Creek
Poplar Creek
Cheat River
Monongahela River
New River
HSi
0.47
0.29
0.40
0.48
0.45
Gauley River
Shavers Fork
0.47
0.31
HT = 0.42
HS 
D
1 m
D  HT  H S
GST  ST
( H Si ) ST

HT
m i 1
See excel sheet for the calculations.
HS=0.41
GST=0.023
Approximately 2.3% of the total genetic variation is due to differentiation among subpopulations
b. What can you infer about the degree of connectedness and genetic exchange among these
populations? (5 points)
The low amount of differentiation among populations suggests that there is a high degree of connectivity
historically. This may not be reflective of current amounts of genetic exchange, because there may not currently
be an equilibrium between genetic drift and mutation, a prerequisite for drawing inferences about connectedness
from population structure estimates.
c. Based on this calculation, would you revise your conclusions in problem 4? Why? (5 points)
Problem 4 should certainly be revisited because the drift calculation assumes that the creek is isolated from
other sources of genetic material. Fixation might be much slower than what we predicted based on an effective
population size of 50.
Short Answer. Graduate students answer any 3. Undergraduates answer any 2. Please aim for no more than a
paragraph (less than 150 words) per answer. (Grads: 10 points each; Undergrads: 15 points each)
6. Explain how genetic drift plays a central role in Wright’s Shifting Balance theory.
Wright’s shifting balance theory posits that populations are divided into isolated demes that can become
differentiated through random genetic drift at the many loci that make up the genome. These multi-locus
genotypes will therefore vary extensively and somewhat randomly, thereby sampling a varied fitness landscape.
Selection will tend to push demes up local adaptive peaks, while drift can allow demes to move off of local
peaks, cross fitness valleys, and subsequently reach other adaptive slopes. Selection will then push the deme to
a new peak. Low levels of gene exchange among demes will enable occasional bouts of mass selection, when
multilocus genotypes on particularly high adaptive peaks will sweep through the entire population, enabling
directional evolution of the entire population.
7. A. How does genetic drift affect the efficiency of selection in small populations?
Genetic drift has a two-fold effect on the efficiency of selection. On the one hand, alleles with higher marginal
fitness can be pushed to selection much more quickly in small populations than in large ones. However, if
4NeS<1, drift plays a predominant role and alleles with lower marginal fitness can become fixed by chance
alone, thereby controverting the effects of directional selection within a population.
B. How does the level of dominance affect this relationship?
The level of dominance will somewhat accentuate the effects of genetic drift by protecting alleles from selection
in heterozygotes, in cases where the allele with higher marginal fitness is also dominant. In cases where the
allele with lower marginal fitness is dominant, fixation should occur more quickly.
8. What is the difference between cladistic and phenetic approaches to phylogenetic analysis? What are the
advantages and disadvantages of each?
Cladistic approaches are essentially distance-based approaches to phylogenetic analysis that rely upon
construction of bifurcating trees that hierarchically group taxa that are progressively less closely related. The
advantages of this approach are that it is objective, typically relies upon a readily-understood quantitative model
and is generally computationally tractable.
Phenetic approaches typically have an underlying evolutionary model to represent the inheritance of characters
from common ancestors. Different tree topologies are compared based on the number of required character
state changes to arrive at the final state of the operational taxonomic units. The principle of parsimony drives
most methods, with the assumption being that the scenario requiring the smallest number of steps is the most
likely one. Phenetic approaches are favored by many because they have an underlying evolutionary model and
therefore explicitly represent a biological hypothesis. Evolutionary changes can be readily mapped onto
phenetic trees, and evolutionary hypotheses can be directly generated and assessed. A disadvantage is that the
methods are generally more computationally intensive than cladistics approaches, and they are perceived to be
subjective by some practitioners.
9. Compare and contrast coalescent and phylogenetic approaches to the study of evolution. How does one
depend on the other?
These approaches are highly related in that they both try to reconstruct ancestral relationships based on extant
taxa based on inference about common ancestors. Phylogenetic approaches reconstruct nodes of a phylogenetic
tree that represent putative ancestral states, and they typically rely upon multiple character states. Coalescent
trees typically refer to a specific molecule or entity, and they attempt to reconstruct demographic events that
lead back in time to the coalescence of different forms of that molecule or entity (usually alleles). In many ways
coalescent approaches are more flexible than phylogenetic approaches because they have explicit underlying
population models with parameters depicting historical demographic events. Phylogenetic approaches assume
coalescence of the measured character states are concordant with the evolutionary events represented in the
phylogeny of the taxa. This is not always the case, such as when gene trees do not match species trees due to
incomplete lineage sorting, essentially when coalescence occurs following population subdivision leading to the
evolution of the two taxa.
10. List at least three possible types of errors in paternity analysis, and explain how each of these can these
be overcome.
1. Genotyping Error causes false exclusion of the true father. This can be overcome by using likelihoodbased approaches that account for genotyping error by allowing some degree of mismatch between
paternal and offspring genotypes.
2. No males are compatible with the offspring. This could be caused by gene flow from unsampled males.
This can be circumvented by sampling more males until the true father is found.
3. Cryptic gene flow. Compatible males are identified within the sampled population, but the true father is
an unsampled male that happened to produce a gamete that is compatible with the genotypes of the
sampled males. This can be minimized by adding genetic power and sampling more males.
4. Multiple males are compatible. This can be circumvented by using more genetic markers and/or by
incorporating priors into the likelihood calculation that account for differential probability of paternity
(e.g., in the case of plants, fecundity and proximity are good predictors of paternity)
11. What does the following equation mean, what are its underlying assumptions, and what are some
alternatives to this approach?
Nm 
1 FST
4 FST
This equation relates the effective population size and the migration rate to the degree of population
differentiation. One underlying assumption is that there is an equilibrium between mutation and genetic drift,
such that genetic differentiation is primarily an inverse function of degree of connectedness. Another
assumption is that mutation is negligible and can be safely ignored. Finally, the simplified form of this equation
assumes that the migration rate is relatively low, such that the m2 and 2m terms can be disgregarded.
The approach is potentially flawed, primarily because the equilibrium assumption and island models often do
not hold in real populations. An alternative is to measure gene flow between populations directly, either using
parentage analysis techniques or some form of mark-recapture experiment. It also may be possible to examine
contemporary movement using population assignment techniques.
Download