Document 10303820

advertisement
BIOL 464/GEN 535 Population Genetics
Spring 2014
Exam # 1, 02/12/2014
Formulas and Definitions
Probability
Definition: Probability is a quantitative measure of one’s belief in the occurrence of a future event.
Sample-Point Method
The Sample-Point Method is a simple way to find the probability of an event. This method involves the following
conceptual steps:
1) Define the sample space S of an experiment by listing all sample points (i.e., all possible outcomes of the
experiment).
2) Assign probabilities Pi to all sample points in S, making sure that
Pi = 1.
∑
3) Determine which sample points constitute the event of interest and sum their probabilities to find the
probability of that event. If all sample points have equal probabilities, the probability of event A can also be
calculated as P(A) = na / N, where na is the number of points constituting event A and N is the total number of
sample points.
Combinatorics
MN rule: The mn rule states that with m elements from one group and n elements from another group, it is possible to
form m × n pairs containing one element from each group.
Permutations: An ordered arrangement of distinct objects is called a permutation, and the total number of ways of
ordering n objects taken r at the time is:
Prn = n(n − 1)(n − 2)...(n − r + 1) =
n!
,
(n − r )!
where n! = n × (n-1) × (n-2) ×...× 2 × 1 (remember also that 0! = 1).
Combinations: Unordered sets of r elements chosen without replacement from n available elements are called
combinations, and the total number of combinations can be calculated using the formula:
⎛ n ⎞
n!
Crn = ⎜⎜ ⎟⎟ =
.
⎝ r ⎠ r!( n − r )!
Union and Intersection of Events
Definition: The union of events A and B, denoted by A ∪ B, contains all sample points that fall within A or B or both
within A and B (i.e., union = A or B is true).
Definition: The intersection of events A and B, denoted by A ∩ B, contains all sample points that fall within both A
and B (i.e., intersection = A and B are both true).
Conditional Probability
The conditional probability of an event A, given that event B has occurred can be found as:
P( A | B ) =
P( A ∩ B )
.
P( B )
1
Independent Events and the Multiplicative Law of Probability
Definition: Two events A and B are said to be independent if P( A | B ) = P( A) or P( B | A) = P( B ). If events A and B
are independent:
P( A ∩ B ) = P( A) P( B ).
Mutually Exclusive Events and the Additive Law of Probability
Definition: Two events A and B are said to be mutually exclusive if P( A ∩ B ) = 0.
The probability of the union of two events can be calculated using the additive law of probability:
P( A ∪ B) = P( A) + P( B) − P( A ∩ B),
and if events A and B are mutually exclusive:
P( A ∪ B) = P( A) + P( B).
Binomial Probability Distribution
Definition: A random variable is a quantity that can take any of a number of possible real values. A random variable
can be characterized by a probability distribution (a formula, a table, or a graph), which specifies the possible values of
the random variable, and gives the probability associated with each value.
The Binomial Probability Distribution can be used to characterize any random variable that satisfies all of the
following conditions:
1. The values taken by the variable depend on a number of independent and identical trials.
2. Each trial results in one of two outcomes (arbitrarily called success and failure).
3. The random variable of interest is the number of successes observed.
The Binomial Probability Distribution is given by the following formula:
!n$
P(Y = y) = # & s y f n−y ,
" y%
where y is number of successes, n is the number of trials, s is the probability of a success on each trial, and f = 1 - s is
⎛ n ⎞
n!
the probability of failure on each trial (remember also that ⎜⎜ ⎟⎟ = C yn =
).
y
y
!
(
n
−
y
)!
⎝ ⎠
Estimating Allele Frequencies
Definition: A locus is codominant if both alleles (or their products) in a heterozygous individual can be detected. With
dominant loci, heterozygous individuals cannot be distinguished from individuals homozygous for the dominant allele.
Codominant loci: Let N11, N12, and N22 be the counts of genotypes A1A1, A1A2, and A2A2, respectively, from a
sample of N diploid individuals (N = N11 + N12 + N22). If p is the allele frequency of allele A1 and q is the allele
frequency of allele A2, these allele frequencies can be found as:
1
N11 + N12
2 N11 + N12
2
p=
=
,
N
2N
1
N 22 + N12
2 N 22 + N12
2
q=
=
,
N
2N
2
Based on the properties of the Binomial Probability Distribution, the standard errors of the allele frequency estimates
are:
SE p =
p (1 − p )
,
2N
SE q =
q (1 − q)
.
2N
Dominant loci: When heterozygous individuals are indistinguishable from individuals homozygous for the dominant
allele, estimating allele frequencies is not so simple. If we assume that the population of interest is in Hardy-Weinberg
Equilibrium, the frequency of the recessive allele (A2) can be estimated as:
q=
N 22
,
N
and the frequency of allele A1 as p = 1 – q. The standard error for the frequency of A2 is:
1− q2
SE q =
.
4N
Hypothesis Testing
Definition: P-value is the probability of occurrence of a result as extreme or more extreme than the observed if the null
hypothesis is correct.
Testing for Departures from Hardy-Weinberg Equilibrium (HWE)
The Chi-square test statistic is calculated as:
(Oi − Ei ) 2
,
Ei
i =1
k
χ2 = ∑
where O and Е are the observed and expected numbers of a particular genotype, and k is the number of possible
genotypes. The degrees of freedom associated with the test are:
df = k – number of parameters estimated from the data – 1.
Remember that this test should always be done using the numbers of genotypes, not their frequencies. Also, as a rule
of thumb, this test is only appropriate when the expected number of genotypes in each category is greater than 5.
Measuring Genetic Variation
Definition: Heterozygosity can be thought of as the probability that a randomly sampled individual will have two
different alleles at a given locus. The Observed Heterozygosity (HO) is simply the proportion of heterozygotes in a
sample. The Expected Heterozygosity (HE), which is the heterozygosity expected under HWE, is also called Gene
Diversity and can be calculated using the formula:
n
H E = 1 − ∑ p 2i ,
i =1
n
where
∑p
2
i
the homozygosity expected under HWE, or the sum of the squared allele frequencies of all n alleles
i =1
detected in a population. A slight correction to this formula is made to account for the sampling error in small samples,
so the widely used formula for estimating gene diversity is:
HE =
n
2 N ⎛
⎞
⎜1 − ∑ p 2i ⎟.
2 N − 1 ⎝ i =1 ⎠
3
The Number of Alleles (n) is often used as a summary statistic of the amount of genetic variation in a sample, but a
more informative parameter is the Effective Number of Alleles (ne), which is the number of alleles a population would
have if all alleles had equal frequencies. The Effective Number of Alleles is measured as:
ne =
1
,
n
∑p
2
i
i =1
where the denominator is the homozygosity expected under HWE.
Estimating Inbreeding from Population Data
Definition: The inbreeding coefficient (f) measures the probability that two alleles at a randomly chosen locus in an
individual are Identical By Descent (IBD).
The genotype frequencies in a population experiencing inbreeding are:
D = p 2 + pqf ,
H = 2 pq − 2 pqf = 2 pq(1 − f ),
R = q 2 + pqf .
From the formula for the frequency of heterozygotes, we can express the level of inbreeding as:
f =1−
H
.
2 pq
Thus, assuming that departures from Hardy-Weinberg Equilibrium (HWE) are caused entirely by inbreeding, the level
of inbreeding in that population can be estimated from the fixation index:
F =1−
HO H E − HO
=
,
HE
HE
where HO is the observed heterozygosity and HE is the expected heterozygosity.
Inbreeding Equilibrium in Organisms with Mixed Mating Systems
Heterozygosity is halved with each generation of strict self-fertilization:
t
1
⎛ 1 ⎞
H t = H t −1 = ⎜ ⎟ H 0 .
2
⎝ 2 ⎠
In organisms with mixed mating systems, a proportion (S) of the progeny is produced by self-fertilization, whereas the
remaining progeny (T = 1 − S) are produced through outcrossing. The heterozygosity after one generation of mixed
mating will be:
H t = T 2 pq + S
H t −1
.
2
When inbreeding equilibrium is reached, there is no change in heterozygosity (Ht = Ht − 1 = Heq). The equilibrium
heterozygosity in organisms with mixed mating systems is:
H eq =
T 4 pq (1 − S )4 pq
Spq
=
. and Peq = p 2 +
.
2−S
2−S
2−S
4
The equilibrium level of inbreeding can then be calculated as:
f eq =
S
.
2−S
Therefore, assuming that a population is at inbreeding equilibrium and the departures from HWE are caused entirely
by self-fertilization, the rate of self-fertilization can be estimated from the relationship:
F=
H E − HO
S
=
,
HE
2−S
Estimating Inbreeding and Kinship from Pedigrees
Chain-Counting Technique: A chain for a given common ancestor starts with one parent of the individual for which f
is estimated, goes up to the common ancestor, and comes back down to the other parent. The inbreeding coefficient is:
N
⎛ 1 ⎞
f = ⎜ ⎟ .
⎝ 2 ⎠
When there are multiple common ancestors, f is the sum over the number of different chains (m):
Ni
⎛ 1 ⎞
f = ∑ ⎜ ⎟ .
i =1 ⎝ 2 ⎠
m
Finally, when the common ancestors are themselves inbred, the inbreeding coefficient is calculated as:
Ni
⎛ 1 ⎞
f = ∑ ⎜ ⎟ (1 + f CA ( i ) ),
i =1 ⎝ 2 ⎠
m
where fCA(i) is the inbreeding coefficient of the i-th common ancestor.
The kinship coefficient for a pair of individuals is the inbreeding coefficient of their hypothetical offspring. This is the
probability that an allele sampled from each will be identical by descent.
Relatedness is twice the kinship coefficient, and estimates the fraction of alleles that are identical by descent.
Relatedness can be estimated using molecular markers in natural populations, based on sharing of alleles between
pairs of individuals. These estimates must be corrected for allele frequencies to compensate for errors due to identity
by state.
5
Calculating Changes in Allele Frequencies after Selection
Δp = p'− p =
pq[ p(ω11 − ω12 ) − q(ω22 − ω12 )]
ω
Δq = q'−q = −Δp =
,
pq[q(ω22 − ω12 ) − p(ω11 − ω12 )]
ω
,
where p and p’ are the frequencies of allele A1 before and after selection, q and q’ are the frequencies of allele A2
before and after selection, and the remaining part of the notation is defined in the following table:
Fitness
Frequency before selection
A1A1
ω11
p2
2
Frequency × Fitness
p ω11
Frequency after selection
p 2ω11
ω
Genotype
A1A2
ω12
2pq
2 pqω12
2 pqω12
ω
Sum
A2A2
ω22
q2
2
1
q ω22
ω
q2ω22
1
ω
A convenient way to express the fitness coefficients of the three genotypes in terms of the selection coefficient (s) and
the level of dominance (h) is:
Fitness
Fitness in terms of s and h
A1A1
ω11
1
Genotype
A1A2 A2A2
ω12
ω22
1 – hs 1 – s
If we substitute the fitness coefficients expressed in terms of s and h into the equations describing the change of allele
frequencies after selection, these equations simplify to:
Δp =
Δq =
pqs[ ph + q(1 − h)]
,
1 − 2 pqhs − q 2 s
− pqs[ ph + q(1 − h)]
.
1 − 2 pqhs − q 2 s
Forms of Selection: (where A1 is defined as the allele that confers higher fitness in a two-allele system)
h=0; Recessive deleterious
h=1; Dominant deleterious
0<h<1; Partial dominance, incomplete dominance, or general dominance
h=0.5; Additivity
6
Overdominance
To describe overdominance (i.e., advantage of heterozygotes over all homozygotes), a more convenient way to express
the fitness coefficients of the genotypes at a locus with two alleles is:
Fitness
Fitness in terms of s and h
Genotype
A1A1 A1A2 A2A2
ω11
ω12
ω22
1 – s1
1
1 – s2
where s1 and s2 are the selection disadvantages of A1A1 and A2A2 with respect to A1A2.
In terms of h and s, overdominance occurs when h<0. In this case, the change in allele frequency is
Δq =
pq[ s1 p − s 2 q)]
.
1 − s1 p 2 − s 2 q 2
The equilibrium frequency of A2 can be calculated as:
qeq =
s2
s1
. peq =
.
s1 + s2
s1 + s 2
Relative to the equilibrium point, the change in allele frequency becomes:
Δq =
− pq ( s1 + s 2 )(q − q e )
w
so that when q<qe Δq is positive, and when q>qe, Δq is negative. Selection moves the allele frequency toward
equilibrium in both cases.
Underdominance
To describe underdominance (i.e., advantage of both homozygotes over heterozygotes), a more convenient way to
express the fitness coefficients of the genotypes at a locus with two alleles is:
Fitness
Fitness in terms of s and h
Genotype
A1A1 A1A2 A2A2
ω11
ω12
ω22
1 + s1
1
1 + s2
where s1 and s2 are the selection advantages of A1A1 and A2A2 over A1A2.
In terms of h and s, underdominance occurs when h>1. In this case, the change in allele frequency is
Δq =
pq[ s 2 q − s1 p)]
.
1 + s1 p 2 + s 2 q 2
The equilibrium frequency of A2 can be calculated as:
qeq =
s2
s1
. peq =
.
s1 + s2
s1 + s 2
Relative to the equilibrium point, the change in allele frequency becomes:
Δq =
pq ( s1 + s 2 )(q − q e )
w
so that when q<qe Δq is negative, and when q>qe, Δq is positive. Selection moves the allele frequency away from
equilibrium in both cases.
7
Download