BIOL 464/GEN 535 Population Genetics Spring 2014 Exam # 1, 02/12/2014 Formulas and Definitions Probability Definition: Probability is a quantitative measure of one’s belief in the occurrence of a future event. Sample-Point Method The Sample-Point Method is a simple way to find the probability of an event. This method involves the following conceptual steps: 1) Define the sample space S of an experiment by listing all sample points (i.e., all possible outcomes of the experiment). 2) Assign probabilities Pi to all sample points in S, making sure that Pi = 1. ∑ 3) Determine which sample points constitute the event of interest and sum their probabilities to find the probability of that event. If all sample points have equal probabilities, the probability of event A can also be calculated as P(A) = na / N, where na is the number of points constituting event A and N is the total number of sample points. Combinatorics MN rule: The mn rule states that with m elements from one group and n elements from another group, it is possible to form m × n pairs containing one element from each group. Permutations: An ordered arrangement of distinct objects is called a permutation, and the total number of ways of ordering n objects taken r at the time is: Prn = n(n − 1)(n − 2)...(n − r + 1) = n! , (n − r )! where n! = n × (n-1) × (n-2) ×...× 2 × 1 (remember also that 0! = 1). Combinations: Unordered sets of r elements chosen without replacement from n available elements are called combinations, and the total number of combinations can be calculated using the formula: ⎛ n ⎞ n! Crn = ⎜⎜ ⎟⎟ = . ⎝ r ⎠ r!( n − r )! Union and Intersection of Events Definition: The union of events A and B, denoted by A ∪ B, contains all sample points that fall within A or B or both within A and B (i.e., union = A or B is true). Definition: The intersection of events A and B, denoted by A ∩ B, contains all sample points that fall within both A and B (i.e., intersection = A and B are both true). Conditional Probability The conditional probability of an event A, given that event B has occurred can be found as: P( A | B ) = P( A ∩ B ) . P( B ) 1 Independent Events and the Multiplicative Law of Probability Definition: Two events A and B are said to be independent if P( A | B ) = P( A) or P( B | A) = P( B ). If events A and B are independent: P( A ∩ B ) = P( A) P( B ). Mutually Exclusive Events and the Additive Law of Probability Definition: Two events A and B are said to be mutually exclusive if P( A ∩ B ) = 0. The probability of the union of two events can be calculated using the additive law of probability: P( A ∪ B) = P( A) + P( B) − P( A ∩ B), and if events A and B are mutually exclusive: P( A ∪ B) = P( A) + P( B). Binomial Probability Distribution Definition: A random variable is a quantity that can take any of a number of possible real values. A random variable can be characterized by a probability distribution (a formula, a table, or a graph), which specifies the possible values of the random variable, and gives the probability associated with each value. The Binomial Probability Distribution can be used to characterize any random variable that satisfies all of the following conditions: 1. The values taken by the variable depend on a number of independent and identical trials. 2. Each trial results in one of two outcomes (arbitrarily called success and failure). 3. The random variable of interest is the number of successes observed. The Binomial Probability Distribution is given by the following formula: !n$ P(Y = y) = # & s y f n−y , " y% where y is number of successes, n is the number of trials, s is the probability of a success on each trial, and f = 1 - s is ⎛ n ⎞ n! the probability of failure on each trial (remember also that ⎜⎜ ⎟⎟ = C yn = ). y y ! ( n − y )! ⎝ ⎠ Estimating Allele Frequencies Definition: A locus is codominant if both alleles (or their products) in a heterozygous individual can be detected. With dominant loci, heterozygous individuals cannot be distinguished from individuals homozygous for the dominant allele. Codominant loci: Let N11, N12, and N22 be the counts of genotypes A1A1, A1A2, and A2A2, respectively, from a sample of N diploid individuals (N = N11 + N12 + N22). If p is the allele frequency of allele A1 and q is the allele frequency of allele A2, these allele frequencies can be found as: 1 N11 + N12 2 N11 + N12 2 p= = , N 2N 1 N 22 + N12 2 N 22 + N12 2 q= = , N 2N 2 Based on the properties of the Binomial Probability Distribution, the standard errors of the allele frequency estimates are: SE p = p (1 − p ) , 2N SE q = q (1 − q) . 2N Dominant loci: When heterozygous individuals are indistinguishable from individuals homozygous for the dominant allele, estimating allele frequencies is not so simple. If we assume that the population of interest is in Hardy-Weinberg Equilibrium, the frequency of the recessive allele (A2) can be estimated as: q= N 22 , N and the frequency of allele A1 as p = 1 – q. The standard error for the frequency of A2 is: 1− q2 SE q = . 4N Hypothesis Testing Definition: P-value is the probability of occurrence of a result as extreme or more extreme than the observed if the null hypothesis is correct. Testing for Departures from Hardy-Weinberg Equilibrium (HWE) The Chi-square test statistic is calculated as: (Oi − Ei ) 2 , Ei i =1 k χ2 = ∑ where O and Е are the observed and expected numbers of a particular genotype, and k is the number of possible genotypes. The degrees of freedom associated with the test are: df = k – number of parameters estimated from the data – 1. Remember that this test should always be done using the numbers of genotypes, not their frequencies. Also, as a rule of thumb, this test is only appropriate when the expected number of genotypes in each category is greater than 5. Measuring Genetic Variation Definition: Heterozygosity can be thought of as the probability that a randomly sampled individual will have two different alleles at a given locus. The Observed Heterozygosity (HO) is simply the proportion of heterozygotes in a sample. The Expected Heterozygosity (HE), which is the heterozygosity expected under HWE, is also called Gene Diversity and can be calculated using the formula: n H E = 1 − ∑ p 2i , i =1 n where ∑p 2 i the homozygosity expected under HWE, or the sum of the squared allele frequencies of all n alleles i =1 detected in a population. A slight correction to this formula is made to account for the sampling error in small samples, so the widely used formula for estimating gene diversity is: HE = n 2 N ⎛ ⎞ ⎜1 − ∑ p 2i ⎟. 2 N − 1 ⎝ i =1 ⎠ 3 The Number of Alleles (n) is often used as a summary statistic of the amount of genetic variation in a sample, but a more informative parameter is the Effective Number of Alleles (ne), which is the number of alleles a population would have if all alleles had equal frequencies. The Effective Number of Alleles is measured as: ne = 1 , n ∑p 2 i i =1 where the denominator is the homozygosity expected under HWE. Estimating Inbreeding from Population Data Definition: The inbreeding coefficient (f) measures the probability that two alleles at a randomly chosen locus in an individual are Identical By Descent (IBD). The genotype frequencies in a population experiencing inbreeding are: D = p 2 + pqf , H = 2 pq − 2 pqf = 2 pq(1 − f ), R = q 2 + pqf . From the formula for the frequency of heterozygotes, we can express the level of inbreeding as: f =1− H . 2 pq Thus, assuming that departures from Hardy-Weinberg Equilibrium (HWE) are caused entirely by inbreeding, the level of inbreeding in that population can be estimated from the fixation index: F =1− HO H E − HO = , HE HE where HO is the observed heterozygosity and HE is the expected heterozygosity. Inbreeding Equilibrium in Organisms with Mixed Mating Systems Heterozygosity is halved with each generation of strict self-fertilization: t 1 ⎛ 1 ⎞ H t = H t −1 = ⎜ ⎟ H 0 . 2 ⎝ 2 ⎠ In organisms with mixed mating systems, a proportion (S) of the progeny is produced by self-fertilization, whereas the remaining progeny (T = 1 − S) are produced through outcrossing. The heterozygosity after one generation of mixed mating will be: H t = T 2 pq + S H t −1 . 2 When inbreeding equilibrium is reached, there is no change in heterozygosity (Ht = Ht − 1 = Heq). The equilibrium heterozygosity in organisms with mixed mating systems is: H eq = T 4 pq (1 − S )4 pq Spq = . and Peq = p 2 + . 2−S 2−S 2−S 4 The equilibrium level of inbreeding can then be calculated as: f eq = S . 2−S Therefore, assuming that a population is at inbreeding equilibrium and the departures from HWE are caused entirely by self-fertilization, the rate of self-fertilization can be estimated from the relationship: F= H E − HO S = , HE 2−S Estimating Inbreeding and Kinship from Pedigrees Chain-Counting Technique: A chain for a given common ancestor starts with one parent of the individual for which f is estimated, goes up to the common ancestor, and comes back down to the other parent. The inbreeding coefficient is: N ⎛ 1 ⎞ f = ⎜ ⎟ . ⎝ 2 ⎠ When there are multiple common ancestors, f is the sum over the number of different chains (m): Ni ⎛ 1 ⎞ f = ∑ ⎜ ⎟ . i =1 ⎝ 2 ⎠ m Finally, when the common ancestors are themselves inbred, the inbreeding coefficient is calculated as: Ni ⎛ 1 ⎞ f = ∑ ⎜ ⎟ (1 + f CA ( i ) ), i =1 ⎝ 2 ⎠ m where fCA(i) is the inbreeding coefficient of the i-th common ancestor. The kinship coefficient for a pair of individuals is the inbreeding coefficient of their hypothetical offspring. This is the probability that an allele sampled from each will be identical by descent. Relatedness is twice the kinship coefficient, and estimates the fraction of alleles that are identical by descent. Relatedness can be estimated using molecular markers in natural populations, based on sharing of alleles between pairs of individuals. These estimates must be corrected for allele frequencies to compensate for errors due to identity by state. 5 Calculating Changes in Allele Frequencies after Selection Δp = p'− p = pq[ p(ω11 − ω12 ) − q(ω22 − ω12 )] ω Δq = q'−q = −Δp = , pq[q(ω22 − ω12 ) − p(ω11 − ω12 )] ω , where p and p’ are the frequencies of allele A1 before and after selection, q and q’ are the frequencies of allele A2 before and after selection, and the remaining part of the notation is defined in the following table: Fitness Frequency before selection A1A1 ω11 p2 2 Frequency × Fitness p ω11 Frequency after selection p 2ω11 ω Genotype A1A2 ω12 2pq 2 pqω12 2 pqω12 ω Sum A2A2 ω22 q2 2 1 q ω22 ω q2ω22 1 ω A convenient way to express the fitness coefficients of the three genotypes in terms of the selection coefficient (s) and the level of dominance (h) is: Fitness Fitness in terms of s and h A1A1 ω11 1 Genotype A1A2 A2A2 ω12 ω22 1 – hs 1 – s If we substitute the fitness coefficients expressed in terms of s and h into the equations describing the change of allele frequencies after selection, these equations simplify to: Δp = Δq = pqs[ ph + q(1 − h)] , 1 − 2 pqhs − q 2 s − pqs[ ph + q(1 − h)] . 1 − 2 pqhs − q 2 s Forms of Selection: (where A1 is defined as the allele that confers higher fitness in a two-allele system) h=0; Recessive deleterious h=1; Dominant deleterious 0<h<1; Partial dominance, incomplete dominance, or general dominance h=0.5; Additivity 6 Overdominance To describe overdominance (i.e., advantage of heterozygotes over all homozygotes), a more convenient way to express the fitness coefficients of the genotypes at a locus with two alleles is: Fitness Fitness in terms of s and h Genotype A1A1 A1A2 A2A2 ω11 ω12 ω22 1 – s1 1 1 – s2 where s1 and s2 are the selection disadvantages of A1A1 and A2A2 with respect to A1A2. In terms of h and s, overdominance occurs when h<0. In this case, the change in allele frequency is Δq = pq[ s1 p − s 2 q)] . 1 − s1 p 2 − s 2 q 2 The equilibrium frequency of A2 can be calculated as: qeq = s2 s1 . peq = . s1 + s2 s1 + s 2 Relative to the equilibrium point, the change in allele frequency becomes: Δq = − pq ( s1 + s 2 )(q − q e ) w so that when q<qe Δq is positive, and when q>qe, Δq is negative. Selection moves the allele frequency toward equilibrium in both cases. Underdominance To describe underdominance (i.e., advantage of both homozygotes over heterozygotes), a more convenient way to express the fitness coefficients of the genotypes at a locus with two alleles is: Fitness Fitness in terms of s and h Genotype A1A1 A1A2 A2A2 ω11 ω12 ω22 1 + s1 1 1 + s2 where s1 and s2 are the selection advantages of A1A1 and A2A2 over A1A2. In terms of h and s, underdominance occurs when h>1. In this case, the change in allele frequency is Δq = pq[ s 2 q − s1 p)] . 1 + s1 p 2 + s 2 q 2 The equilibrium frequency of A2 can be calculated as: qeq = s2 s1 . peq = . s1 + s2 s1 + s 2 Relative to the equilibrium point, the change in allele frequency becomes: Δq = pq ( s1 + s 2 )(q − q e ) w so that when q<qe Δq is negative, and when q>qe, Δq is positive. Selection moves the allele frequency away from equilibrium in both cases. 7