Elizabeth Hom Epi 516 November 16, 2011 Homework #4 Question 1 (a) The values of P(A) and P(B) were given. I calculated P(a) = 1-P(A) and P(b)=1-P(B). Then I used Excel to calculate the values for the theoretical range of the linkage disequilibrium coefficient DAB for the four scenarios: Scenario P(A) P(a) P(B) P(b) 1 0.5 0.5 0.5 0.5 2 0.95 0.05 0.95 0.05 3 0.95 0.05 0.05 0.95 4 0.5 0.5 0.95 0.05 - P(A)*P(B) - P(a)*P(b) P(a)*P(B) P(A)*P(b) -0.25 -0.25 0.25 0.25 -0.9025 -0.0025 0.0475 0.0475 -0.0475 -0.0475 0.0025 0.9025 -0.475 -0.025 0.475 0.025 Range of D_ab Minimum = Maximum= Scenario Max(-PAPB, -PaPb) Min(PaPB, PAPb) 1 -0.25 0.25 2 -0.0025 0.0475 3 -0.0475 0.0025 4 -0.025 0.025 Based on the range of Dab, here is the theoretical range of the absolute value of the linkage disequilibrium coefficient, which is |DAB|, for the four scenarios: Scenario 1 2 3 4 Range of |D_ab| (0, 0.25) (0,0.0475) (0,0.0475) (0, 0.025) b) D would reach its theoretical maximum value when D= P(a)*P(B) or D=P(A)*P(b). We can also use the definition of the linkage disequilibrium coefficient, D=P(AB)-[P(A)*P(B)] In one case, at the theoretical maximum value, D= [P(a)*P(B)] = P(AB)-[P(A)*P(B)] P(AB)=[P(a)*P(B)] + [P(A)*P(B)] P(AB) = P(B), which means that P(aB) =0. Thus, one of the possible four haplotypes is not present in this population. In another case, at the theoretical maximum value, D= [P(A)*P(b)] = P(AB)-[P(A)*P(B)] P(AB) = [P(A)*P(b)] + [P(A)*P(B)] P(AB) = P(A), which means that P(Ab)=0. Thus, one of the possible four haplotypes is not present in the population. This makes sense because D reaches its maximum value, D can be thought of as being in “complete linkage disequilibrium.” When “complete linkage disequilibrium” happens, at least one haplotype does not occur in the population, and at most 3 of the 4 possible haplotypes occur in the population. When 2 loci are in complete linkage disequilibrium, we can imagine that the 2 loci cannot separated by recombination and thus one haplotype is missing. Question 2 a) For SNP1, I calculated the allele frequencies of A1 and A2: FA1 = (nA1A1 + ½ nA1A2) /N = (115+ 0.5*119)/260 = 0.67 FA2 = 1-FA1 = 1-0.67 = 0.33 To determine if SNP1 was in Hardy-Weinberg Equilibrium, I used a Chi-square test. To calculate the Chi-square statistic, I used the formula: X2= ∑𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒𝑠(𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡)2 /(𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡) I used Excel to calculate the separate terms for each genotype: Genotype n_A1A1 n_A1A2 n_A2A2 observed count 115 119 26 expected count 117.1163 114.7673 28.11635 Chi-square statistic 0.038243 0.156104 0.1593 I summed these separate terms together: X2 = 0.04 + 0.16 + 0.16 = 0.353647 I calculated: P(X2 > 0.353647) = 1-P(X2 < 0.353647) = 1- 0.4479443= 0.5520557 I used the criteria that p<0.05 in order to reject the null hypothesis. Thus, in this case because p=0.55 >0.05, I accept the null hypothesis that the Hardy-Weinberg equilibrium is true for SNP1. To determine if SNP2 was in Hardy-Weinberg Equilibrium, I also used a Chi-square test. For SNP2, I calculated the allele frequencies of B1 and B2: FB1 = (nB1B1 + ½ nB1B2) /N = (47+ 0.5*125)/260 = 0.42 FA2 = 1-FA1 = 1-0.42 = 0.58 To calculate the Chi-square statistic, I used the formula: X2= ∑𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒𝑠(𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡)2 /(𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡) I used Excel to calculate the separate terms for each genotype: Genotype n_B1B1 n_B1B2 n_B2B2 observed count 47 125 88 expected count 46.11635 126.7673 87.11635 Chi-square statistic 0.016932 0.024639 0.008963 I summed these separate terms together: X2 = 0.017 + 0.025 + 0.009= 0.0505 I calculated: P(X2 > 0.050534)=1-P(X2<0.050534)=1- 0.1778632= 0.8221368 I used the criteria that p<0.05 in order to reject the null hypothesis. Thus, in this case because p=0.82 >0.05, I accept the null hypothesis that the Hardy-Weinberg equilibrium is true for SNP2. b) My null hypothesis is that the two loci, SNP1 and SNP2 are in linkage equilibrium and are therefore not linked. Thus, my alternative hypothesis is that the two loci are not in linkage equilibrium and are linked. The observed frequencies of the haplotypes were calculated as follows: A1B1=36+36+62+5+28 A1B2= 62+17+17+24+62 A2B1= 5+24+6+6+11 A2B2= 28+62+11+9+9 The expected frequencies were calculated using the allele frequencies calculated in part (a) and the fact that in total 520 haplotypes (2*260 offspring) were observed in the sample as follows: E(A1B1) =Nhaplotypes*P(A1)*P(B1)= 520*0.67*0.42 E(A1B2)= Nhaplotypes *P(A1)*P(B1)=520*0.67*0.58 E(A2B1)= Nhaplotypes *P(A2)*P(B1)=520*0.33*0.42 E(A2B2)= Nhaplotypes *P(A2)*P(B2)=520*0.33*0.58 Here are the results for the observed and expected frequencies of the haplotypes: Haplotype A1B1 A1B2 A2B1 A2B2 Observed 167 182 52 119 Expected 146.9827 202.0173 72.01731 98.98269 Chi-Square Statistic 2.72612102 1.98345682 5.56383764 4.04810778 I summed these separate terms together: X2 =2.73+1.98+5.56+4.05=14.32 I calculated: P(X2 > 14.32152327)=1-P(X2<14.32152327)=1- 0.9998459= 0.0001540929 I used the criteria that p<0.05 in order to reject the null hypothesis. Thus, in this case because p=0.0001540929<<<0.05, I reject the null hypothesis that the 2 loci are in linkage equilibrium and conclude that the 2 loci are not in linkage equilibrium. c) For the observed data, here are the linkage disequilibrium parameter estimates: i. ̂ = P(A1B1) –P(A1)P(B1) = (167/520) – (0.67*0.42) = 0.038495 𝐷 ii. ̂ ′= 𝐷 min(𝑝 𝐷𝐴1𝐵1 𝐴2 𝑝𝐵1 , 𝑝𝐴1 𝑝𝐵2) ̂ >0 if 𝐷 0.038495 0.038495 ̂ ′= 𝐷 = min(0.134,0.382) = 0.277951 min( (0.33∗0.42),(0.67∗0.58) ) iii. 𝑟̂ 2 = 𝑃 𝐷2 𝐴1 (1−𝑃𝐴1 )𝑃𝐵1 (1−𝑃𝐵1 ) (0.038495)^2 = (0.67∗0.33∗0.42∗0.58) = 0.027541