Homework 4 Solutions (Elizabeth Hom)

advertisement
Elizabeth Hom
Epi 516
November 16, 2011
Homework #4
Question 1
(a)
The values of P(A) and P(B) were given.
I calculated P(a) = 1-P(A) and P(b)=1-P(B). Then I used Excel to calculate the values for the
theoretical range of the linkage disequilibrium coefficient DAB for the four scenarios:
Scenario P(A)
P(a)
P(B)
P(b)
1
0.5
0.5
0.5
0.5
2
0.95
0.05
0.95
0.05
3
0.95
0.05
0.05
0.95
4
0.5
0.5
0.95
0.05
- P(A)*P(B) - P(a)*P(b) P(a)*P(B) P(A)*P(b)
-0.25
-0.25
0.25
0.25
-0.9025
-0.0025
0.0475
0.0475
-0.0475
-0.0475
0.0025
0.9025
-0.475
-0.025
0.475
0.025
Range of D_ab
Minimum =
Maximum=
Scenario Max(-PAPB, -PaPb) Min(PaPB, PAPb)
1
-0.25
0.25
2
-0.0025
0.0475
3
-0.0475
0.0025
4
-0.025
0.025
Based on the range of Dab, here is the theoretical range of the absolute value of the linkage
disequilibrium coefficient, which is |DAB|, for the four scenarios:
Scenario
1
2
3
4
Range of |D_ab|
(0, 0.25)
(0,0.0475)
(0,0.0475)
(0, 0.025)
b)
D would reach its theoretical maximum value when D= P(a)*P(B) or D=P(A)*P(b).
We can also use the definition of the linkage disequilibrium coefficient,
D=P(AB)-[P(A)*P(B)]
In one case, at the theoretical maximum value,
D= [P(a)*P(B)] = P(AB)-[P(A)*P(B)]
P(AB)=[P(a)*P(B)] + [P(A)*P(B)]
P(AB) = P(B), which means that P(aB) =0. Thus, one of the possible four haplotypes is not
present in this population.
In another case, at the theoretical maximum value,
D= [P(A)*P(b)] = P(AB)-[P(A)*P(B)]
P(AB) = [P(A)*P(b)] + [P(A)*P(B)]
P(AB) = P(A), which means that P(Ab)=0. Thus, one of the possible four haplotypes is not
present in the population.
This makes sense because D reaches its maximum value, D can be thought of as being in
“complete linkage disequilibrium.” When “complete linkage disequilibrium” happens, at least
one haplotype does not occur in the population, and at most 3 of the 4 possible haplotypes occur
in the population. When 2 loci are in complete linkage disequilibrium, we can imagine that the 2
loci cannot separated by recombination and thus one haplotype is missing.
Question 2
a)
For SNP1, I calculated the allele frequencies of A1 and A2:
FA1 = (nA1A1 + ½ nA1A2) /N = (115+ 0.5*119)/260 = 0.67
FA2 = 1-FA1 = 1-0.67 = 0.33
To determine if SNP1 was in Hardy-Weinberg Equilibrium, I used a Chi-square test. To
calculate the Chi-square statistic, I used the formula:
X2= ∑𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒𝑠(𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡)2 /(𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡)
I used Excel to calculate the separate terms for each genotype:
Genotype
n_A1A1 n_A1A2 n_A2A2
observed
count
115
119
26
expected
count
117.1163 114.7673 28.11635
Chi-square
statistic
0.038243 0.156104
0.1593
I summed these separate terms together: X2 = 0.04 + 0.16 + 0.16 = 0.353647
I calculated: P(X2 > 0.353647) = 1-P(X2 < 0.353647) = 1- 0.4479443= 0.5520557
I used the criteria that p<0.05 in order to reject the null hypothesis. Thus, in this case because
p=0.55 >0.05, I accept the null hypothesis that the Hardy-Weinberg equilibrium is true for SNP1.
To determine if SNP2 was in Hardy-Weinberg Equilibrium, I also used a Chi-square test. For
SNP2, I calculated the allele frequencies of B1 and B2:
FB1 = (nB1B1 + ½ nB1B2) /N = (47+ 0.5*125)/260 = 0.42
FA2 = 1-FA1 = 1-0.42 = 0.58
To calculate the Chi-square statistic, I used the formula:
X2= ∑𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒𝑠(𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡)2 /(𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡)
I used Excel to calculate the separate terms for each genotype:
Genotype
n_B1B1 n_B1B2 n_B2B2
observed
count
47
125
88
expected
count
46.11635 126.7673 87.11635
Chi-square
statistic
0.016932 0.024639 0.008963
I summed these separate terms together: X2 = 0.017 + 0.025 + 0.009= 0.0505
I calculated: P(X2 > 0.050534)=1-P(X2<0.050534)=1- 0.1778632= 0.8221368
I used the criteria that p<0.05 in order to reject the null hypothesis. Thus, in this case because
p=0.82 >0.05, I accept the null hypothesis that the Hardy-Weinberg equilibrium is true for SNP2.
b)
My null hypothesis is that the two loci, SNP1 and SNP2 are in linkage equilibrium and are
therefore not linked. Thus, my alternative hypothesis is that the two loci are not in linkage
equilibrium and are linked.
The observed frequencies of the haplotypes were calculated as follows:
A1B1=36+36+62+5+28
A1B2= 62+17+17+24+62
A2B1= 5+24+6+6+11
A2B2= 28+62+11+9+9
The expected frequencies were calculated using the allele frequencies calculated in part (a) and
the fact that in total 520 haplotypes (2*260 offspring) were observed in the sample as follows:
E(A1B1) =Nhaplotypes*P(A1)*P(B1)= 520*0.67*0.42
E(A1B2)= Nhaplotypes *P(A1)*P(B1)=520*0.67*0.58
E(A2B1)= Nhaplotypes *P(A2)*P(B1)=520*0.33*0.42
E(A2B2)= Nhaplotypes *P(A2)*P(B2)=520*0.33*0.58
Here are the results for the observed and expected frequencies of the haplotypes:
Haplotype
A1B1
A1B2
A2B1
A2B2
Observed
167
182
52
119
Expected
146.9827
202.0173
72.01731
98.98269
Chi-Square
Statistic
2.72612102
1.98345682
5.56383764
4.04810778
I summed these separate terms together: X2 =2.73+1.98+5.56+4.05=14.32
I calculated: P(X2 > 14.32152327)=1-P(X2<14.32152327)=1- 0.9998459= 0.0001540929
I used the criteria that p<0.05 in order to reject the null hypothesis. Thus, in this case because
p=0.0001540929<<<0.05, I reject the null hypothesis that the 2 loci are in linkage equilibrium
and conclude that the 2 loci are not in linkage equilibrium.
c)
For the observed data, here are the linkage disequilibrium parameter estimates:
i.
̂ = P(A1B1) –P(A1)P(B1) = (167/520) – (0.67*0.42) = 0.038495
𝐷
ii.
̂ ′=
𝐷
min(𝑝
𝐷𝐴1𝐵1
𝐴2
𝑝𝐵1 , 𝑝𝐴1 𝑝𝐵2)
̂ >0
if 𝐷
0.038495
0.038495
̂ ′=
𝐷
= min(0.134,0.382) = 0.277951
min( (0.33∗0.42),(0.67∗0.58) )
iii.
𝑟̂ 2 = 𝑃
𝐷2
𝐴1 (1−𝑃𝐴1 )𝑃𝐵1 (1−𝑃𝐵1 )
(0.038495)^2
= (0.67∗0.33∗0.42∗0.58) = 0.027541
Download