2002-09-24: Linkage Analysis II

advertisement
Lecture 9: Linkage Analysis II
Date: 9/24/02
 Unknown linkage phase
 Mixture of linkage phase
 Mixture of self and random mating
Unknown Linkage Phase for
Backcross
coupling
A
B
a
b
x
A
B
a
b
repulsion
A
b
A
B
A
b
A
B
a
B
a
B
x
A
B
a
b
no information
A
B
A
b
A
b
a
b
A
b
a
b
x a
B
a
b
A
B
a
B
?
a
b
a
B
?
Unknown Linkage Phase F2
coupling-coupling
A
B
a
b
A
B
x A
B
a
b
a
B
a
b
A
b
repulsion-repulsion
A
b
a x A
B
b
A
B
a
b
a
B
coupling-repulsion
a
B
A
b
A
B
dealt with later
A
B
a
b
x A
b
a
b
a
B
A
b
A
B
a
b
a
B
a
B
A
b
Determining Linkage Phase:
F2-CD
Goal: Calculate likelihood for F2 with one codominant and one
dominant locus. Show that the coupling and repulsion likelihoods
are symmetric about 0.5.
1. Determine the possible gametes and their probabilities.
Assume coupling of A and B in both parents.
AB
Ab
aB
ab
(1-q)/2
q/2
q/2
(1-q)/2
2. Determine the observable genotypes and their probabilities.
AAB-
Aabb
AaB-
Aabb
aaB-
aabb
(1-q2)/4
q 2/4
(1-q +q 2)/2
q(1-q)/2
q(2-q)/4
(1-q)2/4
Determining Linkage Phase:
F2-CD
3. Write an expression for the likelihood, then log likelihood.

 
LC q   1  q 2 / 4 q 2 / 4
f1
 1  q  q / 2 q 1  q  / 2 q 2  q  / 4 1  q  / 4
f2
2


f3
f4
f5

f6
2
lC q   f1 log 1  q 2  2 f 2 log q   f 3 log 1  q  q 2

 f 4 log q 1  q   f 5 log q 2  q   2 f 6 log 1  q 
4. Repeat the whole process now assuming repulsion phase
and obtain expression for lR(q).
5. Confirm lC(q)=lR(1-q).
Symmetry Around 0.5
0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91
0
Log Likelihood
-1000
-2000
-3000
Coupling Phase
Repulsion Phase
-4000
-5000
-6000
-7000
-8000
Recombinant Fraction
An Ad Hoc Linkage Phase
Determination Method I
 When the likelihood surface for the coupling and
repulsion phase is symmetric about 0.5 (backcross
and F2 with 1 codominant marker, then a single
test is sufficient.
 Calculate the G statistic under the coupling
assumption (use lC(q)).



If it is significant and q<0.5, then the linkage is coupling
If it is significant and q>0.5, then the linkage is
repulsion.
If it is not significant, no determination can be made.
An Ad Hoc Linkage Phase
Determination Method II
 When the likelihood surface is not symmetric
(e.g. F2 with dominant markers).
 Calculate GC under coupling and GR under
repulsion model.
 If either is significant and


GC > GR, then linkage is coupling.
GR > GC, then linkage is repulsion.
 Otherwise, no determination can be made.
Statistical Phase Determination:
Error
 There is a high chance of making an error when
linkage is loose.
 When q<0.3, then the chance of error is small
except for F2-DD, even with sample sizes of ~20.
 For F2-DD cross need sample size >100 to keep
error down.
 Sample size needed decreases as linkage becomes
tighter.
Once Linkage Phase
Determined
 Once linkage phase has been determined, the
analysis continues as before.
 Assume linkage phase is now known and do
a phase-known analysis.
Phase-Unknown Gametes
gametes produced by father
AaBb
AaBb
aabb
AB
ab
Ab
aB
aabb
Aabb
aaBb
• There are multiple reasons why you may not know phase.
• One reason is that grandparents are unavailable.
Likelihood for Phase-Unknown
Gametes
Let X be the count of AB and ab gametes.
Let Y be the count of Ab and aB gametes.
Lq   Pdata q   Pdata, coupled q   Pdata, repulsion q 
 Pdata coupled, q Pcoupled   Pdata repulsion, q Prepulsion
 q 1  q 
X
Y
1
X 1
Y
 q 1  q 
2
2

Distribution of the Log
Likelihood Ratio Test Statistic
 Unfortunately, the test statistic
G=2(lnL1 – lnL2)
does not have a regular distribution under the null of
no linkage.
 Numerical approximation of the distribution is
required.
 On the other hand, there is usually insufficient data
in one family to get a significant test statistic.
Distribution When There Are
Multiple Families
1
1
Y
X
Lq    ln  q X 1  q   q Y 1  q  
2
2

 The distribution of G approaches a 50:50 mixture of
a probability mass at 0 and a chi-squared
distribution with one degree of freedom. In other
words, we can simply perform a one-tailed chisquare test to test linkage when large numbers of
families are included in the study.
General Analysis with Missing
Information: Step 1
AaBb
aabb
Aabb
aaBb
1. Identify all possible mating types that could produce these
offspring and their expected frequency. (Retain phase
information).
All Possible Mating Types
Mating Type
Expected Frequency
AB/ab x AB/ab
(2p1p2q1q2)2
AB/ab x Ab/aB
2(2p1p2q1q2)2
Ab/aB x Ab/aB
(2p1p2q1q2)2
AB/ab x Ab/ab
2(2p1p2q1q2)(2p1p2q2q2)
Ab/aB x Ab/ab
2(2p1p2q1q2)(2p1p2q2q2)
AB/ab x aB/ab
2(2p1p2q1q2)(2p2p2q1q2)
Ab/aB x aB/ab
2(2p1p2q1q2)(2p2p2q1q2)
AB/ab x ab/ab
2(2p1p2q1q2)(p2p2q2q2)
Ab/aB x ab/ab
2(2p1p2q1q2)(p2p2q2q2)
Ab/ab x aB/ab
2(2p1p2q2q2)(2p2p2q1q2)
General Analysis with Missing
Information : Step 2
2. Conditional on parental mating type, calculate the
probability of each offspring genotype.
Probability of Offspring
Conditional on Mating Type
e.g. AB/ab
x Ab/aB
AB
(1-q)/2
Ab
aB
q/2
q/2
ab
(1-q/2
AB
0.25q1q
0.25q2
0.25q2
0.25q1q
q/2
Ab
(1-q/2
aB
(1-q/2
ab
q/2
0.251q2 0.25q1q 0.25q1q 0.251q2
0.251q2 0.25q1q 0.25q1q 0.251q2
0.25q1q
0.25q2
0.25q2
0.25q1q
General Analysis with Missing
Information : Step 3
PAaBb AB/ab x Ab/aB   4  0.25q 1  q 
 q 1  q 
3. Calculate the unconditional probability of each
offspring genotype.
PAaBb  
 PAaBb mating
mating types
type Pmating type 
General Analysis with Missing
Information : Step 4
4. Sum the log-likelihood contributions over all possible
offspring genotypes.
l q  
 f logP  j 
j
j  offspring genotype
General Analysis with Missing
Information : Step 5
5. The log-likelihood ratio statistic is asymptotically a
50:50 mixture of 0 point and mass and chi-squared
with one degree of freedom.
G  2ln L1  ln L0 
Mixture of Linkage Phase
 A mixture of linkage phase results when the
two parents have difference phase. Consider
the F2 with coupling-repulsion parents.
AB/ab x Ab/aB
Mixture of Linkage Phase:
Expected Genotype Frequency
Genotype
Count
Expected
Frequency
Pi(R|G)
AABB
f1
0.25q(1-q)
0.5
AABb
f2
0.25(1-2q +q 2)
q 2/[(1-q)2+q 2]
Aabb
f3
0.25q
Mixture of Linkage Phase:
Expected Genotype Frequency
Genotype
Count
Expected
Frequency
Pi(R|G)
AABB
f1
0.25q(1-q)
0.5
AABb
f2
0.25(1-q )2
q 2/[(1-q)2+q 2]
Aabb
f3
0.25q(1-q)
0.5
AaBB
f4
0.25(1-q )2
q 2/[(1-q)2+q 2]
AaBb
f5
q(1-q)
0.5
Aabb
f6
0.25(1-q )2
q 2/[(1-q)2+q 2]
aaBB
f7
0.25q(1-q)
0.5
aaBb
f8
0.25(1-q )2
q 2/[(1-q)2+q 2]
aabb
f9
0.25q(1-q)
0.5
Mixture of Linkage Phase: Log
Likelihood
Lq    f1  f 3  f 5  f 7  f 9  log q
 N  f 3  f 4  f 6  f 8  log 1  q 
Analytic MLE available:
f1  f 3  f 5  f 7  f 9
ˆ
q
2N
Mixture of Self and Random
Mating (MSR)
 Controlled crosses not always available.
 Frequently, crosses resulting from open-pollinated
populations are. These lead to MSR.
 Assume loci A and B are linked in coupling phase
with recombination fraction q.
 Assume alleles a and A at A and b and B at B.
 Assume u and v are the frequencies of A and B in
the pollen pool. (e.g. frequency of a is 1-u)
 Assume linkage equilibrium in the pollen.
MSR - Expected Frequencies
for Codominant Alleles
Genotype
Count
Expected Frequencies
Outcross
Self
AABB
f1
0.5uv(1-q)
0.25(1-q)2
AABb
f2
0.5u[(1-v)(1-q)+vq]
0.5q(1-q)
Aabb
f3
0.5u(1-v)q
0.25q 2
AaBB
f4
0.5v[(1-u)(1-q )+uq]
0.5q (1-q)
AaBb
f5
0.51q 12q)(u+v-2uv)]
0.5(1-q)2
Aabb
f6
0.5(1-v)(u-2uq +q )
0.5q (1-q)
aaBB
f7
0.5(1-u)vq
0.25q 2
aaBb
f8
0.5(1-u)(v-2vqq )
0.5q (1-q)
aabb
f9
0.5(1-u)(1-v)(1-q)
0.25(1-q)2
MSR – Log Likelihood
Function
9
Lq    f i log tpoi  1  t  psi 
i 1
• t is the probability of outcrossing (vs. selfing)
• poi is the expected frequency of type i progeny from outcross.
• psi is the expected frequency of type i progeny from self.
• q enters through the above expected frequencies as provided in previous
table.
Estimating Allelic Frequencies
in Pollen Pool (u and v)
 Use a single locus, say A.
 Consider heterozygous maternal plants (Aa).
 Write an expression for the log-likelihood in
MSR population.
 Condition on the outcrossing rate t.
 Solve analytically for umle.
Estimating the Outcrossing
Rate t
 The prior analysis conditioned on the
outcrossing rate t.
 Unfortunately Aa heterozygous mother is
necessary to determine linkage but is least
informative for t.
MSR - Estimating
Recombination Fraction q I
 EM: Calculate the conditional probabilities
of recombination given the genotype.
1 9
q n 1   f i  poiAb  poiaB t   psiAb  psiaB 1  t 
N i 1
 NR: Calculate the score and information.
9
S q   
i 1
d log 1  t  psi  tpoi 
fi
dq
d 2 log 1  t  psi  tpoi 
I q    E f i 
dq 2
i 1
9
MSR - Estimating q, u, and v
EM
 Pick initial estimates (u0, v0, q0).
 Calculate expected gametic frequencies in
selfed and outcrossed populations conditional
on current estimates and observed genotype
frequencies. tfi poig
 Calculate the mle for (u1, v1, q1).
 Iterate.
MSR - Estimating q, u, and v
(NR)
 L 
 
 u 
L 

S q , u , v  
 v 
 L 
 
 q 
 2L

2
 u2
  L
I q , u , v   
uv
 2
  L
 uq

 un 1   un 

   1 1
 vn1    vn   I S
 q  q  N
 n1   n 
2L
uv
2L
v 2
2L
qv
2L 

uq 
2L 
qv 

2L 
q 2 
MSR – Linkage Information
 Linkage information content is sensitive to allele
frequencies when outcrossing is high.
 Linkage information content decreases rapidly as
the allelic frequencies approach 0.5.
 When linkage is tight MSR provides less
information relative to F2 than when linkage is
tight, but high linkage is always more informative
than low linkage.
MSR - Bias and Variance
 Bias and mean square error is higher for dominant
markers than codominant.
 Bias and mean square errors are acceptable for
q<0.2 only when dominant allele frequency is less
than true q.
 When dominant allele frequency is > 0.5, high
negative bias on q.
 Allele frequency cannot be accurately estimated
when true frequency is <0.1 or >0.5 and outcrossing
is low.
Summary
 Unknown linkage phase


Reducing the problem to a phase-known problem
Likelihood when phase unknown
 Likelihood for general pedigree with missing
information.
 Likelihood for mixture of linkage phase
 Mixture of Self and Random mating (MSR)
Download