TPJ_4679_sm_DataS4

advertisement
Data S4. Inferring the genotypes of the TH RIL population and the parental cultivars by the
maximum likelihood method
When the RIL population was genotyped, novel polymorphic markers that did not appear in the
parents were detected frequently. To use these novel markers for construction of the genetic map,
it is necessary to establish the corresponding genotypes of the parents. Therefore, the maximum
likelihood method was used to infer the genotypes of the TH RIL population and the parents as
follows:
1. Calculation of the rate of recombination between two markers. For any two given linked
markers (A/a, B/b) with a recombination rate value of “r”, 4 genotypes, AABB, aabb, AAbb and
aaBB, can be detected in an RIL population, with an expected frequency of (1 – r)/2, (1 – r)/2, r/2
and r/2, respectively.
For a given RIL population that consists of n inbred lines, the emergence probability of all 4
genotypes is f1, f2, f3, f4 (n = f1 + f2 + f3 + f4), respectively. Therefore, the formula for
maximum likelihood is as follows:
L( r ) 
n!
1  r ( f 1 f 2) r ( f 3 f 4)
(
)
(( )
f 1! f 2! f 3! f 4! 2
2
After logarithmic transformation, the formula becomes:
In ( L(r )  C  ( f 1  f 2) ln(
1 r
r
)  ( f 3  f 4) ln( )
2
2
C is a constant.
If the two markers are not assumed to be linked, the value of r equals 0.5, and the likelihood
function can be calculated as follows:
In
L( r )
1 r
r
 ( f 1  f 2)(ln(
)  In(0.5))  ( f 3  f 4)(ln( )  In(0.5)
L(0.5)
2
2
When the value of r introduced into the formula is increased over the range 0.001 to 1.0 by the
addition of 0.001 each time, the largest maximum likelihood ratio can be obtained. This is
calculated as the r value of the recombination rate between the two markers. If the value of r is
<0.5, the 2 markers are assumed to be linked as AB/ab; otherwise, they are assumed to be linked
as Ab/aB.
2. Deduction of parental genotypes. According to the above approach, the likelihood ratio for any
2 markers is calculated, and then all the markers are divided into different groups according to
certain threshold values. Taking these groups as units, we infer the parental genotypes as follows.
(a) Beginning with the 2 markers with the largest likelihood ratio in a group, we select 1 of the 2
markers randomly and assign an artificial parental genotype. We then determine the parental
genotype of the other marker according to the recombination rate calculated in step 1.
(b) The markers in each group are divided into 2 subgroups: (I) Mg: the markers for which the
parental genotype has been assigned; and (II) Mu: the markers for which the parental genotype has
not been assigned.
(c) The markers with the largest likelihood ratio, mgi and muj, are identified from the Mg and Mu
subgroups, respectively. Then the parental genotype of muj is inferred from the rate of
recombination between the 2 markers and the parental genotype of mgi.
(d) muj is then incorporated into the Mg marker subgroup, and muj is removed from the Mu
marker subgroup. Step (c) is repeated until the Mu marker subgroup becomes empty.
Download