Data S4. Inferring the genotypes of the TH RIL population and the parental cultivars by the maximum likelihood method When the RIL population was genotyped, novel polymorphic markers that did not appear in the parents were detected frequently. To use these novel markers for construction of the genetic map, it is necessary to establish the corresponding genotypes of the parents. Therefore, the maximum likelihood method was used to infer the genotypes of the TH RIL population and the parents as follows: 1. Calculation of the rate of recombination between two markers. For any two given linked markers (A/a, B/b) with a recombination rate value of “r”, 4 genotypes, AABB, aabb, AAbb and aaBB, can be detected in an RIL population, with an expected frequency of (1 – r)/2, (1 – r)/2, r/2 and r/2, respectively. For a given RIL population that consists of n inbred lines, the emergence probability of all 4 genotypes is f1, f2, f3, f4 (n = f1 + f2 + f3 + f4), respectively. Therefore, the formula for maximum likelihood is as follows: L( r ) n! 1 r ( f 1 f 2) r ( f 3 f 4) ( ) (( ) f 1! f 2! f 3! f 4! 2 2 After logarithmic transformation, the formula becomes: In ( L(r ) C ( f 1 f 2) ln( 1 r r ) ( f 3 f 4) ln( ) 2 2 C is a constant. If the two markers are not assumed to be linked, the value of r equals 0.5, and the likelihood function can be calculated as follows: In L( r ) 1 r r ( f 1 f 2)(ln( ) In(0.5)) ( f 3 f 4)(ln( ) In(0.5) L(0.5) 2 2 When the value of r introduced into the formula is increased over the range 0.001 to 1.0 by the addition of 0.001 each time, the largest maximum likelihood ratio can be obtained. This is calculated as the r value of the recombination rate between the two markers. If the value of r is <0.5, the 2 markers are assumed to be linked as AB/ab; otherwise, they are assumed to be linked as Ab/aB. 2. Deduction of parental genotypes. According to the above approach, the likelihood ratio for any 2 markers is calculated, and then all the markers are divided into different groups according to certain threshold values. Taking these groups as units, we infer the parental genotypes as follows. (a) Beginning with the 2 markers with the largest likelihood ratio in a group, we select 1 of the 2 markers randomly and assign an artificial parental genotype. We then determine the parental genotype of the other marker according to the recombination rate calculated in step 1. (b) The markers in each group are divided into 2 subgroups: (I) Mg: the markers for which the parental genotype has been assigned; and (II) Mu: the markers for which the parental genotype has not been assigned. (c) The markers with the largest likelihood ratio, mgi and muj, are identified from the Mg and Mu subgroups, respectively. Then the parental genotype of muj is inferred from the rate of recombination between the 2 markers and the parental genotype of mgi. (d) muj is then incorporated into the Mg marker subgroup, and muj is removed from the Mu marker subgroup. Step (c) is repeated until the Mu marker subgroup becomes empty.