Supplemental data:

advertisement
Supplemental data:
An S-PLUS (2000) function, called simulSEL, was developed to reproduce the steps of a
typical breeding population.
The principle of the simulations is as follows: (Figure 3)
Figure 3 here
Marker-genotypes construction: Before the start of the breeding process, we considered only
a few founder lines, with full linkage disequilibrium across all their genome, hence between
all the markers and the QTLs. We also imposed that at this generation, no founder lines had
any alleles in common with any other. Thus, line 1 carried only markers and QTLs coded 1,
line 2 only markers and QTLs coded 2, and the last founder line (the NPth) carried only
markers and QTLs coded NP all along its genome. Markers were evenly spaced every d
centiMorgans along the chromosomes. On chromosome 1, we simulated a single QTL
between two markers. In addition to the QTL of interest, we simulated on the other
chromosomes Npoly randomly located QTLs (that might therefore be linked or not) with
random effects to simulate the polygenic contribution to the trait values.
First generation of crosses: From the founder lines generation, circular crosses, i.e. 1*2, 2*3,
… , NP-1*NP, NP*1 were performed. We then derived lines by self-pollination to obtain new
fixed lines (the initial crosses followed by numerous self-pollinations to go back to fixed lines
is called a breeding cycle). NP mixed sub-populations of the same size (same contribution for
all founders for this first generation) were derived, giving the “G0” generation.
Quantitative trait: The parameters for the creation of the quantitative trait are the QTL
heritability (h²QTL) and the heritability of the polygenes (h²poly). The heritability of the QTL is
represented by the variance of the QTL divided by the total variance while the heritability of
the polygenes are represented by the variance of the whole set of polygenes divided by the
total variance.
We created at the founder line generation the NP allele effects for the QTL and the effects of
the Npoly polygenes. The NP possible effects of the QTL (for example, 20 founder lines
means also 20 different alleles at the QTL and thus 20 different effects) were drawn from a
normal distribution with mean 0 and variance 1. Then the QTL variance (VarQTL) was
calculated at the true QTL position (according to the QTL allele information and the
corresponding effect), and the NP effects for each of the Npoly polygenes were extracted from
a normal distribution with mean 0 and variance [VarQTL*(h²poly/h²QTL)]/Npoly]. Finally, the
true variance accounted for by the polygenes was computed (VarPoly), and a random normally
distributed noise with variance  e2 = [VarQTL *(1/h²QTL-1) - VarPoly ] was added to simulate
phenotypic values of the trait.
Overlapping generations and matrix of crosses: After the genotype and phenotype of the lines
of the “G0” generation were obtained, virtual breeding programs were conducted. These
consisted in choosing a certain number of parents, in crossing them according to a “matrix of
crosses”, then in deriving a certain number of progenies by self-pollination from each twoparents cross. Thus, each fixed progeny had only two fixed parents, and could share with
other individuals of this generation a half-sib or full-sib relationship, or none.
Moreover, two particularities common to most of the plant breeding schemes were
implemented:
-
the “overlapping” choice of the parents. All parents were not necessarily extracted
from the last generation only, but a proportion of them (parameter) could originate
from older ones
-
the influence of a matrix of crosses on the structure of the resulting progeny of a
breeding program, which influenced the effective population size
The design of crosses at the beginning of a breeding program could be seen as a geometric
series, since the representation of parents in the selected progeny is uneven, L-shaped rather
than random. For example, if a given line, say X, is recognized to be the most elite line at a
given period (with the best agronomic performance in a range of environments), X will
typically be crossed to many other lines to fully exploit its genetic value. After self-pollination
and selection, a certain number of lines coming from this parent X will still remain at the end
of the breeding cycle, and will form one of the largest half-sib families (containing possibly
some full-sibs when a specific cross is particularly outstanding). In contrast, a non-elite plant
with a very specific trait of interest but with low agronomic performance may also be used to
initiate crosses, but on a smaller scale. Some of its offspring might also be selected but to a
much smaller extent.
A matrix of crosses was implemented to reproduce the formation of half-sib and full-sib
families during a cycle of breeding, hence taking into account unbalanced contributions of
parents to the final population. The parents were sorted according to their phenotype, i.e. for
their “breeding interest”, from the best to the worse in the first line and first column of the
matrix of crosses. For example, for a breeding scenario with crosses between 100 parents,
100*99/2=4950 crosses are possible. If we want to create only 500 progenies, we have to
choose to make only some crosses. As we want to obtain certain relationships through the
crosses, we thus give a rank to the parents for their interest, and make more crosses for the
interesting parents. The number of progenies derived from the cross between parent i and
parent j varied from 0 (for 0 progenies, the most common situation) to 1, 2, 3 or 4 (1
individual was obtained in advanced breeding generations for the most common situation).
The “overlapping” option extracted 80% of parents from the most recent generation of
breeding, and 20% from all the older generations (accounting for 10% of the resulting
progeny).
Performing n breeding cycles: A loop over successive breeding cycles was performed and the
parents of crosses, the resulting progenies and the phenotypic data were stored. All available
generations were used to build the next cycle. The last breeding cycle, NG, was used for QTL
detection.
It should be noted that at the beginning, all the allele frequencies were equal, which was not
the case after many generations due to genetic drift and non-panmictic conditions, and/or
selection. NP alleles with different effects were possible at each QTL locus and at each
marker at “G0” but their number was also reduced after NG cycles of breeding. Finally, all the
markers and QTL were in full linkage disequilibrium at G0 but were not so after many
recombinations (5 generations of self-pollination by cycle, and NG cycles before the mapping
generation).
The simulSEL function is freely available upon request from the authors.
Download