Supplementary Figures

advertisement
Supplementary Material
Extended Results
The first design considered is a backcross, or doubled haploid population, or biparental RILs;
for our purposes the only issue is whether the observed marker values exactly reflect the
underlying genotypes. Generally we see an improvement in power over random sampling for
both SPCLUST and maxRec, but this improvement is not highly significant (Supp. Fig. 1a).
All three clustering approaches perform well for SPCLUST, although no approach is
uniformly superior. Altering the marker density does not have a large effect on power.
Similarly all the methods have greater diversity than simple random sampling (Supp. Fig.
1b).
For the backcross designs, we also explored the effect of missing data on the selection
of subsamples. While we compare the various methods when data was 10% missing
completely at random, when the missing data had been imputed, and when the original full
data was analyzed, we found little difference in results. All methods were robust to this level
of missing data; there was no significant decrease in power. Indeed, we find that the selective
phenotyping approaches tend to select lines with less missing data than average, which may
explain the robustness to moderate levels of missing data.
The second design considered is an F2 intercross. Here we compare SRS and
SPCLUST with MMA with respect to selective power and diversity when markers are either
codominant or dominant. Supp. Figures 2a and 2b show the power and diversity respectively
for these two marker types under different genetic maps. In this situation MMA and
SPCLUST-PAM exhibit the highest power of all the methods, significantly better than SRS
and SPCLUST-WARD. Using dominant markers rather than codominant results in
approximately 25% loss in power, though the general ranking of the methods remains
unchanged. Although MMA produces samples with high power under F2 designs, it does not
produce the most diverse samples. As was consistently true throughout simulations,
SPCLUST produces samples which contain the most genetically diverse individuals.
MMA-selected samples are similar in diversity to SRS for codominant markers.
The third and final design we consider is a MAGIC 4-way cross. No previously
proposed methods are applicable to such a design, so we compared SPCLUST only to
random sampling. For this design, most genetic markers (including SNPs) will be
incompletely informative due to the increased number of parents in the population. We
compare the performance of SPCLUST for fully informative markers with the more typical
situation of biallelic markers. First we consider power and diversity for fully informative
markers under these designs for three different sampling proportions (25%, 50% and 75%)
(Supp. Fig. 3a and 3b). SPCLUST exhibits small increases in power relative to SRS, although
for the most part these differences are not significant. As expected from the other simulations,
SPCLUST samples exhibit greater diversity than SRS samples. The decrease in diversity with
sample size reflects the definition of diversity as the minimum distance between any
members of the sample. Larger samples are thus more likely to contain similar individuals.
For biallelic markers, we find that there is an overall decrease in power relative to
fully informative markers (compare Supp. Fig. 3a with Supp. Fig. 4a). This is due to the
greater uncertainty as to whether two individuals share alleles in the biallelic case. As the
density of markers increases, the power approaches that seen for fully informative markers.
This is to be expected since information is more closely shared between neighbouring
markers. We find that in general the same rankings hold across methods as found for fully
informative markers. Little change is seen in diversity values (Supp. Fig. 4b).
Chara x Glenlea Population
Previous composite interval mapping analysis of this trait on the Chara x Glenlea population
(Huang and George 2009) detected QTL on chromosomes 4B, 7A, and 7B. Hence in our
analysis we are interested to see whether these QTL can be detected had selective
phenotyping been used to reduce the number of individuals chosen from the population.
Table 1 shows the maximum LOD score achieved on each of these chromosomes for
specific selection methods and sample sizes. In general, the LOD score increases with sample
size, as we would expect. Using a significance threshold of 3 for the LOD score, all selection
methods successfully identify the QTL on Chromosome 7A. The QTL on Chromosome 4B
is only detected by SRS and SPCLUST-WARD at a sample size of 88 individuals (50% of
population), while all methods detect it for the largest selected sample size of 132 individuals
(75% of population). SPCLUST has the greatest ability to detect the QTL on Chromosome
7B. This is detected at all three sample sizes by SPCLUST-PAM and SPCLUST-AVG. The
remaining methods detect this QTL as long as at least half the population is sampled.
MAGIC Population
We analyzed the trait ‘plant height’ in the full dataset and selected samples of the MAGIC 4way population. At a significance threshold of 0.00017, we detected QTL on seven
chromosomes in the full data: 1B, 2B, 2D, 4A, 4B, 4D, and 5B (see Huang et al. (2012b) for
full analysis). The two largest QTL, on Chr 4B and 4D, represent known dwarfing genes
(Rht-B1 and Rht-D1) for the trait (Keyes et al. 1989), while the QTL on 2D may be related to
the flowering gene for photo-period sensitivity PPD-D1.
We selected samples ranging in size from 100 to 1000 plants by steps of 100. In the
smallest sample size, SPCLUST-WARD detects a significant QTL on chromosome 4D with
p-value 8.35e-6, and a marginally significant QTL on 2D (p=6.9e-4). SPCLUST-AVG
detects only a QTL on 4D, and SRS detects no QTL. In the next smallest sample size,
SPCLUST-WARD and SPCLUST-AVG both detect significant QTL on Chromosomes 4B
and 4D and a marginally significant QTL on 2B (p=4.8e-4 and p=3.6e-4); SRS detects only a
QTL on Chromosome 4D. Figure 5 depicts the QTL profiles for SPCLUST-AVG, SRS and
the full analysis for a sample size of 200. For the remaining sample sizes, all three selection
methods detect the QTL associated with the dwarfing genes, along with varying QTL on
other chromosomes. Hence it is primarily in the smaller sample sizes that the benefits of
selective phenotyping are realized.
Supplementary Figures
Supp. Fig. 1 For markers genotyped in a 200 line backcross population (a) 95% confidence
intervals for QTL mapping power at three levels of marker density and (b) boxplots of
diversity scores in selected subsets of size 100. Dots in boxplots indicate median diversity
scores.
Supp. Fig. 2 For codominant (red) and dominant (green) markers genotyped on a 200 line
F2 intercross population (a) 95% confidence interval for QTL mapping power at three levels
of marker density and (b) boxplots of diversity (for marker density of 4 cM) in a selected
subsets of size 50. Dots in boxplots indicate median diversity scores.
Supp. Fig. 3 For fully informative markers genotyped in subsets of an 800 line MAGIC 4way population (a) 95% confidence interval for QTL mapping power at three levels of marker
density and (b) boxplots of diversity scores (for marker density of 4 cM) in selected subsets
of size 200 (red), 400 (green), 600 (blue). Dots in boxplots indicate median diversity scores.
Supp. Fig. 4 For biallelic markers genotyped in subsets of an 800 line MAGIC 4-way
population (a) 95% confidence interval for QTL mapping power at three levels of marker
density and (b) boxplots of diversity scores (for marker density of 4 cM) in selected subsets
of size 200 (red), 400 (green), 600 (blue). Dots in boxplots indicate median diversity scores.
Download