Supplementary Material Extended Results The first design considered is a backcross, or doubled haploid population, or biparental RILs; for our purposes the only issue is whether the observed marker values exactly reflect the underlying genotypes. Generally we see an improvement in power over random sampling for both SPCLUST and maxRec, but this improvement is not highly significant (Supp. Fig. 1a). All three clustering approaches perform well for SPCLUST, although no approach is uniformly superior. Altering the marker density does not have a large effect on power. Similarly all the methods have greater diversity than simple random sampling (Supp. Fig. 1b). For the backcross designs, we also explored the effect of missing data on the selection of subsamples. While we compare the various methods when data was 10% missing completely at random, when the missing data had been imputed, and when the original full data was analyzed, we found little difference in results. All methods were robust to this level of missing data; there was no significant decrease in power. Indeed, we find that the selective phenotyping approaches tend to select lines with less missing data than average, which may explain the robustness to moderate levels of missing data. The second design considered is an F2 intercross. Here we compare SRS and SPCLUST with MMA with respect to selective power and diversity when markers are either codominant or dominant. Supp. Figures 2a and 2b show the power and diversity respectively for these two marker types under different genetic maps. In this situation MMA and SPCLUST-PAM exhibit the highest power of all the methods, significantly better than SRS and SPCLUST-WARD. Using dominant markers rather than codominant results in approximately 25% loss in power, though the general ranking of the methods remains unchanged. Although MMA produces samples with high power under F2 designs, it does not produce the most diverse samples. As was consistently true throughout simulations, SPCLUST produces samples which contain the most genetically diverse individuals. MMA-selected samples are similar in diversity to SRS for codominant markers. The third and final design we consider is a MAGIC 4-way cross. No previously proposed methods are applicable to such a design, so we compared SPCLUST only to random sampling. For this design, most genetic markers (including SNPs) will be incompletely informative due to the increased number of parents in the population. We compare the performance of SPCLUST for fully informative markers with the more typical situation of biallelic markers. First we consider power and diversity for fully informative markers under these designs for three different sampling proportions (25%, 50% and 75%) (Supp. Fig. 3a and 3b). SPCLUST exhibits small increases in power relative to SRS, although for the most part these differences are not significant. As expected from the other simulations, SPCLUST samples exhibit greater diversity than SRS samples. The decrease in diversity with sample size reflects the definition of diversity as the minimum distance between any members of the sample. Larger samples are thus more likely to contain similar individuals. For biallelic markers, we find that there is an overall decrease in power relative to fully informative markers (compare Supp. Fig. 3a with Supp. Fig. 4a). This is due to the greater uncertainty as to whether two individuals share alleles in the biallelic case. As the density of markers increases, the power approaches that seen for fully informative markers. This is to be expected since information is more closely shared between neighbouring markers. We find that in general the same rankings hold across methods as found for fully informative markers. Little change is seen in diversity values (Supp. Fig. 4b). Chara x Glenlea Population Previous composite interval mapping analysis of this trait on the Chara x Glenlea population (Huang and George 2009) detected QTL on chromosomes 4B, 7A, and 7B. Hence in our analysis we are interested to see whether these QTL can be detected had selective phenotyping been used to reduce the number of individuals chosen from the population. Table 1 shows the maximum LOD score achieved on each of these chromosomes for specific selection methods and sample sizes. In general, the LOD score increases with sample size, as we would expect. Using a significance threshold of 3 for the LOD score, all selection methods successfully identify the QTL on Chromosome 7A. The QTL on Chromosome 4B is only detected by SRS and SPCLUST-WARD at a sample size of 88 individuals (50% of population), while all methods detect it for the largest selected sample size of 132 individuals (75% of population). SPCLUST has the greatest ability to detect the QTL on Chromosome 7B. This is detected at all three sample sizes by SPCLUST-PAM and SPCLUST-AVG. The remaining methods detect this QTL as long as at least half the population is sampled. MAGIC Population We analyzed the trait ‘plant height’ in the full dataset and selected samples of the MAGIC 4way population. At a significance threshold of 0.00017, we detected QTL on seven chromosomes in the full data: 1B, 2B, 2D, 4A, 4B, 4D, and 5B (see Huang et al. (2012b) for full analysis). The two largest QTL, on Chr 4B and 4D, represent known dwarfing genes (Rht-B1 and Rht-D1) for the trait (Keyes et al. 1989), while the QTL on 2D may be related to the flowering gene for photo-period sensitivity PPD-D1. We selected samples ranging in size from 100 to 1000 plants by steps of 100. In the smallest sample size, SPCLUST-WARD detects a significant QTL on chromosome 4D with p-value 8.35e-6, and a marginally significant QTL on 2D (p=6.9e-4). SPCLUST-AVG detects only a QTL on 4D, and SRS detects no QTL. In the next smallest sample size, SPCLUST-WARD and SPCLUST-AVG both detect significant QTL on Chromosomes 4B and 4D and a marginally significant QTL on 2B (p=4.8e-4 and p=3.6e-4); SRS detects only a QTL on Chromosome 4D. Figure 5 depicts the QTL profiles for SPCLUST-AVG, SRS and the full analysis for a sample size of 200. For the remaining sample sizes, all three selection methods detect the QTL associated with the dwarfing genes, along with varying QTL on other chromosomes. Hence it is primarily in the smaller sample sizes that the benefits of selective phenotyping are realized. Supplementary Figures Supp. Fig. 1 For markers genotyped in a 200 line backcross population (a) 95% confidence intervals for QTL mapping power at three levels of marker density and (b) boxplots of diversity scores in selected subsets of size 100. Dots in boxplots indicate median diversity scores. Supp. Fig. 2 For codominant (red) and dominant (green) markers genotyped on a 200 line F2 intercross population (a) 95% confidence interval for QTL mapping power at three levels of marker density and (b) boxplots of diversity (for marker density of 4 cM) in a selected subsets of size 50. Dots in boxplots indicate median diversity scores. Supp. Fig. 3 For fully informative markers genotyped in subsets of an 800 line MAGIC 4way population (a) 95% confidence interval for QTL mapping power at three levels of marker density and (b) boxplots of diversity scores (for marker density of 4 cM) in selected subsets of size 200 (red), 400 (green), 600 (blue). Dots in boxplots indicate median diversity scores. Supp. Fig. 4 For biallelic markers genotyped in subsets of an 800 line MAGIC 4-way population (a) 95% confidence interval for QTL mapping power at three levels of marker density and (b) boxplots of diversity scores (for marker density of 4 cM) in selected subsets of size 200 (red), 400 (green), 600 (blue). Dots in boxplots indicate median diversity scores.