1 Characterizing genomic variation of Arabidopsis thaliana: the roles of geography and climate Jesse R. Lasky1,2, David L. Des Marais1, John K. McKay3, James H. Richards4, Thomas E. Juenger1, and Timothy H. Keitt1 Supporting Information document 1 University of Texas at Austin Section of Integrative Biology 1 University Station C0900 Austin Texas 78712-0253 2 Corresponding author 3 Colorado State University Bioagricultural Sciences and Pest Management Campus delivery 1177 Fort Collins, CO 80523 4 University of California, Davis Land, Air and Water Resources One Shields Avenue Davis, CA 95616 2 Supporting Information INTRODUCTORY NOTES Life history variation Climate driven natural selection depends on the timing of climate events relative to temporal life cycle patterns (Korves et al. 2007). Arabidopsis has an annual but varied life history. Among other factors, Arabidopsis speed of growth and development is limited by temperature (Wilczek et al. 2009), water availability, and day length (Corbesier & Coupland 2005, Lempe et al. 2005). Some plants are rapid-cycling, growing mostly in the spring and summer and over-wintering as seeds. Other plants germinate in the fall and overwinter as rosettes, flowering in the spring (i.e. winter annuals). Whether a plant overwinters as a seed or as a rosette is a major categorical difference in life history that affects selective pressures (Donohue 2005). Ignoring this variation could hinder studies of climate-genome correlations. Thus information on the life history and periods of vegetative growth of accessions could greatly improve identification of important climate variables, though the natural life history of most accessions is unknown. Much of Arabidopsis life history variation may correlate to genetic variation in flowering time observed in common gardens. Many accessions have a geneticallydetermined requirement for prolonged exposure to cold (i.e. vernalization) in order to flower; these likely are winter annuals (Koornneef et al. 1998). In common gardens these 3 accessions fail to flower or flower late in the absence of vernalization (i.e. late-flowering accessions), while other accessions can rapidly-cycle without vernalization (i.e. earlyflowering accessions). Sensitivity to vernalization is largely under the genetic control of a well-studied signaling pathway (Michaels & Amasino 1999, Johanson et al. 2000, Caicedo et al. 2004, Stinchcombe et al. 2004, Shindo et al. 2005). However, accessions that typically flower early in common gardens can overwinter as rosettes and flower in the spring (i.e. behave as winter annuals) if germinated sufficiently late in fall (Wilczek et al. 2009). Nevertheless, additional traits are correlated to flowering time variation, suggesting there are broad ecological and life history differences between early and lateflowering accessions that affect how they experience climate. Ecologically important traits such as drought response (McKay et al. 2003), plant defense, and ion concentrations (Atwell et al. 2010) covary with common garden flowering times. When plants are forced to germinate in the fall, late-flowering accessions have higher over-winter survival than early-flowering, but when plants germinate in the spring early-flowering accessions have greater fecundity than late-flowering (Korves et al. 2007). This germination time by flowering time interaction supports the hypothesis that late-flowering accessions are more likely to be winter annuals than early-flowering accessions. To account for these functional differences, we stratified analyses by genetic variation in flowering time to capture ecologically relevant life history variation between accessions over-wintering as seeds and those over-wintering as rosettes. While predicting natural life history based on a limited number of common garden experiments is an imperfect technique, we expect to capture enough variation to reveal major selective pressures. 4 METHODS Predicting flowering time Because flowering time data were only available for 476 of the 1,307 accessions with genomic data, we used data from the 476 to predict flowering time variation in the remaining accessions. The structure of flowering time variation among accessions was assessed with data from 13 common garden experiments of different environments, all without vernalization (Figure S2, Table S2; Shindo et al. 2005, Zhao et al. 2007, Atwell et al. 2010, Li et al. 2010, Kenney et al. unpublished data). We used experiments without vernalization because vernalization accelerates the flowering of late-flowering putative winter annuals (Koornneef et al. 1998) which would have limited our ability to distinguish life histories. Conditions in the 13 experiments ranged from short to long days, from natural spring temperatures in Sweden to natural summer temperatures in Spain, and from well-watered to drought conditions. We conducted principal components analysis on the 67 accessions that appeared in all 13 experiments. The first principal component explained 72.7% of the variation, while the second principle component explained only 12.6% of the variation. All experiments were positively correlated to the first principal component (Figure S3). The distribution of accessions along the first axis was bimodal, indicating that a large portion of flowering time variation across environmental conditions is explained by early vs. late flowering status (Figure S4). We 5 divided the 67 accessions into two categories using k-means clustering and then assigned all 476 accessions from the 13 experiments into the k-means category with the closest centroid in flowering time space. Our categorization of the 476 accessions with known flowering time was then used as a training set to predict flowering time category in the remaining accessions. Our prediction of flowering time was based on SNP variation in genomic regions identified in the original studies. Loci associated with flowering time variation changed among experiments (Atwell et al. 2010), thus we selected candidate loci from 12 experiments to increase the robustness of our model to environmental variation (no candidates identified in one experiment). We selected SNPs identified by Atwell et al. (2010, eight experiments) and Li et al. (2010, four experiments) as associated with flowering time variation. We also included all SNPs within 100 kb of FRI and FLC, two interacting genes in the vernalization-sensitivity pathway (Koornneef et al. 1998, Michaels & Amasino 1999, Stinchombe et al. 2004), giving 857 total SNPs as predictor variables (Table S3). We modeled flowering category with support vector machines (SVM), a type of classification model. SVM are highly robust as they divide the embedding space of the data into cluster regions – the “support” of the classes – rather than attempting to classify based on statistical moment calculations within and among classes as in traditional methods. SVM finds the optimal hyperplane (support vectors) for separating response classes in the space of predictor variables. Computational biologists are increasingly using SVM to predict phenotype from genotype because of their accuracy and flexibility 6 in high-dimensional space (Ben-Hur et al. 2008). Epistatic interactions among genes (i.e. non-additive effects) have strong effects on flowering time in Arabidopsis (Juenger et al. 2005, Shindo et al. 2005). SVM can account for interactions and nonlinearity by using nonlinear basis functions to create support vectors. We used a radial basis function, which offers a reasonable tradeoff between flexible modeling and over-fitting with a large number of parameters (Hsu et al. 2010). We conducted a grid search for values of the two SVM tuning parameters to find the most accurate classification of flowering time category. Accuracy was assessed using four-fold cross-validation on the training data. A model with the best values of tuning parameters predicted with 91% accuracy in cross-validation. This model was then used to predict categories for all 1,307 accessions. SVM were implemented in the R package “e1071,” which interfaces to libsvm (Chang & Lin 2001, R Core Development Team 2010). We previously predicted flowering time category in the manner described above for 29 accessions using only data from experiments X1, X2, X5, X8, and X13. The 29 accessions were not present in any of the five experiments. We planted 12 seeds from each line in randomized blocks and grew them under long days (16hr) in our growth chamber at 22oC/18oC day/night. Two lines had very poor germination (25%) and were removed from the experiment. Trays of 18 plants were randomized three times weekly and scored daily for flowering. Mean flowering time for accessions was bimodal, those in the first cluster were considered early-flowering and those in the second cluster were 7 considered late-flowering (Figure S5). Twenty-four of the 27 accessions had experimental flowering times consistent with our flowering time category predictions based on whether they fell into the first or second clump of flowering times (Figure S4). To estimate the genomic differentiation of our predicted flowering time categories, we measured the genetic distance between predicted categories compared to the distance between random categorizations. At each SNP, we calculated an estimate of Fst, or the proportion of genetic variation found between groups) between early and late-flowering groups (Weir and Hill 2002). The 857 SNPs used to classify categories and neighboring SNPs within 100 kb were excluded (11,599 excluded). We calculated the mean and 50th, 95th, and 99th percentiles of genome-wide to use as test statistics. We randomly classified accessions into two groups, equal in size to the predicted categories, and calculated at each SNP, repeating this process 1,000 times. The proportions of random groupings with test statistics larger than our predicted grouping were taken as empirical p-values for rejecting null hypotheses that classification was random with respect to genome-wide . RDA on life history groups We conducted RDA and variance partitioning on life history groups. When we tested significance of variance partitioning on life history groups, collection sites were permuted among groups of accessions collected at the same site, within the same flowering time category (Legendre & Legendre 1998). In order to identify candidate environmental gradients underlying local adaptation we conducted RDA on each life history group. In a 8 subset of RDA we first removed PCNM spatial variables as a method of controlling for population structure. Climate variables were then used to explain SNP residuals (i.e. partial RDA). We identified important climate variables for each life history group by calculating Px as described in the main text. RESULTS Predicted phenology Using SVM with 857 SNPs as predictor variables, we predicted 1,035 accessions to be early-flowering and 272 to be late-flowering. Among Eurasian accessions that we studied with RDA, 755 and 248 were predicted to be early and late-flowering, respectively. Lateflowering accessions were most common in northern Europe (Figure S16). The predicted flowering time categories explained highly significant portions of genomic variation among the full panel of 1,304 accessions. The observed mean and 50th, 95th, and 99th percentiles of Weir's θ between flowering time categories were greater than the same respective statistics for all 1,000 random groupings. RDA Compared to analyses among all accessions, climate and space explained a similar portion of SNP variation for early-flowering accessions (20.1%), although the figure was much greater for late-flowering accessions (39.5%; Table S4, Figure S14). The observed portions of variation explained by climate variables and by PCNM were greater than the 9 portions explained by each set in all of 1,000 permuted data sets, for all accessions and for both flowering time groups (all permutation tests p < 0.001). Among early-flowering accessions, the coefficient of variation of monthly precipitation explained the most genomic variation (Table S6). After removing the effect of spatial structure, June and growing season monthly precipitation explained the greatest portion. Minimum winter temperatures explained the greatest portion among lateflowering accessions (Table S7). When accounting for spatial structure, minimum summer and growing season temperatures explained the greatest portion. The climate variables explaining the greatest amount of SNP variation differed between predicted flowering time groups (Figure S15). However, there was a weak, but significant positive correlation between the variation explained by each climate variable in early vs. late-flowering accessions (Spearman's rank correlation, = 0.27, p = 0.02). After removing spatial structure, the correlation between variance explained in early vs. late-flowering accessions became stronger (Spearman's rank correlation, = 0.52, p < 0.00001). The greatest outliers from this correlation were summer and growing season precipitation variables, which explained the most SNP variation among early-flowering accessions but explained relatively little variation among late-flowering accessions. DISCUSSION Results for flowering time groups suggest that climate imposes different selective pressures on flowering time groups and that groups experience climate differently. Climate explained the most genomic variation among late-flowering accessions, much 10 more than among early-flowering accessions. Early-flowering accessions may have relatively weaker climate-genome correlations if their rapid life-cycle during favorable conditions allows them to escape many climate selective pressures, while late-flowering accessions are subject to greater annual climate variability and selection (McKay et al. 2003). Additionally, these strong climate-genetic correlations are likely the result of consistent life history among late-flowering accessions. When populations have the same life history, a given season's climate should affect the same life stage in all populations (e.g. winter conditions affect rosette survival) resulting in consistent mechanisms of local adaptation. Late-flowering accessions may be consistently winter annuals, while earlyflowering accessions have more varied phenology (Wilczek et al. 2009). Climate-genome correlations differed between flowering time groups in ways consistent with known functional variation associated with flowering time. Climategenome relationships differed between these categories, although after removing spatial effects the two categories became more similar. However, growing season and warm season precipitation variables were major outliers from this pattern, explaining much more variation among early-flowering than late-flowering accessions. If early-flowering accessions behave as spring and summer annuals they are more likely to experience warm season precipitation variation than are late-flowering, putative winter annual, plants. Additionally, early-flowering accessions may be more sensitive to drought because they have lower water-use efficiency than late-flowering accessions (McKay et al. 2003). Flowering time variation itself may be involved in local adaptation to drought; rapidly flowering escapes drought. Genomic variation between flowering time categories 11 extended well beyond the flowering time-associated loci used to classify accessions. This variation is likely associated with population structure, known functional genetic divergence (McKay et al. 2003, Korves et al. 2007, Atwell et al. 2010), and differential patterns of adaptation to climate between categories (Figures S16 & S17). Flowering time may mediate patterns of local adaptation to specific climate variables, but flowering time covaries with other traits that together may represent different life history strategies adapted to local climatic stress. 12 References Atwell, S., Huang, Y. S., Vilhjalmsson, B. J., Willems, G., Horton, M., Li, Y., Meng, D., et al. (2010). Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature, 465, 627-631. Ben-Hur, A., Ong, C. S., Sonnenburg, S., Schölkopf, B., & Rätsch, G. (2008). Support Vector Machines and Kernels for Computational Biology. PLoS Comput Biol, 4, e1000173. Caicedo, A. L., Stinchcombe, J. R., Olsen, K. M., Schmitt, J., & Purugganan, M. D. (2004). Epistatic interaction between Arabidopsis FRI and FLC flowering time genes generates a latitudinal cline in a life history trait. Proc Nat Acad Sci USA, 101, 15670-15675. Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines [WWW document]. URL http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf Corbesier, L., & Coupland, G. (2005). Photoperiodic flowering of Arabidopsis: integrating genetic and physiological approaches to characterization of the floral stimulus. Plant, Cell & Env, 28, 54-66. Donohue, K. (2005). Niche construction through phenological plasticity: life history dynamics and ecological consequences. New Phyt, 166, 83-92. Horton, M., Hancock, A. M., Huang, Y. S., Toomajian, C., Atwell, S., Muliyati, N. W., et al. In review. The pattern of linkage disequilibrium and selection in Arabidopsis thaliana. Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2010). A practical guide to support vector 13 classification. [WWW document] URL http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf Johanson, U., West, J., Lister, C., Michaels, S., Amasino, R., & Dean, C. (2000). Molecular Analysis of FRIGIDA, a Major Determinant of Natural Variation in Arabidopsis Flowering Time. Science, 290, 344-347. Juenger, T., Sen, S., Stowe, K., & Simms, E. (2005). Epistasis and genotype-environment interaction for quantitative trait loci affecting flowering time in Arabidopsis thaliana. Genetica, 123, 87-105. Kenney, A., McKay, J. K., Richards, J. H., & Juenger, T.E. Unpublished work. Environment-dependent selection on drought escape in Arabidopsis thaliana – the evolutionary significance of global quantitative variation in δ13C, flowering phenology, and phenotypic plasticity. Koornneef, M., Alonso-Blanco, C., Peeters, A. J. M., & Soppe, W. (1998). Genetic control of flowering time in Arabidopsis. Annu Rev Plant Physiol Plant Mol Biol, 49, 345-370. Korves, T. M., Schmid, K. J., Caicedo, A. L., Mays, C., Stinchcombe, J. R., Purugganan, M. D., et al. (2007). Fitness Effects Associated with the Major Flowering Time Gene FRIGIDA in Arabidopsis thaliana in the Field. Am Nat, 169, E141-E157. Legendre, P., & Legendre., L. (1998). Numerical Ecology. 2nd edn. Elsevier, New York. Lempe, J., Balasubramanian, S., Sureshkumar, S., Singh, A., Schmid, M., & Weigel, D. (2005). Diversity of flowering responses in wild Arabidopsis thaliana strains. PLoS Genet, 1, e6. 14 Li, Y., Huang, Y., Bergelson, J., Nordborg, M., & Borevitz, J. O. (2010). Association mapping of local climate-sensitive quantitative trait loci in Arabidopsis thaliana. Proc Nat Acad Sci USA, online early edition, 1-6. McKay, J. K., Richards, J. H., & Mitchell-Olds, T. (2003). Genetics of drought adaptation in Arabidopsis thaliana: I. Pleiotropy contributes to genetic correlations among ecological traits. Mol Ecol, 12, 1137-1151. Michaels, S. D., & Amasino, R. M. (1999). FLOWERING LOCUS C Encodes a Novel MADS Domain Protein That Acts as a Repressor of Flowering. Plant Cell Online, 11, 949-956. Shindo, C., Aranzana, M. J., Lister, C., Baxter, C., Nicholls, C., Nordborg, M., et al. (2005). Role of FRIGIDA and FLOWERING LOCUS C in Determining Variation in Flowering Time of Arabidopsis. Plant Physiol, 138, 1163-1173. Weir, B. S., & Hill, W. G. (2002). Estimating F-statistics. Ann Rev Genet, 36, 721-750. Zhao, K., Aranzana, M. J., Kim, S., Lister, C., Shindo, C., Tang, C., et al. (2007). An Arabidopsis example of association mapping in structured samples. PLoS Genet, 3, e4. 15 Table S1. All climate variables for which data were obtained. All were used in variance partitioning to determine the portion of genomic variation explained by climatic variation. Numbers indicate calendar months. 16 Prec. 1 Prec. 2 Prec. 3 Prec. 4 Prec. 5 Prec. 6 Prec. 7 Prec. 8 Prec. 9 Prec. 10 Prec. 11 Prec. 12 Inter-ann. CV prec. 1 Inter-ann. CV prec. 2 Inter-ann. CV prec. 3 Inter-ann. CV prec. 4 Inter-ann. CV prec. 5 Inter-ann. CV prec. 6 Inter-ann. CV prec. 7 Inter-ann. CV prec. 8 Inter-ann. CV prec. 9 Inter-ann. CV prec. 10 Inter-ann. CV prec. 11 Inter-ann. CV prec. 12 Min. temp. 1 Min. temp. 2 Min. temp. 3 Min. temp. 4 Min. temp. 5 Min. temp. 6 Min. temp. 7 Min. temp. 8 Min. temp. 9 Min. temp. 10 Min. temp. 11 Min. temp. 12 Mean temp. 1 Mean temp. 2 Mean temp. 3 Mean temp. 4 Mean temp. 5 Mean temp. 6 Mean temp. 7 Mean temp. 8 Mean temp. 9 Mean temp. 10 Mean temp. 11 Mean temp. 12 Max. temp. 1 Max. temp. 2 Max. temp. 6 Max. temp. 7 Max. temp. 8 Max. temp. 9 Max. temp. 10 Max. temp. 11 Max. temp. 12 VPD 1 VPD 2 VPD 3 VPD 4 VPD 5 VPD 6 VPD 7 VPD 8 VPD 9 VPD 10 VPD 11 VPD 12 Gr. seas. length Mean prec. gr. seas. CV prec. gr. seas. Total prec. gr. seas. Mean month. min. temp. gr. seas. Mean temp. gr. seas. Mean month. max. temp. gr. seas. Mean VPD gr. seas. Mean inter-ann. CV prec. gr. seas. Ann. mean temp. Mean diurnal temp. range Isothermality S.D. month. temp. Max. temp. warmest mo. Min. temp. coldest mo. Temp. ann. range Mean temp. wettest quart. Mean temp. driest quart. Mean temp. warmest quart. Mean temp. coldest quart. Ann. prec. Prec. wettest mo. Prec. driest mo. CV month. prec. Prec. wettest quart. Prec. driest quart. Prec. warmest quart. Prec. coldest quart. Winter PAR Spring PAR Summer PAR 17 Table S2. Flowering time experiments used as training data in SVM model. Day length was either constant or followed natural variation for a given location. Threshold FT gives the flowering time after which accessions were considered late-flowering, standardized so that the first flowering individual in each experiment had flowering time = 0. Experiment # Reference X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Zhao et al. 2007 Zhao et al. 2007 Atwell et al. 2010 Atwell et al. 2010 Atwell et al. 2010 Zhao et al. 2007 Shindo et al. 2005 Atwell et al. 2010 Li et al. 2010 Li et al. 2010 Li et al. 2010 Li et al. 2010 Day length (hrs) Natural light conditions 16 n/a 8 n/a 16 n/a 16 n/a 16 n/a 16 n/a n/a 52º37' N, Oct. – March 16 n/a n/a 41º43' N, March.-July n/a 55º43' N, May – Sep. n/a 41º43' N, Apr. – Sep. n/a 55º43' N, June – Nov. Temperature (ºC) N accessions 18 18 10 16 22 23 20-22 20 5-27 5-21 7-28 5-21 167 162 194 193 193 137 153 166 445 445 445 445 Notes simulated natural day length and temperature simulated natural day length and temperature simulated natural day length and temperature simulated natural day length and temperature 18 Table S3. SNPs used as predictor variables in SVM models of flowering time (TAIR 10). See attached file. 19 Table S4. Proportion of total SNP variation explained by climate and spatial variables. Regional groupings taken from Horton et al. (2012). Accessions All Early-flowering Late-flowering R2 (adjusted) Climate + Space Climate | Space N accessions N locations Space | Climate Climate ∩ Space Residual 0.226 0.201 0.395 0.057 0.050 0.221 0.069 0.056 0.093 0.100 0.096 0.082 0.774 0.799 0.605 1003 755 248 447 315 165 0.051 0.168 0.191 0.071 0.033 0.016 0.120 0.029 0.009 0.000 0.006 0.001 0.009 0.152 0.065 0.041 0.949 0.832 0.809 0.929 174 204 96 156 0 89 29 69 25 Regional groups Britain & Ireland France Central Europe Central-Eastern Europe 20 Table S5. Bioclim abbreviations from the WorldClim data set. Taken from http://www.worldclim.org/bioclim . BIO1 = Annual Mean Temperature BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp)) BIO3 = Isothermality (BIO2/BIO7) (* 100) BIO4 = Temperature Seasonality (standard deviation *100) BIO5 = Max Temperature of Warmest Month BIO6 = Min Temperature of Coldest Month BIO7 = Temperature Annual Range (BIO5-BIO6) BIO8 = Mean Temperature of Wettest Quarter BIO9 = Mean Temperature of Driest Quarter BIO10 = Mean Temperature of Warmest Quarter BIO11 = Mean Temperature of Coldest Quarter BIO12 = Annual Precipitation BIO13 = Precipitation of Wettest Month BIO14 = Precipitation of Driest Month BIO15 = Precipitation Seasonality (Coefficient of Variation) BIO16 = Precipitation of Wettest Quarter BIO17 = Precipitation of Driest Quarter BIO18 = Precipitation of Warmest Quarter BIO19 = Precipitation of Coldest Quarter 21 Table S6. Climate variables and the percent of SNP variation among early-flowering accessions they explain in RDA (100*Px). Only the top 10 climate variables are shown for each RDA. Note changes in climate variables important to late-flowering (Table S7) versus early-flowering accessions. Early-flowering, RDA on raw SNPs Early-flowering, partial RDA after removing spatial effects Climate variable Climate variable Percent of SNP variation explained CV monthly prec. 5.77 S.D. mean monthly temp. 5.74 Max. March temp. 5.69 Mean March temp. 5.67 Grow. seas. length 5.67 Mean temp. wettest quart. 5.60 Max. February temp. 5.55 January prec. 5.52 Max. December temp. 5.51 November prec. 5.51 June prec. Mean monthly prec. grow. seas. Prec. warmest quart. Prec. wettest quart. Prec. wettest month May prec. August prec. September prec. July prec. CV monthly prec. grow. seas. Percent of SNP variation explained 0.99 0.98 0.97 0.94 0.93 0.91 0.88 0.83 0.83 0.82 22 Table S7. Climate variables and the percent of SNP variation among late-flowering accessions they explain in RDA (100*Px). Only the top 10 climate variables are shown for each RDA. Note changes in climate variables important to early-flowering (Table S6) versus late-flowering accessions. Late-flowering, RDA on raw SNPs Climate variable Min. January temp. Min. November temp. Min. temp. coldest month Min. December temp. Mean January temp. Mean December temp. Min. February temp. Min. March temp. Min. April temp. Mean November temp. Late-flowering, partial RDA after removing spatial effects Percent of SNP Climate variable Percent of SNP variation explained variation explained 11.26 Min. July temp. 2.37 11.24 Mean monthly min. temp. grow. seas. 2.29 11.20 Mean diurnal temp. range 2.25 11.19 Min. June temp. 2.24 11.19 Mean monthly max. temp. grow. seas. 2.19 11.17 CV monthly prec. 2.18 11.15 Min. August temp. 2.15 11.15 Mean monthly temp. grow. seas. 2.08 11.14 July prec. 2.07 11.14 CV monthly prec. grow. seas. 2.04 23 Figure S1. Correlation matrix between values of climatic variables at the 389 unique collection locations in Eurasia. Precipitation variables are shown in blue, temperature variables are shown in red, VPD variables are shown in purple, growing season variables are shown in green, and Bioclim derived variables are shown in black, PAR is shown in orange. 24 Figure S2. Flowering times of accessions from 13 experiments used to train a genetic SVM model of early vs. late-flowering phenotype. Accessions were split into two groups with k-means clustering. Clusters are shown in blue and red, cluster means are shown as black dots. Flowering times for accessions included in experiments neighboring on the plot are shown as lines; other accessions are shown as circles. The x-axis indicates the experiment name, light, and temperature conditions (see Table S2 for details). Flowering time is standardized so that the first flowering individual in each experiment had flowering time = 0. 25 Figure S3. The first two principal components of flowering time in the absence of vernalization. Thirteen experiments of varied conditions included 67 accessions. Accessions are identified by abbreviations in gray, experiments are identified by green arrows (see Table S2 for description of each). Axes are scaled based on the variation in flowering time they explain. 26 Figure S4. Histogram of the distribution of accessions along the first principal compo nent of floweri ng time variati on shown in Figure S3. 27 Figure S5. Standardized flowering time for 27 accessions that were used to validate previous flowering time predictions. The first plant to flower was considered day 0. 28 Figure S6. Portion of SNP variation explained (Px) by PCNM eigenvectors (only those with positive eigenvalues are shown). Eigenvectors are ranked by eigenvalue, with the greatest eigenvalue on the left declining as one moves right along the x-axis. 29 Figure S7. Portion of SNP variation explained by PCNM eigenvectors (Px) vs. Moran's I for each eigenvector. Larger values of I indicate greater scale spatial autocorrelation. 30 Figure S8. The first two RDA axes for all accessions combined. Climate variables with the strongest correlation to each quadrant are shown. 31 Figure S9. The first two RDA axes for all accessions combined. Spatial structure variables were first removed in partial RDA. Climate variables with the strongest correlation to each quadrant are shown. 32 Figure S10. The first two RDA axes for early-flowering accessions. Climate variables with the strongest correlation to each quadrant are shown. 33 Figure S11. The first two RDA axes for early-flowering accessions after removing spatial structure. Climate variables with the strongest correlation to each quadrant are shown. 34 Figure S12. The first two RDA axes for late-flowering accessions. Climate variables with the strongest correlation to each quadrant are shown. 35 Figure S13. The first two RDA axes for late-flowering accessions. Spatial structure variables were first removed with partial RDA. Climate variables with the strongest correlation to each quadrant are shown. 36 Figure S14. Venn diagrams of variance partitioning results for early and late-flowering accessions. Circles represent the proportion of SNP variation explained by climate and spatial structure. Unexplained SNP variation is represented by the white surrounding the circles. 37 Figure S15. Comparison of the SNP variation explained by climate variables (Px) in early vs. late-flowering accessions. Spearman's rank correlations are indicated in red. The gray line shows a least-squares fit to the correlation between variables. See supplemental material for a key of Bioclim variable abbreviations (Table S5). 38 Figure S16. Distribution of flowering time groups across the Eurasian sample. April maximum temperature is shown, as it was the climate variable explaining the most raw SNP variation in RDA on all accessions.