mec5709_sm_Supporting-information

advertisement
1
Characterizing genomic variation of Arabidopsis thaliana: the roles of geography and
climate
Jesse R. Lasky1,2, David L. Des Marais1, John K. McKay3, James H. Richards4,
Thomas E. Juenger1, and Timothy H. Keitt1
Supporting Information document
1
University of Texas at Austin
Section of Integrative Biology
1 University Station C0900
Austin Texas 78712-0253
2
Corresponding author
3
Colorado State University
Bioagricultural Sciences and Pest Management
Campus delivery 1177
Fort Collins, CO 80523
4
University of California, Davis
Land, Air and Water Resources
One Shields Avenue
Davis, CA 95616
2
Supporting Information
INTRODUCTORY NOTES
Life history variation
Climate driven natural selection depends on the timing of climate events relative to
temporal life cycle patterns (Korves et al. 2007). Arabidopsis has an annual but varied
life history. Among other factors, Arabidopsis speed of growth and development is
limited by temperature (Wilczek et al. 2009), water availability, and day length
(Corbesier & Coupland 2005, Lempe et al. 2005). Some plants are rapid-cycling, growing
mostly in the spring and summer and over-wintering as seeds. Other plants germinate in
the fall and overwinter as rosettes, flowering in the spring (i.e. winter annuals). Whether a
plant overwinters as a seed or as a rosette is a major categorical difference in life history
that affects selective pressures (Donohue 2005). Ignoring this variation could hinder
studies of climate-genome correlations. Thus information on the life history and periods
of vegetative growth of accessions could greatly improve identification of important
climate variables, though the natural life history of most accessions is unknown.
Much of Arabidopsis life history variation may correlate to genetic variation in
flowering time observed in common gardens. Many accessions have a geneticallydetermined requirement for prolonged exposure to cold (i.e. vernalization) in order to
flower; these likely are winter annuals (Koornneef et al. 1998). In common gardens these
3
accessions fail to flower or flower late in the absence of vernalization (i.e. late-flowering
accessions), while other accessions can rapidly-cycle without vernalization (i.e. earlyflowering accessions). Sensitivity to vernalization is largely under the genetic control of a
well-studied signaling pathway (Michaels & Amasino 1999, Johanson et al. 2000,
Caicedo et al. 2004, Stinchcombe et al. 2004, Shindo et al. 2005). However, accessions
that typically flower early in common gardens can overwinter as rosettes and flower in
the spring (i.e. behave as winter annuals) if germinated sufficiently late in fall (Wilczek et
al. 2009). Nevertheless, additional traits are correlated to flowering time variation,
suggesting there are broad ecological and life history differences between early and lateflowering accessions that affect how they experience climate. Ecologically important
traits such as drought response (McKay et al. 2003), plant defense, and ion concentrations
(Atwell et al. 2010) covary with common garden flowering times. When plants are forced
to germinate in the fall, late-flowering accessions have higher over-winter survival than
early-flowering, but when plants germinate in the spring early-flowering accessions have
greater fecundity than late-flowering (Korves et al. 2007). This germination time by
flowering time interaction supports the hypothesis that late-flowering accessions are more
likely to be winter annuals than early-flowering accessions. To account for these
functional differences, we stratified analyses by genetic variation in flowering time to
capture ecologically relevant life history variation between accessions over-wintering as
seeds and those over-wintering as rosettes. While predicting natural life history based on
a limited number of common garden experiments is an imperfect technique, we expect to
capture enough variation to reveal major selective pressures.
4
METHODS
Predicting flowering time
Because flowering time data were only available for 476 of the 1,307 accessions with
genomic data, we used data from the 476 to predict flowering time variation in the
remaining accessions. The structure of flowering time variation among accessions was
assessed with data from 13 common garden experiments of different environments, all
without vernalization (Figure S2, Table S2; Shindo et al. 2005, Zhao et al. 2007, Atwell
et al. 2010, Li et al. 2010, Kenney et al. unpublished data). We used experiments without
vernalization because vernalization accelerates the flowering of late-flowering putative
winter annuals (Koornneef et al. 1998) which would have limited our ability to
distinguish life histories. Conditions in the 13 experiments ranged from short to long
days, from natural spring temperatures in Sweden to natural summer temperatures in
Spain, and from well-watered to drought conditions. We conducted principal components
analysis on the 67 accessions that appeared in all 13 experiments. The first principal
component explained 72.7% of the variation, while the second principle component
explained only 12.6% of the variation. All experiments were positively correlated to the
first principal component (Figure S3). The distribution of accessions along the first axis
was bimodal, indicating that a large portion of flowering time variation across
environmental conditions is explained by early vs. late flowering status (Figure S4). We
5
divided the 67 accessions into two categories using k-means clustering and then assigned
all 476 accessions from the 13 experiments into the k-means category with the closest
centroid in flowering time space. Our categorization of the 476 accessions with known
flowering time was then used as a training set to predict flowering time category in the
remaining accessions.
Our prediction of flowering time was based on SNP variation in genomic regions
identified in the original studies. Loci associated with flowering time variation changed
among experiments (Atwell et al. 2010), thus we selected candidate loci from 12
experiments to increase the robustness of our model to environmental variation (no
candidates identified in one experiment). We selected SNPs identified by Atwell et al.
(2010, eight experiments) and Li et al. (2010, four experiments) as associated with
flowering time variation. We also included all SNPs within 100 kb of FRI and FLC, two
interacting genes in the vernalization-sensitivity pathway (Koornneef et al. 1998,
Michaels & Amasino 1999, Stinchombe et al. 2004), giving 857 total SNPs as predictor
variables (Table S3).
We modeled flowering category with support vector machines (SVM), a type of
classification model. SVM are highly robust as they divide the embedding space of the
data into cluster regions – the “support” of the classes – rather than attempting to classify
based on statistical moment calculations within and among classes as in traditional
methods. SVM finds the optimal hyperplane (support vectors) for separating response
classes in the space of predictor variables. Computational biologists are increasingly
using SVM to predict phenotype from genotype because of their accuracy and flexibility
6
in high-dimensional space (Ben-Hur et al. 2008).
Epistatic interactions among genes (i.e. non-additive effects) have strong effects
on flowering time in Arabidopsis (Juenger et al. 2005, Shindo et al. 2005). SVM can
account for interactions and nonlinearity by using nonlinear basis functions to create
support vectors. We used a radial basis function, which offers a reasonable tradeoff
between flexible modeling and over-fitting with a large number of parameters (Hsu et al.
2010).
We conducted a grid search for values of the two SVM tuning parameters to find
the most accurate classification of flowering time category. Accuracy was assessed using
four-fold cross-validation on the training data. A model with the best values of tuning
parameters predicted with 91% accuracy in cross-validation. This model was then used to
predict categories for all 1,307 accessions. SVM were implemented in the R package
“e1071,” which interfaces to libsvm (Chang & Lin 2001, R Core Development Team
2010).
We previously predicted flowering time category in the manner described above
for 29 accessions using only data from experiments X1, X2, X5, X8, and X13. The 29
accessions were not present in any of the five experiments. We planted 12 seeds from
each line in randomized blocks and grew them under long days (16hr) in our growth
chamber at 22oC/18oC day/night. Two lines had very poor germination (25%) and were
removed from the experiment. Trays of 18 plants were randomized three times weekly
and scored daily for flowering. Mean flowering time for accessions was bimodal, those in
the first cluster were considered early-flowering and those in the second cluster were
7
considered late-flowering (Figure S5). Twenty-four of the 27 accessions had
experimental flowering times consistent with our flowering time category predictions
based on whether they fell into the first or second clump of flowering times (Figure S4).
To estimate the genomic differentiation of our predicted flowering time
categories, we measured the genetic distance between predicted categories compared to
the distance between random categorizations. At each SNP, we calculated an estimate
of Fst, or the proportion of genetic variation found between groups) between early and
late-flowering groups (Weir and Hill 2002). The 857 SNPs used to classify categories
and neighboring SNPs within 100 kb were excluded (11,599 excluded). We calculated
the mean and 50th, 95th, and 99th percentiles of genome-wide  to use as test statistics. We
randomly classified accessions into two groups, equal in size to the predicted categories,
and calculated  at each SNP, repeating this process 1,000 times. The proportions of
random groupings with test statistics larger than our predicted grouping were taken as
empirical p-values for rejecting null hypotheses that classification was random with
respect to genome-wide .
RDA on life history groups
We conducted RDA and variance partitioning on life history groups. When we tested
significance of variance partitioning on life history groups, collection sites were permuted
among groups of accessions collected at the same site, within the same flowering time
category (Legendre & Legendre 1998). In order to identify candidate environmental
gradients underlying local adaptation we conducted RDA on each life history group. In a
8
subset of RDA we first removed PCNM spatial variables as a method of controlling for
population structure. Climate variables were then used to explain SNP residuals (i.e.
partial RDA). We identified important climate variables for each life history group by
calculating Px as described in the main text.
RESULTS
Predicted phenology
Using SVM with 857 SNPs as predictor variables, we predicted 1,035 accessions to be
early-flowering and 272 to be late-flowering. Among Eurasian accessions that we studied
with RDA, 755 and 248 were predicted to be early and late-flowering, respectively. Lateflowering accessions were most common in northern Europe (Figure S16). The predicted
flowering time categories explained highly significant portions of genomic variation
among the full panel of 1,304 accessions. The observed mean and 50th, 95th, and 99th
percentiles of Weir's θ between flowering time categories were greater than the same
respective statistics for all 1,000 random groupings.
RDA
Compared to analyses among all accessions, climate and space explained a similar
portion of SNP variation for early-flowering accessions (20.1%), although the figure was
much greater for late-flowering accessions (39.5%; Table S4, Figure S14). The observed
portions of variation explained by climate variables and by PCNM were greater than the
9
portions explained by each set in all of 1,000 permuted data sets, for all accessions and
for both flowering time groups (all permutation tests p < 0.001).
Among early-flowering accessions, the coefficient of variation of monthly
precipitation explained the most genomic variation (Table S6). After removing the effect
of spatial structure, June and growing season monthly precipitation explained the greatest
portion. Minimum winter temperatures explained the greatest portion among lateflowering accessions (Table S7). When accounting for spatial structure, minimum
summer and growing season temperatures explained the greatest portion.
The climate variables explaining the greatest amount of SNP variation differed
between predicted flowering time groups (Figure S15). However, there was a weak, but
significant positive correlation between the variation explained by each climate variable
in early vs. late-flowering accessions (Spearman's rank correlation,  = 0.27, p = 0.02).
After removing spatial structure, the correlation between variance explained in early vs.
late-flowering accessions became stronger (Spearman's rank correlation,  = 0.52, p <
0.00001). The greatest outliers from this correlation were summer and growing season
precipitation variables, which explained the most SNP variation among early-flowering
accessions but explained relatively little variation among late-flowering accessions.
DISCUSSION
Results for flowering time groups suggest that climate imposes different selective
pressures on flowering time groups and that groups experience climate differently.
Climate explained the most genomic variation among late-flowering accessions, much
10
more than among early-flowering accessions. Early-flowering accessions may have
relatively weaker climate-genome correlations if their rapid life-cycle during favorable
conditions allows them to escape many climate selective pressures, while late-flowering
accessions are subject to greater annual climate variability and selection (McKay et al.
2003). Additionally, these strong climate-genetic correlations are likely the result of
consistent life history among late-flowering accessions. When populations have the same
life history, a given season's climate should affect the same life stage in all populations
(e.g. winter conditions affect rosette survival) resulting in consistent mechanisms of local
adaptation. Late-flowering accessions may be consistently winter annuals, while earlyflowering accessions have more varied phenology (Wilczek et al. 2009).
Climate-genome correlations differed between flowering time groups in ways
consistent with known functional variation associated with flowering time. Climategenome relationships differed between these categories, although after removing spatial
effects the two categories became more similar. However, growing season and warm
season precipitation variables were major outliers from this pattern, explaining much
more variation among early-flowering than late-flowering accessions. If early-flowering
accessions behave as spring and summer annuals they are more likely to experience
warm season precipitation variation than are late-flowering, putative winter annual,
plants. Additionally, early-flowering accessions may be more sensitive to drought
because they have lower water-use efficiency than late-flowering accessions (McKay et
al. 2003). Flowering time variation itself may be involved in local adaptation to drought;
rapidly flowering escapes drought. Genomic variation between flowering time categories
11
extended well beyond the flowering time-associated loci used to classify accessions. This
variation is likely associated with population structure, known functional genetic
divergence (McKay et al. 2003, Korves et al. 2007, Atwell et al. 2010), and differential
patterns of adaptation to climate between categories (Figures S16 & S17).
Flowering time may mediate patterns of local adaptation to specific climate
variables, but flowering time covaries with other traits that together may represent
different life history strategies adapted to local climatic stress.
12
References
Atwell, S., Huang, Y. S., Vilhjalmsson, B. J., Willems, G., Horton, M., Li, Y., Meng, D.,
et al. (2010). Genome-wide association study of 107 phenotypes in Arabidopsis
thaliana inbred lines. Nature, 465, 627-631.
Ben-Hur, A., Ong, C. S., Sonnenburg, S., Schölkopf, B., & Rätsch, G. (2008). Support
Vector Machines and Kernels for Computational Biology. PLoS Comput Biol, 4,
e1000173.
Caicedo, A. L., Stinchcombe, J. R., Olsen, K. M., Schmitt, J., & Purugganan, M. D.
(2004). Epistatic interaction between Arabidopsis FRI and FLC flowering time
genes generates a latitudinal cline in a life history trait. Proc Nat Acad Sci USA,
101, 15670-15675.
Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines
[WWW document]. URL http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf
Corbesier, L., & Coupland, G. (2005). Photoperiodic flowering of Arabidopsis:
integrating genetic and physiological approaches to characterization of the floral
stimulus. Plant, Cell & Env, 28, 54-66.
Donohue, K. (2005). Niche construction through phenological plasticity: life history
dynamics and ecological consequences. New Phyt, 166, 83-92.
Horton, M., Hancock, A. M., Huang, Y. S., Toomajian, C., Atwell, S., Muliyati, N. W., et
al. In review. The pattern of linkage disequilibrium and selection in Arabidopsis
thaliana.
Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2010). A practical guide to support vector
13
classification. [WWW document] URL
http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Johanson, U., West, J., Lister, C., Michaels, S., Amasino, R., & Dean, C. (2000).
Molecular Analysis of FRIGIDA, a Major Determinant of Natural Variation in
Arabidopsis Flowering Time. Science, 290, 344-347.
Juenger, T., Sen, S., Stowe, K., & Simms, E. (2005). Epistasis and genotype-environment
interaction for quantitative trait loci affecting flowering time in Arabidopsis
thaliana. Genetica, 123, 87-105.
Kenney, A., McKay, J. K., Richards, J. H., & Juenger, T.E. Unpublished work.
Environment-dependent selection on drought escape in Arabidopsis thaliana – the
evolutionary significance of global quantitative variation in δ13C, flowering
phenology, and phenotypic plasticity.
Koornneef, M., Alonso-Blanco, C., Peeters, A. J. M., & Soppe, W. (1998). Genetic
control of flowering time in Arabidopsis. Annu Rev Plant Physiol Plant Mol Biol,
49, 345-370.
Korves, T. M., Schmid, K. J., Caicedo, A. L., Mays, C., Stinchcombe, J. R., Purugganan,
M. D., et al. (2007). Fitness Effects Associated with the Major Flowering Time
Gene FRIGIDA in Arabidopsis thaliana in the Field. Am Nat, 169, E141-E157.
Legendre, P., & Legendre., L. (1998). Numerical Ecology. 2nd edn. Elsevier, New York.
Lempe, J., Balasubramanian, S., Sureshkumar, S., Singh, A., Schmid, M., & Weigel, D.
(2005). Diversity of flowering responses in wild Arabidopsis thaliana strains.
PLoS Genet, 1, e6.
14
Li, Y., Huang, Y., Bergelson, J., Nordborg, M., & Borevitz, J. O. (2010). Association
mapping of local climate-sensitive quantitative trait loci in Arabidopsis thaliana.
Proc Nat Acad Sci USA, online early edition, 1-6.
McKay, J. K., Richards, J. H., & Mitchell-Olds, T. (2003). Genetics of drought
adaptation in Arabidopsis thaliana: I. Pleiotropy contributes to genetic
correlations among ecological traits. Mol Ecol, 12, 1137-1151.
Michaels, S. D., & Amasino, R. M. (1999). FLOWERING LOCUS C Encodes a Novel
MADS Domain Protein That Acts as a Repressor of Flowering. Plant Cell Online,
11, 949-956.
Shindo, C., Aranzana, M. J., Lister, C., Baxter, C., Nicholls, C., Nordborg, M., et al.
(2005). Role of FRIGIDA and FLOWERING LOCUS C in Determining
Variation in Flowering Time of Arabidopsis. Plant Physiol, 138, 1163-1173.
Weir, B. S., & Hill, W. G. (2002). Estimating F-statistics. Ann Rev Genet, 36, 721-750.
Zhao, K., Aranzana, M. J., Kim, S., Lister, C., Shindo, C., Tang, C., et al. (2007). An
Arabidopsis example of association mapping in structured samples. PLoS Genet,
3, e4.
15
Table S1. All climate variables for which data were obtained. All were used in variance
partitioning to determine the portion of genomic variation explained by climatic
variation. Numbers indicate calendar months.
16
Prec. 1
Prec. 2
Prec. 3
Prec. 4
Prec. 5
Prec. 6
Prec. 7
Prec. 8
Prec. 9
Prec. 10
Prec. 11
Prec. 12
Inter-ann. CV prec. 1
Inter-ann. CV prec. 2
Inter-ann. CV prec. 3
Inter-ann. CV prec. 4
Inter-ann. CV prec. 5
Inter-ann. CV prec. 6
Inter-ann. CV prec. 7
Inter-ann. CV prec. 8
Inter-ann. CV prec. 9
Inter-ann. CV prec. 10
Inter-ann. CV prec. 11
Inter-ann. CV prec. 12
Min. temp. 1
Min. temp. 2
Min. temp. 3
Min. temp. 4
Min. temp. 5
Min. temp. 6
Min. temp. 7
Min. temp. 8
Min. temp. 9
Min. temp. 10
Min. temp. 11
Min. temp. 12
Mean temp. 1
Mean temp. 2
Mean temp. 3
Mean temp. 4
Mean temp. 5
Mean temp. 6
Mean temp. 7
Mean temp. 8
Mean temp. 9
Mean temp. 10
Mean temp. 11
Mean temp. 12
Max. temp. 1
Max. temp. 2
Max. temp. 6
Max. temp. 7
Max. temp. 8
Max. temp. 9
Max. temp. 10
Max. temp. 11
Max. temp. 12
VPD 1
VPD 2
VPD 3
VPD 4
VPD 5
VPD 6
VPD 7
VPD 8
VPD 9
VPD 10
VPD 11
VPD 12
Gr. seas. length
Mean prec. gr. seas.
CV prec. gr. seas.
Total prec. gr. seas.
Mean month. min. temp. gr. seas.
Mean temp. gr. seas.
Mean month. max. temp. gr. seas.
Mean VPD gr. seas.
Mean inter-ann. CV prec. gr. seas.
Ann. mean temp.
Mean diurnal temp. range
Isothermality
S.D. month. temp.
Max. temp. warmest mo.
Min. temp. coldest mo.
Temp. ann. range
Mean temp. wettest quart.
Mean temp. driest quart.
Mean temp. warmest quart.
Mean temp. coldest quart.
Ann. prec.
Prec. wettest mo.
Prec. driest mo.
CV month. prec.
Prec. wettest quart.
Prec. driest quart.
Prec. warmest quart.
Prec. coldest quart.
Winter PAR
Spring PAR
Summer PAR
17
Table S2. Flowering time experiments used as training data in SVM model. Day length
was either constant or followed natural variation for a given location. Threshold FT gives
the flowering time after which accessions were considered late-flowering, standardized
so that the first flowering individual in each experiment had flowering time = 0.
Experiment #
Reference
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
Zhao et al. 2007
Zhao et al. 2007
Atwell et al. 2010
Atwell et al. 2010
Atwell et al. 2010
Zhao et al. 2007
Shindo et al. 2005
Atwell et al. 2010
Li et al. 2010
Li et al. 2010
Li et al. 2010
Li et al. 2010
Day length (hrs) Natural light conditions
16
n/a
8
n/a
16
n/a
16
n/a
16
n/a
16
n/a
n/a 52º37' N, Oct. – March
16
n/a
n/a 41º43' N, March.-July
n/a 55º43' N, May – Sep.
n/a 41º43' N, Apr. – Sep.
n/a 55º43' N, June – Nov.
Temperature (ºC) N accessions
18
18
10
16
22
23
20-22
20
5-27
5-21
7-28
5-21
167
162
194
193
193
137
153
166
445
445
445
445
Notes
simulated natural day length and temperature
simulated natural day length and temperature
simulated natural day length and temperature
simulated natural day length and temperature
18
Table S3. SNPs used as predictor variables in SVM models of flowering time (TAIR 10).
See attached file.
19
Table S4. Proportion of total SNP variation explained by climate and spatial variables.
Regional groupings taken from Horton et al. (2012).
Accessions
All
Early-flowering
Late-flowering
R2 (adjusted)
Climate + Space Climate | Space
N accessions N locations
Space | Climate
Climate ∩ Space
Residual
0.226
0.201
0.395
0.057
0.050
0.221
0.069
0.056
0.093
0.100
0.096
0.082
0.774
0.799
0.605
1003
755
248
447
315
165
0.051
0.168
0.191
0.071
0.033
0.016
0.120
0.029
0.009
0.000
0.006
0.001
0.009
0.152
0.065
0.041
0.949
0.832
0.809
0.929
174
204
96
156
0
89
29
69
25
Regional groups
Britain & Ireland
France
Central Europe
Central-Eastern Europe
20
Table S5. Bioclim abbreviations from the WorldClim data set. Taken from
http://www.worldclim.org/bioclim .
BIO1 = Annual Mean Temperature
BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp))
BIO3 = Isothermality (BIO2/BIO7) (* 100)
BIO4 = Temperature Seasonality (standard deviation *100)
BIO5 = Max Temperature of Warmest Month
BIO6 = Min Temperature of Coldest Month
BIO7 = Temperature Annual Range (BIO5-BIO6)
BIO8 = Mean Temperature of Wettest Quarter
BIO9 = Mean Temperature of Driest Quarter
BIO10 = Mean Temperature of Warmest Quarter
BIO11 = Mean Temperature of Coldest Quarter
BIO12 = Annual Precipitation
BIO13 = Precipitation of Wettest Month
BIO14 = Precipitation of Driest Month
BIO15 = Precipitation Seasonality (Coefficient of Variation)
BIO16 = Precipitation of Wettest Quarter
BIO17 = Precipitation of Driest Quarter
BIO18 = Precipitation of Warmest Quarter
BIO19 = Precipitation of Coldest Quarter
21
Table S6. Climate variables and the percent of SNP variation among early-flowering
accessions they explain in RDA (100*Px). Only the top 10 climate variables are shown
for each RDA. Note changes in climate variables important to late-flowering (Table S7)
versus early-flowering accessions.
Early-flowering, RDA on raw SNPs
Early-flowering, partial RDA after removing spatial effects
Climate variable
Climate variable
Percent of SNP
variation explained
CV monthly prec.
5.77
S.D. mean monthly temp.
5.74
Max. March temp.
5.69
Mean March temp.
5.67
Grow. seas. length
5.67
Mean temp. wettest quart.
5.60
Max. February temp.
5.55
January prec.
5.52
Max. December temp.
5.51
November prec.
5.51
June prec.
Mean monthly prec. grow. seas.
Prec. warmest quart.
Prec. wettest quart.
Prec. wettest month
May prec.
August prec.
September prec.
July prec.
CV monthly prec. grow. seas.
Percent of SNP
variation explained
0.99
0.98
0.97
0.94
0.93
0.91
0.88
0.83
0.83
0.82
22
Table S7. Climate variables and the percent of SNP variation among late-flowering
accessions they explain in RDA (100*Px). Only the top 10 climate variables are shown
for each RDA. Note changes in climate variables important to early-flowering (Table S6)
versus late-flowering accessions.
Late-flowering, RDA on raw SNPs
Climate variable
Min. January temp.
Min. November temp.
Min. temp. coldest month
Min. December temp.
Mean January temp.
Mean December temp.
Min. February temp.
Min. March temp.
Min. April temp.
Mean November temp.
Late-flowering, partial RDA after removing spatial effects
Percent of SNP Climate variable
Percent of SNP
variation explained
variation explained
11.26
Min. July temp.
2.37
11.24
Mean monthly min. temp. grow. seas.
2.29
11.20
Mean diurnal temp. range
2.25
11.19
Min. June temp.
2.24
11.19
Mean monthly max. temp. grow. seas.
2.19
11.17
CV monthly prec.
2.18
11.15
Min. August temp.
2.15
11.15
Mean monthly temp. grow. seas.
2.08
11.14
July prec.
2.07
11.14
CV monthly prec. grow. seas.
2.04
23
Figure S1. Correlation matrix between values of climatic variables at the 389 unique
collection locations in Eurasia. Precipitation variables are shown in blue, temperature
variables are shown in red, VPD variables are shown in purple, growing season variables
are shown in green, and Bioclim derived variables are shown in black, PAR is shown in
orange.
24
Figure S2. Flowering times of accessions from 13 experiments used to train a genetic
SVM model of early vs. late-flowering phenotype. Accessions were split into two groups
with k-means clustering. Clusters are shown in blue and red, cluster means are shown as
black dots. Flowering times for accessions included in experiments neighboring on the
plot are shown as lines; other accessions are shown as circles. The x-axis indicates the
experiment name, light, and temperature conditions (see Table S2 for details). Flowering
time is standardized so that the first flowering individual in each experiment had
flowering time = 0.
25
Figure S3. The first two principal components of flowering time in the absence of
vernalization. Thirteen experiments of varied conditions included 67 accessions.
Accessions are identified by abbreviations in gray, experiments are identified by green
arrows (see Table S2 for description of each). Axes are scaled based on the variation in
flowering time they explain.
26
Figure S4. Histogram of the distribution of accessions along the first principal
compo
nent of
floweri
ng
time
variati
on
shown
in
Figure
S3.
27
Figure S5. Standardized flowering time for 27 accessions that were used to validate
previous flowering time predictions. The first plant to flower was considered day 0.
28
Figure S6. Portion of SNP variation explained (Px) by PCNM eigenvectors (only those
with positive eigenvalues are shown). Eigenvectors are ranked by eigenvalue, with the
greatest eigenvalue on the left declining as one moves right along the x-axis.
29
Figure S7. Portion of SNP variation explained by PCNM eigenvectors (Px) vs. Moran's I
for each eigenvector. Larger values of I indicate greater scale spatial autocorrelation.
30
Figure S8. The first two RDA axes for all accessions combined. Climate variables with
the strongest correlation to each quadrant are shown.
31
Figure S9. The first two RDA axes for all accessions combined. Spatial structure
variables were first removed in partial RDA. Climate variables with the strongest
correlation to each quadrant are shown.
32
Figure S10. The first two RDA axes for early-flowering accessions. Climate variables
with the strongest correlation to each quadrant are shown.
33
Figure S11. The first two RDA axes for early-flowering accessions after removing
spatial structure. Climate variables with the strongest correlation to each quadrant are
shown.
34
Figure S12. The first two RDA axes for late-flowering accessions. Climate variables
with the strongest correlation to each quadrant are shown.
35
Figure S13. The first two RDA axes for late-flowering accessions. Spatial structure
variables were first removed with partial RDA. Climate variables with the strongest
correlation to each quadrant are shown.
36
Figure S14. Venn diagrams of variance partitioning results for early and late-flowering
accessions. Circles represent the proportion of SNP variation explained by climate and
spatial structure. Unexplained SNP variation is represented by the white surrounding the
circles.
37
Figure S15. Comparison of the SNP variation explained by climate variables (Px) in
early vs. late-flowering accessions. Spearman's rank correlations are indicated in red. The
gray line shows a least-squares fit to the correlation between variables. See supplemental
material for a key of Bioclim variable abbreviations (Table S5).
38
Figure S16. Distribution of flowering time groups across the Eurasian sample. April
maximum temperature is shown, as it was the climate variable explaining the most raw
SNP variation in RDA on all accessions.
Download