PHYLOGENETIC CONSERVATISM IN PLANT PHENOLOGY T. Jonathan Davies1*, Elizabeth M. Wolkovich2, Nathan J. B. Kraft3, Nicolas Salamin4,5, Jenica M. Allen6, Toby R. Ault7, Julio L. Betancourt8, Kjell Bolmgren9,10, Elsa E. Cleland11, Benjamin I. Cook12, 13, Theresa M. Crimmins14, Susan J. Mazer15, Gregory J. McCabe16, Stephanie Pau17, Jim Regetz17, Mark D. Schwartz18, & Steven Travers19. 1 Department of Biology, McGill University, Montreal, QC, Canada Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada 3 Department of Biology, University of Maryland, College Park, MD, 20742 , USA 4 Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland 5 Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland 6 Department of Ecology & Evolutionary Biology, University of Connecticut, Storrs, CT 06269 USA 7 National Center for Atmospheric Research, Boulder, Colorado 8 U.S. Geological Survey, Reston, VA 9 Theoretical Population Ecology and Evolution, Lund University, Lund, Sweden 10 Swedish University of Agricultural Sciences, Swedish National Phenology Network, Sweden 11 Ecology, Behavior & Evolution Section, University of California San Diego, La Jolla, CA 92103 USA 12 NASA Goddard Institute for Space Studies, New York, New York 13 Ocean and Climate Physics, Lamont-Doherty Earth Observatory, Palisades, New York 14 USA National Phenology Network, Tucson, Arizona 15 Department of Ecology, Evolution and Marine Biology, University of California – Santa Barbara, Santa Barbara, California 16 U.S. Geological Survey, Denver, Colorado 17 National Center for Ecological Analysis and Synthesis, Santa Barbara, California 18 Department of Geography, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin 19 Department of Biological Sciences, North Dakota State University, Fargo, North Dakota 2 Supporting Information: Construction of phylogeny from molecular sequence data, Author contributions, Table S1, Figure S1 and Newick tree file generated using Phylomatic. 1 SUPPORTING INFORMATION Construction of phylogeny from molecular sequence data An initial survey of available genetic data in GenBank for the list of species present in the phenology dataset indicated that seven DNA regions (cpDNA: atpB, matK, ndhF, rbcL, trnL-F; ntDNA: lfy, phyB) maximised the number of genera included without introducing more than 10% of missing data in the supermatrix. Every genus was considered monophyletic and we selected, for each DNA region, the longest DNA sequence available for each genus. This resulted in a combined DNA matrix that included 1246 different genera and that is available from the Dryad database (10.5061/dryad.td03p886). Multiple alignments were done with the software Mafft using default options. However, we followed a taxonomic approach to perform the alignments by creating initial matrices for each plant family in the dataset. Once multiple alignments were done for every plant family, we used profile alignment to merge family-level alignments into taxonomically wider matrices. These steps followed again a taxonomic hierarchy. Maximum likelihood (ML) analyses were done in RaxML, using a GTR+G model for each of the seven DNA regions analysed. The support of the resulting tree was assessed using 100 bootstrap replicates. Because of the size of the DNA matrix, we used the ML tree to fix the topology during the divergence times analyses and set a single GTR model of evolution for the whole DNA matrix. This reduced the complexity of the MCMC analyses and allowed to better sample the other parameters of the models, in particular the dates of the different nodes of the phylogenetic tree. We further constrained the age of several nodes using the list of fossil calibrations from Smith and Donoghue (2008) that corresponded to lineages present in the ML tree. Normal prior distributions were used on each calibration with means set to represent the fossil dates from Smith and Donoghue (2008) and standard deviations arbitrarily set to 5.0 Mya. We ran a single MCMC chain for 100 * 106 generations, sampling parameters every 1000 generations. We repeated the analyses twice to assess the convergence of the posterior distribution and the effective sample size using Tracer.. All analyses were performed on the vital-IT facilities of the Swiss Institute of Bioinformatics. LITERATURE CITED Smith, S. & Donoghue, M. (2009) Rates of molecular evolution are linked to life history in flowering plants. Science 322: 86-89. Author contributions All authors contributed to editing of the manuscript. In addition EMW and TJD conceived the idea, EMW, NJBK and TJD performed analyses and wrote the paper, BIC, JR and NS performed additional analyses. 2 Table S1. Strength and significance of phylogenetic signal in times of first flower [FF], first leaf [FF] and variance in first flower (var[FF]) within sites, estimated on the Phylomatic tree. Site FF Kthinned ±s.d. Arnell 1877 0.695±0 Arnell 0.342±0.013** BCI 0.443±0.013* Concord 0.514±0.023** 0.342±0.01** Fargo 0.487±0.019** 0.239±0.008 Fitter 0.593±0.032** 0.247±0.011** Gothic 0.459±0.021** 0.456±0.029 Gunnar 0.772±0.031* 0.952±0.163 Harvard 1.366±0.203** Kochmer 0.374±0.009** Konza 0.637±0.030** 0.219±0.008 Luquillo 0.576±0.020* 0.333±0.005 OPG 0.516±0 0.963±0* Robertson 0.572±0.026** Sevilleta 0.364±0.028* 0.430±0.020 0.290±0.012 Soederstroem 0.318±0.005** 0.673±0.017 0.317±0.005 UWM FL Kthinned ±s.d. var(FF) Kthinned ±s.d. 0.627±0 0.559±0.001 0.207±0.003 0.550±0 Washington D. C. 0.352±0.023** 0.232±0.009* WPS 0.616±0.020 0.459±0.011** * = K significant from random at P<0.05, ** = K significant from random at P<0.01 3 FIGURE S1. Phylogenetic distribution of day of year for FF at Harvard. Branches are shaded in proportion to the weighted average of descendent tips (contrast with Figure 1 and S2). FIGURE S2. Phylogenetic distribution of day of year for FF at Chinnor. Branches are shaded in proportion to the weighted average of descendent tips (contrast with Figure 1 and S1). FIGURE S3. Histograms illustrating phylogeographic clustering of floras within sites. Vertical red lines indicate the observed phylogenetic diversity (summed phylogenetic branch lengths) captured by the set of species within each site. Frequency histograms represent expected phylogenetic diversity from re-sampling the same number of species at random from the global dataset (1000 randomisations). Analyses conducted using the ML phylogenetic topology. FIGURE S4. High resolution image of the complete species-level ML phylogenetic tree with species names shaded by day of year of FF; red values indicating events occurring towards the start of the year and violet indicates events occurring towards the end of the year y. 4