Supporting Information Methods S1–S5, Figs S1 & S2, Tables S1, S2, S4–S6 Methods S1 Oleoresin drymass transformation Due to strong positive skew (Fig. 1a), oleoresin drymass was transformed with a Box-Cox power transformation (Box & Cox, 1964) according to the equation: tdm = (dm - 1)/, where tdm is transformed oleoresin drymass, dm is the untransformed drymass in grams 24 h-1, and is the transformation parameter. Transformed oleoresin drymass residuals were checked for normality with quantile-quantile plots using values of that varied in increments of 0.05 from 0.3 to 0.6. Residuals from quantitative genetic analyses approximated a normal distribution with = 0.4. Methods S2 Likelihood ratio tests for correlations Likelihood ratios tests (LRT) were used to assess whether correlation estimates (Table 2, Figs 3, 4) differed significantly from zero. The test statistic for the LRT was D = 2 [log(L)full – log(L)null)], where log(L)full is the log-likelihood of the full model where the correlation was estimated from the data and log(L)null is the log-likelihood of the null model where the correlation was fixed to zero (Gilmour et al., 2009). Under the null hypothesis of zero correlation, the test statistic was assumed to be 2-distributed with degrees of freedom equal the difference in number of covariance parameters estimated in the full versus null models (Table S1). Methods S3 Preselection of SNP loci for association testing Single nucleotide polymorphisms were selected for BAMD association analyses according to the magnitude of their effects on additive genetic variation in tdm within and 1 across sites. The additive genetic effects of all 4854 SNP loci were estimated with three methods: Additive genetic variance reduction - Each SNP locus was analyzed independently with the SAS v. 9.3 Mixed procedure (SAS Institute, Cary, NC) and ranked according to the percent reduction in additive genetic variance of a full model that includes the effect of the kth SNP locus relative to a reduced model without the SNP. Full model: yijkl = μ + rj + lk + ail + fl + εijkl Reduced model: yijl = μ + rj + ail + fl + εijl Where yijl is the observed phenotype of ith clone within jth rep and lth family, is the overall mean, rj is the fixed effect of replicate j = 1….Nreps, lk is the fixed effect of SNP locus k, fl is the random effect of family l = 1…..Nfamilies ~N(0, 2f), ail is the random additive genetic effect of clone i =1…. Nclones ~ N(0, 2a), ij(k)l is the random error ~N(0, 2). The SNP loci that absorbed the largest percentage of additive genetic variance (%reduction = [σ2a-reduced - σ2a-full]/σ2a-reduced * 100) were ranked most highly according to this method. Ridge regression –Following Resende et al. (2012), The effects of all 4854 polymorphic loci were estimated in R (R Core Team, 2012) with the ridge regression model: y ~ 1 + Zg + Where y is a vector of deregressed EBVs of tdm with variance 2p, is the overall mean of y, Z is an incidence matrix of marker genotypes standardized to have mean 0 and variance 1 (missing SNP genotypes were assigned a value of 0), g is a vector of random marker effects ~ N(0, 2gI). Marker effects were shrunk towards zero with the parameter = 1/2e *[2a/NSNPs], where 2a is the additive genetic variance and 2e = 2p - 2a. 2 Bayes C: All 4854 SNP were fit to deregressed EBVs of tdm with Bayes C on the Bioinfomatics to Implement Genomic Selection server (bigs.ansci.iastate.edu/bigsgui/). The Bayes C regression model is similar to one employed for ridge regression except that Bayes C utilizes a Monte Carlo Markov chain to estimate the proportion of SNPs with zero effect (), and then estimates the effects of the remaining SNPs with the prior assumption that their effects are multivariate Student’t t distributed (Habier et al., 2011). Methods S4 Tests for subpopulation structure The program STRUCTURE v. 2.3.3 (Hubisz et al., 2009) was used to detect subpopulation structure within CCLONES. Using SNP data from 4854 loci and allowing for admixture, the likelihood of models with K = 2 through K = 30 subpopulations were compared. No likelihood peak was detected indicating the absence of subpopulation structure in CCLONES. Therefore it was unnecessary to include an effect of subpopulation in the association model. Methods S5 Predicting gains from selection Predicted additive genetic values of transformed oleoresin drymass (tdmPV) for clones within each site were calculated by adding the overall site means (i, where i = 1….Nsites) to the best-linear unbiased predictions of clonal breeding values (tdmEBVj, where j = 1…Nclones). Predicted additive genetic values of tdm were then transformed back into the original drymass units (g 24 h-1 ) with the formula drymassij = ((*tdmPVij) +1)1/, where is the transformation parameter. Estimated F1 genetic gains in under varying selection intensities (Table S6) were calculated by dividing the additive genetic 3 values of the top 10%, 5%, and 1% of clones by the mean of additive genetic values of the entire population at each site. 4 Table S1 Formulas for the estimation of heritabilities and correlations Parameter Total phenotypic variance within sites Estimate Total phenotypic variance across sites Additive genetic heritability Non-additive genetic heritability Broad-sense heritability Single-site incomplete block variance proportion Across-site incomplete block variance proportion Random error variance proportion Site-by-genotype variance proportion Additive genetic correlation Total genetic correlation Environmental correlation Site-by-genotype correlation Phenotypic correlation within sites 5 ˆa2 ˆf2 ˆc2 ˆp2 2-estimated variance component, 1,2- estimated covariance between trait 1 and 2, Subscripts: p – phenotypic, b – incomplete block, a – additive genetic effect of clone, f – non-additive genetic effect of family, c – non-additive genetic effect of clone, sa – siteby-additive genetic, sf – site-by-family, e – error, pwithin – phenotypic variance within sites, pacross – phenotypic variance across sites. Table S2 Contrasts of average transformed oleoresin drymass between sites and years measured in the Pinus taeda CCLONES population Site 1 Site 2 Site 1 Site 2 F-test F-test F P Mean ± SE Mean ± SE NumDF DenDF CUT NAS yr 7 -0.82±0.025 -0.34±0.017 1 676.4 251.91 <0.001 CUT PAL -0.82±0.025 -0.51±0.020 1 561.8 91.19 <0.001 NAS yr 7 NAS yr 6 -0.33±0.017 -0.62±0.015 1 3428.4 142.53 <0.001 NAS yr 7 PAL -0.33±0.017 -0.51±0.020 1 672.5 44.39 <0.001 Means of tdm were contrasted between sites and years in ASReml (Gilmour et al., 2009) with approximate F-tests according to the model: y = + site + rep + incblock + error. Site abbreviations: CUT – Cuthbert, GA, NAS – Nassau, FL, PAL – Palatka, FL. 6 Table S4 Site-specific associations with transformed oleoresin drymass in the Pinus taeda CCLONES population tdm – yr 7 # sig. loci % unique Cuthbert 19 79% Nassau 20 55% Palatka 22 68% Total 51 81% The percentage of associations that were site-specific was determined by repeating association analyses with only 722 clones that were present at all three sites and a common set of 157 mapped SNPs where adjacent loci were greater than 10 cM apart (12.5 cM average distance between adjacent loci). % unique – percentage of significant associations that were detected only at individual sites. 7 Table S5 Tests for associations between transformed oleoresin drymass and SNPs within sequences similar to terpenoid biosynthetic genes in the Pinus taeda CCLONES population Enzyme 1-deoxy-D-xyulose-5phosphate synthase 1-deoxy-D-xyulose 5phosphate reductisomerase 2-C-methyl-D-erythritol 4phosphate cytidyltransferase 4-(cytidine 5’-diphospho)2-C-methyl-D-erythritol kinase 2-C- methyl-D-erythritol 2,4-cyclodiphosphate synthase 1-hydroxy-2-methyl-2-(E)butenyl 4-diphosphate synthase 1-hydroxy-2-methyl-2-(E)butenyl 4-diphosphate reductase Geranyl pyrophosphate synthase Geranyl-geranyl pyrophosphate synthase Terpene synthase Gene DXS #hits 2 #sig. 0 DXR Query GI# 215478267 215478265 215478269 0 - MECT 73672048 0 - CMEK 73672044 73672046 0 - MECS 40849972 1 0 HDS 186532616 1 0 IDS 126697259 126697261 1 0 GPPS 307950754 0 - GGPPS 17352450 3 0 TPS 28894481 59800271 59800265 1 0 Abietadienol/abietadienal oxidase (cyp450) AO 7 2 Sequences from terpenoid biosynthetic genes were blasted against the loblolly pine EST database used for SNP discovery. Enzyme and gene nomenclature for the genes in the 1deoxy-D-xyulose-5 phosphate pathway (first seven rows) follows Kim et al. (2008). Query GI: genbank identifier of query sequences. #hits: number of BLASTX hits (evalue cutoff 10-8) to ESTs containing SNPs. #sig: number of SNPs significantly associated with transformed oleoresin drymass. 8 Table S6 Predicted F1 gains from selection for increased oleoresin flow in the Pinus taeda CCLONES population Fold-increase Fold-increase Fold-increase site h breed top 10% breed top 5% breed top 1% CUT 0.1375 1.618 1.736 1.977 NAS yr 6 0.3039 1.856 2.051 2.409 NAS yr 7 0.2398 1.801 1.977 2.333 PAL 0.1184 1.536 1.614 1.768 ALL 0.1177 1.608 1.717 1.918 2 h – narrow sense heritability. Fold-increase in oleoresin drymass 24 hr-1 under 2 increasing selection intensities was estimated by dividing the mean of the predicted genetic values of the selected clones by the mean predicted genetic values of the entire CCLONES population. 9 Fig. S1 Oleoresin wetmass and drymass comparisons. Samples were stored at 4C prior to obtaining wetmass to minimize volatilization of monoterpenes. Oleoresin drymass was obtained after lypophilizing the samples for 3 d. (A) Phenotypic correlation between oleoresin wetmass and drymass. Observations where wetmass >> drymass (rainwater contamination?) or drymass >> wetmass (phenotyping errors?) were reweighed. (B) A comparison of across-site heritability of oleoresin wetmass and drymass. Error bars are 1 SE of the variance proportion estimates. 10 Fig. S2 A comparison of SNP preselection methods with cross-validation. Prior to association analysis of tdm from each site, the effects of all 4854 polymorphic loci on 11 tdm EBVs were estimated with three methods (additive variance reduction, ridge regression, and Bayes C). Four hundred SNP loci including those with the largest effects in preselection + 16 SNPs in putative terpene biosynthetic genes (Table S5) were included in BAMD association analyses. The efficiency by which these methods selected SNPs linked to the causative polymorphisms was assessed by comparing how accurately the SNPs found to be significant with BAMD predicted deregressed EBVs of tdm in 10fold cross validation. The significant SNPs were added to the prediction model in decreasing order of their average effect on tdm EBVs among 10 random partitions of the CCLONES population. 12 References Box GEP, Cox DR. 1964. An analysis of transformations. Journal of the Royal Statistical Society, Series B 26(2): 211-252. Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. 2005. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18): 3674-3676. Eckert AJ, van Heerwaarden J, Wegrzyn JL, Nelson CD, Ross-Ibarra J, GonzalezMartinez SC, Neale DB. 2010b. Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae). Genetics 185(3): 969-982. Gilmour AR, Gogel BJ, B.R. C, Thompson R. 2009. ASReml user guide release 3.0. Hemel Hempstead, HP1 1ES, UK: VSN International Ltd. Habier D, Fernando RL, Kizilkaya K, Garrick DJ. 2011. Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics 12: 186. Hubisz MJ, Falush D, Stephens M, Pritchard JK. 2009. Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources 9(5): 1322-1332. Kim SM, Kuzuyama T, Kobayashi A, Sando T, Chang YJ, Kim SU. 2008. 1-hydroxy-2methyl-2-(e)-butenyl 4-diphosphate reductase (IDS) is encoded by multicopy genes in gymnosperms Ginkgo biloba and Pinus taeda. Planta 227(2): 287-298. R Core Team. 2012. R: A Language and Environment for Statistical Computing. Vienna, Austria. http://www.R-project.org/ Resende MFR, Munoz P, Acosta JJ, Peter GF, Davis JM, Grattapaglia D, Resende 13 MDV, Kirst M. 2012. Accelerating the domestication of trees using genomic selection: Accuracy of prediction models across ages and environments. New Phytologist 193(3): 617-624. 14