Methods S1 Oleoresin drymass transformation

advertisement
Supporting Information Methods S1–S5, Figs S1 & S2, Tables S1, S2, S4–S6
Methods S1 Oleoresin drymass transformation
Due to strong positive skew (Fig. 1a), oleoresin drymass was transformed with a
Box-Cox power transformation (Box & Cox, 1964) according to the equation:
tdm = (dm - 1)/, where tdm is transformed oleoresin drymass, dm is the untransformed
drymass in grams 24 h-1, and  is the transformation parameter. Transformed oleoresin
drymass residuals were checked for normality with quantile-quantile plots using values of
 that varied in increments of 0.05 from 0.3 to 0.6. Residuals from quantitative genetic
analyses approximated a normal distribution with  = 0.4.
Methods S2 Likelihood ratio tests for correlations
Likelihood ratios tests (LRT) were used to assess whether correlation estimates
(Table 2, Figs 3, 4) differed significantly from zero. The test statistic for the LRT was D
= 2 [log(L)full – log(L)null)], where log(L)full is the log-likelihood of the full model where
the correlation was estimated from the data and log(L)null is the log-likelihood of the null
model where the correlation was fixed to zero (Gilmour et al., 2009). Under the null
hypothesis of zero correlation, the test statistic was assumed to be 2-distributed with
degrees of freedom equal the difference in number of covariance parameters estimated in
the full versus null models (Table S1).
Methods S3 Preselection of SNP loci for association testing
Single nucleotide polymorphisms were selected for BAMD association analyses
according to the magnitude of their effects on additive genetic variation in tdm within and
1
across sites. The additive genetic effects of all 4854 SNP loci were estimated with three
methods:
Additive genetic variance reduction - Each SNP locus was analyzed independently with
the SAS v. 9.3 Mixed procedure (SAS Institute, Cary, NC) and ranked according to the
percent reduction in additive genetic variance of a full model that includes the effect of
the kth SNP locus relative to a reduced model without the SNP.
Full model: yijkl = μ + rj + lk + ail + fl + εijkl
Reduced model: yijl = μ + rj + ail + fl + εijl
Where yijl is the observed phenotype of ith clone within jth rep and lth family,  is the
overall mean, rj is the fixed effect of replicate j = 1….Nreps, lk is the fixed effect of SNP
locus k, fl is the random effect of family l = 1…..Nfamilies ~N(0, 2f), ail is the random
additive genetic effect of clone i =1…. Nclones ~ N(0, 2a), ij(k)l is the random error ~N(0,
2). The SNP loci that absorbed the largest percentage of additive genetic variance
(%reduction = [σ2a-reduced - σ2a-full]/σ2a-reduced * 100) were ranked most highly according to this
method.
Ridge regression –Following Resende et al. (2012), The effects of all 4854 polymorphic
loci were estimated in R (R Core Team, 2012) with the ridge regression model:
y ~ 1 + Zg + 
Where y is a vector of deregressed EBVs of tdm with variance 2p,  is the overall mean
of y, Z is an incidence matrix of marker genotypes standardized to have mean 0 and
variance 1 (missing SNP genotypes were assigned a value of 0), g is a vector of random
marker effects ~ N(0, 2gI). Marker effects were shrunk towards zero with the parameter
 = 1/2e *[2a/NSNPs], where 2a is the additive genetic variance and 2e = 2p - 2a.
2
Bayes C: All 4854 SNP were fit to deregressed EBVs of tdm with Bayes C on the
Bioinfomatics to Implement Genomic Selection server (bigs.ansci.iastate.edu/bigsgui/).
The Bayes C regression model is similar to one employed for ridge regression except
that Bayes C utilizes a Monte Carlo Markov chain to estimate the proportion of SNPs
with zero effect (), and then estimates the effects of the remaining SNPs with the prior
assumption that their effects are multivariate Student’t t distributed (Habier et al., 2011).
Methods S4 Tests for subpopulation structure
The program STRUCTURE v. 2.3.3 (Hubisz et al., 2009) was used to detect
subpopulation structure within CCLONES. Using SNP data from 4854 loci and allowing
for admixture, the likelihood of models with K = 2 through K = 30 subpopulations were
compared. No likelihood peak was detected indicating the absence of subpopulation
structure in CCLONES. Therefore it was unnecessary to include an effect of
subpopulation in the association model.
Methods S5 Predicting gains from selection
Predicted additive genetic values of transformed oleoresin drymass (tdmPV) for
clones within each site were calculated by adding the overall site means (i, where i =
1….Nsites) to the best-linear unbiased predictions of clonal breeding values (tdmEBVj,
where j = 1…Nclones). Predicted additive genetic values of tdm were then transformed
back into the original drymass units (g 24 h-1 ) with the formula drymassij = ((*tdmPVij)
+1)1/, where  is the transformation parameter. Estimated F1 genetic gains in under
varying selection intensities (Table S6) were calculated by dividing the additive genetic
3
values of the top 10%, 5%, and 1% of clones by the mean of additive genetic values of
the entire population at each site.
4
Table S1 Formulas for the estimation of heritabilities and correlations
Parameter
Total phenotypic
variance within sites
Estimate
Total phenotypic
variance across sites
Additive genetic
heritability
Non-additive genetic
heritability
Broad-sense heritability
Single-site incomplete
block variance
proportion

Across-site incomplete
block variance
proportion
Random error variance
proportion
Site-by-genotype
variance proportion
Additive genetic
correlation
Total genetic
correlation
Environmental
correlation
Site-by-genotype
correlation
Phenotypic correlation
within sites
5

ˆa2  
ˆf2  
ˆc2

ˆp2
2-estimated variance component, 1,2- estimated covariance between trait 1 and 2,
Subscripts: p – phenotypic, b – incomplete block, a – additive genetic effect of clone, f –
non-additive genetic effect of family, c – non-additive genetic effect of clone, sa – siteby-additive genetic, sf – site-by-family, e – error, pwithin – phenotypic variance within
sites, pacross – phenotypic variance across sites.
Table S2 Contrasts of average transformed oleoresin drymass between sites and years
measured in the Pinus taeda CCLONES population
Site 1
Site 2
Site 1
Site 2
F-test
F-test
F
P
Mean ± SE
Mean ± SE
NumDF DenDF
CUT
NAS yr 7
-0.82±0.025 -0.34±0.017 1
676.4
251.91 <0.001
CUT
PAL
-0.82±0.025 -0.51±0.020 1
561.8
91.19
<0.001
NAS yr 7 NAS yr 6
-0.33±0.017 -0.62±0.015 1
3428.4
142.53 <0.001
NAS yr 7 PAL
-0.33±0.017 -0.51±0.020 1
672.5
44.39
<0.001
Means of tdm were contrasted between sites and years in ASReml (Gilmour et al., 2009)
with approximate F-tests according to the model: y =  + site + rep + incblock + error.
Site abbreviations: CUT – Cuthbert, GA, NAS – Nassau, FL, PAL – Palatka, FL.
6
Table S4 Site-specific associations with transformed oleoresin drymass in the Pinus
taeda CCLONES population
tdm – yr 7
# sig. loci % unique
Cuthbert
19
79%
Nassau
20
55%
Palatka
22
68%
Total
51
81%
The percentage of associations that were site-specific was determined by repeating
association analyses with only 722 clones that were present at all three sites and a
common set of 157 mapped SNPs where adjacent loci were greater than 10 cM apart
(12.5 cM average distance between adjacent loci). % unique – percentage of significant
associations that were detected only at individual sites.
7
Table S5 Tests for associations between transformed oleoresin drymass and SNPs within
sequences similar to terpenoid biosynthetic genes in the Pinus taeda CCLONES
population
Enzyme
1-deoxy-D-xyulose-5phosphate synthase
1-deoxy-D-xyulose 5phosphate reductisomerase
2-C-methyl-D-erythritol 4phosphate cytidyltransferase
4-(cytidine 5’-diphospho)2-C-methyl-D-erythritol
kinase
2-C- methyl-D-erythritol
2,4-cyclodiphosphate
synthase
1-hydroxy-2-methyl-2-(E)butenyl 4-diphosphate
synthase
1-hydroxy-2-methyl-2-(E)butenyl 4-diphosphate
reductase
Geranyl pyrophosphate
synthase
Geranyl-geranyl
pyrophosphate synthase
Terpene synthase
Gene
DXS
#hits
2
#sig.
0
DXR
Query GI#
215478267
215478265
215478269
0
-
MECT
73672048
0
-
CMEK
73672044
73672046
0
-
MECS
40849972
1
0
HDS
186532616
1
0
IDS
126697259
126697261
1
0
GPPS
307950754
0
-
GGPPS
17352450
3
0
TPS
28894481
59800271
59800265
1
0
Abietadienol/abietadienal
oxidase (cyp450)
AO
7
2
Sequences from terpenoid biosynthetic genes were blasted against the loblolly pine EST
database used for SNP discovery. Enzyme and gene nomenclature for the genes in the 1deoxy-D-xyulose-5 phosphate pathway (first seven rows) follows Kim et al. (2008).
Query GI: genbank identifier of query sequences. #hits: number of BLASTX hits (evalue cutoff 10-8) to ESTs containing SNPs. #sig: number of SNPs significantly
associated with transformed oleoresin drymass.
8
Table S6 Predicted F1 gains from selection for increased oleoresin flow in the Pinus
taeda CCLONES population
Fold-increase
Fold-increase Fold-increase
site
h
breed top 10% breed top 5% breed top 1%
CUT
0.1375
1.618
1.736
1.977
NAS yr 6
0.3039
1.856
2.051
2.409
NAS yr 7
0.2398
1.801
1.977
2.333
PAL
0.1184
1.536
1.614
1.768
ALL
0.1177
1.608
1.717
1.918
2
h – narrow sense heritability. Fold-increase in oleoresin drymass 24 hr-1 under
2
increasing selection intensities was estimated by dividing the mean of the predicted
genetic values of the selected clones by the mean predicted genetic values of the entire
CCLONES population.
9
Fig. S1 Oleoresin wetmass and drymass comparisons. Samples were stored at 4C prior
to obtaining wetmass to minimize volatilization of monoterpenes. Oleoresin drymass
was obtained after lypophilizing the samples for 3 d. (A) Phenotypic correlation between
oleoresin wetmass and drymass. Observations where wetmass >> drymass (rainwater
contamination?) or drymass >> wetmass (phenotyping errors?) were reweighed. (B) A
comparison of across-site heritability of oleoresin wetmass and drymass. Error bars are 1
SE of the variance proportion estimates.
10
Fig. S2 A comparison of SNP preselection methods with cross-validation. Prior to
association analysis of tdm from each site, the effects of all 4854 polymorphic loci on
11
tdm EBVs were estimated with three methods (additive variance reduction, ridge
regression, and Bayes C). Four hundred SNP loci including those with the largest
effects in preselection + 16 SNPs in putative terpene biosynthetic genes (Table S5) were
included in BAMD association analyses. The efficiency by which these methods selected
SNPs linked to the causative polymorphisms was assessed by comparing how accurately
the SNPs found to be significant with BAMD predicted deregressed EBVs of tdm in 10fold cross validation. The significant SNPs were added to the prediction model in
decreasing order of their average effect on tdm EBVs among 10 random partitions of the
CCLONES population.
12
References
Box GEP, Cox DR. 1964. An analysis of transformations. Journal of the Royal Statistical
Society, Series B 26(2): 211-252.
Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. 2005. Blast2GO: A
universal tool for annotation, visualization and analysis in functional genomics research.
Bioinformatics 21(18): 3674-3676.
Eckert AJ, van Heerwaarden J, Wegrzyn JL, Nelson CD, Ross-Ibarra J, GonzalezMartinez SC, Neale DB. 2010b. Patterns of population structure and
environmental associations to aridity across the range of loblolly pine (Pinus
taeda L., Pinaceae). Genetics 185(3): 969-982.
Gilmour AR, Gogel BJ, B.R. C, Thompson R. 2009. ASReml user guide release 3.0.
Hemel Hempstead, HP1 1ES, UK: VSN International Ltd.
Habier D, Fernando RL, Kizilkaya K, Garrick DJ. 2011. Extension of the bayesian alphabet
for genomic selection. BMC Bioinformatics 12: 186.
Hubisz MJ, Falush D, Stephens M, Pritchard JK. 2009. Inferring weak population structure
with the assistance of sample group information. Molecular Ecology Resources 9(5):
1322-1332.
Kim SM, Kuzuyama T, Kobayashi A, Sando T, Chang YJ, Kim SU. 2008. 1-hydroxy-2methyl-2-(e)-butenyl 4-diphosphate reductase (IDS) is encoded by multicopy genes in
gymnosperms Ginkgo biloba and Pinus taeda. Planta 227(2): 287-298.
R Core Team. 2012. R: A Language and Environment for Statistical Computing. Vienna,
Austria. http://www.R-project.org/
Resende MFR, Munoz P, Acosta JJ, Peter GF, Davis JM, Grattapaglia D, Resende
13
MDV, Kirst M. 2012. Accelerating the domestication of trees using genomic
selection: Accuracy of prediction models across ages and environments. New
Phytologist 193(3): 617-624.
14
Download