Association Genetics of Traits Controlling Cellulose and

advertisement
Association Genetics of Traits Controlling Lignin and Cellulose Biosynthesis in Black
Cottonwood (Populus trichocarpa, Salicaceae) Secondary Xylem
Jill L Wegrzyn (1), Andrew J. Eckert (2,3), Minyoung Choi (2), Jennifer M. Lee (2), Brian J. Stanton
(4), Robert Sykes (5), Mark F. Davis (5), Chung-Jui Tsai (6), and David B. Neale (1,3,7,8,9)
(1)Department of Plant Sciences, University of California at Davis, Davis, CA 95616
(2)Section of Evolution and Ecology, University of California at Davis, Davis, CA 95616
(3)Center for Population Biology, University of California at Davis, Davis, CA 95616
(4)Genetic Resources Conservation Program, Greenwood Resources, Portland, OR
(5)National Renewable Energy Lab, Golden, CO
(6)School of Forestry and Natural Resources, and Department of Genetics, University of Georgia, Athens, GA
(7)Bioenergy Research Center (BERC), University of California at Davis, Davis, CA 95616
(8)Institute of Forest Genetics, USDA Forest Service, Davis, CA 95616
(9) Author for correspondence:
David B. Neale
Department of Plant Sciences
Mail Stop 6
University of California, Davis
Davis, CA 95616
(530) 754-8431
dbneale@ucdavis.edu
Abstract
Recent interest in poplars as a source of renewable energy, combined with the vast genomic resources
available, has enabled further examination of the genetic diversity underlying the lignin and cellulose
biosynthetic pathways. In this study, an association genetics approach was used to examine individual
genes and alleles at the loci responsible for complex traits controlling lignin and cellulose quality and
quantity in black cottonwood (Populus trichocarpa Torr. & A. Gray). Forty candidate genes of the
lignin and cellulose biosynthetic pathways were resequenced in a panel of 15 unrelated individuals to
identify single nucleotide polymorphisms (SNPs). A total of 1,536 SNPs were subsequently genotyped
in a clonally replicated population (448 clones). The association population (1,080 trees) was
phenotyped using pyrolysis molecular beam mass spectrometry (pyMBMS). Both single marker and
haplotype-based association tests were implemented to identify associations for composite traits
representing lignin content, syringyl/guaiacyl ratio, and C6 sugars. A general linear model approach,
including population structure estimates as covariates, was implemented for each marker-trait pair. This
study identified 27 highly significant associations (FDR Q < 0.10) unique across 40 candidate genes in
three composite traits. Of these, five associations were found to be in the coding region of the candidate
genes, with two being nonsynonymous. Haplotype-based associations were performed on 181 amplicons
across the 40 genes. For lignin content and C6 sugars, 23 significant haplotypes within 11 genes were
discovered. The majority of markers (56%) in all three composite traits were characterized as
having additive modes of gene action. These associations provide insight into the genetic components
of complex traits involved in the lignin and cellulose biosynthetic pathways in black cottonwood.
Introduction
Forest trees are a potential source of net-zero carbon emission lignocellulosic biofuels. Production of
biofuels involves collection of biomass, deconstruction of cell wall polymers into component sugars
(pretreatment and saccharification), and conversion of these sugars to ethanol (fermentation) (Rubin
2008). Woody bioenergy crops from which biomass is derived have not been domesticated for this
purpose and the current methods for lignocellulosic saccharification and fermentation are inefficient.
The recent need to develop viable fuel alternatives is now taking advantage of genomics resources and
technologies to discover the potential gain that can be achieved through breeding. Traits of interest in
trees with applications in bioenergy include: growth rate, branching habit, stem thickness, and cell-wall
chemistry (Stettler et al. 1996; Bradshaw et al. 2000). As a commercial species, black cottonwood
(Populus trichocarpa Torr. & A. Gray) and its hybrids have already proven themselves to be valuable as
a renewable energy resource. Rapid growth, moderate genome size, woody tissues, and economic
importance make black cottonwood an ideal model organism to examine biofuels-related traits
(Bradshaw et al. 2000). Black cottonwood possesses tremendous genetic and phenotypic diversity, is
obligate outcrossing, able to hybridize with many other species, and easily clonally propagated (Davis,
2008). To further complement the advantages of this species as a short rotation woody crop, black
cottonwood is the first tree and bioenergy feedstock to have its genome sequenced and annotated.
Derived from a single wild individual (Nisqually-1), the genome sequence represents an estimated
45,500 genes across 19 chromosomes (Tuskan et al. 2006). In addition to the genome, resources such as
controlled cross-populations, cross-species molecular markers, expressed sequence tag (EST)
collections, and full-length cDNAs are available to the research community (Ralph et al. 2006; Strauss
and Martin 2004; Tuskan et al. 2006).
Improvement of biofuels feedstocks focuses on increasing both the relative carbon partitioning in woody
tissues above ground and cellulose accessibility for enzymatic digestion (Ragauskas et al. 2006). As
with other woody species, the major components of black cottonwood secondary cell walls are
cellulose, hemicellulose, and lignin (Harris et al. 2008). Lignin inhibits saccharification in processes
aimed at producing simple sugars for fermentation to ethanol. Many studies have been focused on the
molecular biology of wood and secondary wall formation (Plomion et al. 2001; Schrader et al. 2004;
Sterky et al. 1998, 2004). The pathways and genes involved in lignin and cellulose biosynthesis and
microfibril deposition are increasingly well understood through biochemical analysis and expression
studies (Whetten et al. 1998; Plomion et al. 2001; Li et al. 2003; Peter and Neale 2004; Shrader et al.
2004; Boerjan 2005; Oakley et al. 2007). The specific roles of genes in these pathways have been
verified through forward and reverse genetic mutation studies (Dixon and Reddy 2003; Ralph et al.
2007; Davis 2008). A relatively unexplored area of research is to identify the natural allelic variation
controlling phenotype variation and to exploit this variation in breeding.
A major goal of population and quantitative genetics is the identification of polymorphisms responsible
for phenotypic variation (Feder and Mitchell-Olds, 2003; Stinchcombe and Hoekstra 2007). Many traits
of interest in forest trees, such as wood quality, are complex in nature and occur later in development
(Groover 2007). Recent advances in high-throughput marker technologies, combined with the wealth of
genomic resources available to species like black cottonwood, enable closer examination of the number
and effect sizes of genes responsible for traits of interest through complex trait dissection using
association mapping. Tree species are ideal for association mapping as they are predominantly
outcrossing and have large, relatively unstructured populations, resulting in high levels of nucleotide
diversity and low linkage disequilibrium (LD) (Neale and Savolaninen 2004; Gonzalez-Martinez et al.
2006). Significant associations between SNPs within candidate genes have been established in forest
trees. Associations with wood quality traits in Eucalyptus (Thumma et al. 2005), wood quality and
drought tolerance traits in loblolly pine (Gonzalez-Martinez et al. 2007, 2008), bud phenology traits in
European poplar (Ingvarsson et al. 2008), and cold-hardiness related traits in coastal Douglas-fir (Eckert
et al. 2009a) have been identified. In general, individual SNPs explain a small portion of the phenotypic
variance (0.5%-5.0%), which is consistent with the complex nature of these traits.
In this study, statistical models were applied to perform association tests and account for population
structure in 579 SNPs from 40 candidate genes involved in lignocellulosic cell wall synthesis in black
cottonwood. Single-marker and haplotype-based tests were performed to identify associations with
natural variation in composite traits evaluating lignin and cellulose content.
Materials and Methods
Association Population and Phenotypic Data
Focal Species
The native range of black cottonwood covers large sections of western North America, primarily
inhabiting floodplains and river margins (Kelleher et al. 2007). The range extends from Kodiak Island
along Cook Inlet to latitude 62° 30° N., to southeast Alaska and British Columbia to the forested
areas of Washington, Oregon, and to the mountains in southern California and northern Baja
California (lat. 31° N.). It is also found inland, generally on the west side of the Rocky Mountains, in
British Columbia, western Alberta, western Montana, and northern Idaho. Scattered small
populations have been noted in southeastern Alberta, eastern Montana, western North Dakota,
western Wyoming, Utah, and Nevada. Black cottonwood grows up to elevations of 2100 m.
Association Population
As part of a long-term Populus x generosa hybridization program, GreenWood Resources (Portland,
OR) assembled a collection of 1,189 black cottonwood clones from 101 provenances from 12 river
drainages located west of the Cascade Mountains between 480 56’ N latitude (Nooksack River,
Whatcom County, Washington) and 430 47’ N latitude (Middle Fork, Willamette River, Lane County,
Oregon) during the period 1990 through 1999 (Figure 1). The collection was established in clone banks
where it was annually coppiced to remove C-effects from planting stock used in the establishment of
clonally replicated field trials in 1994, 1996, 1999, and 2003. All four trials were planted at an alluvial
site on the lower Columbia River floodplain at Westport, Oregon (460 08’ N). The soil is deep,
moderately well drained with a loam - silt loam surface overlaying a sandy loam to fine sand horizon.
Annual precipitation averages 2,034 mm and the average maximum temperature during the April September growing season is 20 C.
Sample Preparation and Wood Chemistry Phenotyping
Wood samples were collected from a subset of 448 clones representing all of the original provenances.
Two Haglof 5 mm increment borers were taken from the bark to the pith of up to three ramets per clone
growing in the four Westport clone trials (Figure 1B, Table S1). Cores were extracted at diameter at
breast height (1.37m) and placed in a –8ºC freezer until sectioning. Sample preparation consisted of
removing the two outermost complete growth rings of each core due to different ages of the trees.
Ground wood samples (~4 mg) were prepared in stainless steel sample cups, and pyrolyzed using a
Frontier Pyrolyzer, PY2020iD (Frontier Laboratories, LTD). Pyrolysis was performed at 500°C using
helium carrier gas flowing at 2.0 L/min (at STP). The transfer line connecting the pyrolysis unit to the
molecular beam mass spectrometer (MBMS) was heated to approximately 400°C. The pyrolysis vapors
were expanded through a ruby sampling orifice that was mated directly to the faceplate of the MBMS.
Total pyrolysis time was 30 s, although the pyrolysis reaction was completed in less than 12 s. A
custom-built molecular-beam mass spectrometer using an ExtrelTM Model TQMS C50 mass
spectrometer was used for pyrolysis vapor analysis. Mass spectral data from mass to charge ratio (m/z)
30-450 were acquired on a Merlin data acquisition system using 22.5eV electron impact ionization.
Using this system, both light gases and heavy tars are sampled simultaneously and in real time. The
mass spectrum of the pyrolysis vapor provides a rapid, semi-quantitative depiction of the molecular
fragments. Data analysis was performed using the Unscrambler v. 9.7 (CAMO A/S, Trondheim,
Norway).
Resequencing, SNP Discovery, and Genotyping
Candidate Gene Selection
Forty candidate genes associated with lignocellulosic cell wall development were selected for
resequencing (Table 1). These include 22 genes from 11 gene families involved in lignin biosynthesis
and polymerization, six genes from four families involved in one-carbon metabolism associated with
lignin biosynthesis, and 12 genes from five families involved in cellulose biosynthesis and microfibril
deposition. The corresponding gene models were obtained from the JGI Poplar Genome Assembly v. 1.1
and manually curated (Table 1).
DNA Isolation, Primer Design, and Resequencing
Leaf tissue from the diversity panel of 15 unrelated poplar clones (one ramet/clone), selected to
represent the latitudinal range of the entire clone collection, were sampled as leaf punches, dried with
silica gel, and shipped at room temperature to DNA Landmarks (Quebec, Canada) for DNA extraction
utilizing their proprietary micro-scale protocol. All DNA extractions were standardized to 2.5 ng/ul for
resequencing. The same protocol was used to extract DNA for the 448 clones, with all extractions
standardized to 50 ng/ul prior to genotyping.
Primers were designed at Ampure Agencourt Bioscience Corporation (Beverly, MA) utilizing custom
software against the Poplar Genome Assembly v. 1.1. Genomic sequences covering the entire proteincoding regions, including introns and 1,000 bp upstream and 300 bp downstream noncoding sequences,
were retrieved for primer design. The program was set to design primers every 700 bp which yielded
517 primer pairs across the 40 genes. Of these, Agencourt utilized in-house software to select 200 nonoverlapping primer pairs based on a quality metric representing the redundancy in the genome and how
likely the amplicon is to be a homopolymer locus. The best-scoring pairs were tagged with M13F
(GTAAAACGACGGCCAGT) and M13R (CAGGAAACAGCTATGACC) primers for high-throughput
sequencing.
Genomic DNA was amplified in 384-well format PCR setup. Each PCR reaction contained 10 ng DNA,
1X HotStar buffer, 0.8 mM dNTPs, 1 mM MgCl2, 0.2U HotStar enzyme (Qiagen, Valencia, CA) and 0.2
μM forward and reverse primers in a 10 μl reaction. PCR cycling parameters were: one cycle of 95ºC for
15 min, 35 cycles of 9C for 20 s, 60C for 30 s and 72C for 1 min, followed by one cycle of 72C for
3 min. The resultant PCR products were purified using solid phase reversible immobilization chemistry
followed by dye-terminator fluorescent sequencing with universal M13 primers. PCR for sequencing
was initiated at 95°C for 15 mins followed by: 40 cycles for 10 s, 50 cycles for 5 s, and finally, 60 cycles
for 2 mins 30 s. Dye-terminator removal was performed using SPRI. Bidirectional Sanger sequencing of
PCR fragments was carried out via capillary electrophoresis using ABI Prism 3730xl DNA analyzers
(Applied Biosystems, Foster City, CA).
SNP Discovery and Selection
Sanger resequencing produced a total of 202 amplicons representing 40 genes. The package, PineSAP
(Pine Sequence Alignment and SNP Identification) (Wegrzyn et al. 2009), applied a combination of
ProbConsRNA (Do et al. 2005), Polyphred (Nickerson et al. 1997), Polybayes (Marth et al. 1999),
and machine learning techniques to align sequences from 195 of the 202 amplicons and
computationally identify 1,485 polymorphisms (an average of 7 SNPs/amplicon). SNP detection of
the resulting calls was based on information gathered on quality scores, coverage, and alignment
metrics computed during the sequence alignments. The identified polymorphisms and their
flanking sequences were formatted for the GoldenGate assay (Illumina, San Diego, CA) and
submitted to their in-house software package responsible for assigning design scores. An
additional 1,233 SNPs from 232 genes were identified for population structure inference through eSNP
methods utilizing ESTs from male and female catkin tissue aligned to the reference genome (Unneberg
et al. 2005). To construct the 1,536 assay, we selected 948 high scoring SNPs from the 40
lignin/cellulose genes and 588 high scoring eSNPs from the 232 catkins ESTs.
SNP Genotyping
Genotyping was carried out using the Illumina GoldenGate SNP genotyping platform (Landegren et al.
1998; Oliphant et al. 2002; Fan et al. 2003; Eckert et al. 2009b) at the DNA Technologies Core Facility
(UC Davis). The assay involves generating templates with specific target and address sequences using
allele-specific extension followed by ligation and amplification with universal primers. Fluorescent
products are hybridized to coded beads on an array matrix and signal intensities are subsequently
determined using the BeadArray Reader (Illumina). Signal intensities are quantified and matched to
specific alleles using BeadStudio v. 3.1.14 (Illumina). Manual adjustments to genotypic clusters were
made when necessary. For inclusion of SNPs into the final data set, we used thresholds of 0.20 and 0.60
for the GenCall50 (GC50) and call rate (CR) indices, respectively (Table S2). These are established
quality metrics that have been used to evaluate Illumina genotyping data (Pavy et al. 2008; Eckert et al.
2009b). The scores reflect the quality genotypic clusters (GC50) and the fraction of the samples that had
a genotype defined for a particular SNP.
Tests for Association
Genetic Diversity, Population Structure and Linkage Disequilibrium
For each SNP, we estimated expected and observed heterozygosity, Wright’s inbreeding coefficient
(FIS) and hierarchical fixation indices using the Genetics and hierfstat packages available in R (Warnes
and Leisch 2006; Goudet 2005; R Development Core Team, 2007). We excluded those SNPs with |FIS|
> 0.25 from further analyses. The significance of multilocus fixation indices was tested via
bootstrapping across loci (n = 10,000 replicates) to obtain 99% confidence intervals (99% CI). Patterns
of population structure were further examined using principal components analysis (PCA). Population
structure coefficients were estimated using Eigenstrat v. 2.0 (Price et al. 2006). For association analyses,
a Q-matrix defined by significant principal components (PCs) as assessed using the Tracy-Widom
distribution (Patterson et al. 2006) was utilized. Cluster membership was determined via hierarchical
cluster analysis using Ward’s linkage and Euclidean distances on the significant PCs. The number of
clusters was identified as k+1, where k is the number of significant PCs. We identified FST outliers
using the bivariate distribution of expected heterozygosity and FST among inferred clusters observed for
the 297 eSNPs to define the genome-wide expectation of background levels of genetic structure. Lignin
SNPs falling outside this distribution were identified as FST outliers.
Linkage disequilibrium (LD) was measured as the squared correlation of allele frequencies, r 2 (Hill and
Robertson 1968), which is affected both by recombination and by differences in allele frequencies
between sites. The r2 value between pairs of informative SNP sites in candidate genes was calculated
using the Genetics package in R (Warnes and Leisch 2006; R Development Core Team, 2007). Patterns
of LD were investigated among SNPs from 39 of the 40 candidate genes. CesA1A was not included in
this analysis due to physical annotation differences in the reference genome. To assess the extent of LD
in the sequenced genomic regions, the decay of LD with physical distance (base pairs) between SNP
sites within each candidate locus and over all candidate genes was evaluated by nonlinear regression
analysis of r2 values (Remington et al. 2001). The expectation of r2 for low mutation rates and taking
into account sample size is given by:
Where C is the population recombination parameter (P = 4Ner) and n the sample size
and replaced C by C x distance in base pairs when fitting the formula to our data using the nonlinear
regression (nls) function in the R (R Core Development Team, 2007).
Statistical Models
Single marker models were utilized for all SNP-trait combinations. A general linear model (GLM) was
fitted to each trait-SNP combination (Yu et al. 2006), with SNP markers as fixed effects and elements of
the Q-matrix as covariates. P-values were generated for each test using 10,000 permutations of
genotypes with respect to phenotypic trait values. All analyses were conducted using TASSEL v. 2.0.1
(Bradbury et al. 2007). Corrections for multiple testing were performed using the positive false
discovery rate (FDR) method (Storey 2002; Storey and Tibshirani 2003). All the necessary data
to perform these analyses are available in Files S1 and S2. Modes of gene action were quantified using
the ratio of dominance (d) to additive (a) effects estimated from least square means for each genotypic
class. Partial or complete dominance was defined as values in the range of 0.50 < |d/a| < 1.25, while
additive effects were defined as values in the range -0.50 ≤ d/a ≤ 0.50. Values of |d/a| > 1.25 were
equated with over- or underdominance.
Haplotypes were inferred and their frequencies were estimated using the modified expectation
maximization (EM) method of haplotype inference included in the haplo.stats (v. 2.0.1) program
available in R (Schaid et al. 2002; R Core Development Team, 2007). Singleton alleles were ignored
when constructing the haplotypes, and haplotypes with frequency less than five were also discarded.
Output in the form of global-score statistics and haplotype-specific scores were derived from
generalized linear models. Corrections for multiple testing were performed using the positive false
discovery rate (FDR) method (Storey 2002; Storey and Tibshirani 2003).
Results
Phenotype
Wood samples were analyzed using pyrolysis molecular beam mass spectrometry (pyMBMS). The
intensities of the major peaks assigned to lignin were summed in order to estimate the lignin content,
Syringyl/Guaiacyl (S/G) ratios, C5 sugars, and C6 sugars across the range of samples (Table 2). Lignin
content was calculated with peaks at mass to charge ratio (m/z) 124, 137, 138, 150, 152, 164, and 178;
these were summed and then averaged for the different samples. S/G ratios were determined by
summing S peaks at m/z 154, 167, 168, 182, 194, 208, and 210 then dividing by the sum of G peaks at
m/z 124, 137, 138, 150, 164, and 178. C5 sugars were calculated as the sum of the peaks at m/z 57, 73,
85, 96, 114. Likewise, C6 sugars were calculated as the sum of the following peaks at m/z 57, 60, 73,
98, 126, 144. Visualization of each phenotype demonstrated a strongly bimodal distribution for the C5
trait as opposed to the distributions for the other three composite traits, which were approximately
normal. As a result, C5 was not included in subsequent analyses. S/G ratios ranged from 1.2-2.4 while
lignin content ranged from 15.8-27.5%.
Genotyping Results
The 1,536 SNPs chosen for genotyping using the Illumina GoldenGate platform represent 948 from 40
candidate genes (20 gene families and 202 amplicons) with seven to 65 SNPs per gene, and 588 from
the 232 catkins ESTs (Table 1). Of the 1,536 SNPs, 874 (57%) yielded data consistent with our quality
thresholds (579 candidate gene SNPs and 297 eSNPs). A conversion rate of 61% (579 SNPs) was
observed among the 948 SNPs from the resequenced 40 lignin/cellulose candidate genes as opposed to
51% for the eSNPs. The median GC50 score across all usable SNPs was 0.71 and the median CR score
was 0.72. Quality scores across the genotyped loci are summarized in Table S2. Distribution of the
quality metrics for genotyped SNPs, grouped by dataset, is shown in Figure S1. The majority of the 579
successfully genotyped SNPs were silent, with nonsynonymous SNPs accounting for 19% of the total.
Population Structure
Principal components analysis on the 488 clones using 297 eSNPs revealed four significant PCs,
explaining 10% of the overall variance. From these four PCs, five clusters were formed using
hierarchical clustering with Ward’s linkage method. All five clusters illustrated a latitudinal trend, with
the Columbia River, delineating a major geographical north-south separation (Fig. 1C). These five
clusters also illustrated significant genetic structuring as estimated using FST, as well as significant
differences among means for the three composite traits. The average FST was low for both sets, but
greater for the lignocellulosic SNPs (FST = 0.034, 99% CI: 0.028-0.042) as opposed to the eSNPs (FST =
0.013, 99% CI: 0.011-0.016) SNPs. A comparison of the distribution of FST for each set revealed that 7
genes had values of FST greater than any observed for the eSNPs (Fig. S4). These outliers were
concentrated within the CesA3A, CAD, SUSY1, 4CL1, CesA2B, TUB15 and CesA1B genes (Fig. S4).
Polymorphisms within these genes had values of Fst approximately five to 10-fold greater than the
multiple locus average. Cluster one, which was distributed primarily south of the Columbia River, also
had significantly different means for lignin, S/G and C6 (ANOVA: P < 2.0 X 10-6; Tukey multiple
comparison tests: P < 0.01). Additional summaries of genetic diversity across all SNPs and clusters are
given in (Table S3, Figures: S1, S2, S3, and S4).
Linkage Disequilibrium
All r2 values were pooled to assess the overall behavior of LD for the candidate genes and to
estimate the genome-wide degree of LD in black cottonwood. Figure 2B shows the extent of LD
across the sequenced regions. The fitted curve indicates that LD is generally low in black
cottonwood, rapidly decaying by over 50% (from 0.50 to 0.20) within a distance of ~200 bp
(Figure 2B, 2C). Within candidate genes, the average distance associated with LD decline to
r2 = 0.1 varies from c. 200 to c. 600 bp (Figure 2A).
Overall summary of single SNP and haplotype based associations
A total of 1,734 (579 SNPs x 3 traits) single marker association tests were performed. Of these, 65 were
significant at the threshold of P < 0.05. Multiple test corrections using the FDR method reduced this
number to 37 at a significance threshold of Q < 0.10. A total of 13 lignin content, 1 S/G, and 23 C6
sugar content associations were identified (Table 3). The 37 associations represent 27 unique SNPs from
40 candidate genes. Many of the 37 SNPs that exhibited significant associations with at least one trait
were consistent with codominance (Table 4). Four of the 34 markers for which dominance and additive
effects could be calculated were consistent with overdominance (|d/a| > 1.25). The remaining 30
markers were split between modes of gene action that were codominant (|d/a| < 0.50, 25) or partially to
fully dominant (0.50 < |d/a| < 1.25, 5). Most effects were small to moderate and accounted for 10% to
78% of the phenotypic standard deviation.
Among haplotype-based associations, a total of 181 amplicons were analyzed (after the removal of
singletons) and 17 amplicons from 13 unique genes were significant with a global significance threshold
of P < 0.05 (Table 5). Multiple test corrections using the FDR method reduced this number to 14
amplicons (13 unique genes and 71 haplotypes) at a global significance threshold of Q < 0.10.
Lignin Associations
Lignin composition was represented by averaging values of guaicyl precursor peaks. A total of 13
significant single marker associations were found for nine candidate genes associated with lignin content
(Table 3). Three of the significant marker-trait associations were located in the coding region and 10 in
the non-coding region. Two of the significant associations were nonsynonymous (C4H1, CESA2A) and
one synonymous (HCT6). Individually, each of the 13 markers explained a small portion of the
phenotypic variance, with effects ranging from 1.2% to 3.8%.
Eleven significant haplotype associations from 10 unique genes were identified for lignin content (Table
5). Eight amplicons, representing seven unique genes, had at least one significant haplotype after
multiple test corrections (Table 5). Three of the amplicons did not have significant individual haplotypes
and included regions of three candidate genes (CCR, CesA2B, and TUA5). From the eight amplicons
with at least one significant haplotype, just one (SUSY1) was supported with a single marker association
in the same trait (SUSY1_02-108). The remaining five candidate genes (4CL1, 4CL3, CesA1B, CesA3A,
and HCT1) had at least one supporting single marker association with a P value < 0.05 before multiple
test corrections.
S/G Ratio Associations
The S/G ratio phenotype is a result of the seven S peaks to the six G peaks. Analysis of the S/G trait
resulted in one significant marker-trait association (Table 3). This marker is non-coding and explained a
small portion of the phenotypic variance (3.2%). Haplotype-based tests did not reveal significant
associations.
C6 Sugar Associations
C6 sugars were represented by summing the values of six peaks. A total of 23 significant associations
were found in 13 candidate genes associated with C6 sugars (Table 3). Four of the significant markertrait associations were located in coding regions. Three of these SNPs are synonymous for three
different candidate genes (CESA1A, C4H2, HCT6) and one significant association was nonsynonymous
(CESA2A). Four marker-trait associations in two candidate genes were highly significant and unique
only to the C6 phenotype (SUSY1, CESA1B). All 23 markers explain a small portion of the phenotypic
variance, with individual effects ranging from 1.1% to 3.7%.
A total of three amplicons representing three unique candidate genes (4CL1, CesA1A, and SAM1) were
significant in terms of haplotype-based associations with C6 (Table 5). All three amplicons were highly
significant (Q < 0.05) with respect to C6 sugars and contained at least one significant individual
haplotype after multiple test corrections (Q < 0.10). One candidate gene (CesA1A) contained a
significant single maker association in the same amplicon and associated with the same trait.
(CesA1A_12-40).
Discussion
Hybridization, molecular breeding and genetic engineering efforts are all under consideration to
improve wood-based ethanol production. Strategies for the domestication of forest trees using
either conventional or novel molecular breeding approaches are centered around the exploitation
of existing genetic diversity. Over the past few decades, genetic maps have been made for many
forest tree species and QTLs have been mapped for a range of traits, such as wood properties, with
the aim of using genetic markers linked to QTLs to apply marker-based breeding programs
(Brown et al. 2003). The lack of resolution in mapping candidate genes and QTL alleles can be
overcome by association genetics, using natural populations in which the long evolutionary
history has decreased the extent of LD in populations (Neale and Savolainen 2004). An important
prerequisite for association mapping is the availability of large allelic variation in the population.
LD describes a key aspect of genetic variation in natural populations of plants. This study is the
first examination of genome-wide LD in black cottonwood and enables comparison with other
poplars. We examined LD across 39 of the candidate genes (Figure 2B, 2C) and observed a rapid
decay of LD within just a few hundred bp, indicating the potential of association genetics to
identify genes responsible for variation in the trait. Previous studies in both P. tremula (five genes)
and P. nigra (nine genes), showed a similar rapid decay of LD (Ingvarsson et al. 2005; Chu et al.
2009).
This study examined both single marker associations and haplotype-based tests to account for
information present in the associations between markers as well as directly between a SNP and
the trait. Given the structure of our data, a natural way to apply the knowledge of LD within and
between genes is to perform haplotype-based association tests. The power of a single marker
association test is often limited because LD information contained in flanking markers is ignored.
Intuitively, haplotypes (which are essentially a collection of ordered markers) may be more
powerful than individual, non-ordered markers.
This study demonstrates that the use of
haplotypes can significantly increase the ability to map traits of interest.
Candidate genes known to be involved in lignocellulosic cell wall development were examined for
genetic associations. There are two major steps of lignin biosynthesis in plants: monolignol
biosynthesis and the subsequent polymerization of lignin monomers to form polymers. This
biochemical pathway is highly conserved throughout vascular plants, and many of the enzymes
have been identified and characterized (Boerjan et al. 2003; Xu et al. 2009). The cellulose
biosynthesis pathway involves the process involves synthesis and assembly of β-1,4 glucan chains
at the RTC, and their orderly deposition to form cell wall microfibrils. Although several candidate
genes have been identified, the precise molecular mechanism of cellulose biosynthesis and
microfibril deposition in plants is still not clearly understood. Genetic improvement of lignin and
cellulose biosynthesis in trees continues to be a major research priority. Similar to other
commercial applications for black cottonwood, modified lignin structure (chemical reactivity), and
increased cellulose content are desirable traits. Mechanisms that can increase C6 sugar content
and decrease C5 sugar content of hemicelluloses are favorable for fermentation.
The monolignol biosynthetic pathway involves many intermediates and enzymes (Boerjan et al. 2003).
The first step in the process consists of a deamination of phenylalanine by the phenylalanine ammonialyase (PAL) that produces cinnamic acid. PAL is encoded by a small multigene family (Appert et al.
1994; Osakabe et al. 1995; Cochrane et al. 2004), and five isoforms have been annotated in the poplar
genome (Tsai et al. 2006). In this study, markers in PAL2, PAL4, and PAL5 were genotyped. A singlemarker non-coding association was identified with PAL2 that explained 1.4% of the phenotypic
variation in C6 sugars (Table 3). In aspen (P. tremuloides) stem, PAL2 transcripts have been localized to
developing xylem cells, consistent with its involvement in lignin biosynthesis (Kao et al. 2002).
C4H catalyzes the first oxidative reaction in phenylpropanoid metabolism, namely, the conversion
of cinnamic acid to p-coumaric acid (Sewalt et al. 1997). Three C4H genes have been characterized
in black cottonwood (Lu et al. 2006). C4H1 is proposed to be associated with G lignin deposition
while C4H2 is thought to be involved in S lignin biosynthesis (Lu et al. 2006). Four unique single
maker associations were identified in the C4H1 and C4H2 genes examined in this study. A
significant non-synonymous association in exon 1 of C4H1 with lignin demonstrated modes of
gene action consistent with additive effects (Table 3; Figure 4). The C allele at C4H1_02-219 is the
minor allele and causes a histidine (H)  proline (P) amino acid substitution. Heterozygotes for the
marker had a percent value of lignin composition that was intermediate to either homozygote class
(21.9% for A/A, 22.7% for A/C, 23.2% for C/C). A similar study in European maize identified a
nonsynoymous SNP in the first exon of C4H1 associated with forage quality traits (Anderson et al.
2008). Physiological studies of these genes describe unique functions for the isoforms within the
lignin biosynthetic pathway.
4-coumarate:CoA ligase (4CL), which catalyzes the formation of CoA esters of p-coumaric acid and
its derivatives, have a pivotal role in channeling phenylpropanoid precursors into different
downstream pathways, each leading to a variety of functionally distinct end products (Harding et
al. 2002). 4CL is also encoded by multigene families, with 5 isoforms annotated in the poplar
genome (Tsai et al. 2006). While we were unable to identify significant single marker associations
in 4CL1, 4CL3, and 4CL5, significant associations with haplotypes in 4CL1 and 4CL3 were observed
for both lignin and C6 traits. Of the five haplotypes (spanning 389 bp) in 4CL1_01, two significant
associations demonstrated an effect on C6 sugar content (35.1% for AGA and 34.1% for AAA). In
lignin composition, two haplotypes of 4CL1_11 demonstrated a difference of > 1% in lignin
composition (Table 5; Figure 5B). Three single markers in 4CL_11 at P < 0.05 were found to be
linkage disequilibrium and their individual genotypic effects on lignin composition were small in
comparison to the spanning haplotype block (Figure 5B). Reducing 4CL expression in transgenic
poplar has resulted in significant reductions of lignin, ranging from 5% to 45% (Hu et al. 1999; Li
et al. 2003).
Hydroxycinnamoyl-CoA transferase (HCT) is the most recently identified enzyme in monolignol
biosynthesis and belongs to a large family of acyltransferases (Hoffmann et al., 2003). It catalyzes
the conversion of p-coumaroyl-CoA and caffeoyl-CoA to their corresponding shikimate or quinate
esters. Two of the six annotated HCT genes in the Populus genome (HCT1 and HCT6) are expressed in
developing xylem (Tsai et al. 2006). HCT6_13-225 was a significant synonymous marker in both
lignin and C6 (Table 3). Two significant haplotypes in HCT1_12 were associated with lignin
composition (Table 5). HCT has not been transgenically manipulated in poplar, however RNAimediated silencing of HCT in conifers (Pinus radiata) that do not produce S lignin, had a strong impact
on lignin content (42% reduction), monolignol composition, and interunit linkage distribution
(Wagner et al. 2007). A similar study of HCT in Arabidopsis showed a reduction in lignin content
and an increased G lignin deposition (Hoffmann et al. 2007).
P-coumaroyl-CoA shikimate proceeds through a series of transformations into caffeoyl-CoA
shikimate, caffeoyl-CoA, feruloyl-CoA, and coniferaldehyde by the action of the enzymes pcoumaroyl-CoA 3’-hydrolase (C3’H), HCT, caffeoyl-CoA O-methyltransferase (CCoAOMT), and
cinnamoyl CoA reductase (CCR), respectively. CCoAOMT, catalyzing the methylation of caffeoylCoA to feruloyl-CoA, is critical in maintaining lignin structural integrity (Meyermans et al., 2000;
Zhong et al., 2000). In the two independent studies referenced, antisense down-regulation of
CCoAOMT1 in transgenic hybrid poplar (P. tremula x P. alba) resulted in reduced lignin content as well
as altered S/G ratio. In this study, markers from CCoAOMT1 and CCoAMOT2 were genotyped.
CCoAOMT1 had one significant non-coding SNP associated with C6 sugar content (Table 3).
Cinnamoyl-CoA reductase (CCR) catalyzes the conversion of hydroxycinnamoyl-CoA esters (pcoumaroyl-CoA, feruloyl-CoA, sinapoyl-CoA) into their corresponding cinnamyl aldehydes (Pichon
et al. 1998). Downregulation of CCR in transgenic poplar (P. tremula X P. alba) is associated with
up to 50% reduced lignin content (Leple et al. 2007). In this study, a single non-coding two-state
marker in CCR was found to be strongly associated with lignin composition (Table 3). A different
amplicon in the same gene (CCR_12) was globally significant in terms of haplotype associations
but did not report any significant individual haplotypes (Table 5). Haplotype associations were
previously identified in eucalyptus with CCR in relation to wood property traits (Thumma et al.
2005).
Coniferaldehyde can be converted to coniferyl alcohol by the action of CAD or to 5-hydroxyconiferaldehyde and sinapyl aldehyde by the action of ferulate 5-hydrolase (F5H) and caffeic/5hydroxyferulic acid O-methyltransferase (COMT).
CAD catalyzes the reduction of p-
hydroxycinnamaldehydes into their corresponding alcohols and is the last enzyme in monolignol
biosynthesis. In this study, CAD_04-185, a non-coding marker, illustrated patterns of gene action
consistent with additive effects in relation to S/G and C6 sugars. This was the only single marker
association identified with S/G ratio. Three of the nine individual haploypes (spanning 407 bp) in the
same amplicon of CAD were significant for lignin composition. Differences in genotypic effects on
lignin content were minimal (22.2% for CAAAAT, 22.8% for CATAAT, and 22.5% for GATAAT).
The CAD gene family has been extensively studied in Arabidopsis, rice, and poplar (Bakarat et al.
2009). Down-regulation of CAD in transgenic poplar did not affect the overall lignin content and
composition, but led to an increased incorporation of the hydroxycinnamaldehydes into the lignin
(Boucher et al. 1996; Pilate et al. 2002). Field trials of the CAD-deficient transgenic poplar showed
improved Kraft pulping performance (Pilate et al. 2002).
COMT was originally thought to be a bifunctional enzyme that sequentially methylated caffeic and
5-hydroxyferulic acids. More recently, it has been shown to act downstream in monolignol
biosynthesis by methylating the aldehyde and alcohol backbones (Osakabe et al. 1999; Parvathi et
al. 2001). In this study, markers from COMT1 and COMT2 were successfully genotyped (Table 1). A
single non-coding COMT2 marker was identified as significant with C6 sugar content (Table 3).
Suppression of COMT in both P. tremula x P. alba and P. tremuloides lines did not change lignin
content but resulted in a reduction of the S/G lignin ratio (due to a decrease of S and an increase of
G), as well as the incorporation of an abnormal, 5-hydroxyguaicyl unit into the lignin (Van
Doorsselaere et al. 1995; Tsai et al. 1998).
After their biosynthesis, monolignols are transported from the cytoplasm to the cell wall and
polymerized to a lignin matrix. The molecular mechanisms and the proteins responsible for
transport and polymerization are not fully characterized. In the cell wall, the monolignols are
oxidized to their radicals and polymerized. Laccases (Lac), peroxidases and other phenol oxidases
have long been thought to be involved in this polymerization (Baucher et al., 2003), but conclusive
evidence for their role is still lacking. In our study, we examined Lac1a, Lac2, and Lac90a. Lac1a
was found to have two non-coding single marker associations with C6 sugars (Table 3). In
poplars, several laccases (Ranocha et al., 1999) were cloned and characterized. At least eight of
these laccases were identified in association with lignin biosynthetic pathways by microarray
analysis (Andersson-Gunnerås et al. 2006). Subsequent studies with antisense Lac3 in transgenic
hybrid poplar showed little variation in lignin content, however, the soluble phenolics and
structure of the secondary wall were altered (Ranocha et al. 2002).
Variations in the quantity and quality of cellulose in plants is suspected to be a primary result of
enzymatic activities of different types of cellulose synthases (CesAs) (Haigler and Blanton, 1996).
The CesA gene family contains 17 members in the sequenced poplar genome, five of which are highly
expressed during wood formation (Djerbi et al. 2004; Joshi et al. 2004; Suzuki et al. 2006, Kumar et al.
2009). All five isoforms were evaluated for association in this study (CesA1A, CesA1B, CesA2A,
CesA2B, and CesA3A), and all had at least one single marker or haplotype association (Table 1). In
lignin and C6 sugars traits, the same nonsynonymous marker in the 6th exon of CesA2A was strongly
associated. The G allele at CesA2A is the minor allele and causes an isoleucine (I)  valine (V) amino
acid substitution (Table 3). The genotypic effects of the two-state SNP are shown in Figure 3B. In lignin
traits, the differences in content were significant (22% for AA and 23.6% for AG); the same is true for
C6 sugar content (34.9% for AA and 32.1% for AG). Three single marker associations between CesA1B
and lignin composition were identified (Table 3; Figure 3). Two of these three non-coding SNPs were
also associated with C6 sugar content. CesA1B_10 had one significant haplotype associated with lignin
composition. CesA1A had two non-coding and one synonymous association (CesA1A_12-40) for C6
sugars. One of the non-coding SNPs (CesA1A_20-226) was also associated with lignin content. CesA3A
had two different amplicons with significant haplotype associations with lignin. Three significant
haplotypes from six were highly associated in CesA1A_12 (spanning 183 bp) and their genotypic effects
on C6 were also significant (33.6% for AGA, 34.2% for AAA, 35.3% for GAG) (Table 5).
CesA proteins in the RTC use cytosolic uridine diphosphate (UDP)-glucose as substrate, which is
provided directly by particulate sucrose synthase (SUSY) (Haigler et al. 2001). This enzyme
produces UDP-glucose and fructose from sucrose and UDP. Of the six SUSY genes annotated in the
poplar genome, only two were highly expressed in wood-forming tissues based on microarray analysis
(Geisler-Lee et al. 2006; Meng et al. 2006). In this study, amplicons from SUSY1 were successfully
genotyped (Table 1). Single-marker tests with SUSY1 revealed six non-coding associations with C6 and
two with lignin composition (Table 3). Two of the three individual haplotypes (spanning 386 bp)
identified in SUSY1_02 were significant. Genotypic differences between haplotypes were observed for
lignin composition (21.8% for AAAA and 22.9% for TGGG) (Table 5). Three of the four markers that
compose the SUSY1_02 haplotype are in strong LD (Figure 5A). Recently, over-expression of SUSY in
transgenic poplar has led to an increase in both cellulose production and cellulose crystallinity (Coleman
et al. 2009), confirming previous suggestion that SUSY could be one of the limiting steps of cellulose
biosynthesis (Tang and Strum, 1999; Haigler et al. 2001).
This study represents the most comprehensive evaluation of LD and genetic association in poplars.
High-throughput genotyping technologies and the vast genomic resources in black cottonwood allowed
a large number of candidate genes to be evaluated for associations with the lignocellulosic cell wall
development. The genes studied are those known to be associated with these pathways and ones that
have been extensively studied for commercial applications, such as pulp and feedstock production, and
are now being further evaluated for improvement in relation to biofuels production. Given the rapid
decay of within-gene LD in black cottonwood and the high coverage of amplicons across each
gene, it is likely the numerous polymorphisms identified are in close proximity to the causative
SNPs and the haplotype associations accurately reflect the information present in the associations
between markers.
This study demonstrates that a forward genetics approach (association
genetics) can be used to discover naturally occurring allelic variation in genes associated with
commercially important traits, in this case, lignin and cellulose biosynthesis. Many of the same
genes were implicated using reverse genetics approaches, however, the association approach
provides estimates of the size of effects of these alleles on a phenotype. Understanding the size of
the effects as well as the existing variation is critical in applying the knowledge gained on a
particular SNP to marker-based breeding programs with goals to increase cellulose yield and
therefore cellulosic ethanol production. Given the increasing efficiency and lowering costs of
sequencing and genotyping technologies, the goal of resequencing the genome and relating the
polymorphisms to a trait of interest is now feasible.
Acknowledgements:
We thank Charles Nicolet and Vanessa Rashbrook for performing the SNP genotyping, and John
Liechty and Benjamin Figueroa for bioinformatics support. Funding for this project was made
available through the Chevron Technology Ventures-UC Davis Biofuels Project.
ANDERSSON-GUNNERAS, S., E. J. MELLEROWICZ, J. LOVE, B. SEGERMAN, Y. OHMIYA et al., 2006 Biosynthesis
of cellulose-enriched tension wood in Populus: global analysis of transcripts and metabolites
identifies biochemical and developmental regulators in secondary wall biosynthesis. Plant
Journal 45: 144-165.
APPERT, C., E. LOGEMANN, K. HAHLBROCK, J. SCHMID and N. AMRHEIN, 1994 Structural and Catalytic
Properties of the 4 Phenylalanine Ammonia-Lyase Isoenzymes from Parsley (PetroselinumCrispum Nym). European Journal of Biochemistry 225: 491-499.
BATE, N. J., J. ORR, W . T. NI, A. MEROMI, T. NADLERHASSAR et al., 1994 Quantitative Relationship
between Phenylalanine Ammonia-Lyase Levels and Phenylpropanoid Accumulation in
Transgenic Tobacco Identifies a Rate-Determining Step in Natural Product Synthesis.
Proceedings of the National Academy of Sciences of the United States of America 91: 76087612.
BAUCHER, M., B. CHABBERT, G. PILATE, J. VANDOORSSELAERE, M. T. TOLLIER et al., 1996 Red xylem and
higher lignin extractability by down-regulating a cinnamyl alcohol dehydrogenase in poplar.
Plant Physiology 112: 1479-1490.
BAUCHER, M., C. HALPIN, M. PETIT-CONIL and W . BOERJAN, 2003 Lignin: Genetic engineering and impact
on pulping. Critical Reviews in Biochemistry and Molecular Biology 38: 305-350.
BEAUMONT, M. A., W . ZHANG and D. J. BALDING, 2002 Approximate Bayesian computation in population
genetics. Genetics 162: 2025-2035.
BISHAI, J. M., W . MITZNER, C. G. TANKERSLEY and E. M. W AGNER, 2007 PEEP-induced changes in
epithelial permeability in inbred mouse strains. Respir Physiol Neurobiol 156: 340-344.
BOERJAN, W ., 2005 Biotechnology and the domestication of forest trees. Curr Opin Biotechnol 16: 159166.
BOERJAN, W ., J. RALPH and M. BAUCHER, 2003 Lignin biosynthesis. Annual Review of Plant Biology 54:
519-546.
BOUDET, A. M., S. KAJITA, J. GRIMA-PETTENATI and D. GOFFNER, 2003 Lignins and lignocellulosics: a
better control of synthesis for new and improved uses. Trends Plant Sci 8: 576-581.
BRADBURY, P. J., Z. ZHANG, D. E. KROON, T. M. CASSTEVENS, Y. RAMDOSS et al., 2007 TASSEL: software
for association mapping of complex traits in diverse samples. Bioinformatics 23: 2633-2635.
BRADSHAW, H. D., R. CEULEMANS, J. DAVIS and R. STETTLER, 2000 Emerging Model Systems in Plant
Biology: Poplar (Populus) as a Model Forest Tree. Journal of Plant Growth Regulation 19: 306313.
BROWN, G. R., D. L. BASSONI, G. P. GILL, J. R. FONTANA, N. C. W HEELER et al., 2003 Identification of
quantitative trait loci influencing wood property traits in loblolly pine (Pinus taeda L.). III. QTL
Verification and candidate gene mapping. Genetics 164: 1537-1546.
BROWN, G. R., G. P. GILL, R. J. KUNTZ, C. H. LANGLEY and D. B. NEALE, 2004 Nucleotide diversity and
linkage disequilibrium in loblolly pine. Proc Natl Acad Sci U S A 101: 15255-15260.
CHU, Y., X. SU, Q. HUANG and X. ZHANG, 2009 Patterns of DNA sequence variation at candidate gene
loci in black poplar (Populus nigra L.) as revealed by single nucleotide polymorphisms.
Genetica 137: 141-150.
COCHRANE, F. C., L. B. DAVIN and N. G. LEWIS, 2004 The Arabidopsis phenylalanine ammonia lyase
gene family: kinetic characterization of the four PAL isoforms. Phytochemistry 65: 1557-1564.
COLEMAN, H. D., J. YAN and S. D. MANSFIELD, 2009 Sucrose synthase affects carbon partitioning to
increase cellulose production and altered cell wall ultrastructure. Proceedings of the National
Academy of Sciences 106: 13118-13123.
DAVIS, J. M., 2008 Genetic Improvement of Poplar (Populus spp.) as a Bioenergy Crop, pp. 397-419 in
Genetic Imrpovement of Bioenergy Crops, edited by W . VERMERRIS. Springer New York, New
York.
DAVISON, B. H., S. R. DRESCHER, G. A. TUSKAN, M. F. DAVIS and N. P. NGHIEM, 2006 Variation of S/G ratio
and lignin content in a Populus family influences the release of xylose by dilute acid hydrolysis.
Appl Biochem Biotechnol 129-132: 427-435.
DIXON, R. A., and M. S. S. REDDY, 2003 Biosynthesis of monolignols. Genomic and reverse genetic
approaches. Phytochemistry Reviews 2: 289-306.
DJERBI, S., M. LINDSKOG, L. ARVESTAD, F. STERKY and T. T. TEERI, 2005 The genome sequence of black
cottonwood (Populus trichocarpa) reveals 18 conserved cellulose synthase (CesA) genes.
Planta 221: 739-746.
DJERBI, S., M. LINDSKOG, L. ARVESTAD, F. STERKY and T. T. TERRI, 2005 The genome sequence of black
cottonwood (Populus trichocarpa) reveals 18 conserved cellulose synthase (CesA) genes.
Planta 221: 8.
DO, C. B., M. S. P. MAHABHASHYAM, M. BRUDNO and S. BATZOGLOU, 2005 ProbCons: Probabilistic
consistency-based multiple sequence alignment. Genome Research 15: 330-340.
ECKERT, A. J., A. D. BOWER, J. L. W EGRZYN, B. PANDE, K. D. JERMSTAD et al., 2009 Association genetics of
coastal Douglas fir (Pseudotsuga menziesii var. menziesii, Pinaceae). I. Cold-hardiness
related traits. Genetics 182: 1289-1302.
ECKERT, A. J., J. L. W EGRZYN, B. PANDE, K. D. JERMSTAD, J. M. LEE et al., 2009 Multilocus patterns of
nucleotide diversity and divergence reveal positive selection at candidate genes related to cold
hardiness in coastal Douglas Fir (Pseudotsuga menziesii var. menziesii). Genetics 183: 289298.
ECKERT, C. G., K. E. SAMIS and S. C. LOUGHEED, 2008 Genetic variation across species' geographical
ranges: the central-marginal hypothesis and beyond. Mol Ecol 17: 1170-1188.
ELKIND, Y., R. EDWARDS, M. MAVANDAD, S. A. HEDRICK, O. RIBAK et al., 1990 Abnormal-Plant
Development and down-Regulation of Phenylpropanoid Biosynthesis in Transgenic Tobacco
Containing a Heterologous Phenylalanine Ammonia-Lyase Gene. Proceedings of the National
Academy of Sciences of the United States of America 87: 9057-9061.
FAN, J. B., A. OLIPHANT, R. SHEN, B. G. KERMANI, F. GARCIA et al., 2003 Highly parallel SNP genotyping.
Cold Spring Harb Symp Quant Biol 68: 69-78.
FEDER, M. E., and T. MITCHELL-OLDS, 2003 Evolutionary and ecological functional genomics. Nat Rev
Genet 4: 651-657.
FRANKE, R., M. R. HEMM, J. W. DENAULT, M. O. RUEGGER, J. M. HUMPHREYS et al., 2002 Changes in
secondary metabolism and deposition of an unusual lignin in the ref8 mutant of Arabidopsis.
Plant Journal 30: 47-59.
FU, Y. X., and W . H. LI, 1993 Statistical tests of neutrality of mutations. Genetics 133: 693-709.
GARCIA, M. V., and P. K. INGVARSSON, 2007 An excess of nonsynonymous polymorphism and extensive
haplotype structure at the PtABI1B locus in European aspen (Populus tremula): a case of
balancing selection in an obligately outcrossing plant? Heredity 99: 381-388.
GEISLER-LEE, J., M. GEISLER, P. M. COUTINHO, B. SEGERMAN, N. NISHIKUBO et al., 2006 Poplar
carbohydrate-active enzymes. Gene identification and expression analyses. Plant Physiology
140: 946-962.
GILL, G. P., G. R. BROWN and D. B. NEALE, 2003 A sequence mutation in the cinnamyl alcohol
dehydrogenase gene associated with altered lignification in loblolly pine. Plant Biotechnol J 1:
253-258.
GONZALEZ-MARTINEZ, S. C., E. ERSOZ, G. R. BROWN, N. C. W HEELER and D. B. NEALE, 2006 DNA sequence
variation and selection of tag single-nucleotide polymorphisms at candidate genes for droughtstress response in Pinus taeda L. Genetics 172: 1915-1926.
GONZALEZ-MARTINEZ, S. C., D. HUBER, E. ERSOZ, J. M. DAVIS and D. B. NEALE, 2008 Association genetics
in Pinus taeda L. II. Carbon isotope discrimination. Heredity 101: 19-26.
GONZALEZ-MARTINEZ, S. C., N. C. W HEELER, E. ERSOZ, C. D. NELSON and D. B. NEALE, 2007 Association
genetics in Pinus taeda L. I. Wood property traits. Genetics 175: 399-409.
GOUDET, J., 2005 HIERFSTAT, a package for R to compute and test hierarchical F-statistics.
Molecular Ecology Notes 5: 184-186.
GROOVER, A. T., 2007 Will genomics guide a greener forest biotech? Trends Plant Sci 12: 234-238.
GUESS, H. A., and W . J. EWENS, 1972 Theoretical and simulation results relating to the neutral allele
theory. Theor Popul Biol 3: 434-447.
HAIGLER, C. H., and R. L. BLANTON, 1996 New hope for old dreams: Evidence that plant cellulose
synthase genes have finally been identified. Proceedings of the National Academy of Sciences
of the United States of America 93: 12082-12085.
HAIGLER, C. H., M. IVANOVA-DATCHEVA, P. S. HOGAN, V. V. SALNIKOV, S. HWANG et al., 2001 Carbon
partitioning to cellulose synthesis. Plant Molecular Biology 47: 29-51.
HARDING, S. A., J. LESHKEVICH, V. L. CHIANG and C. J. TSAI, 2002 Differential substrate inhibition couples
kinetically distinct 4-coumarate : coenzyme A ligases with spatially distinct metabolic roles in
quaking aspen. Plant Physiology 128: 428-438.
HILL, W. G., and A. ROBERTSON, 1968 Effects of Inbreeding at Loci with Heterozygote Advantage.
Genetics 60: 615-&.
HOFFMANN, B., B. CHABBERT, B. MONTIES and T. SPECK, 2003 Mechanical, chemical and X-ray analysis
of wood in the two tropical lianas Bauhinia guianensis and Condylocarpon guianense:
variations during ontogeny. Planta 217: 32-40.
HOFFMANN, L., S. BESSEAU, P. GEOFFROY, C. RITZENTHALER, D. MEYER et al., 2004 Silencing of
hydroxycinnamoy-coenzyme A shikimate/quinate hydroxycinnamoyltransferase affects
phenylpropanoid biosynthesis. Plant Cell 16: 1446-1465.
HOFFMANN, L., S. MAURY, F. MARTZ, P. GEOFFROY and M. LEGRAND, 2003 Purification, cloning, and
properties of an acyltransferase controlling shikimate and quinate ester intermediates in
phenylpropanoid metabolism. Journal of Biological Chemistry 278: 95-103.
HU, W . J., S. A. HARDING, J. LUNG, J. L. POPKO, J. RALPH et al., 1999 Repression of lignin biosynthesis
promotes cellulose accumulation and growth in transgenic trees. Nature Biotechnology 17:
808-812.
HUDSON, P. J., A. P. DOBSON, I. M. CATTADORI, D. NEWBORN, D. T. HAYDON et al., 2002 Trophic interactions
and population growth rates: describing patterns and identifying mechanisms. Philos Trans R
Soc Lond B Biol Sci 357: 1259-1271.
INGVARSSON, P. K., 2005 Nucleotide polymorphism and linkage disequilibrium within and among natural
populations of European aspen (Populus tremula L., Salicaceae). Genetics 169: 945-953.
INGVARSSON, P. K., 2009 Natural selection on synonymous and non-synonymous mutations shape
patterns of polymorphism in Populus tremula. Mol Biol Evol.
INGVARSSON, P. K., M. V. GARCIA, V. LUQUEZ, D. HALL and S. JANSSON, 2008 Nucleotide polymorphism
and phenotypic associations within and around the phytochrome B2 Locus in European aspen
(Populus tremula, Salicaceae). Genetics 178: 2217-2226.
JOSHI, C. P., S. BHANDARI, P. RANJAN, U. C. KALLURI, X. LIANG et al., 2004 Genomics of cellulose
biosynthesis in poplars. New Phytologist 164: 53-61.
KAO, Y. Y., S. A. HARDING and C. J. TSAI, 2002 Differential expression of two distinct phenylalanine
ammonia-lyase genes in condensed tannin-accumulating and lignifying cells of quaking aspen.
Plant Physiology 130: 796-807.
KELLEHER, C. T., R. CHIU, H. SHIN, I. E. BOSDET, M. I. KRZYWINSKI et al., 2007 A physical map of the highly
heterozygous Populus genome: integration with the genome sequence and genetic map and
analysis of haplotype variation. Plant J 50: 1063-1078.
KUMAR, M., S. THAMMANNAGOWDA, V. BULONE, V. CHIANG, K. H. HAN et al., 2009 An update on the
nomenclature for the cellulose synthase genes in Populus. Trends in Plant Science 14: 248254.
LANDEGREN, U., M. NILSSON and P. Y. KWOK, 1998 Reading bits of genetic information: methods for
single-nucleotide polymorphism analysis. Genome Res 8: 769-776.
LAPIERRE, C., B. POLLET, M. PETIT-CONIL, G. TOVAL, J. ROMERO et al., 1999 Structural alterations of lignins
in transgenic poplars with depressed cinnamyl alcohol dehydrogenase or caffeic acid Omethyltransferase activity have an opposite impact on the efficiency of industrial kraft pulping.
Plant Physiology 119: 153-163.
LEPLE, J. C., R. DAUWE, K. MORREEL, V. STORME, C. LAPIERRE et al., 2007 Downregulation of cinnamoylcoenzyme A reductase in poplar: multiple-level phenotyping reveals effects on cell wall
polymer metabolism and structure. Plant Cell 19: 3669-3691.
LI, L., Y. ZHOU, X. CHENG, J. SUN, J. M. MARITA et al., 2003 Combinatorial modification of multiple lignin
traits in trees through multigene cotransformation. Proc Natl Acad Sci U S A 100: 4939-4944.
LI, Y., S. KAJITA, S. KAWAI, Y. KATAYAMA and N. MOROHOSHI, 2003 Down-regulation of an anionic
peroxidase in transgenic aspen and its effect on lignin characteristics. J Plant Res 116: 175182.
LU, S. F., Y. H. ZHOU, L. G. LI and V. L. CHIANG, 2006 Distinct roles of cinnamate 4-hydroxylase genes in
Populus. Plant and Cell Physiology 47: 905-914.
MACKAY, J., D. R. DIMMEL and J. J. BOON, 2001 Pyrolysis mass spectral characterization of wood from
CAD-deficient pine. Journal of Wood Chemistry and Technology 21: 19-29.
MARCHINI, J., L. R. CARDON, M. S. PHILLIPS and P. DONNELLY, 2004 The effects of human population
structure on large genetic association studies. Nat Genet 36: 512-517.
MARJORAM, P., J. MOLITOR, V. PLAGNOL and S. TAVARE, 2003 Markov chain Monte Carlo without
likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328.
MARTH, G. T., I. KORF, M. D. YANDELL, R. T. YEH, Z. J. GU et al., 1999 A general approach to singlenucleotide polymorphism discovery. Nature Genetics 23: 452-456.
MENG, M., M. GEISLER, H. JOHANSSON, E. J. MELLEROWICZ, S. KARPINSKI et al., 2007 Differential
tissue/organ-dependent expression of two sucrose- and cold-responsive genes for UDPglucose pyrophosphorylase in Populus. Gene 389: 186-195.
MEYERMANS, H., K. MORREEL, C. LAPIERRE, B. POLLET, A. DE BRUYN et al., 2000 Modifications in Lignin
and Accumulation of Phenolic Glucosides in Poplar Xylem upon Down-regulation of CaffeoylCoenzyme A O-Methyltransferase, an Enzyme Involved in Lignin Biosynthesis. J. Biol. Chem.
275: 36899-36909.
NEALE, D. B., and O. SAVOLAINEN, 2004 Association genetics of complex traits in conifers. Trends Plant
Sci 9: 325-330.
NICKERSON, D. A., V. O. TOBE and S. L. TAYLOR, 1997 PolyPhred: Automating the detection and
genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic
Acids Research 25: 2745-2751.
OAKLEY, R. V., Y. S. W ANG, W . RAMAKRISHNA, S. A. HARDING and C. J. TSAI, 2007 Differential expansion
and expression of alpha- and beta-tubulin gene families in Populus. Plant Physiol 145: 961973.
OLIPHANT, A., D. L. BARKER, J. R. STUELPNAGEL and M. S. CHEE, 2002 BeadArray technology: enabling an
accurate, cost-effective approach to high-throughput genotyping. Biotechniques Suppl: 56-58,
60-51.
OSAKABE, K., C. C. TSAO, L. G. LI, J. L. POPKO, T. UMEZAWA et al., 1999 Coniferyl aldehyde 5hydroxylation and methylation direct syringyl lignin biosynthesis in angiosperms. Proceedings
of the National Academy of Sciences of the United States of America 96: 8955-8960.
OSAKABE, Y., Y. OHTSUBO, S. KAWAI, Y. KATAYAMA and N. MOROHOSHI, 1995 Structure and TissueSpecific Expression of Genes for Phenylalanine Ammonia-Lyase from a Hybrid Aspen,
Populus-Kitakamiensis. Plant Science 105: 217-226.
PARVATHI, K., F. CHEN, D. J. GUO, J. W. BLOUNT and R. A. DIXON, 2001 Substrate preferences of Omethyltransferases in alfalfa suggest new pathways for 3-O-methylation of monolignols. Plant
Journal 25: 193-202.
PAVY, N., B. PELGAS, S. BEAUSEIGLE, S. BLAIS, F. GAGNON et al., 2008 Enhancing genetic mapping of
complex genomes through the design of highly-multiplexed SNP arrays: application to the
large and unsequenced genomes of white spruce and black spruce. BMC Genomics 9: 21.
PETER, G., and D. NEALE, 2004 Molecular basis for the evolution of xylem lignification. Curr Opin Plant
Biol 7: 737-742.
PICHON, M., I. COURBOU, M. BECKERT, A. M. BOUDET and J. GRIMA-PETTENATI, 1998 Cloning and
characterization of two maize cDNAs encoding Cinnamoyl-CoA Reductase (CCR) and
differential expression of the corresponding genes. Plant Molecular Biology 38: 671-676.
PILATE, G., E. GUINEY, K. HOLT, M. PETIT-CONIL, C. LAPIERRE et al., 2002 Field and pulping performances
of transgenic trees with altered lignification. Nature Biotechnology 20: 607-612.
PINCON, G., M. CHABANNES, C. LAPIERRE, B. POLLET, K. RUEL et al., 2001 Simultaneous down-regulation
of caffeic/5-hydroxy ferulic acid-O-methyltransferase I and cinnamoyl-coenzyme a reductase in
the progeny from a cross between tobacco lines homozygous for each transgene.
Consequences for plant development and lignin synthesis. Plant Physiology 126: 145-155.
PLOMION, C., G. LEPROVOST and A. STOKES, 2001 Wood formation in trees. Plant Physiol 127: 15131523.
POKE, F. S., R. E. VAILLANCOURT, R. C. ELLIOTT and J. B. REID, 2003 Sequence variation in two lignin
biosynthesis genes, cinnamoyl CoA reductase (CCR) and cinnamyl alcohol dehydrogenase 2
(CAD2). Molecular Breeding 12: 107-118.
PRICE, A. L., N. J. PATTERSON, R. M. PLENGE, M. E. W EINBLATT, N. A. SHADICK et al., 2006 Principal
components analysis corrects for stratification in genome-wide association studies. Nature
Genetics 38: 904-909.
RAGAUSKAS, A. J., C. K. WILLIAMS, B. H. DAVISON, G. BRITOVSEK, J. CAIRNEY et al., 2006 The path forward
for biofuels and biomaterials. Science 311: 484-489.
RALPH, J., T. AKIYAMA, H. KIM, F. C. LU, P. F. SCHATZ et al., 2006 Effects of coumarate 3-hydroxylase
down-regulation on lignin structure. Journal of Biological Chemistry 281: 8843-8853.
RALPH, S., C. ODDY, D. COOPER, H. YUEH, S. JANCSIK et al., 2006 Genomics of hybrid poplar (Populus
trichocarpax deltoides) interacting with forest tent caterpillars (Malacosoma disstria):
normalized and full-length cDNA libraries, expressed sequence tags, and a cDNA microarray
for the study of insect-induced defences in poplar. Mol Ecol 15: 1275-1297.
RANOCHA, P., M. CHABANNES, S. CHAMAYOU, S. DANOUN and A. JAUNEAU, 2002 Laccase down-regulation
causes alterations in phenolic metabolism and cell wall structure in poplar. Plant Physiol. 129:
145.
RANOCHA, P., G. MCDOUGALL, S. HAWKINS, R. STERJIADES, G. BORDERIES et al., 1999 Biochemical
characterization, molecular cloning and expression of laccases - a divergent gene family - in
poplar. European Journal of Biochemistry 259: 485-495.
REMINGTON, D. L., J. M. THORNSBERRY, Y. MATSUOKA, L. M. W ILSON, S. R. W HITT et al., 2001 Structure of
linkage disequilibrium and phenotypic associations in the maize genome. Proc Natl Acad Sci U
S A 98: 11479-11484.
RUBIN, E. M., 2008 Genomics of cellulosic biofuels. Nature 454: 841-845.
SCHAID, D. J., C. M. ROWLAND, D. E. TINES, R. M. JACOBSON and G. A. POLAND, 2002 Score tests for
association between traits and haplotypes when linkage phase is ambiguous. Am J Hum
Genet 70: 425-434.
SCHRADER, J., J. NILSSON, E. MELLEROWICZ, A. BERGLUND, P. NILSSON et al., 2004 A high-resolution
transcript profile across the wood-forming meristem of poplar identifies potential regulators of
cambial stem cell identity. Plant Cell 16: 2278-2292.
SEWALT, V. J. H., W . T. NI, H. G. JUNG and R. A. DIXON, 1997 Lignin impact on fiber degradation:
Increased enzymatic digestibility of genetically engineered tobacco (Nicotiana tabacum) stems
reduced in lignin content. Journal of Agricultural and Food Chemistry 45: 1977-1983.
STERKY, F., R. R. BHALERAO, P. UNNEBERG, B. SEGERMAN, P. NILSSON et al., 2004 A Populus EST
resource for plant functional genomics. Proceedings of the National Academy of Sciences of
the United States of America 101: 13951-13956.
STERKY, F., S. REGAN, J. KARLSSON, M. HERTZBERG, A. ROHDE et al., 1998 Gene discovery in the woodforming tissues of poplar: analysis of 5, 692 expressed sequence tags. Proc Natl Acad Sci U S
A 95: 13330-13335.
STINCHCOMBE, J. R., and H. E. HOEKSTRA, 2008 Combining population genomics and quantitative
genetics: finding the genes underlying ecologically important traits. Heredity 100: 158-170.
STOREY, J. D., 2002 A direct approach to false discovery rates. Journal of the Royal Statistical Society,
Series B (Methodological) 64: 479-498.
STOREY, J. D., and R. TIBSHIRANI, 2003 Statistical significance for genomewide studies. Proc Natl Acad
Sci U S A 100: 9440-9445.
STRAUSS, S. H., and F. M. MARTIN, 2004 Poplar genomics comes of age. New Phytologist 164: 1-4.
SUZUKI, S., L. G. LI, Y. H. SUN and V. L. CHIANG, 2006 The cellulose synthase gene superfamily and
biochemical functions of xylem-specific cellulose synthase-like genes in Populus trichocarpa.
Plant Physiology 142: 1233-1245.
TAYLOR, G., 2002 Populus: arabidopsis for forestry. Do we need a model tree? Ann Bot 90: 681-689.
THUMMA, B. R., M. F. NOLAN, R. EVANS and G. F. MORAN, 2005 Polymorphisms in cinnamoyl CoA
reductase (CCR) are associated with variation in microfibril angle in Eucalyptus spp. Genetics
171: 1257-1265.
TSAI, C. J., S. A. HARDING, T. J. TSCHAPLINSKI, R. L. LINDROTH and Y. YUAN, 2006 Genome-wide analysis
of the structural genes regulating defense phenylpropanoid metabolism in Populus. New
Phytologist 172: 47-62.
TSAI, C. J., J. L. POPKO, M. R. MIELKE, W . J. HU, G. K. PODILA et al., 1998 Suppression of Omethyltransferase gene by homologous sense transgene in quaking aspen causes red-brown
wood phenotypes. Plant Physiology 117: 101-112.
TUSKAN, G. A., S. DIFAZIO, S. JANSSON, J. BOHLMANN, I. GRIGORIEV et al., 2006 The genome of black
cottonwood, Populus trichocarpa (Torr. & Gray). Science 313: 1596-1604.
UNNEBERG, P., M. STROMBERG and F. STERKY, 2005 SNP discovery using advanced algorithms and
neural networks. Bioinformatics 21: 2528-2530.
VAN DOORSSELAERE, J., M. BAUCHER, E. CHOGNOT, B. CHABBERT, M. T. TOLLIER et al., 1995 A novel lignin
in poplar trees with a reduced caffeic acid 5-hydroxyferulic acid O-methyltransferase activity.
Plant Journal 8: 855-864.
WAGNER, A., J. RALPH, T. AKIYAMA, H. FLINT, L. PHILLIPS et al., 2007 Exploring lignification in conifers by
silencing hydroxycinnamoyl-CoA : shikimate hydroxycinnamoyltransferase in Pinus radiata.
Proceedings of the National Academy of Sciences of the United States of America 104: 1185611861.
WEGRZYN, J. L., J. M. LEE, J. LIECHTY and D. B. NEALE, 2009 PineSAP--sequence alignment and SNP
identification pipeline. Bioinformatics 25: 2609-2610.
WHETTEN, R. W ., J. J. MACKAY and R. R. SEDEROFF, 1998 Recent Advances in Understanding Lignin
Biosynthesis. Annu Rev Plant Physiol Plant Mol Biol 49: 585-609.
WULLSCHLEGER, S. D., S. JANSSON and G. TAYLOR, 2002 Genomics and forest biology: Populus emerges
as the perennial favorite. Plant Cell 14: 2651-2655.
XU, Z., D. ZHANG, J. HU, X. ZHOU, X. YE et al., 2009 Comparative genome analysis of lignin biosynthesis
gene families across the plant kingdom. BMC Bioinformatics 10 Suppl 11: S3.
YU, X. Q., H. W . MEI, L. J. LUO, G. L. LIU, H. Y. LIU et al., 2006 Dissection of additive, epistatic effect and
Q x E interaction of quantitative trait loci influencing stigma exsertion under water stress in rice.
Yi Chuan Xue Bao 33: 542-550.
ZHAO, H. Y., J. H. W EI, J. Y. ZHANG, H. R. LIU, T. WANG et al., 2002 Lignin biosynthesis by suppression of
two O-methyltransferases. Chinese Science Bulletin 47: 1092-1095.
ZHONG, R. Q., W . H. MORRISON, D. S. HIMMELSBACH, F. L. POOLE and Z. H. YE, 2000 Essential role of
caffeoyl coenzyme A O-methyltransferase in lignin biosynthesis in woody poplar plants. Plant
Physiology 124: 563-577.
Table 1. Details of Candidate Genes selected for Resequencing
10
3
3
3
2
3
7
5
10
10
6
6
7
5
SNPs
Targeted
42
9
19
10
9
15
56
21
43
65
25
32
38
33
SNPs
Converted
28
5
15
10
7
10
30
17
27
39
19
18
29
18
5
4
5
2
5
5
4
4
3
3
6
3
6
3
3
4
3
3
3
3
4
4
4
3
6
3
27
12
15
7
29
17
24
24
20
17
26
14
24
12
19
27
13
18
14
24
26
17
14
9
36
10
18
9
13
5
16
11
11
11
12
9
14
10
16
9
11
18
7
12
7
16
14
13
9
5
22
8
Gene
Amplicons
+
4CL1
4CL3+
4CL5
C3H3
C4H1*
C4H2*
CAD*+
CCR*+
CesA1A*+
CesA1B*+
CesA2A*
CesA2B*+
CesA3A+
HCT1*+
HCT6*
KOR1*
LAC1A*
LAC2
LAC90A
PAL2*
PAL4
PAL5
SAM1*+
SHMT1
SHMT3
SHMT6
SUSY1*+
TUA1
TUA5+
TUB15
TUB16
TUB9
CoAOMT1*
CoAOMT2
COMT1
COMT2*
F5H1
F5H2
gdcH1
gdcT2
Gene Family
4-coumarate:CoA ligase (4CL)
coumarate 3-hydroxylase (C3H)
cinnamate 4-hydroxylase (C4H)
cinnamyl alcohol dehydrogenase (CAD)
cinnamoyl-CoA reductase (CCR)
cellulose synthase (CesA)
hydroxcinnamoyl-CoA quinate/shikimate
hydroxycinnamolytransferase (HCT)
cellulase (KOR)
laccase (LAC)
phenylalanine ammonia-lyase (PAL)
S-adenosylmethionine synthetase (SAMS)
Serine hydroxymethyltransferase (SHMT)
sucrose synthase (SUSY)
alpha-tubulin (TUA)
beta-tubulin (TUB)
caffeoyl CoA O-methyltransferase (CCoAOMT)
caffeate O-methyltransferase (COMT)
ferulate 5-hydroxylase (F5H)
glycine decarboxylase complex, H (gdcH)
glycine decarboxylase complex, T (gdcT)
*Genes with significant single marker associations
+Genes with significant haplotype-based associations
SNPs Targeted – SNPs identified and sent for genotyping on the Illumina Golden Gate assay
SNPs Converted – SNPs successfully genotyped on the Illumina Golden Gate assay
JGI Gene Model
estExt_fgenesh4_pg.C_1210004
grail3.0100002702
fgenesh4_pg.C_LG_III001773
fgenesh4_pg.C_LG_VI000268
grail3.0094002901
estExt_fgenesh4_pg.C_LG_XIII0519
estExt_Genewise1_v1.C_LG_IX2359
estExt_fgenesh4_kg.C_LG_III0056
gw1.XI.3218.1
eugene3.00040363
gw1.XVIII.3152.1
estExt_Genewise1_v1.C_LG_VI2188
eugene3.00002636
fgenesh4_pg.C_LG_III001559
eugene3.02080010
estExt_fgenesh4_pg.C_LG_I0683
estExt_fgenesh4_pg.C_LG_XVI1027
estExt_fgenesh4_pg.C_LG_VIII0541
estExt_fgenesh4_pm.C_LG_VIII0291
estExt_fgenesh4_pg.C_LG_VIII0293
gw1.X.2713.1
estExt_fgenesh4_pg.C_LG_X2023
eugene3.00080928
eugene3.00012227
grail3.0003095602
estExt_fgenesh4_pm.C_880008
estExt_fgenesh4_pm.C_LG_XVIII0009
gw1.II.3483.1
eugene3.00090803
estExt_Genewise1_v1.C_LG_I1970
estExt_fgenesh4_pm.C_LG_IX0457
eugene3.00010909
grail3.0001059501
estExt_fgenesh4_pm.C_LG_I1023
estExt_fgenesh4_pm.C_LG_XV0035
estExt_fgenesh4_pm.C_LG_XII0129
estExt_fgenesh4_pm.C_570058
eugene3.00071182
estExt_fgenesh4_pg.C_LG_XII1299
eugene3.02520018
Table 2. Major Peak Assignments from the Pyrolysis Molecular Beam Mass Spectrometry
m/z
Assignment
57, 73, 85, 96, 114
C5 sugars
57, 60, 73, 98, 126, 144
C6 sugars
124
guaiacol
G
137
ethylguaiacol ,homovanillin, coniferyl alcohol
G
138
methylguaiacol
G
150
vinylguaiacol, coumaryl alcohol
G
152
4-ethylguaiacol, vanillin
G
154
syringol
S
164
allyl-*propenyl guaiacol
G
167
ethylsyringol, syringylacetone, propiosyringone
S
168
4-methyl-2,6-dimethoxyphenol
S
178
coniferyl aldehyde
G
180
coniferyl alcohol, vinylsyringol, -D-glucose
G, S
182
syringaldehyde
S
194
4-propenylsyringol
S
208
sinapylaldehyde
S
210
sinapylalcohol
S
m/z – mass to charge ratio
S – syringyl peaks
G – guiacyl peaks
(S) or (G) Precursor
Table 3. List of significant marker-trait pairs after a correction for multiple testing (FDR Q ≤ 0.10)
Trait
Gene Symbol
SNP
F
P
N
R2
Q
Lignin
C4H1_04-219
[A:C]ns
4.6766
0.0013
433
0.0187
0.0395
C4H2_09-169
[A:C]nc
9.8329
0.0001
433
0.0384
0.0178
CCR_08-554
[A:G]nc
5.9541
0.0014
435
0.0119
0.0395
CesA1A_20-226
[A:G]nc
4.0516
0.0015
432
0.0163
0.0402
CesA1B_02-87
[A:G]nc
4.0226
0.0024
427
0.0163
0.0482
CesA1B_04-127
[A:C]nc
5.7095
0.001
432
0.0227
0.0288
CesA1B_08-261
[A:G]nc
5.4417
0.001
434
0.0216
0.0288
CesA2A_08-38
[A:G]ns
8.6111
0.0011
431
0.0172
0.0288
HCT1_03-246
[A:G]nc
3.9879
0.0027
434
0.0159
0.0482
HCT6_13-225
[A:G]s
5.4364
0.0001
433
0.0217
0.0178
SUSY1_02-108
[A:T]nc
6.7751
0.0001
433
0.0268
0.0178
SUSY1_10-258
[A:C]nc
6.7036
0.0001
433
0.0265
0.0178
SUSY1_14-94
[A:G]nc
3.7898
0.0027
434
0.0152
0.0482
CAD_04-185
[A:T]nc
6.5211
0.001
325
0.0322
0.0288
C4H2_09-169
[A:C]nc
9.9962
0.003
433
0.0373
0.0444
C4H2_12-151
[A:G]s
3.535
0.0027
429
0.0137
0.0487
CAD_04-185
[A:T]nc
3.4754
0.0033
325
0.0175
0.0487
CesA1A_02-481
[A:C]
5.6615
0.008
432
0.0216
0.0766
CesA1A_12-40
[A:G]s
3.593
0.0033
433
0.0138
0.0487
CesA1A_20-226
[A:G]nc
5.7281
0.003
432
0.0218
0.0444
CesA1B_04-127
[A:C]nc
3.4124
0.0037
432
0.0131
0.0487
CesA1B_08-261
[A:G]nc
3.4197
0.0035
434
0.0131
0.0487
CesA2A_08-38
[A:G]ns
8.1264
0.008
431
0.0157
0.0766
CesA2B_01-162
[A:C]nc
3.5186
0.0026
431
0.0135
0.0487
CoAOMT1_08-313
[A:G]nc
7.2144
0.002
431
0.0272
0.0344
COMT2_10-423
[A:C]nc
3.3459
0.0046
397
0.014
0.0487
HCT6_13-225
[A:G]s
3.1681
0.0044
433
0.0122
0.0487
LAC1a_03-98
[A:G]nc
4.6733
0.0013
424
0.0184
0.0487
LAC1a_11-493
[A:G]nc
2.8918
0.0049
433
0.0111
0.0487
PAL2_04-212
[A:G]nc
3.5212
0.0021
432
0.0135
0.0487
S/G
C6
nc
SAM1_09-195
[A:T]nc
4.1603
0.0015
422
0.0162
0.0487
SUSY1_02-108
[A:T]nc
5.4253
0.004
433
0.0207
0.0366
SUSY1_02-396
[A:G]nc
3.0401
0.0042
430
0.0118
0.0487
SUSY1_02-503
[A:G]nc
3.1539
0.0031
432
0.0122
0.0487
SUSY1_10-258
[A:C]nc
4.3819
0.0014
433
0.0168
0.0487
SUSY1_14-128
[A:T]nc
3.2779
0.0035
434
0.0126
0.0487
SUSY1_14-94
Nonsynonymous polymorphism.
s
Synonymous polymorphism.
nc
Noncoding polymorphism.
[A:G]nc
3.2779
0.0045
434
0.0126
0.0487
ns
Table 4. List of marker effects for significant marker-trait pairs
Trait
SNP
2ab
dc
d/a
2a/spd
Frequencye
af
Lignin
C4H1_04-219
C4H2_09-169
CesA1A_20-226
CesA1B_02-87
CesA1B_04-127
CesA1B_08-261
HCT1_03-246
HCT6_13-225
SUSY1_02-108
SUSY1_10-258
SUSY1_14-94
1.1128
5.4356
0.8812
0.7162
0.8864
0.8609
1.7016
1.1007
0.6518
0.4200
1.8197
-0.2097
-2.8933
-0.2481
0.1487
0.1855
0.1475
-0.0340
-0.8865
0.5331
0.7762
-0.1772
-0.3769
-1.0646
-0.5631
0.4154
0.4185
0.3427
-0.0400
-1.6108
1.6356
3.6963
-0.1947
0.8912
4.3533
0.7057
0.5736
0.7099
0.6895
1.3628
0.8815
0.5220
0.3363
1.4574
0.17
0.01
0.32
0.30
0.28
0.28
0.07
0.12
0.13
0.07
0.06
(C)
(A)
(A)
(G)
(A)
(A)
(A)
(G)
(T)
(A)
(A)
0.7467
5.2266
0.3124
-0.7869
-0.6741
-0.5574
1.4719
0.8653
0.3869
0.2358
1.5948
CAD_04-185
0.1655
0.0268
0.3236
0.7762
0.33
(A)
-0.4801
C4H2_09-169
C4H2_12-151
CAD_04-185
CesA1A_02-481
CesA1A_12-40
CesA1A_20-226
CesA1B_04-127
CesA1B_08-261
CesA2B_01-162
CoAOMT1_08-313
COMT2_10-423
HCT6_13-225
LAC1a_03-98
LAC1a_11-493
PAL2_04-212
SAM1_09-195
SUSY1_02-108
SUSY1_02-396
SUSY1_02-503
SUSY1_10-258
SUSY1_14-128
SUSY1_14-94
9.3773
0.9478
1.3827
4.4213
2.0992
1.6669
1.5740
1.5529
1.2053
0.9396
1.3815
1.8043
2.5171
1.3099
0.9397
0.8770
1.4921
1.0363
0.9770
1.5603
3.6518
3.6518
4.0891
-0.2340
0.3715
2.1152
0.2777
0.5787
-0.3659
-0.2997
0.2626
-0.4899
-0.1172
1.2473
0.2932
0.0851
-1.5024
0.7918
-0.6568
-0.1138
-0.1478
-0.8029
0.5908
0.5908
0.8721
-0.4938
0.5373
0.9568
0.2645
0.6943
-0.4649
-0.3860
0.4358
-1.0429
-0.1697
1.3826
0.2329
0.1299
-3.1978
1.8058
-0.8804
-0.2196
-0.3024
-1.0292
0.3236
0.3236
4.4291
0.4477
0.6531
2.0883
0.9915
0.7873
0.7435
0.7335
0.5693
0.4438
0.6525
0.8522
1.1889
0.6187
0.4438
0.4142
0.7047
0.4895
0.4615
0.7370
1.7248
1.7248
0.01
0.45
0.33
0.08
0.13
0.32
0.28
0.28
0.30
0.47
0.44
0.12
0.14
0.26
0.05
0.32
0.13
0.23
0.23
0.07
0.06
0.06
(A)
(G)
(A)
(C)
(A)
(A)
(A)
(A)
(A)
(A)
(C)
(G)
(G)
(A)
(G)
(A)
(T)
(G)
(G)
(A)
(A)
(A)
-9.3344
-0.7508
-8.8239
-4.2124
-1.8498
-1.1166
0.7040
0.8447
-0.9912
-0.0247
-2.4781
-1.7190
-2.7865
0.6407
-1.0696
-1.4465
-1.3030
-1.0285
-0.8377
-1.4921
-3.3928
-3.3928
S/G
C6
bCalculated
as the difference between the phenotypic means observed within each homozygous class (2a = |GBB-Gbb|, where Gij is the
trait mean in the ijth genotypic class).
cCalculated as the difference between the phenotypic mean observed within the heterozygous class and the average phenotypic mean
across both homozygous classes (d = GBb – 0.5(GBB+Gbb), where Gij is the trait mean in the ijth genotypic class).
ds
p,
standard deviation for the phenotypic trait under consideration.
eAllele
frequency of either the derived or minor allele. SNP alleles corresponding to the frequency listed are given in parentheses.
additive effect was calculated as a = pB(GBB) +pb(GBb) – G, where G is the overall trait mean, Gij is the trait mean in the ijth genotypic
class and pi is the frequency of the ith marker allele. These values were always calculated with respect to the minor allele.
fThe
Table 5. List of haplotypes with significant associations to phenotype after a correction for multiple testing (FDR Q ≤ 0.10)
Amplicon
4CL1_11
Trait
lignin
P
0.0042
Q
0.0539
Haplotypes
3
4CL3_14
lignin
0.0021
0.0519
6
CAD_04
lignin
0.0065
0.0578
9
CCR_12
CesA1B_10
CesA2B_16
CesA3A_09
CesA3A_13
lignin
lignin
lignin
lignin
lignin
0.0060
0.0038
0.0055
0.0018
0.0022
0.0578
0.0539
0.0576
0.0519
0.0519
4
6
2
5
7
HCT1_12
lignin
0.0016
0.0519
3
SUSY1_02
lignin
0.0053
0.0576
3
TUA5_09
4CL1_01
lignin
C6
0.0027
0.0000
0.0521
0.0018
7
5
CesA1A_12
C6
0.0005
0.0231
6
SAM1_07
C6
0.0008
0.0239
5
Significant
Haplotypes
TGC
AGC
CGT
GGT
GGA
CAAAAT
CATAAT
GATAAT
0
AGA
0
TAAAAA
CGGAA
CAAAT
CGGCT
CAACT
AA
GA
AAAA
TGGG
0
AGA
AAA
AGA
AAA
GAG
AGAA
GGAA
Haplotype
Frequency
0.31
0.94
0.02
0.09
0.22
0.03
0.01
0.02
0.15
0.01
0.15
0.04
0.02
0.57
0.73
0.08
0.77
0.13
0.12
0.02
0.09
0.04
0.20
0.01
0.30
*Significant single marker associations (FDR Q  0.10) listed with the associated traits
aSingle
Single marker associations
4CL1_11-108 (0.2278)a
4CL3_13-464 (0.2041)a
*CAD_04-185 (S/G, C6)
CCR_12-366 (0.2168)a
CesA1B_10-41 (0.3726)a
CesA2B_16-423 (0.2967)a
CesA3A_09-93 (0.2068) a
CesA3A_13-535 (0.1777) a
HCT1_12-156 (0.1828) a
*SUSY1_02-108 (lignin, C6)
*SUSY1_02-396, SUSY1_02-503 (C6)
TUA5_09-73 (0.1899) a
4CL1_01-468 (0.1668) a
*CesA1A_12-40 (C6)
SAM1_07-480 (0.2874) a
marker associations with the lowest Q value relating to the significant haplotype-trait association
Figure Legends
Figure 1: Descriptive information about the distribution, sampling localities, and population
structure across the range of black cottonwood. (A) Range map for black cottonwood. (B) Sample
locations across Oregon and Washington. Each point denotes a single tree (n = 448). (C) Population
structure estimates across all the sampled range of black cottonwood. Colors designate the five
significant genetic clusters detected using PCA. Multiple colors denote points with multiple clones
assigned to different genetic clusters.
Figure 2: (A) Decay of LD with distance in base pairs between sites in two candidate genes: SUSY1
and C4H1. Squared coefficients of allele frequency (r 2) are plotted against distance in base pairs.
The fitted curve represents the trend of decay of LD. (B) Decay of LD with distance in base pairs
between sites pooled across 39 genes. (C) Decay of LD across all candidate genes for the first 400
base pairs from that presented in (B).
Figure 3: (A-C) An example of marker effects in the CesA1B gene on the lignin content phenotype.
Each marker explains a small portion of the phenotypic variance (r2 2-3%) and is consistent with
an additive model of gene action. Whiskers in the box plots represent 1.5 times the interquartile
range. (D) Illustrated are the 39 SNPs genotyped for the CesA1B gene relative to the reference
gene model, as well as 3 of those 39 that were significant (indicated with an asterick). Solid boxes
denote UTR, solid lines are introns, and open boxes indicate exons in the gene model.
Figure 4: Marker effects on the significant non-synonymous SNPs found in C4H1 and CesA2A. (A)
The C4H1_04-219 non-synonymous marker in the first exon of the C4H1 gene illustrated patterns
of gene action consistent additive effects. The C allele at C4H1_04-219 causes a histidine (H) to
proline (P) amino acid substitution. (B) The CesA2A_08-38 non-synonymous marker is located in
the 6th exon of the CesA2A gene. This SNP is significant for both lignin content and C6 traits. For
lignin content, the homozygote decreases the percent content while in C6, the sugar content is
elevated. The G allele at CESA2A is the derived state and is responsible for an isoleucine (I) to
valine (V) amino acid substitution. In both gene models, solid boxes denote UTR, solid lines are
introns, and open boxes indicate exons.
Figure 5: Haplotype and single marker associations are illustrated for SUSY1 and 4CL1. (A) The
genotypic effects of the three proposed haplotypes (two significant) of SUSY1 are shown. The
haplotypes yield significantly different median phenotypic values for the lignin content trait. The marker
effects of four significant single marker associations are also shown. SUSY1_02-108 is significant with
respect to lignin. The remaining markers are significant with respect to the related trait, C6 sugars. All
four markers are within LD of one another. (B) The genotypic effects of the three haplotypes (two
significant) of 4CL1 are shown. The significant haplotypes yield different median phenotypic values for
the lignin content trait. No significant single marker associations were identified after multiple testing,
however the box plots for single markers with P < 0.05 are shown. Two of the three markers are in LD
with one another.
A
B
C

SUPPORTING INFORMATION
TABLE S1. Sample localities for the 448 individuals used for association mapping in P. trichocarpa
Sample ID
Latitude
Longitude
349
44.733
-123.067
359
47.850
-121.817
365
47.050
-122.700
367
47.050
-122.700
457
.
.
460
47.067
-122.200
521
.
.
552
47.067
-122.200
1862
45.750
-122.833
1863
45.750
-122.833
1909
45.583
-122.383
1921
45.583
-122.200
1922
45.583
-122.200
1944
45.583
-122.000
1950
45.583
-121.917
1983
46.083
-122.833
1984
46.083
-122.833
2022
45.750
-122.667
2028
45.833
-122.833
2037
45.833
-122.833
2045
46.083
-123.917
2048
46.083
-123.917
2063
46.167
-123.000
2066
46.167
-123.000
2076
46.333
-123.417
2092
46.167
-123.333
2103
46.000
-123.000
2116
46.000
-123.083
2118
46.000
-123.083
2136
46.167
-122.917
2151
46.333
-122.917
2159
46.333
-122.917
2161
46.333
-122.667
2165
46.333
-122.667
2166
46.333
-122.667
2175
46.333
-122.667
2204
45.617
-122.583
2213
45.617
-122.583
2220
43.750
-122.500
2228
43.750
-122.500
2236
46.167
-123.083
2257
45.567
-123.000
2283
45.833
-123.117
2299
45.833
-122.733
2322
45.750
-122.750
2325
45.333
-122.717
2327
45.333
-122.717
2343
45.250
-122.867
2345
45.083
-123.033
2351
45.083
-123.033
2356
45.083
-123.033
2358
44.833
-123.167
2361
44.833
-123.167
2365
44.833
-123.167
2368
44.833
-123.167
2392
44.700
-123.167
2393
44.700
-123.167
2402
44.733
-123.083
2405
44.733
-123.083
2408
44.733
-123.083
2428
45.417
-122.600
2451
45.567
-122.667
2480
45.317
-122.567
2501
45.317
-122.567
2505
45.283
-122.633
2506
45.283
-122.633
2515
45.283
-122.633
2518
45.083
-122.750
2525
45.083
-122.750
2529
45.083
-122.750
2532
44.417
-123.333
2534
44.417
-123.333
2538
44.417
-123.333
2549
44.250
-123.250
2551
44.250
-123.250
2572
44.083
-123.167
2573
44.083
-123.167
2577
44.083
-123.167
2583
44.000
-122.917
2590
44.000
-122.917
2591
44.000
-122.917
2597
44.000
-122.917
2616
44.333
-123.417
2631
44.517
-123.200
2654
45.167
-123.167
2679
45.200
-123.083
2683
45.200
-123.083
2686
45.200
-123.083
2716
46.667
-121.750
2720
46.667
-121.750
2727
46.500
-122.000
2731
46.500
-122.000
2884
45.583
-122.383
2889
45.583
-122.383
2892
45.583
-122.383
2896
45.583
-122.383
2897
46.100
-122.967
4530
46.667
-121.667
4531
46.500
-122.000
4579
46.717
-121.717
4580
46.567
-121.667
4583
46.567
-121.667
4584
46.717
-121.717
4585
46.567
-121.667
4588
45.583
-122.383
4593
44.733
-123.050
4594
45.833
-121.883
4603
45.583
-122.383
4605
46.183
-123.150
4606
45.667
-122.717
4608
45.667
-122.717
4610
45.967
-122.817
4611
47.050
-122.900
6808
46.183
-123.533
6816
46.183
-123.533
6828
46.150
-123.200
6831
45.900
-122.733
6841
45.933
-122.817
6848
45.933
-122.817
6855
45.733
-122.767
6858
45.733
-122.767
6874
46.150
-123.333
6880
46.267
-123.450
6891
42.917
-122.950
6903
45.583
-122.383
6909
45.583
-122.383
6915
45.800
-122.750
6926
45.533
-122.350
6931
45.533
-122.350
6936
45.567
-122.333
6946
45.567
-122.333
6952
46.533
-122.100
6958
46.533
-122.100
6962
46.533
-121.900
6966
46.533
-121.900
6969
46.533
-121.900
6972
46.550
-122.267
6977
46.550
-122.267
6981
46.333
-122.917
6995
45.983
-122.533
6997
45.983
-122.533
6998
45.983
-122.533
6999
45.983
-122.533
7006
46.533
-122.483
7014
46.533
-122.483
7017
46.033
-122.300
7019
46.033
-122.300
7021
46.033
-122.300
7030
46.433
-122.850
7042
45.900
-123.150
7044
45.900
-123.150
7059
45.900
-123.083
7062
45.900
-123.083
7067
46.150
-123.200
7069
45.567
-122.450
7071
45.633
-122.717
7073
44.167
-122.950
7074
45.567
-122.450
7075
44.167
-122.950
7076
44.167
-122.950
7077
48.867
-121.867
7079
45.483
-121.867
7087
44.333
-123.233
7088
44.333
-123.233
7091
44.333
-123.233
7094
44.333
-123.233
7096
45.833
-121.883
7098
46.133
-123.333
7109
45.617
-122.667
7117
45.483
-122.683
7118
47.700
-121.350
7123
47.500
-121.783
7124
47.500
-121.783
7126
43.867
-122.817
7128
43.467
-122.683
7131
45.583
-122.417
7133
45.850
-122.750
7136
44.733
-123.050
7137
44.733
-123.050
7138
44.733
-123.050
7139
44.733
-123.050
7140
45.383
-122.600
7141
45.383
-122.600
7142
45.383
-122.600
7144
45.650
-122.750
7149
45.150
-122.533
7151
45.150
-122.533
7152
45.783
-122.533
7632
45.750
-122.817
7637
45.467
-122.683
7639
45.583
-122.417
7640
45.583
-122.417
7647
46.133
-123.333
7649
45.533
-122.383
7973
45.550
-122.400
7974
45.550
-122.400
7975
45.550
-122.400
7976
46.117
-123.000
7978
46.117
-123.000
7979
46.117
-123.000
7981
46.133
-123.250
7982
46.133
-123.250
7983
46.117
-123.000
7984
46.117
-123.000
7985
46.117
-123.000
7987
46.083
-122.883
7988
46.083
-122.883
7989
46.083
-122.883
7990
46.083
-122.883
7991
46.000
-122.867
7992
46.000
-122.867
7993
46.000
-122.867
7994
46.000
-122.867
7996
46.100
-123.183
8401
46.933
-122.600
8407
46.933
-122.600
8415
46.750
-122.033
8423
46.767
-122.183
8431
46.767
-122.183
8435
46.300
-122.783
8436
46.300
-122.783
8445
46.733
-121.900
8452
46.733
-121.900
8467
46.933
-122.550
8468
46.933
-122.550
8469
47.000
-123.517
8470
47.000
-123.517
8493
47.000
-123.400
8505
45.400
-122.500
8513
45.400
-122.500
8516
45.383
-122.400
8527
45.367
-122.367
8534
44.633
-122.883
8540
44.633
-122.883
8552
44.500
-122.817
8561
44.800
-122.783
8579
44.533
-122.900
8581
44.533
-122.900
8585
44.750
-122.467
8601
44.783
-122.617
8608
44.117
-122.567
8611
44.117
-122.567
8628
44.133
-122.567
8631
44.133
-123.067
8639
44.133
-123.067
9577
47.167
-122.383
9578
47.167
-122.383
9579
47.167
-122.383
9580
47.167
-122.383
9581
47.167
-122.383
9582
47.167
-122.383
9583
47.167
-122.383
9584
47.167
-122.383
9585
47.867
-122.633
9586
47.867
-122.633
9587
47.867
-122.633
9588
47.450
-123.033
9589
47.450
-123.033
9590
47.450
-123.033
9591
47.450
-123.033
9592
44.750
-122.867
9593
44.750
-122.867
9594
44.750
-122.867
9595
44.750
-122.867
9596
44.750
-122.867
9597
44.800
-123.233
9598
47.067
-123.733
9756
48.817
-122.217
9757
48.817
-122.217
9758
48.817
-122.217
9759
48.817
-122.217
9760
48.817
-122.217
9761
48.817
-122.217
9762
48.817
-122.217
9763
48.817
-122.217
9764
48.817
-122.217
9765
48.817
-122.217
9766
48.817
-122.217
9767
48.817
-122.217
9768
48.717
-122.200
9769
48.717
-122.200
9770
48.717
-122.200
9771
48.717
-122.200
9772
48.717
-122.200
9773
48.717
-122.200
9774
48.717
-122.200
9775
48.717
-122.200
9776
48.717
-122.200
9777
48.717
-122.200
9778
48.717
-122.200
9779
48.717
-122.200
9780
48.917
-122.067
9781
48.917
-122.067
9782
48.917
-122.067
9783
48.917
-122.067
9784
48.917
-122.067
9785
48.917
-122.067
9786
48.917
-122.067
9787
48.917
-122.067
9788
48.917
-122.067
9789
48.917
-122.067
9790
48.917
-122.067
9791
48.917
-122.067
9792
48.500
-122.217
9793
48.500
-122.217
9794
48.500
-122.217
9795
48.500
-122.217
9796
48.500
-122.217
9797
48.500
-122.217
9798
48.500
-122.217
9799
48.500
-122.217
9801
48.500
-122.217
9802
48.500
-122.217
9803
48.500
-122.217
9804
48.517
-122.050
9805
48.517
-122.050
9806
48.517
-122.050
9807
48.517
-122.050
9808
48.517
-122.050
9809
48.517
-122.050
9810
48.517
-122.050
9811
48.517
-122.050
9812
48.517
-122.050
9813
48.517
-122.050
9814
48.517
-122.050
9815
48.517
-122.050
9816
48.533
-121.750
9817
48.533
-121.750
9818
48.533
-121.750
9819
48.533
-121.750
9820
48.533
-121.750
9821
48.533
-121.750
9822
48.533
-121.750
9823
48.533
-121.750
9824
48.533
-121.750
9825
48.533
-121.750
9826
48.533
-121.750
9827
48.533
-121.750
9828
47.733
-121.933
9829
47.733
-121.933
9830
47.733
-121.933
9831
47.733
-121.933
9832
47.733
-121.933
9833
47.733
-121.933
9834
47.733
-121.933
9836
47.733
-121.933
9837
47.733
-121.933
9838
47.733
-121.933
9839
47.733
-121.933
9840
47.733
-121.983
9841
47.733
-121.983
9842
47.733
-121.983
9843
47.733
-121.983
9844
47.733
-121.983
9845
47.733
-121.983
9846
47.733
-121.983
9847
47.733
-121.983
9848
47.733
-121.983
9849
47.733
-121.983
9850
47.733
-121.983
9851
47.733
-121.983
9852
47.683
-121.917
9853
47.683
-121.917
9854
47.683
-121.917
9855
47.683
-121.917
9857
47.683
-121.917
9858
47.683
-121.917
9859
47.683
-121.917
9860
47.683
-121.917
9861
47.683
-121.917
9862
47.683
-121.917
9863
47.683
-121.917
9864
47.200
-121.933
9865
47.200
-121.933
9866
47.200
-121.933
9867
47.200
-121.933
9868
47.200
-121.933
9869
47.200
-121.933
9870
47.200
-121.933
9871
47.200
-121.933
9872
47.200
-121.933
9873
47.200
-121.933
9874
47.200
-121.933
9875
47.200
-122.050
9876
47.200
-122.050
9878
47.200
-122.050
9879
47.200
-122.050
9880
47.200
-122.050
9882
47.200
-122.050
9883
47.200
-122.050
9884
47.200
-122.050
9885
47.200
-122.050
9886
47.200
-122.050
9887
47.083
-122.233
9888
47.083
-122.233
9889
47.083
-122.233
9890
47.083
-122.233
9891
47.083
-122.233
9892
47.083
-122.233
9893
47.083
-122.233
9894
47.083
-122.233
9895
47.083
-122.233
9896
47.083
-122.233
9897
47.083
-122.233
9898
47.083
-122.233
9899
47.117
-122.117
9900
47.117
-122.117
9901
47.117
-122.117
9903
47.117
-122.117
9904
47.117
-122.117
9905
47.117
-122.117
9906
47.117
-122.117
9907
47.117
-122.117
9908
47.117
-122.117
9909
47.117
-122.117
9910
47.117
-122.117
9911
47.100
-122.200
9912
47.100
-122.200
9913
47.100
-122.200
9914
47.100
-122.200
9915
47.100
-122.200
9916
47.100
-122.200
9917
47.100
-122.200
9918
47.100
-122.200
9919
47.100
-122.200
9920
47.100
-122.200
9921
47.100
-122.200
9947
46.050
-121.933
9948
46.050
-121.933
9949
46.050
-121.933
9950
46.050
-121.933
9951
46.050
-121.933
9952
46.050
-121.933
9953
46.150
-123.333
9954
46.150
-123.333
9955
46.150
-123.333
9956
46.150
-123.333
9957
46.150
-123.333
9958
46.150
-123.333
9959
45.950
-121.950
9960
45.950
-121.950
9961
45.950
-121.950
9962
45.950
-121.950
9963
45.950
-121.950
9964
45.950
-121.950
9965
45.950
-122.817
9966
45.950
-122.817
9967
45.950
-122.817
9968
45.950
-122.817
9969
45.950
-122.817
9970
45.950
-122.817
9971
45.950
-122.817
10072
46.267
-123.450
TABLE S2. Summaries of quality scores across genotyped SNP loci
Sample ID
Latitude
Longitude
0.6856
0.1359
0.3091-0.8874
0.6919
0.1290
0.3244-0.8881
0.6568
0.1569
0.2676-0.8627
0.6688
0.1530
0.2659-0.8692
Control
0.4210
0.1856
0.1500-0.9331
0.4233
0.1814
0.1494-0.9359
Focal
0.3599
0.1478
0.1399-0.6849
0.3666
0.1476
0.1399-0.6808
0.6791
0.1437
0.2756-0.8869
0.6850
0.1376
0.2798-0.8879
0.6495
0.1601
0.2484-0.8560
0.6613
0.1564
0.2443-0.8563
0.9859
0.0281
0.9060-1.0000
0.9875
0.0237
0.9242-1.0000
0.9836
0.0514
0.8894-1.0000
0.9859
0.0469
0.9159-1.0000
GenTrain
Control
Focal
Cluster Separation
GenCall50
Control
Focal
Call Rate
Control
Focal
Numbers are given for the full dataset (upper) and for the reduced dataset with FIS outliers removed (lower and bolded). The full dataset
consisted of 297 control SNPs and 579 focal SNPs, while the reduced dataset consisted of 247 control SNPs and 530 focal SNPs. The 95% interval is
defined by the 2.5% and 97.5% quantiles.
TABLE S3. Summaries of genotyped SNPs for focal (n = 530) and control SNPs (n = 247).
Samples
Missing (%)
MAF
HE
HO
FIS
HWE (%)
Focal
448
1.56 (4.72)
0.21 (0.15)
0.28 (0.16)
0.27 (0.16)
0.04 (0.08)
97.36 (77.17)
Cluster 1
All
64
1.30 (5.59)
0.23 (0.18)
0.29 (0.17)
0.28 (0.18)
0.01 (0.15)
99.81 (93.02)
Cluster 2
80
0.07 (0.14)
0.20 (0.17)
0.27 (0.17)
0.29 (0.20)
-0.03 (0.17)
96.23 (89.81)
Cluster 3
82
1.14 (4.73)
0.20 (0.16)
0.27 (0.17)
0.27 (0.18)
0.03 (0.21)
97.55 (90.19)
Cluster 4
113
1.15 (4.58)
0.19 (0.16)
0.26 (0.17)
0.25 (0.18)
0.03 (0.12)
97.54 (92.08)
Cluster 5
109
2.25 (5.20)
0.21 (0.17)
0.27 (0.17)
0.28 (0.19)
0.01 (0.18)
96.04 (85.09)
448
1.26 (2.35)
0.23 (0.14)
0.31 (0.15)
0.30 (0.15)
0.03 (0.08)
95.95 (80.16)
Cluster 1
64
1.23 (2.42)
0.25 (0.16)
0.33 (0.15)
0.33 (0.16)
-0.01 (0.14)
99.59 (94.33)
Cluster 2
80
1.19 (2.61)
0.22 (0.14)
0.30 (0.16)
0.30 (0.16)
0.02 (0.14)
100.00 (90.69)
Cluster 3
82
1.02 (2.41)
0.23 (0.14)
0.31 (0.15)
0.30 (0.16)
0.02 (0.12)
99.59 (93.52)
Cluster 4
113
1.01 (2.40)
0.22 (0.14)
0.30 (0.16)
0.29 (0.16)
0.03 (0.14)
100.00 (91.49)
Cluster 5
109
1.76 (2.96)
0.22 (0.15)
0.31 (0.16)
0.30 (0.16)
0.02 (0.13)
100.00 (87.85)
Control
All
Listed are averages (one standard deviation) across loci unless otherwise noted. Outliers with respect to FIS were removed prior to these
calculations. Numbers in bold have 99% bootstrap (n = 10,000 replicates) confidence intervals for the mean across loci that do not include zero.
Numbers listed under HWE are the percent of loci consistent with HWE at a Bonferroni corrected significance threshold of P = 0.000094. Values
in parentheses are the percent of loci consistent with HWE at a significance threshold of P = 0.05. FIS, Wright’s fixation index; HE, expected
heterozygosity; HO, observed heterozygosity; HWE, Hardy-Weinberg Equilibrium; MAF, minor allele frequency.
FIGURE S1.–Distribution of quality metrics for genotyped SNPs grouped by dataset (control: 297 SNPs, focal: 579 SNPs). Values are assessed per
SNP across all samples and higher values indicate higher quality for all metrics. The distributions for these metrics when FIS outliers were removed
were qualitatively similar (data not shown). A comparison of these distributions when FIS outliers were or were not included is located in Table S1.
(A). The distributions for the GenTrain score were similar across control and focal SNPs. (B). The distributions for the cluster separation score
were similar across control and focal SNPs, with the control SNPs forming slightly tighter clusters. (C). The distributions for the GenCall50 (GC50)
score were similar across control and focal SNPs. This metric is assigned to each genotype call at a SNP, thus the GC50 score represents the median
value of this metric across all samples typed at a particular SNP. (D). The distributions for the Call Rate were similar across control and focal SNPs.
This metric is the complement of the fraction of missing data per locus.
FIGURE S2.–Cluster assignments illustrated across pairwise plots of the four significant principal components (PCs) derived using principal
components analysis (PCA) encapsulated in the EIGENSOFT computer package (cf. http://genepath.med.harvard.edu/~reich/Software.htm).
Clusters were formed using hierarchical clustering with Ward’s method on the four significant PCs. The geographical distribution of cluster
assignments is shown in Figure 1C.
FIGURE S3.–Summaries of population genetic parameters across all samples and samples placed into clusters. For each plot, the dashed line
represents all samples while colors designate the five significant genetic clusters detected using PCA. Further summaries are given in Table S2. (A).
Distributions of Wright’s inbreeding coefficient (FIS) for the 247 control SNPs were similar across all clusters. (B). Distributions of FIS for the 530
focal SNPs were similar across all clusters. (C). Distributions for the minor allele frequency for the control SNPs were broadly similar across all
clusters. Clusters four and five were the most different, with cluster four having a pronounced spike in its density around a MAF of 0.30. (D).
Distributions for the minor allele frequency for the focal SNPs were similar across all clusters.
FIGURE S4.–Differentiation among inferred genetic clusters for Populus trichocarpa reveals FST outliers within the set of focal SNPs. The gray colors
denote a two dimensional (2D) boxplot of the relationship between expected heterozygosity and FST for the 247 control SNPs. The lightest color
denotes the extreme for this 2D distribution (i.e. no values for FST were observed greater than that bounding line for any value of expected
heterozygosity). Points represent individual focal SNPs, with colored points differentiating outlier SNPs within the same candidate gene locus.
Plotted to the right are the distributions for FST for the focal and control SNPs. The average FST was low for both sets, but greater for the focal (FST =
0.034) as opposed to control (FST = 0.013) SNPs. Plotted above are the distributions for expected heterozygosity for the focal and control SNPs.
The distribution for the control SNPs illustrates a larger ascertainment bias, which is apparent by the high density centered on values of expected
heterozygosity in the range of 0.40-0.50. This distribution should be broadly U-shaped under neutrality without ascertainment bias.
FIGURE S5.–Cluster assignment is correlated with phenotypic traits. Colors designate groups based on Tukey multiple comparison tests. A global
significance threshold of P = 0.01 was assumed for all tests. In all cases, the mean of cluster one was significantly different from means for the
remaining four clusters. Shown are boxplots for each cluster. The whiskers extend to the data extremes in all cases. (A). The effect of cluster
assignment on the C6 phenotype. Differences in the mean among clusters are significant (ANOVA: F3,443 = 12.290, P = 1.722E-09). (B). The effect
of cluster assignment on the lignin (%) phenotype. Differences in the mean among clusters are significant (F3,443 = 8.388, P = 1.573E-06). (C). The
effect of cluster assignment on the S/G phenotype. Differences in the mean among clusters are significant (ANOVA: F3,443 = 9.456, P = 2.419E07).
Download