Association Genetics of Traits Controlling Lignin and Cellulose Biosynthesis in Black Cottonwood (Populus trichocarpa, Salicaceae) Secondary Xylem Jill L Wegrzyn (1), Andrew J. Eckert (2,3), Minyoung Choi (2), Jennifer M. Lee (2), Brian J. Stanton (4), Robert Sykes (5), Mark F. Davis (5), Chung-Jui Tsai (6), and David B. Neale (1,3,7,8,9) (1)Department of Plant Sciences, University of California at Davis, Davis, CA 95616 (2)Section of Evolution and Ecology, University of California at Davis, Davis, CA 95616 (3)Center for Population Biology, University of California at Davis, Davis, CA 95616 (4)Genetic Resources Conservation Program, Greenwood Resources, Portland, OR (5)National Renewable Energy Lab, Golden, CO (6)School of Forestry and Natural Resources, and Department of Genetics, University of Georgia, Athens, GA (7)Bioenergy Research Center (BERC), University of California at Davis, Davis, CA 95616 (8)Institute of Forest Genetics, USDA Forest Service, Davis, CA 95616 (9) Author for correspondence: David B. Neale Department of Plant Sciences Mail Stop 6 University of California, Davis Davis, CA 95616 (530) 754-8431 dbneale@ucdavis.edu Abstract Recent interest in poplars as a source of renewable energy, combined with the vast genomic resources available, has enabled further examination of the genetic diversity underlying the lignin and cellulose biosynthetic pathways. In this study, an association genetics approach was used to examine individual genes and alleles at the loci responsible for complex traits controlling lignin and cellulose quality and quantity in black cottonwood (Populus trichocarpa Torr. & A. Gray). Forty candidate genes of the lignin and cellulose biosynthetic pathways were resequenced in a panel of 15 unrelated individuals to identify single nucleotide polymorphisms (SNPs). A total of 1,536 SNPs were subsequently genotyped in a clonally replicated population (448 clones). The association population (1,080 trees) was phenotyped using pyrolysis molecular beam mass spectrometry (pyMBMS). Both single marker and haplotype-based association tests were implemented to identify associations for composite traits representing lignin content, syringyl/guaiacyl ratio, and C6 sugars. A general linear model approach, including population structure estimates as covariates, was implemented for each marker-trait pair. This study identified 27 highly significant associations (FDR Q < 0.10) unique across 40 candidate genes in three composite traits. Of these, five associations were found to be in the coding region of the candidate genes, with two being nonsynonymous. Haplotype-based associations were performed on 181 amplicons across the 40 genes. For lignin content and C6 sugars, 23 significant haplotypes within 11 genes were discovered. The majority of markers (56%) in all three composite traits were characterized as having additive modes of gene action. These associations provide insight into the genetic components of complex traits involved in the lignin and cellulose biosynthetic pathways in black cottonwood. Introduction Forest trees are a potential source of net-zero carbon emission lignocellulosic biofuels. Production of biofuels involves collection of biomass, deconstruction of cell wall polymers into component sugars (pretreatment and saccharification), and conversion of these sugars to ethanol (fermentation) (Rubin 2008). Woody bioenergy crops from which biomass is derived have not been domesticated for this purpose and the current methods for lignocellulosic saccharification and fermentation are inefficient. The recent need to develop viable fuel alternatives is now taking advantage of genomics resources and technologies to discover the potential gain that can be achieved through breeding. Traits of interest in trees with applications in bioenergy include: growth rate, branching habit, stem thickness, and cell-wall chemistry (Stettler et al. 1996; Bradshaw et al. 2000). As a commercial species, black cottonwood (Populus trichocarpa Torr. & A. Gray) and its hybrids have already proven themselves to be valuable as a renewable energy resource. Rapid growth, moderate genome size, woody tissues, and economic importance make black cottonwood an ideal model organism to examine biofuels-related traits (Bradshaw et al. 2000). Black cottonwood possesses tremendous genetic and phenotypic diversity, is obligate outcrossing, able to hybridize with many other species, and easily clonally propagated (Davis, 2008). To further complement the advantages of this species as a short rotation woody crop, black cottonwood is the first tree and bioenergy feedstock to have its genome sequenced and annotated. Derived from a single wild individual (Nisqually-1), the genome sequence represents an estimated 45,500 genes across 19 chromosomes (Tuskan et al. 2006). In addition to the genome, resources such as controlled cross-populations, cross-species molecular markers, expressed sequence tag (EST) collections, and full-length cDNAs are available to the research community (Ralph et al. 2006; Strauss and Martin 2004; Tuskan et al. 2006). Improvement of biofuels feedstocks focuses on increasing both the relative carbon partitioning in woody tissues above ground and cellulose accessibility for enzymatic digestion (Ragauskas et al. 2006). As with other woody species, the major components of black cottonwood secondary cell walls are cellulose, hemicellulose, and lignin (Harris et al. 2008). Lignin inhibits saccharification in processes aimed at producing simple sugars for fermentation to ethanol. Many studies have been focused on the molecular biology of wood and secondary wall formation (Plomion et al. 2001; Schrader et al. 2004; Sterky et al. 1998, 2004). The pathways and genes involved in lignin and cellulose biosynthesis and microfibril deposition are increasingly well understood through biochemical analysis and expression studies (Whetten et al. 1998; Plomion et al. 2001; Li et al. 2003; Peter and Neale 2004; Shrader et al. 2004; Boerjan 2005; Oakley et al. 2007). The specific roles of genes in these pathways have been verified through forward and reverse genetic mutation studies (Dixon and Reddy 2003; Ralph et al. 2007; Davis 2008). A relatively unexplored area of research is to identify the natural allelic variation controlling phenotype variation and to exploit this variation in breeding. A major goal of population and quantitative genetics is the identification of polymorphisms responsible for phenotypic variation (Feder and Mitchell-Olds, 2003; Stinchcombe and Hoekstra 2007). Many traits of interest in forest trees, such as wood quality, are complex in nature and occur later in development (Groover 2007). Recent advances in high-throughput marker technologies, combined with the wealth of genomic resources available to species like black cottonwood, enable closer examination of the number and effect sizes of genes responsible for traits of interest through complex trait dissection using association mapping. Tree species are ideal for association mapping as they are predominantly outcrossing and have large, relatively unstructured populations, resulting in high levels of nucleotide diversity and low linkage disequilibrium (LD) (Neale and Savolaninen 2004; Gonzalez-Martinez et al. 2006). Significant associations between SNPs within candidate genes have been established in forest trees. Associations with wood quality traits in Eucalyptus (Thumma et al. 2005), wood quality and drought tolerance traits in loblolly pine (Gonzalez-Martinez et al. 2007, 2008), bud phenology traits in European poplar (Ingvarsson et al. 2008), and cold-hardiness related traits in coastal Douglas-fir (Eckert et al. 2009a) have been identified. In general, individual SNPs explain a small portion of the phenotypic variance (0.5%-5.0%), which is consistent with the complex nature of these traits. In this study, statistical models were applied to perform association tests and account for population structure in 579 SNPs from 40 candidate genes involved in lignocellulosic cell wall synthesis in black cottonwood. Single-marker and haplotype-based tests were performed to identify associations with natural variation in composite traits evaluating lignin and cellulose content. Materials and Methods Association Population and Phenotypic Data Focal Species The native range of black cottonwood covers large sections of western North America, primarily inhabiting floodplains and river margins (Kelleher et al. 2007). The range extends from Kodiak Island along Cook Inlet to latitude 62° 30° N., to southeast Alaska and British Columbia to the forested areas of Washington, Oregon, and to the mountains in southern California and northern Baja California (lat. 31° N.). It is also found inland, generally on the west side of the Rocky Mountains, in British Columbia, western Alberta, western Montana, and northern Idaho. Scattered small populations have been noted in southeastern Alberta, eastern Montana, western North Dakota, western Wyoming, Utah, and Nevada. Black cottonwood grows up to elevations of 2100 m. Association Population As part of a long-term Populus x generosa hybridization program, GreenWood Resources (Portland, OR) assembled a collection of 1,189 black cottonwood clones from 101 provenances from 12 river drainages located west of the Cascade Mountains between 480 56’ N latitude (Nooksack River, Whatcom County, Washington) and 430 47’ N latitude (Middle Fork, Willamette River, Lane County, Oregon) during the period 1990 through 1999 (Figure 1). The collection was established in clone banks where it was annually coppiced to remove C-effects from planting stock used in the establishment of clonally replicated field trials in 1994, 1996, 1999, and 2003. All four trials were planted at an alluvial site on the lower Columbia River floodplain at Westport, Oregon (460 08’ N). The soil is deep, moderately well drained with a loam - silt loam surface overlaying a sandy loam to fine sand horizon. Annual precipitation averages 2,034 mm and the average maximum temperature during the April September growing season is 20 C. Sample Preparation and Wood Chemistry Phenotyping Wood samples were collected from a subset of 448 clones representing all of the original provenances. Two Haglof 5 mm increment borers were taken from the bark to the pith of up to three ramets per clone growing in the four Westport clone trials (Figure 1B, Table S1). Cores were extracted at diameter at breast height (1.37m) and placed in a –8ºC freezer until sectioning. Sample preparation consisted of removing the two outermost complete growth rings of each core due to different ages of the trees. Ground wood samples (~4 mg) were prepared in stainless steel sample cups, and pyrolyzed using a Frontier Pyrolyzer, PY2020iD (Frontier Laboratories, LTD). Pyrolysis was performed at 500°C using helium carrier gas flowing at 2.0 L/min (at STP). The transfer line connecting the pyrolysis unit to the molecular beam mass spectrometer (MBMS) was heated to approximately 400°C. The pyrolysis vapors were expanded through a ruby sampling orifice that was mated directly to the faceplate of the MBMS. Total pyrolysis time was 30 s, although the pyrolysis reaction was completed in less than 12 s. A custom-built molecular-beam mass spectrometer using an ExtrelTM Model TQMS C50 mass spectrometer was used for pyrolysis vapor analysis. Mass spectral data from mass to charge ratio (m/z) 30-450 were acquired on a Merlin data acquisition system using 22.5eV electron impact ionization. Using this system, both light gases and heavy tars are sampled simultaneously and in real time. The mass spectrum of the pyrolysis vapor provides a rapid, semi-quantitative depiction of the molecular fragments. Data analysis was performed using the Unscrambler v. 9.7 (CAMO A/S, Trondheim, Norway). Resequencing, SNP Discovery, and Genotyping Candidate Gene Selection Forty candidate genes associated with lignocellulosic cell wall development were selected for resequencing (Table 1). These include 22 genes from 11 gene families involved in lignin biosynthesis and polymerization, six genes from four families involved in one-carbon metabolism associated with lignin biosynthesis, and 12 genes from five families involved in cellulose biosynthesis and microfibril deposition. The corresponding gene models were obtained from the JGI Poplar Genome Assembly v. 1.1 and manually curated (Table 1). DNA Isolation, Primer Design, and Resequencing Leaf tissue from the diversity panel of 15 unrelated poplar clones (one ramet/clone), selected to represent the latitudinal range of the entire clone collection, were sampled as leaf punches, dried with silica gel, and shipped at room temperature to DNA Landmarks (Quebec, Canada) for DNA extraction utilizing their proprietary micro-scale protocol. All DNA extractions were standardized to 2.5 ng/ul for resequencing. The same protocol was used to extract DNA for the 448 clones, with all extractions standardized to 50 ng/ul prior to genotyping. Primers were designed at Ampure Agencourt Bioscience Corporation (Beverly, MA) utilizing custom software against the Poplar Genome Assembly v. 1.1. Genomic sequences covering the entire proteincoding regions, including introns and 1,000 bp upstream and 300 bp downstream noncoding sequences, were retrieved for primer design. The program was set to design primers every 700 bp which yielded 517 primer pairs across the 40 genes. Of these, Agencourt utilized in-house software to select 200 nonoverlapping primer pairs based on a quality metric representing the redundancy in the genome and how likely the amplicon is to be a homopolymer locus. The best-scoring pairs were tagged with M13F (GTAAAACGACGGCCAGT) and M13R (CAGGAAACAGCTATGACC) primers for high-throughput sequencing. Genomic DNA was amplified in 384-well format PCR setup. Each PCR reaction contained 10 ng DNA, 1X HotStar buffer, 0.8 mM dNTPs, 1 mM MgCl2, 0.2U HotStar enzyme (Qiagen, Valencia, CA) and 0.2 μM forward and reverse primers in a 10 μl reaction. PCR cycling parameters were: one cycle of 95ºC for 15 min, 35 cycles of 9C for 20 s, 60C for 30 s and 72C for 1 min, followed by one cycle of 72C for 3 min. The resultant PCR products were purified using solid phase reversible immobilization chemistry followed by dye-terminator fluorescent sequencing with universal M13 primers. PCR for sequencing was initiated at 95°C for 15 mins followed by: 40 cycles for 10 s, 50 cycles for 5 s, and finally, 60 cycles for 2 mins 30 s. Dye-terminator removal was performed using SPRI. Bidirectional Sanger sequencing of PCR fragments was carried out via capillary electrophoresis using ABI Prism 3730xl DNA analyzers (Applied Biosystems, Foster City, CA). SNP Discovery and Selection Sanger resequencing produced a total of 202 amplicons representing 40 genes. The package, PineSAP (Pine Sequence Alignment and SNP Identification) (Wegrzyn et al. 2009), applied a combination of ProbConsRNA (Do et al. 2005), Polyphred (Nickerson et al. 1997), Polybayes (Marth et al. 1999), and machine learning techniques to align sequences from 195 of the 202 amplicons and computationally identify 1,485 polymorphisms (an average of 7 SNPs/amplicon). SNP detection of the resulting calls was based on information gathered on quality scores, coverage, and alignment metrics computed during the sequence alignments. The identified polymorphisms and their flanking sequences were formatted for the GoldenGate assay (Illumina, San Diego, CA) and submitted to their in-house software package responsible for assigning design scores. An additional 1,233 SNPs from 232 genes were identified for population structure inference through eSNP methods utilizing ESTs from male and female catkin tissue aligned to the reference genome (Unneberg et al. 2005). To construct the 1,536 assay, we selected 948 high scoring SNPs from the 40 lignin/cellulose genes and 588 high scoring eSNPs from the 232 catkins ESTs. SNP Genotyping Genotyping was carried out using the Illumina GoldenGate SNP genotyping platform (Landegren et al. 1998; Oliphant et al. 2002; Fan et al. 2003; Eckert et al. 2009b) at the DNA Technologies Core Facility (UC Davis). The assay involves generating templates with specific target and address sequences using allele-specific extension followed by ligation and amplification with universal primers. Fluorescent products are hybridized to coded beads on an array matrix and signal intensities are subsequently determined using the BeadArray Reader (Illumina). Signal intensities are quantified and matched to specific alleles using BeadStudio v. 3.1.14 (Illumina). Manual adjustments to genotypic clusters were made when necessary. For inclusion of SNPs into the final data set, we used thresholds of 0.20 and 0.60 for the GenCall50 (GC50) and call rate (CR) indices, respectively (Table S2). These are established quality metrics that have been used to evaluate Illumina genotyping data (Pavy et al. 2008; Eckert et al. 2009b). The scores reflect the quality genotypic clusters (GC50) and the fraction of the samples that had a genotype defined for a particular SNP. Tests for Association Genetic Diversity, Population Structure and Linkage Disequilibrium For each SNP, we estimated expected and observed heterozygosity, Wright’s inbreeding coefficient (FIS) and hierarchical fixation indices using the Genetics and hierfstat packages available in R (Warnes and Leisch 2006; Goudet 2005; R Development Core Team, 2007). We excluded those SNPs with |FIS| > 0.25 from further analyses. The significance of multilocus fixation indices was tested via bootstrapping across loci (n = 10,000 replicates) to obtain 99% confidence intervals (99% CI). Patterns of population structure were further examined using principal components analysis (PCA). Population structure coefficients were estimated using Eigenstrat v. 2.0 (Price et al. 2006). For association analyses, a Q-matrix defined by significant principal components (PCs) as assessed using the Tracy-Widom distribution (Patterson et al. 2006) was utilized. Cluster membership was determined via hierarchical cluster analysis using Ward’s linkage and Euclidean distances on the significant PCs. The number of clusters was identified as k+1, where k is the number of significant PCs. We identified FST outliers using the bivariate distribution of expected heterozygosity and FST among inferred clusters observed for the 297 eSNPs to define the genome-wide expectation of background levels of genetic structure. Lignin SNPs falling outside this distribution were identified as FST outliers. Linkage disequilibrium (LD) was measured as the squared correlation of allele frequencies, r 2 (Hill and Robertson 1968), which is affected both by recombination and by differences in allele frequencies between sites. The r2 value between pairs of informative SNP sites in candidate genes was calculated using the Genetics package in R (Warnes and Leisch 2006; R Development Core Team, 2007). Patterns of LD were investigated among SNPs from 39 of the 40 candidate genes. CesA1A was not included in this analysis due to physical annotation differences in the reference genome. To assess the extent of LD in the sequenced genomic regions, the decay of LD with physical distance (base pairs) between SNP sites within each candidate locus and over all candidate genes was evaluated by nonlinear regression analysis of r2 values (Remington et al. 2001). The expectation of r2 for low mutation rates and taking into account sample size is given by: Where C is the population recombination parameter (P = 4Ner) and n the sample size and replaced C by C x distance in base pairs when fitting the formula to our data using the nonlinear regression (nls) function in the R (R Core Development Team, 2007). Statistical Models Single marker models were utilized for all SNP-trait combinations. A general linear model (GLM) was fitted to each trait-SNP combination (Yu et al. 2006), with SNP markers as fixed effects and elements of the Q-matrix as covariates. P-values were generated for each test using 10,000 permutations of genotypes with respect to phenotypic trait values. All analyses were conducted using TASSEL v. 2.0.1 (Bradbury et al. 2007). Corrections for multiple testing were performed using the positive false discovery rate (FDR) method (Storey 2002; Storey and Tibshirani 2003). All the necessary data to perform these analyses are available in Files S1 and S2. Modes of gene action were quantified using the ratio of dominance (d) to additive (a) effects estimated from least square means for each genotypic class. Partial or complete dominance was defined as values in the range of 0.50 < |d/a| < 1.25, while additive effects were defined as values in the range -0.50 ≤ d/a ≤ 0.50. Values of |d/a| > 1.25 were equated with over- or underdominance. Haplotypes were inferred and their frequencies were estimated using the modified expectation maximization (EM) method of haplotype inference included in the haplo.stats (v. 2.0.1) program available in R (Schaid et al. 2002; R Core Development Team, 2007). Singleton alleles were ignored when constructing the haplotypes, and haplotypes with frequency less than five were also discarded. Output in the form of global-score statistics and haplotype-specific scores were derived from generalized linear models. Corrections for multiple testing were performed using the positive false discovery rate (FDR) method (Storey 2002; Storey and Tibshirani 2003). Results Phenotype Wood samples were analyzed using pyrolysis molecular beam mass spectrometry (pyMBMS). The intensities of the major peaks assigned to lignin were summed in order to estimate the lignin content, Syringyl/Guaiacyl (S/G) ratios, C5 sugars, and C6 sugars across the range of samples (Table 2). Lignin content was calculated with peaks at mass to charge ratio (m/z) 124, 137, 138, 150, 152, 164, and 178; these were summed and then averaged for the different samples. S/G ratios were determined by summing S peaks at m/z 154, 167, 168, 182, 194, 208, and 210 then dividing by the sum of G peaks at m/z 124, 137, 138, 150, 164, and 178. C5 sugars were calculated as the sum of the peaks at m/z 57, 73, 85, 96, 114. Likewise, C6 sugars were calculated as the sum of the following peaks at m/z 57, 60, 73, 98, 126, 144. Visualization of each phenotype demonstrated a strongly bimodal distribution for the C5 trait as opposed to the distributions for the other three composite traits, which were approximately normal. As a result, C5 was not included in subsequent analyses. S/G ratios ranged from 1.2-2.4 while lignin content ranged from 15.8-27.5%. Genotyping Results The 1,536 SNPs chosen for genotyping using the Illumina GoldenGate platform represent 948 from 40 candidate genes (20 gene families and 202 amplicons) with seven to 65 SNPs per gene, and 588 from the 232 catkins ESTs (Table 1). Of the 1,536 SNPs, 874 (57%) yielded data consistent with our quality thresholds (579 candidate gene SNPs and 297 eSNPs). A conversion rate of 61% (579 SNPs) was observed among the 948 SNPs from the resequenced 40 lignin/cellulose candidate genes as opposed to 51% for the eSNPs. The median GC50 score across all usable SNPs was 0.71 and the median CR score was 0.72. Quality scores across the genotyped loci are summarized in Table S2. Distribution of the quality metrics for genotyped SNPs, grouped by dataset, is shown in Figure S1. The majority of the 579 successfully genotyped SNPs were silent, with nonsynonymous SNPs accounting for 19% of the total. Population Structure Principal components analysis on the 488 clones using 297 eSNPs revealed four significant PCs, explaining 10% of the overall variance. From these four PCs, five clusters were formed using hierarchical clustering with Ward’s linkage method. All five clusters illustrated a latitudinal trend, with the Columbia River, delineating a major geographical north-south separation (Fig. 1C). These five clusters also illustrated significant genetic structuring as estimated using FST, as well as significant differences among means for the three composite traits. The average FST was low for both sets, but greater for the lignocellulosic SNPs (FST = 0.034, 99% CI: 0.028-0.042) as opposed to the eSNPs (FST = 0.013, 99% CI: 0.011-0.016) SNPs. A comparison of the distribution of FST for each set revealed that 7 genes had values of FST greater than any observed for the eSNPs (Fig. S4). These outliers were concentrated within the CesA3A, CAD, SUSY1, 4CL1, CesA2B, TUB15 and CesA1B genes (Fig. S4). Polymorphisms within these genes had values of Fst approximately five to 10-fold greater than the multiple locus average. Cluster one, which was distributed primarily south of the Columbia River, also had significantly different means for lignin, S/G and C6 (ANOVA: P < 2.0 X 10-6; Tukey multiple comparison tests: P < 0.01). Additional summaries of genetic diversity across all SNPs and clusters are given in (Table S3, Figures: S1, S2, S3, and S4). Linkage Disequilibrium All r2 values were pooled to assess the overall behavior of LD for the candidate genes and to estimate the genome-wide degree of LD in black cottonwood. Figure 2B shows the extent of LD across the sequenced regions. The fitted curve indicates that LD is generally low in black cottonwood, rapidly decaying by over 50% (from 0.50 to 0.20) within a distance of ~200 bp (Figure 2B, 2C). Within candidate genes, the average distance associated with LD decline to r2 = 0.1 varies from c. 200 to c. 600 bp (Figure 2A). Overall summary of single SNP and haplotype based associations A total of 1,734 (579 SNPs x 3 traits) single marker association tests were performed. Of these, 65 were significant at the threshold of P < 0.05. Multiple test corrections using the FDR method reduced this number to 37 at a significance threshold of Q < 0.10. A total of 13 lignin content, 1 S/G, and 23 C6 sugar content associations were identified (Table 3). The 37 associations represent 27 unique SNPs from 40 candidate genes. Many of the 37 SNPs that exhibited significant associations with at least one trait were consistent with codominance (Table 4). Four of the 34 markers for which dominance and additive effects could be calculated were consistent with overdominance (|d/a| > 1.25). The remaining 30 markers were split between modes of gene action that were codominant (|d/a| < 0.50, 25) or partially to fully dominant (0.50 < |d/a| < 1.25, 5). Most effects were small to moderate and accounted for 10% to 78% of the phenotypic standard deviation. Among haplotype-based associations, a total of 181 amplicons were analyzed (after the removal of singletons) and 17 amplicons from 13 unique genes were significant with a global significance threshold of P < 0.05 (Table 5). Multiple test corrections using the FDR method reduced this number to 14 amplicons (13 unique genes and 71 haplotypes) at a global significance threshold of Q < 0.10. Lignin Associations Lignin composition was represented by averaging values of guaicyl precursor peaks. A total of 13 significant single marker associations were found for nine candidate genes associated with lignin content (Table 3). Three of the significant marker-trait associations were located in the coding region and 10 in the non-coding region. Two of the significant associations were nonsynonymous (C4H1, CESA2A) and one synonymous (HCT6). Individually, each of the 13 markers explained a small portion of the phenotypic variance, with effects ranging from 1.2% to 3.8%. Eleven significant haplotype associations from 10 unique genes were identified for lignin content (Table 5). Eight amplicons, representing seven unique genes, had at least one significant haplotype after multiple test corrections (Table 5). Three of the amplicons did not have significant individual haplotypes and included regions of three candidate genes (CCR, CesA2B, and TUA5). From the eight amplicons with at least one significant haplotype, just one (SUSY1) was supported with a single marker association in the same trait (SUSY1_02-108). The remaining five candidate genes (4CL1, 4CL3, CesA1B, CesA3A, and HCT1) had at least one supporting single marker association with a P value < 0.05 before multiple test corrections. S/G Ratio Associations The S/G ratio phenotype is a result of the seven S peaks to the six G peaks. Analysis of the S/G trait resulted in one significant marker-trait association (Table 3). This marker is non-coding and explained a small portion of the phenotypic variance (3.2%). Haplotype-based tests did not reveal significant associations. C6 Sugar Associations C6 sugars were represented by summing the values of six peaks. A total of 23 significant associations were found in 13 candidate genes associated with C6 sugars (Table 3). Four of the significant markertrait associations were located in coding regions. Three of these SNPs are synonymous for three different candidate genes (CESA1A, C4H2, HCT6) and one significant association was nonsynonymous (CESA2A). Four marker-trait associations in two candidate genes were highly significant and unique only to the C6 phenotype (SUSY1, CESA1B). All 23 markers explain a small portion of the phenotypic variance, with individual effects ranging from 1.1% to 3.7%. A total of three amplicons representing three unique candidate genes (4CL1, CesA1A, and SAM1) were significant in terms of haplotype-based associations with C6 (Table 5). All three amplicons were highly significant (Q < 0.05) with respect to C6 sugars and contained at least one significant individual haplotype after multiple test corrections (Q < 0.10). One candidate gene (CesA1A) contained a significant single maker association in the same amplicon and associated with the same trait. (CesA1A_12-40). Discussion Hybridization, molecular breeding and genetic engineering efforts are all under consideration to improve wood-based ethanol production. Strategies for the domestication of forest trees using either conventional or novel molecular breeding approaches are centered around the exploitation of existing genetic diversity. Over the past few decades, genetic maps have been made for many forest tree species and QTLs have been mapped for a range of traits, such as wood properties, with the aim of using genetic markers linked to QTLs to apply marker-based breeding programs (Brown et al. 2003). The lack of resolution in mapping candidate genes and QTL alleles can be overcome by association genetics, using natural populations in which the long evolutionary history has decreased the extent of LD in populations (Neale and Savolainen 2004). An important prerequisite for association mapping is the availability of large allelic variation in the population. LD describes a key aspect of genetic variation in natural populations of plants. This study is the first examination of genome-wide LD in black cottonwood and enables comparison with other poplars. We examined LD across 39 of the candidate genes (Figure 2B, 2C) and observed a rapid decay of LD within just a few hundred bp, indicating the potential of association genetics to identify genes responsible for variation in the trait. Previous studies in both P. tremula (five genes) and P. nigra (nine genes), showed a similar rapid decay of LD (Ingvarsson et al. 2005; Chu et al. 2009). This study examined both single marker associations and haplotype-based tests to account for information present in the associations between markers as well as directly between a SNP and the trait. Given the structure of our data, a natural way to apply the knowledge of LD within and between genes is to perform haplotype-based association tests. The power of a single marker association test is often limited because LD information contained in flanking markers is ignored. Intuitively, haplotypes (which are essentially a collection of ordered markers) may be more powerful than individual, non-ordered markers. This study demonstrates that the use of haplotypes can significantly increase the ability to map traits of interest. Candidate genes known to be involved in lignocellulosic cell wall development were examined for genetic associations. There are two major steps of lignin biosynthesis in plants: monolignol biosynthesis and the subsequent polymerization of lignin monomers to form polymers. This biochemical pathway is highly conserved throughout vascular plants, and many of the enzymes have been identified and characterized (Boerjan et al. 2003; Xu et al. 2009). The cellulose biosynthesis pathway involves the process involves synthesis and assembly of β-1,4 glucan chains at the RTC, and their orderly deposition to form cell wall microfibrils. Although several candidate genes have been identified, the precise molecular mechanism of cellulose biosynthesis and microfibril deposition in plants is still not clearly understood. Genetic improvement of lignin and cellulose biosynthesis in trees continues to be a major research priority. Similar to other commercial applications for black cottonwood, modified lignin structure (chemical reactivity), and increased cellulose content are desirable traits. Mechanisms that can increase C6 sugar content and decrease C5 sugar content of hemicelluloses are favorable for fermentation. The monolignol biosynthetic pathway involves many intermediates and enzymes (Boerjan et al. 2003). The first step in the process consists of a deamination of phenylalanine by the phenylalanine ammonialyase (PAL) that produces cinnamic acid. PAL is encoded by a small multigene family (Appert et al. 1994; Osakabe et al. 1995; Cochrane et al. 2004), and five isoforms have been annotated in the poplar genome (Tsai et al. 2006). In this study, markers in PAL2, PAL4, and PAL5 were genotyped. A singlemarker non-coding association was identified with PAL2 that explained 1.4% of the phenotypic variation in C6 sugars (Table 3). In aspen (P. tremuloides) stem, PAL2 transcripts have been localized to developing xylem cells, consistent with its involvement in lignin biosynthesis (Kao et al. 2002). C4H catalyzes the first oxidative reaction in phenylpropanoid metabolism, namely, the conversion of cinnamic acid to p-coumaric acid (Sewalt et al. 1997). Three C4H genes have been characterized in black cottonwood (Lu et al. 2006). C4H1 is proposed to be associated with G lignin deposition while C4H2 is thought to be involved in S lignin biosynthesis (Lu et al. 2006). Four unique single maker associations were identified in the C4H1 and C4H2 genes examined in this study. A significant non-synonymous association in exon 1 of C4H1 with lignin demonstrated modes of gene action consistent with additive effects (Table 3; Figure 4). The C allele at C4H1_02-219 is the minor allele and causes a histidine (H) proline (P) amino acid substitution. Heterozygotes for the marker had a percent value of lignin composition that was intermediate to either homozygote class (21.9% for A/A, 22.7% for A/C, 23.2% for C/C). A similar study in European maize identified a nonsynoymous SNP in the first exon of C4H1 associated with forage quality traits (Anderson et al. 2008). Physiological studies of these genes describe unique functions for the isoforms within the lignin biosynthetic pathway. 4-coumarate:CoA ligase (4CL), which catalyzes the formation of CoA esters of p-coumaric acid and its derivatives, have a pivotal role in channeling phenylpropanoid precursors into different downstream pathways, each leading to a variety of functionally distinct end products (Harding et al. 2002). 4CL is also encoded by multigene families, with 5 isoforms annotated in the poplar genome (Tsai et al. 2006). While we were unable to identify significant single marker associations in 4CL1, 4CL3, and 4CL5, significant associations with haplotypes in 4CL1 and 4CL3 were observed for both lignin and C6 traits. Of the five haplotypes (spanning 389 bp) in 4CL1_01, two significant associations demonstrated an effect on C6 sugar content (35.1% for AGA and 34.1% for AAA). In lignin composition, two haplotypes of 4CL1_11 demonstrated a difference of > 1% in lignin composition (Table 5; Figure 5B). Three single markers in 4CL_11 at P < 0.05 were found to be linkage disequilibrium and their individual genotypic effects on lignin composition were small in comparison to the spanning haplotype block (Figure 5B). Reducing 4CL expression in transgenic poplar has resulted in significant reductions of lignin, ranging from 5% to 45% (Hu et al. 1999; Li et al. 2003). Hydroxycinnamoyl-CoA transferase (HCT) is the most recently identified enzyme in monolignol biosynthesis and belongs to a large family of acyltransferases (Hoffmann et al., 2003). It catalyzes the conversion of p-coumaroyl-CoA and caffeoyl-CoA to their corresponding shikimate or quinate esters. Two of the six annotated HCT genes in the Populus genome (HCT1 and HCT6) are expressed in developing xylem (Tsai et al. 2006). HCT6_13-225 was a significant synonymous marker in both lignin and C6 (Table 3). Two significant haplotypes in HCT1_12 were associated with lignin composition (Table 5). HCT has not been transgenically manipulated in poplar, however RNAimediated silencing of HCT in conifers (Pinus radiata) that do not produce S lignin, had a strong impact on lignin content (42% reduction), monolignol composition, and interunit linkage distribution (Wagner et al. 2007). A similar study of HCT in Arabidopsis showed a reduction in lignin content and an increased G lignin deposition (Hoffmann et al. 2007). P-coumaroyl-CoA shikimate proceeds through a series of transformations into caffeoyl-CoA shikimate, caffeoyl-CoA, feruloyl-CoA, and coniferaldehyde by the action of the enzymes pcoumaroyl-CoA 3’-hydrolase (C3’H), HCT, caffeoyl-CoA O-methyltransferase (CCoAOMT), and cinnamoyl CoA reductase (CCR), respectively. CCoAOMT, catalyzing the methylation of caffeoylCoA to feruloyl-CoA, is critical in maintaining lignin structural integrity (Meyermans et al., 2000; Zhong et al., 2000). In the two independent studies referenced, antisense down-regulation of CCoAOMT1 in transgenic hybrid poplar (P. tremula x P. alba) resulted in reduced lignin content as well as altered S/G ratio. In this study, markers from CCoAOMT1 and CCoAMOT2 were genotyped. CCoAOMT1 had one significant non-coding SNP associated with C6 sugar content (Table 3). Cinnamoyl-CoA reductase (CCR) catalyzes the conversion of hydroxycinnamoyl-CoA esters (pcoumaroyl-CoA, feruloyl-CoA, sinapoyl-CoA) into their corresponding cinnamyl aldehydes (Pichon et al. 1998). Downregulation of CCR in transgenic poplar (P. tremula X P. alba) is associated with up to 50% reduced lignin content (Leple et al. 2007). In this study, a single non-coding two-state marker in CCR was found to be strongly associated with lignin composition (Table 3). A different amplicon in the same gene (CCR_12) was globally significant in terms of haplotype associations but did not report any significant individual haplotypes (Table 5). Haplotype associations were previously identified in eucalyptus with CCR in relation to wood property traits (Thumma et al. 2005). Coniferaldehyde can be converted to coniferyl alcohol by the action of CAD or to 5-hydroxyconiferaldehyde and sinapyl aldehyde by the action of ferulate 5-hydrolase (F5H) and caffeic/5hydroxyferulic acid O-methyltransferase (COMT). CAD catalyzes the reduction of p- hydroxycinnamaldehydes into their corresponding alcohols and is the last enzyme in monolignol biosynthesis. In this study, CAD_04-185, a non-coding marker, illustrated patterns of gene action consistent with additive effects in relation to S/G and C6 sugars. This was the only single marker association identified with S/G ratio. Three of the nine individual haploypes (spanning 407 bp) in the same amplicon of CAD were significant for lignin composition. Differences in genotypic effects on lignin content were minimal (22.2% for CAAAAT, 22.8% for CATAAT, and 22.5% for GATAAT). The CAD gene family has been extensively studied in Arabidopsis, rice, and poplar (Bakarat et al. 2009). Down-regulation of CAD in transgenic poplar did not affect the overall lignin content and composition, but led to an increased incorporation of the hydroxycinnamaldehydes into the lignin (Boucher et al. 1996; Pilate et al. 2002). Field trials of the CAD-deficient transgenic poplar showed improved Kraft pulping performance (Pilate et al. 2002). COMT was originally thought to be a bifunctional enzyme that sequentially methylated caffeic and 5-hydroxyferulic acids. More recently, it has been shown to act downstream in monolignol biosynthesis by methylating the aldehyde and alcohol backbones (Osakabe et al. 1999; Parvathi et al. 2001). In this study, markers from COMT1 and COMT2 were successfully genotyped (Table 1). A single non-coding COMT2 marker was identified as significant with C6 sugar content (Table 3). Suppression of COMT in both P. tremula x P. alba and P. tremuloides lines did not change lignin content but resulted in a reduction of the S/G lignin ratio (due to a decrease of S and an increase of G), as well as the incorporation of an abnormal, 5-hydroxyguaicyl unit into the lignin (Van Doorsselaere et al. 1995; Tsai et al. 1998). After their biosynthesis, monolignols are transported from the cytoplasm to the cell wall and polymerized to a lignin matrix. The molecular mechanisms and the proteins responsible for transport and polymerization are not fully characterized. In the cell wall, the monolignols are oxidized to their radicals and polymerized. Laccases (Lac), peroxidases and other phenol oxidases have long been thought to be involved in this polymerization (Baucher et al., 2003), but conclusive evidence for their role is still lacking. In our study, we examined Lac1a, Lac2, and Lac90a. Lac1a was found to have two non-coding single marker associations with C6 sugars (Table 3). In poplars, several laccases (Ranocha et al., 1999) were cloned and characterized. At least eight of these laccases were identified in association with lignin biosynthetic pathways by microarray analysis (Andersson-Gunnerås et al. 2006). Subsequent studies with antisense Lac3 in transgenic hybrid poplar showed little variation in lignin content, however, the soluble phenolics and structure of the secondary wall were altered (Ranocha et al. 2002). Variations in the quantity and quality of cellulose in plants is suspected to be a primary result of enzymatic activities of different types of cellulose synthases (CesAs) (Haigler and Blanton, 1996). The CesA gene family contains 17 members in the sequenced poplar genome, five of which are highly expressed during wood formation (Djerbi et al. 2004; Joshi et al. 2004; Suzuki et al. 2006, Kumar et al. 2009). All five isoforms were evaluated for association in this study (CesA1A, CesA1B, CesA2A, CesA2B, and CesA3A), and all had at least one single marker or haplotype association (Table 1). In lignin and C6 sugars traits, the same nonsynonymous marker in the 6th exon of CesA2A was strongly associated. The G allele at CesA2A is the minor allele and causes an isoleucine (I) valine (V) amino acid substitution (Table 3). The genotypic effects of the two-state SNP are shown in Figure 3B. In lignin traits, the differences in content were significant (22% for AA and 23.6% for AG); the same is true for C6 sugar content (34.9% for AA and 32.1% for AG). Three single marker associations between CesA1B and lignin composition were identified (Table 3; Figure 3). Two of these three non-coding SNPs were also associated with C6 sugar content. CesA1B_10 had one significant haplotype associated with lignin composition. CesA1A had two non-coding and one synonymous association (CesA1A_12-40) for C6 sugars. One of the non-coding SNPs (CesA1A_20-226) was also associated with lignin content. CesA3A had two different amplicons with significant haplotype associations with lignin. Three significant haplotypes from six were highly associated in CesA1A_12 (spanning 183 bp) and their genotypic effects on C6 were also significant (33.6% for AGA, 34.2% for AAA, 35.3% for GAG) (Table 5). CesA proteins in the RTC use cytosolic uridine diphosphate (UDP)-glucose as substrate, which is provided directly by particulate sucrose synthase (SUSY) (Haigler et al. 2001). This enzyme produces UDP-glucose and fructose from sucrose and UDP. Of the six SUSY genes annotated in the poplar genome, only two were highly expressed in wood-forming tissues based on microarray analysis (Geisler-Lee et al. 2006; Meng et al. 2006). In this study, amplicons from SUSY1 were successfully genotyped (Table 1). Single-marker tests with SUSY1 revealed six non-coding associations with C6 and two with lignin composition (Table 3). Two of the three individual haplotypes (spanning 386 bp) identified in SUSY1_02 were significant. Genotypic differences between haplotypes were observed for lignin composition (21.8% for AAAA and 22.9% for TGGG) (Table 5). Three of the four markers that compose the SUSY1_02 haplotype are in strong LD (Figure 5A). Recently, over-expression of SUSY in transgenic poplar has led to an increase in both cellulose production and cellulose crystallinity (Coleman et al. 2009), confirming previous suggestion that SUSY could be one of the limiting steps of cellulose biosynthesis (Tang and Strum, 1999; Haigler et al. 2001). This study represents the most comprehensive evaluation of LD and genetic association in poplars. High-throughput genotyping technologies and the vast genomic resources in black cottonwood allowed a large number of candidate genes to be evaluated for associations with the lignocellulosic cell wall development. The genes studied are those known to be associated with these pathways and ones that have been extensively studied for commercial applications, such as pulp and feedstock production, and are now being further evaluated for improvement in relation to biofuels production. Given the rapid decay of within-gene LD in black cottonwood and the high coverage of amplicons across each gene, it is likely the numerous polymorphisms identified are in close proximity to the causative SNPs and the haplotype associations accurately reflect the information present in the associations between markers. This study demonstrates that a forward genetics approach (association genetics) can be used to discover naturally occurring allelic variation in genes associated with commercially important traits, in this case, lignin and cellulose biosynthesis. Many of the same genes were implicated using reverse genetics approaches, however, the association approach provides estimates of the size of effects of these alleles on a phenotype. Understanding the size of the effects as well as the existing variation is critical in applying the knowledge gained on a particular SNP to marker-based breeding programs with goals to increase cellulose yield and therefore cellulosic ethanol production. Given the increasing efficiency and lowering costs of sequencing and genotyping technologies, the goal of resequencing the genome and relating the polymorphisms to a trait of interest is now feasible. Acknowledgements: We thank Charles Nicolet and Vanessa Rashbrook for performing the SNP genotyping, and John Liechty and Benjamin Figueroa for bioinformatics support. Funding for this project was made available through the Chevron Technology Ventures-UC Davis Biofuels Project. ANDERSSON-GUNNERAS, S., E. J. MELLEROWICZ, J. LOVE, B. SEGERMAN, Y. OHMIYA et al., 2006 Biosynthesis of cellulose-enriched tension wood in Populus: global analysis of transcripts and metabolites identifies biochemical and developmental regulators in secondary wall biosynthesis. Plant Journal 45: 144-165. APPERT, C., E. LOGEMANN, K. HAHLBROCK, J. SCHMID and N. AMRHEIN, 1994 Structural and Catalytic Properties of the 4 Phenylalanine Ammonia-Lyase Isoenzymes from Parsley (PetroselinumCrispum Nym). European Journal of Biochemistry 225: 491-499. BATE, N. J., J. ORR, W . T. NI, A. MEROMI, T. NADLERHASSAR et al., 1994 Quantitative Relationship between Phenylalanine Ammonia-Lyase Levels and Phenylpropanoid Accumulation in Transgenic Tobacco Identifies a Rate-Determining Step in Natural Product Synthesis. Proceedings of the National Academy of Sciences of the United States of America 91: 76087612. BAUCHER, M., B. CHABBERT, G. PILATE, J. VANDOORSSELAERE, M. T. TOLLIER et al., 1996 Red xylem and higher lignin extractability by down-regulating a cinnamyl alcohol dehydrogenase in poplar. Plant Physiology 112: 1479-1490. BAUCHER, M., C. HALPIN, M. PETIT-CONIL and W . BOERJAN, 2003 Lignin: Genetic engineering and impact on pulping. Critical Reviews in Biochemistry and Molecular Biology 38: 305-350. BEAUMONT, M. A., W . ZHANG and D. J. BALDING, 2002 Approximate Bayesian computation in population genetics. Genetics 162: 2025-2035. BISHAI, J. M., W . MITZNER, C. G. TANKERSLEY and E. M. W AGNER, 2007 PEEP-induced changes in epithelial permeability in inbred mouse strains. Respir Physiol Neurobiol 156: 340-344. BOERJAN, W ., 2005 Biotechnology and the domestication of forest trees. Curr Opin Biotechnol 16: 159166. BOERJAN, W ., J. RALPH and M. BAUCHER, 2003 Lignin biosynthesis. Annual Review of Plant Biology 54: 519-546. BOUDET, A. M., S. KAJITA, J. GRIMA-PETTENATI and D. GOFFNER, 2003 Lignins and lignocellulosics: a better control of synthesis for new and improved uses. Trends Plant Sci 8: 576-581. BRADBURY, P. J., Z. ZHANG, D. E. KROON, T. M. CASSTEVENS, Y. RAMDOSS et al., 2007 TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23: 2633-2635. BRADSHAW, H. D., R. CEULEMANS, J. DAVIS and R. STETTLER, 2000 Emerging Model Systems in Plant Biology: Poplar (Populus) as a Model Forest Tree. Journal of Plant Growth Regulation 19: 306313. BROWN, G. R., D. L. BASSONI, G. P. GILL, J. R. FONTANA, N. C. W HEELER et al., 2003 Identification of quantitative trait loci influencing wood property traits in loblolly pine (Pinus taeda L.). III. QTL Verification and candidate gene mapping. Genetics 164: 1537-1546. BROWN, G. R., G. P. GILL, R. J. KUNTZ, C. H. LANGLEY and D. B. NEALE, 2004 Nucleotide diversity and linkage disequilibrium in loblolly pine. Proc Natl Acad Sci U S A 101: 15255-15260. CHU, Y., X. SU, Q. HUANG and X. ZHANG, 2009 Patterns of DNA sequence variation at candidate gene loci in black poplar (Populus nigra L.) as revealed by single nucleotide polymorphisms. Genetica 137: 141-150. COCHRANE, F. C., L. B. DAVIN and N. G. LEWIS, 2004 The Arabidopsis phenylalanine ammonia lyase gene family: kinetic characterization of the four PAL isoforms. Phytochemistry 65: 1557-1564. COLEMAN, H. D., J. YAN and S. D. MANSFIELD, 2009 Sucrose synthase affects carbon partitioning to increase cellulose production and altered cell wall ultrastructure. Proceedings of the National Academy of Sciences 106: 13118-13123. DAVIS, J. M., 2008 Genetic Improvement of Poplar (Populus spp.) as a Bioenergy Crop, pp. 397-419 in Genetic Imrpovement of Bioenergy Crops, edited by W . VERMERRIS. Springer New York, New York. DAVISON, B. H., S. R. DRESCHER, G. A. TUSKAN, M. F. DAVIS and N. P. NGHIEM, 2006 Variation of S/G ratio and lignin content in a Populus family influences the release of xylose by dilute acid hydrolysis. Appl Biochem Biotechnol 129-132: 427-435. DIXON, R. A., and M. S. S. REDDY, 2003 Biosynthesis of monolignols. Genomic and reverse genetic approaches. Phytochemistry Reviews 2: 289-306. DJERBI, S., M. LINDSKOG, L. ARVESTAD, F. STERKY and T. T. TEERI, 2005 The genome sequence of black cottonwood (Populus trichocarpa) reveals 18 conserved cellulose synthase (CesA) genes. Planta 221: 739-746. DJERBI, S., M. LINDSKOG, L. ARVESTAD, F. STERKY and T. T. TERRI, 2005 The genome sequence of black cottonwood (Populus trichocarpa) reveals 18 conserved cellulose synthase (CesA) genes. Planta 221: 8. DO, C. B., M. S. P. MAHABHASHYAM, M. BRUDNO and S. BATZOGLOU, 2005 ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15: 330-340. ECKERT, A. J., A. D. BOWER, J. L. W EGRZYN, B. PANDE, K. D. JERMSTAD et al., 2009 Association genetics of coastal Douglas fir (Pseudotsuga menziesii var. menziesii, Pinaceae). I. Cold-hardiness related traits. Genetics 182: 1289-1302. ECKERT, A. J., J. L. W EGRZYN, B. PANDE, K. D. JERMSTAD, J. M. LEE et al., 2009 Multilocus patterns of nucleotide diversity and divergence reveal positive selection at candidate genes related to cold hardiness in coastal Douglas Fir (Pseudotsuga menziesii var. menziesii). Genetics 183: 289298. ECKERT, C. G., K. E. SAMIS and S. C. LOUGHEED, 2008 Genetic variation across species' geographical ranges: the central-marginal hypothesis and beyond. Mol Ecol 17: 1170-1188. ELKIND, Y., R. EDWARDS, M. MAVANDAD, S. A. HEDRICK, O. RIBAK et al., 1990 Abnormal-Plant Development and down-Regulation of Phenylpropanoid Biosynthesis in Transgenic Tobacco Containing a Heterologous Phenylalanine Ammonia-Lyase Gene. Proceedings of the National Academy of Sciences of the United States of America 87: 9057-9061. FAN, J. B., A. OLIPHANT, R. SHEN, B. G. KERMANI, F. GARCIA et al., 2003 Highly parallel SNP genotyping. Cold Spring Harb Symp Quant Biol 68: 69-78. FEDER, M. E., and T. MITCHELL-OLDS, 2003 Evolutionary and ecological functional genomics. Nat Rev Genet 4: 651-657. FRANKE, R., M. R. HEMM, J. W. DENAULT, M. O. RUEGGER, J. M. HUMPHREYS et al., 2002 Changes in secondary metabolism and deposition of an unusual lignin in the ref8 mutant of Arabidopsis. Plant Journal 30: 47-59. FU, Y. X., and W . H. LI, 1993 Statistical tests of neutrality of mutations. Genetics 133: 693-709. GARCIA, M. V., and P. K. INGVARSSON, 2007 An excess of nonsynonymous polymorphism and extensive haplotype structure at the PtABI1B locus in European aspen (Populus tremula): a case of balancing selection in an obligately outcrossing plant? Heredity 99: 381-388. GEISLER-LEE, J., M. GEISLER, P. M. COUTINHO, B. SEGERMAN, N. NISHIKUBO et al., 2006 Poplar carbohydrate-active enzymes. Gene identification and expression analyses. Plant Physiology 140: 946-962. GILL, G. P., G. R. BROWN and D. B. NEALE, 2003 A sequence mutation in the cinnamyl alcohol dehydrogenase gene associated with altered lignification in loblolly pine. Plant Biotechnol J 1: 253-258. GONZALEZ-MARTINEZ, S. C., E. ERSOZ, G. R. BROWN, N. C. W HEELER and D. B. NEALE, 2006 DNA sequence variation and selection of tag single-nucleotide polymorphisms at candidate genes for droughtstress response in Pinus taeda L. Genetics 172: 1915-1926. GONZALEZ-MARTINEZ, S. C., D. HUBER, E. ERSOZ, J. M. DAVIS and D. B. NEALE, 2008 Association genetics in Pinus taeda L. II. Carbon isotope discrimination. Heredity 101: 19-26. GONZALEZ-MARTINEZ, S. C., N. C. W HEELER, E. ERSOZ, C. D. NELSON and D. B. NEALE, 2007 Association genetics in Pinus taeda L. I. Wood property traits. Genetics 175: 399-409. GOUDET, J., 2005 HIERFSTAT, a package for R to compute and test hierarchical F-statistics. Molecular Ecology Notes 5: 184-186. GROOVER, A. T., 2007 Will genomics guide a greener forest biotech? Trends Plant Sci 12: 234-238. GUESS, H. A., and W . J. EWENS, 1972 Theoretical and simulation results relating to the neutral allele theory. Theor Popul Biol 3: 434-447. HAIGLER, C. H., and R. L. BLANTON, 1996 New hope for old dreams: Evidence that plant cellulose synthase genes have finally been identified. Proceedings of the National Academy of Sciences of the United States of America 93: 12082-12085. HAIGLER, C. H., M. IVANOVA-DATCHEVA, P. S. HOGAN, V. V. SALNIKOV, S. HWANG et al., 2001 Carbon partitioning to cellulose synthesis. Plant Molecular Biology 47: 29-51. HARDING, S. A., J. LESHKEVICH, V. L. CHIANG and C. J. TSAI, 2002 Differential substrate inhibition couples kinetically distinct 4-coumarate : coenzyme A ligases with spatially distinct metabolic roles in quaking aspen. Plant Physiology 128: 428-438. HILL, W. G., and A. ROBERTSON, 1968 Effects of Inbreeding at Loci with Heterozygote Advantage. Genetics 60: 615-&. HOFFMANN, B., B. CHABBERT, B. MONTIES and T. SPECK, 2003 Mechanical, chemical and X-ray analysis of wood in the two tropical lianas Bauhinia guianensis and Condylocarpon guianense: variations during ontogeny. Planta 217: 32-40. HOFFMANN, L., S. BESSEAU, P. GEOFFROY, C. RITZENTHALER, D. MEYER et al., 2004 Silencing of hydroxycinnamoy-coenzyme A shikimate/quinate hydroxycinnamoyltransferase affects phenylpropanoid biosynthesis. Plant Cell 16: 1446-1465. HOFFMANN, L., S. MAURY, F. MARTZ, P. GEOFFROY and M. LEGRAND, 2003 Purification, cloning, and properties of an acyltransferase controlling shikimate and quinate ester intermediates in phenylpropanoid metabolism. Journal of Biological Chemistry 278: 95-103. HU, W . J., S. A. HARDING, J. LUNG, J. L. POPKO, J. RALPH et al., 1999 Repression of lignin biosynthesis promotes cellulose accumulation and growth in transgenic trees. Nature Biotechnology 17: 808-812. HUDSON, P. J., A. P. DOBSON, I. M. CATTADORI, D. NEWBORN, D. T. HAYDON et al., 2002 Trophic interactions and population growth rates: describing patterns and identifying mechanisms. Philos Trans R Soc Lond B Biol Sci 357: 1259-1271. INGVARSSON, P. K., 2005 Nucleotide polymorphism and linkage disequilibrium within and among natural populations of European aspen (Populus tremula L., Salicaceae). Genetics 169: 945-953. INGVARSSON, P. K., 2009 Natural selection on synonymous and non-synonymous mutations shape patterns of polymorphism in Populus tremula. Mol Biol Evol. INGVARSSON, P. K., M. V. GARCIA, V. LUQUEZ, D. HALL and S. JANSSON, 2008 Nucleotide polymorphism and phenotypic associations within and around the phytochrome B2 Locus in European aspen (Populus tremula, Salicaceae). Genetics 178: 2217-2226. JOSHI, C. P., S. BHANDARI, P. RANJAN, U. C. KALLURI, X. LIANG et al., 2004 Genomics of cellulose biosynthesis in poplars. New Phytologist 164: 53-61. KAO, Y. Y., S. A. HARDING and C. J. TSAI, 2002 Differential expression of two distinct phenylalanine ammonia-lyase genes in condensed tannin-accumulating and lignifying cells of quaking aspen. Plant Physiology 130: 796-807. KELLEHER, C. T., R. CHIU, H. SHIN, I. E. BOSDET, M. I. KRZYWINSKI et al., 2007 A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation. Plant J 50: 1063-1078. KUMAR, M., S. THAMMANNAGOWDA, V. BULONE, V. CHIANG, K. H. HAN et al., 2009 An update on the nomenclature for the cellulose synthase genes in Populus. Trends in Plant Science 14: 248254. LANDEGREN, U., M. NILSSON and P. Y. KWOK, 1998 Reading bits of genetic information: methods for single-nucleotide polymorphism analysis. Genome Res 8: 769-776. LAPIERRE, C., B. POLLET, M. PETIT-CONIL, G. TOVAL, J. ROMERO et al., 1999 Structural alterations of lignins in transgenic poplars with depressed cinnamyl alcohol dehydrogenase or caffeic acid Omethyltransferase activity have an opposite impact on the efficiency of industrial kraft pulping. Plant Physiology 119: 153-163. LEPLE, J. C., R. DAUWE, K. MORREEL, V. STORME, C. LAPIERRE et al., 2007 Downregulation of cinnamoylcoenzyme A reductase in poplar: multiple-level phenotyping reveals effects on cell wall polymer metabolism and structure. Plant Cell 19: 3669-3691. LI, L., Y. ZHOU, X. CHENG, J. SUN, J. M. MARITA et al., 2003 Combinatorial modification of multiple lignin traits in trees through multigene cotransformation. Proc Natl Acad Sci U S A 100: 4939-4944. LI, Y., S. KAJITA, S. KAWAI, Y. KATAYAMA and N. MOROHOSHI, 2003 Down-regulation of an anionic peroxidase in transgenic aspen and its effect on lignin characteristics. J Plant Res 116: 175182. LU, S. F., Y. H. ZHOU, L. G. LI and V. L. CHIANG, 2006 Distinct roles of cinnamate 4-hydroxylase genes in Populus. Plant and Cell Physiology 47: 905-914. MACKAY, J., D. R. DIMMEL and J. J. BOON, 2001 Pyrolysis mass spectral characterization of wood from CAD-deficient pine. Journal of Wood Chemistry and Technology 21: 19-29. MARCHINI, J., L. R. CARDON, M. S. PHILLIPS and P. DONNELLY, 2004 The effects of human population structure on large genetic association studies. Nat Genet 36: 512-517. MARJORAM, P., J. MOLITOR, V. PLAGNOL and S. TAVARE, 2003 Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100: 15324-15328. MARTH, G. T., I. KORF, M. D. YANDELL, R. T. YEH, Z. J. GU et al., 1999 A general approach to singlenucleotide polymorphism discovery. Nature Genetics 23: 452-456. MENG, M., M. GEISLER, H. JOHANSSON, E. J. MELLEROWICZ, S. KARPINSKI et al., 2007 Differential tissue/organ-dependent expression of two sucrose- and cold-responsive genes for UDPglucose pyrophosphorylase in Populus. Gene 389: 186-195. MEYERMANS, H., K. MORREEL, C. LAPIERRE, B. POLLET, A. DE BRUYN et al., 2000 Modifications in Lignin and Accumulation of Phenolic Glucosides in Poplar Xylem upon Down-regulation of CaffeoylCoenzyme A O-Methyltransferase, an Enzyme Involved in Lignin Biosynthesis. J. Biol. Chem. 275: 36899-36909. NEALE, D. B., and O. SAVOLAINEN, 2004 Association genetics of complex traits in conifers. Trends Plant Sci 9: 325-330. NICKERSON, D. A., V. O. TOBE and S. L. TAYLOR, 1997 PolyPhred: Automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Research 25: 2745-2751. OAKLEY, R. V., Y. S. W ANG, W . RAMAKRISHNA, S. A. HARDING and C. J. TSAI, 2007 Differential expansion and expression of alpha- and beta-tubulin gene families in Populus. Plant Physiol 145: 961973. OLIPHANT, A., D. L. BARKER, J. R. STUELPNAGEL and M. S. CHEE, 2002 BeadArray technology: enabling an accurate, cost-effective approach to high-throughput genotyping. Biotechniques Suppl: 56-58, 60-51. OSAKABE, K., C. C. TSAO, L. G. LI, J. L. POPKO, T. UMEZAWA et al., 1999 Coniferyl aldehyde 5hydroxylation and methylation direct syringyl lignin biosynthesis in angiosperms. Proceedings of the National Academy of Sciences of the United States of America 96: 8955-8960. OSAKABE, Y., Y. OHTSUBO, S. KAWAI, Y. KATAYAMA and N. MOROHOSHI, 1995 Structure and TissueSpecific Expression of Genes for Phenylalanine Ammonia-Lyase from a Hybrid Aspen, Populus-Kitakamiensis. Plant Science 105: 217-226. PARVATHI, K., F. CHEN, D. J. GUO, J. W. BLOUNT and R. A. DIXON, 2001 Substrate preferences of Omethyltransferases in alfalfa suggest new pathways for 3-O-methylation of monolignols. Plant Journal 25: 193-202. PAVY, N., B. PELGAS, S. BEAUSEIGLE, S. BLAIS, F. GAGNON et al., 2008 Enhancing genetic mapping of complex genomes through the design of highly-multiplexed SNP arrays: application to the large and unsequenced genomes of white spruce and black spruce. BMC Genomics 9: 21. PETER, G., and D. NEALE, 2004 Molecular basis for the evolution of xylem lignification. Curr Opin Plant Biol 7: 737-742. PICHON, M., I. COURBOU, M. BECKERT, A. M. BOUDET and J. GRIMA-PETTENATI, 1998 Cloning and characterization of two maize cDNAs encoding Cinnamoyl-CoA Reductase (CCR) and differential expression of the corresponding genes. Plant Molecular Biology 38: 671-676. PILATE, G., E. GUINEY, K. HOLT, M. PETIT-CONIL, C. LAPIERRE et al., 2002 Field and pulping performances of transgenic trees with altered lignification. Nature Biotechnology 20: 607-612. PINCON, G., M. CHABANNES, C. LAPIERRE, B. POLLET, K. RUEL et al., 2001 Simultaneous down-regulation of caffeic/5-hydroxy ferulic acid-O-methyltransferase I and cinnamoyl-coenzyme a reductase in the progeny from a cross between tobacco lines homozygous for each transgene. Consequences for plant development and lignin synthesis. Plant Physiology 126: 145-155. PLOMION, C., G. LEPROVOST and A. STOKES, 2001 Wood formation in trees. Plant Physiol 127: 15131523. POKE, F. S., R. E. VAILLANCOURT, R. C. ELLIOTT and J. B. REID, 2003 Sequence variation in two lignin biosynthesis genes, cinnamoyl CoA reductase (CCR) and cinnamyl alcohol dehydrogenase 2 (CAD2). Molecular Breeding 12: 107-118. PRICE, A. L., N. J. PATTERSON, R. M. PLENGE, M. E. W EINBLATT, N. A. SHADICK et al., 2006 Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 38: 904-909. RAGAUSKAS, A. J., C. K. WILLIAMS, B. H. DAVISON, G. BRITOVSEK, J. CAIRNEY et al., 2006 The path forward for biofuels and biomaterials. Science 311: 484-489. RALPH, J., T. AKIYAMA, H. KIM, F. C. LU, P. F. SCHATZ et al., 2006 Effects of coumarate 3-hydroxylase down-regulation on lignin structure. Journal of Biological Chemistry 281: 8843-8853. RALPH, S., C. ODDY, D. COOPER, H. YUEH, S. JANCSIK et al., 2006 Genomics of hybrid poplar (Populus trichocarpax deltoides) interacting with forest tent caterpillars (Malacosoma disstria): normalized and full-length cDNA libraries, expressed sequence tags, and a cDNA microarray for the study of insect-induced defences in poplar. Mol Ecol 15: 1275-1297. RANOCHA, P., M. CHABANNES, S. CHAMAYOU, S. DANOUN and A. JAUNEAU, 2002 Laccase down-regulation causes alterations in phenolic metabolism and cell wall structure in poplar. Plant Physiol. 129: 145. RANOCHA, P., G. MCDOUGALL, S. HAWKINS, R. STERJIADES, G. BORDERIES et al., 1999 Biochemical characterization, molecular cloning and expression of laccases - a divergent gene family - in poplar. European Journal of Biochemistry 259: 485-495. REMINGTON, D. L., J. M. THORNSBERRY, Y. MATSUOKA, L. M. W ILSON, S. R. W HITT et al., 2001 Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc Natl Acad Sci U S A 98: 11479-11484. RUBIN, E. M., 2008 Genomics of cellulosic biofuels. Nature 454: 841-845. SCHAID, D. J., C. M. ROWLAND, D. E. TINES, R. M. JACOBSON and G. A. POLAND, 2002 Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 70: 425-434. SCHRADER, J., J. NILSSON, E. MELLEROWICZ, A. BERGLUND, P. NILSSON et al., 2004 A high-resolution transcript profile across the wood-forming meristem of poplar identifies potential regulators of cambial stem cell identity. Plant Cell 16: 2278-2292. SEWALT, V. J. H., W . T. NI, H. G. JUNG and R. A. DIXON, 1997 Lignin impact on fiber degradation: Increased enzymatic digestibility of genetically engineered tobacco (Nicotiana tabacum) stems reduced in lignin content. Journal of Agricultural and Food Chemistry 45: 1977-1983. STERKY, F., R. R. BHALERAO, P. UNNEBERG, B. SEGERMAN, P. NILSSON et al., 2004 A Populus EST resource for plant functional genomics. Proceedings of the National Academy of Sciences of the United States of America 101: 13951-13956. STERKY, F., S. REGAN, J. KARLSSON, M. HERTZBERG, A. ROHDE et al., 1998 Gene discovery in the woodforming tissues of poplar: analysis of 5, 692 expressed sequence tags. Proc Natl Acad Sci U S A 95: 13330-13335. STINCHCOMBE, J. R., and H. E. HOEKSTRA, 2008 Combining population genomics and quantitative genetics: finding the genes underlying ecologically important traits. Heredity 100: 158-170. STOREY, J. D., 2002 A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B (Methodological) 64: 479-498. STOREY, J. D., and R. TIBSHIRANI, 2003 Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100: 9440-9445. STRAUSS, S. H., and F. M. MARTIN, 2004 Poplar genomics comes of age. New Phytologist 164: 1-4. SUZUKI, S., L. G. LI, Y. H. SUN and V. L. CHIANG, 2006 The cellulose synthase gene superfamily and biochemical functions of xylem-specific cellulose synthase-like genes in Populus trichocarpa. Plant Physiology 142: 1233-1245. TAYLOR, G., 2002 Populus: arabidopsis for forestry. Do we need a model tree? Ann Bot 90: 681-689. THUMMA, B. R., M. F. NOLAN, R. EVANS and G. F. MORAN, 2005 Polymorphisms in cinnamoyl CoA reductase (CCR) are associated with variation in microfibril angle in Eucalyptus spp. Genetics 171: 1257-1265. TSAI, C. J., S. A. HARDING, T. J. TSCHAPLINSKI, R. L. LINDROTH and Y. YUAN, 2006 Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus. New Phytologist 172: 47-62. TSAI, C. J., J. L. POPKO, M. R. MIELKE, W . J. HU, G. K. PODILA et al., 1998 Suppression of Omethyltransferase gene by homologous sense transgene in quaking aspen causes red-brown wood phenotypes. Plant Physiology 117: 101-112. TUSKAN, G. A., S. DIFAZIO, S. JANSSON, J. BOHLMANN, I. GRIGORIEV et al., 2006 The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313: 1596-1604. UNNEBERG, P., M. STROMBERG and F. STERKY, 2005 SNP discovery using advanced algorithms and neural networks. Bioinformatics 21: 2528-2530. VAN DOORSSELAERE, J., M. BAUCHER, E. CHOGNOT, B. CHABBERT, M. T. TOLLIER et al., 1995 A novel lignin in poplar trees with a reduced caffeic acid 5-hydroxyferulic acid O-methyltransferase activity. Plant Journal 8: 855-864. WAGNER, A., J. RALPH, T. AKIYAMA, H. FLINT, L. PHILLIPS et al., 2007 Exploring lignification in conifers by silencing hydroxycinnamoyl-CoA : shikimate hydroxycinnamoyltransferase in Pinus radiata. Proceedings of the National Academy of Sciences of the United States of America 104: 1185611861. WEGRZYN, J. L., J. M. LEE, J. LIECHTY and D. B. NEALE, 2009 PineSAP--sequence alignment and SNP identification pipeline. Bioinformatics 25: 2609-2610. WHETTEN, R. W ., J. J. MACKAY and R. R. SEDEROFF, 1998 Recent Advances in Understanding Lignin Biosynthesis. Annu Rev Plant Physiol Plant Mol Biol 49: 585-609. WULLSCHLEGER, S. D., S. JANSSON and G. TAYLOR, 2002 Genomics and forest biology: Populus emerges as the perennial favorite. Plant Cell 14: 2651-2655. XU, Z., D. ZHANG, J. HU, X. ZHOU, X. YE et al., 2009 Comparative genome analysis of lignin biosynthesis gene families across the plant kingdom. BMC Bioinformatics 10 Suppl 11: S3. YU, X. Q., H. W . MEI, L. J. LUO, G. L. LIU, H. Y. LIU et al., 2006 Dissection of additive, epistatic effect and Q x E interaction of quantitative trait loci influencing stigma exsertion under water stress in rice. Yi Chuan Xue Bao 33: 542-550. ZHAO, H. Y., J. H. W EI, J. Y. ZHANG, H. R. LIU, T. WANG et al., 2002 Lignin biosynthesis by suppression of two O-methyltransferases. Chinese Science Bulletin 47: 1092-1095. ZHONG, R. Q., W . H. MORRISON, D. S. HIMMELSBACH, F. L. POOLE and Z. H. YE, 2000 Essential role of caffeoyl coenzyme A O-methyltransferase in lignin biosynthesis in woody poplar plants. Plant Physiology 124: 563-577. Table 1. Details of Candidate Genes selected for Resequencing 10 3 3 3 2 3 7 5 10 10 6 6 7 5 SNPs Targeted 42 9 19 10 9 15 56 21 43 65 25 32 38 33 SNPs Converted 28 5 15 10 7 10 30 17 27 39 19 18 29 18 5 4 5 2 5 5 4 4 3 3 6 3 6 3 3 4 3 3 3 3 4 4 4 3 6 3 27 12 15 7 29 17 24 24 20 17 26 14 24 12 19 27 13 18 14 24 26 17 14 9 36 10 18 9 13 5 16 11 11 11 12 9 14 10 16 9 11 18 7 12 7 16 14 13 9 5 22 8 Gene Amplicons + 4CL1 4CL3+ 4CL5 C3H3 C4H1* C4H2* CAD*+ CCR*+ CesA1A*+ CesA1B*+ CesA2A* CesA2B*+ CesA3A+ HCT1*+ HCT6* KOR1* LAC1A* LAC2 LAC90A PAL2* PAL4 PAL5 SAM1*+ SHMT1 SHMT3 SHMT6 SUSY1*+ TUA1 TUA5+ TUB15 TUB16 TUB9 CoAOMT1* CoAOMT2 COMT1 COMT2* F5H1 F5H2 gdcH1 gdcT2 Gene Family 4-coumarate:CoA ligase (4CL) coumarate 3-hydroxylase (C3H) cinnamate 4-hydroxylase (C4H) cinnamyl alcohol dehydrogenase (CAD) cinnamoyl-CoA reductase (CCR) cellulose synthase (CesA) hydroxcinnamoyl-CoA quinate/shikimate hydroxycinnamolytransferase (HCT) cellulase (KOR) laccase (LAC) phenylalanine ammonia-lyase (PAL) S-adenosylmethionine synthetase (SAMS) Serine hydroxymethyltransferase (SHMT) sucrose synthase (SUSY) alpha-tubulin (TUA) beta-tubulin (TUB) caffeoyl CoA O-methyltransferase (CCoAOMT) caffeate O-methyltransferase (COMT) ferulate 5-hydroxylase (F5H) glycine decarboxylase complex, H (gdcH) glycine decarboxylase complex, T (gdcT) *Genes with significant single marker associations +Genes with significant haplotype-based associations SNPs Targeted – SNPs identified and sent for genotyping on the Illumina Golden Gate assay SNPs Converted – SNPs successfully genotyped on the Illumina Golden Gate assay JGI Gene Model estExt_fgenesh4_pg.C_1210004 grail3.0100002702 fgenesh4_pg.C_LG_III001773 fgenesh4_pg.C_LG_VI000268 grail3.0094002901 estExt_fgenesh4_pg.C_LG_XIII0519 estExt_Genewise1_v1.C_LG_IX2359 estExt_fgenesh4_kg.C_LG_III0056 gw1.XI.3218.1 eugene3.00040363 gw1.XVIII.3152.1 estExt_Genewise1_v1.C_LG_VI2188 eugene3.00002636 fgenesh4_pg.C_LG_III001559 eugene3.02080010 estExt_fgenesh4_pg.C_LG_I0683 estExt_fgenesh4_pg.C_LG_XVI1027 estExt_fgenesh4_pg.C_LG_VIII0541 estExt_fgenesh4_pm.C_LG_VIII0291 estExt_fgenesh4_pg.C_LG_VIII0293 gw1.X.2713.1 estExt_fgenesh4_pg.C_LG_X2023 eugene3.00080928 eugene3.00012227 grail3.0003095602 estExt_fgenesh4_pm.C_880008 estExt_fgenesh4_pm.C_LG_XVIII0009 gw1.II.3483.1 eugene3.00090803 estExt_Genewise1_v1.C_LG_I1970 estExt_fgenesh4_pm.C_LG_IX0457 eugene3.00010909 grail3.0001059501 estExt_fgenesh4_pm.C_LG_I1023 estExt_fgenesh4_pm.C_LG_XV0035 estExt_fgenesh4_pm.C_LG_XII0129 estExt_fgenesh4_pm.C_570058 eugene3.00071182 estExt_fgenesh4_pg.C_LG_XII1299 eugene3.02520018 Table 2. Major Peak Assignments from the Pyrolysis Molecular Beam Mass Spectrometry m/z Assignment 57, 73, 85, 96, 114 C5 sugars 57, 60, 73, 98, 126, 144 C6 sugars 124 guaiacol G 137 ethylguaiacol ,homovanillin, coniferyl alcohol G 138 methylguaiacol G 150 vinylguaiacol, coumaryl alcohol G 152 4-ethylguaiacol, vanillin G 154 syringol S 164 allyl-*propenyl guaiacol G 167 ethylsyringol, syringylacetone, propiosyringone S 168 4-methyl-2,6-dimethoxyphenol S 178 coniferyl aldehyde G 180 coniferyl alcohol, vinylsyringol, -D-glucose G, S 182 syringaldehyde S 194 4-propenylsyringol S 208 sinapylaldehyde S 210 sinapylalcohol S m/z – mass to charge ratio S – syringyl peaks G – guiacyl peaks (S) or (G) Precursor Table 3. List of significant marker-trait pairs after a correction for multiple testing (FDR Q ≤ 0.10) Trait Gene Symbol SNP F P N R2 Q Lignin C4H1_04-219 [A:C]ns 4.6766 0.0013 433 0.0187 0.0395 C4H2_09-169 [A:C]nc 9.8329 0.0001 433 0.0384 0.0178 CCR_08-554 [A:G]nc 5.9541 0.0014 435 0.0119 0.0395 CesA1A_20-226 [A:G]nc 4.0516 0.0015 432 0.0163 0.0402 CesA1B_02-87 [A:G]nc 4.0226 0.0024 427 0.0163 0.0482 CesA1B_04-127 [A:C]nc 5.7095 0.001 432 0.0227 0.0288 CesA1B_08-261 [A:G]nc 5.4417 0.001 434 0.0216 0.0288 CesA2A_08-38 [A:G]ns 8.6111 0.0011 431 0.0172 0.0288 HCT1_03-246 [A:G]nc 3.9879 0.0027 434 0.0159 0.0482 HCT6_13-225 [A:G]s 5.4364 0.0001 433 0.0217 0.0178 SUSY1_02-108 [A:T]nc 6.7751 0.0001 433 0.0268 0.0178 SUSY1_10-258 [A:C]nc 6.7036 0.0001 433 0.0265 0.0178 SUSY1_14-94 [A:G]nc 3.7898 0.0027 434 0.0152 0.0482 CAD_04-185 [A:T]nc 6.5211 0.001 325 0.0322 0.0288 C4H2_09-169 [A:C]nc 9.9962 0.003 433 0.0373 0.0444 C4H2_12-151 [A:G]s 3.535 0.0027 429 0.0137 0.0487 CAD_04-185 [A:T]nc 3.4754 0.0033 325 0.0175 0.0487 CesA1A_02-481 [A:C] 5.6615 0.008 432 0.0216 0.0766 CesA1A_12-40 [A:G]s 3.593 0.0033 433 0.0138 0.0487 CesA1A_20-226 [A:G]nc 5.7281 0.003 432 0.0218 0.0444 CesA1B_04-127 [A:C]nc 3.4124 0.0037 432 0.0131 0.0487 CesA1B_08-261 [A:G]nc 3.4197 0.0035 434 0.0131 0.0487 CesA2A_08-38 [A:G]ns 8.1264 0.008 431 0.0157 0.0766 CesA2B_01-162 [A:C]nc 3.5186 0.0026 431 0.0135 0.0487 CoAOMT1_08-313 [A:G]nc 7.2144 0.002 431 0.0272 0.0344 COMT2_10-423 [A:C]nc 3.3459 0.0046 397 0.014 0.0487 HCT6_13-225 [A:G]s 3.1681 0.0044 433 0.0122 0.0487 LAC1a_03-98 [A:G]nc 4.6733 0.0013 424 0.0184 0.0487 LAC1a_11-493 [A:G]nc 2.8918 0.0049 433 0.0111 0.0487 PAL2_04-212 [A:G]nc 3.5212 0.0021 432 0.0135 0.0487 S/G C6 nc SAM1_09-195 [A:T]nc 4.1603 0.0015 422 0.0162 0.0487 SUSY1_02-108 [A:T]nc 5.4253 0.004 433 0.0207 0.0366 SUSY1_02-396 [A:G]nc 3.0401 0.0042 430 0.0118 0.0487 SUSY1_02-503 [A:G]nc 3.1539 0.0031 432 0.0122 0.0487 SUSY1_10-258 [A:C]nc 4.3819 0.0014 433 0.0168 0.0487 SUSY1_14-128 [A:T]nc 3.2779 0.0035 434 0.0126 0.0487 SUSY1_14-94 Nonsynonymous polymorphism. s Synonymous polymorphism. nc Noncoding polymorphism. [A:G]nc 3.2779 0.0045 434 0.0126 0.0487 ns Table 4. List of marker effects for significant marker-trait pairs Trait SNP 2ab dc d/a 2a/spd Frequencye af Lignin C4H1_04-219 C4H2_09-169 CesA1A_20-226 CesA1B_02-87 CesA1B_04-127 CesA1B_08-261 HCT1_03-246 HCT6_13-225 SUSY1_02-108 SUSY1_10-258 SUSY1_14-94 1.1128 5.4356 0.8812 0.7162 0.8864 0.8609 1.7016 1.1007 0.6518 0.4200 1.8197 -0.2097 -2.8933 -0.2481 0.1487 0.1855 0.1475 -0.0340 -0.8865 0.5331 0.7762 -0.1772 -0.3769 -1.0646 -0.5631 0.4154 0.4185 0.3427 -0.0400 -1.6108 1.6356 3.6963 -0.1947 0.8912 4.3533 0.7057 0.5736 0.7099 0.6895 1.3628 0.8815 0.5220 0.3363 1.4574 0.17 0.01 0.32 0.30 0.28 0.28 0.07 0.12 0.13 0.07 0.06 (C) (A) (A) (G) (A) (A) (A) (G) (T) (A) (A) 0.7467 5.2266 0.3124 -0.7869 -0.6741 -0.5574 1.4719 0.8653 0.3869 0.2358 1.5948 CAD_04-185 0.1655 0.0268 0.3236 0.7762 0.33 (A) -0.4801 C4H2_09-169 C4H2_12-151 CAD_04-185 CesA1A_02-481 CesA1A_12-40 CesA1A_20-226 CesA1B_04-127 CesA1B_08-261 CesA2B_01-162 CoAOMT1_08-313 COMT2_10-423 HCT6_13-225 LAC1a_03-98 LAC1a_11-493 PAL2_04-212 SAM1_09-195 SUSY1_02-108 SUSY1_02-396 SUSY1_02-503 SUSY1_10-258 SUSY1_14-128 SUSY1_14-94 9.3773 0.9478 1.3827 4.4213 2.0992 1.6669 1.5740 1.5529 1.2053 0.9396 1.3815 1.8043 2.5171 1.3099 0.9397 0.8770 1.4921 1.0363 0.9770 1.5603 3.6518 3.6518 4.0891 -0.2340 0.3715 2.1152 0.2777 0.5787 -0.3659 -0.2997 0.2626 -0.4899 -0.1172 1.2473 0.2932 0.0851 -1.5024 0.7918 -0.6568 -0.1138 -0.1478 -0.8029 0.5908 0.5908 0.8721 -0.4938 0.5373 0.9568 0.2645 0.6943 -0.4649 -0.3860 0.4358 -1.0429 -0.1697 1.3826 0.2329 0.1299 -3.1978 1.8058 -0.8804 -0.2196 -0.3024 -1.0292 0.3236 0.3236 4.4291 0.4477 0.6531 2.0883 0.9915 0.7873 0.7435 0.7335 0.5693 0.4438 0.6525 0.8522 1.1889 0.6187 0.4438 0.4142 0.7047 0.4895 0.4615 0.7370 1.7248 1.7248 0.01 0.45 0.33 0.08 0.13 0.32 0.28 0.28 0.30 0.47 0.44 0.12 0.14 0.26 0.05 0.32 0.13 0.23 0.23 0.07 0.06 0.06 (A) (G) (A) (C) (A) (A) (A) (A) (A) (A) (C) (G) (G) (A) (G) (A) (T) (G) (G) (A) (A) (A) -9.3344 -0.7508 -8.8239 -4.2124 -1.8498 -1.1166 0.7040 0.8447 -0.9912 -0.0247 -2.4781 -1.7190 -2.7865 0.6407 -1.0696 -1.4465 -1.3030 -1.0285 -0.8377 -1.4921 -3.3928 -3.3928 S/G C6 bCalculated as the difference between the phenotypic means observed within each homozygous class (2a = |GBB-Gbb|, where Gij is the trait mean in the ijth genotypic class). cCalculated as the difference between the phenotypic mean observed within the heterozygous class and the average phenotypic mean across both homozygous classes (d = GBb – 0.5(GBB+Gbb), where Gij is the trait mean in the ijth genotypic class). ds p, standard deviation for the phenotypic trait under consideration. eAllele frequency of either the derived or minor allele. SNP alleles corresponding to the frequency listed are given in parentheses. additive effect was calculated as a = pB(GBB) +pb(GBb) – G, where G is the overall trait mean, Gij is the trait mean in the ijth genotypic class and pi is the frequency of the ith marker allele. These values were always calculated with respect to the minor allele. fThe Table 5. List of haplotypes with significant associations to phenotype after a correction for multiple testing (FDR Q ≤ 0.10) Amplicon 4CL1_11 Trait lignin P 0.0042 Q 0.0539 Haplotypes 3 4CL3_14 lignin 0.0021 0.0519 6 CAD_04 lignin 0.0065 0.0578 9 CCR_12 CesA1B_10 CesA2B_16 CesA3A_09 CesA3A_13 lignin lignin lignin lignin lignin 0.0060 0.0038 0.0055 0.0018 0.0022 0.0578 0.0539 0.0576 0.0519 0.0519 4 6 2 5 7 HCT1_12 lignin 0.0016 0.0519 3 SUSY1_02 lignin 0.0053 0.0576 3 TUA5_09 4CL1_01 lignin C6 0.0027 0.0000 0.0521 0.0018 7 5 CesA1A_12 C6 0.0005 0.0231 6 SAM1_07 C6 0.0008 0.0239 5 Significant Haplotypes TGC AGC CGT GGT GGA CAAAAT CATAAT GATAAT 0 AGA 0 TAAAAA CGGAA CAAAT CGGCT CAACT AA GA AAAA TGGG 0 AGA AAA AGA AAA GAG AGAA GGAA Haplotype Frequency 0.31 0.94 0.02 0.09 0.22 0.03 0.01 0.02 0.15 0.01 0.15 0.04 0.02 0.57 0.73 0.08 0.77 0.13 0.12 0.02 0.09 0.04 0.20 0.01 0.30 *Significant single marker associations (FDR Q 0.10) listed with the associated traits aSingle Single marker associations 4CL1_11-108 (0.2278)a 4CL3_13-464 (0.2041)a *CAD_04-185 (S/G, C6) CCR_12-366 (0.2168)a CesA1B_10-41 (0.3726)a CesA2B_16-423 (0.2967)a CesA3A_09-93 (0.2068) a CesA3A_13-535 (0.1777) a HCT1_12-156 (0.1828) a *SUSY1_02-108 (lignin, C6) *SUSY1_02-396, SUSY1_02-503 (C6) TUA5_09-73 (0.1899) a 4CL1_01-468 (0.1668) a *CesA1A_12-40 (C6) SAM1_07-480 (0.2874) a marker associations with the lowest Q value relating to the significant haplotype-trait association Figure Legends Figure 1: Descriptive information about the distribution, sampling localities, and population structure across the range of black cottonwood. (A) Range map for black cottonwood. (B) Sample locations across Oregon and Washington. Each point denotes a single tree (n = 448). (C) Population structure estimates across all the sampled range of black cottonwood. Colors designate the five significant genetic clusters detected using PCA. Multiple colors denote points with multiple clones assigned to different genetic clusters. Figure 2: (A) Decay of LD with distance in base pairs between sites in two candidate genes: SUSY1 and C4H1. Squared coefficients of allele frequency (r 2) are plotted against distance in base pairs. The fitted curve represents the trend of decay of LD. (B) Decay of LD with distance in base pairs between sites pooled across 39 genes. (C) Decay of LD across all candidate genes for the first 400 base pairs from that presented in (B). Figure 3: (A-C) An example of marker effects in the CesA1B gene on the lignin content phenotype. Each marker explains a small portion of the phenotypic variance (r2 2-3%) and is consistent with an additive model of gene action. Whiskers in the box plots represent 1.5 times the interquartile range. (D) Illustrated are the 39 SNPs genotyped for the CesA1B gene relative to the reference gene model, as well as 3 of those 39 that were significant (indicated with an asterick). Solid boxes denote UTR, solid lines are introns, and open boxes indicate exons in the gene model. Figure 4: Marker effects on the significant non-synonymous SNPs found in C4H1 and CesA2A. (A) The C4H1_04-219 non-synonymous marker in the first exon of the C4H1 gene illustrated patterns of gene action consistent additive effects. The C allele at C4H1_04-219 causes a histidine (H) to proline (P) amino acid substitution. (B) The CesA2A_08-38 non-synonymous marker is located in the 6th exon of the CesA2A gene. This SNP is significant for both lignin content and C6 traits. For lignin content, the homozygote decreases the percent content while in C6, the sugar content is elevated. The G allele at CESA2A is the derived state and is responsible for an isoleucine (I) to valine (V) amino acid substitution. In both gene models, solid boxes denote UTR, solid lines are introns, and open boxes indicate exons. Figure 5: Haplotype and single marker associations are illustrated for SUSY1 and 4CL1. (A) The genotypic effects of the three proposed haplotypes (two significant) of SUSY1 are shown. The haplotypes yield significantly different median phenotypic values for the lignin content trait. The marker effects of four significant single marker associations are also shown. SUSY1_02-108 is significant with respect to lignin. The remaining markers are significant with respect to the related trait, C6 sugars. All four markers are within LD of one another. (B) The genotypic effects of the three haplotypes (two significant) of 4CL1 are shown. The significant haplotypes yield different median phenotypic values for the lignin content trait. No significant single marker associations were identified after multiple testing, however the box plots for single markers with P < 0.05 are shown. Two of the three markers are in LD with one another. A B C SUPPORTING INFORMATION TABLE S1. Sample localities for the 448 individuals used for association mapping in P. trichocarpa Sample ID Latitude Longitude 349 44.733 -123.067 359 47.850 -121.817 365 47.050 -122.700 367 47.050 -122.700 457 . . 460 47.067 -122.200 521 . . 552 47.067 -122.200 1862 45.750 -122.833 1863 45.750 -122.833 1909 45.583 -122.383 1921 45.583 -122.200 1922 45.583 -122.200 1944 45.583 -122.000 1950 45.583 -121.917 1983 46.083 -122.833 1984 46.083 -122.833 2022 45.750 -122.667 2028 45.833 -122.833 2037 45.833 -122.833 2045 46.083 -123.917 2048 46.083 -123.917 2063 46.167 -123.000 2066 46.167 -123.000 2076 46.333 -123.417 2092 46.167 -123.333 2103 46.000 -123.000 2116 46.000 -123.083 2118 46.000 -123.083 2136 46.167 -122.917 2151 46.333 -122.917 2159 46.333 -122.917 2161 46.333 -122.667 2165 46.333 -122.667 2166 46.333 -122.667 2175 46.333 -122.667 2204 45.617 -122.583 2213 45.617 -122.583 2220 43.750 -122.500 2228 43.750 -122.500 2236 46.167 -123.083 2257 45.567 -123.000 2283 45.833 -123.117 2299 45.833 -122.733 2322 45.750 -122.750 2325 45.333 -122.717 2327 45.333 -122.717 2343 45.250 -122.867 2345 45.083 -123.033 2351 45.083 -123.033 2356 45.083 -123.033 2358 44.833 -123.167 2361 44.833 -123.167 2365 44.833 -123.167 2368 44.833 -123.167 2392 44.700 -123.167 2393 44.700 -123.167 2402 44.733 -123.083 2405 44.733 -123.083 2408 44.733 -123.083 2428 45.417 -122.600 2451 45.567 -122.667 2480 45.317 -122.567 2501 45.317 -122.567 2505 45.283 -122.633 2506 45.283 -122.633 2515 45.283 -122.633 2518 45.083 -122.750 2525 45.083 -122.750 2529 45.083 -122.750 2532 44.417 -123.333 2534 44.417 -123.333 2538 44.417 -123.333 2549 44.250 -123.250 2551 44.250 -123.250 2572 44.083 -123.167 2573 44.083 -123.167 2577 44.083 -123.167 2583 44.000 -122.917 2590 44.000 -122.917 2591 44.000 -122.917 2597 44.000 -122.917 2616 44.333 -123.417 2631 44.517 -123.200 2654 45.167 -123.167 2679 45.200 -123.083 2683 45.200 -123.083 2686 45.200 -123.083 2716 46.667 -121.750 2720 46.667 -121.750 2727 46.500 -122.000 2731 46.500 -122.000 2884 45.583 -122.383 2889 45.583 -122.383 2892 45.583 -122.383 2896 45.583 -122.383 2897 46.100 -122.967 4530 46.667 -121.667 4531 46.500 -122.000 4579 46.717 -121.717 4580 46.567 -121.667 4583 46.567 -121.667 4584 46.717 -121.717 4585 46.567 -121.667 4588 45.583 -122.383 4593 44.733 -123.050 4594 45.833 -121.883 4603 45.583 -122.383 4605 46.183 -123.150 4606 45.667 -122.717 4608 45.667 -122.717 4610 45.967 -122.817 4611 47.050 -122.900 6808 46.183 -123.533 6816 46.183 -123.533 6828 46.150 -123.200 6831 45.900 -122.733 6841 45.933 -122.817 6848 45.933 -122.817 6855 45.733 -122.767 6858 45.733 -122.767 6874 46.150 -123.333 6880 46.267 -123.450 6891 42.917 -122.950 6903 45.583 -122.383 6909 45.583 -122.383 6915 45.800 -122.750 6926 45.533 -122.350 6931 45.533 -122.350 6936 45.567 -122.333 6946 45.567 -122.333 6952 46.533 -122.100 6958 46.533 -122.100 6962 46.533 -121.900 6966 46.533 -121.900 6969 46.533 -121.900 6972 46.550 -122.267 6977 46.550 -122.267 6981 46.333 -122.917 6995 45.983 -122.533 6997 45.983 -122.533 6998 45.983 -122.533 6999 45.983 -122.533 7006 46.533 -122.483 7014 46.533 -122.483 7017 46.033 -122.300 7019 46.033 -122.300 7021 46.033 -122.300 7030 46.433 -122.850 7042 45.900 -123.150 7044 45.900 -123.150 7059 45.900 -123.083 7062 45.900 -123.083 7067 46.150 -123.200 7069 45.567 -122.450 7071 45.633 -122.717 7073 44.167 -122.950 7074 45.567 -122.450 7075 44.167 -122.950 7076 44.167 -122.950 7077 48.867 -121.867 7079 45.483 -121.867 7087 44.333 -123.233 7088 44.333 -123.233 7091 44.333 -123.233 7094 44.333 -123.233 7096 45.833 -121.883 7098 46.133 -123.333 7109 45.617 -122.667 7117 45.483 -122.683 7118 47.700 -121.350 7123 47.500 -121.783 7124 47.500 -121.783 7126 43.867 -122.817 7128 43.467 -122.683 7131 45.583 -122.417 7133 45.850 -122.750 7136 44.733 -123.050 7137 44.733 -123.050 7138 44.733 -123.050 7139 44.733 -123.050 7140 45.383 -122.600 7141 45.383 -122.600 7142 45.383 -122.600 7144 45.650 -122.750 7149 45.150 -122.533 7151 45.150 -122.533 7152 45.783 -122.533 7632 45.750 -122.817 7637 45.467 -122.683 7639 45.583 -122.417 7640 45.583 -122.417 7647 46.133 -123.333 7649 45.533 -122.383 7973 45.550 -122.400 7974 45.550 -122.400 7975 45.550 -122.400 7976 46.117 -123.000 7978 46.117 -123.000 7979 46.117 -123.000 7981 46.133 -123.250 7982 46.133 -123.250 7983 46.117 -123.000 7984 46.117 -123.000 7985 46.117 -123.000 7987 46.083 -122.883 7988 46.083 -122.883 7989 46.083 -122.883 7990 46.083 -122.883 7991 46.000 -122.867 7992 46.000 -122.867 7993 46.000 -122.867 7994 46.000 -122.867 7996 46.100 -123.183 8401 46.933 -122.600 8407 46.933 -122.600 8415 46.750 -122.033 8423 46.767 -122.183 8431 46.767 -122.183 8435 46.300 -122.783 8436 46.300 -122.783 8445 46.733 -121.900 8452 46.733 -121.900 8467 46.933 -122.550 8468 46.933 -122.550 8469 47.000 -123.517 8470 47.000 -123.517 8493 47.000 -123.400 8505 45.400 -122.500 8513 45.400 -122.500 8516 45.383 -122.400 8527 45.367 -122.367 8534 44.633 -122.883 8540 44.633 -122.883 8552 44.500 -122.817 8561 44.800 -122.783 8579 44.533 -122.900 8581 44.533 -122.900 8585 44.750 -122.467 8601 44.783 -122.617 8608 44.117 -122.567 8611 44.117 -122.567 8628 44.133 -122.567 8631 44.133 -123.067 8639 44.133 -123.067 9577 47.167 -122.383 9578 47.167 -122.383 9579 47.167 -122.383 9580 47.167 -122.383 9581 47.167 -122.383 9582 47.167 -122.383 9583 47.167 -122.383 9584 47.167 -122.383 9585 47.867 -122.633 9586 47.867 -122.633 9587 47.867 -122.633 9588 47.450 -123.033 9589 47.450 -123.033 9590 47.450 -123.033 9591 47.450 -123.033 9592 44.750 -122.867 9593 44.750 -122.867 9594 44.750 -122.867 9595 44.750 -122.867 9596 44.750 -122.867 9597 44.800 -123.233 9598 47.067 -123.733 9756 48.817 -122.217 9757 48.817 -122.217 9758 48.817 -122.217 9759 48.817 -122.217 9760 48.817 -122.217 9761 48.817 -122.217 9762 48.817 -122.217 9763 48.817 -122.217 9764 48.817 -122.217 9765 48.817 -122.217 9766 48.817 -122.217 9767 48.817 -122.217 9768 48.717 -122.200 9769 48.717 -122.200 9770 48.717 -122.200 9771 48.717 -122.200 9772 48.717 -122.200 9773 48.717 -122.200 9774 48.717 -122.200 9775 48.717 -122.200 9776 48.717 -122.200 9777 48.717 -122.200 9778 48.717 -122.200 9779 48.717 -122.200 9780 48.917 -122.067 9781 48.917 -122.067 9782 48.917 -122.067 9783 48.917 -122.067 9784 48.917 -122.067 9785 48.917 -122.067 9786 48.917 -122.067 9787 48.917 -122.067 9788 48.917 -122.067 9789 48.917 -122.067 9790 48.917 -122.067 9791 48.917 -122.067 9792 48.500 -122.217 9793 48.500 -122.217 9794 48.500 -122.217 9795 48.500 -122.217 9796 48.500 -122.217 9797 48.500 -122.217 9798 48.500 -122.217 9799 48.500 -122.217 9801 48.500 -122.217 9802 48.500 -122.217 9803 48.500 -122.217 9804 48.517 -122.050 9805 48.517 -122.050 9806 48.517 -122.050 9807 48.517 -122.050 9808 48.517 -122.050 9809 48.517 -122.050 9810 48.517 -122.050 9811 48.517 -122.050 9812 48.517 -122.050 9813 48.517 -122.050 9814 48.517 -122.050 9815 48.517 -122.050 9816 48.533 -121.750 9817 48.533 -121.750 9818 48.533 -121.750 9819 48.533 -121.750 9820 48.533 -121.750 9821 48.533 -121.750 9822 48.533 -121.750 9823 48.533 -121.750 9824 48.533 -121.750 9825 48.533 -121.750 9826 48.533 -121.750 9827 48.533 -121.750 9828 47.733 -121.933 9829 47.733 -121.933 9830 47.733 -121.933 9831 47.733 -121.933 9832 47.733 -121.933 9833 47.733 -121.933 9834 47.733 -121.933 9836 47.733 -121.933 9837 47.733 -121.933 9838 47.733 -121.933 9839 47.733 -121.933 9840 47.733 -121.983 9841 47.733 -121.983 9842 47.733 -121.983 9843 47.733 -121.983 9844 47.733 -121.983 9845 47.733 -121.983 9846 47.733 -121.983 9847 47.733 -121.983 9848 47.733 -121.983 9849 47.733 -121.983 9850 47.733 -121.983 9851 47.733 -121.983 9852 47.683 -121.917 9853 47.683 -121.917 9854 47.683 -121.917 9855 47.683 -121.917 9857 47.683 -121.917 9858 47.683 -121.917 9859 47.683 -121.917 9860 47.683 -121.917 9861 47.683 -121.917 9862 47.683 -121.917 9863 47.683 -121.917 9864 47.200 -121.933 9865 47.200 -121.933 9866 47.200 -121.933 9867 47.200 -121.933 9868 47.200 -121.933 9869 47.200 -121.933 9870 47.200 -121.933 9871 47.200 -121.933 9872 47.200 -121.933 9873 47.200 -121.933 9874 47.200 -121.933 9875 47.200 -122.050 9876 47.200 -122.050 9878 47.200 -122.050 9879 47.200 -122.050 9880 47.200 -122.050 9882 47.200 -122.050 9883 47.200 -122.050 9884 47.200 -122.050 9885 47.200 -122.050 9886 47.200 -122.050 9887 47.083 -122.233 9888 47.083 -122.233 9889 47.083 -122.233 9890 47.083 -122.233 9891 47.083 -122.233 9892 47.083 -122.233 9893 47.083 -122.233 9894 47.083 -122.233 9895 47.083 -122.233 9896 47.083 -122.233 9897 47.083 -122.233 9898 47.083 -122.233 9899 47.117 -122.117 9900 47.117 -122.117 9901 47.117 -122.117 9903 47.117 -122.117 9904 47.117 -122.117 9905 47.117 -122.117 9906 47.117 -122.117 9907 47.117 -122.117 9908 47.117 -122.117 9909 47.117 -122.117 9910 47.117 -122.117 9911 47.100 -122.200 9912 47.100 -122.200 9913 47.100 -122.200 9914 47.100 -122.200 9915 47.100 -122.200 9916 47.100 -122.200 9917 47.100 -122.200 9918 47.100 -122.200 9919 47.100 -122.200 9920 47.100 -122.200 9921 47.100 -122.200 9947 46.050 -121.933 9948 46.050 -121.933 9949 46.050 -121.933 9950 46.050 -121.933 9951 46.050 -121.933 9952 46.050 -121.933 9953 46.150 -123.333 9954 46.150 -123.333 9955 46.150 -123.333 9956 46.150 -123.333 9957 46.150 -123.333 9958 46.150 -123.333 9959 45.950 -121.950 9960 45.950 -121.950 9961 45.950 -121.950 9962 45.950 -121.950 9963 45.950 -121.950 9964 45.950 -121.950 9965 45.950 -122.817 9966 45.950 -122.817 9967 45.950 -122.817 9968 45.950 -122.817 9969 45.950 -122.817 9970 45.950 -122.817 9971 45.950 -122.817 10072 46.267 -123.450 TABLE S2. Summaries of quality scores across genotyped SNP loci Sample ID Latitude Longitude 0.6856 0.1359 0.3091-0.8874 0.6919 0.1290 0.3244-0.8881 0.6568 0.1569 0.2676-0.8627 0.6688 0.1530 0.2659-0.8692 Control 0.4210 0.1856 0.1500-0.9331 0.4233 0.1814 0.1494-0.9359 Focal 0.3599 0.1478 0.1399-0.6849 0.3666 0.1476 0.1399-0.6808 0.6791 0.1437 0.2756-0.8869 0.6850 0.1376 0.2798-0.8879 0.6495 0.1601 0.2484-0.8560 0.6613 0.1564 0.2443-0.8563 0.9859 0.0281 0.9060-1.0000 0.9875 0.0237 0.9242-1.0000 0.9836 0.0514 0.8894-1.0000 0.9859 0.0469 0.9159-1.0000 GenTrain Control Focal Cluster Separation GenCall50 Control Focal Call Rate Control Focal Numbers are given for the full dataset (upper) and for the reduced dataset with FIS outliers removed (lower and bolded). The full dataset consisted of 297 control SNPs and 579 focal SNPs, while the reduced dataset consisted of 247 control SNPs and 530 focal SNPs. The 95% interval is defined by the 2.5% and 97.5% quantiles. TABLE S3. Summaries of genotyped SNPs for focal (n = 530) and control SNPs (n = 247). Samples Missing (%) MAF HE HO FIS HWE (%) Focal 448 1.56 (4.72) 0.21 (0.15) 0.28 (0.16) 0.27 (0.16) 0.04 (0.08) 97.36 (77.17) Cluster 1 All 64 1.30 (5.59) 0.23 (0.18) 0.29 (0.17) 0.28 (0.18) 0.01 (0.15) 99.81 (93.02) Cluster 2 80 0.07 (0.14) 0.20 (0.17) 0.27 (0.17) 0.29 (0.20) -0.03 (0.17) 96.23 (89.81) Cluster 3 82 1.14 (4.73) 0.20 (0.16) 0.27 (0.17) 0.27 (0.18) 0.03 (0.21) 97.55 (90.19) Cluster 4 113 1.15 (4.58) 0.19 (0.16) 0.26 (0.17) 0.25 (0.18) 0.03 (0.12) 97.54 (92.08) Cluster 5 109 2.25 (5.20) 0.21 (0.17) 0.27 (0.17) 0.28 (0.19) 0.01 (0.18) 96.04 (85.09) 448 1.26 (2.35) 0.23 (0.14) 0.31 (0.15) 0.30 (0.15) 0.03 (0.08) 95.95 (80.16) Cluster 1 64 1.23 (2.42) 0.25 (0.16) 0.33 (0.15) 0.33 (0.16) -0.01 (0.14) 99.59 (94.33) Cluster 2 80 1.19 (2.61) 0.22 (0.14) 0.30 (0.16) 0.30 (0.16) 0.02 (0.14) 100.00 (90.69) Cluster 3 82 1.02 (2.41) 0.23 (0.14) 0.31 (0.15) 0.30 (0.16) 0.02 (0.12) 99.59 (93.52) Cluster 4 113 1.01 (2.40) 0.22 (0.14) 0.30 (0.16) 0.29 (0.16) 0.03 (0.14) 100.00 (91.49) Cluster 5 109 1.76 (2.96) 0.22 (0.15) 0.31 (0.16) 0.30 (0.16) 0.02 (0.13) 100.00 (87.85) Control All Listed are averages (one standard deviation) across loci unless otherwise noted. Outliers with respect to FIS were removed prior to these calculations. Numbers in bold have 99% bootstrap (n = 10,000 replicates) confidence intervals for the mean across loci that do not include zero. Numbers listed under HWE are the percent of loci consistent with HWE at a Bonferroni corrected significance threshold of P = 0.000094. Values in parentheses are the percent of loci consistent with HWE at a significance threshold of P = 0.05. FIS, Wright’s fixation index; HE, expected heterozygosity; HO, observed heterozygosity; HWE, Hardy-Weinberg Equilibrium; MAF, minor allele frequency. FIGURE S1.–Distribution of quality metrics for genotyped SNPs grouped by dataset (control: 297 SNPs, focal: 579 SNPs). Values are assessed per SNP across all samples and higher values indicate higher quality for all metrics. The distributions for these metrics when FIS outliers were removed were qualitatively similar (data not shown). A comparison of these distributions when FIS outliers were or were not included is located in Table S1. (A). The distributions for the GenTrain score were similar across control and focal SNPs. (B). The distributions for the cluster separation score were similar across control and focal SNPs, with the control SNPs forming slightly tighter clusters. (C). The distributions for the GenCall50 (GC50) score were similar across control and focal SNPs. This metric is assigned to each genotype call at a SNP, thus the GC50 score represents the median value of this metric across all samples typed at a particular SNP. (D). The distributions for the Call Rate were similar across control and focal SNPs. This metric is the complement of the fraction of missing data per locus. FIGURE S2.–Cluster assignments illustrated across pairwise plots of the four significant principal components (PCs) derived using principal components analysis (PCA) encapsulated in the EIGENSOFT computer package (cf. http://genepath.med.harvard.edu/~reich/Software.htm). Clusters were formed using hierarchical clustering with Ward’s method on the four significant PCs. The geographical distribution of cluster assignments is shown in Figure 1C. FIGURE S3.–Summaries of population genetic parameters across all samples and samples placed into clusters. For each plot, the dashed line represents all samples while colors designate the five significant genetic clusters detected using PCA. Further summaries are given in Table S2. (A). Distributions of Wright’s inbreeding coefficient (FIS) for the 247 control SNPs were similar across all clusters. (B). Distributions of FIS for the 530 focal SNPs were similar across all clusters. (C). Distributions for the minor allele frequency for the control SNPs were broadly similar across all clusters. Clusters four and five were the most different, with cluster four having a pronounced spike in its density around a MAF of 0.30. (D). Distributions for the minor allele frequency for the focal SNPs were similar across all clusters. FIGURE S4.–Differentiation among inferred genetic clusters for Populus trichocarpa reveals FST outliers within the set of focal SNPs. The gray colors denote a two dimensional (2D) boxplot of the relationship between expected heterozygosity and FST for the 247 control SNPs. The lightest color denotes the extreme for this 2D distribution (i.e. no values for FST were observed greater than that bounding line for any value of expected heterozygosity). Points represent individual focal SNPs, with colored points differentiating outlier SNPs within the same candidate gene locus. Plotted to the right are the distributions for FST for the focal and control SNPs. The average FST was low for both sets, but greater for the focal (FST = 0.034) as opposed to control (FST = 0.013) SNPs. Plotted above are the distributions for expected heterozygosity for the focal and control SNPs. The distribution for the control SNPs illustrates a larger ascertainment bias, which is apparent by the high density centered on values of expected heterozygosity in the range of 0.40-0.50. This distribution should be broadly U-shaped under neutrality without ascertainment bias. FIGURE S5.–Cluster assignment is correlated with phenotypic traits. Colors designate groups based on Tukey multiple comparison tests. A global significance threshold of P = 0.01 was assumed for all tests. In all cases, the mean of cluster one was significantly different from means for the remaining four clusters. Shown are boxplots for each cluster. The whiskers extend to the data extremes in all cases. (A). The effect of cluster assignment on the C6 phenotype. Differences in the mean among clusters are significant (ANOVA: F3,443 = 12.290, P = 1.722E-09). (B). The effect of cluster assignment on the lignin (%) phenotype. Differences in the mean among clusters are significant (F3,443 = 8.388, P = 1.573E-06). (C). The effect of cluster assignment on the S/G phenotype. Differences in the mean among clusters are significant (ANOVA: F3,443 = 9.456, P = 2.419E07).