Supporting Information Legends Supporting Figure S1. Evidence supporting MAKER-P gene annotations in PG29. The MAKER-derived annotation edit distance (AED) is a measure of the annotation to supporting evidence goodness of fit and was obtained after running the MAKER-P gene predictor on the PG29 V3 draft assembly (Campbell, et al. 2014). AED values range from 0 (complete concordance) to 1 (lack of supporting evidence). Over three quarters of the high-confidence PG29 V3 genes have an AED <0.2, consistent with the selection of this gene subset based transcript presence in the PG29 transcriptome assembly. Supporting Figure S2. Intron lengths derived from MAKER-P transcript models and experimental transcripts. We used the intron size information from MAKER-P (green line) and intron lengths derived from sequence alignments of 42,440 white spruce cDNA clones (blue line; (Rigault, et al. 2011 and GCAT v3.3, https://web.gydle.com/smartforests/gcat)) onto the draft PG29 V3 genome to plot the cumulative distribution function (CDF). Intron lengths varied between MAKER-P (n = 124,951 introns; Mean length = 646.70 ± 1,218 bp; Median length = 278 bp; Maximum length = 44,115 bp) and the measurements derived from cDNAs (n=60,265 introns; Mean length = 4,934 ± 14,311 bp; Median length = 202 bp; Maximum length = 377,332 bp). Supporting Figure S3. NMF consensus clustering of high-confidence genes. a) From the NMF rank survey (Gaujoux and Seoighe 2010), both cophenetic correlation coefficient and average silhouette width suggest a four-cluster solution. b) The consensus membership heatmap indicates that the 8 tissue samples were unambiguously grouped into 4 categories. c) 65 high-confidence P450s were subject to hierarchical clustering, fitting into the NMF consensus groups and expression RPKM plotted with the pheatmap R package. The range of RNA-seq measured expression is wide and varies across tissue types. d) A subset of only ten P450s are within the top 5% of genes whose expression were discriminatory for at least one sample group. Supporting Figure S4. Phylogeny of gymnosperm terpene synthases. Phylogeny of gymnosperm terpene synthase proteins, with Physcomitrella patens ent-kaurene synthase as the root. Phylogeny created with FastTree 2 after protein alignment with MAFFT, visualized with FigTree. Picea glauca proteins are shown in red, and those names ending with P are putative pseudogenes. Bootstrap values are indicated at nodes. Supporting Figure S5. Phylogeny of the white spruce cytochrome P450 family. A phylogeny of 307 white spruce P450s is shown with CYP51G used as the root. CYP IDs ending in P are putative pseudogenes. The phylogeny was created with FastTree 2 after protein alignment with MAFFT, and visualized with FigTree. Labels indicate the ten plant P450 clans. Areas of conifer- or gymnosperm-specific expansion are colored in: CYP76AAs, blue; CYP736s, red; CYP750s, orange; CYP720Bs, green; CYP86s, olive; and CYP716Bs, purple. Supporting Figure S6. Distribution of white spruce P450s across the eleven plant P450 clans. Plant P450s have been annotated into eleven major clans based on common ancestry, each containing one or more families (Nelson and Werck-Reichhart 2011). The number of white spruce members found in each family is shown, with the portion of putative pseudogenes shown in white. No white spruce P450s were identified for clan CYP746, which to date has only been identified in Chlamydomonas reinhardtii (Nelson and Werck-Reichhart 2011). Supporting Figure S7. Gymnosperm- and conifer-specific cytochrome P450 subfamilies. Phylogeny of white spruce P450s with orthologues available on NCBI. Phylogeny created with FastTree 2 after protein alignment with MAFFT, and visualized with FigTree. CYP IDs ending in P are putative pseudogenes. Bootstrap values are indicated at nodes. Supporting Figure S8. Phylogeny of HMG-R proteins in plants. Phylogeny created with FastTree 2 after protein alignment of the Picea glauca and PlantCyc proteins with MAFFT, visualized with CLC Main Workbench. Picea glauca protein names ending with P are putative pseudogenes. Bootstrap values are indicated at nodes. Labels for Acrogymnospermae are shown in red, Magnoliophyta are shown in black, and other Streptophyta are shown in blue. Supporting Figure S9. Phylogeny of isoprenyl diphosphate synthase proteins in plants. Phylogeny created with FastTree 2 after protein alignment of the Picea glauca and PlantCyc proteins with MAFFT, visualized with CLC Main Workbench. Phylogeny was rooted between the FPPS and the GPPS/GGPPS clades. Picea glauca protein names ending with P are putative pseudogenes. Bootstrap values are indicated at nodes. Labels for Acrogymnospermae are shown in red, Magnoliophyta are shown in black, other Streptophyta are shown in blue, and Chlorophyta are shown in green. Supporting Figure S10. N20 and N50 length statistics for WS77111 pre-unitig assemblies across a range of k values. In separate experiments, white spruce genome WS77111 sequencing data was assembled up to the k-mer graph stage (pre-unitig, k-1 overlapping k-mers) with ABySS v1.3.7 using a range of k-values (k=96, 104, 112, 116, 120 and 128). We assessed that the optimal parameter k for assembly (k=116) as the value of k giving the most contiguous pre-unitig assembly as measured by the N50 length contiguity statistic (blue line, N50=1,426 bp) and the N20 length (red line, N20=3,819 bp). Supporting Figure S11. Derivation of a high-confidence gene set based on detected gene expression in eight white spruce tissue and organ samples. A reference P. glauca transcritpome was assembled from RNA-seq reads sequenced from 8 PG29 tissues. Only putative transcripts sequences, those containing complete open reading frames were used to align (BLAST expect value < e-20) against MAKER-P gene predictions. Choosing the best hit for each gene model (shown above with a thick black line), we classified 16.4K gene models as high-confidence. Expression profiles were further obtained by aligning RNA-seq reads from each independent tissue library against those models. Supporting Table S1. White spruce WS77111 sequence data. Supporting Table S2. ABySS resources required at various assembly stages of building the 20 Gbp White Spruce WS77111 V1 draft genome. Supporting Table S3. Statistics for WS77111 V1 pre-unitig assemblies across a range of k values. Supporting Table S4. Execution, runtime and resources required at different stages of the DIDA read alignment framework. Supporting Table S5. White spruce WS77111 V1 ABySS v1.3.7 assembly statistics. Supporting Table S6. N50 length (kbp) comparisons between white spruce individuals PG29 V3 and WS77111 V1 at various stages of the ABySS assembly. Supporting Table S7. White spruce PG29 RNA-Seq transcriptome sequence reads. Supporting Table S8. White spruce assembly completeness, as measured by CEGMA analyses. Supporting Table S9. Assembly quality control (QC) using cDNA and sequence capture resources. Supporting Table S10. Exact repeat content in the V3 PG29 genome assembly. Supporting Table S11. ABySS-Bloom sequence identity calculations between various draft genome assemblies. Supporting Table S12. Structural variation (S.V.) in WS77111 V1 relative to PG29 V3 by PAVfinder (Chiu, et al. in preparation) analysis of whole-scaffold alignments. Supporting Table S13. White spruce genome re-scaffolding – scaffold assembly statistics.