Transcriptome sequencing, characterization and polymorphism detection in Big Sagebrush (Artemisia tridentata) subspecies Prabin Bajgain, Joshua Udall (BYU, Provo) Bryce Richardson (USDA-RMRS, Provo) Big Sagebrush (Artemisia tridentata) - Ecologically, one of the most important shrub species in the intermountain United States - Three main widespread subspecies – ssp. tridentata (basin ecotype), ssp. vaseyana (mountain ecotype), ssp. wyomingensis (wyoming ecotype); two less common subspecies – ssp. spiciformis, ssp. Xericensis - Numerous mammals, insects and birds are dependent on big sagebrush for food and shelter – some are obligates while some are semi-obligates - Human encroachment and wildfires followed by cheatgrass invasion are threatening big sagebrush habitat, and those dependent on it Goals - - - To create a reliable and relatively large sequence database for big sagebrush Develop markers on the gene sequences Make the data publicly available for population, ecological and evolutionary studies Entrez records Database name Subtree links Direct links Nucleotide 32 31 Protein 14 14 Popset 3 3 SNP* 20,953 20,953 PubMed Central 34 34 Taxonomy 2 1 NCBI Taxonomy Browser, Feb 17 2011 * Bajgain et al., ‘Transcriptome characterization and polymorphism detection in subspecies of Artemisia tridentata (big sagebrush)’ (in press) Workflow RNA extraction and cDNA library prep 454-sequencing ( sspp. tridentata & vaseyana) Illumina sequencing (ssp. wyomingensis) Sequence assembly Pfam & BLASTx search Marker detection Gene annotation (using BLASTx results) SNP mapping Secondary metabolite genes Hybridization theory EST Sequence assembly Assembly ssp. tridentata (basin) ssp. vaseyana (mtn) ssp. combined Count Average length Total bases Reads 823,392 403.91 332,578,737 Singletons 191,745 403.62 77,391,754 Contigs 20,357 716 14,587,705 Reads 702,001 333.13 233,854,535 Singletons 179,189 331.51 59,402,844 Contigs 20,250 624 12,641,189 Reads 1,525,393 371.34 566,433,272 Singletons 275,866 370.18 102,121,262 Contigs 29,541 796 23,521,465 Summary report of individual and combined de novo assembly Assembly annotation • BLASTx: • against NR protein database • e-value of 1e-15 • BLAST2GO for annotation • 21,436 (72.6%) sequences had hits Biological Process Molecular Function Cellular Component Secondary metabolite genes Enzymes 1-deoxy-D-xylulose 5-phosphate synthase (DXS) 1-deoxy-D-xylulose-5-phosphate reductoisomerase (DXP) 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MCT) 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (CMK) MEP 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MCS) pathway 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (HDS) 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate reductase (HDR) isopentenyl diphosphate/ dimethylallyl diphosphate isomerase (IDI) isopentenyl diphosphate/ dimethylallyl diphosphate synthase (IDS) acetoacetyl-coenzyme A thiolase (AACT) 3-hydroxy-methylglutaryl coenzyme A synthase (HMGS) MVA 3-hydroxy-methylglutaryl coenzyme A reductase (HMGR) pathway Mevalonate kinase (MK) phosphomevalonate kinase (PMK) mevalonate disphosphate decarboxylase (MDC) Coumarin phenylalanine ammonia lyase biosynthesis cinnamate 4-hydroxylase pathway 4-coumarate CoA ligase No. of hits (ssp. tridentata) No. of hits (ssp. vaseyana) 51 100 83 118 22 22 63 126 22 22 0 0 20 12 36 20 0 0 39 21 0 0 0 0 0 50 0 0 20 0 29 45 28 70 322 215 SNP detection • SNP = Single Nucleotide Polymorphism • parameters: • 8x coverage; 90% nucleotide frequency; 20% minor allele frequency • 20,952 ‘true’ SNPs, average coverage 20x 2500 Number of SNPs 2000 1500 1000 500 0 8 12 16 20 24 28 32 36 40 44 48 SNP coverage depth Distribution of the number of SNPs by read coverage depth 52 56 60 SNP detection tridentata SNPS vaseyana Both SNP SNPS types total Montana 138 wyomingensis 306 251 695 Utah 157 wyomingensis 424 458 1,039 • suggests origin of tetraploid ssp. wyomingensis via mixed ancestry • more similar to ssp. vaseyana SSR detection • parameters • 2-7 3-5 4-5 5-5 6-5 7-5 8-5 9-4 10-4 (SSR motif length – repeat number) • 100 bp interruption distance • 1,003 SSRs in basin • 507 SSRs in mtn 800 700 Number of repeats 600 500 400 tridentata 300 vaseyana 200 100 0 di tri tetra penta hexa Repeat motif Frequency and distribution of SSRs in two big sagebrush subspecies From here? - Evolution, intermixing and more evolution of big sagebrush subspecies - Phylogenetic relationship among big sagebrush populations distributed in the intermountain US - Sequence capture approach (~350 genes, 55 populations) From here? - Common garden studies to look at variation among the populations - Later, link traits with genes in Artemisia tridentata populations Acknowledgements - Funding: USDA-FS, RMRS, National Fire Plan, GBNPSIP Rich Cronn Jared Price Nancy Shaw Covey Jones Brian Knaus Felix Jimenez Scott Yourstone