Transcriptome sequencing, characterization and Artemisia Prabin Bajgain, Joshua Udall (BYU, Provo)

advertisement
Transcriptome sequencing, characterization and
polymorphism detection in Big Sagebrush (Artemisia
tridentata) subspecies
Prabin Bajgain, Joshua Udall (BYU, Provo)
Bryce Richardson (USDA-RMRS, Provo)
Big Sagebrush (Artemisia tridentata)
-
Ecologically, one of the most important shrub species in the intermountain United States
-
Three main widespread subspecies – ssp. tridentata (basin ecotype), ssp. vaseyana
(mountain ecotype), ssp. wyomingensis (wyoming ecotype); two less common subspecies –
ssp. spiciformis, ssp. Xericensis
-
Numerous mammals, insects and birds are dependent on big sagebrush for food and shelter
– some are obligates while some are semi-obligates
-
Human encroachment and wildfires followed by cheatgrass invasion are threatening big
sagebrush habitat, and those dependent on it
Goals
-
-
-
To create a reliable and relatively
large sequence database for big
sagebrush
Develop markers on the gene
sequences
Make the data publicly available
for population, ecological and
evolutionary studies
Entrez records
Database name
Subtree
links
Direct
links
Nucleotide
32
31
Protein
14
14
Popset
3
3
SNP*
20,953
20,953
PubMed Central
34
34
Taxonomy
2
1
NCBI Taxonomy Browser, Feb 17 2011
* Bajgain et al., ‘Transcriptome characterization and polymorphism detection in subspecies of Artemisia tridentata (big
sagebrush)’ (in press)
Workflow
RNA extraction
and cDNA library
prep
454-sequencing
( sspp. tridentata
&
vaseyana)
Illumina sequencing
(ssp. wyomingensis)
Sequence assembly
Pfam & BLASTx
search
Marker detection
Gene annotation
(using BLASTx
results)
SNP mapping
Secondary
metabolite genes
Hybridization
theory
EST Sequence assembly
Assembly
ssp.
tridentata
(basin)
ssp.
vaseyana
(mtn)
ssp.
combined
Count
Average length
Total bases
Reads
823,392
403.91
332,578,737
Singletons
191,745
403.62
77,391,754
Contigs
20,357
716
14,587,705
Reads
702,001
333.13
233,854,535
Singletons
179,189
331.51
59,402,844
Contigs
20,250
624
12,641,189
Reads
1,525,393
371.34
566,433,272
Singletons
275,866
370.18
102,121,262
Contigs
29,541
796
23,521,465
Summary report of individual and combined de novo assembly
Assembly annotation
• BLASTx:
• against NR protein database
• e-value of 1e-15
• BLAST2GO for annotation
• 21,436 (72.6%) sequences had hits
Biological Process
Molecular Function
Cellular Component
Secondary metabolite genes
Enzymes
1-deoxy-D-xylulose 5-phosphate synthase (DXS)
1-deoxy-D-xylulose-5-phosphate reductoisomerase (DXP)
2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MCT)
4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (CMK)
MEP
2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MCS)
pathway
1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (HDS)
1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate reductase (HDR)
isopentenyl diphosphate/ dimethylallyl diphosphate isomerase (IDI)
isopentenyl diphosphate/ dimethylallyl diphosphate synthase (IDS)
acetoacetyl-coenzyme A thiolase (AACT)
3-hydroxy-methylglutaryl coenzyme A synthase (HMGS)
MVA
3-hydroxy-methylglutaryl coenzyme A reductase (HMGR)
pathway
Mevalonate kinase (MK)
phosphomevalonate kinase (PMK)
mevalonate disphosphate decarboxylase (MDC)
Coumarin
phenylalanine ammonia lyase
biosynthesis
cinnamate 4-hydroxylase
pathway
4-coumarate CoA ligase
No. of hits
(ssp. tridentata)
No. of hits
(ssp. vaseyana)
51
100
83
118
22
22
63
126
22
22
0
0
20
12
36
20
0
0
39
21
0
0
0
0
0
50
0
0
20
0
29
45
28
70
322
215
SNP detection
• SNP = Single Nucleotide Polymorphism
• parameters:
• 8x coverage; 90% nucleotide frequency; 20% minor allele frequency
• 20,952 ‘true’ SNPs, average coverage 20x
2500
Number of SNPs
2000
1500
1000
500
0
8
12
16
20
24
28
32
36
40
44
48
SNP coverage depth
Distribution of the number of SNPs by read coverage depth
52
56
60
SNP detection
tridentata
SNPS
vaseyana Both SNP
SNPS
types
total
Montana
138
wyomingensis
306
251
695
Utah
157
wyomingensis
424
458
1,039
• suggests origin of tetraploid ssp. wyomingensis via mixed ancestry
• more similar to ssp. vaseyana
SSR detection
• parameters
• 2-7 3-5 4-5 5-5 6-5 7-5 8-5 9-4 10-4 (SSR motif length – repeat number)
• 100 bp interruption distance
• 1,003 SSRs in basin
• 507 SSRs in mtn
800
700
Number of repeats
600
500
400
tridentata
300
vaseyana
200
100
0
di
tri
tetra
penta
hexa
Repeat motif
Frequency and distribution of SSRs in two big sagebrush subspecies
From here?
-
Evolution, intermixing and more evolution of big sagebrush subspecies
- Phylogenetic relationship among big sagebrush populations distributed in
the intermountain US
- Sequence capture approach (~350 genes, 55 populations)
From here?
-
Common garden studies to look at variation among the populations
- Later, link traits with genes in Artemisia tridentata populations
Acknowledgements
-
Funding: USDA-FS, RMRS, National Fire Plan, GBNPSIP
Rich Cronn
Jared Price
Nancy Shaw
Covey Jones
Brian Knaus
Felix Jimenez
Scott Yourstone
Download