mec13026-sup-0009-AppendixS1

advertisement
1
2
Focal species descriptions
Among the Asteraceae represented in our dataset are some of the most
3
noxious weeds and invaders in the temperate world, including the six focal invasive
4
taxa of our study. The species in the genus Centaurea, including C. solstitialis, C.
5
stoebe ssp. micranthos, and C. diffusa, are native to Europe and Asia, and have
6
successfully invaded North American rangelands, making Centaurea the most
7
abundant noxious weed genus in western North America (Lejeune & Seastedt 2001).
8
In fact, Centaurea is one of only 15 plant genera significantly more likely to contain
9
invasive species in North America than expected by chance (Kuester et al. 2014).
10
Cirsium arvense Scop is native to temperate Eurasia, where it is one of the foremost
11
noxious agricultural weeds (Schroeder et al. 1993). It has spread to four other
12
continents and has become one of the most prevalent weeds in North America
13
(Moore 1975). Species in the genus Ambrosia (ragweeds), including A. artemisiifolia
14
and A. trifida, are native to North America and have become abundant colonizers of
15
disturbed habitats across temperate North America, Europe and Asia (Bassett &
16
Crompton 1975, 1982). Ragweeds are agricultural weeds and their allergenic pollen
17
is one of the primary causes of hay fever (Laaidi et al. 2003). Lastly, thirteen
18
members of the genus Helianthus (sunflowers), and in particular H. annuus are
19
globally introduced or invasive (Chamberlain and Szocs, 2013; EOL, 2014). The high
20
levels of gene flow between members of this genus make it a primary concern for
21
transgene escape and the evolution of a “super weed” (discussed in Lai et al. 2012).
22
23
Paralog removal
24
To remove potential paralogs from our alignments, we used a tree-based
25
approach. Briefly, we constructed gene trees for each orthogroup using RAxML
26
version 8.0.6 (Stamatakis 2006). Jmodeltest 2.1.4 was run for each gene to guide the
27
selection of the nucleotide substitution model (Darriba et al. 2012). To improve the
28
resulting gene tree, we implemented TreeFix v1.1.8 (Wu et al. 2013), which uses the
29
species tree topology to guide the reconstruction of the gene tree. Given a gene tree,
30
TreeFix finds a “statistically equivalent” tree that minimizes a species tree-based
31
cost function. Following this we used the program NOTAUG v2.6 (Durand et al.
32
2006) and compared the reconciled gene tree with the species tree (see species tree
33
reconstruction below) to identify likely paralogs. We removed sequences that were
34
identified as paralogs using this method from the alignments. We included at most
35
two sequences of the same species if they were in the alignments. Although this
36
method may also eliminate genes crossing species boundaries, we preferred this
37
conservative approach to avoid excessive numbers of false positives resulting from
38
misidentification of orthology.
39
40
41
Pairwise comparisons of native and introduced transcriptomes
Specifically, we ran CODEML for the protein coding regions in runmode -2,
42
with F3X4 codon frequency. Prior to the analysis we eliminated all alignments with
43
average percent identity below 50%, as these likely represented misalignments
44
(this procedure was conducted for all alignments, such as the branch-site and site-
45
specific analyses). We also removed columns with missing data or gaps and only
46
retained sequences with at least 150 nucleotides for the coding regions. As low
47
divergence leads to uncertain dN/dS ratio estimates, cases where dS was below 0.01
48
were excluded. We also discarded orthogroups showing dS or dN > 2, indicating
49
saturation of substitutions. Orthogroups with dN/dS > 1 were considered
50
candidates for positive selection.
51
52
PAML branch and branch-site models
53
As gaps and other alignment errors have been known to generate false
54
positives in dN/dS analyses (Fletcher & Yang 2010; Markova-Raina & Petrov 2011),
55
each orthogroup and/or site inferred to be under positive selection was visually
56
inspected in AliView 1.07 (Larsson 2014), and any cases of potential misalignment
57
(e.g. errors associated with indels) were removed from the analysis. Furthermore,
58
because nucleotide changes inferred to be under positive selection did not occur in
59
all taxa designated as foreground, we performed an additional set of PAML analyses
60
on the remaining orthogroups setting as foreground only taxa in which changes of
61
interest were present.
62
63
Gene Ontology analysis
64
We assigned GO terms to each orthogroup based on the GO A. thaliana
65
mappings to the top hits and removed redundant GO terms. To identify which
66
biological processes rapidly evolving genes were associated with, we performed a
67
GO enrichment analysis using topGO (Alexa et al. 2006). All orthogroups that were
68
not significant in tests of positive selection were used as background. Significance
69
for each individual GO-identifier was computed with Fisher's exact test. As GO terms
70
are non-independent, we used the parent-child method that determines
71
overrepresentation of terms in the context of annotations to the term's parents
72
(Grossmann et al. 2007). This approach reduces the dependencies between the
73
individual terms, and avoids producing false-positives.
74
75
References
76
Alexa A, Rahnenführer J, Lengauer T (2006) Improved scoring of functional groups
77
from gene expression data by decorrelating GO graph structure. Bioinformatics,
78
22, 1600–1607.
79
Bassett IJ, Crompton CW (1975) The biology of Canadian weeds. 11. Ambrosia
80
artemisiifolia L. and A. psilostachya D.C. Canadian Journal of Plant Science, 55,
81
463–476.
82
Durand D, Halldórsson B, Vernot B (2006) A hybrid micro–macroevolutionary
83
approach to gene tree reconstruction. Journal of Computational Biology, 13,
84
320–335.
85
Fletcher W, Yang Z (2010) The effect of insertions, deletions, and alignment errors
86
on the branch-site test of positive selection. Molecular Biology and Evolution,
87
27, 2257–2267.
88
Grossmann S, Bauer S, Robinson PN, Vingron M (2007) Improved detection of
89
overrepresentation of Gene-Ontology annotations with parent child analysis.
90
Bioinformatics, 23, 3024–3031.
91
Laaidi M, Laaidi K, Besancenot J-P, Thibaudon M (2003) Ragweed in France: An
92
invasive plant and its allergenic pollen. Annals of Allergy, Asthma & Immunology,
93
91, 195–201.
94
95
96
97
98
99
100
101
102
103
104
Larsson A (2014) AliView: a fast and lightweight alignment viewer and editor for
large datasets. Bioinformatics, 1–3.
Lejeune K, Seastedt T (2001) Centaurea species: The forb that won the west.
Conservation Biology, 15, 1568–1574.
Markova-Raina P, Petrov D (2011) High sensitivity to aligner and high rate of false
positives in the estimates of positive selection in the 12 Drosophila genomes.
Genome Research, 21, 863–874.
Moore R (1975) The biology of Canadian weeds. 13. Cirsium arvense (L.). Canadian
Journal of Plant Science, 55, 1033–1048.
Schroeder D, Stinson CSA, Station E (1993) A European weed survey in 10 major
crop systems to identify targets for biological control, 33, 449–459.
105
Wu Y, Rasmussen D, Bansal M, Kellis M (2013) TreeFix: Statistically informed gene
106
tree error correction using species trees. Systematic Biology, 62, 110–120.
107
108
109
Download