emi12259-sup-0001-methods

advertisement
1
Supplementary Methods
2
Hybridization efficiency was calculated from fundamental thermodynamic
3
principles. The hybridization efficiency (Effhyb), defined as the fraction of template
4
bound by a primer at a given temperature (Mathews et al., 1999), was calculated as
5
follows: First, the program UNAFold (Markham and Zuker, 2008) was utilized for
6
determining Gibbs free energy change (ΔGo) of hybridization between primer and
7
template (pt), as well as to determine the free energies of primer folding (pf), target
8
folding (tf) at the target site, and the formation of primer-dimer (pd) products. Since
9
UNAFold allows for a correction factor based on the concentration of ions present in the
10
reaction, the molar concentrations of monovalent cations ([Mon+]; e.g., [K+]) and
11
divalent cations ([Div2+]; e.g., [Mg2+]) were converted to a sodium equivalent
12
concentration [Naeq+] according to Eq. 1 (Owczarzy et al., 2008), which also takes into
13
account the molar concentration of deoxyribonucleotide triphosphates [dNTPs].
+
[π‘π‘Žπ‘’π‘ž
] = [π‘€π‘œπ‘›+ ] + 3.33√[𝐷𝑖𝑣 2+ ] − [𝑑𝑁𝑇𝑃𝑠]
(1)
14
Gibbs free energies were then used to calculate equilibrium constants for all
15
reactions using the fundamental relationship shown in Eq. 2, where R is the ideal gas
16
law constant (0.00199 kcal⁄(π‘šπ‘œπ‘™ ∗ 𝐾)) and T is the annealing temperature in Kelvin.
𝐾 = exp(−
βˆ†πΊ π‘œ
)
𝑅𝑇
(2)
17
An effective equilibrium constant (Keff) was calculated, according to Eq. 3
18
(Mathews et al., 1999), as a function of the side-reaction equilibrium constants (Kpf, Ktf,
1
19
Kpd), the equilibrium constant governing primer-template duplex formation (Kpt), and
20
the molar concentration of primers (Po) present in the PCR reaction.
𝐾𝑒𝑓𝑓 =
4 ∗ 𝐾𝑝𝑑 ∗ 𝐾𝑝𝑑 ∗ 𝑃0
(3)
2
(−1 − 𝐾𝑝𝑓 + √(1 + 𝐾𝑝𝑓 ) + 8 ∗ 𝑃0 ∗ 𝐾𝑝𝑑 ) ∗ (1 + 𝐾𝑑𝑓 )
21
Finally, Effhyb was determined directly from the effective equilibrium constant
22
(Keff) using Eq. 4, which assumes that the primer concentration is much greater than the
23
template concentration (Yilmaz and Noguera, 2004).
πΈπ‘“π‘“β„Žπ‘¦π‘ = (
𝑃0 ∗ 𝐾𝑒𝑓𝑓
)
1 + 𝑃0 ∗ 𝐾𝑒𝑓𝑓
(4)
24
Quantitative PCR (qPCR) was used to measure elongation efficiency of
25
mismatched primers. A strand of template DNA is copied during a PCR cycle when a
26
primer hybridizes to the strand and is subsequently elongated by a DNA polymerase.
27
In this way amplification can be broken into two successive events: hybridization and
28
elongation.
29
hybridization efficiency, as the fraction of primers hybridized to the template that are
30
elongated during a PCR cycle. Hence, the overall amplification efficiency (Effamp) can be
31
described as the product of hybridization (Effhyb) and elongation efficiency (Effelong)
32
according to Eq. 5.
We chose to represent elongation efficiency in a similar manner to
πΈπ‘“π‘“π‘Žπ‘šπ‘ = πΈπ‘“π‘“β„Žπ‘¦π‘ ∗ πΈπ‘“π‘“π‘’π‘™π‘œπ‘›π‘”
(5)
2
33
Using qPCR it is only possible to measure amplification efficiency, therefore
34
elongation efficiency can only be directly measured when hybridization efficiency is
35
100%. Thus, in order to measure elongation efficiency with qPCR, we chose to perform
36
experiments at 50°C, and we used primers, either perfectly matched or mismatched,
37
that that had greater than 99.9% hybridization efficiency at that temperature. These
38
elongation efficiency measurements were later used to make predictions at higher
39
annealing temperatures because the extension temperature remained constant at 72°C,
40
and it has been previously shown that the rate of mismatched elongation is largely
41
temperature invariant (Innis et al., 1995).
42
By defining elongation efficiency of perfect match primers to be 100%, we
43
measured elongation efficiency of mismatched primers relative to their perfect match
44
counterpart. To obtain this measurement, the same initial concentration of template
45
was separately amplified, in triplicate, with either perfect match primers or a primer set
46
containing one mismatched primer (Fig. S2). A delay of threshold cycle (Ct) of the
47
amplification with the mismatched primer relative to the perfect match primer set (ΔCt)
48
was obtained from these experiments.
49
according to Eq. 6, which empirically approximates the exponential nature of PCR
50
amplifications and takes into account the fact that each PCR cycle performed with a
51
mismatched primer creates a new template that will be perfectly matched in subsequent
Elongation efficiency was then calculated
3
52
cycles. This equation also reflects the assumption that Effhyb is equal to 100% in the
53
experiments.
54
55
πΈπ‘“π‘“π‘’π‘™π‘œπ‘›π‘” = 2−βˆ†πΆπ‘‘
(6)
56
The melting temperatures of primer/template duplexes were calculated from
57
Effhyb equations. A primer’s Effhyb as a function of temperature follows a profile with a
58
100% Effhyb plateau at low temperature, 0% efficiency plateau at high temperatures, and
59
a sigmoidal efficiency decrease as temperature increases between the two plateaus. The
60
melting temperature (Tm) is the point at which Effhyb is 50%. Accordingly, the melting
61
point of primer/template duplexes was numerically calculated during primer design by
62
finding the temperature at which Effhyb in Eq. 4 becomes 50%.
63
Perfect match primers were designed to have an average predicted Tm of 65°C
64
(Table S6), and as discussed above, experiments to estimate elongation efficiency were
65
carried out using an annealing temperature of 50°C in order to be at or near the 100%
66
Effhyb plateau. In order to experimentally assess whether this requirement was met,
67
qPCR experiments were conducted using the perfect match primers with annealing
68
temperatures varying from 50°C to 75°C (Fig. S2).
69
temperature at which amplification was delayed by no more than two cycles with
70
respect to the 100% efficiency plateau. These temperatures were between 59°C and
71
70°C, confirming that using 50°C as the annealing temperature for the rest of the study
72
was sufficient to reach the 100% Effhyb plateau.
Table S5 reports the observed
4
73
Primers specific to bacterial and archaeal genera were designed to take
74
advantage of mismatches at or near the primer's 3' end. The primer design algorithm
75
begins with a set of aligned DNA sequences for designing primers. We chose to use the
76
RDP (Cole et al., 2009) database (version 10.30) of “good quality” 16S rRNA gene
77
sequences, which included 2,342,448 Bacteria and 117,236 Archaea. We removed 30,085
78
possibly chimeric sequences with DECIPHER (Wright et al., 2012) using full-length
79
sequence parameters. The full set of DNA sequences was grouped into categories, in
80
our case according to phylogenetic similarity where each genus formed its own group.
81
Unclassified groups were omitted from the analysis, which resulted in a dataset of
82
1,640,974 bacterial sequences classified into 1,834 genera, and 47,091 archaeal sequences
83
classified into 109 genera.
84
Next, a set of overlapping k-mers, termed “tiles,” of 27 nucleotides in length was
85
formed from each group of aligned sequences.
For each group, up to 10 tiles
86
representing the most common permutations at each position within the sequence were
87
selected for further analysis. The aim was to select the smallest possible number of tiles
88
that represented at least 90% of the sequences within each group (genus). This tiling
89
approach is preferable to using consensus sequences for each group, because it results
90
in less sequence ambiguity and minimizes the possibility of designs that target minor
91
sequence variations that may be the result of either sequencing errors or chimeras not
92
detectable with the chimera-removal tool (Wright et al., 2012). The compromise is that
5
93
this approach may eliminate from the design real sequence variations that are only
94
minor components of the targeted group. The complete set of tiles selected above
95
became the set of subsequences from which primers were designed to target a specific
96
group while avoiding other non-target groups.
97
In the next step, a set of potential forward and reverse primers meeting the
98
following constraints was designed for each target site: First, a minimum set of perfect
99
match primers must cover a specified fraction of the group, while requiring a minimum
100
percentage of the group’s sequences to include that region. For example, we chose to
101
design forward and reverse primers that covered at least 90% of their genus with up to
102
4 permutations, while requiring at least 20% of the group’s sequences to overlap the
103
target site (not all sequences span the entire gene). Second, since about 97% of sequence
104
information in the RDP database is contained between Escherichia coli positions 27 and
105
1406, we only considered tiles in this range in order to obtain a more holistic
106
representation of target and non-target diversity. Third, tiles were also constrained to
107
have no more than 4 runs of a single base (e.g., AAAA), or 4 di-nucleotide repeats (e.g.,
108
ACACACAC), as this may result in false priming due to entropic effects.
109
Finally, to create primers from each selected tile, the primer’s length was
110
adjusted, base-by-base, starting from 17 nucleotides long and adding one nucleotide to
111
the 5’-end until a defined minimum hybridization efficiency was met. In this manner,
6
112
primer permutations are given different lengths such that each permutation has a
113
similar efficiency at the annealing temperature in order to minimize bias. For this study
114
we chose to have primers with at least 80% hybridization efficiency at an annealing
115
temperature of 64°C. PCR primers are often designed for maximum (100%) efficiency at
116
the annealing temperature; however such a high binding strength is disadvantageous
117
for discriminating between target and non-target templates. After these steps, any
118
positions within the sequences for which the selected tiles could not meet these initial
119
primer design constraints were eliminated from the set of possible target sites.
120
Typically, about 25 - 75% of 16S positions were viable as candidate target sites after
121
applying the above constraints.
122
The amplification efficiency of a primer hybridized to a perfectly matched
123
template assumed an elongation efficiency of 100%, while a systematic set of mismatch
124
rules were created to predict the elongation efficiency of non-targets. Mismatched
125
terminal nucleotides were evaluated using the averaged penultimate base pairing
126
information derived from Fig. 2 (Table S1). Furthermore, elongation efficiency of single
127
mismatches and indels in positions 2-4 were based on average efficiencies measured in
128
those positions (Fig. 3). Double mismatches and double indels in positions 1-6 were
129
assigned zero efficiency, as multiple mismatches cause both decreased elongation and
130
hybridization efficiencies at the high annealing temperatures typically used in PCR.
7
131
Next, each viable set of primers was scored with the objective of minimizing
132
potential cross-amplification with non-target groups. To accomplish this, the primers
133
were hybridized in silico to all other non-target tiles in the same alignment position and
134
elongated in silico to determine their predicted amplification efficiency. The potential
135
amplification efficiency for each primer at the annealing temperature (64°C) was
136
estimated from hybridization and elongation efficiencies according to Eq. 5. Then, the
137
overall in silico amplification efficiency for a set of forward and reverse primers was
138
approximated by the geometric mean of the forward and reverse efficiencies according
139
to Equation 7.
πΈπ‘“π‘“π‘œπ‘π‘  = √𝐸𝑓𝑓𝑓_π‘π‘Ÿπ‘–π‘šπ‘’π‘Ÿ ∗ πΈπ‘“π‘“π‘Ÿ_π‘π‘Ÿπ‘–π‘šπ‘’π‘Ÿ
(7)
140
After performing all in silico amplifications of non-target groups, amplifications
141
with at least 0.1% predicted efficiency were recorded. A specificity score for each
142
potential primer set was then calculated as the sum of the predicted amplification
143
efficiencies for all non-target groups. Potential forward and reverse primer sets were
144
ranked according to their ability to discriminate between their target group and all
145
other non-target groups. This ranked set of primers was screened to eliminate primer
146
combinations that would produce amplicons of undesirable length. In our case, we
147
chose amplicons to be between 300 and 1,200 base pairs so that the PCR product would
148
be long enough for downstream sequencing. Furthermore, a check for potential primer-
8
149
dimer artifacts was conducted with every combination of forward and reverse primers
150
in the ranked set.
151
As a final step, a more thorough search was conducted using the top ranked
152
primer sets. The sequence of each primer was again checked against every non-target
153
tile, but this time for every position upstream and downstream of the target site. This
154
search ensured the primer would not unexpectedly cross-hybridize to a position other
155
than the target site on a non-target DNA template. While it may seem unlikely that
156
cross-hybridizations would occur away from the target site, we found examples in
157
which this occurred, and thus, we included this rule for thoroughness. Scores for each
158
primer set were updated based on this exhaustive search, and the optimal primer set
159
was chosen as the one that would amplify the target group while minimizing the in
160
silico amplification of non-targets. In practice, it was not always possible to find an
161
ideal primer set with no predicted amplification of all non-targets. In such cases, the
162
predicted amplification efficiency of each potential cross-amplification was recorded for
163
further consideration.
164
The genus specific primers developed in this study have been made available
165
online (http://DECIPHER.cee.wisc.edu). For modest primer design tasks we have also
166
made our program accessible online as part of a web tool. In our experience the
167
program’s outputs are adequate for immediate application, but we recommend that
168
PCR conditions be optimized with a temperature gradient using the target template as
9
169
is typically performed in PCR. This is especially necessary if using primers designed by
170
the program with an induced mismatch, because they may have a lower melt
171
temperature with the target template. The primer design program can be used for the
172
identification of primer sets (forward and reverse) that minimize potential non-target
173
hybridizations for any DNA sequence and any level of taxonomic or group clustering.
174
Help documentation available with the program assist the user in setting the
175
appropriate input parameters for their PCR experiment and interpreting the program’s
176
output.
177
REFERENCES
178
Cole, J.R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R.J. et al. (2009) The
179
Ribosomal Database Project: improved alignments and new tools for rRNA analysis.
180
Nucleic Acids Research 37: D141-D145.
181
Innis, M.A., Gelfand, D.H., and Sninsky, J.J. (eds) (1995) PCR Strategies. San Diego, CA:
182
Academic Press, Inc.
183
Markham, N.R., and Zuker, M. (2008) UNAFold: software for nucleic acid folding and
184
hybriziation. In Methods in Molecular Biology: Humana Press, Totowa, NJ, pp. 3-31.
185
Mathews, D., Burkard, M., Freier, S., Wyatt, J., and Turner, D. (1999) Predicting
186
Oligonucleotide Affinity to Nucleic Acid Targets. RNA 5: 1458-1469.
10
187
Owczarzy, R., Moreira, B.G., You, Y., Behlke, M.A., and Walder, J.A. (2008) Predicting
188
stability of DNA duplexes in solutions containing magnesium and monovalent cations.
189
Biochemistry 47: 5336-5353.
190
Wright, E.S., Yilmaz, L.S., and Noguera, D.R. (2012) DECIPHER, a Search-Based
191
Approach to Chimera Identification for 16S rRNA Sequences. Applied and Environmental
192
Microbiology 78: 717-725.
193
Yilmaz, L.S., and Noguera, D.R. (2004) Mechanistic approach to the problem of
194
hybridization efficiency in fluorescent in situ hybridization. Applied and Environmental
195
Microbiology 70: 7126-7139.
196
11
Download