Here we provide details of the derivation of the two main theoretical

advertisement
Supplementary Material
Here we provide details of our model for quantitative heteroduplex analysis of SNPs. First we
determine an expression for the difference curve obtained by subtracting the melting curves from
samples of different genotypes mixed with homozygous reference DNA at fraction x. This curve is
the given by the difference in heteroduplex fractions of the mixtures times a weighted average of
the homoduplex curves minus the mean of the heteroduplex curves. When the melting curves of
wild type and mutant homozygotes are indistinguishable, the weighted average of the homoduplex
curves is just the common homozygote curve. In this case the reference DNA fraction which
maximizes separation of the difference curves all three genotypes of bi-allelic diploid DNA is x= 17 .
Samples of different genotypes are designated W for wild type, M for homozygous mutant,
 are
and H for heterozygous mutant. The forward and reverse strands of the wild type amplicon
designated w and w’, respectively,with m and m’ for the homozygous mutant strands. We assume
that the genotype of the reference DNA is wild type. We assume amplification preserves the
relative proportion of all strand species. Let x be the homozygous reference DNA fraction at the
beginning of PCR. If all strands are fully extended during PCR and no strand reassociation occurs,
Table 1 gives the homoduplex fractions at the end of PCR.
Table 1. Homoduplex fractions without strand re-association
Genotype
[ww’]
[mm’]
W
1
0
M
x
1-x
H
x + 12 (1-x) = 12 (1+x)


1
2

(1-x)
If a final denaturation and reannealing step occurs prior to melting, we assume that heteroduplexes
are formed randomly, independent of whether they are perfectly complementary ([ww’],[mm’]) or
just approximately complementary ([wm’],[mw’]). The resulting fraction of each of these duplexes
is the product of the individual strand fractions (each given by its duplex fraction in Table 1.) Table
2 lists the duplex fractions after re-association obtained by multiplying the expressions in Table 1.
in Binomial (`foil’) form much like a Punnett square. Since each row sums to 1 in Table 1, so do
the rows of Table 2 (If a+b=1, (a+b)(a+b)=aa+ab+ba+bb=1).
Table 2. Duplex fractions after denaturation and reannealing of PCR product
Genotype
[ww’]
[wm’]
[mw’]
[mm’]
W
1
0
0
0
M
x2
x(1-x)
x(1-x)
(1-x)2
H
1
4
(1+x)2
1
4
(1-x2)
1
4
(1-x2)
1
4
(1-x)2




Our next assumption is that the melting curve of a mixture of duplexes is given by the
weighted sum of the individual duplex melting curves in proportion to their relative
concentrations. Identifying the duplex fluorescence vs. temperature melting curves by their
two strands, Fww’, Fwm’, Fmw’, Fmm’, and weighting them by the duplex concentrations from
Table 2 gives Table 3, the theoretical melting curves of the various genotypes mixed with
reference DNA at fraction x. Note that when x=0, these expressions reduce to the wild type
and mutant homozygote melting curves, and the heterozygote curve given by the equally
weighted sum of one-quarter of each duplex, two homoduplexes and two heteroduplexes, so
its overall heteroduplex content is one-half. We may interchange the words homozygote and
homoduplex when referring to their melting curves, but not heterozygote and heteroduplex.
Table 3. Melting curves of different genotypes mixed with homozygous reference DNA at
fraction x.
Genotype
Melting Curve
W
1.0Fww’(T)
M
x2Fww’ (T) + x(1-x)Fwm’ (T) + x(1-x)Fmw’(T) + (1-x)2F mm’(T)
H
1
4
(1+x)2Fww’ (T) +
1
4
(1-x2)Fwm’ (T) +
1
4
(1-x2)Fmw’(T) +
1
4
(1-x)2Fmm’(T)




Theoretically computed examples of the four duplex curves and the three genotype curves
for the HFE amplicons of the manuscript, corresponding to the experimental examples in Fig.
2 of the manuscript, are shown in Fig. S1. The two homozygous genotypes and two
homoduplexes all have identical melting curves, so only four distinct melting curves are
visible. These curves are obtained by inverting the van ‘tHoff equation for temperature in
terms of product and reactant concentrations, T 
H
, with S and
[dsDNA]
S  Rln
[ssDNA1][ssDNA 2]

H obtained from nearest -neighbor approximations for the duplexes and thermodynamic

parameters described in [S1, S2] and implemented
by MeltingWizard, available at

http://dnawizards.path.utah.edu.
Subtracting the melting curves in the lower two rows of Table 3 from the melting curve in
the upper row gives the theoretical difference curves shown in Table 4. The difference curve
between mixtures with mutant homozygous and heterozygous samples is the difference of
these differences. Note that the sum of the coefficients of melting curves in each row of Table
4 is zero, since we have subtracted rows with coefficients adding to one.
Table 4. Difference curves
Genotypes
Difference Curve
W-M
(1-x2) 2Fww’(T) - x(1-x)Fwm’(T) - x(1-x)Fmw’(T) - (1-x)2Fmm’(T)
(1- 14 (1+x)2) 2Fww’(T) -
W-H

1
4

(1-x2)Fwm’(T) -

1
4
(1-x2)Fmw’(T) - 14 (1-x)2Fmm’(T)

In Table 5, we factor out heteroduplex content from the expressions in each row of Table 4. Since
the coefficients of the heteroduplex curves Fwm’ and Fmw’ in Table 4 are the same (both equal to
minus one-half of the total heteroduplex content) the coefficients of Fwm’ and Fmw’ in Table 5 are
both - 12 . Since the coefficients of Fww’ and Fmm’ in Table 4 are positive, and the sum of homoduplex
and heteroduplex curve coefficients is zero, the coefficients mww’, mmm’ in Table 5 must be positive
 and sum to +1, and the same holds for h and h .
ww’
mm’
Table 5. Factored form of difference curves.
Genotypes
Difference Curve (Factored Form)
W-M
2x(1-x)[(mww’(x)Fww’(T)+ mmm’(x)Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T))]
W-H
1
2
(1-x2)[ [(hww’(x)Fww’(T)+ hmm’(x)Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T))]

 the conditions for coefficients of
that is mww’(x) +mmm’(x) =1 and hww’(x) + hmm’(x) =1. These are
weighted average, sometimes called a convex combination or linear interpolation. Table 5 says that
difference curves are described by the total heteroduplex content times a weighted average of the
homoduplex curves minus the mean heteroduplex curve.
Since we are most interested in the situation when it is difficult to distinguish the
homozygous mutant melting curve from the wild type melting curve, we focus first on the
case when they are taken to be identical. In situations where nearest-neighbor
thermodynamics predicts that the melting curves of two homoduplexes are the same, it does
not imply that the melting curves corresponding to heteroduplexes are the same, in fact that
would be an unlikely coincidence. And though the relative concentrations of the two
heteroduplexes in our these mixtures is always equal, the relative concentrations of the
homoduplexes is not. What is unique to this situation is that all weighting factors have the
same result: When Fww’(T)=Fmm’(T) then aFww’(T)+bFmm’(T) is the same for any a and b with
a+b=1. This allows us to replace each of the weighting factors in Table 5 (mww’, mmm’ , hww’ , and
hmm’ ) by 12 , and completely separate the heteroduplex dependence from the temperature
dependence of the difference curves. Table 6 shows that in the case of identical homozygotes, the
melting curve difference is given by the heteroduplex content difference times a universal curve
equal to the mean homoduplex curve minus the mean heteroduplex curve. This universal difference
curve and the difference curve between unmixed homozygous and heterozygous samples are also
shown in Fig. S1. The latter is one-half of the former, in accordance with the theory.
Table 6. Difference curves when homozygote curves are equal, Fww’(T)= Fmm’(T)
Genotypes
Difference Curve When Fww’(T)= Fmm’(T)
W-M
2x(1-x)[ 12 (Fww’(T)+Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T))] = m(x)F(T)
1
2
W-H
(1-x2)[ 12 (Fww’(T)+Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T))] = h(x)F(T)


( 32 x2-2x+ 12 )[ 12 (Fww’(T)+Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T))] = (h(x)-m(x))F(T)
H-M





 heteroduplex content
 of the mixture with a mutant homozygous sample,
Here m(x)=2x(1-x)
is the
h(x)=2x(1-x) is the heteroduplex content of the mixture with a heterozygous sample, and F(T) =
1
2
(Fww’(T)+Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T)) is the difference of the mean homoduplex curve and the
mean heteroduplex curve. Since there is no heteroduplex content in the mixture with wild type,

 are also the difference in heteroduplex content between their respective mixtures and
m(x) and h(x)
wild type. We have also obtained the expression for the difference curve between homozygous
mutant and heterozygous samples by subtracting the differences of each and wild type: H-M=(WM)-(W-H).
In the case of identical homozygous mutant and wild type melting curves (Table 6) we can
explicitly determine the reference DNA fraction x which maximizes the separation between melting
curves corresponding to mixtures of that fraction with different genotypes. This is because the
separation between any such pair of curves is proportional to the magnitude of the heteroduplex
content difference between the mixtures, which we have computed for all three pairs as m(x), h(x),
and |h(x)-m(x)|, which are plotted in Fig. S2. Any quantitative measure of separation will be
propotional to the appropriate function, such as area between curves or their maximum separation.
We will measure the ability to distinguish all genotypes by the maximum separation between the
closest pair. The separation between a pair of mixtures is given by their heteroduplex content
difference times the maximum value of F(T), which we will call F. For example, when the
reference DNA fraction x=0 and samples are unmixed, wild type and homozygous mutant samples
have heteroduplex content equal to 0, and heterozygous samples have heteroduplex content equal to
1
2
. The maximum difference between either homozygous curve and the heterozygous curve is 12 F,
but the maximum difference between the two homozygous curves is zero. To maximize our ability

to distinguish all three genotypes, we seek the value of the reference DNA fraction xwhich
maximizes the smallest of the three absolute heteroduplex content difference functions. To find this
value, we first place the three absolute heteroduplex content difference functions, in increasing
order depending on the values of x on intervals where the order differs.
Table 7. Order of absolute heteroduplex content difference on intervals of x.



Interval
Smallest, s(x)
Middle
Largest
0<x< 17
m(x)
h(x)-m(x)
h(x)
1
7
<x< 13
h(x)-m(x)
m(x)
h(x)
1
3
<x<1
m(x)-h(x)
h(x)
m(x)

The second column gives the function s(x) we wish to maximize across the full interval of
reference DNA fractions x. In Fig. S2, this is the union of the lowest curves on each interval, m(x)
from reference DNA fractions x=0 to 17 , h(x)-m(x) from x= 17 to x= 13 , and m(x)-h(x) from x= 13




to x=1. A theorem of Fermat which states if a function s(x) has a local extremum at x=a, then
s'(a)=0 or s'(a) does not exist, confirms our visual intuition that the maximum of s(x) can only
occur where the slope of its tangent is zero, or it does not have a well-defined tangent. The only
place s'(x)=0 is halfway between its roots
1
3
and 1 (as it is quadratic in this interval), i.e., at x= 23 .
This corresponds to adding twice as much wild-type DNA as there was unknown DNA and gives a


separation of 16 F between the heterogygous
and homozygous SNP curves. The separations
between
the wild-type melting curve and the other two melting curves are larger.
 only places s(x) is not differentiable is where it changes form, i.e., at x= 1 and x= 1 .
The
7
3
Comparing the values at these points and x= 23 , we find the optimal mixture fraction occurs at


x= 17 , as indicated in Fig. S2. For this reference DNA fraction (at the temperature of maximum

separation) the melting curves of mixtures with heterozygous and wild type samples are

24
49
F apart,
barely reduced from the separation 12 F = 24
48 F when the same samples were not mixed. What we

have gained is that at this reference DNA fraction, instead of overlapping the wild type curve when
 the
 melting curve of a mixture with a homozygous sample, is exactly
the samples were not mixed,
halfway between melting curves of mixtures with wild type and heterozygous samples,
12
49
F away
from both at the temperature of maximum separation.
In retrospect, we can give a simple heuristic explanation for this value, whenwe recognize that it
corresponds to adding one part wild-type DNA to six parts unknown sample. As we saw above, the
the melting curves will be optimally separated when the homozygous mutant curve is equally
separated from both the wild-type and heterozygous melting curves, so the heteroduplex content of
the mixture with a homozygous sample must be exactly half that of a mixture with a heterozygous
sample. The ratio of 1 part wild-type to 6 parts unknown is optimal because when we divide 6 in
equal parts (3+3, representing the heterozygous sample strands), add 1 to one of the parts (4=3 wild
type sample plus one reference strand) and multiply (3)(4)=12 to represent heteroduplexes formed,
we obtain exactly twice the product of the original number (6, representing the homozygous SNP
strands) multiplied by one (reference strand.) So at the simplest level, it is because ( 12 6) ( 12 6+1) = 2
(6)(1) that the optimal reference DNA fraction is . This is visualized in the animation


http://www.math.utah.edu/~palais/optimal_mixing.html
References for the Supplementary Material
[S1] J. SantaLucia Jr., A unified view of polymer, dumbbell, and oligonucleotide DNA nearestneighbor thermodynamics, Proc. Natl. Acad. Sci. USA 95 (1995) 1460-1465.
[S2] N. Peyret, P.A. Seneviratne, H.T. Allawi, J. SantaLucia Jr.,
"Nearest neighbor thermodynamics of DNA with A·A, C·C, G·G, and T·T mismatches"
Biochemistry 38 (1999) 3468-3477.
Download