Supplementary Material Here we provide details of our model for quantitative heteroduplex analysis of SNPs. First we determine an expression for the difference curve obtained by subtracting the melting curves from samples of different genotypes mixed with homozygous reference DNA at fraction x. This curve is the given by the difference in heteroduplex fractions of the mixtures times a weighted average of the homoduplex curves minus the mean of the heteroduplex curves. When the melting curves of wild type and mutant homozygotes are indistinguishable, the weighted average of the homoduplex curves is just the common homozygote curve. In this case the reference DNA fraction which maximizes separation of the difference curves all three genotypes of bi-allelic diploid DNA is x= 17 . Samples of different genotypes are designated W for wild type, M for homozygous mutant, are and H for heterozygous mutant. The forward and reverse strands of the wild type amplicon designated w and w’, respectively,with m and m’ for the homozygous mutant strands. We assume that the genotype of the reference DNA is wild type. We assume amplification preserves the relative proportion of all strand species. Let x be the homozygous reference DNA fraction at the beginning of PCR. If all strands are fully extended during PCR and no strand reassociation occurs, Table 1 gives the homoduplex fractions at the end of PCR. Table 1. Homoduplex fractions without strand re-association Genotype [ww’] [mm’] W 1 0 M x 1-x H x + 12 (1-x) = 12 (1+x) 1 2 (1-x) If a final denaturation and reannealing step occurs prior to melting, we assume that heteroduplexes are formed randomly, independent of whether they are perfectly complementary ([ww’],[mm’]) or just approximately complementary ([wm’],[mw’]). The resulting fraction of each of these duplexes is the product of the individual strand fractions (each given by its duplex fraction in Table 1.) Table 2 lists the duplex fractions after re-association obtained by multiplying the expressions in Table 1. in Binomial (`foil’) form much like a Punnett square. Since each row sums to 1 in Table 1, so do the rows of Table 2 (If a+b=1, (a+b)(a+b)=aa+ab+ba+bb=1). Table 2. Duplex fractions after denaturation and reannealing of PCR product Genotype [ww’] [wm’] [mw’] [mm’] W 1 0 0 0 M x2 x(1-x) x(1-x) (1-x)2 H 1 4 (1+x)2 1 4 (1-x2) 1 4 (1-x2) 1 4 (1-x)2 Our next assumption is that the melting curve of a mixture of duplexes is given by the weighted sum of the individual duplex melting curves in proportion to their relative concentrations. Identifying the duplex fluorescence vs. temperature melting curves by their two strands, Fww’, Fwm’, Fmw’, Fmm’, and weighting them by the duplex concentrations from Table 2 gives Table 3, the theoretical melting curves of the various genotypes mixed with reference DNA at fraction x. Note that when x=0, these expressions reduce to the wild type and mutant homozygote melting curves, and the heterozygote curve given by the equally weighted sum of one-quarter of each duplex, two homoduplexes and two heteroduplexes, so its overall heteroduplex content is one-half. We may interchange the words homozygote and homoduplex when referring to their melting curves, but not heterozygote and heteroduplex. Table 3. Melting curves of different genotypes mixed with homozygous reference DNA at fraction x. Genotype Melting Curve W 1.0Fww’(T) M x2Fww’ (T) + x(1-x)Fwm’ (T) + x(1-x)Fmw’(T) + (1-x)2F mm’(T) H 1 4 (1+x)2Fww’ (T) + 1 4 (1-x2)Fwm’ (T) + 1 4 (1-x2)Fmw’(T) + 1 4 (1-x)2Fmm’(T) Theoretically computed examples of the four duplex curves and the three genotype curves for the HFE amplicons of the manuscript, corresponding to the experimental examples in Fig. 2 of the manuscript, are shown in Fig. S1. The two homozygous genotypes and two homoduplexes all have identical melting curves, so only four distinct melting curves are visible. These curves are obtained by inverting the van ‘tHoff equation for temperature in terms of product and reactant concentrations, T H , with S and [dsDNA] S Rln [ssDNA1][ssDNA 2] H obtained from nearest -neighbor approximations for the duplexes and thermodynamic parameters described in [S1, S2] and implemented by MeltingWizard, available at http://dnawizards.path.utah.edu. Subtracting the melting curves in the lower two rows of Table 3 from the melting curve in the upper row gives the theoretical difference curves shown in Table 4. The difference curve between mixtures with mutant homozygous and heterozygous samples is the difference of these differences. Note that the sum of the coefficients of melting curves in each row of Table 4 is zero, since we have subtracted rows with coefficients adding to one. Table 4. Difference curves Genotypes Difference Curve W-M (1-x2) 2Fww’(T) - x(1-x)Fwm’(T) - x(1-x)Fmw’(T) - (1-x)2Fmm’(T) (1- 14 (1+x)2) 2Fww’(T) - W-H 1 4 (1-x2)Fwm’(T) - 1 4 (1-x2)Fmw’(T) - 14 (1-x)2Fmm’(T) In Table 5, we factor out heteroduplex content from the expressions in each row of Table 4. Since the coefficients of the heteroduplex curves Fwm’ and Fmw’ in Table 4 are the same (both equal to minus one-half of the total heteroduplex content) the coefficients of Fwm’ and Fmw’ in Table 5 are both - 12 . Since the coefficients of Fww’ and Fmm’ in Table 4 are positive, and the sum of homoduplex and heteroduplex curve coefficients is zero, the coefficients mww’, mmm’ in Table 5 must be positive and sum to +1, and the same holds for h and h . ww’ mm’ Table 5. Factored form of difference curves. Genotypes Difference Curve (Factored Form) W-M 2x(1-x)[(mww’(x)Fww’(T)+ mmm’(x)Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T))] W-H 1 2 (1-x2)[ [(hww’(x)Fww’(T)+ hmm’(x)Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T))] the conditions for coefficients of that is mww’(x) +mmm’(x) =1 and hww’(x) + hmm’(x) =1. These are weighted average, sometimes called a convex combination or linear interpolation. Table 5 says that difference curves are described by the total heteroduplex content times a weighted average of the homoduplex curves minus the mean heteroduplex curve. Since we are most interested in the situation when it is difficult to distinguish the homozygous mutant melting curve from the wild type melting curve, we focus first on the case when they are taken to be identical. In situations where nearest-neighbor thermodynamics predicts that the melting curves of two homoduplexes are the same, it does not imply that the melting curves corresponding to heteroduplexes are the same, in fact that would be an unlikely coincidence. And though the relative concentrations of the two heteroduplexes in our these mixtures is always equal, the relative concentrations of the homoduplexes is not. What is unique to this situation is that all weighting factors have the same result: When Fww’(T)=Fmm’(T) then aFww’(T)+bFmm’(T) is the same for any a and b with a+b=1. This allows us to replace each of the weighting factors in Table 5 (mww’, mmm’ , hww’ , and hmm’ ) by 12 , and completely separate the heteroduplex dependence from the temperature dependence of the difference curves. Table 6 shows that in the case of identical homozygotes, the melting curve difference is given by the heteroduplex content difference times a universal curve equal to the mean homoduplex curve minus the mean heteroduplex curve. This universal difference curve and the difference curve between unmixed homozygous and heterozygous samples are also shown in Fig. S1. The latter is one-half of the former, in accordance with the theory. Table 6. Difference curves when homozygote curves are equal, Fww’(T)= Fmm’(T) Genotypes Difference Curve When Fww’(T)= Fmm’(T) W-M 2x(1-x)[ 12 (Fww’(T)+Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T))] = m(x)F(T) 1 2 W-H (1-x2)[ 12 (Fww’(T)+Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T))] = h(x)F(T) ( 32 x2-2x+ 12 )[ 12 (Fww’(T)+Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T))] = (h(x)-m(x))F(T) H-M heteroduplex content of the mixture with a mutant homozygous sample, Here m(x)=2x(1-x) is the h(x)=2x(1-x) is the heteroduplex content of the mixture with a heterozygous sample, and F(T) = 1 2 (Fww’(T)+Fmm’(T)) - 12 (Fwm’(T)+Fmw’(T)) is the difference of the mean homoduplex curve and the mean heteroduplex curve. Since there is no heteroduplex content in the mixture with wild type, are also the difference in heteroduplex content between their respective mixtures and m(x) and h(x) wild type. We have also obtained the expression for the difference curve between homozygous mutant and heterozygous samples by subtracting the differences of each and wild type: H-M=(WM)-(W-H). In the case of identical homozygous mutant and wild type melting curves (Table 6) we can explicitly determine the reference DNA fraction x which maximizes the separation between melting curves corresponding to mixtures of that fraction with different genotypes. This is because the separation between any such pair of curves is proportional to the magnitude of the heteroduplex content difference between the mixtures, which we have computed for all three pairs as m(x), h(x), and |h(x)-m(x)|, which are plotted in Fig. S2. Any quantitative measure of separation will be propotional to the appropriate function, such as area between curves or their maximum separation. We will measure the ability to distinguish all genotypes by the maximum separation between the closest pair. The separation between a pair of mixtures is given by their heteroduplex content difference times the maximum value of F(T), which we will call F. For example, when the reference DNA fraction x=0 and samples are unmixed, wild type and homozygous mutant samples have heteroduplex content equal to 0, and heterozygous samples have heteroduplex content equal to 1 2 . The maximum difference between either homozygous curve and the heterozygous curve is 12 F, but the maximum difference between the two homozygous curves is zero. To maximize our ability to distinguish all three genotypes, we seek the value of the reference DNA fraction xwhich maximizes the smallest of the three absolute heteroduplex content difference functions. To find this value, we first place the three absolute heteroduplex content difference functions, in increasing order depending on the values of x on intervals where the order differs. Table 7. Order of absolute heteroduplex content difference on intervals of x. Interval Smallest, s(x) Middle Largest 0<x< 17 m(x) h(x)-m(x) h(x) 1 7 <x< 13 h(x)-m(x) m(x) h(x) 1 3 <x<1 m(x)-h(x) h(x) m(x) The second column gives the function s(x) we wish to maximize across the full interval of reference DNA fractions x. In Fig. S2, this is the union of the lowest curves on each interval, m(x) from reference DNA fractions x=0 to 17 , h(x)-m(x) from x= 17 to x= 13 , and m(x)-h(x) from x= 13 to x=1. A theorem of Fermat which states if a function s(x) has a local extremum at x=a, then s'(a)=0 or s'(a) does not exist, confirms our visual intuition that the maximum of s(x) can only occur where the slope of its tangent is zero, or it does not have a well-defined tangent. The only place s'(x)=0 is halfway between its roots 1 3 and 1 (as it is quadratic in this interval), i.e., at x= 23 . This corresponds to adding twice as much wild-type DNA as there was unknown DNA and gives a separation of 16 F between the heterogygous and homozygous SNP curves. The separations between the wild-type melting curve and the other two melting curves are larger. only places s(x) is not differentiable is where it changes form, i.e., at x= 1 and x= 1 . The 7 3 Comparing the values at these points and x= 23 , we find the optimal mixture fraction occurs at x= 17 , as indicated in Fig. S2. For this reference DNA fraction (at the temperature of maximum separation) the melting curves of mixtures with heterozygous and wild type samples are 24 49 F apart, barely reduced from the separation 12 F = 24 48 F when the same samples were not mixed. What we have gained is that at this reference DNA fraction, instead of overlapping the wild type curve when the melting curve of a mixture with a homozygous sample, is exactly the samples were not mixed, halfway between melting curves of mixtures with wild type and heterozygous samples, 12 49 F away from both at the temperature of maximum separation. In retrospect, we can give a simple heuristic explanation for this value, whenwe recognize that it corresponds to adding one part wild-type DNA to six parts unknown sample. As we saw above, the the melting curves will be optimally separated when the homozygous mutant curve is equally separated from both the wild-type and heterozygous melting curves, so the heteroduplex content of the mixture with a homozygous sample must be exactly half that of a mixture with a heterozygous sample. The ratio of 1 part wild-type to 6 parts unknown is optimal because when we divide 6 in equal parts (3+3, representing the heterozygous sample strands), add 1 to one of the parts (4=3 wild type sample plus one reference strand) and multiply (3)(4)=12 to represent heteroduplexes formed, we obtain exactly twice the product of the original number (6, representing the homozygous SNP strands) multiplied by one (reference strand.) So at the simplest level, it is because ( 12 6) ( 12 6+1) = 2 (6)(1) that the optimal reference DNA fraction is . This is visualized in the animation http://www.math.utah.edu/~palais/optimal_mixing.html References for the Supplementary Material [S1] J. SantaLucia Jr., A unified view of polymer, dumbbell, and oligonucleotide DNA nearestneighbor thermodynamics, Proc. Natl. Acad. Sci. USA 95 (1995) 1460-1465. [S2] N. Peyret, P.A. Seneviratne, H.T. Allawi, J. SantaLucia Jr., "Nearest neighbor thermodynamics of DNA with A·A, C·C, G·G, and T·T mismatches" Biochemistry 38 (1999) 3468-3477.