Title: Digital Genotyping and Haplotyping with Polymerase Colonies Rob Mitra et al. Harvard Medical School, Lipper Center for Computational Genetics, 200 Longwood Ave., Boston, MA 02115. 116107171 2/13/2016 3:00:00 AM Page 1 of 30 Abstract: The polymerase colony (polony) technology amplifies multiple individual DNA molecules in a thin acrylamide gel on the surface of a glass microscope slide. In each resulting polony of double stranded DNA, one strand is covalently attached to the gel. We genotype these polonies by performing single base extensions with dyelabeled nucleotides, and we demonstrate the accurate quantitation of two allelic variants using this technology. We also show that polony technology can be used to directly determine the phase, or haplotype, of two single nucleotide polymorphisms (SNPs). We correctly determined the genotype and phase of three different pairs of SNPs. In one case, the distance between the two SNPs is 45 kilobases, the largest distance achieved to date without separating the chromosomes by cloning or somatic cell fusion. The results indicate that polony genotyping and haplotyping may play an important role in understanding genetic variation. Introduction One goal of genomic science is to find genetic variation that predicts susceptibility to disease. Individuals who have been identified as being at risk could then change their diet, lifestyle, or environment to reduce their chances of developing disease. For patients who have already developed disease, genetic markers could guide the choice of therapy to increase the likelihood of a successful outcome. 116107171 2/13/2016 3:00:00 AM Page 2 of 30 Most researchers who study DNA variation are focusing on single nucleotide polymorphisms (SNPs), as these are the most common variations in the human population. By studying candidate genes and performing genomewide linkage studies, scientists are trying to hone in on the “causative SNP”- the SNP that alters gene function and increases the risk of disease. However, recent studies[Drysdale, 2000 #1; Hoehe, 2000 #24] suggest that, for some genes, there may not be one single SNP that is responsible for altering protein function or expression - and thereby causing disease, but, instead, multiple SNPs that interact to alter function or expression[Davidson, 2000 #20]. Furthermore, this phenotype only occurs when these SNPs are present on the same chromosome, so one must determine the haplotype of these SNPs to find a correlation to the observed phenotypes. In these cases, then, we have traded the notion of a causative SNP for that of a causative haplotype. What existing technologies allow one to the haplotype, or phase, of a pair of SNPs? Currently, the most common approach is to first genotype the SNPs to acquire unphased data from multiple related individuals and to then infer the haplotype computationally[Stephens, 2001 #4; Clark, 1998 #1; Excoffier, 1995 #2; Hawley, 1995 #3; Hoehe, 2000 #24; Niu, 2002 #6]. The development of this methodology has greatly increased the power of both linkage studies and candidate gene studies. However, the computational inference of haplotypes has been estimated to be only 75-95% accurate[Stephens, 2001 #4; Niu, 2002 #6;Tishkoff, 2000 #13], making this technique an unlikely candidate for use in a clinical setting, as well as presenting challenges when used as a research tool. 116107171 2/13/2016 3:00:00 AM Page 3 of 30 Daly et. al[Daly, 2001 #27] and others[Patil, 2001 #31; Gabriel, 2002 #28] have shown that SNPs tend to be inherited in larger haplotype blocks than previously thought and that there are relatively few variants of each block. This observation has sparked a public effort to characterize all common haplotypes in the human population[Couzin, 2002 #32]. Prior knowledge of common haplotype blocks may make it easier to infer the phase of SNPs that lie within the same haplotype block[Zhang, 2002 #18]. However, even with this knowledge, it will be difficult to accurately predict the haplotype of two SNPs that are in different haplotype blocks because these two SNPs will not typically be in linkage disequilibrium. These cases may be quite common since the genome contains about 100 kb per gene and the average haplotype block is only 22 kilobases in European and Asian populations and 11 kilobases in Yoruban and African-American populations[Gabriel, 2002 #28]. This point is well illustrated by two known mutations, R347->H and A970->D, in the CFTR gene[Clain, 2001 #17] that are separated by 65 kbp. When present in cis, they interact to produce more severe symptoms of cystic fibrosis than when present in trans, or when only one mutation is present. If haplotypes are to be used in the clinic as a prognostic marker, a direct molecular haplotyping technology is necessary. Current methods for the direct determination of haplotypes have clear limitations. Allele specific PCR[Michalatos-Beloin, 1996 #14] and single molecule PCR[Ruano, 1990 #2] require significant optimization and cannot routinely determine the phase of SNPs separated by more than 10-15 kilobases. Atomic force microscopy[Woolley, 2000 #21] is an interesting alternative, but it is 116107171 2/13/2016 3:00:00 AM Page 4 of 30 unclear how easily this technology can be scaled up, and it requires expensive equipment not commonly found in a molecular biology laboratory. Methods in which chromosomal fragments are cloned into BACs or in which somatic cell hybrids[Patil, 2001 #31;Douglas, 2001 #22] are made are not cost effective when one is interested in phasing a small number of SNPs for a large number of samples[Douglas, 2001 #22], as would be necessary for a clinical diagnostic. Here, we present a method to determine haplotypes using polymerase colony (polony) technology, a technology in which a large number of individual DNA molecules are cloned, amplified, and analyzed on a glass microscope slide[Mitra, 1999 #3]. We determined the phase of three different pairs of SNPs up to 45 kilobases apart. In principle, distances of hundreds of megabases (whole chromsomes) are possible. This technology requires very little DNA as input - we show that a buccal swab provides enough DNA to perform hundreds of reactions. We also demonstrate that a large number of polony assays can be performed on a single microscope slide, reducing the cost per assay. As a prerequisite to determining haplotypes, it was necessary to demonstrate that polony technology could be used to determine genotypes. Polony genotyping also has many applications such as detecting loss of heterozygosity[Zhou, 2001 #7], quantifying allelic imbalance[Yan, 2002 #9] and the detection of rare somatic mutations in a background of wild-type DNA[Lizardi, 1998 #34]. Therefore, in addition to demonstrating haplotyping, we also present data that demonstrates the utility of polony genotyping for these applications. 116107171 2/13/2016 3:00:00 AM Page 5 of 30 Materials and Methods Polony Amplification Template (50 to 100,000 molecules) was added to the polony amplification mixture [10mM Tris-HCl pH 8.3, 50mM KCl, 0.01% gelatin, 1.5mM MgCl2, 200M dNTPs, 10U JumpStart Taq (Sigma), 5.91% acrylamide, 0.09% bis-acrylamide, 0.5M forward primer (with acrydite group), 0.5M reverse primer, 0.1% Tween 20, 0.2% BSA]. Ammonium persulfate and TEMED were added to a final concentration of 0.083%. A 15m thick gel was poured on a glass microscope slide that was partially covered with a teflon coating (Erie Scientific). The teflon coating served as a spacer between the glass surface of the slide and a glass coverslip (20mm x 30mm no. 2 - Fisher Scientific). The gel was allowed to polymerize under argon for 30 minutes. The coverslip was overlaid with mineral oil and the slide cycled using the following program: denaturation (2 minutes at 94C) 40 cycles (30s at 94C, 30s at 56C, 1min at 72C), and extension (2 min at 72C). After cycling, the mineral oil was removed by rinsing the slides in hexane. Polony amplification for the haplotyping reactions were performed as above except four primers were used (two forward and two reverse primers) at a concentration of 0.25M each primer. The polony protocol was modified to amplify polonies in gels that contained the cleavable crosslinker DATD(see results). Instead of polymerizing the acrylamide gel with the template and PCR reagents present, we polymerized the gel first and later diffused in the DNA template molecules and PCR reagents. 116107171 2/13/2016 3:00:00 AM Page 6 of 30 In this protocol, we made the gel mix [Tris-HCl pH 8.3, 50mM KCl, 0.01% gelatin, 1.5mM MgCl2, 7.6% acrylamide, 0.36% DATD, 0.036% bis-acrylamide, 0.5M acrydite modified reverse primer, 0.1% Tween 20, 0.2% BSA], and then added ammonium persulfate and temed, to a final concentration of 0.083%. We poured a 15m thick gel on a glass microscope slide that was partially covered with a teflon coating (Erie scientific). The teflon coating served as a spacer between the glass surface of the slide and a glass coverslip (20mm x 30mm no. 2 - Fisher Scientific). The gel was allowed to polymerize under argon for 30 minutes. The slides were washed in deionized water, allowed to dry and stored under a vacuum until use. To perform polony amplification, we took PCR amplification mix [500 - 5 x 104 molecules/ul template, Tris-HCl pH 8.3, 50mM KCl, 0.01% gelatin, 0.2% BSA, 0.1%Tween 20, 0.5M primer PR1 pcr2.1-R, 200M dNTPs, 0.335 units/ul Jumpstart Taq] and covered the polymerized gel for 2 minutes and then removed excess fluid. The gel was covered with 35 l of mineral oil and covered with a coverslip. The slides were cycled as follows: denaturation (2 minutes at 94C) 44 cycles (30s at 94C, 45s at 56C, 90s at 72C). After amplification the DATD crosslinker was cleaved by treating the slides with 100mM NaIO4 for 15 minutes at room temperature. Next we washed the slides in deionized water for 5 minutes, in inactivation buffer [50mM ethanolamine, 100mM Tris-HCL pH 9.0, 0.1% SDS] for 30 minutes at room temperature, and in deionized water for 5 minutes. 116107171 2/13/2016 3:00:00 AM Page 7 of 30 Denaturing Polony Gels After polony amplification, the unattached DNA strand was removed by incubating in 70C denaturing buffer [70% formamide, 1x SSC] and electrophoresing in 0.5x TBE with 42% urea for 1 hour at 5-10 v/cm. The slides were then washed 2x4minutes in Wash buffer 1 [10mM Tris-HCl pH 7.5, 50mM KCl, 2mM EDTA, 0.01% triton x-100]. For polony haplotyping reactions, after the first SBE, the extended primers were removed by washing in 70 degree denaturing buffer and the slides were washed 2x5' in dH20. Single Base Extension (SBE) Reactions The SBE reactions used in the polony haplotyping and genotyping experiments in this study were carried out using fluorescent deoxynucleotides. To do so, the acrylamide gel was covered with a frame seal chamber (MJ Research) and annealing mix [0.25 M SBE primer, 6x SSPE, 0.01% triton-x100] was added over the gel. The slides were heated at 94C for 2 minutes, then at 56C for 15 minutes. We removed unannealed primer by washing the slides 2 x 4 minutes in wash buffer 1 and then equilibrated the slides in 1x Klenow buffer [10mM TrisHCL pH 7.5, 10 mM MgCl2]. Next, we covered the gel with 40 microliters of extension mix[1x Klenow buffer, Klenow exo - polymerase Xunits/l, E.coli Single stranded binding protein, 1M Cy3 or Cy5 labeled deoxynucleotide] for two minutes and then washed the slides in wash buffer 1. The slides were scanned 116107171 2/13/2016 3:00:00 AM Page 8 of 30 on a scanning confocal microscope designed for microarrays (Scanarray 5000, GSI Luminomics). SBE reactions with dideoxynucleotides were performed as follows: The acrylamide gel was covered with a frame seal chamber (MJ Research) and annealing mix [0.25M sequencing primer, 6x SSPE, 0.01% triton-x100] was added over the gel. The slides were heated at 94C for 2 minutes, then at 56C for 15 minutes. Unannealed primer was removed by washing the slides 2 x 4 minutes in wash buffer 1. The slides were equilibrated in 1x Amplitaq FS buffer [10mM Tris-HCL pH 8.0, 50mM KCL, 1.5mM MgCl2]. Next, the gel was covered with 40 microliters of extension mix[1x Amplitaq FS buffer, 2M FITC-12-ddUTP, 2M ROX-ddCTP, 2M Cy5-ddATP, 2M Cy3-ddGTP, Amplitaq FS Xunits/l, E.coli Single stranded binding protein ]. The gel was covered with a frame seal chamber and heated to 55 degrees for 4 minutes. A wash in wash buffer 1 was performed and the slides were scanned on a scanning confocal microscope designed for microarrays (Scanarray 5000, GSI luminomics). Image Analysis Images of polony gels were acquired in TIF format. The images were filtered using a Wiener filter and a median filter to remove speckle and noise. The background was subtracted and polonies were computationally identified using the ImageQuantNT software package. This package quantified the fluorescent intensity of each polony and output the data as a text file. Overlapping polonies were identified using a MATLAB script, HAPCALL which is available at 116107171 2/13/2016 3:00:00 AM Page 9 of 30 http//arep.med.harvard.edu. For the polony genotyping experiment in which the relative abundances of two alleles are measured, the images were smoothed, polonies identified, and their genotypes were determined using the MATLAB script polony_call.m also available at http://arep.med.harvard.edu Oligonucleotides and Patient DNA All primers used to perform the polony amplification reactions were designed using Primer 3 software[Rozen, 1998 #41]. We found it was necessary to set the following parameters in order to obtain good results: PRIMER_OPT_SIZE=25, PRIMER_MIN_SIZE=19, PRIMER_MAX_SIZE=30, PRIMER_OPT_TM=70, PRIMER_MIN_TM=64, PRIMER_MAX_TM=73, PRIMER_MAX_DIFF_TM=5, PRIMER_MIN_GC=45, PRIMER_MAX_GC=80. For some experiments, the parameter PRIMER_PRODUCT_SIZE_RANGE=90100 was used. All other parameters were set to default values. The names and sequences of the oligonucleotides used to amplify polonies are as follows: Locus containing SNP DK438: Primer DK438AP.1.FM 5’ QCATTGAGTCCTTACTGTGCACACAGCTC 3’; Primer DK438AP.1.R 5’ GGGGGAAATCCACTGAGCTAAATTGC 3’. Locus containing SNP DK445-2: Primer DK445-2AP.1.F 5’ GGTCCCCACCTAGGCCTCTGTGTTA 3’; Primer DK445-2AP.1.RM 5’ QTGAGTCCCTCAAACCCCTTTCTTCTG 3’. Locus containing SNP DK331: Primer DK331AP.1.FM 5’ QTGTTGGTATGGCAGAATGTAGCATGG 3’; Primer DK331AP.1.R 5’GGCGGTGAGAAAAGGTTTTAATGG 3’; Locus containing SNP C/T –13910: 116107171 2/13/2016 3:00:00 AM Page 10 of 30 Primer IN13L126PS2F 5’ GGCCTCTGCGCTGGCAATACAG 3’; Primer In13l126ps2RM 5’ QCCTCGTGGAATGCAGGGCTCAA 3’; Locus containing SNP G/A –22018: Primer In9L125ps3FM 5’QGATGTCCTTAAAAACAGCATTCTCAGC 3’; Primer In9L125ps3r 5’CCATGTTGGCCAGGCTGGTCTC 3’;Model Templates for SBE Quantitation: Primer PR1-RM 5’QCTGCCCCGGGTTCCTCATTCTCT 3’; Primer PR1pcr2.1-R 5’ CCATGTAAGCCCACTGCAAGCTACC 3’;INSERT jays primers here; PR1-F CCACTACGCCTCCGCTTTCCTCTC 3’ The following oligonucleotides were used as primers for the single base extension reactions: Primer Seq 438 5’GAGCTAAATTGCACATAACTTAGTAACAGGCTTA3’; Primer Seq 445-2 5’ ACCTAGGCCTCTGTGTTAGTCTGTTTTCA 3’; Primer Seq 331 5’ACCTAGGCCTCTGTGTTAGTCTGTTTTCA 3’; Primer In9L102ps2R 5’ GGGACAAAGGTGTGAGCCACCG 3’; Primer SeqIN13ps2 5’ GGCCTCTGCGCTGGCAATACAGATAAGATAATGTAG 3’. Primer Hybe 010129-1GA 5’ TATGGGCAGTCGGTGATAGAGTGGTGGA 3’. INSERT JAYS PRIMER HERE. Patient DNA used to haplotype SNPs DK438, DK445-2, and DK331 was obtained from the Coriell Institute. Patient DNA used to haplotype the SNPs G/A –22018 and C/T –13910 was purified from buccal swabs using the MasterAmp buccal swab DNA extraction kit (Epicentre). Results 116107171 2/13/2016 3:00:00 AM Page 11 of 30 Principles Underlying Polony Haplotyping. Our approach is shown in Figure 1. One hundred to five hundred genome equivalents of patient DNA are diluted into a mixture of acrylamide monomer, bis cross-linker, and PCR reagents. Two pairs of primers are included in this mixture, one pair flanking the first SNP of interest(Figure 1 inset), the other pair flanking the second SNP of interest, and this mixture is used to pour a thin (15 micrometer) acrylamide gel on a glass microscope slide. Because the concentration of patient DNA is so low, the chromosomes are well separated from each other on the surface of the slide. PCR is then performed using a PCR machine designed to accommodate slides. Each chromosome is amplified at two loci by the PCR reaction, and the acrylamide matrix prevents the amplification products from diffusing very far. As such, double stranded DNA accumulates around the chromosome, forming two overlapping polonies - each amplified from a different region on the same chromosome molecule. A key feature of this protocol is the use of modified primers in the PCR reaction that covalently attach one strand of the amplified DNA to the acrylamide matrix[Rehman, 1999 #196]. This feature allows the unattached other strand to be removed from all polonies by heating and washing the slide, leaving single stranded templates for the subsequent single base extension (SBE) reactions that will determine the genotypes of the two SNPs. After genotyping all polonies, the phase of the SNPs is then determined by identifing overlapping polonies. 116107171 2/13/2016 3:00:00 AM Page 12 of 30 Proof-of-Principle To perform the protocol described above, it was first necessary to establish that I) multiple polonies could be amplified from a single molecule of DNA and II) single base extension reactions[Pastinen, 1996 #4; Pastinen, 1997 #36; Syvanen, 1994 #35; Dubiley, 1999 #38] could be performed on DNA covalently attached to the acrylamide gel. To confirm that two polonies could be amplified from a single DNA molecule, we first cut a circular plasmid template (Figure 2c) with EcoRI to make it a linear molecule. We then amplified this linear template in a polony reaction using two sets of PCR primers – each primer pair chosen to amplify a different region of the template molecule (designated regions A and B in figure 2). After amplification, the polonies were made single stranded by heating the slide and washing away the unattached DNA strand. Next, two dye-labeled oligonucleotides were hybridized to the gel(figure 2a). The oligonucleotide complementary to DNA sequence located in region A was labeled with a Cy5 molecule (red), and the oligonucleotide complementary to DNA sequence located in region B was labeled with a Cy3 (green) molecule. In a separate control reaction, we cleaved the circular plasmid with two restriction endonucleases, EcoRI and NcoI, so that region A and region B were no longer on the same molecule of DNA (figure 2b). When the singly cut plasmid was used as the template for polony amplification, numerous overlapping polonies could be identified after the hybridization, as evident from the large number of yellow polonies in Figure 2a. The doubly cut plasmid produced few polonies that overlapped. The polonies that did overlap did so only near their edges and were the result of two separate DNA molecules falling near 116107171 2/13/2016 3:00:00 AM Page 13 of 30 each other when the gel was poured. These results demonstrate that a single DNA molecule can give rise to two overlapping polonies. We determined the efficiency of amplification for the two primer pairs to be 85% and 81% (see methods). We next characterized the specificity of single base extension (SBE) on acrylamide-immobilized DNA. We used a single dye-labeled deoxynucleotide or dideoxynucleotide to extend primer:template duplexes by one base in a DNA polymerase catalyzed reaction. We performed four reactions for each nucleotide tested to determine the specificity of the SBE reaction for the correct base relative to all possible mismatches. The results are shown in Table 1. SBE reactions with both fluorescent deoxynucleotides and dideoxynucleotides showed good discrimination for the correct base. We chose to use fluorescent deoxynucleotides in our SBE reactions as they performed somewhat better and are not as expensive as fluorescent dideoxynucleotides. Polony Haplotyping on Patient DNA. We determined the phase of two SNPs that are 11.8 kilobases apart on chromosome 7. These SNPs, DK438, a T->C mutation in intron 4 of the CFTR gene, and DK445-2, a T->C mutation in intron 9 of the CFTR gene, have been previously characterized [Keen, Housman unpublished], and their phase is known because they are in strong linkage disequilibrium with one another. We used two patient samples that were heterozygous at these alleles and amplified both loci in a polony reaction. Next, we performed an SBE reaction to genotype the SNP DK438. The results for one 116107171 2/13/2016 3:00:00 AM Page 14 of 30 sample are shown in Figure 2a . Green polonies correspond to the wild type allele (T) and red polonies correspond to mutant allele (C). The primers were stripped from the slides and we performed a second SBE reaction to genotype the SNP DK445-2. Again, green polonies correspond to the wild type allele (T) and red polonies correspond to mutant allele (C). The images were merged and overlapping polonies were identified computationally (see Methods) and are circled in the figure. We found 22 overlapping polonies(table 1), all of which indicated the correct haplotype (the base T at DK438 in cis with the base T at DK 445-2). When polonies are amplified in a polony haplotyping reaction, there is a chance of observing overlapping polonies, not because the two polonies were amplified from the same molecule of DNA, but because two different DNA molecules happened to land very close to each other when the gel was poured, and when these molecules were amplified, the resulting polonies overlap. Pairs of overlapping polonies that occur in this fashion will not provide information about the phase of the two SNPs being queried. To confirm that this phenomenon did not cause an error in our called haplotypes, we estimated the probability that the results we observed could have occurred by this process. This probability is a function of the density of polonies on the slide, the number of observed overlapping polonies, the maximum distance between the center of two polonies that are called overlapping, and the number of overlapping polonies that predict the same haplotype (see Methods). For the two samples used in the 116107171 2/13/2016 3:00:00 AM Page 15 of 30 polony haplotyping reaction, we found the p-values to be 6e-22 and 6e-24 (Table 1), indicating that we can have high confidence in the called haplotype. One feature of the polony haplotyping technology is that it requires very little patient DNA, - a buccal swab should collect enough DNA for many reactions. To test this hypothesis, we collected buccal swabs from 5 subjects and haplotyped two SNPs recently shown to be involved in hypolactsia [Enattah, 2002 #42]. These SNPs, a G->A variant 22018 bases upstream of the gene MCM6 (designated G/A -22018), and a C->T variant 13910 bases upstream of the gene MCM6 (designated C/T -13910) are in strong linkage disequilibrium, so we expected to find only one of the two possible haplotypes. The results of these polony haplotyping reactions are summarized in table 2. For all samples, the G variant at -22018 was found to be on the same chromosome as the C variant at 13910, consistent with the predicted linkage. For some samples, not every pair of overlapping polonies called the same haplotype. For example, in the patient NR, 32 pairs of overlapping polonies indicated the correct haplotype, but 3 pairs indicated the other haplotype. These overlapping polonies were most likely amplified from different template molecules and happened to overlap because the DNA was plated at a relatively high density. In spite of these occasional overlapping pairs with a dissenting prediction, the calculated p-values (Table 2) indicate that we can have high confidence in the called haplotype. The two pairs of SNPs that were haplotyped in the above examples were separated by 11.8 kb and 8.1kb of genomic sequence respectively. However, in principle, one should be able to phase two SNPs separated by any distance, as 116107171 2/13/2016 3:00:00 AM Page 16 of 30 long as the sample is not sheared or degraded to the extent that most of the DNA molecules in the sample are too short to contain both SNPs. To assess the degree of degradation in our samples, we performed agarose gel electrophoresis, and chose the sample that contained the least amount of low molecular weight fragments. We estimated the average fragment size of this sample to be greater than 80kb. Next, we performed polony haplotyping to phase two SNPs, DK331 and DK445-2, in the CFTR gene that are separated by 45 kilobases. There were 153 polonies amplified at the locus surrounding SNP DK331 (figure 4a) and 175 polonies amplified at the locus surrounding SNP DK445-2 (figure 4b). We identified 34 overlapping polonies from the merged scans and 32 of these indicated the same haplotype. From this, we conclude that the A variant at SNP DK331 is on the same chromosome as the T variant at DK445-2 (p = 2e-9) Considerations When Working With Small Polonies. The cost of a polony reaction is related to the size of the polonies amplified in the reaction. Smaller polonies mean less area on the slide is needed to haplotype SNPs, and therefore a smaller volume of the necessary reagents is used per reaction. We have previously reported a strong dependence of polony size on the length of the amplified PCR product ,e.g. longer PCR products result in smaller polonies[Mitra, 1999 #3]. In that study, polonies were detected by staining the DNA with an intercalating dye, SYBR Green I. However, as we developed this haplotyping technology, we found that the SBE reaction is not efficient on polony amplification products greater than 500 base pairs in length, limiting our ability to 116107171 2/13/2016 3:00:00 AM Page 17 of 30 use small polonies for genotyping or haplotyping reactions. We hypothesized that this phenomenon is due to secondary structure in the single-stranded DNA that is stabilized by virtue of its location inside the pore of an acrylamide gel. Therefore, we reasoned that if we amplify in acrylamide gels with the cleavable cross-linker DATD and then loosen-up the gel by periodate cleavage of the cross-linker, we could improve the efficiency of the SBE reaction in the acrylamide gel. To test this idea, we amplified two 917 base pair templates in an acrylamide gel with (10:1) DATD:bis cross-linker using a slightly modified amplification protocol (see methods). We cleaved the DATD cross-linker, and performed a SBE reaction with Cy5 (red) labeled dATP and then with Cy3 (green) labeled dGTP (figure 4a). The Cy5/Cy3 ratios of 100 polonies are plotted in figure 4b. The polonies show a clear biphasic distribution with the difference in Cy5/Cy3 ratios between these groups measured to be 600 +/- xx. The polonies in figure 5b are 50um in radius, however we have previously shown that polonies as small as 6um can be amplified using a DNA template of approximately the same length {Why weren’t these?}. These results suggest that it should be possible to perform {How many?} many polony haplotyping reactions on a single slides, or in modified 384 well plates[Bell, 2002 #39]. Polony Genotyping. In addition to haplotyping, the results presented above demonstrate the ability to genotype a large number of single DNA molecules. This ability can be used to accurately determine the ratio of two alleles of a gene, which is important for detecting allelic imbalance, loss of heterozygosity, and somatic mutations. We tested the ability to measure small changes in the ratio 116107171 2/13/2016 3:00:00 AM Page 18 of 30 of two alleles by mixing genomic DNA from two inbred strains of mouse in various ratios and performing polony amplification followed by SBE. Polonies were counted automatically by in-house software, and the results are shown in Figure 1-5. Significantly, the observed ratios (blue) match the expected ratios (pink), with errors of less than +/-7.5% (95% confidence level, error bars representing 3 standard deviations Discussion The polony haplotyping reactions performed in this study used two pairs of primers to amplify two different polony products from the same molecule. This strategy was employed due to the large distance (8-45kb) between the pairs of SNPs haplotyped. For SNPs that are 1-4 kilobases apart, another feasible strategy is to use one pair of primers to amplify both loci in one single polony and then genotype both SNPs. In the strategy we employed, two polonies must be amplified from a single DNA molecule. However, not every template molecule included in the reaction must give rise to two polonies to correctly determine the haplotype. In our experiments, the polony efficiency, defined as the probability that a DNA molecule will give rise to a polony, ranged from 25% to 80% (check these figures). Because a large number {How large} of polonies were analyzed, accurate haplotypes were obtained. Increasing the polony efficiency would, decrease the number of polonies that need to be analyzed. These observations raise two questions: I) what are the parameters that affect the polony efficiency?, and ii) what is the relationship between the accuracy of polony haplotyping, the 116107171 2/13/2016 3:00:00 AM Page 19 of 30 polony efficiency, the number of polonies analyzed, and the plating density of the polonies? We have found that the presence of ungelled acrylamide in the polony gel during thermal cycling decreases the polony efficiency. We hypothesize that acrylamide monomer reacts with DNA at high temperature to form an adduct so that PCR primers that flank this site will not produce a polony. This hypothesis is supported by the observation that ungelled acrylamide monomer strongly inhibits in tube PCR, even when 1ng of DNA is included as template(data not shown). But, if the acrylamide is first gelled, the reaction proceeds normally. For this reason, we degassed all reagents used the in polony amplification and we polymerized the gels under argon. In an alternate protocol, we poured the acrylamide gels on the slide, washed away any ungelled acrylamide, diffused in the templates and PCR reagents and then performed the polony amplification (see methods). We have also found that polony efficiency depends on the design of the primer pair used to amplify the locus. We used the Primer3[Rozen, 1998 #41] program to design our primers, optimizing the selection parameters to maximize polony efficiency (see methods). These parameters are listed in the Methods section. DNA fragmentation or degradation can also result in low measurements of polony efficiency. If a number of DNA molecules in the polony amplification do not contain both loci, then there will be fewer overlapping polonies, and therefore the polony efficiency measurement will be lower than expected. This may 116107171 2/13/2016 3:00:00 AM Page 20 of 30 explain why the DNA samples from buccal swabs(table 2), which were purified using a fast but relatively crude protocol, displayed a lower fraction of overlapping polonies than the other DNA samples which were purified by more standard methods(see Methods). {This paragraph will discuss the relationship between the polony efficiency, number of polonies amplified, and polony density} For some applications it is desirable to phase a large number,100 10,000, SNPs along a chromosome. We plan to multiplex the polony amplification so that a large number of SNPs can be haplotyped in a single reaction, and then perform many SBE reactions in order to obtain the entire haplotype. PCR reactions can be multiplexed M=30-fold or more on a routine basis[Kokoris, 2000 #40]. Polony amplification should be even more amenable to multiplexing because mispriming events between any two of the attached primers or among early amplification products is rare. Multiplexing would not require any increase in the average polony efficiency because phasing is transitive.If SNPs A, B and C are heterozygous and we know the phase of SNP A and SNP B and the phase of SNP B and SNP C then we know the phase of SNP A and SNP C. This means that, in the haplotyping reaction, it is not necessary that any one DNA molecule amplifies M overlapping polonies, only that enough pairs of overlapping polonies are amplified to infer the total haplotype for all SNPs. Next, we would phase another set of M SNPs, some of which are located further along the chromosome. This set of SNPs would include also include SNPs phased in the previous haplotyping reaction so that the relative phase of 116107171 2/13/2016 3:00:00 AM Page 21 of 30 the two sets of SNPs could be determined. (To achieve phasing between the two sets, at least one SNP included in both reactions must be heterozygous in the patient sample being tested) Using polonies of 50 micron radius, as shown in figure x, we should be able to fit 100 reactions on a glass microscope slide. Alternately, modified 384 well plates as described could be used[Bell, 2002 #39]. Polony technology provides an inexpensive, flexible method for determining haplotypes. To our knowledge, no molecular haplotyping technology has previously demonstrated the phasing of two SNPs more than 20 kilobases apart [McDonald, 2002 #5] without separating the chromosomes by cloning or somatic cell fusion. Here we phase two SNPs 45 kilobases apart, and there are no apparent barriers to working with larger distances. Polony haplotyping also requires very small amounts of patient DNA, simplifying its collection. Polony haplotyping should be a valuable tool in understanding genetic variation. The polony genotyping and haplotyping techniques described here are inherently digital - each polony provides 1 bit of information. The power of a digital genotyping for molecular biology has been demonstrated by a related technology, Digital PCR. which has found applications in detecting loss of heterozygosity, measuring allelic skewing, and rare mutation detection. The polony genotyping presented here should further extend the utility of the digital genotyping since millions[Mitra] of polonies can be counted on a single slide resulting in extremely accurate molecular quantitation and sensitivity. Use fewer digits in Table 2 p-values. 116107171 2/13/2016 3:00:00 AM Page 22 of 30 116107171 2/13/2016 3:00:00 AM Page 23 of 30 116107171 2/13/2016 3:00:00 AM Page 24 of 30 Figure 2 notes: setting brightness 60, contrast 84 No zoom. Smoothed image (use smooth_tif.m) 116107171 2/13/2016 3:00:00 AM Page 25 of 30 116107171 2/13/2016 3:00:00 AM Page 26 of 30 116107171 2/13/2016 3:00:00 AM Page 27 of 30 Figure 5 A B C Allelic Discrimination 2 1.5 Log Ratio 1 0.5 0 -0.5 0 20 40 60 80 100 120 -1 -1.5 -2 Rank 116107171 2/13/2016 3:00:00 AM Page 28 of 30 Figure 6 1.0 Polony Count Ratio 0.9 0.8 0.6 0.5 0.4 0.3 0.1 0.0 0.0 0.1 0.3 0.4 0.5 0.6 0.8 0.9 Template Ratio 116107171 2/13/2016 3:00:00 AM Page 29 of 30 1.0 Table 1 Patient DK438 polonies DK 445-2 Polonies Overlapping polonies CT calls CC calls Called Haplotype P-Value 14611 94 92 24 24 0 CT 5.96E-08 11321 65 77 22 22 0 CT 2.38E-07 Table 2 Patient Intron 13 Intron 9 Overlapping Polonies CC calls CT calls Called Haplotype P-Value NR 325 328 49 46 3 CC 3.49054E-11 RM 295 334 35 32 3 CC 2.08849E-07 DJ 232 234 38 38 0 CC 3.63798E-12 VB 286 290 25 21 4 CC 0.00045526 BW 274 274 11 10 1 CC 0.005859375 116107171 2/13/2016 3:00:00 AM Page 30 of 30