Large Scale SNP Scanning on Human Chromosome Y and DNA Pooling Study Using Unlabeled Probes Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah 84132 Abstract High-throughput SNP scanning is an important tool for genome studies. Genotyping of known mutations and scanning for unknown ones using high-resolution melting analysis and unlabeled probes is simple, rapid, and inexpensive, requiring only PCR, an unlabeled oligonucleotide, LCGreen Plus, and melting instrumentation. This method works on the single-sample HR-1, the 384-sample LightScanner and the LightCycler. We have used synthetic PCR constructs to demonstrate the detection of all possible SNP base changes by high-resolution melting analysis. In all cases heterozygotes were easily identified because the resulting heteroduplex, formed by the probe oligonucleotide and the mismatched amplicon strand, which altered the shape of the melting curve. Chromosome Y is an effective and simple target for evolution studies. Thirty-five SNP markers distributed along the human Y chromosome have been characterized in 192 individuals from south India on a 384-well LightScanner. DNA pooling is a practical way to reduce the cost of large-scale evolution or association studies. Pooling allows the population allele frequencies to be measured using far fewer PCR reactions and genotyping assays than required when genotyping individuals one by one. We have developed an unlabeled probe/high resolution melting methodology together with analysis software to determine SNP frequencies in a pooled DNA sample. Different ratios of complementary and mismatched amplicon strands from 0% to 100% were mixed and melted and quantification software was optimized (calibrated?) using this model system. We repeated this analysis using two genomic DNA samples homozygous for a G to A mutation in the cystic fibrosis gene. When mixed in different ratios, and analyzed using this methodology, the software was able to correctly determine the ratio of G to A mutation in the mixture to an accuracy of 2% over the range of 0% to 100% of one allele. This method was also applied to a pool of ninety-six human genomic DNA samples, which previously had been genotyped individually at eight SNP markers on chromosome Y. The analysis software was able to determine the allele frequencies to within 2% accuracy across a range of frequencies from 3% to 23%. This method is very simple, fast and inexpensive for the determination of SNP allele fraction. Introduction Single nucleotide polymorphisms (SNPs) are the most common source of human genetic variation. Genotyping large numbers of SNPs in linkage, association studies and evolution studies will aid in the understanding of complex diseases traits, including many common human diseases, drug responses and human evolution (1). These applications require reliable and economical methods for high-throughput SNP genotyping. SNP genotyping methods include gel-based genotyping and non-gel-based genotyping. Single-strand conformation polymorphic analysis (2) is one of the most widely used gelbased methods for mutation detection. Oligonucleotide Ligation Assay (OLA) and mini sequencing (3) are also gel-based genotyping techniques. Gel-based genotyping methods are still widely used in many labs for a small number of samples though they are labor intensive and require experience and technical skills for analysis. Non-gel-based highthroughput genotyping techniques are rapidly developed. Pyrosequencing, which uses single-base extension with fluorescence detection, and DNA microarray genotyping could genotype large numbers of SNPs simultaneously. High-throughput genotyping methods using fluorescently labeled oligonucleotides include TaqMan (4), Hybridization probe (5), Simple probe (6), Invader assay (7) and allele-specific ligation (8) genotyping. Previously, we have developed a non gel-based genotyping technique without the need for fluorescently labeled probes (Refs here?). This technique uses melting of unlabeled oligonucleotide probes and PCR products in the presence of a high-resolution double stranded DNA dye, LCGreen Plus. In addition, a 3’ end blocked oligonucliotide is used (the probe? Asymmetric PCR?). In this paper, we use this technique for high-throughput genotyping and genome-wide association studies using DNA pooling. The PCR may be performed on any 384-well thermocycler, and the melting carried out on the inexpensive “LightScanner” machine. Genome-wide association studies are necessary to identify genes underlying certain complex diseases. Many genetic diseases have yet to be located on the human genome for reasons that include their multiple loci and incomplete penetration. To pinpoint these loci in terms of particular regions of the chromosomes, association studies, which compare allele frequency between affected individuals (probands) and controls, must be performed across the entire human genome. With approximately 0.4 cMs between markers, 10,000 microsatellite markers would be necessary to fully saturate the genome (). For a study of 1000 probands and 1000 controls, 20 million genotypings would be required (). DNA pooling could greatly reduce the genotyping burden and speed up the initial gene mapping studies. Techniques previously used for analyzing SNP allele fraction by DNA pooling include amplification and cleavage at the SNP site (), primer extension (), amplification with allele-specific primers (), detection of conformational changes (), hybridization of PCR products to microarrays (), DHPLC () and Pyrosequencing (). The allele frequency estimates measured by these techniques are about 2-5%. We demonstrate the technology we have developed for high-throughput genotyping using unlabeled probes to study Chromosome Y evolution, by genotyping 35 SNPs in 192 samples from south India. We also use this method to measure SNP allele fraction for the Cystic fibrosis mutation (G542X) in pooled DNA amplified from samples of known genotype. The technique is fast, easy to design and inexpensive, with sensitivity and accuracy between 1-2%. Method DNA samples of chromosome Y 192 DNA samples used in the analysis were collected in Tamil Nadu, South India from Brian Mowry’s lab in Queens Centre for Mental Health Research Wacol, Brisbane, Australia. All samples are from control individuals that were collected as a part of a larger study of complex disease and not associated with known disease phenotypes. DNA was extracted from established cell lines using standard protocols. Genotyping SNP Markers on Chromosome Y The protocols for genotyping many of the 237 polymorphic sites which were analyzed on chromosome Y have been published (Underhill et al. 2000, 2001; Hammer et al. 2001). 35 SNP markers were chosen for the south Indian evolution study. The 35 markers are listed in Table 1. Multiplex PCR Four to six times deep multiplex PCR was used for the first PCR. Multiplex PCR was performed on the Peltier Thermal Cycler PTC-200 (MJ Research) on 96-well plates. The PCR reaction is at 1.5uM Mg++, 0.4U Taq polymerase, 2mM dNTP, 0.5uM of 4 to 6 times multiple forward and reverse primers and 12.5ng human genomic DNA. The PCR condition is 94C for 3 minutes followed by 25 cycles with 94C for 15 second, 52C for 15 second and 72C for 15 second. One-thousandth PCR products were used for nested asymmetric PCR to amplify an individual marker with an exclusive probe. There were two polymorphisms (alleles?) for each marker. Both genotypes of probes were used for the SNP typing. The PCR reaction is 2.0uM Mg++, 0.4 Taq polymerase, 2mm dNTP, 0.05uM forward primer, 0.5uM reverse primer, 0.5uM probe with 3’ end phosphorylated and 1/1000 multiplex PCR product. The PCR was performed on the same thermal cycler with 384-well plates with the following condition: 94C for 2 minutes follow by 25 cycles with 94C for 5 second, 52C for 5 second, 72C for 10 second. Data analyses of SNP genotyping After PCR, melting curve analysis is performed on the LightScanner. The temperature is raised from 50C to 90C at a rate 0.1C/second in the Automatic mode. The process takes only 5 minutes (seems like 400 seconds = 6 2/3 minutes). The software CTWTool1-18-03 was used to analyze the melting curve data (using older software for this?). Two genotype probes were used side by side to do the genotyping. Determination of genotype was achieved by comparing the melting curves of two probes. (Manually? Wouldn’t automatic clustering be more suited to high-throughput?) Genomic DNA allele fraction and DNA pooling Human genomic DNA of cystic fibrosis wild type (CFTR 542 G) and homozygous mutant (CFTR 542 T) genotypes was used for the allele fraction study. The cystic fibrosis mutation (G542X) is a single base change on exon 11 (G>T). The two genotypes were mixed in ratios from 0% to 100% in 10% increments and 2% increments from 0 to 10%, 20 to 30%, 45 to 55%, 70 to 80% and 90 to 100%. 3’ end phosphate wild type probe (-P) was used for the allele fraction test, 5’– CAATATAGTTCTTGGAGAAGGTGGAATC-P-3’. The primers and asymmetric PCR conditions were described by Zhou et al (). Ninety-six samples of genotyped human genomic DNA were pooled together. 50ng pooled DNA was used to determine the population frequency by the unlabeled probe technique. By comparing estimated allele fraction and actual allele fraction we were able to determine the sensitivity of this technique. Plots of the derivative melting curves of the pooled samples, –dF/dT, were generated from the melting curve analysis by the software. Software for allele fraction (Bob Palais) Background fluorescence was removed from raw fluorescence vs. temperature data using an exponential model for the background fit to the slope of the raw fluorescence curve in two temperature regions, one below and one above the probe melting temperatures for both genotypes. The resulting melting curves were normalized to the 0-100% range and differentiated using the polynomial least-squares fit (Savitsky-Golay) method. Allele fraction was determined by linear interpolation of the peak heights of the unknown sample melted in the presence of unlabeled probes matching both genotypes and that of pure samples and neighboring standard calibration curves having synthetically determined allele fractions melted in the same conditions, and performing an equally weighted average of the values obtained. Results SNP marker selection Over the past 15 years, DNA polymorphisms have been widely used to reconstruct human evolutionary history. Mitochondrial DNA was originally used for this purpose, because the high mutation rate produced numerous polymorphisms, and the absence of recombination facilitated their interpretation. Thirty-five SNP markers that represent a set of sequence variants from the south Indian population were chosen to carry out the genotyping. Multiplex PCR For human genetic studies, such as looking for human genetic disease, tumor suppression genes and human evolution studies, the human genomic DNA samples are always limited. Multiplex PCR has the ability to amplify different loci at same time, using the same amount of human DNA, consequently saving large quantities of human genomic DNA. Multiplex PCR has the ability to simultaneously amplify up to ten different amplicons. In this paper we focus on using unlabeled probes to genotype SNPs, consequently only six-plex PCR is performed The purpose of multiplex PCR is to enrich the loci that need to be genotyped. Then, using nested PCR, the multiplex enrichment is followed by asymmetric PCR for easy probing the genotypes of individual loci (Figure 1). Asymmetric PCR and unlabeled probes There are many different techniques to detect mutations or SNPs through the use of probes. TaqMan, Hybridization probes and Simple probes are the most common techniques. These techniques need one or two florescent labels at the end of the probe. Zhou et al has developed a technique to determine mutations using probes without fluorescent labels. (). The key to this technique is the use of asymmetric PCR and melting of the product with an unlabeled probe in the presence of the high-resolution double stranded DNA dye, LC Green Plus. Asymmetric PCR amplifies one strand much more than the complementary strand. The probe direction is opposite to this strand with the 3’ end blocked (to prevent extension). After PCR, when the unlabeled probe is added and hybridization is promoted, the probe and the strand that shares its sequence will compete to anneal with the opposite strand. The derivative melting curve of a symmetric PCR product shows the amplicon peak but not the probe peak. After asymmetric PCR, when hybridization is promoted, annealing of forward and reverse amplicon strands is limited by the lower concentration of one strand, leaving a plethora of (the other strand.) single-stranded DNA. These single strands of DNA anneal to the unlabeled probe. After this process, the derivative melting curve of an asymmetric PCR product has a lower temperature peak where probes melt from single amplicon strands, and a higher temperature peak where amplicons melt. The process of 384-well plate melting only takes five minutes. Genotype determination All SNP markers of chromosome Y have two allele types. We have typed both possibilities. Genomic DNA from chromosome Y has only one allele so the probe melting curve always shows only one peak, either a 100% perfect match peak or 100% mismatch peak, in contrast with 50% of each in the case of heterozygous DNA from other chromosomes. The melting temperature of the probe which is perfectly complementary to the sample genotype will appear 3 to 5C higher than that of the probe with one base mismatched. Hence, the probe displaying a higher melting temperature determines the genotype by complementarity (Figure 1). The matched and mismatched probe melting curves may be easily distinguished visually by a human technician, or by automatic clustering or classification. Different genotypes can also be distinguished based on the amplicon melting curves if the SNP is small deletion (Figure 2) or A, T vs. C, G change (Figure 3). This allows for a double confirmation of the genotyping. To confirm the genotyping obtained using the unlabeled probe technique, we have chosen samples from each SNP marker for sequencing. The process for choosing samples for sequencing is as follows: For a SNP marker that does not have any variation we chose the most ambiguous sample and for a SNP marker that has variation, we chose one the most ambiguous sample of each genotype. The result of sequencing was in 100% agreement with the result of genotyping by unlabeled probes. Twenty-four of the most important markers for south India population were tested on 192 samples by SNP short technique (?). Only one operator error was made out of 4,608 samples. High throughput genotyping with the aid of unlabeled probes is fast, taking about five minutes after PCR. The unlabeled probe is a 20-30bp oligonucliotide with the 3’ end blocked. The unlabeled probe is very stable: It can be stored at room temperature for a few years with no light reaction or other degradation. Unlabeled probe design is not sensitive to GC content, which gives it more flexibility than the TaqMan probe, Hybridization probe and the Simple probe. The cost of an unlabeled probe is significantly lower than a fluorescently labeled probe. The data analysis is very simple as well. On chromosome Y, we have genotyped 35 SNP markers on 192 samples, for a total of 6,720 genotypes determined. DNA pooling by unlabeled probe An allele fraction study was done by unlabeled probes to determine the sensitivity of unlabeled probes. Human genomic wild type DNA (CFTR 542 G) and cystic fibrosis mutant DNA (CFTR 542 T) were mixed in different ratios. We created mixtures of of wild type and mutant genomic DNA from 0-100% wild type in 10% increments. As the wild type DNA ratio increases, the perfect match peak height G::C increases and the mismatch peak height clearly decreases (Figure 3A). We also mixed 2% increments of wild type genomic DNA from 0-10% (Figure 3B), 20-30%, 45-55% (Figure 3C), 7080%, and 90-100%. As wild type DNA increases 2%, the perfect match peak height persistently increases and the mismatch peak height continually decreases. The estimated and actual allele frequencies are shown in Table 2. The allele fraction can be determined by the unlabeled probe technique within 2%. After calibrating the analysis software in this manner, 96 samples of human genomic DNA having known genotype from among two genotypes were mixed as a “pool.” Samples taken from the mixture were hybridized with probes complementary to each genotype then high-resolution melting and analysis was performed. The known allele fraction of the mixture was correctly estimated to within +/- 2%, based on the analysis software’s comparison of the height of probe melting peaks of the mixture with those of samples of pure genotype. The software is automatic and simple to use. It is also possible to call the genotypes in the high-throughput evolution study automatically with the software with high accuracy. This is important when we are considering thousands, and ultimately millions of genotyping experiments.