Large Scale SNP Scanning on Human Chromosome Y and DNA Pooling Study by Unlabeled Probe Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah 84132 Abstract High-throughput SNP scanning is an important tool for genome studies. We have used synthetic PCR constructs to demonstrate the detection of all possible SNP base changes. LCGreen Plus was included in the PCR reaction and high-resolution melting analysis was performed five minutes after amplification. In all cases heterozygotes were easily identified because the resulting heteroduplex, formed by the probe oligonucleotide and the amplicon, altered the shape of the melting curve. Analysis of known mutations using high-resolution melting analysis and unlabeled probes is simple, rapid, and inexpensive. This only requires PCR, an unlabeled oligonucleotide, LCGreen Plus, and melting instrumentation. This method works on the single-sample HR-1, the 384-sample LightScanner and the LightCycler. Chromosome Y is an effective and simple target for evolution studies. Thirty-five SNP markers, distributed along the human Y chromosome, have been characterized in 192 individuals of south India on a 384-well LightScanner. DNA pooling is a practical way to reduce the cost of large-scale evolution or association studies. Pooling allows the population allele frequencies to be measured using far fewer PCR reactions and genotyping assays than required when genotyping individuals one by one. We have developed an unlabeled probe/high resolution melting methodology together with analysis software to determine SNP frequencies in the pooled DNA sample. Different ratios of complementary and mismatched amplicon strands from 0% to 100% were mixed and melted and software was optimized using this model system. We repeated this analysis using two genomic DNAs homozygous for a G to A mutation in the cystic fibrosis gene. When mixed in different ratios, and analyzed using this methodology, the software was able to correctly determine the ratio of G to A mutation in the mixture to an accuracy of 2% over the range of 0% to 100% of one allele. This method was also applied to a pool of ninety-six human genomic DNA samples, which previously had been genotyped individually at eight SNP markers on chromosome Y. The analysis software was able to determine the allele frequencies to within 2% accuracy across a range of frequencies from 3% to 23%. This method is very simple, fast and inexpensive for the determination of SNP allele fraction. Introduction Single nucleotide polymorphisms (SNPs) are the most common source of human genetic variation. Genotyping large numbers of SNPs in linkage, association studies and evolution studies will aid in the understanding of complex diseases traits, including many common human diseases, drug responses and human evolution (1). These applications require reliable and economical methods for high-throughput SNP genotyping. SNP genotyping methods include gel-based genotyping and non-gel-based genotyping. Single-strand conformation polymorphic analysis (2) is one of the most widely used gelbased methods for mutation detection. Oligonucleotide Ligation Assay (OLA) and mini sequencing (3) are also gel-based genotyping. Gel-based genotyping methods are still widely used in many labs for a small number of samples though it is labor intensive and requires experience and technical skills for analysis. Non-gel-based high-throughput genotyping techniques are rapidly developed. Pyrosequence, which uses single-base extension with fluorescence detection and DNA microarray genotyping could handle large numbers of SNP genotyping. Labeled with fluorescence, the oligonucliotide highthroughput genotyping methods are TaqMan (4), Hybridization probe (5), Simple probe (6), Invader assay (7) and allele-specific ligation (8) genotyping. We developed a non gel-based genotyping technique. This technique uses a non-fluorescently labeled probe in conjunction with homogeneous melting of PCR products in a double strand DNA dye called LCGreen Plus. In addition, a 3’ end blocked oligonucliotide is used. In this paper, we used this unlabeled probe technique on high-throughput genotyping. The PCR may be performed on any 384-well thermal cycler, and the melting carried out on the inexpensive machine called “LightScanner.” Chromosome Y evolution is a good candidate for us to do a large number of SNP genotyping via unlabeled probe. 35 SNPs and 192 samples from south India were genotyped by unlabeled probe. Genome-wide association studies are necessary to identify genes underlying certain complex diseases. Many genetic diseases have yet to be located on the human genome for reasons that include their multiple loci and incomplete penetration. To pinpoint these loci in terms of particular regions of the chromosomes, association studies, which compare allele frequency between affected individuals (probands) and controls, must be performed across the entire human genome. With approximately 0.4 cMs between markers, 10,000 microsatellite markers would be necessary to fully saturate the genome (). For a study of 1000 probands and 1000 controls, 20 million genotypings would be required (). DNA pooling could greatly reduce the genotyping burden and speed up the initial gene mapping studies. Few techniques were used for SNP allele fraction by DNA pooling such as amplification and cleavage at SNP site (), primer extension (), amplification with allele-specific primers (), detection of conformational changes (), hybridization of PCR products to microarrays (), DHPLC () and Pyrosequence (). The allele frequency estimates measured of these techniques are about 2-5%. We have developed the unlabeled-probe technology to measure SNP allele fraction. This technique is fast, easy to design and inexpensive. The sensitivity and accuracy is between 1-2%. Method DNA sample of chromosome Y 192 DNA samples used in the analysis were collected in Tamil Nadu, South India from Brian Mowry’s lab in Queens Centre for Mental Health Research Wacol, Brishbane, Australia. All samples are from control individuals that were collected as a part of a larger study of complex disease and not associated with known disease phenotypes. DNA was extracted from established cell lines using standard protocols. Genotyping SNP Markers on Chromosome Y The protocols for genotyping many of the 237 polymorphic sites, which were analyzed on chromosome Y, have been published (Underhill et al. 2000, 2001; Hammer et al. 2001). 35 SNP markers were chosen for the south Indian evolution study. The 35 markers were listed in Table 1. Multiplex PCR Four to six times deep multiplex PCR was used for the first PCR. Multiplex PCR was performed on the Peltier Thermal Cycler PTC-200 (MJ Research) on 96-well plates. The PCR reaction is at 1.5uM Mg++, 0.4U Taq polymerase, 2mM dNTP, 0.5uM of 4 to 6 times multiple forward and reverse primers and 12.5ng human genomic DNA. The PCR condition is 94C for 3 minutes followed by 25 cycles with 94C for 15 second, 52C for 15 second and 72C for 15 second. One-thousandth PCR products were used for nested asymmetric PCR to amplify an individual marker with an exclusive probe. There were two polymorphisms for each marker. Both genotypes of probes were used for the SNP typing. The PCR reaction is 2.0uM Mg++, 0.4 Taq polymerase, 2mm dNTP, 0.05uM forward primer, 0.5uM reverse primer, 0.5uM probe with 3’ end phosphorylated and 1/1000 multiplex PCR product. The PCR was performed on the same thermal cycler with 384-well plates with the following condition: 94C for 2 minutes follow by 25 cycles with 94C for 5 second, 52C for 5 second, 72C for 10 second. Data analyses of SNP genotyping After PCR, the melting curve analyses will be on the LightScanner. The melting temperature will from 50C to 90C with the melting rate 0.1C/second and Automatic mode. The process takes only 5 minutes. The software CTWTool-1-18-03 was used to analyze the melting curve data. Two genotype probes were used side by side to do the genotyping. Determination of genotype was achieved by comparing the melting curves of two probes. Genomic DNA allele fraction and DNA pooling Cystic fibrosis mutation G542X is a single base change on exon 11 (G change to T). Human genomic DNA of cystic fibrosis, wild type and homozygous mutation, was used for the allele fraction study. Human genomic wild type DNA (CFTR 542 G) and cystic fibrosis homozygous mutant DNA (CFTR 542 T) were mixed in ratios from 0% to 100% in 10% increments and 2% increments from 0 to 10%, 20 to 30%, 45 to 55%, 70 to 80% and 90 to 100%. 3’ end phosphate wild type probe (-P) was used for the allele fraction test, 5’–CAATATAGTTCTTGGAGAAGGTGGAATC-P-3’. The primers and asymmetric PCR conditions were described by Zhou et al (). Ninety-six samples of genotyped human genomic DNA were pooled together. 50ng pooled DNA was used to determinate the population frequency by use of the unlabeledprobe technique. By comparing estimated allele fraction and actual allele fraction we were able to determine the sensitivity of this technique. Plots of –dF/dT were generated from the melting curve analysis by the software. Software for allele fraction (Bob Palais) Results SNP marker selection Over the past 15 years, DNA polymorphisms have been widely used to reconstruct human evolutionary history. Mitochondrial DNA was originally used for this purpose, because the high mutation rate produced numerous polymorphisms and the absence of recombination facilitated their interpretation. Thirty-five SNP markers that represent a set of sequence variants from the south Indian population were chosen to carry out the genotyping. Multiplex PCR For human genetic studies, such as looking for human genetic disease, tumor suppression gene and human evolution study, the human genomic DNA samples are always limited. Multiplex PCR has the ability to amplify different loci at same time, using the same amount of human DNA, consequently saving large quantities of human genomic DNA. Multiplex PCR has the ability to simultaneously amplify up to ten different amplicons. In this paper we focus on using unlabeled probes to genotype SNPs, consequently the multiplex PCR is only performed “six-times deep.” The purpose of multiplex PCR is to enrich the loci that need to be genotyped. Then, using nested PCR, genotyping individual loci should be very easy (Figure 1). Asymmetric PCR and unlabeled probe There are many different techniques to detect mutations or SNPs through the use of probes. TaqMan, Hybridization probe and Simple probe are the most common techniques. These techniques need one or two florescent labels at the end of the probe. Zhou et al has developed an unlabeled probe technique (). The key to this technique is asymmetric PCR with an unlabeled probe and melting with double strand dye LC Green Plus. Asymmetric PCR amplifies one strand much more than the complimentary strand. The probe direction is opposite to this strand with the 3’ end blocked. After PCR the unlabeled probe is added and a re-nature is performed, the strand with the same direction as the unlabeled probe will have competition to anneal to the opposite strand. The melting curve of symmetric PCR shows the amplicon peak but not the probe peak. After asymmetric PCR, during the re-nature phase, forward and reverse amplicon strands anneal, leaving a plethora of single-stranded DNA. These single strands of DNA anneal to the unlabeled probe. The melting curve of asymmetric PCR shows the probe peak and the amplicon peak. The process of 384-well plate melting only takes five minutes. Genotype determination All the SNP markers of chromosome Y have two types. We have typed both possibilities. Chromosome Y has only one allele so the probe melting curve shows only one peak, either 100% perfect match peak or 100% mismatch peak. Comparing both probes’ melting curves, the perfect match probe melting temperature will show 3 to 5C higher than the mismatch probe. Hence, the probe displaying a higher melting temperature determines the genotype (Figure 1). It is very clear and easy to distinguish two probes’ melting curves. The amplicon melting curves also show the different genotype if the SNP is small deletion (Figure 2) or A, T vs. C, G change (Figure 3). This allows for a double confirmation of the genotyping. To confirm the genotyping with unlabeled probe technique, we have chosen samples from each SNP marker for sequencing. The process for choosing samples for sequencing is as follows: for the SNP marker that does not have variation we chose the most unclear sample. For the SNP marker that has variation, we chose one of each genotype, these samples also being unclear. The result of sequencing is 100% match unlabeled probe. Twenty-four of the most important markers for south India population were tested on 192 samples by SNP short technique. Only one operator error was made out of 4,608 samples. High throughput genotyping with the aid of unlabeled probes is fast. It takes only five minutes after PCR. The unlabeled probe is a 20 to 30bp oligonucliotide with the 3’ end blocked. The unlabeled probe is very stable. It can be stored at room temperature for a few years with no light reaction. The unlabeled probe design does not need to consider the GC content, which gives it more flexibility than the TaqMan probe, Hybridization probe and Simple probe. The cost of an unlabeled probe is significantly lower than a fluorescently labeled probe. The data analysis is very simple as well. On chromosome Y, we have genotyped 35 SNP markers on 192 samples, which equals 6,720 genotypes. DNA pooling by unlabeled probe An allele fraction study was done by unlabeled probes to determine the sensitivity of unlabeled probes. Human genomic wild type DNA (CFTR 542 G) and cystic fibrosis mutant DNA (CFTR 542 T) were mixed in different ratios. We created mixtures in 10% increments of wild type DNA from 0-100% of genomic DNA. As the wild type DNA ratio increases, the perfect match peak height G::C increases and the mismatch peak height clearly decreases (Figure 3A). We also mixed 2% increments of wild type genomic DNA from 0-10% (Figure 3B), 20-30%, 45-55% (Figure 3C), 70-80%, and 90100%. As wild type DNA increases 2%, the perfect match peak height persistently increases and the mismatch peak height continually decreases. The estimated and actual allele frequencies are shown in talbe2. The allele fraction determination by unlabeled probe can be within 2%. Ninety-six samples of genotyped human genomic DNA were mixed as a “pool.” There are two possible SNPs for each probe. Either probe could be used to test allele frequency. (Bob Palais is going to write this part).