Large Scale SNP Scanning on Human Chromosome Y and DNA

advertisement
Large Scale SNP Scanning on Human Chromosome Y
and DNA Pooling Study by Unlabeled Probe
Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah 84132
Abstract
High-throughput SNP scanning is an important tool for genome studies. We have used
synthetic PCR constructs to demonstrate the detection of all possible SNP base changes.
LCGreen Plus was included in the PCR reaction and high-resolution melting analysis was
performed five minutes after amplification. In all cases heterozygotes were easily
identified because the resulting heteroduplex, formed by the probe oligonucleotide and
the amplicon, altered the shape of the melting curve. Analysis of known mutations using
high-resolution melting analysis and unlabeled probes is simple, rapid, and inexpensive.
This only requires PCR, an unlabeled oligonucleotide, LCGreen Plus, and melting
instrumentation. This method works on the single-sample HR-1, the 384-sample
LightScanner and the LightCycler. Chromosome Y is an effective and simple target for
evolution studies. Thirty-five SNP markers, distributed along the human Y chromosome,
have been characterized in 192 individuals of south India on a 384-well LightScanner.
DNA pooling is a practical way to reduce the cost of large-scale evolution or association
studies. Pooling allows the population allele frequencies to be measured using far fewer
PCR reactions and genotyping assays than required when genotyping individuals one by
one. We have developed an unlabeled probe/high resolution melting methodology
together with analysis software to determine SNP frequencies in the pooled DNA sample.
Different ratios of complementary and mismatched amplicon strands from 0% to 100%
were mixed and melted and software was optimized using this model system. We
repeated this analysis using two genomic DNAs homozygous for a G to A mutation in the
cystic fibrosis gene. When mixed in different ratios, and analyzed using this
methodology, the software was able to correctly determine the ratio of G to A mutation in
the mixture to an accuracy of 2% over the range of 0% to 100% of one allele. This
method was also applied to a pool of ninety-six human genomic DNA samples, which
previously had been genotyped individually at eight SNP markers on chromosome Y.
The analysis software was able to determine the allele frequencies to within 2% accuracy
across a range of frequencies from 3% to 23%. This method is very simple, fast and
inexpensive for the determination of SNP allele fraction.
Introduction
Single nucleotide polymorphisms (SNPs) are the most common source of human genetic
variation. Genotyping large numbers of SNPs in linkage, association studies and
evolution studies will aid in the understanding of complex diseases traits, including many
common human diseases, drug responses and human evolution (1). These applications
require reliable and economical methods for high-throughput SNP genotyping.
SNP genotyping methods include gel-based genotyping and non-gel-based genotyping.
Single-strand conformation polymorphic analysis (2) is one of the most widely used gelbased methods for mutation detection. Oligonucleotide Ligation Assay (OLA) and mini
sequencing (3) are also gel-based genotyping. Gel-based genotyping methods are still
widely used in many labs for a small number of samples though it is labor intensive and
requires experience and technical skills for analysis. Non-gel-based high-throughput
genotyping techniques are rapidly developed. Pyrosequence, which uses single-base
extension with fluorescence detection and DNA microarray genotyping could handle
large numbers of SNP genotyping. Labeled with fluorescence, the oligonucliotide highthroughput genotyping methods are TaqMan (4), Hybridization probe (5), Simple probe
(6), Invader assay (7) and allele-specific ligation (8) genotyping. We developed a non
gel-based genotyping technique. This technique uses a non-fluorescently labeled probe
in conjunction with homogeneous melting of PCR products in a double strand DNA dye
called LCGreen Plus. In addition, a 3’ end blocked oligonucliotide is used. In this paper,
we used this unlabeled probe technique on high-throughput genotyping. The PCR may
be performed on any 384-well thermal cycler, and the melting carried out on the
inexpensive machine called “LightScanner.” Chromosome Y evolution is a good
candidate for us to do a large number of SNP genotyping via unlabeled probe. 35 SNPs
and 192 samples from south India were genotyped by unlabeled probe. Genome-wide
association studies are necessary to identify genes underlying certain complex diseases.
Many genetic diseases have yet to be located on the human genome for reasons that
include their multiple loci and incomplete penetration. To pinpoint these loci in terms of
particular regions of the chromosomes, association studies, which compare allele
frequency between affected individuals (probands) and controls, must be performed
across the entire human genome. With approximately 0.4 cMs between markers, 10,000
microsatellite markers would be necessary to fully saturate the genome (). For a study of
1000 probands and 1000 controls, 20 million genotypings would be required (). DNA
pooling could greatly reduce the genotyping burden and speed up the initial gene
mapping studies. Few techniques were used for SNP allele fraction by DNA pooling
such as amplification and cleavage at SNP site (), primer extension (), amplification with
allele-specific primers (), detection of conformational changes (), hybridization of PCR
products to microarrays (), DHPLC () and Pyrosequence (). The allele frequency
estimates measured of these techniques are about 2-5%. We have developed the
unlabeled-probe technology to measure SNP allele fraction. This technique is fast, easy
to design and inexpensive. The sensitivity and accuracy is between 1-2%.
Method
DNA sample of chromosome Y
192 DNA samples used in the analysis were collected in Tamil Nadu,
South India from Brian Mowry’s lab in Queens Centre for Mental Health
Research Wacol, Brishbane, Australia.
All samples are from control
individuals that were collected as a part of a larger study of complex
disease and not associated with known disease phenotypes.
DNA was
extracted from established cell lines using standard protocols.
Genotyping SNP Markers on Chromosome Y
The protocols for genotyping many of the 237 polymorphic sites, which were analyzed
on chromosome Y, have been published (Underhill et al. 2000, 2001; Hammer et al.
2001). 35 SNP markers were chosen for the south Indian evolution study. The 35
markers were listed in Table 1.
Multiplex PCR
Four to six times deep multiplex PCR was used for the first PCR. Multiplex PCR was
performed on the Peltier Thermal Cycler PTC-200 (MJ Research) on 96-well plates. The
PCR reaction is at 1.5uM Mg++, 0.4U Taq polymerase, 2mM dNTP, 0.5uM of 4 to 6
times multiple forward and reverse primers and 12.5ng human genomic DNA. The PCR
condition is 94C for 3 minutes followed by 25 cycles with 94C for 15 second, 52C for
15 second and 72C for 15 second. One-thousandth PCR products were used for nested
asymmetric PCR to amplify an individual marker with an exclusive probe. There were
two polymorphisms for each marker. Both genotypes of probes were used for the SNP
typing. The PCR reaction is 2.0uM Mg++, 0.4 Taq polymerase, 2mm dNTP, 0.05uM
forward primer, 0.5uM reverse primer, 0.5uM probe with 3’ end phosphorylated and
1/1000 multiplex PCR product. The PCR was performed on the same thermal cycler
with 384-well plates with the following condition: 94C for 2 minutes follow by 25
cycles with 94C for 5 second, 52C for 5 second, 72C for 10 second.
Data analyses of SNP genotyping
After PCR, the melting curve analyses will be on the LightScanner. The melting
temperature will from 50C to 90C with the melting rate 0.1C/second and Automatic
mode. The process takes only 5 minutes. The software CTWTool-1-18-03 was used to
analyze the melting curve data. Two genotype probes were used side by side to do the
genotyping. Determination of genotype was achieved by comparing the melting curves
of two probes.
Genomic DNA allele fraction and DNA pooling
Cystic fibrosis mutation G542X is a single base change on exon 11 (G change to T).
Human genomic DNA of cystic fibrosis, wild type and homozygous mutation, was used
for the allele fraction study. Human genomic wild type DNA (CFTR 542 G) and cystic
fibrosis homozygous mutant DNA (CFTR 542 T) were mixed in ratios from 0% to 100%
in 10% increments and 2% increments from 0 to 10%, 20 to 30%, 45 to 55%, 70 to 80%
and 90 to 100%. 3’ end phosphate wild type probe (-P) was used for the allele fraction
test, 5’–CAATATAGTTCTTGGAGAAGGTGGAATC-P-3’. The primers and
asymmetric PCR conditions were described by Zhou et al ().
Ninety-six samples of genotyped human genomic DNA were pooled together. 50ng
pooled DNA was used to determinate the population frequency by use of the unlabeledprobe technique. By comparing estimated allele fraction and actual allele fraction we
were able to determine the sensitivity of this technique.
Plots of –dF/dT were generated from the melting curve analysis by the software.
Software for allele fraction (Bob Palais)
Results
SNP marker selection
Over the past 15 years, DNA polymorphisms have been widely used to reconstruct
human evolutionary history. Mitochondrial DNA was originally used for this purpose,
because the high mutation rate produced numerous polymorphisms and the absence of
recombination facilitated their interpretation. Thirty-five SNP markers that represent a
set of sequence variants from the south Indian population were chosen to carry out the
genotyping.
Multiplex PCR
For human genetic studies, such as looking for human genetic disease, tumor suppression
gene and human evolution study, the human genomic DNA samples are always limited.
Multiplex PCR has the ability to amplify different loci at same time, using the same
amount of human DNA, consequently saving large quantities of human genomic DNA.
Multiplex PCR has the ability to simultaneously amplify up to ten different amplicons.
In this paper we focus on using unlabeled probes to genotype SNPs, consequently the
multiplex PCR is only performed “six-times deep.” The purpose of multiplex PCR is to
enrich the loci that need to be genotyped. Then, using nested PCR, genotyping individual
loci should be very easy (Figure 1).
Asymmetric PCR and unlabeled probe
There are many different techniques to detect mutations or SNPs through the use of
probes. TaqMan, Hybridization probe and Simple probe are the most common
techniques. These techniques need one or two florescent labels at the end of the probe.
Zhou et al has developed an unlabeled probe technique (). The key to this technique is
asymmetric PCR with an unlabeled probe and melting with double strand dye LC Green
Plus. Asymmetric PCR amplifies one strand much more than the complimentary strand.
The probe direction is opposite to this strand with the 3’ end blocked. After PCR the
unlabeled probe is added and a re-nature is performed, the strand with the same direction
as the unlabeled probe will have competition to anneal to the opposite strand. The
melting curve of symmetric PCR shows the amplicon peak but not the probe peak. After
asymmetric PCR, during the re-nature phase, forward and reverse amplicon strands
anneal, leaving a plethora of single-stranded DNA. These single strands of DNA anneal
to the unlabeled probe. The melting curve of asymmetric PCR shows the probe peak and
the amplicon peak. The process of 384-well plate melting only takes five minutes.
Genotype determination
All the SNP markers of chromosome Y have two types. We have typed both
possibilities. Chromosome Y has only one allele so the probe melting curve shows only
one peak, either 100% perfect match peak or 100% mismatch peak. Comparing both
probes’ melting curves, the perfect match probe melting temperature will show 3 to 5C
higher than the mismatch probe. Hence, the probe displaying a higher melting
temperature determines the genotype (Figure 1). It is very clear and easy to distinguish
two probes’ melting curves. The amplicon melting curves also show the different
genotype if the SNP is small deletion (Figure 2) or A, T vs. C, G change (Figure 3). This
allows for a double confirmation of the genotyping.
To confirm the genotyping with unlabeled probe technique, we have chosen samples
from each SNP marker for sequencing. The process for choosing samples for sequencing
is as follows: for the SNP marker that does not have variation we chose the most unclear
sample. For the SNP marker that has variation, we chose one of each genotype, these
samples also being unclear. The result of sequencing is 100% match unlabeled probe.
Twenty-four of the most important markers for south India population were tested on 192
samples by SNP short technique. Only one operator error was made out of 4,608
samples.
High throughput genotyping with the aid of unlabeled probes is fast. It takes only five
minutes after PCR. The unlabeled probe is a 20 to 30bp oligonucliotide with the 3’ end
blocked. The unlabeled probe is very stable. It can be stored at room temperature for a
few years with no light reaction. The unlabeled probe design does not need to consider
the GC content, which gives it more flexibility than the TaqMan probe, Hybridization
probe and Simple probe. The cost of an unlabeled probe is significantly lower than a
fluorescently labeled probe. The data analysis is very simple as well. On chromosome
Y, we have genotyped 35 SNP markers on 192 samples, which equals 6,720 genotypes.
DNA pooling by unlabeled probe
An allele fraction study was done by unlabeled probes to determine the sensitivity of
unlabeled probes. Human genomic wild type DNA (CFTR 542 G) and cystic fibrosis
mutant DNA (CFTR 542 T) were mixed in different ratios. We created mixtures in 10%
increments of wild type DNA from 0-100% of genomic DNA. As the wild type DNA
ratio increases, the perfect match peak height G::C increases and the mismatch peak
height clearly decreases (Figure 3A). We also mixed 2% increments of wild type
genomic DNA from 0-10% (Figure 3B), 20-30%, 45-55% (Figure 3C), 70-80%, and 90100%. As wild type DNA increases 2%, the perfect match peak height persistently
increases and the mismatch peak height continually decreases. The estimated and actual
allele frequencies are shown in talbe2. The allele fraction determination by unlabeled
probe can be within 2%.
Ninety-six samples of genotyped human genomic DNA were mixed as a “pool.” There
are two possible SNPs for each probe. Either probe could be used to test allele
frequency. (Bob Palais is going to write this part).
Download