ChroY_revised2 - Department of Mathematics, University of Utah

advertisement
Large Scale SNP Scanning on Human Chromosome Y
and DNA Pooling Study Using Unlabeled Probes
Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah 84132
Abstract
High-throughput SNP scanning is an important tool for genome studies. Genotyping of
known mutations and scanning for unknown ones using high-resolution melting analysis
and unlabeled probes is simple, rapid, and inexpensive, requiring only PCR, an unlabeled
oligonucleotide, LCGreen Plus, and melting instrumentation. This method works on the
single-sample HR-1, the 384-sample LightScanner and the LightCycler. We have used
synthetic PCR constructs to demonstrate the detection of all possible SNP base changes
by high-resolution melting analysis. In all cases heterozygotes were easily identified
because the resulting heteroduplex, formed by the probe oligonucleotide and the
mismatched amplicon strand, which altered the shape of the melting curve. Chromosome
Y is an effective and simple target for evolution studies. Thirty-five SNP markers
distributed along the human Y chromosome have been characterized in 192 individuals
from south India on a 384-well LightScanner. DNA pooling is a practical way to reduce
the cost of large-scale evolution or association studies. Pooling allows the population
allele frequencies to be measured using far fewer PCR reactions and genotyping assays
than required when genotyping individuals one by one. We have developed an unlabeled
probe/high resolution melting methodology together with analysis software to determine
SNP frequencies in a pooled DNA sample. Different ratios of complementary and
mismatched amplicon strands from 0% to 100% were mixed and melted and
quantification software was optimized (calibrated?) using this model system. We
repeated this analysis using two genomic DNA samples homozygous for a G to A
mutation in the cystic fibrosis gene. When mixed in different ratios, and analyzed using
this methodology, the software was able to correctly determine the ratio of G to A
mutation in the mixture to an accuracy of 2% over the range of 0% to 100% of one allele.
This method was also applied to a pool of ninety-six human genomic DNA samples,
which previously had been genotyped individually at eight SNP markers on chromosome
Y. The analysis software was able to determine the allele frequencies to within 2%
accuracy across a range of frequencies from 3% to 23%. This method is very simple, fast
and inexpensive for the determination of SNP allele fraction.
Introduction
Single nucleotide polymorphisms (SNPs) are the most common source of human genetic
variation. Genotyping large numbers of SNPs in linkage, association studies and
evolution studies will aid in the understanding of complex diseases traits, including many
common human diseases, drug responses and human evolution (1). These applications
require reliable and economical methods for high-throughput SNP genotyping.
SNP genotyping methods include gel-based genotyping and non-gel-based genotyping.
Single-strand conformation polymorphic analysis (2) is one of the most widely used gelbased methods for mutation detection. Oligonucleotide Ligation Assay (OLA) and mini
sequencing (3) are also gel-based genotyping techniques. Gel-based genotyping methods
are still widely used in many labs for a small number of samples though they are labor
intensive and require experience and technical skills for analysis. Non-gel-based highthroughput genotyping techniques are rapidly developed. Pyrosequencing, which uses
single-base extension with fluorescence detection, and DNA microarray genotyping
could genotype large numbers of SNPs simultaneously. High-throughput genotyping
methods using fluorescently labeled oligonucleotides include TaqMan (4), Hybridization
probe (5), Simple probe (6), Invader assay (7) and allele-specific ligation (8) genotyping.
Previously, we have developed a non gel-based genotyping technique without the need
for fluorescently labeled probes (Refs here?). This technique uses melting of unlabeled
oligonucleotide probes and PCR products in the presence of a high-resolution double
stranded DNA dye, LCGreen Plus. In addition, a 3’ end blocked oligonucliotide is used
(the probe? Asymmetric PCR?). In this paper, we use this technique for high-throughput
genotyping and genome-wide association studies using DNA pooling. The PCR may be
performed on any 384-well thermocycler, and the melting carried out on the inexpensive
“LightScanner” machine.
Genome-wide association studies are necessary to identify genes underlying certain
complex diseases. Many genetic diseases have yet to be located on the human genome
for reasons that include their multiple loci and incomplete penetration. To pinpoint these
loci in terms of particular regions of the chromosomes, association studies, which
compare allele frequency between affected individuals (probands) and controls, must be
performed across the entire human genome. With approximately 0.4 cMs between
markers, 10,000 microsatellite markers would be necessary to fully saturate the genome
(). For a study of 1000 probands and 1000 controls, 20 million genotypings would be
required (). DNA pooling could greatly reduce the genotyping burden and speed up the
initial gene mapping studies. Techniques previously used for analyzing SNP allele
fraction by DNA pooling include amplification and cleavage at the SNP site (), primer
extension (), amplification with allele-specific primers (), detection of conformational
changes (), hybridization of PCR products to microarrays (), DHPLC () and
Pyrosequencing (). The allele frequency estimates measured by these techniques are
about 2-5%.
We demonstrate the technology we have developed for high-throughput genotyping
using unlabeled probes to study Chromosome Y evolution, by genotyping 35 SNPs in
192 samples from south India. We also use this method to measure SNP allele fraction
for the Cystic fibrosis mutation (G542X) in pooled DNA amplified from samples of
known genotype. The technique is fast, easy to design and inexpensive, with sensitivity
and accuracy between 1-2%.
Method
DNA samples of chromosome Y
192 DNA samples used in the analysis were collected in Tamil Nadu,
South India from Brian Mowry’s lab in Queens Centre for Mental Health
Research Wacol, Brisbane, Australia.
All samples are from control
individuals that were collected as a part of a larger study of complex
disease and not associated with known disease phenotypes.
DNA was
extracted from established cell lines using standard protocols.
Genotyping SNP Markers on Chromosome Y
The protocols for genotyping many of the 237 polymorphic sites which were analyzed on
chromosome Y have been published (Underhill et al. 2000, 2001; Hammer et al. 2001).
35 SNP markers were chosen for the south Indian evolution study. The 35 markers are
listed in Table 1.
Multiplex PCR
Four to six times deep multiplex PCR was used for the first PCR. Multiplex PCR was
performed on the Peltier Thermal Cycler PTC-200 (MJ Research) on 96-well plates. The
PCR reaction is at 1.5uM Mg++, 0.4U Taq polymerase, 2mM dNTP, 0.5uM of 4 to 6
times multiple forward and reverse primers and 12.5ng human genomic DNA. The PCR
condition is 94C for 3 minutes followed by 25 cycles with 94C for 15 second, 52C for
15 second and 72C for 15 second. One-thousandth PCR products were used for nested
asymmetric PCR to amplify an individual marker with an exclusive probe. There were
two polymorphisms (alleles?) for each marker. Both genotypes of probes were used for
the SNP typing. The PCR reaction is 2.0uM Mg++, 0.4 Taq polymerase, 2mm dNTP,
0.05uM forward primer, 0.5uM reverse primer, 0.5uM probe with 3’ end phosphorylated
and 1/1000 multiplex PCR product. The PCR was performed on the same thermal cycler
with 384-well plates with the following condition: 94C for 2 minutes follow by 25
cycles with 94C for 5 second, 52C for 5 second, 72C for 10 second.
Data analyses of SNP genotyping
After PCR, melting curve analysis is performed on the LightScanner. The temperature is
raised from 50C to 90C at a rate 0.1C/second in the Automatic mode. The process
takes only 5 minutes (seems like 400 seconds = 6 2/3 minutes). The software CTWTool1-18-03 was used to analyze the melting curve data (using older software for this?). Two
genotype probes were used side by side to do the genotyping. Determination of genotype
was achieved by comparing the melting curves of two probes. (Manually? Wouldn’t
automatic clustering be more suited to high-throughput?)
Genomic DNA allele fraction and DNA pooling
Human genomic DNA of cystic fibrosis wild type (CFTR 542 G) and homozygous
mutant (CFTR 542 T) genotypes was used for the allele fraction study. The cystic
fibrosis mutation (G542X) is a single base change on exon 11 (G>T). The two genotypes
were mixed in ratios from 0% to 100% in 10% increments and 2% increments from 0 to
10%, 20 to 30%, 45 to 55%, 70 to 80% and 90 to 100%. 3’ end phosphate wild type
probe (-P) was used for the allele fraction test, 5’–
CAATATAGTTCTTGGAGAAGGTGGAATC-P-3’. The primers and asymmetric PCR
conditions were described by Zhou et al ().
Ninety-six samples of genotyped human genomic DNA were pooled together. 50ng
pooled DNA was used to determine the population frequency by the unlabeled probe
technique. By comparing estimated allele fraction and actual allele fraction we were able
to determine the sensitivity of this technique.
Plots of the derivative melting curves of the pooled samples, –dF/dT, were generated
from the melting curve analysis by the software.
Software for allele fraction (Bob Palais)
Background fluorescence was removed from raw fluorescence vs. temperature data using
an exponential model for the background fit to the slope of the raw fluorescence curve in
two temperature regions, one below and one above the probe melting temperatures for
both genotypes. The resulting melting curves were normalized to the 0-100% range and
differentiated using the polynomial least-squares fit (Savitsky-Golay) method. Allele
fraction was determined by linear interpolation of the peak heights of the unknown
sample melted in the presence of unlabeled probes matching both genotypes and that of
pure samples and neighboring standard calibration curves having synthetically
determined allele fractions melted in the same conditions, and performing an equally
weighted average of the values obtained.
Results
SNP marker selection
Over the past 15 years, DNA polymorphisms have been widely used to reconstruct
human evolutionary history. Mitochondrial DNA was originally used for this purpose,
because the high mutation rate produced numerous polymorphisms, and the absence of
recombination facilitated their interpretation. Thirty-five SNP markers that represent a
set of sequence variants from the south Indian population were chosen to carry out the
genotyping.
Multiplex PCR
For human genetic studies, such as looking for human genetic disease, tumor suppression
genes and human evolution studies, the human genomic DNA samples are always
limited. Multiplex PCR has the ability to amplify different loci at same time, using the
same amount of human DNA, consequently saving large quantities of human genomic
DNA. Multiplex PCR has the ability to simultaneously amplify up to ten different
amplicons. In this paper we focus on using unlabeled probes to genotype SNPs,
consequently only six-plex PCR is performed The purpose of multiplex PCR is to enrich
the loci that need to be genotyped. Then, using nested PCR, the multiplex enrichment is
followed by asymmetric PCR for easy probing the genotypes of individual loci (Figure
1).
Asymmetric PCR and unlabeled probes
There are many different techniques to detect mutations or SNPs through the use of
probes. TaqMan, Hybridization probes and Simple probes are the most common
techniques. These techniques need one or two florescent labels at the end of the probe.
Zhou et al has developed a technique to determine mutations using probes without
fluorescent labels. (). The key to this technique is the use of asymmetric PCR and
melting of the product with an unlabeled probe in the presence of the high-resolution
double stranded DNA dye, LC Green Plus. Asymmetric PCR amplifies one strand
much more than the complementary strand. The probe direction is opposite to this strand
with the 3’ end blocked (to prevent extension). After PCR, when the unlabeled probe is
added and hybridization is promoted, the probe and the strand that shares its sequence
will compete to anneal with the opposite strand. The derivative melting curve of a
symmetric PCR product shows the amplicon peak but not the probe peak. After
asymmetric PCR, when hybridization is promoted, annealing of forward and reverse
amplicon strands is limited by the lower concentration of one strand, leaving a plethora of
(the other strand.) single-stranded DNA. These single strands of DNA anneal to the
unlabeled probe. After this process, the derivative melting curve of an asymmetric PCR
product has a lower temperature peak where probes melt from single amplicon strands,
and a higher temperature peak where amplicons melt. The process of 384-well plate
melting only takes five minutes.
Genotype determination
All SNP markers of chromosome Y have two allele types. We have typed both
possibilities. Genomic DNA from chromosome Y has only one allele so the probe
melting curve always shows only one peak, either a 100% perfect match peak or 100%
mismatch peak, in contrast with 50% of each in the case of heterozygous DNA from
other chromosomes. The melting temperature of the probe which is perfectly
complementary to the sample genotype will appear 3 to 5C higher than that of the probe
with one base mismatched. Hence, the probe displaying a higher melting temperature
determines the genotype by complementarity (Figure 1). The matched and mismatched
probe melting curves may be easily distinguished visually by a human technician, or by
automatic clustering or classification. Different genotypes can also be distinguished
based on the amplicon melting curves if the SNP is small deletion (Figure 2) or A, T vs.
C, G change (Figure 3). This allows for a double confirmation of the genotyping.
To confirm the genotyping obtained using the unlabeled probe technique, we have chosen
samples from each SNP marker for sequencing. The process for choosing samples for
sequencing is as follows: For a SNP marker that does not have any variation we chose
the most ambiguous sample and for a SNP marker that has variation, we chose one the
most ambiguous sample of each genotype. The result of sequencing was in 100%
agreement with the result of genotyping by unlabeled probes. Twenty-four of the most
important markers for south India population were tested on 192 samples by SNP short
technique (?). Only one operator error was made out of 4,608 samples.
High throughput genotyping with the aid of unlabeled probes is fast, taking about five
minutes after PCR. The unlabeled probe is a 20-30bp oligonucliotide with the 3’ end
blocked. The unlabeled probe is very stable: It can be stored at room temperature for a
few years with no light reaction or other degradation. Unlabeled probe design is not
sensitive to GC content, which gives it more flexibility than the TaqMan probe,
Hybridization probe and the Simple probe. The cost of an unlabeled probe is
significantly lower than a fluorescently labeled probe. The data analysis is very simple as
well. On chromosome Y, we have genotyped 35 SNP markers on 192 samples, for a total
of 6,720 genotypes determined.
DNA pooling by unlabeled probe
An allele fraction study was done by unlabeled probes to determine the sensitivity of
unlabeled probes. Human genomic wild type DNA (CFTR 542 G) and cystic fibrosis
mutant DNA (CFTR 542 T) were mixed in different ratios. We created mixtures of of
wild type and mutant genomic DNA from 0-100% wild type in 10% increments. As the
wild type DNA ratio increases, the perfect match peak height G::C increases and the
mismatch peak height clearly decreases (Figure 3A). We also mixed 2% increments of
wild type genomic DNA from 0-10% (Figure 3B), 20-30%, 45-55% (Figure 3C), 7080%, and 90-100%. As wild type DNA increases 2%, the perfect match peak height
persistently increases and the mismatch peak height continually decreases. The estimated
and actual allele frequencies are shown in Table 2. The allele fraction can be determined
by the unlabeled probe technique within 2%.
After calibrating the analysis software in this manner, 96 samples of human genomic
DNA having known genotype from among two genotypes were mixed as a “pool.”
Samples taken from the mixture were hybridized with probes complementary to each
genotype then high-resolution melting and analysis was performed. The known allele
fraction of the mixture was correctly estimated to within +/- 2%, based on the analysis
software’s comparison of the height of probe melting peaks of the mixture with those of
samples of pure genotype. The software is automatic and simple to use. It is also possible
to call the genotypes in the high-throughput evolution study automatically with the
software with high accuracy. This is important when we are considering thousands, and
ultimately millions of genotyping experiments.
Download