Large Scale SNP Scanning on Human Chromosome Y and DNA

advertisement
Large Scale SNP Scanning on Human Chromosome Y
and DNA Pooling Study Using Unlabeled Probes
Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah 84132
Abstract
High-throughput SNP scanning is an important tool for genome studies. Genotyping of
known mutations and scanning for unknown ones using high-resolution melting analysis
and unlabeled probes is simple, rapid, and inexpensive, requiring only PCR, an unlabeled
oligonucleotide, LCGreen Plus, and melting instrumentation. This method works on the
single-sample HR-1, the 384-sample LightScanner and the LightCycler. We have used
synthetic PCR constructs to demonstrate the detection of all possible SNP base changes
by high-resolution melting analysis. In all cases heterozygotes were easily identified
because the resulting heteroduplex, formed by the probe oligonucleotide and the
mismatched amplicon strand, which altered the shape of the melting curve. Chromosome
Y is an effective and simple target for evolution studies. Thirty-five SNP markers
distributed along the human Y chromosome have been characterized in 192 individuals
from south India on a 384-well LightScanner. DNA pooling is a practical way to reduce
the cost of large-scale evolution or association studies. Pooling allows the population
allele frequencies to be measured using far fewer PCR reactions and genotyping assays
than required when genotyping individuals one by one. We have developed an unlabeled
probe/high resolution melting methodology together with analysis software to determine
SNP frequencies in a pooled DNA sample. Different ratios of complementary and
mismatched amplicon strands from 0% to 100% were mixed and melted and
quantification software was optimized (calibrated?) using this model system. We
repeated this analysis using two genomic DNA samples homozygous for a G to A
mutation in the cystic fibrosis gene. When mixed in different ratios, and analyzed using
this methodology, the software was able to correctly determine the ratio of G to A
mutation in the mixture to an accuracy of 2% over the range of 0% to 100% of one allele.
This method was also applied to a pool of ninety-six human genomic DNA samples,
which previously had been genotyped individually at eight SNP markers on chromosome
Y. The analysis software was able to determine the allele frequencies to within 2%
accuracy across a range of frequencies from 3% to 23%. This method is very simple, fast
and inexpensive for the determination of SNP allele fraction.
Introduction
Single nucleotide polymorphisms (SNPs) are the most common source of human genetic
variation. Genotyping large numbers of SNPs in linkage, association studies and
evolution studies will aid in the understanding of complex diseases traits, including many
common human diseases, drug responses and human evolution (1, 2, 3). These
applications require reliable and economical methods for high-throughput SNP
genotyping.
SNP genotyping methods include gel-based genotyping and non-gel-based genotyping.
Single-strand conformation polymorphic analysis is one of the most widely used gelbased methods for mutation detection (4). Oligonucleotide Ligation Assay (OLA) and
mini sequencing are also gel-based genotyping techniques (5, 6). Gel-based genotyping
methods are still widely used in many labs for a small number of samples though they are
labor intensive and require experience and technical skills for analysis. Non-gel-based
high-throughput genotyping techniques are rapidly developed. Pyrosequencing, which
uses single-base extension with fluorescence detection (7, 8) and DNA microarray
genotyping could genotype large numbers of SNPs simultaneously (9, 10, 11). Highthroughput genotyping methods using fluorescently labeled oligonucleotides include
TaqMan (12, 13), Hybridization probe (14, 15), Simple probe (16), Invader assay (17)
and allele-specific ligation (18) genotyping.
Previously, we have developed a non gel-based genotyping technique without the need
for fluorescently labeled probes (19, 20). This technique uses melting of unlabeled
oligonucleotide probes and PCR products in the presence of a high-resolution double
stranded DNA dye, LCGreen Plus. In addition, a 3’ end blocked oligonucliotide and
asymmetric PCR are used. In this paper, we use this technique for high-throughput
genotyping and genome-wide association studies using DNA pooling. The PCR may be
performed on any 384-well thermocycler, and the melting carried out on the inexpensive
“LightScanner” machine.
Genome-wide association studies are necessary to identify genes underlying certain
complex diseases. Many genetic diseases have yet to be located on the human genome
for reasons that include their multiple loci and incomplete penetration. To pinpoint these
loci in terms of particular regions of the chromosomes, association studies, which
compare allele frequency between affected individuals (probands) and controls, must be
performed across the entire human genome. With approximately 0.4 cMs between
markers, 10,000 microsatellite markers would be necessary to fully saturate the genome.
For a study of 1000 probands and 1000 controls, 20 million genotypings would be
required (20). DNA pooling could greatly reduce the genotyping burden and speed up
the initial gene mapping studies (21, 22). Techniques previously used for analyzing SNP
allele fraction by DNA pooling include amplification and cleavage at the SNP site (23),
primer extension (24), amplification with allele-specific primers (25), detection of
conformational changes (26), hybridization of PCR products to microarrays (27), DHPLC
(28) and Pyrosequencing (29). The allele frequency estimates measured by these
techniques are about 2-5%.
We demonstrate the technology we have developed for high-throughput genotyping using
unlabeled probes to study Chromosome Y evolution, by genotyping 35 SNPs in 192
samples from south India. We also use this method to measure SNP allele fraction for the
Cystic fibrosis mutation (G542X) in pooled DNA amplified from samples of known
genotype. The technique is fast, easy to design and inexpensive, with sensitivity and
accuracy between 1-2%.
Method
DNA samples of chromosome Y
192 DNA samples used in the analysis were collected in Tamil Nadu,
South India from Brian Mowry’s lab in Queens Centre for Mental Health
Research Wacol, Brisbane, Australia.
All samples are from control
individuals that were collected as a part of a larger study of complex
disease and not associated with known disease phenotypes.
Genotyping SNP Markers on Chromosome Y
The protocols for genotyping many of the 237 polymorphic sites which were analyzed on
chromosome Y have been published (30, 31, 32). 35 SNP markers were chosen for the
south Indian evolution study. The 35 markers are listed in online supplementary table 1.
Multiplex PCR
Four to six times deep multiplex PCR was used for the first PCR. Multiplex PCR was
performed on the Peltier Thermal Cycler PTC-200 (MJ Research) on 96-well plates. The
PCR reaction is at 1.5uM Mg++, 0.4U Taq polymerase, 2mM dNTP, 0.5uM of 4 to 6
times multiple forward and reverse primers and 12.5ng human genomic DNA. The PCR
condition is 94C for 3 minutes followed by 25 cycles with 94C for 15 second, 52C for
15 second and 72C for 15 second. One-thousandth PCR products were used for nested
asymmetric PCR to amplify an individual marker with an exclusive probe. There were
two possible polymorphisms or alleles for each marker. Both genotypes of probes were
used for the SNP typing. The PCR reaction is 2.0uM Mg++, 0.4 Taq polymerase, 2mm
dNTP, 0.05uM forward primer, 0.5uM reverse primer, 0.5uM probe with 3’ end
phosphorylated and 1/1000 multiplex PCR product. The PCR was performed on the
same thermal cycler with 384-well plates with the following condition: 94C for 2
minutes follows by 25 cycles with 94C for 5 second, 52C for 5 second, and 72C for 10
second.
Data analyses of SNP genotyping
After PCR, melting curve analysis is performed on the LightScanner. The temperature is
raised from 50C to 90C at a rate 0.1C/second in the Automatic mode. The process
takes about 5 minutes. The software HiR2 was used to analyze the melting curve data.
Two genotype probes were used side by side to do the genotyping. Determination of
genotype was achieved by comparing the melting curves of two probes.
Genomic DNA allele fraction and DNA pooling
Human genomic DNA of cystic fibrosis wild type (CFTR 542 G) and homozygous
mutant (CFTR 542 T) genotypes was used for the allele fraction study. The cystic
fibrosis mutation (G542X) is a single base change on exon 11 (G>T). The two genotypes
were mixed in ratios from 0% to 100% in 10% increments and 2% increments from 0 to
10%, 20 to 30%, 45 to 55%, 70 to 80% and 90 to 100%. 3’ end phosphate wild type
probe (-P) was used for the allele fraction test, 5’–
CAATATAGTTCTTGGAGAAGGTGGAATC-P-3’. The primers and asymmetric PCR
conditions were described by Zhou et al (19).
Ninety-six samples of genotyped human genomic DNA were pooled together. 50ng
pooled DNA was used to determine the population frequency by the unlabeled probe
technique. By comparing estimated allele fraction and actual allele fraction we were able
to determine the sensitivity of this technique.
Plots of the derivative melting curves of the pooled samples, –dF/dT, were generated
from the melting curve analysis by the software.
Software for allele fraction
Background fluorescence was removed from raw fluorescence vs. temperature data using
an exponential model for the background fit to the slope of the raw fluorescence curve in
two temperature regions, one below and one above the probe melting temperatures for
both genotypes. The resulting melting curves were normalized to the 0-100% range and
differentiated using the polynomial least-squares fit (Savitsky-Golay) method. Allele
fraction was determined by linear interpolation of the peak heights of the unknown
sample melted in the presence of unlabeled probes matching both genotypes and that of
pure samples and neighboring standard calibration curves having synthetically
determined allele fractions melted in the same conditions, and performing an equally
weighted average of the values obtained.
Results
SNP marker selection
Over the past 15 years, DNA polymorphisms have been widely used to reconstruct
human evolutionary history. Mitochondrial DNA was originally used for this purpose,
because the high mutation rate produced numerous polymorphisms, and the absence of
recombination facilitated their interpretation. Thirty-five SNP markers that represent a
set of sequence variants from the south Indian population were chosen to carry out the
genotyping.
Multiplex PCR
For human genetic studies, such as looking for human genetic disease, tumor suppression
genes and human evolution studies, the human genomic DNA samples are always
limited. Multiplex PCR has the ability to amplify different loci at same time, using the
same amount of human DNA, consequently saving large quantities of human genomic
DNA. Multiplex PCR has the ability to simultaneously amplify up to ten different
amplicons. In this paper we focus on using unlabeled probes to genotype SNPs,
consequently only six-plex PCR is performed. The purpose of multiplex PCR is to enrich
the loci that need to be genotyped. Then, using nested PCR, the multiplex enrichment is
followed by asymmetric PCR for easy probing the genotypes of individual loci.
Asymmetric PCR and unlabeled probes
There are many different techniques to detect mutations or SNPs through the use of
probes. TaqMan, Hybridization probes and Simple probes are the most common
techniques. These techniques need one or two florescent labels at the end of the probe.
Zhou et al has developed a technique to determine mutations using probes without
fluorescent labels (19, 20). The key to this technique is the use of asymmetric PCR and
melting of the product with an unlabeled probe in the presence of the high-resolution
double stranded DNA dye, LC Green Plus. Asymmetric PCR amplifies one strand
much more than the complementary strand. The probe direction is opposite to this strand
with the 3’ end blocked (to prevent extension). After PCR, when the unlabeled probe is
added and hybridization is promoted, the probe and the strand that shares its sequence
will compete to anneal with the opposite strand. The derivative melting curve of a
symmetric PCR product shows the amplicon peak but not the probe peak. After
asymmetric PCR, when hybridization is promoted, annealing of forward and reverse
amplicon strands is limited by the lower concentration of one strand, leaving a plethora of
single-stranded DNA. These single strands of DNA anneal to the unlabeled probe. After
this process, the derivative melting curve of an asymmetric PCR product has a lower
temperature peak where probes melt from single amplicon strands, and a higher
temperature peak where amplicons melt (19). The process of 384-well plate melting only
takes five minutes.
Genotype determination
All SNP markers of chromosome Y have two allele types. We have typed both
possibilities. Genomic DNA from chromosome Y has only one allele so the probe
melting curve always shows only one peak, either a 100% perfect match peak or 100%
mismatch peak, in contrast with 50% of each in the case of heterozygous DNA from
other chromosomes. The melting temperature of the probe which is perfectly
complementary to the sample genotype will appear 3 to 5C higher than that of the probe
with one base mismatched. Hence, the probe displaying a higher melting temperature
determines the genotype by complementarity. The matched and mismatched probe
melting curves may be easily distinguished visually by a human technician, or by
automatic clustering or classification. Different genotypes can also be distinguished
based on the amplicon melting curves if the SNP is small deletion (Figure 1) or A, T vs.
C, G change (Figure 2). This allows for a double confirmation of the genotyping.
To confirm the genotyping obtained using the unlabeled probe technique, we have chosen
samples from each SNP marker for sequencing. The process for choosing samples for
sequencing is as follows: For a SNP marker that does not have any variation we chose
the most ambiguous sample and for a SNP marker that has variation, we chose one the
most ambiguous sample of each genotype. The result of sequencing was in 100%
agreement with the result of genotyping by unlabeled probes. Twenty-four of the most
important markers for south India population were tested on 192 samples by ABI Prism
SNaPshot technique™. Only one operator error was made out of 4,608 samples.
High throughput genotyping with the aid of unlabeled probes is fast, taking about five
minutes after PCR. The unlabeled probe is a 20-30bp oligonucliotide with the 3’ end
blocked. The unlabeled probe is very stable: It can be stored at room temperature for a
few years with no light reaction or other degradation. Unlabeled probe design is not
sensitive to GC content, which gives it more flexibility than the TaqMan probe,
Hybridization probe and the Simple probe. The cost of an unlabeled probe is
significantly lower than a fluorescently labeled probe. The data analysis is very simple as
well. On chromosome Y, we have genotyped 35 SNP markers on 192 samples, for a total
of 6,720 genotypes determined.
DNA pooling by unlabeled probe
An allele fraction study was done by unlabeled probes to determine the sensitivity of
unlabeled probes. Human genomic wild type DNA (CFTR 542 G) and cystic fibrosis
mutant DNA (CFTR 542 T) were mixed in different ratios. We created mixtures of wild
type and mutant genomic DNA from 0-100% wild type in 10% increments. As the wild
type DNA ratio increases, the perfect match peak height G::C increases and the mismatch
peak height clearly decreases (Figure 3A). We also mixed 2% increments of wild type
genomic DNA from 0-10% (Figure 3B), 20-30%, 45-55%, 70-80%, and 90-100%. As
wild type DNA increases 2%, the perfect match peak height persistently increases and the
mismatch peak height continually decreases. The estimated and actual allele frequencies
are shown in Table 1. The allele fraction can be determined by the unlabeled probe
technique within 2%.
After calibrating the analysis software in this manner, 96 samples of human genomic
DNA having known genotype from among two genotypes were mixed as a “pool.”
Samples taken from the mixture were hybridized with probes complementary to each
genotype then high-resolution melting and analysis was performed. The known allele
fraction of the mixture was correctly estimated to within +/- 2%, based on the analysis
software’s comparison of the height of probe melting peaks of the mixture with those of
samples of pure genotype. The software is automatic and simple to use. It is also possible
to call the genotypes in the high-throughput evolution study automatically with the
software with high accuracy. This is important when we are considering thousands, and
ultimately millions of genotyping experiments.
Reference
1. Risch NJ, Searching for genetic determinants in the new millennium. 2000.
Nature. 405(6788):847-56.
2. Akey J., Jin L. and Xiong M. 2001. Haplotypes vs single marker linkage
disequilibrium tests: what do we gain? Eur. J. Hum. Genet. 9(4):291-300.
3. Zollner S, von Haeseler A. 2000. A coalescent approach to study linkage
disequilibrium between single-nucleotide polymorphisms.Am J Hum Genet.
66(2):615-28.
4. Neibergs HL, Dietz AB, Womack JE. 1993. Single-strand conformation
polymorphisms (SSCPs) detected in five bovine genes. Anim Genet. 24(2):81-4.
5. Landegren, U., Kaiser, R., Sanders, J. and Hood, L., 1988. A ligase-mediated gene
detection technique.
Science 241, 1077–1080.
6. Bernat M., Titos F. and Clària J., 2002. Genet. Mol. Res. 1 (1): 72-78.
7. Ronaghi, M., Karamohamed, S., Petterson, B., Uhlen, M., Nyrén, P. 1996. Realtime DNA sequencing using detection of pyrophosphate release. Analytical
Biochemistry. 242, 84-89.
8. Alderborn A, Kristofferson A, Hammerling U. 2000. Determination of singlenucleotide polymorphisms by real-time pyrophosphate DNA sequencing. Genome
Res 10: 1249-1258.
9. Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J. 2005. Independence
and reproducibility across microarray platforms. 2(5):337-44.
10. Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E,
Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka
AE, Kawasaki E, Martinez-Murillo F, Morsberger L, Lee H, Petersen D,
Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W. 2005. Multiplelaboratory comparison of microarray platforms. 2(5):345-50.
11. Jares P, Campo E. 2006. Genomic platforms for cancer research: potential
diagnostic and prognostic applications in clinical oncology. 8(3):161-72.
12. Onay VU, Briollais L, Knight JA, Shi E, Wang Y, Wells S, Li H, Rajendram I,
Andrulis IL, Ozcelik H. 2006. SNP-SNP interactions in breast cancer
susceptibility. 6(1):114. (Livak KJ, Flood SJ, Marmaro J, Giusti W, Deetz K.
1995. Oligonucleotides with fluorescent dyes at opposite ends provide a
quenched probe system useful for detecting PCR product and nucleic acid
hybridization. 4(6):357-62.)
13. Callegaro A, Spinelli R, Beltrame L, Bicciato S, Caristina L, Censuales S, De
Bellis G, Battaglia C. 2006. Algorithm for automatic genotype calling of single
nucleotide polymorphisms using the full course of TaqMan real-time data.
34(7):e56.
14. Bernard, P., Lay, M. and Wittwer. 1998. Integrated amplification and detection of
the C677T point mutation in the methylenetetrahydrofolate reduc-tase gene by
fluorescence resonance energy transfer and probe melting curves. Anal. Biochem.
255:101–107.
15. Liew M, Nelson L, Margraf R, Mitchell S, Erali M, Mao R, Lyon E, Wittwer C.
2006. Genotyping of human platelet antigens 1 to 6 and 15 by high-resolution
amplicon melting and conventional hybridization probes. 8(1):97-104.
16. Crockett AO, Wittwer CT. 2001. Fluorescein-labeled oligonucleotides for realtime pcr: using the inherent quenching of deoxyguanosine nucleotides.
290(1):89-97.
17. Olivier M, Chuang LM, Chang MS, Chen YT, Pei D, Ranade K, de Witte A,
Allen J, Tran N, Curb D, Pratt R, Neefs H, de Arruda Indig M, Law S, Neri B,
Wang L, Cox DR. 2002. High-throughput genotyping of single nucleotide
polymorphisms using new biplex invader technology. 30(12):e53.
18. Vincent O. Tobe , Scott L. Taylor and Deborah A. Nickerson. 1996. Singlewell genotyping of diallelic sequence variations by a two-color ELISA-based
oligonucleotide ligation assay. Nucleic Acids Research, Vol. 24, No. 19.
19. Zhou L, Myers AN, Vandersteen JG, Wang L, Wittwer CT. 2004. Closed-tube
genotyping with unlabeled oligonucleotide probes and a saturating DNA dye.
Clin Chem. 50(8):1328-35.
20. Zhou L, Wang L, Palais R, Pryor R, Wittwer CT. 2005. High-resolution DNA
melting analysis for simultaneous mutation scanning and genotyping in solution.
Clin Chem. 51(10):1770-7.
21. Collins H.E., Li H., Inda S.E., Anderson J., Laiho K., Tuomilehto J., Seldin M.F.
2000. A simple and accurate method for determination of microsatellite total
allele content differences between DNA pools. Human Genetics. Volume 106,
Number 2: 218 – 226.
22. Sham P, Bader JS, Craig I, O'Donovan M, Owen M. 2002. DNA Pooling: a tool
for large-scale association studies. Nat Rev Genet. 3(11):862-71.
23. Breen G, Harold D, Ralston S, Shaw D, St Clair D. 2000. Determining SNP
allele frequencies in DNA pools. Biotechniques. 28(3):464-6, 468, 470.
24. Norton N, Williams NM, Williams HJ, Spurlock G, Kirov G, Morris DW,
Hoogendoorn B, Owen MJ, O'Donovan MC. 2002. Universal, robust, highly
quantitative SNP allele frequency measurement in DNA pools. Hum Genet.
110(5):471-8.
25. McClay JL, Sugden K, Koch HG, Higuchi S, Craig IW. 2002. High-throughput
single-nucleotide polymorphism genotyping by fluorescent competitive allelespecific polymerase chain reaction (SNiPTag). Anal Biochem. 301(2):200-6.
26. Kawana F, Sawae Y, Sahara T, Tanaka S, Debari K, Shimizu M, Sasaki T. 2001.
Porcine enamel matrix derivative enhances trabecular bone regeneration during
wound healing of injured rat femur. Anat Rec. 264(4):438-46.
27. Uhl GR, Liu QR, Walther D, Hess J, Naiman D. 2001. Polysubstance abusevulnerability genes: genome scans for association, using 1,004 subjects and 1,494
single-nucleotide polymorphisms. Am J Hum Genet. 69(6):1290-300.
28. Wolford JK, Blunt D, Ballecer C, Prochazka M. 2000. High-throughput SNP
detection by using DNA pooling and denaturing high performance liquid
chromatography (DHPLC). Hum Genet. 107(5):483-7.
29. Nordfors L, Jansson M, Sandberg G, Lavebratt C, Sengul S, Schalling M, Arner
P. 2002. Large-scale genotyping of single nucleotide polymorphisms by
Pyrosequencingtrade mark and validation against the 5'nuclease (Taqman((R)))
assay. Hum Mutat. 19(4):395-401.
30. Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Yang WH, Kauffman E,
Bonne-Tamir B, Bertranpetit J, Francalacci P, Ibrahim M, Jenkins T, Kidd JR,
Mehdi SQ, Seielstad MT, Wells RS, Piazza A, Davis RW, Feldman MW, CavalliSforza LL, Oefner PJ. 2000. Y chromosome sequence variation and the history
of human populations. Nat Genet. 26(3):358-61.
31. Underhill PA, Passarino G, Lin AA, Marzuki S, Oefner PJ, Cavalli-Sforza LL,
Chambers GK. 2001. Maori origins, Y-chromosome haplotypes and implications
for human history in the Pacific. Hum Mutat. 17(4):271-80.
32. Hammer MF, Karafet TM, Redd AJ, Jarjanazi H, Santachiara-Benerecetti S,
Soodyall H, Zegura SL. 2001. Hierarchical patterns of global human Ychromosome diversity. Mol Biol Evol. 18(7):1189-203.
Download