High Throughput SNP Discovery & Genotyping

advertisement
High Throughput SNP Discovery
& Genotyping
Perry Cregan
Soybean Genomics and Improvement Lab
USDA, ARS, BARC-West
Beltsville, Maryland
Agricultural
Research
Service
Single Nucleotide Polymorphism
- A working definition -
Single base changes between homologous
DNA fragments
+
Small insertions and deletions (indels)
1
..GAATCTTATTATCTATACTATACATAATTATATACTAAT-GGGTATTGTTCTTAT..
2
..GAATCTTATTATCTATGCTATACATAATTATATACTAATAGGGTATTGTTCTTAT..
..CTTAGAATAATAGATATGATATGTATTAATATATGATTA-CCCATAACAAGAATA..
..CTTAGAATAATAGATACGATATGTATTAATATATGATTATCCCATAACAAGAATA..
SNP
SNP (Indel)
Initial SNP Discovery and Mapping
SNP discovery using Sanger re-sequencing
- Mostly genic
- BAC-end and BAC subclones
SNP genotyping and mapping
- Sequenom mass spectrometer
- Luminex Flow cytometer
- Illumina Inc. GoldenGate™ assay
SNP Discovery in Soybean Unigenes
Re-Sequencing
Design PCR primers to existing 3'-unigene sequence
Identify sequence tagged sites (STSs) visually (agarose gel)
and by sequence analysis
Determine sequence quality (PHRED) and Align sequence
traces (PHRAP) from six diverse soybean genotypes
Analyze assemblies with SNP discovery software (
)
for SNP discovery in redundant sequence
Analysis of haplotype variation and databasing
In silico
From existing expressed sequence tag (EST) data
SNP Discovery in Soybean Unigenes
Re-Sequencing
Design PCR primers to existing 3'-unigene sequence
Identify sequence tagged sites (STSs) visually (agarose gel)
and by sequence analysis
Determine sequence quality (PHRED) and Align sequence
traces (PHRAP) from six diverse soybean genotypes
Analyze assemblies with SNP discovery software (
)
for SNP discovery in redundant sequence
Analysis of haplotype variation and databasing
In silico
From existing expressed sequence tag (EST) data
Initial Assessment of PCR Primers Designed to Soybean Unigenes
SNP Discovery in Soybean Unigenes
Re-Sequencing
Design PCR primers to existing 3'-unigene sequence
Identify sequence tagged sites (STSs) visually (agarose gel)
and by sequence analysis
Determine sequence quality (PHRED) and Align sequence
traces (PHRAP) from six diverse soybean genotypes
Analyze assemblies with SNP discovery software (
)
for SNP discovery in redundant sequence
Analysis of haplotype variation and databasing
In silico
From existing expressed sequence tag (EST) data
Discovery of SNPs in aligned DNA sequence data using PolyBayes in the Consed viewer
SNP Discovery software
SNP
DNA Sequence Alignment for Single Nucleotide Polymorphism (SNP) Discovery in Soybean
SNP Discovery in Soybean Unigenes
Primers sets designed and tested . . . . . . . 9459
Primer sets producing a single
PCR product. . . . . . . . . . . . . . . . . . . . . .
6290 (66.5%)
High quality sequence data for all 6 SNP
discovery genotypes . . . . . . . . . . . . . .
4240 (44.8%)
Genes with at least one SNP . . . . . . . . . . . .
2032 (21.5%)
Data from: Choi et al. (2007) Genetics 176: 685-696
Initial SNP Discovery and Mapping
SNP discovery using Sanger resequencing
- Mostly genic
- BAC-end and BAC subclones
SNP genotyping and mapping
- Sequenom mass spectrometer
- Luminex Flow cytometer
- Illumina Inc. GoldenGate™ assay
SNP Analysis Using the Illumina, Inc.
GoldenGate™ Assay
- A Three Step Process 1. Allele Specific Extension and
Ligation
2. PCR Amplification
3. Hybridization to the Universal
Sentrix® Array Matrix
Allele Specific Extension and
Ligation
Genomic DNA
Allele Specific
Extension &
Ligation
[T/C]
Polymerase
Universal
PCR Sequence 1
Ligase
A
G
[T/A]
illumiCode’ Address
Universal
PCR Sequence 3’
Universal
PCR Sequence 2
Custom Oligo Pool All (OPA)
96-1,536 SNPs multiplexed
Total oligos in reaction – 288-4,608
PCR Amplification
A
Amplification
Template
PCR with
Common
Primers
Cy3 Universal
Primer 1
Cy5 Universal
Primer 2
illumiCode #561
Universal
Primer P3
Hybridization to Sentrix® Array Matrix
SNP #561
G/G
SNP #217
/\/\/\/
/\/\/\/
A/A
illumiCode
#1024
/\/\/\/
illumiCode
#217
illumiCode
#561
C/T
SNP #1024
Sentrix® Array Matrix
1.5 mm
400 mm
10 mm
The Illumina BeadStation 500G permits high throughput analysis of thousands
of SNP DNA markers in hundreds of genotypes in less than one week.
Genetic Mapping
Three Mapping Populations of 89
individuals each
Total markers = 6521
– 1008 SSR
– 3959 SNP
– 637 RFLP
– 14 Classical
– 3 Isozyme
Total Map length 2393.7 centiMorgans
A Set of 1536 SNPs with Maximal
Genome Coverage and
High Minor Allele Frequency
3110 working GoldenGate assays
All SNPs have been genetically mapped
All SNPs analyzed on diverse Exotic and
Elite soybean germplasm lines
- 96 Diverse Asian introductions from China, Korea, and Japan
collected from 22-50 degrees N and 104-140 degrees E.
- 96 N. American released cultivars selected based upon a
cluster analysis using pedigree data to maximize diversity
The Costs Reagents for Whole Genome Scans of
96 Genotypes Using an Optimized Set of 1536
SNPs
$ / set of 96 genotypes
$12,000
$10,000
$8,000
$6,000
$4,000
$2,000
$0
0
10
20
30
40
Sets of 96 genotypes
50
60
70
Accelerated SNP Discovery
Creation of a Reduced Representation Genome
Library
– Digest genomic DNA with a combination of five
blunt-end restriction endonucleases
– Select a combination of restriction enzymes such
that approx. 5% of the genome is present in the
110-140 bp fraction
Solexa sequence analysis of the Reduced
Representation Library
SNP discovery via alignment of the Solexa
reads with the Williams 82 whole genome
sequence from the DOE, JGI
Creation of a Reduced Representation
Genome Library
New England BioLabs 100 bp and 50 bp Ladders
New England BioLabs 100 bp and 50 bp Ladders
PI 468916 genomic DNA (4 ul = 50 ng)
200 bp
150 bp
100 bp
50 bp
Mix of 5 genomic DNAs (12 ul = 50 ng)
Solexa Resequencing Results – PI 468916 Reduced Representation Library
No. of
occurrences of a
particular 33mer
No. of unique
33mers
500 plus
1,293
2,142,203
70,692,699
300-500
1,414
536,701
17,711,133
100-299
6,561
1,097,771
36,226,443
35-99
15,119
871,270
28,751,910
20-34
14,510
374,818
12,368,994
15-20
12,040
206,542
6,815,886
11-14
15,645
192,046
6,337,518
9-10
15,234
143,581
4,738,173
7-8
29,215
216,367
7,140,111
6
26,648
159,888
5,276,304
5
43,105
215,525
7,112,325
4
72,955
291,820
9,630,060
3
130,555
391,665
12,924,945
2
259,225
518,450
17,108,850
1
1,312,518
1,312,518
43,313,094
1,956,037
8,671,165
286,148,445
TOTAL
No. of 33
base
reads
Total
bases
Green arrows indicate
reads that are unique to
one genome position
Position of SNP
Conclusions
Approx. 20,000 SNPs discovered in 6000+ Sequence
Tagged Site via Sanger re-sequencing
Linkage Analysis– In the near future we will have an
optimized set of 1536 GoldenGate assays for high
throughput QTL analysis
The Solexa analysis will greatly accelerate SNP
discovery
The Illumina Infinium assay will provide an order of
magnitude greater genotyping capacity
Association Analysis



Illumina Infinium assay will allow estimates of linkage disequilibrium
across the soybean genome
Creating an Association Panel of 2400 cultivated soybean genotypes
Phenotyping will become the limiting factor
Collaborators
David Hyten & Lakshmi Matukumalli, USDA-ARS, Beltsville, MD
Qijian Song & Eun-Young Hwang, Univ. of Maryland
Ik-Young Choi, Seoul National University, South Korea
James Specht, Univ. of Nebraska
Randy Shoemaker, Steve Cannon & Michelle Graham,
USDA-ARS, Ames, IA
Greg May & Andrew Farmer, NCGR, Sante Fe, NM
Randall Nelson, USDA-ARS, Urbana, IL
Tommy Carter, Jr. USDA-ARS, Raleigh, NC
Kevin Chase & K. Gordon Lark, Univ. of Utah
Funding
Support
USDA-ARS, United Soybean Board
Download