General SolCAP Powerpoint Presentation

advertisement
SolCAP
Solanaceae Coordinated Agricultural Project
SNP Development for Elite Potato Germplasm
David Douches Walter De Jong Robin Buell David Francis
John Hamilton Lukas Mueller AllenVan Deynze
Funding
USDA/AFRI
This project is supported by the Agriculture and Food Research Initiative Applied Plant
Genomics CAP Program of USDA’s National Institute of Food and Agriculture.
What is SolCAP?
The SolCAP project is a coordinated agricultural
project that links together people from public institutions,
private institutions and industries who are dedicated to the
improvement of the Solanaceae crops: potato and
tomato.
Through innovative research, education and extension
the SolCAP project will focus on providing significant
benefits to both the consumer and the environment.
The SolCAP project is supported by the Agriculture and Food
Research Initiative Applied Plant Genomics CAP Program of the
USDA’s National Institute of Food and Agriculture
SolCAP Project Participants
Lead Institution: Michigan State University
New York
Cornell University
Oregon
Maryland
Oregon State University
Cedar Lake Research and Consulting
USDA/ARS Beltsville
Idaho
USDA/ARS University of Idaho
California
Minnesota
West Virginia
UC Davis
Campbells R&D
University of Minnesota
West Virginia State University
Wisconsin
North Carolina
USDA/ARS University of Wisconsin
North Carolina State University
Michigan
Florida
Michigan State University
University of Florida
Ohio
Ohio State University
Commercial Solanaceae Production
US: $5.38 billion product value (1.6 million acres)
Potato Breeding Bottlenecks & Challenges
•
•
•
•
Tetraploid genetics
Narrow genetic base
Small populations
Many pests
• Multi-trait evaluation
– Quality
– Resistance
– Agronomy
• Market differentiation
Potato Breeding Bottlenecks & Challenges
• Lack of markers in elite germplasm
• Mostly a phenotypic based process
• Market defining traits (CHO) difficult
to select for at early generation
stages.
• Breeder needs to combine the
market-driven quality with the
agronomic performance and host
plant resistance needed by the
growers.
The Potato Genome Sequencing Consortium
• The Potato Genome Sequencing Consortium (PGSC) have
collaborated to sequence the genomes of two species:
Solanum tuberosum (RH) and Solanum phureja (DM1-3 516
R44).
• First potato genome assembly http://www.potatogenome.net
The Potato Genome Sequencing Consortium
Whole genome shotgun sequencing – Hybrid approach
using three sequencing technologies
Metrics: 850 Mb
• V3: 9,171 scaffolds (717.5Mb) & 58,998 contigs (9.7Mb)
• N50 scaffold size: 1,318,511bp
• N90 scaffold size: 253,760 bp
Available at:
Potatogenome.net
Annotating the Potato Genome
• Identified genes
• Sequenced transcriptome from 29 different DM tissues
• Analyzing the genes and their expression currently
In Solanaceae There is a Major Gap
Between Genomic Information and Breeding
• Potato breeding are based upon phenotypes, not
genotypes, despite the fact that they are being
sequenced.
• Marker assisted breeding (MAB) is not widely practiced
due to a lack of genetic markers linked to traits of
interest.
• SolCAP is providing translational genomics strategy.
Primary Research Objective
• To reduce the gap between genomics and breeding
SolCAP will provide infrastructure to link allelic variation of
SNPs in genes to valuable traits.
– Identify up to 10,000 SNPs for potato in elite germplasm
– Combine eSNPs w/ Illumina sequence-identified SNPs
– 75% of the SNP’s distributed throughout the genome
– 25% of the SNP’s targeted to candidate genes and genetic markers
– Genotype germplasm panels and mapping populations with
Illumina Infinium platform
Plan of Work
• Develop extensive sequence data of expressed genes,
and identify SNP markers associated with candidate
genes for CHO and vitamin biosynthetic pathways.
• Collect standardized phenotypic data of panel and 4x
mapping population across multiple environments for
potato.
• Address regional, individual program and emerging
needs through a small grants program that supports
SNP genotyping of additional mapping populations.
• Create integrated, breeder-focused resources for
genotypic and phenotypic analysis by leveraging existing
databases and resources at SGN and MSU.
Potato SNP Marker Discovery
1. Existing eSNPs from Kennebec, Bintje and Shepody ESTs
2. New Illumina GAII sequencer-identified SNPs from important
processing cultivar transcriptomes:
Atlantic – high solids chip-processor
Snowden – low reducing sugar storage chip processor
Premier Russet – low reducing sugar; frozen proc.
In Silico Sanger Identified SNPs (eSNPs)
Tomato
Total # of Transcript Assemblies:
48,945
Total bp length of Transcript Assemblies:
33,916,704
Total # Transcript Assemblies w/ putative SNPs:
5,198
Total bp length of Transcript Assemblies w/ SNPs: 6,347,780
Total # of putative SNP positions:
16,531
Potato
Total # of Transcript Assemblies:
70,344
Total bp length of Transcript Assemblies:
49,859,202
Total # Transcript Assemblies w/ putative SNPs:
7,722
Total bp length of Transcript Assemblies w/SNPs: 8,872,526
Total # of putative SNP positions:
57,705
Sanger-derived Potato eSNPs
- Intra-varietal and inter-varietal
- Bulk of sequence data from ESTs
- http://solanaceae.plantbiology.msu.edu/analyses_snp.php
cDNA Libraries for Sequencing
Using Illumina Genome Analyzer II
Potato
•Snowden
•Atlantic
•Premier Russet
Tuber
Leaf
Flower
Callus
•Isolate RNA from these 4 tissues
•Pool in equimolar amounts
•Construct normalized cDNA to reduce representation of
abundant transcripts
SNP Workflow
Library creation/QC
GAII sequencing
(single and paired end)
400
300
Assembly
Data Collection
Analysis: transcriptome complexity
SNP calling/validation
Data Analysis of Illumina
cDNA Reads: Potato
Sample
Total Clusters Total Reads
PF Passed
Clusters
% PF
Passed
Clusters
Total PF
Reads
Actual
Reads
Atlantic 1
7,601,277
15,202,554
6,382,748
83.97
12,765,496
Atlantic 2
10,544,542
21,089,084
9,252,168
87.74
18,504,336 30,185,186
Premier 1
7,812,394
15,624,788
6,652,121
85.15
13,304,242
Premier 2
11,678,379
23,356,758
9,999,926
85.63
19,999,852 31,949,096
Snowden 1
7,996,418
15,992,836
6,837,553
85.51
13,675,106
Snowden 2
11,781,671
88.22
20,786,644 33,288,120
23,563,342 10,393,322
De Novo Velvet Assemblies of Potato
Illumina Sequences
Minimum contig length of 150bp:
Transcriptome
No.
Size (Mb)
Contigs
N50 (bp)
Maximum
Contig (Kb)
45215
1192
11.2
38.2
54917
826
6.6
38.2
58754
775
6.9
Variety
Total Gb
Atlantic
1.8
38.4
Premier
1.9
Snowden
2.0
Velvet Assemblies of Potato
Illumina Sequences
Alignment of S. tuberosum GAII-transcriptome contigs to
the PGSC draft genome sequence from DM1-3 516 R44:
• Atlantic:
– 45214 contigs
– 32520 align with GMAP(95%id, 50%cov)
– 27106 align with GMAP(95%id, 90%cov)
• Premier:
– 54917 contigs
– 41497 align with GMAP (95%id, 50%cov)
– 37297 align with GMAP (95%id, 90%cov)
• Snowden:
– 58754 contigs
– 44479 align with GMAP (95%id, 50%cov)
– 40708 align with GMAP (95%id, 90%cov)
Identify intra-varietal SNPs
Query
SNPs
Filtered SNPs
Atlantic
224748
150669
Premier
265673
181800
Snowden
258872
166253
A/C SNP
Filtered SNP counts
Filtering on SNP quality and 1 SNP/ 150bp window
Ref
Query
d 10
d 20
d 30
d 40
d 50
d 60
d 100
Atlantic
Atlantic
21336
17509
14493
12150
10277
8673
4435
Atlantic
Premier
21789
18050
15084
12477
10584
8919
4620
Atlantic
Snowden
19997
16518
13694
11378
9689
8048
4173
Premier
Atlantic
21117
17096
14106
11785
9790
8222
4228
Premier
Premier
22951
18431
15016
12377
10300
8703
4371
Premier
Snowden
20972
16846
13709
11357
9479
7873
4113
Snowden
Atlantic
20777
16998
13984
11619
9647
8131
4186
Snowden
Premier
22101
17888
14701
12068
10124
8650
4223
Snowden
Snowden
21083
16963
13792
11218
9359
7735
3896
Design SNPs for the Illumina Infinium Platform
SNPs from:
Final SNP 10K array content selected from 69,011 SNPs
that pass the filtering and design criteria for the Infinium®
platform using the following criteria:
-Read Depth: 20 reads min, 255 reads max
-Biallelic based on all available sequence
-Within exons (map to DM1-3 draft genome sequence);
specifically, 50 bp from exon/intron junction
-Max 1 SNP within 50 bp of candidate SNP
-Preferred SNPs that were intervarietal
Candidate Genes For Genotyping
-2009/10: a community call for genes to be placed on the
potato and tomato platforms (assuming SNPs could be
designed)
-Had strong response by the community; web page
submissions, direct solicitations, email solicitations
~ 1800 sequences were identified by project personnel and the
community for this targeted SNP discovery; note: represents
redundant sequences
-In potato, > 700 candidate genes have a SNP that passes our
filtering criteria
SNPs found in candidate genes
1065 candidates with no SNPs
160 candidates with 1 SNP
135 candidates with 2 SNPs
102 candidates with 3 SNPs
100 candidates with 4 SNPs
48 candidates with 5 SNPs
175 candidates with 6-10 SNPs
54 candidates with 11-31 SNPs
We want up to 5 SNPs per candidate gene.
SNPs in some key candidate genes
Sucrose-phosphate-synthase
Soluble starch synthase 3, chloroplastic/amyloplastic
Acid invertase
Granule-bound starch synthase 2, chloroplastic/amyloplastic
Glucose-6-phosphate isomerase
Sucrose sythase
Isoamylase isoform 2
Sucrose transporter
Beta-amylase
Sucrose synthase
Granule-bound starch synthase 1
Phosphoglucomutase
20
18
16
10
10
10
8
8
6
6
6
6
Spacing and gene region coverage
We expect approximately 25% of the SNPs will be mapped to
candidate genes, 10% to SNPs from known genetic markers,
and 65% to genes distributed across scaffolds, primarily those
anchored to the DM1‐3 516R44 S. phureja draft genome.
2769 SNPs in candidate genes
508 SNPs in genetic markers
6723 SNPs will come from throughout the genome
How much of the genome is represented?
~650 Mb of the genome will be covered (~850 Mb genome)
Validation
High Resolution Melting
• Tested: 48 primers
• Validation (75%)
• Problems with technical
replicates
GoldenGate Bead Express
96 x 480 samples
Selected 32 SNPs total per variety
(96 total)
Validation rate ~85%
Illumina Output: Good
SNP Validation: Dosage calls
Premier
Atlantic
Snowden
Rio Grande
BBBB
14
6
7
13
AAAA
17
5
6
21
hets
BBBA
3
13
5
30
3
26
AABB
AAAB
21
20
31
11
28
19
Total hets
no data
57
6
77
6
76
6
nulliplex
simplex
duplex
31
33
21
11
41
31
13
45
28
59
SNP segregation in 4x Russet
Mapping population (Premier x Rio Grande)
Cross
Duplex x Duplex
Duplex x Nulliplex
Duplex x Simplex
10
2
11
10
4
22
Nulliplex x Duplex
Nulliplex x Nulliplex
Nulliplex x Simplex
2
28
6
28
14
Simplex x Duplex
Simplex x Nulliplex
Simplex x Simplex
Simplex x Het
Total
11
8
11
5
11
5
94
Pair-wise Comparison of SNPs
Non-segregating
Segregating SNPs (%)z
Cross
Ploidy
SNPs (%)
I
II
W2310-3 x Kalkaska
4X
22.4
37.6
40.0
MSG227-2 x Jacqueline Lee
4X
16.5
51.8
31.8
Atlantic x Superior
4X
5.9
51.8
42.4
Stirling x 12601ad1
4X
25.9
37.6
36.5
B1829-5 x Atlantic
4X
11.5
18.8
69.8
BER 63 x DM1-3
2X
79.3
20.7
0
BER 83 x DM1-3
2X
78.8
21.2
0
84SD22 x DM1-3
2X
46.0
54.0
0
MCR205 x DM1-3
2X
76.7
23.3
0
DI x DM1-3
2X
85
15
0
08675-21 x 09901-01
2X
53.8
46.2
0
RH x SH
2X
59
41
0
zI = segregation not dependent on scoring dosage; II = segregation dependent on scoring
dosage
Percent Heterozygosity
% Heterozygosity: 96 SNPs x 96 Potato lines
Clone
Potato Panel SNP Heterozygosity
% of clones in each rangee
70.0
60.0
50.0
40.0
Diploid Clones
Tetraploid Clones
30.0
20.0
10.0
0.0
1-10% 10-20% 20-30% 30-40% 40-50% 50-60% 60-70% 70-80% 80-90% 90-100%
Range of % Heterozygosity
SNP Heterozygosity Extremes
Tetraploids
80-90%
All Red
CF77154-1
CO95051-5W
Snowden
Atlantic
CO97215-2P/P
% Heterozygosity
81.2
83.5
84.7
84.7
85.9
85.9
<50%
Chunshu No4
P1
MSL512-6
Inca Gold
P2
NDSU clone 4
% Heterozygosity
27.1
28.2
44.7
47.1
48.2
48.2
Diploids
50-60%
C5
84SD22
MCD500036
% Heterozygosity
52.9
56.7
69.4
1-10%
DM1-3
ber265857
CMM6-3
CMM 1T
CMM243503
% Heterozygosity
0
5.9
7.8
8.2
8.2
Potato Germplasm Panel
• Panel structure (350 clones)
–
–
–
–
–
Top 50 N. American varieties
Historical varieties
Advanced US breeding lines
Non-US germplasm
Genetic stocks
• Population analyses
– Association mapping
– Historical relationship
– Hypothesis testing for trait
associations
– Parental selection
– Resolve population structure
• Phenotypic screening for
additional traits outside of
SolCAP
• Phenotypic evaluation
– Key traits: specific gravity,
sucrose, glucose, Vitamin C,
maturity, tuber shape, tuber
number, etc.
– Additional traits determined by
breeding community
– Data curated at SGN
SNP comparison across potato germplasm
panel: resolving population structure
MSU Breeding Program varieties
Group Phureja clones clusters separately from elite germplasm
Wild species cluster separately from Phureja and Tuberosum
SNP Genotyping Consortium
• Potato 10K (~9100 SNPs) Illumina Infinium chip
• a core set of SNPs in standard germplasm panels in tomato
and potato.
• Over 3000 genotyping samples were ordered
• Consortium’s efforts resulted in securing a 24% discount per
sample beyond what would have been possible with one
contributor ($85/sample)
• The barrier to entry for many institutions was lowered, as
they were able to access this tool with only a 48 sample
commitment.
• Illumina saw orders from each of the
three major world regions.
• More SNPs?
SolCAP SNP Genotyping
• ~9100 SNPs for elite potato germplasm
• 2010 SolCAP Goal:
1,152 potato x 9,100 SNPs
•
•
•
•
potato germplasm panel:350
4x russet mapping population: 200
2x mapping population: 160
Community SNP genotyping:
• 2 populations: 350
What makes up the Potato Germplasm Panel
Phenotypic Evaluation?
• Clonal Study (CS)
– 250 clones
– 2 reps X 10 hills
– OR, WI, NY
CS
• Russet Mapping Population (MP)
–
–
–
–
Rio Grande X Premier Russet
200 progeny
2 reps X 10 hills
ID, NC, MN
MP
MP
CS
CS
MP
States in blue = Participants in SolCAP
Potato Germplasm Panel
• To be field tested 2 years X 3 major environments for
potato production.
• Evaluation of specific gravity, glucose and sucrose, chip
color, skin type, shape, vine maturity, tuber number,
tuber shape, vitamin C, internal defects, bruising,
anthocyanins and biotic resistances.
Genotyping the core collections will
impact strategies for translation
• Potential translational approaches:
– 1) introgression from other populations (domesticated or wild)
– 2) selection for coupling phase recombinants to establish
linkage blocks of favorable alleles (e.g. disease resistance loci)
– 3) population development designed to maximize variation w/in
market classes
– 4) association approaches
– 5) whole genome approaches
• Other translational strategies will emerge under other
CAPs or through innovation in public research.
Russet 4x Mapping Population
• Evaluate russet mapping population traits (Yencho, Novy,
Sowokinos, Thill, Gupta, Haynes) (2009-2011)
– Key traits: specific gravity, sucrose, glucose, Vitamin C,
maturity, tuber shape, tuber number, etc.
• Genetic Mapping (Van Deynze, De Jong, Douches)
– Genotyping 9100 SNPs
• QTL Analysis (Haynes)
– Identify markers associated with key traits
• MAS/MAB (Marker Assisted Selection / Breeding)
– Validation of QTL in additional mapping populations
– Use markers in new breeding populations
Databases and Resources
• Integrated, breeder-focused resources for genotypic and
phenotypic analysis at SGN and MSU.
– http://solcap.msu.edu
– http://solanaceae.plantbiology.msu.edu/
– http://solgenomics.net/
SolCAP Education and Extension Objectives
• Team-taught distance-learning graduate level course in
translational genomics at Cornell University
• Yearly workshops for breeders to integrate genotypebased breeding strategies with elite germplasm
• Use eXtension.org to develop a Community of Practice
for plant breeders, called Plant Breeding and Genomics,
across all CAPs (Barley, Wheat, Conifer, RosBreed,
Bean, Onion)
SolCAP PAA Workshop
• August 15, 2010 Corvallis, Oregon
• Hands-on computer lab format
• Topics
– Potato genome analysis: Robin Buell
– Tetraploid QTL analysis: Christine Hackett
– Use of Illumina Genome studio: Allen Van
Deynze
PB&GWorks Web community
http://pbgworks.hort.oregonstate.edu/
SolCAP has created PBGworks, a web community within the eXtension.org
Plant breeders, basic scientists, seed industry professionals, agricultural
professionals, extension specialists and others can publish content and
network.
Target audience: The practicing plant breeder.
Our long-term goal is to provide:
• Start-to-finish examples of marker-assisted
selection applications
• Resource pages including protocols, software
tutorials, and up-to-date contact information for
companies offering genetic services
• Improved access to genetic resources through
the "breeder's toolbox"
Potato SNP Summary
•In silico Sanger eSNPs: potato: 57,705 eSNPs
•
•~75,000 potato SNPs from 5.7 Gb of GAII transcriptome
sequence (69,011 SNPs passed Infinium design)
•~650 Mb of the genome will be covered by SNPs
•Validation suggests SNPs can be called in broader germplasm
•Dosage reads of SNPs will optimize SNP genotyping of 4x
mapping populations
•Reference Sequence of DM1-3 516R44 is permitting
bioinformatic optimization of pipelines rather than relying on
empirical validation.
Germplasm Panel SNP Genotyping
• SSR-based genetic map
– 2 years
– 200 markers
• 17 markers/chromosome
– $5/ data point
– Not dense enough for 4x
mapping
– Markers may be linked to
traits
• SNP-based genetic map
– < 1 week
– 9,100 markers
• >700 markers/chromosome
– < 2 ¢ / data point
– Dense enough for 4x
mapping
– Markers are in genes
– Markers robust enough
for broader germplasm
Outcomes for Breeding from SolCAP
• A genome-wide set of markers and
bioinformatic tools accessible by
breeders
– Breeders will access germplasm for
crossing based upon SNP polymorphism
and linked QTL of interest
– design crosses complementary for QTL
and traits, and then use MAB in early
generation selection.
Outcomes for Breeding from SolCAP
• Better understanding of the allelic
variation influencing CHOs
– Design crosses to create improved
sugar and starch levels and starch
quality.
– Crosses designed to manipulate and
select variation within existing elite
populations or introgress novel
alleles from wild germplasm.
– More predictable and directed
breeding effort for processing and
fresh market traits.
SolCAP Acknowledgments
Collaborators, OSU
David Francis
Matt Robbins
Sung-Chur Sim
Troy Aldrich
Collaborators, MSU
David Douches
C Robin Buell
John Hamilton
Kelly Zarka
Collaborators, Cornell
Walter De Jong
Lucas Mueller
Joyce van Eck
Collaborators, UCD
Allen Van Deynze
Kevin Stoffel
Alex Kozic
Jeanette Martins
Collaborators,
Oregon State
Alex Stone
John McQueen
Roger Leigh
Others:
Michael Coe
Sanwen Huang
Funding
USDA/AFRI
This project is supported by the Agriculture and Food Research Initiative
Applied Plant Genomics CAP Program of USDA’s National Institute of Food
and Agriculture.
Acknowledgments: PGSC
BGI-Shenzhen, China (Sanwen Huang, Ruiqiang Li, Xun Xu, Wei Fan, Peixiang Ni, Hongmei Zhu, Desheng Mu,
Bicheng Yang, Jian Wang and Jun Wang); Center Bioengineering RAS, Russia (Boris Kuznetsov); Central
Potato Research Institute, India (Swarup Chakrabarti, V.U. Patil, Shashi Rawat and S.K. Pandey); Chinese
Academy of Agricultural Sciences, China (Sanwen Huang, Zhonghua Zhang and Dongyu Qu); University of
Dundee, United Kingdom (Dan Bolser and David Martin); ENEA, Italian National Agency for New
Technologies, Energy and the Environment, Italy (Giovanni Giuliano and Gaetano Perrotta); Imperial College
London, United Kingdom (Gerard Bishop); International Potato Center (CIP), Peru (Merideth Bonierbale, Marc
Ghislain and Reinhard Simon); Institute of Biochemistry and Biophysics (PAS), Poland (Wlodzimierz Zagorski,
Jacek Hennig, Pawel Szczesny, Piotr Zielenkiewicz and Robert Gromadka); Instituto Nacional de TecnologÌa
Agropecuaria (INTA), Argentina (Gabriela Massa, Leandro Barreiro and Sergio Feingold); Instituto de
Investigaciones Agropecuarias (INIA), Chile (Boris Sagredo, Alex Di Genova and Nilo MejÌa); Michigan State
University, USA (Robin Buell, David Douches, Steven Lundback, Alicia Massa, and Brett Whitty); New Zealand
Institute for Plant & Food Research, New Zealand (Jeanne Jacobs, Mark Fiers and Susan Thomson); Scottish
Crop Research Institute, United Kingdom (Glenn Bryan, David Marshall, Robbie Waugh and Sanjeev Kumar
Sharma); Teagasc Agriculture and Food Development Authority, Ireland (Dan Milbourne, Istvan Nagy and
Marialaura Destefanis); Universidad Peruana Cayetano Heredia, Peru (Gisella Orjeda, Frank Guzman, Michael
Torres, Tomas Miranda, German de la Cruz, Roberto Lozano and Olga Ponce); University of Wisconsin, USA
(Jiming Jiang and Marina Iovene); Virginia Polytechnic Institute & State University, USA (Richard E. Veilleux);
Wageningen University, The Netherlands (Bas te Lintel Hekkert, Christian Bachem, Erwin Datema, Jan de Boer,
Richard Visser, Roeland van Ham, Theo Borm and Xiaomin Tang)
Funding at MSU for potato genomics: National Science Foundation
Visit us at http://solcap.msu.edu/
EXTRAS
What is a SNP?
Single-nucleotide polymorphism
(SNP, pronounced snip)
SNP is a DNA sequence variation occurring
when a single nucleotide — A, T, C, or G —
in the genome differs between members of a
species
SNPs may fall within coding sequences of
genes, non-coding regions of genes, or in the
intergenic regions between genes.
SNPs within a coding sequence may or may
not change the amino acid sequence of the
protein that is produced.
Hawkeye Viewer – Visualizing
SNPs
G/T SNP
Download