SolCAP Solanaceae Coordinated Agricultural Project SNP Development for Elite Potato Germplasm David Douches Walter De Jong Robin Buell David Francis John Hamilton Lukas Mueller AllenVan Deynze Funding USDA/AFRI This project is supported by the Agriculture and Food Research Initiative Applied Plant Genomics CAP Program of USDA’s National Institute of Food and Agriculture. What is SolCAP? The SolCAP project is a coordinated agricultural project that links together people from public institutions, private institutions and industries who are dedicated to the improvement of the Solanaceae crops: potato and tomato. Through innovative research, education and extension the SolCAP project will focus on providing significant benefits to both the consumer and the environment. The SolCAP project is supported by the Agriculture and Food Research Initiative Applied Plant Genomics CAP Program of the USDA’s National Institute of Food and Agriculture SolCAP Project Participants Lead Institution: Michigan State University New York Cornell University Oregon Maryland Oregon State University Cedar Lake Research and Consulting USDA/ARS Beltsville Idaho USDA/ARS University of Idaho California Minnesota West Virginia UC Davis Campbells R&D University of Minnesota West Virginia State University Wisconsin North Carolina USDA/ARS University of Wisconsin North Carolina State University Michigan Florida Michigan State University University of Florida Ohio Ohio State University Commercial Solanaceae Production US: $5.38 billion product value (1.6 million acres) Potato Breeding Bottlenecks & Challenges • • • • Tetraploid genetics Narrow genetic base Small populations Many pests • Multi-trait evaluation – Quality – Resistance – Agronomy • Market differentiation Potato Breeding Bottlenecks & Challenges • Lack of markers in elite germplasm • Mostly a phenotypic based process • Market defining traits (CHO) difficult to select for at early generation stages. • Breeder needs to combine the market-driven quality with the agronomic performance and host plant resistance needed by the growers. The Potato Genome Sequencing Consortium • The Potato Genome Sequencing Consortium (PGSC) have collaborated to sequence the genomes of two species: Solanum tuberosum (RH) and Solanum phureja (DM1-3 516 R44). • First potato genome assembly http://www.potatogenome.net The Potato Genome Sequencing Consortium Whole genome shotgun sequencing – Hybrid approach using three sequencing technologies Metrics: 850 Mb • V3: 9,171 scaffolds (717.5Mb) & 58,998 contigs (9.7Mb) • N50 scaffold size: 1,318,511bp • N90 scaffold size: 253,760 bp Available at: Potatogenome.net Annotating the Potato Genome • Identified genes • Sequenced transcriptome from 29 different DM tissues • Analyzing the genes and their expression currently In Solanaceae There is a Major Gap Between Genomic Information and Breeding • Potato breeding are based upon phenotypes, not genotypes, despite the fact that they are being sequenced. • Marker assisted breeding (MAB) is not widely practiced due to a lack of genetic markers linked to traits of interest. • SolCAP is providing translational genomics strategy. Primary Research Objective • To reduce the gap between genomics and breeding SolCAP will provide infrastructure to link allelic variation of SNPs in genes to valuable traits. – Identify up to 10,000 SNPs for potato in elite germplasm – Combine eSNPs w/ Illumina sequence-identified SNPs – 75% of the SNP’s distributed throughout the genome – 25% of the SNP’s targeted to candidate genes and genetic markers – Genotype germplasm panels and mapping populations with Illumina Infinium platform Plan of Work • Develop extensive sequence data of expressed genes, and identify SNP markers associated with candidate genes for CHO and vitamin biosynthetic pathways. • Collect standardized phenotypic data of panel and 4x mapping population across multiple environments for potato. • Address regional, individual program and emerging needs through a small grants program that supports SNP genotyping of additional mapping populations. • Create integrated, breeder-focused resources for genotypic and phenotypic analysis by leveraging existing databases and resources at SGN and MSU. Potato SNP Marker Discovery 1. Existing eSNPs from Kennebec, Bintje and Shepody ESTs 2. New Illumina GAII sequencer-identified SNPs from important processing cultivar transcriptomes: Atlantic – high solids chip-processor Snowden – low reducing sugar storage chip processor Premier Russet – low reducing sugar; frozen proc. In Silico Sanger Identified SNPs (eSNPs) Tomato Total # of Transcript Assemblies: 48,945 Total bp length of Transcript Assemblies: 33,916,704 Total # Transcript Assemblies w/ putative SNPs: 5,198 Total bp length of Transcript Assemblies w/ SNPs: 6,347,780 Total # of putative SNP positions: 16,531 Potato Total # of Transcript Assemblies: 70,344 Total bp length of Transcript Assemblies: 49,859,202 Total # Transcript Assemblies w/ putative SNPs: 7,722 Total bp length of Transcript Assemblies w/SNPs: 8,872,526 Total # of putative SNP positions: 57,705 Sanger-derived Potato eSNPs - Intra-varietal and inter-varietal - Bulk of sequence data from ESTs - http://solanaceae.plantbiology.msu.edu/analyses_snp.php cDNA Libraries for Sequencing Using Illumina Genome Analyzer II Potato •Snowden •Atlantic •Premier Russet Tuber Leaf Flower Callus •Isolate RNA from these 4 tissues •Pool in equimolar amounts •Construct normalized cDNA to reduce representation of abundant transcripts SNP Workflow Library creation/QC GAII sequencing (single and paired end) 400 300 Assembly Data Collection Analysis: transcriptome complexity SNP calling/validation Data Analysis of Illumina cDNA Reads: Potato Sample Total Clusters Total Reads PF Passed Clusters % PF Passed Clusters Total PF Reads Actual Reads Atlantic 1 7,601,277 15,202,554 6,382,748 83.97 12,765,496 Atlantic 2 10,544,542 21,089,084 9,252,168 87.74 18,504,336 30,185,186 Premier 1 7,812,394 15,624,788 6,652,121 85.15 13,304,242 Premier 2 11,678,379 23,356,758 9,999,926 85.63 19,999,852 31,949,096 Snowden 1 7,996,418 15,992,836 6,837,553 85.51 13,675,106 Snowden 2 11,781,671 88.22 20,786,644 33,288,120 23,563,342 10,393,322 De Novo Velvet Assemblies of Potato Illumina Sequences Minimum contig length of 150bp: Transcriptome No. Size (Mb) Contigs N50 (bp) Maximum Contig (Kb) 45215 1192 11.2 38.2 54917 826 6.6 38.2 58754 775 6.9 Variety Total Gb Atlantic 1.8 38.4 Premier 1.9 Snowden 2.0 Velvet Assemblies of Potato Illumina Sequences Alignment of S. tuberosum GAII-transcriptome contigs to the PGSC draft genome sequence from DM1-3 516 R44: • Atlantic: – 45214 contigs – 32520 align with GMAP(95%id, 50%cov) – 27106 align with GMAP(95%id, 90%cov) • Premier: – 54917 contigs – 41497 align with GMAP (95%id, 50%cov) – 37297 align with GMAP (95%id, 90%cov) • Snowden: – 58754 contigs – 44479 align with GMAP (95%id, 50%cov) – 40708 align with GMAP (95%id, 90%cov) Identify intra-varietal SNPs Query SNPs Filtered SNPs Atlantic 224748 150669 Premier 265673 181800 Snowden 258872 166253 A/C SNP Filtered SNP counts Filtering on SNP quality and 1 SNP/ 150bp window Ref Query d 10 d 20 d 30 d 40 d 50 d 60 d 100 Atlantic Atlantic 21336 17509 14493 12150 10277 8673 4435 Atlantic Premier 21789 18050 15084 12477 10584 8919 4620 Atlantic Snowden 19997 16518 13694 11378 9689 8048 4173 Premier Atlantic 21117 17096 14106 11785 9790 8222 4228 Premier Premier 22951 18431 15016 12377 10300 8703 4371 Premier Snowden 20972 16846 13709 11357 9479 7873 4113 Snowden Atlantic 20777 16998 13984 11619 9647 8131 4186 Snowden Premier 22101 17888 14701 12068 10124 8650 4223 Snowden Snowden 21083 16963 13792 11218 9359 7735 3896 Design SNPs for the Illumina Infinium Platform SNPs from: Final SNP 10K array content selected from 69,011 SNPs that pass the filtering and design criteria for the Infinium® platform using the following criteria: -Read Depth: 20 reads min, 255 reads max -Biallelic based on all available sequence -Within exons (map to DM1-3 draft genome sequence); specifically, 50 bp from exon/intron junction -Max 1 SNP within 50 bp of candidate SNP -Preferred SNPs that were intervarietal Candidate Genes For Genotyping -2009/10: a community call for genes to be placed on the potato and tomato platforms (assuming SNPs could be designed) -Had strong response by the community; web page submissions, direct solicitations, email solicitations ~ 1800 sequences were identified by project personnel and the community for this targeted SNP discovery; note: represents redundant sequences -In potato, > 700 candidate genes have a SNP that passes our filtering criteria SNPs found in candidate genes 1065 candidates with no SNPs 160 candidates with 1 SNP 135 candidates with 2 SNPs 102 candidates with 3 SNPs 100 candidates with 4 SNPs 48 candidates with 5 SNPs 175 candidates with 6-10 SNPs 54 candidates with 11-31 SNPs We want up to 5 SNPs per candidate gene. SNPs in some key candidate genes Sucrose-phosphate-synthase Soluble starch synthase 3, chloroplastic/amyloplastic Acid invertase Granule-bound starch synthase 2, chloroplastic/amyloplastic Glucose-6-phosphate isomerase Sucrose sythase Isoamylase isoform 2 Sucrose transporter Beta-amylase Sucrose synthase Granule-bound starch synthase 1 Phosphoglucomutase 20 18 16 10 10 10 8 8 6 6 6 6 Spacing and gene region coverage We expect approximately 25% of the SNPs will be mapped to candidate genes, 10% to SNPs from known genetic markers, and 65% to genes distributed across scaffolds, primarily those anchored to the DM1‐3 516R44 S. phureja draft genome. 2769 SNPs in candidate genes 508 SNPs in genetic markers 6723 SNPs will come from throughout the genome How much of the genome is represented? ~650 Mb of the genome will be covered (~850 Mb genome) Validation High Resolution Melting • Tested: 48 primers • Validation (75%) • Problems with technical replicates GoldenGate Bead Express 96 x 480 samples Selected 32 SNPs total per variety (96 total) Validation rate ~85% Illumina Output: Good SNP Validation: Dosage calls Premier Atlantic Snowden Rio Grande BBBB 14 6 7 13 AAAA 17 5 6 21 hets BBBA 3 13 5 30 3 26 AABB AAAB 21 20 31 11 28 19 Total hets no data 57 6 77 6 76 6 nulliplex simplex duplex 31 33 21 11 41 31 13 45 28 59 SNP segregation in 4x Russet Mapping population (Premier x Rio Grande) Cross Duplex x Duplex Duplex x Nulliplex Duplex x Simplex 10 2 11 10 4 22 Nulliplex x Duplex Nulliplex x Nulliplex Nulliplex x Simplex 2 28 6 28 14 Simplex x Duplex Simplex x Nulliplex Simplex x Simplex Simplex x Het Total 11 8 11 5 11 5 94 Pair-wise Comparison of SNPs Non-segregating Segregating SNPs (%)z Cross Ploidy SNPs (%) I II W2310-3 x Kalkaska 4X 22.4 37.6 40.0 MSG227-2 x Jacqueline Lee 4X 16.5 51.8 31.8 Atlantic x Superior 4X 5.9 51.8 42.4 Stirling x 12601ad1 4X 25.9 37.6 36.5 B1829-5 x Atlantic 4X 11.5 18.8 69.8 BER 63 x DM1-3 2X 79.3 20.7 0 BER 83 x DM1-3 2X 78.8 21.2 0 84SD22 x DM1-3 2X 46.0 54.0 0 MCR205 x DM1-3 2X 76.7 23.3 0 DI x DM1-3 2X 85 15 0 08675-21 x 09901-01 2X 53.8 46.2 0 RH x SH 2X 59 41 0 zI = segregation not dependent on scoring dosage; II = segregation dependent on scoring dosage Percent Heterozygosity % Heterozygosity: 96 SNPs x 96 Potato lines Clone Potato Panel SNP Heterozygosity % of clones in each rangee 70.0 60.0 50.0 40.0 Diploid Clones Tetraploid Clones 30.0 20.0 10.0 0.0 1-10% 10-20% 20-30% 30-40% 40-50% 50-60% 60-70% 70-80% 80-90% 90-100% Range of % Heterozygosity SNP Heterozygosity Extremes Tetraploids 80-90% All Red CF77154-1 CO95051-5W Snowden Atlantic CO97215-2P/P % Heterozygosity 81.2 83.5 84.7 84.7 85.9 85.9 <50% Chunshu No4 P1 MSL512-6 Inca Gold P2 NDSU clone 4 % Heterozygosity 27.1 28.2 44.7 47.1 48.2 48.2 Diploids 50-60% C5 84SD22 MCD500036 % Heterozygosity 52.9 56.7 69.4 1-10% DM1-3 ber265857 CMM6-3 CMM 1T CMM243503 % Heterozygosity 0 5.9 7.8 8.2 8.2 Potato Germplasm Panel • Panel structure (350 clones) – – – – – Top 50 N. American varieties Historical varieties Advanced US breeding lines Non-US germplasm Genetic stocks • Population analyses – Association mapping – Historical relationship – Hypothesis testing for trait associations – Parental selection – Resolve population structure • Phenotypic screening for additional traits outside of SolCAP • Phenotypic evaluation – Key traits: specific gravity, sucrose, glucose, Vitamin C, maturity, tuber shape, tuber number, etc. – Additional traits determined by breeding community – Data curated at SGN SNP comparison across potato germplasm panel: resolving population structure MSU Breeding Program varieties Group Phureja clones clusters separately from elite germplasm Wild species cluster separately from Phureja and Tuberosum SNP Genotyping Consortium • Potato 10K (~9100 SNPs) Illumina Infinium chip • a core set of SNPs in standard germplasm panels in tomato and potato. • Over 3000 genotyping samples were ordered • Consortium’s efforts resulted in securing a 24% discount per sample beyond what would have been possible with one contributor ($85/sample) • The barrier to entry for many institutions was lowered, as they were able to access this tool with only a 48 sample commitment. • Illumina saw orders from each of the three major world regions. • More SNPs? SolCAP SNP Genotyping • ~9100 SNPs for elite potato germplasm • 2010 SolCAP Goal: 1,152 potato x 9,100 SNPs • • • • potato germplasm panel:350 4x russet mapping population: 200 2x mapping population: 160 Community SNP genotyping: • 2 populations: 350 What makes up the Potato Germplasm Panel Phenotypic Evaluation? • Clonal Study (CS) – 250 clones – 2 reps X 10 hills – OR, WI, NY CS • Russet Mapping Population (MP) – – – – Rio Grande X Premier Russet 200 progeny 2 reps X 10 hills ID, NC, MN MP MP CS CS MP States in blue = Participants in SolCAP Potato Germplasm Panel • To be field tested 2 years X 3 major environments for potato production. • Evaluation of specific gravity, glucose and sucrose, chip color, skin type, shape, vine maturity, tuber number, tuber shape, vitamin C, internal defects, bruising, anthocyanins and biotic resistances. Genotyping the core collections will impact strategies for translation • Potential translational approaches: – 1) introgression from other populations (domesticated or wild) – 2) selection for coupling phase recombinants to establish linkage blocks of favorable alleles (e.g. disease resistance loci) – 3) population development designed to maximize variation w/in market classes – 4) association approaches – 5) whole genome approaches • Other translational strategies will emerge under other CAPs or through innovation in public research. Russet 4x Mapping Population • Evaluate russet mapping population traits (Yencho, Novy, Sowokinos, Thill, Gupta, Haynes) (2009-2011) – Key traits: specific gravity, sucrose, glucose, Vitamin C, maturity, tuber shape, tuber number, etc. • Genetic Mapping (Van Deynze, De Jong, Douches) – Genotyping 9100 SNPs • QTL Analysis (Haynes) – Identify markers associated with key traits • MAS/MAB (Marker Assisted Selection / Breeding) – Validation of QTL in additional mapping populations – Use markers in new breeding populations Databases and Resources • Integrated, breeder-focused resources for genotypic and phenotypic analysis at SGN and MSU. – http://solcap.msu.edu – http://solanaceae.plantbiology.msu.edu/ – http://solgenomics.net/ SolCAP Education and Extension Objectives • Team-taught distance-learning graduate level course in translational genomics at Cornell University • Yearly workshops for breeders to integrate genotypebased breeding strategies with elite germplasm • Use eXtension.org to develop a Community of Practice for plant breeders, called Plant Breeding and Genomics, across all CAPs (Barley, Wheat, Conifer, RosBreed, Bean, Onion) SolCAP PAA Workshop • August 15, 2010 Corvallis, Oregon • Hands-on computer lab format • Topics – Potato genome analysis: Robin Buell – Tetraploid QTL analysis: Christine Hackett – Use of Illumina Genome studio: Allen Van Deynze PB&GWorks Web community http://pbgworks.hort.oregonstate.edu/ SolCAP has created PBGworks, a web community within the eXtension.org Plant breeders, basic scientists, seed industry professionals, agricultural professionals, extension specialists and others can publish content and network. Target audience: The practicing plant breeder. Our long-term goal is to provide: • Start-to-finish examples of marker-assisted selection applications • Resource pages including protocols, software tutorials, and up-to-date contact information for companies offering genetic services • Improved access to genetic resources through the "breeder's toolbox" Potato SNP Summary •In silico Sanger eSNPs: potato: 57,705 eSNPs • •~75,000 potato SNPs from 5.7 Gb of GAII transcriptome sequence (69,011 SNPs passed Infinium design) •~650 Mb of the genome will be covered by SNPs •Validation suggests SNPs can be called in broader germplasm •Dosage reads of SNPs will optimize SNP genotyping of 4x mapping populations •Reference Sequence of DM1-3 516R44 is permitting bioinformatic optimization of pipelines rather than relying on empirical validation. Germplasm Panel SNP Genotyping • SSR-based genetic map – 2 years – 200 markers • 17 markers/chromosome – $5/ data point – Not dense enough for 4x mapping – Markers may be linked to traits • SNP-based genetic map – < 1 week – 9,100 markers • >700 markers/chromosome – < 2 ¢ / data point – Dense enough for 4x mapping – Markers are in genes – Markers robust enough for broader germplasm Outcomes for Breeding from SolCAP • A genome-wide set of markers and bioinformatic tools accessible by breeders – Breeders will access germplasm for crossing based upon SNP polymorphism and linked QTL of interest – design crosses complementary for QTL and traits, and then use MAB in early generation selection. Outcomes for Breeding from SolCAP • Better understanding of the allelic variation influencing CHOs – Design crosses to create improved sugar and starch levels and starch quality. – Crosses designed to manipulate and select variation within existing elite populations or introgress novel alleles from wild germplasm. – More predictable and directed breeding effort for processing and fresh market traits. SolCAP Acknowledgments Collaborators, OSU David Francis Matt Robbins Sung-Chur Sim Troy Aldrich Collaborators, MSU David Douches C Robin Buell John Hamilton Kelly Zarka Collaborators, Cornell Walter De Jong Lucas Mueller Joyce van Eck Collaborators, UCD Allen Van Deynze Kevin Stoffel Alex Kozic Jeanette Martins Collaborators, Oregon State Alex Stone John McQueen Roger Leigh Others: Michael Coe Sanwen Huang Funding USDA/AFRI This project is supported by the Agriculture and Food Research Initiative Applied Plant Genomics CAP Program of USDA’s National Institute of Food and Agriculture. Acknowledgments: PGSC BGI-Shenzhen, China (Sanwen Huang, Ruiqiang Li, Xun Xu, Wei Fan, Peixiang Ni, Hongmei Zhu, Desheng Mu, Bicheng Yang, Jian Wang and Jun Wang); Center Bioengineering RAS, Russia (Boris Kuznetsov); Central Potato Research Institute, India (Swarup Chakrabarti, V.U. Patil, Shashi Rawat and S.K. Pandey); Chinese Academy of Agricultural Sciences, China (Sanwen Huang, Zhonghua Zhang and Dongyu Qu); University of Dundee, United Kingdom (Dan Bolser and David Martin); ENEA, Italian National Agency for New Technologies, Energy and the Environment, Italy (Giovanni Giuliano and Gaetano Perrotta); Imperial College London, United Kingdom (Gerard Bishop); International Potato Center (CIP), Peru (Merideth Bonierbale, Marc Ghislain and Reinhard Simon); Institute of Biochemistry and Biophysics (PAS), Poland (Wlodzimierz Zagorski, Jacek Hennig, Pawel Szczesny, Piotr Zielenkiewicz and Robert Gromadka); Instituto Nacional de TecnologÌa Agropecuaria (INTA), Argentina (Gabriela Massa, Leandro Barreiro and Sergio Feingold); Instituto de Investigaciones Agropecuarias (INIA), Chile (Boris Sagredo, Alex Di Genova and Nilo MejÌa); Michigan State University, USA (Robin Buell, David Douches, Steven Lundback, Alicia Massa, and Brett Whitty); New Zealand Institute for Plant & Food Research, New Zealand (Jeanne Jacobs, Mark Fiers and Susan Thomson); Scottish Crop Research Institute, United Kingdom (Glenn Bryan, David Marshall, Robbie Waugh and Sanjeev Kumar Sharma); Teagasc Agriculture and Food Development Authority, Ireland (Dan Milbourne, Istvan Nagy and Marialaura Destefanis); Universidad Peruana Cayetano Heredia, Peru (Gisella Orjeda, Frank Guzman, Michael Torres, Tomas Miranda, German de la Cruz, Roberto Lozano and Olga Ponce); University of Wisconsin, USA (Jiming Jiang and Marina Iovene); Virginia Polytechnic Institute & State University, USA (Richard E. Veilleux); Wageningen University, The Netherlands (Bas te Lintel Hekkert, Christian Bachem, Erwin Datema, Jan de Boer, Richard Visser, Roeland van Ham, Theo Borm and Xiaomin Tang) Funding at MSU for potato genomics: National Science Foundation Visit us at http://solcap.msu.edu/ EXTRAS What is a SNP? Single-nucleotide polymorphism (SNP, pronounced snip) SNP is a DNA sequence variation occurring when a single nucleotide — A, T, C, or G — in the genome differs between members of a species SNPs may fall within coding sequences of genes, non-coding regions of genes, or in the intergenic regions between genes. SNPs within a coding sequence may or may not change the amino acid sequence of the protein that is produced. Hawkeye Viewer – Visualizing SNPs G/T SNP