Update on the NSA SNP project - National Sunflower Association

advertisement
Update on the NSA SNP
project
Dr. Venkatramana Pedagaraju – Molecular
Breeding and Genomics Technology Manager
Dr. Brent Hulke -- Research Geneticist
NSA Sunflower Chip
Idenifyed
SNPs/InDels
# Single Bead Assay
Fixed variants
8313
6323
> 1SNP/contigs
5361
2072
RAD clustering to
common EST
1167
430
Het variants
1557
1175
Total
16398
10000
Categories
Synthesis failure
Final set
-1277
8723
Sunflower Genotyping Panel
Diversity Panel
Mapping Panel
HA89 x RHA464
Mycogen
NuFlower
1134
samples
Advanta
Genosys
F1
Self
Seeds 2000
CHS
USDA
141, F2 lines
genotyped
24x1 HD NSA Bead Chip
A1 well: Reproducibility controls(3 unique lines selected from sequencing panel)
A12 well: heterozygous controls (F1 hybrids)
Infinium Work Flow
Data Analysis using Genome Studio
Software
Cartesian Plot
Polar Plot
AA
AB
BB
Call region
Example of a good SNP
Challenges Posed due to Deletions, Nearby
Polymorphisms & Paraolog sequences
Creating Project Specific Cluster Files
Improved call rates
All Samples
Specific project
Performance of SNP markers across various Diversity
Panels
8400
8200
8000
#
Marker Loci
7800
7600
7400
7200
7000
6800
6600
6400
6200
Projects
Combined
Specific
Projects
Reproducibility
Reproducibility is based on the replicate pairs identified in sample manifest. The metric for
reproducibility is calculated based on number of matching allele calls. Marker displayed
99.54% reproducibility.
Mendelian Consistency
Mendelian Consistency is based on the trios identified in sample manifest.
The metric is calculated based on the number of matching genotypes
(Mendelian Inheritance) between a child and each of its parents
Summary
NSA member
Status
Advanta
USDA(Brent)
USDA(Lili)
Seeds2000
Genosys
CHS
Mycogen
NuFlower
May Agro
Reported
Reported
Completed
Reported
Completed
Reported
Reported
Data analysis in progress
DNA isolated
Conclusions
• Out of total 16398 SNP identified, a subset of 8723
SNP were successfully validated across wide range of
sunflower breeding lines.
• Deletions, nearby polymorphism and presences of
paralog sequences cause the locus success rate to vary
among different breeding lines.
• About 91% of SNPs were successfully scored in the
sunflower diversity panel and linkage mapping
population.
• Approximately 5500 polymorphic loci were identify in
the USDA bi-parental mapping population
Future Directions
—
Develop a SNP based genetic map using genotypic data derived from USDA mapping
population (HA89 x RHA464).
—Constitute a standard panel of 384 sunflower SNP markers for
routine usage across range of breeding projects(diversity analysis,
genome selection, qtl mapping, trait introgression programs),
based on below criteria:
• Highly polymorphic & informative in any panel of
sunflower germplasm(MAF>0.05)
• Uniformly distributed on sunflower genome
• Easily scorable on genome studio and produce automatic
genotypic calls
Future Prospects with SNPs
1. Mapping of SNPs to linkage groups
defined by the SSR map
2. Development of a 384 marker suite for
background selection in trait capture and
genomic selection
3. Development of a suite of trait specific
markers (may be included in the 384)
4. Genomic selection concept and practice
Trait specific markers
• Obtained two ways:
– Association mapping with Phase II germplasm from
all companies and USDA
• Use existing inbred lines to find markers for traits
• Strong possibility for IMISUN, SURES, HO, Pl6, Pl8, R-gene,
recessive branching, and confection traits
– Two parent mapping
• Will happen for RHA 464 rust gene and Plarg gene as part of
Lili’s mapping
• Other traits, like other rust, vert resistance will need to be
started new or translated from existing populations with prior
SSR data
Trait specific markers
• Markers from any type of discovery
method can be put together on a Bead
Express assay, which is either part of the
384 Bead with random markers for
genomic and background selection, or will
stand on its own (48 Bead?)
Genomic Selection
• Using a moderate set of markers (384) to
statistically associate with previous
breeding data, to provide a way to make
early selections before you have field info
• Instead of just field measurement of traits,
you can preselect lines based on marker
data, and put only the “best” to field testing
Genomic Selection
• What is the ideal use of this to a breeder?
– Take information from your own yield trials and
apply it to new breeding lines
– Standard set of random markers (like a 384 SNP
bead) that are equally distributed over genome
(divides genome into “blocks” or “bins”)
– Only marker-assisted system with “pipeline”
characteristics like a breeding program
Conceptual bins for a chromosome, vertical bars as SNPs
Genomic Selection – “training”
• Breeder has a population that has good potential
to produce exceptional lines
• Data is collected on existing breeding lines for a
quantitative trait over many locations (yield, oil)
• A moderately sized marker set (384) is
regressed statistically against the data
• Markers are random effects
– Marker significance is not determined individually, but
as the full set of markers together
– All markers are included in the selection model,
however, each has a different weighting (importance)
for selection (called Estimated Breeding Values)
Genomic Selection – “selection”
Data from previous YT with
EBVs calculated for SNPs
Very narrow based population
for short term improvement and
rapid inbred extraction
Pick the most likely plants to have
the phenotype of interest by
selecting the plants with the best
marker profile
Elite x Elite cross
F1 plant
x
F2 plants (large number, >100)
Analyze with OPA as seedlings
Select top 30%
x
Simple and straightforward
F3
Alternatively, advance large number
of lines by SSD to F4 or F5 and
analyze with SNPs to fix genes and
improve predictions.
x
…
Testcross to tester lines,
and evaluate in field
x
Finished inbred Commercial hybrid
development
Data from YT used to
“tweak” model for next gen.
Where is GS best used?
• Excellent technique if you want to maximize
selection accuracy and rate of genetic gain on a
pop. by pop. basis.
– Inference space is the population(s) of interest
– Different populations have different gene structure,
thus different EBVs for each bin in each population
will improve gain from selection
• Excellent technique if data is routinely generated
for the trait of interest (e.g. yield data will always
be generated in plant breeding)
Time course for Genomic Selection
1. Assemble prior information – yield trials,
special trait trials, on all lines tested the last
few years
2. Get these same lines genotyped with 384
markers of equal genome distribution
3. “Train” your model and find the value of each
marker
4. Take your newest germplasm, genotype
5. Use markers to assess which are the most
likely lines to be release, and do field testing
Thanks for your support!
Download