讲座提纲 1 2 3 4 5 6 7 8 9 10 什么是分子育种 历史回顾 全基因组策略 基因型鉴定 表现型鉴定 环境型鉴定 (etyping) 标记-性状关联分析 标记辅助选择 决策支撑系统 展望 Evolution of Genotyping (1980-2010s) Systems From gels to chips and sequencing (GBS) Throughput From singles to millions Resolution 10-30 cM to many markers per gene Cost (per data point) Several US dollars to 1/1000 cent Marker type Morphological Cytological Protein DNA RFLP RAPD AFLP SSR SNP Xu 2010 Molecular Plant Breeding CABI Publisher Molecular basis of DNA markers A single-nucleotide polymorphism (SNP) is a DNA sequence variation occurring when a single nucleotide — A, T, C, or G — in the genome differs between members of a biological species. (Wiki) Revised from: Xu 2010 Molecular Plant Breeding CABI Publisher Copy-number variations (CNVs) — a form of structural variation—are alterations of the DNA of a genome that results in the cell having an abnormal number of copies of one or more sections of the DNA Presence/Absence Variation (PAV) Sample Chromosome distribution A Presence B Absence Presence/Absence Variation (PAV) results in many genes that cannot be mapped based on regular linkage mapping with SNP markers 单倍型的概念及其发展 A haplotype is a group of genes within an organism that was inherited together from a single parent. A haplotype can describe a pair of genes inherited together from one parent on one chromosome, or it can describe all of the genes on a chromosome that were inherited together from a single parent. This group of genes was inherited together because of genetic linkage. The term "haplotype" can also refer to the inheritance of a cluster of single nucleotide polymorphisms (SNPs), which are variations at single positions in the DNA sequence among individuals. 功能区域SNP构成的单倍型 基因内SNP构成的单倍型 染色体内SNP构成的单倍型 全基因组范围 SNP构成的单倍型 从SNP 到单倍型和 标签SNP SNP2 SNP1 SNPs Chromosome 1 Chromosome 2 Chromosome 3 Chromosome 4 SNP3 AACACGCCA …. TTCGGGGTC….AGTCGACCG …. AACACGCCA …. TTCGAGGTC….AGTCAACCG …. AACATGCCA …. TTCGGGGTC….AGTCAACCG …. AACACGCCA …. TTCGGGGTC….AGTCGACCG …. Haplotype Haplotype 1 Haplotype 2 Haplotype 3 Haplotype 4 Tag SNPs Individual 01 Individual 02 Individual 03 Individual 04 Individual 05 Individual 06 Individual 07 Individual 08 Individual 09 Individual 10 Individual 11 Individual 12 CTCAAAGTACGGTTCAGGCA CTCAAAGTACGGTTCAGGCA CTCAAAGTACGGTTCAGGCA CTCAAAGCACGGTTGAGGCA CTCAAAGCACGGTTGAGGCA CTCAAAGCACGGTTGAGGCA CTCGAAGTACGGTTCAGGCA CTCGAAGTACGGTTCAGGCA CTCGAAGTACGGTTCAGGCA CTCAAAGCACGGTTCAGGCA CTCAAAGCACGGTTCAGGCA CTCAAAGCACGGTTCAGGCA A / G T / C C / G SNP Genotyping Platforms Winner? # Markers Throughput Cost Data deliverry Service Genotyping by Arraying (Chips) ● Three Illumina 1536-SNP chips: Illumina-Cornell-CIMMYT collaboration Yan et al 2009; Yan et al 2010 ● Illumina MaizeSNP50 Beadchip: Up to 56,110 SNPs, 1 SNPs/40 kb Covering 19,540 genes, 2 SNPs/gene Functionally tested with over 30 diverse maize lines Developed by Illumina in collaboration with TraitGenetics, INRA, and Syngenta SNP genotyping by Array Tape Douglas Scientific Array Tape 平台包括: Nexar Inline Liquid Handling System Soellex Thermal Cycler Araya Inline Fluorescence Scanner Centrifuge Kraken SNPline XL System 高通量数据: 每天处理400 张384孔反应数据(15万个) 低运行成本:极微量反应体系, 节省80-90% 的反应试剂 模块化程序设计: NEXAR微量液体转移系统 SOELLEX高通量PCR反应系统 ARAYA扫描系统 特别适合于大量样本 少量标记的分析 Genotyping By Sequencing (GBS) GBS technology enables the detection of a wider range of polymorphisms: SNPs plus small indels No pre-discovery or validation Applicable to any species or population GBS approaches Simply sequence the entire genomes of individuals: expensive Several extant methods. Each enriches for a portion of the genome which is then sequenced. Enrichment is most often achieved via restriction enzyme (RE) digestion. The existence of only 4, 6 or 8bp recognition sites limits the “tunability” of extant methods. Huang et al., 2010 Nature Genetics; Andolfatto et al., 2011 Genome Research; Elshire et al., 2011, PLoS ONE; Davey et al., 2011 Nature Reviews Genetics Genotyping-By-Sequencing GBS Created for high-throughput, semi-automated genotyping Sequencing adaptor Barcode Sticky ends Genomic DNA Sample plants Isolate DNA Restriction digest Ligate adaptors Sequence • Drawbacks • Advantages • • • • Pool & amplify One step SNP discovery + genotyping Simple protocol; no reference required Large numbers of SNPs found cheaply Broadly applicable • False SNPs from sequencing errors • Missing data from stochastic sampling Images: Qiagen, Illumina, Elshire et al 2011, PLoS ONE 1. 限制性酶切 2. 添加接头 3. 混池构建 4. 片段长度选择 5. 测序 6. 质量检控 7. 序列比对 8. HMM模型拟合 9. 下游分析 Andolfatto et al. 2011 Genome Research GBS: Competitive Landscape 1 Commercialized by Floragenex Inc. 2 Not disclosed; Data2Bio’s proprietary technology From P. S. Schnable Maize GBS 2.7 Build Trained on 32K taxa including extensive CIMMYT material (landraces and diverse breeding materials) 45K taxa now scored with build 960K core SNPs Production Tags On Physical Map (TOPM) file for one step SNP calling available at panzea.org (imputation and calling in 15 min) Ed Buckler, personal comm. Genotyping by Whole Genome Sequencing Sequencing Everything !! Resequencing to discover SNPs, haplotypes and tag SNPs Tag SNPs can be developed to represent haplotypes. Each tag SNP represents one haplotype fragment. A set of tag SNPs can be developed to represent whole genome diversity. Approaches to Reduce Cost and Increase Scale in Genotyping Seed-based DNA genotyping Efficient sample tracking Selective genotyping and pooled DNA analysis Integrated diversity analysis, genetic mapping and MAS Developing breeding strategies for simultaneous improvement of multiple traits Seed DNA-based Genotyping in Maize ① Soaking ⑥ Tracking back and planting ② Sampling ⑤ PCR and genotyping ③ Grinding ④ DNA extraction Gao et al 2008 Mol Breed 22:477–494 Automatic seed chipping Laser-assisted seed selection Selective Genotyping: QTL Effects and Population/Tail Sizes N = 200 N = 500 100 90 80 70 60 50 40 30 20 10 0 20% 15% 10% 5% 3% 100 50 30 15 20% 15% 1% N = 1000 10% 5% 3% 100 50 30 15 1% N = 3000 100 90 80 70 60 50 40 30 20 10 0 20% 100 90 80 70 60 50 40 30 20 10 0 15% 10% 5% 3% 100 50 30 15 100 90 80 70 60 50 40 30 20 10 0 20% 15% 10% 1% Sun et al 2010 Mol Breed 26:493–511 100 50 30 5% 3% 1% 15 Bulked or Pooled DNA Analysis A B Population distribution Selection DNA Pools 1.0 Linked Linked Genotyping PCR markers Chip genotyping DNA sequencing RNA sequencing Linked Unlinked Allele frequency 0.5 0.0 1.0 Linked 0.5 0.0 1.0 Unlinked 0.5 0.0 R plants S plants High tail Low tail Xu, 2010, Molecular Plant Breeding, CABI