Genetic Diversity and the Effects of Artificial Selection in Maize Maize Diversity Project Team Molecular Diversity How has selection shaped molecular diversity in maize? What is the relationship of selected genes to agronomic traits? Goal: Identify genes exhibiting selection – Domestication, agronomic improvement, and local adaptation Community resource: SNP marker collection Teosinte Landraces Inbreds/Hybrids Photos courtesy J. Doebley Major predictions for the model Those genes have contributed most to maize improvement, i.e. have experienced the strongest history of selection have the least genetic variability left to contribute to crop improvement by classical breeding. These genes will not be detected in standard QTL experiments because all lines will contain similar alleles. Can we develop genomics screens to identify genes that have undergone selection? Invariant SSR approach (Vigouroux et al. 2002 PNAS 99:9650) Directly contrast sequence diversity among teosintes and inbreds (Wright et al. 2005 Science 308:1310) Are genes with low inbred diversity enriched for selected genes? (Yamasaki et al. 2005 Plant Cell 17:2859) mcmullenm@missouri.edu for .pdfs Summary of Sequencing on Random Genes (Irie Vroh Bi, Masanori Yamasaki, Kate Houchins) MPZ inbreds – (temperate) B73(2), Mo17(2), Hp301, Il14H, Ky21, M37W, Oh43, (tropical) CML69, CML247, CML322, CML333, KUI3, KUI11, NC350. 1095 alignments - 6169 SNPs. MPZ inbreds + 16 teosinte partial inbreds 774 alignments – 3463 SNPs MPZ inbreds – 6136 SNPs in teosintes. Sequence statistics for 1095 genes for diverse maize inbred lines. All Maize Temperate Tropical N L 13.1 280.4 6.7 292.2 6.6 290.8 Total L 307034 310306 308816 S 5.6 4.3 4.2 Total S 6169 4560 4427 N = number of sequences, L = length of alignment, S = number of segregating sites, π average number of pairwise differences per bp. π 0.0067 0.0065 0.0061 Inbred-Teosinte Sequence Summary • • • • • • Number of alignments >5 in both sets Average sample size inbreds Average sample size teosinte Average alignment length Total SNPS in inbreds Total SNP in teosintes 774 12.0 12.7 294 3463 6136 Diversity in maize inbreds vs. teosinte 0.07 0.06 q inbreds 0.05 0.04 0.03 0.02 0.01 0 0 0.02 0.04 0.06 q teosintes Average q.inbred/q.teosinte 0.57 Excluding q.inbred=0 values 0.63 0.08 To identify the selected genes we need new statistical approaches • There are two models: a selection model and a bottleneck model • We must estimate the size of the bottleneck • For each model, we estimate the probability of the model given the data (the likelihood) for each gene • This is very simulation and computer intensive! • This approach allows us to estimate the proportion of genes under selection and to identify the candidates Two models: To be considered selected need to fail the neutral model and be accepted by the selected model. Na t1 Na Nb t2 t1 Nb t2 Np neutral Np selected Genes significant for selection Locus S inb. S teo. Probability of being in selected class Annotated BLAST hit scl394_p3** 0 27 0.74 Arabidopsis thaliana L28 ribosomal protein scl491_p3** 0 13 0.62 Maize dihydrodipicolinate synthase scl405_p3** 0 12 0.59 Unknown expressed protein scl427_p2* 0 16 0.54 A. thaliana DNAJ heat shock protein scl526_p3** 1 16 0.54 Maize hexokinase scl499_p5** 0 12 0.51 Unknown expressed protein scl512_p1** 0 16 0.51 Triticum adenylosuccinate synthetase scl536_p4** 0 17 0.49 Oryza sativa putative acetyl transferase scl531_p4** 0 11 0.46 Oryza sativa putative auxininduced protein scl457_p4* 0 7 0.45 Oryza sativa putative growth factor On a genomic scale…. • Assume 40,000 genes in maize • 40,000 x 0.04 = 1600 selected genes • Before genome scans, 11 genes had been identified as selected by population genetic approaches • By sequencing 1000 genes, have ~30 novel candidates • These genes need to be divided between domestication and improvement What genes show evidence of selection? • Genes involved in amino acid synthesis or metabolism • Genes involved in growth response. • Transcription factors and signal transduction components. • Unique genes with no significant BLAST homologies. Are genes with low inbred diversity enriched for domestication and improvement candidates? (Masanori Yamasaki) Chose 35 genes with no diversity among the MPZ inbred set. Sequenced same region in 16 haploid landrace samples, 16 teosinte partial inbreds and a Tripsacum dactyloides sample. Performed Hudson-Kreitman-Aguadé (HKA) (tests for selection) on inbreds, landraces and teosintes against the neutral genes adh1, glb1, fus6 and bz2. Performed coalescent simulations of domestication (CS) of inbreds vs. teosintes and landraces vs. teosintes. ARF Amino Acid Transporter 0.01 0.02 0.01 0 1 0.02 500 0 1000 GTP-binding Protein 1 2000 3000 Unknown 0.02 0.01 1000 0.01 1 500 1000 0 1500 1 500 1000 1500 p 0 F-box (circadian clock) 0.03 Ankyrin repeat 0.08 0.06 0.02 0.04 0.01 0.02 0 0.02 1 1000 0 2000 1 Fruit protein 1000 2000 3000 Chromatin remodeling 0.03 0.02 0.01 0.01 0 1 1000 2000 3000 0 1 Nucleotide position (bp) 500 1000 1500 Inbreds P value in HKAtotal Unigene N L S AY108876 14 1,055 1 < 0.0068 ** AY107195 14 3,119 1 AY110109 14 1,466 AY105060 14 AY108178 Teosintes P value in HKAsilent P value in HKAtotal P value in HKAsilent N L S Candidate status Homology search < 0.0120 * 16 1,026 13 < 0.0433 * < 0.1849 Selected Gene Amino acid transporter < 0.0058 ** < 0.0087 ** 11 3,097 81 < 0.5889 < 0.6761 Selected Gene Auxin response factor 1 < 0.0051 ** < 0.0054 ** 14 1,355 43 < 0.3613 < 0.5321 Selected Gene GTP-binding protein 1,090 0 < 0.0041 ** < 0.0053 ** 15 1,112 59 < 0.7005 < 0.7631 Selected Gene 14 1,259 0 < 0.0054 ** < 0.0082 ** 13 1,224 54 < 0.3233 < 0.4719 Selected Gene AY106616 14 2,745 84 < 0.4395 < 0.2205 7 2,619 97 < 0.6859 < 0.7214 - Ankyrin repeat-like protein AY107952 14 2,469 23 < 0.1193 < 0.0927 14 2,599 38 < 0.1453 < 0.1678 - Putative fruit protein, Oxidoreductase AY106371 14 1,574 4 < 0.0094 ** < 0.0061 ** 15 1,615 65 < 0.4603 < 0.4047 Selected Gene Circadian clock Putative methyl-binding domain protein Do genes exhibiting signatures of selection control agronomic traits? (Sherry Flint-Garcia) • Hypothesis: manipulation of the expression of domestication and improvement genes will alter key agronomic traits • Methods: use genetic and transgenic approaches to examine teosinte, exotic, and inbred alleles • Test case: amino acid composition in kernels • Evidence for selection for cysteine synthase, chorismate mutase, dihydrodipicolinate synthase and hexokinase To what extend has diversity in amino acid synthesis genes been reduced by selection? (Sherry Flint-Garcia) • Whitt et al., 2002 demonstrated that 3 of 6 genes in starch synthesis pathway in maize show solid evidence of artificial selection • Evidence for selection for cysteine synthase, chorismate mutase, dihydrodipicolinate synthase and hexokinase from random sequencing • Chose 16 additional genes for important steps in amino acid synthesis, sequenced in teosintes, landraces and inbreds and conducted tests of selection ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ns ** ** ** ns ns ** ns Arginine Aspartic Acid Cysteine Glutamic Acid Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline ** ** ns ** ** ns Tryptophan Tyrosine 15 10 Percent of Kernel Weight Teosinte (n = 7) Landraces (n = 11) Maize (n = 27) 5 Total Amino Acid Valine ** ** Threonine 20 Serine 0 Teosinte vs. Landraces ** Teosinte vs. Inbred Lines ** Alanine Percent of total amino acid 25 30 25 20 15 10 5 0 ** ** Trans-cinnamic acid Lignin PAL Glycine Glucose Serine Phenylalanine 3-Phosphoglycerate O-Acetylserine Cysteine synthase Prephenate Erythrose 4-P Leucine Cysteine 2-isopropylmalate synthase Pyruvate Tyrosine Chorismate mutase Phosphoenol pyruvate DAHP Pyruvate Alanine Shikimate Chorismate Anthranilate Synthase β Anthranilate Valine Acetyl-CoA Acetohydroxy acid synthase Isoleucine Asparagine Asparagine synthetase 2-Ketobutyrate Threonine Aspartate deaminase Aspartate kinase Aspartate 4-seminaldehyde Threonine Homoserine 4-phosphate Cystathionine γ-synthase Indole-3-glycerol phosphate Cysteine Cystathionine Homocysteine Tryptophan Synthase β1 Aspartate Aminotransferase Tryptophan Oxaloacetate TCA Cycle Glutamate α-Ketoglutarate DHDP synthase Arginine Proline Glutamate dehydrogenase 2,3-Dihydrodipicolinate Glutamate Proline dehydrogenase Lysine NO3– Methionine SAM synthetase I SAM synthetase II S-Adenosylmethionine NH4 NO2– Glutamine NH4 Nitrate Reductase Hexokinase (N:C sensing) ntl1 -- nitrogen regulating protein Histidine Sequencing candidate genes • Goal is to sequence 1000 candidate genes in all inbreds for the 25DL, 16 teosintes, 2 Tripsacum, and W22 R-std • Shared responsibility by E. Buckler and M. McMullen laboratories • Develop SNP (or sequence) based assays for association analysis • Develop a mechanism to accept candidate gene suggestions for outside the project • www.panzea.org 100% 80% 60% 38,000 genes 1,000 genes 1,000 genes Implications for GEM • For the vast majority of genes inbreds lines retain on average 60% of common diversity of teosinte and 80% of the diversity of landraces. Therefore the problem of loss of diversity is a specific problem to particular genes and traits rather than a general problem • Most of the diversity lost in unselected genes is in rare alleles and therefore hard to capture Implications for GEM • Our studies to date have not addressed specific adaptation, possibly a more important justification for GEM than limited diversity per se • It is hard for me to think about how to tap diversity for specific adaptation without considering diversity in a trait context.