Title: Development of sequence-based community tools for Setaria viridis-a model genetic system for C4 grasses 1. Description The goal of this proposal is to build on the existing foundation established at JGIDOE to develop genomics resources for Setaria viridis. (http://www.phytozome.net/foxtailmillet.php). Team members include: PI: Thomas P. Brutnell, Boyce Thompson Institute Co-PI: Jeffrey Bennetzen, University of Georgia Co-PI: Andrew Doust, Oklahoma State University Co-PI: Katrien Devos, University of Georgia Co-PI: Elizabeth Kellogg, University of Missouri, St. Louis Co-PI: Todd Mockler, Oregon State University Specifically, we have identified three primary goals. Detailed justifications for each objective are in the Justification section. 1. Sequence 50 diverse accessions of S. viridis and map polymorphisms relative to the S. italica reference genome. To date, approximately 160 S. viridis accessions from 15 countries, including 60 accessions from the US, have been collected by Co-PIs Kellogg and Devos and corresponding genomes are currently being characterized by Devos using SSR markers. A subset of 50 inbred lines that captures the most diversity in the population will be nominated for deep sequencing (20x coverage) to fine map genetic variation throughout the genome. Estimated total sequence: (500Mb x 20x coverage x 50 lines= 500 Gb) 2. Sequence a subset of NMU- and fast neutron-mutagenized lines to define the nature and frequency of mutation in the populations. To characterize the efficacy of an ongoing mutagenesis program, we propose deep sequencing (20x coverage) of a subset of the M1 and descendant M2 progeny to assess the distribution, nature and frequency of mutations across the genome. This deep coverage anticipates a relatively high percentage of unmappable reads resulting in a minimal coverage of 8-10x. Characterization of M1 plants will reveal somatic mutation rates whereas germinal transmission rates will be determined in the M2 progeny. We anticipate that most mutations identified in the NMU-mutagenized populations will carry G-A transition mutations as NMU is an alkylating agent. Fast-neutron mutagenized plants are anticipated to harbor primarily deletion alleles of varying size and distribution throughout the genome. Estimated total sequence (12 plants x 500Mb x 20x coverage x 3 treatments = 360 Gb) 3. Sequence a S. Italica x S. viridis segregating population to map all breakpoints within the population and determine parental polymorphisms. To accelerate the discovery and characterization of genes underlying major QTL, we propose sequencing 500 members of a founder F2 population that is being self-pollinated to create new recombinant inbred populations. Each of the 500 individuals will be sequenced to 0.5X coverage to identify genetic breakpoints, and to generate a high density SNP map for both parental genomes. That is, although each line will be sequenced to a relatively low depth, each parental allele will be sequenced to a depth of 125x coverage. At this sequencing depth, nearly all small indels and SNPs will be determined for the two parental genotypes and will then be imputed onto the genetic map once breakpoints are determined. Thus, through this sequencing effort, we will provide the foundation for fine-mapping studies. Estimated total sequence (500 plants x 500 Mb x 0.5x coverage = 125 Gb) Total Sequencing Requested= 985 Gb 2. Justification S. viridis is a small stature, rapid cycling grass species that is closely related to some of the most promising bioenergy feedstock grasses including the sister taxa switchgrass (Panicum virgatum) and other closely related panicoids such as Miscanthus (Miscanthus giganteus), sorghum (Sorghum bicolor) and sugarcane (Saccharum officinarum). It is a rapidly emerging genetic model system to study C4 photosynthesis (Brutnell et al. (2010) Plant Cell 22:2537-2544), abiotic and biotic stress response (Li and Brutnell (2011) J Ex Bot., in press) and domestication (Doust et al. (2009) Plant Phys 149:137-141) – essential areas of research for the development of low input, high yielding bioenergy feedstocks that will be grown widely throughout the US. DOE-JGI has recently produced ~8.3X coverage of the ~500 Mb genome of the closely related food and feed crop, Setaria italica, and efforts are currently underway to sequence the S. viridis genome. We have identified three specific goals with the following justifications. 1. Sequence 50 diverse accessions of S. viridis Co-PIs Kellogg and Devos are currently collecting S. viridis accessions throughout the US and the world to study the global diversity of S. viridis, and the history, origin, and population structure of S. viridis introductions into North America. They will also examine the effects of selection for herbicide resistance on genome evolution. As S. viridis is generally considered one of the most adaptable plant species in the world, these lines will also serve as a foundation for association mapping studies, as a source of rare allelic variants and as founders for the development of additional recombinant inbred populations. 2. Sequence a subset of NMU and fast neutron-mutagenized lines. Brutnell has recently conducted a large-scale NMU-mutagenesis (3000 M1 plants propagated) and is in the process of conducting a fast-neutron mutagenesis (3000 M1 seed mutagenized) of S. viridis seeds. The Bennetzen lab has performed a fast-neutron mutagenesis of S. italica seeds. The three populations will serve as foundations for community forward and reverse genetic screens. Thus, sequence-based characterization of these populations will provide great insight into their utility as a resource for genetic screens and inform future mutagenesis programs. 3. Sequence a S. Italica x S. viridis segregating population to map breakpoints Co-PIs Bennetzen and Devos have created a number of crosses between Yugu1 (the foxtail millet accession sequenced by JGI) and selected S. viridis accessions. We propose sequencing F2 individuals from one of these founder populations to define genetic breakpoints, determine heterozygosity/homozygosity and map all parental SNPs and small indels. This will create a large mapping population that will be used to extend previous QTL analyses by co-PIs Doust and Kellogg on a small F3 segregating population from a cross between the wild (A10, S. viridis) and cultivated (B100, S.italica) accessions developed by co-PI Devos (e.g. Doust et al. (2004) Proc Natl Acad Sci 101:9045-9050, Wang et al. (1998) Theor Appl Genet 96:31-36). A small F7 RIL population from this cross was also developed by Panaud and colleagues, and co-PI Devos has generated a high-density SNP map (manuscript in prep) for the RILs that has been used to detect QTL for plant architecture, biomass accumulation, and flowering time (co-PI Doust unpublished). Comparative genomic analysis has suggested candidate genes that may determine several of these traits, yet further analysis requires large mapping populations, such as the one proposed for sequencing here. 3. Utilization The sequence data generated will serve multiple purposes including: 1) Provide a high density SNP map across 50 diverse accessions of S. viridis. This will guide the development of recombinant inbred populations and serve as a foundation for association mapping experiments. 2) Enable highly accurate determination of mutation density in chemical and fast-neutron mutagenesis populations of Setaria. These data will then be used to guide future mutagenesis programs and to establish reverse and forward genetic screening protocols to mine for specific allelic variants; 3) Define recombination breakpoints and the majority of parental polymorphisms in an F2 population. These data will be used to create one of the most highly resolved mapping populations yet developed for any plant species. Thus, this sequencing effort will have applications for plant breeding, genetics and genomics and greatly facilitate the development of community tools for S. viridis. 4. Community interest The sequence data generated will be used widely by a growing community of Setaria researchers. This includes a major international research community that is using Setaria as a model to understand C4 photosynthesis, industry and academic scientists using Setaria as a model for C4 bioenergy grasses that include sugarcane, switchgrass, sorghum and Miscanthus, and academic and industry scientists who currently work with Zea mays (maize) and are looking for a closely related and readily transformable model (Brutnell et al. (2010) Plant Cell 22:2537-2544; Doust et al. (2009) 149:137-141; Li and Brutnell (2011) in press). 5. DOE mission This sequencing effort will have the most immediate effect in DOE’s mission for alternative energy production. We are developing tools for S. viridis specifically with the bioenergy feedstock community in mind. S. viridis is a member of the same tribe as switchgrass (Paniceae) and is sister to Miscanthus, sugarcane, maize and sorghum. Although Brachypodium distachyon has been advanced as a model for understanding the grass cell wall, as a C3 grass, it has limited utility in understanding photosynthetic limitations of the bioenergy grasses, which nearly exclusively utilize C4 photosynthesis. Thus, S. viridis has great potential as the model system to examine carbon-nitrogen balance, biomass accumulation, biotic and abiotic stress tolerance and water use efficiency in C4 grasses. This is particularly relevant for Miscanthus or switchgrass that are recalcitrant to genetic analysis due to their large size, polyploidy, long generation times, sterility (Miscanthus) and self-incompatibility (switchgrass). 6. Sample preparation Brutnell and Mockler have extensive experience in Illumina library construction (e.g. Filichkin et al. (2010) Genome Res 20:45-58; Li et al. (2010) Nat Genet 42:1060-1067) and will construct libraries for this project. In particular, Brutnell proposes generating an indexed library for sequencing the F2 progeny on the Illumina HiSeq2000 platform and to generate the libraries for sequencing NMU and fast neutron-mutagenized populations. Mockler proposes generating Illumina libraries for the 50 diverse accessions of S. viridis. Both the Mockler and Brutnell labs will also assist in data analysis. Mockler has developed automated informatics pipelines for sequence variant detection from Illumina genomic resequencing datasets. These pipelines have been used in the JGI Brachypodium distachyon re-sequencing project (PI John Vogel; manuscript in preparation) and the snow leopard genome project (Mockler, Irizarry et al., manuscript in preparation). The Mockler and Brutnell labs are happy to collaborate with JGI on bioinformatics efforts associated with this proposed project.