Project summary Jeffrey K. Conner, Michigan State University, PI; Shin-Han Shiu, Michigan State University, Co-PI; Yongli Xiao, The Institute for Genomic Research (TIGR), Co-PI Scientific objectives and approaches: Radish (Raphanus) is an important crop, a major agricultural pest weed on six continents, and an invasive species of natural areas, especially in California. Radish is a model system for studies of ecology and evolution, with major past and ongoing work on population and molecular genetics, plant-insect interactions (both pollination and herbivory), quantitative genetics of floral and life history traits, natural selection through both male and female fitness, adaptation to global change, and the possible role of transgene escape and natural hybridization in the creation of more weedy and invasive genotypes. Thus, we have very broad and deep knowledge of how radish interacts with its abiotic and biotic environment from basic ecology and evolutionary genetics to issues of fundamental applied importance. The wealth of ecological and evolutionary background in this species makes it an excellent candidate to understand adaptation at the molecular genetic level as well as address the applied issues; however, rapid progress in this area is currently hampered by the lack of radish sequence information. In addition, the taxonomic position of radish, as a close relative of Brassica and a more distant member of the same family as Arabidopsis, makes it an ideal candidate for comparative genomics among closely related plant species. We propose to sequence two cDNA libraries, one from the crop and one from wild radish, from both the 5’ and 3’ ends, to produce abundant EST and full-length cDNA sequence data. We will identify orthologs in Brassica and Arabidopsis, and initiate comparative genomic studies in several key areas including evolution in polyploids, gene retention and loss after duplication, and rates of adaptive evolution in an outcrossing plant. We will mine these data for codominant markers that will enable a number of research groups to understand adaptation of native, weedy, and invasive radish to its environment through fine scale genetic mapping. The cDNA sequence will also facilitate future studies of the mechanisms of phenotypic plasticity, e.g. induction of anti-herbivore defensive chemicals, through measurements of differential gene expression. Broader impacts: Sequence data of any kind for radish is sorely lacking, so the sequences we will generate will greatly facilitate the work of the radish research community, and likely attract additional ecologists and evolutionary biologists to this species. This project will establish a collaboration between Kellogg Biological Station (KBS), a leading ecological and evolutionary field station that includes an NSF Long-Term Ecological Research (LTER) site as well as an ongoing K-12 educational partnership, with TIGR, a leading structural genomics center. Such collaborations are unusual, but the marriage of modern genomics with modern field ecology and evolution will greatly advance our understanding of both areas, as well as educational opportunities for students at all levels and their teachers. Because KBS is a key member of the Great Lakes and Central US Ecological Observatory, a member of the Consortium of Regional Ecological Observatories, this collaboration could(will?) impact the NSF National Ecological Observatory Network program as well. We will work with high school biology teachers and graduate fellows who are part of the NSF-funded KBS GK-12 project (Conner is a co-PI) to develop a classroom unit on the use of genetic tools in ecology, environmental, and genomic science. Project Description Progress report: NSF DBI-0312656 Large-Scale Analysis of Novel Arabidopsis Genes Predicted by Comparative Genomics; (P.I. C. D. Town; co-P.I. YL. Xiao; 9/03-8/06). This project was funded to verify the structure of, and produce full-length cDNAs in a Gateway recombination vector for 2,000 Arabidopsis genes that were either annotated as hypothetical or not annotated initially but subsequently predicted by comparative genomics (http://www.tigr.org/tdb/hypos/). The project is currently being completed through a no-cost extension. To date, we have processed over 2,300 genes through the RACE/structure pipeline, validating the structure of ~1,500 annotated genes and providing experimental support for ~500 novel genes. Approximately 2,100 ORFs have been targeted by our highly efficient FL-cDNA pipeline yielding more than 2,000 Gateway entry clones which include hypothetical genes, novel genes and several hundred low-expression genes that have functional annotation but for which there is no evidence for expression in the public ATH1 GeneChip data. Due to the sensitivity of our ORF cloning pipeline and the richness of the cDNA populations employed, we were able to generate Gateway entry clones for many of the “non-expressed” genes. At the start of the project, the ORF clones were produced only in the closed configuration (i.e. with a stop codon) as per our original project description. However, around the mid-point of the project we adopted a degenerate primer strategy developed by Pierre Hilson and colleagues and for the last 800 targets have produced ORF clones in both open and closed configurations (Underwood et al. 2006). Approximately 2/3 of our clones have been deposited at the Arabidopsis Biological Resource Center (ABRC) and the rest are being re-arrayed to complete this process. TIGR produced 152,680 sequence reads for this project and total clear range of sequences is 78,061,332 bp. This community service project has also generated ~5,000 GenBank submissions and one publication with another in preparation. Overview of the genus Raphanus The genus Raphanus (radish) includes the cultivated radish, R. sativus, and one of the world’s most economically important weeds, R. raphanistrum (Holm et al. 1997). Raphanus is a model system in plant reproductive ecology and evolution, particularly in the areas of pollination and herbivory (e.g., Agrawal 1998; Agrawal et al. 2002; Agrawal et al. 2004; Bett and Lydiate 2003; Conner 2002; Conner et al. 2003a; Devlin and Ellstrand 1990; Irwin and Strauss 2005; Irwin et al. 2003; Mazer and Schick 1991; Morgan and Conner 2001; Snow et al. 2001; Stanton et al. 1986). The genus originated in the Mediterranean region. The crop radish, R. sativus, may have had multiple origins, probably derived from R. raphanistrum or a recent common ancestor. All radish species or subspecies are highly interfertile, with little segregation distortion or disruption of chromosome pairing in crosses between R. sativus and R. raphanistrum (Bett and Lydiate 2003), and in California most wild radish is the result of hybridization between these two species, so several authors have suggested that all Raphanus are in fact one species (references). In the last 200 years weedy R. raphanistrum has spread to every continent except Antarctica, is an increasingly serious agricultural pest in 17 countries (Holm et al. 1997), and is the worst dicot agricultural weed in southwest Australia (Warwick and Francis; R. Cousens, pers. comm.). The Raphanus genome has nine chromosomes, is fairly small compared to other angiosperms (estimated at 573 Mbp; Johnston et al. 2005; map distance 915 cM; Bett and Lydiate 2003), and is very closely related to the Brassica A and C genomes (Warwick and Black 1991). The most recent treatment of the large (over 3000 species) and important family Brassicaceae divides most of the family into three large and well-supported clades (Beilstein et al. 2006). One of these clades includes Arabidopsis and Capsella (both currently being sequenced), and Brassica and Raphanus are in one of the other clades. These latter two are sister genera, having shared a common ancestor between 0.9 and 2.2 mya (Yang et al. 1999; Yang et al. 2002) and crosses between these two have been conducted for some time (creating amphidiploid Raphanobrassica; Williams and Hill 1986). The Arabidopsis/Capsella and Raphanus/Brassica clades probably diverged 15 to 20 million years ago, and sequence similarity between Brassica and Arabidopsis ranges from 75%-90% in exons. Specific aims 1. To generate full-length cDNA sequence that can be mapped and for which orthologues can be found for comparative genomics in Brassica, Arabidopsis, and Capsella. 2. Mine the sequence data for gene-based codominant markers (5’UTR-SSR, ESTSSR, SNP, CAPs, dCAPs, intron-spanning markers) for use in mapping and other studies in comparative genomics, ecology, and evolution. 3. To establish the degree of sequence similarity between radish and its relatives with genomes that are either sequenced or for which large scale sequencing is planned or in progress. To analyze the sequence data for gene content, relationships to genes in other plant species, and patterns of gene duplication, loss and retention, as well as to test hypotheses of the nature of selection on plant genes. Background Importance of radish as a crop and weed Radish was already an important crop in ancient Egypt over 5000 years ago, and was likely independently domesticated in China over 2000 years ago (Snow and Campbell 2005). The value of the US radish crop in 2000-01 was $50 million (www.ers.usda.gov/briefing/Vegetables/vegpdf/Radishes.pdf), and radish is certainly far more important in Asia, where a large variety of radishes are grown for their edible roots (including daikon) and others for edible leaves as fodder or for human consumption of seedpods (rat-tail radish; Snow and Campbell 2005). Wild radish is a major pest of cereals and other crops worldwide, especially winter wheat, and is a serious weed in at least 17 countries (Holm et al. 1997). It is the most damaging weed in small grains in the southwestern US (refs in Schroeder89, Warwick and Francis), where it can reduce winter wheat and canola yields by up to 40% as well as contaminate seed stock (refs in Warwick and Francis; www.ag.ndsu.nodak.edu/aginfo/entomology/ndpiap/Canola_GS/23weeds.htm). Radish is becoming a more serious pest, especially in the US and Australia, for at least two reasons. First, wild radish has evolved resistance to a variety of herbicides in Australia and South Africa (www.weedscience.org). Second, the increasing use of low-tillage practices to reduce soil erosion in the US makes wild radish harder to control (Culpepper et al 2005). On the other hand, wild radish is sometimes used as a “green manure” to help control other weeds through the allellopathic chemicals produced by radish (Norsworthy 2001; 2005). The annual weed and crop radish have evolved from winter annual ancestors – Recent work in Conner’s lab on native European populations of R. raphanistrum show that these populations are winter annuals, forming a tight rosette with many leaves and bolting and flowering only after a cold treatment. This is in contrast to the populations of weedy radish that have been studied to date in a number of labs, which form little or no rosette and bolt and flower quickly. The crop radish also does not form a rosette and often flowers quickly; delayed flowering is a major goal for radish breeders (Curtis et al 2002; Curtis 2001; Snow and Campbell 2005). Thus, a major shift has occurred in the life history of radish under domestication and in becoming a serious worldwide weed. Having a genetic map of radish will greatly facilitate finding the genes that underlie the evolution of a weed from its wild progenitor. A molecular genetic understanding of this shift would provide fundamental insights into crop domestication, weed evolution, and life-history evolution in plants in general. Finding the gene loci responsible for this shift in radish will be greatly facilitated by the wealth of knowledge of the genetics of flowering in its close relative A. thaliana, including many candidate genes such as CO, FT, and gigantea (refs); gigantea has been used to produce a later-flowering crop radish (Curtis et al 2002). Note that Arabidopsis also has annual and winter annual genotypes. To our knowledge, Brassica rapa is the only other serious weed species currently being sequenced, and Brassica and Raphanus each have the added advantage of containing both major crops and major weeds. Wild radish is an invasive species of wild habitats in California. The California Invasive Plant Council (www.cal-ipc.org) lists Raphanus sativus as an invasive of moderate distribution but limited impact to date. However, given that radish has greatly increased its distribution over the last 20 years or so, it is a major concern for the future. Norman Ellstrand’s group has studied radish ecology and evolution for over 20 years (see below). They have found that the currently invasive radish in California is actually a hybrid between crop (R. sativus) and weedy (R. raphanistrum) radish, and that it has caused the extinction of both progenitor species in the wild. The invasive populations share a specific combination of traits from the crop and weedy ancestors, and that the invasive is transgressive for one fitness-related trait – fruit weight is far greater in the hybrids than in either parent (Hegde et al. 2006). Ellstrand (pers. comm.) would use the markers developed by our proposed work and subsequent radish genetic map to find the genes and chromosome segments from each of the two parental species that affect the invasiveness of wild radish in California. Radish is a model system for assessing the potential for trangenes inserted into crops to escape and increase the spread of weedy and invasive relatives. Allison Snow (Ohio State University) has established 15 replicate populations of experimentally-produced crop-wild radish hybrids in northern lower Michigan, and plans to submit an NSF LTREB? proposal to expand this work to other locations (A. Snow, pers. comm.). The crop/weed hybrids had lower F1 fitness, but crop genes persisted over three years in the field (Snow et al. 2001). Snow (pers. comm.) would use the markers developed from the proposed sequencing to determine the specific genes and chromosomal segments from the crop that are retained in the hybrid weedy populations. Radish as a model system in ecology and evolution Below we give some examples of the diversity of ecological and evolutionary work on radish. The underlying themes of all of this research are adaptation to the biotic and abiotic environments (both natural and human-impacted) and some of the key traits involved in this adaptation; the breadth and depth of this work demonstrates that radish is one of the few true model systems in ecology and evolutionary biology. Plant-Insect interactions The interactions between angiosperms and insects are some of the most important determinants of ecosystem structure and function, due to the dominance of these two groups in terms of numbers, biomass, and diversity. Herbivory is the main antagonistic plant-insect interaction, and pollination the main mutualism. Both have been extremely well-studied in wild radish. Herbivory is one of the most important challenges that plants face, and a major challenge for agriculture Herbivory decreases female fitness (seed production) in radish. This decrease in fitness is known to occur both in response to chewing insects like caterpillars (Lehtila and Strauss) and sucking insects like aphids (Snow and Stanton 1988), and the spatial and temporal patterns of leaf damage within a plant affect the magnitude of the resulting decrease in female fitness (Mauricio et al. 1993). Radish has evolved multiple induced defenses against this herbivory; the fitness costs, benefits, and quantitative genetics of these plastic responses to herbivory are extraordinarily well-known. Induced responses to herbivory are an important type of adaptive phenotypic plasticity, in which plants produce more defensive chemicals or structures after damage by herbivores. Feeding by herbivores on radish increases the density of defensive hairs (trichomes) on the leaves as well as toxic chemicals (glucosinolates) in the leaves, and these increased defenses reduce subsequent herbivory by both chewing and sucking herbivores and increased plant fitness relative to non-induced control plants (Agrawal 1998; Agrawal 1999; Agrawal et al. 2002). However, the induced defense has a cost, as the fitness of induced plants is decreased in the absence of later attack by herbivores (Agrawal et al. 1999b). The induced resistance was even transmitted to offspring through a maternal effect (Agrawal 2001; Agrawal et al. 1999a). The level of glucosinolate induction is heritable, demonstrating that continued selection for induction will result in continued evolution of this trait (Agrawal et al. 2002). The genomic tools enabled by the proposed cDNA sequencing would provide the basis for a much needed radish genetic map and?? allow this work on induced defenses in radish to be taken to the next level. For example, the mechanisms of the inducible defenses could be uncovered by examining differences in gene expression between plants damaged by herbivores and others protected from damage. This is similar to work that an NSF Minority Postdoctoral Fellow in Conner’s lab, Gabriela Bidart-Bouzat, is undertaking in Arabidopsis. A genetic map would be the first step toward finding the genes underlying resistance to herbivory. Pollination is a key mutualism for angiosperms, and is crucial for reproduction in crops, weeds, and native plants. Most studies of plant-pollinator interactions have been on plants that are specialized, that is, have only one or a few closely-related pollinator species, but many, perhaps most, plants are more generalized in their pollination. Radish is perhaps the best-studied of these generalist pollination systems. Radish has floral color polymorphisms, and in both R. raphanistrum (yellow and white flowers) and R sativus (purple, pink, and white flowers) different taxa of pollinators have different color preferences (Kay 1976; Kay 1978; Kay 1982; Stanton 1987). The different pollinator taxa also vary in their preference for floral size and number, and in their efficiency in removing and depositing pollen (Conner et al. 1995; Conner and Rush 1996). Conner’s lab would use a radish genetic map to find the genes affecting pollinator attraction and efficiency in radish, an intraspecific analogue to the work by Schemske and Bradshaw on crosses between two species of Mimulus (Bradshaw et al. 1998; Bradshaw et al. 1995; Bradshaw and Schemske 2003; Schemske and Bradshaw 1999). Mechanisms of adaptation The rate of adaptation of a complex (quantitative) phenotypic trait is determined by two elements: the strength of natural selection, often quantified as the selection gradient (), and the G matrix containing the additive genetic variances and covariances among the traits. The latter are often expressed in their more familiar standardized versions, heritability and genetic correlations respectively. We have extraordinarily broad and deep knowledge of the strength of natural selection and the G-matrix for floral and life-history traits in wild radish, perhaps more so than for any other plant species. Natural selection Seed size is an important determinant of success in native as well as weedy and invasive plants; the causes and fitness consequences of seed size have been well-studied in radish. Maureen Stanton (UC Davis) has shown that there are both developmental and genetic components to seed size variation (Nakamura and Stanton 1989; Stanton 1984a), and that the developmental processes led to six-fold variation in seed size within single radish fruits. This within-fruit variation has strong fitness consequences in the field, as larger seeds from the same fruit were more likely to sprout, grew faster, and made more flowers than smaller seeds from the same fruit (Stanton 1984b). These differences resulted in differences in lifetime female fitness (Stanton 1985), a key evolutionary parameter. Selection through differences in male fitness (seed-siring success) is a crucial component of adaptive evolution in plants, but has been well studied only in radish. Half of all nuclear genes transmitted across generations are through pollen or sperm, that is, male function, but the vast majority of ecological and evolutionary studies of selection and fitness in plants measure only female fitness (numbers of seeds produced). Actual male fitness, estimated as the number of seeds sired using genetic marker-based paternity analysis, has been measured in wild radish more often than any other plant; indeed wild radish was one of the first plant in which this was ever accomplished (Devlin et al. 1992; Devlin and Ellstrand 1990; Stanton et al. 1986). As a result, we know more about how herbivory and pollination affect lifetime male and female fitness, and more about selection on floral traits through male and female fitness in wild radish than we do for any other plant. For example, the work of Stanton’s group and Conner’s group show that selection on floral color (Stanton et al. 1986; Stanton et al. 1989) and floral morphology (Conner et al. 2003b; Conner et al. 1996a; Conner et al. 1996b; Morgan and Conner 2001; Stanton et al. 1991) is often stronger through male fitness than through female fitness. Strauss and Conner’s labs have shown that leaf damage by herbivores can affect attractiveness of the plant to pollinators and resulting male fitness (two refs+Lehtila and Strauss 1999). A key component of male fitness is pollen competition; we know more about pollen competition and its fitness effects in radish than perhaps any other plant. Diane Marshall of the University of New Mexico has been examining the processes that govern the success of pollen from different males deposited on the same flower for twenty years. She has found that multiple paternity within single wild radish fruits is common, and the relative success of pollen from different males is nonrandom, consistent across maternal plants, and occurs at least in part through interference competition (Ellstrand and Marshall 1986; Marshall 1988; Marshall 1998; Marshall et al. 2000; Marshall and Diggle 2001; Marshall and Ellstrand 1985; Marshall and Ellstrand 1986; Marshall and Ellstrand 1988; Marshall and Ellstrand 1989; Marshall and Folsom 1992; Marshall et al. 1996; Marshall and Fuller 1994; Marshall and Oliveras 2001). Work in Maureen Stanton’s group at UC Davis has shown that pollen competitive ability is both heritable (Snow and Mazer 1988) and strongly affected by the environment (Young and Stanton 1990). Marshall, Karron, and Snow have shown that the deposition of pollen from multiple donors on a flower affects both maternal and offspring fitness (Karron and Marshall 1990; Karron and Marshall 1993; Marshall and Whittaker 1989; Snow 1990). Marshall would use genomic tools to measure gene expression in the pollen and stigmas in response to different pollination treatments (D. Marshall, pers. comm.). Genetic variance and covariance (G-matrix) Genetic correlations do not cause the expected evolutionary constraint in wild radish. Constraints on adaptive evolution, defined as anything that slows or prevents the attainment of an optimally adapted phenotype, have been a topic of major interest since the publication of Gould and Lewontin (1979). Genetic correlations among traits have often been invoked as a likely cause of constraint (e.g., Arnold 1992; Clark 1987; Maynard Smith et al. 1985). The genetic correlation between the filament and corolla tube in R. raphanistrum flowers is one of the highest ever reported in nature (Conner and Via 1993), is caused by pleiotropy (Conner 2002), and is stable across environments, populations, and related species (Conner et al, submitted). Thus, this correlation should cause an evolutionary constraint, that is, a slowing of the evolution of the most adaptive combination of traits. However, contrary to this prediction, artificial selection produced rapid independent evolution of these traits, with little evidence for a constraint (Conner et al, submitted). Stanton and Young (Stanton and Young 1994) reported very similar results for petal size and pollen number in R. sativus. We already have extraordinarily broad and deep of knowledge about wild radish floral evolution including pollinator-mediated selection based on lifetime male and female fitness measured in six field seasons at two field sites, multiple quantitative genetic analyses conducted in both the field and greenhouse, and phylogenetic comparative studies across the family Brassicaceae. Therefore, the logical next step is an understanding of the molecular genetics of these traits, but this will be difficult without more comprehensive sequence data. To facilitate future QTL (linkage) and association (linkage disequilibrium) mapping, a dense molecular map is required. An EST sequencing project using cDNA from multiple samples would provide the infrastructure for developing resources such as an expression microarray and a linkage map. Importance of radish for comparative genomics In the family Brassicaceae, Arabidopsis thaliana has been fully sequenced, and sequencing projects are underway for A. lyrata as well as the very closely related Capsella rubella by the Joint Genome Institute. Sequencing projects are also underway for Brassica rapa and B. oleracea (Ayele et al. 2005; Yang et al. 2005), which are very close to Raphanus (see Overview above). Having sequence data available for Raphanus would provide the comparative genomics community with the ability to make hierarchical comparisons between replicate pairs of closely related genera that are more distantly related to each other, but still close enough for comparisons across the two pairs. Sequences from species pairs can be used to determine if genome-wide trends are consistent across related lineages; the data for these kinds of analyses are not currently available in plants. Specifically, the availability of Raphanus sequence will (a) improve gene annotation and facilitate the identification of novel coding and RNA genes, (b) allow the detection of positive and lineage-specific selection on plant genes, and (c) provide crucial details on gene gain and loss patterns in plant gene families. Gene discovery and annotation through sequence conservation: The availability of multiple genomes greatly facilitates gene prediction, because evolutionary conservation can be used to identify likely functional regions (Brent and Guigo 2004). For protein coding genes, dual genome (e.g. TWINSCAN, Korf et al. 2001) and multiple genome (e.g. phylo-HMM, Siepel and Haussler 2004) gene finders have been developed that significantly out-perform prediction programs that use a single genome. Thus, the proposed project will greatly facilitate dicot gene prediction using dual or multiple genome gene finders. In addition to protein coding genes, substantial RNA genes are likely present in the unannotated regions of eukaryote genomes (Meyers et al. 2006). Recent whole genome tiling array studies have revealed candidate expression signals in intergenic regions in humans (e.g. Kapranov et al. 2002; Bertone et al. 2004), Arabidopsis (Yamada et al. 2003; Stolc et al. 2005), rice (Li et al. 2006), and Drosophila (Stolc et al. 2004). The proposed project will generate cDNA information useful for identifying novel RNA genes and assist in the validation of putative RNA genes found in other organisms, particularly in plants. Nature of selection on plant gene sequences: Comparisons of DNA polymorphism within species to divergence between species allows the identification of positively selected genes as well as the differentiation of weak from strong purifying selection (Hudson et al. 1987; McDonald and Kreitman 1991; Sawyer and Hartl 1992). In species such as Drosophila melanogaster, several studies have shown that substantial number of protein coding genes experienced positive selection (Fay et al. 2002; {Smith, 2002 #106}; Sawyer et al. 2003). In humans, 9% of loci analyzed show rapid amino acid evolution (Bustamante et al. 2005). On the other hand, studies of Arabidopsis thaliana populations show that most substitutions are deleterious (Bustamante et al. 2002). The differences between Drosophila and Arabidopsis have been attributed to the primarily selfing Arabidopsis mating system (Bustamante et al. 2002). Therefore, to see if plant genes experience positive selection like other species, sequences from natural populations of outbreeding species like radish will be necessary. The proposed project will generate cDNA sequences both within and between radish species that will facilitate the identification of positive selection of genes in plants. The availability of sequences from multiple species will also allow the identification of genes experiencing positive selection in the lineagespecific fashion (Clark et al. 2003). These positively selected genes are candidate targets for adaptive evolution to the biotic and abiotic environments in radish. Polyploidy, gene loss and retention: Polyploidy has occurred extensively in angiosperms and is recognized as a key factor in the evolution of plants and their genomes (Wendel 2000). Gene loss occurs frequently in polyploids; for example, more than 80% of genes were lost after the most recent polyploidization event in the Arabidopsis lineage (Blanc and Wolfe 2004). The high gene loss rate is corroborated by a sequence analysis of a 2.2 Mb region representing triplicated genome segments of Brassica oleracea, which are each paralogous with one another and homologous with a segmentally duplicated region of the Arabidopsis thaliana genome (Town et al. 2006). Nonetheless, some gene families are preferentially retained, which suggests that they are important in plant-specific adaptations (Blanc and Wolfe 2004; {Shiu, 2004 #501}; Shiu et al. 2005). The two clades of Brassicaceae discussed above differ in ploidy level, with the Brassica/Raphanus clade having undergone a genome triplication after having diverged from the clade containing Arabidopsis and Capsella. Because of the higher rate of gene duplication in plants compared to other organisms, independent gene losses have also occurred at higher rates, which obscure orthologous relationships. With Raphanus cDNA sequences in hand, phylogenomic approaches can be applied to infer gene gain and loss events in gene families to provide a better understanding of the factors that contribute to duplicate gene retention. Raphanus diverged from Brassica only ~1-2 million years ago. Therefore, having Raphanus sequences would facilitate comparative studies of the consequences of polyploidy at a much shorter time scale than has been possible previously. The broad and deep knowledge of adaptive traits in Raphanus discussed above should facilitate making the link between the genes that are preferentially retained and adaptation to the natural environment. Rationale and Significance •Radish is a model system in ecology and evolution, an important crop, an invasive species of natural land, and a serious agricultural weed worldwide. Given the wealth of ecological and evolutionary work that has been conducted on radish, a modest investment in sequence data for radish would have a large payoff in all these areas. •Very little sequence data of any kind are available for radish; therefore, cDNA sequence, maps, polymorphic markers, and markers in genes of known function (identified by orthology with Arabidopsis) would greatly facilitate radish research. The many research groups that study radish are in need of modern molecular genetic tools. •The phylogenetic position of radish, as the sister genus to Brassica, but in an entirely different clade than Arabidopsis and Capsella within the Brassicacae, means that sequence data from radish would provide the comparative genomics community with an unprecedented opportunity to make hierarchical comparisons. The ability to make these comparisons will greatly facilitate gene annotation and prediction, as well as the identification of genes under selection. •The two clades differ in the number of whole genome duplication events, with ~3 rounds in the Raphanus/Brassica and two rounds in the Arabidopsis lineage. Having sequence data from replicate pairs of species will offer unique insights into consistency in the patterns of gene retention and loss and better understanding on the nature of selection on plant genes. Research Plan Approach: We propose to produce full-length cDNA sequence libraries from the two named species of radish, the crop radish R. sativus and the native and weedy radish, R. raphanistrum. We will sequence, from both 5’ and 3’ ends, 50,000 clones from each of two normalized cDNA libraries of pooled tissue. Thus, a total of 100,000 clones from both libraries will be sequenced from both ends, for an overall total of 200,000 reads. This sequencing should generate at least 30,000 unique cDNA sequences. These sequences will be mined to generate a variety of gene-based codominant markers or marker candidates including 5’UTR-SSRs, EST-SSRs, SNPs, CAPs, and dCAPs. We will sequence from both ends to maximize the numbers of full-length cDNAs recovered, as well as to maximize the numbers of highly polymorphic markers discovered. This work will generate or enable the generation of three general classes of markers, listed below in decreasing order of level of polymorphism and increasing level of transferability across species: 1. SSR from 5’ UTR. The 5’ UTR has been shown to be by far the richest source of SSR markers in Arabidopsis, with almost 2400 SSRs found per MB, compared with less than 1000/MB in introns, 3’UTR, and genomic DNA (Lawson and Zhang 2006). Because these regions are untranslated, they should be at least as highly variable as SSR derived from genomic DNA, but they also should show lower transferability across species. Thus, these markers will be used for studies within radish, including within-population studies of the biologically important traits described above. For within-population mapping of outbred species, the most highly polymorphic markers are necessary, which means SSRs are the markers of choice. SSRs derived from genomic DNA are notoriously difficult to transfer between even closely related species, especially in plants (Whitton et al. 1997). Indeed, Conner’s lab screened 450 publicly available microsatellites from Brassica and found only about 25 that amplified well and were interpretable in radish. Of these, only 12 were informative in one outbred cross. Therefore, sequencing of radish directly is necessary to produce many informative SSRs. Besides serving as highly polymorphic markers for mapping and other studies in radish ecology and evolution, some of the SSRs we uncover may provide functional information as well, because recent research has shown that SSRs function in development and gene regulation (Fondon and Garner 2004; Karlin and Burge 1996; Li et al. 2004; Meloni et al. 1998). 2. SSR from translated regions (EST-SSR), plus SNPs, CAPs, and dCAPs. Based on studies from five cereal species (Kantety et al. 2002) and six species and subspecies of Medicago (Eujayl et al. 2004), SSR markers located in coding regions (EST-SSR) should be both polymorphic (>70 were polymorphic in Medicago; Eujayl et al. 2004), although not as polymorphic as SSR from the 5’UTR, and more transferable among closely related species than SSR from genomic DNA or UTR. Our sequencing should also uncover a large number of SNPs, many of which can be converted to CAPs and dCAPs markers (refs). These should also be transferable among closely related species. Thus, these markers will be most useful for comparative mapping between radish and Brassica. 3. Intron-spanning markers: For comparative mapping with the more distantly related species in Arabidopsis and Capsella, primers located within exons but that span introns that vary in length across species (Choi et al. 2004) would be most useful. We will predict the position of radish introns by aligning radish sequences with a corresponding genomic sequence of Arabidopsis, and primers will be designed to anneal in exon sequences and to amplify across intron regions, which will likely harbor ample length variation across species. These primer sequences will be provided to the community for screening. The resulting sequence data will be a valuable resource for researchers studying Brassicaceae species as well as for the comparative genomics community in general. In addition to marker identification, we will also initiate comparative analysis with several other plant genomes to generate insights on the evolution of plant genomes. Specifically, we will construct transcript assemblies (TAs), identify potential orthologous sequences from reference genomes including Arabidiopsis, Brassica, poplar, and rice, determine the gene gain and loss patterns in various gene families, and examine the nature of selection on plant genes. DNA substrate and sequencing strategy: We will construct two normalized cDNA libraries: 1. Four R. sativus cultivars pooled 2. Four R. raphanistrum populations pooled, two weedy and two native. One of the weedy populations will be the well-studied NY? population from North America (refs), and the other will be from southwestern Australia, where wild radish is a very serious pest. The two native populations will be from France and Spain and represent the landra and maritimus subspecies respectively. We chose this sampling scheme so that we would uncover ample genetic variation both within and among libraries, but the plant material included in both libraries is closely related enough that we will have a high frequency of sequence matching. Although the two libraries are constructed from different named species, recall from the Overview above that several authors have proposed that R. sativus and R. raphanistrum are actually the same species and that R. sativus was domesticated from R. raphanistrum. This means that there should be a low percentage of sequence divergence between our two libraries. The libraries will contain ample genetic variation, as each plant sampled will be highly heterozygous (since Raphanus is selfincompatible), the cultivars chosen will be highly divergent (ref? see above), and there is substantial neutral marker differentiation between the weedy and native R. raphanistrum (Sahli and Conner, in prep.). The variation in our libraries represents natural variation (among the two native populations), variation due to domestication (among cultivars of R. sativus and between the two libraries, because R. sativus was likely domesticated from R. raphanistrum), and variation due to the evolution of a serious agricultural pest (within the R. raphanistrum library). The use of multiple cultivars of R. sativus and two populations each of weedy and native R. raphanistrum means that the cDNA sequences we will generate will be more representative of the genus Raphanus in general, and of the crop and weedy radishes specifically. By sequencing separate libraries for the two named species, we will be able to assign sequence variants unambiguously to each. It will be straightforward for researchers in future work to assign variation within libraries to the different cultivars, populations, or subspecies by simply designing PCR primers to amplify the variable regions and screen the plant populations in question. We will collect tissue from a variety of plant parts at different developmental stages, focusing particularly on newly-formed flower buds and shoot apical meristems; this will ensure that we get transcripts from developmental genes and the genes affecting the floral traits discussed above. Conner’s lab will grow the plant material and isolate total RNA using RNeasy Plant Mini kits (Qiagen); we have seeds from all of these populations and experience collecting tissues into liquid nitrogen and isolation of RNA with the RNeasy kits. Library construction Libraries will be constructed using the normalization services of Evrogen (www.evrogen.com). Evrogen combines the full-length Smart technique to capture full-length sequences (Zhu et al. 2001) with a proprietary normalization strategy using a novel duplexspecific nuclease (Shagin et al. 2002). Isolated total RNA will be sent to Evrogen. Normalized double-stranded cDNA generated by Evrogen will be directionally ligated into SfiI A/B sites of pDNR-LIB (Clonetech) and transformed into GC5 High Eff Competent Cells (Gene Choice) at TIGR. The titer of each library will be checked before colony picking and sequencing. Most recently, this strategy in Medicago EST sequencing resulted in 40-60% near full-length cDNAs in various libraries. Therefore, this approach should be able to generate a high yield of novel ESTs including a high percentage of full-length cDNAs from both crop and wild radish cDNA libraries. Sequence Quality and Quantity Sequencing will be carried out at the TIGR affiliate organization, the J. Craig Venter Science Foundation Joint Technology Center (JTC). JTC has a state-of-the-art facility and is one of the world's leading DNA sequencing organizations in terms of capacity, cost effectiveness and scientific expertise. JTC employs robotics, LIMS tracking and 100 of the most advanced sequencing machines, the Applied Biosystems’ 3730xl automated DNA analyzer. The JTC’s current capacity is greater than 52 million sequence reads (lanes) per year. Current average read lengths are at least 700 bp (sequence quality equivalent to phred 20) or longer and recent EST projects have sequenced with 80% to 90% efficiency. Approximately 200,000 total sequence reads with an average read length of at least 700 bp will be generated from both ends of 100,000 cDNA clones from the crop and wild radish libraries. In the first year, the normalized cDNA libraries will be constructed and pilot sequencing of about 1000 clones from each library will be completed in order to assess the quality of both libraries. The production of EST sequences will be accomplished in the rest of the first year and the first half of the second year. Base-callers will be used to provide quality values for each base produced. Our daily QC reports evaluate production success using several summary statistics including number of reads, sequencing success rate, read lengths and average quality values (see Appendix A3 for details). All the sequences will be cleaned, including trimming of vector and adaptor sequences, removal of all low-quality sequence and any contamination, and then will be assembled and clustered to generate a radish gene index or transcript assemblies (Lee et al. 2005; Quackenbush et al. 2001; Quackenbush et al. 2000). We estimate based on our experience that the project should produce about 30,000 unique sequences, both tentative consensus sequences (TCs) and singletons. There are currently only 94 EST sequences from radish in GenBank (06/01/2006). Therefore, the immediate outcome of this project will be the significant increase of the numbers of radish ESTs, which will greatly enrich the genomic resources available to the radish research community. The analysis of all the sequences of this project will be finished in the rest of the 2nd year. Table 1. Summary of sequencing cost Type of sequence to be generated Direct sequence cost Total number of sequencing reads budgeted Total number of successful sequencing reads Anticipated sequence read length (in phred20 bases) Anticipated paired end rate Estimated cost of library preparation Estimated cost per phred20 base Estimated cost per finished base ESTs $98,000 200,000 170,000 721 85% $6,000 $0.0008 $0.0012 * JTC direct cost is $0.49 for random reads. Per lane cost to TIGR of $0.70 includes JTC indirect costs and is excluded from TIGR indirect costs – see budget justification Analysis A. Content of radish transcriptome and orthologous group identification The EST sequences generated by the proposed study will provide a wealth of information on gene content in radish. All the sequences will be cleaned, including trimming of vector and adaptor sequences, removal of all low-quality sequence and any contamination, and then will be assembled by a modified CAP3 program (Huang and Madan 1999) and clustered to generate a radish gene index or transcript assemblies (TAs) (Lee et al. 2005; Quackenbush et al. 2001; Quackenbush et al. 2000). We estimate based our experience that the project should produce about 30,000 unique sequences, both tentative consensus sequences (TCs) and singletons. All TAs including TCs and singletons will be searched using the basic local alignment search tool (BLAST; Altschul et al. 1990) against the TIGR non-identical amino acid (niaa) database, which is made up of all proteins available from GenBank (http://www.ncbi.nlm.nih.gov), PIR (http://pir.georgetown.edu), SWISS-PROT (http://www.expasy.ch/sprot), and TIGR's CMR database, the Omniome (http://cmr.tigr.org). These searches will enable us to annotate all transcript assemblies, identify the possible novel ones from radish, and discover whether crop and wild radish differ in their transcript assemblies. At the same time, this search will identify possible full-length cDNA sequences and untranslated regions (UTRs) by looking for the inframe ATG position relative to the start codon of the matched protein. From our recent Medicago EST study, we estimate that at least 40% of our sequences will be full-length cDNA; these will constitute an invaluable resource for gene annotation, gene prediction and functional genomic studies (Alexandrov et al. 2006; Urbanek et al. 2005; Xiao et al. 2005). Since radish has an estimated genome size of 573Mbp (Johnston et al. 2005), repetitive elements such as transposons likely constitute a large part of the radish genome, but transposable elements (TE) have never been studied in this species. To distinguish transcribed transposon sequences from radish genes, the sequences generated will be searched against a TIGR database of plant TE peptide sequences using BLASTX which will identify the contents of TE in our radish ESTs including class-I DNA elements and class-II RNA elements (Kuhl et al. 2004). The orientations of ESTs that match will be inspected to determine whether the ESTs were products of directionally cloned transcripts, genomic contamination, or read-through from neighboring retrotransposons (Elrouby and Bureau 2001). Orthologous groups will be identified using phylogeny-based approaches (Shiu et al. 2005). First, gene family clusters will be constructed by Markov Clustering (Van Dongen 2000) using annotated protein sequences from the reference species A. thaliana, poplar, and rice. Additional plant genome information, such as those for A.lyrata, Capesella, and Brassica species will be incorporated as they become available. Phylogenetic trees of all family clusters will be constructed as in Shiu et. al (2006). All the TAs will be mapped to the tri-species gene family trees by identifying the best matches of each TA in the three reference species. Each gene family tree and associated radish TA mapping information will then be superimposed on to the species trees of Arabidopsis, radish, poplar, and rice to identify orthologous groups based on maximum parsimony. B. Data mining for the three classes of markers The Raphanus ESTs will be mined to generate the three general classes of markers with decreasing order of level of polymorphism and increasing level of transferability across species (see above) including: (a) SSR from 5’ UTR, (b) SSR from translated regions (EST-SSR), plus SNPs, CAPs, and dCAPs, and (c). Intron-spanning markers. Below we outline how SSR and exons will be identified and how SNPs and some of the variation in SSRs can be uncovered from the Raphanus EST sequences. Screening for further SSR variation as well as intron-length variation will be left for our future work or other investigators (all information below will be made publicly available). Transcript assemblies will be screened for simple sequence repeats (SSRs) using the MISA program (Thiel et al. 2003), which removes poly A/T tracks, identifies microsatellites, and finally, can design primers for experimental verification of the detected microsatellites using Primer 3 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi). We will conduct an analysis similar to that of Lawson and Zhang (2006) on the radish cDNA sequence generated to compare the frequency of SSR among 3’UTR, 5’UTR, and exons. Although EST sequences will not contain intron-spanning variation, we can lay the ground work for identifying them by identifying exons in radish ESTs. Based on the orthologous group defined in the previous section, the putative orthologs in the Arabidopsis, poplar, and rice for each radish ESTs and TAs will be identified. Based on the EST to orthologous gene protein alignments, we will extract the translated sequences from ESTs. Each translated EST sequence will then be used to search against orthologous gene nucleotide sequences of the reference species. In case where the protein-to-nucleotide alignment is interrupted by a length longer than the pre-defined threshold for each reference species, the alignment breakpoints are regarded as exon boundaries. The threshold is defined as the number of base pairs that is smaller than 99% of the intron in a refernce species. Sequence variation (SSR and SNPs) will be identified by comparing different TAs or ESTs. First, we will map all TAs to the annotated genes of Arabidopsis or poplar based on sequence similarity (> 80% identical, over 300 nucleotides aligned). In cases where multiple TAs are mapped to the same gene in Arabidopsis or poplar and the identities between these TAs are >= 90%, these TAs are regarded as potential variants. This threshold is chosen based on the sequence identity distribution of paralogs originating from the most recent whole genome duplication in the Brassica-Raphanus lineage (Town et al. 2006). If the TAs are mapped in tandem configuration in Arabidopsis or poplar, the associated TAs will be excluded since they may also represent tandemly duplicated paralogs. If the differences between two TAs are indels that overlap with introns, then they will be regarded as alternatively spliced variants and discarded as well. The remaining TAs form a number of “orthologous groups” with Arabidopsis, poplar, and rice protein genes as described above. In orthologous groups containing TAs from different species of Raphanus, sequence variation will be identified from alignments of each group. Sequencing errors will be checked by evaluating the quality value of the various base pairs from CAP3 assembly and the quality values from these bases of each component EST generated by TIGR sequencing. For TAs that do not map to Arabidopsis or poplar genes, single linkage clusters of TAs will be generated with an identity threshold of 90% and an alignment length threshold of 300 bp; each cluster is regarded as an orthologous group. Differences between libraries will be regarded as distinct variants only if >80% of the TAs within each library have the same nucleotide. While the between TA approach can identify rapidly accumulated sequence variation between Raphanus species, the relatively low identity threshold for transcript assembly precludes the identification of relatively subtle differences between the sequenced libraries. Therefore, we will map each EST to the reference species, identify ESTs in the same orthologous group but from different libraries, and identify variations among species if >80% of the ESTs within each library have the same indel or substitution. Sequencing error will be evaluated by checking quality value of these bases as described above. C. Gene gain/loss inference and lineage-specific selection Gene duplications and losses will be identified by the reconciled tree approach, in which gene family trees constructed in section A will be superimposed on the species tree (Page and Charleston 1997). The results will provide information on gene gain and loss events that occurred in the Arabidopsis lineage after its divergence from the Raphanus-Brassica lineage. The phylogenetic trees generated will also provide the framework for comparison of evolutionary rates in the Arabidopsis and Raphanus-Brassica lineage. For each orthologous group tree containing Raphanus, Arabidopsis, and poplar sequences, the number of synonymous (ds) and non-synonymous (dn) substitutions in each branch will be estimated using PAML (Yang 1997) and RateEstimator (Hanada and Shiu, unpublished). Using poplar sequence as an outgroup, significant differences in dn/ ds will be the criterion for detecting lineage-specific selection. Genes currently or recently experiencing positive selection will have a dn/ ds value significantly greater than one; we will use this criterion to identify positively selected genes in radish. In this framework, we will identify genes that experiencing common selection pressure among the Brassicaceae species analyzed as well genes subject to lineage-specific selection. Since two related species will be sequenced, we are particularly interested in identifying genes with contrasting selection regimes between species. In the cultivated radish, this will identify candidate domestication genes. Similarly, genes under positive selection in weedy radish are possible contributors to their success as weeds. Finally, to see if genes in outbred plants experience positive selection at the same frequency as inbred plants, we will determine the sequence polymorphism and variation in Raphanus as outlined in section B to estimate the number of positively selected genes. Utility of the sequence generated to the broader community In future proposals we will use the sequence generated to produce a radish linkage map using at least 200 highly polymorphic SSR markers for use in radish QTL studies. We will also screen these markers, as well as more conserved markers if necessary (which would also be added to the Raphanus map), in Brassica oleracea and Arabidopsis for comparative mapping. We will further use the cDNA sequences to develop radish microarrays for gene expression studies. These tools will likely attract additional researchers to radish, as well as enable current radish research to take the next key steps, such as: •Determine QTL underlying interactions between radish and its environment, particularly herbivores, pollinators, and human-induced global environmental changes such as temperature and CO2. •Measure selection on individual QTL through differences in both male and female fitness. •Study induced defensive responses to herbivory, as well as phenotypic plasticity in general, at the mechanistic level in terms of differences in gene expression between different environmental conditions. •Uncover the genetic changes that have led to the evolution of the agricultural pest ecotype in R. raphanistrum, the California invasive hybrid populations, and the potential for transgene escape from crop to weedy radish. Plan to Integrate Research and Education: The two graduate students at MSU will spend time working on the project in both MSU labs; thus, Shiu’s student will learn about evolutionary ecology and ecological genetics in Conner’s lab, and conversely, Conner’s student will learn about genomics and bioinformatics in Shiu’s lab. In addition, both MSU grads will travel together twice/year to TIGR, to participate and learn about high throughput sequencing and the databases and bioinformatics analysis programs that TIGR develops and uses. This cross-cutting interdisciplinary training is unusual for students in both evolutionary ecology and bioinformatics, and will help enable them to be more innovative and multidisciplinary in their future work. The project will also involve participation by high school students, undergraduates, and K12 teachers. The PIs will actively recruit underrepresented minorities and women, and provide opportunities for authorship on papers for all participants. Conner has been successful in these endeavors in the past: over 70% of the more than 150 undergraduates that have done research in Conner’s lab have been women, and Conner has mentored research projects by two AfricanAmericans (one a woman), two Latinas, and one female Pacific Islander. Conner currently has an NSF minority postdoctoral fellow, and the Conner lab will continue to sponsor high school interns from the Battle Creek and Kalamazoo Math and Science Centers. Conner is a co-PI on the NSF funded GK12 project at KBS (http://www.kbs.msu.edu/GK-12/Index.htm; nsf.gov/funding/pgm_summ.jsp?pims_id=5472&org=DGE&from=fund), which pairs graduate students in ecology and evolution with teachers in K12 classrooms. One of the goals is to help the teachers develop more inquiry-based activities in the classroom; to this end, we will work with the teachers and graduate fellows to develop at least one classroom unit on the use of genetic tools in ecology and environmental science, with an emphasis on environmental issues related to this proposal such as the evolution of weedy and invasive plants and adaptation to global environmental change. We plan to also submit an application for at least one NSF RET supplement if this proposal is funded, so that one or more teachers can get first-hand research experience working on this project. KBS has a number of involved teachers from over a dozen local school districts to draw from, and has hosted several RET supplement projects over the last few years. Shiu has established collaboration with the East Lansing Public Library (ELPL) to create an outreach program focused on the process of science, facts on evolution, and the prospects of genomics. ELPL has extensive experience in hosting outreach programs for all age groups and in attracting a broad audience in central Michigan. Since all current programs in ELPL focus on literature, theater, and fine arts, the science program will be a unique opportunity to educate the public about science, evolution and genomics, fulfilling the NSF’s goal of broad dissemination to enhance scientific and technological understanding. References cited Agrawal, A. A. 1998. Induced responses to herbivory and increased plant performance. Science 279:1201-1202. Agrawal, A. A. 1999. Induced responses to herbivory in wild radish: Effects on several herbivores and plant fitness. Ecology 80:1713-1723. Agrawal, A. A. 2001. Transgenerational consequences of plant responses to herbivory: An adaptive maternal effect? The American Naturalist 157:555-569. Agrawal, A. A., J. K. Conner, M. T. J. Johnson, and R. Wallsgrove. 2002. Ecological genetics of an induced plant defense against herbivores: additive genetic variance and costs of phenotypic plasticity. Evolution 56:2206-2213. Agrawal, A. A., J. K. Conner, and J. R. Stinchcombe. 2004. Evolution of plant resistance and tolerance to frost damage. Ecology Letters 7:1199–1208. Agrawal, A. A., C. Laforsch, and R. Tollrian. 1999a. Transgenerational induction of defenses in animals and plants. Nature 401:60-63. Agrawal, A. A., S. Y. Strauss, and M. J. Stout. 1999b. Costs of induced responses and tolerance to herbivory in male and female fitness components of wild radish. Evolution 53:10931104. Alexandrov, N. N., M. E. Troukhan, V. V. Brover, T. Tatarinova, R. B. Flavell, and K. A. Feldmann. 2006. Features of Arabidopsis genes and genome discovered using fulllength cDNAs. Plant Mol Biol 60:69-85. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J Mol Biol 215:403-10. Arnold, S. J. 1992. Constraints on phenotypic evolution. Am. Nat. 140:S85-S107. Ayele, M., B. J. Haas, N. Kumar, H. Wu, Y. L. Xiao, S. Van Aken, T. R. Utterback, J. R. Wortman, O. R. White, and C. D. Town. 2005. Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis. Genome Research 15:487-495. Beilstein, M. A., I. A. Al-Shehbaz, and E. A. Kellogg. 2006. Brassicaceae phylogeny and trichome evolution. American Journal of Botany 93:607-619. Bertone, P., V. Stolc, T. E. Royce, J. S. Rozowsky, A. E. Urban, X. Zhu, J. L. Rinn, W. Tongprasit, M. Samanta, S. Weissman, M. Gerstein, and M. Snyder. 2004. Global identification of human transcribed sequences with genome tiling arrays. Science 306:2242-6. Bett, K. E., and D. J. Lydiate. 2003. Genetic analysis and genome mapping in Raphanus. Genome 46:423-430. Blanc, G., and K. H. Wolfe. 2004. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16:1679-91. Bradshaw, H. D., Jr., K. G. Otto, B. E. Frewen, J. K. McKay, and D. W. Schemske. 1998. Quantitative trait loci affecting differences in floral morphology between two species of monkeyflower (Mimulus). Genetics 149:367-382. Bradshaw, H. D., Jr., S. M. Wilbert, K. G. Otto, and D. W. Schemske. 1995. Genetic mapping of floral traits associated with reproductive isolation in monkeyflowers (Mimulus). Nature 376:762-765. Bradshaw, H. D., and D. W. Schemske. 2003. Allele substitution at a flower colour locus produces a pollinator shift in monkeyflowers. Nature 426:176-178. Brent, M. R., and R. Guigo. 2004. Recent advances in gene structure prediction. Curr Opin Struct Biol 14:264-72. Bustamante, C. D., A. Fledel-Alon, S. Williamson, R. Nielsen, M. T. Hubisz, S. Glanowski, D. M. Tanenbaum, T. J. White, J. J. Sninsky, R. D. Hernandez, D. Civello, M. D. Adams, M. Cargill, and A. G. Clark. 2005. Natural selection on protein-coding genes in the human genome. Nature 437:1153-7. Bustamante, C. D., R. Nielsen, S. A. Sawyer, K. M. Olsen, M. D. Purugganan, and D. L. Hartl. 2002. The cost of inbreeding in Arabidopsis. Nature 416:531-4. Choi, H. K., D. Kim, T. Uhm, E. Limpens, H. Lim, J. H. Mun, P. Kalo, R. V. Penmetsa, A. Seres, O. Kulikova, B. A. Roe, T. Bisseling, G. B. Kiss, and D. R. Cook. 2004. A sequence-based genetic map of Medicago truncatula and comparison of marker colinearity with M-sativa. Genetics 166:1463-1502. Clark, A. G. 1987. Genetic correlations: The quantitative genetics of evolutionary constraints. Pp. 25-45 in V. Loeschcke, ed. Genetic Constraints on Adaptive Evolution. SpringerVerlag, Berlin. Clark, A. G., S. Glanowski, R. Nielsen, P. D. Thomas, A. Kejariwal, M. A. Todd, D. M. Tanenbaum, D. Civello, F. Lu, B. Murphy, S. Ferriera, G. Wang, X. G. Zheng, T. J. White, J. J. Sninsky, M. D. Adams, and M. Cargill. 2003. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302:1960-1963. Conner, J., and S. Via. 1993. Patterns of phenotypic and genetic correlations among morphological and life history traits in wild radish, Raphanus raphanistrum. Evolution 47:704-711. Conner, J. K. 2002. Genetic mechanisms of floral trait correlations in a natural population. Nature 420:407-410. Conner, J. K., R. Davis, and S. Rush. 1995. The effect of wild radish floral morphology on pollination efficiency by four taxa of pollinators. Oecologia 104:234-245. Conner, J. K., R. Franks, and C. Stewart. 2003a. Expression of additive genetic variances and covariances for wild radish floral traits: comparison between field and greenhouse environments. Evolution 57:487-495. Conner, J. K., A. M. Rice, C. Stewart, and M. T. Morgan. 2003b. Patterns and mechanisms of selection on a family-diagnostic trait: Evidence from experimental manipulation and lifetime fitness selection gradients. Evolution 57:480-486. Conner, J. K., and S. Rush. 1996. Effects of flower size and number on pollinator visitation to wild radish, Raphanus raphanistrum. Oecologia 105:509-516. Conner, J. K., S. Rush, and P. Jennetten. 1996a. Measurements of natural selection on floral traits in wild radish (Raphanus raphanistrum). I. Selection through lifetime female fitness. Evolution 50:1127-1136. Conner, J. K., S. Rush, S. Kercher, and P. Jennetten. 1996b. Measurements of natural selection on floral traits in wild radish (Raphanus raphanistrum). II. Selection through lifetime male and total fitness. Evolution 50:1137-1146. Devlin, B., J. Clegg, and N. C. Ellstrand. 1992. The effect of flower production on male reproductive success in wild radish populations. Evolution 46:1030-1042. Devlin, B., and N. C. Ellstrand. 1990. Male and female fertility variation in wild radish, a hermaphrodite. Am. Nat. 136:87-107. Ellstrand, N. C., and D. L. Marshall. 1986. Patterns of multiple paternity in populations of Raphanus sativus. Evolution 40:837-842. Elrouby, N., and T. E. Bureau. 2001. A novel hybrid open reading frame formed by multiple cellular gene transductions by a plant long terminal repeat retroelement. J Biol Chem 276:41963-8. Eujayl, I., M. K. Sledge, L. Wang, G. D. May, K. Chekhovskiy, J. C. Zwonitzer, and M. A. R. Mian. 2004. Medicago truncatula EST-SSRs reveal cross-species genetic markers for Medicago spp. Theoretical and Applied Genetics 108:414-422. Fay, J. C., G. J. Wyckoff, and C. I. Wu. 2002. Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415:1024-6. Fondon, J. W., 3rd, and H. R. Garner. 2004. Molecular origins of rapid and continuous morphological evolution. Proc Natl Acad Sci U S A 101:18058-63. Gould, S. J., and R. C. Lewontin. 1979. The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proc. R. Soc. Lond. B 205:581-598. Hegde, S. G., J. D. Nason, J. M. Clegg, and N. C. Ellstrand. 2006. The evolution of California's wild radish has resulted in the extinction of its progenitors. Evolution 60:, in press. Holm, L., J. Doll, E. Holm, J. Pancho, and J. Herberger. 1997. World Weeds. Natural Histories and Distribution. Wiley, New York. Huang, X., and A. Madan. 1999. CAP3: A DNA sequence assembly program. Genome Res 9:868-77. Hudson, R. R., M. Kreitman, and M. Aguade. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-9. Irwin, R. E., and S. Y. Strauss. 2005. Flower color microevolution in wild radish: Evolutionary response to pollinator-mediated selection. American Naturalist 165:225-237. Irwin, R. E., S. Y. Strauss, S. Storz, A. Emerson, and G. Guibert. 2003. The role of herbivores in the maintenance of a flower color polymorphism in wild radish. Ecology 84:17331743. Johnston, J. S., A. E. Pepper, A. E. Hall, Z. J. Chen, G. Hodnett, J. Drabek, R. Lopez, and H. J. Price. 2005. Evolution of genome size in Brassicaceae. Annals of Botany 95:229-235. Kantety, R. V., M. La Rota, D. E. Matthews, and M. E. Sorrells. 2002. Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat. Plant Molecular Biology 48:501-510. Kapranov, P., S. E. Cawley, J. Drenkow, S. Bekiranov, R. L. Strausberg, S. P. Fodor, and T. R. Gingeras. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296:916-9. Karlin, S., and C. Burge. 1996. Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. Proc Natl Acad Sci U S A 93:1560-5. Karron, J. D., and D. L. Marshall. 1990. Fitness consequences of multiple paternity in wild radish, Raphanus sativus. Evolution 44:260-268. Karron, J. D., and D. L. Marshall. 1993. Effects of environmental variation on fitness of singly and multiply sired progenies of Raphanus sativus (Brassicaceae). Am. J. of Botany 80:1407-1412. Kay, Q. O. N. 1976. Preferential pollination of yellow-flowered morphs of Raphanus raphanistrum by Pieris and Eristalis spp. Nature 261:230-232. Kay, Q. O. N. 1978. The role of preferential and assortative pollination in the maintenance of flower colour polymorphisms. Pp. 175-190 in A. J. Richards, ed. The Pollination of Flowers by Insects. Academic Press, New York. Kay, Q. O. N. 1982. Intraspecific discrimination by pollinators and its role in evolution. Pp. 928 in J. A. Armstrong, J. M. Powell and A. J. Richards, eds. Pollination and Evolution. Royal Botanic Gardens, Sydney. Korf, I., P. Flicek, D. Duan, and M. R. Brent. 2001. Integrating genomic homology into gene structure prediction. Bioinformatics 17 Suppl 1:S140-8. Kuhl, J. C., F. Cheung, Q. Yuan, W. Martin, Y. Zewdie, J. McCallum, A. Catanach, P. Rutherford, K. C. Sink, M. Jenderek, J. P. Prince, C. D. Town, and M. J. Havey. 2004. A unique set of 11,008 onion expressed sequence tags reveals expressed sequence and genomic differences between the monocot orders Asparagales and Poales. Plant Cell 16:114-25. Lawson, M. J., and L. Zhang. 2006. Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes. Genome Biology 7:R14. Lee, Y., J. Tsai, S. Sunkara, S. Karamycheva, G. Pertea, R. Sultana, V. Antonescu, A. Chan, F. Cheung, and J. Quackenbush. 2005. The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes. Nucleic Acids Res 33:D71-4. Lehtila, K., and S. Y. Strauss. 1999. Effects of foliar herbivory on male and female reproductive traits of wild radish, Raphanus raphanistrum. Ecology 80:116-124. Li, L., X. Wang, V. Stolc, X. Li, D. Zhang, N. Su, W. Tongprasit, S. Li, Z. Cheng, J. Wang, and X. W. Deng. 2006. Genome-wide transcription analyses in rice using tiling microarrays. Nat Genet 38:124-9. Li, Y. C., A. B. Korol, T. Fahima, and E. Nevo. 2004. Microsatellites within genes: structure, function, and evolution. Mol Biol Evol 21:991-1007. Marshall, D. L. 1988. Postpollination effects on seed paternity: Mechanisms in addition to microgametophyte competition operate in wild radish. Evolution 42:1256-1266. Marshall, D. L. 1998. Pollen donor performance can be consistent across maternal plants in wild radish (Raphanus sativus, Brassicaceae): a necessary condition for the action of sexual selection. American Journal of Botany 85:1389-1397. Marshall, D. L., J. J. Avritt, M. Shaner, and R. L. Saunders. 2000. Effects of pollen load size and composition on pollen donor performance in wild radish, Raphanus sativus Brassicaceae). Amer. J. Bot. 87:1619-1627. Marshall, D. L., and P. K. Diggle. 2001. Mechanisms of differential pollen donor performance in wild radish, Raphanus sativus (Brassicaceae). Amer. J. Bot. 88:242-257. Marshall, D. L., and N. C. Ellstrand. 1985. Proximal causes of multiple paternity in wild radish, Raphanus sativus. Am. Nat. 126:596-605. Marshall, D. L., and N. C. Ellstrand. 1986. Sexual selection in Raphanus sativus: experimental data on nonrandom fertilization, maternal choice, and consequences of multiple paternity. Am. Nat. 127:446-461. Marshall, D. L., and N. C. Ellstrand. 1988. Effective mate choice in wild radish: Evidence for selective seed abortion and its mechanism. Am. Nat. 131:739-756. Marshall, D. L., and N. C. Ellstrand. 1989. Regulation of mate number in fruits and wild radish. Am. Nat. 133:751-765. Marshall, D. L., and M. W. Folsom. 1992. Mechanisms of nonrandom mating in wild radish. Pp. 91-118 in R. Wyatt, ed. Ecology and Evolution of Plant Reproduction. Routledge, Chapman & Hall, Inc., NY. Marshall, D. L., M. W. Folson, C. Hatfield, and T. Bennett. 1996. Does interference competition among pollen grains occur in wild radish? Evolution 50:1842-1848. Marshall, D. L., and O. S. Fuller. 1994. Does nonrandom mating among wild radish plants occur in the field as well as in the greenhouse? Am. J. of Botany 81:439-445. Marshall, D. L., and D. M. Oliveras. 2001. Does differential seed siring success change over time or with pollination history in wild radish, Raphanus sativus (Brassicaceae)? American Journal of Botany 88:2232-2242. Marshall, D. L., and K. L. Whittaker. 1989. Effects of pollen donor identity on offspring quality in wild radish, Raphanus sativus. Amer. J. Bot. 76:1081-1088. Mauricio, R., M. D. Bowers, and F. A. Bazzaz. 1993. Pattern of leaf damage affects fitness of the annual plant Raphanus sativus (Brassicaceae). Ecology 74:2066-2071. Maynard Smith, J., R. Burian, S. Kauffman, P. Alberch, J. Campell, B. Goodwin, R. Lande, D. Raup, and L. Wolpert. 1985. Developmental constraints and evolution. Quart. Rev. Biol. 60:265-287. Mazer, S. J., and C. T. Schick. 1991. Constancy of population parameters for life-history and floral traits in Raphanus sativus L. II. Effects of planting density on phenotype and heritability estimates. Evolution 45:1888-1907. McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652-4. Meloni, R., V. Albanese, P. Ravassard, F. Treilhou, and J. Mallet. 1998. A tetranucleotide polymorphic microsatellite, located in the first intron of the tyrosine hydroxylase gene, acts as a transcription regulatory element in vitro. Hum Mol Genet 7:423-8. Meyers, B. C., F. F. Souret, C. Lu, and P. J. Green. 2006. Sweating the small stuff: microRNA discovery in plants. Curr Opin Biotechnol 17:139-46. Morgan, M. T., and J. K. Conner. 2001. Using genetic markers to directly estimate male selection gradients. Evolution 55:272-281. Nakamura, R. R., and M. L. Stanton. 1989. Embryo growth and seed size in Raphanus sativus : Maternal and paternal effects in vivo and in vitro. Evolution 43:1435-1443. Page, R. D., and M. A. Charleston. 1997. From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Mol Phylogenet Evol 7:231-40. Quackenbush, J., J. Cho, D. Lee, F. Liang, I. Holt, S. Karamycheva, B. Parvizi, G. Pertea, R. Sultana, and J. White. 2001. The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res 29:159-64. Quackenbush, J., F. Liang, I. Holt, G. Pertea, and J. Upton. 2000. The TIGR gene indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Res 28:141-5. Sawyer, S. A., and D. L. Hartl. 1992. Population genetics of polymorphism and divergence. Genetics 132:1161-76. Sawyer, S. A., R. J. Kulathinal, C. D. Bustamante, and D. L. Hartl. 2003. Bayesian analysis suggests that most amino acid replacements in Drosophila are driven by positive selection. J Mol Evol 57 Suppl 1:S154-64. Schemske, D. W., and H. D. Bradshaw. 1999. Pollinator preference and the evolution of floral traits in monkeyflowers (Mimulus). Proc. Natl. Acad. Sci. 96:11910-11915. Shagin, D. A., D. V. Rebrikov, V. B. Kozhemyako, I. M. Altshuler, A. S. Shcheglov, P. A. Zhulidov, E. A. Bogdanova, D. B. Staroverov, V. A. Rasskazov, and S. Lukyanov. 2002. A novel method for SNP detection using a new duplex-specific nuclease from crab hepatopancreas. Genome Res 12:1935-42. Shiu, S.-H., M.-C. Shih, and W. H. Li. 2005. Transcription factor families have much higher expansion rates in plants than in animals. Plant Physiol In press Shiu, S. H., J. K. Byrnes, R. Pan, P. Zhang, and W. H. Li. 2006. Role of positive selection in the retention of duplicate genes in mammalian genomes. Proc Natl Acad Sci USA 103:2232-6. Siepel, A., and D. Haussler. 2004. Computational identification of evolutionarily conserved exons. Pp. 177-186. RECOMB 2004: Proceedings of the Eighth Annual International Conference on Computational Molecular Biology: 2004 March 27–31; San Diego. ACM Press, New York. Snow, A., and S. J. Mazer. 1988. Gametophytic selection in Raphanus raphanistrum: A test for heritable variation in pollen competitive ability. Evolution 42:1065-1075. Snow, A. A. 1990. Effects of pollen-load size and number of donors on sporophyte fitness in wild radish (Raphanus raphanistrum). Am. Nat. 136:742-758. Snow, A. A., and L. G. Campbell. 2005. Can feral radishes become weeds? Pp. 193-207 in J. Gressel, ed. Crop Ferality and Volunteerism. Taylor and Francis, Boca Raton. Snow, A. A., and M. L. Stanton. 1988. Aphids limit fecundity of a weedy annual (Raphanus sativus ). Amer. J. Bot. 75:589-593. Snow, A. A., K. L. Uthus, and T. M. Culley. 2001. Fitness of hybrids between weedy and cultivated radish: Implications for weed evolution. Ecological Applications 11:934-943. Stanton, M., and H. J. Young. 1994. Selecting for floral character associations in wild radish, Raphanus sativus L. J. Evol. Biol. 7:271-285. Stanton, M. L. 1984a. Developmental and genetic sources of seed weight variation in Raphanus raphanistrum L. (Brassicaceae). Amer. J. Bot. 71:1090-1098. Stanton, M. L. 1984b. Seed variation in wild radish: effect of seed size on components of seedling and adult fitness. Ecology 65:1105-1112. Stanton, M. L. 1985. Seed size and emergence time within a stand of wild radish (Raphanus raphanistrum L.): the establishment of a fitness hierarchy. Oecologia 67:524-531. Stanton, M. L. 1987. Reproductive biology of petal color variants in wild populations of Raphanus sativus: I. Pollinator response to color morphs. Amer. J. Bot. 74:178-187. Stanton, M. L., A. A. Snow, and S. N. Handel. 1986. Floral evolution: attractiveness to pollinators increases male fitness. Science 232:1625-1627. Stanton, M. L., A. A. Snow, S. N. Handel, and J. Bereczky. 1989. The impact of a flower-color polymorphism on mating patterns in experimental populations of wild radish (Raphanus raphanistrum L.). Evolution 43:335-346. Stanton, M. L., H. J. Young, N. C. Ellstrand, and J. M. Clegg. 1991. Consequences of floral variation for male and female reproduction in experimental populations of wild radish, Raphanus sativus L. Evolution 45:268-280. Stolc, V., Z. Gauhar, C. Mason, G. Halasz, M. F. van Batenburg, S. A. Rifkin, S. Hua, T. Herreman, W. Tongprasit, P. E. Barbano, H. J. Bussemaker, and K. P. White. 2004. A gene expression map for the euchromatic genome of Drosophila melanogaster. Science 306:655-60. Stolc, V., M. P. Samanta, W. Tongprasit, H. Sethi, S. Liang, D. C. Nelson, A. Hegeman, C. Nelson, D. Rancour, S. Bednarek, E. L. Ulrich, Q. Zhao, R. L. Wrobel, C. S. Newman, B. G. Fox, G. N. Phillips, Jr., J. L. Markley, and M. R. Sussman. 2005. Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays. Proc Natl Acad Sci U S A 102:4453-8. Thiel, T., W. Michalek, R. K. Varshney, and A. Graner. 2003. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theoretical And Applied Genetics 106:411-422. Town, C. D., F. Cheung, R. Maiti, J. Crabtree, B. J. Haas, J. R. Wortman, E. E. Hine, R. Althoff, T. S. Arbogast, L. J. Tallon, M. Vigouroux, M. Trick, and I. Bancroft. 2006. Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveal gene loss, fragmentation, and dispersal after polyploidy. Plant Cell 18:1348-1359. Underwood, B. A., R. Vanderhaeghen, R. Whitford, C. D. Town, and P. Hilson. 2006. Simultaneous high-throughput recombinational cloning of open reading frames in closed and open configurations. Plant Biotechnology Journal 4:317-324. Urbanek, P., J. Paces, and V. Paces. 2005. An approach towards experimental cDNA sequence determination of predicted genes: an example from Arabidopsis U3-55k homologues. urbanek@img.cas.cz. Gene 358:67-72. Van Dongen, S. M. 2000. Graph clustering by flow simulation. Pp. 169. University of Utrecht. Warwick, S. I., and L. D. Black. 1991. Molecular systematics of Brassica and allied genera (Subtribe Brassicinae, Brassiceae) -- chloroplast genome and cytodeme congruence. Theoretical and Applied Genetics 82:81-92. Wendel, J. F. 2000. Genome evolution in polyploids. Plant Mol Biol 42:225-49. Whitton, J., L. H. Rieseberg, and M. C. Ungerer. 1997. Microsatellite loci are not conserved across the Asteraceae. Molecular Biology and Evolution 14:204-209. Williams, P. H., and C. B. Hill. 1986. Rapid-cycling populations of Brassica. Science 232:1385-1389. Xiao, Y. L., S. R. Smith, N. Ishmael, J. C. Redman, N. Kumar, E. L. Monaghan, M. Ayele, B. J. Haas, H. C. Wu, and C. D. Town. 2005. Analysis of the cDNAs of hypothetical genes on Arabidopsis chromosome 2 reveals numerous transcript variants. Plant Physiol 139:1323-37. Yamada, K., J. Lim, J. M. Dale, H. Chen, P. Shinn, C. J. Palm, A. M. Southwick, H. C. Wu, C. Kim, M. Nguyen, P. Pham, R. Cheuk, G. Karlin-Newmann, S. X. Liu, B. Lam, H. Sakano, T. Wu, G. Yu, M. Miranda, H. L. Quach, M. Tripp, C. H. Chang, J. M. Lee, M. Toriumi, M. M. Chan, C. C. Tang, C. S. Onodera, J. M. Deng, K. Akiyama, Y. Ansari, T. Arakawa, J. Banh, F. Banno, L. Bowser, S. Brooks, P. Carninci, Q. Chao, N. Choy, A. Enju, A. D. Goldsmith, M. Gurjal, N. F. Hansen, Y. Hayashizaki, C. Johnson-Hopson, V. W. Hsuan, K. Iida, M. Karnes, S. Khan, E. Koesema, J. Ishida, P. X. Jiang, T. Jones, J. Kawai, A. Kamiya, C. Meyers, M. Nakajima, M. Narusaka, M. Seki, T. Sakurai, M. Satou, R. Tamse, M. Vaysberg, E. K. Wallender, C. Wong, Y. Yamamura, S. Yuan, K. Shinozaki, R. W. Davis, A. Theologis, and J. R. Ecker. 2003. Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302:842-6. Yang, T. J., J. S. Kim, K. B. Lim, S. J. Kwon, J. A. Kim, M. Jin, J. Y. Park, M. H. Lim, H. I. Kim, S. H. Kim, Y. P. Lim, and B. S. Park. 2005. The Korea Brassica Genome Project: A glimpse of the Brassica genome based on comparative genome analysis with Arabidopsis. Comparative and Functional Genomics 6:138-146. Yang, Y. W., K. N. Lai, P. Y. Tai, D. P. Ma, and W. H. Li. 1999. Molecular phylogenetic studies of Brassica, Rorippa, Arabidopsis and allied genera based on the internal transcribed spacer region of 18S-25S rDNA. Molecular Phylogenetics and Evolution 13:455-462. Yang, Y. W., P. Y. Tai, Y. Chen, and W. H. Li. 2002. A study of the phylogeny of Brassica rapa, B-nigra, Raphanus sativus, and their related genera using noncoding regions of chloroplast DNA. Molecular Phylogenetics and Evolution 23:268-275. Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:555-6. Young, H. J., and M. L. Stanton. 1990. Influence of environmental quality on pollen competitive ability in wild radish. Science 248:1631-1633. Zhu, Y. Y., E. M. Machleder, A. Chenchik, R. Li, and P. D. Siebert. 2001. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques 30:892-7.