PROJECT DESCRIPTION: Association mapping of a candidate domestication gene in Physalis philadelphica (Solanaceae) Prior NSF Support DBI-0110069, “Genomic Analysis of Water Use Efficiency”, 9/1/01-8/31/05, $341,185 (subcontract to UNC-CH). On this award, TJV is a key collaborator and the PIs include J Comstock (project leader), S McCouch and B Martin. The goals are to map quantitative trait loci for water use efficiency (WUE) in intra- and inter-specific crosses of tomato and rice using the relative abundance of stable carbon isotopes as a proxy for WUE. To date, key environmental parameters for control of the phenotype have been determined, screens of WUE have been made in a large number of potential parental lines, and initial whole-genome scans have been carried out in multiple permanent mapping populations of both species. Current work focuses on (1) verification of initial candidate QTL to pursue for fine-mapping and construction of nearisogenic lines and (2) the development of markers and genetic material to achieve those ends. Publications describing the findings of the first year are currently in preparation. Several student and teacher interns have participated in the project so far. The web site has primers on the science behind the project, progress updates on scientific and outreach goals, announcemounts about upcoming outreach activities, and more: http://isotope.bti.cornell.edu. DBI-0227314, “Tools for Plant Comparative Genomics”, 9/1/02-8/31/07, $992,655. On this award, TJV is the principal investigator. The goals are to integrate a number of inter-related problems in comparative genomics, including gene family evolution, gene order evolution, and the divergence of expression patterns among duplicated genes. Analysis of comprehensive sequence and map datasets from a phylogenetically diverse set of publicly available plant genome projects are being made available from an interactive web database. An inquiry-based learning module in genome science is being developed for secondary school students in collaboration with professional educators. In the first few months of the project, we have made progress building a code base for the database and curating pilot data. We have improved upon computational and statistical methods for comparative mapping (manuscript submitted to the 11th Annual Conference on Intelligent Systems in Molecular Biology, authors: P Calabrese, S Chakravarty and T Vision) and are presently preparing two manuscripts: one, a novel algorithm for inference of ancestral gene order (authors: J Huan, J Prins, W Wang and T Vision) and one an applied study of evolution in the Arabidopsis Aux/IAA and ARF gene families (authors: D Remington, T Vision and J Reed). Introduction Our major crop plants have been greatly modified from their wild ancestors through the process of selection known as domestication. Changes seen under domestication include both those that directly serve human needs and preferences (e.g. increased allocation to edible parts) and those that provide a presumed competitive advantage under cultivation (e.g. change of growth habit, lack of seed dormancy, reduced seed shattering). Independent domestication of different crops of the same family or even of the same genus has sometimes resulted in similar phenotypic changes among closely related taxa. Particularly striking examples of convergent domestication have been documented in the families Poaceae and Solanaceae. Interestingly, in these two families, some domestication traits appear to be underlain, at least in part, by genetic variation at homologous loci. 1 The independently domesticated cereal grains sorghum, rice and maize (in the Poaceae) share numerous domestication traits, among them larger grain size, reduced disarticulation of mature inflorescences (shattering), and day-length insensitivity to flowering. Quantitative trait locus (QTL) mapping, coupled with the application of homologous genetic markers, has revealed that many of the loci responsible for the quantitative variation in these three traits map to homologous regions in the three species (Paterson et al. 1995). Such loci have been dubbed ‘homologous QTLs’. In the Solanaceae, the independent domestication of several different crops (including tomato, pepper, eggplant and tomatillo) has resulted in extreme increases in fruit weight relative to the nearest wild relatives (as much as 500-fold). In tomato, over twelve QTL associated with the difference in fruit weight between wild and domesticated species have been identified (Grandillo et al. 1999). The most important of these QTL, fw2.2, is responsible for approximately 30% of the mean difference in fruit weight (averaged over crosses involving different wild relatives). The gene responsible for this effect, ORFX, has recently been cloned and characterized (Alpert et al. 1995; Frary et al. 2000). Major QTL affecting fruit traits in pepper (Ben Chaim et al. 2001) and eggplant (Doganlar et al. 2002) cosegregate with ORFX, suggesting that this one gene may be a major fruit weight QTL in all three species. Three other fruit weight QTL show homology between tomato and either pepper or eggplant (Table 1). Similar patterns have been observed in other traits; not only are many of the same loci responsible for segregating variation in these different studies, but their effect sizes and dominance relationships are also similar (Ben Chaim et al. 2001, Doganlar et al. 2002). Table 1. Locus names and QTL effect sizes for putative homologous QTLs from tomato (Eshed and Zamir 1995), pepper (Ben Chaim et al. 2001) and eggplant (Doganlar et al. 2002). %C is percent difference of homozoygous introgression relative to control. %PVE is percent phenotypic variance explained. Entries marked by a period (.) indicate no significant QTL at the homologous location. Tomato Locus fw2.2 fw3.1 fw9.2 fw11.1 %C 20 13 9 10 Pepper Locus fw2.1 fw3.2 . . %PVE 10 15 . . Eggplant Locus fw2.1 . fw9.1 fw11.1 %PVE 23 . 33 19 This pattern poses an evolutionary puzzle, since the identities and relative contributions of loci that contribute to variation in a highly polygenic trait are expected to be more idiosyncratic. Under certain conditions, uniform selection even on two samples from the same population will result in greater divergence in allele frequencies at QTL than under random drift alone (Cohan 1984, but see Lynch 1986). Tomato, pepper and eggplant had undergone tens of millions of years of divergence prior to the recent (<10,000 yr old) application of artificial selection on fruit size. In such species, which differ not only in allele frequency but even in allelic composition at potential QTL, it is not at all obvious why the genetic architecture of a novel trait should converge under uniform selection. There are a number of other examples of homologous QTLs, most of which come from studies of traits related to domestication (eg Fatokun et al 1992, Koinange et al 1996, Shoemaker et al 1996, Osborn et al 1997). This is not surprising, since domestication traits in agriculturally important species have been the focus of the lion’s share of intensive QTL mapping and comparative mapping studies outside of humans. For the same reasons, domestication-related 2 traits provide convenient experimental systems for exploring the problem further. Despite the focus on domestication traits, findings from these studies should be of relevance to understanding the evolution of genetic architectures in complex traits more generally. And understanding the phenomenon of QTL homology would help guide future efforts to dissect complex traits, suggesting strategies for the efficient use of candidate genes and model organisms (such as Drosophila melanogaster and Arabidopsis thaliana). Homologous QTLs could, in fact, provide a toe-hold into studies of the genetic basis for ecologically important traits in natural systems. One important question is: “How frequent is the phenomenon of QTL homology?” There are certainly examples where QTLs do not correspond across taxa. An interesting case is that of disease resistance loci in three Solanaceous species: tomato, potato and pepper. While the positions of loci conferring pathogen resistance, broadly defined, correspond across the species, the positions of loci specific to particular pathogens do not (Grube et al 2000). Apart from such striking cases, the finding of an absence of QTL homology is inherently less likely to be reported. Thus, the generality of the phenomenon may easily be inflated by publication basis. Even if it is a general phenomenon, there has been no attempt as of yet to synthesize the literature to estimate how frequently QTL homology does or does not occur or search for patterns in its occurrence that could give clues as to its underlying causes. Another important question is: “Do homologous QTLs arise from variation in homologous genes?” The difficulty is that line-cross studies typically map QTL to intervals 5-20 centimorgans in size. Such a large interval may easily contain hundreds of genes. This provides grounds for reasonable doubt that homologous QTL are necessarily underlain by homologous genes. Why would it be otherwise? One possibility, particularly relevant to domestication traits, is that selection on multiple traits can cause interactions between the genetic factors underlying each. Imagine genetic variation for trait A is confined to a single locus while that for trait B is present at many loci throughout the genome. The single trait A locus that responds to selection may influence which loci respond to selection for trait B. All other things being equal, those loci for trait B that are linked to the trait A locus (in coupling with respect to selection) will contribute more to the selection response. Provided that there are major loci underlying a sufficient number of traits that show correspondence across species, this could provide a mechanism by which QTLs may occur at homologous loci without being underlain by the same genes. There is, in fact, some evidence for linkage among domestication genes for different traits in the cereal grains (Cai and Morishima 2002). But regardless of whether this particular model is correct, the point is that it remains to be demonstrated empirically that homologous QTLs either are, or are not, underlain by homologous genes. Motivated by these questions, we propose to further study the genetic architecture of fruit weight in the Solanaceae. This is an especially interesting trait because of its highly polygenic nature (Grandillo et al 1999). One species in the family for which this trait has not yet been studied is Physalis philadelphica, which is commonly known as tomatillo or husk tomato. A chloroplast DNA phylogeny indicates that Physalis diverged prior to the most recent common ancestor of the clade that contains Lycopersicon (tomato), Capsicum (pepper) and Solanum (eggplant) (Olmstead and Palmer 1992). Tomatillo is a fruit crop which was domesticated in Mesoamerica in pre-Columbian times (Montes Hernandez and Aguirre Rivera, 1994) and is still important in Mexican cuisine. The average fruit weight varies over two orders of magnitude among different genotypes (Montalvo Hernandez, 1998). It is not yet known what loci contribute to this variation or whether they are homologous to those in tomato, pepper and eggplant. Thus, tomatillo provides a useful test of the generality of the phenomenon of homologous QTLs. 3 Tomatillo is also an excellent system in which to directly test the contribution of ORFX to fruit weight variation. In tomato, pepper and eggplant, all of which are self-compatible, crosses between inbred lines have been used to generate populations for dissecting complex traits (MacKay 2001, and see below). By contrast, most genotypes of tomatillo posses gametophytic self-incompatibility and thus are obligate outcrossers (Pandey 1957). As a result, it is possible to identify QTL using association mapping, which takes advantage of naturally occurring patterns of linkage disequilibrium between markers and QTLs. With association mapping, because linkage disequilibrium is much less extensive than in a line cross population, one can directly assess the contribution of polymorphisms within a candidate locus to variation in a trait. In addition, the use of association mapping to identify the causative polymorphism(s) in tomatillos is aided by the fact that small and large-fruited landraces are grown in close proximity to one another in many regions of Mexico. There is high fertility in the F1 progeny of crosses between diverse selfincompatible genotypes of tomatillo independent of the fruit weight phenotype (Hudson 1986, Peña Lomeli 1998). Thus, there has likely been genetic exchange between small and largefruited populations for many generations and, so, linkage disequilibrium should steeply around the causative polymorphism(s). This will allow us to localize such polymorphisms very precisely should they be present at the candidate locus. Objectives The primary scientific objective of the proposed research is to evaluate the contribution of the P. philadelphica ORFX homolog (PpORFX), and neighboring loci, to phenotypic variation in fruit weight. 1. Fruit weight (and other domestication) phenotypes will be measured in a large, geographically diverse collection of genotypes in a common garden experiment. 2. Nucleotide polymorphisms at PpORFX will be screened and then scored in these same genotypes. 3. Association mapping will be used to fine-map QTL linked to polymorphisms at PpORFX that contribute to fruit weight variation. 4. In order to control for the potentially confounding effects of population structure, we will also score multilocus microsatellite genotypes in these individuals and incorporate the resulting estimates of population structure into the association mapping analysis. 5. Sequence data will be obtained for three unlinked loci that are not domestication candidates in order to determine the background patterns of nucleotide variability and linkage disequilibrium decay. In a parallel study, the contribution of the larger chromosomal region containing ORFX, as well as those containing other Solanaceous fruit weight markers, is being measured in an F2 cross between large and small fruited genotypes. Together, these studies will allow us to determine whether the ORFX-containing region contributes to fruit weight variation in tomatillo and whether the causative polymorphism resides in or near ORFX. Research Plan Tomatillo germplasm collection Tomatillo is actively cultivated in 21 Mexican states and uncultivated (though not truly wild) populations can also be found in south-central Mexico. The tomatillo is usually grown on smallscale traditional agricultural systems. It has been grown on an industrial scale only within the last 15 years and has not yet received extensive scientific breeding attention (Moriconi et al., 4 1990). U.S. cultivation is largely in California, but acreage is increasing in several other southern U.S. states. Hundreds of seed collections, covering the wide range of fruit weight available in tomatillo, are available from germplasm banks in the U.S., Mexico and Costa Rica. Though phenotypically diverse, these seed collections are mainly from the central-western states of Mexico (e. g. Jalisco, México, Michoacán and Puebla), close to major scientific centers. Judging from herbarium collections, many states hosting diverse populations of tomatillo (namely, the states of Guerrero, Oaxaca and Chiapas) are substantially under-represented in these collections. With support from the Plant Exchange Office of the U.S. Department of Agriculture (USDA), we spearheaded a collecting expedition for P. philadelphica germplasm in central-southern Mexico that took place in late October and early November of 2002. Other principal investigators included Dr. Larry Robertson, curator of the Solanaceae germplasm collection for the USDA, Ofelia Vargas Ponce from the Universidad de Guadalajara, Mexico and Dr Aureliano Peña Lomelí from the Universidad Autónoma Chapingo, Mexico. Samples were taken from ninety collection sites, including (in descending order of importance) the states of Oaxaca, Chiapas, Jalisco, Guerrero, Puebla, Michoacán and Hidalgo. Table 2 shows the number of documented accessions available from various sources at the present time, not including commercial cultivars. Many of these accessions have already been obtained by my laboratory for use in the present study. Based on discussion with our collaborators in Mexico, our experience from the 2002 expedition, statistical analysis of locality data from herbarium records, the results of prior agronomic field trials in the U.S. and Mexico, and other published sources, we believe the accessions that we have obtained provide fairly comprehensive coverage of the geographical and phenotypic diversity in the species. Table 2. Numbers of accessions of P. philadelphica seed available from the national germplasm banks prior to Fall 2002 and additional collections made by the 2002 expedition. In many cases, multiple accessions have been collected from single populations. germplasm bank CATIE, Costa Rica GRIN, U.S.A BANGEV, Mexico 2002 expedition accessions 43 18 391 105 Field measurements of fruit-related traits. A field plot will be established during the spring and summer of 2004 at the Central Crops Research Station (CCRS) in Clayton, North Carolina to phenotypically evaluate 100-200 tomatillo accessions (the core set) under common environmental conditions. CCRS, which is owned and managed by the state of North Carolina, will provide the equipment, expertise and manpower for greenhouse germination, plot preparation, transplanting, irrigation, fertilization, and pest/pathogen control. We will measure fruit-related traits on accessions that have been chosen to cover the range of the species both geographically and (where prior information is available) in fruit weight. We will measure a suite of additional domestication-related traits (Table 3) in order to measure character correlations and generate a useful dataset for future studies. Summer interns will participate in the harvest, phenotyping, and data analysis from this experiment. 5 Sequencing and genotyping of the PpORFX homolog For this study, PpORFX and portions of the flanking intergenic spacer regions will be sequenced in a select sample of genotypes from the core set to identify insertion-deletion and single nucleotide polymorphisms. It is important to include the intergenic spacers since variation in non-coding cis-regulatory sequences could affect the expression of ORFX and thus contribute to fruit weight variation. In tomato, nucleotide differences 5’ upstream sequence of ORFX gene appear to be responsible for the functional difference between the wild and domestic alleles (Frary et al., 2000, Nesbitt and Tanksley 2002). A sample of these polymorphisms will then be genotyped in the full core set for use in association mapping. Having haplotype sequences of PpORFX early in the project will be helpful for planning of the genotyping task since they will enable us to measure the pattern of linkage disequilibrium decay and select an appropriate marker density for association mapping (Remington et al 2001). These sequences will also allow us to estimate the power of the association mapping analysis. Cloning of short and longrange PCR products will be used to obtain haplotype data, since heterozygosity is likely to be high in this outcrossing species. The genotyping methodology to be used on the full core set (for the association study itself) will depend upon the pattern of polymorphism that we find at the locus. Table 3. Traits to be measured before and at harvest in field trials of core set. Italics indicate traits identified as differing between cultivated and uncultivated accessions by Montes Hernandez (1989). Category Plant Seed Fruit Flower Stem Leaf Trait Height Growth habit Germination Number per fruit Weight Days to fruiting Fresh and dry weight Diameter Volume Specific weight Color Pedicel length Number per plant Calyx color Days to flowering Pedicel length Length and width of corolla No. of nodes Internode length Color Trichome density Length Width Number of teeth 6 A fragment of PpORFX has already been isolated (A Habel and TJV, unpublished results). A pair of degenerate primers was designed based upon the predicted amino acid sequence of the tomato ORFX gene product and a homologous expressed sequence tag from Petunia hybrida, a distant relative in the Solanaceae (Olmstead and Palmer 1992). Among the products obtained was a 750bp fragment 64% identity at the amino acid level to portions of the first and second exons of tomato ORFX. Twenty-two single nucleotide and six indel polymorphisms have been identified in this fragment among three closely related tomatillo accessions. To isolate the remainder of the gene and the flanking intergenic spacers, we are using inverse PCR (Ochman et al. 1988) and GeneWalker libraries (BD Biosciences). It is worth pointing out that the function of the ORFX gene product is not well understood at this time. The protein was initially thought, based on protein-structure threading, to be a distant homolog of the human Ras oncogene (Frary et al 2000) but subsequent analysis has cast doubt on this (TJV, unpublished). Though homologs have been identified in the expressed sequence tag libraries of other plant species, and distant homologs are present in tomato, there are no unambiguous homologs of known function (Frary et al 2000). The protein is chiefly expressed prior to anthesis in developing ovaries; the large-fruited allele is responsible for an increase in cell number, though not cell size, due to a heterochronic shift in expression timing (Cong et al 2002). Should our work demonstrate the role of ORFX in fruit weight variation in tomatillo, it would likely motivate studies to elucidate the function of ORFX by examination of the knockout phenotypes of its two close Arabidopsis homologs. Additional sequence data In addition to the haplotype data from PpORFX, we will obtain equivalent data for smaller regions from three additional unlinked loci in order to determine whether patterns of polymorphism and linkage disequilibrium are at all unusual at PpORF. This could provide an insight into the history of selection at the PpORFX locus. Selection is of interest both because of the potential role of PpORFX in domestication and also because prior balancing selection could conceivably contribute to QTL orthology. If a locus with a long history of balancing selection contributes to variation in a trait that suddenly comes under directional selection, then that locus may contribute a large proportion of the initial response to selection by virtue of its having built up numerous, functionally different alleles. There are several well-known cases of interspecific polymorphisms due to very long-standing balancing selection (e.g. Ioerger et al 1990). Where such a balancing polymorphism is present, then convergent directional selection in multiple species might drive convergent allele frequency changes at orthologous loci. This explanation is entirely speculative, and it relies on arguable assumptions about both the frequency of balancing selection in nature and the contribution of standing variation to selection response. Yet, the explanation is consistent with the finding in tomato that the large fruited ORFX allele appears to have diverged from the extant lineage of small-fruited alleles millions of years ago, long prior to domestication (Nesbitt and Tanksley 2002). For this reason, it would be desirable to test for the presence of balancing selection at PpORFX should the locus prove to be associated with fruit weight variation. Though not all tests of selective neutrality at a locus require an outgroup sequence (eg Tajima 1989), several do (eg McDonald and Kreitman 1991, Fay and Wu 2000). In order to have the data needed to perform such tests, we will also sequence the homologs of these loci from at least one other Physalis species. Conserved Orthologous Sequences (COS) markers are sequence-tagged markers for mapped, conserved single copy genes in tomato (Fulton et al 2002). The positions of each COS ortholog in Arabidopsis thaliana is known (www.sgn.cornell.edu). Thus, the COS markers are a useful 7 starting point for comparative mapping in other eudicotyledonous species. Steven Tanksley (Cornell University) has kindly provided us with a number of primer pairs that amplify single-copy tomatillo orthologs of different COS markers. We have selected four pairs for use in this study which provide amplification products of sufficient length and are on chromosomes in tomato other than chromosome two, where ORFX is located (Table 4). Three of the primer pairs will be used to amplify the loci that are to be sequenced for this section of the project. Table 4. Loci to be used for study of background levels of polymorphism and linkage disequilibrium. Listed are the chromosome for the COS marker in tomato and the amplicon size in tomatillo. COS ID T0142 T0161 T0687 T1347 chromosome amplicon size 11 1000-1650 bp 9 650-850 bp 6 650-850 bp 7 850-1000 bp putative function lipid/fatty-acid/isoprenoid metabolism MRP-like ABC transporter unknown possible apospory-associated protein Line-cross mapping In a parallel study, not included within the scope of this proposal, we are using line-cross methodology to determine whether the region containing ORFX contributes to fruit weight variation in tomatillo. In the next section, we describe the association mapping approach to determine whether ORFX itself, rather than a QTL to which it is in LD in the line-cross progeny, underlies any of the variation. The use of line cross methods for studying the genetic basis of quantitative variation within and between species is well established (Mackay 2001). An experimental cross induces linkage disequilibrium (LD) between loci that differ between the parents. If one measures the trait(s) of interest and scores markers throughout the genome (at 5-20 cM spacing) in a collection of segregating progeny (e.g. F2), then one can test whether a particular locus explains a significant fraction of the phenotypic variation among the progeny. QTL of small effect may evade detection, but major QTL (that explain 20% or more of the variation) can be reliably detected in most designs. We are isolating markers in tomatillo for the major fruit weight loci in tomato, pepper and eggplant and measuring the variation explained by segregation at these markers to fruit weight in an F2 cross between large and small-fruited genotypes. PpORFX is one of the markers to be used; additional markers are being obtained by screening an existing tomatillo callus cDNA library (M. Robertson and TJV, unplublished) using heterologous probes obtained from the tomato expressed sequence tag collection. Phenotypic and genotypic data will be obtained for >100 individuals so as to provide sufficient power to detect QTL of moderate effect, and the traits measured will be the same as those in Table 3. Association mapping The regions to which QTL are mapped by line cross methods are typically 5-20 cM in length. In organisms such as P. philadelphica, such regions are too coarse to implicate specific genes. To isolate the gene itself by positional cloning, the typical strategy is to obtain large numbers of additional recombinants in the region, and to eliminate segregating background variation, by 8 progressing through several more generations of genotyping and phenotyping. Once the QTLcontaining region has been narrowed down to a manageable number of candidate genes, these are then typically tested for phenotypic effect using transgenic techniques (MacKay 2001). Positional cloning is thus laborious, expensive, risky, and difficult to implement in non-model organisms. Instead, the contribution of a candidate locus to variation in a trait of interest can be directly determined by measuring the association between polymorphisms at the locus and trait variation in a sample of naturally occurring genotypes (Risch, 2000; Risch and Merikangas, 1996). By ‘naturally occurring’, we mean a sample of genotypes with alleles having a sufficiently deep coalescent history that recombination will have broken down any LD present over long distances. Thus, while it is not necessary to score the causative polymorphism itself in order to see an association, it is necessary to score a polymorphism sufficiently close to the causative one that LD is present between them. With association mapping, one can identify a region containing a causative polymorphism (i.e. QTL) with much finer resolution than with a line-cross QTL experiment. Association mapping is greatly facilitated by having a limited candidate region to start with, because whole-genome association mapping requires an enormous number of markers and must make severe corrections for multiple tests. A voluminous literature on association mapping has accumulated in the last several years, particularly in the field of human medical genetics. Population structure An important caveat with association mapping is that, in structured populations, LD between two polymorphisms may arise even in the absence of physical linkage. Such associations will create false positives in a mapping experiment. The effect can be understood by considering unlinked alleles that are, because of population structure, both at high frequency in one subpopulation and low frequency in another. Even if allele A affects the measured phenotype but B does not, locus B will appear to be associated with the phenotype. This problem has received a good deal of attention in the human genetics community and a number of solutions have been proposed (Devlin and Roeder 1999, Pritchard et al 2000b, Reich and Goldstein 2001). Pritchard et al. (2000b) introduced a popular test for association that corrects for population structure. The first step is to score the genotypes in the association mapping sample at a number of unlinked marker loci. A simple statistical test for population structure in the sample is then performed (Pritchard et al. 2000a). If structure is detected, then one can estimate the number of subpopulations and the proportion of the genome of each individual that is derived from each subpopulation. This matrix of estimated proportions is incorporated into the test for association at the candidate locus, thereby not only correcting for population structure but also allowing one to detect associations that are in different phase in different subpopulations. Pritchard and Rosenberg (1999) have shown using simulations that a limited number of microsatellite loci (15-20) is sufficient to detect stratification under two different models of population structure. Therefore, we propose to develop 15-20 microsatellite markers in tomatillo and to genotype these in the core set. If population structure is detected in our sample, we will incorporate the estimated population structure into the association mapping analysis using the method of Pritchard et al. (2000b) as later modified by Thornsberry et al. (2001) to accomodate quantitative traits. 9 Enriched-microsatellite library To date, we have constructed an enriched-microsatellite library using the double enrichment method of Fleischer and Loew (1995). Since plant microsatellites tend to be AT-rich (Cardle et al. 2000, Gupta and Varshney 2000, Lagercrantz et al. 1993, Powell et al. 1996), we have used ten different biotin-labeled 30mer oligonucleotide probes for enrichment: (ACT)10, (AGT)10, (AAG)10, (ATC)10, (ATG)10, (TTC)10, (AAT)10, (TTA)10, (TTG)10 and (AAC)10. These probes are complementary to all thirty AT-rich trimeric repeat motifs because each one is complementary to three different overlapping motifs. For instance, probe (ACT)10 is complementary to microsatellites composed of the motifs ACT, CTA and TAC. We focused on trimeric repeats because they are the most abundant repeats within genes (Cardle et al 2000) and because they are relatively easy to score. After the second enrichment step, four of the probes yielded good smears in the appropriate size range: (ACT)10, (AAT)10, (AAG)10, and (TTG)10 The next step will be to clone these fragments (we are using the pBluescript II KS (+) Vector from Stratagene), end-sequence the inserts using universal primers, identify those clones containing unique triplet repeats, and design PCR primers to amplify them. The microsatellites will then be screened for polymorphism in a small (10-20) set of genotypes. Those that are sufficiently polymorphic will be used for the analysis of population structure. For genotyping, the amplicons will be separated in denaturing acrylamide gels, stained with ethidium bromide, and scored manually. Anticipated results and scientific importance The proposed study will provide a definitive answer to the question: “Does the PpORFX itself contribute to fruit weight variation in tomatillo?” In answering this question, we will generate the following resources: 1. A set of 15-20 polymorphic AT-rich trimeric microsatellites 2. Fruit weight and a suite of other domestication-related trait measurements in the core set 3. Multilocus microsatellite genotypes for the core set 4. Haplotype data for four loci, including PpORFX, from a subset of the core In addition to the experimental work described above, we anticipate reviewing the literature pertaining to homologous QTLs and evaluating models that can shed light on the phenomenon. While the primary emphasis of the study is on the contribution of PpORFX to variation in fruit weight in tomatillo, the data obtained from this study will also be of use in answering other questions. The microsatellite genotypes will help to elucidate the phylogeographic structure of P. philadelphica and possibly shed light on the genetic history of its domestication. The patterns of sequence polymorphism and LD decay in tomatillo are of interest in their own right, as evolutionary geneticists currently have data on these from a limited number of plant species. There is considerable interest in the association mapping of candidate loci underlying complex traits in non-model organisms and natural populations. For that reason, it is desirable to have quantitative data on the scale of LD decay in an obligate outcrossing plant such as tomatillo. If the results suggest that PpORFX does, in fact, contribute to fruit weight variation in tomatillo, it would lay the groundwork for tests as to the cause of the phenomenon of QTL orthology. Physalis is an attractive system for these studies, as a number of species are native to North Carolina and would thus be accessible for field manipulations. The results of these studies could 10 have consequences for many areas of evolutionary genetics by providing novel insights into the evolution and genetic architecture of complex traits. Recruitment, education and training plan The personnel on this project would include the PI (TJV), at least one postdoctoral investigator, rotating graduate students from the Department of Biology at UNC-CH, at least one undergraduate thesis student, and, should recruitment be successful, one or more interns from the two summer undergraduate research programs that are active at UNC-CH (see below). This will create opportunities for mentoring experiences at many levels. In addition to participating in the research itself, students and postdocs are expected to participate in lab meetings and departmental activities, attend seminars and interact with other scientists at UNC-CH and neighboring institutions (particularly Duke University and North Carolina State University), to present their work at scientific meetings, to write and peer review journal articles, to mentor younger students, and to gain experience in teaching, where possible. There are a number of special programs at UNC-CH through which students may be recruited for this project. Since the programs mentioned here cover all expenses for the students, additional funds are not included on the budget for this project. Two programs, in particular, draw undergraduate students to the UNC-CH campus. On is the NIH-funded Summer Undergraduate Research Experience (SURE). SURE is a competitive program drawing from a national applicant pool that provides opportunities for students to carry out independent research projects under the guidance of UNC-CH faculty mentors during the summer months. Through meetings with guest scientists, the program promotes awareness of the diversity of research areas, especially areas of current major biological importance. The program also conducts workshops and field trips that provide information and career guidance about research and other types of science professions in academia, government, and the private sector. SURE is intended for students with a genuine desire to pursue careers in experimental research in the biological and chemical sciences. Students contemplating other occupations in which familiarity with experimental science would be valuable, e.g., science teaching, are encouraged to apply. SURE is particularly interested in students having no prior research experience and students from groups underrepresented in the sciences. Preference is given to individuals completing their junior year. UNC-CH also hosts a Summer Pre-Graduate Research Experience (SPGRE) program. Like SURE, SPGRE Program offers students throughout the country the opportunity to work full-time on research projects under the direction of UNC-CH faculty members. The program is designed for students aiming to pursue graduate studies, particularly Ph.D. degrees. SPGRE is more targeted than SURE toward students from underrepresented groups such as the African American, Native American, Mexican American, and Puerto Rican populations. Students are expected to have a paper produced as their finished product and present their work at the endof-the-program poster session. A number of students make oral presentations of their work during the course of their participation in the program. The program also provides financial support to undergraduates enrolled at UNC-CH to conduct research during the academic year. Additional resources allow us to recruit graduate students and postdocs to satisfy our research and training mission. The Department of Biology administers a training grant in plant genomics awarded by the statewide administration of the University of North Carolina. It provides $450,000 over three years, and is currently in its first year. Students recruited to this program will have the opportunity to participate in this project as rotation students in their first year. The 11 training grant also helps us to foster a sense of community among the grad students in plant genetics, of which there are now a considerable number at UNC-CH. At the postdoctoral level, UNC-CH participates in a program named Seeding Postdoctoral Innovators in Research and Education (SPIRE). SPIRE's mission is to provide rounded training to future scientist researchers and educators while ensuring that the science professions reflect the nation's racial and gender diversity. SPIRE fellows spend 2/3 of their time doing original research in the lab setting of their choice at UNC-Chapel Hill. They are expected to publish in peer-reviewed journals, present research findings at national and international science meetings and participate in journal clubs and laboratory meetings. In addition, fellows spend 1/3 of their time teaching courses and mentoring students at one of seven historically minority universities (HMUs) in North Carolina. Under the mentorship of faculty and staff at UNC-CH and the partner HMUs, SPIRE fellows develop an undergraduate course within their area of expertise and teach this course at one of the HMU campuses. While at the HMUs, fellows work closely with a faculty partner who provides guidance and mentoring. In addition to obtaining research experience and formal and practical training in science education, SPIRE fellows participate in various professional development workshops. SPIRE fellows administer a Distinguished Scholars seminar series for the larger university community and organize an Annual Symposium. As a consequence of these programs, UNC-CH provides an excellent and diverse training environment. Students will join the existing project team, which includes, in addition to the PI, Maria Chacon, a postdoctoral associate with expertise in the genetic structure of domesticated plants in Mesoamerica. Dr. Chacon organized and carried out the recent tomatillo germplasm collection trip in Mexico and has been developing an enriched tomatillo microsatellite library. The team also includes a Biology undergraduate, Matthew Robinson, who joined the laboratory last year as a sophomore, and has been developing a tomatillo cDNA library and will be mapping fruit weight QTLs in a large x small fruited tomatillo F2 population for his independent research project. Timeline of activities Initial activities (Fall and Winter 2003) will focus on the continued development of microsatellite primers (including screening) and the sequencing of the PpORFX region. In Spring and Summer of 2004, the common garden experiment will be established at Central Crops Field Station and phenotypic data will be collected on the core set. At that time, we will also collect DNA for subsequent genotyping and scoring of polymorphisms at the PpORFX locus. The following Fall and Winter of 2004, we will finish genotyping the microsatellites, scoring the polymorphisms at PpORFX, and sequencing the three unlinked loci in the reduced set of genotypes. The Spring and Summer of 2005 will be devoted to analysis and follow-up work. 12