project description - Biology Department | UNC Chapel Hill

advertisement
PROJECT DESCRIPTION:
Association mapping of a candidate domestication gene in Physalis philadelphica
(Solanaceae)
Prior NSF Support
DBI-0110069, “Genomic Analysis of Water Use Efficiency”, 9/1/01-8/31/05, $341,185
(subcontract to UNC-CH). On this award, TJV is a key collaborator and the PIs include J
Comstock (project leader), S McCouch and B Martin. The goals are to map quantitative trait loci
for water use efficiency (WUE) in intra- and inter-specific crosses of tomato and rice using the
relative abundance of stable carbon isotopes as a proxy for WUE. To date, key environmental
parameters for control of the phenotype have been determined, screens of WUE have been
made in a large number of potential parental lines, and initial whole-genome scans have been
carried out in multiple permanent mapping populations of both species. Current work focuses on
(1) verification of initial candidate QTL to pursue for fine-mapping and construction of nearisogenic lines and (2) the development of markers and genetic material to achieve those ends.
Publications describing the findings of the first year are currently in preparation. Several student
and teacher interns have participated in the project so far. The web site has primers on the
science behind the project, progress updates on scientific and outreach goals, announcemounts
about upcoming outreach activities, and more: http://isotope.bti.cornell.edu.
DBI-0227314, “Tools for Plant Comparative Genomics”, 9/1/02-8/31/07, $992,655. On this
award, TJV is the principal investigator. The goals are to integrate a number of inter-related
problems in comparative genomics, including gene family evolution, gene order evolution, and
the divergence of expression patterns among duplicated genes. Analysis of comprehensive
sequence and map datasets from a phylogenetically diverse set of publicly available plant
genome projects are being made available from an interactive web database. An inquiry-based
learning module in genome science is being developed for secondary school students in
collaboration with professional educators. In the first few months of the project, we have made
progress building a code base for the database and curating pilot data. We have improved upon
computational and statistical methods for comparative mapping (manuscript submitted to the 11th
Annual Conference on Intelligent Systems in Molecular Biology, authors: P Calabrese, S
Chakravarty and T Vision) and are presently preparing two manuscripts: one, a novel algorithm
for inference of ancestral gene order (authors: J Huan, J Prins, W Wang and T Vision) and one
an applied study of evolution in the Arabidopsis Aux/IAA and ARF gene families (authors: D
Remington, T Vision and J Reed).
Introduction
Our major crop plants have been greatly modified from their wild ancestors through the process
of selection known as domestication. Changes seen under domestication include both those
that directly serve human needs and preferences (e.g. increased allocation to edible parts) and
those that provide a presumed competitive advantage under cultivation (e.g. change of growth
habit, lack of seed dormancy, reduced seed shattering). Independent domestication of different
crops of the same family or even of the same genus has sometimes resulted in similar
phenotypic changes among closely related taxa. Particularly striking examples of convergent
domestication have been documented in the families Poaceae and Solanaceae. Interestingly, in
these two families, some domestication traits appear to be underlain, at least in part, by genetic
variation at homologous loci.
1
The independently domesticated cereal grains sorghum, rice and maize (in the Poaceae) share
numerous domestication traits, among them larger grain size, reduced disarticulation of mature
inflorescences (shattering), and day-length insensitivity to flowering. Quantitative trait locus
(QTL) mapping, coupled with the application of homologous genetic markers, has revealed that
many of the loci responsible for the quantitative variation in these three traits map to homologous
regions in the three species (Paterson et al. 1995). Such loci have been dubbed ‘homologous
QTLs’.
In the Solanaceae, the independent domestication of several different crops (including tomato,
pepper, eggplant and tomatillo) has resulted in extreme increases in fruit weight relative to the
nearest wild relatives (as much as 500-fold). In tomato, over twelve QTL associated with the
difference in fruit weight between wild and domesticated species have been identified (Grandillo
et al. 1999). The most important of these QTL, fw2.2, is responsible for approximately 30% of
the mean difference in fruit weight (averaged over crosses involving different wild relatives). The
gene responsible for this effect, ORFX, has recently been cloned and characterized (Alpert et al.
1995; Frary et al. 2000). Major QTL affecting fruit traits in pepper (Ben Chaim et al. 2001) and
eggplant (Doganlar et al. 2002) cosegregate with ORFX, suggesting that this one gene may be a
major fruit weight QTL in all three species. Three other fruit weight QTL show homology
between tomato and either pepper or eggplant (Table 1). Similar patterns have been observed
in other traits; not only are many of the same loci responsible for segregating variation in these
different studies, but their effect sizes and dominance relationships are also similar (Ben Chaim
et al. 2001, Doganlar et al. 2002).
Table 1. Locus names and QTL effect sizes for putative homologous QTLs from tomato (Eshed
and Zamir 1995), pepper (Ben Chaim et al. 2001) and eggplant (Doganlar et al. 2002). %C is
percent difference of homozoygous introgression relative to control. %PVE is percent
phenotypic variance explained. Entries marked by a period (.) indicate no significant QTL at the
homologous location.
Tomato
Locus
fw2.2
fw3.1
fw9.2
fw11.1
%C
20
13
9
10
Pepper
Locus
fw2.1
fw3.2
.
.
%PVE
10
15
.
.
Eggplant
Locus
fw2.1
.
fw9.1
fw11.1
%PVE
23
.
33
19
This pattern poses an evolutionary puzzle, since the identities and relative contributions of loci
that contribute to variation in a highly polygenic trait are expected to be more idiosyncratic.
Under certain conditions, uniform selection even on two samples from the same population will
result in greater divergence in allele frequencies at QTL than under random drift alone (Cohan
1984, but see Lynch 1986). Tomato, pepper and eggplant had undergone tens of millions of
years of divergence prior to the recent (<10,000 yr old) application of artificial selection on fruit
size. In such species, which differ not only in allele frequency but even in allelic composition at
potential QTL, it is not at all obvious why the genetic architecture of a novel trait should converge
under uniform selection.
There are a number of other examples of homologous QTLs, most of which come from studies
of traits related to domestication (eg Fatokun et al 1992, Koinange et al 1996, Shoemaker et al
1996, Osborn et al 1997). This is not surprising, since domestication traits in agriculturally
important species have been the focus of the lion’s share of intensive QTL mapping and
comparative mapping studies outside of humans. For the same reasons, domestication-related
2
traits provide convenient experimental systems for exploring the problem further. Despite the
focus on domestication traits, findings from these studies should be of relevance to
understanding the evolution of genetic architectures in complex traits more generally. And
understanding the phenomenon of QTL homology would help guide future efforts to dissect
complex traits, suggesting strategies for the efficient use of candidate genes and model
organisms (such as Drosophila melanogaster and Arabidopsis thaliana). Homologous QTLs
could, in fact, provide a toe-hold into studies of the genetic basis for ecologically important traits
in natural systems.
One important question is: “How frequent is the phenomenon of QTL homology?” There are
certainly examples where QTLs do not correspond across taxa. An interesting case is that of
disease resistance loci in three Solanaceous species: tomato, potato and pepper. While the
positions of loci conferring pathogen resistance, broadly defined, correspond across the species,
the positions of loci specific to particular pathogens do not (Grube et al 2000). Apart from such
striking cases, the finding of an absence of QTL homology is inherently less likely to be
reported. Thus, the generality of the phenomenon may easily be inflated by publication basis.
Even if it is a general phenomenon, there has been no attempt as of yet to synthesize the
literature to estimate how frequently QTL homology does or does not occur or search for
patterns in its occurrence that could give clues as to its underlying causes.
Another important question is: “Do homologous QTLs arise from variation in homologous
genes?” The difficulty is that line-cross studies typically map QTL to intervals 5-20 centimorgans
in size. Such a large interval may easily contain hundreds of genes. This provides grounds for
reasonable doubt that homologous QTL are necessarily underlain by homologous genes. Why
would it be otherwise? One possibility, particularly relevant to domestication traits, is that
selection on multiple traits can cause interactions between the genetic factors underlying each.
Imagine genetic variation for trait A is confined to a single locus while that for trait B is present at
many loci throughout the genome. The single trait A locus that responds to selection may
influence which loci respond to selection for trait B. All other things being equal, those loci for
trait B that are linked to the trait A locus (in coupling with respect to selection) will contribute
more to the selection response. Provided that there are major loci underlying a sufficient
number of traits that show correspondence across species, this could provide a mechanism by
which QTLs may occur at homologous loci without being underlain by the same genes. There
is, in fact, some evidence for linkage among domestication genes for different traits in the cereal
grains (Cai and Morishima 2002). But regardless of whether this particular model is correct, the
point is that it remains to be demonstrated empirically that homologous QTLs either are, or are
not, underlain by homologous genes.
Motivated by these questions, we propose to further study the genetic architecture of fruit weight
in the Solanaceae. This is an especially interesting trait because of its highly polygenic nature
(Grandillo et al 1999). One species in the family for which this trait has not yet been studied is
Physalis philadelphica, which is commonly known as tomatillo or husk tomato. A chloroplast
DNA phylogeny indicates that Physalis diverged prior to the most recent common ancestor of the
clade that contains Lycopersicon (tomato), Capsicum (pepper) and Solanum (eggplant)
(Olmstead and Palmer 1992). Tomatillo is a fruit crop which was domesticated in Mesoamerica
in pre-Columbian times (Montes Hernandez and Aguirre Rivera, 1994) and is still important in
Mexican cuisine. The average fruit weight varies over two orders of magnitude among different
genotypes (Montalvo Hernandez, 1998). It is not yet known what loci contribute to this variation
or whether they are homologous to those in tomato, pepper and eggplant. Thus, tomatillo
provides a useful test of the generality of the phenomenon of homologous QTLs.
3
Tomatillo is also an excellent system in which to directly test the contribution of ORFX to fruit
weight variation. In tomato, pepper and eggplant, all of which are self-compatible, crosses
between inbred lines have been used to generate populations for dissecting complex traits
(MacKay 2001, and see below). By contrast, most genotypes of tomatillo posses gametophytic
self-incompatibility and thus are obligate outcrossers (Pandey 1957). As a result, it is possible to
identify QTL using association mapping, which takes advantage of naturally occurring patterns of
linkage disequilibrium between markers and QTLs. With association mapping, because linkage
disequilibrium is much less extensive than in a line cross population, one can directly assess the
contribution of polymorphisms within a candidate locus to variation in a trait. In addition, the use
of association mapping to identify the causative polymorphism(s) in tomatillos is aided by the fact
that small and large-fruited landraces are grown in close proximity to one another in many
regions of Mexico. There is high fertility in the F1 progeny of crosses between diverse selfincompatible genotypes of tomatillo independent of the fruit weight phenotype (Hudson 1986,
Peña Lomeli 1998). Thus, there has likely been genetic exchange between small and largefruited populations for many generations and, so, linkage disequilibrium should steeply around
the causative polymorphism(s). This will allow us to localize such polymorphisms very precisely
should they be present at the candidate locus.
Objectives
The primary scientific objective of the proposed research is to evaluate the contribution of the P.
philadelphica ORFX homolog (PpORFX), and neighboring loci, to phenotypic variation in fruit
weight.
1. Fruit weight (and other domestication) phenotypes will be measured in a large,
geographically diverse collection of genotypes in a common garden experiment.
2. Nucleotide polymorphisms at PpORFX will be screened and then scored in these same
genotypes.
3. Association mapping will be used to fine-map QTL linked to polymorphisms at PpORFX
that contribute to fruit weight variation.
4. In order to control for the potentially confounding effects of population structure, we will
also score multilocus microsatellite genotypes in these individuals and incorporate the
resulting estimates of population structure into the association mapping analysis.
5. Sequence data will be obtained for three unlinked loci that are not domestication
candidates in order to determine the background patterns of nucleotide variability and
linkage disequilibrium decay.
In a parallel study, the contribution of the larger chromosomal region containing ORFX, as well
as those containing other Solanaceous fruit weight markers, is being measured in an F2 cross
between large and small fruited genotypes. Together, these studies will allow us to determine
whether the ORFX-containing region contributes to fruit weight variation in tomatillo and whether
the causative polymorphism resides in or near ORFX.
Research Plan
Tomatillo germplasm collection
Tomatillo is actively cultivated in 21 Mexican states and uncultivated (though not truly wild)
populations can also be found in south-central Mexico. The tomatillo is usually grown on smallscale traditional agricultural systems. It has been grown on an industrial scale only within the
last 15 years and has not yet received extensive scientific breeding attention (Moriconi et al.,
4
1990). U.S. cultivation is largely in California, but acreage is increasing in several other southern
U.S. states.
Hundreds of seed collections, covering the wide range of fruit weight available in tomatillo, are
available from germplasm banks in the U.S., Mexico and Costa Rica. Though phenotypically
diverse, these seed collections are mainly from the central-western states of Mexico (e. g.
Jalisco, México, Michoacán and Puebla), close to major scientific centers. Judging from
herbarium collections, many states hosting diverse populations of tomatillo (namely, the states of
Guerrero, Oaxaca and Chiapas) are substantially under-represented in these collections.
With support from the Plant Exchange Office of the U.S. Department of Agriculture (USDA), we
spearheaded a collecting expedition for P. philadelphica germplasm in central-southern Mexico
that took place in late October and early November of 2002. Other principal investigators
included Dr. Larry Robertson, curator of the Solanaceae germplasm collection for the USDA,
Ofelia Vargas Ponce from the Universidad de Guadalajara, Mexico and Dr Aureliano Peña
Lomelí from the Universidad Autónoma Chapingo, Mexico. Samples were taken from ninety
collection sites, including (in descending order of importance) the states of Oaxaca, Chiapas,
Jalisco, Guerrero, Puebla, Michoacán and Hidalgo. Table 2 shows the number of documented
accessions available from various sources at the present time, not including commercial
cultivars. Many of these accessions have already been obtained by my laboratory for use in the
present study. Based on discussion with our collaborators in Mexico, our experience from the
2002 expedition, statistical analysis of locality data from herbarium records, the results of prior
agronomic field trials in the U.S. and Mexico, and other published sources, we believe the
accessions that we have obtained provide fairly comprehensive coverage of the geographical
and phenotypic diversity in the species.
Table 2. Numbers of accessions of P. philadelphica seed available from the national germplasm
banks prior to Fall 2002 and additional collections made by the 2002 expedition. In many cases,
multiple accessions have been collected from single populations.
germplasm bank
CATIE, Costa Rica
GRIN, U.S.A
BANGEV, Mexico
2002 expedition
accessions
43
18
391
105
Field measurements of fruit-related traits.
A field plot will be established during the spring and summer of 2004 at the Central Crops
Research Station (CCRS) in Clayton, North Carolina to phenotypically evaluate 100-200
tomatillo accessions (the core set) under common environmental conditions. CCRS, which is
owned and managed by the state of North Carolina, will provide the equipment, expertise and
manpower for greenhouse germination, plot preparation, transplanting, irrigation, fertilization,
and pest/pathogen control. We will measure fruit-related traits on accessions that have been
chosen to cover the range of the species both geographically and (where prior information is
available) in fruit weight. We will measure a suite of additional domestication-related traits
(Table 3) in order to measure character correlations and generate a useful dataset for future
studies. Summer interns will participate in the harvest, phenotyping, and data analysis from this
experiment.
5
Sequencing and genotyping of the PpORFX homolog
For this study, PpORFX and portions of the flanking intergenic spacer regions will be sequenced
in a select sample of genotypes from the core set to identify insertion-deletion and single
nucleotide polymorphisms. It is important to include the intergenic spacers since variation in
non-coding cis-regulatory sequences could affect the expression of ORFX and thus contribute to
fruit weight variation. In tomato, nucleotide differences 5’ upstream sequence of ORFX gene
appear to be responsible for the functional difference between the wild and domestic alleles
(Frary et al., 2000, Nesbitt and Tanksley 2002). A sample of these polymorphisms will then be
genotyped in the full core set for use in association mapping. Having haplotype sequences of
PpORFX early in the project will be helpful for planning of the genotyping task since they will
enable us to measure the pattern of linkage disequilibrium decay and select an appropriate
marker density for association mapping (Remington et al 2001). These sequences will also
allow us to estimate the power of the association mapping analysis. Cloning of short and longrange PCR products will be used to obtain haplotype data, since heterozygosity is likely to be
high in this outcrossing species. The genotyping methodology to be used on the full core set (for
the association study itself) will depend upon the pattern of polymorphism that we find at the
locus.
Table 3. Traits to be measured before and at harvest in field trials of core set. Italics indicate
traits identified as differing between cultivated and uncultivated accessions by Montes
Hernandez (1989).
Category
Plant
Seed
Fruit
Flower
Stem
Leaf
Trait
Height
Growth habit
Germination
Number per fruit
Weight
Days to fruiting
Fresh and dry weight
Diameter
Volume
Specific weight
Color
Pedicel length
Number per plant
Calyx color
Days to flowering
Pedicel length
Length and width of corolla
No. of nodes
Internode length
Color
Trichome density
Length
Width
Number of teeth
6
A fragment of PpORFX has already been isolated (A Habel and TJV, unpublished results). A
pair of degenerate primers was designed based upon the predicted amino acid sequence of the
tomato ORFX gene product and a homologous expressed sequence tag from Petunia hybrida, a
distant relative in the Solanaceae (Olmstead and Palmer 1992). Among the products obtained
was a 750bp fragment 64% identity at the amino acid level to portions of the first and second
exons of tomato ORFX. Twenty-two single nucleotide and six indel polymorphisms have been
identified in this fragment among three closely related tomatillo accessions. To isolate the
remainder of the gene and the flanking intergenic spacers, we are using inverse PCR (Ochman
et al. 1988) and GeneWalker libraries (BD Biosciences).
It is worth pointing out that the function of the ORFX gene product is not well understood at this
time. The protein was initially thought, based on protein-structure threading, to be a distant
homolog of the human Ras oncogene (Frary et al 2000) but subsequent analysis has cast doubt
on this (TJV, unpublished). Though homologs have been identified in the expressed sequence
tag libraries of other plant species, and distant homologs are present in tomato, there are no
unambiguous homologs of known function (Frary et al 2000). The protein is chiefly expressed
prior to anthesis in developing ovaries; the large-fruited allele is responsible for an increase in
cell number, though not cell size, due to a heterochronic shift in expression timing (Cong et al
2002). Should our work demonstrate the role of ORFX in fruit weight variation in tomatillo, it
would likely motivate studies to elucidate the function of ORFX by examination of the knockout
phenotypes of its two close Arabidopsis homologs.
Additional sequence data
In addition to the haplotype data from PpORFX, we will obtain equivalent data for smaller
regions from three additional unlinked loci in order to determine whether patterns of
polymorphism and linkage disequilibrium are at all unusual at PpORF. This could provide an
insight into the history of selection at the PpORFX locus. Selection is of interest both because of
the potential role of PpORFX in domestication and also because prior balancing selection could
conceivably contribute to QTL orthology. If a locus with a long history of balancing selection
contributes to variation in a trait that suddenly comes under directional selection, then that locus
may contribute a large proportion of the initial response to selection by virtue of its having built
up numerous, functionally different alleles. There are several well-known cases of interspecific
polymorphisms due to very long-standing balancing selection (e.g. Ioerger et al 1990). Where
such a balancing polymorphism is present, then convergent directional selection in multiple
species might drive convergent allele frequency changes at orthologous loci. This explanation is
entirely speculative, and it relies on arguable assumptions about both the frequency of balancing
selection in nature and the contribution of standing variation to selection response. Yet, the
explanation is consistent with the finding in tomato that the large fruited ORFX allele appears to
have diverged from the extant lineage of small-fruited alleles millions of years ago, long prior to
domestication (Nesbitt and Tanksley 2002). For this reason, it would be desirable to test for the
presence of balancing selection at PpORFX should the locus prove to be associated with fruit
weight variation. Though not all tests of selective neutrality at a locus require an outgroup
sequence (eg Tajima 1989), several do (eg McDonald and Kreitman 1991, Fay and Wu 2000).
In order to have the data needed to perform such tests, we will also sequence the homologs of
these loci from at least one other Physalis species.
Conserved Orthologous Sequences (COS) markers are sequence-tagged markers for mapped,
conserved single copy genes in tomato (Fulton et al 2002). The positions of each COS ortholog
in Arabidopsis thaliana is known (www.sgn.cornell.edu). Thus, the COS markers are a useful
7
starting point for comparative mapping in other eudicotyledonous species. Steven Tanksley
(Cornell University) has kindly provided us with a number of primer pairs that amplify single-copy
tomatillo orthologs of different COS markers. We have selected four pairs for use in this study
which provide amplification products of sufficient length and are on chromosomes in tomato
other than chromosome two, where ORFX is located (Table 4). Three of the primer pairs will be
used to amplify the loci that are to be sequenced for this section of the project.
Table 4. Loci to be used for study of background levels of polymorphism and linkage
disequilibrium. Listed are the chromosome for the COS marker in tomato and the amplicon size
in tomatillo.
COS ID
T0142
T0161
T0687
T1347
chromosome amplicon size
11
1000-1650 bp
9
650-850 bp
6
650-850 bp
7
850-1000 bp
putative function
lipid/fatty-acid/isoprenoid metabolism
MRP-like ABC transporter
unknown
possible apospory-associated protein
Line-cross mapping
In a parallel study, not included within the scope of this proposal, we are using line-cross
methodology to determine whether the region containing ORFX contributes to fruit weight
variation in tomatillo. In the next section, we describe the association mapping approach to
determine whether ORFX itself, rather than a QTL to which it is in LD in the line-cross progeny,
underlies any of the variation.
The use of line cross methods for studying the genetic basis of quantitative variation within and
between species is well established (Mackay 2001). An experimental cross induces linkage
disequilibrium (LD) between loci that differ between the parents. If one measures the trait(s) of
interest and scores markers throughout the genome (at 5-20 cM spacing) in a collection of
segregating progeny (e.g. F2), then one can test whether a particular locus explains a significant
fraction of the phenotypic variation among the progeny. QTL of small effect may evade
detection, but major QTL (that explain 20% or more of the variation) can be reliably detected in
most designs.
We are isolating markers in tomatillo for the major fruit weight loci in tomato, pepper and
eggplant and measuring the variation explained by segregation at these markers to fruit weight
in an F2 cross between large and small-fruited genotypes. PpORFX is one of the markers to be
used; additional markers are being obtained by screening an existing tomatillo callus cDNA
library (M. Robertson and TJV, unplublished) using heterologous probes obtained from the
tomato expressed sequence tag collection. Phenotypic and genotypic data will be obtained for
>100 individuals so as to provide sufficient power to detect QTL of moderate effect, and the traits
measured will be the same as those in Table 3.
Association mapping
The regions to which QTL are mapped by line cross methods are typically 5-20 cM in length. In
organisms such as P. philadelphica, such regions are too coarse to implicate specific genes. To
isolate the gene itself by positional cloning, the typical strategy is to obtain large numbers of
additional recombinants in the region, and to eliminate segregating background variation, by
8
progressing through several more generations of genotyping and phenotyping. Once the QTLcontaining region has been narrowed down to a manageable number of candidate genes, these
are then typically tested for phenotypic effect using transgenic techniques (MacKay 2001).
Positional cloning is thus laborious, expensive, risky, and difficult to implement in non-model
organisms.
Instead, the contribution of a candidate locus to variation in a trait of interest can be directly
determined by measuring the association between polymorphisms at the locus and trait variation
in a sample of naturally occurring genotypes (Risch, 2000; Risch and Merikangas, 1996). By
‘naturally occurring’, we mean a sample of genotypes with alleles having a sufficiently deep
coalescent history that recombination will have broken down any LD present over long
distances. Thus, while it is not necessary to score the causative polymorphism itself in order to
see an association, it is necessary to score a polymorphism sufficiently close to the causative
one that LD is present between them. With association mapping, one can identify a region
containing a causative polymorphism (i.e. QTL) with much finer resolution than with a line-cross
QTL experiment. Association mapping is greatly facilitated by having a limited candidate region
to start with, because whole-genome association mapping requires an enormous number of
markers and must make severe corrections for multiple tests. A voluminous literature on
association mapping has accumulated in the last several years, particularly in the field of human
medical genetics.
Population structure
An important caveat with association mapping is that, in structured populations, LD between two
polymorphisms may arise even in the absence of physical linkage. Such associations will create
false positives in a mapping experiment. The effect can be understood by considering unlinked
alleles that are, because of population structure, both at high frequency in one subpopulation
and low frequency in another. Even if allele A affects the measured phenotype but B does not,
locus B will appear to be associated with the phenotype. This problem has received a good deal
of attention in the human genetics community and a number of solutions have been proposed
(Devlin and Roeder 1999, Pritchard et al 2000b, Reich and Goldstein 2001).
Pritchard et al. (2000b) introduced a popular test for association that corrects for population
structure. The first step is to score the genotypes in the association mapping sample at a
number of unlinked marker loci. A simple statistical test for population structure in the sample is
then performed (Pritchard et al. 2000a). If structure is detected, then one can estimate the
number of subpopulations and the proportion of the genome of each individual that is derived
from each subpopulation. This matrix of estimated proportions is incorporated into the test for
association at the candidate locus, thereby not only correcting for population structure but also
allowing one to detect associations that are in different phase in different subpopulations.
Pritchard and Rosenberg (1999) have shown using simulations that a limited number of
microsatellite loci (15-20) is sufficient to detect stratification under two different models of
population structure. Therefore, we propose to develop 15-20 microsatellite markers in tomatillo
and to genotype these in the core set. If population structure is detected in our sample, we will
incorporate the estimated population structure into the association mapping analysis using the
method of Pritchard et al. (2000b) as later modified by Thornsberry et al. (2001) to accomodate
quantitative traits.
9
Enriched-microsatellite library
To date, we have constructed an enriched-microsatellite library using the double enrichment
method of Fleischer and Loew (1995). Since plant microsatellites tend to be AT-rich (Cardle et
al. 2000, Gupta and Varshney 2000, Lagercrantz et al. 1993, Powell et al. 1996), we have used
ten different biotin-labeled 30mer oligonucleotide probes for enrichment: (ACT)10, (AGT)10,
(AAG)10, (ATC)10, (ATG)10, (TTC)10, (AAT)10, (TTA)10, (TTG)10 and (AAC)10. These probes are
complementary to all thirty AT-rich trimeric repeat motifs because each one is complementary to
three different overlapping motifs. For instance, probe (ACT)10 is complementary to
microsatellites composed of the motifs ACT, CTA and TAC. We focused on trimeric repeats
because they are the most abundant repeats within genes (Cardle et al 2000) and because they
are relatively easy to score. After the second enrichment step, four of the probes yielded good
smears in the appropriate size range: (ACT)10, (AAT)10, (AAG)10, and (TTG)10
The next step will be to clone these fragments (we are using the pBluescript II KS (+) Vector
from Stratagene), end-sequence the inserts using universal primers, identify those clones
containing unique triplet repeats, and design PCR primers to amplify them. The microsatellites
will then be screened for polymorphism in a small (10-20) set of genotypes. Those that are
sufficiently polymorphic will be used for the analysis of population structure. For genotyping, the
amplicons will be separated in denaturing acrylamide gels, stained with ethidium bromide, and
scored manually.
Anticipated results and scientific importance
The proposed study will provide a definitive answer to the question: “Does the PpORFX itself
contribute to fruit weight variation in tomatillo?” In answering this question, we will generate the
following resources:
1. A set of 15-20 polymorphic AT-rich trimeric microsatellites
2. Fruit weight and a suite of other domestication-related trait measurements in the core set
3. Multilocus microsatellite genotypes for the core set
4. Haplotype data for four loci, including PpORFX, from a subset of the core
In addition to the experimental work described above, we anticipate reviewing the literature
pertaining to homologous QTLs and evaluating models that can shed light on the phenomenon.
While the primary emphasis of the study is on the contribution of PpORFX to variation in fruit
weight in tomatillo, the data obtained from this study will also be of use in answering other
questions. The microsatellite genotypes will help to elucidate the phylogeographic structure of
P. philadelphica and possibly shed light on the genetic history of its domestication. The patterns
of sequence polymorphism and LD decay in tomatillo are of interest in their own right, as
evolutionary geneticists currently have data on these from a limited number of plant species.
There is considerable interest in the association mapping of candidate loci underlying complex
traits in non-model organisms and natural populations. For that reason, it is desirable to have
quantitative data on the scale of LD decay in an obligate outcrossing plant such as tomatillo.
If the results suggest that PpORFX does, in fact, contribute to fruit weight variation in tomatillo, it
would lay the groundwork for tests as to the cause of the phenomenon of QTL orthology.
Physalis is an attractive system for these studies, as a number of species are native to North
Carolina and would thus be accessible for field manipulations. The results of these studies could
10
have consequences for many areas of evolutionary genetics by providing novel insights into the
evolution and genetic architecture of complex traits.
Recruitment, education and training plan
The personnel on this project would include the PI (TJV), at least one postdoctoral investigator,
rotating graduate students from the Department of Biology at UNC-CH, at least one
undergraduate thesis student, and, should recruitment be successful, one or more interns from
the two summer undergraduate research programs that are active at UNC-CH (see below). This
will create opportunities for mentoring experiences at many levels. In addition to participating in
the research itself, students and postdocs are expected to participate in lab meetings and
departmental activities, attend seminars and interact with other scientists at UNC-CH and
neighboring institutions (particularly Duke University and North Carolina State University), to
present their work at scientific meetings, to write and peer review journal articles, to mentor
younger students, and to gain experience in teaching, where possible.
There are a number of special programs at UNC-CH through which students may be recruited
for this project. Since the programs mentioned here cover all expenses for the students,
additional funds are not included on the budget for this project. Two programs, in particular,
draw undergraduate students to the UNC-CH campus. On is the NIH-funded Summer
Undergraduate Research Experience (SURE). SURE is a competitive program drawing from a
national applicant pool that provides opportunities for students to carry out independent research
projects under the guidance of UNC-CH faculty mentors during the summer months. Through
meetings with guest scientists, the program promotes awareness of the diversity of research
areas, especially areas of current major biological importance. The program also conducts
workshops and field trips that provide information and career guidance about research and other
types of science professions in academia, government, and the private sector. SURE is
intended for students with a genuine desire to pursue careers in experimental research in the
biological and chemical sciences. Students contemplating other occupations in which familiarity
with experimental science would be valuable, e.g., science teaching, are encouraged to apply.
SURE is particularly interested in students having no prior research experience and students
from groups underrepresented in the sciences. Preference is given to individuals completing
their junior year.
UNC-CH also hosts a Summer Pre-Graduate Research Experience (SPGRE) program. Like
SURE, SPGRE Program offers students throughout the country the opportunity to work full-time
on research projects under the direction of UNC-CH faculty members. The program is designed
for students aiming to pursue graduate studies, particularly Ph.D. degrees. SPGRE is more
targeted than SURE toward students from underrepresented groups such as the African
American, Native American, Mexican American, and Puerto Rican populations. Students are
expected to have a paper produced as their finished product and present their work at the endof-the-program poster session. A number of students make oral presentations of their work
during the course of their participation in the program. The program also provides financial
support to undergraduates enrolled at UNC-CH to conduct research during the academic year.
Additional resources allow us to recruit graduate students and postdocs to satisfy our research
and training mission. The Department of Biology administers a training grant in plant genomics
awarded by the statewide administration of the University of North Carolina. It provides
$450,000 over three years, and is currently in its first year. Students recruited to this program
will have the opportunity to participate in this project as rotation students in their first year. The
11
training grant also helps us to foster a sense of community among the grad students in plant
genetics, of which there are now a considerable number at UNC-CH.
At the postdoctoral level, UNC-CH participates in a program named Seeding Postdoctoral
Innovators in Research and Education (SPIRE). SPIRE's mission is to provide rounded training
to future scientist researchers and educators while ensuring that the science professions reflect
the nation's racial and gender diversity. SPIRE fellows spend 2/3 of their time doing original
research in the lab setting of their choice at UNC-Chapel Hill. They are expected to publish in
peer-reviewed journals, present research findings at national and international science meetings
and participate in journal clubs and laboratory meetings. In addition, fellows spend 1/3 of their
time teaching courses and mentoring students at one of seven historically minority universities
(HMUs) in North Carolina. Under the mentorship of faculty and staff at UNC-CH and the partner
HMUs, SPIRE fellows develop an undergraduate course within their area of expertise and teach
this course at one of the HMU campuses. While at the HMUs, fellows work closely with a faculty
partner who provides guidance and mentoring. In addition to obtaining research experience and
formal and practical training in science education, SPIRE fellows participate in various
professional development workshops. SPIRE fellows administer a Distinguished Scholars
seminar series for the larger university community and organize an Annual Symposium.
As a consequence of these programs, UNC-CH provides an excellent and diverse training
environment. Students will join the existing project team, which includes, in addition to the PI,
Maria Chacon, a postdoctoral associate with expertise in the genetic structure of domesticated
plants in Mesoamerica. Dr. Chacon organized and carried out the recent tomatillo germplasm
collection trip in Mexico and has been developing an enriched tomatillo microsatellite library.
The team also includes a Biology undergraduate, Matthew Robinson, who joined the laboratory
last year as a sophomore, and has been developing a tomatillo cDNA library and will be mapping
fruit weight QTLs in a large x small fruited tomatillo F2 population for his independent research
project.
Timeline of activities
Initial activities (Fall and Winter 2003) will focus on the continued development of microsatellite
primers (including screening) and the sequencing of the PpORFX region. In Spring and Summer
of 2004, the common garden experiment will be established at Central Crops Field Station and
phenotypic data will be collected on the core set. At that time, we will also collect DNA for
subsequent genotyping and scoring of polymorphisms at the PpORFX locus. The following Fall
and Winter of 2004, we will finish genotyping the microsatellites, scoring the polymorphisms at
PpORFX, and sequencing the three unlinked loci in the reduced set of genotypes. The Spring
and Summer of 2005 will be devoted to analysis and follow-up work.
12
Download