Development of multilocus nuclear intronic markers for a

advertisement
Development of multilocus nuclear intronic markers for a phylogeographic study of
Platystemon californicus
Nathan Poslusny
Biology 98
End-of-Semester Progress Report
Spring 2005
Advisor:
Todd Vision, Department of Biology
Abstract
Platystemon californicus is an obligately outcrossing annual plant that inhabits
both isolated serpentine habitats (magnesium and iron rich soils) and a variety of nonserpentine habitats. The objective of this study is to develop multilocus nuclear intronic
markers to analyze the phylogeny of Platystemon and infer the evolutionary history of
serpentine colonization. In this study, the plant comparative genomic database, Phytome,
was used to find conserved protein sequences from plant taxa related to Platystemon that
could be used to design degenerate primers. Degenerate primers were designed to span
introns using the COnsensus-Degenerate Hybrid Oligonucleotide Primer (CODEHOP)
strategy and were optimized for PCR conditions. This study shows that new
developments in comparative sequence analysis tools may aid the design of intronic
markers that could be broadly applicable for systematic and phylogeographic studies.
Introduction
The objective of this study is to design multilocus nuclear intronic markers for use
in a phylogeographic study of Platystemon californicus. Platystemon (Papaveraceae) is
an obligately outcrossing annual plant distributed from southern Oregon to Baja
California and east to Utah and Arizona (Vision, 1998). It inhabits both isolated
serpentine habitats, characterized by ultramafic rocks (magnesium and iron rich) and
nutrient poor soils, and a variety of non-serpentine soils (Vision, 1998). Primarily, we
would like to determine whether current serpentine populations of Platystemon were the
product of a single colonization or multiple independent colonizations from neighboring
non-serpentine populations.
These two alternative colonization histories can be distinguished from each other
by comparing the phylogenetic relationships of serpentine and non-serpentine
populations. If present serpentine populations are the result of a single colonization event
and subsequent long distance dispersal between serpentine populations then all the
serpentine populations should be descendents of a single common ancestral population.
If present serpentine populations are the result of multiple independent colonizations then
they should be more closely related to nearby non-serpentine populations then to
geographically distant serpentine populations.
Previous phylogenetic results using chloroplast and nuclear DNA sequences
suggest that serpentine populations of Platystemon have been independently colonized
from non-serpentine populations (Vision, 1998). However, these results are inconclusive
because of the limitations of the genetic markers used in the study. The chloroplast DNA
sequences were from the Type 1 intron within the anticodon of trnL(UAA) and the
intergenic spacer between trnL(UAA) and trnF(GAA) from the single–copy region of the
cholorplast genome. These sequences were shown to be insufficiently polymorphic to
distinguish between the alternative colonization hypotheses and resulted in a cpDNA tree
of poor resolution. The nuclear DNA sequences examined were from the internal
transcribed spacers (ITS) between the 18S and 5.8S or the 5.8S and 26S rRNA coding
regions, known as ITS1 and ITS2, respectively. The rapidly evolving ITS regions and
the highly conserved rRNA coding regions form tandemly repeated elements that occupy
thousands of loci in the plant genome (Avise 2004). Although repeats maybe subject to
the homogenization effects of concerted evolution (Avise 2004), the ITS regions in
Platystemon exhibited only partial homogenization and some tandem repeat sequences
evolved independently of all others. This made it difficult to interpret and draw
phylogenetic conclusions from the ITS sequences.
Multilocus nuclear intronic markers may circumvent many of the problems
associated with the chloroplast and ITS markers used in the previous study.
Nuclear intronic markers are designed from polymerase chain reaction (PCR) primers
that are anchored in conserved exons that span target introns (Figure 1). They are useful
markers for phylogenetic studies because introns are less subject to selective constraint
and evolve quickly. Differences in nucleotide sequences within intronic regions can be
used to infer patterns of divergence and phylogenetic relationships between species or
conspecific populations. The use of multiple independent loci for phylogenetic analyses
is important because it increases the probability of constructing a true species phylogeny
and decreases the probability of recovering gene trees that only represent the evolution of
the single locus (Crow et al. 2004, Madison 1997, Nichols 2001).
The challenge with using multilocus nuclear intronic markers is that they are
difficult to design because limited DNA sequence data exists for most plant species.
Nuclear intronic markers can only be designed for species in which the DNA sequence
data is available and the position of introns relative to the exons is known. The lack of
sequence data for many species, such as Platystemon, has deterred the development of
these markers for phylogenetic analyses of these species. However, new comparative
sequence analysis tools and degenerate primer design strategies may aid in the design of
markers for plant species with unknown target sequences. Using these tools, degenerate
primers can be designed for an unknown target sequence based on a consensus of
available DNA sequences from conserved genomic regions of related plant taxa.
In this study, the plant comparative genomic database, Phytome
(www.phytome.org) was used to search and align unipeptides (predicted protein
sequences from EST contigs) within gene families of related plant species to
Platystemon. The most conserved unipeptide sequences were used to design degenerate
primers using the COnsensus-Degenerate Hybrid Oligonucleotide Primer (CODEHOP)
strategy (Rose et al. 1998). The degenerate primers were then used to design specific
(nondegenerate) primers to sequence serpentine and non-serpentine populations sampled
across the geographic distribution of Platystemon. The ultimate goal is to construct
phylogenetic trees from these sequences to infer the history of colonization of
Platystemon from non-serpentine to serpentine populations.
Methods
Selection of Genes and Primer Design
The PCR design pipeline used in this study was initially developed by Dr.
Stephanie Hartmann and utilizes the Phytome database. This database contains protein
sequences predicted from aligned EST contigs and clustered into gene families. A search
was made on Phytome to generate gene families that are present in Eschscholzia
californica (California poppy), the closest related species to Platystemon in the Phytome
database, and Arabidopsis thaliana, a species for which the intronic regions of the
genome have been predicted. Gene families were selected on several criteria: they had
no more than one Arabidopsis member or Eschscholzia member, they have sufficient
sequence overlap between Arabidopsis and Eschscholzia (> 80 amino acids), and they
had long uninterrupted stretches of unipeptide sequences. (> 50 amino acids) with no
more than four gaps of fewer than 20 amino acids each. A Perl script written by Jason
Phillips was used to obtain the DNA sequences for the selected gene families from EST
contigs stored in the Phytome database.
Intron positions within the gene family were based on Arabidopsis gene models
obtained from the TIGR Institute of Genomics (www.tigr.org). The positions of intron
junctions within the Arabidopsis sequence were assumed to be conserved for the all the
taxa including Platystemon. Conserved exons were used to design degenerate PCR
primer using the CODEHOP (http://blocks.fhcrc.org/codehop) strategy (Rose et al.
1998). In the CODEHOP program, all the settings were left as default, unipeptide
sequences were unweighted, and the codon usage was set to Eschscholzia californica.
The CODEHOP primers were aligned to the original DNA nucleotide sequences using
Cinema 5 Version 0.2.1. The CODEHOP primers did not consistently match the
consensus of DNA nucleotides at each position. Therefore, modified primers were
manually designed by eye to better correspond with the consensus DNA nucleotide
sequences and the DNA nucleotide sequence of Eschscholzia. The primer position
predicted by CODEHOP was not modified.
DNA Extraction
Dried, pressed samples of Platystemon were collected between 3/28/95 and
11/20/97 by Dr. Todd Vision. The samples were collected from a total of 41 serpentine
and non-serpentine soils across the geographic distribution of Platystemon. DNA was
extracted from these samples using a Qiagen DNeasy Plant Mini Kit according to the
suggested protocal. The DNA was quantified using PicoGreen dsDNA Quantification
Kit (Molecular Probes) on a Tecan GENios fluorometer using with, Magellan version
3.11 software.
PCR Amplification
Primers were synthesized by MWG-Biotech. Each set of primers was optimized
with 3 magnesium concentrations, 2.0 M, 2.5 M, and 3.0 M. across 6 different
temperatures, ranging from 60.0-69.7. The PCR cocktails (25 L) consisted of 10X
Promega PCR buffer, 10 M of each primer, 10 M of dNTP, 1 unit of Taq DNA
polymerase (Promega), 2.0-3.0 M of MgCl2, and 5-20 ng of DNA template. These
amplification reactions were performed using a MJ Research PTC-225 thermal cycler.
The PCR amplification test conditions were: 94 for 1 min, followed by 30 cycles of 94
for 1 min, 60-69.7 for 1 min, 72 for 90 sec, and then 2 min for 72. PCR products were
run on 1.5% agarose gels for 45 minutes at 94 V and visualized with ethidium bromide
staining.
Sequencing
Amplified PCR products were prepared for sequencing using the Qiagen
QIAquick PCR purification Kit according to the suggested protocol. The isolated PCR
product was quantified using the same method as for template DNA. PCR products were
sequenced using the ABI Prism dGTP Big Dye Terminator v3.1 Cycle Sequencing Ready
Reaction Kit. Sequenced products were sent to the Evolgen Sequencing Facility at UNCChapel Hill to be analyzed using the ABI 3100-Avant Genetic Analyzer (Applied
Biosystems). The nucleotide sequences were analyzed and edited using Vector NTi
(InforMax Inc.).
Results
Extractions were performed on 16 of the 41 samples from populations across the
geographic distribution of Platystemon for use in the initial testing of nuclear intronic
markers. DNA concentrations ranging from 0.202-34.187 ng/L were successfully
extracted from the dried, pressed samples. All the DNA extractions tested in a PCR using
control primers, ITS 4 and 5 which amplify the ITS 1 sequence for most plants,
successfully amplified.
Phytome retrieved a total of 2077 gene families that were found both in the
Arabidopsis and Eschscholzia genome. A search of approximately 380 gene families was
performed and 17 of these families were selected for further examination based on the
previously specified criteria (Table 1). Of the 17 families, three of the families were
excluded from primer design because the exons were either too degenerate or too short or
because introns were only present in regions where sequence data was limited to
Arabidopsis.
A total of seven primer pairs ranging from 33-39 bp were designed for this study
(Table 2). Each primer consisted of a short degenerate 3’ core region (11-12 bp) and a
non-degenerate 5’ consensus clamp region of (18-25 bp) according to the CODEHOP
design strategy (Rose et al. 1998). The primers had G+C content of 33.7-56.6% and Tm =
65.7-73.9C. The primers were designed for six different exons within five gene
families.
Primer pairs for gene families 4442, 7181, and 7734 amplified a PCR product
(Table 2). The other primers failed to amplify with the tested PCR conditions, but
different PCR conditions are being tested. Only the PCR product for the gene family
7734 has been sequenced, but it could not be verified as the target locus because the
sequencing results indicated that multiple loci were amplified.
Discussion
We have developed a novel method to design multilocus nuclear intronic markers
for an orphan species sequence using a consensus of available sequences from conserved
genomic regions of related plant taxa. Using this method, we have identified 14 gene
families out of 380 as possible candidates for PCR-primer design and have designed
seven PCR-primers for six conserved introns within five of these genes. Of the seven
designed degenerate PCR-primers, four of the primers successfully amplified a PCRproduct. The nucleotide sequence data for 7734.2a indicates that multiple loci have been
amplified. Therefore, the PCR-product will be cloned to isolate the correct amplification,
if present, from the others. Once the PCR product has been verified as the target locus,
specific primers will be designed. Using these primers, PCR assays will be performed on
samples from a wide geographic range to verify that the target locus is sufficiently
polymorphic for use in the phylogenetic study. Future plans include developing primers
for the other gene families and optimizing PCR conditions for these primers.
Polymorphisms within these intronic regions will be used to study the serpentine
colonization history of Platystemon.
Although more research is needed to assess the effectiveness of this method in
designing multilocus nuclear intronic markers, this approach could be broadly applicable
for systematic and phylogeographic studies. The lack of sequence data for most plant
species limits the types and number of molecular markers that can be used to assess
phylogenetic relationships between species or conspecific populations. However, this
study shows that new developments in comparative sequence analysis tools may aid the
design of new intronic markers.
References
Avise, J.C. (1994) Molecular Markers, Natural History, and Evolution. Chapman and
Hall: New York.
Crow, K.D, Kanamoto, Z., Bernardi, G (2004) Molecular phylogeny of the hexagrammid
fish using a multi-locus approach. Molecular Phylogenetics and Evolution 32: 986-997.
Maddison, W.P. (1997) Gene Trees in Species Trees. Systematic Biology 46: 523-536.
Nichols, R. (2001) Gene Trees and species trees are not the same. Trends in Ecology &
Evolution 16: 358-363.
Rose, T. et al. (1998) Consensus-degenerate hybrid oligonucleotide primers for
amplification of distantly related sequences. Nucleic Acids Research 26: 1628-1635
Vision, T.J. (1998) Differentiation Among Serpentine Populations of Platystemon
californicus: Environmental, Historical, and Genetic Influences. PhD dissertation:
Princeton University.
Acknowledgements
I thank Dr. Todd Vision for allowing me to work in the Vision lab, Dr. Stephanie
Hartmann for designing the primer design pipeline, Dr. Maria Tsompana for her guidance
with the lab protocols, Jason Phillips for writing the Perl script that accessed DNA
sequence data from Phytome, Dr. Paul Gabrielson for his advice on lab technique, and
Dr. Eric Ganko for his advice in writing this paper.
Table 1: The Phytome identification numbers of the gene families initially selected for
primer design and the corresponding Eschscholzia californica and Arabidopsis thaliana
unipeptide sequence identification numbers. Asterisks indicate the gene families that
were unable to be used for primer design because of short or degenerate exons or no
introns.
Gene
Family
Phytome
ID
Eschscholzia
californica
Unipeptide ID
Arabidopsis
thaliana
Unipeptide ID
4011
Ecal673
Atha51630
4442
Ecal1337
Atha47999
4593
Ecal1225
Atha29868
4802
4897
Ecal3802
Ecal5462
Atha45172
Atha33336
5318
5532
6259
6794
7181
Ecal1080
Ecal1675
Ecal3427
Ecal1940
Ecal4770
Atha48559
Atha35375
Atha73110
Atha50946
Atha33492
7437
Ecal849
Atha34640
7734
Ecal812
Atha37444
6721
8372
5926*
7209*
7984*
Ecal1448
Ecal5273
Ecal1183
Ecal841
Ecal5237
Atha34173
Atha64047
Atha37340
Atha37680
Atha31332
Comments
8 exons; Primers designed for
intron 3
2 exons; Primers designed for
intron 1
3 highly conserved exons (2540 codons)
4 conserved exons
10 conserved exons; short
exons (~20 codons)
3 conserved exons
2 conserved exons
5 conserved exons
5 conserved exons
2 conserved exons; Primer
designed for intron 1
8 conserved exons; Primers
designed for introns 4-8
3 conserved exons; Primers
designed for intron 2
4 conserved exons
3 conserved exons
No introns
No introns
Exons are too degenerate to
design primers
Table 2: Degenerate primers and optimal PCR conditions.
Primer pairs (5’-3’)
Gene
Family ID
Target
Intron(s)
4011
3
4442
1
7181
1
7437
4, 5, 6
7437
7,8
7734
2
7734
2
Forward
MGRTCHWATG ATGAARTYGC
WATWTGTTCC TTY
GCDTTCAAGG TRTATGAAAG
AGGTRTYRAG ATATTCAAR
TGAMGCTGGW GCTTCTCTRC
TCTTTGGTTT CYT
TCCTGYCCWA AGGDTCDNTR
AGACAYAGR GA
CCWCCWRRAA AGYTKGARCT
YTTCTCWTAY GAR
YRTBTCDGCW GCDTTYCGYC
GYTCAGCTGA TGCDYT
YTDGTGGTKGA AGGTYTVDST
GATTTTGGAA AYRT
Reverse
DACRWAAACA CCAKKYAGRT
CYTGHARATT
AAATATCTCM GCTGCWCGRG
MWATYTCRTA
MATGGGYGAG AMCCKWGCAA
GAAGRACAAA YTY CCA
CTDCCDGCWC GRAGAAKAGT
HGGCAYCCAT CCD
CARDATCDTC TTCARRTCVC
CHRABTGVAM HCCAGTGTT
RGBDGWRTCC CAAADYTCAA
TRTCCARCCA TTT
RGBDGWRTCC CAAADYTCAA
TRTCCARCCA TTT
PCR
Product
Size (bp)
N/A
Optimal PCR
Conditions
(C/M Mg+2)
N/A
~500
50.0/3.0
N/A
N/A
N/A
N/A
N/A
N/A
~300
60.6/2.5
~300
60.6/3.0
Figure 1: A. Multilocus nuclear intronic markers are designed to be anchored in
conserved exon regions and span target introns. B. Some markers span multiple intronic
regions because intermediate exons were too degenerate or too short for primer design.
These markers were required to be less than 2,000 bp to facilitate sequencing.
A.
Conserved Exon 1
Intron
Conserved Exon 2
B.
Conserved Exon 1
Intron 1
Exon 2
Intron 2
Conserved Exon 3
Figure 2: PCR Primer Design Pipeline
2077 candidate gene families
380 gene families searched for specified
criteria
17 gene families met the selected criteria
Intron positions were predicted
Phytome was used to query for gene
families found in Arabidopsis and
Eschscholzia
-Must contain only one Arabidopsis or
Eschscholzia member
-Sufficient overlap between Arabidopsis
and Eschscholzia (> 80 amino acids)
-Long uninterrupted stretches of
unipeptide sequence (> 50 amino acids)
-No more than 5 gaps within unipeptide
sequence
-Gaps less than 20 amino acids long.
15 gene families had introns within unipeptide
sequences
Realigned sequences around
introns and visually identified
conserved exons
Predicted degenerate PCR primers for
target intronic regions from conserved
exons using CODEHOP Primer
Design Strategy
Degenerate PCR primers were predicted for 14
gene families
Modified CODEHOPpredicted degenerate primers
to better correspond with the
consensus DNA sequence
and the Eschscholzia DNA
sequence. The predicted
primer position was not
modified.
7 degenerate PCR primer pairs have been
designed for 5 gene families
Each primer pair was optimized for
Mg+2 concentration and temperature
4 degenerate PCR primer pairs have amplified
a PCR product
Future Research
-Clone and sequence small sample of
genotypes
-Design specific primers
-Amplify and sequence from full collection
-Analyze phylogeny
Download