Development of multilocus nuclear intronic markers for a phylogeographic study of Platystemon californicus Nathan Poslusny Biology 98 End-of-Semester Progress Report Spring 2005 Advisor: Todd Vision, Department of Biology Abstract Platystemon californicus is an obligately outcrossing annual plant that inhabits both isolated serpentine habitats (magnesium and iron rich soils) and a variety of nonserpentine habitats. The objective of this study is to develop multilocus nuclear intronic markers to analyze the phylogeny of Platystemon and infer the evolutionary history of serpentine colonization. In this study, the plant comparative genomic database, Phytome, was used to find conserved protein sequences from plant taxa related to Platystemon that could be used to design degenerate primers. Degenerate primers were designed to span introns using the COnsensus-Degenerate Hybrid Oligonucleotide Primer (CODEHOP) strategy and were optimized for PCR conditions. This study shows that new developments in comparative sequence analysis tools may aid the design of intronic markers that could be broadly applicable for systematic and phylogeographic studies. Introduction The objective of this study is to design multilocus nuclear intronic markers for use in a phylogeographic study of Platystemon californicus. Platystemon (Papaveraceae) is an obligately outcrossing annual plant distributed from southern Oregon to Baja California and east to Utah and Arizona (Vision, 1998). It inhabits both isolated serpentine habitats, characterized by ultramafic rocks (magnesium and iron rich) and nutrient poor soils, and a variety of non-serpentine soils (Vision, 1998). Primarily, we would like to determine whether current serpentine populations of Platystemon were the product of a single colonization or multiple independent colonizations from neighboring non-serpentine populations. These two alternative colonization histories can be distinguished from each other by comparing the phylogenetic relationships of serpentine and non-serpentine populations. If present serpentine populations are the result of a single colonization event and subsequent long distance dispersal between serpentine populations then all the serpentine populations should be descendents of a single common ancestral population. If present serpentine populations are the result of multiple independent colonizations then they should be more closely related to nearby non-serpentine populations then to geographically distant serpentine populations. Previous phylogenetic results using chloroplast and nuclear DNA sequences suggest that serpentine populations of Platystemon have been independently colonized from non-serpentine populations (Vision, 1998). However, these results are inconclusive because of the limitations of the genetic markers used in the study. The chloroplast DNA sequences were from the Type 1 intron within the anticodon of trnL(UAA) and the intergenic spacer between trnL(UAA) and trnF(GAA) from the single–copy region of the cholorplast genome. These sequences were shown to be insufficiently polymorphic to distinguish between the alternative colonization hypotheses and resulted in a cpDNA tree of poor resolution. The nuclear DNA sequences examined were from the internal transcribed spacers (ITS) between the 18S and 5.8S or the 5.8S and 26S rRNA coding regions, known as ITS1 and ITS2, respectively. The rapidly evolving ITS regions and the highly conserved rRNA coding regions form tandemly repeated elements that occupy thousands of loci in the plant genome (Avise 2004). Although repeats maybe subject to the homogenization effects of concerted evolution (Avise 2004), the ITS regions in Platystemon exhibited only partial homogenization and some tandem repeat sequences evolved independently of all others. This made it difficult to interpret and draw phylogenetic conclusions from the ITS sequences. Multilocus nuclear intronic markers may circumvent many of the problems associated with the chloroplast and ITS markers used in the previous study. Nuclear intronic markers are designed from polymerase chain reaction (PCR) primers that are anchored in conserved exons that span target introns (Figure 1). They are useful markers for phylogenetic studies because introns are less subject to selective constraint and evolve quickly. Differences in nucleotide sequences within intronic regions can be used to infer patterns of divergence and phylogenetic relationships between species or conspecific populations. The use of multiple independent loci for phylogenetic analyses is important because it increases the probability of constructing a true species phylogeny and decreases the probability of recovering gene trees that only represent the evolution of the single locus (Crow et al. 2004, Madison 1997, Nichols 2001). The challenge with using multilocus nuclear intronic markers is that they are difficult to design because limited DNA sequence data exists for most plant species. Nuclear intronic markers can only be designed for species in which the DNA sequence data is available and the position of introns relative to the exons is known. The lack of sequence data for many species, such as Platystemon, has deterred the development of these markers for phylogenetic analyses of these species. However, new comparative sequence analysis tools and degenerate primer design strategies may aid in the design of markers for plant species with unknown target sequences. Using these tools, degenerate primers can be designed for an unknown target sequence based on a consensus of available DNA sequences from conserved genomic regions of related plant taxa. In this study, the plant comparative genomic database, Phytome (www.phytome.org) was used to search and align unipeptides (predicted protein sequences from EST contigs) within gene families of related plant species to Platystemon. The most conserved unipeptide sequences were used to design degenerate primers using the COnsensus-Degenerate Hybrid Oligonucleotide Primer (CODEHOP) strategy (Rose et al. 1998). The degenerate primers were then used to design specific (nondegenerate) primers to sequence serpentine and non-serpentine populations sampled across the geographic distribution of Platystemon. The ultimate goal is to construct phylogenetic trees from these sequences to infer the history of colonization of Platystemon from non-serpentine to serpentine populations. Methods Selection of Genes and Primer Design The PCR design pipeline used in this study was initially developed by Dr. Stephanie Hartmann and utilizes the Phytome database. This database contains protein sequences predicted from aligned EST contigs and clustered into gene families. A search was made on Phytome to generate gene families that are present in Eschscholzia californica (California poppy), the closest related species to Platystemon in the Phytome database, and Arabidopsis thaliana, a species for which the intronic regions of the genome have been predicted. Gene families were selected on several criteria: they had no more than one Arabidopsis member or Eschscholzia member, they have sufficient sequence overlap between Arabidopsis and Eschscholzia (> 80 amino acids), and they had long uninterrupted stretches of unipeptide sequences. (> 50 amino acids) with no more than four gaps of fewer than 20 amino acids each. A Perl script written by Jason Phillips was used to obtain the DNA sequences for the selected gene families from EST contigs stored in the Phytome database. Intron positions within the gene family were based on Arabidopsis gene models obtained from the TIGR Institute of Genomics (www.tigr.org). The positions of intron junctions within the Arabidopsis sequence were assumed to be conserved for the all the taxa including Platystemon. Conserved exons were used to design degenerate PCR primer using the CODEHOP (http://blocks.fhcrc.org/codehop) strategy (Rose et al. 1998). In the CODEHOP program, all the settings were left as default, unipeptide sequences were unweighted, and the codon usage was set to Eschscholzia californica. The CODEHOP primers were aligned to the original DNA nucleotide sequences using Cinema 5 Version 0.2.1. The CODEHOP primers did not consistently match the consensus of DNA nucleotides at each position. Therefore, modified primers were manually designed by eye to better correspond with the consensus DNA nucleotide sequences and the DNA nucleotide sequence of Eschscholzia. The primer position predicted by CODEHOP was not modified. DNA Extraction Dried, pressed samples of Platystemon were collected between 3/28/95 and 11/20/97 by Dr. Todd Vision. The samples were collected from a total of 41 serpentine and non-serpentine soils across the geographic distribution of Platystemon. DNA was extracted from these samples using a Qiagen DNeasy Plant Mini Kit according to the suggested protocal. The DNA was quantified using PicoGreen dsDNA Quantification Kit (Molecular Probes) on a Tecan GENios fluorometer using with, Magellan version 3.11 software. PCR Amplification Primers were synthesized by MWG-Biotech. Each set of primers was optimized with 3 magnesium concentrations, 2.0 M, 2.5 M, and 3.0 M. across 6 different temperatures, ranging from 60.0-69.7. The PCR cocktails (25 L) consisted of 10X Promega PCR buffer, 10 M of each primer, 10 M of dNTP, 1 unit of Taq DNA polymerase (Promega), 2.0-3.0 M of MgCl2, and 5-20 ng of DNA template. These amplification reactions were performed using a MJ Research PTC-225 thermal cycler. The PCR amplification test conditions were: 94 for 1 min, followed by 30 cycles of 94 for 1 min, 60-69.7 for 1 min, 72 for 90 sec, and then 2 min for 72. PCR products were run on 1.5% agarose gels for 45 minutes at 94 V and visualized with ethidium bromide staining. Sequencing Amplified PCR products were prepared for sequencing using the Qiagen QIAquick PCR purification Kit according to the suggested protocol. The isolated PCR product was quantified using the same method as for template DNA. PCR products were sequenced using the ABI Prism dGTP Big Dye Terminator v3.1 Cycle Sequencing Ready Reaction Kit. Sequenced products were sent to the Evolgen Sequencing Facility at UNCChapel Hill to be analyzed using the ABI 3100-Avant Genetic Analyzer (Applied Biosystems). The nucleotide sequences were analyzed and edited using Vector NTi (InforMax Inc.). Results Extractions were performed on 16 of the 41 samples from populations across the geographic distribution of Platystemon for use in the initial testing of nuclear intronic markers. DNA concentrations ranging from 0.202-34.187 ng/L were successfully extracted from the dried, pressed samples. All the DNA extractions tested in a PCR using control primers, ITS 4 and 5 which amplify the ITS 1 sequence for most plants, successfully amplified. Phytome retrieved a total of 2077 gene families that were found both in the Arabidopsis and Eschscholzia genome. A search of approximately 380 gene families was performed and 17 of these families were selected for further examination based on the previously specified criteria (Table 1). Of the 17 families, three of the families were excluded from primer design because the exons were either too degenerate or too short or because introns were only present in regions where sequence data was limited to Arabidopsis. A total of seven primer pairs ranging from 33-39 bp were designed for this study (Table 2). Each primer consisted of a short degenerate 3’ core region (11-12 bp) and a non-degenerate 5’ consensus clamp region of (18-25 bp) according to the CODEHOP design strategy (Rose et al. 1998). The primers had G+C content of 33.7-56.6% and Tm = 65.7-73.9C. The primers were designed for six different exons within five gene families. Primer pairs for gene families 4442, 7181, and 7734 amplified a PCR product (Table 2). The other primers failed to amplify with the tested PCR conditions, but different PCR conditions are being tested. Only the PCR product for the gene family 7734 has been sequenced, but it could not be verified as the target locus because the sequencing results indicated that multiple loci were amplified. Discussion We have developed a novel method to design multilocus nuclear intronic markers for an orphan species sequence using a consensus of available sequences from conserved genomic regions of related plant taxa. Using this method, we have identified 14 gene families out of 380 as possible candidates for PCR-primer design and have designed seven PCR-primers for six conserved introns within five of these genes. Of the seven designed degenerate PCR-primers, four of the primers successfully amplified a PCRproduct. The nucleotide sequence data for 7734.2a indicates that multiple loci have been amplified. Therefore, the PCR-product will be cloned to isolate the correct amplification, if present, from the others. Once the PCR product has been verified as the target locus, specific primers will be designed. Using these primers, PCR assays will be performed on samples from a wide geographic range to verify that the target locus is sufficiently polymorphic for use in the phylogenetic study. Future plans include developing primers for the other gene families and optimizing PCR conditions for these primers. Polymorphisms within these intronic regions will be used to study the serpentine colonization history of Platystemon. Although more research is needed to assess the effectiveness of this method in designing multilocus nuclear intronic markers, this approach could be broadly applicable for systematic and phylogeographic studies. The lack of sequence data for most plant species limits the types and number of molecular markers that can be used to assess phylogenetic relationships between species or conspecific populations. However, this study shows that new developments in comparative sequence analysis tools may aid the design of new intronic markers. References Avise, J.C. (1994) Molecular Markers, Natural History, and Evolution. Chapman and Hall: New York. Crow, K.D, Kanamoto, Z., Bernardi, G (2004) Molecular phylogeny of the hexagrammid fish using a multi-locus approach. Molecular Phylogenetics and Evolution 32: 986-997. Maddison, W.P. (1997) Gene Trees in Species Trees. Systematic Biology 46: 523-536. Nichols, R. (2001) Gene Trees and species trees are not the same. Trends in Ecology & Evolution 16: 358-363. Rose, T. et al. (1998) Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Research 26: 1628-1635 Vision, T.J. (1998) Differentiation Among Serpentine Populations of Platystemon californicus: Environmental, Historical, and Genetic Influences. PhD dissertation: Princeton University. Acknowledgements I thank Dr. Todd Vision for allowing me to work in the Vision lab, Dr. Stephanie Hartmann for designing the primer design pipeline, Dr. Maria Tsompana for her guidance with the lab protocols, Jason Phillips for writing the Perl script that accessed DNA sequence data from Phytome, Dr. Paul Gabrielson for his advice on lab technique, and Dr. Eric Ganko for his advice in writing this paper. Table 1: The Phytome identification numbers of the gene families initially selected for primer design and the corresponding Eschscholzia californica and Arabidopsis thaliana unipeptide sequence identification numbers. Asterisks indicate the gene families that were unable to be used for primer design because of short or degenerate exons or no introns. Gene Family Phytome ID Eschscholzia californica Unipeptide ID Arabidopsis thaliana Unipeptide ID 4011 Ecal673 Atha51630 4442 Ecal1337 Atha47999 4593 Ecal1225 Atha29868 4802 4897 Ecal3802 Ecal5462 Atha45172 Atha33336 5318 5532 6259 6794 7181 Ecal1080 Ecal1675 Ecal3427 Ecal1940 Ecal4770 Atha48559 Atha35375 Atha73110 Atha50946 Atha33492 7437 Ecal849 Atha34640 7734 Ecal812 Atha37444 6721 8372 5926* 7209* 7984* Ecal1448 Ecal5273 Ecal1183 Ecal841 Ecal5237 Atha34173 Atha64047 Atha37340 Atha37680 Atha31332 Comments 8 exons; Primers designed for intron 3 2 exons; Primers designed for intron 1 3 highly conserved exons (2540 codons) 4 conserved exons 10 conserved exons; short exons (~20 codons) 3 conserved exons 2 conserved exons 5 conserved exons 5 conserved exons 2 conserved exons; Primer designed for intron 1 8 conserved exons; Primers designed for introns 4-8 3 conserved exons; Primers designed for intron 2 4 conserved exons 3 conserved exons No introns No introns Exons are too degenerate to design primers Table 2: Degenerate primers and optimal PCR conditions. Primer pairs (5’-3’) Gene Family ID Target Intron(s) 4011 3 4442 1 7181 1 7437 4, 5, 6 7437 7,8 7734 2 7734 2 Forward MGRTCHWATG ATGAARTYGC WATWTGTTCC TTY GCDTTCAAGG TRTATGAAAG AGGTRTYRAG ATATTCAAR TGAMGCTGGW GCTTCTCTRC TCTTTGGTTT CYT TCCTGYCCWA AGGDTCDNTR AGACAYAGR GA CCWCCWRRAA AGYTKGARCT YTTCTCWTAY GAR YRTBTCDGCW GCDTTYCGYC GYTCAGCTGA TGCDYT YTDGTGGTKGA AGGTYTVDST GATTTTGGAA AYRT Reverse DACRWAAACA CCAKKYAGRT CYTGHARATT AAATATCTCM GCTGCWCGRG MWATYTCRTA MATGGGYGAG AMCCKWGCAA GAAGRACAAA YTY CCA CTDCCDGCWC GRAGAAKAGT HGGCAYCCAT CCD CARDATCDTC TTCARRTCVC CHRABTGVAM HCCAGTGTT RGBDGWRTCC CAAADYTCAA TRTCCARCCA TTT RGBDGWRTCC CAAADYTCAA TRTCCARCCA TTT PCR Product Size (bp) N/A Optimal PCR Conditions (C/M Mg+2) N/A ~500 50.0/3.0 N/A N/A N/A N/A N/A N/A ~300 60.6/2.5 ~300 60.6/3.0 Figure 1: A. Multilocus nuclear intronic markers are designed to be anchored in conserved exon regions and span target introns. B. Some markers span multiple intronic regions because intermediate exons were too degenerate or too short for primer design. These markers were required to be less than 2,000 bp to facilitate sequencing. A. Conserved Exon 1 Intron Conserved Exon 2 B. Conserved Exon 1 Intron 1 Exon 2 Intron 2 Conserved Exon 3 Figure 2: PCR Primer Design Pipeline 2077 candidate gene families 380 gene families searched for specified criteria 17 gene families met the selected criteria Intron positions were predicted Phytome was used to query for gene families found in Arabidopsis and Eschscholzia -Must contain only one Arabidopsis or Eschscholzia member -Sufficient overlap between Arabidopsis and Eschscholzia (> 80 amino acids) -Long uninterrupted stretches of unipeptide sequence (> 50 amino acids) -No more than 5 gaps within unipeptide sequence -Gaps less than 20 amino acids long. 15 gene families had introns within unipeptide sequences Realigned sequences around introns and visually identified conserved exons Predicted degenerate PCR primers for target intronic regions from conserved exons using CODEHOP Primer Design Strategy Degenerate PCR primers were predicted for 14 gene families Modified CODEHOPpredicted degenerate primers to better correspond with the consensus DNA sequence and the Eschscholzia DNA sequence. The predicted primer position was not modified. 7 degenerate PCR primer pairs have been designed for 5 gene families Each primer pair was optimized for Mg+2 concentration and temperature 4 degenerate PCR primer pairs have amplified a PCR product Future Research -Clone and sequence small sample of genotypes -Design specific primers -Amplify and sequence from full collection -Analyze phylogeny