14 plant molecular systematics ACQUISITION OF MOLECULAR DATA . . . . . . . . . . . . . 585 MICROSATELLITE DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 DNA SEQUENCE DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 random amplified polymorphic dna (rapds) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596 Polymerase Chain Reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 DNA Sequencing Reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Types of DNA Sequence Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Analysis of DNA Sequence Data . . . . . . . . . . . . . . . . . . . . . . . . . . 590 AMPLIFIED FRAGMENT LENGTH POLYMORPHISM (AFLPs) . . . . . . . . . . . . . . . . . . . . . . . 596 REVIEW QUESTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 RESTRICTION SITE ANALYSIS (RFLPs) . . . . . . . . . . . . . . 592 EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 ALLOZYMES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 REFERENCES FOR FURTHER STUDY . . . . . . . . . . . . . . . . 601 Molecular systematics encompasses a series of approaches in which phylogenetic relationships are inferred using information from macromolecules of the organisms under study. Specifically, the types of molecular data acquired include that from DNA sequences, DNA restriction sites, allozymes, microsatellites, RAPDs, and AFLPs. (The use of data from other, generally smaller molecules, such as secondary compounds in plants, is usually relegated to the field of “chemosystematics” and will not be reviewed here.) A revolution in inferring the phylogenetic relationships of life is occurring with the use of molecular data. The following is a review of the types of data, methods of acquisition, and methods of analysis of molecular systematics. in a container of silica gel. Alternatively, plant samples may be frozen or placed in concentrated extraction buffer. With any of these procedures, DNA is usually preserved intact. Usable DNA is often successfully isolated from dried herbarium sheets, attesting to the “toughness” of the molecule. DNA SEQUENCE DATA ACQUISITION OF MOLECULAR DATA Perhaps the most important method for inferring phylogenetic relationships of life is that of acquiring DNA sequences. DNA sequence data basically refers to the sequence of nucleotides (adenine = A, cytosine = C, guanine = G, or thymine = T; Figure 14.1) in a particular region of the DNA of a given taxon. Comparisons of homologous regions of DNA among the taxa under study yield the characters and character states that are used to infer relationships in phylogenetic analyses. The first step of acquiring DNA sequence data is to identify a particular region of DNA to be compared between species. Much prior research goes into identifying these regions and determining their efficacy in phylogenetic analysis. Plant samples from which DNA is to be isolated may be acquired by various means. It is vital to always collect a proper voucher specimen, properly mounted and accessioned in an accredited herbarium, to serve as documentation for any molecular systematic study (see Chapter 17). Live samples may be collected and immediately subjected to chemical processing, e.g., for allozyme analysis (see later discussion). For many DNA methods, pieces of leaves (from which chloroplast, mitochondrial, and nuclear DNA can be isolated) are removed from the live plant and immediately dried, typically © 2010 Elsevier Inc. All rights reserved. 10.1016/B978-0-12-374380-0.50014-2 POLYMERASE CHAIN REACTION 585 After a gene sequence of interest is identified, the DNA from a given plant sample is first isolated and purified by various 586 CHAPTER 14 Plant molecular systematics NH2 O = CH 3 N NH2 N HN N =O NH HN NH adenine NH HN NH N guanine = NH2 = N O O cytosine thymine Molecular structure of the four DNA nucleotides. Adenine and guanine are chemically similar purines; cytosine and thymine are chemically similar pyrimidines. Figure 14.1 chemical procedures. Following this, the DNA sequences of interest are amplified using the polymerase chain reaction (or PCR). The invention of this technology was crucial to modern DNA sequencing, as it permitted rapid and efficient DNA amplification, the replication of thousands of copies of DNA. The polymerase chain reaction work as follows (see Figure 14.2). Prior research establishes the occurrence of relatively short regions of DNA that flank (occur at each end of) the gene or DNA sequence of interest and that are both unique (not occurring elsewhere in the genome) and conserved (i.e., invariable) in all taxa to be investigated. repeat cycle sample DNA solution heated DNA denatures 3′ 5′ 5′ 3′ temperature lowered 3′ T-A-G-C-C-A-A-T-C-G-C-T ~ ~ T-T-A-A-T-C-G-A-G-G-T-T-A A-A-T-T-A-G-C-T-C-C-A-A-T 5′ A-T-C-G-G-T-T-A-G-C-G-A T-A-G-C-C-A-A-T-C-G-C-T ~ ~ 3′ T-T-A-A-T-C-G-A-G-G-T-T-A A-A-T-T-A-G-C-T-C-C-A-A-T 5′ 5′ A-T-C-G-G-T-T-A-G-C-G-A primers anneal to conserved regions DNA renatures 3′ 5′ A-T-C-G-G-T-T 3′ T-A-G-C-C-A-A-T-C-G-C-T ~ A-A-T-T-A-G-C-T-C-C-A-A-T 5′ A-T-C-G-G-T-T-A-G-C-G-A ~ 3′ T-T-A-A-T-C-G-A-G-G-T-T-A C-T-C-C-A-A-T 5′ 3′ 5′ DNA strands replicated 5′ 5′ A-T-C-G-G-T-T-A-G-C-G-A free nucleotides (catalyzed by DNA polymerase) bind to primers Figure 14.2 3′ T-A-G-C-C-A-A-T-C-G-C-T ~ A-A-T-T-A-G-C-T-C-C-A-A-T 5′ A-T-C-G-G-T-T-A-G-C-G-A ~ 3′ T-T-A-A-T-C-G-A-G-G-T-T-A A-A-T-T-A-G-C-T-C-C-A-A-T 5′ Polymerase chain reaction, using cycle sequencing to produce multiple copies of a stretch of DNA. 5′ Unit III Systematic evidence and descriptive terminology These short, conserved, flanking regions are used as a template for the synthesis of multiple, complementary copies, known as primers. Primers ideally are constructed such that they do not bind with one another. In the polymerase chain reaction, a solution is prepared, made up of the isolated and purified DNA of a sample; multiple copies of primers; free nucleotides; DNA polymerase molecules (typically Taq polymerase, which can tolerate heat); and buffer and salts. This solution is heated to a point at which the sample DNA denatures, whereby the two strands of DNA separate from one another. Once the sample DNA denatures, the primers in solution may bind with the corresponding, complementary DNA of the sample (Figure 14.2). Following binding of the primer to the sample DNA, individual nucleotides in solution attach to the 3′ end of the primer, with the sample DNA acting as a template; DNA polymerase catalyzes this reaction. A second primer, at the opposite end of the DNA sequence of importance, is used for the complementary, denatured DNA strand. Thus, the two denatured strands of DNA are replicated. After replication, the solution is cooled to allow for annealing of the replicated DNA with the complementary DNA single strands. This is followed by heating to the point of DNA denaturation, and repeating the process. A typical PCR reaction can produce more than a million copies of DNA in a matter of hours. DNA SEQUENCING REACTION After DNA is replicated, it is sequenced. The most common sequencing technology involves a machine that reads fluorescent dyes with a laser detector. The production of dye-labeled DNA is very similar to DNA replication using the PCR. The replicated DNA is placed into solution with DNA polymerase, primers, free nucleotides, and a small concentration of synthesized compounds called dideoxynucleotides (discussed later) that are each attached to a different type of fluorescent dye. As in the polymerase chain reaction, the sample DNA is heated until the double helix unwinds and the two complementary DNA chains separate (Figure 14.3). At this point, a primer attaches to a conserved region of one of the strands of DNA, and free nucleotides in solution join to the 3′ end of the primer, using the sample DNA as a template and catalyzed by DNA polymerase (Figure 14.3). Thus, a replicated copy of the DNA strand begins to form. However, at some point a dideoxynucleotide joins to the new strand instead of a nucleotide doing so. The dideoxynucleotides (dideoxyadenine, dideoxycytosine, dideoxyguanine, and dideoxythymine) resemble the four nucleotides, except that they lack a hydroxyl group. Once a dideoxynucleotide is joined to the chain, absence of the hydroxyl group prevents the DNA polymerase from joining it to anything else. Thus, with the addition of 587 a dideoxynucleotide, synthesis of the new DNA strand terminates (Figure 14.3). The ratio of dideoxynucleotides to nucleotides in the reaction mixture is carefully set and is such that the concentration of dideoxynucleotides is always much smaller than that of normal nucleotides. Thus, the dideoxynucleotides may terminate the new DNA strand at any point along the gene being replicated. For example, some of the new DNA strands will be the length of the primer plus one additional base (in this case the dideoxynucleotide); some will be the primer length plus two bases (a nucleotide plus the terminal dideoxynucleotide); some will be the primer length plus three bases (two nucleotides plus the terminal dideoxynucleotide); etc. There are many thousands, if not millions, of copies of the sample DNA. Thus, there will be an equivalent number of newly replicated DNA strands, of all different lengths. The final step of DNA sequencing entails subjecting the DNA strands to electrophoresis, in which the DNA is loaded onto a flat gel plate or in a thin capillary subjected to an electric current. Because the phosphate components of nucleic acids give DNA a net negative charge, the molecules are attracted to the positive pole. The DNA strands migrate through the medium over time, the amount of migration inversely proportional to the molecular weight of the strand (i.e., lighter strands migrate further). Each strand is terminated with a dideoxynucleotide to which a fluorescent dye is attached; each of the four dideoxynucleotides has a different type of fluorescent dye, which (upon excitation) emits light of a different wavelength. Thus, as the multiple copies of DNA of one particular length migrate along the gel or capillary, the wavelength of emitted light is detected and recorded as a peak, which measures the light intensity. Because a given emitted wavelength (“color”) is determined by one of the four dideoxynucleotides, the corresponding nucleotide can be inferred and its position identified by the timing of migration of the DNA strands. In this way, the sequence of nucleotides of the DNA strand can be inferred (Figure 14.3). TYPES OF DNA SEQUENCE DATA For plants, the three basic types of DNA sequence data stem from the three major sources of DNA: nuclear (nDNA), chloroplast (cpDNA), and mitochondrial (mtDNA). Nuclear DNA is, of course, transmitted from parent(s) to offspring by nuclear division (meiosis or mitosis) via sexual or asexual (somatic) reproduction. Chloroplasts and mitochondria, however, replicate and divide independently of the nucleus and may be transmitted to offspring in a different fashion. For example, in angiosperms these organelles are usually (with some exceptions) sexually transmitted only maternally, being 588 CHAPTER 14 Plant molecular systematics sample DNA (many copies) add: primer molecules, nucleotides, DNA polymerase, dideoxynucleotides new DNA strands denatured from sample DNA; after numerous reactions new DNA strands separated by electrophoresis (below) solution heated, DNA denatures 3’ 3’ 5’ 5’ 3’ a single primer anneals to a conserved region of one strand of sample DNA 5’ A-T-C-G-G-T-T-A-G-C* T-A-G-C-C-A-A-T-C-G-C-A ~ A-A-T-T-A-G-C-T-C-C-A-A-T 5’ at random, dideoxynucleotide (C* in this case) binds to primer strand, terminating reaction 3’ 5’ A-T-C-G-G-T-T-A-G T-A-G-C-C-A-A-T-C-G-C-A ~ A-A-T-T-A-G-C-T-C-C-A-A-T 5’ primer 3’ 5’ A-T-C-G-G-T-T T-A-G-C-C-A-A-T-C-G-C-A ~ A-A-T-T-A-G-C-T-C-C-A-A-T 5’ second nucleotide binds to primer strand sample DNA first nucleotide (catalyzed by DNA polymerase) binds to primer strand 3’ 5’ A-T-C-G-G-T-T-A T-A-G-C-C-A-A-T-C-G-C-A ~ A-A-T-T-A-G-C-T-C-C-A-A-T 5’ (+) A-T-C-G-G-T-T-A* A G C G T A-T-C-G-G-T-T-A-G* ELECTROPHORESIS: electric current applied. DNA strands migrate to (+) pole (inversely to molecular weight) A-T-C-G-G-T-T-A-G-C* A-T-C-G-G-T-T-A-G-C-G* DNA strands scanned during migration. Peaks of wavelengths correspond to fluorescent dyes attached to specific dideoxynucleotides A-T-C-G-G-T-T-A-G-C-G-T* (–) Figure 14.3 DNA sequencing reactions. A* = dideoxyadenine; C* = dideoxycytosine; G* = dideoxyguanine; T* = dideoxythymine. Unit III Systematic evidence and descriptive terminology 589 U GG yc C- f 6 GC A oB *rp C1 p p sbJ p sbL ps sbF o bE W rf 1 P - -CC03 UG A G rp1 20 5’rp s12 clp P* po *r 2 oC rp H atp F *atp atpA Chloroplast DNA rpoA rps11 rpl36 infA rps8 rpl14 rpl16 * rps3 rpl22 rps19 rpl2 * rpl23 I-CAU S-GCU Q-UUG (Nicotiana tabacum) JLB matK psbA H-GU G rp12 * rp12 r-CA 3 U A IR orf B 1 F ndh rpl3 2 sprA L -UA G ycf 5 ars iA Inverted Repeat A ori 1 Small Single Copy Region or UU N -G 75 orf rrn 23 rrn 4.5 R-Arrn 5 CG orf3 50 31 rr 3’ f1 *I- rp s7 rp s1 2* o L-Crf 115 AA hB * 1 lA nd or ycf ndh 23 rrn 4.5 rrnrrn5 CG R -A N orf -GUU 75 SSC rps 15 ndhH or JSA B * ndhA ndhI ndhG ndhE psaC ndhD o V- rf 7 G 08 A n1 C 6 *A GA -U U GC JS 2 yc o f 15 or rf 92 f7 9 08 f 7 AC or -G 16 V rrn U A -G C *I -UG *A 15 f1 or CAA * LhB nd 7 2 * s 1 rp ’rps 3 31 f1 or B Inverted Repeat B psbI psbK *rps16 155,939 base pairs IR 5 f1 yc rf 92 8 o rf 7 o U R-UC * G-UCC *K-UUU matK ycf f2 yc 2 ars 2 rps I atp psbN * petD E Y -UU D -GU C -G A U ps C bM LSC 9 JLA psb8 psbT psbH * petB psaA ycf 3* orf 74 rps4 U T -UG J ndh K ndhh C nd C* UA V- E atp B atp cL aI ps f4 yc 10 f yc tA f9 p peet L ps tG rp aJ rp 133 s1 8 psaB rps 14 IM -CAU orf 1 S-U 05 GA U CA rb cD ac pe or T- G-GCC ycf 9 psb C S -GGA M- rbcL 0A orf 7 AA *L -U A A F -G atpB psb D Large Single Copy Region Figure 14.4 Molecular structure of the chloroplast DNA of tobacco (Nicotiana tabacum). Note large single-copy region (LSC), small single-copy region (SSC), and the two inverted repeats (IRA and IRB). Also note location of atpB, rbcL, matK, and ndhF genes (see Table 14.1). (Redrawn from Wakasugi, T., M. Sugita, T. Tsudzuki, and M. Sugiura. 1998. Updated gene map of tobacco chloroplast DNA. Plant Molecular Biology Reporter 16: 231–241, by permission.) retained in the egg but excluded in sperm cells. (In conifers, interestingly, chloroplast DNA is transmitted paternally, not maternally.) The use of sequence data from the DNA of chloroplasts has proven to be very useful in elucidating both lower and higher level relationships. The basic structure of chloroplast DNA for a flowering plant, with coding genes indicated, is shown in Figure 14.4. Like all organelle and prokaryotic DNA, chloroplast DNA is circular. Curiously, most angiosperms have a region of chloroplast DNA known as the inverted repeat, which is the mirror image of the corresponding region (Figure 14.4). Some of the more commonly sequenced chloroplast DNA genes are listed in Table 14.1, although many more have been utilized. In addition to coding genes of chloroplast DNA, the sequences between genes, known as intergenic spacers, may be used in phylogenetic analyses. Intergenic spacer regions often show a higher degree of variability than the coding genes, making the former more useful for analyses at a lower taxonomic level, such as species or infraspecies. A list of some commonly used chloroplast intergenic spacers is seen in Table 14.2. Nuclear DNA sequencing has been used to a lesser degree in plant systematics. Some nuclear genes such as alcohol dehydrogenase (Adh), which has traditionally been used in allozyme studies, are becoming more frequently used. One of the more useful types of nuclear DNA sequences has been the internal transcribed spacer (ITS) region, 590 CHAPTER 14 TABLE 14.1 Plant molecular systematics Some chloroplast genes that have been used in plant molecular systematics, after Soltis et al. 1998. CHLOROPLAST GENES GENE LOCATION FUNCTION atpB Large single-copy region of chloroplast rbcL Large single-copy region of chloroplast matK ndhF Large single-copy region of chloroplast Small single-copy region of chloroplast Beta subunit of ATP synthethase, which functions in the synthesis of ATP via proton translocation Large subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (RUBISCO), which functions in the initial fixation of carbon dioxide in the dark reactions Maturase, which functions in splicing type II introns from RNA transcripts Subunit of chloroplast NADH dehydrogenase, which functions in converting NADH to NAD + H+, driving various reactions of respiration which contains multiple DNA copies (as opposed to single copies found in most protein-coding genes). The ITS region lies between the 18S and 26S nuclear ribosomal DNA (nrDNA); the ITS region is divided into two subregions, ITS1 and ITS2, separated by the 5.8S nrDNA (Figure 14.5A). ITS sequence data has been most valuable for inferring phylogenetic relationships at a lower level, e.g., between closely related species. However, it has also been used in elucidating higher level relationships. (See Baldwin et al. 1995.) A related DNA sequence region is the external transcribed spacer (ETS) region. The ETS region lies between 26S and 18S nrDNA, adjacent to the latter (Figure 14.5B). (The entire region, including both the ETS and the nontranscribed spacer region (NTS) is known as the intergenic spacer region, or IGS; see Figure 14.5B.) The ETS region contains even more sequence variation than ITS and is useful in analyses at lower taxonomic levels. (See Baldwin and Markos 1998.) ANALYSIS OF DNA SEQUENCE DATA DNA sequence data is converted to characters and character states to be used in phylogenetic analyses. First, the sequences of a given length of DNA are aligned, in which homologous nucleotide positions (e.g., corresponding to the same codon TABLE 14.2 position of a given gene) are arranged in corresponding columns (Figure 14.6). For some genes that are relatively conserved, alignment is straightforward, as all taxa have the same number of nucleotides per gene. For other genes or DNA segments, some taxa may have one or more additions, deletions, inversions, or translocations relative to other taxa. The occurrence of these mutations, and/or the occurrence of considerable homoplasy among taxa, can make alignment of DNA sequences difficult. In addition, multiple copies of a gene can make homology assessment difficult. Various computer algorithms can be used to automatically align sequences of the taxa being studied, but these have assumptions that must be carefully assessed. Generally, in using DNA sequence data in a phylogenetic analysis, a character is equivalent to the nucleotide position, and a character state of that character is the specific nucleotide at that position (there being four possible character states, corresponding to the four nucleotides; see Figure 14.6). A large number (often the great majority) of nucleotide positions are generally invariable among taxa, and some of the variable ones are often uninformative by being autapomorphic for a given taxon; thus, relatively few sites are informative and therefore useful in phylogenetic reconstruction (Figure 14.6). Some chloroplast intergenic spacer regions that have been used in plant molecular systematics, after Shaw et al. 2005, 2007. CHLOROPLAST INTERGENIC SPACER REGIONS 3’rps16-5’trnK 3’trnK-matK intron 3’trnV-ndhC 5’rpS12-rpL20 atpI-atpH matK-5’trnK intron ndhA intron ndhF-rpl32 ndhJ-trnF petL-psbE psaI-accD psbA-3’trnK psbB-psbH psbD-trnT psbJ-petA psbM-trnD rpl14-rps8-infA-rpl36 rpl16 intron rpl32-trnL rpoB-trnC rps16 intron rps4-trnT trnC-ycf6 trnD-trnT trnG intron trnH-psbA trnL intron trnL-trnF trnQ-5’rps16 trnS-rps4 trnS-trnfM trnS-trnG trnT-trnL ycf6-psbM Unit III Systematic evidence and descriptive terminology 591 ITS Region LEU1 ITS5 18S nrDNA A ITS3 5.8S nrDNA ITS1 ITS2 ITS2 26S nrDNA ITS4 C28KJ ETS-Hel-1 26S nrDNA NTS ETS ETS-Hel-2 26S-IGS 18S nrDNA 18S-E 18S-IGS 18S-ETS B IGS Region A. Internal transcribed spacers (ITSs) of nuclear ribosomal DNA, illustrating the ITS region and flanking subunits, and showing the orientations and locations of primer sites. After Baldwin et al. (1995). B. External transcribed spacer (ETS) of the intergenic spacer (IGS) region, also showing orientations and locations of primer sites. After Baldwin and Markos (1998). Figure 14.5 However, a major addition, deletion, inversion, or translocation can in itself be identified as an evolutionary novelty (apomorphy), used in grouping lineages together. For example, members of the Faboideae (of the Fabaceae) lack, by deletion, one of the inverted repeats found in the chloroplasts of most angiosperms (see Figure 14.4). Chromosomal mutations such as these may be coded separately from single base differences (e.g., as in the example of Figure 14.6) and may be given relatively greater weight in inferring relationships. Several types of weighting schemes may be done with molecular data. For protein encoding genes, the codon position may be differentially weighted. For example, because of redundancy of the genetic code, the third codon position is generally more labile (a change more likely to have occurred randomly) than the second, and the second may be more labile than the first. Thus, the first and second codon positions may be given relatively greater weight, respectively (such as a weight of 10 for the first codon position, 5 for the second position, and 1 for the third position). The logic here is that a change in codon position 1 or 2 is less likely to have occurred at random within a taxon and more likely represents evolutionary novelties that are shared among taxa. Weighting by codon position may be based on empirical data. For a given data set, the number of changes occurring for codon positions 1, 2, and 3 may be used (inversely) to establish the relative weights. Another weighting parameter that may be used with DNA sequence data concerns transitions versus transversions. Transitions are evolutionary changes from one purine to another purine (A → G or G → A) or from one pyrimidine to another pyrimidine (C → T or T → C); see Figure 14.1. Transversions are evolutionary changes from a purine to a pyrimidine (A → C, A → T, G → C, or G → T) or from a pyrimidine to a purine (C → A, C → G, T → A, or T → G). Weighting using transitions versus transversions may be based on empirical data. For a given data set, the relative frequency 592 CHAPTER 14 Taxon 1 Taxon 2 Taxon 3 Taxon 4 Taxon 5 Taxon 6 Taxon 7 Taxon 8 Plant molecular systematics DNA Alignment Character Coding 00000000000000000001111111111111111111111 88888888899999999990000000000111111111122 12345678901234567890123456789012345678901 GCCTAGCCAAAGCTCTTCCAAGGTGACTCTCAGTTCAAGCT GCCTAGCCAAAGCTCTTCCAAGCTGACTCTCA------GCT GCCTAGCCTAAGCTCAACCAAGGTGTCTCTCAGTTCAAGCT GCCTAGCCTAAGCTCTTCCAAGGTGTCTCTCAGTTCAAGCT GCCTAGCCAAAGCTCTTCCAAGCTGACTCTCA------GCT CCCTAGCCAAAGCTCTTCCAAGCTGACTCTCAGTTCAAGCT CCCTAGCCAAAGCTCTTCCAAGCTGACTCTCAGTTCAAGCT GCCTAGCCTAAGCTCTTCCAAGCTGACTCTCAGTTCAAGCT 1 2 3 4 5 6 2 0 3 2 0 4 2 0 3 1 0 5 2 3 0 2 3 4 2 3 3 2 3 4 2 0 3 1 0 5 1 0 3 1 0 4 1 0 3 1 0 4 2 3 3 1 0 4 Figure 14.6 Example of alignment of DNA sequences of 41 nucleotide sites (positions 81–121) from eight taxa. Variable nucleotide sites are in bold. Note deletion of six bases in taxon 2 and taxon 5. Possible character coding of variable sites is seen at right. Coding of nucleotides is as follows: A = state 0; C = state 1; G = state 2; T = state 3. In this example, the deletion is coded as a single binary character (character 6), coded differently from nucleotides, as state 4 = deletion absent and state 5 = deletion present. of transitions versus transversions may be used (inversely) to establish the relative weights. For example, for a given group under study, if transitions occur 5× more frequently than transversions, the latter may be given a weight of 5 and the former a weight of 1, as illustrated in the step matrix of Figure 14.7. These weighting schemes may be viewed as a simplified component of a process that may be quite complex, taking into account, e.g., rate of base substitution, base frequency, and branch length in determining an evolutionary model. Evolutionary models are commonly used in maximum likelihood and Bayesian analyses. (See Chapter 2.) DNA sequence data can also be used to evaluate the secondary structure of a molecule. Thus, nucleotide differences that result in major changes in the conformation of the product (whether ribosomal RNA or protein) may have a much greater physiological effect than those that do not and might receive a higher weight. Computer algorithms can evaluate this to some degree. A G C T A 0 1 5 5 G 1 0 5 5 C 5 5 0 1 T 5 5 1 0 Step matrix of nucleotide changes, showing weighting scheme in which transversions are given a weight 5 times greater than that of transitions. Figure 14.7 Parsimony, maximum likelihood, and Bayesian methods are commonly used to infer phylogenetic relationships using DNA sequence data (Chapter 2). The most robust hypotheses of relationship are generally those using a large taxon sampling and sequence data from multiple (e.g., anywhere from 3 to 20+) genes and/or sequence regions. RESTRICTION SITE ANALYSIS (RFLPS) A restriction site is a sequence of approximately 6–8 base pairs of DNA that binds to a given restriction enzyme. These restriction enzymes, of which there are many, have been isolated from bacteria. Their natural function is to inactivate invading viruses by cleaving the viral DNA. Restriction enzymes known as type II recognize restriction sites and cleave the DNA at particular locations within or near the restriction site. An example is the restriction enzyme EcoRI (named after E. coli, from which it was first isolated), which recognizes the DNA sequence seen in Figure 14.8 and cleaves the DNA at the sites indicated by the arrows in this figure. Restriction fragment length polymorphism, or RFLP, refers to differences between taxa in restriction sites, and therefore the lengths of fragments of DNA following cleavage with restriction enzymes. For example, Figure 14.9 shows, for two hypothetical species, amplified DNA lengths of 10,000 base pairs that are subjected to (“digested with”) the restriction enzyme EcoRI. Note, after a reaction with the EcoRI enzyme, that the DNA of species A is cleaved into three fragments, corresponding to two EcoRI restriction sites, whereas that of species B is cleaved into four fragments, corresponding to three EcoRI restriction sites. The relative Unit III Systematic evidence and descriptive terminology G-A-A-T-T-C C-T-T-A-A-G A DNA restriction site, cleaved (at arrows) by the restriction site enzyme EcoRI. Figure 14.8 locations of these restriction sites on the DNA can be mapped; one possibility is seen at the bottom of Figure 14.9. (Note that there are other possibilities for this map; precise mapping requires additional work.) Additional restriction enzymes can be used. Figure 14.10 illustrates how each of the DNA fragments from the EcoRI digests can be digested with the BAM HI restriction enzyme, yielding different fragments for the two species. These data can be added to the original in preparing a map (one possible map is shown in lower part of Figure 14.10). Restriction site fragment data can be coded as characters and character states in a phylogenetic analysis. For example, given that the restriction site maps of Figure 14.10 are correct, the presence or absence of these sites can be coded as characters, as seen in Figure 14.11. Restriction site analysis contains far less data than complete DNA sequencing, accounting only for the presence or absence of sites 6–8 base pairs long. It has the advantage, however, of surveying considerably larger segments of DNA. However, with improved and less expensive sequencing techniques, it is less valuable and less often used than in the past. ALLOZYMES Allozymes are different molecular forms of an enzyme that correspond to different alleles of a common gene (locus). (This is not to be confused with isozymes, which are forms of an enzyme that are derived from separate genes or loci.) Allozymes are traditionally detected using electrophoresis, in which the enzymes are extracted and placed on a medium (e.g., starch) through which an electric current runs (similar to gel electrophoresis in DNA sequencing). A given enzyme will migrate toward one pole or the other depending on its charge. Similarly, different allozymes of an enzyme will migrate differentially because they differ slightly in amino acid composition and therefore have somewhat different 593 electrical charges. Allozymes subjected to electrophoresis are identified with a stain specific to that enzyme and the bands marked by their relative position on the electrophoresis medium. Allozymes have traditionally been used to assess genetic variation within a population or species, but they can also be used as data in phylogenetic analyses of closely related species, e.g., species within a monophyletic genus. Figure 14.12A illustrates an example of electrophoretic allozyme banding data for five species and an outgroup. There are several ways to code polymorphic allozyme data. One way is to code each allele as a character and the presence or absence of that allele as a character state. A second way to code allozyme data is to treat the locus (corresponding to the gene coding for the enzyme) as the character and all unique combinations of alleles as character states (as in Figure 14.12B). The number of state changes between these unique allelic combinations can be a default of one. However, another method of coding is to treat the loss of each allele as one state change and the gain of an allele as a separate state change. Thus, the number of state changes between different allelic combinations can vary, as seen in Figure 14.12C. Step matrices (see Chapter 2) are used to code these in a cladistic analysis. Yet another way to code allozyme data is to take into account the frequency of alleles present in a given taxon. For example, by this method, species A, which has allele X present with a frequency of 95% and allele Y with a frequency of 5%, would receive a different coding from species B, which has the same alleles, but in frequencies of 55% and 45%, respectively. MICROSATELLITE DNA Microsatellites are regions of DNA that contain short (usually 2–5) repeats of nucleotides, an example being TGTGTG, in which two base pairs repeat. The regions are termed tandem repeats; if they vary within a population or species, they are called variable-number tandem repeats (VNTR). (Other designations and acronyms are used, depending on the particular field of study.) These tandem repeats can be located all across the genome; at a given location (locus), the repeat will tend to be of a certain length. However, individuals within or between populations may vary in the number of tandem repeats at a given locus (or even show allelic variation) because of irregularities in crossing-over and replication. Thus, variable-number tandem repeats can be used as a genetic marker. Microsatellites are identified by constructing primers that flank the tandem repeats and then using PCR technology. 594 CHAPTER 14 Plant molecular systematics Species A Species B DNA − 10,000 bp long DNA − 10,000 bp long + EcoRI + EcoRI 5000 bp 4000 bp 4000 bp 3000 bp 2000 bp 1000 bp 1000 bp EcoRI EcoRI Species A 5000 6000 EcoRI EcoRI EcoRI 3000 5000 Species B 6000 Figure 14.9 Example of restriction site analysis of species A and B, using restriction site enzyme EcoRI. Note differences in fragment lengths. Possible restriction site maps of species A and B are shown in the lower portion of the figure. Unit III 595 Systematic evidence and descriptive terminology Species A Species B EcoRI digest EcoRI digest 5000 bp 4000 bp 4000 bp 3000 bp + 2000 bp + BAM HI BAM HI 1000 bp + + BAM HI + BAM HI + BAM HI BAM HI + 1000 bp BAM HI 4000 bp 3500 bp 3000 bp 2200 bp 1800 bp 1600 bp 1500 bp 700 bp 700 bp 300 bp 300 bp 400 bp EcoRI EcoRI 3500 Species A 5300 5000 BAM HI EcoRI Species B 3400 6000 BAM HI EcoRI EcoRI 5300 7800 3000 5000 BAM HI BAM HI 6000 BAM HI Example of restriction site analysis of species A and B, using restriction site enzyme EcoRI, followed by restriction site enzyme BAM HI. Possible restriction site maps of species A and B are shown in the lower portion of the figure. Figure 14.10 596 CHAPTER 14 Plant molecular systematics CHARACTERS EcoRI BAM EcoRI BAM BAM EcoRI BAM TAXA 3000 3400 3500 5000 5300 6000 7800 Species A − − + + + + − Species B + + − + + + + Figure 14.11 Character coding of restriction site map data of Figure 14.10, derived by presence or absence of EcoRI or BAM sites at specific locations along DNA. (The primers are initially identified for a species by the time-consuming process of synthesizing genetic probes of a tandem repeat, screening DNA for binding to these probes, and sequencing these regions to design primers that flank the tandem repeats.) Once the primers are identified, PCR can be used to quickly generate multiple copies of the tandem repeat DNA, the length of which (for a given individual at a given locus or allele) can be determined by gel electrophoresis. (See example in Figure 14.13.) Microsatellite analysis can generate data quickly and efficiently (once the primers are identified for a given group) for a large number of individuals. It is most often used for population studies, e.g., to assess genetic variation or homozygosity. Its use in systematics is largely in examining relationships within a species (such as to assess infraspecific classifications) or between very closely related species. RANDOM AMPLIFIED POLYMORPHIC DNA (RAPDS) Another method of identifying genetic markers is by using a randomly synthesized primer to amplify DNA in a PCR reaction. In this method, the primer will anneal to complementary regions located in various locations of isolated DNA. If another complementary site is present on the opposing DNA strand at a distance that is not too great (i.e., within the limits of PCR), then the reaction will amplify this region of DNA (Figure 14.14). Because many sections of DNA complementary to the primer may occur, the PCR reaction will result in DNA strands of many different lengths, which can be sizeseparated by electrophoresis. Because even closely related individuals may show some sequence variation that could determine potential primer sites, these different individuals will show different amplification products. Thus, RAPD refers to using randomly generated primers for the amplification of DNA to identify polymorphic DNA regions of different individuals or taxa. (See example in Figure 14.14.) RAPDs, like microsatellites, may often be used for withinspecies genetic studies, but may also be successfully employed in phylogenetic studies to address relationships within a species or between closely related species. However, RAPD analysis has the major disadvantages in that results are difficult to replicate (being very sensitive to PCR conditions) and in that the homology of similar bands in different taxa may be unclear. AMPLIFIED FRAGMENT LENGTH POLYMORPHISM (AFLPS) This method is similar to that of identifying RFLPs in that a restriction enzyme is used (Figure 14.15A) to cut DNA into numerous, smaller pieces, each of which (because of the action of the restriction enzymes) terminates in a characteristic nucleotide sequence (Figure 14.15B). However, the numerous, cut DNA fragments are then modified by binding to each end (using DNA ligase) a synthesized, double-stranded piece of DNA, known as a primer adapter (Figure 14.15C). The primer adapters are designed to insert at the cut ends (corresponding to the complementary sequences of the restriction enzymes). Primers are then constructed that bind to the primer adapters and amplify the DNA fragments using a polymerase chain reaction (Figure 14.15D). Electrophoresis separates the amplified DNA fragments that exhibit length polymorphism (hence, AFLP), enabling the recognition of numerous genetic markers. AFLP data are more experimentally replicable than are RAPD data and can be used to identify genetic differences among individuals using large pieces of DNA. AFLP has one disadvantage in that so many fragments may be generated that it is hard to distinguish them on an electrophoretic gel. However, a slight modification of the primers used may limit the number of fragments that are amplified, enabling them to be more easily identified. AFLP is largely used for population genetics studies, but has been used in studies of closely related species and even, in some cases, for higher-level, cladistic analyses. Unit III Species A Systematic evidence and descriptive terminology Species B Species C Species D 597 Species E Outgroup 22 GOT 18 21 17 IDH 35 31 27 PGI A GOT A 1 B 1 C 1 D 1 E 0 OUT 0 B IDH 2 2 1 2 0 0 PGI 0 1 2 3 2 0 GOT IDH 22 (1) 17 (0) 2 steps 18 (0) 2 steps PGI 27 (0) 2 steps 35 (1) 1 step 1 step 3 steps 2 steps 2 steps 21 1 step (1) 17,21 (2) 31 (2) 1 step 27,31 (3) C Allozyme data. A. Hypothetical allozyme banding data for taxa A–E and Outgroup and enzymes GOT, IDH, and PGI. B. Coding of data using the locus as the character and unique allelic combinations as character states. C. One possible coding of data (after Mabee and Humphries, 1993). Diagrams illustrating number of state changes between character states (state number in parentheses); each loss or gain of an allele counts as one step change. Figure 14.12 598 CHAPTER 14 Plant molecular systematics Species A Species B PRIMER PRIMER 5′ A-T-C-G-G-T-T 5′ A-T-C-G-G-T-T 3′ T-A-G-C-C-A-A-C-A-C-A-C-A ~ C-A-C-A-C-A-C-T-C-C-A-A-T 5′ 3′ T-A-G-C-C-A-A-C-A-C-A ~ C-A-C-A-C-T-C-C-A-A-T 5′ A-T-C-G-G-T-T-G-T-G-T-G-T ~ 3′ G-T-G-T-G-T-G-A-G-G-T-T-A T-C-C-A-A-T 5′ PRIMER 5′ A-T-C-G-G-T-T-G-T-G-T ~ 3′ G-T-G-T-G-A-G-G-T-T-A T-C-C-A-A-T 5′ PRIMER T-A-G-C-C-A-A-C-A-C-A-C-A ~ C-A-C-A-C-A-C-T-C-C-A-A-T T-A-G-C-C-A-A-C-A-C-A ~ C-A-C-A-C-T-C-C-A-A-T A-T-C-G-G-T-T-G-T-G-T-G-T ~ G-T-G-T-G-T-G-A-G-G-T-T-A A-T-C-G-G-T-T-G-T-G-T ~ G-T-G-T-G-A-G-G-T-T-A 5′ Microsatellite data. Primers were constructed to flank regions of tandem repeats. Note that tandem repeat region of species A is longer than that of species B and is thus a genetic difference between the two. Figure 14.13 Species A Species B Primer X Primer X 3′ 5′ 3′ 5′ 5′ 3′ 5′ 3′ Primer X Primer X 2400 bp 1560 bp RAPDs data. In this example the same DNA regions for species A and B anneal to different randomly generated primers, resulting in amplified DNA of different lengths, a genetic difference between the two taxa. Figure 14.14 Unit III A Systematic evidence and descriptive terminology 599 Add restriction enzyme (e.g., EcoRI) to cleave isolated DNA G-A-A-T-T-C C-T-T-A-A-G B G-A-A-T-T-C C-T-T-A-A-G G-A-A-T-T-C C-T-T-A-A-G DNA cleaved into fragments A-A-T-T-C G A-A-T-T-C G G C-T-T-A-A C G C-T-T-A-A Add primer adapters (+ DNA ligase) N~N-G-A-A-T-T-C N~N-C-T-T-A-A-G G-A-A-T-T-C-N~N C-T-T-A-A-G-N~N N~N-G-A-A-T-T-C N~N-C-T-T-A-A-G D G-A-A-T-T-C-N~N C-T-T-A-A-G-N~N Amplify with PCR primers 5′ N~N-C-T-T-A-A-G N~N-G-A-A-T-T-C N~N-C-T-T-A-A-G G-A-A-T-T-C-N~N C-T-T-A-A-G-N~N G-A-A-T-T-C-N~N 5′ 5′ N~N-C-T-T-A-A-G N~N-G-A-A-T-T-C N~N-C-T-T-A-A-G Figure 14.15 G-A-A-T-T-C-N~N C-T-T-A-A-G-N~N G-A-A-T-T-C-N~N 5′ AFLP technique. The letters “N~N-” represent a length of nucleotides. 600 CHAPTER 14 Plant molecular systematics REVIEW QUESTIONS 1. Name the specific types of data used in studies of molecular systematics. 2. How are samples used to acquire molecular data typically processed? 3. Why is collection of a voucher specimen in molecular studies essential? 4. What does DNA sequence data refer to? 5. Explain the polymerase chain reaction and its importance in molecular systematics. 6. What is a primer? 7. Explain the basic process of automated DNA sequencing. What is the significance of dideoxynucleotides? 8. What are the three major types of DNA used in DNA sequence (and other molecular) studies? 9. In chloroplast DNA, what are the large single-copy region, small single-copy region, and inverted repeats? 10. Name some useful chloroplast genes used in plant molecular systematics. 11. What is the internal transcribed spacer region (ITS) and what is its efficacy in plant molecular systematics? 12. How does the external transcribed spacer region (ETS) differ from ITS and what is the advantage of these data? 13. What is DNA alignment, and what are potential problems with this? 14. In general, what are the characters and character states for DNA sequence data? 15. Name the ways that DNA sequence data may be weighted in a cladistic analysis. 16. What factors do models of evolution take into account as used in maximum likelihood or Bayesian analyses? 17. What is a restriction site? 18. What does restriction fragment length polymorphism (RFLP) refer to? 19. How is RFLP data acquired and how is it used in a cladistic analysis? 20. What is an allozyme? 21. How are allozyme data acquired? 22. Explain the different ways to code allozyme data in a cladistic analysis. 23. What are microsatellites and how are these data obtained? 24. What are random amplified polymorphic DNAs (RAPDs) and how are these data obtained? 25. Describe the technique for generating amplified fragment length polymorphisms (AFLPs), citing how this differs from that of generating RFLPs. EXERCISES 1. If possible, get a demonstration of the various techniques of molecular systematics, e.g., DNA extraction and sequencing. Consider a special topics project in which you define a problem and use these techniques to acquire the data to answer the problem. 2. Access GenBank (http://www.ncbi.nih.gov/Genbank) and acquire molecular data on a particular group of choice. Consider analyzing these data using phylogenetic inference software (see Chapter 2). 3. Peruse journal articles in plant systematics, e.g., American Journal of Botany, Annals of the Missouri Botanical Garden, Systematic Botany, or Taxon, and note those that describe the use of molecular data in relation to systematic studies. Identify the techniques used, data acquired, and problems addressed. Unit III Systematic evidence and descriptive terminology 601 REFERENCES FOR FURTHER STUDY Baldwin, B. G., M. J. Sanderson, J. M. Porter, M. F. Wojciechowski, C. S. Campell, and M. J. Donoghue. 1995. The ITS region of nuclear ribosomal DNA: A valuable source of evidence on angiosperm phylogeny. Annals of the Missouri Botanical Garden 82: 247–277. Baldwin, B. G., and S. Markos. 1998. Phylogenetic utility of the External Transcribed Spacer (ETS) of 18S-26S rDNA: Congruence of ETS and ITS Trees of Calycadenia (Compositae). Molecular Phylogenetics and Evolution 10: 449–463. Hillis, D. M., C. Moritz, and B. K. Mable. 1996. Molecular Systematics, 2nd ed. Sinauer, Sunderland, Massachusetts. Mabee, P. M., and J. Humphries. 1993. Coding polymorphic data: Examples from allozymes and ontogeny. Systematic Biology 42: 166–181. Shaw, J., E. B. Lickey, J. T. Beck, S. B. Farmer, W. Liu, J. Miller, K. C. Siripun, C. T. Winder, E. E. Schilling, and R. L. Small. 2005. The tortoise and the hare II: Relative utility of 21 noncoding chloroplast DNA sequences for phylogenetic analysis. American Journal of Botany 92: 142–166. Shaw, J., E. B. Lickey, E. E. Schilling, and R. L. Small. 2007. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: The tortoise and the hare III. American Journal of Botany 94: 275–288. Small, R. L., J. A. Ryburn, R. C. Cronn, T. Seelanan, and J. F. Wendel. 1998. The tortoise and the hare: Choosing between noncoding plastome and nuclear Adh sequences for phylogenetic reconstruction in a recently diverged plant group. American Journal of Botany 85: 1301–1315. Soltis, P. S., D. E. Soltis, and J. J. Doyle (eds.). 1992. Molecular Systematics of Plants. Chapman and Hall, New York. Soltis, D. E., P. S. Soltis, and J. J. Doyle (eds.). 1998. Molecular Systematics of Plants II: DNA Sequencing. Kluwer Academic, Boston.
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )