DNA Barcoding of Molecular Life Science Center Plant Species We often walk past the plants that surround MLSC, but hardly notice them. Over the next few lab periods DNA will be isolated from these plants, amplified by PCR, and sent to a company for sequencing. The DNA sequences will then be used to place these plants into a phylogenetic tree and identify which Family each plant belongs to. Recall from BIOL 211 that living things are classified by the following levels: Kingdom, Phylum, Class, Order, Family, Genus, and Species. Related organisms share similar traits, including DNA sequence, thus DNA sequences can be used to classify living things. Organisms with many similarities will be closely related while those with more differences will be more distantly related. DNA sequences used for classification are termed DNA Barcodes, since they are analogous to the UPC barcodes used in the grocery store that are scanned and identify the product. Two specific regions of DNA are quite useful for classification: cytochrome c oxidase subunit I (COI) and RuBisCo large subunit (rbcL). The COI gene resides on the mitochondrial genome and is essential for electron transport during respiration. The rbcL gene is located on the chloroplast genome and is essential for the uptake of CO2 in the first step of the Calvin cycle of photosynthesis. Both mitochondrial and chloroplast genomes are highly abundant since there are multiple genomes per organelle and multiple organelles per cell. This allows a researcher to isolate DNA from small amounts of tissue and have enough template DNA for PCR. Also, both COI and rbcL are highly conserved since they perform essential functions. Researchers can thus use one pair of PCR primers for a wide range of species. Primers are designed from regions with little to no variation, but there are more DNA substitutions in the regions between the primers. This more variable region is amplified by PCR as shown in Figure 1, below. Fungi rbcL for variable nucleotides in this region for different species rbcL rev PCR Denature, Anneal, Extend 30X in the presence of Taq polymerase & deoxyNTPs. Figure 1. PCR amplification of the rbcL DNA barcode gene using chloroplast DNA as the template. The two primers used to prime DNA synthesis are named rbcL for and rbcL rev. PCR (Polymerase Chain Reaction) is used to amplify the region between the two primers. Primers are designed from highly conserved regions and anneal to the chloroplast genome of most plant species. The region between the primers is more variable, so PCR products amplified from different species will have sequence differences in the region between the two primers. dNTPs indicates a mixture of deoxyATP, deoxyCTP, deoxyGTP and deoxyTTP. and animal classification is done using the COI DNA barcode, however rbcL is the DNA barcode for plants since COI variation in plants is too low to distinguish different species. The PCR products are then sequenced by GeneWiz, a company in San Diego, which makes the DNA sequences available on their website. The sequences are then compared to a set of reference rbcL sequences to determine closest similarity. The computational tools for performing these comparisons are found on the Blue Line of the DNA Subway, an online educational tool developed by the DNA Learning Center at Cold Spring Harbor, NY. The general logic for DNA sequence comparison is shown in Figure 2. ACGTGCTAGA ACGTGCCAGA ATGTGCCAGA ACGTACCAGA GCGTACCAAA ACGTACCAAA ACGTCCCAAA Figure 2. Evolutionary changes in DNA sequence. The ancestral DNA sequence is shown on the left. Two independent nucleotide changes, marked by circles, occur that differentiate two lineages. Recall that a change that defines a new lineage is termed a synapomorphy. Over time, new nucleotide changes, marked by triangles, occur to further differentiate these lineages. The International Barcode of Life (iBOL) campaign organizes barcoding efforts for projects that focus on certain types of organisms (i.e. ants, sharks, mosquitos) or certain types of environments (i.e. marine, coral reef, polar). The iBOL website has lots of interesting information (ibol.org); check out the About Us tab. This lab will be divided into three sections. 1. Obtain a small leaf sample from one of the plants commonly found on campus and isolate total DNA (nuclear, mitochondrial and chloroplast). 2. Amplify DNA by PCR using the rbcL primers, load gel to verify the presence of PCR product (580 bp) before sending for sequencing. Gel can be viewed on BeachBoard. 3. Analyze sequence data using the DNA Subway Blue Line; discover which reference barcode is most closely related to your plant species. Identify the family that your species belongs to! Concepts Reinforced in this Lab: DNA Barcodes are small regions of organelle DNA that can identify an organism. The gene encoding RuBisCo large subunit (rbcL) is the DNA barcode used for plants. During the process of evolution, more DNA substitutions occur for more distantly related species, while closely related species have a smaller number of DNA substitutions. The polymerase chain reaction (PCR) amplifies DNA that is located between two primers. In this case, chloroplast DNA is the template that is amplified by the rbcL forward and rbcL reverse primers. The PCR products are sequenced by dideoxysequencing which utilizes dideoxy nucleotides that lack a 3’-OH group and serve as a chain terminator. DNA sequences can then be compared to reference DNA barcodes to place a DNA sequence into a phylogenetic tree. Experimental Procedures Day 1: Genomic DNA Isolation 1. Eight different plant species surrounding the MLSC building will be used in this exercise. A numbered sign has been posted next to each plant type. When instructed to do so by your instructor, you will view the plants selected for this experiment and CHOOSE ONE for tissue collection. Be sure to take one P1000 pipette tips, one 1.5ml microfuge tube and a marker with you before you leave the lab. 2. Using the wide end of a P1000 pipette tip, punch a disc of plant tissue into a 1.5ml microfuge tube. This can be most effectively done by “sandwiching” leaf tissue between the tip end and the tube opening and then pushing the tip into the tube. Close the cap on the microfuge tubes and return to lab. The tip can be discarded in any trash receptacle. Be sure to label your sample of plant tissue with the number on the sign and your group name. 3. After returning to the lab, add 100μl of Nuclei Lysis Buffer to your tube. 4. Thoroughly grind the tissue in the tube using a clean blue plastic pestle for 1 minute. 5. Add an additional 500μl of Nuclei Lysis Buffer to the tube and then incubate the tube at 65°C for 15 minutes by placing them in any of the hot baths labeled “65°C” located around the room. Be sure to wear the insulated gloves when adding and removing your samples from the bath. 6. Add 3μl of RNase Solution to each tube, close the cap, invert several times to mix the contents and incubate the tubes at 37°C for 15 minutes by placing them in any of the baths labeled “37°C” located around the room. 7. Add 200μl of Protein Precipitation Solution to the tube, close the caps and then vigorously shake the tube for 5 seconds to mix the contents. 8. Incubate the tube on ice for 5 minutes. 9. Place your tube across from another lab group’s tube to balance the centrifuge. Make sure the tubes have their cap hinges pointing outwards in a microcentrifuge. Once all groups’ tubes have been added, centrifuge the tubes at maximum speed for 4 minutes. 10. While the tubes are spinning, label 1 new microfuge tube. When the centrifugation is complete, gently transfer 600μl supernatant from your tube into the new tubes. Do not mix up the samples and be sure to use a new pipette tip. Also, do not disturb the pellet. After the transfer, the tube with the pellet may be discarded into the room trash bins or beakers labeled “used pipette tips”. 11. Add 600μl of isopropanol to the collected supernatant, close the caps, and invert each tube several times to mix the contents. The isopropanol will cause the DNA to come out of solution. 12. Place your tube across from other lab group’s tube with their cap hinges pointing outwards in a microcentrifuge. Once all groups’ tubes have been added, centrifuge the tubes at maximum speed for 1 minute. The DNA is now in the small pellet that is formed at the bottom of the tube. 13. Carefully pour off (decant) the supernatant into one of the beakers labeled “Used Isopropanol and Ethanol”. These beakers can be found near the sinks. Add 600μl of 70% ethanol to each tube, close the cap and gently invert the tubes three times to wash the salt from the DNA pellets. 14. Place your tubes across from others with their cap hinges pointing outwards in a microcentrifuge. Once all groups’ tubes have been added, centrifuge the tubes at maximum speed for 1 minute. 15. Carefully decant the supernatant (into one of the beakers labeled “Used Isopropanol and Ethanol”) and then use a fresh tip on a P100 to remove the residual 70% ethanol being very careful not to disturb the pellet, which may or may not be visible. The pellet will be located on the side of the tube with the cap hinge so don’t let the pipette tip touch this part of the tube. 16. Allow the pellet to air dry for 15 minutes. During this time leave the cap open and rest the tube on its side. 17. Add 100μl of DNA Rehydration Solution to the DNA pellet, close the cap and allow the tube to incubate overnight at 4°C. The next day your TA will move your samples to a minus 20°C freezer for storage. Day 2: Polymerase Chain Reaction and Gel Electrophoresis PCR Setup 1. Allow your genomic sample to thaw and then mix it by gently tapping the tube with your finger. 2. Spin your tube briefly in a microcentrifuge to pull the contents down to the bottom being sure to use another group’s tube as a balance. 3. Obtain one PCR tube containing Ready-To-Go PCR “beads”. Label the tube with your group and sample numbers. 4. Add 23μl of the Primer/Loading Dye Mix and allow the beads to dissolve for 1 minute. 5. Using a fresh tip, add 2μl of genomic sample to the PCR tube. 6. Place the tube in the thermal cycler for amplification. Following amplification, spin your tube down briefly in the microcentrifuge. Agarose Gel and Electrophoresis Buffer Preparation (one agarose gel per two pairs of students) 1. Make 350 mL of 1X TBE (Tris Base/Boric Acid/EDTA) from 10X TBE stock. Transfer 40 mL of the 1X TBE to a flask and keep the remaining 310 mL in the large graduated cylinder. 2. To the flask containing the 40 mL of 1X TBE add the appropriate weight in grams of agarose to make 2% agarose (in this 40 mL of 1X TBE solution). Check your calculations with your lab instructor before continuing. 3. Microwave for 1 minute. CAUTION: Flask will be hot. Use proper gloves for removal. Carefully swirl and view the molten agarose to ensure that it has fully melted. 4. Allow to cool to 65 ºC and then bring flask to lab instructor, who will dispense 2.0 μL of ethidium bromide into your flask –SUSPECTED CARCINOGEN---WEAR GLOVES! 5. Pour your gel and add the combs to make the wells. Allow to solidify—about 20 minutes. Check with your instructor. 6. Turn the gel (after comb removal) to the proper orientation in the electrophoresis unit and add the remaining electrophoresis buffer to the apparatus. 7. Load your gel as follows— --Lane 1—4μl of group 1 PCR-amplified plant sample --Lane 2—4μl molecular weight standard --Lane 3 -- 4μl of group 2 PCR-amplified plant sample 8. Run gel at 120 V for 20 minutes. 9. Your laboratory instructor will assist you in interpreting your gel results. For each sample you should have one band approximately 580 bp. For comparison, the molecular weight standard has fragments of 100, 250, 400, 800 and 1,500 bp in length. Your instructor will post an image of your gel on BeachBoard. 10. Provide the tube with your PCR product to your lab instructor for sequencing. Make sure your tube is CLEARLY labelled with your lab group’s initials and plant number. If your PCR was successful, two sequencing reactions will be set up, one with the rbcL for primer and one with the rbcL rev primer. 10 μl of PCR product will be used in each dideoxy sequencing reaction. Day 3: Interpreting Sequencing Results and Phylogenetic Tree Building 1. Set up an account with DNA Subway (dnasubway.iplantcollaborative.org). This should be done the week before the data are analyzed since it takes 24 hours. Each lab group should set up one account with a shared username and password. 2. Login to DNA Subway, and click on the blue box towards the left of the screen Determine Sequence Relationships. 3. Under Barcoding, select rbcL, then select Import trace files from DNALC. The sequenced obtained by GeneWiz are automatically uploaded to the DNALC. 4. Search for the name provided by your TA, and click on the tracking number to get a list of sequences. 5. Select your sequences. There should be an F and R for your DNA sample. Click on Add Selected Files, and let the green bar go across the screen until files are uploaded. 6. Give a Project title that includes your group name and plant numbers. Adding a description is optional. Click on Continue. 7. Now the blue line is apparent. There are three hubs: Assemble Sequences, Add Sequences and Analyze Sequences. We will take the DNA subway down the Assemble Sequences branch line first. 8. Click on Sequence Viewer to see your two DNA sequences. By moving the gray bar below the sequences, you can see the entire sequence. At the beginning and end there will be many N’s. N’s are read by the ABI machines when there are multiple strong peaks in one location and a base cannot be called. There should not be many N’s in the middle of your sequence. 9. You can view the electropherogram by clicking the View Trace icon next to the sequence name. The four differently color traces show the signal for the four bases; green = A, red = T, blue = C and black = G. A black peak, by itself, is called a G by the sequencing machine software. 10. Just below the called letters is the Quality score shown by vertical blue bars. This is measured by the Phred Score. A Phred Score of 20 signifies that the probability that the base was called incorrectly is 1 in 100. Phred scores above 20 are considered reliable, and the horizontal blue line is 20 Phred units. Click on the View Trace icon for each sequence; be sure to note if you get a “Low Quality Score Alert”. This sequence may need to be removed from the analysis. 11. Close the Sequence Viewer box and click on Sequence Trimmer. This will trim the N’s off of the ends of each sequence. When done, click again to see trimmed sequences. Close Sequence Trimmer box. 12. Click on Pair Builder and the Pair Builder box opens up. For each DNA sample, there should be an F and R. Click on the box to the right of the sequence for the F and R of your DNA sequence. Once you check the two boxes, a window will pop up that asks you to Pair them? Click yes. 13. For the R sequence, click on the blue F to the right of the sequence, and it will turn into a red R. This means that the reverse complement of this sequence will be used in the sequence alignment. This is done so each sequence is on the same strand. 14. If you had a low quality score alert on a certain sequence, do not include it in the pair building. Just use the high quality sequence in your analysis below. 15. Click on SAVE to save your pair. Then click on Consensus Builder to view your consensus DNA sequence based on your DNA sequences that were obtained. 16. The Consensus Editor will open when Consensus Builder is clicked. If your DNA sequences are high quality, there will be none to only a small number of yellow mismatches. If there are many mismatches it will be necessary to remove a low quality sequence from the analysis. In the Editor, you can change the name of your sequence and give it the common name provided by your TA for the species you analyzed. Click Edit Name, change name, and then Save. 17. Now you are ready to take the DNA Subway down the Add Sequences branch line. Skip the first two stops and click on “Reference Data”. 18. Click on Common plants, then Add ref data, and close box. 19. Now you are ready to take the DNA Subway down the Analyze Sequences branch line. Click on Select Data. Click Select all to add the User data and the Reference data set (Common plants) to be used in the analysis. Click on Save Selections, which will close the box. The Common plants are a wide variety of angiosperm (covered seeds) and gymnosperm (naked seeds: fern, gingko, pine) plants. The angiosperms contain representatives from both monocotyledons (asparagus, wheat, corn and rice are examples) and dicotyledons (magnolia, sunflower and broccoli are examples). If one of your sequences was low quality and NOT used in pair building, then be sure to choose the high quality score in the Select data window. 20. Click on MUSCLE to align the sequences. The light will flash yellow and red while the program is running and then change to solid green with a white V when it is ready to be viewed. Click on MUSCLE to open Alignment Viewer box. Your lab group’s sequence will show up on the alignment with vertical bars that show nucleotide differences. More closely related species will be near one another in the MUSCLE alignment. Close Alignment Viewer box, and click on PHYLIP NJ. 21. View output of PHYLIP NJ by clicking on icon when white V is visible on a green background. A phylogenetic tree built by Neighbor Joining (NJ) will be visible. Questions: Print and attach your final NJ tree and answer the following questions: Technical Questions 1. During the genomic DNA isolation, Protein Precipitation Solution was added to the tube, and then the tube was centrifuged and the supernatant was transferred to a fresh tube. Explain what was happening to proteins during these steps and why these steps are needed for purifying genomic DNA. 2. Which primers were used for PCR? Why did these primers work for PCR amplification in all the plant species located outside of MLSC? 3. Why is ethidium bromide added to the agarose gel solution? While the gel was running, was the DNA visible? What is needed to be done to visualize the DNA fragments in the gel? 4. What does an N in a DNA sequence signify? 5. Why did the reverse complement of the R sequence need to be used in the analysis? Draw out a DNA sequence with 5’ and 3’ ends to answer this question. Phylogenetic Tree Questions 1. Look at the tree as a whole. Do the gymnosperms, monocotyledons and dicotyledons form different clades? Show on your tree these three groups (see point 19 above). In some cases, one representative from one group may be in an odd place; likely due to low numbers of sequences in this analysis. Why would it help to have more species in the analysis; for instance many more monocotyledons? 2. The numbers on the NJ tree are the probability that the relationships are correct. 100 means that if the analysis were done 100 times, the separation adjacent to the number would always be made. If the value is 45, this means the separation would be observed 45/100 times. Values must be greater than 50 to be considered high confidence. Circle the numbers that separate gymnopserms from angiosperms and monocotyledons from dicotyledons. Are these values above 50? 3. Look for the closest relatives to your DNA sequence. These may be in the same family as your sequence or could be in a closely-related family. Using the common name of the species that were used for DNA isolation and sequencing, search for the correct family using the internet. The Common plants reference list only has a subset of families, so you may not have a representative from your family in this phylogenetic tree. Report which family is closest and the correct family for your species. If you were studying this family in more detail, what types of reference barcodes would provide a more meaningful tree?