ASSET Student Laboratory Collecting Wild Tetrahymena For hundreds of years, humans have been compiling information about the plants, animals and other organisms that share their world. At the most basic level, it’s simply interesting and exciting to know what’s there. But that information can also give us insight into how those organisms came to be there and how they’re related to other organisms. With the tools of molecular biology, scientist can now answer questions that we couldn’t even have imagined asking two hundred years ago. Tetrahymena is a ciliated protist that occurs naturally in fresh water. Only about 50µm itself, it eats things that are even smaller, using its cilia to sweep them into its mouth. Generally these smaller objects are bacteria. Areas in which there are lots of bacteria are good places to find Tetrahymena. Since bacteria are decomposers, places where there is dead plant or animal material in the water are likely sites for finding them. Tetrahymena has been used in research labs to study the structure of chromosomes and the catalytic ability of RNA, among many other things. It’s easy to grow in the lab and it’s eukaryotic, so it’s an ideal model organism for research. But surprisingly little is known about where it lives and how it’s distributed around the world in natural ecosystems. Up to now, relatively few biologists have been involved in finding out where it lives and how widespread it is. The goal of this lab is to use and help build that database by collecting water samples in your area and looking for Tetrahymena. Identification of protists is tricky because they’re so small. Many are similar-looking and the differences between species can be so subtle that it takes a practiced eye to spot them. Fortunately, molecular biology has provided tools that let us use small differences in genetic sequences to distinguish between members of different species. If you can find the organism, you can use molecular biology tools to figure out what type of organism it is. Tetrahymena is one of a growing number of organisms that have had their entire genomes sequenced. In other words, scientists know the ACGT spelling of all the genes in the organism. All of that information is entered in a database that’s freely accessible. If you collected a protist from a pond in your area, extracted its DNA, and determined its ACGT sequence, you could use a computer to compare the sequence of your organism to the sequences of all the other organisms in the database. The more closely related two different organisms are, the more similar their DNA sequences will be. If you found that your organism was, in fact, Tetrahymena, you could report your discovery to scientists who are interested in knowing more about the distribution of the organism in the wild. Overview The goal of this research project is threefold: 1) to determine whether or not Tetrahymena is present in the water you sample, 2) to isolate DNA from your “Tetrahymena” so it can be sequenced, and 3) using the sequence, find the most likely identity of the organism you've collected. Is it actually Tetrahymena, and if so, what !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 1 species is it? Scientists are interested in the results you get, so it's important to work carefully and keep detailed records so you can report everything you've discovered. In order to do this, there are six parts of the lab. Since PCR and sequencing require special equipment, you’ll send your samples out to a lab for those two steps (unless your classroom has a PCR machine, in which case only the sequencing needs to be done in a specialized lab), but you’ll do the rest yourself. • sample collection • isolation of Tetrahymena • DNA extraction • PCR • DNA sequencing • bioinformatics – comparison of your Tetrahymena sequence to sequences in genetic databases Scientists use molecular biology techniques every day. In the field of human biology, knowing a DNA sequence can give information about whether or not a person carries a genetic disorder, how likely a person is to have a particular type of cancer, what medication will be most effective in treating a disease in a particular patient, or which murder suspect left evidence at the scene of a crime. In the field of evolutionary biology, sequence information is used to determine how closely related organisms are. Once we know how closely related organisms are, we can draw conclusions about their evolutionary history. The general rule is that the more similar the DNA sequences of two organisms are, the more closely related they are. (If you think about it, DNA copying is pretty faithful from generation to generation, but occasionally copying mistakes are made. The more time and generations there are separating two organisms, the more likely it is that there will be differences.) All organisms on earth have a common ancestor. In the case of you and your brother, the most recent common ancestors are pretty obvious – your parents. In the case of you and your pet mouse, there’s still a common ancestor, but that ancestor existed far back in history, and may not have looked very much like either you or your mouse. You and your brother are not identical in terms of your DNA sequence (unless you’re identical twins), but your DNA sequences will be 99.9% the same. You and your pet mouse are closer to 85% identical. From an evolutionary standpoint, this means that the common ancestor you and your pet mouse share existed much further back in time than the common ancestor you and your brother share. (Interestingly, most of the differences between you and your mouse are in non-coding regions of the DNA. Only a small portion of the differences are in areas that make you very different from your mouse. This is one of the reasons that mice make good model organisms for testing drugs and other substances. Where it really matters in the DNA, mice and humans are very similar.) So, our goal is to find some organisms that appear to be Tetrahymena, to extract their DNA, determine the DNA sequence, and figure out how closely related they are to Tetrahymena thermophila, the organism used by researchers in the lab. Day 1: Collecting Samples Tetrahymena have been found in USA, Canada, Mexico, Central and South America, the Caribbean, Europe, China, Australia, Africa and the Pacific Islands. In the United States, up to this point, the areas where most Tetrahymena strains have been collected has often been linked to locations in which researchers live and work. Student research will help to fill the gaps in our knowledge of where Tetrahymena live and what species occur in nature. !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 2 The best collection results will come during warmer weather – late May through early October. It's not known for sure what they do during the winter, but their numbers drop in the fall and rebound the following spring. They are more likely to be found in still than fast moving water. Tetrahymena eat mostly bacteria. Look for them where they feed, so try for an area that has emergent vegetation. Bacteria are important decomposers, so decaying plant and animal material will probably mean more bacteria and, therefore, more Tetrahymena. Stir up the bottom a little by dragging your collection bag across the bottom when you sample. You will probably have some mud in your water, but you don't want a bag full of mud. Don't worry about taking more than one sample in a pond. Since Tetrahymena are attracted to areas where bacteria numbers are high, you may find many Tetrahymena in one area (perhaps around a dead fish?) and none only yards away. If you're sampling in the same pond, just spread your sampling sites out. The sampling tool is a golf ball retriever, fitted with a one-quart Ziploc freezer bag. You only need about 100 mL of sample – less than a quarter of a bag. A bag that’s half-full will be easiest to work with and some air space at the top is good for organisms that need oxygen. After retrieving your sample, either in the field or back in the lab, place your collecting vial into the conical tube and secure it as shown in the photo. Put the conical tube into the bag and seal it. Working through the sealed bag, move the conical tube until it fills with water. Let the bag and collection set-up sit overnight. If you’re really eager to know if your sample is likely to be positive, you can take a look at any point. Use a transfer pipette to remove a milliliter or so from the bag and place it in half a Petri dish (doesn't need to be sterile). Place the dish under a microscope and look to see what you have. You'll probably need to use your 10x objective to see protists (some are big enough to see with the 4x lens). If you’re collecting in the summer or fall, the immediately obvious organisms are likely to be Daphnia, copepods, worms, larvae, etc. Protists are smaller than these. Tetrahymena look like tiny footballs that swim in relatively straight lines until they bump into something. They are smaller than Paramecium but move in much the same way. At this magnification, you won't see bacteria, but there are plenty of them in the water. If you’re collecting in the spring, you won’t see the big impressive critters, but there should be tiny organisms swimming around. As the weather gets warmer and summer nears, there will be more large organisms. Since Tetrahymena are nearly transparent, dark field illumination is used in the lab for viewing them, but you can see them with a regular microscope, especially if you turn the light down a bit. Where did you collect that specimen? You can pinpoint the location with a GPS unit if you have access to one. Some cell phones and PDAs have GPS capability. If not, it's easy to use Google Earth to get the location. You just have to have a good sense of precisely where you took your sample. Mark it on a road map if you aren't sure you'll remember. Make a note of landmarks like nearby buildings, roadways or the shape of the shoreline. A sketch of the area will help you to remember exactly where you were when you’re looking at a map later. You can use either Google Earth or Google Maps to get the GPS coordinates. Use the directions below that apply in your case. If Google Earth is loaded on your computer, click on the icon to get started. Double click on the map of the U.S. and then on the map of your state. This will zoom you in and center the expanded map on your location. !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 3 Keep zooming in until you can locate the pond you sampled. You can use the hand tool to drag the map around until you locate exactly where you took your sample. Choose the yellow “Add Placemark” tool from the menu at the top. A yellow pushpin and a new window will appear. Drag the pushpin to the spot where you collected and the coordinates will appear in the new window. Copy the coordinates in Data Table 1. If you don’t have Google Earth, connect to the Internet, go to the main Google website (www.google.com) and click on "Maps" at the top of the page. This should take you to a map of the US. Choose "Satellite" in the upper right corner of the map. This will give you an actual aerial view, making it easier to locate physical features like ponds. Double click on the map of the U.S. and then on the map of your state. This will zoom you in and center the expanded map on your location. Keep zooming in until you can locate the pond you sampled. You can also use the hand tool to drag the map around until you locate exactly where you took your sample. The pointer turns into a hand when you click on the map. Right click (control click on a Mac) on the site and choose "What's here?". The GPS coordinates for the site you just clicked on will appear right above the map window and the site will be marked on the map. Copy the coordinates in Data Table 1. If you want to make sure you have the right coordinates, re-open Google Maps or Google Earth and paste in your coordinates. Click Enter and it will take you straight to the site you just identified. Give a brief description of the site. Is it surrounded by overhanging trees or exposed to bright sunlight? About how big is it? Was it cold and rainy or warm and sunny when you collected? What is the land around it used for – residences, industry, recreation, farmland? Data Table 1. GPS Coordinates of Collection Site Description of Site Day 2: Getting Tetrahymena out of the mix The next step is to isolate Tetrahymena. Along with all the obvious large organisms, there's a huge population of bacteria and fungi. The small collection vial was filled with a mixture that scientists use to grow Tetrahymena in the lab. The idea is to attract the relatively small number of Tetrahymena from bag of water into the vial to make it easier to isolate them. Unfortunately, bacteria and fungi also grow well in the same mixture. You will transfer the contents of the small vial to tubes that contain growth medium plus antibiotics that will help to limit the growth of bacteria and fungi. Protists like Tetrahymena are not affected by the antibiotics. (Remember that protists are eukaryotic just like you. They are much more similar to you than bacteria are. Antibiotics are useful because they interfere with bacterial growth but not eukaryotic growth.) Materials Zip-lock bags with collection vials tube rack forceps waterproof marker !2012 NIH SEPA ASSET Program tubes with Neff and antibiotics alcohol paper towels Field Research BioInformatics Student Protocol" 4 Procedure 1. Label your Neff tube with your initials, the date and the location where you collected your water sample. 2. Open your Zip-lock bag and remove the conical tube containing your collection vial. Re-seal the bag so it won’t spill. 3. Remove the vial from the conical tube and blot it dry with a clean paper towel. Take a clean piece of paper towel and create a work space for the next step. 4. Dip your forceps into alcohol to sterilize them. While they are air-drying, look at the open end of the vial. Poke the tips of the dry forceps through the filter paper covering the opening in the white ring. Hook the curved end of the forceps under the white ring. Lift the white ring up and out by rotating the forceps. You may have to go around the ring several times to get it all the way out. Put the ring and the forceps on the paper towel. 5. Pour the contents of the vial into a tube containing Neff and antibiotics. Place the cap on the tube, but don’t tighten it all the way. This will allow air to circulate. 6. Clear up your work space as your teacher directs. 7. Leave the tube in the rack at room temperature and check on periodically it for the next few days. Record your observations in Data Table 2. Is it clear or cloudy? Can you see any evidence of growth? What does the evidence look like? Data Table 2. First Transfer to Neff. Incubation Time (hrs) ____ hours ____ hours ____ hours Observations Day 3: Getting Rid of Contaminants - Again When you captured Tetrahymena (or other protists) and transferred them into growth medium, you also transferred contaminants. It was unavoidable because they were in the water sample. The antibiotics were added to keep the contaminants from growing rapidly and overwhelming the Tetrahymena. Fungi are particularly hard to get rid of because they form spores that may have actually been protected inside Tetrahymena’s food vacuoles. If the spores aren’t digested, they are excreted with other waste and may germinate (begin to grow again) in the growth medium. Waiting a few days should ensure that the spores are eliminated. If you’re lucky, by making a second transfer to fresh medium, you will leave the spores behind and transfer such a small number of bacteria and other contaminants that the Tetrahymena will be able to eat them and you’ll end up with a culture in which the only living organism is the Tetrahymena. That’s your goal. It’s possible that you’ll have to make a third transfer if the contaminants seem to be getting the upper hand again. You and your teacher will make that decision. If your culture looks healthy after one or two transfers, you’ll be ready to extract DNA and send the DNA to ASSET for sequencing. Materials tubes with Neff and antibiotics waterproof marker micropipette (20-200 µL) !2012 NIH SEPA ASSET Program tube rack sterile dropper pipette tips for micropipette Field Research BioInformatics Student Protocol" 5 glass slide marker Procedure 1. Place your culture and a fresh Neff tube into a rack. Label the fresh tube again, same as last time. 2. Without opening it, look at your sterile pipette and find the 1 mL mark. Open it as directed by your teacher, starting from the bulb end. Don’t let the tip touch your skin or any other surface. Draw up about 1 mL from the culture tube and transfer it to your fresh Neff tube. Cap the old culture tube tightly, and cap the new one loosely. 3. Observe again over the next few days and record your observations in Data Table 3. Data Table 3. Second Transfer to Neff. Incubation Time (hrs) ____ hours ____ hours ____ hours Observations Day 4: DNA Extraction In order to determine whether or not the ciliates you have are actually Tetrahymena, you will need to extract their DNA and have it sequenced. In order to extract DNA, you will use a chemical (Chelex) that will break up the cells and prevent damage to the DNA. When a cell breaks up, its DNA is unprotected and comes into contact with enzymes called nucleases that the cell would ordinarily use to break down foreign DNA from viruses or bacteria, to make repairs to its own DNA, or to break down and recycle DNA from dead or dying cells. Fortunately, these enzymes usually need co-factors, like metal ions, in order to function. Chelex is a chelating agent, a substance that binds these metal ions, especially the very important Mg2+ ions, and keeps them from activating the DNA-degrading enzymes. It also attracts other polar components of the cytoplasm. When the Chelex and all the other substances that have attached to it settle to the bottom of the tube, the DNA is left in the solution above it. The solution contains a lot of other substances besides DNA, but the next step will be to make many copies of the DNA for sequencing using a technique called PCR or polymerase chain reaction. This technique only works on DNA and will ignore all the other substances in the solution, so it’s OK that the DNA isn’t completely pure,. The PCR reaction needs to use some of those same metal ions, so it’s important to get as much of the Chelex as possible out of the mixture. Simple settling will remove most of it, but there may be small particles that don’t settle out. Using a centrifuge will help to remove these. When a tube spins very fast in a centrifuge, particles that are large compared to water, DNA and other molecules in the solution will be forced into a pellet at the end of the tube. When the liquid above the pellet is removed carefully, the Chelex particles will remain in the pellet and the liquid will be free of all but the tiniest Chelex particles. This liquid mixture contains the DNA and is ready to be sent back to ASSET for PCR. Materials your culture micropipette (20-200 µL) !2012 NIH SEPA ASSET Program micropipette tips 5% Chelex Field Research BioInformatics Student Protocol" 6 55ºC water bath 100ºC water bath floating microfuge racks 2 microfuge tubes microfuge tube rack mini centrifuge marker microscope microscope slide cover slips waste beaker Procedure 1. Label one of your microfuge tubes “extraction.” Label the second tube “final extract.” Put your initials and the date on both and put them in the rack. 2. Set your micropipette for 100 µL, put a tip on it, and transfer 100 µL of your culture to the “extraction” tube. Discard the tip in the waste beaker. Re-set the micropipette to 200 µL, put on a fresh tip, and transfer 200 µL of the freshly shaken Chelex mixture to the “extraction tube. (If one partner shakes the Chelex tube and opens it for the partner with the micropipette, the Chelex should stay suspended pretty well. Work quickly so it doesn’t settle out too much.) 3. Place the “extraction” tube in the floating rack. (You’ll probably be using the same rack as other students, so you may have to wait until others are ready.) Place the floating rack in the 55ºC water bath for 30 minutes. During this time, the Chelex will break up the cells and bind up many of the substances that might break down the DNA. 4. While you are waiting for the Chelex to work, examine a sample of your culture under a microscope. Use your micropipette to transfer 20 µL from your culture tube to a slide. If you are going to use your 4X or 10X objectives, you won’t need a cover slip, but if you go to the 40X objective, make sure you cover the drop with a cover slip. Look for moving cells and sketch what you see with the 10X objective in the space to the right. If you have time, switch to the highest power and make a sketch at that level too. 5. After the 30-minute incubation is done, transfer the floating rack to the 100ºC water bath for 8 minutes. This step will denature (deactivate) enzymes that might break down the DNA. While your tube is in the boiling water bath, read the next two steps and get ready to do them quickly. 6. Place your tube in the mini centrifuge, making sure it’s balanced by other tubes. (“Balanced” means that there’s another tube with the same amount of liquid right across from yours. If you turn on the centrifuge when it’s not balanced, it won’t spin smoothly and may actually be dangerous.) When the tubes are ready, turn the centrifuge on for 3 minutes. 7. Set your micropipette for 100 µL, put on a new tip and, immediately after the centrifuge stops, and draw 100 µL of clear liquid from above the Chelex pellet in the bottom of the tube. Don’t get any Chelex into the tip. (If you do, just put the liquid back into the tube and spin it again.) If you can’t get 100 µL without picking up Chelex, take a little less than 100 µL. Transfer the liquid into the “final extract” tube. 8. Put your “final extract” tube into the rack your teacher has set up for the class. These are the tubes that will be sent to ASSET for sequencing. Although it doesn’t look like much in the tube, there’s enough for 50 PCR reactions! 9. Put your culture tube into the rack your teacher has set up for the class. The scientists at ASSET maintain a collection of different Tetrahymena strains from all over the world. If your sample turns out to be Tetrahymena some of your cells will be added to the collection and might be used for research in the future. 10. Your “extraction” tube can go into the waste beaker with your tips. Clean up the rest of your work space. !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 2 What Happens to the DNA Next? The PCR reaction requires a machine that automatically repeats a temperature cycle over and over. Because PCR machines are delicate, you will ship your DNA to ASSET and scientists there will run the PCR and send the DNA to be sequenced. Here’s what will happen. Your DNA sample will be mixed with all the other components that are needed to make DNA: • nucleotides, the A, C, G and T building blocks of DNA, • DNA polymerase, the enzyme that will put the nucleotides together, • primers, short bits of DNA that attach to the part of your DNA we’re interested in copying. They give the polymerase a place to begin making a copy, and • buffer, a solution containing those important Mg2+ ions and other chemicals that keep the pH of the mixture just right for the enzyme to work. Once all of these chemicals are mixed in a tube, the tube is put into the PCR machine. The machine is called a thermal cycler because it automatically repeats the same cycle of temperature changes over and over. • denaturation – in this step the DNA mixture is heated to about 98ºC for about 10 seconds, causing the double strands to come apart and making the DNA single-stranded. • annealing – the DNA is cooled to about 50ºC for 30 seconds, allowing the primers to find and stick to their complementary sequences on the DNA. • elongation – the DNA is heated up to about 70ºC for 60 seconds, allowing the polymerase to add nucleotides to the primers and make a much longer piece of DNA (polymerase can add about a thousand nucleotides per minute). The polymerase uses your DNA to tell it the order in which to add nucleotides to the new pieces, so it’s not just making random DNA – it’s making an exact copy of your DNA. This cycle of three time and temperature conditions is repeated 35 times, so by the time the PCR reaction is done, your DNA molecules have been doubled 35 times. This means that, theoretically, each DNA molecule in the original sample could produce billions of copies. Since the process isn’t 100% efficient, it probably won’t make quite that many, but it will certainly make millions of copies. Eventually, the raw materials will be used up and the enzyme will become less efficient after many cycles of heating and cooling. Once the PCR is done, the copied DNA will be sent to a special lab that has an automated sequencer. This machine is able to take the copies of DNA and make even more copies. This time, however, when the PCR reaction is run, special chain-stopping nucleotides are used. This time the PCR tube will contain a mixture of normal A, C, G, and T molecules, but also some chain-stopping A, C, G and T molecules. When the copying starts, if polymerase puts a normal A into place, the chain can keep growing, but as soon as a chain-stopping A is used, the chain ends with that A. The places where the chain-stopping nucleotides are added are random, so you end up with pieces of DNA that go from very short (the chain stopped soon after it started) to very long (the chain stopped at or near the end of the piece of DNA being copied). The same thing happens with the chainstopping G’s, C’s and T’s. In the sequencing process, these pieces are made, sorted by size, and their ending nucleotides are identified. If a piece that’s 3 nucleotides long ends in G, the third nucleotide must be G. If a piece that’s 6 nucleotides long ended in A, the sixth nucleotide must be A, and so on. A computer will immediately convert all that information into a sequence, so your data will be available the day after the DNA is sent for sequencing. The sequence you get back will not be the whole sequence of Tetrahymena. That sequence has been worked out, but it’s way too long for us to use. The sequence information you will get pertains to three small areas of the Tetrahymena genome: !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 2 • • • H3H4 – a short section of DNA that occurs between two genes for histones 3 and 4. Histones are proteins help coil up the DNA so it’s ready to be divided into two “packages” for two separate new cells. The PCR product made with the primers used is about 699 bp long. (The abbreviation bp stands for “base pair.” A base pair is a pair of complementary nucleotides – an A paired with a T for example. So, 699 bp means that the double-stranded DNA sequence is 699 nucleotides long.) 5.8s ITS – This is a non-coding region between the structural ribosomal components. The 5.8s ITS sequence is transcribed as part of a common precursor transcript, but is excised and rapidly degraded during ribosomal RNA maturation. The PCR product made with the primers used is about 150 bp. cox1 – a section that codes for part of cytochrome oxidase I, an enzyme that is important for cellular respiration. This DNA is actually in the mitochondria, not in the nucleus. The PCR product made with the primers used is about 2000 bp. It’s been widely used for “DNA bar-coding,” that is, using short sequences of DNA to scan for differences that allow biologists to identify organisms, sort of the way a scanner in the grocery store looks at the barcode on a cereal box to tell what it is and what its price should be. Three separate PCR reactions are run, one using H3H4 primers, one using 5.8s primers, and the third using cox1 primers. The primers used in the PCR process are specially chosen to “look for” these small regions. Scientists have already studied these regions and choose primers that will automatically bind to the beginning of these areas in the DNA you extracted. These regions are chosen because they’re found in all Tetrahymena cells, they are important so they don’t change much at all, but they have experienced occasional mutations. (Remember that in order to use DNA sequence information to tell how closely related different organisms are, we have to be able to see differences.) So, when the PCR reaction is run using the H3H4 primers, only that section of the whole Tetrahymena genome will be copied. When the second PCR reaction is run with the 5.8s primers, only that section will be copied. And when the third is run, only the cox1 section will be copied. If your sample contains DNA, the DNA will be sent for sequencing and you will get a report back that gives the sequences of these three short regions. Day 5: Bioinformatics – Using Sequence Information to Make an Identification At this point, you have received your sequence information from ASSET. It’s time to take it to the database used by scientists all over the world and see what the sequence tells you. You will be going to the National Center for Biotechnology Information (NCBI) website. NCBI maintains a public database (called GenBank) of genetic information that has been contributed by scientists from all over the world. The information is freely available to anyone. It was created in 1982 and, in recent years, has been doubling in size every 18 months. Scientists who work out the sequence for a particular gene, or for a particular region of DNA in the organism they study send their results go GenBank, where it is registered and made available to the public. As more information is gathered about the gene, the region or the DNA, or the organism it came from, it’s added to the database in the form of annotations. Other public databases also exist and can be searched using the NCBI tools. A complete human DNA sequence was finished in 2003. It’s like a book that’s billions of letters long (the only letters being A, C, G and T) with no spaces or punctuation to tell us where a gene begins or ends. Many clever scientists who are interested in finding the answers to questions about genetic diseases or evolutionary history go into the field of bioinformatics, where they use computer tools to search the genetic data for answers to important questions. !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 3 The complete genomes for many other organisms are also available now, including Tetrahymena. Since the databases are constantly growing, a search done today may yield different results than one done using the same sequence tomorrow. Most of the information will be exactly the same, but newly added information may show up on the later run. Materials DNA sequence information computer with access to the Internet Running a BLAST 1. Connect to the internet and go to: http://www.ncbi.nlm.nih.gov/ 2. On the right side of the page, under “popular Resources,” choose “BLAST”. BLAST stands for Basic Local Alignment Search Tool. It is a tool that will allow you to enter your sequence and search all the genetic databases that are linked under NCBI. It will report the sequences that it finds that are most closely aligned to yours. In other words, it will report the best matches. 3. Where it asks you to choose a BLAST program to run, choose “nucleotide blast.” 4. There will be several steps involved in running your search: a. Open the document that contains your sequence information from ASSET. b. Copy the H3H4 sequence from your document, just the ACGT portion, not the title or any other information. Paste it into the window that says: “Enter accession number(s), gi(s), or FASTA sequence(s).” You are entering a FASTA sequence. Make sure you see it in the box. c. Just below the sequence box, there’s a box for entering the “Job title.” Enter “H3H4” and your initials. d. Scroll down to the “Choose search set” box and pull down to “nucleotide collection.” The pull down window shows the major collections of data that you might want to search. If you were studying the amino acid sequence of a protein, you could use the Protein Data Bank. But your data is a nucleotide sequence, so that’s the place to search. e. Scroll down the page a bit to the “Program Selection” section and tell it to Optimize for “Somewhat similar sequences.” (The default setting is “Highly similar sequences.” If you wanted to look at samples from two different humans, for example, this would make sense because the sequences would probably be almost identical. By making this change, you allow the program to report bigger differences. Since your organism might not even be Tetrahymena thermophila, it makes sense that there could be significant differences. A BLAST search that only looks for highly similar sequences might not give you any results at all.) f. Scroll down to the bottom of the page and click on “Show results in new window.” This will bring up a separate window so you can look at both windows at the same time. g. Just below that box, click on “Algorithm parameters.” A number of new choices appear. Under “Filters and Masking,” unclick “low complexity regions.” Some areas of DNA have many repeats of the same nucleotides over and over. The default setting tells the search tool to ignore these types of sequences. You will be searching for a relatively short sequence that’s not repetitive, so you won’t need to filter these out. Filtering sometimes speeds up the search process, but it may block matches that could be important. If time isn’t an issue (and it won’t be for our short sequence), there’s no need to filter. h. Now you’re ready to search, so click on the big blue “BLAST” button at the bottom of the page. A status window will come up. It tells you that it’s searching and how long it’s been since you submitted your BLAST request. It shouldn’t take much more than 30 seconds or so. When it’s done searching, a new window will come up displaying your results. !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 4 Interpreting your results 1. First check to make sure you’re looking at what you actually submitted. In the upper left corner, you should see the job title you entered. Below that, it should say that you did a “nucleotide” search for a sequence that was about “480” nucleotides long. To the right of that information, there’s a report telling which databases were searched and what program was used. 2. Below this, you’ll see a graphic representation of the search results. The top horizontal red bar (the one labeled “Query”) represents the DNA sequence you pasted into the search box. Each of the bars below it indicates a match between your sequence and sequences in the databases. The best match is at the top, probably shown in red. As you go down, the matches get weaker because there are more differences. 3. Scroll down the chart and notice that as you go down, the bar color goes from red to blue. (Note the color code right above your query bar. As the match gets weaker, the color changes.) You may also notice gaps in the bars. These are areas where your sample and the sample from the database didn’t match because one of the two was missing DNA. If the line picks up again, the BLAST tool picked up a match later in the sequence. 4. Mouse over the first match and notice that a brief description of the sequence your DNA matched appears in the box above your query bar. 5. Scroll down below the chart and you’ll see a table. Each row in the table corresponds to a bar on the graphic display, and they’re in the same order. The “Description” column gives a brief description of the sequence from the data base, usually starting with the name of the organism it came from. A quick scan down the list can tell you whether you are likely to match a member of the genus Tetrahymena, or even Tetrahymena thermophila itself. 6. There are three pieces of data to look at. (Check Data Table 4. You’ll be entering some of this data as you go along.) a. max score (S) – This score comes from a mathematical calculation that takes a number of factors into consideration, like the length of the sequences, the number of identical pairs, whether there were gaps, etc. The higher the match score, the better the match. b. expect value (E) – E is the probability that a match would occur randomly, without any evolutionary relationship. For very short sequences, the likelihood of finding a random match is pretty high. You are searching a 480-nucleotide sequence, so the likelihood of a perfect random match should be really small (zero). The smaller the E number, the more confident you can be that this is a good match. (A value of “1e-15” means that the probability of a random match is 1x10-15, or one in 1015 sequences. That’s one in a quadrillion.) c. max identity – This is also called the “% identity” and refers to the percentage of the nucleotides in the query and subject sequences that match exactly. The higher the percent, the better the match. 7. Click on the top red bar of the top entry in the graph. You’re now looking at details of the first match, perhaps something like the match below. (If you scroll down from this point, there will be a similar entry for each of the matches. The sequence you pasted into the BLAST is the “Query” and the “Subject” is the matching DNA from the database. BLAST has lined up the two so you can see where the differences are. Look at the sample below and notice the following: a. The first 12 nucleotides are exactly the same. You can tell at a glance because there’s a vertical bar between the letters. Where there’s no match, there’s no vertical bar. Find a spot where an A in one sequence is a G in the other. b. “N” means that there was a nucleotide present, but the sequencing reaction wasn’t able to tell what it was. !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 5 c. A hyphen in one of the sequences indicates a “gap.” This means the sequences stopped matching for a short distance, but then began to match again. The computer inserted this “gap” so the match could continue. There are two in the sequence below. d. Note that the “Query” sequence (your sequence) starts at nucleotide #11. The “Subject” sequence (the match found by BLAST) starts at #44. This means that the sequences you are comparing had different starting points. BLAST found the point where they started matching and reported from that point on. In this case, which sequence was longer? emb|X17135.1| DNA Length=559 Tetrahymena malaccensis histone H3II and histone H4II intergenic Score = 762 bits (844), Expect = 0.0 Identities = 450/467 (96%), Gaps = 2/467 (0%) Strand=Plus/Plus Query 11 Sbjct 44 Query 70 Sbjct 103 Query 130 Sbjct 163 Query 190 Sbjct 223 Query 250 Sbjct 283 Query 310 Sbjct 343 Query 370 Sbjct 403 Query 430 Sbjct 463 TTCTGGCGGCCT-GGNAGCGAGTTATTTTCTGGGGGCCTTAGCACCAGGTGGACTTTCTA |||||||||||| || |||||||||||||||||||||||||||||||| ||||||||||| TTCTGGCGGCCTTGGAAGCGAGTTATTTTCTGGGGGCCTTAGCACCAG-TGGACTTTCTA GCAGTTTATTTAGTTCTAGCCATTTTTGCTTATGTATTTATAGTGGATTGTCTTTTTGAC |||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||| GCAGTTTATTTAGTTCTAGCCATTTTTGCTTATGTATTTATAGTGGGTTGTCTTTTTGAC TTTTCTTTTGAAGGTTATTATTTTTTTTTAATAAAATTCTTTATCGACAACAATTAGGGC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTTTCTTTTGAAGGTTATTATTTTTTTTTAATAAAATTCTTTATCGACAACAATTAGGGC AAGATCATTTGAAATGTTTGGCATAATCCTGGAAAGAGAAGATATGAACAATTTTGATTG |||||||||||||||||||||| |||||||||| ||| ||||||||| |||||||||||| AAGATCATTTGAAATGTTTGGCTTAATCCTGGATAGACAAGATATGATCAATTTTGATTG 69 102 129 162 189 222 249 282 GATGATTTGAAAGGAAATCAGATTTTTGAGATTTTATCCAATCAAATTTGAGATCTCCGA 309 |||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||| GATGATTTGAAAGGAAATCAGATTTTTGAGATTTTATCCAATCAGATTTGAGATCTCCGA 342 GCAATTTGGATAATTAAATAATATTAAAAAAAAAGAGATCTTTCCCCAAAGACGATAATC ||||||||||||||| ||| |||||||||||||||||||||||||||||||| |||||| TCAATTTGGATAATTAGATATTATTAAAAAAAAAGAGATCTTTCCCCAAAGACTATAATC 369 ATTAAAACAAAAATAAATAATCTAATTAAAAATAACAATAAAAAAATAATAATCCAGCAA ||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| ATTAAAACAAAAATAAATAATCTAATTAAAAATAATAATAAAAAAATAATAATCCAGCAA 429 AAATGGCCGGTGGAAAAGGTGGTAAAGGTATGGGTAAGGNCGGANCC ||||||||||||| ||||||||||||||||||||||||| |||| || AAATGGCCGGTGGTAAAGGTGGTAAAGGTATGGGTAAGGTCGGAGCC 402 462 476 509 8. Let’s say you want to know more about your top entry. Click on the blue “Accession Number” in the upper left corner. When a scientist submits a DNA sequence to the database, it is given an accession number that can be used to find the sequence and any other information about it that has been submitted. This opens up a new page that gives all the information stored in the database about this DNA sequence. Answer the remaining questions in Data Table 4 for your sequence. (Remember that blue text is probably a link to other information, and that you have information search tools available on your computer.) 9. Repeat the process for your other DNA sequences, the 5.8s and cox1. Enter your data in Data Table 4. !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 6 Data Table 4. BLAST Search Data. H3H4 sequence 5.8sITS sequence cox1 sequence Collection Site Name Collection Site GPS Coordinates Job Title for BLAST Max Score (S) E Value % Identity Accession Number Organism the best match in the database came from What author(s) submitted the sequence? What was the date of their earliest publication? How many different journals are listed? What super kingdom does the organism belong to? What family does it belong to? How many nucleotides are in the sequence? !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 7 Conclusion Questions 1. Tetrahymena is a tiny ciliated protist. It feeds by using cilia to sweep particles from the water into its oral groove. Why do you think it’s more likely that you’ll find Tetrahymena in still water than in a fast-moving stream? 2. Why did your teacher suggest collecting near decaying vegetation, even though Tetrahymena doesn’t eat plants? 3. Draw a picture of the most interesting organism you saw in your water sample. Describe in words what made it interesting to you. 4. Why was it necessary to use antibiotics in isolating Tetrahymena-like protists from the water sample? 5. Why would anyone want to know the GPS coordinates of the collection site? 6. During the DNA extraction, what was the function of the Chelex? 7. Why did you transfer the Chelex mixture to a boiling water bath? 8. If a PCR reaction from evidence left at the scene of a crime contained three strands of DNA from a suspect, how many copies of a section of that DNA would there be after 35 PCR cycles? 9. In the PCR reaction, what is the function of the DNA polymerase? Most enzymes would be deactivated by the high temperatures needed in the PCR reaction. Using Google, find out how molecular biologists found a heat-resistant enzyme. 10. Why does a PCR reaction need primers? 11. Why does only a small portion of the DNA get copied during the PCR reaction? !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 1 12. The pieces of DNA made in a sequencing reaction were sorted by gel electrophoresis. The final, chain-stopping nucleotide for each piece is shown on the image of the gel. a. What is the ending nucleotide of the shortest piece? ________ b. What is the ending nucleotide of the longest piece? ________ c. What would the sequence of the original DNA be? _________________ 13. In your BLAST search, what organism was the closest match based on the H3H4 sequence? the 5.8s sequence? the cox1 sequence? 14. In your BLAST search, what was the % identity for the best match based on H3H4? based on 5.8s? based on cox1? 15. Imagine that you ran BLAST searches on DNA sequences from three different organisms. BLAST compared your sequence data to an organism called Pentahymena arcticophila and reported the following data. Answer the questions below based on the data. Collection site H3H4 5.8s cox1 % E value % E value % E value Identity Identity Identity North Pond 100% 0 100% 0 97% 0 Johnson Lake 98% 0 97% 0 87% 9.8 e178 Smith’s Birdbath 97% 0 92% 0 99% 0 a. Which of the samples contained organisms that were the most similar to Pentahymena arcticophila? b. Which of the samples contained organisms that were the most similar to each other? c. Of the three DNA sequences, which seems to be “most highly conserved” (i.e., it’s changed the least over time)? d. Explain your choice for “c”. 16. Deciding exactly how much difference between two sequences would be needed to indicate that they came from two different species is tricky. Even with DNA sequence evidence, scientists aren’t always sure where to draw the line. !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 2 If you had cox1 sequence data from two similar, but non-identical organisms, what other evidence would you want to look at before deciding that they belonged to different species? List at least five other pieces of evidence you’d want to look at: a. b. c. d. e. 17. You’ve just used BLAST to compare your sequences with the sequences scientists have submitted to the database. You can also search for other information. Go back to the NCBI home page: http://www.ncbi.nlm.nih.gov/ In the box at the top of the page, enter “cytochrome oxidase”. (“All Databases” should be chosen in the pull down menu.) Click on “Search.” The page that comes up lists all references to cytochrome oxidase found in all the public databases. a. How many Nucleotide sequence records are there? (Don’t click on it – you’ll have to wait a long time for those records to load.) b. Click on the link to “Structure.” What do you notice about the links that come up? c. The link to OMIM gives information about human inheritance. Click on it and get a list of scientific journal articles that mention cytochrome oxidase. Scan down the list and find the name of the first disease mentioned in an article title. !2012 NIH SEPA ASSET Program Field Research BioInformatics Student Protocol" 3