Tetrahymena Project Directions

advertisement
ASSET Student Laboratory
Collecting Wild Tetrahymena
For hundreds of years, humans have been compiling information about the plants, animals and other organisms
that share their world. At the most basic level, it’s simply interesting and exciting to know what’s there. But
that information can also give us insight into how those organisms came to be there and how they’re related to
other organisms. With the tools of molecular biology, scientist can now answer questions that we couldn’t even
have imagined asking two hundred years ago.
Tetrahymena is a ciliated protist that occurs naturally in fresh water. Only about 50µm itself, it eats things that
are even smaller, using its cilia to sweep them into its mouth. Generally these smaller objects are bacteria.
Areas in which there are lots of bacteria are good places to find Tetrahymena. Since bacteria are decomposers,
places where there is dead plant or animal material in the water are likely sites for finding them.
Tetrahymena has been used in research labs to study the structure of chromosomes and the catalytic ability of
RNA, among many other things. It’s easy to grow in the lab and it’s eukaryotic, so it’s an ideal model organism
for research. But surprisingly little is known about where it lives and how it’s distributed around the world in
natural ecosystems. Up to now, relatively few biologists have been involved in finding out where it lives and
how widespread it is. The goal of this lab is to use and help build that database by collecting water samples in
your area and looking for Tetrahymena.
Identification of protists is tricky because they’re so small. Many are similar-looking and the differences
between species can be so subtle that it takes a practiced eye to spot them. Fortunately, molecular biology has
provided tools that let us use small differences in genetic sequences to distinguish between members of different
species. If you can find the organism, you can use molecular biology tools to figure out what type of organism
it is.
Tetrahymena is one of a growing number of organisms that have had their entire genomes sequenced. In other
words, scientists know the ACGT spelling of all the genes in the organism. All of that information is entered in
a database that’s freely accessible. If you collected a protist from a pond in your area, extracted its DNA, and
determined its ACGT sequence, you could use a computer to compare the sequence of your organism to the
sequences of all the other organisms in the database. The more closely related two different organisms are, the
more similar their DNA sequences will be.
If you found that your organism was, in fact, Tetrahymena, you could report your discovery to scientists who
are interested in knowing more about the distribution of the organism in the wild.
Overview
The goal of this research project is threefold: 1) to determine whether or not Tetrahymena is present in the water
you sample, 2) to isolate DNA from your “Tetrahymena” so it can be sequenced, and 3) using the sequence,
find the most likely identity of the organism you've collected. Is it actually Tetrahymena, and if so, what
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
1
species is it? Scientists are interested in the results you get, so it's important to work carefully and keep detailed
records so you can report everything you've discovered.
In order to do this, there are six parts of the lab. Since PCR and sequencing require special equipment, you’ll
send your samples out to a lab for those two steps (unless your classroom has a PCR machine, in which case
only the sequencing needs to be done in a specialized lab), but you’ll do the rest yourself.
• sample collection
• isolation of Tetrahymena
• DNA extraction
• PCR
• DNA sequencing
• bioinformatics – comparison of your Tetrahymena sequence to sequences in genetic databases
Scientists use molecular biology techniques every day. In the field of human biology, knowing a DNA
sequence can give information about whether or not a person carries a genetic disorder, how likely a person is to
have a particular type of cancer, what medication will be most effective in treating a disease in a particular
patient, or which murder suspect left evidence at the scene of a crime.
In the field of evolutionary biology, sequence information is used to determine how closely related organisms
are. Once we know how closely related organisms are, we can draw conclusions about their evolutionary
history. The general rule is that the more similar the DNA sequences of two organisms are, the more
closely related they are. (If you think about it, DNA copying is pretty faithful from generation to generation,
but occasionally copying mistakes are made. The more time and generations there are separating two
organisms, the more likely it is that there will be differences.)
All organisms on earth have a common ancestor. In the case of you and your brother, the most recent common
ancestors are pretty obvious – your parents. In the case of you and your pet mouse, there’s still a common
ancestor, but that ancestor existed far back in history, and may not have looked very much like either you or
your mouse. You and your brother are not identical in terms of your DNA sequence (unless you’re identical
twins), but your DNA sequences will be 99.9% the same. You and your pet mouse are closer to 85% identical.
From an evolutionary standpoint, this means that the common ancestor you and your pet mouse share existed
much further back in time than the common ancestor you and your brother share. (Interestingly, most of the
differences between you and your mouse are in non-coding regions of the DNA. Only a small portion of the
differences are in areas that make you very different from your mouse. This is one of the reasons that mice
make good model organisms for testing drugs and other substances. Where it really matters in the DNA, mice
and humans are very similar.)
So, our goal is to find some organisms that appear to be Tetrahymena, to extract their DNA, determine the DNA
sequence, and figure out how closely related they are to Tetrahymena thermophila, the organism used by
researchers in the lab.
Day 1: Collecting Samples
Tetrahymena have been found in USA, Canada, Mexico, Central and South America, the Caribbean, Europe,
China, Australia, Africa and the Pacific Islands. In the United States, up to this point, the areas where most
Tetrahymena strains have been collected has often been linked to locations in which researchers live and work.
Student research will help to fill the gaps in our knowledge of where Tetrahymena live and what species occur
in nature.
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
2
The best collection results will come during warmer weather – late May through early October. It's not known
for sure what they do during the winter, but their numbers drop in the fall and rebound the following spring.
They are more likely to be found in still than fast moving water.
Tetrahymena eat mostly bacteria. Look for them where they feed, so try for an area that has emergent
vegetation. Bacteria are important decomposers, so decaying plant and animal material will probably mean
more bacteria and, therefore, more Tetrahymena. Stir up the bottom a little by dragging your collection bag
across the bottom when you sample. You will probably have some mud in your water, but you don't want a bag
full of mud.
Don't worry about taking more than one sample in a pond. Since Tetrahymena are attracted to areas where
bacteria numbers are high, you may find many Tetrahymena in one area (perhaps around a dead fish?) and none
only yards away. If you're sampling in the same pond, just spread your sampling sites out.
The sampling tool is a golf ball retriever, fitted with a one-quart Ziploc freezer bag. You only need about 100
mL of sample – less than a quarter of a bag. A bag that’s half-full will be easiest to work with and some air
space at the top is good for organisms that need oxygen.
After retrieving your sample, either in the field or back in the lab, place your collecting vial into the conical tube
and secure it as shown in the photo. Put the conical tube into the bag and seal it. Working through the sealed
bag, move the conical tube until it fills with water. Let the bag and collection set-up sit overnight.
If you’re really eager to know if your sample is likely to be positive, you can take a look at any point. Use a
transfer pipette to remove a milliliter or so from the bag and place it in half a Petri dish (doesn't need to be
sterile). Place the dish under a microscope and look to see what you have. You'll probably need to use your
10x objective to see protists (some are big enough to see with the 4x lens). If you’re collecting in the summer
or fall, the immediately obvious organisms are likely to be Daphnia, copepods, worms, larvae, etc. Protists are
smaller than these. Tetrahymena look like tiny footballs that swim in relatively straight lines until they bump
into something. They are smaller than Paramecium but move in much the same way. At this magnification,
you won't see bacteria, but there are plenty of them in the water. If you’re collecting in the spring, you won’t
see the big impressive critters, but there should be tiny organisms swimming around. As the weather gets
warmer and summer nears, there will be more large organisms.
Since Tetrahymena are nearly transparent, dark field illumination is used in the lab for viewing them, but you
can see them with a regular microscope, especially if you turn the light down a bit.
Where did you collect that specimen?
You can pinpoint the location with a GPS unit if you have access to one. Some cell phones and PDAs have
GPS capability. If not, it's easy to use Google Earth to get the location. You just have to have a good sense of
precisely where you took your sample. Mark it on a road map if you aren't sure you'll remember. Make a note
of landmarks like nearby buildings, roadways or the shape of the shoreline. A sketch of the area will help you
to remember exactly where you were when you’re looking at a map later.
You can use either Google Earth or Google Maps to get the GPS coordinates. Use the directions below that
apply in your case.
If Google Earth is loaded on your computer, click on the icon to get started. Double click on the map of the
U.S. and then on the map of your state. This will zoom you in and center the expanded map on your location.
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
3
Keep zooming in until you can locate the pond you sampled. You can use the hand tool to drag the map around
until you locate exactly where you took your sample. Choose the yellow “Add Placemark” tool from the menu
at the top. A yellow pushpin and a new window will appear. Drag the pushpin to the spot where you collected
and the coordinates will appear in the new window. Copy the coordinates in Data Table 1.
If you don’t have Google Earth, connect to the Internet, go to the main Google website (www.google.com) and
click on "Maps" at the top of the page. This should take you to a map of the US. Choose "Satellite" in the
upper right corner of the map. This will give you an actual aerial view, making it easier to locate physical
features like ponds. Double click on the map of the U.S. and then on the map of your state. This will zoom you
in and center the expanded map on your location. Keep zooming in until you can locate the pond you sampled.
You can also use the hand tool to drag the map around until you locate exactly where you took your sample.
The pointer turns into a hand when you click on the map. Right click (control click on a Mac) on the site and
choose "What's here?". The GPS coordinates for the site you just clicked on will appear right above the map
window and the site will be marked on the map. Copy the coordinates in Data Table 1.
If you want to make sure you have the right coordinates, re-open Google Maps or Google Earth and paste in
your coordinates. Click Enter and it will take you straight to the site you just identified.
Give a brief description of the site. Is it surrounded by overhanging trees or exposed to bright sunlight? About
how big is it? Was it cold and rainy or warm and sunny when you collected? What is the land around it used
for – residences, industry, recreation, farmland?
Data Table 1.
GPS Coordinates of
Collection Site
Description of Site
Day 2: Getting Tetrahymena out of the mix
The next step is to isolate Tetrahymena. Along with all the obvious large organisms, there's a huge population
of bacteria and fungi. The small collection vial was filled with a mixture that scientists use to grow
Tetrahymena in the lab. The idea is to attract the relatively small number of Tetrahymena from bag of water
into the vial to make it easier to isolate them. Unfortunately, bacteria and fungi also grow well in the same
mixture. You will transfer the contents of the small vial to tubes that contain growth medium plus antibiotics
that will help to limit the growth of bacteria and fungi. Protists like Tetrahymena are not affected by the
antibiotics. (Remember that protists are eukaryotic just like you. They are much more similar to you than
bacteria are. Antibiotics are useful because they interfere with bacterial growth but not eukaryotic growth.)
Materials
Zip-lock bags with collection vials
tube rack
forceps
waterproof marker
!2012 NIH SEPA ASSET Program
tubes with Neff and antibiotics
alcohol
paper towels
Field Research BioInformatics Student Protocol"
4
Procedure
1. Label your Neff tube with your initials, the date and the location where you collected your water sample.
2. Open your Zip-lock bag and remove the conical tube containing your collection vial. Re-seal the bag so it
won’t spill.
3. Remove the vial from the conical tube and blot it dry with a clean paper towel. Take a clean piece of paper
towel and create a work space for the next step.
4. Dip your forceps into alcohol to sterilize them. While they are air-drying, look at the open end of the vial.
Poke the tips of the dry forceps through the filter paper covering the opening in the white ring. Hook the
curved end of the forceps under the white ring. Lift the white ring up and out by rotating the forceps. You
may have to go around the ring several times to get it all the way out. Put the ring and the forceps on the
paper towel.
5. Pour the contents of the vial into a tube containing Neff and antibiotics. Place the cap on the tube, but don’t
tighten it all the way. This will allow air to circulate.
6. Clear up your work space as your teacher directs.
7. Leave the tube in the rack at room temperature and check on periodically it for the next few days. Record
your observations in Data Table 2. Is it clear or cloudy? Can you see any evidence of growth? What does
the evidence look like?
Data Table 2. First Transfer to Neff.
Incubation Time (hrs)
____ hours
____ hours
____ hours
Observations
Day 3: Getting Rid of Contaminants - Again
When you captured Tetrahymena (or other protists) and transferred them into growth medium, you also
transferred contaminants. It was unavoidable because they were in the water sample. The antibiotics were
added to keep the contaminants from growing rapidly and overwhelming the Tetrahymena. Fungi are
particularly hard to get rid of because they form spores that may have actually been protected inside
Tetrahymena’s food vacuoles. If the spores aren’t digested, they are excreted with other waste and may
germinate (begin to grow again) in the growth medium. Waiting a few days should ensure that the spores are
eliminated. If you’re lucky, by making a second transfer to fresh medium, you will leave the spores behind and
transfer such a small number of bacteria and other contaminants that the Tetrahymena will be able to eat them
and you’ll end up with a culture in which the only living organism is the Tetrahymena. That’s your goal.
It’s possible that you’ll have to make a third transfer if the contaminants seem to be getting the upper hand
again. You and your teacher will make that decision. If your culture looks healthy after one or two transfers,
you’ll be ready to extract DNA and send the DNA to ASSET for sequencing.
Materials
tubes with Neff and antibiotics
waterproof marker
micropipette (20-200 µL)
!2012 NIH SEPA ASSET Program
tube rack
sterile dropper pipette
tips for micropipette
Field Research BioInformatics Student Protocol"
5
glass slide
marker
Procedure
1. Place your culture and a fresh Neff tube into a rack. Label the fresh tube again, same as last time.
2. Without opening it, look at your sterile pipette and find the 1 mL mark. Open it as directed by your teacher,
starting from the bulb end. Don’t let the tip touch your skin or any other surface. Draw up about 1 mL from
the culture tube and transfer it to your fresh Neff tube. Cap the old culture tube tightly, and cap the new one
loosely.
3. Observe again over the next few days and record your observations in Data Table 3.
Data Table 3. Second Transfer to Neff.
Incubation Time (hrs)
____ hours
____ hours
____ hours
Observations
Day 4: DNA Extraction
In order to determine whether or not the ciliates you have are actually Tetrahymena, you will need to extract
their DNA and have it sequenced. In order to extract DNA, you will use a chemical (Chelex) that will break up
the cells and prevent damage to the DNA. When a cell breaks up, its DNA is unprotected and comes into
contact with enzymes called nucleases that the cell would ordinarily use to break down foreign DNA from
viruses or bacteria, to make repairs to its own DNA, or to break down and recycle DNA from dead or dying
cells. Fortunately, these enzymes usually need co-factors, like metal ions, in order to function. Chelex is a
chelating agent, a substance that binds these metal ions, especially the very important Mg2+ ions, and keeps
them from activating the DNA-degrading enzymes. It also attracts other polar components of the cytoplasm.
When the Chelex and all the other substances that have attached to it settle to the bottom of the tube, the DNA
is left in the solution above it. The solution contains a lot of other substances besides DNA, but the next step
will be to make many copies of the DNA for sequencing using a technique called PCR or polymerase chain
reaction. This technique only works on DNA and will ignore all the other substances in the solution, so it’s OK
that the DNA isn’t completely pure,.
The PCR reaction needs to use some of those same metal ions, so it’s important to get as much of the Chelex as
possible out of the mixture. Simple settling will remove most of it, but there may be small particles that don’t
settle out. Using a centrifuge will help to remove these. When a tube spins very fast in a centrifuge, particles
that are large compared to water, DNA and other molecules in the solution will be forced into a pellet at the end
of the tube. When the liquid above the pellet is removed carefully, the Chelex particles will remain in the pellet
and the liquid will be free of all but the tiniest Chelex particles. This liquid mixture contains the DNA and is
ready to be sent back to ASSET for PCR.
Materials
your culture
micropipette (20-200 µL)
!2012 NIH SEPA ASSET Program
micropipette tips
5% Chelex
Field Research BioInformatics Student Protocol"
6
55ºC water bath
100ºC water bath
floating microfuge racks
2 microfuge tubes
microfuge tube rack
mini centrifuge
marker
microscope
microscope slide
cover slips
waste beaker
Procedure
1. Label one of your microfuge tubes “extraction.” Label the second tube “final extract.” Put your initials and
the date on both and put them in the rack.
2. Set your micropipette for 100 µL, put a tip on it, and transfer 100 µL of your culture to the “extraction”
tube. Discard the tip in the waste beaker. Re-set the micropipette to 200 µL, put on a fresh tip, and transfer
200 µL of the freshly shaken Chelex mixture to the “extraction tube. (If one partner shakes the Chelex tube
and opens it for the partner with the micropipette, the Chelex should stay suspended pretty well. Work
quickly so it doesn’t settle out too much.)
3. Place the “extraction” tube in the floating rack. (You’ll probably be using the same rack as other students,
so you may have to wait until others are ready.) Place the floating rack in the 55ºC water bath for 30
minutes. During this time, the Chelex will break up the cells and bind up many of the substances that might
break down the DNA.
4. While you are waiting for the Chelex to work, examine a
sample of your culture under a microscope. Use your
micropipette to transfer 20 µL from your culture tube to a
slide. If you are going to use your 4X or 10X objectives,
you won’t need a cover slip, but if you go to the 40X
objective, make sure you cover the drop with a cover slip.
Look for moving cells and sketch what you see with the
10X objective in the space to the right. If you have time,
switch to the highest power and make a sketch at that
level too.
5. After the 30-minute incubation is done, transfer the floating rack to the 100ºC water bath for 8 minutes.
This step will denature (deactivate) enzymes that might break down the DNA. While your tube is in the
boiling water bath, read the next two steps and get ready to do them quickly.
6. Place your tube in the mini centrifuge, making sure it’s balanced by other tubes. (“Balanced” means that
there’s another tube with the same amount of liquid right across from yours. If you turn on the centrifuge
when it’s not balanced, it won’t spin smoothly and may actually be dangerous.) When the tubes are ready,
turn the centrifuge on for 3 minutes.
7. Set your micropipette for 100 µL, put on a new tip and, immediately after the centrifuge stops, and draw
100 µL of clear liquid from above the Chelex pellet in the bottom of the tube. Don’t get any Chelex into the
tip. (If you do, just put the liquid back into the tube and spin it again.) If you can’t get 100 µL without
picking up Chelex, take a little less than 100 µL. Transfer the liquid into the “final extract” tube.
8. Put your “final extract” tube into the rack your teacher has set up for the class. These are the tubes that will
be sent to ASSET for sequencing. Although it doesn’t look like much in the tube, there’s enough for 50
PCR reactions!
9. Put your culture tube into the rack your teacher has set up for the class. The scientists at ASSET maintain a
collection of different Tetrahymena strains from all over the world. If your sample turns out to be
Tetrahymena some of your cells will be added to the collection and might be used for research in the future.
10. Your “extraction” tube can go into the waste beaker with your tips. Clean up the rest of your work space.
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
2
What Happens to the DNA Next?
The PCR reaction requires a machine that automatically repeats a temperature cycle over and over. Because
PCR machines are delicate, you will ship your DNA to ASSET and scientists there will run the PCR and send
the DNA to be sequenced. Here’s what will happen. Your DNA sample will be mixed with all the other
components that are needed to make DNA:
• nucleotides, the A, C, G and T building blocks of DNA,
• DNA polymerase, the enzyme that will put the nucleotides together,
• primers, short bits of DNA that attach to the part of your DNA we’re interested in copying. They give
the polymerase a place to begin making a copy, and
• buffer, a solution containing those important Mg2+ ions and other chemicals that keep the pH of the
mixture just right for the enzyme to work.
Once all of these chemicals are mixed in a tube, the tube is put into the PCR machine. The machine is called a
thermal cycler because it automatically repeats the same cycle of temperature changes over and over.
• denaturation – in this step the DNA mixture is heated to about 98ºC for about 10 seconds, causing the
double strands to come apart and making the DNA single-stranded.
• annealing – the DNA is cooled to about 50ºC for 30 seconds, allowing the primers to find and stick to
their complementary sequences on the DNA.
• elongation – the DNA is heated up to about 70ºC for 60 seconds, allowing the polymerase to add
nucleotides to the primers and make a much longer piece of DNA (polymerase can add about a thousand
nucleotides per minute). The polymerase uses your DNA to tell it the order in which to add nucleotides
to the new pieces, so it’s not just making random DNA – it’s making an exact copy of your DNA.
This cycle of three time and temperature conditions is repeated 35 times, so by the time the PCR reaction is
done, your DNA molecules have been doubled 35 times. This means that, theoretically, each DNA molecule in
the original sample could produce billions of copies. Since the process isn’t 100% efficient, it probably won’t
make quite that many, but it will certainly make millions of copies. Eventually, the raw materials will be used
up and the enzyme will become less efficient after many cycles of heating and cooling.
Once the PCR is done, the copied DNA will be sent to a special lab that has an automated sequencer. This
machine is able to take the copies of DNA and make even more copies. This time, however, when the PCR
reaction is run, special chain-stopping nucleotides are used. This time the PCR tube will contain a mixture of
normal A, C, G, and T molecules, but also some chain-stopping A, C, G and T molecules. When the copying
starts, if polymerase puts a normal A into place, the chain can keep growing, but as soon as a chain-stopping A
is used, the chain ends with that A. The places where the chain-stopping nucleotides are added are random, so
you end up with pieces of DNA that go from very short (the chain stopped soon after it started) to very long (the
chain stopped at or near the end of the piece of DNA being copied). The same thing happens with the chainstopping G’s, C’s and T’s.
In the sequencing process, these pieces are made, sorted by size, and their ending nucleotides are identified. If a
piece that’s 3 nucleotides long ends in G, the third nucleotide must be G. If a piece that’s 6 nucleotides long
ended in A, the sixth nucleotide must be A, and so on. A computer will immediately convert all that
information into a sequence, so your data will be available the day after the DNA is sent for sequencing.
The sequence you get back will not be the whole sequence of Tetrahymena. That sequence has been worked
out, but it’s way too long for us to use. The sequence information you will get pertains to three small areas of
the Tetrahymena genome:
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
2
•
•
•
H3H4 – a short section of DNA that occurs between two genes for histones 3 and 4. Histones are
proteins help coil up the DNA so it’s ready to be divided into two “packages” for two separate new cells.
The PCR product made with the primers used is about 699 bp long. (The abbreviation bp stands for
“base pair.” A base pair is a pair of complementary nucleotides – an A paired with a T for example. So,
699 bp means that the double-stranded DNA sequence is 699 nucleotides long.)
5.8s ITS – This is a non-coding region between the structural ribosomal components. The 5.8s ITS
sequence is transcribed as part of a common precursor transcript, but is excised and rapidly degraded
during ribosomal RNA maturation. The PCR product made with the primers used is about 150 bp.
cox1 – a section that codes for part of cytochrome oxidase I, an enzyme that is important for cellular
respiration. This DNA is actually in the mitochondria, not in the nucleus. The PCR product made with
the primers used is about 2000 bp. It’s been widely used for “DNA bar-coding,” that is, using short
sequences of DNA to scan for differences that allow biologists to identify organisms, sort of the way a
scanner in the grocery store looks at the barcode on a cereal box to tell what it is and what its price
should be.
Three separate PCR reactions are run, one using H3H4 primers, one using 5.8s primers, and the third using cox1
primers. The primers used in the PCR process are specially chosen to “look for” these small regions. Scientists
have already studied these regions and choose primers that will automatically bind to the beginning of these
areas in the DNA you extracted. These regions are chosen because they’re found in all Tetrahymena cells, they
are important so they don’t change much at all, but they have experienced occasional mutations. (Remember
that in order to use DNA sequence information to tell how closely related different organisms are, we have to be
able to see differences.)
So, when the PCR reaction is run using the H3H4 primers, only that section of the whole Tetrahymena genome
will be copied. When the second PCR reaction is run with the 5.8s primers, only that section will be copied.
And when the third is run, only the cox1 section will be copied. If your sample contains DNA, the DNA will be
sent for sequencing and you will get a report back that gives the sequences of these three short regions.
Day 5: Bioinformatics – Using Sequence Information to Make an Identification
At this point, you have received your sequence information from ASSET. It’s time to take it to the database
used by scientists all over the world and see what the sequence tells you.
You will be going to the National Center for Biotechnology Information (NCBI) website. NCBI maintains a
public database (called GenBank) of genetic information that has been contributed by scientists from all over
the world. The information is freely available to anyone. It was created in 1982 and, in recent years, has been
doubling in size every 18 months. Scientists who work out the sequence for a particular gene, or for a particular
region of DNA in the organism they study send their results go GenBank, where it is registered and made
available to the public. As more information is gathered about the gene, the region or the DNA, or the organism
it came from, it’s added to the database in the form of annotations. Other public databases also exist and can be
searched using the NCBI tools.
A complete human DNA sequence was finished in 2003. It’s like a book that’s billions of letters long (the only
letters being A, C, G and T) with no spaces or punctuation to tell us where a gene begins or ends. Many clever
scientists who are interested in finding the answers to questions about genetic diseases or evolutionary history
go into the field of bioinformatics, where they use computer tools to search the genetic data for answers to
important questions.
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
3
The complete genomes for many other organisms are also available now, including Tetrahymena. Since the
databases are constantly growing, a search done today may yield different results than one done using the same
sequence tomorrow. Most of the information will be exactly the same, but newly added information may show
up on the later run.
Materials
DNA sequence information
computer with access to the Internet
Running a BLAST
1. Connect to the internet and go to: http://www.ncbi.nlm.nih.gov/
2. On the right side of the page, under “popular Resources,” choose “BLAST”. BLAST stands for Basic Local
Alignment Search Tool. It is a tool that will allow you to enter your sequence and search all the genetic
databases that are linked under NCBI. It will report the sequences that it finds that are most closely aligned
to yours. In other words, it will report the best matches.
3. Where it asks you to choose a BLAST program to run, choose “nucleotide blast.”
4. There will be several steps involved in running your search:
a. Open the document that contains your sequence information from ASSET.
b. Copy the H3H4 sequence from your document, just the ACGT portion, not the title or any other
information. Paste it into the window that says: “Enter accession number(s), gi(s), or FASTA
sequence(s).” You are entering a FASTA sequence. Make sure you see it in the box.
c. Just below the sequence box, there’s a box for entering the “Job title.” Enter “H3H4” and your
initials.
d. Scroll down to the “Choose search set” box and pull down to “nucleotide collection.” The pull down
window shows the major collections of data that you might want to search. If you were studying the
amino acid sequence of a protein, you could use the Protein Data Bank. But your data is a
nucleotide sequence, so that’s the place to search.
e. Scroll down the page a bit to the “Program Selection” section and tell it to Optimize for “Somewhat
similar sequences.” (The default setting is “Highly similar sequences.” If you wanted to look at
samples from two different humans, for example, this would make sense because the sequences
would probably be almost identical. By making this change, you allow the program to report bigger
differences. Since your organism might not even be Tetrahymena thermophila, it makes sense that
there could be significant differences. A BLAST search that only looks for highly similar sequences
might not give you any results at all.)
f. Scroll down to the bottom of the page and click on “Show results in new window.” This will bring
up a separate window so you can look at both windows at the same time.
g. Just below that box, click on “Algorithm parameters.” A number of new choices appear. Under
“Filters and Masking,” unclick “low complexity regions.” Some areas of DNA have many repeats
of the same nucleotides over and over. The default setting tells the search tool to ignore these types
of sequences. You will be searching for a relatively short sequence that’s not repetitive, so you
won’t need to filter these out. Filtering sometimes speeds up the search process, but it may block
matches that could be important. If time isn’t an issue (and it won’t be for our short sequence),
there’s no need to filter.
h. Now you’re ready to search, so click on the big blue “BLAST” button at the bottom of the page. A
status window will come up. It tells you that it’s searching and how long it’s been since you
submitted your BLAST request. It shouldn’t take much more than 30 seconds or so. When it’s done
searching, a new window will come up displaying your results.
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
4
Interpreting your results
1. First check to make sure you’re looking at what you actually submitted. In the upper left corner, you should
see the job title you entered. Below that, it should say that you did a “nucleotide” search for a sequence that
was about “480” nucleotides long. To the right of that information, there’s a report telling which databases
were searched and what program was used.
2. Below this, you’ll see a graphic representation of the search results. The top horizontal red bar (the one
labeled “Query”) represents the DNA sequence you pasted into the search box. Each of the bars below it
indicates a match between your sequence and sequences in the databases. The best match is at the top,
probably shown in red. As you go down, the matches get weaker because there are more differences.
3. Scroll down the chart and notice that as you go down, the bar color goes from red to blue. (Note the color
code right above your query bar. As the match gets weaker, the color changes.) You may also notice gaps
in the bars. These are areas where your sample and the sample from the database didn’t match because one
of the two was missing DNA. If the line picks up again, the BLAST tool picked up a match later in the
sequence.
4. Mouse over the first match and notice that a brief description of the sequence your DNA matched appears in
the box above your query bar.
5. Scroll down below the chart and you’ll see a table. Each row in the table corresponds to a bar on the
graphic display, and they’re in the same order. The “Description” column gives a brief description of the
sequence from the data base, usually starting with the name of the organism it came from. A quick scan
down the list can tell you whether you are likely to match a member of the genus Tetrahymena, or even
Tetrahymena thermophila itself.
6. There are three pieces of data to look at. (Check Data Table 4. You’ll be entering some of this data as you
go along.)
a. max score (S) – This score comes from a mathematical calculation that takes a number of factors
into consideration, like the length of the sequences, the number of identical pairs, whether there
were gaps, etc. The higher the match score, the better the match.
b. expect value (E) – E is the probability that a match would occur randomly, without any
evolutionary relationship. For very short sequences, the likelihood of finding a random match is
pretty high. You are searching a 480-nucleotide sequence, so the likelihood of a perfect random
match should be really small (zero). The smaller the E number, the more confident you can be
that this is a good match. (A value of “1e-15” means that the probability of a random match is
1x10-15, or one in 1015 sequences. That’s one in a quadrillion.)
c. max identity – This is also called the “% identity” and refers to the percentage of the nucleotides
in the query and subject sequences that match exactly. The higher the percent, the better the
match.
7. Click on the top red bar of the top entry in the graph. You’re now looking at details of the first match,
perhaps something like the match below. (If you scroll down from this point, there will be a similar entry
for each of the matches. The sequence you pasted into the BLAST is the “Query” and the “Subject” is the
matching DNA from the database. BLAST has lined up the two so you can see where the differences are.
Look at the sample below and notice the following:
a. The first 12 nucleotides are exactly the same. You can tell at a glance because there’s a vertical
bar between the letters. Where there’s no match, there’s no vertical bar. Find a spot where an A
in one sequence is a G in the other.
b. “N” means that there was a nucleotide present, but the sequencing reaction wasn’t able to tell
what it was.
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
5
c. A hyphen in one of the sequences indicates a “gap.” This means the sequences stopped
matching for a short distance, but then began to match again. The computer inserted this “gap”
so the match could continue. There are two in the sequence below.
d. Note that the “Query” sequence (your sequence) starts at nucleotide #11. The “Subject”
sequence (the match found by BLAST) starts at #44. This means that the sequences you are
comparing had different starting points. BLAST found the point where they started matching
and reported from that point on. In this case, which sequence was longer?
emb|X17135.1|
DNA
Length=559
Tetrahymena malaccensis histone H3II and histone H4II intergenic
Score = 762 bits (844), Expect = 0.0
Identities = 450/467 (96%), Gaps = 2/467 (0%)
Strand=Plus/Plus
Query
11
Sbjct
44
Query
70
Sbjct
103
Query
130
Sbjct
163
Query
190
Sbjct
223
Query
250
Sbjct
283
Query
310
Sbjct
343
Query
370
Sbjct
403
Query
430
Sbjct
463
TTCTGGCGGCCT-GGNAGCGAGTTATTTTCTGGGGGCCTTAGCACCAGGTGGACTTTCTA
|||||||||||| || |||||||||||||||||||||||||||||||| |||||||||||
TTCTGGCGGCCTTGGAAGCGAGTTATTTTCTGGGGGCCTTAGCACCAG-TGGACTTTCTA
GCAGTTTATTTAGTTCTAGCCATTTTTGCTTATGTATTTATAGTGGATTGTCTTTTTGAC
|||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||
GCAGTTTATTTAGTTCTAGCCATTTTTGCTTATGTATTTATAGTGGGTTGTCTTTTTGAC
TTTTCTTTTGAAGGTTATTATTTTTTTTTAATAAAATTCTTTATCGACAACAATTAGGGC
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TTTTCTTTTGAAGGTTATTATTTTTTTTTAATAAAATTCTTTATCGACAACAATTAGGGC
AAGATCATTTGAAATGTTTGGCATAATCCTGGAAAGAGAAGATATGAACAATTTTGATTG
|||||||||||||||||||||| |||||||||| ||| ||||||||| ||||||||||||
AAGATCATTTGAAATGTTTGGCTTAATCCTGGATAGACAAGATATGATCAATTTTGATTG
69
102
129
162
189
222
249
282
GATGATTTGAAAGGAAATCAGATTTTTGAGATTTTATCCAATCAAATTTGAGATCTCCGA 309
|||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||
GATGATTTGAAAGGAAATCAGATTTTTGAGATTTTATCCAATCAGATTTGAGATCTCCGA 342
GCAATTTGGATAATTAAATAATATTAAAAAAAAAGAGATCTTTCCCCAAAGACGATAATC
||||||||||||||| ||| |||||||||||||||||||||||||||||||| ||||||
TCAATTTGGATAATTAGATATTATTAAAAAAAAAGAGATCTTTCCCCAAAGACTATAATC
369
ATTAAAACAAAAATAAATAATCTAATTAAAAATAACAATAAAAAAATAATAATCCAGCAA
||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||
ATTAAAACAAAAATAAATAATCTAATTAAAAATAATAATAAAAAAATAATAATCCAGCAA
429
AAATGGCCGGTGGAAAAGGTGGTAAAGGTATGGGTAAGGNCGGANCC
||||||||||||| ||||||||||||||||||||||||| |||| ||
AAATGGCCGGTGGTAAAGGTGGTAAAGGTATGGGTAAGGTCGGAGCC
402
462
476
509
8. Let’s say you want to know more about your top entry. Click on the blue “Accession Number” in the upper
left corner. When a scientist submits a DNA sequence to the database, it is given an accession number that
can be used to find the sequence and any other information about it that has been submitted. This opens up
a new page that gives all the information stored in the database about this DNA sequence. Answer the
remaining questions in Data Table 4 for your sequence. (Remember that blue text is probably a link to other
information, and that you have information search tools available on your computer.)
9. Repeat the process for your other DNA sequences, the 5.8s and cox1. Enter your data in Data Table 4.
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
6
Data Table 4. BLAST Search Data.
H3H4 sequence
5.8sITS sequence
cox1 sequence
Collection Site Name
Collection Site GPS
Coordinates
Job Title for BLAST
Max Score (S)
E Value
% Identity
Accession Number
Organism the best
match in the
database came from
What author(s)
submitted the
sequence?
What was the date of
their earliest
publication?
How many different
journals are listed?
What super kingdom
does the organism
belong to?
What family does it
belong to?
How many
nucleotides are in
the sequence?
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
7
Conclusion Questions
1. Tetrahymena is a tiny ciliated protist. It feeds by using cilia to sweep particles from the water
into its oral groove. Why do you think it’s more likely that you’ll find Tetrahymena in still water
than in a fast-moving stream?
2. Why did your teacher suggest collecting near decaying vegetation, even though Tetrahymena
doesn’t eat plants?
3. Draw a picture of the most interesting organism you saw in your water sample. Describe in
words what made it interesting to you.
4. Why was it necessary to use antibiotics in isolating Tetrahymena-like protists from the water
sample?
5. Why would anyone want to know the GPS coordinates of the collection site?
6. During the DNA extraction, what was the function of the Chelex?
7. Why did you transfer the Chelex mixture to a boiling water bath?
8. If a PCR reaction from evidence left at the scene of a crime contained three strands of DNA from
a suspect, how many copies of a section of that DNA would there be after 35 PCR cycles?
9. In the PCR reaction, what is the function of the DNA polymerase? Most enzymes would be
deactivated by the high temperatures needed in the PCR reaction. Using Google, find out how
molecular biologists found a heat-resistant enzyme.
10. Why does a PCR reaction need primers?
11. Why does only a small portion of the DNA get copied during the PCR reaction?
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
1
12. The pieces of DNA made in a sequencing reaction were sorted by gel electrophoresis. The final,
chain-stopping nucleotide for each piece is shown on the image of the gel.
a. What is the ending nucleotide of the shortest piece?
________
b. What is the ending nucleotide of the longest piece?
________
c. What would the sequence of the original DNA be?
_________________
13. In your BLAST search, what organism was the closest match based on the H3H4 sequence? the
5.8s sequence? the cox1 sequence?
14. In your BLAST search, what was the % identity for the best match based on H3H4? based on
5.8s? based on cox1?
15. Imagine that you ran BLAST searches on DNA sequences from three different organisms.
BLAST compared your sequence data to an organism called Pentahymena arcticophila and
reported the following data. Answer the questions below based on the data.
Collection site
H3H4
5.8s
cox1
%
E value
%
E value
%
E value
Identity
Identity
Identity
North Pond
100%
0
100%
0
97%
0
Johnson Lake
98%
0
97%
0
87%
9.8 e178
Smith’s Birdbath
97%
0
92%
0
99%
0
a. Which of the samples contained organisms that were the most similar to Pentahymena
arcticophila?
b. Which of the samples contained organisms that were the most similar to each other?
c. Of the three DNA sequences, which seems to be “most highly conserved” (i.e., it’s
changed the least over time)?
d. Explain your choice for “c”.
16. Deciding exactly how much difference between two sequences would be needed to indicate that
they came from two different species is tricky. Even with DNA sequence evidence, scientists
aren’t always sure where to draw the line.
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
2
If you had cox1 sequence data from two similar, but non-identical organisms, what other evidence
would you want to look at before deciding that they belonged to different species? List at least
five other pieces of evidence you’d want to look at:
a.
b.
c.
d.
e.
17. You’ve just used BLAST to compare your sequences with the sequences scientists have submitted
to the database. You can also search for other information. Go back to the NCBI home page:
http://www.ncbi.nlm.nih.gov/
In the box at the top of the page, enter “cytochrome oxidase”. (“All Databases” should be chosen
in the pull down menu.) Click on “Search.” The page that comes up lists all references to
cytochrome oxidase found in all the public databases.
a. How many Nucleotide sequence records are there? (Don’t click on it – you’ll have to
wait a long time for those records to load.)
b. Click on the link to “Structure.” What do you notice about the links that come up?
c. The link to OMIM gives information about human inheritance. Click on it and get a list
of scientific journal articles that mention cytochrome oxidase. Scan down the list and
find the name of the first disease mentioned in an article title.
!2012 NIH SEPA ASSET Program
Field Research BioInformatics Student Protocol"
3
Download