Step 6) Analyzing your Mitochondrial DNA Sequences Trimming

advertisement
Step 6) Analyzing your Mitochondrial DNA Sequences
Trimming your mitochondrial DNA sequence using 4Peaks.
In order to determine how related you are to your classmates, to Rarely Reclusive,
and to the human ancient and modern groups in the Dolan DNA database, you will
need to first determine whether or not your sequence is of sufficient quality for the
analysis, and then to trim your sequence so that only good quality data is used. The
4Peaks program that you used during your yeast experiments will work very well
for this purpose.
Since 4Peaks is a Mac-only program, you will need to do the first two stages of this
analysis on a Mac (your own, if you have one, or a departmental laptop if you do
not). You can then do the MEGA5 and Dolan DNA analysis on any computer
platform.
1. Examining the primary data of the mitochondrial control region




Log in to the class Blackboard Site. Choose “Content” in the left menu bar.
Open the “Mitochondrial DNA Sequences” folder and download your
sequence file (labeled “TT(#)” to the desktop.
Open the file using the program “4Peaks”.
At this point you should see peaks of four different colors. There is also an
X-axis that consists of a time unit of measurement, a nucleotide number, and
a nucleotide (designated A, C, G, or T). There are a few things to note:
o at many points there is only one peak and that the color of the peak
determines the nucleotide indicated on the X-axis. The sequence at
these points is of high quality, and can be used for phylogenetic
analysis. A sample is shown below:
o at some points, there is more than one peak, or the peak will look very
broad or misshapen (see bases near the beginning, for example). This
is due to technical problems and makes it difficult or impossible to
predict the base that is at that position on the DNA strand. This is
why there is occasionally a “N” instead of an A, C, G, or T. “N”
designates that the identity of the base at this position was not clearly
determined. Sequence with too many “Ns” cannot be used in
phylogenetic analysis. An example is shown below, although some
sequences will certainly look even worse than this:

4 Peaks may also try to call sequences when the peaks are obviously of poor
quality, as shown below. Areas with poor quality peaks should NOT be
included in your sequence analysis.

Look at your sequence. Does it have a section with good sequence, or are
there N’s or poor quality peaks all the way through it? If there is good
sequence available, highlight (select) the good sequence by clicking on the
first base of the good stretch, and then holding and dragging to the right
until you have selected the last base of the good stretch. If your sequence
is not of sufficient quality, do not worry. Simply work with a sequence
from your team that is.

Select "Crop" from the drop-down menu in the lower left hand corner of the
window (the ‘gear’ icon). The sequence that remains in the 4Peaks window
is now just the good quality DNA.

Do NOT close your cropped file when you move to the next step – you will
need it again soon!

Use “Export” to rename this cropped file and to save it to the desktop.
Change the .txt to .fas This step must be done or the MEGA 6 program
will not recognize the sequences.

Repeat this process (cropping and exporting) for each member of your
team. If a team member did not get good sequence of their own, simply
choose one of the other class sequences.

AT THIS STAGE, IF YOU ARE NOT YET USING YOUR OWN
COMPUTER, EMAIL THE CROPPED FILES TO THE TEAM
MEMBER WHO BROUGHT THE COMPUTER, AND THEN
CONTINUE ON WITH THE 4PEAKS ANALYSIS BELOW.
2. BLAST Analysis of your cropped sequence




Under the gear icon in the lower left corner. choose “BLAST Sequence”
then Nucleotide (BLASTn). This is the same program that you used for
your yeast DNA sequence comparisons, except we are now using the
entire sequence database, rather than just yeast.
You should see your sequence in the query window. Click on “BLAST”.
Does your sequence appear to be human mitochondrial DNA? How do
you know? What is the percent coverage, the percent identity and the E
value? What do these values mean? Write the answers to these questions
in the final lab worksheet found at the end of this handout.
Repeat this process for all of the members of your team. Report the data
for all of your team members in the final report sheet.
Step 7) Phylogenetic Analysis of the mitochondrial control region
Now it is time to fine out how closely related you are to rarely reclusive, to other
members of your team, and to students from last year. You will do this using the
MEGA 6 program.


Switch computers and download all sequences to the desktop, if
needed.
Open MEGA 6 by double-clicking on its icon. You will see a window that
looks like this:

Begin your analysis by clicking on “Align” and then clicking on “Edit/Build
Alignment” from the drop-down menu. Choose “Create a New Alignment” and
click ‘Ok’, and then “DNA”. You should see a window that looks like this:

Import your sequences. Under “Edit” choose ‘Insert Sequence From File”.
Make sure that the ‘Select Files of Type’ window at the bottom shows
“FASTA”. Select your sequence, and click on “Open”. Your sequence should
now appear in the Alignment Explorer Window. Repeat this step for each
sequence from your team, and for Rarely Reclusive (Rarely.fas, respectively).
As you import each sequence, you will notice that it comes in ‘selected’ (all
blue). You can unselect (and reselect) the sequence by clicking on the sequence
name to the left. You should notice that each base is a different color when the
sequences are not selected.
“Aligning” sequences makes sure that you are comparing the sequences of the
same parts of a gene with each other. The sequences above are not aligned (yet),
so even though they are all mitochondrial DNA, they do not look like they
match. When you ask MEGA to align sequences, it will search for large areas of
identity and similarity, and then slide the genes around until those areas are
matched to each other. The match will never be perfect (unless you have two
identical sequences, of course), so the program is really looking for the ‘best
match’.

Align your sequences by selecting all four sequences (shift click), and then
clicking on ‘Alignment’ and “Align by Clustal W” in the Alignment Explorer
window. Click “Ok” to accept the default parameters. When you align your
sequences, you should now see some of them match pretty well, while others
may not. The computer will also insert short gaps into the sequences if it needs
to to make things align better. In the aligned sequences below, the first three
sequences are identical, while the next two are aligned, but are obviously NOT
perfect matches.

Export your alignment. Once the alignment is complete, save and export the
current alignment session to the desktop by selecting Data | Export Alignment
from the Alignment Explorer window main menu. Choose “MEGA” format and
give the file an appropriate name, such as "MTalignment_team3.meg". This will
allow the current alignment session to be used in the next step.
So, who is most closely related to Rarely Reclusive? You will answer this
question using UPGMA analysis. As we discussed in class, this analysis not
only looks at the total number of nucleotide differences between individuals, but
it also maximizes the parsimony of the tree. You will notice as you play around
with MEGA that there is a specific “Maximum Parsimony” phylogeny program.
However, in this analysis, we do not want to penalize sequences for being of
different lengths. Remember that each of you trimmed your sequences to
different lengths based on sequence quality, not based on evolution, we do NOT
want the program thinking length differences are significant! UPGMA analysis
will only use the bases that all of the sequences have in common when it
constructs the tree.

Generate a UPGMA Phylogenetic Tree. Go back to the MEGA 6 Window and
click on “Phylogeny” and then “Construct/Test UPGMA Tree.”
o Choose your saved .meg file and open it.
o Click on “Compute”. You should now see a phylogenetic tree in a
window that looks something like this:
You can save the file as a .pdf image under the “Image” menu. The image can
then be pasted into your Word document for your final lab report (question 2). If
you prefer, you can also simply draw your tree in the space provided.
Which student in your analysis is most closely related to Rarely Reclusive?
Step 7) Phylogenetic Analysis – Human evolution and your place in the
human family.
Well, you can't all be most closely related to Rarely, so who are you related to?
1. Follow this link (http://www.bioservers.org/bioserver/) to the Dolan DNA
Bioserver login page. Click ‘Enter’ under the Sequence Server Site.
2. Click on 'Manage Groups' from the menu in the center of the top of the page.
You will now see a window that looks like this:
3. Click on the upper right pull-down menu (sequence sources). You should
now see a variety of choices. Choose 'modern human mt DNA'. Click on the
boxes to the left of each of the sequences to select them, and then click on
'ok'.
4. Click on 'Manage Groups' again. This time select 'Ancient Human mt DNA'
and click on all of the boxes, and then 'ok'. You have now uploaded a whole
bunch of mtDNA sequences for analysis. If you wish, you can also select
and upload ‘Ancient Human mtDNA’ or ‘Neanderthal mt DNA’ in the same
way.
5. To upload your own sequence, go to your FASTA sequence, select only the
sequence itself, and copy it.
6. In the sequence server window, choose 'Create Sequence'. Give your
sequence a name, and then paste your sequence data into the window. Click
'ok'.
7. You will now see your sequence added to the rest. Do the same thing for all
of your group members.
8. To compare your group members to the available sequences, click on the
box to the left of your sequences, and then next to any other sequences that
interest you (you'll see lots of drop-down possibilities within each group).
You may only select up to ten total sequences (including your group
sequences). Have fun, but also use sequences that make sense, given what
you know about your ancestry.
9. After you have selected 10 sequences for analysis, find the word 'compare'
in the gray bar menu, and choose 'phylogenetic tree' from the drop-down
box. Click on "Compare". After your sequences are analyzed, a popup
window will show you your tree.
10. Choose 'phenogram' and 'yes' (to make the tree branch lengths proportional
to the evolutionary distances). You will see something that looks like the
picture below, but that contains the individuals and species that you chose
to analyze.
11. Phenograms such as this one can provide a ton of information. For example,
one thing that this phenogram shows is that Lake Mungo Man and African
American #1 share a common ancestor, to which Lake Mungo Man is more
closely related.
12. Use the Grab program (scissors on the dock) and then choose “Capture” and
“selection” (or perform a screen capture of the selection on your Windows
computer). Select the phylogenetic tree by drawing a box around it and
releasing the mouse. You will now see a window with the tree image in it.
Next click on ‘copy’ to copy the image, and then go to your lab report and
paste the image into the report (question #3).
Genetics Final Lab Report, Fall 2012. Bioinformatics and Human Evolution
Names:
1) In the spaces below, type or write the BLAST Values for your team’s
mitochondrial DNA sequence(s) and provide a brief description, in your own
words, of what those values mean in terms of your sequence matches with human
mitochondrial DNA.
% Coverage:
E value:
% Identity:
2) Paste or draw your MEGA 6 Phylogenetic Tree in the space below. Which
classmate is most closely related to Rarely Reclusive? Justify your answer.
3) Paste or draw your Phylogenetic Tree from the Dolan DNA Server in the space
below. Describe, in your own words, the evolutionary relationships shown in the
tree.
Download