Molecular ecology and evolution (BIO 648, 10 p)

advertisement
Internet-based sequence analyses 2014-03-19
Molecular ecology and evolution (BIOR25)
1. Internet-based sequence analyses
The aim of this DryLab is to use internet (GenBank) to find cytochrome b gene sequences in
primates, and align these sequences and measure some basic distances between the sequences.
Based on the tools you learn in this DryLab, you will be able to compare DNA sequences that you
have generated yourself with sequences from different species stored in e.g. GenBank. The methods
you learn in this lab can also be used to identify species based on DNA sequences.
1.1 The approach in this lab is to find DNA sequences of the cytochrome b gene from some species
of primates (monkeys), align these sequences and calculate sequence divergence on the level of
DNA and amino acids. Known sequences can be found at the GenBank via the entrez nucleotide at
NCBI (and you can later use the same methods/tools to analyse sequences from your own lab
work).
Människa
Chimpans
Gorilla
Orangutang
Gibbon
Rhesusapa
Human
Chimpanzee
Gorilla
Sumatran orangutan
Red-cheeked Gibbon
Rhesus Macaque
Homo sapiens
Pan troglodytes
Gorilla gorilla
Pongo pygmaeus abelii
Hylobates gabriellae
Macaca mulatta
GenBank homepage:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide
1
Internet-based sequence analyses 2014-03-19
1.2 Get cytochrome b sequences from different primate species. Write “gorilla gorilla
mitochondrion complete” in the search string box, choose “Nucleotide” and run a search in
GenBank, click on Go. To find what you want, you sometimes have to play around with the
keywords. Matching sequences will then be displayed. Choose for example “Gorilla gorilla gorilla
mitochondrion, complete genome” (dubble click on the blue text). The next page that comes up
will show the whole nucleotide sequence of the gorilla mitochondrion (at the end of the page) and
you have to find where the cytochrome b sequence begins. Scroll down until you find the following
text:
gene
14171..15311
/gene="CYTB"
/db_xref="GeneID:6742684"
Click on ”gene” so that you select the cyt b sequence, i.e. from nucleotide 14171 to 15311.
1.3 Copy from GenBank to BioEdit. The next step is to transfer the sequences from GenBank to
BioEdit (the program for sequence analysis). Before you can paste the sequence into BioEdit, you
have to transform them into the socalled FASTA format (starting with the > sign followed by
sequence name and description on line one, and the actual sequence from line 2 onwards).
 Click on “FASTA” (lower right).
2
Internet-based sequence analyses 2014-03-19

Select the nucleotide sequence (including the first line) and copy (Ctrl+C).
The next step is to paste the sequence into BioEdit.
 Open the program BioEdit. [If you have not got BioEdit installed, download and install the
latest version from http://www.mbio.ncsu.edu/bioedit/bioedit.html.] Go to ”File””New
alignment”.
 Make sure that “Mode:” in BioEdit is set as “Edit” and “Insert” in the upper left corner
 Put the marker in the box for the new alignment and click, and then go up with the mouse
arrow and click on “File””Import from Clipboard”. Then BioEdit automatically convert
the file to a continuous nucleotide sequence and puts it in your alignment file.


Change the name with ”Sequence””Rename””Edit Title”. Avoid blank space and
symbols in the name! This will make it easier when finding a sequence in a phylogenetic
tree later on.
Save the file (student computers; in GU-Student /My Documents)
1.4. Add cytochrome b gene sequences from another 5-6 different species in your alignment file.
You can either do this by searching for taxonomic names / gene regions as above or using one
cytochrome b sequence, e.g. from the gorilla, doing a BLAST (1.5). Note that some of the options
to explore in MEGA (later in this exercise) do not work if you have less than four sequences.
1.5 Execute a BLAST (Basic Local Alignment Search Tool):

Go to GenBank. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide
Find BLAST (in the menu under Nucleotide Tools). A click leads you to the BLAST-page.
3
Internet-based sequence analyses 2014-03-19
 Select “nucleotide BLAST”. Now you see an open field where you paste your copied
sequence. Select the database “Nucleotide collection (nr/nt)”

Click on the “BLAST” button (lower left). The search may take up to a few minutes! You
get the following result of the BLAST search. First an illustration of the “Distribution of 100
Blast hits on the Query Sequence”, followed by a long list with the names of the hits
(sequences that show some similarity to your sequence). Hits are ranked according to
similarity with the most similar sequence on the top of the list.
 What does the E-value and Query coverage mean?
 Do a new blast search but type in an arbitrary sequence of about 100 nucleotide (or
copy and paste this one (and change “Optimize for” to “somewhat similar sequences
(blastn)”:
4
Internet-based sequence analyses 2014-03-19
“gagggctttcggtatgcttgcacacattccggttcggctgcgtggtgcagatgacagatagcagatagacccttgtgtgt
gcgaaatgtgtgcgagagcagagagatttccatttggccattggacccttggtaattgggaaacctta”
 Compare with the results from the BLASTed gorilla. Coverage, E-value

BLAST the gorilla sequence again and continue. Choose >4 sequences from different
species and add these to the file with the gorilla sequence. This can be done in several
different ways. The simplest way is to do the same as you did with the gorilla sequence and
add them one by one. A quicker way is to download all your chosen sequences in one batch.
To do this, select the sequences you want by ticking the box next to the sequence name.
Click Download (at the top of the list of sequences) and tick “FASTA (aligned sequences)”
and then Continue. Save the .txt file and open it in BioEdit. Select the sequences and copy
and paste them into the file with the gorilla sequence. Change the name of the sequences to
something more handy than the GenBank names.

Make sure all your sequences are aligned and of the same length. If not, insert or delete
bases at the beginning so the sequences match each other, and then trim the sequences so
they all have the same length.
1.7. Open MEGA5 (free program that can be down loaded from (http://www.megasoftware.net/).
1.8. Open your file [Open a File/Session] and answer the following questions as they appear:
 "How do you want to open the file?” Analyze
 "Nucleotide sequences" OK
 "Protein-coding DNA" yes
 Select genetic code "Vertebrate mitochondrial".
In order to look at the data:
 Click Data / Explore Active Data
5
Internet-based sequence analyses 2014-03-19
1.9. Explore the buttons at the top of the Sequence Data Explorer window: (and try to understand
what these means)
 C – Conserved sites
 V – Variable sites
 Pi – Parsimony-informative sites
 S – Singleton sites
 0 – 0-fold degenerate
 2 – 2-fold degenerate
 4 – 4-fold degenerate
1.10. Compute DNA distances
 Click Data / Quit Data Viewer
 In the menu, go to distance / compute pairwise distances
 In the Options summary, select “Model/Method = p-distance”, Substitution
type=”Nucleotide” and “Gaps/Missing Data Treatment = Pairwise deletions”


Interpret the distances between the species
Do this separately for Codon Positions 1st, 2nd and 3rd. Why are the distances not the same?
1.11. Compute amino acid distances
 In the menu, go to distance / compute pairwise distances and choose “Substitution type:
Amino acid” and “Model/method = p-distance”
 Compare the distances with what you got from the DNA-distances
6
Download