Exercise4_2015

advertisement

Exercise 4 Multiple sequence alignment, genetic distances, phylogenies

Porter et al. (1996) studied evolutionary relationships of mammals using exon 28 sequences of von Willebrand factor gene. In their study there was a human pseudogene of the von Willebrand factor gene, which was placed on the human-chimp branch of the phylogenetic tree (Fig. 3 below, ψ-Homo is the pseudogene). A third more distantly related primate, Galago, was included into the study and it was concluded that this species had speciated already before the pseudogene appeared. There were no other primates included in that study, but it would be interesting to know in more detail where the origin of the pseudogene is in the primate branch. Your task is to fetch von

Willebrand factor gene sequences from the GenBank and study this by using phylogenies.

(Original paper: CALVIN A. PORTER, MORRIS GOODMAN, AND MICHAEL J. STANHOPE

(1996) Evidence on Mammalian Phylogeny from Sequences of Exon 28 of the von Willebrand Factor Gene. MOLECULAR PHYLOGENETICS AND EVOLUTION 5: 89-

101.)

Below are accession numbers for the ten primate sequences plus three outgroup species.

Taxon

Homo sapiens

Pan troglodytes

Hylobates lar

Macaca mulatta

Cercopithecus solatus

Ateles belzebuth

Pithecia pithecia

Cebus apella

Callithrix jacchus

Lemur catta

Galeopterus variegatus

Aplodontia rufa

Canis lupus familiaris

Acc. no.

M25851

U31620

AJ410300

AJ410302

AJ410301

AF061059

AJ410298

AJ410297

AJ410299

AJ410292

U31606

AJ224662

L16903

At the end of this Word-document you can find info about the taxonomy of the species.

1) Fetching the sequences & sequence alignment

Open MEGA. Choose ”Align” and ”Query DataBanks”, and a familiar NCBI page will open.

Here add accession number, and start the run.

By clicking this you can add the sequences to the MEGA, but only after you have first searched the correct sequence and opened the sequence file

(program will tell if you click at a wrong stage).

 fetch sequences from “Nucleotide” database one by one by using the listed accession numbers above. Use taxon name as sequence label in MEGA.

 add human VWF gene pseudogene, accession number is M60676. This sequence is way too long (21 033 bp) compared to the other sequences, so it is wise to shorten it already here:

Add here the

7582 as a starting point and 8960 as a ending point.

 When you have included all the sequences, close MEGA Web Browser window, you’ll return to the ”Alignment Explorer” window where you can see the sequences on top of each other. E.g.:

Notice already now this ”Undo” button, which cancels the last action. It can save your day…

Sequences are now only piled starting from the first nucleotides. The actual alignment still has to be done. MEGA utilizes another program, ClustalW, for sequence alignment.

(ClustalW is available as an independent program also, but we will use it in MEGA)

 Choose ”Alignment” and ”Align by ClustalW”. Include all sequences. A new window opens, here you can choose parameter values (click ”Help” to get more info about parameters). Keep the default settings.

 Alignment must always be checked ”by eye” e.g. go through the whole alignment from the start till the end. Use your own judgment: could some parts be aligned better? Especially gap positions should be checked. This is an exon sequence, so all gap lengths should be dividable by three because of the codon structure. This does not concern the pseudogene, which is no longer functional.

If necessary, make corrections:

-click the nucleotide and press space > makes a gap on the left side

-click on the gap and press backspace > removes the gap

Trim the ends of the alignment: leave the first nucleotide that exists in all sequences as the first nucleotide, the same at the end. (paint with the mouse the region you want to delete and press delete)

After trimming, check “Translated Protein Sequences “ (standard genetic code) and make sure that there are no stop codons (*) in the amino acid sequences

(except in the pseudogene) and that the amino acids are aligned nicely, check

especially the edges of the gaps. “?” is displayed if the codon is interrupted by indels: check at the nucleotide level if it would be reasonable to move the gap

 Save the alignment (file extension *.mas).

 Save it also in MEGA format (”Data” ”Export” ”Mega file”) (file extension *.meg

(protein coding nucleotide sequence data)

 Open the alignment in ”Sequence Data Explorer”

Check that the alignment looks okay.

2) Estimating sequence differences

-window in MEGA.

For estimating genetic distances and constructing phylogenies the appropriate substitution model for the sequences has to be determined. For this purpose we need to take a look at the nucleotide frequencies, transition-transversion ratio and nucleotide substitutions.

 Remove pseudogene for this step (why?) by removing the tick mark from the left side of the sequence name.

 From ”Statistics” you will find ”Nucleotide composition” and ”Nucleotide pair frequencies” (directional, 16 pairs). Make a summary of the findings, and use the

“Choosing the substitution model” -diagram from the exercise slides. Which model fits the best?

Click the pseudogene back. Estimate pairwise distances for the sequences in

”Distances” > ” Compute Pairwise Distances” First use ”p-distance” and then the model you chose for the data.

Compare results:

are there differences between different substitution models?

examine the distances between the pseudogene and other sequences: are you able to conclude something already now?

3) Constructing a phylogenetic tree

Check the level of variation:

- how many variable sites there are ? [V]

- how many parsimony informative sites? [Pi]

- how many monomorphic sites?

 Make two trees, Neighbor-Joining Tree and Maximum Parsimony Tree in

“Phylogeny”. For NJ-tree, use the substitution model that you chose earlier. For statistical testing of a tree both methods can use bootstrapping. In NJ, use 1000 bootstrap replication, in MP, use 100 replications

Rooting

Style of the tree

Flipping

If the root of a tree appears to be in a wrong place, put it to where it belongs based on taxonomic information (below). Try also different styles and flipping the order of branches.

Compare the results of the two methods (NJ and MP). What kind of bootstrap values you get?

What can you conclude about the origin of the pseudogene in the primate branch?

Lineages of the species:

ORGANISM Homo sapiens

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates;

Haplorrhini;

Catarrhini; Hominidae; Homo

ORGANISM Pan troglodytes

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates;

Haplorrhini;

Catarrhini; Hominidae; Pan

ORGANISM Hylobates lar

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates;

Haplorrhini;

Catarrhini; Hylobatidae; Hylobates

ORGANISM Macaca mulatta

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates;

Haplorrhini;

Catarrhini; Cercopithecidae; Cercopithecinae; Macaca

ORGANISM Cercopithecus solatus

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates;

Haplorrhini;

Catarrhini; Cercopithecidae; Cercopithecinae;

Cercopithecus.

ORGANISM Ateles belzebuth

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates;

Haplorrhini;

Platyrrhini; Atelidae; Atelinae; Ateles

ORGANISM Pithecia pithecia

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates;

Haplorrhini;

Platyrrhini; Pitheciidae; Pitheciinae; Pithecia

ORGANISM Cebus apella

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates;

Haplorrhini;

Platyrrhini; Cebidae; Cebinae; Cebus

ORGANISM Callithrix jacchus

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates;

Haplorrhini;

Platyrrhini; Cebidae; Callitrichinae; Callithrix

ORGANISM Lemur catta

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates;

Strepsirrhini;

Lemuriformes; Lemuridae; Lemur

ORGANISM Galeopterus variegatus

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Dermoptera;

Cynocephalidae;

Galeopterus

ORGANISM Aplodontia rufa

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;

Sciurognathi; Aplodontidae; Aplodontia

ORGANISM Canis lupus familiaris

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi;

Mammalia; Eutheria; Laurasiatheria; Carnivora; Caniformia;

Canidae;

Canis

Download