Gld manuscript

advertisement
Homework for Bioinformatics I – Phylogeny I – parsimony analysis.
Hi, everyone.
Here is a homework assignment to get you all comfortable with a
variety of issues we’ve worked on so far. The following dataset is a MSA of ca
300 bp upstream of the GLD gene in several species of Drosophila. GLD has a
highly variable pattern of expression in the reproductive glands male and female
developing and adult Drosophila species. Our dataset is 7 species; since this is
an exercise you will do by hand, we need to pair down to five to have a
reasonable number to work with.
We will do some further analyses with this dataset, so it is a little larger
than the one you will use.
1.
2.
3.
4.
5.
6.
7.
first, eliminate Drosophila sechellia and D. melanogaster from
consideration. We’ll work on the other five species.
Second, ignore any position in the sequence that has a gap (we’ll go
back to think about them after we are done).
Identify and label all of the phylogenetically informative characters.
Using the brute force approach we used in class, identify the most
parsimonious unrooted tree or trees for the five species erecta,
mauritiana, simulans, teissieri, and yakuba. You will have to begin with
the 15 unrooted trees for five species. You may want to check the
listing from Chapter 10 if not sure you have them all.
How many steps is the most parsimious tree? What is the next shortest
tree, in terms of steps?
Root the tree, using simulans and mauritiana as the outgroup to the
other three. Draw and annotate your rooted tree.
Answer the following questions;
a. Identify two of your phylogenetically informative characters that
have no homoplasy given your dataset, and two that do have
homoplasy (if any such characters exist). Point them out to me on
a copy of the alignmnent.
Which of the three kinds of homoplasy
do you observe?
b. Now, go back to each of the insertion-deletion (indel) regions in
the dataset. Select two that look interesting and use your tree
to evaluate the evolutionary history of those indels. Use treejargon like the kind I have used in class to describe their
history.
c. These species have been separated by up to 50 million years of
evolutionary history. What do you think would be a good strategy
to use this analysis and any other information you could bring to
bear to predict and test what factors influence the expression
patterns of Drosophila GLD? (just a brief idea or two, not a big
essay)
This assignment is likely to take you several hours to complete.
So, I will
give you almost a week. Please plan to turn this in to me in class – THURSDAY,
NOVEMBER 10.
Cheers,
Claude
erecta
-299
TGACGTCTTAGCTGAAGCTAGGGGTGCTTTAAGAGAGTTTTGCAACACTAGAAAATATTCT
melanogaster
-296
TGACGTCTTAGCCGAAGTCAGGGGTGCTTAAAGAAAGTTTTACAACACTAGACCATATTCA
mauritiana
-296
TGACGTCTTAGCCGAAGTCAGGGGTGCTTTAATAAAGTTTTACAACACTGGAAATTATTCA
sechellia
-296
TGACGTCTTAGCCGAAGTCAGGGGTGCTTTAATAAAGTTTTACAACACTAGAAAATATTCA
simulans
-297
TGACGTCTTAGCCGAAGTCAGGGGTGCTTTAATAAAGTTTTACAACACTGGAAATTATTCA
teissieri
-328
TGACGTCTTAGCCTCAGTCAGGGGTGCTTTACGAAAGTTGTACAACACTAGAAAATATTCC
yakuba
-312
TGACGTCTTAGCCGCAGTCAGGGGTGCTTTCTGAAAGTTATACAACACTAGAAAATGTCCA
consensus
************
** **********
* **** * ******* **
* *
erecta
TAATA
melanogaster
ACGTAAGAAATAATA
mauritiana
TAAAA
sechellia
TAATA
simulans
TAATA
teissieri
TAATA
yakuba
TAATA
consensus
*** *
*
I
II
III
-238 TGAGTA---------------------GTAATAAATAATAC-GAAATACGTTAG---235 TGAGTAAAGGGTT------------GAGTAATAA--AATACATAAA-235 TGAGTAAGGGGTT------------AAGAATTAA--AATACATAAATACGTAAG---235 TGAGTAAGGGGCT------------AAGAATTAA--AATACATAACTACGTAAG---236 TGAGTAAGGGGTT------------AAGAATTAA--AATACATAAATACGTAAG---266 TGAGTAAAGGGTTGAGTAGAAAATAAATAATTAA-TAATAC-GCAATGCGTAAG---251 TGAGTAAGGGGTTGGGTAGAAAATAAGTAATTAA-TAATAC-GCAATACGTAAG--******
* ***
*****
*
*** **
IV
V
erecta
-201 ACAATT----ATGCAGAGTTTAAAGGGAAGTGGAAATAGGCTGTGTAAAATTGCACCAAT
melanogaster
-188 ATAATA-------CAGATTCTAAAAGTTATTAG----------GTAAAATTTAGACCAAT
mauritiana
-190 ATAATA-------CAGATTCTAAAAGTTATCGG----------GTAAAGTTTAGACCAAT
sechellia
-190 AAAATA-------CAGATTCTAAAAGTTATTGG----------GTAAAATTTGGCTCAAT
simulans
-191 ATAATA-------CAGATTCTAAAAGTTATTGG----------CTAAAATTTAGACCAAT
teissieri
-209
TTAATATCCGATACCGGTTTTAAAAGAGATTGGAAATAGGCTGGGTAAAATTTATACCAAT
yakuba
-194 ATAATAATAATACCGATTTTAAAAGAGATTGGCAATAGGCTGTGTAAAATTTATACCAAT
consensus
***
* * * **** * *
*
**** **
****
TTAGAat (negative strand)
TTAGAcc
CCAAT
VI
VII
erecta
-145 TTACTTACCTACT-CGTTGCAA-GCTTCAAAAGCT-TTCGCCTCAGACCAAGTCTCAGA
melanogaster
-145 TTA--GACCTACT-CATTGCAAACACTCAAAAGCT-CCCGATTCAGACCAAGTTTCAGA
mauritiana
-147 TTACTTAACTACT-CATTTCAAACACTCAAAAGCT-CCCGCTTCAGACCAAGTTTCAGA
sechellia
-147 TTACTTAACTACT-CATTTCAAACACTCAAAAGCT-CCCGCTTCAGAACAAGTTTCAGA
VIII
simulans
-148 TTACTTACCTACTCCATTTCAAACACTCAAAAGCT-CCCGCTTCAGACCAAGTTTCAGA
teissieri
-148 TTACTTACCTACTCATTGCAAACACTCAAAAGCTTCCAAGCTTCAGACCAAGTTTCAGA
yakuba
-134 TTACTTACCTACT-CATTGCAAGCACTCAAAAGCTTC-AAGC-----------TTCAGA
consensus
***
* ***** * ** ***
*********
*
*****
TCAGAcc
TCAGA
erecta
-89 GAGCGCAGCTTTGGCCCAGCTTTAAGCTGTCTTTCGTTGAGTTTGAGCTTTTCGCCAG
melanogaster
-90
GAGCGCAGCTTTGCGGCCAGCTTTAAGCTGTCTTTCGTTGAGTTCGAGCTTTTCGTCAG
mauritiana
-90
GAGCACAGCTTTGTGGTCAGCTTTAAGCTGTCTTTCGTTGAGTTCGAGCTTTTCGCCAG
sechellia
-90
GAGCACAGCTTTGCGGTCAGCTTTAAGCTATCTTTCGTTGAGTTCGAGCTTTTCGCCAG
simulans
-90
GAGCGCAGCTTTGCGTTCAGCTTTAAGCTGTCTTTCGTTGAGTTCGAGCTTTTCGCCTG
teissieri
-89
GAGCGCAGCTCTGGGACCATCTTAAAGCTGTCTTTCGTTGAGTTTGAGCTTTTAGCCAG
yakuba
-88 GAGCGCAGCTTTGGGCCAGCTTAAAGCTCTCTTTCGTTGAGTTTGAGCTTTTGGCCAG
consensus
**** ***** ** * ** *** ***** ************** ******** * *
*
Gpal half-site
Gpal
Gpal half-site
erecta
melanogaster
mauritiana
sechellia
simulans
teissieri
yakuba
consensus
-31
-31
-31
-31
-31
-30
-30
TTTAAAAAGACTGGCGCCTGCTGGCCAGAAGC
TTTAAAAAGACTGGCGCCTGCTGGTCAGAAGC
TTTAAAAAGACGGGCGCCTGCTGGCCAGAAGC
TTTAAAAAGACGGGCGCCTGCTGGCCAGAAGC
TTTAAAAAGACGGGCGCCTGCTGGCCAGAAGC
TTTAAATAGAC-GGCGCCTAGTGGCCAGAAGC
TTTAAAAAGAC-GGCGCCTGCTGGCCAGAAGC
****** **** ******* *** *******
TATA
+1
Download