File - Chad M. Hodge

advertisement
Homework 3 – sequence alignments
Chad Hodge – u0584663
BMI 6030 – Dr. Eilbeck
1.
If you were looking for a mutation between your protein and a paralogous protein would you try a global or a
local alignment first? (2 points)
a. I think global would be the best one to try first.
b. Explain your reasoning. (2 points)
i. I would choose a global alignment because global alignments try to align proteins from the
beginning to the end of the sequence. Since the mutations will be small, the global will
show where the insertions or deletions or substitutions will pop out. Local alignment is
meant for a smaller area of comparison, such as a motif, but global is meant for larger
area matches, such as an entire sequence.
2.
You search a 100-nucleotide long transcript against an EST database. The bit score of the best alignment
was 37 bits. The effective database size was 100,000,000 nucleotides. The effective query size was 80
nucleotides. K = 0.059.
a. Is the hit significant? (2 points)
i. No, because the P value is greater than .05 significance.
b. What is its E-value? (3 points)
λ
i. The Karlin-Altschul equation for this is normally shown as : E = Kmne- s however, since
that equation calls for a raw score, and we have a bit score, I would suspect we should
use this formula : E = m’n’2-s’
Solving this equation, we get this:
E = 80*100,000,000*2-37
E = 0.05820766091
c.
What is its P value? (3 points)
i. The equation is this: P = 1-e-E
Solving for this, we get:
P = 1-e-0.05820766091
P = 0.05654599142
3.
You want to look for distant homologs of your favorite Human protein in the Mouse Genome sequence.
a. Which BLAST should you use (blastn, blastx, tblastn, tblastx, blastp)? (1 point)
i. We are comparing protein (human) to nucleotide (mouse), across species, so we could
use tblastn. If we converted our protein to a nucleotide, we could use tblastx to compare
nucleotide to nucleotide, but this will introduce errors caused by splicing.
b. Which BLOSUM Matrix(s) should you use? (3 points)
i. BLOSUM62. I chose this because mice are a model organism, so are pretty similar to
humans, so this could be a good starting point.
c. Which BLOSUM Matrix would you use for searching the E. coli genome sequence with a human
protein? (2 points)
i. BLOSUM45 most likely, because humans and E. Coli are widely different. We could also
look to BLOSUM62 if we think the gene is conserved between the 2.
d. Why? (2 points)
i. The lower the BLOSUM matrix number, the less identity match is used. This means that
species that diverged long ago will have less in common with humans today, which E.coli
would certainly fall under.
Download