Sequence comparison: Dynamic programming

advertisement
Sequence comparison:
More dynamic programming
Genome 559: Introduction to Statistical
and Computational Genomics
Prof. William Stafford Noble
One-minute responses
• WAY TOO FAST. Please walk around more during sample
problems. I was completely lost.
• Today I felt a bit lost. Most times I was still trying to figure out one
slide or problem, while the class was on the next one.
• It was fast today, but after the reading I was prepared to take things
more quickly and I understood things much better today.
• I enjoyed class today. I thought it moved at a great pace.
• I thought the pace was good today.
• I liked the pace of the lecture – even though you said we spent too
much time on the dynamic programming, it gave me time to
understand.
• The pace is great and gives me time to explore.
• I thought this lecture built nicely on the last lecture. I struggled last
class but it clicked today.
One-minute responses
•
•
•
•
•
•
•
•
•
•
•
•
The matrix exercise was very helpful, even though I’m not fully clear on how it works
yet.
I found today’s class time much more understandable.
I struggled a little bit to grasp the matrix, but by the end I had it. The pace and
numerous examples helped.
The DP matrix was simple to grasp after computing one or two matrix values, so the
portion of the lecture could go faster.
I like the sample problems.
Dynamic programming reminded me of sudoku, which was fun.
Going through the alignment table helped a lot.
It was nice to do examples with DNA sequences.
I’m feeling a lot better about it all. I really like going through examples.
Again, the small steps with programming problems helped, although the first problem
was overly challenging (when explained in a different way it was fine).
I was a little confused when writing the program. I think more practice is required.
The practice problems will help.
Today’s class was much better since we had appropriate reading first. The sample
problems were interesting since they actually relate to biology.
One-minute responses
• Is there a place to get more samples of simple code to
use to help see patterns of how this works? Or is there
plenty in the book?
– There are lots of examples in the book. And of course, you can
easily find lots of examples on the web. For a reference book
with examples, try Python Cookbook, by Martelli, Ravenscroft
and Ascher.
• I’m a little fuzzy about how dynamic programming differs
from other sorts of programming, but everything else
was really clear.
– The term “dynamic programming” predates computers. There is
no relationship between this use of the word “programming” and
what we are learning to do in Python.
DP matrix
G
A
A
T
C
0
-4
-8
-12
-16
-20
C
-4
-5
-9
-13
-12
-6
A
-8
-4
5
1
-3
-7
T
-12
-8
1
0
11
7
A
-16
-12
2
11
7
6
C
-20
-16
-2
7
11
17
Three legal moves
• A diagonal move aligns a character from
the left sequence with a character from the
top sequence.
• A vertical move introduces a gap in the
sequence along the top edge.
• A horizontal move introduces a gap in the
sequence along the left edge.
GA-ATC
CATA-C
DP matrix
G
A
A
T
C
0
-4
-8
-12
-16
-20
C
-4
-5
-9
-13
-12
-6
A
-8
-4
5
1
-3
-7
T
-12
-8
1
0
11
7
A
-16
-12
2
11
7
6
C
-20
-16
-2
7
11
17
GAAT-C
CA-TAC
DP matrix
G
A
A
T
C
0
-4
-8
-12
-16
-20
C
-4
-5
-9
-13
-12
-6
A
-8
-4
5
1
-3
-7
T
-12
-8
1
0
11
7
A
-16
-12
2
11
7
6
C
-20
-16
-2
7
11
17
GAAT-C
C-ATAC
DP matrix
G
A
A
T
C
0
-4
-8
-12
-16
-20
C
-4
-5
-9
-13
-12
-6
A
-8
-4
5
1
-3
-7
T
-12
-8
1
0
11
7
A
-16
-12
2
11
7
6
C
-20
-16
-2
7
11
17
GAAT-C
-CATAC
DP matrix
G
A
A
T
C
0
-4
-8
-12
-16
-20
C
-4
-5
-9
-13
-12
-6
A
-8
-4
5
1
-3
-7
T
-12
-8
1
0
11
7
A
-16
-12
2
11
7
6
C
-20
-16
-2
7
11
17
Multiple solutions
GA-ATC
CATA-C
GAAT-C
CA-TAC
GAAT-C
C-ATAC
GAAT-C
-CATAC
• When a program returns a
sequence alignment, it may
not be the only best
alignment.
DP in equation form
• Align sequence x and y.
• F is the DP matrix; s is the substitution
matrix; d is the linear gap penalty.
F 0,0  0
 F i  1, j  1  s xi , y j 

F i, j   max  F i  1, j   d
 F i, j  1  d

DP in equation form
F i, j 1
F i  1, j  1
s xi , y j 
F i 1, j 
d
d
F i, j 
A simple example
A
C
G
T
A
2
-7
-5
-7
C
-7
2
-7
-5
G
-5
-7
2
-7
T
-7
-5
-7
2
Find the optimal alignment of AAG and AGC.
Use a gap penalty of d=-5.
A
A
F i, j 1
G
s xi , y j 
d
C
d
F i, j 
F i 1, j 1
F i 1, j 
A
G
A simple example
A
C
G
T
A
2
-7
-5
-7
C
-7
2
-7
-5
G
-5
-7
2
-7
T
-7
-5
-7
2
Find the optimal alignment of AAG and AGC.
Use a gap penalty of d=-5.
A
0
A
F i, j 1
G
s xi , y j 
d
C
d
F i, j 
F i 1, j 1
F i 1, j 
A
G
A simple example
A
C
G
T
A
2
-7
-5
-7
C
-7
2
-7
-5
G
-5
-7
2
-7
T
-7
-5
-7
2
0
A
-5
F i, j 1
G
-10
s xi , y j 
d
C
-15
d
F i, j 
F i 1, j 1
F i 1, j 
Find the optimal alignment of AAG and AGC.
Use a gap penalty of d=-5.
A
A
G
-5
-10
-15
A simple example
A
C
G
T
A
2
-7
-5
-7
C
-7
2
-7
-5
G
-5
-7
2
-7
T
-7
-5
-7
2
A
A
G
0
-5
-10
-15
A
-5
2
-3
-8
F i, j 1
G
-10
-3
-3
-1
s xi , y j 
d
C
-15
-8
-8
-6
d
F i, j 
F i 1, j 1
F i 1, j 
Find the optimal alignment of AAG and AGC.
Use a gap penalty of d=-5.
Traceback
• Start from the lower right corner and trace back
to the upper left.
• Each arrow introduces one character at the end
of each aligned sequence.
• A horizontal move puts a gap in the left
sequence.
• A vertical move puts a gap in the top sequence.
• A diagonal move uses one character from each
sequence.
A simple example
Find the optimal alignment of AAG and AGC.
Use a gap penalty of d=-5.
•
•
•
•
•
Start from the lower right
corner and trace back to
the upper left.
Each arrow introduces one
character at the end of
each aligned sequence.
A horizontal move puts a
gap in the left sequence.
A vertical move puts a gap
in the top sequence.
A diagonal move uses one
character from each
sequence.
A
0
A
A
G
-5
2
-3
G
-1
C
-6
A simple example
Find the optimal alignment of AAG and AGC.
Use a gap penalty of d=-5.
•
•
•
•
•
Start from the lower right
corner and trace back to
the upper left.
Each arrow introduces one
character at the end of
each aligned sequence.
A horizontal move puts a
gap in the left sequence.
A vertical move puts a gap
in the top sequence.
A diagonal move uses one
character from each
sequence.
A
0
A
A
G
-5
2
-3
G
-1
C
-6
AAG-AGC
AAGA-GC
Traceback problem #1
G
A
A
T
C
0
-4
-8
-12
-16
-20
C
-4
-5
-9
-13
-12
-6
A
-8
-4
5
1
-3
-7
T
-12
-8
1
0
11
7
A
-16
-12
2
11
7
6
C
-20
-16
-2
7
11
17
Write down the alignment corresponding to the circled score.
GA
CA
Solution #1
G
A
A
T
C
0
-4
-8
-12
-16
-20
C
-4
-5
-9
-13
-12
-6
A
-8
-4
5
1
-3
-7
T
-12
-8
1
0
11
7
A
-16
-12
2
11
7
6
C
-20
-16
-2
7
11
17
Write down the alignment corresponding to the circled score.
Traceback problem #2
G
A
A
T
C
0
-4
-8
-12
-16
-20
C
-4
-5
-9
-13
-12
-6
A
-8
-4
5
1
-3
-7
T
-12
-8
1
0
11
7
A
-16
-12
2
11
7
6
C
-20
-16
-2
7
11
17
Write down three alignments corresponding to the circled score.
GAATC
CA---
Solution #2
G
A
A
T
C
0
-4
-8
-12
-16
-20
C
-4
-5
-9
-13
-12
-6
A
-8
-4
5
1
-3
-7
T
-12
-8
1
0
11
7
A
-16
-12
2
11
7
6
C
-20
-16
-2
7
11
17
Write down three alignments corresponding to the circled score.
GAATC
C-A--
Solution #2
GAATC
CA---
G
A
A
T
C
0
-4
-8
-12
-16
-20
C
-4
-5
-9
-13
-12
-6
A
-8
-4
5
1
-3
-7
T
-12
-8
1
0
11
7
A
-16
-12
2
11
7
6
C
-20
-16
-2
7
11
17
Write down three alignments corresponding to the circled score.
Solution #2
GAATC
-CA--
GAATC
C-A--
GAATC
CA---
G
A
A
T
C
0
-4
-8
-12
-16
-20
C
-4
-5
-9
-13
-12
-6
A
-8
-4
5
1
-3
-7
T
-12
-8
1
0
11
7
A
-16
-12
2
11
7
6
C
-20
-16
-2
7
11
17
Write down three alignments corresponding to the circled score.
Download