Lab 3: Basic Database Similarity Searching

advertisement
Review of Dynamic
Programming
SEQUENCE 1
SEQUENCE 2
We want to calculate the score for the
yellow box.
The final score that we fill in the yellow
box will be the SUM of two other scores,
we’ll call them MATCH and MAX.
Let’s try it…
Dynamic Programming
Score = Sum of MatchScore + MAX
j-4
Match Score
whether the sequence matches at
that location
1 for match / 0 for non match
j-3
j-2
j-1
j
i-4
i-3
i-2
i-1
Fill in the Table
from the top left
hand corner!
i
MAX (the highest of the following three)
1. The score in the box at position
i-1, j-1
2. The highest score in the row i-x, j-1
(where 2<=x<i)
3. The highest score in the column
i-1, j-y (where 2<=y<j)
Dynamic Programming – Filling in the Table!
FILL in the Table from the top left hand corner!
A
A
1
B
B C
D
The MATCH score is assigned based
on whether the residues at position i, j
(i.e. yellow box) matches.
B
C C C
In this case, the residues at i, j are A
and A which matches. Therefore, the
MATCH score would be 1.
Since there are no i-1 or j-1 (i.e no
column/rows on top) we don’t have to
worry about the MAX part of the score.
Dynamic Programming – Filling in the Table!
A
B
A
1 0
B C
D
Moving one square to the right.
B
C C C
In this case, the residues at i, j are B
and A and match. Therefore, the
MATCH score would be 0.
Again there are no i-1 or j-1 (i.e no
column/rows on top) we don’t have to
worry about the MAX part of the score.
Dynamic Programming – Filling in the Table!
A
B
B C
D
A
B
C C C
1 0 0 0 0
0
0
0
0
We can filled in the rest of
the first column and first
row
Dynamic Programming – Filling in the Table!
Let’s move to the 2nd row
Score = Sum of MatchScore + MAX
A
B
B C
D
A
B
C C C
1 0 0 0 0
0 2
0
0
0
MAX (the highest of the following three)
1. The score in the box at position i-1, j-1
2. The highest score in the row i-x, j-1 (where
2<=x<i)
3. The highest score in the column i-1, j-y (where
2<=y<j)
In this case there is no 2 or 3 to
consider
MatchScore = 1
MAX = 1
Score = 1 + 1 = 2
Dynamic Programming – Filling in the Table!
Moving across the row
Score = Sum of MatchScore + MAX
A
B
B C
D
A
B
C C C
1 0 0 0 0
0 2 2
0
0
0
MAX (the highest of the following three)
1. The score in the box at position i-1, j-1
2. The highest score in the row i-x, j-1 (where
2<=x<i)
3. The highest score in the column i-1, j-y (where
2<=y<j)
MatchScore = 1
MAX = 1
Score = 1 + 1 = 2
Dynamic Programming – Filling in the Table!
Moving across the row again!
Score = Sum of MatchScore + MAX
A
B
B C
D
A
B
C C C
1 0 0 0 0
0 2 2 1 1
0
0
0
MAX (the highest of the following three)
1. The score in the box at position i-1, j-1
2. The highest score in the row i-x, j-1 (where
2<=x<i)
3. The highest score in the column i-1, j-y (where
2<=y<j)
MatchScore = 0
MAX = 1
Score = 0 + 1 = 1
We can fill in the last square using the
same method = 1
Dynamic Programming – Filling in the Table!
Moving to the next row
A
B
B C
D
A
B
C C C
1 0 0 0 0
0 2 2 1 1
0 1
0
0
MAX (the highest of the following three)
1. The score in the box at position i-1, j-1
2. The highest score in the row i-x, j-1 (where
2<=x<i)
3. The highest score in the column i-1, j-y (where
2<=y<j)
MatchScore = 0
MAX = 1
Score = 0 + 1 = 1
Dynamic Programming – Filling in the Table!
Moving to the next row
A
B
B C
D
A
B
C C C
1 0 0 0 0
0 2 2 1 1
0 1 2
0
0
MAX (the highest of the following three)
1. The score in the box at position i-1, j-1
2. The highest score in the row i-x, j-1 (where
2<=x<i)
3. The highest score in the column i-1, j-y (where
2<=y<j)
MatchScore = 0
MAX = 2
Score = 0 + 2 = 2
Dynamic Programming – Filling in the Table!
Moving to the next row
A
B
B C
D
A
B
C C C
1 0 0 0 0
0 2 2 1 1
0 1 2 3 2
0
0
MAX (the highest of the following three)
1. The score in the box at position i-1, j-1
2. The highest score in the row i-x, j-1 (where
2<=x<i)
3. The highest score in the column i-1, j-y (where
2<=y<j)
MatchScore = 1
MAX = 2 OR 2
Score = 1 + 2 = 3
We can fill in the last square in similar
fashion
Dynamic Programming – Filling in the Table!
A
B
C C C
A
B
1
0
0
0
0
0
2
1
1
1
B C
0
2
2
2
2
0
1
3
3
3
D
0
1
2
3
We can fill in the
remaining squares!
Dynamic Programming – Filling in the Table!
A
B
C C C
A
B
1
0
0
0
0
0
2
1
1
1
B C
0
2
2
2
2
0
1
3
3
3
D
0
1
2
3
3
The LAST Square!
MATCH = 0
MAX = 3
Score = 0+3 = 3
QUESTIONS?
A
B
A
1
0
0
0
0
B
0
2
2
1
1
C
0
1
2
3
2
C
0
1
2
3
3
C
B
C
D
0
1
2
3
3
Traceback Protocol
A
A
T
V
D
A
1
1
0
0
0
V
0
1
1
2
1
Start in the lower right
corner.
V
0
1
1
2
2
D
0
1
1
1
3
You can only move to
the largest number
that is UP and TO
THE LEFT.
D
D
Used to get the
alignment from the
filled in table.
Traceback Protocol
A
A
T
V
D
A
1
1
0
0
0
V
0
1
1
2
1
V
0
1
1
2
2
D
0
1
1
1
3
VD
VD
All 3 paths start like
this.
But, moving up and to
the left from the
square with score 2,
we have two possible
choices, both of
which are up and to
the left, and contain
equal values.
Traceback Protocol
A
A
T
V
D
A
1
1
0
0
0
V
0
1
1
2
1
V
0
1
1
2
2
D
0
TVD
VVD
1
1
1
3
ATVD
V-VD
We now have two
possible alignments –
red and yellow.
Yellow has only one
more square it can
access.
The red alignment
can branch off again,
however.
Traceback Protocol
A
A
T
V
D
A
1
1
0
0
0
V
0
1
1
2
1
V
0
1
1
2
2
D
0
1
1
1
3
AATVD
-AVVD
AATVD
AV-VD
AATVD
A-VVD
These are the 3
possible paths
through the matrix, in
other words, the 3
possible alignments.
Traceback Protocol
A
A
T
V
D
A
1
1
0
0
0
V
0
1
1
2
1
V
0
1
1
2
2
D
0
1
1
1
3
Every time a diagonal
line “skips” a box (i.e
does not lead into the
box immediately to
the upper left (i-1, j1), we insert a gap
into the alignment.
Traceback Protocol
A
A
A
T
V
D
1
1
0
0
0
V
0
1
1
2
1
V
0
1
1
2
2
D
0
1
1
1
3
AATVD
-AVVD
AATVD
AV-VD
AATVD
A-VVD
Traceback Protocol
A
A
T
V
D
A
1
1
0
0
0
V
0
1
1
2
1
V
0
1
1
2
2
D
0
1
1
1
3
Is this
possible??
AATV-D
-A-VVD
Optimal
alignment??
QUESTIONS??
A
A
A
T
V
D
1
1
0
0
0
V
0
1
1
2
1
V
0
1
1
2
2
D
0
1
1
1
3
AATVD
-AVVD
AATVD
AV-VD
AATVD
A-VVD
Download