Calign:aligning sequences with restricted affine gap penal

advertisement
Calign:aligning sequences with
restricted affine gap penalties
Kun-Mao Chao
Motivation
Given
a genomic DNA sequence, it is still an open problem to
determine its coding region, i.e. the region consisting of exons and
introns.
The comparison of cDNA and genomic DNA helps the
understanding of coding region.
Exon1
coding region
Intron1 Exon2 Intron2 Exon3
2 strands
Genomic DNA
RNA synthesis (transcription)
RNA
1 strand
Splicing
mRNA (cDNA)
Protein synthesis (translation)
Protein
Preliminaries
sequence A(a1a2……aM) and B(b1b2….…bN), where without loss of
generality N >= M.
Output:An alignment of A and B.
When aligning a cDNA sequence with a genomic DNA sequence, It might be
more appropriate to penalize each long gap with a constant penalty
Restricted affine gap penalties:when insertion gaps are more than l symbols
are penalized a + lb.
Input:Two
O(MN) algorithm
S(i,j)
denotes the minimum cost of any alignment between
a1a2….ai and b1b2....bj
D(i,j) denotes the minimum cost of any alignment between
a1a2….ai and b1b2....bj ending with a deletion.
I(i,j) and I’(i,j) denote similar with D(i,j).
O(MN) algorithm(cont.)
 D(i-1,j)+b
D(i,j)  min 
S(i-1,j)+a+b
 I(i,j-1)+b
I(i,j)  min 
S(i,j-1)+a+b
O(MN) algorithm(cont.)
 I'(i,j-1)
I'(i,j)  min 
S(i,j-1)+a+lb
S(i-1,j-1)+k(ai,bj)
 D(i,j)

S(i,j)  min 
 I(i,j)

 I'(i,j)
(0,0)
(0,N)
D(i-1,j)+b
S(i-1,j-1)+k(ai,bj) S(i-1,j)+a+b
I’(i,j-1)
S(i,j-1)+a+lb
I(i,j-1)+b
I’(i,j)
D(i,j)
I(i,j) S(i,j)
S(i,j-1)+a+b
(M,0)
(M,N)
O(NC) algorithm
Tables
D, I, I’, S have diagonalwise monotonically nondecreasing property.
Let D”(k,c), I”(k,c), I’”(k,c). S”(k,c) be the largest row I such that DiI,k+i) = c,
I(i,k+i) = c, I’(i,k+i) = c, S(i,k+i) = c, respectively.
D"(k,c)  max
 D"(k+1,c-b)+1

S"(k+1,c-a-b)+1
O(NC) algorithm(cont.)
I"(k-1,c-b)
I"(k,c)  max 
S"(k-1,c-a-b)
 I'"(k-1,c)
I'"(k,c)  max 
S"(k-1,c-a-lb)
O(NC) algorithm(cont.)
i  max
S"(k,c- )+1
 D"(k,c)


 I"(k,c)

 I'"(k,c)
s "(k , c)  snake(i, k  i )
O(NC) algorithm(cont.)
snake(i,j)
= max{ z: ai+1…..ai+z = bj+1…..bj+z}
K+1
k
K-1
D”(k+1,c-b)
S”(k+1,c-a-b)
I’”(k-1,c)
S”(k-1,c-a-lb)
I”(k-1,c-b)
S”(k-1,c-a-b)
I’”(k,c)
I”(k,c)
D”(k,c)
S”(k,c-r)
i
Snake(I,k+i)
S(k,c)
Download