Calign:aligning sequences with restricted affine gap penalties Kun-Mao Chao Motivation Given a genomic DNA sequence, it is still an open problem to determine its coding region, i.e. the region consisting of exons and introns. The comparison of cDNA and genomic DNA helps the understanding of coding region. Exon1 coding region Intron1 Exon2 Intron2 Exon3 2 strands Genomic DNA RNA synthesis (transcription) RNA 1 strand Splicing mRNA (cDNA) Protein synthesis (translation) Protein Preliminaries sequence A(a1a2……aM) and B(b1b2….…bN), where without loss of generality N >= M. Output:An alignment of A and B. When aligning a cDNA sequence with a genomic DNA sequence, It might be more appropriate to penalize each long gap with a constant penalty Restricted affine gap penalties:when insertion gaps are more than l symbols are penalized a + lb. Input:Two O(MN) algorithm S(i,j) denotes the minimum cost of any alignment between a1a2….ai and b1b2....bj D(i,j) denotes the minimum cost of any alignment between a1a2….ai and b1b2....bj ending with a deletion. I(i,j) and I’(i,j) denote similar with D(i,j). O(MN) algorithm(cont.) D(i-1,j)+b D(i,j) min S(i-1,j)+a+b I(i,j-1)+b I(i,j) min S(i,j-1)+a+b O(MN) algorithm(cont.) I'(i,j-1) I'(i,j) min S(i,j-1)+a+lb S(i-1,j-1)+k(ai,bj) D(i,j) S(i,j) min I(i,j) I'(i,j) (0,0) (0,N) D(i-1,j)+b S(i-1,j-1)+k(ai,bj) S(i-1,j)+a+b I’(i,j-1) S(i,j-1)+a+lb I(i,j-1)+b I’(i,j) D(i,j) I(i,j) S(i,j) S(i,j-1)+a+b (M,0) (M,N) O(NC) algorithm Tables D, I, I’, S have diagonalwise monotonically nondecreasing property. Let D”(k,c), I”(k,c), I’”(k,c). S”(k,c) be the largest row I such that DiI,k+i) = c, I(i,k+i) = c, I’(i,k+i) = c, S(i,k+i) = c, respectively. D"(k,c) max D"(k+1,c-b)+1 S"(k+1,c-a-b)+1 O(NC) algorithm(cont.) I"(k-1,c-b) I"(k,c) max S"(k-1,c-a-b) I'"(k-1,c) I'"(k,c) max S"(k-1,c-a-lb) O(NC) algorithm(cont.) i max S"(k,c- )+1 D"(k,c) I"(k,c) I'"(k,c) s "(k , c) snake(i, k i ) O(NC) algorithm(cont.) snake(i,j) = max{ z: ai+1…..ai+z = bj+1…..bj+z} K+1 k K-1 D”(k+1,c-b) S”(k+1,c-a-b) I’”(k-1,c) S”(k-1,c-a-lb) I”(k-1,c-b) S”(k-1,c-a-b) I’”(k,c) I”(k,c) D”(k,c) S”(k,c-r) i Snake(I,k+i) S(k,c)