Pairwise profile alignment Usman Roshan BNFO 601 Protein families • PFAM: http://pfam.sanger.ac.uk/ • Family alignments can be used to search for new members in a database Profile-sequence alignment • Given a family alignment, how can we align it to a sequence? • First, we compute a profile of the alignment. • We then align the profile to the sequence using standard dynamic programming. • However, we need to describe how to align a profile vector to a nucleotide or residue. Profile • A profile can be described by a set of vectors of nucleotide/residue frequencies. • For each position i of the alignment, we we compute the normalized frequency of nucleotides A, C, G, and T Aligning a profile vector to a nucleotide • ClustalW/MUSCLE – Let f be the profile vector – Score(f,j)= f i S (i, j ) i { A ,C ,G ,T } – where S(i,j) is substitution scoring matrix Aligning a profile vector to a nucleotide • • • • • PSI-BLAST Score(f,i)=log(Qi/Pi) Pi is the background probability of nucleotide i qij is a matrix of match/mismatch probabilities Define gi as f gi i { A ,C ,G ,T } • and Qi as Qi j Pj f i gi q ij ( , are constants)