FCH 532 Lecture 8 Exam on Friday Chapter 7 Sequence alignments • An alignment score (AS) is used to determine if there is any relationship. • 10 for every identity except Cys which scores 20 • Subtract 25 for every gap. • The normalized alignment score (NAS) by dividing the AS by the number of residues of the shortest of the two polypeptides in the alignment and multiplying by 100. • Example Human hemoglobin and myoglobin. Page 195 Figure 7-27 The optical alignments of human myoglobin (Mb, 153 residues) and the human hemoglobin a chain (Hba, 141 residues). Hemoglobina is 141 and myoglobin is 153 AS = number of identities X 10 + 20 for Cys -number of gaps = 37 identities X 10 + 20 (Cys) - (1gap X 25)= 365 NAS = AS/number of residues for shortest polypeptide =365/141*100 = 259 Page 195 Figure 7-28 A guide to the significance of normalized alignment scores (NAS) in the comparison of peptide sequences. Alignments are weighted according to the likelihood of substitution • Realistic way of assigning the probability of occurrence (weight) for a substitution is to look at the physical similarity of amino acids. • Dayhoff measured a number of residue exchanges for closely related proteins and determined their relative frequency of the 20 X 19/2 = 190 different possible residue changes. • This number is divided by 2 to account for the fact that A B and B A are equally likely. • These data can be used to create a square matrix (20 X 20) • The elements (20 properties per side) Mij, indicate the probability that, in a related sequence, amino acid i will replace amino acid j after an evolutionary interval (usually one PAM unit). • PAM-1 matrix. PAM matrix • • • • • • • • • Mutation probability can be determined for other evolutionary distances. PAM-N matrix is made bt multiplying the matrix by itself N times ([M]N). Relatedness odds matrix - Rij = Mij/fi fi = probability that the amino acid i will occur in the second sequence by chance. Rij = probability that amino acid i will replace amino acid j or vice versa every time i or j is encountered in the sequence. When two polypeptides are compared with each other, the Rij values for each position are multiplied to give the relatedness odds. For example A-B-C-D-E-F and P-Q-R-S-T-U, relatedness odds = RAP X RBQ X RCR X RDS X RET X RFU Log odds substitution matrix - is made by taking the log of the relatedness odds. Log odds need to be maximized to get the best alignment. Table 7-7 The PAM-250 Log Odds Substitution Matrix. Page 196 All elements multiplied by 10. Each diagonal element indicates the mutability of the corresponding amino acid. Neutral score = 0. Sequence alignment • Make a matrix with the log odds values associated with the amino acids at the appropriate positions. • Example use a PAM-250 log odds matrix with a 10 peptide horizontal and 11 peptide vertical. • The alignment of these two peptides must have at least one gap assuming a significant alignment can be found. • This is called a comparison matrix Page 197 Figure 7-29a Use of the Needleman-Wunsch alignment algorithm [alignment of 10-residue peptide (horizontal) with 11-residue peptide (vertical)]. (a) Comparison matrix. Needleman-Wunsch algorithm • Needleman and Wunsch constructed an algorithm to find the best alignment between 2 polypeptides. • Start at the lower right corner of the matrix (C-termini) at position M and N (these correspond to the 10th and 11th amino acid residues) and add the value to the position M-1, N-1 in the matrix. • Add to each element of the matrix the largest number from the row or column to the lower right of each element proceeding right to left, bottom to top. Page 197 Figure 7-29b Use of the Needleman-Wunsch alignment algorithm [alignment of 10-residue peptide (horizontal) with 11-residue peptide (vertical)]. (b) Transforming the matrix. Page 197 •Add to each element of the matrix the largest number from the row or column to the lower right of each element proceeding right to left, bottom to top. Page 197 •Add to each element of the matrix the largest number from the row or column to the lower right of each element proceeding right to left, bottom to top. Page 197 •Add to each element of the matrix the largest number from the row or column to the lower right of each element proceeding right to left, bottom to top. Page 197 •Add to each element of the matrix the largest number from the row or column to the lower right of each element proceeding right to left, bottom to top. Page 197 Figure 7-29c Use of the Needleman-Wunsch alignment algorithm [alignment of 10-residue peptide (horizontal) with 11-residue peptide (vertical)]. (c) Transformed matrix. Page 197 Figure 7-29d Use of the Needleman-Wunsch alignment algorithm [alignment of 10-residue peptide (horizontal) with 11-residue peptide (vertical)]. (d) Alignment. • • • • • • • Gap penalties If there are gaps in the alignment, the gap penalty must be applied. Gaps for a long gap are penalized slightly more than short gaps. a + bk a = penalty for opening the gap b = penalty for for extending gap by one residue k = length of gap between residues Empirical studies suggest a = -8 and b= -2 are appropriate values for the PAM-250 matrix. • Final alignment score for Fig. 7-29d (1-residue and 2-residue gap) is 41-(8 + 2 X 1) - (8 + 2 X 2) = 19 • • • • • • • • • Other algorithms Heuristic algorithms - algorithms that make educated guesses to increase the speed of the program used to make alignments. Heuristic algorithms are based on how proteins evolve. Risk= may get suboptimal results. PAM-250 Matrix is based on extrapolation: calculation assumes 1 PAM unit of evolutionary distance is the same as 250 PAM units. Because proteins can evolve at different rates this may not always be true. Another logs odd substitution matrix based on ~2000 blocks of aligned sequence from ~500 groups of related proteins calculated. For ungapped alignments, the best matrix is called BLOSUM62 (block substution matrix; 62 indicates that all blocks of alighned polypeptides in which there are >62% identity are weighted as a single seequence in order to reduce contributions from closely related sequences. For gapped alignments BLOSUM50 performs better. Both matrices are more sensitive than those based on PAM-250. BLAST • BLAST (basic local alignment search tool) and FASTA use different search philosophies. • BLAST (http://www.ncbi.nlm.nih.gob/BLAST/) performs pairwise alignments up to user-selected number of subject sequences in the selected database(s) most similar to the input query sequence. • Can align vs ~900,000 peptide sequences in the database. • Pairwise alignments are found using BLOSUM62 and listed according to decreasing statistical significance. • Alignments show both identical residues and similar residues between the query sequence and aligned sequence and gaps will be indicated. • Assigns “E” value - expected value = number of expected results by chance. • The higher the E value, the less significant. Page 199 Figure 7-30 Examples of peptide sequence alignments. FASTA • FASTA (http://www.ebi.ac.uk/fasta33/) allows users to choose the substitution matrix (PAM, BLOSUM) the default is BLOSUM50. • Allows user to choose the gap penalty parameters. • Allows user to choose ktup (k-tuple) value of 1 or 2 = number of consecutive residues in “words” that FASTA uses to search for identities. • The smaller the ktup value, the more sensitive the alignment. CLUSTAL • Multiple sequence alignment -To make alignments with more than 2 sequences. • CLUSTAL (http://www2.ebi.ac.uk/clustalw/) • User can select matrix and gap penalties. • Finds all possible pairwise alignments. • Starting with the highest scoring pairwise alignment, realigns remaining sequence. • Should be looked at carefully. Page 199 Figure 7-30 Examples of peptide sequence alignments. Chemical synthesis of oligonucleotides • Basic strategy is similar to polypeptide synthesis. • Protected nucleotide is coupled to growing end of oligonucleotide chain. • Protecting group is removed. • Process repeated until desired oligo has been synthesized. • Current method is the phosphoramidite method • Nonaqueous reaction sequence. • 4 steps. Page 208 Page 208 1. Dimethoxytrityl (DMTr) protecting group at the 5’ end is removed with trichloroacetic acid (Cl3CCOOH) Page 208 2. The 5’ end of the oligo is couple to the 3’ phosphoramidite derivative. Tetrazole is used as coupling agent. Page 208 3. Any unreacted 5’ end group is capped by acetylation to block its extension. Page 208 4. The phosphite triester group from the coupling step is oxidized with I2 to the phosphotriester. Treated with NH4OH to remove blocking groups. DNA Chips • Determination of the whole genomes from several organisms allows us to ask significant questions about the function of all the genes. • Under what circumstances and to what extent is each gene expressed under specific conditions? • How do gene products interact to yield a functional organism? • What are the consequences of variant genes? • DNA chips (microarrays, gene chips) can be used for global analysis of gene expression during biological responses. • Arrays of different DNA oligonucleotides anchored to a glass or nylon substrate in a grid. • ~1 million oligonucleuotides can by simultaneously synthesized using photolithography and DNA synthesis. Page 209 Figure 7-38 A DNA chip. DNA Chips • Photolithography-oligonucleotides are synthesized with photochemically removable protective groups at the 5’ end. • Function in a similar manner as the DMTr group in conventional synthesis. • For the synthesis of a specific oligonucleotide, utilize masks that protect specific oligos from being exposed to light while those that are to be extended are exposed to light. (deprotection) • The chip is then incubated with a solution of activated nucleotide that couples only to the deprotected oligos. • Excess is washed away and the process is repeated. • Nanoliter sized droplets of reagents are applied using a device similar to an ink jet printer. Page 210 Figure 7-39 The photolithographic synthesis of a DNA chip. Applications: SNPs • Can be used to examine single nucleotide polymorphisms (SNPs) • L-residue oligos are arranged in an array of L columns by 4 rows for a total of 4L sequences. • The probe in the Mth column has the standard sequence with the exception of the probes Mth position where it has a different base (A,C,G, or T) in each row. • One probe is standard whereas the other three in each column differ by one base pairs. • The probe array is hybridized with complementary DNA or RNA and variations in hybridization due to the SNPs can be rapidly determined. Applications: Expression profiles • DNA features put onto a chip and the level of expression of the corresponding genes in a tissue of interest can be determined by the degree of hybridization of its fluorescently labeled mRNA or cDNA population. • Used to generate an expression profile - pattern of expression. • Can be done with mRNA isolated under different growth conditions. • Can check how specific genes are affected. • Example: cyclin gene expression in different tissues of the same organism. Page 211 Figure 7-40 Variation in the expression of genes that encode proteins known as cyclins (Section 34-4C) in human tissues.