Lecture 8

advertisement
FCH 532 Lecture 8
Exam on Friday
Chapter 7
Sequence alignments
• An alignment score (AS) is used to determine if there
is any relationship.
• 10 for every identity except Cys which scores 20
• Subtract 25 for every gap.
• The normalized alignment score (NAS) by dividing
the AS by the number of residues of the shortest of the
two polypeptides in the alignment and multiplying by
100.
• Example Human hemoglobin and myoglobin.
Page 195
Figure 7-27 The optical alignments of human myoglobin
(Mb, 153 residues) and the human hemoglobin a chain (Hba,
141 residues).
Hemoglobina is 141 and myoglobin is 153
AS = number of identities X 10 + 20 for Cys -number of gaps
= 37 identities X 10 + 20 (Cys) - (1gap X 25)= 365
NAS = AS/number of residues for shortest polypeptide
=365/141*100 = 259
Page 195
Figure 7-28 A guide to the significance of normalized
alignment scores (NAS) in the comparison of peptide
sequences.
Alignments are weighted according to
the likelihood of substitution
• Realistic way of assigning the probability of occurrence
(weight) for a substitution is to look at the physical similarity
of amino acids.
• Dayhoff measured a number of residue exchanges for
closely related proteins and determined their relative
frequency of the 20 X 19/2 = 190 different possible residue
changes.
• This number is divided by 2 to account for the fact that A 
B and B  A are equally likely.
• These data can be used to create a square matrix (20 X 20)
• The elements (20 properties per side) Mij, indicate the
probability that, in a related sequence, amino acid i will
replace amino acid j after an evolutionary interval (usually
one PAM unit).
• PAM-1 matrix.
PAM matrix
•
•
•
•
•
•
•
•
•
Mutation probability can be determined for other evolutionary distances.
PAM-N matrix is made bt multiplying the matrix by itself N times ([M]N).
Relatedness odds matrix - Rij = Mij/fi
fi = probability that the amino acid i will occur in the second sequence by
chance.
Rij = probability that amino acid i will replace amino acid j or vice versa
every time i or j is encountered in the sequence.
When two polypeptides are compared with each other, the Rij values for
each position are multiplied to give the relatedness odds.
For example A-B-C-D-E-F and P-Q-R-S-T-U, relatedness odds = RAP X
RBQ X RCR X RDS X RET X RFU
Log odds substitution matrix - is made by taking the log of the
relatedness odds.
Log odds need to be maximized to get the best alignment.
Table 7-7
The PAM-250 Log Odds Substitution Matrix.
Page 196
All elements multiplied by 10.
Each diagonal element indicates
the mutability of the
corresponding amino acid.
Neutral score = 0.
Sequence alignment
• Make a matrix with the log odds values associated with
the amino acids at the appropriate positions.
• Example use a PAM-250 log odds matrix with a 10
peptide horizontal and 11 peptide vertical.
• The alignment of these two peptides must have at least
one gap assuming a significant alignment can be found.
• This is called a comparison matrix
Page 197
Figure 7-29a Use of the Needleman-Wunsch alignment
algorithm [alignment of 10-residue peptide (horizontal) with
11-residue peptide (vertical)]. (a) Comparison matrix.
Needleman-Wunsch algorithm
• Needleman and Wunsch constructed an algorithm to
find the best alignment between 2 polypeptides.
• Start at the lower right corner of the matrix (C-termini)
at position M and N (these correspond to the 10th and
11th amino acid residues) and add the value to the
position M-1, N-1 in the matrix.
• Add to each element of the matrix the largest number
from the row or column to the lower right of each
element proceeding right to left, bottom to top.
Page 197
Figure 7-29b Use of the Needleman-Wunsch alignment
algorithm [alignment of 10-residue peptide (horizontal) with
11-residue peptide (vertical)]. (b) Transforming the matrix.
Page 197
•Add to each element of the matrix the largest number
from the row or column to the lower right of each
element proceeding right to left, bottom to top.
Page 197
•Add to each element of the matrix the largest number
from the row or column to the lower right of each
element proceeding right to left, bottom to top.
Page 197
•Add to each element of the matrix the largest number
from the row or column to the lower right of each
element proceeding right to left, bottom to top.
Page 197
•Add to each element of the matrix the largest number
from the row or column to the lower right of each
element proceeding right to left, bottom to top.
Page 197
Figure 7-29c Use of the Needleman-Wunsch alignment
algorithm [alignment of 10-residue peptide (horizontal) with
11-residue peptide (vertical)]. (c) Transformed matrix.
Page 197
Figure 7-29d Use of the Needleman-Wunsch alignment
algorithm [alignment of 10-residue peptide (horizontal) with
11-residue peptide (vertical)]. (d) Alignment.
•
•
•
•
•
•
•
Gap penalties
If there are gaps in the alignment, the gap penalty must be applied.
Gaps for a long gap are penalized slightly more than short gaps.
a + bk
a = penalty for opening the gap
b = penalty for for extending gap by one residue
k = length of gap between residues
Empirical studies suggest a = -8 and b= -2 are appropriate values
for the PAM-250 matrix.
• Final alignment score for Fig. 7-29d (1-residue and 2-residue gap)
is 41-(8 + 2 X 1) - (8 + 2 X 2) = 19
•
•
•
•
•
•
•
•
•
Other algorithms
Heuristic algorithms - algorithms that make educated guesses to
increase the speed of the program used to make alignments.
Heuristic algorithms are based on how proteins evolve.
Risk= may get suboptimal results.
PAM-250 Matrix is based on extrapolation: calculation assumes 1 PAM
unit of evolutionary distance is the same as 250 PAM units.
Because proteins can evolve at different rates this may not always be true.
Another logs odd substitution matrix based on ~2000 blocks of aligned
sequence from ~500 groups of related proteins calculated.
For ungapped alignments, the best matrix is called BLOSUM62 (block
substution matrix; 62 indicates that all blocks of alighned polypeptides in
which there are >62% identity are weighted as a single seequence in
order to reduce contributions from closely related sequences.
For gapped alignments BLOSUM50 performs better.
Both matrices are more sensitive than those based on PAM-250.
BLAST
• BLAST (basic local alignment search tool) and FASTA use
different search philosophies.
• BLAST (http://www.ncbi.nlm.nih.gob/BLAST/) performs pairwise
alignments up to user-selected number of subject sequences in the
selected database(s) most similar to the input query sequence.
• Can align vs ~900,000 peptide sequences in the database.
• Pairwise alignments are found using BLOSUM62 and listed
according to decreasing statistical significance.
• Alignments show both identical residues and similar residues
between the query sequence and aligned sequence and gaps will
be indicated.
• Assigns “E” value - expected value = number of expected results
by chance.
• The higher the E value, the less significant.
Page 199
Figure 7-30
Examples of peptide sequence alignments.
FASTA
• FASTA (http://www.ebi.ac.uk/fasta33/) allows users to
choose the substitution matrix (PAM, BLOSUM) the
default is BLOSUM50.
• Allows user to choose the gap penalty parameters.
• Allows user to choose ktup (k-tuple) value of 1 or 2 =
number of consecutive residues in “words” that FASTA
uses to search for identities.
• The smaller the ktup value, the more sensitive the
alignment.
CLUSTAL
• Multiple sequence alignment -To make alignments
with more than 2 sequences.
• CLUSTAL (http://www2.ebi.ac.uk/clustalw/)
• User can select matrix and gap penalties.
• Finds all possible pairwise alignments.
• Starting with the highest scoring pairwise alignment,
realigns remaining sequence.
• Should be looked at carefully.
Page 199
Figure 7-30
Examples of peptide sequence alignments.
Chemical synthesis of oligonucleotides
• Basic strategy is similar to polypeptide synthesis.
• Protected nucleotide is coupled to growing end of
oligonucleotide chain.
• Protecting group is removed.
• Process repeated until desired oligo has been synthesized.
• Current method is the phosphoramidite method
• Nonaqueous reaction sequence.
• 4 steps.
Page 208
Page 208
1. Dimethoxytrityl (DMTr) protecting group at the 5’ end
is removed with trichloroacetic acid (Cl3CCOOH)
Page 208
2. The 5’ end of the oligo is couple to the 3’
phosphoramidite derivative. Tetrazole is used as
coupling agent.
Page 208
3. Any unreacted 5’ end group is capped by acetylation
to block its extension.
Page 208
4. The phosphite triester group
from the coupling step is
oxidized with I2 to the
phosphotriester.
Treated with NH4OH to remove
blocking groups.
DNA Chips
• Determination of the whole genomes from several organisms
allows us to ask significant questions about the function of all the
genes.
• Under what circumstances and to what extent is each gene
expressed under specific conditions?
• How do gene products interact to yield a functional organism?
• What are the consequences of variant genes?
• DNA chips (microarrays, gene chips) can be used for global
analysis of gene expression during biological responses.
• Arrays of different DNA oligonucleotides anchored to a glass or
nylon substrate in a grid.
• ~1 million oligonucleuotides can by simultaneously synthesized
using photolithography and DNA synthesis.
Page 209
Figure 7-38
A DNA chip.
DNA Chips
• Photolithography-oligonucleotides are synthesized with
photochemically removable protective groups at the 5’ end.
• Function in a similar manner as the DMTr group in conventional
synthesis.
• For the synthesis of a specific oligonucleotide, utilize masks that
protect specific oligos from being exposed to light while those that
are to be extended are exposed to light. (deprotection)
• The chip is then incubated with a solution of activated nucleotide
that couples only to the deprotected oligos.
• Excess is washed away and the process is repeated.
• Nanoliter sized droplets of reagents are applied using a device
similar to an ink jet printer.
Page 210
Figure 7-39
The photolithographic synthesis of a DNA
chip.
Applications: SNPs
• Can be used to examine single nucleotide
polymorphisms (SNPs)
• L-residue oligos are arranged in an array of L columns
by 4 rows for a total of 4L sequences.
• The probe in the Mth column has the standard
sequence with the exception of the probes Mth position
where it has a different base (A,C,G, or T) in each row.
• One probe is standard whereas the other three in each
column differ by one base pairs.
• The probe array is hybridized with complementary DNA
or RNA and variations in hybridization due to the SNPs
can be rapidly determined.
Applications: Expression profiles
• DNA features put onto a chip and the level of
expression of the corresponding genes in a tissue of
interest can be determined by the degree of
hybridization of its fluorescently labeled mRNA or cDNA
population.
• Used to generate an expression profile - pattern of
expression.
• Can be done with mRNA isolated under different growth
conditions.
• Can check how specific genes are affected.
• Example: cyclin gene expression in different tissues of
the same organism.
Page 211
Figure 7-40 Variation in the expression of genes that
encode proteins known as cyclins (Section 34-4C) in human
tissues.
Download