Part II (Protein Structure).

advertisement
PROTEIN STRUCTURE
NAME: ANUSHA
INTRODUCTION
• Frederick Sanger was awarded his first Nobel
Prize for determining the amino acid sequence
of insulin, the protein needed by people
suffering from Diabetics.
• Each protein has a different functionality, like
the brain cells need a different protein to
function than the liver cells do.
Amino Acids vs Nucleic Acids
1. They are composed of one
carboxyl group and one or
more amino group
2. There are 20 different types
of amino acids
3. They are the building blocks
of all proteins.
4. They are linked by peptide
bonds to form protein.
5. Example:
1. They are composed of sugar,
nitrogenous base and
phosphate group.
2. The two types of nucleic acids
are DNA and RNA.
3. DNA is the genetic material.
RNA is mainly responsible for
protein synthesis and is
genetic material in some
viruses.
Protein Sequencing and Identification
• There are two computational problems for
protein sequencing :
1. De novo protein sequencing
2. Protein identification.
• Example: A biologist wants to determine the
protein that form the DNA Polymerase
complex in Rats.
Rats Genome and gene Sequence:
Mass Spectrometer
• Protease, e.g. Trypsin, breaks the protein into peptides.
• A Tandem Mass Spectrometer further breaks the peptides
down into fragment ions and measures the mass of each
piece.
• Mass Spectrum of a peptide is a collection of masses of
these fragments
• Mass Spectrometer electrically accelerates the
fragmented ions; heavier ions accelerate slower than
lighter ones.
• Mass Spectrometers measure mass/charge ratio of an ion.
Mass Spectrometry
• First the components are
separated by
Electrophoresis.
• The isolated proteins are
digested by Trypsin to
produce peptide
fragments with relative
molecular masses.
Protein Backbone:
Breaking the Protein:
• Trypsin breaks after Lys
and Arg residues.
• Given a typical amino
acid composition, a
protein of 500 residues
yields about 50 Tryptic
fragments.
• The mass spectrometer
measures the masses of
the fragments with very
high accuracy
Peptide Sequencing Problem
• Tandem Mass Spectrometry (MS/MS): mainly
generates partial N- terminal and C- terminal
peptides.
• Spectrum consists of different ion types
because peptides can be broken in several
parts.
• Chemical noise often complicates the
spectrum.
• Represented in 2D :mass/charge axis vs
intensity
Tandem Mass Spectrum: An Example
N-terminal and C-terminal peptides
• While breaking
into
&
it may lose some small parts of
&
,
results in fragments of a lower mass.
• For example, the peptides
might lose water
(H2O ), and the peptides
loses an ammonia
(NH3 ).
• The resulting Masses detected by the spectrometer
will be equal to the mass of
minus the mass of
H2O, and mass of
minus the NH3 , etc
Terminal peptides and ion types
Fragment pattern of peptide
Two different types of fragment ions b-ions and y-ions. When the
carbon nitrogen bond breaks in the spectrometer each of these
ion type will lose water or ammonia or both.
Theoretical Mass Spectrum
Mass differences corresponding to the
amino acids
Peptide Sequencing Problem:
Algorithm
Spectrum Graph
• This is one of the approach for solving the Peptide
Sequencing Problem.
• In this approach we construct a graph from the
experimental spectrum.
• Example :
• Consider an Experimental spectrum S = {s1,…..,sq} Nterminal ions.
• We generate K different guesses for each of masses in
the experimental spectrum.
• Every guess s = x – δj where x is the mass of some
partial peptide and 1<= j <= k.
• For every mass x in the experimental spectrum ,there
are k guesses for the mass x of some partial peptide : s
+ δ1,s + δ2,…….,s + δk.
• Each mass in the experimental spectrum is transformed
into a set of k vertices in the spectrum graph.
• The vertex for δi for the mass s is labeled with mass
s + δi
• Then we connect two vertices u and v in the graph by the
directed edge(u,v), if the mass of v is larger than that of u.
• If we add a vertex at 0 and a vertex at parent mass m. Then
we have to find a path from 0 to m.
• The spectrum graph may have at most qk+2 vertices.
• Edges of the spectrum graph by the amino acid whose
mass is equal to difference between vertex masses.
• This shows that the Peptide Sequencing problem as one of
the finding the “correct” path in the set of all the paths
between the two vertices in the directed acyclic graph.
Spectrum Graph
Protein Identification via Database
Search
• De novo peptide sequencing is invaluable for
identification of unknown proteins
• However , de novo algorithm are designed for
working with high quality spectra with good
fragmentation and without modification.
• Another approach is to compare a spectrum
against a set of known spectra in a database.
Protein Identification Problem
• Input : A database of proteins, an experimental spectrum S, a set of
ions types ∆, and a parent mass m.
• Output: A protein of mass m from the database with the best match
to spectrum S.
MS/MS Database search
• Database search in the mass-spectrometry has been very
successful in identification of already know proteins.
• Experimental spectrum can be compared with the theoretical
spectra database peptides to find the best fit.
• SEQUEST is one for the popular algorithm it determines
whether a database entry matches an experimental spectrum.
The basic approach of this algorithm is just a linear search
through the database.
• The drawback to MS/MS database search algorithm like
SEQUEST is that peptides in a cell are often slightly different
from the canonical peptides present in the database.
Modified Protein Identification
Problem
• Input :
Experimental spectrum S
Database of Peptides
parameter k( number of modification)
A set of ion types ∆
Parent mass m.
• Output :
A protein of mass m with the best match to spectrum S
that is at most K modifications away from an entry in the
database .
• The drawback of the modified protein identification problem is that
very similar peptides may have very different spectra
• Goal : define a notion of spectral similarity that correlates well with
the sequence similarity.
• If peptides are few modification s apart, then the spectral similarity
between them should be high.
Shared peak count
Spectral Convolution
• It is the number of masses common to both
spectral S1 and S2. is simply
..
• MS/MS database search algorithms that
maximizes
, is the theoretical
spectrum and is the experimental spectrum.
• If peptides P1 and P2 differ by only one mutation
with amino acid difference δ = m(p2) – m(p1)
then
is expected to have approximately
equal peaks at x =0 and x = δ.
Example:
be a spectrum of peptide P, and assume that P produces only b-ions
Let:
and
Which of the peptides fits the S best?
Shared peak count : since both S’ and S’’ have 5 peaks in common with S.
Spectral convolution : S ϴ S’ and S ϴ S’’ have strong peaks of same heights at 0
and 5.
This reveals that both P’ and P ‘’ can be obtained from P by single mutation with
mass difference of 5
S1 and S2 are the theoretical spectra of the peptide PRTEIN and PRTEYN respectively. The
Elements in the spectral convolution that have multiplicity > 2 are shaded, while the
elements with multiplicity = 2 are circled. The high multiplicity element 0 are shaded in red
, other higher element 50 in green due to the shift in the masses by δ = 50 due to the mutation
of I and Y in PRTEIN
Spectral convolution
Protein Folding
• DNA to RNA to Protein
• Protein folding.
• Why do Protein Folds? Why is folding
Important?
Conclusion
•
•
•
•
•
•
Tandem Mass spectrometry
De novo Peptide Sequencing
Spectrum Graph
Protein identification via Database Search
Spectral Convolution
Protein Folding.
References
• An introduction to Bioinformatics Algorithm
by Neil C. Jones and Pavel A. Pevzner.
• Introduction to bioinformatics – Arthur M.Lesk
Thank You
Download