II. Mass Spectrometry

advertisement
1
An indispensable tool for systems biology:
mass spectrometry-based proteomics
Wing Yung

Mass spectrometry is an instrument that can identify and
quantify thousands of proteins from complex sample. To date,
proteomics, the large-scale analysis of proteins, has combined
with mass spectrometry to analyze quantitative protein profiles,
protein-protein interactions and post-translational modifications
of proteins. By providing diverse, quantitative, high-quality
proteomic data, mass spectrometry based proteomics becomes an
essential component of systems biology that seeks understanding
of biological processes and developing predictive models of
biological systems.
Keywords: Protein, proteomics, mass spectrometry, systems
biology
I. INTRODUCTION
P
is a systematic study of the many and diverse
properties of proteins in parallel to provide detailed
descriptions of structure, function and control of biological
systems in health and disease. In general, proteomics deals
with the large-scale determination of gene and cellular
function directly at the protein level. It is a particularly rich
source of biological information because proteins are involved
in almost all biological activities, which contribute greatly to
our understanding of biological system. Proteomics, an
analysis process, is an essential component of the systems
biology approach that seeks to comprehensively describe
biological systems through integration of diverse types of data
and to allow computational simulations of complex biological
system in the future [3].
Proteomics can be divided into three main areas: 1) protein
micro-characterization for large-scale identification of proteins
and their post-translation modifications; 2) differential display
proteomics for comparison of protein levels with potential
application in a wide range of diseases; and 3) studies of
protein-protein interactions using techniques such as mass
spectrometry of the yeast two-hybrid system [1].
The ability of mass spectrometry (MS) to identify ever
smaller amounts of protein from increasingly complex
mixtures is a primary driving force in proteomics as described
in the Nature by Mike Tyers et al [2]. Patterson et al defined
mass spectrometry as accurate measurement of charged
analytes (ions); in the context of proteomics, analytes are
usually peptides or less frequently ions; a mass spectrometer
ROTEOMICS
measures the mass to charge ratio of charged species under
vacuum and comprises. MS-based proteomics is a discipline
made possible by the availability of gene and gnome sequence
databases and protein ionization methods.
MS-based
proteomics has established itself as an indispensable
technology to interpret the information encoded in genomes.
So far protein analysis (primary sequence, post-translational
modifications or protein-protein interactions) by MS has been
most successful when applied to small sets of proteins isolated
in specific functional contexts [3].
In this paper, I will firstly describe what mass spectrometry
is and how can it identify and quantify thousands of proteins. I
will then review the applications of combined MS and
proteomics, to analyze protein profile, protein-protein
interactions and post-translation modification. This paper will
be concluded with key challenges and perspective of MS based
proteomics, especially it acts as an input (proteomic data)
and/or a component (proteomic analysis) of systems biology to
achieve system level understanding of cells or networks of
cells [8].
II. MASS SPECTROMETRY
A. Ionization
Mass spectrometry (MS) is carried out in the gas state on
ionized analytes. By definition, a mass spectrometer consists
of ion source, a mass analyzer that measure mass-to-charge
ratio of the ionized analytes, and a detector to register the
number of ions at each mass-to-charge value. Electrospray
ionization (ESI) and matrix-assisted laser desorption ionization
(MALDI) are the two common techniques to ionize the
proteins or peptides for mass spectrometry.
Patterson et al described MALDI as a process by which
ion formation is promoted by short laser pulses; the sample is
deposited on a sample plate into the source and then embedded
in a matrix that promotes ionization; a laser fired at the sample
that is co-crystallized with the matrix results in the desorption
of the analyte from the sample plated and its ionization [3].
Mann stated in Annual Review Biochem that the precise
nature of the ionization process in MALDI is still largely
unknown and the signal intensities depend on incorporation of
the peptides into crystals, their likelihood of capturing and /or
retaining a proton during the desorption process., and a
number of other factors including suppressing effects in
peptide mixtures. Proteins generally undergo fragmentation to
2
some extent during MALDI, resulting in broad peaks and loss
in sensitivity; therefore MALDI is mostly applied to the
analysis of peptides [6].
Different from MALDI, ESI takes place in atmosphere and
is therefore very gentle without fragmentation of analyte ions
in gas phase. The molecules are transferred into the MS with
high efficiency for analysis. A wide range of compounds
especially protein can be analyzed by ESI. Large ions are
typically multiply charged which brings them into the range of
mass to charge ratios of typical MS. The distribution of
charges gives rise to the typical multiple charge envelopes.
These spectra can be simplified by deconvolution, an
algorithm that sums up the signal intensity into a single peak at
the molecular weight of the analyte [6].
B. Mass Analyzer
According to Abersold et al, the mass analyzer is central to
the MS technology. In the context of Proteomics, its key
parameters are sensitivity, resolution, mass accuracy and the
ability to generate information-rich ion mass spectra. There are
three basic types of mass analyzer used in mass spectrometryproteomics. These are time of flight (TOF), quadruple and
three-dimensional trapping field (ion trap, Fourier transform
ion cyclotron) analyzers. The analyzers can be stand-alone
and put together in tandem to take advantage of the strengths
of each [4].
C. MALDI Time-of-Flight MS
Aebersold stated that because of its simplicity, excellent
mass accuracy, high resolution and sensitivity, MALDI-TOF
(figure 1) which is known as peptide-mass mapping or peptidemass fingerprinting is the first approach. In this approach, the
mass spectrum of the eluted peptide mixture is acquired, which
results in a peptide-mass fingerprint of the protein being
studied. Because mass mapping requires a purified target
protein, this approach is commonly used with prior protein
fractionation using two-dimensional gel electrophoresis (2DE)
[4]. This mass spectrum is obtained by MALDI, which results
in a TOF distribution of the peptides comprising the mixture.
The reason for using peptides rather than proteins is that gelseparate proteins are difficult to elute and to analyze by mass
spectrometry, and the molecular weight of proteins is not
sufficient database identification. In contrast, peptides are
easily eluted from gels and even a small set of peptides from a
protein provides sufficient information for identification [1].
Mann identified the fact that some of the peptide ions decay
because of the energy departed in the desorption process has
been used to obtain structural information in a technique
termed post-source decay. It can provide important structural
information but does not serve as general peptide sequencing
method because it is not sufficiently sensitive and simple to
control [6].
Figure 1 Schematic of MALDI process (A) & MALDI-TOF
instrument (B) The mass-to-charge ratio is related to time, it
takes an ion to reach the detector the lighter ions arrive first.
D. ESI Quadrupole MS
The quadrupole is a mass filter which consists of four rods
to which an oscillating electric field is applied and which lets
only a certain mass pass through. To date most peptide
sequencing experiments is performed on triple quadrupole that
consist of three sections: two mass-separating quadrupole
sections separated by a central quadrupole (or a higher
multiple) section whose function is to contain the ions during
fragmentation. Quadrupole MS are capable of unit mass
resolution and mass accuracy of 0.1-1Da and excel at
quantitative measurements. The triple quadrupole can be
programmed for a variety of different scan modes in addition
to the isolation of the peptide followed by obtaining a mass
spectrum of the fragments (collision-induced (CID) spectra)
[6].
Figure 2 Schematic of a quadrupole TOF instrument. Ions
are separated in Q1 and dissociated in q2. They enter the TOF
analyze through a grid and are pulsed into the reflector and
onto the detector, where they are recorded.
In recent years the triple quadrupole has begun to be
complemented by the quadrupole time-of-flight (TOF)
instrument (figure 2) in which the third quadrupole section is
replaced by a TOF analyzer. Its main advantages are that it
3
provides the high accuracy for mass and high resolution,
resulting in unambiguous determination of charge state and
very high specificity in the database searches.
In another method, the peptides are ionized by ESI directly
from the liquid phase. The peptide ions are spayed into a
tandem mass spectrometer which has the ability to resolve
peptides in the mixture and isolate one species at a time. The
main advantage of it over MALDI fingerprinting is that
sequence information derived from several peptides is much
more specific for the identification of a protein than a list of
peptide masses. The fragmentation data can not only be used
to search protein sequence databases but also nucleotide
database such as expressed sequence tag databases and even
raw genomic sequence databases [1].
E. Liquid Chromatography and Tandem MS
Liquid chromatography (LC) coupled to tandem mass
spectrometry called LC-MS/MS, is a power technique for the
analysis of proteins and peptides. Complicated mixtures
containing hundreds of proteins vary by orders of magnitude,
LC-MS/MS can be used alone or in combination of with 1DE
or 2DE or other protein purification techniques [6].
Before LC-MS/MS can be used, there are two technical
issues need to be addressed. 1) The relationship between the
amount of analyte present and measured signal intensity is
complex and incompletely understood. MS is therefore a poor
quantitative device; and 2) the amount of data collected by the
method is huge and its analysis daunting. The question of
what constitutes an identified protein in a LC-MS/MS
experiment has been difficult to answer. It is therefore
important that computer programs that use robust and
transparent statistical principles to estimate accurate
probabilities indicating the likelihood for the presence of a
peptide or protein in the sample are further developed and
widely tested and applied [4].
Mann echoed that if it is the approach of using proteolytic
digestion of mixtures of proteins and LC-MS/MS analysis of
the peptides generated, it is no longer possible to identify
proteins based on peptide mass profiling because peptides are
scrambled and multiplexed. Instead, tandem mass spectral
data must be generated and interpreted. He saw software
control and automation accelerate the process of acquiring the
MS/MS data and hundreds of MS/MS spectra can be generated
in a single run [6]. Data analysis tools are essential in these
high capacity experiments.
III. MS-BASED PROTEOMICS
MS-based proteomics has established itself as an
indispensable technology to interpret the information encoded
in genomics. So far protein analysis (primary sequence, posttranslational modification (PTM) or protein-protein
interactions) by MS has been most successful when applied to
small sets of protein isolated in specific functional contexts.
The systematic analysis of the much larger number of proteins
expressed in a cell, an explicit goal of proteomics is now also
rapidly advancing.
A. Protein profiling
MS-based proteomics has interfaced well with 1) the
generation of protein-protein linkage map; 2) the use of
protein identification technology to annotate and correct
genomics DNA sequences and 3) the use of quantitative
methods to analyze protein expression profiles as a function of
cellular state as an aid to infer cellular function. Stable-isotope
dilution and LC-MS/MS are used to accurately detect changes
in quantitative protein profiles and to infer biological function
from the observed patterns [4].
The most valuable information on the system being studied
is obtained from those proteins that are expressed differentially
in a matrix of proteins of unchanged expression, therefore
proteomic technologies detecting difference in protein profiles
need to be quantitative. Two peptides of identical chemical
structure that differ in mass because they differ in isotopic
composition are expected according to stable isotope dilution
theory, to generate identical specific signals in as mass
spectrometer [3].
Gygi et al described an approach based on a class of new
chemical reagents termed isotope-code affinity tags (ICATs)
and tandem mass spectrometry (LC MS/MS and sequence
database searching). The reagent exists in two forms heavy
(d8) and light (d0). This approach is based on two principles.
First, a short sequence of contiguous amino acids from a
protein contains sufficient information to identify that unique
protein. Second, pairs of peptides tagged with the light and
heavy reagents are chemically identical and serve as ideal
mutual internal standards for accurate quantification. In MS,
the ratio between the intensities of the lower and upper mass
components of these pairs of peaks provide an accurate
measure of the relative abundance of the peptides (and hence
the proteins) in the original cell pools because the MS intensity
response to a given peptide is independent of the isotopic
composition of the ICAT reagents [5].
B. Analysis of post-translation modification
Proteins are converted to their mature form through a
complicated sequence of post-translational protein processing
events. Many of the post translational modifications (PTM)
are regulatory and reversible. Pandey et al. stated that one of
the unique features of proteomics studies is the ability to
analyze the PTM of proteins. Phosphorylation, glycosylation
and sulphation as well as many other modifications are
extremely important for protein function as they can determine
activity, stability, localization and turnover. MS is the
proteomic method of choice to determine protein
modifications and the task is more difficult than protein
identification [1].
Mann et al said that the MS methods have been refined to
using peptide mapping with different enzymes to “cover” as
much of the protein sequence as possible.
Protein
modifications are then determined by via computer-aided
interpretation. For analysis of some types of PTMs, specific
MS techniques have been developed that scan the peptides
derived from a protein for the presence of a particular
4
modification. But, the analysis of regulatory modification
remains a challenge due to the size and ionizability of peptides
bearing the modifications and their fragmentation behavior in
the MS [4].
Given the difficulties of identifying all modifications in a
single protein, at present, scanning for proteome-wide
modification is not comprehensive. Another approach is
instead of searching the database only for non-modified
peptides, the databases algorithm is instructed to also match
potentially modified peptides. In summary the experiment is
divided into identification of a set of protein based of nomodified peptides followed by searching only these proteins
for modified peptides [4].
Overall, many challenges remain in the large-scale mapping
of PTMs, but it is clear that MS-based proteomics can make a
unique contribution in this area, for example systematic
quantitative measurements of PTMs by stable-isotope labeling.
C. Analysis of protein interactions
Most proteins exert their function by way of protein-protein
interactions and enzymes are often held in tightly controlled
regions of the cell by such interactions. The question is to
what protein does it bind? MS with ICAT strategy of Gygi et
al [5] – fully processed and modified protein can serve as the
bait that the interactions take place in the native environment
and cellular location and that multicomponent complex can be
isolated and analyzed in a single operation. However, because
many biologically interactions are of low affinity, transient and
generally dependent on the specific cellular environment in
which they occur. MS based methods in a straightforward
affinity experiment only detect a subset of the protein
interactions that actually occur. Bioinformatics methods,
correlation of MS data with those obtained by other methods
or iterative MS measurements possibly in conjunction with
chemical cross-linking can often help to further explain direct
interactions and overall topology of multi-protein complexes
[4].
Mann et al described epitope-tagging strategy, an example
of functional proteomics, used in identification of interacting
proteins. The cDNA of interest is first cloned into a vector
that provides an epitope tag. This is followed by transfection
of the tagged “bait” into the cell of interest. The cells are then
lysed and the lysates purified by affinity purification using an
antibody against epitope. Proteins bound specifically to the
bait protein are eluted by competitive elutin using a peptide
that encodes the epitope. The proteins are then resolved by gel
electrophoresis followed by mass spectrometric identification.
Mann continued to describe a modification to this approach
which is two epitope tags. This may provide decreased
binding to nonspecific proteins as well as to improve the
recovery of the protein complex [6].
Pandey et al believed that functional proteomics will
provide a wealth of protein-protein interaction data, which will
probably be its most important and immediate impact on
biological science. Because proteins are one step closer to
function than are genes, these studies frequently lead directly
to biological discoveries or hypothesis [1].
IV. CHALLENGES & EXPECTATIONS
As proteins are involved in essentially all biological
functions and clinical conditions, MS and proteomics will have
an even greater impact on biology and medicine that it has had
so far. MS-based proteomics interfaces particularly well with
cell biological studies for studying specific protein functions.
The success is built on the proven potential of MS techniques
to rapidly identify almost any protein, to analyze that protein
for the presence of post-translational modifications, to
determine how and with what other biomolecules that proteins
interact [4]. This shift in focus from the analysis of selected
isolated proteins to proteome-wide analyze but poses yet
unmet challenges as following.
Proteomic MS, different from protein MS, often collect
large amounts of data in the absence of hypotheses concerning
specific proteins or activities. Successful proteomics
experiments need to be designed in such a way that they can
take advantage of the power of statistics for interpretation. To
achieve this goal, carefully controlled repeat studies and the
generation of models describing the source, magnitude and
distribution of error will be essential. MS-based proteomics
result in large amounts of data. Data collection at a volume
and quality that is consistent with the use of statistical methods
is a significant limitation of proteomics today. In a typical LCMS/MS, approx. 1000 CID spectra can be acquired per hour.
It would take a long time to analyze complete proteomes.
High-throughput collection of consistently high quality data
remains a challenge in proteomics.
The analysis and
interpretation of the enormous volumes of proteomic data is
also a challenge. In order to compare data from different labs,
proteomic tools in the area of data analysis, data storage,
visualization and communication are critical. The publication
of the large data sets generated by proteomics experiments and
the information contained therein poses significant challenge
[4].
I agreed with Patterson et al those three challenges needs to
be addressed in order for proteomics to have a substantial
impact on the systems biology model. The first challenge is the
enormous complexity of the proteome. The second challenge
is the limited throughput of today’s proteomic platforms:
iterative, systematic measurements on differentially perturbed
systems demand a sample that is not matched by current
proteomic platform. The third challenge is lack of a general
technique for the absolute quantification of proteins,
eliminating the need of a reference sample [3].
V. CONCLUSION & PERSPECTIVE
MS-based proteomics is an essential component to systems
biology research because proteins are rich in information that
has turned out to be extremely valuable for the description of
biological processes. This include protein abundances, linkage
maps to other proteins or other types of biomolecules
including DNA and lipids, activities., modification states,
5
subcellular location and more. Unfortunately, with the
exception of quantitative protein profiles and protein-protein
interactions (discussed above), none of these properties can
currently be measured systematically, quantitatively and with
high throughput [7].
According to Patterson et al in short term, MS based
proteomics also can be expected to provide partial data sets of
sufficient quality, density and information content to provide
the basis for generating sophisticated models of biological
processes that will be able to simulate system properties such
as adaptation and robustness, which may not be appear form
the analysis of isolated elements of a system [3].
In long term, researchers expected that combining different
genomic and proteomic results obtained from some biological
system will substantially increase understanding of complex
biological processes. For instance, mRNA expression profiles
and protein expression profiles seem to be largely
complementary and therefore contribute to more refined
description of the system that each observation by itself is
unable to provide [6]. Idekar et al stated the ultimate goal of
systems biology is the integration of data represent and
simulate the physiology of cell. More specifically, systems
biology based on diverse and high quality proteomic data will
define functional biological modules, reveal previously
unrecognized connections biochemical processes and modules,
and generate new hypothesis that can be tested by the targeted
generation of more proteomic data [8].
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
A. Pandey, et al “Proteomics to study genes and genomes” Nature, Vol.
405, 15 June 2000, pp. 837-846
M. Tyers, et al “ From genomics to proteomics” Nature, Vol. 422, 13
March 2003, pp. 193-196
S. Patterson, et al “Proteomics: the first decade and beyond” Nature
Genetics supplement, Vol. 33, March 2003, pp. 311-321
R. Aebersold, et al “Mass spectrometry-based proteomics” Nature, Vol.
422, 13 March 2003, pp 198-207
S. Gygi, et al “Quantitative analysis of complex protein mixtures using
isotope-coded affinity tags” Nature Biotechnology, Vol. 17, October
1999, pp 994-999
M. Mann, et al “Analysis of protein and proteomes by mass
spectrometry” Annu Rev. Biochem. Vol. 70, 2001, pp. 437-473
T Ideker, et al “Integrated Genomics and proteomics analysis of a
systematically perturbed metabolic network” Science, Vol. 292, 2001,
pp 929-934
T Ideker, et al “A new approach of decoding life: system biology” Annu.
Rev. Genomics Human Genes, Vol. 2, 2001, pp 343-372
Download