Proteomic studies of the environmentally important methanotroph Methylocella silvestris
Konstantinos Thalassinos, Vibhuti Patel, Susan Slade, Nisha Patel, Joanne
Connoly, Andrew Crombie, Colin Murrell, James Scrivens
Human Genome 20,000 – 25,000 genes 1
A single gene can give rise to many different proteins by the process of alternative splicing
In complex genes, alternative splicing can generate dozens or even hundreds of different mRNA isoforms 2
It is estimated that almost 50% of all proteins contain one or more post-translational modifications 3
1 International Human Genome Sequencing Consortium (2004). "Finishing the euchromatic sequence of the human genome.".
Nature 431 (7011): 931-945
2 Missler, M. and Sudhof, T. C. (1998). Neurexins: three genes and 1001 products. Trends Genet. 14,
20–26.
3 Apweiler, R., Hermjakob, H. and Sharon, N. (1999). On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim.Biophys. Acta 1473, 4–8.
Proteomics is the global characterisation of protein products (sequence, post-translational modifications, protein-protein interactions) expressed by a given genome at a specific point in time
Unlike the genome, the proteome is a dynamic entity
Biological
Dynamic range
4 orders of magnitude in cells and 10 orders of magnitude in plasma
Post-translational modifications
Alternative splicing
Technical
Sample requirements
Complex, time-consuming sample preparation
Days of experimental time
Pedrioli, P. G., et al., (2004). A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol. 22, 1459–1466.
Profiling
Identify which proteins are present in a sample
Differential
Probe for changes in protein expression levels between a number of environmental states
Identify the presence and map the position of all posttranslational modifications present on each protein
A mass spectrometer separates ions according to their mass-to-charge ( m/z ) ratio
Direct Sample Introduction
Liquid Chromatography
ESI
MALDI
Quadrupole
TOF
Ion Trap
FT
MS/MS fragment ion nomenclature
Limitations of Current Database Search Programs
Finding a peptide match after a database search is easy, but knowing whether it is correct is not
It is almost always possible to match a MS/MS spectrum to a peptide in the database
Incorrect matches often (but not always) result from use of low quality peptide MS/MS data to search the database
Actual peptide sequence is not in the database searched (under the search conditions used)
Probability of a false positive assignment is much higher for proteins identified with only one peptide (known as one-hitwonders)
According to publishing guidelines more than 1 peptide per protein is required
Acidophillic aerobic methanotroph
Methanotrophs use methane as their sole carbon and energy source
Key position in the global methane cycle
Effects of multi-carbon substrates on activity
Soluble methane monooxygenase
(sMMO)
The recently discovered genus of
Methylocella is capable of utilising certain multi-carbon compounds as well as methane
To measure changes in protein expression of
Methylocella silvestris under varying growth conditions.
Relate changes in proteome to important biological pathways .
Compare existing methods and new approaches using the same instrumentation and software without bias.
H
3
C CH
3
H O
O
O
OH
H
H
H
OH H
H
H
N
H
H
H
H
H
Methane
H
O
H O
Acetate
CH
3
M. silvestris genome recently published
Predict all open reading frames
Use custom Perl scripts to create appropriate FASTA formatted database
SDS-PAGE gels
No quantitation
iTRAQ
Uses 4 isobaric tags for quantitation
MS E (Identity E )
Uses an internal standard for quantitation
Database search results saved in MySQL database
Gels iTRAQ Identity E
Including single-peptide identifications
Two peptides or more
Replicate analyses of iTRAQ-labelled samples
E
Log(e) ratio plot of common proteins expressed under methane and acetate growth.
Protein loading
Total experimental time
Total instrument time
Number of proteins identified
Average number of peptides per protein
Average sequence coverage
Dynamic range covered by relative quantitation
Gels
14 µg
4 days
30-40 hours
95
3.7
17.2 %
-iTRAQ
800 – 1000 µg
6 days
30-40 hours
171
2
11.5 %
3
Identity E
0.5 – 0.75 µg
Less than 3 days
6 hours per sample
399
10
50.4 %
4
All the methodologies employed provided good profiling coverage of the respective proteome.
iTRAQ and Identity E both provide information on protein identity and changes in expression.
Identity E
More confident protein identifications
Lower protein requirements
Significantly less instrument demands
Significantly reduced sample preparation time
Provides a stand-alone quantitative estimate of the proteins present in any given sample
A comparison of labelling and label-free mass spectrometry-based proteomics approaches.
Konstantinos Thalassinos, Vibhuti Patel, Susan Slade,
Nisha Patel, Joanne Connoly, Andrew Crombie, Colin
Murrell, James Scrivens.
Journal of Proteome Research
Continue the studies on the effect of growth substrate on the proteome of M. silvestris .
Relate results back to M. silvestris cell biochemistry in particular the pathways of multi-carbon substrate assimilation
Distinct protein profiles for each substrate:
Certain proteins only expressed under methane.
Certain proteins only expressed under acetate.
Significantly lower levels of key enzyme soluble methane monooxygenase ( sMMO ) when grown under acetate.
Kyoto Encyclopedia of Genes and Genomes (KEGG)
Manually curated database of biological pathways.
KEGG Automatic Annotation Server (KAAS)
Functional annotation of genes by BLAST comparisons against KEGG database.
Develop a program to map the KAAS results back onto KEGG pathways http://www.genome.jp/kegg/
Mu
d
p
i
t
Profiling Quantitation
Proteome
Label with isobaric tags
Tryptic digest
Strong cation exchange chromatography
LC-ESI-MS/MS
Database searches
Analyse ratios of tags iTRAQ 115
114
Protein identification
Relative levels of identified proteins
116
117
Waters Identity E and Expression E
500 ng sample loading plus an internal standard
Approximately 2 hours analysis/sample
E
E
E
Quantitation is based on relationship between ESI signal response and protein concentration
The average ESI signal response of the 3 most intense tryptic peptides per mole of protein is constant (CV +/- 10%) $
Quantify at the protein level (gross changes) or peptide level
(minor fluctuations in protein expression)
$ Absolute Quantification of Proteins by LCMS E Silva et al., MCP vol. 5 issue 1 (2006) 144-156
Protein identifications common to iTRAQ and
Identity E
Proteins identified in the iTRAQ and hit-wonders.
Identity E experiments, including one-