MATERIALS AND METHODS OVERVIEW

advertisement
A Proteomic Approach to an Analysis of the Virion Structure of the Marine Bacteriophage S-PM2.
Julia E. Jackson; Konstantinos Thalassinos; Susan E. Slade; Martha R. Clokie; Nicholas H. Mann; James H. Scrivens.
University of Warwick, Coventry, United Kingdom.
OVERVIEW
14%
5%
Methods
The genomic sequence of S-PM2 was translated in all six reading frames using three gene
prediction programs (Expasy Translate Tool, Glimmer and GeneMarkS) and the output
converted to a Fasta format suitable for database searching.
Unknown
Construction of the protein database
In total 24 structural proteins were identified from the cyanophage S-PM2 and 19 from the
bacteriophage T4 shown in Tables 1 and 2 and represented diagrammatically in Figure 4.
Similarity to other phage or
bacteria apart from T4
The raw genomic sequence of SPM-2 was translated in all six reading frames using the
Translate tool from the Expasy web server (http://ca.expasy.org/) Perl scripts transformed the
output to a Fasta formatted database. Open Reading Frames (ORFs) with a molecular weight
greater than 300Da were included.
56%
Purpose
To undertake a holistic proteomic study of a novel cyanobacterial virus (S-PM2).
In parallel, the well-characterised proteins of the Escherichia coli virus T4 were identified and
compared to their distantly related “homologues” in S-PM2.
MATERIALS AND METHODS
Protein identification
Very similar to T4
Similarity to hypothetical
proteins from uncharacterised
genomes
25%
Protein preparation and identification
Figure 2. BLAST homology search of cyanophage S-PM2 genes showing similarities to
genes in other organisms.
Mass spectrometric analysis and subsequent protein identification were achieved for both T4
and S-PM2 using both MALDI-MS and LC-ESI-MS/MS techniques.
Results
We have successfully identified 19 T4 head, neck, baseplate, tail, and tail fibre proteins using
our methodology.
Over 65% of the proposed structural proteins from S-PM2 have been characterised.
We have identified 24 Open Reading Frames from S-PM2 encoding structural proteins of
which a substantial number show no similarity to any other phage, including the tail fibre
proteins in S-PM2 which confer host specificity.
Genes were predicted using GeneMarkS (http://opal.biology.gatech.edu/GeneMark/) and
Glimmer (http://www.tigr.org/software/glimmer/). Again the output was converted to a Fasta
formatted database by use of Perl scripts and all three databases were added to SwissProt.
This study aims to identify the structural proteins of S-PM2 where little or no homology exists
with known DNA sequences. Initially the focus commenced at the nucleic acid level by
predicting genes and their products then using a mass spectrometric approach for protein
identification, information is generated that can be used to annotate the genome; see Figure 3.
A mass spectrometry-based proteomic approach was taken in the identification of gelresolved purified phage proteins from both S-PM2 and T4.
The amino acid sequences obtained from the identified S-PM2 proteins were compared with
the predicted proteins generated by a combination of the three programs.
The focus of the study then returned back to the genome annotating the identified genes with
their newly identified function.
INTRODUCTION
S-PM2 and T4 virus particles were purified using a CsCl gradient and the proteins solubilised
in Laemmli buffer prior to resolution on a 1D SDS-PAGE gel, stained with Coomassie G-250.
Protein bands were excised and processed using a MassPrep robotic protein handling
system (Waters Micromass MS Technologies, U.K.). Protein samples were destained,
reduced, alkylated with iodoacetamide, digested with trypsin and the resultant peptides
extracted according to standard protocols described by the supplier.
The tryptic peptides were characterised by means of matrix assisted laser desorption
ionisation MS on a M@LDI-LR instrument (Waters Micromass MS Technologies, U.K.). The
tryptic extract was mixed with matrix (alpha-cyano-4-hydroxycinnamic acid) prior to spotting
on the target plate. An external calibration was performed over the mass / charge (m/z) range
of 800-3000, and adrenocorticotropic hormone (ACTH) fragment 18-39 was used to correct
for calibration drift.
In addition, we are very confident that we have identified the unique tail fibres from S-PM2
which confer host specificity and are encoded by ORFs 174 and 176.
S-PM2 ORF
T4 name
T4 Function/domain
Identified in S-PM2
Identified in T4
108
gp23
major capsid protein
Yes
Yes
107
gp22
prohead core protein
Yes
No
110
gp3
tail completion protein
Yes
No
95
gp15
proximal tail sheath stabilizer
Yes
Yes
102
gp18
contractile tail sheath protein
Yes
Yes
103
gp19
tail tube protein
Yes
Yes
104
gp20
portal vertex protein
Yes
Yes
93
gp13
neck protein
Yes
Yes
94*
gp14
neck protein
No
No
83
gp8
baseplate wedge subunit
Yes
Yes
80
gp6
baseplate wedge subunit
Yes
No
79*
gp25
baseplate wedge subunit
No
No
201*
gp53
baseplate wedge component
No
No
211
gp5
baseplate hub subunit and tail lysozyme
Yes
Yes
206*
gp26
base plate hub subunit
No
No
202
gp48
base plate, tail tube associated
Yes
Yes
Experimental
The tryptic extracts were also analysed by means of nano-LC-ESI-MS/MS on a Q-Tof Ultima
Global with in-line CapLC system (Waters Micromass MS Technologies, U.K.). The tryptic
extract was desalted using an in-line C18 precolumn cartridge (Dionex, U.S.A.) and the
peptides further resolved on a 75 µm C18 PepMap column (Dionex, U.S.A.) using an
increasing acetonitrile concentration gradient.
Table 1. Structural proteins predicted to be in S-PM2 on the basis of similarity to those
found in T4. A * designates that we have not identified them using current methodologies.
Protein extract
1D / 2D gels
Predict Proteins
from genome
Proteins
Separation
Create Databases
of predicted proteins
Link back
to genome
Enzymatic
using Trypsin
Digestion
In contrast, S-PM2 (Figure 1, right), a cyanophage that infects the marine bacterium
Synechococcus was first isolated in 1993 (Wilson et al.) and, although its impact on natural
populations of Synechococcus is still not known, it is thought to have a significant effect. The
genome of S-PM2 has recently been sequenced (Millard et al., Mann et al., 2003 and 2004 in
prep.) and shows little homology to other known viral genomes thus making identification of
viral proteins more complicated; see Figure 2. Consequently, a combined bioinformatics and
proteomics approach was undertaken to solve this complex problem.
Use existing database Database
searching software
Search
Protein Identification
Figure 3. Overview of a holistic proteomic study.
A combination of genomic, proteomic and bioinformatic approaches has proved highly
successful in the identification of proteins from a novel virus that shows little similarity to the
distantly related coliphage T4.
Over 50% of the structural proteins from T4 have been identified including the host-specific tail
fibres.
During the identification of S-PM2 proteins, no T4 sequences were identified and conversely
no S-PM2 sequences were identified during T4 protein analysis. This indicates that these
“homologous” proteins at the amino acid level are truly quite dissimilar.
T4 Function/domain
Hypothesised function
A small number of hydrophobic proteins have yet to be identified from either the T4 or S-PM2
samples.
82
-
-
baseplate
S-PM2 Gene Prediction Results
86
-
2 repeats indicative of folding
baseplate
We propose to utilise alternate methodologies, including intact protein separation by 2D liquid
chromatography, to identify the proteins that remain elusive to our gel-resolved sample
preparation methods.
The Expasy Translate Tool predicted over 3700 ORFs from the S-PM2 genome. GeneMarkS
predicted 217 proteins and Glimmer 202 for S-PM2, of which 189 are almost identical in
sequence. The Glimmer prediction contains 13 unique proteins while the GeneMarkS
contains 28.
87
-
no GTOP info
baseplate
89
-
fibrinogen domain
wac
90
-
a-helices, pentraxin domain
head
Thus using a combination of gene prediction tools, we propose that a total of 239 ORFs are
encoded by the S-PM2 genome, of which approximately 37 are believed to encode structural
proteins.
91
-
-
baseplate
146
-
-
head
RESULTS
Acquire Data
Use MS and MS/MS
data to search against
predicted databases
CONCLUSIONS
Equivalent T4 protein
174
gp34
-
long tail fibre
176
gp37
3x repeat
long tail fibre
221
-
-
baseplate
223
-
protein kinase
baseplate
225
gp12
repeats
short tail fibre
MALDI, ESI MS
Figure 1. Electron micrographs of T4 (left) and the distantly related S-PM2 phage (right).
The genome was annotated accordingly upon confirmation that a protein product had been
translated from a specified region of S-PM2 nucleic acid.
S-PM2 ORF
RESULTS
CapLC-ESI-MS/MS
Tail fibres have been shown to have unusual folding patterns (trimeric beta-helix fibers).
BetaWrap (http://betawrap.lcs.mit.edu/) predicts such folds for ORFs174 and 176. In T4 and
KVP40 (a T4-like, broad-host-range vibriophage) tail fibres are encoded by the negative
strand. ORFs 174 and 176 are encoded by the negative strand in S-PM2. Furthermore, the
proteins encoded by ORFs 174 and 176 in S-PM2 are of similar molecular mass to proteins
gp37 and gp34 from T4.
We have identified the unique tail fibres from S-PM2 which have no similarity to any other
phage currently under study.
ProteinLynx Global Server 2.0 (Waters Micromass MS Technologies, U.K.) was used to
interrogate the data obtained from both MALDI-MS and LC-ESI-MS/MS experiments.
Theoretical
Additional evidence for the nature of the proteins from ORFs 174 and 176 has been obtained
through structural studies of their homologues.
Over 65% of the structural proteins expected from S-PM2 have now been identified.
Genome
T4 is a virus (phage) that infects the enteric bacterium Escherichia coli; see Figure 1 (left).
Extensive studies have fully characterised the genes and proteins involved in host infection,
phage DNA replication, phage protein translation prior to the self assembly of mature viral
particles for release from the cell.
We have successfully identified the corresponding tail fibre proteins gp34 (anchors tail fibre to
baseplate) and gp37 (involved in host attachment) in our T4 samples.
Figure 4. A diagram of the predicted structural proteins
in S-PM2.
The position of these proteins in T4 has been re-drawn
using information from Miller et al. (2003) and Leiman
et al. (2004).
Solid areas represent proteins that we detected by
mass spectrometry, hatched areas represent proteins
that we did not detect.
Table 2. Structural proteins identified in S-PM2 that are not present in NCBI or T4-like
phage database with suggested function.
The authors have submitted the study presented here to the Journal of Molecular Biology for
publication, entitled “Mass spectrometry and bioinformatic analysis of structural proteins in the
marine cyanovirus S-PM2 and the enteric coliphage T4”.
REFERENCES
Leiman, P. G., Chipman, P. R., Kostyuchenko, V. A., Mesyanzhinov, V. V., and Rossmann, M.
G. (2004). Three-Dimensional Rearrangement of Proteins in the Tail of Bacteriophage T4 on
Infection of Its Host. Cell 118: 419-429.
Mann, N.H., Cook, A., Millard, A., Bailey, S. and Clokie, M. (2003). Marine ecosystems:
Bacterial photosynthesis genes in a virus. Nature 424: 741.
Mann N.H. et al. (2005). The genome of S-PM2, a “photosynthetic” T4-type bacteriophage
that infects marine Synechococcus strains. Journal of Bacteriology 187 (9): 3188-3200
Millard, A., Clokie, M., Shub, D.A. and Mann, N.H. (2004). Genetic organization of the psbAD
region in phages infecting marine Synechococcus strains. Proceedings of the National
Academy of Sciences 101 (30): 11007-11012.
Wilson, W. H., Joint I. R. Carr, N. G. et al. (1993). Isolation and molecular characterization of
five marine cyanophages propagated on Synechococcus sp. strain WH7803. Appl. Environ.
Download