A Proteomic Approach to an Analysis of the Virion Structure of the Marine Bacteriophage S-PM2. Julia E. Jackson; Konstantinos Thalassinos; Susan E. Slade; Martha R. Clokie; Nicholas H. Mann; James H. Scrivens. University of Warwick, Coventry, United Kingdom. OVERVIEW 14% 5% Methods The genomic sequence of S-PM2 was translated in all six reading frames using three gene prediction programs (Expasy Translate Tool, Glimmer and GeneMarkS) and the output converted to a Fasta format suitable for database searching. Unknown Construction of the protein database In total 24 structural proteins were identified from the cyanophage S-PM2 and 19 from the bacteriophage T4 shown in Tables 1 and 2 and represented diagrammatically in Figure 4. Similarity to other phage or bacteria apart from T4 The raw genomic sequence of SPM-2 was translated in all six reading frames using the Translate tool from the Expasy web server (http://ca.expasy.org/) Perl scripts transformed the output to a Fasta formatted database. Open Reading Frames (ORFs) with a molecular weight greater than 300Da were included. 56% Purpose To undertake a holistic proteomic study of a novel cyanobacterial virus (S-PM2). In parallel, the well-characterised proteins of the Escherichia coli virus T4 were identified and compared to their distantly related “homologues” in S-PM2. MATERIALS AND METHODS Protein identification Very similar to T4 Similarity to hypothetical proteins from uncharacterised genomes 25% Protein preparation and identification Figure 2. BLAST homology search of cyanophage S-PM2 genes showing similarities to genes in other organisms. Mass spectrometric analysis and subsequent protein identification were achieved for both T4 and S-PM2 using both MALDI-MS and LC-ESI-MS/MS techniques. Results We have successfully identified 19 T4 head, neck, baseplate, tail, and tail fibre proteins using our methodology. Over 65% of the proposed structural proteins from S-PM2 have been characterised. We have identified 24 Open Reading Frames from S-PM2 encoding structural proteins of which a substantial number show no similarity to any other phage, including the tail fibre proteins in S-PM2 which confer host specificity. Genes were predicted using GeneMarkS (http://opal.biology.gatech.edu/GeneMark/) and Glimmer (http://www.tigr.org/software/glimmer/). Again the output was converted to a Fasta formatted database by use of Perl scripts and all three databases were added to SwissProt. This study aims to identify the structural proteins of S-PM2 where little or no homology exists with known DNA sequences. Initially the focus commenced at the nucleic acid level by predicting genes and their products then using a mass spectrometric approach for protein identification, information is generated that can be used to annotate the genome; see Figure 3. A mass spectrometry-based proteomic approach was taken in the identification of gelresolved purified phage proteins from both S-PM2 and T4. The amino acid sequences obtained from the identified S-PM2 proteins were compared with the predicted proteins generated by a combination of the three programs. The focus of the study then returned back to the genome annotating the identified genes with their newly identified function. INTRODUCTION S-PM2 and T4 virus particles were purified using a CsCl gradient and the proteins solubilised in Laemmli buffer prior to resolution on a 1D SDS-PAGE gel, stained with Coomassie G-250. Protein bands were excised and processed using a MassPrep robotic protein handling system (Waters Micromass MS Technologies, U.K.). Protein samples were destained, reduced, alkylated with iodoacetamide, digested with trypsin and the resultant peptides extracted according to standard protocols described by the supplier. The tryptic peptides were characterised by means of matrix assisted laser desorption ionisation MS on a M@LDI-LR instrument (Waters Micromass MS Technologies, U.K.). The tryptic extract was mixed with matrix (alpha-cyano-4-hydroxycinnamic acid) prior to spotting on the target plate. An external calibration was performed over the mass / charge (m/z) range of 800-3000, and adrenocorticotropic hormone (ACTH) fragment 18-39 was used to correct for calibration drift. In addition, we are very confident that we have identified the unique tail fibres from S-PM2 which confer host specificity and are encoded by ORFs 174 and 176. S-PM2 ORF T4 name T4 Function/domain Identified in S-PM2 Identified in T4 108 gp23 major capsid protein Yes Yes 107 gp22 prohead core protein Yes No 110 gp3 tail completion protein Yes No 95 gp15 proximal tail sheath stabilizer Yes Yes 102 gp18 contractile tail sheath protein Yes Yes 103 gp19 tail tube protein Yes Yes 104 gp20 portal vertex protein Yes Yes 93 gp13 neck protein Yes Yes 94* gp14 neck protein No No 83 gp8 baseplate wedge subunit Yes Yes 80 gp6 baseplate wedge subunit Yes No 79* gp25 baseplate wedge subunit No No 201* gp53 baseplate wedge component No No 211 gp5 baseplate hub subunit and tail lysozyme Yes Yes 206* gp26 base plate hub subunit No No 202 gp48 base plate, tail tube associated Yes Yes Experimental The tryptic extracts were also analysed by means of nano-LC-ESI-MS/MS on a Q-Tof Ultima Global with in-line CapLC system (Waters Micromass MS Technologies, U.K.). The tryptic extract was desalted using an in-line C18 precolumn cartridge (Dionex, U.S.A.) and the peptides further resolved on a 75 µm C18 PepMap column (Dionex, U.S.A.) using an increasing acetonitrile concentration gradient. Table 1. Structural proteins predicted to be in S-PM2 on the basis of similarity to those found in T4. A * designates that we have not identified them using current methodologies. Protein extract 1D / 2D gels Predict Proteins from genome Proteins Separation Create Databases of predicted proteins Link back to genome Enzymatic using Trypsin Digestion In contrast, S-PM2 (Figure 1, right), a cyanophage that infects the marine bacterium Synechococcus was first isolated in 1993 (Wilson et al.) and, although its impact on natural populations of Synechococcus is still not known, it is thought to have a significant effect. The genome of S-PM2 has recently been sequenced (Millard et al., Mann et al., 2003 and 2004 in prep.) and shows little homology to other known viral genomes thus making identification of viral proteins more complicated; see Figure 2. Consequently, a combined bioinformatics and proteomics approach was undertaken to solve this complex problem. Use existing database Database searching software Search Protein Identification Figure 3. Overview of a holistic proteomic study. A combination of genomic, proteomic and bioinformatic approaches has proved highly successful in the identification of proteins from a novel virus that shows little similarity to the distantly related coliphage T4. Over 50% of the structural proteins from T4 have been identified including the host-specific tail fibres. During the identification of S-PM2 proteins, no T4 sequences were identified and conversely no S-PM2 sequences were identified during T4 protein analysis. This indicates that these “homologous” proteins at the amino acid level are truly quite dissimilar. T4 Function/domain Hypothesised function A small number of hydrophobic proteins have yet to be identified from either the T4 or S-PM2 samples. 82 - - baseplate S-PM2 Gene Prediction Results 86 - 2 repeats indicative of folding baseplate We propose to utilise alternate methodologies, including intact protein separation by 2D liquid chromatography, to identify the proteins that remain elusive to our gel-resolved sample preparation methods. The Expasy Translate Tool predicted over 3700 ORFs from the S-PM2 genome. GeneMarkS predicted 217 proteins and Glimmer 202 for S-PM2, of which 189 are almost identical in sequence. The Glimmer prediction contains 13 unique proteins while the GeneMarkS contains 28. 87 - no GTOP info baseplate 89 - fibrinogen domain wac 90 - a-helices, pentraxin domain head Thus using a combination of gene prediction tools, we propose that a total of 239 ORFs are encoded by the S-PM2 genome, of which approximately 37 are believed to encode structural proteins. 91 - - baseplate 146 - - head RESULTS Acquire Data Use MS and MS/MS data to search against predicted databases CONCLUSIONS Equivalent T4 protein 174 gp34 - long tail fibre 176 gp37 3x repeat long tail fibre 221 - - baseplate 223 - protein kinase baseplate 225 gp12 repeats short tail fibre MALDI, ESI MS Figure 1. Electron micrographs of T4 (left) and the distantly related S-PM2 phage (right). The genome was annotated accordingly upon confirmation that a protein product had been translated from a specified region of S-PM2 nucleic acid. S-PM2 ORF RESULTS CapLC-ESI-MS/MS Tail fibres have been shown to have unusual folding patterns (trimeric beta-helix fibers). BetaWrap (http://betawrap.lcs.mit.edu/) predicts such folds for ORFs174 and 176. In T4 and KVP40 (a T4-like, broad-host-range vibriophage) tail fibres are encoded by the negative strand. ORFs 174 and 176 are encoded by the negative strand in S-PM2. Furthermore, the proteins encoded by ORFs 174 and 176 in S-PM2 are of similar molecular mass to proteins gp37 and gp34 from T4. We have identified the unique tail fibres from S-PM2 which have no similarity to any other phage currently under study. ProteinLynx Global Server 2.0 (Waters Micromass MS Technologies, U.K.) was used to interrogate the data obtained from both MALDI-MS and LC-ESI-MS/MS experiments. Theoretical Additional evidence for the nature of the proteins from ORFs 174 and 176 has been obtained through structural studies of their homologues. Over 65% of the structural proteins expected from S-PM2 have now been identified. Genome T4 is a virus (phage) that infects the enteric bacterium Escherichia coli; see Figure 1 (left). Extensive studies have fully characterised the genes and proteins involved in host infection, phage DNA replication, phage protein translation prior to the self assembly of mature viral particles for release from the cell. We have successfully identified the corresponding tail fibre proteins gp34 (anchors tail fibre to baseplate) and gp37 (involved in host attachment) in our T4 samples. Figure 4. A diagram of the predicted structural proteins in S-PM2. The position of these proteins in T4 has been re-drawn using information from Miller et al. (2003) and Leiman et al. (2004). Solid areas represent proteins that we detected by mass spectrometry, hatched areas represent proteins that we did not detect. Table 2. Structural proteins identified in S-PM2 that are not present in NCBI or T4-like phage database with suggested function. The authors have submitted the study presented here to the Journal of Molecular Biology for publication, entitled “Mass spectrometry and bioinformatic analysis of structural proteins in the marine cyanovirus S-PM2 and the enteric coliphage T4”. REFERENCES Leiman, P. G., Chipman, P. R., Kostyuchenko, V. A., Mesyanzhinov, V. V., and Rossmann, M. G. (2004). Three-Dimensional Rearrangement of Proteins in the Tail of Bacteriophage T4 on Infection of Its Host. Cell 118: 419-429. Mann, N.H., Cook, A., Millard, A., Bailey, S. and Clokie, M. (2003). Marine ecosystems: Bacterial photosynthesis genes in a virus. Nature 424: 741. Mann N.H. et al. (2005). The genome of S-PM2, a “photosynthetic” T4-type bacteriophage that infects marine Synechococcus strains. Journal of Bacteriology 187 (9): 3188-3200 Millard, A., Clokie, M., Shub, D.A. and Mann, N.H. (2004). Genetic organization of the psbAD region in phages infecting marine Synechococcus strains. Proceedings of the National Academy of Sciences 101 (30): 11007-11012. Wilson, W. H., Joint I. R. Carr, N. G. et al. (1993). Isolation and molecular characterization of five marine cyanophages propagated on Synechococcus sp. strain WH7803. Appl. Environ.