Supplementary Methods

advertisement
Supplementary Methods
Chimeras taking shape: potential functions of proteins
encoded by chimeric RNA transcripts
Milana Frenkel-Morgestern1, Vincent Lacroix2, Iakes Ezkurdia1, Yishai
Levin3, Jaime Prilusky4, Angela del Pozo1, Michael Tress1, Roderic
Guigo5 and Alfonso Valencia1*
1
Structural Biology and BioComputing Program, Spanish National Cancer Research
Centre (CNIO), Madrid, 28029, Spain.
2
UMR CNRS 5558, Laboratoire de Biométrie et Biologie Evolutive, INRIA Bamboo,
Université Claude Bernard, Villeurbanne, 69100, France.
3
Mass-Spectrometry Unit, Weizmann Institute of Science, Rehovot, 76100, Israel.
4
Bioinformatics Unit, Weizmann Institute of Science, Rehovot, 76100, Israel.
5
Centre for Genomic Regulation (CRG), C/ Dr Aiguader 88, 08003, Barcelona, Spain.
*Corresponding author: Alfonso Valencia, Structural Biology and BioComputing
Program, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain.
Email: avalencia@cnio.es
Proteomics Data Availability:
The data associated with this manuscript may be downloaded from the ProteomeCommons.org
Tranche network (www.proteomecommons.org)
(Access password: chimera), using the following hashes:
Prostate cell line (HTB-81):
fWVcKoaR2QlHayG5mPyEgdrkmZQyuGEWLh5360/WMGvjwD3V5D+EagZbxk9ssGis8OrdFth3ypYSr
/j9GJvpMi9ibO4AAAAAAAACtg==
Breast cell line (HTB-22):
o3DSLLM33aGCl1heZFwEFCdaP+XAUjJE1yuEyGhp0XelqrCFirAlKyqu4Ln0IuaU5Q/6WOb7j7zymG1
X9fOBJ/Fk5xUAAAAAAAADAQ==
Ovary cell line (HTB-161):
CM9ceZc516XdmAC98xpvG+02ISpL4tGzbnCz+70f+1afp24R56R7kaYv1/9WkE/hwNHeYXY0qLcP/y
Tl4hI+j0ZngNgAAAAAAnkQTA==
These hashes can be used to prove exactly what files were published as part of this manuscript's
dataset, and the hash can also be used to check that the data has not changed since publication.
Scaffold Viewer is required to view the spectra. It can be downloaded free here.
Shotgun proteomics experiments
To evidence chimeric proteins we employed ‘bottom-up’ shotgun proteomics using 2dimensional liquid chromatography coupled with high-resolution tandem mass
spectrometry. The platform was operated in data independent mode as described in Levin
et al (Levin et al. 2011). The data was searched against a concatenated protein sequence
database: the human Swiss-Prot and list of all chimeric ESTs from ChimerDB (Kim et al.
2010) translated in six frames.
Cell Lines and Total Proteome Extraction
Three human cancer cell lines were subjected to proteomic analysis: the MCF7 human
breast epithelial cell derived from mammary gland adenocarcinoma (HTB-22™), the
OVCAR-3 human epithelial cell line derived from ovary (HTB-161™) and the DU-145
human epithelial carcinoma derived from prostate (HTB-81™). The cells were grown in
the media indicated in Table S2. In each case, cells were harvested upon reaching 80100% confluence (~5x106 cells per 75 cm2 flask). Briefly, the growth media was
aspirated and then the cells rinsed gently with cold 1xPBS before being scraped into 1 ml
cold 1xPBS and transferred into microfuge tube. Next, the cells were spun at 14,000g for
10min at 40C and the pellet re-suspended in RIPA buffer (50mM Tris HCl pH8, 150mM
NaCl, 1% NP- 40, 0.5% sodium deoxycholate and 0.1% SDS, with protease inhibitors),
such that the ration was roughly 1:7 (30ul of cells 210ul RIPA). Total protein
concentration was measured by the BCA method according to the manufacturer’s
instructions. The sample was diluted so that the final concentration was 2ug/µl and the
total volume at least 200µl. Protein samples were stored at -80°C.
Sample Preparation
Proteins in the cell lysates were reduced by addition of dithiolthreitol (Sigma; 5mM) and
incubation for 30 min at 60°C and then alkylated by addition of iodoacetemide (Sigma;
10 mM) and incubation in the dark for 30 min at 21°C. The proteins were then digested
by incubation with trypsin (Promega; Madison, WI, USA) for 16 hours at 37°C, added at
a ratio of 1:50 (w/w trypsin/protein). Digestions were stopped by the addition of 1%
trifluroacetic acid (TFA). The samples were stored at -80˚C in aliquots.
Liquid Chromatography
ULC/MS grade solvents were used for all chromatographic steps. Each sample was
loaded using split-less nano-Ultra Performance Liquid Chromatography (10kpsi
nanoAcquity; Waters, Milford, MA, USA) in high-pH/low-pH reversed phase (RP) 2
dimensional liquid chromatography mode. 15μg of digested protein from each sample
was loaded onto a C18 column (XBridge, 0.3x50mm, 5μm particles, Waters). The
following two buffers were combined: (A) 20mM ammonium formate, pH 10 and (B)
acetonitrile (ACN). Peptides were released from the column using a step gradient:
6.9%B, 10.4%B, 12.1%B, 13.5%B, 14.7%B, 15.9%B, 17.3%B, 18.8%B, 20.9%B and
65%B. Each fraction flowed directly to the second dimension of chromatography. The
buffers used in the low pH RP were: (A) H2O + 0.1% formic acid and (B) ACN + 0.1%
formic acid. Desalting of samples was performed online using a reverse-phase C18
trapping column (180µm i.d., 20mm length, 5µm particle size, Waters). Then the
peptides were separated using a C18 T3 HSS nano-column (75µm i.d., 150mm length,
1.8µm particle size, Waters) run at 0.4µL/minute. Finally, peptides were eluted from the
column and loaded onto the mass spectrometer using the following protocol: 3% to
30%B over 60min, 30% to 95%B over 5min, 95% maintained for 7min (and then back to
initial conditions).
Mass Spectrometry
The nanoLC was coupled online through a nanoESI emitter (7 cm length, 10 mm tip;
New Objective; Woburn, MA, USA) to a quadrupole ion mobility time-of-flight mass
spectrometer (Synapt G2 HDMS, Waters) tuned to 20,000 mass resolution (full width at
half height). Data were acquired using Masslynx version 4.1 in HDMSE positive ion
mode, in which the quadrupole was set to transfer all ions. The ions were separated in the
T-Wave ion mobility chamber and transferred into the collision cell. Collision energy
was alternated from low to high throughout the acquisition time. In low-energy (MS1)
scans, the collision energy was set to 5 eV and this was ramped from 27 to 50 eV for
high-energy scans. For both scans, the mass range was set to 50 – 2,000 Da with a scan
time set to 1 second. A reference compound (Glu-Fibrinopeptide B; Sigma) was infused
continuously for external calibration using a LockSpray and scanned every 30 seconds.
Data Processing, Searching and Analysis
Raw data processing and database searching was performed using Proteinlynx Global
Server (IdentityE) version 2.5. Database searching was carried out using the Ion
Accounting algorithm described by Li et al (Li et al. 2009). Briefly, the algorithm detects
the 250 most abundant peptides and performs an initial pass through the database in order
to identify these peptides (with mass tolerance of 7ppm for precursor ions and 15ppm for
fragment ions). These peptides are depleted from the database before the remaining
peptides are sought in the database. The cycle continues to the next most abundant
peptides, which are identified and then depleted from the database. These tentative
peptide identifications are ranked and scored based on how well they conform to 14
predetermined models of specific, physicochemical attributes (such as retention time and
fragmentation prediction, fragment to precursor ratios etc). Trypsin was set as the
protease, one missed cleavage was allowed and fixed modification was set to
carbamidomethylation of cysteines. Variable modifications included oxidation of
methionine.
Data were searched against a target database: the concatenated human Swiss-Prot protein
database (version 2011.05) and all chimeric ESTs (translated in six frames) from
ChimerDB (Kim et al. 2010). All reversed sequences were used as a decoy set. The
criteria for protein identification were set to minimum of three fragments per peptide, five
fragments per protein and minimum peptide score of 6.7, which corresponds to the false
identification rate (FDR) of 1% (Figure S1). The approach for setting the minimum
identification score is based on reports by Keller et al, and termed Peptide Prophet
(Keller et al. 2002; Nesvizhskii et al. 2003).
Targeted analysis in selective reaction monitoring mode (SRM)
We confirmed two chimeric peptides using liquid chromatography mass spectrometry in
selective reaction monitoring mode (SRM). This technique is widely used in proteomics
for targeted analysis (Addona et al. 2009; Picotti et al. 2010; Stergachis et al. 2011). The
two peptides were synthesized (JPT Peptide Technologies) with heavy isotopic labels: C
terminus R (15N6, 13C4) or C terminus K (15N6, 13C2) and added to the cell lysates
prior to analysis.
Sample Preparation
An aliquot was taken from the digested samples prepared as described above. Samples
were diluted to 0.5ug/µL in 97:3% H2O:ACN+0.1% TFA.
Liquid Chromatography
ULC/MS grade solvents were used for all chromatographic steps. Each sample was
loaded using split-less nano-Ultra Performance Liquid Chromatography (10kpsi
nanoAcquity; Waters, Milford, MA, USA). The mobile phase was: (A) H2O + 0.1%
formic acid and (B) ACN + 0.1% formic acid. Desalting of samples was performed
online using a reverse-phase C18 trapping column (180µm i.d., 20mm length, 5µm
particle size; Waters). The peptides in samples were separated using a C18 T3 HSS nanocolumn (75µm i.d., 150mm length, 1.8µm particle size; Waters) run at 0.4µL/minute.
Peptides were eluted from the column and into the mass spectrometer using the following
gradient: 3% to 30%B over 40min, 30% to 95%B over 5min, maintained at 95% for 7min
and then back to initial conditions.
Mass Spectrometry
The nanoLC was coupled online through a nanoESI emitter (7 cm length, 10 mm tip;
New Objective; Woburn, MA, USA) to a tandem quadrupole mass spectrometer (Xevo
TQ-S, Waters Corp.). Data was acquired in selective reaction monitoring using Masslynx
4.1. Data was then imported into Skyline (Maclean et al. 2010; MacLean et al. 2010) for
final processing and evaluation. Signal to noise ratio was calculated by root-mean-square
in Masslynx software (Waters) with no extra processing. Minimum criteria were 5:1
signal to noise.
References
Addona TA, Abbatiello SE, Schilling B, Skates SJ, Mani DR, Bunk DM, Spiegelman CH, Zimmerman LJ,
Ham AJ, Keshishian H et al. 2009. Multi-site assessment of the precision and reproducibility of
multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotechnol 27(7):
633-641.
Keller A, Nesvizhskii AI, Kolker E, Aebersold R. 2002. Empirical statistical model to estimate the
accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74(20):
5383-5392.
Kim P, Yoon S, Kim N, Lee S, Ko M, Lee H, Kang H, Kim J. 2010. ChimerDB 2.0--a knowledgebase for
fusion genes updated. Nucleic Acids Res 38(Database issue): D81-85.
Levin Y, Hradetzky E, Bahn S. 2011. Quantification of proteins using data-independent analysis (MSE) in
simple andcomplex samples: a systematic evaluation. Proteomics 11(16): 3273-3287.
Li GZ, Vissers JP, Silva JC, Golick D, Gorenstein MV, Geromanos SJ. 2009. Database searching and
accounting of multiplexed precursor and product ion spectra from the data independent analysis of
simple and complex peptide mixtures. Proteomics 9(6): 1696-1719.
Maclean B, Tomazela DM, Abbatiello SE, Zhang S, Whiteaker JR, Paulovich AG, Carr SA, Maccoss MJ.
2010. Effect of collision energy optimization on the measurement of peptides by selected reaction
monitoring (SRM) mass spectrometry. Anal Chem 82(24): 10116-10124.
MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B, Kern R, Tabb DL, Liebler
DC, MacCoss MJ. 2010. Skyline: an open source document editor for creating and analyzing
targeted proteomics experiments. Bioinformatics 26(7): 966-968.
Nesvizhskii AI, Keller A, Kolker E, Aebersold R. 2003. A statistical model for identifying proteins by
tandem mass spectrometry. Anal Chem 75(17): 4646-4658.
Picotti P, Rinner O, Stallmach R, Dautel F, Farrah T, Domon B, Wenschuh H, Aebersold R. 2010. Highthroughput generation of selected reaction-monitoring assays for proteins and proteomes. Nat
Methods 7(1): 43-46.
Stergachis AB, Maclean B, Lee K, Stamatoyannopoulos JA, Maccoss MJ. 2011. Rapid empirical discovery
of optimal peptides for targeted proteomics. Nat Methods.
Download