Supplementary Methods Chimeras taking shape: potential functions of proteins encoded by chimeric RNA transcripts Milana Frenkel-Morgestern1, Vincent Lacroix2, Iakes Ezkurdia1, Yishai Levin3, Jaime Prilusky4, Angela del Pozo1, Michael Tress1, Roderic Guigo5 and Alfonso Valencia1* 1 Structural Biology and BioComputing Program, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain. 2 UMR CNRS 5558, Laboratoire de Biométrie et Biologie Evolutive, INRIA Bamboo, Université Claude Bernard, Villeurbanne, 69100, France. 3 Mass-Spectrometry Unit, Weizmann Institute of Science, Rehovot, 76100, Israel. 4 Bioinformatics Unit, Weizmann Institute of Science, Rehovot, 76100, Israel. 5 Centre for Genomic Regulation (CRG), C/ Dr Aiguader 88, 08003, Barcelona, Spain. *Corresponding author: Alfonso Valencia, Structural Biology and BioComputing Program, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain. Email: avalencia@cnio.es Proteomics Data Availability: The data associated with this manuscript may be downloaded from the ProteomeCommons.org Tranche network (www.proteomecommons.org) (Access password: chimera), using the following hashes: Prostate cell line (HTB-81): fWVcKoaR2QlHayG5mPyEgdrkmZQyuGEWLh5360/WMGvjwD3V5D+EagZbxk9ssGis8OrdFth3ypYSr /j9GJvpMi9ibO4AAAAAAAACtg== Breast cell line (HTB-22): o3DSLLM33aGCl1heZFwEFCdaP+XAUjJE1yuEyGhp0XelqrCFirAlKyqu4Ln0IuaU5Q/6WOb7j7zymG1 X9fOBJ/Fk5xUAAAAAAAADAQ== Ovary cell line (HTB-161): CM9ceZc516XdmAC98xpvG+02ISpL4tGzbnCz+70f+1afp24R56R7kaYv1/9WkE/hwNHeYXY0qLcP/y Tl4hI+j0ZngNgAAAAAAnkQTA== These hashes can be used to prove exactly what files were published as part of this manuscript's dataset, and the hash can also be used to check that the data has not changed since publication. Scaffold Viewer is required to view the spectra. It can be downloaded free here. Shotgun proteomics experiments To evidence chimeric proteins we employed ‘bottom-up’ shotgun proteomics using 2dimensional liquid chromatography coupled with high-resolution tandem mass spectrometry. The platform was operated in data independent mode as described in Levin et al (Levin et al. 2011). The data was searched against a concatenated protein sequence database: the human Swiss-Prot and list of all chimeric ESTs from ChimerDB (Kim et al. 2010) translated in six frames. Cell Lines and Total Proteome Extraction Three human cancer cell lines were subjected to proteomic analysis: the MCF7 human breast epithelial cell derived from mammary gland adenocarcinoma (HTB-22™), the OVCAR-3 human epithelial cell line derived from ovary (HTB-161™) and the DU-145 human epithelial carcinoma derived from prostate (HTB-81™). The cells were grown in the media indicated in Table S2. In each case, cells were harvested upon reaching 80100% confluence (~5x106 cells per 75 cm2 flask). Briefly, the growth media was aspirated and then the cells rinsed gently with cold 1xPBS before being scraped into 1 ml cold 1xPBS and transferred into microfuge tube. Next, the cells were spun at 14,000g for 10min at 40C and the pellet re-suspended in RIPA buffer (50mM Tris HCl pH8, 150mM NaCl, 1% NP- 40, 0.5% sodium deoxycholate and 0.1% SDS, with protease inhibitors), such that the ration was roughly 1:7 (30ul of cells 210ul RIPA). Total protein concentration was measured by the BCA method according to the manufacturer’s instructions. The sample was diluted so that the final concentration was 2ug/µl and the total volume at least 200µl. Protein samples were stored at -80°C. Sample Preparation Proteins in the cell lysates were reduced by addition of dithiolthreitol (Sigma; 5mM) and incubation for 30 min at 60°C and then alkylated by addition of iodoacetemide (Sigma; 10 mM) and incubation in the dark for 30 min at 21°C. The proteins were then digested by incubation with trypsin (Promega; Madison, WI, USA) for 16 hours at 37°C, added at a ratio of 1:50 (w/w trypsin/protein). Digestions were stopped by the addition of 1% trifluroacetic acid (TFA). The samples were stored at -80˚C in aliquots. Liquid Chromatography ULC/MS grade solvents were used for all chromatographic steps. Each sample was loaded using split-less nano-Ultra Performance Liquid Chromatography (10kpsi nanoAcquity; Waters, Milford, MA, USA) in high-pH/low-pH reversed phase (RP) 2 dimensional liquid chromatography mode. 15μg of digested protein from each sample was loaded onto a C18 column (XBridge, 0.3x50mm, 5μm particles, Waters). The following two buffers were combined: (A) 20mM ammonium formate, pH 10 and (B) acetonitrile (ACN). Peptides were released from the column using a step gradient: 6.9%B, 10.4%B, 12.1%B, 13.5%B, 14.7%B, 15.9%B, 17.3%B, 18.8%B, 20.9%B and 65%B. Each fraction flowed directly to the second dimension of chromatography. The buffers used in the low pH RP were: (A) H2O + 0.1% formic acid and (B) ACN + 0.1% formic acid. Desalting of samples was performed online using a reverse-phase C18 trapping column (180µm i.d., 20mm length, 5µm particle size, Waters). Then the peptides were separated using a C18 T3 HSS nano-column (75µm i.d., 150mm length, 1.8µm particle size, Waters) run at 0.4µL/minute. Finally, peptides were eluted from the column and loaded onto the mass spectrometer using the following protocol: 3% to 30%B over 60min, 30% to 95%B over 5min, 95% maintained for 7min (and then back to initial conditions). Mass Spectrometry The nanoLC was coupled online through a nanoESI emitter (7 cm length, 10 mm tip; New Objective; Woburn, MA, USA) to a quadrupole ion mobility time-of-flight mass spectrometer (Synapt G2 HDMS, Waters) tuned to 20,000 mass resolution (full width at half height). Data were acquired using Masslynx version 4.1 in HDMSE positive ion mode, in which the quadrupole was set to transfer all ions. The ions were separated in the T-Wave ion mobility chamber and transferred into the collision cell. Collision energy was alternated from low to high throughout the acquisition time. In low-energy (MS1) scans, the collision energy was set to 5 eV and this was ramped from 27 to 50 eV for high-energy scans. For both scans, the mass range was set to 50 – 2,000 Da with a scan time set to 1 second. A reference compound (Glu-Fibrinopeptide B; Sigma) was infused continuously for external calibration using a LockSpray and scanned every 30 seconds. Data Processing, Searching and Analysis Raw data processing and database searching was performed using Proteinlynx Global Server (IdentityE) version 2.5. Database searching was carried out using the Ion Accounting algorithm described by Li et al (Li et al. 2009). Briefly, the algorithm detects the 250 most abundant peptides and performs an initial pass through the database in order to identify these peptides (with mass tolerance of 7ppm for precursor ions and 15ppm for fragment ions). These peptides are depleted from the database before the remaining peptides are sought in the database. The cycle continues to the next most abundant peptides, which are identified and then depleted from the database. These tentative peptide identifications are ranked and scored based on how well they conform to 14 predetermined models of specific, physicochemical attributes (such as retention time and fragmentation prediction, fragment to precursor ratios etc). Trypsin was set as the protease, one missed cleavage was allowed and fixed modification was set to carbamidomethylation of cysteines. Variable modifications included oxidation of methionine. Data were searched against a target database: the concatenated human Swiss-Prot protein database (version 2011.05) and all chimeric ESTs (translated in six frames) from ChimerDB (Kim et al. 2010). All reversed sequences were used as a decoy set. The criteria for protein identification were set to minimum of three fragments per peptide, five fragments per protein and minimum peptide score of 6.7, which corresponds to the false identification rate (FDR) of 1% (Figure S1). The approach for setting the minimum identification score is based on reports by Keller et al, and termed Peptide Prophet (Keller et al. 2002; Nesvizhskii et al. 2003). Targeted analysis in selective reaction monitoring mode (SRM) We confirmed two chimeric peptides using liquid chromatography mass spectrometry in selective reaction monitoring mode (SRM). This technique is widely used in proteomics for targeted analysis (Addona et al. 2009; Picotti et al. 2010; Stergachis et al. 2011). The two peptides were synthesized (JPT Peptide Technologies) with heavy isotopic labels: C terminus R (15N6, 13C4) or C terminus K (15N6, 13C2) and added to the cell lysates prior to analysis. Sample Preparation An aliquot was taken from the digested samples prepared as described above. Samples were diluted to 0.5ug/µL in 97:3% H2O:ACN+0.1% TFA. Liquid Chromatography ULC/MS grade solvents were used for all chromatographic steps. Each sample was loaded using split-less nano-Ultra Performance Liquid Chromatography (10kpsi nanoAcquity; Waters, Milford, MA, USA). The mobile phase was: (A) H2O + 0.1% formic acid and (B) ACN + 0.1% formic acid. Desalting of samples was performed online using a reverse-phase C18 trapping column (180µm i.d., 20mm length, 5µm particle size; Waters). The peptides in samples were separated using a C18 T3 HSS nanocolumn (75µm i.d., 150mm length, 1.8µm particle size; Waters) run at 0.4µL/minute. Peptides were eluted from the column and into the mass spectrometer using the following gradient: 3% to 30%B over 40min, 30% to 95%B over 5min, maintained at 95% for 7min and then back to initial conditions. Mass Spectrometry The nanoLC was coupled online through a nanoESI emitter (7 cm length, 10 mm tip; New Objective; Woburn, MA, USA) to a tandem quadrupole mass spectrometer (Xevo TQ-S, Waters Corp.). Data was acquired in selective reaction monitoring using Masslynx 4.1. Data was then imported into Skyline (Maclean et al. 2010; MacLean et al. 2010) for final processing and evaluation. Signal to noise ratio was calculated by root-mean-square in Masslynx software (Waters) with no extra processing. Minimum criteria were 5:1 signal to noise. References Addona TA, Abbatiello SE, Schilling B, Skates SJ, Mani DR, Bunk DM, Spiegelman CH, Zimmerman LJ, Ham AJ, Keshishian H et al. 2009. Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotechnol 27(7): 633-641. Keller A, Nesvizhskii AI, Kolker E, Aebersold R. 2002. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74(20): 5383-5392. Kim P, Yoon S, Kim N, Lee S, Ko M, Lee H, Kang H, Kim J. 2010. ChimerDB 2.0--a knowledgebase for fusion genes updated. Nucleic Acids Res 38(Database issue): D81-85. Levin Y, Hradetzky E, Bahn S. 2011. Quantification of proteins using data-independent analysis (MSE) in simple andcomplex samples: a systematic evaluation. Proteomics 11(16): 3273-3287. Li GZ, Vissers JP, Silva JC, Golick D, Gorenstein MV, Geromanos SJ. 2009. Database searching and accounting of multiplexed precursor and product ion spectra from the data independent analysis of simple and complex peptide mixtures. Proteomics 9(6): 1696-1719. Maclean B, Tomazela DM, Abbatiello SE, Zhang S, Whiteaker JR, Paulovich AG, Carr SA, Maccoss MJ. 2010. Effect of collision energy optimization on the measurement of peptides by selected reaction monitoring (SRM) mass spectrometry. Anal Chem 82(24): 10116-10124. MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B, Kern R, Tabb DL, Liebler DC, MacCoss MJ. 2010. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26(7): 966-968. Nesvizhskii AI, Keller A, Kolker E, Aebersold R. 2003. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75(17): 4646-4658. Picotti P, Rinner O, Stallmach R, Dautel F, Farrah T, Domon B, Wenschuh H, Aebersold R. 2010. Highthroughput generation of selected reaction-monitoring assays for proteins and proteomes. Nat Methods 7(1): 43-46. Stergachis AB, Maclean B, Lee K, Stamatoyannopoulos JA, Maccoss MJ. 2011. Rapid empirical discovery of optimal peptides for targeted proteomics. Nat Methods.