BCB430Y1 Project Proposal – Mass spectrometry Bimal Ramdoyal Student ID: 996407552 What is Mass Spectrometry? Mass Spectrometry (MS) is a technique that measures the mass-to-charge ratio of charged particles. Let’s consider molecule X inserted into the MS. Once inside, it is ionized by colliding with a beam of electrons. The ions then travel at varying speeds and the ones chosen hit the detector at the end of the MS. We use this technique to filter out a desired set of molecule(s) from a collection of molecules. By calculating the mass and abundance of certain fragments gives us hints about the structure of the molecule X. Scoring the data The data is found in the ProHits database. Currently we use two methods to score the data namely Saint and Compass. To use any of the two scoring methods, the data has to be extracted from the database, and converted into input files that are compatible with each scoring method. In the case for SAINT, to find the probability of an interaction between a prey protein i with bait j, , it creates separate distributions, each specific top every bait-prey pair. The spectral counts for each baitprey pair are represented after a mixture model of two components, true and false interactions. From the spectral counts, we can calculate the probability of a true and false interaction which are then used to calculate the prior probability of true interactions. These prior probabilities are then used to calculate SAINT probabilities, to estimate the Bayesian false discovery rate (FDR). COMPASS is composed of an automated MS/MS data processing component, a protein function/annotation component which form a platform for analyzing proteomic data. COMPASS uses 2 datasets LC-MS/MS and has different scoring metrics (ST, DT, & WDT). COMPASS creates a matrix in which the rows are composed of unique proteins identified from all the experiments and the columns represent each bait used for those experiments. Each cell in the matrix is the Total Spectral Count for a specific interacting protein. Once the matrix is built, we can calculate the scores for each interactor for each bait. Proposal 1. Getting the data To score the data, first the data must be fetched by the ProHits database, and must be converted into a different input format for Saint and COMPASS. A perl-based interface will be created that will allow the user to select a list of experiments based on the experiment id from a sorted list, and generate a tab delimited text file of experiment ids and a T/C flag. This file is then passed as an input into another perl script (export_saint.pl or export_compass.pl). The script export_saint.pl will create 3 input files (bait.dat, prey.dat, inter.dat) in a format compatible to be run by SAINT. If the scoring method chosen is COMPASS, then one of the output files from export_saint.pl (inter.dat) has to be converted (using the homebrewed R script) into an m X n matrix and then the scores can be computed by COMPASS. Currently, it is possible to have the 3 input files (required for Saint) generated by the ProHits user interface but our task is to use the interface to inject the input file into the slightly modded export_saint.pl script to generate the 3 required data files. (Available on: http://code.google.com/p/prohits/source/browse/trunk/Prohits/script/export_saint.pl) 2. Interface A user interface, titled Scoring Module, will be built to allow users to create a new analysis set which is a tab delimited text file which consists of the experiment ID and the T/C flag. The experiments will be sorted by either experiment ID, experiment name, userid, or datetime. Once the user selects a list of experiments, another interface will allow him/her to select the T/C flag and generate the analysis set. After submitting the form, the analysis set will be created in a text file. Here is an example of the analysis set with the first column containing the experiment IDs and the second column the T/C flag: 101 407 408 634 T C C T Using the interface the user will also have the option to select a scoring method and parse in this input file to the appropriate script based on the scoring method used. Then, the 3 files discussed above (bait.dat, prey.dat and inter.dat) will be created and can then be run by SAINT or modified to be run by COMPASS. The same interface will have a section that will allow the user to compare the output for both methods (saved into the html folder and available through download links) and adjustment parameters. 3. Database performance Running queries that involve many experiments can be intensive, so a view-based approach will be used to reduce the time needed to query the data. A view called “vScoringModule” will be created that will hold the data needed for the Scoring Module interface. 4. Documentation Will document results and provide a tutorial manual to aid new users to get comfortable with the user interfaces. Goals: The main goals of this project are to create a set of user interfaces that will allow users to select a sortable list of experiments, choose their preferred scoring methods, run the scripts through the interface itself and download the files. Furthermore, the interface should enable the users to compare their scoring methods and filter out based on categories and see the performance and to analyze different sets of data using different metrics, and to be used as a benchmarking tool. References: Current Scoring Module interface v1.0: http://tin.emililab.edu/Prohits/analyst/scoring_module/scoring_module.cgi link to current version of export_saint.pl: http://code.google.com/p/prohits/source/browse/trunk/Prohits/script/export_saint.pl?spec=svn37&r=37 http://falcon.hms.harvard.edu/ipmsmsdbs/cgi-bin/tutorial.cgi#scores Defining the Human Deubiquitinating Enzyme Interaction Landscape, Mathew E. Sowa,1§ Eric J. Bennett,1§ Steven P. Gygi,2 and J. Wade Harper1*, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2716422/ Analysis and validation of proteomic data generated by tandem mass spectrometry, Alexey I Nesvizhskii1, Olga Vitek2 & Ruedi Aebersold3, 4 http://www.nature.com/nmeth/journal/v4/n10/abs/nmeth1088.html SAINT: probabilistic scoring of affinity purification–mass spectrometry data Hyungwon Choi, 1Brett Larsen,2 Zhen-Yuan Lin,2 Ashton Breitkreutz,2 Dattatreya Mellacheruvu,1 Damian Fermin,1 Zhaohui S Qin,3, 8 Mike Tyers,2, 4, 5, 6 Anne-Claude Gingras2, 4 Alexey I Nesvizhskii1, 7 http://www.nature.com/nmeth/journal/v8/n1/full/nmeth.1541.html Mass spectrometry-based proteomics Aebersold R, Mann M. http://www.ncbi.nlm.nih.gov/pubmed/12634793