BCB430Y1_Proposal2

advertisement
BCB430Y1
Project Proposal – Mass spectrometry
Bimal Ramdoyal
Student ID: 996407552
What is Mass Spectrometry?
Mass Spectrometry (MS) is a technique that measures the mass-to-charge ratio of charged particles.
Let’s consider molecule X inserted into the MS. Once inside, it is ionized by colliding with a beam of
electrons. The ions then travel at varying speeds and the ones chosen hit the detector at the end of the
MS. We use this technique to filter out a desired set of molecule(s) from a collection of molecules. By
calculating the mass and abundance of certain fragments gives us hints about the structure of the
molecule X.
Scoring the data
The data is found in the ProHits database. Currently we use two methods to score the data namely Saint
and Compass. To use any of the two scoring methods, the data has to be extracted from the database,
and converted into input files that are compatible with each scoring method.
In the case for SAINT, to find the probability of an interaction between a prey protein i with bait j, , it
creates separate distributions, each specific top every bait-prey pair. The spectral counts for each baitprey pair are represented after a mixture model of two components, true and false interactions. From
the spectral counts, we can calculate the probability of a true and false interaction which are then used
to calculate the prior probability of true interactions. These prior probabilities are then used to calculate
SAINT probabilities, to estimate the Bayesian false discovery rate (FDR).
COMPASS is composed of an automated MS/MS data processing component, a protein
function/annotation component which form a platform for analyzing proteomic data. COMPASS uses 2
datasets LC-MS/MS and has different scoring metrics (ST, DT, & WDT). COMPASS creates a matrix in
which the rows are composed of unique proteins identified from all the experiments and the columns
represent each bait used for those experiments. Each cell in the matrix is the Total Spectral Count for a
specific interacting protein. Once the matrix is built, we can calculate the scores for each interactor for
each bait.
Proposal
1. Getting the data
To score the data, first the data must be fetched by the ProHits database, and must be converted into a
different input format for Saint and COMPASS. A perl-based interface will be created that will allow the
user to select a list of experiments based on the experiment id from a sorted list, and generate a tab
delimited text file of experiment ids and a T/C flag. This file is then passed as an input into another perl
script (export_saint.pl or export_compass.pl). The script export_saint.pl will create 3 input files (bait.dat,
prey.dat, inter.dat) in a format compatible to be run by SAINT. If the scoring method chosen is
COMPASS, then one of the output files from export_saint.pl (inter.dat) has to be converted (using the
homebrewed R script) into an m X n matrix and then the scores can be computed by COMPASS.
Currently, it is possible to have the 3 input files (required for Saint) generated by the ProHits user
interface but our task is to use the interface to inject the input file into the slightly modded
export_saint.pl script to generate the 3 required data files.
(Available on: http://code.google.com/p/prohits/source/browse/trunk/Prohits/script/export_saint.pl)
2. Interface
A user interface, titled Scoring Module, will be built to allow users to create a new analysis set which is a
tab delimited text file which consists of the experiment ID and the T/C flag. The experiments will be
sorted by either experiment ID, experiment name, userid, or datetime.
Once the user selects a list of experiments, another interface will allow him/her to select the T/C flag
and generate the analysis set.
After submitting the form, the analysis set will be created in a text file. Here is an example of the
analysis set with the first column containing the experiment IDs and the second column the T/C flag:
101
407
408
634
T
C
C
T
Using the interface the user will also have the option to select a scoring method and parse in this input
file to the appropriate script based on the scoring method used. Then, the 3 files discussed above
(bait.dat, prey.dat and inter.dat) will be created and can then be run by SAINT or modified to be run by
COMPASS. The same interface will have a section that will allow the user to compare the output for both
methods (saved into the html folder and available through download links) and adjustment parameters.
3. Database performance
Running queries that involve many experiments can be intensive, so a view-based approach will be used
to reduce the time needed to query the data. A view called “vScoringModule” will be created that will
hold the data needed for the Scoring Module interface.
4. Documentation
Will document results and provide a tutorial manual to aid new users to get comfortable with the user
interfaces.
Goals:
The main goals of this project are to create a set of user interfaces that will allow users to select a
sortable list of experiments, choose their preferred scoring methods, run the scripts through the
interface itself and download the files. Furthermore, the interface should enable the users to compare
their scoring methods and filter out based on categories and see the performance and to analyze
different sets of data using different metrics, and to be used as a benchmarking tool.
References:
Current Scoring Module interface v1.0:
http://tin.emililab.edu/Prohits/analyst/scoring_module/scoring_module.cgi
link to current version of export_saint.pl:
http://code.google.com/p/prohits/source/browse/trunk/Prohits/script/export_saint.pl?spec=svn37&r=37
http://falcon.hms.harvard.edu/ipmsmsdbs/cgi-bin/tutorial.cgi#scores
Defining the Human Deubiquitinating Enzyme Interaction Landscape, Mathew E. Sowa,1§ Eric J. Bennett,1§ Steven
P. Gygi,2 and J. Wade Harper1*, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2716422/
Analysis and validation of proteomic data generated by tandem mass spectrometry, Alexey I Nesvizhskii1, Olga
Vitek2 & Ruedi Aebersold3, 4
http://www.nature.com/nmeth/journal/v4/n10/abs/nmeth1088.html
SAINT: probabilistic scoring of affinity purification–mass spectrometry data
Hyungwon Choi, 1Brett Larsen,2 Zhen-Yuan Lin,2 Ashton Breitkreutz,2 Dattatreya Mellacheruvu,1 Damian Fermin,1
Zhaohui S Qin,3, 8 Mike Tyers,2, 4, 5, 6 Anne-Claude Gingras2, 4 Alexey I Nesvizhskii1, 7
http://www.nature.com/nmeth/journal/v8/n1/full/nmeth.1541.html
Mass spectrometry-based proteomics
Aebersold R, Mann M. http://www.ncbi.nlm.nih.gov/pubmed/12634793
Download