Flowgram 1. Preprocessing Data 01 (mzxml) Data 02 (mzxml) Pre-processing the data Data 01 (mat file) Data 02 (mat file) 2. Training data & model parameters generation Common Data 01 & Data 02 Tandem Peptide List Training Data & Model parameters Generation Training Dataset Model Parameters 3. Alignment part Data 01 & Data 02 Testing List (Different from training peptides) Align Testing List Alignment Result Model Parameters 4. Aligning Result Verification Alignment Result excel file Input Ground truth file Verification Result Readme 1. The pre-processing script: main_pre_processing.m, takes in the raw LC/MS data files in the mzxml format to be aligned, parses the mzxml files, and save the LC/MS level one scans in two Matlab data files named data02levelone.mat and data03levelone.mat These two files will be the input to both the training data & model parameter generation script: main_training.m and alignment testing script main_testing.m. 2. The training data and model parameters generation script: main_training.m is used to get the training peptide for the Alignment R (AR) statistics model and Alignment Time (AT) model. The input should be a peplist xls file which list common Tandem MS identified peptides from the two data files to be aligned data02 and data03. ( A demo file of such a list is TandemCommonPeplist.xlsx). The selected training peptides are chosen based on the conditions outlined in the paper. The output is a training dataset (a list of peptides) and the model parameters. The training dataset is saved as an excel file (CommonTandemTrainingPeplistInput.xlsx as an example), which contains peptide sequence, charge state in data02, Tandem MS identified time point in data02, charge state in data03 and Tandem MS identified time point in data03. The model parameter file (training_parameter.mat)contains all the parameters for the Gamma AR model and the Student-t AT model. The names of the input and output files can be specified by the user in the main_training.m script. 3. The alignment testing script main_testing.m is used to align the testing datasets. The input is supposed to be a peptide list annotated by peptide sequence and charge state information. See CommonTandemTestingPeplistInput.xlsx for an example. Note that elution time information is optional. The column corresponding to data02ms2timepoint and data03ms2timepoint can be set to all zeros. The input parameters are loaded from the training_parameter.mat file. The alignment result is an excel file that contains 8 columns. The first is titled pepseqence which is as same as that in the input file. The 2nd column is named XIC02exist which indicates if a peptide can be found in data02 (1 for exist, 0 for non-exist). The 3rd column is named XIC03exsit which has the same meaning as that of the 2nd column for data03. The 4th column is named XIC0203exist which shows if peptides can be found in both datasets. The 5th and 6th columns are named T02start and T02end which indicate when peptide elution time start and end in data02. The last two columns record the elution time start and end in data03. See CommonTandemMsInspectTestingPeplistAlignmentResult.xlsx for an example. The input and output file names can be modified in the main_testing.m script. 4. The main_verification.m is used to verify the alignment result if the user have the tandem time point in the testing excel file as CommonTandemTestingPeplistInput.xlsx for an example. The data02ms2timepoint and data03ms2timepoint are set to be the ground truth. This script compares the intervals the LABAHT detected to the ground truth. the result is saved in Detection_verification.mat in which the 1st column is for data02 (1 for detected, 0 for non-detected) while the 2nd column is for data03.