1 Supplementary Material 2 Methods 3 Programs were written in Microsoft Visual Studio 2008 in Visual Basic .NET. For fast sorting 4 of the selected peptides masses, List(Of Long) Visual Basic class was used. Masses with five- 5 digit precision and indices containing sequence information (protein number, start position in the 6 sequence, and length of the peptides) were concatenated and converted to a long integer value 7 and fast sorted using sort method of the class. After sorting, the long integer values were 8 converted back to peptide masses and corresponding peptide sequence information was 9 associated with them. The algorithm was implemented in the DXMSMS Match Sorting 10 program, which is available free for download at 11 http://www.creativemolecules.com/CM_Software.htm. 12 13 Experimental crosslinking data were generated as described previously [1, 2]. Briefly, proteins 14 were crosslinked with isotopically-coded amine reactive crosslinker CBDPS-H8/D8 (Creative 15 Molecules Inc.), digested with proteinase K, the crosslinked peptides were enriched by using 16 monomeric avidin and were analyzed by ESI-LC-MS and MS/MS using an Orbitrap mass 17 spectrometer and the Mass Tags acquisition method. 18 19 Mass spectrometric analysis was carried out with a nano-HPLC system (Easy-nLC II, 20 ThermoFisher) coupled to the ESI-source of an LTQ Orbitrap Velos mass spectrometer 21 (ThermoFisher Scientific). Samples were injected onto a 100 μm ID, 360 μm OD IntegraFrit 22 trap column (New Objective Inc.) packed with Magic C18AQ (5 µm particle size, 100 Å pore 23 size, Bruker-Michrom) and desalted by washing for 15 min 300 nl/min with 0.1% (v/v) formic 24 acid. Peptides were subsequently injected onto a 75 μm ID, 360 μm OD IntegraFrit analytical 25 column packed with Magic C18AQ (5 µm particle size, 100 Å pore size), equilibrated with 95% 26 solvent A (2% (v/v) acetonitrile, 98% (v/v) water, 0.1% (v/v) formic acid), 5% solvent B (90% 27 (v/v) acetonitrile, 10% (v/v) water, 0.1% (v/v) formic acid). Peptides were separated at a flow 28 rate of 300 nl/min using a 70 minutes gradient (0–60 min: 4–40% solvent B, 60–62 min: 40–80% 29 solvent B, 62–70 min: 80% solvent B). 30 31 MS data were acquired with Xcalibur (ver. 2.1.0.1140) with Mass Tags and Dynamic Exclusion 32 precursor selection methods enabled in global data dependent settings. For CBDPS-H8/D8, a 33 mass difference between the light and heavy isotopic forms of 8.05824 Da was used in Mass 34 Tags setting. Mass tags and inclusion list runs used the Top 3 method. MS scans (m/z range 35 from 400 to 2000) and MSMS scans were acquired in the Orbitrap mass analyzer at 60000 and 36 30000 resolution, respectively. Fragment ions for MSMS acquisition were produced by collision- 37 induced dissociation (CID) at normalized collision energy of 35% for 10 ms at activation q=0.25. 38 Data analysis was performed with DXMSMS Match of ICC-CLASS [3]. 39 40 Computationally, in order to implement the algorithm, all of the peptide masses are first sorted 41 into a one-dimensional array. The search iterations are performed as two nested loops, with 42 indices starting from opposite ends of the array and with index increments going in opposite 43 directions (Figure 1S). The outer loop starts from the top of the one-dimensional array and the 44 inner loop starts from the bottom of the one-dimensional array. The inner loop is set to start 45 from MP2 + and exit after reaching the element of the one-dimensional array equal to MP2 –. 46 Compared to conventional search algorithms, this strategy reduces the number of iterations 47 required to search n peptides from ~0.5 n2 to ~6.8 n (Figure 2S). 48 49 50 51 52 Figure 1S. Illustration of the fast mass matching algorithm calculation. Calculated peptide 53 masses are sorted ascending in one-dimensional array. The sums of possible peptide masses are 54 tested pairwise (Mi+Mj) to see if they fit the experimentally determined mass (Mobs). The search 55 is organized as two nested loops. For each Mi (upper box, outer loop), which corresponds to 56 MP1, only masses in the vicinity of Mj (i.e., masses between Mj+k and Mj-l -- the start and end 57 points of the inner loop), are searched (lower box). The inner loop index (j+k) starts from the 58 position where the previous examination of the MP2 values ended, and ends with j-l (i.e., when 59 the element of the array with mass equal to MP2 - is reached. As the algorithm proceeds, the 60 peptide mass Mi increases, while the peptide mass Mj decreases (arrows). In this way, for each 61 Mi, only elements from j+k to j-l are searched instead of the entire array (from 1 to n). This 62 results in a significant reduction in the number of iterations needed, and a significant 63 improvement in search speed. 64 M1 . Mi . . . . . . Mj-l . . Mj . . . Mj+k . . . Mn |Mobs-(Mi+Mj+MCL)|< 65 Figure 2S. Dependence of the ratio of the number of iterations required for the conventional 66 method to the number of iterations required for the fast sorting algorithm on the size of the 67 protein database. The ratio was calculated as number of iterations necessary for the conventional 68 algorithm ((n+1)n/2) divided by the number of iterations used in the fast sorting algorithm for the 69 peptide selections shown in Table 1, where n is the number of the peptides selected from the 70 protein database in each case. The calculated curve for the conventional method is shown in 71 Figure 2Sa, and the number of iterations required is approximately equal to ~0.5 n2. By 72 performing a linear regression on the data for the fast sorting method, the number of iterations 73 required is ~6.8n, Figure 2Sb. Thus the calculated number of iterations needed is reduced by a 74 factor of ~0.07n (i.e., 0.5 n2/0.07n = 6.8n), Figure 2Sc. 75 76 77 a) number of iterations w/o fast sort 78 79 4E+11 3.5E+1 1 y = 0.500n2 R2 = 1 3E+11 2.5E+1 1 2E+11 1.5E+1 1 1E+11 5E+10 0 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 n, number of peptides b) number if iterations with fast sort 80 81 82 7000000 y = 6.78n + 51786 R2 = 0.9603 6000000 5000000 4000000 3000000 2000000 1000000 0 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 n, number of peptides c) ratio of iterations w/ and w/o fast sort 83 84 85 70000 y = 0.074n + 252.64 R2 = 0.9805 60000 50000 40000 30000 20000 10000 0 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 n, number of peptides 86 87 88 89 90 91 References 92 Proteinase K Non-Specific Digestion for Selective and Comprehensive Identification of 93 Interpeptide Crosslinks: Application to Prion Proteins., Mol. Cell. Proteomics 2012, 11, 94 M111.013524. 95 [2] Petrotchenko, E. V., Serpa, J. J., Borchers, C. H., An Isotopically-coded CID-cleavable 96 biotinylated crosslinker for structural proteomics, Mol. Cell. Proteomics 2011, 97 doi:10.1074/mcp.M110.001420 98 [3] Petrotchenko, E. V., Borchers, C. H., Isotopically-Coded Cleavable CrossLinking Analysis 99 Software Suite (ICC-CLASS) for the automated analysis of MALDI- and ESI-LC-MS-MS/MS 100 101 102 [1] Petrotchenko, E. V., Serpa, J. J., Berjanskii, M., Suriyamongkol, B. P., et al., Use of crosslinking data., BMC Bioinformatics, submitted.