Supplemental data Figure S-1. Figure S-1. Determination of the threshold value. The black, blue and red lines represent differences between the RTP-RFP value for SVM-All, SVM-Long and SVM-Short respectively. For each SVM the ROC curve that yields the largest AUC value was used. 1 Figure S-2. Figure S-2. The averaged AUC values calculated by five fold cross validation and standard deviation of the assessment. The bar chart represents the average value of AUC values calculated from the results of five-fold cross validation tests. These values were calculated by shuffling 5 times randomly the proteins in the test and training dataset. Error bar represents the standard deviation of the assessment of each SVM. 2 Table S-I. Characteristics of SVMs with RBF and linear kernel. Kernel AUC value Training time [sec] Prediction time [sec] RBF 0.653 1.57 123.44 Linear 0.687 0.26 0.01 The SVMs were trained with the DS-All data set. The SVM with an RBF kernel function was trained with a window size of 13 and a of 0.01. The SVM with a linear kernel was trained with window size of 13. AUC values were obtained from the prediction result of each SVM with a 5-fold cross-validation test using all of the linker sequences contained in DS-All. The prediction and training time were measured for a 1000-residue sequence. 3 Table S-II:. Dependency of AUC value on training conditions. (a)Training window size AUC Value Window Size SVM-All SVM-Long SVM-Short 5 0.63301 0.64392 0.60388 9 0.67908 0.69613 0.5821 13 0.68657 0.70826 0.58606 17 0.68458 0.71206 0.5896 21 0.68337 0.70651 0.58058 25 0.6826 0.69387 0.57528 29 0.67953 0.68548 0.56506 (b)Smoothing Window size AUC Value Smoothing Value SVM-All SVM-Long SVM-Short 5 0.69202 0.72288 0.60828 9 0.69345 0.7282 0.59048 13 0.69093 0.72999 0.55635 17 0.68516 0.72935 0.51879 21 0.67909 0.72482 0.49677 25 0.67402 0.71819 0.48987 29 0.6692 0.70966 0.49744 33 0.66504 0.70069 0.50413 37 0.65873 0.68958 0.49726 AUC dependence on (a) the training window size and (b) the smoothing window size. The AUC value was calculated with a 5-fold cross-validation test. The SVM performances were assessed with DS-All. 4 Table S-III: Rank dependent prediction performance of the SVM predictors. rank 1 Predictor Sensitivity Specificity SVM-Long 0.440 0.549 SVM-Joint 0.458 0.556 DSC 0.393 0.419 Predictor Sensitivity Specificity SVM-Long 0.560 0.425 SVM-Joint 0.597 0.436 DSC 0.569 0.322 Predictor Sensitivity Specificity SVM-Long 0.611 0.371 SVM-Joint 0.685 0.357 DSC 0.718 0.285 rank2 rank3 The top one, two and three ranked predicted domain linkers were used for calculating sensitivity and specificity. Details of the calculation are given in Table II. 5 Table S-IV: Termini offset dependent prediction performance of the SVM predictors. Predictor Offset Sensitivity Specificity 0 0.532 0.345 20 0.556 0.367 40 0.56 0.409 60 0.532 0.424 0 0.546 0.368 20 0.551 0.373 40 0.56 0.425 60 0.546 0.456 0 0.426 0.261 20 0.449 0.284 40 0.5 0.344 60 0.486 0.367 0 0.574 0.375 20 0.574 0.378 40 0.597 0.436 60 0.579 0.461 SVM-All SVM-Long SVM-Short SVM-Joint The prediction performances are calculated by removing zero (none) to 60 residues at the N and C termini of each protein sequence. The values reported in Tables I~III (and throughout the paper) use an offset of 40 residues that is the value used in Armadillo. The sensitivity and specificity are calculated using predicted domain linkers with rank 1 and 2. Details of the calculation are the same as those reported in Table II. 6