Supplemental data Figure S-1. Figure S

advertisement
Supplemental data
Figure S-1.
Figure S-1. Determination of the threshold value. The black, blue and red lines represent
differences between the RTP-RFP value for SVM-All, SVM-Long and SVM-Short respectively.
For each SVM the ROC curve that yields the largest AUC value was used.
1
Figure S-2.
Figure S-2. The averaged AUC values calculated by five fold cross validation and standard
deviation of the assessment. The bar chart represents the average value of AUC values
calculated from the results of five-fold cross validation tests. These values were calculated by
shuffling 5 times randomly the proteins in the test and training dataset. Error bar represents the
standard deviation of the assessment of each SVM.
2
Table S-I. Characteristics of SVMs with RBF and linear kernel.
Kernel
AUC value
Training time [sec]
Prediction time [sec]
RBF
0.653
1.57
123.44
Linear
0.687
0.26
0.01
The SVMs were trained with the DS-All data set. The SVM with an RBF kernel function
was trained with a window size of 13 and a  of 0.01. The SVM with a linear kernel was trained
with window size of 13. AUC values were obtained from the prediction result of each SVM
with a 5-fold cross-validation test using all of the linker sequences contained in DS-All. The
prediction and training time were measured for a 1000-residue sequence.
3
Table S-II:. Dependency of AUC value on training conditions.
(a)Training window size
AUC Value
Window Size
SVM-All
SVM-Long
SVM-Short
5
0.63301
0.64392
0.60388
9
0.67908
0.69613
0.5821
13
0.68657
0.70826
0.58606
17
0.68458
0.71206
0.5896
21
0.68337
0.70651
0.58058
25
0.6826
0.69387
0.57528
29
0.67953
0.68548
0.56506
(b)Smoothing Window size
AUC Value
Smoothing Value
SVM-All
SVM-Long
SVM-Short
5
0.69202
0.72288
0.60828
9
0.69345
0.7282
0.59048
13
0.69093
0.72999
0.55635
17
0.68516
0.72935
0.51879
21
0.67909
0.72482
0.49677
25
0.67402
0.71819
0.48987
29
0.6692
0.70966
0.49744
33
0.66504
0.70069
0.50413
37
0.65873
0.68958
0.49726
AUC dependence on (a) the training window size and (b) the smoothing window size. The AUC
value was calculated with a 5-fold cross-validation test. The SVM performances were assessed with
DS-All.
4
Table S-III: Rank dependent prediction performance of the SVM predictors.
rank 1
Predictor
Sensitivity
Specificity
SVM-Long
0.440
0.549
SVM-Joint
0.458
0.556
DSC
0.393
0.419
Predictor
Sensitivity
Specificity
SVM-Long
0.560
0.425
SVM-Joint
0.597
0.436
DSC
0.569
0.322
Predictor
Sensitivity
Specificity
SVM-Long
0.611
0.371
SVM-Joint
0.685
0.357
DSC
0.718
0.285
rank2
rank3
The top one, two and three ranked predicted domain linkers were used for calculating
sensitivity and specificity. Details of the calculation are given in Table II.
5
Table S-IV: Termini offset dependent prediction performance of the SVM predictors.
Predictor
Offset
Sensitivity
Specificity
0
0.532
0.345
20
0.556
0.367
40
0.56
0.409
60
0.532
0.424
0
0.546
0.368
20
0.551
0.373
40
0.56
0.425
60
0.546
0.456
0
0.426
0.261
20
0.449
0.284
40
0.5
0.344
60
0.486
0.367
0
0.574
0.375
20
0.574
0.378
40
0.597
0.436
60
0.579
0.461
SVM-All
SVM-Long
SVM-Short
SVM-Joint
The prediction performances are calculated by removing zero (none) to 60 residues at the N and
C termini of each protein sequence. The values reported in Tables I~III (and throughout the
paper) use an offset of 40 residues that is the value used in Armadillo.
The sensitivity and
specificity are calculated using predicted domain linkers with rank 1 and 2. Details of the
calculation are the same as those reported in Table II.
6
Download