file - BioMed Central

advertisement
Additional File 1
Appendix A Processes used to develop the predictive model. (a) In the model
development phase, bias-controlled datasets were generated from the Tokyo dataset by
randomly selecting individuals, allowing redundant selection. Each generated dataset
contained an approximately equal ratio of AxLN-negative and -positive patients.
ADTree models were then developed using the generated datasets under different
conditions, such as the number of nodes in an ADTree and the number of ADTrees in a
prediction model. The model yielding the best area under the receiver operating
characteristics (ROC) curve (AUC) value using the Kyoto dataset was selected. (b) The
Seoul dataset, which was not used in the development phase, was used to evaluate the
selected model.
Appendix B ADTree-based prediction models. One of the ADTree-based prediction
models is shown in Figure 1. The final prediction score represents the mean score of the
five ADTree models. The probability of lymph node metastasis (%) is determined using
the formula (scorepred – scoremin) / (scoremax – scoremin) × 100, where scorepred, scoremin,
and scoremax are the predicted and theoretical minimum and maximum final scores,
respectively.
Appendix C Calculation of the predictive score in each ADTree model. An individual
tree’s score is determined by taking the value of the top node and adding those of the
children nodes. When child nodes are linked to a parent node by a dashed line, the
scores of all child nodes are included in the calculation. When child nodes are linked to
a parent node by a solid line, only the score of the child node that fulfills the branching
condition is included in the calculation. The nodes in red were involved in the
calculation when a patient had the following features: A, No; B, No; C, 1; D, Z; E, Yes;
and F, 5. When a node has missing value, the range of the predictive score can be
calculated by considering the both branching child nodes; e.g. when F is missing, the
range is from -0.2 to 0.6.
Appendix D Calibration plots of the ADTree-based model for the Kyoto (a) and Seoul
(b) datasets. The predicted value for each variable was divided into quintiles according
to the predictive probability, and the mean predictive probability and the actual
frequency of lymph node metastasis were plotted for each quintile (triangles). A
polynominal formula for calibration correction was developed using the Kyoto dataset
and applied to the Seoul dataset. Circular dots show the corrected prediction.
Appendix E AUC values and the number of nodes in the pruning analysis The X-axis
shows the number of ADTrees in the prediction model. Black, white and gray bars
represent the AUC values of the Tokyo (cross validation), Kyoto and Seoul datasets,
respectively. The line shows the number of nodes in the prediction model. For cross
validation, the Tokyo dataset was randomly split into a 9:1 ratio. A model was
developed using the larger dataset and evaluated using the smaller dataset. This process
was repeated 10 times and the AUC was calculated based on evaluated data.
Appendix F ROC curves of the ADTree model, the MSKCC nomogram and the
Russells Hall Hospital scoring system using the Seoul dataset (n = 131). The AUC
values were 0.777 (95% CI: 0.689–0.864, P<0.001) for the ADTree model, 0.664 (95%
CI: 0.560–0.768, P = 0.0033) for the MSKCC nomogram and 0.620 (95% CI:
0.509–0.731, P = 0.0032) for the Russells Hall Hospital scoring system.
Appendix A
b) Validation phase
a) Model development phase
Tokyo
dataset
Start
Seoul
dataset
Bias controlled dataset
generation
Validation
Bias
controlled
datasets
Model development
Model selection
Kyoto
dataset
Appendix B
(a)
+0.107
1: US tumor,
echogenic halo
y
–0.215
2: MMG calcification,
distribution
grouped/
regional
+0.707
–0.340
+0.629
n
y
+0.360
–0.580
y
n
–0.688
+1.297
+0.691
10: US tumor, D/W
ratio ≥ 0.54
multifocal
None/
one
+0.335
11: US tumor, max
size ≥ 0.9 cm
fine branching/
pleomorphic/
–0.335
8: US tumor, number
–0.439
y
amorphous/
small round
–0.394
–0.344
9: Age ≥ 60
years
6: MMG calcification, figure
n
+1.144
–0.093
n
–0.623
+0.224
y
n
4: US tumor,
D/W ratio ≥ 0.84
y
n
12: US tumor, interruption
of anterior border
n
+1.012
13: US tumor, max
size ≥ 1.7cm
5: US tumor, max size
≥ 2.9 cm
y
y
–0.123
+1.02
linear/
segmental
–0.310
n
y
n
+1.291
7: US LN, max size
≥ 1.1 cm
3: PE skin dimpling
y
n
+0.831
+0.564
(b)
–0.587
+0.121
3: US tumor, max
size ≥ 2.4 cm
1: US LN, max size
≥ 1.1 cm
n
y
–0.325
+0.920
9: Age ≥
50 years
y
2: US tumor, D/W
ratio ≥ 0.66
n
n
–0.431
y
–0.747
+0.456
+0.983
–
0.175
4: PE, nipple
discharge
n
y
–0.102
7: US tumor,
max size ≥ 1.9 cm
n
y
10: Age ≥ 40
years
n
–0.771
+0.729
6: US tumor,
interruption of
anterior border
y
y
n
13: Age ≥ 80
years
n
–0.663
–0.504
+0.158
y
+0.893
+0.249
–1.091
+0.199
–0.839
y
n
8: MMG calcification,
distribution
linear/
segmental
y
–0.745
–0.417
+0.869
n
+1.018
+1.269
11: Histological grade ≥
2
n
y
5: Pathology:
HER2 ≥ 0.250
12: PE, skin dimpling
n
y
–0.883
+0.867
grouped/
regional
+0.686
Appendix B (continued)
(c)
+0.013
1: US LN, max size ≥
1.1 cm
n
+0.84
7
5: US tumor, max
size ≥ 2.5 cm
n
–0.335
y
n
+0.847
+0.646
y
–0.319
–0.354
–0.585
–0.442
+0.472
amorphous/
small round
+0.364
fine branching/
pleomorphic
–0.501
+0.990
13: US tumor, max
size ≥ 2.5 cm
11: Histological grade
=3
y
n
y
–0.480
+0.848
7: MMG calcification,
figure
y
n
n
–0.832
10: US tumor, D/W
ratio ≥ 0.68
n
+0.768
y
n
–0.437
+0.371
9: US tumor, number
≥ 0.750
y
y
n
+0.883
4: US tumor, D/W
ratio ≥ 0.509
n
y
6: US tumor, max
size ≥ 1.2 cm
8: Age ≥ 60 years
y
–0.210
+0.859
–0.174
12: Histological grade
≥2
n
y
n
y
–0.377
3: PE, skin
dimpling
2: US tumor, echogenic
halo
–0.932
+0.554
+0.505
(d)
+0.067
1: US LN,
max size ≥ 1.1 cm
n
y
n
+0.827
–0.255
y
–0.327
+0.744
4: Age ≥
50 years
n
n
–0.624
+0.507
n
–0.502
–0.525
–0.575
+1.122
y
+0.951
n
–0.730
n
–0.368
n
–0.333
+0.971
9: US tumor,
max size ≥ 1.7 cm
y
+0.200
n
y
+0.229
–0.787
13: MMG calcification,
figure
pleomorphic/
amorphous/
small round
fine branching
–0.891
+0.414
+0.422
8: US tumor, max
size ≥ 1.6 cm
y
–0.215
+0.412
12: US tumor,
max size ≥ 1.5 cm
5: BMI ≥ 25
n
y
y
n
+0.980
7: US tumor, max
size ≥ 2.5 cm
fine branching/
pleomorphic/
amorphous
small round
y
y
–0.139
10: MMG calcification,
figure
6: US tumor,
D/W ratio ≥ 0.71
11: US tumor,
interruption of anterior
border
3: US tumor,
echogenic halo
2: PE, skin dimpling
y
+1.169
Appendix C
–0.1
B
A
y
n
–0.2
y
n
–0.2
+1.0
C
+0.5
D
≥2
<2
–0.4
+0.4
X, Y
–0.4
Z
+0.2
E
F
n
y
+0.5
–0.3
>4
≤4
+0.5
–0.3
–0.1 –0.2 –0.2 +0.4 +0.2 +0.5 = +0.6
Root A
B
C
D
F
Appendix D
(a)
1.0
1.0
Prediction
Corrected prediction
Ideal
0.8
Actual frequency
Actual frequency
(b)
Kyoto
0.6
0.4
0.2
Seoul
Prediction
Corrected prediction
0.8
Ideal
0.6
0.4
0.2
0.0
0.0
0.0
0.2
0.4
0.6
0.8
Predicted probability
1.0
0.0
0.2 0.4 0.6 0.8
Predicted probability
1.0
Appendix E
16
Tokyo (CV)
Kyoto
Seoul
0.85
14
12
AUC
0.80
10
0.75
8
0.70
6
4
0.65
2
0.60
0
1
4
3
2
Number of ADTrees
5
Number nodes
0.90
Appendix F
100
Sensitivity (%)
80
60
ADTree
40
Nomogram
Scoring
20
0
0
20
40
60
100 – Specificity (%)
80
100
Download