Clinical prediction from structural brain MRI scans

advertisement
SUPPLEMENTARY MATERIAL
Clinical prediction from structural brain MRI scans:
A large-scale empirical study
Mert R. Sabuncu1,2 and Ender Konukoglu1,
for the Alzheimer’s Disease Neuroimaging Initiative*
1
Athinoula A. Martinos Center for Biomedical Imaging, Harvard Medical
School/Massachusetts General Hospital, Charlestown, MA
2
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of
Technology, Cambridge, MA
1
Supplementary Table S1
List of univariate measurements, such as thickness or volume of an anatomical ROI, that
were most frequently associated with each variable of interest over the hundred 5-fold
cross-validation sessions.
2
Supplementary Figure S1
Prediction performance metrics for each variable and MVPA algorithm, estimated via 5fold cross-validation. (Panels a-c) Binary variables; (Panel d) Continuous variables.
(Panel a) Area under the receiver operating characteristic curve (AUC) computed using
MATLAB’s perfcurve tool (MATLAB 8.2, The Mathworks Inc. Natick, MA, 2013).
(Panel b) True positive rate, or sensitivity. (Panel c) True negative rate, or specificity.
(Panel d) Pearson’s correlation values. The MVPA algorithms are abbreviated as
follows: N for neighborhood approximation forest, S for SVMs, and R for RVMs. The
number after each letter denotes the feature type (1:aseg, 2:aparc, 3:aseg+aparc, 4:thick).
The shaded gray color indicates statistical significance (-log10 p-value), where the pvalue is computed via DeLong’s method and the colorbar is shown in Panel d.
Associations with a p-value less than 0.01 are shown in red.
(a)
3
(b)
4
(c)
5
(d)
6
Supplementary Figure S2
The difference between the performance estimates of the three algorithms over all
predicted variables and feature types. (Panel a) Correct Classification Ratio (CCR)
differences with respect to the RVM, which has the highest average accuracy. All three
classifiers are statistically equivalent. (Panel b) Normalized RMSE differences with
respect to SVM, which offers the lowest average RMSE. NAF and SVM are statistically
equivalent, and offer significantly better regression accuracy than RVM. On each box,
the central mark is the median, the edges of the box are the 25th and 75th
percentiles, the whiskers extend to the most extreme data points not considered
outliers (which corresponds to a 99.3% data coverage if the data are normally
distributed), and outliers are plotted individually. Plots were generated with
Matlab’s boxplot function, with the ‘notch’ option turned on and default settings
otherwise.
(a)
7
(b)
8
Supplementary Figure S3
The difference between the performance estimates of the MVPA models that employ the
four image feature types, over all predicted variables and algorithms. (Panel a) Correct
Classification Ratio (CCR) differences with respect to image feature 3 (aseg+aparc),
which yields the highest average accuracy. Features 3 and 4 are statistically equivalent
and offer slightly better accuracies than 1 and 2. (Panel b) Normalized RMSE differences
with respect to Feature 4, which yields the lowest average NRMSE. Features 1 and 3
yield performances that are statistically equivalent to Feature 4. On each box, the central
mark is the median, the edges of the box are the 25th and 75th percentiles, the
whiskers extend to the most extreme data points not considered outliers (which
corresponds to a 99.3% data coverage if the data are normally distributed), and
outliers are plotted individually. Plots were generated with Matlab’s boxplot
function, with the ‘notch’ option turned on and default settings otherwise.
(a)
9
(b)
10
Supplementary Figure S4
Algorithm versus feature range values. For each variable, we computed the algorithm
range as the difference between the best and worst performance metrics (CCR for
classification –Panel a- and NRMSE for regression –Panel b-) across the three
algorithms (SVM, RVM and NAF), while fixing the feature type. These values were then
averaged over feature types. Similarly, for each variable, the feature range was defined as
the difference between the best and worst performance metrics across the four feature
types, while fixing the algorithm type. These values were then average over the
algorithms.
(a)
11
(b)
12
Download