Multimodal MR Imaging Model to Predict Tumor Infiltration in Patients with Gliomas APPENDIX (On-line Only) APPENDIX 1. STATISTICAL METHODS Overview: As previously indicated, the goal of our study was to evaluate if data from a combination of advanced MR imaging techniques could be utilized to better predict the tumor infiltration in patients with gliomas, compared to a single imaging technique. A three-step analytical modeling process was undertaken to achieve this goal. Principal component analyses: Due to the small samples size and the large number of imaging variables, the first step in the model building process was to reduce the dimensionality of the imaging dataset without losing important information about each of the12 imaging variables. This step was accomplished by way of a principal components analysis (PCA). PCA utilizes orthogonal projections of the multivariate data to produce a set of linear independent principal composite scores (pccomposite scores), which together capture all of the information associated with the complete multivariate dataset. The number of orthogonal projects is equal to the number of variables in the multivariate dataset. Thus, for the study at hand we used a PCA to generate 12 pc-composite scores per nuclear density measurement. Each pc-composite score was derived as linear combination of the values of the precontrast MPRAGE, T2, FLAIR, DWI, DTI, DSC, PWI, post-contrast MPRAGE, and axial T1 spin echo imaging variables associated with the nuclear density measurement. Supplemental Table 1. Principal component non standardized coefficients (i.e. the weights give to the individual predictors) Predictor PC1 0.05830 T1 0.00082 fa 0.00000 fa num 0.00000 fa denom 0.00000 Mean Diff. -0.96708 T2 -0.00001 K2 -0.00681 rCBF -0.19794 rCBV corr. -0.14875 rCBV uncorr. -0.00042 rMTT 0.00336 TTP PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 0.44323 -0.86202 0.23875 -0.00630 -0.00117 0.00470 -0.00062 0.00000 0.00000 0.00000 0.00000 -0.00090 -0.00103 0.00421 0.01733 -0.01466 -0.17951 0.98348 0.00184 0.00022 -0.00179 0.00006 0.00000 0.00000 0.00001 0.00000 -0.00005 -0.00057 0.00086 0.22320 0.37934 0.82340 0.35819 0.00000 0.00000 0.00000 0.00000 0.00004 -0.00016 -0.00123 0.32903 0.77806 -0.23945 -0.47857 0.00000 0.00000 0.00000 -0.00001 0.00006 0.00016 -0.00117 0.08694 0.29929 -0.51017 0.80162 -0.19246 -0.13901 0.09161 0.00199 0.00114 -0.00069 -0.00006 0.00000 0.00000 0.00000 0.00000 -0.00001 -0.00002 -0.00007 0.00010 0.00004 0.00073 0.00187 -0.91343 0.40144 0.06639 -0.00857 0.01967 0.01531 0.00064 -0.04356 0.04308 0.98131 0.18056 0.00117 0.00004 0.00015 0.00002 0.57266 0.06273 -0.78983 -0.06910 -0.00826 -0.01643 0.00223 0.00004 -0.00002 -0.00001 0.00000 0.66192 0.48231 0.55080 0.05720 0.00823 -0.01899 -0.00548 -0.00007 0.00002 0.00000 0.00000 -0.00030 0.01033 0.02728 -0.19516 -0.97970 0.03506 -0.00487 -0.00003 0.00005 -0.00005 0.00000 -0.00584 0.02636 0.08148 -0.97551 0.19489 -0.05400 0.00991 -0.00012 0.00003 -0.00001 -0.00001 Univariate analyses: The second step in the model building process was to identify the set of pccomposite scores (pc-composite scores) that were linearly associated with nuclear density. This step was accomplished by conducting a set of univariate generalized estimating equation regression (GEER) analyses, in which for each of the 12 set of pc-composite scores, nuclear density served as the GEER model response variable and the set of pc-composite scores served as the GEE model predictor variable. With regard to the GEER model specification, the Gaussian distribution was the assumed underlying distribution of the nuclear density measurements, and since each patient had multiple nuclear density measurements, to account for intra-patient measurement correlation in the hypothesis testing process, the GEE model variance-covariance parameter estimates were derived by way of the Huber and White sandwich variance-covariance estimator. With regard to hypothesis testing, a p≤0.05 decision rule was utilized as the criterion for rejecting the null hypothesis that there was no linear association between the pc-composite score and nuclear density measurement. The adequacy of the univariate model to predict nuclear density was assessed via the coefficient of determination (R2). Supplemental Figure 1. Principal component coefficients for PC1-PC12. Principle Component Coefficients PC 1 PC 2 TTP PC 3 rM TT rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa -1 .0 -0 .5 0 .0 0 .5 1 .0 -1 .0 -0 .5 PC 5 0 .0 0 .5 1 .0 T1 0 .5 1 .0 -1 .0 -0 .5 TTP rM TT rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa T1 0 .0 0 .0 0 .5 1 .0 -1 .0 0 .5 1 .0 1 .0 0 .5 1 .0 -1 .0 0 .5 1 .0 0 .5 1 .0 T1 -0 .5 0 .0 0 .5 1 .0 -1 .0 -0 .5 PC 11 0 .5 0 .0 TTP 0 .0 PC 12 TTP rM TT rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa T1 T1 0 .0 -0 .5 rM TT rM TT rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa -0 .5 -1 .0 PC 8 TTP TTP -0 .5 1 .0 rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa PC 10 rM TT -1 .0 0 .5 TTP PC 9 T1 0 .0 rM TT rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa T1 rM TT rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa T1 0 .0 -0 .5 PC 7 TTP TTP -0 .5 -1 .0 rM TT rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa T1 PC 6 rM TT rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa -1 .0 TTP rM TT rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa T1 rM TT rCBV u n c o r rCBV c o rr rCBF K2 T2 m e a n d i ff fa d e n o m fa n u m fa T1 T1 PC 4 TTP TTP -1 .0 -0 .5 0 .0 0 .5 1 .0 -1 .0 -0 .5 0 .0 Supplemental Table 2. Univariate generalized estimating equation analysis (modeling only marginal associations). Predictor PCS 1 PCS 2 PCS 3 PCS 4 PCS 5 PCS 6 PCS 7 PCS 8 PCS 9 PCS 10 PCS 11 PCS 12 Degrees of Freedom 1 1 1 1 1 1 1 1 1 1 1 1 Chi-Square P-value R2 0.16 1.36 0.68 0.78 0.03 5.31 1.16 0.56 0.00 5.75 0.06 3.97 0.688 0.244 0.408 0.376 0.854 0.021 0.281 0.453 0.981 0.016 0.811 0.046 0.01 0.04 0.01 0.01 0.00 0.05 0.05 0.02 0.00 0.08 0.00 0.19 Multivariate analyses: The third and final step in model building process was to utilize the pc-scores that were found to be statistically associated with nuclear density in step 2 of the model building process to build a multivariate model to predict nuclear density. Two multivariate GEER models were examined. The first model included the pc-composite scores that were found to be associated with nuclear density in step 2 of the model building process, while the second model include linear and nonlinear restricted cubic spline functions of the same pc-composite scores. As in the univariate analyses, the Gaussian distribution was the assumed underlying distribution of the nuclear density measurements, and the GEE model variance-covariance parameter estimates were derived by way of the Huber and White sandwich variance-covariance estimator. With regard to hypothesis testing, the GEE modified version of the generalized Wald test was utilized to compare the two models. A p≤0.05 decision rule was utilized as the criterion for rejecting the null hypothesis that the predictive information gained by allowing for non-linear associations between the pc-composite scores and nuclear density was no greater than what would be expected by pure chance. The same hypothesis testing strategy was utilized to determine the composition of the pc-composite scores that were included in the final multivariate GEER model. The final regression model is described by Equation 1 (Eq. 1), where E(Density |X) denotes the predicted nuclear density given the values of PC10 and PC12, given in Equations 2 and 3, respectively. Eq 1. E(Density |X) = 1857.0969 + 322.9077(PC10 x 1000) − 1996.2808(PC10 x 1000 − 5.2251)3+ + 5038.5490(PC10 x 1000 − 5.4618)3+ − 3043.2683(PC10 x 1000 − 5.6171)3+ − 1100.6736(PC12 x 10000) + 14148.0490(PC12 x 10000 + 0.5485)3+ − 24336.1410(PC12 x 10000 + 0.4333 )3+ + 10188.0910(PC12 x 10000 + 0.2734)3+ Note that if the quantity within (∙)3+ is ≤ 0, (∙)3+ = 0, else (∙)3+ = (∙)3 where Eq 2. PC10 = (4.17E − 06 ∗ T1) + (2.24E − 04 ∗ FA) + (3.79E − 01 ∗ FA numerator) + (7.78E − 01 ∗ FA denominator) + (2.99E − 01 ∗ Mean Diffusivity) + (4.01E − 01 ∗ K2) + (3.96E − 05 ∗ rCBF) + (−1.13E − 06 ∗ T2) + (−1.99E − 05 ∗ rCBVcorrected) + (1.85E − 05 ∗ rCBVuncorrected) + (5.18E − 05 ∗ rMTT) + (3.0E − 05 ∗ TTP) and Eq 3. PC12 = (−8.78E − 08 ∗ T1) + (5.59E − 05 ∗ FA) + (3.58E − 01 ∗ FA numerator) + (−4.76E − 01 ∗ FA denominator) + (8.02E − 01 ∗ Mean Diffusivity) + (7.39E − 08 ∗ T2) + (−8.57E − 03 ∗ K2) + (1.58E − 05 ∗ rCBF) + (−8.77E − 09 ∗ rCBVcorrected) + (−6.535E − 07 ∗ rCBVuncorrected) + (3.92E − 06 ∗ rMTT) + (−5.25E − 06 ∗ TTP) Model adequacy, with respect to the final model’s ability to predict nuclear density, was assessed based on a biased corrected version of the multiple coefficient of determination (R 2). The biased correct R2 was estimated via the bootstrap validation function “validate” of the HMISC library of Spotfire Splus version 8.3 (TIBCO Inc., Palo Alto, CA). The biased corrected R 2 essentially represents the predicted R2 after subtracting out the optimism in the observed value of R2 induced by the fact that the model parameter estimates were optimized to predict the observed values of the response variable. Statistical software: The principal components analysis, and the univariate and multivariate generalized estimating equation modeling were conducted utilizing the software of the Spotfire Splus version 8.3 (TIBCO Inc., Palo Alto, CA) statistical package.