hep27666-sup-0002-suppinfosup2

advertisement
1
Supplementary Material
METHODS
We performed a post-hoc analysis to derive and test the optimal cut-point to separate between
no steatosis and any steatosis (mild, moderate, or severe) for the MRI Rosetta Stone Project
data set presented in this manuscript. The cut-point that provided the greatest AUROC was
determined to be the optimal cut-point. Because of the potential for over-fitting that occurs
when a threshold is tested in the same population in which it was derived, we also tested the
cut-point along with the existing published cut-points via simulations using data from two prior
pediatric studies with histology. Because the distribution of liver fat in a population is an
important determinant of the biomarker’s ability to separate between normal and abnormal, we
chose data sets that represented important scenarios that would be encountered in clinical
research and/or patient care:
1. General population: Studies of epidemiology or genetics will often require a general
population sample. Therefore in order to represent the general population, we used data from
the Study of Child and Adolescent Liver Epidemiology (SCALE) that included 954 children in the
County of San Diego that had an autopsy with clinical data and liver histology(1).
2. Suspected NAFLD: A second scenario in which one would want to be able to use MRI to
determine the presence of absence of fatty liver is the overweight child with elevated ALT. In
order to represent the clinical case of suspected NAFLD, we used data from a study that
included 347 children referred from primary care to pediatric gastroenterology for suspected
NAFLD(31).
2
Imputation of MRI PDFF Values
Both of these data sets had liver histology with steatosis graded in the same manner as done in
the MRI Rosetta Stone Project, but did not include liver MRI. Therefore the MRI PDFF values
were imputed using the Markov Chain Monte Carlo method. First the means, distribution type
and uncertainty were calculated for the relationship between MRI Fat Fraction and factors found
to be statistically significant predictors of MRI Fat Fraction in the Rosetta cohort using W bij=
(I/Pij)* (k, A, G, Bmi,); Where: Pij = Initial probability of selection; k = Adjustment for sub-sampling;
A = Age; G = Gender; BMIz = BMI Z Score. The model addressed several components
including: covariate effects, sex, nonlinearity associated with age, and BMI Z scores. A set of
confidence intervals were created for imputing the likely range of MRI fat fraction for each
individual subject. Each subject’s confidence intervals underwent 1000 bootstrap sample
simulation and an MRI fat faction score was randomly selected from the simulation.
Quality of Model Assessment
To assess the quality of the model (and thus the robustness of our conclusions), we used 10fold cross-validation and an additional bootstrapping procedure (N = 10,000 boot strap
samples). Briefly, for 10-fold cross-validation, the original sample was partitioned into 10
subsamples. Of the 10 subsamples, a single subsample was retained as the validation data for
testing the model, and the remaining 9 subsamples were used as training data. The crossvalidation process was then repeated multiple times, with each of the 9 subsamples used as the
validation data.
3
Sensitivity and Specificity Calculations
We calculated the sensitivity and specificity for the optimal cut-point derived in MRI Rosetta
Stone Project along with the four previously published MRI fat fraction threshold values used in
Aim 3. These thresholds were tested in the MRI Rosetta Stone Project data set and using the
simulated MRI PDFF values generated for the general population and the clinical scenario of
suspected NAFLD. Area Under the Receiver Operating Curves (AUROC) where then
calculated for each MRI PDFF threshold using the DeLong method.
RESULTS
The optimal MRI PDFF cut-point in the MRI Rosetta Stone Project to separate between no
steatosis and any steatosis (mild, moderate, or severe) was 3.5%. As shown in Supplementary
Table 1, this threshold yielded a sensitivity of 95% and a specificity of 83%. However, both
sensitivity and specificity were reduced when the threshold of 3.5% was tested in the simulation
using the general population or a clinical population with suspected NAFLD. The widely used
threshold of 5.5% had a similar AUROC in both the MRI Rosetta Stone Project data set (0.88)
and the general population data set (0.87); however, this threshold performed much less well in
the setting of suspected NAFLD (0.64). For all of the thresholds, the data in Supplementary
Table 1 show the trade-offs made that are dependent upon the threshold selected and the
target population in which they are applied.
Download