sim_6165-sup0001-Xanthakis_Online Supplemental

Online Supplemental Material Online Supplement A. Detailed explanation of the simulation design We simulated 4,000 replications (datasets) across 4 different scenarios (1000 replications per scenario; Table 1), with each dataset having a sample size of n=5,000. Each replication represents a simulated “current study” dataset (containing randomly generated values of standard risk factors, the new biomarker risk factor and event incidence indicator for each of the 5,000 subjects in the simulated dataset). The event rates in the simulated datasets are shown in Table 1. For example, we forced the event rate to range from 5-35% across the 1,000 simulated datasets; this was carried out by simulating the probability of the event incidence from the N(0.2,0.052) distribution. Once the event rate was randomly generated for a given simulated dataset, we generated event incidence status (yes/no) for 5,000 subjects in the dataset. Risk factor values were then generated separately for each of these 5,000 subjects in each replication. Specifically, in each simulated dataset, there were 4 standard risk factors (X1, X2, X3, X4) and the new biomarker (W). The means of the 4 standard risk factors were generated randomly from 4 different sets of normal distributions (upper part of Table 1), producing 4 different simulation scenarios. Once the means (µ1, µ2, µ3, µ4) of the 4 risk factors were generated from the normal distributions assigned to the scenario (Table 1), we then generated values of random variables Xi from the N(µi, 12) distribution, i = 1, 2, 3, 4, for subjects with events (middle part of Table 1). Values of W for subjects with events in the simulated dataset were generated from the N(1, 12) distribution. Values of the risk factors X1, X2, X3, X4, and W for non-events were generated independently from the N(0,12) distribution. Online Supplement B. Example of a simulated dataset As an example of how a given replication was generated in the first simulation scenario where the event rates ranged from 5% to 35%, one simulated value for the event rate could be 15%, (corresponding to 750 events and 4,250 non-events). Additionally, in the same simulation scenario the distributions of µ1, µ2, µ3, µ4 were N(0.3,0.32), N(0.5,0.32), N(0.7,0.32), and N(0.9,0.32), respectively. One of the 1,000 generated vectors of µ1, µ2, µ3, µ4 could be 0.25, 0.45, 0.75, and 0.85, respectively. Therefore, we generated values for X1, X2, X3, X4, and W for the 750 people with events based on the distributions N(0.25,12), N(0.45,12), N(0.75,12), N(0.85,12), and N(1, 12), respectively. Additionally, values for X1, X2, X3, X4, W for the 4250 people without events were generated from the distribution N(0, 12). Pairwise correlations among X1, X2, X3, X4 were set to 0.1, while the correlation between W and each of X1, X2, X3, X4 was set to 0.2. We also created “published study datasets” using the same simulation design as described above for the current study datasets. Specifically, we generated 1,000 “published” datasets to match the number of the current study datasets. Each “published” dataset had a sample size of n=5000. We used logistic regression analysis to estimate the regression coefficients for X1, X2, X3, X4 in the “published” dataset, and applied the “published” coefficients to the current study datasets to create the “published” risk score. Finally, to establish the “typical reference ” values of the discrimination metrics used, we generated a dataset of n=1,000,000, with a 20% event rate (i.e., 200,000 people having the event and 800,000 people not having the event) intended to represent a hypothetical population (lower part of Table 1). The values for the risk factors X1-X4 for people having the event were generated from the multivariate normal distribution, with a vector of means {0.3, 0.5, 0.7, 0.9} and vector of variances {1, 1, 1, 1}. Values of the new biomarker W for subjects with events in this simulated dataset were generated from the N(1, 12) distribution. Values of the risk factors X1, X2, X3, X4, and W for non-events were generated independently from the N(0,12) distribution (lower part of Table 1). Pairwise correlations among X1, X2, X3, X4 were set to 0.1, while the correlation between W and each of X1, X2, X3, X4 was set to 0.2. We centered the risk score at its mean value for each replicated dataset and used the following models: 1a. Model with X1, X2, X3, X4 (current study data) 1b. Model with X1, X2, X3, X4, W (current study data) 2a. Model with the risk score estimated from the current study data 2b. Model with the risk score and W, estimated from the current study data 3a. Model with the “published” risk score (from “published” data) 3b. Model with the “published” risk score (from “published” data) and W estimated from the current study data The goal was to assess the effect of adding a new variable (W) to a model that includes the individual standard risk factors X1, X2, X3, X4 using current study data versus adding it to a model that includes a risk score (“published” or from current study data). Using the enumeration of models described above, we compared mean NRI, mean IDI, and differences in mean C-statistic, between:  sets of models (1a,1b) and (2a,2b)  sets of models (1a,1b) and (3a,3b)  sets of models (2a,2b) and (3a,3b) Online Supplement C. We present here additional results to complement Table 3 of the manuscript. Table. Estimation Difference resulting from the use of the three methods for estimating the predictive utility of a new biomarker SD = 0.3 Difference in C-statistic NRI IDI SD = 0.7 Difference in C-statistic NRI IDI Hypothetical Population Method 1 Method 2 Method 3 Estimation Difference from method 1 Estimation Difference from method 2 Estimation Difference from method 3 0.035 0.033 0.032 0.054 0.002 0.003 0.018 0.529 0.060 0.526 0.058 0.515 0.056 0.568 0.073 0.003 0.002 0.014 0.005 0.038 0.013 0.035 0.027 0.025 0.102 0.008 0.010 0.066 0.529 0.060 0.525 0.053 0.504 0.049 0.657 0.105 0.005 0.007 0.026 0.011 0.127 0.045

sim_6165-sup0001-Xanthakis_Online Supplemental

Related documents

Products

Support

sim_6165-sup0001-Xanthakis_Online Supplemental

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib