Uploaded by CHEM ANAS

Article scientifique

advertisement
International Journal of Medical Informatics 178 (2023) 105192
Contents lists available at ScienceDirect
International Journal of Medical Informatics
journal homepage: www.elsevier.com/locate/ijmedinf
Development and validation of a prediction model for evaluating
extubation readiness in preterm infants
Wongeun Song a, d, 1, Young Hwa Jung b, c, 1, Jihoon Cho a, Hyunyoung Baek a, Chang Won
Choi b, c, *, Sooyoung Yoo a, *
a
Office of eHealth Research and Businesses, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
Department of Pediatrics, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
Department of Pediatrics, Seoul National University College of Medicine, Seoul, Republic of Korea
d
Department of Health Science and Technology, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea
b
c
A R T I C L E I N F O
A B S T R A C T
Keywords:
Extubation readiness
Preterm infants
Prediction model
Successful early extubation has advantages not only in terms of short-term respiratory morbidities and survival
but also in terms of long-term neurodevelopmental outcomes in preterm infants. However, no consensus exists
regarding the optimal protocol or guidelines for extubation readiness in preterm infants. Therefore, the decision
to extubate preterm infants was almost entirely at the attending physician’s discretion. We identified robust and
quantitative predictors of success or failure of the first planned extubation attempt before 36 weeks of postmenstrual age in preterm infants (<32 weeks gestational age) and developed a prediction model for evalu­
ating extubation readiness using these predictors. Extubation success was defined as the absence of reintubation
within 72 h after extubation. This observational cohort study used data from preterm infants admitted to the
neonatal intensive care unit of Seoul National University Bundang Hospital in South Korea between July 2003
and June 2019 to identify predictors and develop and test a predictive model for extubation readiness. Data from
preterm infants included in the Medical Informative Medicine for Intensive Care (MIMIC-III) database between
2001 and 2008 were used for external validation. From a machine learning model using predictors such as
demographics, periodic vital signs, ventilator settings, and respiratory indices, the area under the receiver
operating characteristic curve and average precision of our model were 0.805 (95% confidence interval [CI],
0.802–0.809) and 0.917, respectively in the internal validation and 0.715 (95% CI, 0.713–0.717) and 0.838,
respectively in the external validation. Our prediction model (NExt-Predictor) demonstrated high performance in
assessing extubation readiness in both internal and external validations.
1. Introduction
surfactant deficiency. Although MV remains an important life-saving
management method, prolonged MV is associated with an increased
risk of bronchopulmonary dysplasia (BPD), neurodevelopmental
impairment, and mortality [1,2]. Thus, early extubation of preterm in­
fants is desirable. However, in a recent multicenter study [3], the failure
Extremely preterm infants often require endotracheal intubation and
mechanical ventilation (MV), particularly during the first few days or
weeks after birth, due to lung immaturity, weak respiratory drive, and
Abbreviations: AUROC, area under the receiver operating characteristic curve; BPD, bronchopulmonary dysplasia; CNB, complement naïve Bayesian; CI, confi­
dence interval; DL, deep learning; EHRs, electronic health records; ET-CPAP, endotracheal continuous positive airway pressure; XGB, extreme gradient boosting; GA,
gestational age; GBM, gradient boosting model; HFNC, high-flow nasal cannula; LOCF, last observation carried forward; ML, machine learning; MAP, mean airway
pressure; MV, mechanical ventilation; MIMIC-III, Medical Informative Mart for Intensive Care; N-CPAP, nasal continuous positive airway pressure; NExt-Predictor,
neonatal extubation readiness predictor; NIPPV, nasal intermittent positive pressure ventilation; NPV, negative predictive value; PDP, partial dependence plot; PEEP,
positive end-expiratory pressure; PPV, positive predictive value; PMA, post-menstrual age; ROX, respiratory rate; RSS, respiratory severity score; SNUBH, Seoul
National University Bundang Hospital; SBT, spontaneous breathing trials; SGD, stochastic gradient descent; SHAP, SHapley Additive explanation.
* Corresponding authors at: Department of Pediatrics, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea (C.W. Choi) Healthcare ICT
Research Center, Office of eHealth Research and Businesses, Seoul National University Bundang Hospital, Seongnam, Republic of Korea (S. Yoo).
E-mail addresses: choicw@snu.ac.kr (C. Won Choi), yoosoo0@snubh.org (S. Yoo).
1
These authors contributed equally to this work.
https://doi.org/10.1016/j.ijmedinf.2023.105192
Received 26 March 2023; Received in revised form 13 July 2023; Accepted 8 August 2023
Available online 12 August 2023
1386-5056/© 2023 Elsevier B.V. All rights reserved.
W. Song et al.
International Journal of Medical Informatics 178 (2023) 105192
rate of the first planned extubation ranged from 10% after 24 h postextubation to nearly 50% by discharge in preterm infants with a birth
weight of ≤ 1,250 g. Extubation failure and subsequent reintubation
were associated with an increased rate of extubation failure, and sub­
sequent reintubation was associated with an increased duration of MV
by 10 to 12 days, BPD and mortality, primarily because of prolonged
exposure to MV [4]. However, reintubation after planned extubation is
associated with an increased risk of respiratory morbidity and mortality,
independent of the cumulative MV duration and other known con­
founders [4–7].
Despite the importance of early successful extubation, no consensus
protocols or guidelines exist to aid physicians in determining the optimal
timing for extubation in preterm infants. In clinical practice, extubation
timing is determined at the attending physician’s discretion, which re­
sults in significant inter-physician and inter-center variations [1].
Studies have been conducted to determine objective predictors or pre­
diction models for extubation readiness [2,5,6,8–11]. However, limita­
tions exist in the clinical application of these methods because most
studies have been conducted with small sample sizes and without
external validation [4].
Recently, spontaneous breathing trials (SBT) have gained popularity
in determining extubation readiness in extremely preterm infants [12].
However, SBT shows low specificity, and many extremely preterm in­
fants experience adverse events during SBT [13]. Shalish et al. [14]
proposed that the combination of clinical events used to define a passed
or failed SBT had low accuracy in predicting extubation success.
Few studies have investigated the physiological signals as predictors
of extubation readiness in preterm infants. When used in combination
with SBT, variations in heart or respiratory rate improved the predictive
ability for extubation readiness in small-scale studies [15,16].
We aimed to identify physiological signals to differentiate extubation
success from failure and apply novel data-driven features to develop a
robust and field-applicable prediction model for calculating the proba­
bility of extubation readiness based on the physiological signals identi­
fied. We named our model the neonatal extubation readiness predictor
(NExt-Predictor), which uses physiological signals. We compared and
evaluated its performance with that of existing models.
2. Methods
2.1. Data source
This retrospective cohort study was conducted using electronic
health records (EHRs) and periodic vital signs of the Seoul National
University Bundang Hospital (SNUBH) collected from July 2003 to June
2019 and the Medical Informative Market for Intensive Care (MIMIC-III)
database targeting preterm infants admitted to the NICU [17]. The
development and internal validation cohorts were constructed by
separating the SNUBH database from 2003 to 2016 and 2017 to 2019,
respectively. The external validation cohort was constructed using the
MIMIC database for the period from 2001 to 2008 (Fig. 1). This study
followed the transparent reporting of a multivariate prediction model
for individual prognosis or diagnosis (TRIPOD) reporting guidelines
[18]. Furthermore, this study was reviewed and approved by the Insti­
tutional Review Board (IRB) of SNUBH (X-2205–759-901). Because the
data source for this study was de-identified, a waiver of consent was
granted. Approval for data collection, processing, and release from the
MIMIC-III database was granted by the IRB of the Beth Israel Deaconess
Medical Center and the Massachusetts Institute of Technology [17].
2.2. Eligibility criteria and population
This study included preterm infants born earlier than 32 weeks of
gestational age (GA) who received MV through an endotracheal tube
and underwent their first extubation attempt before 36 weeks of postmenstrual age (PMA). Patients with major congenital anomalies or
other airway problems and those who were extubated after<6 h of MV
were excluded. We selected patients who had been intubated for more
than 6 h to exclude cases where intubation was performed for specific
examinations such as MRI or surgeries for conditions such as retinopathy
of prematurity or inguinal hernia. Patients who underwent unplanned
extubation were excluded. Extubation time was defined as the index
date t = 0 to verify reintubation or extubation failure. An observation
window was set from the time of admission to the time of index data
collection. Fig. 1 shows a flowchart of the development and internal and
Fig. 1. Development and internal and external validation cohorts to identify predictors and develop predictive models. (A) Development cohort and internal
validation cohort. (B) External validation cohort.
2
W. Song et al.
International Journal of Medical Informatics 178 (2023) 105192
external validation cohorts.
2.6. Prediction model development and validation
The models used in this study were logistic regression, random for­
est, gradient boosting model (GBM), decision tree, stochastic gradient
descent (SGD) classifier, complement-naïve Bayesian (CNB), and
extreme gradient boosting (XGB) [24], all of which have demonstrated
high performance in recent studies. The stratified split was isolated
based on the training (80%) and test datasets (20%) in the development
cohort. For hyperparameter optimization of each model, 10-fold crossvalidation and a grid search based on the area under the receiver
operating characteristic curve (AUROC) were applied. The categories
and value ranges of the applied model hyperparameters are listed in
eTable 2 (Supplement). Model selection was performed only on the
training dataset to prevent the leakage of the validation datasets.
2.3. Outcome
The primary outcome was the success or failure of the first planned
extubation in preterm infants. Extubation success was defined as no
reintubation within 72 h of the planned extubation. Reintubation within
10 min of extubation was excluded from the analysis because discrimi­
nating between unplanned extubation due to self-extubation or a mis­
placed endotracheal tube and extubation failure was difficult. Further
details are included in the eMethods section of Supplementary Material.
2.4. Predictors
2.7. Performance evaluation metrics
The clinical data and physiological signals before extubation and
reintubation (in cases of extubation failure) were extracted to identify
potential predictors. We selected routinely obtained vital signs,
including heart rate, respiratory rate, body temperature (BT), oxygen
saturation (SpO2), and blood pressure. Potential predictors included GA,
birth weight, PMA at the time of extubation, male sex, pre-extubation
blood gases (pH and pCO2), and ventilator settings, such as the frac­
tion of inspired oxygen (FiO2), positive end-expiratory pressure (PEEP),
mean airway pressure (MAP), and frequency. Moreover, respiratory
indices, such as the SpO2/FiO2 (SF) [19] ratio, the ratio of SpO2/FiO2 to
the respiratory rate (ROX) [20], and respiratory severity score (RSS),
were included. Missing values were inputted using the last observation
carried forward (LOCF). The reasons for using LOCF are as follows: first,
it minimizes data distribution distortion. As this study used variable
variability, we needed to reduce the risk of introducing artificial biases
or altering the statistical properties of the data. Second, most of the
missing variables were ventilation input data. Vital signs were collected
automatically; therefore, there was almost no data loss. The ventilation
input data were entered into the EMR only when the clinician reported a
significant change. Therefore, reflecting the characteristics of these
clinical environments, we assumed that the empty input value was likely
to be similar to the previous data, and even if it actually changed in the
middle, the difference was negligible. Time-domain analysis methods
were applied to the physiological variables periodically measured from
the admission to the index data to generate predictors (eTable 1 in the
Supplement). We named this predictor NExt-Predictor.
To compare the effectiveness of our NExt-Predictor, the prediction
model proposed by Gupta et al. [2] was selected as the baseline model.
We then compared the performance of NExt-Predictor with that of the
baseline model, which was recalibrated to our dataset. The baseline
model used GA, extubation days, FiO2, RSS, weight, and pH as pre­
dictors. Based on the method of Gupta et al., the time points of the
predictors of the baseline model were as follows: for FiO2, pH, and body
weight, the values measured at the last time point before extubation
were used; for RSS, the highest value during the 24 h before extubation
was used.
SHAP [25] was applied to evaluate the interpretability and feature
importance of the potential predictor, SHapley Additive exPlanation
(SHAP) [25] was applied. A partial dependence plot (PDP) was used to
analyze the predictor cutoff and marginal effects.
The following metrics were used to assess the model discrimination:
accuracy, AUROC, area under the precision-recall curve (AUC), positive
predictive value (PPV), and negative predictive value (NPV). To
compare the discriminating abilities of the prediction models, we
calculated the 95% CI of the AUROC and decision curve analysis.
Because the data class was highly imbalanced, the trained prediction
model was compared with two random guessers. The uninformed
guesser was a classifier who did not follow the distribution within the
cohort (extubation success rate of 83%), and the informed guesser was
the baseline classifier who followed the distribution of each cohort.
Using the calibration belt method, we evaluated the goodness of fit
between the predicted and observed probabilities [26].
3. Results
3.1. Study population
In the SNUBH database, 678 infants met the inclusion criteria. Of
Table 1
Baseline characteristics for extubation success and failure groups.
Patient Characteristics
Gestational Age, mean (SD), weeks
Birth Weight, mean (SD), g
Gender, Male
PMA at extubation (weeks)
Ventilation Variables
FiO2
PEEP (cm H2O)
MAP (cm H2O)
Frequency (rpm)
Oxygen Saturation (%)
Ventilation Variables Post Extubation
N-CPAP
NIPPV
HFNC
FiO2
PEEP (cm H2O)
MAP (cm H2O)
Frequency (rpm)
2.5. Statistical analysis for feature selection
Statistical models and tableone Python libraries were used for sta­
tistical analyses [21,22]. Propensity score matching was used to identify
candidate features with statistically significant differences between the
outcome and control cohorts [23]. Univariate analyses were performed
to determine the adjusted odds ratios (ORs) and marginal effects of all
predictors by controlling for gestational age and birth weight, which can
significantly influence the extubation success rate of preterm infants.
Predictors were analyzed based on the time of extubation using uni­
variate analysis, such as the Student’s t-test, while categorical variables
were analyzed using the chi-square test for baseline characteristics.
Development Cohort
Extubation
Success Group
(n = 402)
Extubation
Failure Group
(n = 79)
P-value
28.8 (2.1)
1160.5 (352.2)
213 (53.0)
30.6 (2.5)
27.1 (2.1)
897.0 (342.2)
52 (65.8)
29.4 (2.9)
<0.001
<0.001
0.048
<0.001
0.24 (0.05)
5.0 (0.7)
7.7 (1.4)
24.6 (9.9)
97.2 (3.9)
0.28 (0.08)
5.1 (0.6)
8.2 (1.6)
25.5 (10.5)
96.0 (4.5)
<0.001
0.370
0.233
0.486
0.025
322 (80.1)
24 (6.0)
22 (5.5)
0.26 (0.07)
5.2 (0.8)
8.3 (1.6)
35.5 (12.4)
63 (79.7)
9 (11.4)
7 (8.9)
0.33 (0.10)
5.4 (0.8)
9.6 (2.5)
25.5 (13.0)
<0.001
0.077
0.089
0.495
Abbreviations: HFNC, high-flow nasal cannula; MAP, mean airway pressure; NCPAP, nasal continuous positive airway pressure; NIPPV, nasal intermittent
positive pressure ventilation; PEEP, positive end-expiratory pressure PMA, postmenstrual age.
3
W. Song et al.
International Journal of Medical Informatics 178 (2023) 105192
these, 481 (71%) were included in the development cohort, and 197
(29%) were included in the internal validation cohort (Fig. 1, Table 1,
and eTable 3). In the development cohort, the mean (standard deviation
[SD]) GA was 28.5 (2.2) weeks, and birth weight was 1,117 (3 5 8) g; 402
patients (84%) were successfully extubated. In internal validation, the
mean (SD) GA was 28.8 (2.2) weeks, and birth weight was 1,179 (3 7 5)
g; 163 (83%) of the patients were extubated successfully (Tables 1 and 2
and eTable 3). No statistically significant differences were observed in
demographic characteristics between the development and internal
validation cohorts, except for birth weight (P = 0.049) (Table 2).
However, the ventilation settings measured at the time closest to extu­
bation, FiO2 (0.25 [0.06] vs. 0.24 [0.04], P = 0.035), PEEP (4.99 [0.66]
vs. 5.29 [0.72], P < 0.001), and frequency (24.76 [10.0] vs. 34.34
[5.43], P < 0.001) showed significant differences between the devel­
opment and internal validation cohorts. In contrast, the SpO2 and RSS of
the two cohorts were not significantly different. From the MIMIC-III
database, 802 patients were included in the validation cohort. The
mean (SD) GA was 29.2 (2.3), and birth weight was 1,343 (429) g; 668
(83%) patients were extubated successfully.
difference of the SF ratio had a high coefficient value, indicating that the
probability of extubation success decreased when the sequential mean
difference of the SF ratio increased (Fig. 2 A, B). Heart rate variability
was associated with successful extubation (Fig. 2 C, D, and E).
3.3. Model performance
We created 331 feature sets and 993 predictive models to develop
and validate our extubation readiness model. Table 3 compares the
models, including the baseline model [2] and NExt-Predictor. The NExtPredictor demonstrated high discriminating ability (AUROC 0.892; 95%
CI, 0.890–0.895) in internal validation and (AUROC 0.766; CI,
0.765–0.768) in external validation. Compared with the baseline
models, the AUROC of NExt-Predictor exceeded the 95% CI without
overlap. In the external validation, NExt-Predictor did not significantly
degrade performance compared with the AUROC in the internal vali­
dation (Fig. 3 and Table 3).
In contrast, the AUROC of the baseline models did not exceed 0.700.
The AUROC of the uninformed and informed guessers were both
approximately 0.500, and the accuracy did not exceed the performance
of our model. The decision curve analysis demonstrated that NExtPredictor provided more benefits than did the clinician’s decision
(gray line) and other models (eFig 2. in Supplementary Material). The
results of calibration-BELT of NExt-Predictor demonstrated that internal
and external calibration were reasonable, as the p-values were higher
than 0.05 in all cohorts (eFig 2).
3.2. Predictors of extubation success
Based on multivariate analysis, we generated candidate predictors
and selected predictors of extubation success (eTables 5 and 6). We also
demonstrate the contribution and interpretability of each predictor
using SHAP and PDP. The probability contribution of each predictor and
the mutual dependency between predictors were calculated using PDP
(Fig. 2).
In the PDP analysis, we found a positive correlation between the
probability of extubation success and both the mean and sequential
difference mean of heart rate within the 12 h prior observation window.
The SF ratio positively correlated with extubation success, showing a
small variation within the observation window and a high mean within
the same period. In contrast, heart rate showed high variability, but the
probability of extubation success increased as it approached a stationary
state. These results suggest that both the variation and average of the
predictors, which were the major determinants in previous studies,
played important roles in determining extubation success in our study
[1,3,27]. The probability of successful extubation decreased rapidly
when the SF ratio was < 319.29, rapidly decreasing. Furthermore, FiO2
had a high SHAP value (0.105), but when checking the PDP plot,
discrimination power did not appear at an FiO2 of 0.4 or less. The
contributions of the other predictors (PEEP, BT, and DBP) did not exceed
0.1, indicating their negligible contributions. The sequential mean
4. Discussion
In this study, we proposed an extubation readiness prediction model
(NExt-Predictor) that can be easily and safely applied in clinical prac­
tice. Furthermore, we performed external validation to identify whether
NExt-Predictor was applicable to most clinical settings. The AUROC and
decision curve of NExt-Predictor demonstrated high discrimination
power in both validation cohorts. Moreover, when extubation success
was defined as no reintubation within 7 days rather than 72 h, our model
performed slightly better without performance degradation (AUROC,
0.885 in the internal validation cohort and AUROC, 0.784 in the
external validation cohort).
Since the extubation success rate of the two institutions was already
high, we additionally confirmed by decision curve analysis the extent to
which the predictive model could improve clinical decisions. We
demonstrated that the net benefit was high at a threshold of 0.5,
although the gray line representing the actual clinical decisions of the
Table 2
Comparison of the demographic characteristics in the development, internal validation, and external validation cohorts.
Patient Characteristics
Development Cohort
(n = 481)
Internal Validation Cohort
(n = 197)
Gestational Age, mean (SD), weeks
Birth Weight, mean (SD), g
Gender, Male
PMA at extubation, mean (SD), weeks
Ventilation Variables
FiO2, mean (SD)
PEEP, mean (SD), cm H2O
MAP, mean (SD), cm H2O
Frequency, mean (SD), rpm
Oxygen Saturation, mean (SD), %
Physiological Variables
Measured Weight, mean (SD), g
Heart Rate, mean (SD), bpm
Respiratory Rate, mean (SD), rpm
Systolic Blood Pressure, mean (SD), mmHg
Diastolic Blood Pressure, mean (SD), mmHg
Mean Blood Pressure, mean (SD), mmHg
Body Temperature, mean (SD), ℃
28.5 (2.2)
1117.2 (358.2)
265 (55.1)
30.4 (2.6)
28.8 (2.2)
1179.0 (374.6)
109 (55.3)
30.9 (2.2)
0.25 (0.06)
4.99 (0.66)
7.83 (1.44)
24.76 (10.00)
97.03 (3.98)
1271.5 (381.2)
152.10 (15.28)
49.87 (14.41)
62.08 (11.45)
37.15 (9.88)
46.62 (9.66)
36.93 (0.28)
External Validation Cohort (n = 802)
P-value
0.058
0.049
1.000
0.010
29.2 (2.3)
1342.7 (429.0)
366 (45.6)
30.9 (1.8)
<0.001*
<0.001*
1.000
<0.001*
0.24 (0.04)
5.29 (0.72)
8.18 (1.39)
34.34 (5.43)
97.39 (3.28)
0.035
<0.001
0.050
<0.001
0.214
0.22 (0.04)
5.24 (0.44)
5.70 (1.26)
49.20 (13.34)
96.82 (2.82)
<0.001*
<0.001*
<0.001*
<0.001*
0.060*
1322.5 (348.7)
153.56 (14.94)
50.12 (13.26)
66.54 (10.89)
39.15 (8.98)
48.44 (8.56)
36.99 (0.29)
0.095
0.228
0.819
<0.001
0.008
0.021
0.015
1403.7 (381.1)
152.06 (16.22)
44.78 (16.48)
65.65 (9.20)
38.22 (7.76)
48.10 (7.59)
36.18 (0.32)
<0.001*
0.425*
<0.001*
<0.001*
0.010*
0.010*
<0.001*
* One-way analysis of variance (ANOVA) test.
4
P-Value
W. Song et al.
International Journal of Medical Informatics 178 (2023) 105192
Fig. 2. PDP and PDP interactive plots for the SF ratio and heart rate (A). Mean of the SpO2/FiO2 ratio (B). Sequential difference mean of the SpO2/FiO2 ratio (C).
Dispersion of the heart rate, (D) mean of the heart rate, and (E) trend of the heart rate.
two institutions already showed a high net benefit. In previous studies,
this type of graph explains that the benefit gained by predictive models
compared to “treat all” are partial [28,29]. However, “treat all” in this
study represented the actual clinician’s decision, not the disease’s
prevalence, which means that our model demonstrated considerable
performance compared to clinical decision-making, and could
contribute to increasing extubation success.
In the NICU, the decision to extubate preterm infants is almost
entirely based on physician judgment, resulting in substantial variations
in extubation practices and frequent failures. Although it varies from
study to study, only 60% to 73% of extremely low birth weight infants
are known to be successfully extubated [29]. Preterm infants in whom
extubation failed were exposed to additional risks, including respiratory
deterioration and alterations in cerebral blood flow and oxygenation.
Extubation failure and subsequent reintubation are associated with an
increased duration of MV by 10–12 days [4,7,30]. Furthermore, pro­
longed MV increases the risk of BPD and neurodevelopmental impair­
ment [30–32]. However, in a small subset of preterm infants,
reintubation itself is associated with an increased risk of BPD or death
independent of MV duration [4]. Taken together, knowledge of the
optimal timing of extubation is crucial for improving the short- and longterm outcomes of preterm infants. Although tools have been developed
to predict extubation readiness in preterm infants, reliable methods are
lacking in clinical practice.
SBTs have gained attention as objective extubation readiness tests
because they are easy to perform and do not require any special
equipment [1,4]. However, the number of studies on preterm
populations is limited. A meta-analysis of these studies [27] concluded
that preterm infants should be extubated directly from low-ventilation
settings without a trial of ET-CPAP. Some studies have reported that
the SBT has low specificity in predicting extubation success. [9,14,16].
After SBTs were implemented in clinical practice, the extubation failure
rate was not significantly altered, which raised doubts about their effi­
cacy [15,27]. The major issue with SBT evaluation is its subjective
interpretation, leading to limited reproducibility [1,33,34]. Shalish et al.
[14] recently demonstrated that 57% of infants exhibited signs of clin­
ical instability during SBTs, which did not improve extubation predic­
tion. Therefore, there is a need to develop a new predictor that
complements subjective judgments without invasive interventions. In
our study, commonly used ventilation setting parameters (PEEP, FiO2,
MAP, and RSS) did not differentiate between extubation success and
failure. Individual physician preferences influence these parameters and
have shown conflicting results in other studies [4,15]. In contrast, the SF
ratio emerged as a robust predictor of extubation success in our study. It
had a high SHAP value and discrimination performance, reflecting the
patient’s oxygenation capacity. Lower heart rate variability was also
associated with a higher likelihood of extubation failure, consistent with
previous studies on cardiovascular variability and extubation readiness
[16]. While signal variability analysis has been used to predict sepsis
and unexpected mortality in the NICU [35], its application in predicting
other clinical outcomes in preterm infants is limited due to challenges in
distinguishing between normal physiological and pathological varia­
tions. Some studies have explored the use of heart or respiratory rate
variability based on SBTs to predict extubation failure [36,37]. In our
5
W. Song et al.
International Journal of Medical Informatics 178 (2023) 105192
Table 3
Performance of Next-Predictor and the comparison model for predicting extubation readiness.
Development Cohort
Classifier Metrics
NExt-Predictor
LR
XGB
GBM
RF
SGD
DT
CNB
Baseline Model
Gupta D’s Model 2
Recalibrated LR*
Internal validation Cohort
Classifier Metrics
NExt-Predictor
LR
XGB
GBM
RF
SGD
DT
CNB
Baseline Model
Gupta D’s Model2
Recalibrated LR*
External validation Cohort
Classifier Metrics
NExt-Predictor
LR
XGB
GBM
RF
SGD
DT
CNB
Baseline Model
Gupta D’s Model2
Recalibrated LR*
ACC
AUROC (CI 95%)
PRC
SENS
SPEC
PPV
NPV
0.851
0.855
0.874
0.870
0.851
0.864
0.841
0.783 (0.780–0.795)
0.827 (0.825–0.829)
0.869 (0.867–0.871)
0.883 (0.881–0.884)
0.780 (0.778–0.782)
0.778 (0.774–0.779)
0.718 (0.716–0.721)
0.951
0.962
0.971
0.976
0.949
0.926
0.924
0.630
0.812
0.734
0.837
0.635
0.741
0.580
0.800
0.675
0.800
0.775
0.775
0.650
0.775
0.945
0.932
0.952
0.953
0.939
0.920
0.934
0.286
0.397
0.356
0.466
0.281
0.315
0.253
0.811
0.841
0.581
0.691 (0.688–0.694)
0.850
0.915
0.610
0.771
0.584
0.512
0.887
0.895
0.218
0.293
ACC
AUROC (CI 95%)
PRC
SENS
SPEC
PPV
NPV
0.819
0.828
0.801
0.828
0.787
0.805
0.782
0.892 (0.890–0.895)
0.815 (0.812–0.718)
0.799 (0.796–0.803)
0.836 (0.833–0.839)
0.818 (0.815–0.821)
0.766 (0.763–0.770)
0.799 (0.796–0.803)
0.954
0.929
0.918
0.939
0.930
0.895
0.913
0.770
0.897
0.851
0.914
0.782
0.681
0.730
0.872
0.574
0.617
0.511
0.702
0.907
0.745
0.957
0.886
0.892
0.874
0.907
0.907
0.914
0.506
0.600
0.527
0.615
0.465
0.533
0.427
0.795
0.779
0.677
0.767 (0.762–0.770)
0.836
0.892
0.697
0.862
0.651
0.500
0.876
0.862
0.378
0.500
ACC
AUROC (CI 95%)
PRC
SENS
SPEC
PPV
NPV
0.713
0.714
0.705
0.715
0.703
0.709
0.712
0.766 (0.765–0.768)
0.683 (0.681–0.684)
0.653 (0.652–0.655)
0.720 (0.718–0.721)
0.705 (0.703–0.707)
0.641 (0.640–0.742)
0.712 (0.710–0.713)
0.882
0.823
0.798
0.848
0.836
0.784
0.823
0.734
0.868
0.782
0.864
0.611
0.876
0.797
0.685
0.346
0.475
0.360
0.720
0.311
0.517
0.848
0.761
0.781
0.764
0.839
0.753
0.798
0.518
0.524
0.477
0.526
0.436
0.511
0.516
0.693
0.699
0.499
0668 (0.666–0.669)
0.816
0.824
0.709
0.754
0.555
0.466
0.791
0.771
0.445
0.443
Abbreviations: RF, random forest classifier; SGD, stochastic gradient descent classifier; XGB, extreme gradient boosting; GBM, gradient boosting machine; DT, decision
tree classifier; CNB, complement-naïve Bayesian; LR, logistic regression; NA, not available; * classifier with the predictors of Gupta et al.
Fig. 3. Area under the receiver operating characteristic curve of NExt-Predictor with logistic regression model performance: (A) development, (B) internal vali­
dation, and (C) external validation cohorts.
6
W. Song et al.
International Journal of Medical Informatics 178 (2023) 105192
study, heart rate variability in the 12 h before extubation served as a
predictor of extubation failure without the need for SBT. The SF ratio
can be easily calculated noninvasively using SpO2 and FiO2 and has
already proven to be a reliable proxy for the PaO2/FiO2 ratio and a good
predictor of noninvasive ventilation failure in children with acute res­
piratory distress syndrome [38–40]. Our findings highlight the critical
role of slight differences in pulmonary oxygenation capacity for suc­
cessful extubation in preterm infants.
This study has several strengths. Firstly, we identified time-series
domain predictors that can be obtained in real-time at the bedside.
NExt-Predictor demonstrated high performance with an AUROC of
0.892 in internal validation. Secondly, we performed external validation
using data from the MIMIC-III database to mitigate selection bias and
demonstrate the generalizability of our NExt-Predictor. Thirdly, the
NExt-Predictor does not rely on ventilator settings, allowing for uniform
predictive performance across different centers and among various
physicians with different ventilator-weaning strategies and settings for
extubation decisions.
However, this study also had limitations. Firstly, the data from the
MIMIC-III database were collected between 2001 and 2008 and may be
considered outdated. Secondly, the small sample size of most extubation
readiness studies often underestimates the performance of state-of-theart classifiers like deep learning. Hence, a multicenter study with a
larger population is needed to develop a more robust model for pre­
dicting extubation readiness with enhanced performance and broader
applicability. Thirdly, even with external validation, prospective clinical
studies are required to confirm the clinical efficacy of the NExtPredictor.
Investigation, Project administration, Resources, Supervision, Writing –
review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence
the work reported in this paper.
Acknowledgement
Funding
This study was supported by a grant from the Korea Health Tech­
nology R&D Project through the Korea Health Industry Development
Institute (KHIDI), funded by the Ministry of Health and Welfare, Re­
public of Korea (grant number: HI18C0022).
Author contributions
WS analyzed the patient data and developed a prediction model. YHJ
and CWC designed this study. JC participated in the data analysis, and
HB curated the data and events. WS, YHJ, CWC, and SY were major
contributors to the writing of the manuscript. SY and CWC supervised
the study. All the authors have read and approved the final version of the
manuscript.
Appendix A. Supplementary material
Supplementary data to this article can be found online at https://doi.
org/10.1016/j.ijmedinf.2023.105192.
5. Conclusion
References
Although a few studies have been conducted on assessing extubation
readiness in preterm infants, there remains no widely accepted assess­
ment tool for extubation readiness for them. In this study, we developed
and evaluated an extubation readiness prediction model for preterm
infants using EHRs and vital sign databases, which can assist physicians
in determining the optimal timing of extubation for preterm infants in
clinical practice without requiring a specific procedure or specialized
device.
[1] H. Al-Mandari, W. Shalish, E. Dempsey, M. Keszler, P.G. Davis, G. Sant’Anna,
International survey on periextubation practices in extremely preterm infants, Arch
Dis Child Fetal Neonatal Ed 100 (2015) F428, https://doi.org/10.1136/
archdischild-2015-308549.
[2] D. Gupta, R.C. Greenberg, A. Sharma, G. Natarajan, M. Cotton, R. Thomas,
S. Chawla, A predictive model for extubation readiness in extremely preterm
infants, Am. J. Perinatol. 39 (2019) 1663–1669, https://doi.org/10.1038/s41372019-0475-x.
[3] W. Shalish, L. Kanbar, M. Keszler, S. Chawla, L. Kovacs, S. Roa, B.A. Panaitescu,
A. Laliberte, D. Precup, K. Brown, R.E. Kearney, G.M. Sant’Anna, Patterns of
reintubation in extremely preterm infants: a longitudinal cohort study, Pediatr.
Res. 83 (2018) 969–975, https://doi.org/10.1038/pr.2017.330.
[4] W. Shalish, M. Keszler, P.G. Davis, G.M. Sant’Anna, Decision to extubate extremely
preterm infants: art, science or gamble? Arch. Dis. Child. Fetal Neonatal Ed. 107
(2022) 105–112, https://doi.org/10.1136/archdischild-2020-321282.
[5] B.J. Manley, L.W. Doyle, L.S. Owen, P.G. Davis, Extubating extremely preterm
infants: predictors of success and outcomes following failure, J. Pediatr. 173 (2016)
45–49, https://doi.org/10.1016/j.jpeds.2016.02.016.
[6] S. Chawla, G. Natarajan, S. Shankaran, B. Carper, L.P. Brian, M. Keszler, W.
A. Carlo, et al., Markers of successful extubation in extremely preterm infants, and
morbidity after failed extubation, J. Pediatr. 189 (2017) 113–119.e2, https://doi.
org/10.1016/j.jpeds.2017.04.050.
[7] W. Shalish, L. Kanbar, L. Kovacs, S. Chawla, M. Keszler, S. Roa, S. Panaitescu, The
impact of time interval between extubation and reintubation on death or
bronchopulmonary dysplasia in extremely preterm infants, J. Pediatr. 205 (2019)
70–76.e2, https://doi.org/10.1016/j.jpeds.2018.09.062.
[8] A. Mikhno, C.M. Ennett, Prediction of extubation failure for neonates with
respiratory distress syndrome using the MIMIC- II clinical database, Conf, Proc.
IEEE Eng. Med. Biol. Soc. (2012) 5094–5097, https://doi.org/10.1109/
EMBC.2012.6347139.
[9] S. Chawla, G. Natarajan, M. Gelmini, S.N.J. Kazzi, Role of spontaneous breathing
trial in predicting successful extubation in premature infants, Pediatr. Pulmonol.
48 (2013) 443–448, https://doi.org/10.1002/ppul.22623.
[10] W. Shalish, L.J. Kanbar, S. Roa, C.A. Robles-Rubio, L. Kovacs, S. Cawla, M. Keszler,
et al., Prediction of extubation readiness in extremely preterm infants by the
automated analysis of cardiorespiratory behavior: study protocol, BMC Pediatr. 17
(2017) 167, https://doi.org/10.1186/s12887-017-0911-z.
[11] P. Gourdeau, L. Kanbar, W. Shalish, G. Saint’Anna, R. Kearney, D. Precup, Feature
selection and oversampling in analsis of clinical data for extubation readiness in
extreme preterm infants, Conf, Proc. IEEE Eng. Med. Biol. Soc. (2015) 4427–4430,
https://doi.org/10.1109/EMBC.2015.7319377.
[12] M. Beltempo, T. Isayama, M. Vento, K. Lui, S. Kusuda, L. Lehtonen, G. Sjörs, et al.,
Respiratory management of extremely preterm infants: an international survey,
Neonatology 114 (2018) 28–36, https://doi.org/10.1159/000487987.
6. Summary Table
What is already known
• Early extubation in preterm infants has advantages, but there is currently no
consensus on the most effective guideline.
• The use of SBT has gained popularity as a method of determining extubation
readiness.
• Limited research has explored the predictive value of physiological signals in
determining extubation readiness.
What this paper adds
• The SF ratio and its variability were identified as predictors that could be used to
quantitatively evaluate extubation readiness.
• The prediction model could assist clinicians in determining extubation readiness
without a specific procedure or a specialized device.
• NExt-predictor demonstrated high performance in both internal and external
validation.
CRediT authorship contribution statement
Wongeun Song: Conceptualization, Data curation, Formal analysis,
Methodology, Software, Validation, Visualization, Writing – original
draft. Young Hwa Jung: Conceptualization, Formal analysis, Method­
ology, Validation, Writing – original draft. Jihoon Cho: Data curation.
Hyunyoung Baek: Data curation. Chang Won Choi: Conceptualiza­
tion, Funding acquisition, Investigation, Resources, Supervision,
Writing – review & editing. Sooyoung Yoo: Conceptualization,
7
W. Song et al.
International Journal of Medical Informatics 178 (2023) 105192
Arch. Dis. Child Fetal Neonatal Ed 104 (2019) F89–F97, https://doi.org/10.1136/
archdischild-2017-313878.
[28] M.S. Pepe, K.F. Kerr, G. Longton, Z. Wang, Testing for improvement in prediction
model performance, Stat. Med. 32 (2013) 1467–1482, https://doi.org/10.1002/
sim.5727.
[29] B. Van Calster, et al., Reporting and Interpreting Decision Curve Analysis: A Guide
for Investigators, Eur. Urol. 74 (2018) 796–804, https://doi.org/10.1016/j.
eururo.2018.08.038.
[30] B. Guy, M.E. Dye, L. Richards, S.O. Guthrie, L.D. Hatch, Association of time of day
and extubation success in very low birthweight infants: a multicenter cohort study,
Am. J. Perinatol. 41 (2021) 2532–2536, https://doi.org/10.1038/s41372-02101168-6.
[31] E.A. Jensen, S.B. DeMauro, M. Kornhauser, Z.H. Aghai, J.S. Greenspan, K.C. Dysart,
Effects of multiple ventilation courses and duration of mechanical ventilation on
respiratory outcomes in extremely low-birth-weight infants, JAMA Pediatr. 169
(2015) 1011–1017, https://doi.org/10.1001/jamapediatrics.2015.2401.
[32] M.C. Walsh, B.H. Morris, L.A. Wrage, B.R. Vohr, W.K. Poole, J.E. Tyson, L.
L. Wright, et al., Extremely low birthweight neonates with protracted ventilation:
mortality and 18-month neurodevelopmental outcomes, J. Pediatr. 146 (2005)
P798–P804, https://doi.org/10.1016/j.jpeds.2005.01.047.
[33] R.J.S. Vliegenthart, A.H. van Kaam, C.S.H. Aarnoudse-Moens, A.G. van Wassenaer,
W. Onland, Duration of mechanical ventilation and neurodevelopment in preterm
infants, Arch Dis Child Fetal Neonatal Ed 104 (2019) F631–F635, https://doi.org/
10.1136/archdischild-2018-315993.
[34] H. Zein, A. Baratloo, A. Negida, S. Safari, Ventilator weaning and spontaneous
breathing trials; an educational review, Emerg (Tehran) 4 (2016) 65–71, https://
doi.org/10.22037/AAEM.V4I2.222.
[35] S. Godard, C. Henry, P. Westgaard, N. Scales, S.M. Brown, K. Burns, S. Mehta, et al.,
Practice variation in spontaneous breathing trial performance and reporting, Can.
Respir. J. (2016), https://doi.org/10.1155/2016/9848942.
[36] B.A. Sullivan, C. McClure, J. Hicks, D.E. Lake, J.R. Moorman, K.D. Fairchild, Early
heart rate characteristics predict death and morbidities in preterm infants,
J. Pediatr. 174 (2016) 57–62, https://doi.org/10.1016/j.jpeds.2016.03.042.
[37] J. Kaczmarek, S. Chawla, C. Marchica, M. Dwaihy, L. Grundy, G.M. Sant’Anna,
Heart rate variability and extubation readiness in extremely preterm infants,
Neonatology 104 (2013) 42–48, https://doi.org/10.1159/000347101.
[38] J. Kaczmarek, C.O. Kamlin, C.J. Morley, P.G. Davis, G.M. Sant’anna, Variability of
respiratory parameters and extubation readiness in ventilated neonates, Arch. Dis.
Child Fetal Neonatal Ed 98 (2013) F70–F73, https://doi.org/10.1136/
fetalneonatal-2011-301340.
[39] R.G. Khemani, N.R. Patel, R.D. Bart, C.J.L. Newth, Comparison of the pulse
oximetric saturation/fraction of inspired oxygen ratio and the Pao2/Fraction of
inspired oxygen ratio in children, Chest 135 (2009) 662–668, https://doi.org/
10.1378/chest.08-2239.
[40] M. Pons-Odena, D. Palanca, V. Modesto, E. Estaban, D. González-Lamuño,
R. Carreras, A. Palomeque, SpO2/FiO2 as a predictor of noninvasive ventilation
failure in children with hypoxemic respiratory insufficiency, J. Pediatr. Intensive
Care. 02 (2013) 111–119, https://doi.org/10.3233/PIC-13059.
[13] A.M. Nakato, D.D.F.C. Ribeiro, A.C. Simão, R.P. Da Silva, P. Nohama, Impact of
spontaneous breathing trials in cardiorespiratory stability of preterm infants,
Respir. Care 66 (2021) 286–291, https://doi.org/10.4187/respcare.07955.
[14] W. Shalish, L. Kanbar, L. Kovacs, S. Chawla, M. Keszler, S. Rao, S. Latremouille, et
al., Assessment of extubation readiness using spontaneous breathing trials in
extremely preterm neonates, JAMA Pediatr. 174 (2020) 178–185, https://doi.org/
10.1001/jamapediatrics.2019.4868.
[15] R.F. Teixeira, A.C.A. Carvalho, R.D. de Araujo, F.C.S. Veloso, S.B. Kassar, A.M.
C. Medeiros, Spontaneous breathing trials in preterm infants: systematic review
and meta-analysis, Respir. Care 66 (2021) 129–137, https://doi.org/10.4187/
respcare.07928.
[16] J. Kaczmarek, C.O.F. Kamlin, C.J. Morley, P.G. Davis, G.M. Sant’Anna, Variability
of respiratory parameters and extubation readiness in ventilated neonates, Arch Dis
Child Fetal Neonatal Ed 98 (2013) F70–F73, https://doi.org/10.1136/
fetalneonatal-2011-301340.
[17] A.E.W. Johnson, T.J. Pollard, L. Shen, L.H. Lehman, M. Feng, M. Ghassemi,
B. Moody, et al., MIMIC-III, a freely accessible critical care database, Sci. Data 3
(2016), 160035, https://doi.org/10.1038/sdata.2016.35.
[18] K.G.M. Moons, D.G. Altman, J.B. Reitsma, J.P.A. Ioannidis, P. Macaskill, E.
W. Steyerberg, A.J. Vickers, et al., Transparent reporting of a multivariable
prediction model for individual prognosis or diagnosis (TRIPOD): explanation and
elaboration, Ann. Intern. Med. 162 (2015) W1–W73, https://doi.org/10.7326/
M14-0698.
[19] T.W. Rice, A.P. Wheeler, G.R. Bernard, D.L. Hayden, D.A. Schoenfeld, L.B. Ware,
Comparison of the Spo2/Fio2 ratio and the Pao2/Fio2 ratio in patients with acute
lung injury or ARDS, Chest 132 (2007) 410–417, https://doi.org/10.1378/
chest.07-0617.
[20] O. Roca, J. Messika, B. Caralt, M. García-de-Acilu, B. Sztrymf, J.D. Ricard, J.
R. Masclans, Predicting success of high-flow nasal cannula in pneumonia patients
with hypoxemic respiratory failure: the utility of the ROX index, J. Crit. Care 35
(2016) 200–205, https://doi.org/10.1016/j.jcrc.2016.05.022.
[21] S. Seabold, J. Perktold, Statsmodels: Econometric and Statistical Modeling with
Python. 9th Python in Science Conference (2010) 57-61.https://doi.org/
10.25080/Majora-92bf1922-011.
[22] T.J. Pollard, A.E.W. Johnson, J.D. Raffa, R.G. Mark, Tableone: an open source
Python package for producing summary statistics for research papers, JAMIA
Open. 1 (2018) 26–31, https://doi.org/10.1093/jamiaopen/ooy012.
[23] P.R. Rosenbaum, D.B. Rubin, The central role of the propensity score in
observational studies for causal effects, Biometrika 70 (1983) 41–55, https://doi.
org/10.1093/biomet/70.1.41.
[24] T. Chen, C. Guestrin, XGBoost: A Scalable Tree Boosting System. 22nd ACM
SIGKDD International Conference on Knowledge discovery and data mining.
(2016) 785-794.
[25] S.M. Lundberg, S.I. Lee, A unified approach to interpreting model predictions, Adv.
Neural. Inf. Process. Syst. 30 (2017).
[26] S. Finazzi, D. Poole, D. Luciani, P.E. Cogo, G. Bertolini, Calibration belt for qualityof-care assessment based on dichotomous outcomes, PLoS One 6 (2011) e16110,
https://doi.org/10.1371/journal.pone.0016110.
[27] W. Shalish, S. Latremouille, J. Papenburg, G.M. Sant’Anna, Predictors of
extubation readiness in preterm infants: a systematic review and meta-analysis,
8
Download