International Journal of Medical Informatics 178 (2023) 105192 Contents lists available at ScienceDirect International Journal of Medical Informatics journal homepage: www.elsevier.com/locate/ijmedinf Development and validation of a prediction model for evaluating extubation readiness in preterm infants Wongeun Song a, d, 1, Young Hwa Jung b, c, 1, Jihoon Cho a, Hyunyoung Baek a, Chang Won Choi b, c, *, Sooyoung Yoo a, * a Office of eHealth Research and Businesses, Seoul National University Bundang Hospital, Seongnam, Republic of Korea Department of Pediatrics, Seoul National University Bundang Hospital, Seongnam, Republic of Korea Department of Pediatrics, Seoul National University College of Medicine, Seoul, Republic of Korea d Department of Health Science and Technology, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea b c A R T I C L E I N F O A B S T R A C T Keywords: Extubation readiness Preterm infants Prediction model Successful early extubation has advantages not only in terms of short-term respiratory morbidities and survival but also in terms of long-term neurodevelopmental outcomes in preterm infants. However, no consensus exists regarding the optimal protocol or guidelines for extubation readiness in preterm infants. Therefore, the decision to extubate preterm infants was almost entirely at the attending physician’s discretion. We identified robust and quantitative predictors of success or failure of the first planned extubation attempt before 36 weeks of postmenstrual age in preterm infants (<32 weeks gestational age) and developed a prediction model for evalu­ ating extubation readiness using these predictors. Extubation success was defined as the absence of reintubation within 72 h after extubation. This observational cohort study used data from preterm infants admitted to the neonatal intensive care unit of Seoul National University Bundang Hospital in South Korea between July 2003 and June 2019 to identify predictors and develop and test a predictive model for extubation readiness. Data from preterm infants included in the Medical Informative Medicine for Intensive Care (MIMIC-III) database between 2001 and 2008 were used for external validation. From a machine learning model using predictors such as demographics, periodic vital signs, ventilator settings, and respiratory indices, the area under the receiver operating characteristic curve and average precision of our model were 0.805 (95% confidence interval [CI], 0.802–0.809) and 0.917, respectively in the internal validation and 0.715 (95% CI, 0.713–0.717) and 0.838, respectively in the external validation. Our prediction model (NExt-Predictor) demonstrated high performance in assessing extubation readiness in both internal and external validations. 1. Introduction surfactant deficiency. Although MV remains an important life-saving management method, prolonged MV is associated with an increased risk of bronchopulmonary dysplasia (BPD), neurodevelopmental impairment, and mortality [1,2]. Thus, early extubation of preterm in­ fants is desirable. However, in a recent multicenter study [3], the failure Extremely preterm infants often require endotracheal intubation and mechanical ventilation (MV), particularly during the first few days or weeks after birth, due to lung immaturity, weak respiratory drive, and Abbreviations: AUROC, area under the receiver operating characteristic curve; BPD, bronchopulmonary dysplasia; CNB, complement naïve Bayesian; CI, confi­ dence interval; DL, deep learning; EHRs, electronic health records; ET-CPAP, endotracheal continuous positive airway pressure; XGB, extreme gradient boosting; GA, gestational age; GBM, gradient boosting model; HFNC, high-flow nasal cannula; LOCF, last observation carried forward; ML, machine learning; MAP, mean airway pressure; MV, mechanical ventilation; MIMIC-III, Medical Informative Mart for Intensive Care; N-CPAP, nasal continuous positive airway pressure; NExt-Predictor, neonatal extubation readiness predictor; NIPPV, nasal intermittent positive pressure ventilation; NPV, negative predictive value; PDP, partial dependence plot; PEEP, positive end-expiratory pressure; PPV, positive predictive value; PMA, post-menstrual age; ROX, respiratory rate; RSS, respiratory severity score; SNUBH, Seoul National University Bundang Hospital; SBT, spontaneous breathing trials; SGD, stochastic gradient descent; SHAP, SHapley Additive explanation. * Corresponding authors at: Department of Pediatrics, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea (C.W. Choi) Healthcare ICT Research Center, Office of eHealth Research and Businesses, Seoul National University Bundang Hospital, Seongnam, Republic of Korea (S. Yoo). E-mail addresses: choicw@snu.ac.kr (C. Won Choi), yoosoo0@snubh.org (S. Yoo). 1 These authors contributed equally to this work. https://doi.org/10.1016/j.ijmedinf.2023.105192 Received 26 March 2023; Received in revised form 13 July 2023; Accepted 8 August 2023 Available online 12 August 2023 1386-5056/© 2023 Elsevier B.V. All rights reserved. W. Song et al. International Journal of Medical Informatics 178 (2023) 105192 rate of the first planned extubation ranged from 10% after 24 h postextubation to nearly 50% by discharge in preterm infants with a birth weight of ≤ 1,250 g. Extubation failure and subsequent reintubation were associated with an increased rate of extubation failure, and sub­ sequent reintubation was associated with an increased duration of MV by 10 to 12 days, BPD and mortality, primarily because of prolonged exposure to MV [4]. However, reintubation after planned extubation is associated with an increased risk of respiratory morbidity and mortality, independent of the cumulative MV duration and other known con­ founders [4–7]. Despite the importance of early successful extubation, no consensus protocols or guidelines exist to aid physicians in determining the optimal timing for extubation in preterm infants. In clinical practice, extubation timing is determined at the attending physician’s discretion, which re­ sults in significant inter-physician and inter-center variations [1]. Studies have been conducted to determine objective predictors or pre­ diction models for extubation readiness [2,5,6,8–11]. However, limita­ tions exist in the clinical application of these methods because most studies have been conducted with small sample sizes and without external validation [4]. Recently, spontaneous breathing trials (SBT) have gained popularity in determining extubation readiness in extremely preterm infants [12]. However, SBT shows low specificity, and many extremely preterm in­ fants experience adverse events during SBT [13]. Shalish et al. [14] proposed that the combination of clinical events used to define a passed or failed SBT had low accuracy in predicting extubation success. Few studies have investigated the physiological signals as predictors of extubation readiness in preterm infants. When used in combination with SBT, variations in heart or respiratory rate improved the predictive ability for extubation readiness in small-scale studies [15,16]. We aimed to identify physiological signals to differentiate extubation success from failure and apply novel data-driven features to develop a robust and field-applicable prediction model for calculating the proba­ bility of extubation readiness based on the physiological signals identi­ fied. We named our model the neonatal extubation readiness predictor (NExt-Predictor), which uses physiological signals. We compared and evaluated its performance with that of existing models. 2. Methods 2.1. Data source This retrospective cohort study was conducted using electronic health records (EHRs) and periodic vital signs of the Seoul National University Bundang Hospital (SNUBH) collected from July 2003 to June 2019 and the Medical Informative Market for Intensive Care (MIMIC-III) database targeting preterm infants admitted to the NICU [17]. The development and internal validation cohorts were constructed by separating the SNUBH database from 2003 to 2016 and 2017 to 2019, respectively. The external validation cohort was constructed using the MIMIC database for the period from 2001 to 2008 (Fig. 1). This study followed the transparent reporting of a multivariate prediction model for individual prognosis or diagnosis (TRIPOD) reporting guidelines [18]. Furthermore, this study was reviewed and approved by the Insti­ tutional Review Board (IRB) of SNUBH (X-2205–759-901). Because the data source for this study was de-identified, a waiver of consent was granted. Approval for data collection, processing, and release from the MIMIC-III database was granted by the IRB of the Beth Israel Deaconess Medical Center and the Massachusetts Institute of Technology [17]. 2.2. Eligibility criteria and population This study included preterm infants born earlier than 32 weeks of gestational age (GA) who received MV through an endotracheal tube and underwent their first extubation attempt before 36 weeks of postmenstrual age (PMA). Patients with major congenital anomalies or other airway problems and those who were extubated after<6 h of MV were excluded. We selected patients who had been intubated for more than 6 h to exclude cases where intubation was performed for specific examinations such as MRI or surgeries for conditions such as retinopathy of prematurity or inguinal hernia. Patients who underwent unplanned extubation were excluded. Extubation time was defined as the index date t = 0 to verify reintubation or extubation failure. An observation window was set from the time of admission to the time of index data collection. Fig. 1 shows a flowchart of the development and internal and Fig. 1. Development and internal and external validation cohorts to identify predictors and develop predictive models. (A) Development cohort and internal validation cohort. (B) External validation cohort. 2 W. Song et al. International Journal of Medical Informatics 178 (2023) 105192 external validation cohorts. 2.6. Prediction model development and validation The models used in this study were logistic regression, random for­ est, gradient boosting model (GBM), decision tree, stochastic gradient descent (SGD) classifier, complement-naïve Bayesian (CNB), and extreme gradient boosting (XGB) [24], all of which have demonstrated high performance in recent studies. The stratified split was isolated based on the training (80%) and test datasets (20%) in the development cohort. For hyperparameter optimization of each model, 10-fold crossvalidation and a grid search based on the area under the receiver operating characteristic curve (AUROC) were applied. The categories and value ranges of the applied model hyperparameters are listed in eTable 2 (Supplement). Model selection was performed only on the training dataset to prevent the leakage of the validation datasets. 2.3. Outcome The primary outcome was the success or failure of the first planned extubation in preterm infants. Extubation success was defined as no reintubation within 72 h of the planned extubation. Reintubation within 10 min of extubation was excluded from the analysis because discrimi­ nating between unplanned extubation due to self-extubation or a mis­ placed endotracheal tube and extubation failure was difficult. Further details are included in the eMethods section of Supplementary Material. 2.4. Predictors 2.7. Performance evaluation metrics The clinical data and physiological signals before extubation and reintubation (in cases of extubation failure) were extracted to identify potential predictors. We selected routinely obtained vital signs, including heart rate, respiratory rate, body temperature (BT), oxygen saturation (SpO2), and blood pressure. Potential predictors included GA, birth weight, PMA at the time of extubation, male sex, pre-extubation blood gases (pH and pCO2), and ventilator settings, such as the frac­ tion of inspired oxygen (FiO2), positive end-expiratory pressure (PEEP), mean airway pressure (MAP), and frequency. Moreover, respiratory indices, such as the SpO2/FiO2 (SF) [19] ratio, the ratio of SpO2/FiO2 to the respiratory rate (ROX) [20], and respiratory severity score (RSS), were included. Missing values were inputted using the last observation carried forward (LOCF). The reasons for using LOCF are as follows: first, it minimizes data distribution distortion. As this study used variable variability, we needed to reduce the risk of introducing artificial biases or altering the statistical properties of the data. Second, most of the missing variables were ventilation input data. Vital signs were collected automatically; therefore, there was almost no data loss. The ventilation input data were entered into the EMR only when the clinician reported a significant change. Therefore, reflecting the characteristics of these clinical environments, we assumed that the empty input value was likely to be similar to the previous data, and even if it actually changed in the middle, the difference was negligible. Time-domain analysis methods were applied to the physiological variables periodically measured from the admission to the index data to generate predictors (eTable 1 in the Supplement). We named this predictor NExt-Predictor. To compare the effectiveness of our NExt-Predictor, the prediction model proposed by Gupta et al. [2] was selected as the baseline model. We then compared the performance of NExt-Predictor with that of the baseline model, which was recalibrated to our dataset. The baseline model used GA, extubation days, FiO2, RSS, weight, and pH as pre­ dictors. Based on the method of Gupta et al., the time points of the predictors of the baseline model were as follows: for FiO2, pH, and body weight, the values measured at the last time point before extubation were used; for RSS, the highest value during the 24 h before extubation was used. SHAP [25] was applied to evaluate the interpretability and feature importance of the potential predictor, SHapley Additive exPlanation (SHAP) [25] was applied. A partial dependence plot (PDP) was used to analyze the predictor cutoff and marginal effects. The following metrics were used to assess the model discrimination: accuracy, AUROC, area under the precision-recall curve (AUC), positive predictive value (PPV), and negative predictive value (NPV). To compare the discriminating abilities of the prediction models, we calculated the 95% CI of the AUROC and decision curve analysis. Because the data class was highly imbalanced, the trained prediction model was compared with two random guessers. The uninformed guesser was a classifier who did not follow the distribution within the cohort (extubation success rate of 83%), and the informed guesser was the baseline classifier who followed the distribution of each cohort. Using the calibration belt method, we evaluated the goodness of fit between the predicted and observed probabilities [26]. 3. Results 3.1. Study population In the SNUBH database, 678 infants met the inclusion criteria. Of Table 1 Baseline characteristics for extubation success and failure groups. Patient Characteristics Gestational Age, mean (SD), weeks Birth Weight, mean (SD), g Gender, Male PMA at extubation (weeks) Ventilation Variables FiO2 PEEP (cm H2O) MAP (cm H2O) Frequency (rpm) Oxygen Saturation (%) Ventilation Variables Post Extubation N-CPAP NIPPV HFNC FiO2 PEEP (cm H2O) MAP (cm H2O) Frequency (rpm) 2.5. Statistical analysis for feature selection Statistical models and tableone Python libraries were used for sta­ tistical analyses [21,22]. Propensity score matching was used to identify candidate features with statistically significant differences between the outcome and control cohorts [23]. Univariate analyses were performed to determine the adjusted odds ratios (ORs) and marginal effects of all predictors by controlling for gestational age and birth weight, which can significantly influence the extubation success rate of preterm infants. Predictors were analyzed based on the time of extubation using uni­ variate analysis, such as the Student’s t-test, while categorical variables were analyzed using the chi-square test for baseline characteristics. Development Cohort Extubation Success Group (n = 402) Extubation Failure Group (n = 79) P-value 28.8 (2.1) 1160.5 (352.2) 213 (53.0) 30.6 (2.5) 27.1 (2.1) 897.0 (342.2) 52 (65.8) 29.4 (2.9) <0.001 <0.001 0.048 <0.001 0.24 (0.05) 5.0 (0.7) 7.7 (1.4) 24.6 (9.9) 97.2 (3.9) 0.28 (0.08) 5.1 (0.6) 8.2 (1.6) 25.5 (10.5) 96.0 (4.5) <0.001 0.370 0.233 0.486 0.025 322 (80.1) 24 (6.0) 22 (5.5) 0.26 (0.07) 5.2 (0.8) 8.3 (1.6) 35.5 (12.4) 63 (79.7) 9 (11.4) 7 (8.9) 0.33 (0.10) 5.4 (0.8) 9.6 (2.5) 25.5 (13.0) <0.001 0.077 0.089 0.495 Abbreviations: HFNC, high-flow nasal cannula; MAP, mean airway pressure; NCPAP, nasal continuous positive airway pressure; NIPPV, nasal intermittent positive pressure ventilation; PEEP, positive end-expiratory pressure PMA, postmenstrual age. 3 W. Song et al. International Journal of Medical Informatics 178 (2023) 105192 these, 481 (71%) were included in the development cohort, and 197 (29%) were included in the internal validation cohort (Fig. 1, Table 1, and eTable 3). In the development cohort, the mean (standard deviation [SD]) GA was 28.5 (2.2) weeks, and birth weight was 1,117 (3 5 8) g; 402 patients (84%) were successfully extubated. In internal validation, the mean (SD) GA was 28.8 (2.2) weeks, and birth weight was 1,179 (3 7 5) g; 163 (83%) of the patients were extubated successfully (Tables 1 and 2 and eTable 3). No statistically significant differences were observed in demographic characteristics between the development and internal validation cohorts, except for birth weight (P = 0.049) (Table 2). However, the ventilation settings measured at the time closest to extu­ bation, FiO2 (0.25 [0.06] vs. 0.24 [0.04], P = 0.035), PEEP (4.99 [0.66] vs. 5.29 [0.72], P < 0.001), and frequency (24.76 [10.0] vs. 34.34 [5.43], P < 0.001) showed significant differences between the devel­ opment and internal validation cohorts. In contrast, the SpO2 and RSS of the two cohorts were not significantly different. From the MIMIC-III database, 802 patients were included in the validation cohort. The mean (SD) GA was 29.2 (2.3), and birth weight was 1,343 (429) g; 668 (83%) patients were extubated successfully. difference of the SF ratio had a high coefficient value, indicating that the probability of extubation success decreased when the sequential mean difference of the SF ratio increased (Fig. 2 A, B). Heart rate variability was associated with successful extubation (Fig. 2 C, D, and E). 3.3. Model performance We created 331 feature sets and 993 predictive models to develop and validate our extubation readiness model. Table 3 compares the models, including the baseline model [2] and NExt-Predictor. The NExtPredictor demonstrated high discriminating ability (AUROC 0.892; 95% CI, 0.890–0.895) in internal validation and (AUROC 0.766; CI, 0.765–0.768) in external validation. Compared with the baseline models, the AUROC of NExt-Predictor exceeded the 95% CI without overlap. In the external validation, NExt-Predictor did not significantly degrade performance compared with the AUROC in the internal vali­ dation (Fig. 3 and Table 3). In contrast, the AUROC of the baseline models did not exceed 0.700. The AUROC of the uninformed and informed guessers were both approximately 0.500, and the accuracy did not exceed the performance of our model. The decision curve analysis demonstrated that NExtPredictor provided more benefits than did the clinician’s decision (gray line) and other models (eFig 2. in Supplementary Material). The results of calibration-BELT of NExt-Predictor demonstrated that internal and external calibration were reasonable, as the p-values were higher than 0.05 in all cohorts (eFig 2). 3.2. Predictors of extubation success Based on multivariate analysis, we generated candidate predictors and selected predictors of extubation success (eTables 5 and 6). We also demonstrate the contribution and interpretability of each predictor using SHAP and PDP. The probability contribution of each predictor and the mutual dependency between predictors were calculated using PDP (Fig. 2). In the PDP analysis, we found a positive correlation between the probability of extubation success and both the mean and sequential difference mean of heart rate within the 12 h prior observation window. The SF ratio positively correlated with extubation success, showing a small variation within the observation window and a high mean within the same period. In contrast, heart rate showed high variability, but the probability of extubation success increased as it approached a stationary state. These results suggest that both the variation and average of the predictors, which were the major determinants in previous studies, played important roles in determining extubation success in our study [1,3,27]. The probability of successful extubation decreased rapidly when the SF ratio was < 319.29, rapidly decreasing. Furthermore, FiO2 had a high SHAP value (0.105), but when checking the PDP plot, discrimination power did not appear at an FiO2 of 0.4 or less. The contributions of the other predictors (PEEP, BT, and DBP) did not exceed 0.1, indicating their negligible contributions. The sequential mean 4. Discussion In this study, we proposed an extubation readiness prediction model (NExt-Predictor) that can be easily and safely applied in clinical prac­ tice. Furthermore, we performed external validation to identify whether NExt-Predictor was applicable to most clinical settings. The AUROC and decision curve of NExt-Predictor demonstrated high discrimination power in both validation cohorts. Moreover, when extubation success was defined as no reintubation within 7 days rather than 72 h, our model performed slightly better without performance degradation (AUROC, 0.885 in the internal validation cohort and AUROC, 0.784 in the external validation cohort). Since the extubation success rate of the two institutions was already high, we additionally confirmed by decision curve analysis the extent to which the predictive model could improve clinical decisions. We demonstrated that the net benefit was high at a threshold of 0.5, although the gray line representing the actual clinical decisions of the Table 2 Comparison of the demographic characteristics in the development, internal validation, and external validation cohorts. Patient Characteristics Development Cohort (n = 481) Internal Validation Cohort (n = 197) Gestational Age, mean (SD), weeks Birth Weight, mean (SD), g Gender, Male PMA at extubation, mean (SD), weeks Ventilation Variables FiO2, mean (SD) PEEP, mean (SD), cm H2O MAP, mean (SD), cm H2O Frequency, mean (SD), rpm Oxygen Saturation, mean (SD), % Physiological Variables Measured Weight, mean (SD), g Heart Rate, mean (SD), bpm Respiratory Rate, mean (SD), rpm Systolic Blood Pressure, mean (SD), mmHg Diastolic Blood Pressure, mean (SD), mmHg Mean Blood Pressure, mean (SD), mmHg Body Temperature, mean (SD), ℃ 28.5 (2.2) 1117.2 (358.2) 265 (55.1) 30.4 (2.6) 28.8 (2.2) 1179.0 (374.6) 109 (55.3) 30.9 (2.2) 0.25 (0.06) 4.99 (0.66) 7.83 (1.44) 24.76 (10.00) 97.03 (3.98) 1271.5 (381.2) 152.10 (15.28) 49.87 (14.41) 62.08 (11.45) 37.15 (9.88) 46.62 (9.66) 36.93 (0.28) External Validation Cohort (n = 802) P-value 0.058 0.049 1.000 0.010 29.2 (2.3) 1342.7 (429.0) 366 (45.6) 30.9 (1.8) <0.001* <0.001* 1.000 <0.001* 0.24 (0.04) 5.29 (0.72) 8.18 (1.39) 34.34 (5.43) 97.39 (3.28) 0.035 <0.001 0.050 <0.001 0.214 0.22 (0.04) 5.24 (0.44) 5.70 (1.26) 49.20 (13.34) 96.82 (2.82) <0.001* <0.001* <0.001* <0.001* 0.060* 1322.5 (348.7) 153.56 (14.94) 50.12 (13.26) 66.54 (10.89) 39.15 (8.98) 48.44 (8.56) 36.99 (0.29) 0.095 0.228 0.819 <0.001 0.008 0.021 0.015 1403.7 (381.1) 152.06 (16.22) 44.78 (16.48) 65.65 (9.20) 38.22 (7.76) 48.10 (7.59) 36.18 (0.32) <0.001* 0.425* <0.001* <0.001* 0.010* 0.010* <0.001* * One-way analysis of variance (ANOVA) test. 4 P-Value W. Song et al. International Journal of Medical Informatics 178 (2023) 105192 Fig. 2. PDP and PDP interactive plots for the SF ratio and heart rate (A). Mean of the SpO2/FiO2 ratio (B). Sequential difference mean of the SpO2/FiO2 ratio (C). Dispersion of the heart rate, (D) mean of the heart rate, and (E) trend of the heart rate. two institutions already showed a high net benefit. In previous studies, this type of graph explains that the benefit gained by predictive models compared to “treat all” are partial [28,29]. However, “treat all” in this study represented the actual clinician’s decision, not the disease’s prevalence, which means that our model demonstrated considerable performance compared to clinical decision-making, and could contribute to increasing extubation success. In the NICU, the decision to extubate preterm infants is almost entirely based on physician judgment, resulting in substantial variations in extubation practices and frequent failures. Although it varies from study to study, only 60% to 73% of extremely low birth weight infants are known to be successfully extubated [29]. Preterm infants in whom extubation failed were exposed to additional risks, including respiratory deterioration and alterations in cerebral blood flow and oxygenation. Extubation failure and subsequent reintubation are associated with an increased duration of MV by 10–12 days [4,7,30]. Furthermore, pro­ longed MV increases the risk of BPD and neurodevelopmental impair­ ment [30–32]. However, in a small subset of preterm infants, reintubation itself is associated with an increased risk of BPD or death independent of MV duration [4]. Taken together, knowledge of the optimal timing of extubation is crucial for improving the short- and longterm outcomes of preterm infants. Although tools have been developed to predict extubation readiness in preterm infants, reliable methods are lacking in clinical practice. SBTs have gained attention as objective extubation readiness tests because they are easy to perform and do not require any special equipment [1,4]. However, the number of studies on preterm populations is limited. A meta-analysis of these studies [27] concluded that preterm infants should be extubated directly from low-ventilation settings without a trial of ET-CPAP. Some studies have reported that the SBT has low specificity in predicting extubation success. [9,14,16]. After SBTs were implemented in clinical practice, the extubation failure rate was not significantly altered, which raised doubts about their effi­ cacy [15,27]. The major issue with SBT evaluation is its subjective interpretation, leading to limited reproducibility [1,33,34]. Shalish et al. [14] recently demonstrated that 57% of infants exhibited signs of clin­ ical instability during SBTs, which did not improve extubation predic­ tion. Therefore, there is a need to develop a new predictor that complements subjective judgments without invasive interventions. In our study, commonly used ventilation setting parameters (PEEP, FiO2, MAP, and RSS) did not differentiate between extubation success and failure. Individual physician preferences influence these parameters and have shown conflicting results in other studies [4,15]. In contrast, the SF ratio emerged as a robust predictor of extubation success in our study. It had a high SHAP value and discrimination performance, reflecting the patient’s oxygenation capacity. Lower heart rate variability was also associated with a higher likelihood of extubation failure, consistent with previous studies on cardiovascular variability and extubation readiness [16]. While signal variability analysis has been used to predict sepsis and unexpected mortality in the NICU [35], its application in predicting other clinical outcomes in preterm infants is limited due to challenges in distinguishing between normal physiological and pathological varia­ tions. Some studies have explored the use of heart or respiratory rate variability based on SBTs to predict extubation failure [36,37]. In our 5 W. Song et al. International Journal of Medical Informatics 178 (2023) 105192 Table 3 Performance of Next-Predictor and the comparison model for predicting extubation readiness. Development Cohort Classifier Metrics NExt-Predictor LR XGB GBM RF SGD DT CNB Baseline Model Gupta D’s Model 2 Recalibrated LR* Internal validation Cohort Classifier Metrics NExt-Predictor LR XGB GBM RF SGD DT CNB Baseline Model Gupta D’s Model2 Recalibrated LR* External validation Cohort Classifier Metrics NExt-Predictor LR XGB GBM RF SGD DT CNB Baseline Model Gupta D’s Model2 Recalibrated LR* ACC AUROC (CI 95%) PRC SENS SPEC PPV NPV 0.851 0.855 0.874 0.870 0.851 0.864 0.841 0.783 (0.780–0.795) 0.827 (0.825–0.829) 0.869 (0.867–0.871) 0.883 (0.881–0.884) 0.780 (0.778–0.782) 0.778 (0.774–0.779) 0.718 (0.716–0.721) 0.951 0.962 0.971 0.976 0.949 0.926 0.924 0.630 0.812 0.734 0.837 0.635 0.741 0.580 0.800 0.675 0.800 0.775 0.775 0.650 0.775 0.945 0.932 0.952 0.953 0.939 0.920 0.934 0.286 0.397 0.356 0.466 0.281 0.315 0.253 0.811 0.841 0.581 0.691 (0.688–0.694) 0.850 0.915 0.610 0.771 0.584 0.512 0.887 0.895 0.218 0.293 ACC AUROC (CI 95%) PRC SENS SPEC PPV NPV 0.819 0.828 0.801 0.828 0.787 0.805 0.782 0.892 (0.890–0.895) 0.815 (0.812–0.718) 0.799 (0.796–0.803) 0.836 (0.833–0.839) 0.818 (0.815–0.821) 0.766 (0.763–0.770) 0.799 (0.796–0.803) 0.954 0.929 0.918 0.939 0.930 0.895 0.913 0.770 0.897 0.851 0.914 0.782 0.681 0.730 0.872 0.574 0.617 0.511 0.702 0.907 0.745 0.957 0.886 0.892 0.874 0.907 0.907 0.914 0.506 0.600 0.527 0.615 0.465 0.533 0.427 0.795 0.779 0.677 0.767 (0.762–0.770) 0.836 0.892 0.697 0.862 0.651 0.500 0.876 0.862 0.378 0.500 ACC AUROC (CI 95%) PRC SENS SPEC PPV NPV 0.713 0.714 0.705 0.715 0.703 0.709 0.712 0.766 (0.765–0.768) 0.683 (0.681–0.684) 0.653 (0.652–0.655) 0.720 (0.718–0.721) 0.705 (0.703–0.707) 0.641 (0.640–0.742) 0.712 (0.710–0.713) 0.882 0.823 0.798 0.848 0.836 0.784 0.823 0.734 0.868 0.782 0.864 0.611 0.876 0.797 0.685 0.346 0.475 0.360 0.720 0.311 0.517 0.848 0.761 0.781 0.764 0.839 0.753 0.798 0.518 0.524 0.477 0.526 0.436 0.511 0.516 0.693 0.699 0.499 0668 (0.666–0.669) 0.816 0.824 0.709 0.754 0.555 0.466 0.791 0.771 0.445 0.443 Abbreviations: RF, random forest classifier; SGD, stochastic gradient descent classifier; XGB, extreme gradient boosting; GBM, gradient boosting machine; DT, decision tree classifier; CNB, complement-naïve Bayesian; LR, logistic regression; NA, not available; * classifier with the predictors of Gupta et al. Fig. 3. Area under the receiver operating characteristic curve of NExt-Predictor with logistic regression model performance: (A) development, (B) internal vali­ dation, and (C) external validation cohorts. 6 W. Song et al. International Journal of Medical Informatics 178 (2023) 105192 study, heart rate variability in the 12 h before extubation served as a predictor of extubation failure without the need for SBT. The SF ratio can be easily calculated noninvasively using SpO2 and FiO2 and has already proven to be a reliable proxy for the PaO2/FiO2 ratio and a good predictor of noninvasive ventilation failure in children with acute res­ piratory distress syndrome [38–40]. Our findings highlight the critical role of slight differences in pulmonary oxygenation capacity for suc­ cessful extubation in preterm infants. This study has several strengths. Firstly, we identified time-series domain predictors that can be obtained in real-time at the bedside. NExt-Predictor demonstrated high performance with an AUROC of 0.892 in internal validation. Secondly, we performed external validation using data from the MIMIC-III database to mitigate selection bias and demonstrate the generalizability of our NExt-Predictor. Thirdly, the NExt-Predictor does not rely on ventilator settings, allowing for uniform predictive performance across different centers and among various physicians with different ventilator-weaning strategies and settings for extubation decisions. However, this study also had limitations. Firstly, the data from the MIMIC-III database were collected between 2001 and 2008 and may be considered outdated. Secondly, the small sample size of most extubation readiness studies often underestimates the performance of state-of-theart classifiers like deep learning. Hence, a multicenter study with a larger population is needed to develop a more robust model for pre­ dicting extubation readiness with enhanced performance and broader applicability. Thirdly, even with external validation, prospective clinical studies are required to confirm the clinical efficacy of the NExtPredictor. Investigation, Project administration, Resources, Supervision, Writing – review & editing. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgement Funding This study was supported by a grant from the Korea Health Tech­ nology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Re­ public of Korea (grant number: HI18C0022). Author contributions WS analyzed the patient data and developed a prediction model. YHJ and CWC designed this study. JC participated in the data analysis, and HB curated the data and events. WS, YHJ, CWC, and SY were major contributors to the writing of the manuscript. SY and CWC supervised the study. All the authors have read and approved the final version of the manuscript. Appendix A. Supplementary material Supplementary data to this article can be found online at https://doi. org/10.1016/j.ijmedinf.2023.105192. 5. Conclusion References Although a few studies have been conducted on assessing extubation readiness in preterm infants, there remains no widely accepted assess­ ment tool for extubation readiness for them. In this study, we developed and evaluated an extubation readiness prediction model for preterm infants using EHRs and vital sign databases, which can assist physicians in determining the optimal timing of extubation for preterm infants in clinical practice without requiring a specific procedure or specialized device. [1] H. Al-Mandari, W. Shalish, E. Dempsey, M. Keszler, P.G. Davis, G. Sant’Anna, International survey on periextubation practices in extremely preterm infants, Arch Dis Child Fetal Neonatal Ed 100 (2015) F428, https://doi.org/10.1136/ archdischild-2015-308549. [2] D. Gupta, R.C. Greenberg, A. Sharma, G. Natarajan, M. Cotton, R. Thomas, S. Chawla, A predictive model for extubation readiness in extremely preterm infants, Am. J. Perinatol. 39 (2019) 1663–1669, https://doi.org/10.1038/s41372019-0475-x. [3] W. Shalish, L. Kanbar, M. Keszler, S. Chawla, L. Kovacs, S. Roa, B.A. Panaitescu, A. Laliberte, D. Precup, K. Brown, R.E. Kearney, G.M. Sant’Anna, Patterns of reintubation in extremely preterm infants: a longitudinal cohort study, Pediatr. Res. 83 (2018) 969–975, https://doi.org/10.1038/pr.2017.330. [4] W. Shalish, M. Keszler, P.G. Davis, G.M. Sant’Anna, Decision to extubate extremely preterm infants: art, science or gamble? Arch. Dis. Child. Fetal Neonatal Ed. 107 (2022) 105–112, https://doi.org/10.1136/archdischild-2020-321282. [5] B.J. Manley, L.W. Doyle, L.S. Owen, P.G. Davis, Extubating extremely preterm infants: predictors of success and outcomes following failure, J. Pediatr. 173 (2016) 45–49, https://doi.org/10.1016/j.jpeds.2016.02.016. [6] S. Chawla, G. Natarajan, S. Shankaran, B. Carper, L.P. Brian, M. Keszler, W. A. Carlo, et al., Markers of successful extubation in extremely preterm infants, and morbidity after failed extubation, J. Pediatr. 189 (2017) 113–119.e2, https://doi. org/10.1016/j.jpeds.2017.04.050. [7] W. Shalish, L. Kanbar, L. Kovacs, S. Chawla, M. Keszler, S. Roa, S. Panaitescu, The impact of time interval between extubation and reintubation on death or bronchopulmonary dysplasia in extremely preterm infants, J. Pediatr. 205 (2019) 70–76.e2, https://doi.org/10.1016/j.jpeds.2018.09.062. [8] A. Mikhno, C.M. Ennett, Prediction of extubation failure for neonates with respiratory distress syndrome using the MIMIC- II clinical database, Conf, Proc. IEEE Eng. Med. Biol. Soc. (2012) 5094–5097, https://doi.org/10.1109/ EMBC.2012.6347139. [9] S. Chawla, G. Natarajan, M. Gelmini, S.N.J. Kazzi, Role of spontaneous breathing trial in predicting successful extubation in premature infants, Pediatr. Pulmonol. 48 (2013) 443–448, https://doi.org/10.1002/ppul.22623. [10] W. Shalish, L.J. Kanbar, S. Roa, C.A. Robles-Rubio, L. Kovacs, S. Cawla, M. Keszler, et al., Prediction of extubation readiness in extremely preterm infants by the automated analysis of cardiorespiratory behavior: study protocol, BMC Pediatr. 17 (2017) 167, https://doi.org/10.1186/s12887-017-0911-z. [11] P. Gourdeau, L. Kanbar, W. Shalish, G. Saint’Anna, R. Kearney, D. Precup, Feature selection and oversampling in analsis of clinical data for extubation readiness in extreme preterm infants, Conf, Proc. IEEE Eng. Med. Biol. Soc. (2015) 4427–4430, https://doi.org/10.1109/EMBC.2015.7319377. [12] M. Beltempo, T. Isayama, M. Vento, K. Lui, S. Kusuda, L. Lehtonen, G. Sjörs, et al., Respiratory management of extremely preterm infants: an international survey, Neonatology 114 (2018) 28–36, https://doi.org/10.1159/000487987. 6. Summary Table What is already known • Early extubation in preterm infants has advantages, but there is currently no consensus on the most effective guideline. • The use of SBT has gained popularity as a method of determining extubation readiness. • Limited research has explored the predictive value of physiological signals in determining extubation readiness. What this paper adds • The SF ratio and its variability were identified as predictors that could be used to quantitatively evaluate extubation readiness. • The prediction model could assist clinicians in determining extubation readiness without a specific procedure or a specialized device. • NExt-predictor demonstrated high performance in both internal and external validation. CRediT authorship contribution statement Wongeun Song: Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing – original draft. Young Hwa Jung: Conceptualization, Formal analysis, Method­ ology, Validation, Writing – original draft. Jihoon Cho: Data curation. Hyunyoung Baek: Data curation. Chang Won Choi: Conceptualiza­ tion, Funding acquisition, Investigation, Resources, Supervision, Writing – review & editing. Sooyoung Yoo: Conceptualization, 7 W. Song et al. International Journal of Medical Informatics 178 (2023) 105192 Arch. Dis. Child Fetal Neonatal Ed 104 (2019) F89–F97, https://doi.org/10.1136/ archdischild-2017-313878. [28] M.S. Pepe, K.F. Kerr, G. Longton, Z. Wang, Testing for improvement in prediction model performance, Stat. Med. 32 (2013) 1467–1482, https://doi.org/10.1002/ sim.5727. [29] B. Van Calster, et al., Reporting and Interpreting Decision Curve Analysis: A Guide for Investigators, Eur. Urol. 74 (2018) 796–804, https://doi.org/10.1016/j. eururo.2018.08.038. [30] B. Guy, M.E. Dye, L. Richards, S.O. Guthrie, L.D. Hatch, Association of time of day and extubation success in very low birthweight infants: a multicenter cohort study, Am. J. Perinatol. 41 (2021) 2532–2536, https://doi.org/10.1038/s41372-02101168-6. [31] E.A. Jensen, S.B. DeMauro, M. Kornhauser, Z.H. Aghai, J.S. Greenspan, K.C. Dysart, Effects of multiple ventilation courses and duration of mechanical ventilation on respiratory outcomes in extremely low-birth-weight infants, JAMA Pediatr. 169 (2015) 1011–1017, https://doi.org/10.1001/jamapediatrics.2015.2401. [32] M.C. Walsh, B.H. Morris, L.A. Wrage, B.R. Vohr, W.K. Poole, J.E. Tyson, L. L. Wright, et al., Extremely low birthweight neonates with protracted ventilation: mortality and 18-month neurodevelopmental outcomes, J. Pediatr. 146 (2005) P798–P804, https://doi.org/10.1016/j.jpeds.2005.01.047. [33] R.J.S. Vliegenthart, A.H. van Kaam, C.S.H. Aarnoudse-Moens, A.G. van Wassenaer, W. Onland, Duration of mechanical ventilation and neurodevelopment in preterm infants, Arch Dis Child Fetal Neonatal Ed 104 (2019) F631–F635, https://doi.org/ 10.1136/archdischild-2018-315993. [34] H. Zein, A. Baratloo, A. Negida, S. Safari, Ventilator weaning and spontaneous breathing trials; an educational review, Emerg (Tehran) 4 (2016) 65–71, https:// doi.org/10.22037/AAEM.V4I2.222. [35] S. Godard, C. Henry, P. Westgaard, N. Scales, S.M. Brown, K. Burns, S. Mehta, et al., Practice variation in spontaneous breathing trial performance and reporting, Can. Respir. J. (2016), https://doi.org/10.1155/2016/9848942. [36] B.A. Sullivan, C. McClure, J. Hicks, D.E. Lake, J.R. Moorman, K.D. Fairchild, Early heart rate characteristics predict death and morbidities in preterm infants, J. Pediatr. 174 (2016) 57–62, https://doi.org/10.1016/j.jpeds.2016.03.042. [37] J. Kaczmarek, S. Chawla, C. Marchica, M. Dwaihy, L. Grundy, G.M. Sant’Anna, Heart rate variability and extubation readiness in extremely preterm infants, Neonatology 104 (2013) 42–48, https://doi.org/10.1159/000347101. [38] J. Kaczmarek, C.O. Kamlin, C.J. Morley, P.G. Davis, G.M. Sant’anna, Variability of respiratory parameters and extubation readiness in ventilated neonates, Arch. Dis. Child Fetal Neonatal Ed 98 (2013) F70–F73, https://doi.org/10.1136/ fetalneonatal-2011-301340. [39] R.G. Khemani, N.R. Patel, R.D. Bart, C.J.L. Newth, Comparison of the pulse oximetric saturation/fraction of inspired oxygen ratio and the Pao2/Fraction of inspired oxygen ratio in children, Chest 135 (2009) 662–668, https://doi.org/ 10.1378/chest.08-2239. [40] M. Pons-Odena, D. Palanca, V. Modesto, E. Estaban, D. González-Lamuño, R. Carreras, A. Palomeque, SpO2/FiO2 as a predictor of noninvasive ventilation failure in children with hypoxemic respiratory insufficiency, J. Pediatr. Intensive Care. 02 (2013) 111–119, https://doi.org/10.3233/PIC-13059. [13] A.M. Nakato, D.D.F.C. Ribeiro, A.C. Simão, R.P. Da Silva, P. Nohama, Impact of spontaneous breathing trials in cardiorespiratory stability of preterm infants, Respir. Care 66 (2021) 286–291, https://doi.org/10.4187/respcare.07955. [14] W. Shalish, L. Kanbar, L. Kovacs, S. Chawla, M. Keszler, S. Rao, S. Latremouille, et al., Assessment of extubation readiness using spontaneous breathing trials in extremely preterm neonates, JAMA Pediatr. 174 (2020) 178–185, https://doi.org/ 10.1001/jamapediatrics.2019.4868. [15] R.F. Teixeira, A.C.A. Carvalho, R.D. de Araujo, F.C.S. Veloso, S.B. Kassar, A.M. C. Medeiros, Spontaneous breathing trials in preterm infants: systematic review and meta-analysis, Respir. Care 66 (2021) 129–137, https://doi.org/10.4187/ respcare.07928. [16] J. Kaczmarek, C.O.F. Kamlin, C.J. Morley, P.G. Davis, G.M. Sant’Anna, Variability of respiratory parameters and extubation readiness in ventilated neonates, Arch Dis Child Fetal Neonatal Ed 98 (2013) F70–F73, https://doi.org/10.1136/ fetalneonatal-2011-301340. [17] A.E.W. Johnson, T.J. Pollard, L. Shen, L.H. Lehman, M. Feng, M. Ghassemi, B. Moody, et al., MIMIC-III, a freely accessible critical care database, Sci. Data 3 (2016), 160035, https://doi.org/10.1038/sdata.2016.35. [18] K.G.M. Moons, D.G. Altman, J.B. Reitsma, J.P.A. Ioannidis, P. Macaskill, E. W. Steyerberg, A.J. Vickers, et al., Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration, Ann. Intern. Med. 162 (2015) W1–W73, https://doi.org/10.7326/ M14-0698. [19] T.W. Rice, A.P. Wheeler, G.R. Bernard, D.L. Hayden, D.A. Schoenfeld, L.B. Ware, Comparison of the Spo2/Fio2 ratio and the Pao2/Fio2 ratio in patients with acute lung injury or ARDS, Chest 132 (2007) 410–417, https://doi.org/10.1378/ chest.07-0617. [20] O. Roca, J. Messika, B. Caralt, M. García-de-Acilu, B. Sztrymf, J.D. Ricard, J. R. Masclans, Predicting success of high-flow nasal cannula in pneumonia patients with hypoxemic respiratory failure: the utility of the ROX index, J. Crit. Care 35 (2016) 200–205, https://doi.org/10.1016/j.jcrc.2016.05.022. [21] S. Seabold, J. Perktold, Statsmodels: Econometric and Statistical Modeling with Python. 9th Python in Science Conference (2010) 57-61.https://doi.org/ 10.25080/Majora-92bf1922-011. [22] T.J. Pollard, A.E.W. Johnson, J.D. Raffa, R.G. Mark, Tableone: an open source Python package for producing summary statistics for research papers, JAMIA Open. 1 (2018) 26–31, https://doi.org/10.1093/jamiaopen/ooy012. [23] P.R. Rosenbaum, D.B. Rubin, The central role of the propensity score in observational studies for causal effects, Biometrika 70 (1983) 41–55, https://doi. org/10.1093/biomet/70.1.41. [24] T. Chen, C. Guestrin, XGBoost: A Scalable Tree Boosting System. 22nd ACM SIGKDD International Conference on Knowledge discovery and data mining. (2016) 785-794. [25] S.M. Lundberg, S.I. Lee, A unified approach to interpreting model predictions, Adv. Neural. Inf. Process. Syst. 30 (2017). [26] S. Finazzi, D. Poole, D. Luciani, P.E. Cogo, G. Bertolini, Calibration belt for qualityof-care assessment based on dichotomous outcomes, PLoS One 6 (2011) e16110, https://doi.org/10.1371/journal.pone.0016110. [27] W. Shalish, S. Latremouille, J. Papenburg, G.M. Sant’Anna, Predictors of extubation readiness in preterm infants: a systematic review and meta-analysis, 8