KSCE Journal of Civil Engineering (2024) 28(1):363-374 pISSN 1226-7988, eISSN 1976-3808 www.springer.com/12205 DOI 10.1007/s12205-023-2457-y Transportation Engineering A Hybrid Framework Combining LSTM NN and BNN for Short-term Traffic Flow Prediction and Uncertainty Quantification Yinpu Wang a , Siping Ke a , Chengchuan An a , Zhenbo Lu a , and Jingxin Xia a a Intelligent Transportation System Research Center, Southeast University, Nanjing 211189, China ARTICLE HISTORY ABSTRACT Received 14 January 2023 Accepted 25 August 2023 Published Online 18 October 2023 Short-term traffic flow prediction plays a critical role in Intelligent Transportation System (ITS), and has attracted continuous attention. Previous studies have focused on improving the prediction accuracy of mean traffic flow. Due to the dynamics and propagation of traffic system, reliable traffic control and induction measures have been considered to be dependent on prediction intervals of short-term traffic flows. The current parametric models used to quantify uncertainty in traffic flow prediction cannot well capture the nonlinear patterns of traffic flow series, and may not apply to situations without long-term continuous observations. This paper proposes a hybrid framework combining long short-term memory neural network (LSTM NN) and Bayesian neural network (BNN) for real-time traffic flow prediction and uncertainty quantification based on sequence data. Caltrans Performance Measurement System (PeMS) traffic flow data for 6 freeways in Sacramento city is aggregated at 15-min intervals to evaluate the proposed model. Compared to the SARIMA-GARCH model, the proposed LSTM-BNN model outperforms in predicting both the mean and interval of the traffic flow. Especially, the experiments show that the LSTM-BNN model is superior during the daytime and under non-seasonal traffic conditions. The proposed LSTM-BNN model can be utilized in ITS for making reliable management decisions. KEYWORDS Short-term traffic flow prediction Uncertainty quantification Long short-term memory neural network Bayesian neural network Intelligent transportation system 1. Introduction Short-term traffic flow is a critical parameter of the Intelligent Transportation System (ITS) (Luo et al., 2018). Using the future near-term traffic flow as input, active traffic management and control, including traffic signal timing optimization and freeway ramp control, could be developed to alleviate traffic congestion and air pollution. In both research and application domains, the prediction of short-term traffic flow has drawn extensive attention. Traffic management relying solely on the mean of the traffic flow may not be reliable due to the dynamics and propagation of traffic flow (Long et al., 2008; Treiber and Kesting, 2013). In this regard, the uncertainty quantification of traffic flow prediction is still a vital yet challenging task in ITS development. To date, there are still limited studies on quantifying uncertainty in short-term traffic flow prediction. In contrast to the conventional parametric models, Bayesian neural network (BNN) has shown its superiority in enhancing the accuracy and reliability of the prediction with its capability to measure predictive uncertainty. CORRESPONDENCE Jingxin Xia xiajingxin@seu.edu.cn ⓒ 2024 Korean Society of Civil Engineers Provided with the predictive uncertainty, more reliable applications can be archived in the ITS domain: 1) traffic managers can know the best and worst consequences of the decisions they will make; 2) travelers can make travel choices according to the upper and lower bounds of traffic conditions. This study establishes a hybrid framework combining long short-term memory neural network (LSTM NN) and BNN for traffic flow prediction and uncertainty quantification. The contributions of this study are summarized below: 1. A hybrid framework combining LSTM and BNN is designed for short-term traffic flow prediction and uncertainty quantification. The LSTM-BNN framework does not require building complex statistical models, and can accurately predict both the mean and interval of traffic flow in real-time based on short-time sequences. 2. This is the first neural network-based study to measure uncertainty in traffic flow prediction. The proposed framework can capture the nonlinear uncertainty patterns of traffic flow series. This work highlights the importance of uncertainty Intelligent Transportation System Research Center, Southeast University, Nanjing 211189, China 364 Y. Wang et al. quantification in short-term traffic flow prediction and demonstrates the potential of BNN in addressing transportation problems. The remaining part of the paper is organized into the following sections. Section 2 reviews the related literature for traffic flow prediction. Section 3 provides the details of the hybrid framework, which is composed of a mean prediction module and an interval prediction module. In Section 4, the deployment of the case study is illustrated and the performance is evaluated. Finally, the conclusions and future work are separately discussed in Section 5. 2. Literature Review There are two primary approaches to predicting traffic flow, i.e., parametric and nonparametric models. Time series methods and Kalman filtering methods predominate among the parametric models. Ahmed and Cook (1979) used the autoregressive integrated moving average (ARIMA) (0,1,3) model to forecast traffic flow on freeways in Los Angeles, Detroit, and Minneapolis surveillance systems. Hamed et al. (1995) found that the ARIMA (0,1,1) model adequately forecasted 1-min traffic volume and verified the model’s performance during the peak morning period on 5 urban arterials in Amman, Jordan. Williams (1999) and Williams and Hoel (2003) justified the application of time series methods to traffic flow forecasting by considering the Wold decomposition theorem, and a seasonal ARIMA (SARIMA) model was presented based on 15-min traffic flow data on a London motorway. In the model, weakly smooth transitions were generated with weekly seasonal differentials, and weekly patterns were set as the key seasonal influences. For the SARIMA model, Williams and Hoel (2003) and Smith et al. (2002) found that parameters (1,0,1)(0,1,1)672 performed significantly best using 15-min aggregated traffic flow data. To better forecast real-time traffic flow online, the Kalman filtering algorithm is widely adopted. Okutani and Stephanedes (1984) built two Kalman filtering models to forecast weekly and daily traffic flows. Based on vehicle speed data from Texas, USA, Ye et al. (2006) delved into the unbiased Kalman filter prediction method and found that the method can highly meet the accuracy requirements of real-time traffic flow prediction. Apart from the mean traffic flow prediction, few scholars have explored the use of parametric methods for uncertainty quantification in traffic flow forecasting. Xia et al. (2013) presented a vector autoregressive (VAR) plus multivariate generalized autoregressive conditional heteroscedasticity (MGARCH) method for shortterm traffic flow forecasting. The presented MGARCH model produces reliable time-varying confidence intervals for traffic flow. Guo et al. (2014) designed an adaptive Kalman filtering for the SARIMA-GARCH model to generate real-time traffic flow level and associated interval prediction. The proposed method shows better performance when the real-time traffic flow is highly unstable. Parametric models might have two limitations: 1) they are applicable to linear systems, so their accuracy still could be improved; 2) a complete long sequence is required to predict near-term traffic flow, e.g., SARIMA(1,0,1)(0,1,1)672 model needs more than a full week of observation to forecast traffic flow for the next 15 minutes. Traffic flow sequences are often inherently nonlinear and have missing values. Therefore, some studies have explored the use of nonparametric models for predicting traffic flow. In terms of nonparametric models, Clark et al. (1993) conducted a forecasting model utilizing a back-propagation (BP) neural network. Park et al. (1998) developed a radial basis function (RBF) neural network to predict traffic volume and concluded that RBF neural network outperformed the BP neural network and consumed less computational time. Due to the capability of capturing the temporal trends of sequence data, recurrent neural network (RNN) has recently been increasingly chosen to predict traffic flow (Tedjopurnomo et al., 2020). Amongst RNN-based methods, LSTM NN is by far the most popular one because it overcomes the shortcomings of conventional RNN in modeling long-term dependencies of sequence data (Hochreiter and Schmidhuber, 1997). Shao and Soong (2016) explored applying LSTM to short-term traffic flow prediction and pointed out that LSTM can learn more abstract representations in the sequence of non-linear traffic flow. Jia et al. (2017) utilized the LSTM model to forecast urban traffic flow taking into account rainfall factor. The model yielded better prediction performance based on rainfall intensity data and arterial traffic flow in Beijing, China. Some researchers treated the evolution of traffic flow as a spatial-temporal process and combined LSTM with other neural networks to improve prediction accuracy. Wu and Tan (2016) and Du et al. (2017) respectively designed a hybrid neural network model that combined LSTM NN and convolutional neural networks (CNN) for traffic flow forecasting. The accuracy and effectiveness of their developed models were demonstrated experimentally compared to other traditional neural network models. To utilize the advantage of the attention mechanisms that can select the relatively critical information for the current task from all inputs (Vaswani et al., 2017), Tang et al. (2021) presented an attention-based LSTM learning structure with the genetic algorithm to forecast entrancelevel traffic volume. Experiment results showed that the attention mechanism can diminish the influence of cumulative errors generated from long traffic flow sequences and further improve forecast accuracy and stability. Though the neural network-based models are performing increasingly better in traffic flow mean prediction, to our best knowledge, they have not dealt with uncertainty quantification in traffic flow forecasting. 3. Methodology 3.1 Overall Framework Figure 1 describes the proposed hybrid framework for short-term traffic flow prediction and uncertainty quantification. The hybrid framework consists of two key modules. The first is a mean prediction module based on LSTM NN, which is well-suitable for making a mean prediction based on time series data (Hochreiter and Schmidhuber, 1997). The second is an interval prediction KSCE Journal of Civil Engineering 365 Fig. 1. The Overall Framework module built on BNN, which can measure uncertainty by combining the neural network with Bayesian inference (Blundell et al., 2015; Kendall and Gal, 2017; Zhu and Laptev, 2017). There is a consensus that a large number of Bayesian layers is quite redundant in accounting for uncertainty (Zeng et al., 2018; Jospin et al., 2022). Therefore, this study seeks to quantify the uncertainty in traffic flow prediction using only a few Bayesian fully connected layers. These two modules take advantage of the strengths of LSTM NN and BNN respectively. The traffic flow data needs to be preprocessed before being fed into the two modules. The traffic flow data is first aggregated according to the length of the time period. The inputs of the proposed model are the traffic flows collected from periods Fig. 2. Mean Prediction Module Using LSTM NN before the current moment, and the output is the traffic flow for the upcoming 1 time-step. The historical traffic flow data is utilized to train the neural networks, and the future data is predicted based on the trained modules. The outputs of these two modules constitute the predicted traffic flow. The structures and training algorithms of these two modules are described below. 3.2 Traffic Flow Mean Prediction Using LSTM NN 3.2.1 The Structure of the LSTM NN LSTM NN, as a variant of RNN, is a powerful deep learning method to handle sequence data. It can capture long-term temporal trends of traffic flow sequence data and has been popular in the 366 Y. Wang et al. field of traffic flow prediction for nearly a decade (Tedjopurnomo et al., 2020). To improve the convergence speed and the accuracy of the LSTM NN model, the traffic flow data is standardized before training. As shown in Fig. 2, the traffic flow mean prediction module includes several LSTM NN layers. The square nodes represent the LSTM neurons, and the circular nodes represent the input data and hidden output states of the LSTM NN layer. The input data consists of a sequence of traffic flows for the t periods before the future period t + 1. The number of LSTM NN hidden layers and the number of LSTM neurons in each layer are determined by the characteristics of actual traffic flow. The grey circular node represents the output of the LSTM model, i.e., the mean of the traffic flow for the period t + 1. A fundamental element of the LSTM NN is the memory cell consisting of three gates, i.e., a forget gate ft + 1 , an input gate it + 1 , and an output gate ot + 1 (Gers and Schmidhuber, 2000). The three gates are utilized to determine and update the cell state Ct + 1 . The relevant definitions are given as ft + 1 = sigmod Wf dt + 1 + Wf ht + bf , (1) it + 1 = sigmod Wi dt + 1 + Wi ht + bi , (2) ot + 1 = sigmod Wo dt + 1 + Wo ht + bo , (3) C (4) t+1 = tanh Wc dt + 1 + Wc ht + bc Ct + 1 = ft + 1 Ct + it + 1 Ct+ 1 , (5) ht + 1 = ot + 1 tanh Ct + 1 , (6) where dt+1 is the input data at time interval t + 1, i.e., the traffic flow sequence for the t periods; ht is the hidden state of the memory cell at time interval t, Wf, Wi, Wo, Wc are the weight matrices; bf, bi, bo, bc are the bias vectors. 3.2.2 Backpropagation for Training LSTM NN LSTM NN can be regarded as a probabilistic model P y x w . Given an input x, an LSTM NN uses a set of parameters or weights w to assign a probability to each possible output y. For the purpose of regression in traffic flow prediction, P y x w is a Gaussian distribution that corresponds to a squared loss. The weights w, based on maximum likelihood estimation (MLE), can be calculated as w MLE = arg maxw log P D w = arg minw – log P D w , (7) where –log P D w is the loss of the LSTM NN. Backpropagation is chosen for training the LSTM NN, where log P D w is assumed to be differentiable in w. The gradient of w is calculated as –log P D|w w = --------------------------------- . w (8) The LSTM NN is trained by Adam optimizer (Kingma and Ba, 2014). The optimizer is a stochastic optimization method with high computational efficiency, simple implementation, low memory requirement, and no impact on the diagonal reconstruction Fig. 3. Interval Prediction Module Using BNN of the gradient (Kingma and Ba, 2014). The weights w, based on the Adam optimizer, are optimized to minimize the prediction loss as w w – 1 w , (9) where 1 is the learning rate of the LSTM NN. 3.3 Traffic Flow Interval Prediction Using BNN 3.3.1 The Structure of the BNN BNN, as a variant of the standard neural network, combines the neural network with Bayesian inference. Since its weights, biases, and outputs are considered variables, a BNN can be regarded as an ensemble of multiple neural networks. Before training the BNN model, the traffic flow data is standardized for the same purpose as the mean prediction module. As shown in Fig. 3, the traffic flow interval prediction module includes several fully connected BNN layers. The circular nodes represent the input data and hidden output states of the BNN layer. The input data is the traffic flow sequence for the t periods before the future period t + 1. The hidden layers are fully connected BNN layers with nonlinear activation functions. The number of layers and the number of neurons in each hidden layer also need to be set on the basis of the characteristics of actual traffic flow. The grey square node represents the output of the BNN model, i.e., the interval of the traffic flow for the future period t + 1. 3.3.2 Bayesian by Backpropagation for Training BNN BNN seeks to find the posterior distribution P w|D of the weights on the training traffic flow data. This distribution predicts queries about future traffic flow data by taking expectations. Given observation x, the predictive distribution of future traffic flow y can be written as P ŷ|x̂ = EPw|D P ŷ|x̂ w , (10) where D is the training traffic flow data and w is the weights. In Eq. (10), the expectation EP w|D P ŷ|x̂ w is intractable for neural networks of any real traffic flow data. To approximate Bayesian processes in neural networks, three ways are usually applied, including Monte Carlo dropout (MC dropout) method, Markov chain Monte Carlo (MCMC) method, KSCE Journal of Civil Engineering 367 and Bayesian by backpropagation method. The Bayesian by backpropagation, one of the variational inference methods, is adopted in this study for the following reasons: 1) the MC dropout method might not fully capture the uncertainty associated with the model predictions and is only applicable to models with dropout layer(s) (Chan et al., 2020); 2) the MCMC method requires storing a very large number of samples and is suitable for small and average models (Blei et al., 2017); 3) the Bayesian by backpropagation method fits any parametric distribution as posterior and is applicable to large-scale models (Blundell et al., 2015). The method finds the best parameters on a distribution q w| to approximate the true posterior distribution P w|D . By minimizing the Kullback-Leibler dispersion (KL divergence) (Kullback and Leibler, 1951) of the two distributions, the approximation can be achieved. Combined with Bayes’ theorem, parameters can be calculated as = arg minKL q w| ||P w|D q w| - dw = arg min q w| log ----------------P w|D (11) = arg minKL q w| ||P w – Eqw| log P D|w . The loss function of BNN is further simplified as follows F D = KL q w| ||P w – Eqw| log P D|w . (12) Equation (12) is a sum of a data-dependent part and a priordependent part that can be respectively referred to as likelihood cost and complexity cost (Blundell et al., 2015). The Monte Carlo sampling is adopted due to the high computationally complexity cost, thus the prediction loss of Eq. (12) can be approximated as u – 2 , (17) – 2 , (18) where 2 is the learning rate of the BNN. 4. Case Study To assess the effectiveness of the proposed LSTM-BNN model, the Caltrans Performance Measurement System (PeMS) dataset collected on freeways is utilized for the case study. The SARIMA(1,0,1)(0,1,1)672-GARCH(1,1) model, a state-of-the-art method for traffic flow prediction and uncertainty quantification, is selected as the benchmark. For comparison purposes, the prediction intervals for both models are computed at the 95% significant level. 4.1 Data Description q w| dw = arg min q w| log -----------------------------P w P D|w n posterior parameters = are optimized as i i F D i = 1 log q w | – log P w – log P D|w (13) i where w is the ith Monte Carlo sample from the variational i 2 posterior q w | . For w N , where is the mean and is the standard deviation, the Monte Carlo sampling directly 2 from N makes and non-differentiable. To solve the non-differentiable problem, a reparametrization trick (Kingma and Welling, 2019) is utilized to guarantee the operation of backpropagation. The is parameterized as = log 1 + exp , so is non-negative. = are the variational posterior parameters, and the posterior sample of the weights w is w = t = + = + log 1 + exp , The Caltrans PeMS dataset is widely used in traffic parameter prediction tasks (Xu et al., 2013; Guo et al., 2019; Li et al., 2019; Yao et al., 2022). The data is collected every 30 seconds from over 15,000 individual vehicle detector stations (VDSs) located throughout California. Sacramento, as the capital of California, benefits from a well-developed freeway network that facilitates Table 1. The Detailed Information for Selected VDSs in Sacramento City Number VDS Freeway Location name 1 2 3 4 5 6 314909 318626 318282 312220 312694 312857 I5-N I5-S US50-E US50-W SR51-N SR51-S WB Florin Rd 4 Seamas Ave 4 25th St 4 NB Howe Ave 4 51NB at J ST 4 51SB at Elvas Underpass 3 (14) where is a random variable. The gradients of the mean and the standard deviation parameter are therefore calculated as f w f w = ------------------- + ------------------- , w (15) f w f w = ------------------- ---------------------------- + ------------------- . w 1 + exp – (16) In this module, Adam optimizer is also chosen for training BNN with large-scale data and parameters. The variational Fig. 4. The locations of the Selected VDSs Lanes 368 Y. Wang et al. Fig. 5. The Traffic Flow Distributions for 6 Selected VDSs efficient travel within the city and connects Sacramento to other regions. The traffic flow data for these freeways is readily available and well-documented in Caltrans PeMS. As shown in Table 1, we use 6 VDS traffic flow data from 6 freeways in Sacramento. The specific locations of these VDSs on the freeways are presented in Fig. 4. These particular VDSs are strategically chosen from major freeways to ensure representative coverage of the city. The prediction results obtained from our study have the potential to assist Sacramento traffic authorities in active traffic management and control. The study data was collected from September 1st, 2018 to November 30th, 2018. The raw data was aggregated at 15-min intervals. Fig. 5 displays the traffic flow distribution for the selected VDSs, with median flows ranging from approximately 750 to 1000, 15th percentile flows ranging from approximately 150 to 400 and 85th percentile flows ranging from approximately 1050 to 1400. The traffic flow distributions of all 6 VDSs are different, which can effectively verify the model's validity. The first two months of data were used to train the models and the last month of data was utilized for evaluation. The training data was transformed to generate the training set and validation set, and the test data was transformed to generate the test set. The data transformation is conducted according to the period number t of the input data. In this case study, t ranged from 1 to 20. The optimal t for each module is determined in Section 3.3. 4.2 Case Study Design Traffic flow prediction is commonly formulated as a regression problem in the literature (Lv et al., 2015; Yu et al., 2018; Pavlyuk, 2019; Razail et al., 2021). Thus, four regression measures are chosen to evaluate the performance of the baseline and proposed model. Mean absolute error (MAE) and mean absolute percentage error (MAPE) are utilized to determine the accuracy of the mean of the predicted traffic flow. Mean interval width (MIW) and kickoff percentage (KP) are used to quantify the uncertainty of predicted traffic flow intervals. The four performance measures are defined as 1 ˆ MAE = ---- i T fi – fi , N (19) ˆ fi – fi 1 - 100% , MAPE = ---- --------N i T fi (20) upper lower 1 MIW = ---- i T f̂ i – f̂ i , N (21) KN KP = -------- 100% , N (22) where N is the number of overall prediction samples, KN is the number of kickoffs, with a kickoff indicating that the real flow lies outside of the prediction interval. In period i, fi is the actual 15-min traffic flow, f̂i is the mean predicted 15-min traffic flow, upper lower and f̂ i are the upper and lower bounds of a prediction f̂ i interval. All performance measures are expected to be small. The performance of the LSTM-BNN model and the SARIMAGARCH model are compared by VDS based on all performance metrics. To demonstrate the detailed performance of the two models, all metrics will also be computed by the time of day. The daytime is simply defined as from 6:00 am to 7:00 pm, while the remainder of the time is considered nighttime. In addition, we define non-seasonal traffic conditions as traffic flow at the current period changes by more than 15% compared to the one during the same period of the previous day. The model performance under unseasonal traffic conditions is also evaluated. 4.3 Hyperparameter Tuning of the LSTM-BNN Model To enhance the model performance, hyperparameter tuning was performed on the two LSTM-BNN modules. Both the mean prediction module and the interval prediction module contain hyperparameters such as the length of the input layer, the number of hidden layers, the number of memory cells in each layer, etc. Fig. 6 shows the main experiments of the hyperparameter turning. For LSTM NN of the mean prediction module, MAE is selected as the turning measure. The traffic flows for the past 10 periods are suitable as the input layer for learning the traffic flow sequence pattern. It demonstrates that an LSTM layer of 10 memory cells is optimal. The batch size of 500 and the learning rate of 0.001 are appropriate for propagation through the network. For BNN of the interval prediction module, KP is chosen as the turning measure based on a fixed MIW. The optimal length of the input layer is 10. One LSTM layer of 30 memory cells is sufficient to obtain a good performance. The batch size of 1000 and the initial learning rate of 0.01 are suitable for training the network. The values of the key hyperparameters of the LSTM-BNN model are summarized in Table 2. 4.4 Performance Evaluation 4.4.1 Accuracy Evaluation Figure 7 presents the overall accuracy of traffic flow prediction for the LSTM-BNN model and the SARIMA-GARCH model, including MAE and MAPE measures. For all 6 VDSs, the MAE of the LSTM-BNN model is 47.8 veh/15 min, and the MAE of the SARIMA-GARCH model is 58.7 veh/15 min. The MAPE of KSCE Journal of Civil Engineering 369 Fig. 6. Hyperparameter Tuning of the LSTM-BNN Model: (a) LSTM-The Length of the Input Layer, (b) LSTM-The Number of LSTM Layers, (c) LSTM-The Number of Memory Cells in Each Layer, (d) LSTM-Batch Size, (e) LSTM-Learning Rate, (f) BNN-The Length of the Input Layer, (g) BNN-The Number of BNN Layers, (h) BNN-The Number of Memory Cells in Each Layer, (i) BNN-Batch Size, (j) BNN-Learning Rate the BNN model is 8.9%, and the MAPE of the SARIMAGARCH model is 10.6%. For each VDS, the MAEs and MAPEs of the LSTM-BNN model are smaller than those of the SARIMAGARCH model. Overall, the LSTM-BNN model tends to perform better than the SARIMA-GARCH model in predicting the mean 15-min traffic flow. The detailed accuracy performances of 15-min traffic flow prediction by time of day are shown in Fig. 8. For most times of the day, both MAEs and MAPEs of the LSTM-BNN model are smaller than those of the SARIMA-GARCH model. During the daytime, the accuracy improvement of the LSTM-BNN model is more pronounced compared to the SARIMA-GARCH model. It 370 Y. Wang et al. Table 2. Hyperparameters of the LSTM-BNN Model Module Hyperparameters Values Mean prediction (LSTM NN) The length of the input layer The number of LSTM layers The number of memory cells in each layer Batch size Learning rate 10 1 10 500 0.001 Interval prediction (BNN) The length of the input layer The number of BNN layers The number of memory cells in each layer Batch size Learning rate, step_size, gamma 10 1 30 1000 0.01, 5000, 0.5 may be due to the fact that the training data contains more high-traffic scenes during the daytime than low-traffic scenes during the nighttime, and the LSTM-BNN model is more likely to learn the traffic flow sequence patterns during the daytime. 4.4.2 Uncertainty Quantification Figure 9 displays the overall uncertainty of traffic flow prediction for the LSTM-BNN model and the SARIMA-GARCH model, including MIW and KP measures. For all 6 VDSs, the MIW of the LSTM-BNN model is 310.4 veh/15 min, and the MIW of the SARIMA-GARCH model is 314.5 veh/15 min. For each VDS, the LSTM-BNN model predicts narrower MIW than the SARIMAGARCH model. For all 6 VDSs, the KP of the LSTM-BNN model is 5.1%, and the KP of the SARIMA-GARCH model is 5.4%. For each VDS, the KPs of the BNN model are not larger than those of the SARIMA-GARCH model. Together, these results suggest that LSTM-BNN performs better in predicting the interval of the traffic flow. The uncertainty of 15-min traffic flow prediction in terms of time of day is presented in Fig. 10. It can be found that the MIWs of the LSTM-BNN model, ranging approximately from 250 veh/ 15 min to 400 veh/15 min, are more stable than those of the SARIMA-GARCH model. During the nighttime with low traffic flow, based on the wider predicted MIWs, the LSTM-BNN model predicts smaller KPs. During the daytime with heavy traffic flow, the KPs of the LSTM-BNN model are closer to those of the benchmark, while the MIWs of the LSTM-BNN model are much narrower. Consistent with the conclusions of the accuracy evaluation, the LSTM-BNN model outperforms during the daytime with high traffic flow. 4.4.3 Extend Evaluation under Non-Seasonal Traffic Conditions To further evaluate the efficacy of the proposed LSTM-BNN model, Table 3 lists several typical examples of traffic flow prediction and uncertainty quantification under non-seasonal traffic conditions. In this work, non-seasonal traffic conditions are simply defined as the traffic flow at the current period changes by more than 15% compared to the one in the same period of the previous day. It is apparent from this table that the AEs and APEs predicted by the LSTM-BNN model are much smaller than those of the benchmark under all listed non-seasonal traffic conditions. Meanwhile, the LSTM-BNN model predicts narrower IWs than Fig. 7. Accuracy Comparison of Traffic Flow Prediction by VDS: (a) MAE, (b) MAPE KSCE Journal of Civil Engineering 371 Fig. 8. Accuracy Comparison of Traffic Flow Prediction by Time of Day: (a) MAE, (b) MAPE Fig. 9. Uncertainty Comparison of Traffic Flow Prediction by VDS: (a) MIW, (b) KP the SARIMA-GARCH model. In the last column, “Yes” and “No” represent whether the real flow lies in the prediction intervals of the model. For all 12 non-seasonal traffic conditions, 8 real flows lie in the prediction intervals of the LSTM-BNN model and 5 lie in those of the SARIMA-GARCH model. The superiority of the LSTM-BNN model is mostly identified in conditions with increased traffic flow. Overall, the findings reveal that the proposed model outperforms the benchmark under nonseasonal traffic conditions. The possible reason is that the LSTM-BNN model can capture the nonlinear relationship in the traffic flow series, and the SARIMA-GARCH relies more on the seasonal patterns of the traffic flow sequences. 372 Y. Wang et al. Fig. 10. Uncertainty Comparison of Traffic Flow Prediction by Time of Day: (a) MIW, (b) KP ㄴ Table 3. Comparison of Traffic Flow Prediction under Unseasonal Traffic Conditions VDS Real flow (veh/15 min) Change ratio (%) Model Predict flow Lower bound Upper bound AE APE (veh/15 min) (veh/15 min) (veh/15 min) (veh/15 min) (%) Is real flow lies IW in the prediction (veh/15 min) interval? 314909 1034 -34% LSTM-BNN 1374 SARIMA-GARCH 1511 314909 762 -28% LSTM-BNN 788 SARIMA-GARCH 844 LSTM-BNN SARIMA-GARCH 318626 318626 318282 318282 312220 312220 312694 312694 312857 312857 1480 1415 614 711 1030 1093 754 1079 1092 858 -20% -22% -49% -37% -20% -17% -32% 41% 23% -20% 1310 1616 340 33% 306 No 1273 1750 477 46% 477 No 688 965 26 3% 277 Yes 339 1349 82 11% 1010 Yes 1455 1255 1720 25 2% 465 Yes 1653 1338 1968 173 12% 630 Yes LSTM-BNN 1689 1499 1872 274 19% 373 No SARIMA-GARCH 1709 1513 1905 294 21% 392 No LSTM-BNN 988 738 1178 374 61% 440 No SARIMA-GARCH 1098 812 1383 484 79% 571 No LSTM-BNN 767 543 984 56 8% 441 Yes SARIMA-GARCH 990 412 1568 279 39% 1156 Yes LSTM-BNN 1145 875 1370 115 11% 495 Yes SARIMA-GARCH 1501 739 2264 471 45.7% 1525 Yes LSTM-BNN 1279 1079 1503 186 17% 424 Yes SARIMA-GARCH 1640 1278 2001 547 50% 723 No LSTM-BNN 1078 912 1219 324 43% 307 No SARIMA-GARCH 1134 864 1403 380 50% 539 No LSTM-BNN 965 850 1122 114 11% 272 Yes SARIMA-GARCH 811 604 1020 268 25% 416 No LSTM-BNN 1077 1008 1226 15 1% 218 Yes SARIMA-GARCH 937 799 1075 155 14% 276 No LSTM-BNN 956 768 1048 98 11% 262 Yes SARIMA-GARCH 1025 814 1237 167 19% 423 Yes KSCE Journal of Civil Engineering 373 5. Conclusions Short-term traffic flow prediction is extensively studied for effectively serving many intelligent traffic management applications. However, only a limited number of parametric studies focus on measuring uncertainty in traffic flow prediction. Over the past decade, artificial intelligence techniques have become a successful means of solving many transportation problems. This paper proposes an LSTM-BNN framework for short-term traffic flow prediction and uncertainty quantification. Caltrans PeMS traffic flow data for 6 freeways in Sacramento city is aggregated at 15min intervals to evaluate the LSTM-BNN model. The SARIMAGARCH model is used as the benchmark. MAE, MAPE, MIW, and KP are chosen as the performance measures. As for accuracy evaluation, the MAE of the LSTM-BNN model is 47.8 veh/ 15 min and the MAPE of the LSTM-BNN model is 8.9%. As for uncertainty quantification, the MIW of the LSTM-BNN model is 310.4 veh/15 min and the KP of the LSTM-BNN model is 5.1%. Experimental results present that the LSTM-BNN model outperforms the SARIMA-GARCH model in both the 15-min mean and interval of traffic flow prediction. Primarily, the LSTMBNN model is superior during the daytime and under non-seasonal traffic conditions. In reality, the proposed LSTM-BNN model can be utilized by ITS for making reliable management decisions. Future research would be of interest to explore combining spatial-temporal neural network and BNN to measure networklevel uncertainty in traffic flow forecasting. Additionally, considering the dynamics and propagation of the traffic system, it is meaningful to investigate using BNN to address other traffic uncertainty problems. Acknowledgments This study was funded by the National Natural Science Foundation of China (No. 71971060). The authors want to thank the anonymous reviewers for their useful comments and suggestions to improve the quality of this paper. ORCID Yinpu Wang https://orcid.org/0000-0002-0280-6978 Siping Ke https://orcid.org/0000-0003-1599-2359 Chengchuan An https://orcid.org/0000-0002-5254-8751 Zhenbo Lu https://orcid.org/0000-0001-5887-872X Jingxin Xia https://orcid.org/0000-0003-2298-3303 References Ahmed MS, Cook AR (1979) Analysis of freeway traffic time-series data by using Box-Jenkins techniques. Transportation Research Record 722:1-9 Blei DM, Kucukelbir A, McAuliffe JD (2017) Variational inference: A review for statisticians. Journal of the American Statistical Association 112(518):859-877, DOI: 10.1080/01621459.2017.1285773 Blundell C, Cornebise J, Kavukcuoglu K, Wierstra D (2015) Weight uncertainty in neural networks. In International Conference on Machine Learning 1613-1622, DOI: 10.48550/arXiv.1505.05424 Chan A, Alaa A, Qian Z, Schaar MVD (2020) Unlabelled data improves Bayesian uncertainty calibration under covariate shift. Proceedings of the 37th International Conference on Machine Learning, PMLR, 1392-1402, DOI: 10.48550/arXiv.2006.14988 Clark SD, Dougherty MS, Kirby HR (1993) The use of neural networks and time series models for short term traffic forecasting: A comparative study. In Transportation Planning Methods. Proceedings of Seminar D Held at the Ptrc European Transport, Highways and Planning 21st Summer Annual Meeting, 363 Du S, Li T, Gong X, Yang Y, Horng SJ (2017) Traffic flow forecasting based on hybrid deep learning framework. 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), IEEE, Nanjing, 1-6, DOI: 10.1109/ISKE.2017.8258813 Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, IEEE, Como, Italy 3:189194, DOI: 10.1109/IJCNN.2000.861302 Guo J, Huang W, Williams BM (2014) Adaptive Kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification. Transportation Research Part C: Emerging Technologies 43:50-64, DOI: 10.1016/j.trc.2014.02.006 Guo S, Lin Y, Feng N, Song C, Wan H (2019) Attention based spatialtemporal graph convolutional networks for traffic flow forecasting. Proceedings of the AAAI Conference on Artificial Intelligence 33:922929, DOI: 10.1609/aaai.v33i01.3301922 Hamed MM, Al-Masaeid HR, Said ZMB (1995) Short-term prediction of traffic volume in urban arterials. Journal of Transportation Engineering 121(3):249-254, DOI: 10.1061/(ASCE)0733-947X(1995)121:3(249) Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735-1780, DOI: 10.1162/neco.1997.9.8.1735 Jia Y, Wu J, Xu M (2017) Traffic flow prediction with rainfall impact using a deep learning method. Journal of Advanced Transportation 2017: e6575947, DOI: 10.1155/2017/6575947 Jospin LV, Laga H, Boussaid F, Buntine W, Bennamoun M (2022) Handson Bayesian neural networks — a tutorial for deep learning. IEEE Computational Intelligence Magazine 17(2):29-48, DOI: 10.1109/ MCI.2022.3155327 Kendall A, Gal Y (2017) What uncertainties do we need in Bayesian deep learning for computer vision. Advances in Neural Information Processing Systems, 30, DOI: 10.48550/arXiv.1703.04977 Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. DOI: 10.48550/arXiv.1412.6980 Kingma DP, Welling M (2019) An introduction to variational autoencoders. Foundations and Trends in Machine Learning 12(4):307-392, DOI: 10.1561/2200000056 Kullback S, Leibler RA (1951) On information and sufficiency. The Annals of Mathematical Statistics 22(1):79-86, DOI: 10.1214/aoms/ 1177729694 Li Z, Jiang S, Li L, Li Y (2019) Building sparse models for traffic flow prediction: An empirical comparison between statistical heuristics and geometric heuristics for Bayesian network approaches. Transportmetrica B: Transport Dynamics 7(1):107-123, DOI: 10.1080/21680566.2017. 1354737 Long J, Gao Z, Ren H, Lian A (2008) Urban traffic congestion propagation and bottleneck identification. Science in China Series F: Information 374 Y. Wang et al. Sciences 51(7):948-964, DOI: 10.1007/s11432-008-0038-9 Luo X, Niu L, Zhang S (2018) An algorithm for traffic flow prediction based on improved SARIMA and GA. KSCE Journal of Civil Engineering 22(10):4107-4115, DOI: 10.1007/s12205-018-0429-4 Lv Y, Duan Y, Kang W, Li Z, Wang F-Y (2015) Traffic flow prediction with big data: A deep learning approach. IEEE Transactions on Intelligent Transportation Systems 16(2):865-873, DOI: 10.1109/ TITS.2014.2345663 Okutani I, Stephanedes YJ (1984) Dynamic prediction of traffic volume through Kalman filtering theory. Transportation Research Part B: Methodological 18(1):1-11, DOI: 10.1016/0191-2615(84)90002-X Park B, Messer CJ, Urbanik T (1998) Short-term freeway traffic volume forecasting using radial basis function neural network. Transportation Research Record: Journal of the Transportation Research Board 1651(1):39-47, DOI: 10.3141/1651-06 Pavlyuk D (2019) Feature selection and extraction in spatiotemporal traffic forecasting: A systematic literature review. European Transport Research Review 11(1):6, DOI: 10.1186/s12544-019-0345-9 Razali NAM, Shamsaimon N, Ishak KK, Ramli S, Amran MFM, Sukardi S (2021) Gap, techniques and evaluation: Traffic flow prediction using machine learning and deep learning. Journal of Big Data 8(1):152, DOI: 10.1186/s40537-021-00542-7 Shao H, Soong BH (2016) Traffic flow prediction with long short-term memory networks (LSTMs). 2016 IEEE Region 10 Conference (TENCON), 2986-2989, DOI: 10.1109/TENCON.2016.7848593 Smith BL, Williams BM, Oswald RK (2002) Comparison of parametric and nonparametric models for traffic flow forecasting. Transportation Research Part C: Emerging Technologies 10(4):303-321, DOI: 10.1016/S0968-090X(02)00009-8 Tang J, Zeng J, Wang Y, Yuan H, Liu F, Huang H (2021) Traffic flow prediction on urban road network based on license plate recognition data: Combining attention-LSTM with genetic algorithm. Transportmetrica A: Transport Science 17(4):1217-1243, DOI: 10.1080/23249935.2020.1845250 Tedjopurnomo DA, Zheng B, Choudhury FM, Qin K (2020) A survey on modern deep neural network for traffic prediction: Trends, methods and challenges. IEEE Transactions on Knowledge and Data Engineering, DOI: 10.1109/TKDE.2020.3001195 Treiber M, Kesting A (2013) Traffic flow dynamics. Traffic Flow Dynamics: Data, Models and Simulation, Springer-Verlag Berlin Heidelberg 983-1000, DOI: 10.1007/978-3-642-32460-4 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Advances in Neural Information Processing Systems 30:6000-6010, DOI: 10.48550/ arXiv.1706.03762 Williams BM (1999) Modeling and forecasting vehicular traffic flow as a seasonal stochastic time series process. PhD Thesis, University of Virginia, Charlottesville, USA Williams BM, Hoel LA (2003) Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. Journal of Transportation Engineering 129(6):664672, DOI: 10.1061/(ASCE)0733-947X(2003)129:6(664) Wu Y, Tan H (2016) Short-term traffic flow forecasting with spatialtemporal correlation in a hybrid deep learning framework. Computer Vision and Pattern Recognition, DOI: 10.48550/arXiv.1612.01022 Xia J, Nie Q, Huang W, Qian Z (2013) Reliable short-Term traffic flow forecasting for urban roads: Multivariate generalized autoregressive conditional heteroscedasticity approach. Transportation Research Record: Journal of the Transportation Research Board 2343(1):7785, DOI: 10.3141/2343-10 Xu C, Liu P, Wang W, Jiang X (2013) Development of a crash risk index to identify real time crash risks on freeways. KSCE Journal of Civil Engineering 17(7):1788-1797, DOI: 10.1007/s12205-013-0353-6 Yao R, Zhang W, Long M (2022) DLW-Net model for traffic flow prediction under adverse weather. Transportmetrica B: Transport Dynamics 10(1):499-524, DOI: 10.1080/21680566.2021.2008280 Ye Z, Zhang Y, Middleton DR (2006) Unscented Kalman filter method for speed estimation using single loop detector data. Transportation Research Record: Journal of the Transportation Research Board 1968(1):117-125, DOI: 10.1177/0361198106196800114 Yu B, Yin H, Zhu Z (2018) Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, Stockholm, Sweden, 3634-3640, DOI: 10.48550/ arXiv.1709.04875 Zeng J, Lesnikowski A, Alvarez JM (2018) The relevance of Bayesian layer positioning to model uncertainty in deep Bayesian active learning. Machine Learning, DOI: 10.48550/arXiv.1811.12535 Zhu L, Laptev N (2017) Deep and confident prediction for time series at Uber. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), 103-110, DOI: 10.1109/ICDMW.2017.19