An Adaptive-Margin Support Vector Regression for Short-Term Traffic Flow Forecast Dali Wei1, Hongchao Liu2,* 1 2 Civil and Environmental Department, Texas Tech University, 10th and Akron, Lubbock, TX 79409 Civil and Environmental Department, Texas Tech University, 10th and Akron, Lubbock, TX 79409 Email: hongchao.liu@ttu.edu; Fax: 1- (806)7424168; Telephone: 1-(806)7422801 *Corresponding 1 author An Adaptive-Margin Support Vector Regression for ShortTerm Traffic Flow Forecast ABSTRACT The day-to-day volatility of traffic series provides valuable information for accurately tracking the complex characteristics of short-term traffic such as stochastic noise and non-linearity. Recently, Support Vector Regression (SVR) has been applied for shortterm traffic forecasting. However, standard SVR adopts a global and fixed ε-margin, which not only fails to tolerate the day-to-day traffic variation, but also requires a blind and time-consuming searching procedure to obtain a suitable value for ε. In this work, on the ground of stochastic modeling of day-to-day traffic variation, we propose an adaptive SVR short-term traffic forecasting model. The time-varying deviation of the day-to-day traffic variation, described in a bi-level formula, is integrated into SVR as heuristic information to construct an adaptive ε-margin, in which both local and normalized factors are considered. Comparative experiments using filed traffic data indicate that the proposed model consistently outperforms the standard SVR with an improved computational efficiency. KEY WORDS: Short-term traffic forecasting, Support Vector Regression, ε-margin, day-to-day traffic variation 1. Introduction Short-term traffic forecasting plays an important role in proactive traffic control and operation. Usually, the predicting period required is less than 15 minutes. In such a short interval, traffic volume and velocity exhibit very complex characteristics such as stochastic noise and non-linearity (Jiang & Adeli, 2005; L. Li, Lin, & Liu, 2006; Wei, Chen, & Chen, 2010). A number of forecasting methods have been proposed regarding short-term traffic as a typical time series, such as the Auto Regression Integrated Moving Average (ARIMA) model (Williams, Durvasula, & Brown, 1998; Williams & Hoel, 1999, 2003), the NonParameter Regression model (Davis & Nihan, 1991) and the Neural Network (NN) models (Park, Messer, & Urbanik II, 1998; Smith, Williams, & Keith Oswald, 2002; Xie & Zhang, 2006). For example, Williams and Hoel (1999) applied traditional parametric Box-Jenkins time series models to the dynamic system of single point traffic flow forecasting, which addressed most of the parametric model concerns for traffic condition data by establishing a theoretical foundation for using seasonal ARIMA forecast models. The experiments of Smith et al. (2002) further indicated that the performances of the seasonal ARIMA are better than the nearest neighbor nonparametric regression model, where the Naive forecast model is applied as a benchmark model. Recently, Support Vector Regression (SVR), as a universal learning model, has drawn increasing attentions and successfully been applied in short-term traffic forecasting. 2 Various empirical studies indicate that SVR consistently outperforms traditional forecasting models like ES and ARIMA (Castro-Neto, Jeong, Jeong, & Han, 2009; F. Chen, Wei, & Tang, 2010; Wei et al., 2010; Y. Zhang & Xie, 2008). SVR using the π-insensitive loss function is in the form of (Vapnik, 1995): 0 ππ |π¦ − π(π₯, π)| < π πΏπ (π¦, π(π₯, π)) = οΏ½ (1) |π¦ − π(π₯, π)| − π ππ‘βπππ€ππ π where π(π₯, π) is the predicting function, πΏπ is the loss function, π¦ is the observed value and π is the margin parameter. As shown in Figure 1, if the data sample is inside the πtube (the tube between the two dash lines), SVR regards it as having no loss or error corresponding to the estimation of π(π₯, π). Only the data samples outside the tube are taken into account in the optimization. The π-insensitive loss function would yield better generalization than other loss functions like Huber’s loss function, which not only measures the empirical risk (training error), but also controls the generalization ability and the tolerance of noise, while conferring the advantage of sparsity in the solution (Smola, Schölkopf, & Müller, 1998). Figure 1: Illustration of π insensitive function in SVR However, in the context of short-term traffic forecasting, using a global and fixed πmargin for all time period within a day lacks the flexibility to capture and tolerate the time-varying variation in daily traffic. SVR, similar to other supervised learning models, learns from historical traffic data (training samples) to gain the knowledge in the form of equation systems, and then predicts the future traffic states. Although showing a similar pattern, day-to-day traffic exhibits a prominent variation due to underlying stochastic travelling behavior. This has been well verified by a number of empirical traffic data analyses (Hellinga & Abdy, 2008; Levinson, Sullivan, & Bryson, 2006; J. Q. Li, 2011; Rakha & Van Aerde, 1995; Yin, 2008; L. Zhang, Yin, & Lou, 2010). Hence, to attain a satisfying predicting accuracy, SVR not only needs to learn the seasonal and cyclical patterns, but also needs to adequately tolerate and track this day-to-day variation. Specifically, this variation is time-dependent—different time periods exhibit different variation characteristics. Using a fixed π-margin in SVR obviously cannot adapt to this time-varying volatility. Furthermore, it is always a tricky task to determine a suitable value of π in the fixedmargin SVR. Without prior experience, all optional values for π have to be tried and compared using a K-fold cross-validation method, which is a time-consuming and blind searching procedure. We believe that integrating an adaptive π-margin in SVR to utilize the day-to-day traffic variation as heuristic information, will not only lead to a better 3 forecasting performance, but also avoid the computation complexity for determine a suitable global margin. This forms the basic motivation of this work. In this paper, an adaptive-margin SVR model for short-term traffic forecasting is proposed on the ground of a stochastic modeling of traffic variation. We establish a bilevel stochastic formula to describe the traffic variation, which combines the timevarying characteristic into the day-to-day traffic variation. The obtained time-varying deviation then is utilized to construct an adaptive π-margin of SVR, in which both local and normalized factors are considered. Four traffic datasets from Hefei, China and Seattle, USA are used in this paper for training our model and validate the performance. Our new model is compared with the fixed π margin SVR, as well as the seasonal ARIMA, NN and Naive forecast. The rest of this paper is organized as follows. In section 2, we briefly introduce the SVR, and then propose the stochastic model for the day-to-day traffic variance and the adaptive π -margin in SVR. Section 3 presents comparative experiments to evaluate the performance of our model. Section 4 concludes the paper. 2. Model formulation The section briefly introduces some fundamentals of ε -SVR, and then presents the stochastic modeling of the day-to-day traffic variation and describes how to integrate this variation into the adaptive margin of SVR. 2.1 Support Vector Regression Support Vector Machine (SVM) is a universal learning method based on the statistical learning theory developed by Vapnik (1995). SVM embodies the structural risk minimization (SRM) principle to minimize an upper bound on the expected risk, which is opposed to empirical risk minimization (ERM) that aims to minimize the error on the training data. This feature equips SVM with a greater generalization performance. SVR is an application of SVM for function estimation problem. Given a set of training data: (π₯π , π¦π ), π = 1,2,3 … , π, π₯π ∈ βπ , π¦π ∈ β (2) a linear function regression function can be stated asοΌ π(π₯, π) = ππ π(π₯) + π (3) in high-dimensional feature space β±, where ω is a vector in β±. Function Ο(x) maps the input x into a vector in β±. The quality of estimation is measured by the loss function of L(y, f(x, ω)). SVR uses a new type of loss function, namely an β°-intensive loss function in Equation (1) proposed by Vapnik. The empirical risk is then: 1 π πππ (π) = ∑ππ=1 πΏβ° (π¦π , π(π₯, π)) (4) π SVR performs linear regression in the high-dimension feature space using the β°-intensive loss function and, at the same time, tries to reduce model complexity by minimizing βπβ2 . This is described by introducing slack variables, ππ , ππ∗ π = 1, … , π, to 4 measure the deviation of training samples outside the β° -tube. Thus, SVR is then formulated as: π 1 2 min βπβ + πΆ οΏ½(ππ + ππ∗ ) 2 π=1 π¦π − π(π₯π , π) ≤ β° + ππ∗ π . π‘. οΏ½ π(π₯π , π) − π¦π ≤ β° + ππ ππ , ππ∗ ≥ 0, π = 1, … , π 1 (5) The βπβ2 in Equation (5) is called the regularization term. πΆ is the regularized fixed for 2 determining the trade-off between the empirical risk and the regularization term. Increasing the value of πΆ will result in the relative importance of the empirical risk with respect to the regularization term to grow. We can view the goal defined by Equation (6) as: 1 π πππ (π) = min βπβ2 + π πππ (π) (6) 2 The ERM principle only considers the term π πππ (π), which is the risk on the given dataset. When dealing with few data in very high-dimensional spaces, this may lead to over-fitting and thus poor generalization properties. Hence the SVR method adds a 1 capacity control term— βπβ2 , which leads to the regularized risk function. 2 To solve the optimization problem of Equation (5), the objective function can be transformed to a dual problem, and its solution is given byοΌ ππ π£ (πΌπ − πΌπ∗ )πΎ(π₯π , π₯) π(π₯) = ∑π=1 π . π‘. 0 ≤ πΌπ ≤ πΆ, 0 ≤ πΌπ∗ ≤ πΆ (7) where ππ π£ is the number of support vectors and πΎ(π₯π , π₯) is the kernel function. So with Equation (6), the one-step SVR forecasting model is illustrated in Figure 2. The input variables of this model are πΉ(π‘ − π) ,…, πΉ(π‘), where πΉ(π‘) denotes the traffic volume at time period π‘. The one-step model prediction is πΉ(π‘ + 1). Volume t F(t-k) F(t-k+i) F(t) F(t+1) SVR Figure 2: Illustration of SVR short-term traffic forecasting model As stated in section 1, the π-insensitive loss function not only measures the empirical risk (training error), but also controls the generalization ability and the tolerance of noise. It 5 is well known that a suitable value of π should depend on the input variation level π ∝ π, as well as the number of training samples. In the context of short-term traffic forecasting, the traffic variation exhibits a time-varying stochastic characteristic, which demands an adaptive π-margin to accommodate the varying level of variation. Next, we will develop a time-varying model for describing day-to-day traffic variation and then present the adaptive π-margin in SVR. 2.2 Modeling Stochastic Variation in Day-to-Day Traffic Although showing a typical seasonal and cyclical pattern, daily traffic also exhibits a prominent day-to-day variation, which has been verified by a number of empirical traffic data analyses. Levinson et al. (2006) analyzed weekday traffic data from 22 continuous traffic counting stations in the city of Milwaukee and computed the coefficient of variation (COV) of peak hour traffic volume, which ranged from 0.048 to 0.155 with a mean of 0.089. Yin (2008) collected traffic data during 9–11AM for an intersection of Gainesville, Florida and showed that the traffic flow varies significantly within that time interval. Hellinga and Abdy (2008) investigated the observed traffic data in Ontario, Canada and found the variation of day-to-day hour volume can be best described by the normal distribution. They also analyzed the daily traffic volume for a 12 month period at city of Toronto and concluded that the hourly volumes follow a normal distribution (Abdy & Hellinga, 2008). Rakha and Van Aerde (1995) investigated traffic volumes at Orlando, Florida for 22 days and validated that 5-min traffic counts for weekday traffic is normally distributed with a confidence level of 95%. Based on these empirical findings, we formulate the traffic volume π(π‘, π) at time period π‘ of day π as a random distribution with a mean of ππ (π‘) and variance of π 2 (π‘). So we have the day-to-day variation in the form of: βπ(π‘) = [π(π‘, π) − π(π‘, π)] ∼ π·οΏ½0, π 2 (π‘)οΏ½ (8) 2 (π‘) 2 (π‘) where π = 2π∗ with the assumption that the distribution of the traffic volumes at the same time period π‘ for day π and day π are independent. π· is the random distribution of the day-to-day traffic volume variations. For determining the variance π(π‘) in Equation (8), we introduce a bimodal formula as Equation (9), which is able to describe the time-varying characteristic of the day-to-day traffic variation. π(π‘) = π0 οΏ½ππ π‘−π −οΏ½ π 1 οΏ½ 1 2 + (1 − π)π π‘−π −οΏ½ π 2 οΏ½ 2 2 οΏ½ (9) Equation (9) is derived from the empirical variation data shown in Figure 3, which plots the day-to-day variation from five-minute traffic counts on five consecutive weekdays collected in Huangshan Avenue in Hefei, China. Three features can be easily observed from the figure. First, the variation is obviously time-varying within a day and shows different magnitude of deviations; secondly, the variation is symmetric with the zero axis; last but not least, two peak variations can be identified, representing the morning and afternoon peak hours. With well-calibrated parameters, these features can be characterized by the proposed Equation (9). 6 Day to day traffic volume variation (Normalized) vehicle per 5 minutes 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 1440 1080 720 Time (minutes) 360 0 Figure 3: Day-to-day traffic variation of the traffic dataset from Hefei, China Using Equation (8) together with Equation (9), the day-to-day traffic variation is then modeled as: βπ(π‘) ∼ π‘−π 1οΏ½ −οΏ½ π· οΏ½0, π0 οΏ½ππ π1 2 + (1 − π‘−π 2 2οΏ½ −οΏ½ π)π π2 οΏ½οΏ½ (10) In Equation (10), at the day-to-day level, the traffic variation at certain time π‘ is subject to a random distribution with a mean of zero. Meanwhile within a day, the variances of the at different time periods π‘ are associated by a bimodal function to represent the timevarying characteristics observed from field data. To find the parameters of the distribution, the Max-likelihood Estimation (MLE) method is applied to identify the deviation π(π‘) in day-to-day traffic, which requires the maximization of the log-likelihood function: π π 1 ln βοΏ½π 2 (π‘)οΏ½ = − ln 2π − lnπ 2 (π‘) − 2(π‘) ∑ππ=1 Δ(t)2π (11) 2 2 2π where π is the number of variation observations and β is the day-to-day traffic variation. Taking derivatives with respect to π 2 (π‘) and solving the resulting system of first order conditions yields the maximum likelihood estimations: 1 οΏ½(π‘))2 ] ποΏ½ 2 (π‘) = ∑ππ=1[(Δπ (π‘) − Δ (12) π ποΏ½ 2 (π‘) is then associated in a compact function form of Equation (8) by the Non-Linear Least Square (NLLS) method, which finds the optimal parameter set of {π, π1 , π2 , π1 , π2 , π0 } to minimize: Ψ = οΏ½ οΏ½π0 = π‘−π1 2 −οΏ½ οΏ½ οΏ½ππ π1 + (1 − π‘−π −οΏ½ π 1 οΏ½ 1 ∑πΎ οΏ½ππ οΏ½(π 0 π=1 2 π‘−π2 2 −οΏ½ οΏ½ π)π π2 οΏ½ + (1 − π‘−π 2 − ποΏ½(π‘)οΏ½ ππ‘ 2οΏ½ −οΏ½ π)π π2 2 οΏ½ − ποΏ½(π))2 οΏ½ (13) where πΎ is the number of observations within a day. A widely-accepted computation method, the Gauss-Newton method is applied to solve the NLLS of Equation (13). 7 2.3 Adaptive margin On the ground of stochastic modeling the time-varying day-to-day traffic variation, this section describes how to integrate this variation to construct an adaptive π-margin. It is well known that π should depend on the data variation (Cherkassky & Ma, 2004; Smola et al., 1998), and therefore we proposed a new time-varying π-margin as: π(π‘) = οΏ½οΏ½ π1 π(π(π‘)) οΏ½) οΏ½οΏ½οΏ½οΏ½οΏ½ + ποΏ½οΏ½οΏ½ 2 π(π πππππ π£πππππ‘πππ π‘πππ π . π‘. π1 + π2 = 1 ππππππππ§ππ π‘πππ (13) The π1 π(π(π‘)) term describes the local variation, while π2 π(ποΏ½) contains the normalized variation information. The parameters of π1 and π2 provide a tradeoff between local and global information. The proposed time-varying form of π-margin includes the fixed margin as a special case if π1 = 0. When π2 = 0, the margin then solely relies on local variation. As stated in section 2.1, π is also dependent on the number of training samples π. We adopt an empirical selection proposed by Cherkassky and Ma (2004) as: ln π π(π) = 3ποΏ½ π (14) Combining Equation (13) and Equation (14), the proposed adaptive margin is ln π π(π‘) = 3(π1 π(π‘) + π2 ποΏ½)οΏ½ (15) π With this new margin, the SVR optimization problem is then re-formulated as: π 1 min βπβ2 + πΆ οΏ½(ππ + ππ∗ ) 2 π=1 π . π‘. β§π¦π − π(π₯π , π) ≤ π(π1 π(π‘) + π2 ποΏ½)οΏ½ln π + ππ∗ π βͺ ln π β¨ π(π₯π , π) − π¦π ≤ π(π1 π(π‘) + π2 ποΏ½)οΏ½ βͺ ππ , ππ∗ ≥ 0, π = 1, … , π β© π + ππ 3. Experiments and Discussion This section presents comparative experiments to validate our adaptive-margin SVR using four different field data. The performance is compared with fixed-margin SVR determined by the method of cross-validation, as well as the seasonal ARIMA model, RBF-NN model and the Naive forecast method. 3.1 Experimental settings 8 (16) 3.1.1 Data Normalization and Preprocessing Four field datasets from Hefei, China and Seattle, USA are used in this experiment. The first dataset is collected from Huangshan Avenue in Hefei, China. It contains traffic counts for five consecutive weekdays from Sep. 15th to Sep. 19th 2008 in normal traffic conditions without incidents and adverse weather. The second dataset contains traffic counts of eight weeks from March 26th to May 20th, 2007 on I-90 (north bound of station cabinet ES-855D), obtained from the Traffic Data Acquisition and Distribution (TDAD) database maintained by an ITS research group at the University of Washington. The third dataset also contains eight weeks of traffic counts from Mar 19th to May 13th on SR-18 (east bound of station cabinet ES-394D). The fourth dataset contains eight weeks of traffic counts from April 2nd to May 26th on SR525 (south bound of station cabinet ES-766D). The traffic counts are aggregated into 5 minutes interval and normalized into [0,1] by: π¦ππππ = π¦−π¦πππ (17) π¦πππ₯ −π¦πππ where π¦πππ₯ and π¦πππ are the maximum and minimum traffic counts in the dataset, respectively. Since the data series contains obvious outliers with zero or extremely large traffic volume. The statistics of the four datasets are summarized in Table 1. Table 1: Descriptive Statistics of experimental datasets Collection Periods Series length Mean (vehicle/5minutes) Hefei Dataset th th Sep. 15 to Sep. 19 2008 th th Standard Deviation 1440 21.6671 16.0877 ES-855D Mar. 26 to May 20 2007 16128 204.1465 141.5130 ES-394D Mar. 19th to May 13th 2007 16128 54..4217 36.1245 16128 70.1395 46.3563 ES-766D nd th Apr. 2 to May 26 2007 3.1.2 Auto-Correlation The input dimension is important for establishing the prediction model. A suitable input dimension is usually determined by the statistical autocorrelation function (ACF), which measures the correlation between data points in a time series separated by different time lags (Jiang & Adeli, 2005). The smallest time lag which makes ACF equal to zero was used to determine the input dimension. Figure 4 shows the ACF for the four datasets. For the Hefei dataset (Fig. 4a), the ACF curve first intersects with zero axes at a time lag of 6 hours, and thus the input dimension is then selected to be 72. For the other three datasets from Seattle, the ACF is plotted in Figure 5(b)-(d) and the input dimension is selected to be 60. 9 Besides π, other parameters including the kernel parameter π and the regularization parameter πΆ are determined by cross-validation. The SVR program is implemented based on the SVM-KM toolbox (Canu, Grandvalet, Guigue, & Rakotomamonjy, 2005). 1 1 0.8 0.6 0.5 0.2 ACF ACF 0.4 0 -0.2 0 -0.4 -0.6 -0.8 0 8 16 32 24 Hour 40 -0.5 0 48 16 32 48 Hour (a) (b) 1 1 0.8 0.6 0.5 ACF ACF 0.4 0.2 0 0 -0.2 -0.4 -0.6 0 -0.5 32 16 48 16 0 Hour 32 48 Hour (c) (d) Figure 4: ACF of the traffic series data (a) dataset from Hefei; (b) ES766; (c) ES394; (d) ES855 3.2 Variation Calibration Result As described in section 2, the variation for day-to-day traffic is modeled by Equation (9) and a combination method of MLE and NLLS is adopted to calibrate the parameters. The results are summarized in Table 2. Table 2: Parameter sets in variation equation (9) Hefei Dataset ES-855D ES-766D ES-394D ππ π ππ (π − π) 540.0 ππ 948.0 ππ 114.4 579.5 0.0749 0.1309 ππ 0.1019 356.7 1008.5 183.1 403.1 0.0608 0.08010 334.9 912.5 171.5 411 0.0851 0.0448 0.05345 296.8 971.0 198.0 ππ 394.3 The time-varying deviations of variations are then plotted in Figure 6. We can see that for the tested datasets the deviations of the afternoon peak is slightly smaller than the morning peak, and the descending branch is also smoother compared with the morning peak. For the datasets from Seattle, the deviation of the morning peak is obviously larger than the afternoon peak. The traffic counts from ES-855 shows relatively larger 10 deviations than other datasets, which is easy to understand and consistent with the deviation of the four time series listed in Table 1. 0.12 Heifei dataset ES-394D dataset ES-855D dataset ES-766D dataset 0.1 σ(t) 0.08 0.06 0.04 0.02 0 0 360 720 Time (minutes) 1080 1440 Figure 5: Time-varying deviation in day-to-day traffic 3.3 Performance Evaluation and Discussion This section presents the comparative results of our adaptive SVR with fixed π on all the four datasets first, then three forecasting models including ARIMA, RBF-NN, and Naïve forecast methods are implemented on the three large datasets from Seattle. 3.3.1 Performance evaluation with Fix-margin SVR Three error indicators are used in this experiment, including Mean Absolute Error (MAE), Mean Sum of Squares of Errors (MSE), and Normalized Square Root of the Mean Sum of Squares of Errors (NSRMSE). The NSRMSE measures the MSE in a normalized way regarding the level of the deviation in the data itself. 1 (18) ππ΄πΈ = ∑π οΏ½π − π¦π | π=1|π¦ π 1 πππΈ = ∑π οΏ½π − π¦π )2 π=1(π¦ π 1 πππ πππΈ = οΏ½π1 π ∑π οΏ½ π −π¦π )2 π=1(π¦ ∑π οΏ½)2 π=1(π¦π −π¦ (19) (20) The adaptive-margin SVR model is compared with the fixed π-SVR first, where the parameter of π is usually determined by cross-validation. In K-fold cross-validation, the original sample is randomly partitioned into K subsamples. Of the K subsamples, a single subsample is retained as the validation data for testing the model, and the remaining K−1 subsamples are used as the training data. The cross-validation process is then repeated K times (the folds), with each of the K subsamples used exactly once as the validation data. Although cross-validation is widely adopted to determine a suitable fixed margin π in SVR, it is a time-consuming and blind searching procedure with a computational complexity of π(πΎπ ), where π is the range of the parameter (F. Chen, Wei, & Tang, 2011). In contrast, the adaptive-margin SVR avoids this blind searching procedure and 11 improved the computational efficiency through establishing an adaptive margin, which is heuristically determined by the day-to-day traffic variations. For the dataset collected from Hefei, traffic counts on Monday are used as the training samples, and the validation dataset contains traffic volumes from Tuesday to Friday. Results with different combination of π1 and π2 are listed in Table 3. The predicted and observed traffic counts on Tuesday are plotted in Figure 7. The NRSSME is plotted in Figure 6, from which we can see that the predicting errors are descending for the consecutive four weekdays with an increase in the proportion of local variation. For nearly all the four of the predicting weekdays, the lowest predicting errors are obtained when π1 = 1 and π2 = 0, which indicates utilizing local variation information without normalized or global information can achieve a better performance. It is also interesting to note that since the traffic on Friday shows a different pattern from typical weekdays, the prediction errors also tend to be slightly larger. Table 4 gives the forecasting errors between fixed and adaptive margin SVRs. It can be seen that compared with the time-consuming K-fold cross-validation procedure in the fixed-margin SVR, the proposed adaptive-margin SVR not only improves the computational efficiency by heuristically determining an adaptive margin, but also consistently outperforms the fixed-margin SVR on the forecasting accuracy. NRSMSE for adaptive-margin NRSMSE for fixed-margin 0.56 0.52 0.558 NRSMSE 0.515 0.556 0.554 0.51 0.552 0.55 0.548 -1 0.585 0.505 Wednesday Tuesday -0.6 -0.2 0.2 0.6 1 0.58 0.5 -1 0.59 -0.6 -0.2 0.2 0.6 1 0.585 0.575 0.58 0.57 0.575 Thursday -1 -0.6 -0.2 0.2 λ 1-λ 2 0.6 1 Friday -1 -0.5 0 λ 1-λ 2 0.5 1 Figure 6: Predicting results with different combination of λ1 and λ2 Table 3: Predicting errors for adaptive margin SVR for the dataset in Hefei Tuesday ππ − ππ MAE NSRMSE -1 -0.6 -0.2 0.2 0.6 1 5.828 0.5581 5.834 0.5586 5.807 0.5575 5.773 0.5539 5.773 0.5517 5.766 0.5492 12 Wednesday Thursday Friday MSE MAE NSRMSE MSE MAE NSRMSE MSE MAE NSRMSE MSE 79.92 5.168 0.5182 68.57 5.828 0.5816 82.07 6.208 0.5834 83.29 80.06 5.114 0.5146 67.62 5.807 0.5791 81.37 6.181 0.5809 82.58 79.75 5.086 0.5132 67.25 5.773 0.5775 80.92 6.140 0.5784 82.12 78.72 5.052 0.5090 66.15 5.760 0.5748 80.16 6.079 0.5750 81.35 78.10 5.032 0.5051 65.14 5.692 0.5705 78.97 6.038 0.5731 80.14 77.39 5.005 0.5013 64.17 5.610 0.5662 77.78 6.032 0.5733 78.94 Table 4: Predicting errors for fixed-margin SVR and adaptive for the dataset in Hefei πΊ margin Tuesday Wednesday Thursday Friday Adaptive margin SVR Fixed margin SVR 5.766 0.5492 77.39 5.005 0.5013 64.17 0.0825 0.5662 77.78 5.610 0.5834 78.94 5.8072 0. 5575 79.74752 5.1068 0.5146 64.27 5.746 0.5787 82.30311 6.154 0.5895 85.56835 MAE NSRMSE MSE MAE NSRMSE MSE MAE NSRMSE MSE MAE NSRMSE MSE 1 Predicted Value Observed Value Traffic volume per 5 minutes (Normalizd) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 360 720 1080 Time (minutes) Figure 7: Illustration of predicted and observed traffic counts on Tuesday for the dataset collected in Hefei Similar results are obtained for the three datasets from Seattle as listed in Table 5. The traffic counts of the first five weekdays for the collection periods are used as the training samples. The model is then evaluated with the traffic counts of the last four weeks. Figure 8 plots the NSRMSE errors for different combination of π1 and π2 , from which we can see that the best performances are also obtained for all datasets when the local variations are fully utilized. The errors listed in Table 6 also indicate that the predicting accuracy with the adaptive margin consistently outperforms the fixed margin SVR. 13 NSRMSE 0.2 0.195 ES394 0.19 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0.8 1 NSRMSE 0.22 0.215 0.21 NSRMSE 0.205 -1 0.13 ES766 -0.8 0.125 0.12 0.115 0.11 NSRMSE with adpative margin NSRMSE with fixed-margin ES855 0.105 -1 -0.8 -0.6 -0.4 0 -0.2 λ 1-λ 2 0.2 0.4 0.6 Figure 8: Predicting results with different combination of λ1 and λ2 Table 5: Predicting errors for adaptive margin SVR for the dataset in TDAD ES394 ES766 ES855 ππ − ππ MAE NSRMSE MSE MAE NSRMSE MSE MAE NSRMSE MSE -1 -0.6 -0.2 0.2 0.6 1 5.434 0.1981 56.67 7.532 0.2132 112.6 13.83 0.1244 356.89 5.333 0.1955 55.19 7.477 0.2104 109.7 13.56 0.1214 340.0 5.120 0.1953 55.08 7.421 0.2093 108.6 12.34 0.1203 310.9 5.148 0.1922 53.34 7.425 0.2076 106.8 12.77 0.1161 333.7 5.015 0.1902 52.24 7.400 0.2062 105.4 12.57 0.1149 304.5 5.013 0.1901 52.18 7.370 0.2059 105.1 12.63 0.1153 306.6 Table 6: Predicting errors for fixed-margin SVR and adaptive for the dataset in Hefei ES394 ES766 ES855 πΊ margin Adaptive margin SVR Fixed margin SVR 5.013 0.1901 52.1819 7.370 0.2059 105.1 12.63 0.1153 306.6 5.412 0.1966 55.0757 7.457 0.2149 109.7 13.73 0.1247 331.6 MAE NSRMSE MSE MAE NSRMSE MSE MAE NSRMSE MSE 14 3.3.2 Model comparisons Above experiments demonstrated that the adaptive margin SVR model consistently outperformed the fixed margin SVR forecast. Next, comparisons with three forecasting models including seasonal ARIMA, Radius Basis Function NN (RBF-NN) and Naive forecast are conducted. For the structure of ARIMA, in Williams et al. (1998), Williams and Hoel (2003) and follow-on research (Smith et al., 2002), the seasonal ARIMA (1, 0, 1) (0, 1, 1)s consistently emerged as the preferred structure. For the data with 5 minutes resolution, the seasonal cycle length π for a week is 2016, thus the recursive prediction equation is written as: yοΏ½ t+1 = yt−2015 + ∅1 (yt − yt−2016 ) − θ1 (yt − yοΏ½ t ) − Θ1 (yt−2015 − yοΏ½ t−2015 ) +θ1 Θ1 (yt−2016 − yοΏ½ t−2016 ) (21) The parameters in Eq. (21) are then estimated using the Econometrics toolbox of Matlabtm(2012a) with the calibration dataset of the first four weeks traffic count (8064 observations) as listed in table 5. With the calibrated parameters, the model is then evaluated by iteratively forecasting the traffic counts of the second four weeks. The errors are calculated regarding the weekday traffic. Table 7: ARIMA parameters ES394 0.9533 0.6127 0.4950 ES766 0.9280 π½π 0.7492 0.5191 ES855 0.9510 0.4863 0.5027 Dataset ∅π π£π The architecture of the RBF-NN is determined by the package of DTREG (Sherrod, 2003), which applies the evolutionary approach developed by S. Chen, Hong, Harris, and Sharkey (2004) to determine the optimal parameter set. Same as the adaptive SVR model, traffic counts of the first week are used as the training samples. The third comparison model as suggested in Smith et al. (2002) is the Naive forecast in the form of: yt yοΏ½ t+1 = yt+1, hist (22) yt, hist where yt, hist is the historical average of the traffic flow rate at the time-of-day and dayof-week associated with time interval t. Forecasting models that could not be shown to consistently produce more accurate forecast than this naive method would be of little value. The performances of above comparing models, as well as the adaptive SVR model, are listed in Table 7. It can be seen that the Naive forecast shows the largest errors for all three datasets, and the adaptive margin SVR model consistently outperforms other comparing models. In terms of NSRMSE, the adaptive-margin SVR model improves the forecasting accuracy by 6%, 7% and 2% with ES394, ES766 and ES855 datasets, respectively. Furthermore, SVR model only used one-week traffic data as the training 15 samples, while the Naive and ARIMA models both utilized traffic counts of four consecutive weeks. Table 7: Prediction errors for different models Models Adaptive margin SVR Naive Forecast RBF-NN ARIMA (1,0,1)(0,1,1)1026 ES394 MAE MSE NSRMSE 5.013 52.1819 0.1901 6.964 95.90 0.2564 5.420 55.58 0.1962 ES766 MAE MSE NSRMSE 7.357 105.0 0.2059 9.554 182.8 0.2722 7.655 110.8 0.2114 7.97 126.9 0.2243 MAE MSE NSRMSE 12.53 284.9 0.1113 15.00 439.3 0.1382 13.03 319.6 0.1177 14.85 428.8 0.1365 ES855 5.92 69.97 0.2159 4. Conclusion and future work Accurately forecasting short-term traffic is a challenging task due to its complex characteristics. This paper proposed an adaptive-margin SVR with integration of day-today traffic variation, which can adequately tolerate and track the time-varying volatility. The adaptive-margin SVR avoids this blind searching procedure and improves the computational efficiency through establishing an adaptive and time-varying π margin, which is heuristically determined by the day-to-day traffic variations. Experimental results with field data indicate that the adaptive π with only the local variation can achieve better performance compared with combinations of the global variation. The adaptive-margin model consistently outperforms the fixed π margin SVR with improved computational efficiency. The impact of the training samples is also discussed, demonstrating that a complete set of daily and weekly traffic counts plays a fundamental role for SVR learning a seasonal traffic pattern. We only investigated the π margin of the SVR model in this paper. However, other parameters in SVR such as the regulation parameter πΆ also may affect the learning ability and forecasting accuracy. Combinations of heuristically determined parameter set including both the π margin and regulation parameter are expected to further improve the performance of the forecasting model. And now this is being investigated by the authors. Reference Abdy, Z. R., & Hellinga, B. R. (2008). Use of Microsimulation to Model Day-to-Day Variability of Intersection Performance. Transportation Research Record: Journal of the Transportation Research Board, 2088(-1), 18-25. Canu, S., Grandvalet, Y., Guigue, V., & Rakotomamonjy, A. (2005). Svm and kernel methods matlab toolbox. Perception Systemes et Information, INSA de Rouen, Rouen, France, 2, 2. Castro-Neto, M., Jeong, Y. S., Jeong, M. K., & Han, L. D. (2009). Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions. Expert Systems with Applications, 36(3), 6164-6173. 16 Chen, F., Wei, D., & Tang, Y. (2010). Wavelet analysis based sparse LS-SVR for time series data. Paper presented at the Bio-Inspired Computing: Theories and Applications (BIC-TA), 2010 IEEE Fifth International Conference on. Chen, F., Wei, D., & Tang, Y. (2011). Virtual Ion Selective Electrode for Online Measurement of Nutrient Solution Components. Sensors Journal, IEEE, 11(2), 462-468. Chen, S., Hong, X., Harris, C. J., & Sharkey, P. M. (2004). Sparse modeling using orthogonal forward regression with PRESS statistic and regularization. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 34(2), 898-911. Cherkassky, V., & Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks, 17(1), 113-126. doi: 10.1016/s0893-6080(03)00169-2 Davis, G. A., & Nihan, N. L. (1991). Nonparametric regression and short-term freeway traffic forecasting. Journal of Transportation Engineering, 117(2), 178-188. Hellinga, B., & Abdy, Z. (2008). Signalized intersection analysis and design: Implications of dayto-day variability in peak-hour volumes on delay. Journal of Transportation Engineering, 134(7), 307-318. Jiang, X., & Adeli, H. (2005). Dynamic wavelet neural network model for traffic flow forecasting. Journal of Transportation Engineering, 131(10), 771-779. Levinson, H. S., Sullivan, D., & Bryson, R. W. (2006). Effects of Urban Traffic Volume Variations on Service Levels. Paper presented at the Transportation Research Board 85th Annual Meeting. Li, J. Q. (2011). Discretization modeling, integer programming formulations and dynamic programming algorithms for robust traffic signal timing. Transportation Research Part C: Emerging Technologies, 19(4), 708-719. Li, L., Lin, W., & Liu, H. (2006). Type-2 fuzzy logic approach for short-term traffic forecasting. Paper presented at the Intelligent Transport Systems, IEE Proceedings. Park, B., Messer, C. J., & Urbanik II, T. (1998). Short-term freeway traffic volume forecasting using radial basis function neural network. Transportation Research Record: Journal of the Transportation Research Board, 1651(-1), 39-47. Rakha, H., & Van Aerde, M. (1995). Statistical analysis of day-to-day variations in real-time traffic flow data. Transportation research record, 26-34. Sherrod, P. H. (2003). DTREG predictive modeling software. Software available at http://www. dtreg. com. Smith, B. L., Williams, B. M., & Keith Oswald, R. (2002). Comparison of parametric and nonparametric models for traffic flow forecasting. Transportation Research Part C: Emerging Technologies, 10(4), 303-321. Smola, A. J., Schölkopf, B., & Müller, K. R. (1998). The connection between regularization operators and support vector kernels. Neural Networks, 11(4), 637-649. Vapnik, V. (1995). The nature of statistical learning theory: Springer-Verlag New York, Inc. Wei, D., Chen, F., & Chen, C. (2010). Least-Square Wavelet-Support Vector Machine for Shortterm Traffic Flow Forecast. The Mediterranean Journal of Measurement and Control, 6(2), 39-45. Williams, B. M., Durvasula, P. K., & Brown, D. E. (1998). Urban freeway traffic flow prediction: application of seasonal autoregressive integrated moving average and exponential smoothing models. Transportation Research Record: Journal of the Transportation Research Board, 1644(-1), 132-141. Williams, B. M., & Hoel, L. A. (1999). Modeling and forecasting vehicular traffic flow as a seasonal stochastic time series process. Williams, B. M., & Hoel, L. A. (2003). Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. Journal of Transportation Engineering, 129(6), 664-672. 17 Xie, Y., & Zhang, Y. (2006). A wavelet network model for short-term traffic volume forecasting. Journal of Intelligent Transportation Systems, 10(3), 141-150. Yin, Y. (2008). Robust optimal traffic signal timing. Transportation Research Part B: Methodological, 42(10), 911-924. Zhang, L., Yin, Y., & Lou, Y. (2010). Robust Signal Timing for Arterials under Day-to-Day Demand Variations. Transportation Research Record: Journal of the Transportation Research Board, 2192(-1), 156-166. Zhang, Y., & Xie, Y. (2008). Forecasting of short-term freeway volume with v-support vector machines. Transportation Research Record: Journal of the Transportation Research Board, 2024(-1), 92-99. 18