An Adaptive-Margin Support Vector Regression for Short-Term Traffic Flow Forecast Dali Wei

advertisement
An Adaptive-Margin Support Vector Regression for Short-Term
Traffic Flow Forecast
Dali Wei1, Hongchao Liu2,*
1
2
Civil and Environmental Department, Texas Tech University, 10th and Akron, Lubbock, TX 79409
Civil and Environmental Department, Texas Tech University, 10th and Akron, Lubbock, TX 79409
Email: hongchao.liu@ttu.edu; Fax: 1- (806)7424168; Telephone: 1-(806)7422801
*Corresponding
1
author
An Adaptive-Margin Support Vector Regression for ShortTerm Traffic Flow Forecast
ABSTRACT
The day-to-day volatility of traffic series provides valuable information for accurately
tracking the complex characteristics of short-term traffic such as stochastic noise and
non-linearity. Recently, Support Vector Regression (SVR) has been applied for shortterm traffic forecasting. However, standard SVR adopts a global and fixed ε-margin,
which not only fails to tolerate the day-to-day traffic variation, but also requires a blind
and time-consuming searching procedure to obtain a suitable value for ε. In this work, on
the ground of stochastic modeling of day-to-day traffic variation, we propose an adaptive
SVR short-term traffic forecasting model. The time-varying deviation of the day-to-day
traffic variation, described in a bi-level formula, is integrated into SVR as heuristic
information to construct an adaptive ε-margin, in which both local and normalized factors
are considered. Comparative experiments using filed traffic data indicate that the
proposed model consistently outperforms the standard SVR with an improved
computational efficiency.
KEY WORDS: Short-term traffic forecasting, Support Vector Regression, ε-margin,
day-to-day traffic variation
1. Introduction
Short-term traffic forecasting plays an important role in proactive traffic control and
operation. Usually, the predicting period required is less than 15 minutes. In such a short
interval, traffic volume and velocity exhibit very complex characteristics such as
stochastic noise and non-linearity (Jiang & Adeli, 2005; L. Li, Lin, & Liu, 2006; Wei,
Chen, & Chen, 2010).
A number of forecasting methods have been proposed regarding short-term traffic as a
typical time series, such as the Auto Regression Integrated Moving Average (ARIMA)
model (Williams, Durvasula, & Brown, 1998; Williams & Hoel, 1999, 2003), the NonParameter Regression model (Davis & Nihan, 1991) and the Neural Network (NN)
models (Park, Messer, & Urbanik II, 1998; Smith, Williams, & Keith Oswald, 2002; Xie
& Zhang, 2006). For example, Williams and Hoel (1999) applied traditional parametric
Box-Jenkins time series models to the dynamic system of single point traffic flow
forecasting, which addressed most of the parametric model concerns for traffic condition
data by establishing a theoretical foundation for using seasonal ARIMA forecast models.
The experiments of Smith et al. (2002) further indicated that the performances of the
seasonal ARIMA are better than the nearest neighbor nonparametric regression model,
where the Naive forecast model is applied as a benchmark model.
Recently, Support Vector Regression (SVR), as a universal learning model, has drawn
increasing attentions and successfully been applied in short-term traffic forecasting.
2
Various empirical studies indicate that SVR consistently outperforms traditional
forecasting models like ES and ARIMA (Castro-Neto, Jeong, Jeong, & Han, 2009; F.
Chen, Wei, & Tang, 2010; Wei et al., 2010; Y. Zhang & Xie, 2008).
SVR using the πœ€-insensitive loss function is in the form of (Vapnik, 1995):
0
𝑖𝑓 |𝑦 − 𝑓(π‘₯, πœ”)| < πœ€
πΏπœ‰ (𝑦, 𝑓(π‘₯, πœ”)) = οΏ½
(1)
|𝑦 − 𝑓(π‘₯, πœ”)| − πœ€
π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’
where 𝑓(π‘₯, πœ”) is the predicting function, πΏπœ‰ is the loss function, 𝑦 is the observed value
and πœ€ is the margin parameter. As shown in Figure 1, if the data sample is inside the πœ€tube (the tube between the two dash lines), SVR regards it as having no loss or error
corresponding to the estimation of 𝑓(π‘₯, πœ”). Only the data samples outside the tube are
taken into account in the optimization. The πœ€-insensitive loss function would yield better
generalization than other loss functions like Huber’s loss function, which not only
measures the empirical risk (training error), but also controls the generalization ability
and the tolerance of noise, while conferring the advantage of sparsity in the solution
(Smola, Schölkopf, & Müller, 1998).
Figure 1: Illustration of πœ€ insensitive function in SVR
However, in the context of short-term traffic forecasting, using a global and fixed πœ€margin for all time period within a day lacks the flexibility to capture and tolerate the
time-varying variation in daily traffic. SVR, similar to other supervised learning models,
learns from historical traffic data (training samples) to gain the knowledge in the form of
equation systems, and then predicts the future traffic states. Although showing a similar
pattern, day-to-day traffic exhibits a prominent variation due to underlying stochastic
travelling behavior. This has been well verified by a number of empirical traffic data
analyses (Hellinga & Abdy, 2008; Levinson, Sullivan, & Bryson, 2006; J. Q. Li, 2011;
Rakha & Van Aerde, 1995; Yin, 2008; L. Zhang, Yin, & Lou, 2010). Hence, to attain a
satisfying predicting accuracy, SVR not only needs to learn the seasonal and cyclical
patterns, but also needs to adequately tolerate and track this day-to-day variation.
Specifically, this variation is time-dependent—different time periods exhibit different
variation characteristics. Using a fixed πœ€-margin in SVR obviously cannot adapt to this
time-varying volatility.
Furthermore, it is always a tricky task to determine a suitable value of πœ€ in the fixedmargin SVR. Without prior experience, all optional values for πœ€ have to be tried and
compared using a K-fold cross-validation method, which is a time-consuming and blind
searching procedure. We believe that integrating an adaptive πœ€-margin in SVR to utilize
the day-to-day traffic variation as heuristic information, will not only lead to a better
3
forecasting performance, but also avoid the computation complexity for determine a
suitable global margin. This forms the basic motivation of this work.
In this paper, an adaptive-margin SVR model for short-term traffic forecasting is
proposed on the ground of a stochastic modeling of traffic variation. We establish a bilevel stochastic formula to describe the traffic variation, which combines the timevarying characteristic into the day-to-day traffic variation. The obtained time-varying
deviation then is utilized to construct an adaptive πœ€-margin of SVR, in which both local
and normalized factors are considered.
Four traffic datasets from Hefei, China and Seattle, USA are used in this paper for
training our model and validate the performance. Our new model is compared with the
fixed πœ€ margin SVR, as well as the seasonal ARIMA, NN and Naive forecast.
The rest of this paper is organized as follows. In section 2, we briefly introduce the SVR,
and then propose the stochastic model for the day-to-day traffic variance and the adaptive
πœ€ -margin in SVR. Section 3 presents comparative experiments to evaluate the
performance of our model. Section 4 concludes the paper.
2. Model formulation
The section briefly introduces some fundamentals of ε -SVR, and then presents the
stochastic modeling of the day-to-day traffic variation and describes how to integrate this
variation into the adaptive margin of SVR.
2.1 Support Vector Regression
Support Vector Machine (SVM) is a universal learning method based on the statistical
learning theory developed by Vapnik (1995). SVM embodies the structural risk
minimization (SRM) principle to minimize an upper bound on the expected risk, which is
opposed to empirical risk minimization (ERM) that aims to minimize the error on the
training data. This feature equips SVM with a greater generalization performance.
SVR is an application of SVM for function estimation problem. Given a set of training
data:
(π‘₯𝑖 , 𝑦𝑖 ), 𝑖 = 1,2,3 … , 𝑙, π‘₯𝑖 ∈ ℛ𝑑 , 𝑦𝑖 ∈ β„›
(2)
a linear function regression function can be stated as:
𝑓(π‘₯, πœ”) = πœ”π‘‡ πœ™(π‘₯) + 𝑏
(3)
in high-dimensional feature space β„±, where ω is a vector in β„±. Function Ο•(x) maps the
input x into a vector in β„±. The quality of estimation is measured by the loss function of
L(y, f(x, ω)). SVR uses a new type of loss function, namely an β„°-intensive loss function
in Equation (1) proposed by Vapnik. The empirical risk is then:
1
π‘…π‘’π‘šπ‘ (πœ”) = ∑𝑛𝑖=1 𝐿ℰ (𝑦𝑖 , 𝑓(π‘₯, πœ”))
(4)
𝑛
SVR performs linear regression in the high-dimension feature space using the β„°-intensive
loss function and, at the same time, tries to reduce model complexity by
minimizing β€–πœ”β€–2 . This is described by introducing slack variables, πœ‰π‘– , πœ‰π‘–∗ 𝑖 = 1, … , 𝑛, to
4
measure the deviation of training samples outside the β„° -tube. Thus, SVR is then
formulated as:
𝑛
1
2
min β€–πœ”β€– + 𝐢 οΏ½(πœ‰π‘– + πœ‰π‘–∗ )
2
𝑖=1
𝑦𝑖 − 𝑓(π‘₯𝑖 , πœ”) ≤ β„° + πœ‰π‘–∗
𝑠. 𝑑. οΏ½ 𝑓(π‘₯𝑖 , πœ”) − 𝑦𝑖 ≤ β„° + πœ‰π‘–
πœ‰π‘– , πœ‰π‘–∗ ≥ 0, 𝑖 = 1, … , 𝑛
1
(5)
The β€–πœ”β€–2 in Equation (5) is called the regularization term. 𝐢 is the regularized fixed for
2
determining the trade-off between the empirical risk and the regularization term.
Increasing the value of 𝐢 will result in the relative importance of the empirical risk with
respect to the regularization term to grow. We can view the goal defined by Equation (6)
as:
1
π‘…π‘Ÿπ‘’π‘” (πœ”) = min β€–πœ”β€–2 + π‘…π‘’π‘šπ‘ (πœ”)
(6)
2
The ERM principle only considers the term π‘…π‘’π‘šπ‘ (πœ”), which is the risk on the given
dataset. When dealing with few data in very high-dimensional spaces, this may lead to
over-fitting and thus poor generalization properties. Hence the SVR method adds a
1
capacity control term— β€–πœ”β€–2 , which leads to the regularized risk function.
2
To solve the optimization problem of Equation (5), the objective function can be
transformed to a dual problem, and its solution is given by:
𝑛𝑠𝑣
(𝛼𝑖 − 𝛼𝑖∗ )𝐾(π‘₯𝑖 , π‘₯)
𝑓(π‘₯) = ∑𝑖=1
𝑠. 𝑑. 0 ≤ 𝛼𝑖 ≤ 𝐢, 0 ≤ 𝛼𝑖∗ ≤ 𝐢
(7)
where 𝑛𝑠𝑣 is the number of support vectors and 𝐾(π‘₯𝑖 , π‘₯) is the kernel function. So with
Equation (6), the one-step SVR forecasting model is illustrated in Figure 2. The input
variables of this model are 𝐹(𝑑 − π‘˜) ,…, 𝐹(𝑑), where 𝐹(𝑑) denotes the traffic volume at
time period 𝑑. The one-step model prediction is 𝐹(𝑑 + 1).
Volume
t
F(t-k)
F(t-k+i)
F(t)
F(t+1)
SVR
Figure 2: Illustration of SVR short-term traffic forecasting model
As stated in section 1, the πœ€-insensitive loss function not only measures the empirical risk
(training error), but also controls the generalization ability and the tolerance of noise. It
5
is well known that a suitable value of πœ€ should depend on the input variation level πœ€ ∝ 𝜎,
as well as the number of training samples. In the context of short-term traffic forecasting,
the traffic variation exhibits a time-varying stochastic characteristic, which demands an
adaptive πœ€-margin to accommodate the varying level of variation. Next, we will develop
a time-varying model for describing day-to-day traffic variation and then present the
adaptive πœ€-margin in SVR.
2.2 Modeling Stochastic Variation in Day-to-Day Traffic
Although showing a typical seasonal and cyclical pattern, daily traffic also exhibits a
prominent day-to-day variation, which has been verified by a number of empirical traffic
data analyses.
Levinson et al. (2006) analyzed weekday traffic data from 22 continuous traffic counting
stations in the city of Milwaukee and computed the coefficient of variation (COV) of
peak hour traffic volume, which ranged from 0.048 to 0.155 with a mean of 0.089. Yin
(2008) collected traffic data during 9–11AM for an intersection of Gainesville, Florida
and showed that the traffic flow varies significantly within that time interval. Hellinga
and Abdy (2008) investigated the observed traffic data in Ontario, Canada and found the
variation of day-to-day hour volume can be best described by the normal distribution.
They also analyzed the daily traffic volume for a 12 month period at city of Toronto and
concluded that the hourly volumes follow a normal distribution (Abdy & Hellinga, 2008).
Rakha and Van Aerde (1995) investigated traffic volumes at Orlando, Florida for 22 days
and validated that 5-min traffic counts for weekday traffic is normally distributed with a
confidence level of 95%. Based on these empirical findings, we formulate the traffic
volume 𝑓(𝑑, 𝑖) at time period 𝑑 of day 𝑖 as a random distribution with a mean of 𝑓𝑒 (𝑑) and
variance of 𝜎 2 (𝑑). So we have the day-to-day variation in the form of:
βˆ†π‘“(𝑑) = [𝑓(𝑑, 𝑖) − 𝑓(𝑑, 𝑗)] ∼ 𝐷�0, 𝜎 2 (𝑑)οΏ½
(8)
2 (𝑑)
2 (𝑑)
where 𝜎
= 2𝜎∗
with the assumption that the distribution of the traffic volumes at
the same time period 𝑑 for day 𝑖 and day 𝑗 are independent. 𝐷 is the random distribution
of the day-to-day traffic volume variations. For determining the variance 𝜎(𝑑) in
Equation (8), we introduce a bimodal formula as Equation (9), which is able to describe
the time-varying characteristic of the day-to-day traffic variation.
𝜎(𝑑) = 𝜎0 �𝑝𝑒
𝑑−πœ‡
−οΏ½ 𝜎 1 οΏ½
1
2
+ (1 − 𝑝)𝑒
𝑑−πœ‡
−οΏ½ 𝜎 2 οΏ½
2
2
οΏ½
(9)
Equation (9) is derived from the empirical variation data shown in Figure 3, which plots
the day-to-day variation from five-minute traffic counts on five consecutive weekdays
collected in Huangshan Avenue in Hefei, China. Three features can be easily observed
from the figure. First, the variation is obviously time-varying within a day and shows
different magnitude of deviations; secondly, the variation is symmetric with the zero axis;
last but not least, two peak variations can be identified, representing the morning and
afternoon peak hours. With well-calibrated parameters, these features can be
characterized by the proposed Equation (9).
6
Day to day traffic volume variation (Normalized)
vehicle per 5 minutes
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
1440
1080
720
Time (minutes)
360
0
Figure 3: Day-to-day traffic variation of the traffic dataset from Hefei, China
Using Equation (8) together with Equation (9), the day-to-day traffic variation is then
modeled as:
βˆ†π‘“(𝑑) ∼
𝑑−πœ‡
1οΏ½
−οΏ½
𝐷 οΏ½0, 𝜎0 �𝑝𝑒 𝜎1
2
+ (1 −
𝑑−πœ‡
2
2οΏ½
−οΏ½
𝑝)𝑒 𝜎2
οΏ½οΏ½
(10)
In Equation (10), at the day-to-day level, the traffic variation at certain time 𝑑 is subject to
a random distribution with a mean of zero. Meanwhile within a day, the variances of the
at different time periods 𝑑 are associated by a bimodal function to represent the timevarying characteristics observed from field data.
To find the parameters of the distribution, the Max-likelihood Estimation (MLE) method
is applied to identify the deviation 𝜎(𝑑) in day-to-day traffic, which requires the
maximization of the log-likelihood function:
𝑛
𝑛
1
ln β„’οΏ½πœŽ 2 (𝑑)οΏ½ = − ln 2πœ‹ − ln𝜎 2 (𝑑) − 2(𝑑) ∑𝑛𝑖=1 Δ(t)2𝑖
(11)
2
2
2𝜎
where 𝑛 is the number of variation observations and βˆ† is the day-to-day traffic variation.
Taking derivatives with respect to 𝜎 2 (𝑑) and solving the resulting system of first order
conditions yields the maximum likelihood estimations:
1
οΏ½(𝑑))2 ]
𝜎� 2 (𝑑) = ∑𝑛𝑖=1[(Δ𝑖 (𝑑) − Δ
(12)
𝑛
𝜎� 2 (𝑑) is then associated in a compact function form of Equation (8) by the Non-Linear
Least Square (NLLS) method, which finds the optimal parameter set of
{𝑝, πœ‡1 , πœ‡2 , 𝜎1 , 𝜎2 , 𝜎0 } to minimize:
Ψ = οΏ½ �𝜎0
=
𝑑−πœ‡1 2
−οΏ½
οΏ½
�𝑝𝑒 𝜎1
+ (1 −
𝑑−πœ‡
−οΏ½ 𝜎 1 οΏ½
1
∑𝐾
�𝑝𝑒
�(𝜎
0
π‘˜=1
2
𝑑−πœ‡2 2
−οΏ½
οΏ½
𝑝)𝑒 𝜎2 οΏ½
+ (1 −
𝑑−πœ‡
2
− 𝜎�(𝑑)οΏ½ 𝑑𝑑
2οΏ½
−οΏ½
𝑝)𝑒 𝜎2
2
οΏ½ − 𝜎�(π‘˜))2 οΏ½
(13)
where 𝐾 is the number of observations within a day. A widely-accepted computation
method, the Gauss-Newton method is applied to solve the NLLS of Equation (13).
7
2.3
Adaptive margin
On the ground of stochastic modeling the time-varying day-to-day traffic variation, this
section describes how to integrate this variation to construct an adaptive πœ€-margin. It is
well known that πœ€ should depend on the data variation (Cherkassky & Ma, 2004; Smola
et al., 1998), and therefore we proposed a new time-varying πœ€-margin as:
πœ€(𝑑) = οΏ½οΏ½
πœ†1 𝑔(𝜎(𝑑))
οΏ½)
οΏ½οΏ½οΏ½οΏ½οΏ½ + πœ†οΏ½οΏ½οΏ½
2 𝑔(𝜎
π‘™π‘œπ‘π‘Žπ‘™
π‘£π‘Žπ‘Ÿπ‘–π‘Žπ‘‘π‘–π‘œπ‘›
π‘‘π‘’π‘Ÿπ‘š
𝑠. 𝑑. πœ†1 + πœ†2 = 1
π‘›π‘œπ‘Ÿπ‘šπ‘Žπ‘™π‘–π‘§π‘’π‘‘
π‘‘π‘’π‘Ÿπ‘š
(13)
The πœ†1 𝑔(𝜎(𝑑)) term describes the local variation, while πœ†2 𝑔(𝜎�) contains the normalized
variation information. The parameters of πœ†1 and πœ†2 provide a tradeoff between local and
global information. The proposed time-varying form of πœ€-margin includes the fixed
margin as a special case if πœ†1 = 0. When πœ†2 = 0, the margin then solely relies on local
variation.
As stated in section 2.1, πœ€ is also dependent on the number of training samples 𝑛. We
adopt an empirical selection proposed by Cherkassky and Ma (2004) as:
ln 𝑛
𝑔(𝜎) = 3𝜎�
𝑛
(14)
Combining Equation (13) and Equation (14), the proposed adaptive margin is
ln 𝑛
πœ€(𝑑) = 3(πœ†1 𝜎(𝑑) + πœ†2 𝜎�)οΏ½
(15)
𝑛
With this new margin, the SVR optimization problem is then re-formulated as:
𝑛
1
min β€–πœ”β€–2 + 𝐢 οΏ½(πœ‰π‘– + πœ‰π‘–∗ )
2
𝑖=1
𝑠. 𝑑.
βŽ§π‘¦π‘– − 𝑓(π‘₯𝑖 , πœ”) ≤ 𝜏(πœ†1 𝜎(𝑑) + πœ†2 𝜎�)οΏ½ln 𝑛 + πœ‰π‘–∗
𝑛
βŽͺ
ln 𝑛
⎨ 𝑓(π‘₯𝑖 , πœ”) − 𝑦𝑖 ≤ 𝜏(πœ†1 𝜎(𝑑) + πœ†2 𝜎�)οΏ½
βŽͺ
πœ‰π‘– , πœ‰π‘–∗ ≥ 0, 𝑖 = 1, … , 𝑛
⎩
𝑛
+ πœ‰π‘–
3. Experiments and Discussion
This section presents comparative experiments to validate our adaptive-margin SVR
using four different field data. The performance is compared with fixed-margin SVR
determined by the method of cross-validation, as well as the seasonal ARIMA model,
RBF-NN model and the Naive forecast method.
3.1 Experimental settings
8
(16)
3.1.1 Data Normalization and Preprocessing
Four field datasets from Hefei, China and Seattle, USA are used in this experiment. The
first dataset is collected from Huangshan Avenue in Hefei, China. It contains traffic
counts for five consecutive weekdays from Sep. 15th to Sep. 19th 2008 in normal traffic
conditions without incidents and adverse weather.
The second dataset contains traffic counts of eight weeks from March 26th to May 20th,
2007 on I-90 (north bound of station cabinet ES-855D), obtained from the Traffic Data
Acquisition and Distribution (TDAD) database maintained by an ITS research group at
the University of Washington. The third dataset also contains eight weeks of traffic
counts from Mar 19th to May 13th on SR-18 (east bound of station cabinet ES-394D). The
fourth dataset contains eight weeks of traffic counts from April 2nd to May 26th on SR525 (south bound of station cabinet ES-766D).
The traffic counts are aggregated into 5 minutes interval and normalized into [0,1] by:
π‘¦π‘›π‘œπ‘Ÿπ‘š =
𝑦−π‘¦π‘šπ‘–π‘›
(17)
π‘¦π‘šπ‘Žπ‘₯ −π‘¦π‘šπ‘–π‘›
where π‘¦π‘šπ‘Žπ‘₯ and π‘¦π‘šπ‘–π‘› are the maximum and minimum traffic counts in the dataset,
respectively. Since the data series contains obvious outliers with zero or extremely large
traffic volume. The statistics of the four datasets are summarized in Table 1.
Table 1: Descriptive Statistics of experimental datasets
Collection Periods
Series length
Mean
(vehicle/5minutes)
Hefei Dataset
th
th
Sep. 15 to Sep. 19 2008
th
th
Standard
Deviation
1440
21.6671
16.0877
ES-855D
Mar. 26 to May 20 2007
16128
204.1465
141.5130
ES-394D
Mar. 19th to May 13th 2007
16128
54..4217
36.1245
16128
70.1395
46.3563
ES-766D
nd
th
Apr. 2 to May 26 2007
3.1.2 Auto-Correlation
The input dimension is important for establishing the prediction model. A suitable input
dimension is usually determined by the statistical autocorrelation function (ACF), which
measures the correlation between data points in a time series separated by different time
lags (Jiang & Adeli, 2005). The smallest time lag which makes ACF equal to zero was
used to determine the input dimension.
Figure 4 shows the ACF for the four datasets. For the Hefei dataset (Fig. 4a), the ACF
curve first intersects with zero axes at a time lag of 6 hours, and thus the input dimension
is then selected to be 72. For the other three datasets from Seattle, the ACF is plotted in
Figure 5(b)-(d) and the input dimension is selected to be 60.
9
Besides πœ€, other parameters including the kernel parameter πœ† and the regularization
parameter 𝐢 are determined by cross-validation. The SVR program is implemented based
on the SVM-KM toolbox (Canu, Grandvalet, Guigue, & Rakotomamonjy, 2005).
1
1
0.8
0.6
0.5
0.2
ACF
ACF
0.4
0
-0.2
0
-0.4
-0.6
-0.8
0
8
16
32
24
Hour
40
-0.5
0
48
16
32
48
Hour
(a)
(b)
1
1
0.8
0.6
0.5
ACF
ACF
0.4
0.2
0
0
-0.2
-0.4
-0.6
0
-0.5
32
16
48
16
0
Hour
32
48
Hour
(c)
(d)
Figure 4: ACF of the traffic series data (a) dataset from Hefei; (b) ES766; (c) ES394; (d) ES855
3.2 Variation Calibration Result
As described in section 2, the variation for day-to-day traffic is modeled by Equation (9)
and a combination method of MLE and NLLS is adopted to calibrate the parameters. The
results are summarized in Table 2.
Table 2: Parameter sets in variation equation (9)
Hefei Dataset
ES-855D
ES-766D
ES-394D
𝝈𝟎 𝒑
𝝈𝟎 (𝟏 − 𝒑)
540.0
𝝁𝟐
948.0
𝝈𝟏
114.4
579.5
0.0749
0.1309
𝝁𝟏
0.1019
356.7
1008.5
183.1
403.1
0.0608
0.08010
334.9
912.5
171.5
411
0.0851
0.0448
0.05345
296.8
971.0
198.0
𝝈𝟐
394.3
The time-varying deviations of variations are then plotted in Figure 6. We can see that for
the tested datasets the deviations of the afternoon peak is slightly smaller than the
morning peak, and the descending branch is also smoother compared with the morning
peak. For the datasets from Seattle, the deviation of the morning peak is obviously larger
than the afternoon peak. The traffic counts from ES-855 shows relatively larger
10
deviations than other datasets, which is easy to understand and consistent with the
deviation of the four time series listed in Table 1.
0.12
Heifei dataset
ES-394D dataset
ES-855D dataset
ES-766D dataset
0.1
σ(t)
0.08
0.06
0.04
0.02
0
0
360
720
Time (minutes)
1080
1440
Figure 5: Time-varying deviation in day-to-day traffic
3.3 Performance Evaluation and Discussion
This section presents the comparative results of our adaptive SVR with fixed πœ€ on all the
four datasets first, then three forecasting models including ARIMA, RBF-NN, and Naïve
forecast methods are implemented on the three large datasets from Seattle.
3.3.1 Performance evaluation with Fix-margin SVR
Three error indicators are used in this experiment, including Mean Absolute Error (MAE),
Mean Sum of Squares of Errors (MSE), and Normalized Square Root of the Mean Sum
of Squares of Errors (NSRMSE). The NSRMSE measures the MSE in a normalized way
regarding the level of the deviation in the data itself.
1
(18)
𝑀𝐴𝐸 = ∑𝑁
�𝑖 − 𝑦𝑖 |
𝑖=1|𝑦
𝑁
1
𝑀𝑆𝐸 = ∑𝑁
�𝑖 − 𝑦𝑖 )2
𝑖=1(𝑦
𝑁
1
𝑁𝑆𝑅𝑀𝑆𝐸 = �𝑁1
𝑁
∑𝑁
οΏ½ 𝑖 −𝑦𝑖 )2
𝑖=1(𝑦
∑𝑁
οΏ½)2
𝑖=1(𝑦𝑖 −𝑦
(19)
(20)
The adaptive-margin SVR model is compared with the fixed πœ€-SVR first, where the
parameter of πœ€ is usually determined by cross-validation. In K-fold cross-validation, the
original sample is randomly partitioned into K subsamples. Of the K subsamples, a single
subsample is retained as the validation data for testing the model, and the remaining K−1
subsamples are used as the training data. The cross-validation process is then repeated K
times (the folds), with each of the K subsamples used exactly once as the validation data.
Although cross-validation is widely adopted to determine a suitable fixed margin πœ€ in
SVR, it is a time-consuming and blind searching procedure with a computational
complexity of 𝑂(𝐾𝑅), where 𝑅 is the range of the parameter (F. Chen, Wei, & Tang,
2011). In contrast, the adaptive-margin SVR avoids this blind searching procedure and
11
improved the computational efficiency through establishing an adaptive margin, which is
heuristically determined by the day-to-day traffic variations.
For the dataset collected from Hefei, traffic counts on Monday are used as the training
samples, and the validation dataset contains traffic volumes from Tuesday to Friday.
Results with different combination of πœ†1 and πœ†2 are listed in Table 3. The predicted and
observed traffic counts on Tuesday are plotted in Figure 7. The NRSSME is plotted in
Figure 6, from which we can see that the predicting errors are descending for the
consecutive four weekdays with an increase in the proportion of local variation. For
nearly all the four of the predicting weekdays, the lowest predicting errors are obtained
when πœ†1 = 1 and πœ†2 = 0, which indicates utilizing local variation information without
normalized or global information can achieve a better performance. It is also interesting
to note that since the traffic on Friday shows a different pattern from typical weekdays,
the prediction errors also tend to be slightly larger.
Table 4 gives the forecasting errors between fixed and adaptive margin SVRs. It can be
seen that compared with the time-consuming K-fold cross-validation procedure in the
fixed-margin SVR, the proposed adaptive-margin SVR not only improves the
computational efficiency by heuristically determining an adaptive margin, but also
consistently outperforms the fixed-margin SVR on the forecasting accuracy.
NRSMSE for adaptive-margin
NRSMSE for fixed-margin
0.56
0.52
0.558
NRSMSE
0.515
0.556
0.554
0.51
0.552
0.55
0.548
-1
0.585
0.505 Wednesday
Tuesday
-0.6
-0.2
0.2
0.6
1
0.58
0.5
-1
0.59
-0.6
-0.2
0.2
0.6
1
0.585
0.575
0.58
0.57
0.575
Thursday
-1
-0.6
-0.2
0.2
λ 1-λ 2
0.6
1
Friday
-1
-0.5
0
λ 1-λ 2
0.5
1
Figure 6: Predicting results with different combination of λ1 and λ2
Table 3: Predicting errors for adaptive margin SVR for the dataset in Hefei
Tuesday
π€πŸ − π€πŸ
MAE
NSRMSE
-1
-0.6
-0.2
0.2
0.6
1
5.828
0.5581
5.834
0.5586
5.807
0.5575
5.773
0.5539
5.773
0.5517
5.766
0.5492
12
Wednesday
Thursday
Friday
MSE
MAE
NSRMSE
MSE
MAE
NSRMSE
MSE
MAE
NSRMSE
MSE
79.92
5.168
0.5182
68.57
5.828
0.5816
82.07
6.208
0.5834
83.29
80.06
5.114
0.5146
67.62
5.807
0.5791
81.37
6.181
0.5809
82.58
79.75
5.086
0.5132
67.25
5.773
0.5775
80.92
6.140
0.5784
82.12
78.72
5.052
0.5090
66.15
5.760
0.5748
80.16
6.079
0.5750
81.35
78.10
5.032
0.5051
65.14
5.692
0.5705
78.97
6.038
0.5731
80.14
77.39
5.005
0.5013
64.17
5.610
0.5662
77.78
6.032
0.5733
78.94
Table 4: Predicting errors for fixed-margin SVR and adaptive for the dataset in Hefei
𝜺 margin
Tuesday
Wednesday
Thursday
Friday
Adaptive
margin SVR
Fixed
margin SVR
5.766
0.5492
77.39
5.005
0.5013
64.17
0.0825
0.5662
77.78
5.610
0.5834
78.94
5.8072
0. 5575
79.74752
5.1068
0.5146
64.27
5.746
0.5787
82.30311
6.154
0.5895
85.56835
MAE
NSRMSE
MSE
MAE
NSRMSE
MSE
MAE
NSRMSE
MSE
MAE
NSRMSE
MSE
1
Predicted Value
Observed Value
Traffic volume per 5 minutes (Normalizd)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
360
720
1080
Time (minutes)
Figure 7: Illustration of predicted and observed traffic counts on Tuesday for the dataset collected in Hefei
Similar results are obtained for the three datasets from Seattle as listed in Table 5. The
traffic counts of the first five weekdays for the collection periods are used as the training
samples. The model is then evaluated with the traffic counts of the last four weeks.
Figure 8 plots the NSRMSE errors for different combination of πœ†1 and πœ†2 , from which we
can see that the best performances are also obtained for all datasets when the local
variations are fully utilized. The errors listed in Table 6 also indicate that the predicting
accuracy with the adaptive margin consistently outperforms the fixed margin SVR.
13
NSRMSE
0.2
0.195
ES394
0.19
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0.8
1
NSRMSE
0.22
0.215
0.21
NSRMSE
0.205
-1
0.13
ES766
-0.8
0.125
0.12
0.115
0.11
NSRMSE with adpative margin
NSRMSE with fixed-margin
ES855
0.105
-1
-0.8
-0.6
-0.4
0
-0.2
λ 1-λ 2
0.2
0.4
0.6
Figure 8: Predicting results with different combination of λ1 and λ2
Table 5: Predicting errors for adaptive margin SVR for the dataset in TDAD
ES394
ES766
ES855
π€πŸ − π€πŸ
MAE
NSRMSE
MSE
MAE
NSRMSE
MSE
MAE
NSRMSE
MSE
-1
-0.6
-0.2
0.2
0.6
1
5.434
0.1981
56.67
7.532
0.2132
112.6
13.83
0.1244
356.89
5.333
0.1955
55.19
7.477
0.2104
109.7
13.56
0.1214
340.0
5.120
0.1953
55.08
7.421
0.2093
108.6
12.34
0.1203
310.9
5.148
0.1922
53.34
7.425
0.2076
106.8
12.77
0.1161
333.7
5.015
0.1902
52.24
7.400
0.2062
105.4
12.57
0.1149
304.5
5.013
0.1901
52.18
7.370
0.2059
105.1
12.63
0.1153
306.6
Table 6: Predicting errors for fixed-margin SVR and adaptive for the dataset in Hefei
ES394
ES766
ES855
𝜺 margin
Adaptive
margin SVR
Fixed
margin SVR
5.013
0.1901
52.1819
7.370
0.2059
105.1
12.63
0.1153
306.6
5.412
0.1966
55.0757
7.457
0.2149
109.7
13.73
0.1247
331.6
MAE
NSRMSE
MSE
MAE
NSRMSE
MSE
MAE
NSRMSE
MSE
14
3.3.2 Model comparisons
Above experiments demonstrated that the adaptive margin SVR model consistently
outperformed the fixed margin SVR forecast. Next, comparisons with three forecasting
models including seasonal ARIMA, Radius Basis Function NN (RBF-NN) and Naive
forecast are conducted.
For the structure of ARIMA, in Williams et al. (1998), Williams and Hoel (2003) and
follow-on research (Smith et al., 2002), the seasonal ARIMA (1, 0, 1) (0, 1, 1)s
consistently emerged as the preferred structure. For the data with 5 minutes resolution,
the seasonal cycle length 𝑆 for a week is 2016, thus the recursive prediction equation is
written as:
yοΏ½ t+1 = yt−2015 + ∅1 (yt − yt−2016 ) − θ1 (yt − yοΏ½ t ) − Θ1 (yt−2015 − yοΏ½ t−2015 )
+θ1 Θ1 (yt−2016 − yοΏ½ t−2016 )
(21)
The parameters in Eq. (21) are then estimated using the Econometrics toolbox of
Matlabtm(2012a) with the calibration dataset of the first four weeks traffic count (8064
observations) as listed in table 5. With the calibrated parameters, the model is then
evaluated by iteratively forecasting the traffic counts of the second four weeks. The errors
are calculated regarding the weekday traffic.
Table 7: ARIMA parameters
ES394
0.9533
0.6127
0.4950
ES766
0.9280
𝜽𝟏
0.7492
0.5191
ES855
0.9510
0.4863
0.5027
Dataset
∅𝟏
𝜣𝟏
The architecture of the RBF-NN is determined by the package of DTREG (Sherrod,
2003), which applies the evolutionary approach developed by S. Chen, Hong, Harris, and
Sharkey (2004) to determine the optimal parameter set. Same as the adaptive SVR model,
traffic counts of the first week are used as the training samples. The third comparison
model as suggested in Smith et al. (2002) is the Naive forecast in the form of:
yt
yοΏ½ t+1 =
yt+1, hist
(22)
yt, hist
where yt, hist is the historical average of the traffic flow rate at the time-of-day and dayof-week associated with time interval t. Forecasting models that could not be shown to
consistently produce more accurate forecast than this naive method would be of little
value.
The performances of above comparing models, as well as the adaptive SVR model, are
listed in Table 7. It can be seen that the Naive forecast shows the largest errors for all
three datasets, and the adaptive margin SVR model consistently outperforms other
comparing models. In terms of NSRMSE, the adaptive-margin SVR model improves the
forecasting accuracy by 6%, 7% and 2% with ES394, ES766 and ES855 datasets,
respectively. Furthermore, SVR model only used one-week traffic data as the training
15
samples, while the Naive and ARIMA models both utilized traffic counts of four
consecutive weeks.
Table 7: Prediction errors for different models
Models
Adaptive margin
SVR
Naive
Forecast
RBF-NN
ARIMA
(1,0,1)(0,1,1)1026
ES394
MAE
MSE
NSRMSE
5.013
52.1819
0.1901
6.964
95.90
0.2564
5.420
55.58
0.1962
ES766
MAE
MSE
NSRMSE
7.357
105.0
0.2059
9.554
182.8
0.2722
7.655
110.8
0.2114
7.97
126.9
0.2243
MAE
MSE
NSRMSE
12.53
284.9
0.1113
15.00
439.3
0.1382
13.03
319.6
0.1177
14.85
428.8
0.1365
ES855
5.92
69.97
0.2159
4. Conclusion and future work
Accurately forecasting short-term traffic is a challenging task due to its complex
characteristics. This paper proposed an adaptive-margin SVR with integration of day-today traffic variation, which can adequately tolerate and track the time-varying volatility.
The adaptive-margin SVR avoids this blind searching procedure and improves the
computational efficiency through establishing an adaptive and time-varying πœ€ margin,
which is heuristically determined by the day-to-day traffic variations.
Experimental results with field data indicate that the adaptive πœ€ with only the local
variation can achieve better performance compared with combinations of the global
variation. The adaptive-margin model consistently outperforms the fixed πœ€ margin SVR
with improved computational efficiency. The impact of the training samples is also
discussed, demonstrating that a complete set of daily and weekly traffic counts plays a
fundamental role for SVR learning a seasonal traffic pattern.
We only investigated the πœ€ margin of the SVR model in this paper. However, other
parameters in SVR such as the regulation parameter 𝐢 also may affect the learning ability
and forecasting accuracy. Combinations of heuristically determined parameter set
including both the πœ€ margin and regulation parameter are expected to further improve the
performance of the forecasting model. And now this is being investigated by the authors.
Reference
Abdy, Z. R., & Hellinga, B. R. (2008). Use of Microsimulation to Model Day-to-Day Variability
of Intersection Performance. Transportation Research Record: Journal of the
Transportation Research Board, 2088(-1), 18-25.
Canu, S., Grandvalet, Y., Guigue, V., & Rakotomamonjy, A. (2005). Svm and kernel methods
matlab toolbox. Perception Systemes et Information, INSA de Rouen, Rouen, France, 2, 2.
Castro-Neto, M., Jeong, Y. S., Jeong, M. K., & Han, L. D. (2009). Online-SVR for short-term
traffic flow prediction under typical and atypical traffic conditions. Expert Systems with
Applications, 36(3), 6164-6173.
16
Chen, F., Wei, D., & Tang, Y. (2010). Wavelet analysis based sparse LS-SVR for time series data.
Paper presented at the Bio-Inspired Computing: Theories and Applications (BIC-TA),
2010 IEEE Fifth International Conference on.
Chen, F., Wei, D., & Tang, Y. (2011). Virtual Ion Selective Electrode for Online Measurement of
Nutrient Solution Components. Sensors Journal, IEEE, 11(2), 462-468.
Chen, S., Hong, X., Harris, C. J., & Sharkey, P. M. (2004). Sparse modeling using orthogonal
forward regression with PRESS statistic and regularization. Systems, Man, and
Cybernetics, Part B: Cybernetics, IEEE Transactions on, 34(2), 898-911.
Cherkassky, V., & Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for
SVM regression. Neural Networks, 17(1), 113-126. doi: 10.1016/s0893-6080(03)00169-2
Davis, G. A., & Nihan, N. L. (1991). Nonparametric regression and short-term freeway traffic
forecasting. Journal of Transportation Engineering, 117(2), 178-188.
Hellinga, B., & Abdy, Z. (2008). Signalized intersection analysis and design: Implications of dayto-day variability in peak-hour volumes on delay. Journal of Transportation Engineering,
134(7), 307-318.
Jiang, X., & Adeli, H. (2005). Dynamic wavelet neural network model for traffic flow forecasting.
Journal of Transportation Engineering, 131(10), 771-779.
Levinson, H. S., Sullivan, D., & Bryson, R. W. (2006). Effects of Urban Traffic Volume
Variations on Service Levels. Paper presented at the Transportation Research Board 85th
Annual Meeting.
Li, J. Q. (2011). Discretization modeling, integer programming formulations and dynamic
programming algorithms for robust traffic signal timing. Transportation Research Part C:
Emerging Technologies, 19(4), 708-719.
Li, L., Lin, W., & Liu, H. (2006). Type-2 fuzzy logic approach for short-term traffic forecasting.
Paper presented at the Intelligent Transport Systems, IEE Proceedings.
Park, B., Messer, C. J., & Urbanik II, T. (1998). Short-term freeway traffic volume forecasting
using radial basis function neural network. Transportation Research Record: Journal of
the Transportation Research Board, 1651(-1), 39-47.
Rakha, H., & Van Aerde, M. (1995). Statistical analysis of day-to-day variations in real-time
traffic flow data. Transportation research record, 26-34.
Sherrod, P. H. (2003). DTREG predictive modeling software. Software available at http://www.
dtreg. com.
Smith, B. L., Williams, B. M., & Keith Oswald, R. (2002). Comparison of parametric and
nonparametric models for traffic flow forecasting. Transportation Research Part C:
Emerging Technologies, 10(4), 303-321.
Smola, A. J., Schölkopf, B., & Müller, K. R. (1998). The connection between regularization
operators and support vector kernels. Neural Networks, 11(4), 637-649.
Vapnik, V. (1995). The nature of statistical learning theory: Springer-Verlag New York, Inc.
Wei, D., Chen, F., & Chen, C. (2010). Least-Square Wavelet-Support Vector Machine for Shortterm Traffic Flow Forecast. The Mediterranean Journal of Measurement and Control,
6(2), 39-45.
Williams, B. M., Durvasula, P. K., & Brown, D. E. (1998). Urban freeway traffic flow prediction:
application of seasonal autoregressive integrated moving average and exponential
smoothing models. Transportation Research Record: Journal of the Transportation
Research Board, 1644(-1), 132-141.
Williams, B. M., & Hoel, L. A. (1999). Modeling and forecasting vehicular traffic flow as a
seasonal stochastic time series process.
Williams, B. M., & Hoel, L. A. (2003). Modeling and forecasting vehicular traffic flow as a
seasonal ARIMA process: Theoretical basis and empirical results. Journal of
Transportation Engineering, 129(6), 664-672.
17
Xie, Y., & Zhang, Y. (2006). A wavelet network model for short-term traffic volume forecasting.
Journal of Intelligent Transportation Systems, 10(3), 141-150.
Yin, Y. (2008). Robust optimal traffic signal timing. Transportation Research Part B:
Methodological, 42(10), 911-924.
Zhang, L., Yin, Y., & Lou, Y. (2010). Robust Signal Timing for Arterials under Day-to-Day
Demand Variations. Transportation Research Record: Journal of the Transportation
Research Board, 2192(-1), 156-166.
Zhang, Y., & Xie, Y. (2008). Forecasting of short-term freeway volume with v-support vector
machines. Transportation Research Record: Journal of the Transportation Research
Board, 2024(-1), 92-99.
18
Download