6. model evaluation and validation

advertisement
6. MODEL EVALUATION AND VALIDATION
This chapter presents results on synthesizing Origin-Destination trip tables based on
information of link volumes using different test and real networks. For this purpose,
the popular Gur Network (Gur et al., 1980) and real data pertaining to the Pulaski
Network are employed. The focus in the reported computations is on the basic linear
programming model addressed in this thesis.
Although the band-width theory
developed in this thesis is not explicitly considered, the results exhibit general trends
for such LP based models, and could be used as a first-step leading into further
research.
6.1 Measures of Closeness Between O-D Trip Tables
In the context of O-D estimation problems, the following two criteria are usually used
to measure the quality of test results: measures based on the replication of link
volumes, and measures based on the closeness of the solution trip table to the target
table. Although alternative measures may be defined and used to quantify the merit
of different test results, the following measures are most popularly employed.
To compare the closeness of modeled volumes to the observed volumes, two
measures, the percentage root mean square error (%RMSE) and the percentage
mean absolute error (%MAE), are defined as follows.
% RMSE =
∑ (V
a∈ Av
a
assign
a
− Vobs
)2
n
∗
100 ∗ n
,
∑Vobsa
a∈ Av
6. Model Evaluation and Validation
% MAE =
∑ |V
a∈ Av
a
assign
∑V
a
− Vobs
|
a
obs
∗ 100 .
a∈ Av
87
a
a
Here V assign
and Vobs
are the assigned and observed volumes on link a , respectively,
n is the total number of links that have available link volumes, and Av represents the
set of links with available volumes.
To measure the closeness between a computed trip table and a target table, the
following statistics are frequently used.
% RMSE =
∑ (t
ij
− tij∗ ) 2
∗
v
% MAE =
nOD
∑| t − t
∑t
ij
∗
ij
φ = ∑ max(1, t ) ln
∗
ij
∗
ij
|
100 ∗ nOD
,
∑Vij∗
∗ 100 ,
max(1, tij∗ )
max(1, tij )
.
Here, tij∗ is the number of trips for O-D interchange (i, j ) , as specified by some prior
trip table (which might be a previous table, or are obtained through survey, a believed
to be “true”, or even quite arbitrary). tij is the estimated or modeled number of trips
for O-D interchange (i, j ) , and nOD is the number of feasible O-D interchanges.
Since the above statistics are measures of error in estimation, the smaller the values
of these measures, the closer is the modeled trip table to the target table. Ideally,
values of zero for each of these statistics would mean that the estimated table is the
same as the target table. Normally, the desirability of having low or zero values of
these statistics depends on how close the target table is to the (unknown) true table.
6. Model Evaluation and Validation
88
6.2 Tests on a Sample Network
The original version of the Linear Programming model required that traffic count data
on all the network links be given as input (Sherali et al. 1994a). It has been tested
(and evaluated) on several different networks, including a small sample hypothetical
corridor network as shown in Figure 6.1.
The network shown in Figure 6.1, often referred to as the Gur Network (Gur et al.
1980) in the literature, possesses many properties including multiple user-equilibrium
solutions, and is frequently used as a test network for different theoretical models.
The LP model, as well as two other popular O-D models, the Maximum Entropy
model (Bromage 1988, Fisk 1988, Beagon 1990) and the LINKOD model (Han et al.
1983), were tested on the Gur network using different target tables. Their outputs
were compared with respect to the correct trip table and observed volumes. The test
results are summarized in Table 6.1.
The “no-prior information” table simply uses uniform values, and is not indicative of a
desired solution.
The “relatively small errors” table uses a perturbation of the
“correct” table to reflect an approximate knowledge of reality. The “correct” trip table
is the true known solution for this hypothetical network. The results in Table 6.1
clearly display the advantage with respect to solution quality of using LP model.
6. Model Evaluation and Validation
89
1
4
7
9
11
2
8
10
12
3
6
5
=Centroid
Link
Index
Link
Observed
Impedance
Observed
Volume
1
(4,9)
10
2400
2
(5,10)
10
2000
3
(6,5)
40
100
4
(6,7)
10
5000
5
(6,8)
10
500
6
(7,1)
10
500
7
(7,9)
20
4500
8
(8.10)
20
500
9
(9,4)
10
2000
10
(9,10)
10
1500
11
(9,11)
20
4900
12
(10,5)
10
1600
13
(10,9)
10
1500
14
(10,12)
20
900
15
(11,2)
20
4800
16
(11,12)
10
300
17
(12,3)
20
1000
18
(12,11)
10
200
Figure 6.1: Gur Network and its Characteristics (Source: Gur et al., 1980).
6. Model Evaluation and Validation
90
Table 6.1. Test results of the different models on the Gur network
(Source: Sherali et al. 1994a)
Target Table
No-Prior-Information
Relatively Small Errors
Correct
LP
Max.
Entropy
LINKOD
LP
Max.
Entropy
LINKOD
LP
Max.
Entropy
LINKOD
Test Models
Equilibrium
Assigned Versus
RMSE
0.74
10.33
9.62
2.54
11.11
11.20
2.49
6.74
12.90
Observed Volumes
MAE
0.43
6.58
7.82
1.31
8.54
8.75
1.64
4.78
8.17
Closeness of
RMSE
872.57
764.46
795.47
118.83
127.37
101.92
0.0
6.86
26.72
Estimated Trip
MAE
555.91
662.91
601.27
77.09
82.45
67.00
0.0
5.00
17.82
Table to the Target
φ
11606.6
9028.81
7573.97
889.93
969.10
769.83
0.0
55.28
201.94
The sample network was also chosen as a test network to evaluate the performance
of an enhanced LP model and the Maximum Entropy model for a research project
sponsored by the Virginia Transportation Research Council (VTRC) (Sivanandan,
1996), for which the author was a graduate research assistant. By choosing different
target tables and different percentages of available volumes, a total of 78 test cases
were constructed on this network solved using both models. The structure of these
test cases is similar to the structure of the cases for the Pulaski network discussed in
the next section, and the reader is referred to the original results (Sivanandan et al.,
1996) for details on this study.
6.3 Tests on Real Networks
6.3.1
Networks and Models
The Linear Programming based models have also been tested on several real world
networks. For example, the original version of the LP model was tested on a portion
of real transportation network in Northern Virginia adjacent to Washington DC
(Sherali et al., 1994a).
6. Model Evaluation and Validation
91
The original version of the LP model had a restriction of requiring that counts are
available on all the links in the network. This condition is hard to realize for realistic
reasonably-sized networks. During the course of research, the Linear Programming
model was enhanced to overcome the restriction of requiring that traffic count data on
all the network links be given as input, and was adjusted to estimate O-D tables using
a partial set of volume data as discussed in Section 3.5. This enhanced model was
tested and evaluated in a research project supported by the Virginia Transportation
Research Council (Sivanandan et al., 1996).
To evaluate the performance of the LP model, a competitive program, The Highway
Emulator (THE), was used for comparison purpose. The Highway Emulator is a
program based on the Maximum Entropy model, and has been applied to several
practical cases. Both LP and THE models were tested on several real networks.
These test networks include the Purdue Network. a real network modeled from the
village network of Purdue University, West Lafayette, Indiana, and the Pulaski
Network, a network abstracted from the street network of Pulaski town, southwest
Virginia. A total of 148 cases were tested for both models in this project.
6.3.2
Pulaski Network
Among all the test cases, the study of the Pulaski network was interesting and
important from a validation viewpoint, because a surveyed trip table was available
through the Virginia Department of Transportation (VDOT). Located in the central
area of Pulaski, southwestern Virginia, Pulaski town had a population of around
10,000 in 1990. The network, as defined by VDOT, consists of 21 internal zones and
11 external stations.
These internal zones have been divided according to the
density of population and the activity centers in and around the area. The original
street map and the zonal division were provided by VDOT (Figure 6.1). The network
was reduced by the Center for Transportation Research of Virginia Tech to eliminate
redundancies and other information not necessary for test purposes, and symbolized
as in Figure 6.2. It consists of 32 zones, 57 intersection nodes, and 230 links.
6. Model Evaluation and Validation
92
In order to validate the O-D models with real data, VDOT conducted an O-D survey,
and established a daily trip table and a peak hour trip table.
For the detailed
information about the data collection and trip table survey, the reader is referred to
the related reports in Sivanandan et al. (1996), and the Center for Survey Research
(1994).
6.3.3
Different Test Cases
It must be noted that volume data was not available for 55 of the 230 links. These
links were mostly centroid connectors. Since these connectors are an abstraction of
several minor streets in the region, adequate measurements could not be performed.
This introduces another realistic element in the evaluation of the models.
There were total of 30 test cases performed on the Pulaski network for both models,
15 daily cases and 15 peak hour cases, using different combinations of available
information in the form of link volumes and prior trip tables. Three kinds of target
tables were used: “structural table”, “no-prior information” table, and “small error” trip
table. A structural target table is one for which the least amount of information is
provided in the target. All that is input to the model in a structural table are 0/1 cell
values, 1 signifying that the O-D interchange represented by that cell is a feasible
interchange, and 0 indicating an infeasible interchange. The “no-prior information”
target table is one that has uniform cell values, representing an average value of a
prior trip table. A “small error” target table represents a situation where an old and
not-so-outdated table for the region is available as a target. For the Pulaski network,
the small error target table was obtained by adding some random errors to the
surveyed table of VDOT.
6. Model Evaluation and Validation
93
Figure 6.2
6. Model Evaluation and Validation
Street map and zones of Pulaski Town.
94
Figure 6.3
6. Model Evaluation and Validation
Pulaski Network.
95
The output of each model on each test case was judged on its ability to match the
“correct table” as closely as possible, and at the same time, its capability to replicate
the observed volumes. For the Pulaski network, the surveyed table provided by
VDOT served as the “correct trip table” for evaluation purposes.
The closeness
measure was quantified via % RMSE , % MAE and φ , for both volume replication and
trip table matching.
6.3.4
Sensitivity Analysis
The statistical values obtained for the different test cases for the Pulaski network are
summarized in Table 6.2 for the peak-hour case, and in Table 6.3 for the 24-hour
case.
For the Pulaski network, there were 50 of the 230 links having no volume information.
In other words, the maximum percentage of links having available volume is about
75%. By eliminating the volume information on some unimportant links (which have
less traffic counts), two other cases were also constructed, having 60% and 50% of
links with known traffic volumes.
Furthermore, three different categories of
prior/target/seed tables were used in this test.
Similar to the case of the Gur network discussed in the previous section, the
structural table has 1/0 values for its cells to indicate merely if that trip interchange is
feasible of not. The no-prior-information trip table has a uniform value for all feasible
exchanges. This value was equal to 33 for the 24-hour case, which represents the
average surveyed trip exchange based on total number of trips factored by a value of
0.75. The factor of 0.75 was arbitrarily chosen, based on the assumption that the
total number of trips for a past period will be in the range, say, 70%-90% of current
total trips. Thus, this value was used in order to emulate past conditions. Similarly, a
value of 3 was used for the cells of the no-prior-information target table for the peak
hour case. This value was obtained using a factor of 0.8.
6. Model Evaluation and Validation
96
Table 6.2.
R
st
t
ul
es
Target
Te
Measure
% Vol
50%
%RMSE(TT) 60%
75%
50%
$MAE(TT)
60%
75%
50%
PHI(TT)
60%
75%
50%
%RMSE(VOL) 60%
75%
50%
%MAE(VOL) 60%
75%
Comparison using the Pulaski Network
Case: Peak Hour
Structural
No-Prior Inform
0-1
0-3
THE
LP
t
ul
es
$MAE(TT)
PHI(TT)
%RMSE(VOL)
%MAE(VOL)
Target
R
st
Te
%RMSE(T T)
60% Available
80% Available 100% Available
THE
THE
LP
LP
LP
THE
LP
566.24
565.91
555.38
475.63
525.7
464.2
525.7
604.8
525.7
395.43
619.1
606.05
622.75
569.16
594.85
585.4
594.85
499.51
594.85
416.51
640.47
615.28
641.7
484.88
619.04
452.52
619.04
515.84
619.04
438.57
213.94
172.92
217.82
162.58
174.84
140.06
174.84
124.83
174.84
117.57
219
196.98
222.78
164.1
185.2
155.53
185.2
136.54
185.2
125.56
230.1
179.41
233.81
166.91
197.34
144.58
197.36
149.66
197.34
112.16
7705
9668
7251
7333
5219
4310
5219
4125
5219
4183
8748
10548
8516
8126
6469
4850
6469
4398
6469
4574
8975
10732
8786
7746
6980
4942
6980
5291
6980
4893
16.63
20.84
15.25
22.73
18.07
18.75
18.07
19.55
18.07
21.62
18.84
12.78
18.38
16.69
26.95
16.57
26.95
12.65
26.95
14.28
24.49
17.64
24.26
20.31
24.76
19.95
24.76
16.34
24.76
16.68
8.97
9.24
8.11
11.73
10.09
8.24
10.09
8.08
10.09
8.6
10.56
6.42
10.08
9.23
15.13
6.04
15.13
5.58
15.13
6.74
14.73
10.97
14.42
11.6
14.12
8.49
14.12
8.06
14.12
8.63
Table 6.3.
Measure
THE
Small Error Target
Comparison on Puluski Network
Case: 24-Hour
Structural
No-Prior Inform
0-1
0-33
Sm all Error Target
60% Available
80% Available
100% Available
% Vol
THE
LP
THE
LP
THE
LP
THE
LP
THE
LP
50%
60%
75%
50%
60%
75%
50%
60%
75%
50%
60%
75%
50%
60%
75%
451.06
637.13
443.45
456.25
494.56
450.59
525.7
604.8
525.7
395.43
389.81
562.67
411.77
543.74
408.38
439.19
594.85
499.51
594.85
416.51
455.04
496.8
456.95
409.48
451.86
401.92
619.04
515.84
619.04
438.87
188.78
198.7
185.4
148.13
158.26
125.85
159.9
97.02
158.9
85.16
164
198.09
174.83
157.58
148.15
119
146.63
96.34
146.63
92.59
178.13
183.48
183.14
141.76
157.54
120.72
157.14
92.74
157.14
83.45
115573
191136
91855
77633
70353
50121
70944
31116
70944
27964
112558
187967
87795
84877
65407
41710
62296
28758
62296
26520
117450
197857
94730
81051
76042
49008
75659
35795
75659
36063
13.74
15.34
12.02
15.94
13.33
18
13.16
18.16
13.16
17.24
15.93
4.04
15.13
7.77
18.11
8.02
18.41
10.99
19.41
14.32
16.84
11.77
15.79
10.95
15.98
13.04
15.91
12.87
15.91
12.5
8.04
4.46
6.61
5.26
13.33
5.9
7.7
5.5
7.5
5.43
10.05
1.78
9.09
2.95
16.11
5.13
12.15
3.58
12.15
4.94
10.82
4.75
10.19
5.61
15.98
6.01
10.3
4.92
10.3
4
6. Model Evaluation and Validation
97
The “small error” trip tables were relatively close to the surveyed trip table and were
created as follows. Let C ij be the ij th cell value of the surveyed table, Ψ be the
mean ratio of the target table cell value to the surveyed table value, say Ψ = 0.8 . Let
β ij be a random distributed cell error bounded in an interval, say (-0.2, 0.2). Then
the ij th cell value Pij of the target table is defined as Pij = C ij ( Ψ + β ij ) . By randomly
selecting certain entries in the obtained table, three different target tables, which
represent different availabilities of previous information, were created for this
category.
The trends of these statistics as a function of the different target tables and the
different percentages of assumed available volumes could be visually compared.
These synthesized test results are graphically shown in Figures 6.4 through 6.8 for
the peak-hour case, and in Figures 6.9 through 6.13 for 24-hour case.
The case study of the Pulaski network is believed to be credible since the data
collection here was specially designed for the purpose of evaluating O-D models, and
a trip table was established through a conventional O-D survey for comparing the
model results. For this network, both daily (24-hour) and peak hour tables were
studied.
As expected, for both models, the trip table error statistics have high values for the
structural target table case. For this case, THE came out superior to LP in terms of
closeness of modeled tables to the surveyed table. When different versions of the
small error table are provided as target, both the models are seen to produce tables
that are closer to the surveyed table.
6. Model Evaluation and Validation
98
%RMSE(TT)
700
600
500
%RMSE(TT)
400
Small Error
300
200
100% 100
% Avail
Vol
No-Prior Inform
80%
Structural
0
60%
THE
0-3
50%
THE
0-1
THE
75%
THE
THE
Figure 6.4.
LP
LP
LP
LP
Target Trip Table
LP
Trip Table Comparisons (Modeled verse VDOT Surveyed)
Network: Pulaski Case: Peak Hour Measure: %RMSE(TT)
%MAE(TT)
250
200
Small Error
100
%MAE(TT)
150
50
100%
% Avail
Vol
No-Prior Inform
Structural
80%
0
60%
THE
0-3
50%
THE
0-1
THE
75%
THE
THE
Figure 6.5.
LP
LP
LP
LP
Target Trip Table
LP
Trip Table Comparisons (Modeled verse VDOT Surveyed)
Network: Pulaski Case: Peak Hour Measure: %MAE(TT)
6. Model Evaluation and Validation
99
%PHI(TT)
12000
10000
%PHI(TT)
8000
6000
Small Error
4000
100%
% Avail
Vol
No-Prior Inform
Structural
80%
0
60%
THE
0-3
50%
THE
75%
THE
THE
LP
LP
LP
THE
0-1
2000
LP
Target Trip Table
LP
Figure 6.6.
Trip Table Comparisons (Modeled verse VDOT Surveyed)
Network: Pulaski Case: Peak Hour Measure: PHI(TT)
%RMSE(VOL)
30
25
15
Small Error
10
100%
% Avail
Vol
No-Prior Inform
Structural
80%
THE
0-3
THE
0-1
THE
75%
THE
THE
LP
5
0
60%
50%
%RMSE(VOL)
20
LP
LP
LP
Target Trip Table
LP
Figure 6.7.
Volume Comparison (Modeled verse Observed)
Network: Pulaski Case: Peak Hour Measure: %RMSE(VOL)
6. Model Evaluation and Validation
100
%MAE(VOL)
30
25
15
Small Error
%MAE(VOL)
20
10
100%
% Avail
Vol
No-Prior Inform
Structural
80%
0
60%
THE
0-3
50%
THE
0-1
THE
75%
THE
THE
LP
5
LP
LP
LP
Target Trip Table
LP
Figure 6.8.
Volume Comparison (Modeled verse Observed)
Network: Pulaski Case: Peak Hour Measure: %MAE(VOL)
%RMSE(TT)
700
600
500
Small Error
300
%RMSE(TT)
400
200
100%
% Avail
Vol
No-Prior Inform
Structural
80%
0
60%
THE
0-33
50%
THE
0-1
THE
75%
THE
THE
LP
100
LP
LP
LP
Target Trip Table
LP
Figure 6.9.
Trip Table Comparisons (Modeled verse VDOT Surveyed)
Network: Pulaski Case: 24 Hour Measure: %RMSE(TT)
6. Model Evaluation and Validation
101
%MAE(TT)
200
180
160
140
100
Small Error
80
%MAE(TT)
120
60
40
100%
% Avail
Vol
No-Prior Inform
Structural
80%
60%
THE
0-33
50%
THE
0-1
THE
LP
LP
LP
LP
THE
75%
20
0
Target Trip Table
LP
THE
Figure 6.10. Trip Table Comparisons (Modeled verse VDOT Surveyed)
Network: Pulaski Case: 24 Hour
Measure: %MAE(TT)
%PHI(TT)
200000
180000
160000
140000
Small Error
80000
%PHI(TT)
120000
100000
60000
40000
100%
% Avail
Vol
No-Prior Inform
Structural
80%
0
60%
THE
0-33
50%
THE
0-1
THE
75%
THE
THE
LP
20000
LP
LP
LP
Target Trip Table
LP
Figure 6.11. Trip Table Comparisons (Modeled verse VDOT Surveyed)
Network: Pulaski Case: 24 Hour
6. Model Evaluation and Validation
Measure: %PHI(TT)
102
%RMSE(VOL)
20
18
16
12
10
Small Error
8
%RMSE(VOL)
14
6
4
100%
% Avail
Vol
No-Prior Inform
Structural
80%
0
60%
THE
0-33
50%
THE
0-1
THE
75%
THE
THE
LP
2
LP
LP
LP
Target Trip Table
LP
Figure 6.12. Volume Comparison (Modeled verse Observed)
Network: Pulaski Case: 24 Hour
Measure: %RMSE(VOL)
%MAE(VOL)
20
18
16
12
10
Small Error
8
%MAE(VOL)
14
6
4
100%
% Avail
Vol
No-Prior Inform
Structural
80%
0
60%
THE
0-33
50%
THE
0-1
THE
75%
THE
THE
LP
2
LP
LP
LP
Target Trip Table
LP
Figure 6.13. Volume Comparison (Modeled verse Observed)
Network: Pulaski Case: 24 Hour
6. Model Evaluation and Validation
Measure: %MAE(VOL)
103
Both the LP and THE models show mixed trends in performance with an increase in
available link flow information. This may be attributed to the fact that the link volumes
may
not
be
consistent
with
the
surveyed
O-D
inconsistencies/errors in observed link volume data.
flows,
or
to
possible
In general, the linear
programming model has lower values for the different statistics, except for the
structural target case. Also, the variation of the link volume replication error for the
LP and THE models, as measured by %RMSE(VOL) and %MAE(VOL), are depicted
in Figures 6.6 and 6.7 for the peak hour case, and in Figures 6.11 and 6.12 for the
daily case. The LP model obtains lower values for this statistics in every test case.
6.3.5
Comparison of Results
Most of the test results presented above favor the LP model, except for the structural
target cases. This conclusion is based on that the VDOT surveyed table represents
the “correct” or “true” trip table for the region. On the other hand, it should be pointed
out that the surveyed table itself was established via a sampling process, and
inconsistencies or errors in these tables and link volume data cannot be completely
ruled out. This was further confirmed by indications from VDOT, and through some
preliminary checks conducted by the study team (Sivanandan 1996).
6.4 Enhancing the Performance of the Models
As the results indicate, the quality of the modeled trip table for both the LP and THE
models are highly dependent upon the target table.
For this reason, some
socioeconomic variables and other classical methods were employed to obtain a
reasonable target (seed) table.
Both models were tested again on the Pulaski
network by using this new target (Sivanandan 1998).
The test results are
summarized in a project report of the Virginia Transportation Research Council.
While the performances of both models were enhanced by a more representative
target table, their relative performances were about the same as before.
6. Model Evaluation and Validation
104
Download