6. MODEL EVALUATION AND VALIDATION This chapter presents results on synthesizing Origin-Destination trip tables based on information of link volumes using different test and real networks. For this purpose, the popular Gur Network (Gur et al., 1980) and real data pertaining to the Pulaski Network are employed. The focus in the reported computations is on the basic linear programming model addressed in this thesis. Although the band-width theory developed in this thesis is not explicitly considered, the results exhibit general trends for such LP based models, and could be used as a first-step leading into further research. 6.1 Measures of Closeness Between O-D Trip Tables In the context of O-D estimation problems, the following two criteria are usually used to measure the quality of test results: measures based on the replication of link volumes, and measures based on the closeness of the solution trip table to the target table. Although alternative measures may be defined and used to quantify the merit of different test results, the following measures are most popularly employed. To compare the closeness of modeled volumes to the observed volumes, two measures, the percentage root mean square error (%RMSE) and the percentage mean absolute error (%MAE), are defined as follows. % RMSE = ∑ (V a∈ Av a assign a − Vobs )2 n ∗ 100 ∗ n , ∑Vobsa a∈ Av 6. Model Evaluation and Validation % MAE = ∑ |V a∈ Av a assign ∑V a − Vobs | a obs ∗ 100 . a∈ Av 87 a a Here V assign and Vobs are the assigned and observed volumes on link a , respectively, n is the total number of links that have available link volumes, and Av represents the set of links with available volumes. To measure the closeness between a computed trip table and a target table, the following statistics are frequently used. % RMSE = ∑ (t ij − tij∗ ) 2 ∗ v % MAE = nOD ∑| t − t ∑t ij ∗ ij φ = ∑ max(1, t ) ln ∗ ij ∗ ij | 100 ∗ nOD , ∑Vij∗ ∗ 100 , max(1, tij∗ ) max(1, tij ) . Here, tij∗ is the number of trips for O-D interchange (i, j ) , as specified by some prior trip table (which might be a previous table, or are obtained through survey, a believed to be “true”, or even quite arbitrary). tij is the estimated or modeled number of trips for O-D interchange (i, j ) , and nOD is the number of feasible O-D interchanges. Since the above statistics are measures of error in estimation, the smaller the values of these measures, the closer is the modeled trip table to the target table. Ideally, values of zero for each of these statistics would mean that the estimated table is the same as the target table. Normally, the desirability of having low or zero values of these statistics depends on how close the target table is to the (unknown) true table. 6. Model Evaluation and Validation 88 6.2 Tests on a Sample Network The original version of the Linear Programming model required that traffic count data on all the network links be given as input (Sherali et al. 1994a). It has been tested (and evaluated) on several different networks, including a small sample hypothetical corridor network as shown in Figure 6.1. The network shown in Figure 6.1, often referred to as the Gur Network (Gur et al. 1980) in the literature, possesses many properties including multiple user-equilibrium solutions, and is frequently used as a test network for different theoretical models. The LP model, as well as two other popular O-D models, the Maximum Entropy model (Bromage 1988, Fisk 1988, Beagon 1990) and the LINKOD model (Han et al. 1983), were tested on the Gur network using different target tables. Their outputs were compared with respect to the correct trip table and observed volumes. The test results are summarized in Table 6.1. The “no-prior information” table simply uses uniform values, and is not indicative of a desired solution. The “relatively small errors” table uses a perturbation of the “correct” table to reflect an approximate knowledge of reality. The “correct” trip table is the true known solution for this hypothetical network. The results in Table 6.1 clearly display the advantage with respect to solution quality of using LP model. 6. Model Evaluation and Validation 89 1 4 7 9 11 2 8 10 12 3 6 5 =Centroid Link Index Link Observed Impedance Observed Volume 1 (4,9) 10 2400 2 (5,10) 10 2000 3 (6,5) 40 100 4 (6,7) 10 5000 5 (6,8) 10 500 6 (7,1) 10 500 7 (7,9) 20 4500 8 (8.10) 20 500 9 (9,4) 10 2000 10 (9,10) 10 1500 11 (9,11) 20 4900 12 (10,5) 10 1600 13 (10,9) 10 1500 14 (10,12) 20 900 15 (11,2) 20 4800 16 (11,12) 10 300 17 (12,3) 20 1000 18 (12,11) 10 200 Figure 6.1: Gur Network and its Characteristics (Source: Gur et al., 1980). 6. Model Evaluation and Validation 90 Table 6.1. Test results of the different models on the Gur network (Source: Sherali et al. 1994a) Target Table No-Prior-Information Relatively Small Errors Correct LP Max. Entropy LINKOD LP Max. Entropy LINKOD LP Max. Entropy LINKOD Test Models Equilibrium Assigned Versus RMSE 0.74 10.33 9.62 2.54 11.11 11.20 2.49 6.74 12.90 Observed Volumes MAE 0.43 6.58 7.82 1.31 8.54 8.75 1.64 4.78 8.17 Closeness of RMSE 872.57 764.46 795.47 118.83 127.37 101.92 0.0 6.86 26.72 Estimated Trip MAE 555.91 662.91 601.27 77.09 82.45 67.00 0.0 5.00 17.82 Table to the Target φ 11606.6 9028.81 7573.97 889.93 969.10 769.83 0.0 55.28 201.94 The sample network was also chosen as a test network to evaluate the performance of an enhanced LP model and the Maximum Entropy model for a research project sponsored by the Virginia Transportation Research Council (VTRC) (Sivanandan, 1996), for which the author was a graduate research assistant. By choosing different target tables and different percentages of available volumes, a total of 78 test cases were constructed on this network solved using both models. The structure of these test cases is similar to the structure of the cases for the Pulaski network discussed in the next section, and the reader is referred to the original results (Sivanandan et al., 1996) for details on this study. 6.3 Tests on Real Networks 6.3.1 Networks and Models The Linear Programming based models have also been tested on several real world networks. For example, the original version of the LP model was tested on a portion of real transportation network in Northern Virginia adjacent to Washington DC (Sherali et al., 1994a). 6. Model Evaluation and Validation 91 The original version of the LP model had a restriction of requiring that counts are available on all the links in the network. This condition is hard to realize for realistic reasonably-sized networks. During the course of research, the Linear Programming model was enhanced to overcome the restriction of requiring that traffic count data on all the network links be given as input, and was adjusted to estimate O-D tables using a partial set of volume data as discussed in Section 3.5. This enhanced model was tested and evaluated in a research project supported by the Virginia Transportation Research Council (Sivanandan et al., 1996). To evaluate the performance of the LP model, a competitive program, The Highway Emulator (THE), was used for comparison purpose. The Highway Emulator is a program based on the Maximum Entropy model, and has been applied to several practical cases. Both LP and THE models were tested on several real networks. These test networks include the Purdue Network. a real network modeled from the village network of Purdue University, West Lafayette, Indiana, and the Pulaski Network, a network abstracted from the street network of Pulaski town, southwest Virginia. A total of 148 cases were tested for both models in this project. 6.3.2 Pulaski Network Among all the test cases, the study of the Pulaski network was interesting and important from a validation viewpoint, because a surveyed trip table was available through the Virginia Department of Transportation (VDOT). Located in the central area of Pulaski, southwestern Virginia, Pulaski town had a population of around 10,000 in 1990. The network, as defined by VDOT, consists of 21 internal zones and 11 external stations. These internal zones have been divided according to the density of population and the activity centers in and around the area. The original street map and the zonal division were provided by VDOT (Figure 6.1). The network was reduced by the Center for Transportation Research of Virginia Tech to eliminate redundancies and other information not necessary for test purposes, and symbolized as in Figure 6.2. It consists of 32 zones, 57 intersection nodes, and 230 links. 6. Model Evaluation and Validation 92 In order to validate the O-D models with real data, VDOT conducted an O-D survey, and established a daily trip table and a peak hour trip table. For the detailed information about the data collection and trip table survey, the reader is referred to the related reports in Sivanandan et al. (1996), and the Center for Survey Research (1994). 6.3.3 Different Test Cases It must be noted that volume data was not available for 55 of the 230 links. These links were mostly centroid connectors. Since these connectors are an abstraction of several minor streets in the region, adequate measurements could not be performed. This introduces another realistic element in the evaluation of the models. There were total of 30 test cases performed on the Pulaski network for both models, 15 daily cases and 15 peak hour cases, using different combinations of available information in the form of link volumes and prior trip tables. Three kinds of target tables were used: “structural table”, “no-prior information” table, and “small error” trip table. A structural target table is one for which the least amount of information is provided in the target. All that is input to the model in a structural table are 0/1 cell values, 1 signifying that the O-D interchange represented by that cell is a feasible interchange, and 0 indicating an infeasible interchange. The “no-prior information” target table is one that has uniform cell values, representing an average value of a prior trip table. A “small error” target table represents a situation where an old and not-so-outdated table for the region is available as a target. For the Pulaski network, the small error target table was obtained by adding some random errors to the surveyed table of VDOT. 6. Model Evaluation and Validation 93 Figure 6.2 6. Model Evaluation and Validation Street map and zones of Pulaski Town. 94 Figure 6.3 6. Model Evaluation and Validation Pulaski Network. 95 The output of each model on each test case was judged on its ability to match the “correct table” as closely as possible, and at the same time, its capability to replicate the observed volumes. For the Pulaski network, the surveyed table provided by VDOT served as the “correct trip table” for evaluation purposes. The closeness measure was quantified via % RMSE , % MAE and φ , for both volume replication and trip table matching. 6.3.4 Sensitivity Analysis The statistical values obtained for the different test cases for the Pulaski network are summarized in Table 6.2 for the peak-hour case, and in Table 6.3 for the 24-hour case. For the Pulaski network, there were 50 of the 230 links having no volume information. In other words, the maximum percentage of links having available volume is about 75%. By eliminating the volume information on some unimportant links (which have less traffic counts), two other cases were also constructed, having 60% and 50% of links with known traffic volumes. Furthermore, three different categories of prior/target/seed tables were used in this test. Similar to the case of the Gur network discussed in the previous section, the structural table has 1/0 values for its cells to indicate merely if that trip interchange is feasible of not. The no-prior-information trip table has a uniform value for all feasible exchanges. This value was equal to 33 for the 24-hour case, which represents the average surveyed trip exchange based on total number of trips factored by a value of 0.75. The factor of 0.75 was arbitrarily chosen, based on the assumption that the total number of trips for a past period will be in the range, say, 70%-90% of current total trips. Thus, this value was used in order to emulate past conditions. Similarly, a value of 3 was used for the cells of the no-prior-information target table for the peak hour case. This value was obtained using a factor of 0.8. 6. Model Evaluation and Validation 96 Table 6.2. R st t ul es Target Te Measure % Vol 50% %RMSE(TT) 60% 75% 50% $MAE(TT) 60% 75% 50% PHI(TT) 60% 75% 50% %RMSE(VOL) 60% 75% 50% %MAE(VOL) 60% 75% Comparison using the Pulaski Network Case: Peak Hour Structural No-Prior Inform 0-1 0-3 THE LP t ul es $MAE(TT) PHI(TT) %RMSE(VOL) %MAE(VOL) Target R st Te %RMSE(T T) 60% Available 80% Available 100% Available THE THE LP LP LP THE LP 566.24 565.91 555.38 475.63 525.7 464.2 525.7 604.8 525.7 395.43 619.1 606.05 622.75 569.16 594.85 585.4 594.85 499.51 594.85 416.51 640.47 615.28 641.7 484.88 619.04 452.52 619.04 515.84 619.04 438.57 213.94 172.92 217.82 162.58 174.84 140.06 174.84 124.83 174.84 117.57 219 196.98 222.78 164.1 185.2 155.53 185.2 136.54 185.2 125.56 230.1 179.41 233.81 166.91 197.34 144.58 197.36 149.66 197.34 112.16 7705 9668 7251 7333 5219 4310 5219 4125 5219 4183 8748 10548 8516 8126 6469 4850 6469 4398 6469 4574 8975 10732 8786 7746 6980 4942 6980 5291 6980 4893 16.63 20.84 15.25 22.73 18.07 18.75 18.07 19.55 18.07 21.62 18.84 12.78 18.38 16.69 26.95 16.57 26.95 12.65 26.95 14.28 24.49 17.64 24.26 20.31 24.76 19.95 24.76 16.34 24.76 16.68 8.97 9.24 8.11 11.73 10.09 8.24 10.09 8.08 10.09 8.6 10.56 6.42 10.08 9.23 15.13 6.04 15.13 5.58 15.13 6.74 14.73 10.97 14.42 11.6 14.12 8.49 14.12 8.06 14.12 8.63 Table 6.3. Measure THE Small Error Target Comparison on Puluski Network Case: 24-Hour Structural No-Prior Inform 0-1 0-33 Sm all Error Target 60% Available 80% Available 100% Available % Vol THE LP THE LP THE LP THE LP THE LP 50% 60% 75% 50% 60% 75% 50% 60% 75% 50% 60% 75% 50% 60% 75% 451.06 637.13 443.45 456.25 494.56 450.59 525.7 604.8 525.7 395.43 389.81 562.67 411.77 543.74 408.38 439.19 594.85 499.51 594.85 416.51 455.04 496.8 456.95 409.48 451.86 401.92 619.04 515.84 619.04 438.87 188.78 198.7 185.4 148.13 158.26 125.85 159.9 97.02 158.9 85.16 164 198.09 174.83 157.58 148.15 119 146.63 96.34 146.63 92.59 178.13 183.48 183.14 141.76 157.54 120.72 157.14 92.74 157.14 83.45 115573 191136 91855 77633 70353 50121 70944 31116 70944 27964 112558 187967 87795 84877 65407 41710 62296 28758 62296 26520 117450 197857 94730 81051 76042 49008 75659 35795 75659 36063 13.74 15.34 12.02 15.94 13.33 18 13.16 18.16 13.16 17.24 15.93 4.04 15.13 7.77 18.11 8.02 18.41 10.99 19.41 14.32 16.84 11.77 15.79 10.95 15.98 13.04 15.91 12.87 15.91 12.5 8.04 4.46 6.61 5.26 13.33 5.9 7.7 5.5 7.5 5.43 10.05 1.78 9.09 2.95 16.11 5.13 12.15 3.58 12.15 4.94 10.82 4.75 10.19 5.61 15.98 6.01 10.3 4.92 10.3 4 6. Model Evaluation and Validation 97 The “small error” trip tables were relatively close to the surveyed trip table and were created as follows. Let C ij be the ij th cell value of the surveyed table, Ψ be the mean ratio of the target table cell value to the surveyed table value, say Ψ = 0.8 . Let β ij be a random distributed cell error bounded in an interval, say (-0.2, 0.2). Then the ij th cell value Pij of the target table is defined as Pij = C ij ( Ψ + β ij ) . By randomly selecting certain entries in the obtained table, three different target tables, which represent different availabilities of previous information, were created for this category. The trends of these statistics as a function of the different target tables and the different percentages of assumed available volumes could be visually compared. These synthesized test results are graphically shown in Figures 6.4 through 6.8 for the peak-hour case, and in Figures 6.9 through 6.13 for 24-hour case. The case study of the Pulaski network is believed to be credible since the data collection here was specially designed for the purpose of evaluating O-D models, and a trip table was established through a conventional O-D survey for comparing the model results. For this network, both daily (24-hour) and peak hour tables were studied. As expected, for both models, the trip table error statistics have high values for the structural target table case. For this case, THE came out superior to LP in terms of closeness of modeled tables to the surveyed table. When different versions of the small error table are provided as target, both the models are seen to produce tables that are closer to the surveyed table. 6. Model Evaluation and Validation 98 %RMSE(TT) 700 600 500 %RMSE(TT) 400 Small Error 300 200 100% 100 % Avail Vol No-Prior Inform 80% Structural 0 60% THE 0-3 50% THE 0-1 THE 75% THE THE Figure 6.4. LP LP LP LP Target Trip Table LP Trip Table Comparisons (Modeled verse VDOT Surveyed) Network: Pulaski Case: Peak Hour Measure: %RMSE(TT) %MAE(TT) 250 200 Small Error 100 %MAE(TT) 150 50 100% % Avail Vol No-Prior Inform Structural 80% 0 60% THE 0-3 50% THE 0-1 THE 75% THE THE Figure 6.5. LP LP LP LP Target Trip Table LP Trip Table Comparisons (Modeled verse VDOT Surveyed) Network: Pulaski Case: Peak Hour Measure: %MAE(TT) 6. Model Evaluation and Validation 99 %PHI(TT) 12000 10000 %PHI(TT) 8000 6000 Small Error 4000 100% % Avail Vol No-Prior Inform Structural 80% 0 60% THE 0-3 50% THE 75% THE THE LP LP LP THE 0-1 2000 LP Target Trip Table LP Figure 6.6. Trip Table Comparisons (Modeled verse VDOT Surveyed) Network: Pulaski Case: Peak Hour Measure: PHI(TT) %RMSE(VOL) 30 25 15 Small Error 10 100% % Avail Vol No-Prior Inform Structural 80% THE 0-3 THE 0-1 THE 75% THE THE LP 5 0 60% 50% %RMSE(VOL) 20 LP LP LP Target Trip Table LP Figure 6.7. Volume Comparison (Modeled verse Observed) Network: Pulaski Case: Peak Hour Measure: %RMSE(VOL) 6. Model Evaluation and Validation 100 %MAE(VOL) 30 25 15 Small Error %MAE(VOL) 20 10 100% % Avail Vol No-Prior Inform Structural 80% 0 60% THE 0-3 50% THE 0-1 THE 75% THE THE LP 5 LP LP LP Target Trip Table LP Figure 6.8. Volume Comparison (Modeled verse Observed) Network: Pulaski Case: Peak Hour Measure: %MAE(VOL) %RMSE(TT) 700 600 500 Small Error 300 %RMSE(TT) 400 200 100% % Avail Vol No-Prior Inform Structural 80% 0 60% THE 0-33 50% THE 0-1 THE 75% THE THE LP 100 LP LP LP Target Trip Table LP Figure 6.9. Trip Table Comparisons (Modeled verse VDOT Surveyed) Network: Pulaski Case: 24 Hour Measure: %RMSE(TT) 6. Model Evaluation and Validation 101 %MAE(TT) 200 180 160 140 100 Small Error 80 %MAE(TT) 120 60 40 100% % Avail Vol No-Prior Inform Structural 80% 60% THE 0-33 50% THE 0-1 THE LP LP LP LP THE 75% 20 0 Target Trip Table LP THE Figure 6.10. Trip Table Comparisons (Modeled verse VDOT Surveyed) Network: Pulaski Case: 24 Hour Measure: %MAE(TT) %PHI(TT) 200000 180000 160000 140000 Small Error 80000 %PHI(TT) 120000 100000 60000 40000 100% % Avail Vol No-Prior Inform Structural 80% 0 60% THE 0-33 50% THE 0-1 THE 75% THE THE LP 20000 LP LP LP Target Trip Table LP Figure 6.11. Trip Table Comparisons (Modeled verse VDOT Surveyed) Network: Pulaski Case: 24 Hour 6. Model Evaluation and Validation Measure: %PHI(TT) 102 %RMSE(VOL) 20 18 16 12 10 Small Error 8 %RMSE(VOL) 14 6 4 100% % Avail Vol No-Prior Inform Structural 80% 0 60% THE 0-33 50% THE 0-1 THE 75% THE THE LP 2 LP LP LP Target Trip Table LP Figure 6.12. Volume Comparison (Modeled verse Observed) Network: Pulaski Case: 24 Hour Measure: %RMSE(VOL) %MAE(VOL) 20 18 16 12 10 Small Error 8 %MAE(VOL) 14 6 4 100% % Avail Vol No-Prior Inform Structural 80% 0 60% THE 0-33 50% THE 0-1 THE 75% THE THE LP 2 LP LP LP Target Trip Table LP Figure 6.13. Volume Comparison (Modeled verse Observed) Network: Pulaski Case: 24 Hour 6. Model Evaluation and Validation Measure: %MAE(VOL) 103 Both the LP and THE models show mixed trends in performance with an increase in available link flow information. This may be attributed to the fact that the link volumes may not be consistent with the surveyed O-D inconsistencies/errors in observed link volume data. flows, or to possible In general, the linear programming model has lower values for the different statistics, except for the structural target case. Also, the variation of the link volume replication error for the LP and THE models, as measured by %RMSE(VOL) and %MAE(VOL), are depicted in Figures 6.6 and 6.7 for the peak hour case, and in Figures 6.11 and 6.12 for the daily case. The LP model obtains lower values for this statistics in every test case. 6.3.5 Comparison of Results Most of the test results presented above favor the LP model, except for the structural target cases. This conclusion is based on that the VDOT surveyed table represents the “correct” or “true” trip table for the region. On the other hand, it should be pointed out that the surveyed table itself was established via a sampling process, and inconsistencies or errors in these tables and link volume data cannot be completely ruled out. This was further confirmed by indications from VDOT, and through some preliminary checks conducted by the study team (Sivanandan 1996). 6.4 Enhancing the Performance of the Models As the results indicate, the quality of the modeled trip table for both the LP and THE models are highly dependent upon the target table. For this reason, some socioeconomic variables and other classical methods were employed to obtain a reasonable target (seed) table. Both models were tested again on the Pulaski network by using this new target (Sivanandan 1998). The test results are summarized in a project report of the Virginia Transportation Research Council. While the performances of both models were enhanced by a more representative target table, their relative performances were about the same as before. 6. Model Evaluation and Validation 104