Route Duration & Variation Estimator by Matthew A. Halsmer Fall 2003-ECE 539 Executive Summary The development of a driving time and variation estimator based upon the number of stop lights, stop signs, and total miles for a given route was approached with various neural network types. The inputs to the model were 12 different data vectors. Each vector represented 4 different drive points in the log so an average and a standard deviation could be established. Of these 12 vectors, 7 were used to train the models, 2 used to test the models relative to each other, and the final 3 vectors as the test vectors against the alternate estimation methods. The alternate estimation methods include 2 experienced drivers and a YahooMaps! drive time estimator. The various neural networks models evaluated included the radial basis network model, multi-layer perceptron model, and a single perceptron model. Of these, the one that worked the best for both cases of average and standard deviation of drive time was the single perceptron with a slight caveat for the case of the standard deviation. For both of these cases this was also the model type that agreed most with the intuitive nature of the physical problem and so for the case of the standard deviation the squared variation was linearly combined in the single perceptron to produce the routes squared variation. From a practical perspective, this model development suggests that for in town driving each mile, with no stop signs and no stoplights, has a time cost of 1 minute and 15 seconds. The time cost for each stoplight and each stop sign is approximately 26 and 20 seconds respectively. Additionally there is a bias term for every trip of approximately 15 seconds. The results for the variation were not that easily explained, as there were negative terms that resulted. For instance the weight representing the squared standard deviation of each mile came out to be negative. If improvement is desired, it is suggested that further development of more data vectors and a larger quantity of data points representing a single data vector should be completed in order reduce potential noise resulting from the limited data used in this study. Introduction The time it takes to get from point A to point B can be difficult to estimate especially when considering traffic lights, stop signs, and other traffic. Current solutions vary from the quite simple guess based on the neural network of the brain, to the automated time estimates generated by Yahoo Maps & MapQuest however even these can be unreliable. Many of these estimators are at their best when focused on highway driving conditions where average speed can be easily estimated and hence so can drive time based on overall highway miles. It is perceived that they do not estimate in town driving well due to the unpredictability of stop signs and stoplights. The proposal is to apply a Neural Network to the problem of in-town driving to estimate not only nominal driving time but also the standard deviation in required drive time. The proposed inputs are: x1 = # of stop light x2 = # of stop signs x3 = # of miles The evaluation of the developed neural network will be against the two most popular estimators: “The experienced driver” & “Yahoo Maps”. Work Performed: Data Collection The first step in this assignment was to collect some data. This involved lying out and driving twelve different routes around town. The routes were chosen in such a way to mix up the inputs such that a wide variety of distances, # of stoplights, and # of stop signs were collected. This will help ensure that the neural network can decipher the effect of the various elements. Each of the routes were then driven four times each while logging the time it took to complete the route. Table 1 shows a summary of the data collected. For a list of all of the raw data, see Appendix A. Table 1: Summary of Collected Data from Driving Log Inputs Outputs Route # # of # of Miles Avg [Min] Std [Min] lights Stops 1 13 1 4.5 13.25 0.54 2 5 13 10.25 20.10 0.41 3 11 6 4.9 11.35 0.60 4 3 0 0.8 2.42 0.53 5 3 1 3.2 6.39 0.26 6 5 3 6.1 11.40 0.37 7 9 7 4.8 12.42 0.91 8 3 7 5.6 11.62 0.30 9 3 2 1.2 4.15 0.38 10 12 3 4.55 11.53 0.38 11 2 0 2.15 3.78 0.46 12 10 2 5.6 11.72 0.69 The data will be split up in such a way that the first 9 data sets will be used as the training set while the last 3 data sets will be used as the test set for comparing accuracy versus alternate methods. In order to collect the comparison data set of human neural network, two subjects were recruited. The subjects were informed of the route inputs and left to estimate the route times for four different drives to give a feel for how much variation they expected. The results for this are found in table 2 below. Table 2: Human Neural Network Estimates (in minutes) Route Number 10 11 12 Friend 1 Friend 2 avg 17.00 15.98 std 0.83 1.75 avg 5.00 5.52 std 0.50 0.64 avg 19.00 18.71 std 1.50 1.13 The final piece of data that needed to be founds was the time estimate for these three routes from Yahoo Driving Directions estimates. One thing should be said about these estimates. First of all, Yahoo does not give time variation estimates. Secondly, the resolution on their estimates is 1 minute, so if rounded at its worst case, it could be off by just under 0.5 minutes and still be correct. This isn’t too bad if the point A to point B driven is the same as they select. However, for route #12, the route had to include an intermediate point in order to force the route to follow the same path. With the poor resolution it could be off as much as much as 59 seconds and still be what Yahoo expected. The predictions are found in Table 3 below. Table 3: Yahoo Estimates (In minutes) Route Number Yahoo 10 10 11 4 12 11 Work Performed: Model Development The various models in this section will be developed off the first 7 data sets and then compared amongst themselves for with the a test set from data sets 8 and 9. This will ensure that when the final three data sets are compared versus the other methods it is the first time that they will be compared. For a quick comparison, the metric used between various models will be the Sum of Squared Error value will be used. In this case N 1 Etest = N (d- y^)2 [units of min2] i=1 Part 1: Estimating Average Driving Time Model 1: RBN Type I The first thought was to build a radial basis network as it is often termed the “universal approximator”. The first attempt was to make a type 1 radial basis network with Gaussian distribution kernel function. Table 3 shows the results of this network. Table 3: RBN Type 1 Results Route d y^ Number Training 1 13.25 13.25 Sets 2 20.1 20.1 3 11.35 11.35 4 2.42 2.42 5 6.39 6.39 6 11.4 11.4 7 12.42 12.42 Test 8 11.62 0.0002 Sets 9 4.15 0.6686 With the parameter alpha set to zero, the training set shows no errors as expected. Unfortunately, the same cannot be said for the test set. For this case: Etest = 73.6 min2 Obviously some improvement is required. Model 2: RBN Type II Then a type II radial basis network was with a Gaussian distribution kernel function was also tried. Table 4 shows how Etest varied with the number of radial basis functions used Table 4: RBN Type II Results # of Etest cluster centers 2 69.69 3 69.69 4 69.69 5 72.81 6 73.14 7 73.6 Again, some improvement is required. Model 3: MLP 3-3-1 Model The next thought was to builds a multi-layer perceptron to estimate the driving time. For this a single hidden layer was used with 3 hidden neurons. The hidden layer neurons used a hyperbolic tangent activation function while the output neuron used a linear activation function. Using standard back propagation techniques with a learning rate parameter of 0.001, the momentum parameter set to zero, and an epoch size of 3, the system converged to a reasonable solution. Training Sets Table 5: 3-3-1 MLP Training Results d y^ 13.25 12.22 20.10 13.66 11.35 12.99 2.42 2.80 6.39 6.24 11.40 11.46 12.42 12.93 Test 11.62 12.52 Set 4.15 6.45 This shows substantial improvement over the last two models! For this model Etest = 3.06 A few other MLP’s were also tried with with varying degrees of success as shown in table 6 below. Table 6: MLP Results Node Etest Configuration 3-3-1 3.06 3-3-3-1 9.89 3-3-3-3-1 13.72 3-4-1 1.26 3-5-1 4.06 3-8-1 11.75 The best result now is the MLP with a configuration of 3-4-1 (1 hidden node layer with 4 nodes). The weights for this case are shown below in Table 7. w{1-2} w{2-3} Table 7: Weights of 3-4-1 MLP 0.28 -0.10 -0.18 -0.02 -0.47 0.10 0.20 0.01 0.26 -0.04 -0.09 -0.17 0.43 -0.06 -0.04 -0.28 3.00 -2.83 2.83 -2.93 -2.91 Model 4: Single Linear Perceptron Model The single linear perceptron configuration probably could have been the first choice had this been approached properly from a physical perspective. When thinking about the problem, one could say that the terms generally should be additive. For instance, for every mile traveling at 45 mph, it will take 1 minute and 20 seconds. Then there is probably 10 – 30 seconds on average wasted at each stoplight and stop sign. With this rational the hopes are fairly high. The same tuning parameters were used for this section as in the MLP model training listed above. The results in this case are shown in Table 8 below. For this case Etest = 0.29 min2. Substantial improvement has been made! This probably should have been the first choice for the system since it is the one that makes intuitive sense. Training Set Table 8: Single Linear Perceptron Training Results d y^ Test Set 13.25 20.10 11.35 2.42 6.39 11.40 12.42 11.62 4.15 11.86 19.73 13.21 2.56 5.92 11.10 12.56 10.98 3.74 Table 9: Single Linear Perceptron Weights [min] [sec] 0.26 15.7 wbias wstoplight wstopsign 0.43 25.9 0.34 20.4 wmile 1.26 75.4 Table 9 shows the resulting weights for this case. Its interesting to note that the parameter wmile corresponds approximately to what it would if driving at 45 mph. Additionally 20 seconds can be attributed to each stop sign, and 26 seconds to each stop light. Mathematically those times are not just what time is spent at the stop sign, but also includes the time lost decelerating and accelerating into and out of the intersection. The interesting term is the bias. From a physical perspective it can be argued that this was a result of the first acceleration and the last deceleration. The value only comes out to 16 seconds, which seems about right. Part 2: Estimating Standard Deviation Driving Time Model 5: RBN Type I for Variation A radial basis network was also tried first on this output parameter. Again using a Gaussian distribution kernel function, a type I RBN was developed. Table 10 shows the results of this network. Table 10: RBN Type 1 Results for Variation Route d y^ Number Training 1 0.54 0.54 Sets 2 0.41 0.41 3 0.6 0.6 4 0.53 0.53 5 0.26 0.26 6 0.37 0.37 7 0.91 0.91 Test 8 0.3 0 Sets 9 0.38 0.0724 The test set for this case has a test error of: Etest = 0.0923 min2 Its hard to judge from this parameter since it is the first one of this category if this is good or not, but when looking at the values of y^ predicted it is obvious that this is not very good. Model 6: RBN Type II for Variation Then a type II radial basis network was with a Gaussian distribution kernel function was also tried. Table 4 shows how Etest varied with the number of radial basis functions used Table 11: RBN Type II Results for Variation # of Etest cluster centers 2 0.0448 3 0.0796 4 0.0796 5 0.0766 6 0.0885 7 0.0923 It appears that 2 cluster centers is somewhat better. Table 12 shows the estimates for this case. Table 12: RBN Type 1 Results for Variation Route d y^ Number Training 1 0.54 0.18 Sets 2 0.41 0.41 3 0.60 0.64 4 0.53 0.30 5 0.26 0.70 6 0.37 0.39 7 0.91 0.46 Test 8 0.30 0.08 Sets 9 0.38 0.34 This time, one of the test sets is reasonably approximated, but the other is not. There is still room for improvement. Model 7: MLP Models for Variation The multi-layered perceptron worked better that the radial basis network for the average drive time estimated, so it was tried for the problem of finding the variation. Table 13 shows various structures that were tried with varying degrees of success. . Table 13: MLP Results Node Etrain Etest Configuration 3-3-1 0.0428 0.0057 3-4-1 0.0960 0.0146 3-6-1 0.634 0.0103 3-3-3-1 0.0444 0.0107 3-3-3-3-1 0.0726 0.0018 It appears that the first configuration listed, 3-3-1, does the best on the training data. The results from this are shown in table 14 below. This time it looks like model 3-3-3-3-1 does the best job of estimating the test set. The results for this set are shown in table 15 below. This is a little disturbing as the 3-3-3-3-1 network is not necessarily good at estimating the data based on the inputs. Its just good at estimating the value of 0.33 which is an approximate mean of the training set. This mean is also very close to the two test samples! This is not such a good thing, and reinforces the need in this case to look not only at Etest, but also at Etrain. Table 14: MLP 3-3-1 Results Variation Route d y^ Number Training 1 0.54 0.40 Sets 2 0.41 0.45 3 0.60 0.48 4 0.53 0.26 5 0.26 0.28 6 0.37 0.34 7 0.91 0.48 Tune 8 0.30 0.39 set 9 0.38 0.32 Table 15: MLP 3-3-3-3-1 Results Variation Route d y^ Number Training 1 0.54 0.33 Sets 2 0.41 0.33 3 0.60 0.33 4 0.53 0.33 5 0.26 0.33 6 0.37 0.33 7 0.91 0.33 Tune 8 0.30 0.33 set 9 0.38 0.33 Overall the 3-3-1 MLP showed significant improvement over the radial basis network, however it still does not do a great job. Model 8: Single Linear Perceptron Model The single linear perceptron worked very well for the average drive time so it is revisited again here. Additionally, it does make some physical sense as in each stop light, stop-sign and mile, could individually add a bit of variation to the net total for the drive. Table 16 shows the results of this model. For this model: Etrain = 0.0369 Etest = 0.0115 This did have the best fit of the training data seen so far, however it did not do a very good job with the testing set. Table 16: Single Perceptron Results Variation Route d y^ Number Training 1 0.54 0.75 Sets 2 0.41 0.46 3 0.60 0.69 4 0.53 0.21 5 0.26 0.23 6 0.37 0.36 7 0.91 0.60 Tune 8 0.30 0.29 set 9 0.38 0.23 Model 9: Single Linear Perceptron Model with Squared Output When thinking of the physical nature of the model and how each element adds a cumulative variation to the overall trips variation a subtle flaw in the approach was noticed. In statistics the variation is not directly cumulative, but instead the overall is the square root of the sum of the individual squared variation terms when the terms are in series. From the author’s mechanical perspective, this is like a stack up of mechanical parts. If five parts are stacked on top of each other, the overall variation in height is as follows: 5 htotal = hi2 i=1 Likewise for our case, the variation could be expected to be: total= Nlight*light2 + Nstop*stop2 + Nmile*mile2 Now that the hopes have been set high... The investigation of the model should be done. The simplest way to do this was to input the variation as the square of the standard deviation. The only time it is converted out of that format is when the sum of squared error is caclulated and when the output vector is sent to the user. This makes the equation appear as follows: y^2 = total2= x1*w1+ x2* w2 + x3* w3 + wb total2 = x1*light2 + x2*stop2 + x3*mile2 + bias2 Table 17 shows the results of this training. Table 17: Single Perceptron Results for Variation with Squared Input Route d y^ Number Training 1 0.54 0.72 Sets 2 0.41 0.53 3 0.60 0.72 4 0.53 0.34 5 0.26 0.30 6 0.37 0.41 7 0.91 0.67 Tune 8 0.30 0.40 set 9 0.38 0.38 As hoped, this is better than any model has done so far. For this case: Etrain = 0.0229 Etest = 0.0049 Table 18: Single Linear Perceptron Weights [min2] [sec2] wbias wstoplight -0.0098 -0.59 0.0458 2.75 wstopsign wmile 0.0189 1.13 -0.018 -1.08 Physically it is difficult to explain how the variation from the mile could be negative and is presumably a bit of noise due to the limited test data. For the case of variation this is the preferred model that will be used in the final evaluation versus the experienced driver. Results Now that the models have been developed and tested with sample data sets the final comparison can be done. Part 1: Average Driving Time Table 19 below shows the final results of the single linear perceptron in comparison with the other methods. It appears that it does the best with a close runner up from Yahoo. d [min] 11.53 3.78 11.72 Overall Etest = Table 19: Final Results Single Friend 1 Friend 2 Yahoo Perceptron 12.18 17.00 15.98 10 3.83 5.00 5.52 4 12.29 19.00 18.71 11 0.249 28.136 23.890 0.969 Part 2: Standard Deviation of Driving Time Table 20 below shows the final results of the trained neural network in comparison with the other methods. It appears that the variation is quite difficult not only for the neural network, but also for the experienced driver, although the network did do better, it was not by quite the same factor as in the average drive time case. Table 20: Final Results for Variation d [min] Single Friend 1 Friend 2 Perceptron2 0.38 0.72 0.83 1.75 0.46 0.21 0.50 0.64 0.69 0.62 1.50 1.13 Overall Etest = 0.0608 0.288 0.702 Discussion Overall, this was a fairly successful project. When approaching this, there were a number of concerns of things that could effect the outputs. In reality everything from weather conditions, time of day, direction of the turn at an intersection, mood of the driver, and quite possibly the length of time since the last speeding ticket for the driver, could all play into the results. In this experiment, all routes were driven by the same driver who appears to drive a little faster than his friends. This was a constant that quite possibly added to the success of the experiment. Additionally all routes were driven in similar traffic conditions with similar weather conditions. An important observation that can be made is that both models that worked the best were models that could to some extent could be explained and understood from a physical or mathematical perspective. The models that failed were the models where the data was blindly thrown at them with out consideration for the physical explanation of the data. From the practical standpoint it is clear that stop signs and stop lights are significant sources of delay. For example, this data shows each mile only cost 1 minute and 15 seconds. If one were fortunate enough to have a 5 mile drive from ones home to work, this could be achieved in as little as 6 minutes and 15 seconds. However, when there are 10 stop signs on the route, add 20 seconds for each. That’s 3 minutes and 20 seconds of slowing down, stopping, and accelerating for what could have been a 6-minute trip. One of the most surprising observations was the variation of a stop light is only worth 3 seconds of standard deviation. Conclusion This project was very insightful into the nature of neural network application to real world problems and the difficulty that can be seen when doing so. It was also insightful to quantify the effect of things that an individual might have a gut feel about like the time cost difference between a stoplight and stop signs.