Uploaded by Michael Karras

Reinforcement Learning for Solving the Vehicle Routing Problem

Reinforcement Learning for Solving the Vehicle Routing Problem
This paper introduces a reinforcement learning method to tackle the vehicle routing problem via
policy gradient methods. The proposed approach reduces computational barriers by removing
the need to re-train the model on every problem it is presented with, as done in classical
methods, by deploying a recurrent architecture. In addition, the architecture is scalable to more
problems and can be applied to many classes of discrete optimization problems. The take home
message is the potential of reinforcement learning algorithms to solve traditional discrete
optimization problems and their efficacy when deployed in practice.
Related Work:
The related works section broadly describes sequence to sequence architectures and neural
combinatorial optimization. For the former, the authors highlight the use of encoders and
decoders which they later reference for their architecture since they employ a decoder RNN in
their model. For the latter, the authors introduce policy networks which are used to solve
combinatorial optimization problems and argue that their approach is a simplified version of
that architecture. Overall the work is properly situated with previous publications and adds on
top of previous works with more novel ideas.
Significance and Originality:
The ideas presented are novel with respect to prior works. In particular, the authors propose the
removal of an encoder RNN from the previously proposed pointer networks in an attempt to
remove order dependence on the input and for efficiency purposes. The authors also investigate
stochastic environments of the vehicle routing problem which is not commonly addressed by
classical methods. Thus, the architecture is unique to the problem at hand and encourages
more research into that direction.
The authors attach appendices with detailed algorithmic explanations of REINFORCE and other
VRP baselines. The authors also describe the attention mechanism and apply it to the problem
at hand. There are no explicit theoretical justifications of the algorithm performing optimally on
the VRP, but it rests on the assumption that reinforcement learning algorithms can perform well
in a Markovian setting which is a reasonable assumption to make for the given problem. Hence,
the ideas presented are sound.
The authors attempt to evaluate their work empirically by running the algorithm on multiple
benchmarks with various numbers of customer nodes (10,20,50,100) and vehicle capacities.
The other methods include Clarke-Wright savings heuristic, the Sweep heuristic, and Google’s
optimization tools. For the smaller number of customer nodes (10,20), the metric user is the
distance from the optimal solution which is computed by mixed integer programming; this
metric is termed “optimality gap”.
For experiments with a higher number of customer nodes, a pairwise comparison is used on a
select number of samples for each method by denoting the percentage of samples on which the
current method outperforms the other.
Moreover, the authors perform comparisons of the outlined methods based on the times taken
to produce the solution normalized by the number of customers. This comparison is relevant
because many systems attempting to tackle the vehicle routing problem need to do so quickly
in real time. The analysis shows that the proposed method maintains a constant ratio with the
number of customers whereas other methods exhibit a linear relationship, suggesting that the
proposed method is scalable. The authors do not investigate problems with nodes of more than
110. A suggestion to the authors would be comparing the number of parameters in each model
deployed. The author reports that total training time will be 13.5 hours. The authors also
suggest a stochastic version of the problem which is closer to the real world in the appendix
Although the concepts are carefully explained, it would be beneficial if the authors introduced
the Vehicle Routing Problem in earlier sections as opposed to doing so in section 4. Moreover,
adding more justification to why RNNs may be a good candidate solution may prove helpful to
The writing is clear and concise.
[1] Nazari, M., Oroojlooy, A., Snyder, L., & Takác, M. (2018). Reinforcement learning for solving
the vehicle routing problem. Advances in neural information processing systems, 31.