Pickup and delivery problems with late penalties can be adopted to model a wide range of practical situations in the field of transportation and logistics. However, the restrictions on the… Click to show full abstract
Pickup and delivery problems with late penalties can be adopted to model a wide range of practical situations in the field of transportation and logistics. However, the restrictions on the multiple vehicles’ service sequences and non-linearity caused by the late penalties make it time-consuming to solve this problem. To overcome this difficulty, we propose a novel reinforcement learning framework inspired by transformer architecture to generate tours instantly after offline training. This framework, as trained through the policy gradient method, consists of the information encoder process which can extract the coupling relationships among the pickup and delivery customers, and the decoder process with multi-vehicle attention network to allocate reasonable orders to each vehicle. Validated on Sioux Falls network, the proposed method yields the improvement of 2.4%-8.0% on the solution quality compared with Google OR-Tools and several heuristic algorithms. Notably, the baselines require dozens of minutes to achieve a lesser result on the case with 100 customers while the well-trained model based on our method can be deployed to provide a high-quality solution within seconds. Furthermore, the proposed model also shows good generalization ability in different scenarios with various scale problems, and the obtained results are shown to be quite robust to counter the fluctuation of travel time.
               
Click one of the above tabs to view related content.