This page is mainly based on the Survey by Yan .
Urban logistics is a broad topic, as different logistics companies and organizations may have their unique problem statements. Recently, the E-Commerce market has been growing year by year, and the importance of Last Mile Delivery (LMD) has been increasing accordingly. LMD can be broadly classified into two types: Delivery problems and Pickup & Delivery problems. These delivery efficiencies have been studied academically as Vehicle Routing Problem (VRP) and Dynamic Pickup & Delivery Problem (DPDP) respectively.
Connection to RL
In Yan’s summary , the complete LMD can be divided into three processes (from the highest to the lowest): order acceptance/rejection, job assignment, and path construction. In an RL setting, a typical approach is to model the transition process of a semi-Markov chain whenever an order arrives. Depending on the problem nature, some studies incorporate only one or two decision levels. For example, there are applications that require that all orders shall be accepted. In these applications, the decisions on order acceptance/rejection are irrelevant. Some other studies consider that couriers can pick up multiple items subject to their capacities, while others assume that orders have to be fulfilled once at a time.
At the highest level of decisions, logistics companies or organizations can decide to accept or reject customer orders. In some problems, acceptance/rejection is the only decision suggested by RL; other decisions are made by optimization algorithms.
At the intermediary level, once an order is confirmed, the next decision is to assign the order to a vehicle. One popular research topic is virtual bidding. In the related problems, each vehicle is modeled as an agent who determines a virtual bidding price, taking into account information about the requests, price, and inputs from other agents. The objective of bidding is to maximize the overall order fulfillment rate and save the travel costs.
At the lowest level of decisions, after assigning orders to vehicles, the routes of the vehicles have to be constructed. On an urban motorway, traffic conditions usually exhibit diurnal patterns and uncertain outcomes. Most of the existing studies rely on the least expected time or risk-based methods to handle stochasticity.
In this part, we will briefly introduce the practical application of LMD in reality.
Dynamic Pickup & Delivery Problem
Ding design a crowdsourcing delivery system based on public transport, considering the practical factors of time constraints, multi-hop delivery, and profits. To incorporate the impact factors, they build a reinforcement learning model to learn the optimal order dispatching strategies from massive passenger data and package data. The order dispatching problem is formulated as a sequential decision making problem for the packages routing, i.e., selecting the next station for the package.
Li propose a data-driven approach, Spatial-Temporal Aided Double Deep Graph Network (ST-DDGN), to solve industry-scale DPDP. In their method, the delivery demands are first forecast using spatial-temporal prediction method, which guides the neural network to perceive spatial-temporal distribution of delivery demand when dispatching vehicles. Besides, the relationships of individuals such as vehicles are modeled by establishing a graph-based value function. ST-DDGN incorporates attention based graph embedding with Double DQN (DDQN).
Ma design an upper-level agent to dynamically partition the DPDP into a series of sub-problems with different scales to optimize vehicles routes towards globally better solutions. Besides, a lower-level agent is designed to efficiently solve each sub-problem by incorporating the strengths of classical operational research-based methods with reinforcement learning-based policies.
Zou proposes a Double Deep Q Network (DQN) based reinforcement learning framework that gradually tests and learns the order dispatching policy by communicating with an O2O simulation model developed by SUMO. The preliminary experimental results using the real order data demonstrate the effectiveness and efficiency of the proposed Double-DQN based order dispatcher. Also, different state encoding schemes are designed and tested to improve the performance of the Double-DQN based dispatcher.
Multi Vehicle Routing Problem with Pickups and Deliveries
Kang consideration of complex and practical cases, such as multiple delivery vehicles, just-in-time (JIT) pickup and delivery, minimum fuel consumption, and maximum profitability. For this they suggest a learning-based logistics planning and scheduling (LLPS) algorithm that controls admission of order requests and schedules the routes of multiple vehicles altogether. For the admission control, they utilize reinforcement learning (RL) with a function approximation using an artificial neural network (ANN). Also, they use a continuous-variable feedback control algorithm to schedule routes that minimize both JIT penalty and fuel consumption. Computational experiments show that the LLPS outperforms other similar approaches by 32% on average in terms of average reward earned from each delivery order. In addition, the LLPS is even more advantageous when the rate of order arrivals is high and the number of vehicles that transport parcels is low.
Ahamed investigates the problem of assigning shipping requests to ad hoc couriers in the context of crowdsourced urban delivery propose a new deep reinforcement learning (DRL)-based approach to tackling this assignment problem. A deep Q network (DQN) algorithm is trained which entails two salient features of experience replay and target network that enhance the efficiency, convergence, and stability of DRL training. The results show the benefits brought by the heuristics-guided action choice and rule-interposing in DRL training, and the superiority of the proposed approach over existing heuristics in both solution quality, time, and scalability.
Kavuk proposes a real-life application of deep reinforcement learning to address the order dispatching problem of a Turkish ultra-fast delivery company, Getir. They use Deep Q-networks to learn the actions of warehouses, i.e., accepting or canceling an order, directly from state dimensions using reinforcement learning. They design the networks with two different rewards. They conduct empirical analyses using real-life data provided by Getir to generate training samples and to assess the models’ performance during a selected 30-day period with a total of 9880 orders. The results indicate that our proposed models are able to generate policies that outperform the rule-based heuristic employed in practice.
- Yan Y, Chow A H F, Ho C P, et al. Reinforcement learning for logistics and supply chain management: Methodologies, state of the art, and future opportunities[J]. Transportation Research Part E: Logistics and Transportation Review, 2022, 162: 102712.
- Ding Y, Guo B, Zheng L, et al. A City-Wide Crowdsourcing Delivery System with Reinforcement Learning[J]. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2021, 5(3): 1-22.
- Li, X., Luo, W., Yuan, M., Wang, J., Lu, J., Wang, J., Lu, J. & Zeng, J. Learning to Optimize Industry-Scale Dynamic Pickup and Delivery Problems. arXiv preprint arXiv:2105.12899 (2021)
- Ma, Y., Hao, X., Hao, J., Lu, J., Liu, X., Xialiang, T., Yuan, M., Li, Z., Tang, J. & Meng, Z. A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems. Thirty-Fifth Conference on Neural Information Processing Systems (2021).
- Zou, G., Tang, J., Yilmaz, L. & Kong, X. Online Food Ordering Delivery Strategies based on Deep Reinforcement Learning. Applied Intelligence, 1–13 (2021).
- Kang, Y., Lee, S. & Do Chung, B. Learning-based Logistics Planning and Scheduling for Crowdsourced Parcel Delivery. Computers & Industrial Engineering 132, 271–279 (2019).
- Ahamed, T., Zou, B., Farazi, N. & Tulabandhula, T. Deep Reinforcement Learning for Crowdsourced Urban Delivery: System States Characterization, Heuristics-guided Action Choice, and Rule-Interposing Integration. arXiv preprint arXiv:2011.14430 (2020).
- Kavuk, E. M., Tosun, A., Cevik, M., Bozanta, A., Sonuç, S. B., Tutuncu, M., Kosucu, B. & Basar, A. Order Dispatching for An Ultra-Fast Delivery Service via Deep Reinforcement Learning. Applied Intelligence, 1–26 (2021).