Repositioning in Ridesharing

From RL for transport research
Jump to navigation Jump to search

Contributor:Zixuan Xu, Jingbang Chen, Qi Luo.
This page mainly based on the Survey by Qin, Zhiwei, Hongtu Zhu, and Jieping Ye[1].

Problem statement[edit]

The repositioning module guides idle vehicles to specific locations in anticipation of fulfilling more requests in the future. Single-vehicle repositioning may refer to as taxi routing or passenger seeking in the literature. Compared with taxi, the matching radius of a mobile rideshare request is considerably longer, sometimes more than a mile. System-level vehicle repositioning, also known as driver dispatching, vehicle rebalancing/reallocation, or fleet management, aims to rebalance the global SD distributions by proactively dispatching idle vehicles to different locations. Repositioning and matching are similar to each other in that both relocate a vehicle to a different place as a consequence. In theory, one can treat repositioning as matching a vehicle to a virtual trip request, the destination of which is that of the reposition action, so that both matching and repositioning can be solved in a single problem instance. Typically in practice, these two problems are solved separately because they are separate system modules on most ridesharing platforms with different review intervals and objective metrics among other details.

Connection to RL methods[edit]

Taxi Routing problem[edit]

Vehicle repositioning from a single-driver perspective (i.e., taxi routing) has a relatively long history of research since taxi service has been in existence long before the emergence of rideshare platforms. Likewise, research on RL-based approaches for this problem also appeared earlier than that on system-level vehicle repositioning. For the taxi routing problem, each driver is naturally an agent, and the objective thus focuses on optimizing individual reward. Common reward definitions include trip fare[2], net profit (income - operational cost) [3], idle cruising distance [4], and ratio of trip mileage to idle cruising mileage[5]. The type of actions of an agent depends on the physical abstraction adopted. A simpler and more common way of representing the spatial world is a grid system, square or hexagonal5 [6][7][3][5][8][9][10][11]. In this setting, the action space is the set of neighboring cells (often including the current cell).

System-level Vehicle Repositioning[edit]

The problem formulation most relevant to the ridesharing service provider is system-level vehicle repositioning. Similar to order matching, the ridesharing platform reviews the vehicles' states at fixed time intervals which are significantly longer than those for order matching. Idle vehicles that meet certain conditions, e.g., being idle for sufficiently long time and not in the process of an existing reposition task, are sent reposition recommendations, which specify the desired destinations and the associated time windows. The motivation here is to explicitly modify the current distribution of the available vehicles so that collectively they are better positioned to fulfill more requests more efficiently in the future. The agent can be either the platform or a vehicle, latter of which calls for a MARL approach. All the works in this formulation have global SD information (each vehicle and request's status or SD distributions) in the state of the MDP, and a vehicle agent will additionally have its spatiotemporal status in the state. The rewards are mostly the same as in the taxi routing case, except that Mao et al. [12] consider the monetized passenger waiting time. The actions are all based on grid or taxi zone systems.

Main results[edit]

System-agent Approaches[edit]

The system-agent RL formulation has only been studied very recently, in view of the intractability of the joint action space of all the vehicles. To tackle this challenge of scalability, Feng et al. [13] decompose the system action into a sequence of atomic actions corresponding to passenger-vehicle matches and vehicle repositions. The MDP encloses a 'sequential decision process' in which all feasible atomic actions are executed to represent one system action, and the MDP advances its state upon complete of the system action. They develop a PPO algorithm for the augmented MDP to determine the sequence of the atomic actions. The system policy[12] produces a reposition plan that specifies the number of vehicles to relocate from zone i to j so that the action space is independent from the number of agents (at the expense of additional work at execution). The agent network, trained by a batch AC method, outputs a value for each OD pair, which after normalization gives the percentage of vehicles from each zone to a feasible destination.

Vehicle-agent Approaches[edit]

The vehicle-agent approaches have to address the coordination issue among the agents. Representative methods are listed in the form.

Paper Methods
Lin et al. (2018)[14] develop contextual DQN and AC methods, in which coordination is achieved by masking the action space based on the state context and splitting the reward accrued in a grid cell among the multiple agents within the same cell.
Oda & Joe-Wong (2018)[15] treat the global state in grid as image input and develop an independent DQN method. A single vehicle agent is trained with contextual deep RL and generates sequential actions for the vehicles.
Liu et al. (2020)[16] construct the zone structure by clustering a road-connectivity graph. A single vehicle agent is trained with contextual deep RL and generates sequential actions for the vehicles.
Li & Xu (2020) [17] train a single DQN agent for all agents, but with global KL distance between the SD distributions similar to (Zhou et al. 2019). The DQN agent is put in tandem with QRewriter, another agent with a Q-table value function that converts the output of DQN to an improved action.
Shou & Di (2020b) [18] approach the MARL problem with bilevel optimization: The bottom level is a mean-field AC method (Li et al. 2019) with the reward function coming from a platform reward design mechanism, which is tuned by the top level Bayesian optimization. Agent coordination is done by a central module in (Chaudhari et al. 2020a), where a vehicle agent executes a mix of independent and coordinated actions. The central module determines the need for coordination based on SD gaps, and explicit coordination is achieved by solving an assignment problem to move vehicles from excess zones to deficit zones.

Joint Matching and Repositioning Optimization[edit]

For joint matching and repositioning optimization, one major challenge is the heterogeneous review cadence. Matching and reposition decisions are typically made asynchronously in practice. To address this issue, Tang et al. [19] allow the two modules to operate independently but share the same spatiotemporal state value function which is updated online. If the two decisions are formulated into the same problem, the action space can be masked depending on the state[20].

Renferences[edit]

  1. Qin, Z., Zhu, H., & Ye, J. (2021). Reinforcement Learning for Ridesharing: An Extended Survey. arXiv e-prints, arXiv-2105.
  2. Rong, H., Zhou, X., Yang, C., Shafiq, Z. & Liu, A. (2016), The rich and the poor: A markov decision process approach to optimizing taxi driver revenue efficiency, in ‘Proceedings of the 25th ACM International on Conference on Information and Knowledge Management’, pp. 2329–2334.
  3. 3.0 3.1 Verma, T., Varakantham, P., Kraus, S. & Lau, H. C. (2017), Augmenting decisions of taxi drivers through reinforcement learning for improving revenues, in ‘Twenty-Seventh International Conference on Automated Planning and Scheduling’.
  4. Garg, N. & Ranu, S. (2018), Route recommendations for idle taxi drivers: Find me the shorte stroute to a customer!, in ‘Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining’, pp. 1425–1434.
  5. 5.0 5.1 Gao, Y., Jiang, D. & Xu, Y. (2018), ‘Optimize taxi driving strategies based on reinforcement learning’, International Journal of Geographical Information Science 32(8), 1677–1696.
  6. Han, M., Senellart, P., Bressan, S. & Wu, H. (2016), Routing an autonomous taxi with reinforcement learning, in ‘Proceedings of the 25th ACM International on Conference on Information and Knowledge Management’, pp. 2421–2424.
  7. Wen, J., Zhao, J. & Jaillet, P. (2017), Rebalancing shared mobility-on-demand systems: A reinforcement learning approach, in ‘2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC)’, Ieee, pp. 220–225.
  8. Lin, K., Zhao, R., Xu, Z. & Zhou, J. (2018), Efficient large-scale fleet management via multi-agent deep reinforcement learning, in ‘Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining’, pp. 1774–1783.
  9. Rong, H., Zhou, X., Yang, C., Shafiq, Z. & Liu, A. (2016), The rich and the poor: A markov decision process approach to optimizing taxi driver revenue efficiency, in ‘Proceedings of the 25th ACM International on Conference on Information and Knowledge Management’, pp. 2329–2334.
  10. Jiao, Y., Tang, X., Qin, Z. T., Li, S., Zhang, F., Zhu, H. & Ye, J. (2021), ‘Real-world ride-hailing vehicle repositioning using deep reinforcement learning’, Transportation Research Part C: Emerging Technologies 130, 103289.
  11. Shou, Z., Di, X., Ye, J., Zhu, H., Zhang, H. & Hampshire, R. (2020), ‘Optimal passenger-seeking policies on e-hailing platforms using markov decision process and imitation learning’, Transportation Research Part C: Emerging Technologies 111, 91–113.
  12. 12.0 12.1 Mao, C., Liu, Y. & Shen, Z.-J. M. (2020), ‘Dispatch of autonomous vehicles for taxi services: A deep reinforcement learning approach’, Transportation Research Part C: Emerging Technologies 115, 102626.
  13. Feng, J., Gluzman, M. & Dai, J. (2020), ‘Scalable deep reinforcement learning for ride-hailing’, IEEE Control Systems Letters .
  14. Lin, K., Zhao, R., Xu, Z. & Zhou, J. (2018), Efficient large-scale fleet management via multi-agent deep reinforcement learning, in ‘Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining’, pp. 1774–1783.
  15. Oda, T. & Joe-Wong, C. (2018), Movi: A model-free approach to dynamic fleet management, in‘IEEE INFOCOM 2018-IEEE Conference on Computer Communications’, IEEE, pp. 2708–2716.
  16. Liu, Z., Li, J. &Wu, K. (2020), ‘Context-aware taxi dispatching at city-scale using deep reinforcement learning’, IEEE Transactions on Intelligent Transportation Systems .
  17. Zhang, W., Wang, Q., Li, J. & Xu, C. (2020), ‘Dynamic fleet management with rewriting deep reinforcement learning’, IEEE Access 8, 143333–143341.
  18. Shou, Z. & Di, X. (2020b), ‘Reward design for driver repositioning using multi-agent reinforcement learning’, Transportation research part C: emerging technologies 119, 102738
  19. Tang, X., Zhang, F., Qin, Z., Wang, Y., Shi, D., Song, B., Tong, Y., Zhu, H. & Ye, J. (2021), ‘Value function is all you need: A unified learning framework for ride hailing platforms’, arXiv e-prints pp. arXiv–2105.
  20. Holler, J., Vuorio, R., Qin, Z., Tang, X., Jiao, Y., Jin, T., Singh, S., Wang, C. & Ye, J. (2019), Deep reinforcement learning for multi-driver vehicle dispatching and repositioning problem, in J. Wang, K. Shim & X. Wu, eds, ‘2019 IEEE International Conference on Data Mining (ICDM)’, Institute of Electrical and Electronics Engineers, Washington, DC, pp. 1090–1095.