Pricing in ridesharing

From RL for transport research
Jump to navigation Jump to search

Contributor:Zixuan Xu, Jingbang Chen, Qi Luo.
This page mainly based on the Survey by Qin, Zhiwei, Hongtu Zhu, and Jieping Ye[1].

Problem statement[edit]

The pricing module offers a quote, which the passenger either accepts or rejects. Since the trip fare is both the price that the passenger has to pay for the trip and the major factor for the income of the driver, pricing decisions influence both demand and supply distributions through price sensitivities of users, e.g., the use of surge pricing during peak hours. The pricing problem in the ridesharing literature is in most cases dynamic pricing, which adjusts trip prices in real-time in view of the changing demand and supply. The pricing modules sits at the upstream position with respect to the other modules and is a macro-level lever to achieve supply-demand (SD) balance. The pricing this page follows the common setting where driver pay is closely associated (approximately proportional) to the trip fare so that pricing has the dual effect on demand and supply.

Connection to RL methods[edit]

RL-based approaches have been developed for dynamic pricing in one-sided retail markets[2]. The ridesharing marketplace has its two-sided nature and spatiotemporal dimensions, where pricing is also a lever to change the supply(driver) distribution if price changes are broadcast to the drivers. Chen et al.[3] describe examples of such elasticity functions for both demand and supply for their simulation environment.

The challenges in dynamic pricing for ridesharing lie in both its exogeneity and endogeneity. Dynamic pricing on trip inquiries changes the subsequent distribution of the submitted requests through passenger price elasticity. The requests distribution, in turn, influences future supply distribution as drivers fulfill those requests. On the other hand, the trip fares influence the demand for ridesharing services at given locations, and these changes will affect the pool of waiting passengers, which further affects the passengers' expected waiting times. Again, it will influence the demand either through cancellation of the current requests or the conversion of future trip inquiries. Because of its close ties to SD distributions, dynamic pricing is often jointly optimized with order matching or vehicle repositioning.

The complex interaction between pricing and the SD makes it hard to explicitly develop mathematical models that adapt well to dynamic and stochastic environments, and RL comes in as a promising direction to address these challenges by considering endogeneity and exogeneity as part of the environment dynamics.

Main results[edit]

RL for Dynamic Pricing[edit]

Early RL works consider one of the two factors of ridesharing market, two-sidedness of the market and spatiotemporal dimensions. Wu et al.[4] consider a simplified ridesharing environment which captures only the two-sidedness of the market but not the spatiotemporal dimensions. The state of the MDP is the current price plus SD information. The action is to set a price, and the reward is the generated profit. A Q-learning agent is trained in a simple simulator, and empirical advantage in the total profit is demonstrated against other heuristic approaches.

More recent works leverage the spatiotemporal nature of the pricing actions and take into account the spatiotemporal long-term values the pricing decisions. Table 1 summarizes these works on RL for dynamic pricing in ridesharing.

Paper Agent State Action Reward Algorithm Environment
Chen, Jiao, Qin, Tang,

Li, An, Zhu & Ye (2019)[5]

global decisionmaker features of the trip request discretized price change percentage profit contextual bandits with action values partly computed by CVNet ride-hailing simulator with pricing module and passenger elasticity model
Turan et al. (2020)[6] global decisionmaker for pricing and EV charging electricity price in each zone, passenger queue length for each OD pair, number of vehicles in each zone and their energy levels price for each OD pair, reposition/charging for each vehicle trip revenue- penalty for queues -operational cost for charging and reposition PPO simulator
Song et al. (2020)[7] global decisionmaker location, time price for spatialtemporal grid cells trip price minus penalty for driver waiting Q-learning case study: ridehailing simulation of Seoul
Mazumdar et al. (2017)[8] passenger price multiplier, time, if a ride has completed wait, take current ride trip price to pay risk-sensitive inverse RL historical data
Chen et al. (2021)[3] global decisionmaker number of open requests, vacant vehicles, and occupied vehicles in each grid cell at time t, and demand in time t-1 joint actions of price (per-km for excess mileage) and wage (perkm rate) for each grid cell profit: revenue

minus wage

PPO simulation based on Hangzhou data from DiDi; modeling on both supply and demand elasticity

Supply Elasticity[edit]

As discussed above, under the setting where the driver pay is associated with the trip fare, the dynamic pricing policy also affects supply elasticity, i.e., drivers' decisions on participation in a given marketplace, working hours, and in some cases, the probability of accepting a given assignment, depending on the rules of the particular ridesharing platform[9][10][11]. Although not yet being widely considered intem state information that has significant implication to the sequence of pricing decisions. For the closely related topic of driver incentives design, Shang et al. [12][13] adopts a learning-based approach to construct a generative model for driver behavior with respect to the incentives policy and subsequently trains an RL agent to optimize the incentives design for system-level metrics. Perhaps this example sheds some light on how RL is able to help improve pricing policies in view of supply-side effects.


  1. Qin, Z., Zhu, H., & Ye, J. (2021). Reinforcement Learning for Ridesharing: An Extended Survey. arXiv e-prints, arXiv-2105.
  2. Raju, C., Narahari, Y. & Ravikumar, K. (2003), Reinforcement learning applications in dynamic pricing of retail markets, in ‘IEEE International Conference on E-Commerce, 2003. CEC 2003.’, IEEE, pp. 339–346.
  3. 3.0 3.1 Chen, C., Yao, F., Mo, D., Zhu, J. & Chen, X. M. (2021), ‘Spatial-temporal pricing for ride-sourcing platform with reinforcement learning’, Transportation Research Part C: Emerging Technologies 130, 103272.
  4. Wu, T., Joseph, A. D. & Russell, S. J. (2016), ‘Automated pricing agents in the on-demand economy’, University of California at Berkeley: Berkeley, CA, USA .
  5. Chen, H., Jiao, Y., Qin, Z., Tang, X., Li, H., An, B., Zhu, H. & Ye, J. (2019), Inbede: Integrating contextual bandit with td learning for joint pricing and dispatch of ride-hailing platforms, in ‘2019 IEEE International Conference on Data Mining (ICDM)’, IEEE, pp. 61–70.
  6. Turan, B., Pedarsani, R. & Alizadeh, M. (2020), ‘Dynamic pricing and fleet management for electric autonomous mobility on demand systems’, Transportation Research Part C: Emerging Technologies 121, 102829.
  7. Song, J., Cho, Y. J., Kang, M. H. & Hwang, K. Y. (2020), ‘An application of reinforced learningbased dynamic pricing for improvement of ridesharing platform service in seoul’, Electronics 9(11), 1818.
  8. Mazumdar, E., Ratliff, L. J., Fiez, T. & Sastry, S. S. (2017), Gradient-based inverse risk-sensitive reinforcement learning, in ‘2017 IEEE 56th Annual Conference on Decision and Control (CDC)’, IEEE, pp. 5796–5801.
  9. Chen, M. K. & Sheldon, M. (2016), ‘Dynamic pricing in a labor market: Surge pricing and flexible work on the uber platform.’, Ec 16, 455.
  10. Sun, H., Wang, H. & Wan, Z. (2019), ‘Model and analysis of labor supply for ride-sharing platforms in the presence of sample self-selection and endogeneity’, Transportation Research Part B: Methodological 125, 76–93.
  11. Angrist, J. D., Caldwell, S. & Hall, J. V. (2021), ‘Uber versus taxi: A driver’s eye view’, American Economic Journal: Applied Economics 13(3), 272–308.
  12. Shang, W., Yu, Y., Li, Q., Qin, Z., Meng, Y. & Ye, J. (2019), Environment reconstruction with hidden confounders for reinforcement learning based recommendation, in ‘Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining’, pp. 566–576.
  13. Shang, W., Li, Q., Qin, Z., Yu, Y., Meng, Y. & Ye, J. (2021), ‘Partially observable environment estimation with uplift inference for reinforcement learning based recommendation’, Machine Learning pp. 1–38.