Electric vehicle sharing

From RL for transport research
Jump to navigation Jump to search

Contributors: Zheng Li and Qi Luo.

This page mainly based on the work by Bogyrbayeva [1] and Shi [2].

Problem Statement[edit]

The advent of Electric Vehicles (EVs) and car-sharing services provides a sustainable option to move people and goods across dense urban areas. Car sharing services with EVs have the potential to increase the utilization of resources and offer a unique opportunity to the urban population in the form of EV sharing systems. With the EV sharing systems, examples of which include companies such as car2go and WeShare, customers no longer need to own a vehicle and can conveniently pick up/drop off any EV, on-demand, from the parking lots of designated service areas. However, there are some critical operational challenges to bring this on-demand service into the mainstream.

System rebalancing[edit]

Before the start of the day, an operating company needs to relocate EVs to the ideal demand locations to establish a supply-demand balance in the system. Furthermore, to provide a certain level of service, EVs need to be charged before they can be used by the customers. There are two major issues. First, there exists a sparse demand in the service area network, and hence it is not trivial to find the ideal locations to relocate the EVs. Second, there needs to be an efficient routing plan to drop off the drivers for picking up the EVs and taking the EVs to the charging stations for charging, and then pick up the drivers from their respective locations. It is evident that without efficient solutions for the above complex and costly operational challenges, the sustainable existence of the EV sharing systems is uncertain.

Ride hailing[edit]

Operating a community or city owned EV fleet to provide ride-hailing services can be a great solution to delivering low cost and low emission mobility services to the local residents. The key difference between dispatching conventional gasoline vehicles and EVs lies in the relatively frequent recharging processes of EV batteries. The recharging process still needs to be considered for the following three reasons. First, the majority of EVs on the market currently have much lower ranges than their gasoline counterparts. For example, the Nissan Leaf has a range of 151 to 226 miles. It would run out of power after only a few hours of continuous operation. Second, recharging an EV is still significantly slower than refueling a conventional gasoline vehicle. For example, charging a Nissan Leaf from empty to full would take around one hour even using a 50 kW fast charger. Third, even though the ranges of EVs will continue to grow in the future, the modeling of recharging process would still be important for many applications. For instance, the recharging process of an EV fleet can be coordinated with the smart grid control systems to provide frequency regulation services, which will bring additional benefits to both the ride-hailing service provider and the power grid.

Connection to RL method[edit]

System rebalancing[edit]

Focusing on solving the shuttle routing decision problem, Bogyrbayeva et al. (2022) [1] propose a reinforcement learning approach, in which the EV relocation decisions are made by a rule-based approach. A fleet of shuttles with drivers leaves a depot and visits nodes in the network to relocate EVs from supplier nodes to demander nodes. Shuttles must return to a depot after fulfilling demand at all demander nodes and picking up all the drivers. These sequential decisions of a central controller for routing shuttles under uncertain demand (locations of drivers) are formulated as a finite horizon Markov Decision Process (MDP), where the future dynamics of the system depend only on the current state. The state set contains each node’s location, the relative distance, the number of EVs, the number of drivers, the charging levels of EVs’ and indicators for the expected transitions. The action set is defined by the next node that need to be visited for each shuttle. All shuttles share a common reward R and immediate reward.

Ride hailing[edit]

Shi et al. (2020) [2] develop an EV fleet operating algorithm to provide ride-hailing services, which minimizes the total customer waiting time, electricity consumption, and vehicle operational costs. Specifically, given a fleet of EVs with random initial locations and remaining battery levels, the task is to make sequential decisions to dispatch the EV fleet to serve an initially unknown set of customer trip requests within the operating time horizon. The operating problem for a fleet of EVs to provide ride-hailing services is formulated as a Markov decision process (MDP). The state set contains remaining battery level, location and the time length of each EV. The action set is defined three actions: pass, charge and assign.

Main results[edit]

Bogyrbayeva et al. (2022) [1] adopt a policy gradient method to learn the complex routing policies of shuttles directly. In general, policy gradient methods consist of two separate networks: an actor and a critic. The critic estimates a value function given a state according to which the actor’s parameters are set to generate policies in the direction of improvement. They train an agent and a central controller to route a single shuttle and multiple shuttles in an urban network by simulating the EV sharing systems environment. The simulator is developed to handle EV relocations through rule-based decisions and utilizing sequence-to-sequence models to generate policies. The overview of the model is shown in Figure 1.

An overview of the reinforcement learning model.png

Figure 1. An overview of the reinforcement learning model [1]. Shi et al. (2020) [2] propose a reinforcement learning framework with decentralized learning and centralized decision making to solve this MDP problem. The overall reinforcement learning framework is illustrated in Figure 2. In the learning process, they treat EVs as individual agents with shared state value function. The parameters of the common state-value function approximator are trained and updated based on the collection of individual EVs’ experiences of interacting with the environment. In the decision making process, the EV fleet operating problem is solved in a centralized manner by leveraging state-value function estimates from the learning process.

Overall reinforcement learning framework.png

Figure 2. Overall reinforcement learning framework [2].


  1. 1.0 1.1 1.2 1.3 A. Bogyrbayeva, S. Jang, A. Shah, Y. J. Jang, and C. Kwon, “A Reinforcement Learning Approach for Rebalancing Electric Vehicle Sharing Systems,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 7, pp. 8704–8714, 2022.
  2. 2.0 2.1 2.2 2.3 J. Shi, Y. Gao, W. Wang, N. Yu, and P. A. Ioannou, “Operating Electric Vehicle Fleet for Ride-Hailing Services with Reinforcement Learning,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 11, pp. 4822–4834, 2020.