Contributors: Zheng Li and Qi Luo.
Electric Vehicles (EVs) are soon likely to replace those powered by an internal combustion engine due to their high efficiency and low pollution. This research trend has encouraged many governments around the world to announce new policies aimed at replacing automobiles powered by internal combustion engines with electric vehicles. However, there are still a number of critical drawbacks in the use of electric vehicles for which solutions must be found. First, the battery in an electric vehicle occupies more space than it does in a gasoline or diesel engine, and its limited capacity implies shorter travel distances between recharging points. Moreover, battery charging time is longer than refuelling time for a conventional vehicle. This is a major problem for drivers who must therefore spend more time at charging stations.
The Online Electric Vehicle (OLEV) was introduced in an attempt to overcome these problems. In this new wireless charging system, power is drawn from underground power cables, meaning that a large battery is not necessary because the motor instantaneously receives power from the embedded power cables, which also leads to improved energy efficiency in the vehicle due to its lighter weight. Furthermore, drivers do not need to spend time charging their vehicles because the installed power cables continuously transmit energy, and the vehicle is thus charged while it is being driven along.
As shown in Figure 1, the OLEV itself appears very similar to a conventional electric vehicle, but significantly differs because a power-receiving pickup module is part of the on-board equipment contained in the vehicle, in addition to a motor and a battery. The power-receiving unit is attached to the bottom of the vehicle and picks up the transmitted power from the power cable; the regulator then supplies a constant voltage to the battery. The power cable is installed beneath the surface of the road on which the vehicle operates. The power-supply infrastructure is composed of an inverter and a power cable installed in segments.
Figure 1. Overall layout of wireless charging system .
Connection to RL method
Conventional studies regarding the optimization of wireless charging systems have several limits. First, most studies are simulated under a static traffic environment, which evaluates the EV’s travel distance based on the designated driving cycle. Then, there is a limitation in reflecting a real traffic environment, because traffic flow changes over time in actual traffic environments. To precisely evaluate EV performance, a simulation under dynamic traffic environments, built based on real-time traffic data, is crucial. In the case of a dynamic traffic environment, the MIP-based exact algorithm’s computational complexity greatly increases as the number of constraints escalates. Moreover, the MIP-based exact algorithm needs to be modified according to traffic environment changes, which makes it extremely inefficient to find the optimal solution in dynamic traffic environments. In order to efficiently optimize the wireless charging systems in a dynamic traffic environment, the reinforcement-learning algorithm approach is an attractive alternative compared to conventional methods.
Lee et al. (2019)  propose a precise model of a wireless charging electric bus system based on a Markov decision process (MDP). The state is given by the average State of Charge (SoC) of the bus fleet, which is calculated based on battery capacity, pickup capacity, and number of installed power cables. The action set A is formulated by selecting the three main variables: battery capacity, pickup capacity and the location of a power cable with given pickup capacity at each time interval. The total cost of the wireless charging system is defined as the reward function.
Lee et al. (2021)  propose a wireless charging electric trams system model, which operates under massive tram network and energy cost minimization algorithm based on decentralized multi-agent reinforcement learning with feedback. The state is composed of battery capacity, pickup capacity and the electric tram’s SoC level. The reward function considers system cost and operation stability is established.
Single-agent reinforcement learning
Lee et al. (2019)  introduce an optimization algorithm based on reinforcement learning to find the optimal battery capacity, pickup capacity, and number of power-cable installations. Q-learning was adopted to maximize the action value as the state changes from the current to the next state according to the action taken by the agent. As a result, for a given action, the Q value is updated based on immediate and future rewards obtained from the environment. An immediate reward is the reward for any recent change in the state brought about by the action of the agent, and a future reward is the reward associated with the future environment resulting from the action. Ultimately, the agent’s ultimate goal is to update the value of Q to obtain the maximum reward.
Figure 2. Architecture of reinforcement-learning algorithm .
Multi-agent reinforcement learning
Lee et al. (2021)  introduce an energy cost minimization algorithm based on a decentralized multi-agent reinforcement learning with feedback network to minimize the total cost for the wireless charging electric tram network, as shown in Figure 3. Each agent uses the one-step Q-learning algorithm to learn the decision policy. For a given action, the value of Q is updated based on immediate and future rewards obtained from the environment. Then, the agents attempt to maximize their expected sums of discounted rewards in a form of general-sum game. Based on this assumption, Nash equilibrium guarantees an existence of at least one optimal point.
Figure 3. Architecture of decentralized reinforcement learning algorithm with feedback network .
- H. Lee, D. Ji, and D. H. Cho, “Optimal design of wireless charging electric bus system based on reinforcement learning,” Energies, vol. 12, no. 7, pp. 1–20, 2019.
- H. Lee and D. H. Cho, “Energy Cost Minimization Based on Decentralized Reinforcement Learning with Feedback for Stable Operation of Wireless Charging Electric Tram Network,” IEEE Syst. J., vol. 15, no. 1, pp. 586–597, 2021.