Static traffic assignment

From RL for transport research
Jump to navigation Jump to search

Contributors: Zheng Li and Qi Luo.

This page is mainly based on Zhou [1], Ramos [2] and Shou [3].

Problem Statement[edit]

In recent years, wireless communication, on-boar computation facilities and advanced sensor techniques have been integrated into transportation systems. These new technologies establish information exchange in vehicle-to-vehicle and vehicle-to-infrastructure networks, and further enable real-time traffic information to be collected, processed, and disseminated among travelers, road infrastructure, as well as traffic management centers. Accordingly, a type of well-connected and information-rich transportation systems, named connected vehicle system, is under rapid development and is expected to be fully implemented in the near future. With the deployment of the advanced technologies, the information-aid route guidance systems and traffic assignments are developed to assist travelers to make a more suitable decision.

Even though connected vehicle system has been granted a great potential to intelligently route travelers, researchers have recognized that if each traveler independently chooses the shortest path based on uniformly shared real-time traffic information, it may only be beneficial when travelers are the minority and their route choices do not impact traffic flows significantly. In fact, travelers may take advantages of the real-time information and find shorter paths which non-travelers may not be able to recognized. However, as travelers become the majority, their route choices will impact traffic flows significantly. Then, current uniform real-time information provision may lead to even worsen traffic congestion, given travelers still selfishly and independently choose their own shortest paths. For example, many travelers sharing uniform information are very likely to choose a same link not crowed at the time that route choices are made, and then it becomes highly congested when they arrive at the link. It enables us to create distributed but coordinated traffic applications to improve mobility, safety, environmental friendliness of transportation systems.

Connection to RL method[edit]

The route choice problem concerns how rational drivers behave when choosing routes between their origins and destinations to minimize their travel costs. In order to accomplish this goal, drivers must adapt their choices to account for changing traffic conditions. Such scenarios are naturally modelled as multi-agent systems. Multiagent reinforcement learning (RL) captures the idea of self-interested agents interacting in a shared environment to improve their outcomes. In the basic, single-agent RL setting, the agent must learn by trial-and-error how to behave in an environment in order to maximize its utility. However, in multiagent RL settings, multiple agents share a common environment, and thus must adapt their behavior to each other. In other words, the learning objective of each agent becomes a moving target.

Main results[edit]

Grunitzki et al. (2014) [4]: two reinforcement schemes, the IQ-learning and DQ-learning, for solving the route choice problem is presented and compared. The former uses an individual reward function, which aims at finding a policy that maximizes the agents’ utility, the latter shapes the agents’ reward based on difference rewards function, and aims at finding a route that maximizes the system’s utility.

Stefanello et al. (2016) [5]: each agent aiming to go to her destination node from her origin node, the action space for her is the k shortest paths from her origin to her destination.

Ramos et al. (2018) [2]: the route choice behavior is modeled as a multiagent reinforcement learning scheme based on the action regret.

Zhou et al. (2020) [1]: the Bush-Mosteller (B-M) reinforcement learning (RL) scheme is introduced to model the route choice behaviors of the travelers in traffic networks, who aim to seek the optimal travel routes that minimize their individual travel time. The optimal route choice strategy is presented by the Nash equilibrium of the congestion game.

Table 1
Paper Action set Reward Algorithm
Grunitzki et al. (2014) [4] Outbound links from the nodes Negative travel time Individual & difference Q-learning
Stefanello et al. (2016) [5] k shortest routes from origin to destination Negative travel time Independent tabular Q-learning
Ramos et al. (2018) [2] k shortest routes from origin to destination App-based regret (anticipated disutility) Independent tabular Q-learning
Zhou et al. (2020) [1] Feasible routes from origin to destination Negative travel time Bush-Mosteller RL scheme


  1. 1.0 1.1 1.2 B. Zhou, Q. Song, Z. Zhao, and T. Liu, “A reinforcement learning scheme for the equilibrium of the in-vehicle route choice problem based on congestion game,” Appl. Math. Comput., vol. 371, 2020.
  2. 2.0 2.1 2.2 G. de O. Ramos, A. L. C. Bazzan, and B. C. da Silva, “Analysing the impact of travel information for minimising the regret of route choice,” Transp. Res. Part C Emerg. Technol., vol. 88, no. January 2017, pp. 257–271, 2018.
  3. Z. Shou, X. Chen, Y. Fu, and X. Di, “Multi-agent reinforcement learning for Markov routing games: A new modeling paradigm for dynamic traffic assignment,” Transp. Res. Part C Emerg. Technol., vol. 137, no. December 2021, p. 103560, 2022.
  4. 4.0 4.1 R. Grunitzki, G. O. De Ramos, and A. L. C. Bazzan, “Individual versus difference rewards on reinforcement learning for route choice,” Proc. - 2014 Brazilian Conf. Intell. Syst. BRACIS 2014, no. December, pp. 253–258, 2014.
  5. 5.0 5.1 F. Stefanello, B. C. Da Silva, and A. L. C. Bazzan, “Using topological statistics to bias and accelerate route choice: Preliminary findings in synthetic and real-world road networks,” CEUR Workshop Proc., vol. 1678, 2016.