Dynamic traffic assignment
Contributors: Zheng Li and Qi Luo.
This page is mainly based on Bazzan  and Shou .
The traffic assignment, which seeks to reproduce the way drivers select their routes, assumes that each driver is aware of a number of routes to travel from an origin to a destination, that it performs some experimentation, and that it selects rationally the route with the highest utility. This is the basis for many approaches that, in an iterative way, vary the combination of route choices in order to find one that maximizes the utility. This perspective is therefore a centralized, aggregate one. In reality, though, drivers may perform en-route experimentation, i.e., they deviate from the originally planned route.
Connection to RL method
While traveling from an origin to a destination, one needs to select a sequence of road segments. Thus, en-route path choice is inherently a sequential decision-making process, in which at each junction one decides what the next link to take. Markov decision processes (MDP) and reinforcement learning (RL) have become popular tools to model such a sequential process, in which travelers are rational agents to optimize a prescribed objective function (or reward). In these studies, every agent needs to decide which outbound link to choose whenever the agent arrives at a node, until the agent reaches some terminal node. This en-route decision-making process captures the adaptive behavior of real drivers.HK123456
Single agent Q-learning approach
Mao and Shen (2018) : update the state action values (Q function) based on the information gathered by travelling in the network. And then infer from the Q function to get the best routing policy.
Unfortunately, single-agent RL fails to capture the competition among adaptive agents. Paper Methods State Action set Reward Algorithm Mao and Shen (2018) (time, node, congestion status) Successor nodes Negative travel time Tabular Q-learning
Multi & independent agents RL
Grunitzki et al. (2014) : apply multiagent reinforcement learning to simulate the effects of the agents’ route choice based on two re- ward functions, and apply them using the reinforcement learning (RL) algorithm Q-learning.
Bazzan and Grunitzki (2016) : individual drivers are considered as active and autonomous agents, which, build these trips by experimentation during the actual trip. Agents learn their routes by deciding, at each node, how to continue their trips to each one's destination, in a way to minimize their travel times.
These studies assume that every agent is an independent learner who has no information of other agents. Such assumption suffers from two issues. First, it is impractical to assume that agents select routes independently without accounting for interactions with other agents and congestion effects on transportation networks. Second, theoretical convergence guarantee for Q-learning does not hold, because the environment is no longer Markovian and stationary.
|Grunitzki et al. (2014) ||(node)||Outbound links||Negative travel time (Individual and systematic)||Independent tabular Q-learning|
|Bazzan and Grunitzki (2016) ||(node)||Outbound links||Negative travel time||Independent tabular Q-learning Tabular|
Multi & dependent agents RL
Shou et al. (2022) : develop a Markov routing game (MRG) in which each agent learns and updates her own en-route path choice policy while interacting with others in transportation networks. Formulate MRG as multi-agent reinforcement learning (MARL) and devise a mean field multi-agent deep Q learning (MF-MA-DQL) approach that captures the competition among agents.
|Shou et al. (2022) ||(time, node)||Outbound links||Negative travel cost||Mean-field deep Q-learning|
- ↑ 1.0 1.1 1.2 1.3 Bazzan, A.L.C., Grunitzki, R., 2016. A multiagent reinforcement learning approach to en-route trip building. In: 2016 International Joint Conference on Neural Networks (IJCNN). pp. 5288–5295, ISSN: 2161-4407.
- ↑ 2.0 2.1 2.2 Shou, Z., Chen, X., Fu, Y., Di, X., 2022. Multi-agent reinforcement learning for Markov routing games: A new modeling paradigm for dynamic traffic assignment. Transp. Res. Part C Emerg. Technol. 137, 103560.
- ↑ 3.0 3.1 Grunitzki, R., Ramos, G.d.O., Bazzan, A.L.C., 2014. Individual versus difference rewards on reinforcement learning for route choice. In: 2014 Brazilian Conference on Intelligent Systems. pp. 253–258.