Travel behaviour modeling

From RL for transport research
Jump to navigation Jump to search

Contributors: Zhicheng Jin and Qi Luo.

This page mainly based on the studies by Yiru Liu[1], Anderson Rocha Tavares[2].

Problem statement[edit]

Day-to-day route choice behavior is a repetitive decision process. Each traveler is an agent with the ability of learning and decision- making. They follow a stochastic route choice rule because of their bounded rationality and the environmental uncertainty. Travelers’ route choice and travel time are not known by each other when there is no external information provided. Expected travel time (ET) and perceptive travel time (PT) are generated according to their experience. Travelers judge their choice by comparing their ET and PT after each travel and then update the probabilities that the paths will be chosen repeatedly.

From the perspective of theoretical modeling, departure time choice model is indispensable to depict how travelers choose time of travel. Literature on the departure time choice behavior modeling can be generally divided into two categories, i.e., random-utility Discrete Choice Models (DCM) and machine learning models [3]. The existing researches of random-utility DCM mainly include the mixed logit model [4], the latent class choice model [5], etc. A shortcoming of DCM is that they assume the data collection independent with the travelers’ behavior [6]. That is, when travelers interact with the environment (i.e., the transportation system), the system feedback could change the choice behavior. While such changes could not be captured by DCM. Besides, some DCMs, such as the multinomial logit model, have the strong assumption of independence of irrelevant alternatives (IIA), which may not hold in the research of departure time choice behavior. In addition, using Dynamic Discrete Choice Models (DDCMs) to depict the choice behavior of travelers has become prevalent in recent years. DDCM was first proposed by Rust[7], and is a model-based approach that requires solving Bellman’s equation to estimate a set of parameters. DDCMs mainly solve the challenge of how to define and compute the expected maximum utility from a node in the graph to the nodes representing the alternatives and how to derive the choice probabilities. Considering the difficulty of these problems, DDCMs have generally been adopted to model route choice behavior. Essentially, road network structures are well-defined for modeling, making them simpler to solve and estimate [8]. However, departure time choice models do not have standardized structures, and choices are highly dependent on each other. Furthermore, the complexity of the information that may influence departure time choice makes establishing an accurate numerical model difficult. Accordingly, it would be desired to adopt a model-free approach that can condense all information into its trained parameters.

Connection to RL methods[edit]

RL has been predominantly used for route choice modelling [2][9][10]. As mentioned earlier route choice models require a series of decisions that lead to the destination rather than a single decision and this suits RL framework very well.

Other than route choice, RL is used mostly in activity schedule modelling. One of the earlier attempts was by Charypar[11]. This study uses q-learning to generate activity plans that include information on activity type and its temporal aspects. Vanhulsel[12] used q-learning to simulate activity sequences. They improve the q-learning using CART Decision Trees to approximate q-values. This improvement not only gives a better solution but also improves the speed of estimation. Yang et al. [13] use q-learning to model activity-travel patterns of individuals including destination choice and time of activity/travel. After each time step, the environment is updated to incorporate the effect of each individual’s decision and the interaction between agents is captured. By simulating the model, the aggregate level traffic characteristics are obtained and compared with survey data. Taking this approach further, Yang et al. [13] use RL to evaluate the effect of staggered working hours on activity schedule and hence on the travel demand highlighting the ability of RL in policy evaluation. A drawback of q-learning is that the number of q-values to be learnt and stored grows exponentially as the number of states or action increases. Further, when small changes occur in the environment, q-learning models need training from scratch[12].

Moreover, previous studies on the departure time choice behavior modeling can be generally divided into random-utility discrete choice models and machine learning models. In this work, we conduct a laboratory virtual experiment as a cheap approach to collect real person’s departure time choice data. An Inverse Reinforcement Learning (IRL) method is proposed to capture travelers’ preferences to times of departure. In IRL, the weights of the reward function are used to depict the departure time choice behavior based on observed data, so as to rationalize the strategy of real travelers [1].

Main results[edit]

In this study[1], an Inverse Reinforcement Learning (IRL) model is established to investigate individual departure time choice under the influence of incentives and further evaluate alternative incentive schemes on departure time shifts. Compared to Discrete Choice Models (DCM), IRL model can take into account the interaction between agent and environment, and fully mine the information behind the data. What’s more, the learnt weights of the reward function in IRL provide trustworthy explanation to describe the traveler’s choice behavior, as compared to a black box conveyed by other machine learning methods. By automatically adjusting parameters in the interactive process, the model avoids the subjective problem of setting parameters manually.

FIGURE 1 Structure of data collection, RL modeling and IRL modeling Travel behaviour 2.png

  1. 1.0 1.1 1.2 Liu, Y., Li, Y., Qin, G., Tian, Y., & Sun, J. (2022). Understanding the behavioral effect of incentives on departure time choice using inverse reinforcement learning. Travel Behaviour and Society, 29, 113-124.
  2. 2.0 2.1 Tavares, A. R., & Bazzan, A. L. (2012). Reinforcement learning for route choice in an abstract traffic scenario. Paper presented at the VI Workshop-Escola de Sistemas de Agentes, seus Ambientes e aplicaçoes (WESAAC).
  3. Zhu, Z., Chen, X., Xiong, C., Zhang, L., 2018. A mixed Bayesian network for two-dimensional decision modeling of departure time and mode choice. Transportation (Amst). 45, 1499–1522.
  4. Bwambale, A., Choudhury, C.F., Hess, S., 2019. Modelling departure time choice using mobile phone data. Transp. Res. Part A Policy Pract. 130, 424–439.
  5. Wang, Y.u., Wang, Y., Choudhury, C., 2020. Modelling heterogeneity in behavioral response to peak-avoidance policy utilizing naturalistic data of Beijing subway travelers. Transp. Res. Part F Traffic Psychol. Behav. 73, 92–106.
  6. Feng J, Huang S, Chen C. Modeling user interaction with app-based reward system: A graphical model approach integrated with max-margin learning[J]. Transportation Research Part C: Emerging Technologies, 2020, 120: 102814.
  7. Rust, J., 1987. Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher. Econometrica 55, 999–1033.
  8. Mai, T., Frejinger, E., Fosgerau, M., Bastin, F., 2017. A dynamic programming approach for quickly estimating large network-based MEV models. Transp. Res. Part B Methodol. 98, 179–197.
  9. Wei, F., Ma, S., & Jia, N. (2014). A day-to-day route choice model based on reinforcement learning. Mathematical Problems in Engineering, 2014.
  10. Zhang, Z., & Xu, J.-M. (2005). A dynamic route guidance arithmetic based on reinforcement learning. Paper presented at the 2005 International Conference on Machine Learning and Cybernetics.
  11. Charypar, D., & Nagel, K. (2005). Generating complete all-day activity plans with genetic algorithms. Transportation, 32(4), 369-397.
  12. 12.0 12.1 Vanhulsel, M., Janssens, D., Wets, G., & Vanhoof, K. (2009). Simulation of sequential data: An enhanced reinforcement learning approach. Expert Systems with Applications, 36(4), 8032-8039.
  13. 13.0 13.1 Yang, M., Yang, Y., Wang, W., Ding, H., & Chen, J. Multiagent-based simulation of temporal-spatial characteristics of activity-travel patterns using interactive reinforcement learning. Mathematical Problems in Engineering, 2014.