Connected and Automated Vehicles

From RL for transport research
Jump to navigation Jump to search

Contributors: Zhiyun Deng and Qi Luo.

This page mainly based on the Survey by Haydari et al. [1], Aradi, Szilárd[2], and Kiran et al. [3].


Connected automated vehicles (CAVs) equipped with advanced on-board sensors, controllers, actuators, and other devices, can perceive environmental information and make intelligent driving decisions[4]. With the support of Dedicated Short-Range Communication (DSRC) and cellular networks, CAVs have the ability to communicate with each other (Vehicle-to-Vehicle, V2V), road infrastructures (Vehicle-to-Infrastructure, V2I), and even non-motorized road users including pedestrians (Vehicle-to-X, V2X). After reaching a certain level, CAVs can drive themselves without the input of human beings, relying only on the computer control systems they are equipped with. Hence, CAVs are seen as the future of transportation that help to improve safety by reducing and mitigating traffic accidents. With the advancement of CAVs, researchers became interested in investigating benefits of vehicle platooning at traffic facilities, e.g., speed harmonization on highway segments [5], cooperative merging at highway on-ramps [6], cooperative eco-driving at signalized intersections [7], and automated coordination at unsignalized intersections [8].

In a fully autonomous traffic stream, all the vehicles are CAVs whose motion are deterministic and can be controlled by a centralized decision-maker. Hence, the coordination problem of CAVs can be solved by classical optimization-based approaches (e.g., mathematical programming [5], evolutionary algorithm [9]). However, in the mixed traffic, the motions of Human Driven Vehicles (HDVs) are full of randomness [10], which motive researchers to develop decentralized control algorithm. Recently, Reinforcement Learning (RL) has been treated as a promising solution to the autonomous driving problem.

Connection to Reinforcement Learning (RL) methods[edit]

In general, the Autonomous Driving (AD) system are consisted of several standard blocks as demonstrated in Fig. 1. It can be seen that the automated vehicles are required to create an intermediate level representation of the environment state (e.g., bird-eye view map of all obstacles and agents) based on multiple sensors, which will be later utilized by the decision-making system to produce a driving policy. It is noteworthy that most of the perception level tasks can be accomplished with high precision on account of Deep Learning (DL) architectures. However, the AD system still has multiple tasks where classical supervised learning methods are no more applicable. Specifically, for those tasks which can be formulated as a sequential decision process (e.g., the task of optimal driving speed, optimal trajectory, optimal lane-changing decision), Reinforcement Learning (RL) is promising solution in the domains of driving policy, predictive perception, path and motion planning, and low-level controller design.

Figure 1: Standard components in a modern Autonomous Driving (AD) system[3].

Despite deep neural networks are available to develop end-to-end solution where the AD system operates like a human driver (i.e., its inputs are the travel destination, the knowledge about the road network, and various sensor information, and the outputs is the direct vehicle control commands), it is complicated to realize such a scheme in actual situations due to the requirement to handle all layers of the driving task. Moreover, such end-to-end system behaves like a black box, which raising design and validation problems. Hence, some researchers try to decompose the original task into at least four sub-tasks [11] as shown in Fig. 2 and focus on solving these sub-tasks of the AD problem. In details, the route planning layer defines the waypoints of the journey based on the map of the road network with the possibility of using real-time traffic data [12]. In the behavioral layer, the agent decides on the short-term policy by taking into consideration of the local rad topology, traffic rules, and the perceived state of other traffic participants [13]. Having a finite set of available actions for the driving context, the realization of this layer is usually a finite state-machine having basic strategies in its states (i.e., car following, lane changing, etc.) with well-defined transitions between them based on the change of the environment. To carry out the strategy defined by the behavioral layer, the motion planning layer needs to design a feasible trajectory consisting of the expected speed, yaw, and position states of the vehicle on a short horizon [14]. Naturally, on this level, the vehicle dynamics has to be considered, hence classic exact solutions of motion planning are impractical since they usually assume holonomic dynamics. At the lowest level, the local feedback control is responsible for minimizing the deviation from the prescribed path or trajectory.

Figure 2: Decomposed layers of the Autonomous Driving (AD) task [2].

Vehicle Modeling[edit]

Modeling the movement of the ego-vehicle is a crucial part of the training process since it raises the trade-off problem between model accuracy and computational resource. Since RL techniques use a massive number of episodes for determining optimal policy, the step time of the environment, which highly depends on the evaluation time of the vehicle dynamics model, profoundly affects training time.

Therefore, during environment design, one needs to choose from the simplest kinematic model to more sophisticated dynamics models ranging from 2 Degree of Freedom (2DoF) lateral model [15] to the more and more complex models with a higher number of parameters and complicated tire models (e.g., 3DoF and 9DoF). Though such simplified kinematic model can behave significantly different from an actual vehicle [16], the accuracy is still suitable for many control situations [15]. By contrast, traffic and surrounding vehicles are often performed by cellular automata models [17], car-following models, and the Intelligent Driving Model (IDM) [18].

Deep RL Setting[edit]

The schematic of the RL process is depicted in Fig. 3, where an agent controlled with an algorithm, observes the system state at each time step and receives a reward from its environment/system after taking the action. After taking an action based on the current policy, the system transitions to the next state. After every interaction, RL agent updates its knowledge about the environment. It is noteworthy that despite either the vehicle or traffic light is available to be defined as the agent, the vehicle agent is a convenient modelling choice for its straightforward definition of state, action, and reward. In other words, each CAV can be treated an agent since it has the ability to sensor the environment, communicate with others, and make intelligent decisions. In the following sections, we will discuss the deep RL configurations (i.e., state, action, reward definitions and neural network structure) together with the traffic simulators used in the literature.

Figure 3: Reinforcement learning control loop[1].

State Spaces[edit]

While humans use their senses (e.g., sight and hearing) to drive, AVs use sensors (e.g., cameras and radars) as shown in Fig.4. Hence, the quality of sensors plays a key role in building successful AVs. For example, the best perception algorithms can perform poorly if the data collected from the corresponding sensors are not reliable. A comprehensive review of different state and action representations used in autonomous driving research is provided in [19]. In a nutshell, commonly used state space features for an autonomous vehicle include position, heading and velocity of ego-vehicle, as well as other obstacles in the sensor view extent of the ego-vehicle. Moreover, a Cartesian or Polar occupancy grid around the ego vehicle is frequently employed to avoid variations in the dimension of the state space. This is further augmented with lane information such as lane number (ego-lane or others), path curvature, past and future trajectory of the ego-vehicle, longitudinal information such as Time-to-collision (TTC), and finally scene information such as traffic laws and signal locations.

Figure 4: Sensors used in Autonomous Vehicles [20].

In addition, using raw sensor data such as camera images, LiDAR, radar, etc. provides the benefit of finer contextual information, while using condensed abstracted data reduces the complexity of the state space. In between, a mid-level representation such as 2D bird eye view (BEV) is sensor agnostic but still close to the spatial organization of the scene. Fig. 5 is an illustration of a top-down view showing an occupancy grid, past and projected trajectories, and semantic information about the scene such as the position of traffic lights. This intermediary format retains the spatial layout of roads when graph-based representations would not.

Figure 5: Bird Eye View (BEV) 2D representation of a driving scenario (Left demonstrates an occupancy grid. Right shows the combination of semantic information with past and projected trajectories. ) [3].

Action Spaces[edit]

A vehicle policy must control a number of different actuators, while the choice of action space highly depends on the vehicle model and task designed for the reinforcement learning problem in each previous research. In detail, continuous-valued actuators for vehicle control include steering angle, throttle, and brake, while other actuators such as gear changes are discrete. To reduce complexity and allow the application of DRL algorithms which work with discrete action spaces only (e.g. DQN), an action space may be discretised uniformly by dividing the range of continuous actuators such as steering angle, throttle and brake into equal-sized bins. Apart from this, discretisation in log-space has also been suggested, as many steering angles which are selected in practice are close to the centre [21].

However, discretisation does have disadvantages since it can lead to jerky or unstable trajectories if the step values between actions are too large. Furthermore, when selecting the number of bins for an actuator, there is a trade-off between having enough discrete steps to allow for smooth control, and not having so many steps that action selections become prohibitively expensive to evaluate. As an alternative to discretisation, continuous values for actuators may also be handled by DRL algorithms which learn a policy directly, (e.g. DDPG). Temporal abstractions options framework [22] may also be employed to simplify the process of selecting actions, where agents select options instead of low-level actions. These options represent a sub-policy that could extend a primitive action over multiple time steps. Additionally, some papers combine the control and behavioral layers by separating longitudinal and lateral tasks, where longitudinal acceleration is a direct command, while lane changing is a strategic decision like in [23].


During training, the agent tries to fulfill a task, generally consisting of more than one step. This task is called an episode. An episode ends if one of the following conditions are met: (1) The agent successfully fulfills the task; (2) The episode reaches a previously defined steps; (3) A terminating condition rises. The first two cases are trivial and depend on the design of the actual problem. Terminal conditions are typically situations where the agent reaches a state from which the actual task is impossible to fulfill, or the agent makes a mistake that is not acceptable. For example, vehicle planning agents usually use terminating conditions, such as: collision with other participants or obstacles or leaving the track or lane, since these two inevitably end the episode. There are lighter approaches, where the episode terminates with failure before the accident occurred, with examples of having a too high tangent angle to the track or reaching too close to other participants. These before accident terminations speed up the training by bringing the information of failure forward in time, though their design needs caution [24].

Rewarding plays the role of evaluating the goodness of the choices the agent made during the episode giving feedback to improve the policy. Designing reward functions for DRL agents for autonomous driving is still very much an open question. Examples of criteria for AD tasks include: distance travelled towards a destination [25], speed of the ego vehicle [25], [26], [27], keeping the ego vehicle at a standstill [28], collisions with other road users or scene objects [25], [26], infractions on sidewalks [25], keeping in lane, and maintaining comfort and stability while avoiding extreme acceleration, braking or steering [27], [28], and following traffic rules [26].

Simulation Environments[edit]

Reinforcement learning requires an environment where state-action pairs can be recovered while modelling dynamics of the vehicle state, environment as well as the stochasticity in the movement and actions of the environment and agent respectively. In modeling the traffic environment, the most popular choice is SUMO (Simulation of Urban MObility), which is a microscopic, inter- and multi-modal, space-continuous and time-discrete traffic flow simulation platform [42]. It can convert networks from other traffic simulators such as VISUM, Vissim, or MATSim and also reads other standard digital road network formats, such as OpenStreetMap or OpenDRIVE. It also provides interfaces to several environments, such as python, Matlab,.Net, C++, etc. In addition, various simulators are actively used for training and validating reinforcement learning algorithms as shown in Table 1.

Table 1: Simulators for RL Applications
Simulator Description
CARLA Urban simulator, Camera & LIDAR streams, with depth & semantic segmentation, Location information
TORCS Racing Simulator, Camera stream, agent positions, testing control policies for vehicles Camera stream with depth
ROS Multi-robot physics simulator employed for path planning & vehicle control in complex 2D & 3D maps
SUMO Macro-scale modelling of traffic in cities motion planning simulators are used
DeepDrive Driving simulator based on unreal, providing multi-camera (eight) stream with depth
Flow Multi-Agent Traffic Control Simulator built on top of SUMO
SMARTS A simulation environment used to create realistic and diverse interactions that enable deeper and broader research on multi-agent interaction
MetaDrive A driving simulation platform developed to support the research of generalizable reinforcement learning algorithms for machine autonomy
OpenCDA A generalized framework and tool for developing and testing Cooperative Driving Automation systems developed by UCLA Mobility Lab


  1. 1.0 1.1 Haydari, A., & Yilmaz, Y. (2020). Deep reinforcement learning for intelligent transportation systems: A survey. IEEE Transactions on Intelligent Transportation Systems.
  2. 2.0 2.1 Aradi, S. (2020). Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems.
  3. 3.0 3.1 3.2 Kiran, B. R., Sobh, I., Talpaert, V., et al. (2021). Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems.
  4. Wang, Y., Cai, P., & Lu, G. (2020). Cooperative autonomous traffic organization method for connected automated vehicles in multi-intersection road networks. Transportation research part C: emerging technologies, 111, 458-476.
  5. 5.0 5.1 Li, X., Ghiasi, A., Xu, Z., & Qu, X. (2018). A piecewise trajectory optimization model for connected automated vehicles: Exact optimization algorithm and queue propagation analysis. Transportation Research Part B: Methodological, 118, 429-456.
  6. Xu, H., Feng, S., Zhang, Y., & Li, L. (2019). A grouping-based cooperative driving strategy for CAVs merging problems. IEEE Transactions on Vehicular Technology, 68(6), 6125-6136.
  7. Wang, Z., Wu, G., & Barth, M. J. (2019). Cooperative eco-driving at signalized intersections in a partially connected and automated vehicle environment. IEEE Transactions on Intelligent Transportation Systems, 21(5), 2029-2038.
  8. Qian, B., Zhou, H., Lyu, F., Li, J., Ma, T., & Hou, F. (2019). Toward collision-free and efficient coordination for automated vehicles at unsignalized intersection. IEEE Internet of Things Journal, 6(6), 10408-10420.
  9. Deng, Z., Fan, J., Shi, Y., & Shen, W. (2022). A Coevolutionary Algorithm for Cooperative Platoon Formation of Connected and Automated Vehicles. IEEE Transactions on Vehicular Technology.
  10. Shi, H., Zhou, Y., Wu, K., Wang, X., Lin, Y., & Ran, B. (2021). Connected automated vehicle cooperative control with a deep reinforcement learning approach in a mixed traffic environment. Transportation Research Part C: Emerging Technologies, 133, 103421.
  11. Paden, B., Čáp, M., Yong, S. Z., Yershov, D., & Frazzoli, E. (2016). A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on intelligent vehicles, 1(1), 33-55.
  12. Bast, H., Delling, D., Goldberg, A., Müller-Hannemann, M., Pajor, T., Sanders, P., ... & Werneck, R. F. (2016). Route planning in transportation networks. In Algorithm engineering (pp. 19-80). Springer, Cham.
  13. Dou, Y., Yan, F., & Feng, D. (2016, July). Lane changing prediction at highway lane drops using support vector machine and artificial neural network classifiers. In 2016 IEEE international conference on advanced intelligent mechatronics (AIM) (pp. 901-906). IEEE.
  14. Saxena, D. M., Bae, S., Nakhaei, A., Fujimura, K., & Likhachev, M. (2020, May). Driving in dense traffic with model-free reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA) (pp. 5385-5392). IEEE.
  15. 15.0 15.1 Kong, J., Pfeiffer, M., Schildbach, G., & Borrelli, F. (2015, June). Kinematic and dynamic vehicle models for autonomous driving control design. In 2015 IEEE intelligent vehicles symposium (IV) (pp. 1094-1099). IEEE.
  16. Polack, P., Altché, F., d'Andréa-Novel, B., & de La Fortelle, A. (2017, June). The kinematic bicycle model: A consistent model for planning feasible trajectories for autonomous vehicles?. In 2017 IEEE intelligent vehicles symposium (IV) (pp. 812-818). IEEE.
  17. You, C., Lu, J., Filev, D., & Tsiotras, P. (2019). Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robotics and Autonomous Systems, 114, 1-18.
  18. Treiber, M., Hennecke, A., & Helbing, D. (2000). Congested traffic states in empirical observations and microscopic simulations. Physical review E, 62(2), 1805.
  19. Leurent, E. (2018). A survey of state-action representations for autonomous driving.
  20. Elallid, B. B., Benamar, N., Hafid, A. S., Rachidi, T., & Mrani, N. (2022). A Comprehensive Survey on the Application of Deep and Reinforcement Learning Approaches in Autonomous Driving. Journal of King Saud University-Computer and Information Sciences.
  21. Xu, H., Gao, Y., Yu, F., & Darrell, T. (2017). End-to-end learning of driving models from large-scale video datasets. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2174-2182).
  22. Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2), 181-211.
  23. Nageshrao, S., Tseng, H. E., & Filev, D. (2019, October). Autonomous highway driving using deep reinforcement learning. In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) (pp. 2326-2331). IEEE.
  24. Alizadeh, A., Moghadam, M., Bicer, Y., Ure, N. K., Yavas, U., & Kurtulus, C. (2019, October). Automated lane change decision making using deep reinforcement learning in dynamic and uncertain highway environment. In 2019 IEEE intelligent transportation systems conference (ITSC) (pp. 1399-1404). IEEE.
  25. 25.0 25.1 25.2 25.3 Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017, October). CARLA: An open urban driving simulator. In Conference on robot learning (pp. 1-16). PMLR.
  26. 26.0 26.1 26.2 Li, C., & Czarnecki, K. (2018). Urban driving with multi-objective deep reinforcement learning. arXiv preprint arXiv:1811.08586.
  27. 27.0 27.1 Kardell, S., & Kuosku, M. (2017). Autonomous vehicle control via deep reinforcement learning (Master's thesis).
  28. 28.0 28.1 Chen, J., Yuan, B., & Tomizuka, M. (2019, October). Model-free deep reinforcement learning for urban autonomous driving. In 2019 IEEE intelligent transportation systems conference (ITSC) (pp. 2765-2771). IEEE.