From RL for transport research
Jump to navigation Jump to search

Contributors: Zhiyun Deng and Qi Luo.

This page mainly based on the survey by Khan et al. [1] and articles by Arabi et al. [2] and Qu et al. [3].

Problem Statement[edit]


Intelligent Transportation Systems (ITS) combine emerging communication, computer, and system technologies to deliver intelligent road traffic services and optimize decision making within the transportation infrastructure. The advancement of Connected and Autonomous Vehicles (CAVs), which generate dynamic data through wireless communications, enables ITS to improve their efficiency, especially in Traffic Signal Control (TSC). It is noteworthy that the ubiquitous communications and computing capabilities of CAVs and the ITS infrastructure are critical to the operation of Level 3–5 CAVs. However, this pervasive connectivity broadens the scope of criminal activity in both physical and cyberspace, generating new security concerns in the form of cyber-attacks.

Note that the potential CAVs-hackers might be: 1) motivated by personal or financial objectives, 2) technological professionals looking for vulnerabilities, or 3) hacktivists. In addition, cyberattacks can be classified into two types: safety-related and non-safety-related assaults [4]. In detail, the former is directly related to life-threatening situations such as car accidents, while the latter is more concerned with privacy, efficiency, and fairness, etc. For examples, Cyber-attacks on vehicles have been successfully launched, e.g., sensitive-information leakage in the Tesla Autonomous Vehicle (AV) [5], triggering incorrect GPS location in Google-car [6], and deactivating the BMW car diagnostic system [7]. Hence, there is a great deal of uncertainty about delivering the advantages of CAVs, raising concerns that autonomous vehicle technology will under-perform its promises, particularly the prospects for safety and security.

A system dynamics model for cybersecurity assessment is provided in [1], which is based on the integration of concepts covering several facets, i.e., 1) CAVs communication framework, 2) secured physical access, 3) human factors, 4) CAVs penetration, 5) regulatory laws and policy framework, and 6) trust-across the industry (OEMs).

Figure 1: Potential drivers for a cyber-safe CAV-based ITS [1].

Security of Traffic Signal Control Systems[edit]

In general, the ITS system involves six main components: vehicles, roadway reporting, traffic flow controls, payment applications, management systems, and communication applications [8]. It is clear that data analysis in TSC systems is highly based on real-time traffic data flows, the reliance of which entails greater security risk and results in negative impact on the security and safety of the whole system.

The following are some of the major attack scenarios on TSC systems [2]:

(1) Data and time spoofing attack: CAVs usually broadcast basic messages that include their location and speed to other vehicles and the road infrastructure. Hence, an attacker can simply install location spoofing applications to manipulate the true location and fake real-time data trajectories [9]. Similarly, when it comes to time spoofing, an attacker vehicle attempts to altering its arrival time to create chaos within the entire ITS.

(2) Replay attack: An attacker can capture data, e.g., GPS signal, from a location and resend the same data to a different location at a later time, providing the controllers with fake sensor information that lead to suboptimal decisions.

(3)Denial of Service (DoS) attack: DoS attacks can target the traffic light controller to set the lights to an undesired state, e.g., all lights to red [10], causing traffic jams.

(4) Man-in-the-Middle (MITM) attack: An attacker intercepts the communication between two legitimate parties to eavesdrop secretly, steal personal information, or alter and resend the message traveling between the two parties [11]. Adversaries can launch MITM attacks through various interactions such as V2V and V2I communication.

(5) Sybil attack: Generally, each node in a wireless communication network has a distinct identity. However, a malicious node can exploit this property by sending messages using multiple fake identities [12]. In that case, a dishonest vehicle can exploit multiple fake identities to increase the number of vehicles waiting at a traffic intersection.

Connection to Deep Reinforcement Learning (DRL) methods[edit]

In the existing DRL-based Adaptive Traffic Signal Control Systems (ATCS), the agent (i.e., the decision-maker) collects traffic state information from nearby vehicles and then produce optimal actions (e.g., switching phases) accordingly. In that case, the DRL model used to fully “trust” that vehicles are sending the true information, making the ATCS vulnerable to adversarial attacks with falsified information. Hence, as shown in Fig. 2, the attackers (e.g., a group of colluding vehicles marked in red) can cooperatively send falsified information to “cheat” DRL- based ATCS in order to save their total travel time.

Figure 2: Attacking ATCS with colluding vehicles[3].

Hence, the attack problem can be formulated in a reinforcement learning problem as shown in Fig. 3, where a single attack agent may attempt to inject fake vehicle at the four-way three-lanes intersection to sabotage the decisions of ATCS. It is assumed that traffic signal agent have been trained to set their signal phase to optimize traffic flow, while the underlying control policy is unknown to any vehicles and the attacker.

Figure 3: Sybil attack scenarios based on RL [2].

State Space[edit]

The attack agent requires access to the intersection information to add perturbations in the traffic system including the number of vehicles at each road as well as the traffic signal state and timing.

Action Space[edit]

After receiving an observation , the agent can send a falsified data indicating the presence of more than one vehicles in order to attract more ”attention” from the traffic signal in charge. For example, [2] defined the action corresponds to the injection of a number of Sybil vehicles into the network. Besides, [3] defined the action space for each agent as a set of possible numbers that each agent can report and at each time step.


The design of reward function is related to the purpose of the attacker. For example, the Sybil attack [2] intends to create high-delay traffic flow. Thus, it is reasonable to reward the agent when the vehicle’s travel time is increased. By contrast, the reward can be defined as the negative average waiting time over all running vehicles [3] with the goal of minimizing total travel time for all colluding vehicles. Therefore, when agents attempt to maximize their reward, their waiting time will be reduced.

Insights and Suggestions[edit]

The research outcomes could help improve the reliability and robustness of the ATCS and better protect the smart mobility systems. In the previous sections, we demonstrate that vehicles can effectively reduce their waiting time at intersections by forming collusion groups and attacking DRL-based ATCS with falsified information. The implications of these results are significant as plenty of resources are invested in developing connected traffic environment and the scenario assumed in this paper may come true in the near future.

In order to prevent such collusion from happening in the real world, the following suggestions are provided in [3]: 1) A strict certification mechanism for connected vehicles is critical in order to fundamentally address this problem while other issues such as privacy concern may make this process challenging. 2) Updating ATCS’s policy frequently should help prevent them from being attacked. For ATCS powered by a DRL model with elaborate design (e.g., MA2C), it takes colluding vehicles a certain amount of time to learn the environment and launch successful attacks (i.e., 1000 episodes of training may take at least a year in real life). 3) From an algorithmic perspective, equipping ATCS with real-time anomaly detection mechanisms may enable immediate recognition of attacks and effective countermeasures taken in time. Moreover, robust DRL, as a popular research area, can mitigate the effects of the falsified information.


  1. 1.0 1.1 1.2 Khan, S. K., Shiwakoti, N., & Stasinopoulos, P. (2022). A conceptual system dynamics model for cybersecurity assessment of connected and autonomous vehicles. Accident Analysis & Prevention, 165, 106515.
  2. 2.0 2.1 2.2 2.3 2.4 Arabi, N. S., Halabi, T., & Zulkernine, M. (2021, July). Reinforcement Learning-driven Attack on Road Traffic Signal Controllers. In 2021 IEEE International Conference on Cyber Security and Resilience (CSR) (pp. 218-225). IEEE.
  3. 3.0 3.1 3.2 3.3 3.4 Qu, A., Tang, Y., & Ma, W. (2021). Attacking Deep Reinforcement Learning-Based Traffic Signal Control Systems with Colluding Vehicles. arXiv preprint arXiv:2111.02845.
  4. Wang, P., Wu, X., & He, X. (2020). Modeling and analyzing cyberattack effects on connected automated vehicular platoons. Transportation research part C: emerging technologies, 115, 102625.
  5. Prevost, S., & Kettani, H. (2019, October). On data privacy in modern personal vehicles. In Proceedings of the 4th International Conference on Big Data and Internet of Things (pp. 1-4).
  6. Hirz, M., & Walzel, B. (2018). Sensor and object recognition technologies for self-driving cars. Computer-aided design and applications, 15(4), 501-508.
  7. Tan, H., Choi, D., Kim, P., Pan, S., & Chung, I. (2017). Comments on “dual authentication and key management techniques for secure data transmission in vehicular ad hoc networks”. IEEE Transactions on Intelligent Transportation Systems, 19(7), 2149-2151.
  8. Huq, N., Vosseler, R., & Swimmer, M. (2017). Cyberattacks against intelligent transportation systems. TrendLabs Research Paper.
  9. Vinayaga-Sureshkanth, N., Wijewickrama, R., Maiti, A., & Jadliwala, M. (2020, March). Security and privacy challenges in upcoming intelligent urban micromobility transportation systems. In Proceedings of the Second ACM Workshop on Automotive and Aerial Vehicle Security (pp. 31-35).
  10. Li, Z., Jin, D., Hannon, C., Shahidehpour, M., & Wang, J. (2016). Assessing and mitigating cybersecurity risks of traffic light systems in smart cities. IET Cyber‐Physical Systems: Theory & Applications, 1(1), 60-69.
  11. Al-Kahtani, M. S. (2012, December). Survey on security attacks in vehicular ad hoc networks (VANETs). In 2012 6th international conference on signal processing and communication systems (pp. 1-9). IEEE.
  12. Douceur, J. R. (2002, March). The sybil attack. In International workshop on peer-to-peer systems (pp. 251-260). Springer, Berlin, Heidelberg.