Navigating in Unmanned Aerial Vehivle

From RL for transport research
Jump to navigation Jump to search

Contributor:Zixuan Xu, Jingbang Chen, Qi Luo.
This page mainly based on the Survey by Fadi AlMahamid, Katarina Grolinger[1].

problem statement[edit]

An Unmanned Aerial Vehicle (UAV) is an aircraft without a human pilot, mainly known as a drone. Autonomous UAVs have been receiving an increasing interest due to their diverse applications, such as delivering packages to customers, responding to traffic collisions to attain injured with medical needs, tracking military targets, assisting with search and rescue operations, and many other applications. Typically, UAVs are equipped with cameras, among other sensors, that collect information from the surrounding environment, enabling UAVs to navigate that environment autonomously. UAV navigation training is typically conducted in a virtual 3D environment because UAVs have limited computation resources and power supply, and replacing UAV parts due to crashes can be expensive.

connection to RL[edit]

Different Reinforcement Learning (RL) algorithms are used to train UAVs to navigate the environment autonomously. RL can solve various problems where the agent acts as a human expert in the domain. The agent interacts with the environment by processing the environment’s state, responding with an action, and receiving a reward. UAV cameras and sensors capture information from the environment for state representation. The agent processes the captured state and outputs an action that determines the UAV movement’s direction or controls the propellers' thrust, as illustrated in Fig. 1. UAV1.jpg

Figure 1: UAV training using deep reinforcement agent[1].

RL agent design for UAV navigation depicted in Fig. 2 shows different UAV input devices used to capture the state processed by the RL agent. The agent produces action values that can be either the movement values of the UAV or the waypoint values where the UAV needs to relocate. Once the agent executes the action in the environment, it receives the new state and the generated reward based on the performed action. The reward function is designed to generate the reward subject to the intended objective while using various information from the environment. The agent design (‘Agent’ box in the figure) is influenced by the RL algorithms where the agent components and inner working varies from one algorithm to another. UAV2.jpg

Figure 2:RL agent design for UAV navigation task[1].

main results[edit]

Most of the research focused on two UAV navigation objectives: (1) Obstacle avoidance using various UAV sensor devices such as cameras and LIDARs and (2) Path planning to find the optimal or shortest route.

Obstacle avoidance[edit]

Avoiding obstacles is an essential task required by the UAV to navigate any environment, which can be achieved by estimating the distance to the objects in the environment using different devices such as front-facing cameras or distance sensors. The output generated by these different devices provides input to the RL algorithm and plays a significant role in the NN architecture. The output generated by these devices reflects the different states that the UAV has over time, used as an input to the RL agent to make actions causing the UAV to move in different directions to avoid obstacles. The NN architecture of the RL agent is based on: (1) input type, (2) the number of inputs, and (3) the used algorithm. For example, processing RGB images or depth-map images using the DQN algorithm requires Convolutional Neural Network (CNN) followed by fully-connect layers since CNN is known for its power in processing images. In contrast, processing event-based images is performed using Spiking Neural Networks (SNN), which is designed to handle spatiotemporal data and identify spatio-temporal patterns (Salvatore et al., 2020). [2]

Obstacle avoidance[edit]

Autonomous UAVs must have a well-defined objective before executing a flying mission. Typically, the goal is to fly from a start to a destination point, such as in delivery drones. But, the goal can also be more sophisticated, such as performing surveillance by hovering over a geographical area or participating in search and rescue operations to find a missing person. Autonomous UAV navigation requires path planning to find the best UAV path to achieve the flying objective while avoiding obstacles. The optimal path does not always mean the shortest path or a straight line between two points; instead, the UAV aims to find a safe path while considering UAV’s limited power and flying mission. Path planning can be divided into two main types:

  • Global Path Planning: concerned with planning the path from the start point to destination point in attempt to select the optimal path.
  • Local Path Planning: concerned with planning the local optimal waypoints in an attempt to avoid static and dynamic obstacles while considering the final destination.

Path planning can be solved using different techniques and algorithms; in this work, we focus on RL techniques used to solve global and local path planning, where the RL agent receives information from the environment and outputs the optimal waypoints according to the reward function. RL techniques can be classified according to the usage of the environment’s local information (1) map-based navigation and (2) mapless navigation.


  1. 1.0 1.1 1.2 AlMahamid, F., & Grolinger, K. (2022). Autonomous Unmanned Aerial Vehicle Navigation using Reinforcement Learning: A Systematic Review. Engineering Applications of Artificial Intelligence, 115, 105321.
  2. Salvatore, N., Mian, S., Abidi, C., George, A.D., 2020. A neuro-inspired approach to intelligent collision avoidance and navigation. In: IEEE Digital Avionics Systems Conference. pp. 1–9.