Adaptive QoS Routing Algorithm for UAV Networks Based on Reinforcement Learning

Introduction

Unmanned Aerial Vehicle (UAV) networks have gained significant attention due to their flexibility, rapid deployment, and self-organizing capabilities. These networks are widely used in both military and civilian applications, including surveillance, disaster management, and communication relay. However, ensuring reliable communication in UAV networks remains challenging due to their highly dynamic nature. UAVs move at speeds ranging from 30 to 460 km/h, leading to frequent topology changes, intermittent connectivity, and increased latency and packet loss. Traditional routing protocols, originally designed for static or moderately mobile networks, struggle to adapt to these conditions, particularly in scenarios requiring strict Quality of Service (QoS) guarantees.

To address these challenges, this paper proposes an adaptive QoS routing algorithm for UAV networks based on Q-learning, referred to as QBQR (Q-learning Based QoS-aware Routing). Unlike conventional routing protocols that rely on fixed metrics such as hop count, QBQR dynamically adjusts routing decisions by considering real-time network conditions, including link delay and packet loss rate. Additionally, the algorithm incorporates node mobility information to enhance routing stability. Simulation results demonstrate that QBQR outperforms existing routing protocols in terms of end-to-end delay, packet delivery rate, and routing overhead.

Challenges in UAV Network Routing

UAV networks exhibit unique characteristics that complicate routing protocol design. First, the high mobility of UAVs leads to rapid and unpredictable topology changes. Unlike ground-based mobile ad hoc networks (MANETs) or vehicular ad hoc networks (VANETs), UAVs operate in three-dimensional space, further increasing the complexity of link maintenance. Second, UAV networks often operate in harsh environments, particularly in military applications, where interference and node failures are common. These factors contribute to increased latency, packet loss, and control overhead.

Existing routing protocols for UAV networks can be broadly categorized into proactive, reactive, and hybrid approaches. Proactive protocols, such as Optimized Link State Routing (OLSR), maintain up-to-date routing tables but suffer from high control overhead. Reactive protocols, such as Ad hoc On-Demand Distance Vector (AODV), reduce overhead by discovering routes on demand but introduce additional latency during route establishment. Hybrid protocols attempt to balance these trade-offs but still struggle to adapt to the extreme dynamics of UAV networks.

Reinforcement Learning for UAV Routing

Reinforcement learning (RL) offers a promising solution to these challenges by enabling UAVs to learn optimal routing strategies through interaction with the network environment. Q-learning, a model-free RL algorithm, is particularly well-suited for this task. In Q-learning, each UAV (acting as an agent) maintains a Q-table that stores the expected cumulative reward (or penalty) for taking specific actions (e.g., forwarding packets to a neighbor) in different states (e.g., current node and destination). The agent updates its Q-values based on feedback from the environment, gradually converging to an optimal routing policy.

Previous studies have explored Q-learning-based routing in UAV networks. For instance, QMR (Q-learning based Multi-objective Routing) optimizes both delay and energy consumption, while DeepCQ+ integrates deep reinforcement learning to improve scalability. However, these approaches often fail to fully account for the distinct characteristics of QoS metrics. Delay is an additive metric, meaning it accumulates along the path, whereas packet loss is multiplicative, as the total packet delivery probability is the product of individual link success rates. QBQR addresses this limitation by designing specialized penalty functions for each metric, ensuring more accurate path selection.

QBQR Algorithm Design

Neighbor Discovery and Maintenance

QBQR employs a hybrid approach for neighbor discovery, combining periodic HELLO messages with data-driven updates. Each UAV broadcasts HELLO messages at fixed intervals, containing its current position and velocity. Neighboring nodes use this information to maintain an up-to-date neighbor table. To minimize control overhead, QBQR supplements HELLO messages with data packet acknowledgments (ACKs). If a node fails to receive ACKs from a neighbor after multiple data transmissions, it removes that neighbor from its table, ensuring rapid detection of link failures.

Link State Estimation

Accurate estimation of link quality is critical for QoS-aware routing. QBQR measures two key metrics:

  1. Link Delay: When Node A sends a data packet to Node B, Node B records the reception time and includes it in the ACK. Node A calculates the round-trip delay as the difference between sending the packet and receiving the ACK. To reduce measurement noise, QBQR uses a moving average of recent delay samples.
  2. Packet Loss Rate: Each node monitors the number of packets sent and received within a sliding time window. The packet loss rate is computed as the ratio of lost packets to total transmitted packets. Nodes exchange this information via ACKs, enabling dynamic adaptation to link conditions.

Q-Learning Framework

QBQR models the routing process as a Markov Decision Process (MDP), where each UAV acts as an agent. The state is defined by the current node and the destination, while actions correspond to selecting a neighbor for packet forwarding. The Q-value represents the cumulative penalty associated with each action, with lower values indicating better paths.

The penalty function combines normalized delay and packet loss metrics:

  • Delay Penalty: The measured delay is normalized by the maximum allowable delay, ensuring scalability across different network conditions.
  • Packet Loss Penalty: The logarithm of the packet delivery ratio is used to account for the multiplicative nature of packet loss.

The Q-value update rule ensures that UAVs continuously refine their routing decisions based on real-time feedback.

Adaptive Routing Strategy

To balance exploration (discovering new paths) and exploitation (using known optimal paths), QBQR introduces a dynamic ε-greedy strategy. Unlike traditional ε-greedy, where ε is fixed, QBQR adjusts ε based on network conditions. Initially, when Q-values are uncertain, ε is set high to encourage exploration. As the algorithm converges, ε decreases to prioritize exploitation. This adaptive approach accelerates convergence while maintaining robustness to topology changes.

Additionally, QBQR incorporates node mobility into routing decisions. Each UAV estimates the current position of its neighbors using their last known location and velocity. If a neighbor is predicted to be out of communication range, the UAV excludes it from the candidate set, reducing unnecessary packet drops.

Performance Evaluation

Simulation Setup

QBQR was evaluated using the NS-3 network simulator under various scenarios, including different node densities, mobility patterns, and interference levels. Key performance metrics included:

  1. End-to-End Delay: The time taken for a packet to travel from source to destination.
  2. Packet Delivery Rate: The percentage of successfully delivered packets.
  3. Routing Overhead: The ratio of control traffic to data traffic.

Comparative Analysis

QBQR was compared against OLSR, AODV, and DeepCQ+. The results demonstrated significant improvements:

  1. Lower End-to-End Delay: QBQR reduced average delay by up to 50.7% compared to OLSR and 35.7% compared to AODV. The dynamic penalty function enabled UAVs to consistently select low-latency paths.
  2. Higher Packet Delivery Rate: Under interference and high mobility, QBQR achieved up to 67.9% higher delivery rates than AODV and 13.7% higher than DeepCQ+. The combination of link loss awareness and mobility prediction minimized packet drops.
  3. Moderate Routing Overhead: While QBQR introduced slightly more overhead than AODV, it remained significantly lower than OLSR. The hybrid neighbor discovery mechanism effectively reduced control traffic.

Robustness to Mobility

QBQR’s performance was further validated under varying mobility speeds. As node velocity increased, traditional protocols experienced sharp declines in delivery rates due to frequent link breaks. In contrast, QBQR maintained stable performance by rapidly adapting to topology changes. The adaptive ε-greedy strategy ensured that UAVs could quickly discover new routes while avoiding excessive exploration overhead.

Conclusion

The QBQR algorithm represents a significant advancement in UAV network routing by leveraging reinforcement learning to address the unique challenges of high mobility and stringent QoS requirements. By integrating real-time link state estimation, adaptive exploration strategies, and mobility awareness, QBQR achieves superior performance in terms of delay, packet delivery, and scalability. Future work could explore extensions to multi-objective optimization, including energy efficiency and load balancing, to further enhance UAV network resilience.

DOI: 10.19734/j.issn.1001-3695.2024.08.0318

Was this helpful?

0 / 0