Multiple UAV Formation Control Based on MIX-MAPPO Algorithm
Introduction
Unmanned Aerial Vehicles (UAVs) have gained significant attention due to their versatility in various applications. However, a single UAV often struggles to handle complex multi-task scenarios, leading to increased interest in multi-UAV formations. By leveraging self-organization, UAV swarms can autonomously collaborate, enhancing overall performance through cooperative information exchange. This capability enables them to execute tasks such as search and rescue, surveillance, and tracking efficiently.
Traditional formation control methods for UAVs include leader-follower approaches, virtual structure methods, consensus theory, and inverse control techniques. While these methods provide precise control, they face scalability challenges as the number of UAVs increases. To address these limitations, bio-inspired self-organizing swarm control methods have been explored, utilizing principles such as separation, cohesion, and velocity alignment to maintain coordinated movement.
Reinforcement learning (RL) has emerged as a promising approach for UAV swarm control, simplifying complex system modeling by training agents to learn optimal policies. However, traditional multi-agent RL algorithms struggle with convergence when applied to large-scale UAV formations. This paper introduces a novel approach—MIX-MAPPO—that integrates the Menger sponge fractal structure with multi-agent proximal policy optimization (MAPPO) and attention mechanisms to enhance formation control.
UAV Formation Modeling
Motion Model
To simplify the problem, UAVs are abstracted as point masses in a Cartesian coordinate system. The motion of each UAV is described by its position and velocity, with control inputs applied to adjust flight parameters. The dynamics ensure that UAVs maintain coordinated movement while avoiding collisions.
Formation Design Using Menger Sponge Fractal
The Menger sponge fractal structure is employed for UAV formation due to its self-similarity properties, where smaller structures resemble the larger whole. This characteristic simplifies formation construction and enhances scalability.
A primary formation consists of five UAVs: one leader at the center and four followers arranged symmetrically around it. This configuration ensures the leader’s protection while maintaining efficient communication within the group. The Laplacian matrix defines the communication topology, ensuring connectivity among UAVs.
By leveraging the self-similarity of the Menger sponge, higher-level formations can be constructed by grouping primary formations. For instance, four primary formations can combine into a secondary formation, maintaining both internal and inter-group communication. This hierarchical approach enables large-scale UAV swarms to operate efficiently under centralized and distributed control strategies.
MIX-MAPPO Algorithm for Formation Control
Challenges with Traditional MAPPO
MAPPO, a multi-agent RL algorithm, suffers from slow convergence and scalability issues as the number of UAVs increases. The state-action space grows linearly with UAV count, making training inefficient.
Improvements in MIX-MAPPO
To overcome these limitations, MIX-MAPPO incorporates:
- Grouping Mechanism – UAVs are divided into sub-formations based on the Menger sponge structure, reducing input dimensionality.
- Attention Mechanism – Followers use weighted attention to focus on relevant UAVs within their sub-formation, improving information aggregation.
- Hybrid Critic Networks – Leaders and followers employ separate critic networks. Leaders use simplified PPO-based critics, while followers integrate attention-based critics for efficient learning.
This architecture reduces computational complexity, accelerates convergence, and enhances adaptability in dynamic environments.
Reward Function Design
The reward function is critical for training UAVs to achieve desired behaviors:
• Leader Reward – Encourages rapid movement toward target positions.
• Follower Reward – Promotes alignment with the leader while maintaining formation shape.
• Collision Avoidance – Penalizes UAVs for entering unsafe proximity.
• Environmental Reward – Simulates real-world disturbances by attracting UAVs toward a reference point.
These rewards ensure stable formation assembly, collision-free navigation, and adaptability to environmental factors.
Experimental Results
Performance Comparison
MIX-MAPPO was evaluated against DDPG, PPO, MADDPG, and MAPPO in simulated environments. Key findings include:
- Faster Convergence – MIX-MAPPO achieved higher reward values and stable formations more quickly than other algorithms.
- Improved Training Efficiency – The grouping mechanism reduced training time by minimizing redundant computations.
- Higher Formation Completion Rate – MIX-MAPPO achieved a 97% success rate in assembling UAVs into target formations, outperforming other methods.
Motion Capture Validation
Real-world experiments demonstrated that MIX-MAPPO-trained UAVs successfully transitioned from random positions to stable formations, maintaining cohesion during movement. Followers consistently adjusted their positions relative to the leader, validating the algorithm’s effectiveness in practical scenarios.
Conclusion
This paper presents MIX-MAPPO, an advanced multi-agent reinforcement learning algorithm for UAV formation control. By integrating the Menger sponge fractal structure, attention mechanisms, and hybrid critic networks, MIX-MAPPO addresses scalability and convergence challenges in large-scale UAV swarms. Experimental results confirm its superiority over existing methods in terms of training efficiency, formation stability, and real-world applicability.
Future work will explore 3D formation control and further optimizations in grouping strategies to enhance swarm coordination in complex environments.
doi.org/10.19734/j.issn.1001-3695.2024.07.0207
Was this helpful?
0 / 0