Electronics Optics & Control, Volume. 31, Issue 1, 51(2024)

A Multi-UAV Cooperative Exploration Method Based on Improved Multi-Agent PPO

AN Cheng‘an and ZHOU Sida
Author Affiliations
  • [in Chinese]
  • show less

    Using multiple UAVs to explore unknown environments can improve the robustness and execution efficiency of exploration tasks.Different from the heuristic method,the multi-agent deep reinforcement learning method eliminates the process of making rules artificially,and takes the UAVs as agents to independently learn more effective “rules” by interacting with the environment.A multi-threaded simulation environment for multiple UAVs is built to provide an environment for cooperative training of multiple UAVs.A Long and Short Term Memory neural network-based shared Multi-Agent Proximal Policy Optimization (LSTM-MAPPO) method is proposed to adapt to the multi-threaded environment,and the global boundary information is added on the basis of the cooperative LSTM-MAPPO method to increase the exploration area of each episode.The numerical experiment results show that:1) Compared with the existing Multi-Agent Depth Deterministic Policy Gradient (MADDPG) method,it can converge stably in later periods of training under the continuous action;2) Compared with the existing LSTM-MAPPO method,its final reward is stably above 5000;and 3) On three different simulation maps,the trained network can realize the stable exploration of more than 70% of the area during the test.

    Tools

    Get Citation

    Copy Citation Text

    AN Cheng‘an, ZHOU Sida. A Multi-UAV Cooperative Exploration Method Based on Improved Multi-Agent PPO[J]. Electronics Optics & Control, 2024, 31(1): 51

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Feb. 14, 2023

    Accepted: --

    Published Online: May. 22, 2024

    The Author Email:

    DOI:10.3969/j.issn.1671-637x.2024.01.008

    Topics