Computer Science > Multiagent Systems
[Submitted on 27 Jun 2021 (v1), revised 2 Aug 2021 (this version, v6), latest version 8 Jun 2023 (v14)]
Title:Policy Regularization with Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods
View PDFAbstract:Multi-Agent Reinforcement Learning (MARL) has seen revolutionary breakthroughs with its successful application to multi-agent cooperative tasks such as robot swarms control, autonomous vehicle coordination, and computer games. Recent works have applied the Proximal Policy Optimization (PPO) to the multi-agent tasks, called Multi-agent PPO (MAPPO). However, previous literature shows that the vanilla MAPPO with a shared value function may not perform as well as Independent PPO (IPPO) and the finetuned QMIX. Thus MAPPO-agent-specific (MAPPO-AS) further improves the performance of vanilla MAPPO and IPPO by the artificial agent-specific features. In addition, there is no literature that gives a theoretical analysis of the working mechanism of MAPPO. In this paper, we firstly theoretically generalize single-agent PPO to the vanilla MAPPO, which shows that the vanilla MAPPO is approximately equivalent to optimizing a multi-agent joint policy with the original PPO. Secondly, we find that vanilla MAPPO faces the problem of \textit{The Policies Overfitting in Multi-agent Cooperation(POMAC)} as they learn policies by the sampled centralized advantage values. Then POMAC may lead to updating the policies of some agents in a suboptimal direction and prevent the agents from exploring better trajectories. To solve the POMAC problem, we propose a novel policy regularization method, i.e, Noisy-MAPPO, and Advantage-Noisy-MAPPO, which smooth out the advantage values by noise. The experimental results show that the average performance of Noisy-MAPPO is better than that of finetuned QMIX and MAPPO-AS, and is much better than the vanilla MAPPO. We open-source the code at \url{this https URL}.
Submission history
From: Jian Hu [view email][v1] Sun, 27 Jun 2021 22:50:35 UTC (732 KB)
[v2] Thu, 1 Jul 2021 06:02:59 UTC (733 KB)
[v3] Tue, 13 Jul 2021 12:24:26 UTC (761 KB)
[v4] Mon, 19 Jul 2021 13:10:15 UTC (759 KB)
[v5] Thu, 22 Jul 2021 13:32:39 UTC (768 KB)
[v6] Mon, 2 Aug 2021 14:01:18 UTC (799 KB)
[v7] Sun, 22 Aug 2021 23:48:07 UTC (1,716 KB)
[v8] Wed, 1 Sep 2021 23:56:31 UTC (1,715 KB)
[v9] Wed, 8 Sep 2021 21:12:49 UTC (1,715 KB)
[v10] Mon, 13 Sep 2021 00:46:27 UTC (1,712 KB)
[v11] Fri, 1 Oct 2021 00:51:10 UTC (1,713 KB)
[v12] Sat, 23 Oct 2021 11:21:34 UTC (1,714 KB)
[v13] Thu, 11 Nov 2021 09:35:41 UTC (1,713 KB)
[v14] Thu, 8 Jun 2023 07:44:23 UTC (1,714 KB)
References & Citations
export BibTeX citation
Loading...
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.