Optimistic Multi-Agent Policy Gradient for Cooperative Tasks

Zhao, Wenshuai; Zhao, Yi; Li, Zhiyuan; Kannala, Juho; Pajarinen, Joni

Computer Science > Machine Learning

arXiv:2311.01953v1 (cs)

[Submitted on 3 Nov 2023 (this version), latest version 14 Oct 2025 (v3)]

Title:Optimistic Multi-Agent Policy Gradient for Cooperative Tasks

Authors:Wenshuai Zhao, Yi Zhao, Zhiyuan Li, Juho Kannala, Joni Pajarinen

View PDF

Abstract:\textit{Relative overgeneralization} (RO) occurs in cooperative multi-agent learning tasks when agents converge towards a suboptimal joint policy due to overfitting to suboptimal behavior of other agents. In early work, optimism has been shown to mitigate the \textit{RO} problem when using tabular Q-learning. However, with function approximation optimism can amplify overestimation and thus fail on complex tasks. On the other hand, recent deep multi-agent policy gradient (MAPG) methods have succeeded in many complex tasks but may fail with severe \textit{RO}. We propose a general, yet simple, framework to enable optimistic updates in MAPG methods and alleviate the RO problem. Specifically, we employ a \textit{Leaky ReLU} function where a single hyperparameter selects the degree of optimism to reshape the advantages when updating the policy. Intuitively, our method remains optimistic toward individual actions with lower returns which are potentially caused by other agents' sub-optimal behavior during learning. The optimism prevents the individual agents from quickly converging to a local optimum. We also provide a formal analysis from an operator view to understand the proposed advantage transformation. In extensive evaluations on diverse sets of tasks, including illustrative matrix games, complex \textit{Multi-agent MuJoCo} and \textit{Overcooked} benchmarks, the proposed method\footnote{Code can be found at \url{this https URL}.} outperforms strong baselines on 13 out of 19 tested tasks and matches the performance on the rest.

Comments:	16 pages, 9 figures
Subjects:	Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Cite as:	arXiv:2311.01953 [cs.LG]
	(or arXiv:2311.01953v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.01953

Submission history

From: Wenshuai Zhao [view email]
[v1] Fri, 3 Nov 2023 14:47:54 UTC (3,890 KB)
[v2] Sat, 25 May 2024 11:34:47 UTC (3,969 KB)
[v3] Tue, 14 Oct 2025 13:41:06 UTC (664 KB)

Computer Science > Machine Learning

Title:Optimistic Multi-Agent Policy Gradient for Cooperative Tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Optimistic Multi-Agent Policy Gradient for Cooperative Tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators