The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Yu, Chao; Velu, Akash; Vinitsky, Eugene; Wang, Yu; Bayen, Alexandre; Wu, Yi

Computer Science > Machine Learning

arXiv:2103.01955v2 (cs)

[Submitted on 2 Mar 2021 (v1), revised 5 Jul 2021 (this version, v2), latest version 4 Nov 2022 (v4)]

Title:The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Authors:Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu

View PDF

Abstract:Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due the belief that on-policy methods are significantly less sample efficient than their off-policy counterparts in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a variant of PPO which is specialized for multi-agent settings. Using a 1-GPU desktop, we show that MAPPO achieves surprisingly strong performance in three popular multi-agent testbeds: the particle-world environments, the Starcraft multi-agent challenge, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves strong results while exhibiting comparable sample efficiency. Finally, through ablation studies, we present the implementation and algorithmic factors which are most influential to MAPPO's practical performance.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cite as:	arXiv:2103.01955 [cs.LG]
	(or arXiv:2103.01955v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2103.01955

Submission history

From: Akash Velu [view email]
[v1] Tue, 2 Mar 2021 18:59:56 UTC (16,035 KB)
[v2] Mon, 5 Jul 2021 23:45:06 UTC (19,815 KB)
[v3] Thu, 21 Jul 2022 06:57:33 UTC (23,508 KB)
[v4] Fri, 4 Nov 2022 06:16:11 UTC (33,520 KB)

Computer Science > Machine Learning

Title:The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators