Options as responses: Grounding behavioural hierarchies in multi-agent RL

Vezhnevets, Alexander Sasha; Wu, Yuhuai; Leblond, Remi; Leibo, Joel Z.

Computer Science > Machine Learning

arXiv:1906.01470v2 (cs)

[Submitted on 4 Jun 2019 (v1), revised 6 Jun 2019 (this version, v2), latest version 10 Jul 2020 (v3)]

Title:Options as responses: Grounding behavioural hierarchies in multi-agent RL

Authors:Alexander Sasha Vezhnevets, Yuhuai Wu, Remi Leblond, Joel Z. Leibo

View PDF

Abstract:We propose a novel hierarchical agent architecture for multi-agent reinforcement learning with concealed information. The hierarchy is grounded in the concealed information about other players, which resolves "the chicken or the egg" nature of option discovery. We factorise the value function over a latent representation of the concealed information and then re-use this latent space to factorise the policy into options. Low-level policies (options) are trained to respond to particular states of other agents grouped by the latent representation, while the top level (meta-policy) learns to infer the latent representation from its own observation thereby to select the right option. This grounding facilitates credit assignment across the levels of hierarchy. We show that this helps generalisation---performance against a held-out set of pre-trained competitors, while training in self- or population-play---and resolution of social dilemmas in self-play.

Comments:	First two authors contributed equally
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:1906.01470 [cs.LG]
	(or arXiv:1906.01470v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1906.01470

Submission history

From: Alexander Vezhnevets [view email]
[v1] Tue, 4 Jun 2019 14:18:47 UTC (206 KB)
[v2] Thu, 6 Jun 2019 15:10:59 UTC (206 KB)
[v3] Fri, 10 Jul 2020 13:31:16 UTC (654 KB)

Computer Science > Machine Learning

Title:Options as responses: Grounding behavioural hierarchies in multi-agent RL

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Options as responses: Grounding behavioural hierarchies in multi-agent RL

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators