Decentralized Q-Learning in Zero-sum Markov Games

Sayin, Muhammed O.; Zhang, Kaiqing; Leslie, David S.; Basar, Tamer; Ozdaglar, Asuman

Computer Science > Computer Science and Game Theory

arXiv:2106.02748 (cs)

[Submitted on 4 Jun 2021 (v1), last revised 12 Dec 2021 (this version, v2)]

Title:Decentralized Q-Learning in Zero-sum Markov Games

Authors:Muhammed O. Sayin, Kaiqing Zhang, David S. Leslie, Tamer Basar, Asuman Ozdaglar

View PDF

Abstract:We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized controller, but only based on their own payoffs and local actions executed. The agents need not observe the opponent's actions or payoffs, possibly being even oblivious to the presence of the opponent, nor be aware of the zero-sum structure of the underlying game, a setting also referred to as radically uncoupled in the literature of learning in games. In this paper, we develop a radically uncoupled Q-learning dynamics that is both rational and convergent: the learning dynamics converges to the best response to the opponent's strategy when the opponent follows an asymptotically stationary strategy; when both agents adopt the learning dynamics, they converge to the Nash equilibrium of the game. The key challenge in this decentralized setting is the non-stationarity of the environment from an agent's perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts her policies simultaneously and independently. To address this issue, we develop a two-timescale learning dynamics where each agent updates her local Q-function and value function estimates concurrently, with the latter happening at a slower timescale.

Comments:	To appear at NeurIPS 2021. Strengthened the results in Theorem 1 and Corollary 1
Subjects:	Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Dynamical Systems (math.DS)
Cite as:	arXiv:2106.02748 [cs.GT]
	(or arXiv:2106.02748v2 [cs.GT] for this version)
	https://doi.org/10.48550/arXiv.2106.02748

Submission history

From: Muhammed Omer Sayin [view email]
[v1] Fri, 4 Jun 2021 22:42:56 UTC (348 KB)
[v2] Sun, 12 Dec 2021 07:53:32 UTC (343 KB)

Computer Science > Computer Science and Game Theory

Title:Decentralized Q-Learning in Zero-sum Markov Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Science and Game Theory

Title:Decentralized Q-Learning in Zero-sum Markov Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators