CASA-B: A Unified Framework of Model-Free Reinforcement Learning

Xiao, Changnan; Shi, Haosen; Fan, Jiajun; Deng, Shihong

Computer Science > Machine Learning

arXiv:2105.03923v1 (cs)

[Submitted on 9 May 2021 (this version), latest version 25 Feb 2023 (v5)]

Title:CASA-B: A Unified Framework of Model-Free Reinforcement Learning

Authors:Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng

View PDF

Abstract:Building on the breakthrough of reinforcement learning, this paper introduces a unified framework of model-free reinforcement learning, CASA-B, Critic AS an Actor with Bandits Vote Algorithm. CASA-B is an actor-critic framework that estimates state-value, state-action-value and policy. An expectation-correct Doubly Robust Trace is introduced to learn state-value and state-action-value, whose convergence properties are guaranteed. We prove that CASA-B integrates a consistent path for the policy evaluation and the policy improvement. The policy evaluation is equivalent to a compensational policy improvement, which alleviates the function approximation error, and is also equivalent to an entropy-regularized policy improvement, which prevents the policy from collapsing to a suboptimal solution. Building on this design, we find the entropy of the behavior policies' and the target policy's are disentangled. Based on this observation, we propose a progressive closed-form entropy control mechanism, which explicitly controls the behavior policies' entropy to arbitrary range. Our experiments show that CASAB is super sample efficient and achieves State-Of-The-Art on Arcade Learning Environment. Our mean Human Normalized Score is 6456.63% and our median Human Normalized Score is 477.17%, under 200M training scale.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2105.03923 [cs.LG]
	(or arXiv:2105.03923v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2105.03923

Submission history

From: Shihong Deng [view email]
[v1] Sun, 9 May 2021 12:45:13 UTC (5,337 KB)
[v2] Wed, 26 May 2021 12:50:54 UTC (6,795 KB)
[v3] Thu, 27 May 2021 16:28:16 UTC (6,816 KB)
[v4] Sat, 11 Jun 2022 07:14:47 UTC (4,669 KB)
[v5] Sat, 25 Feb 2023 12:36:54 UTC (23,237 KB)

Computer Science > Machine Learning

Title:CASA-B: A Unified Framework of Model-Free Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:CASA-B: A Unified Framework of Model-Free Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators