Improving Actor-Critic Training with Steerable Action-Value Approximation Errors

Tasdighi, Bahareh; Werge, Nicklas; Wu, Yi-Shan; Kandemir, Melih

Computer Science > Machine Learning

arXiv:2406.03890 (cs)

[Submitted on 6 Jun 2024 (v1), last revised 20 Aug 2025 (this version, v2)]

Title:Improving Actor-Critic Training with Steerable Action-Value Approximation Errors

Authors:Bahareh Tasdighi, Nicklas Werge, Yi-Shan Wu, Melih Kandemir

View PDF HTML (experimental)

Abstract:Off-policy actor-critic algorithms have shown strong potential in deep reinforcement learning for continuous control tasks. Their success primarily comes from leveraging pessimistic state-action value function updates, which reduce function approximation errors and stabilize learning. However, excessive pessimism can limit exploration, preventing the agent from effectively refining its policies. Conversely, optimism can encourage exploration but may lead to high-risk behaviors and unstable learning if not carefully managed. To address this trade-off, we propose Utility Soft Actor-Critic (USAC), a novel framework that allows independent, interpretable control of pessimism and optimism for both the actor and the critic. USAC dynamically adapts its exploration strategy based on the uncertainty of critics using a utility function, enabling a task-specific balance between optimism and pessimism. This approach goes beyond binary choices of pessimism or optimism, making the method both theoretically meaningful and practically feasible. Experiments across a variety of continuous control tasks show that adjusting the degree of pessimism or optimism significantly impacts performance. When configured appropriately, USAC consistently outperforms state-of-the-art algorithms, demonstrating its practical utility and feasibility.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2406.03890 [cs.LG]
	(or arXiv:2406.03890v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.03890

Submission history

From: Bahareh Tasdighi [view email]
[v1] Thu, 6 Jun 2024 09:26:02 UTC (535 KB)
[v2] Wed, 20 Aug 2025 07:56:10 UTC (189 KB)

Computer Science > Machine Learning

Title:Improving Actor-Critic Training with Steerable Action-Value Approximation Errors

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Improving Actor-Critic Training with Steerable Action-Value Approximation Errors

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators