Skip to main content

Showing 1–6 of 6 results for author: Tasdighi, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.23501  [pdf, ps, other

    cs.LG stat.ML

    Directional Ensemble Aggregation for Actor-Critics

    Authors: Nicklas Werge, Yi-Shan Wu, Bahareh Tasdighi, Melih Kandemir

    Abstract: Off-policy reinforcement learning in continuous control tasks depends critically on accurate $Q$-value estimates. Conservative aggregation over ensembles, such as taking the minimum, is commonly used to mitigate overestimation bias. However, these static rules are coarse, discard valuable information from the ensemble, and cannot adapt to task-specific needs or different learning regimes. We propo… ▽ More

    Submitted 31 July, 2025; originally announced July 2025.

  2. arXiv:2507.03487  [pdf, ps, other

    cs.LG

    ObjectRL: An Object-Oriented Reinforcement Learning Codebase

    Authors: Gulcin Baykal, Abdullah Akgül, Manuel Haussmann, Bahareh Tasdighi, Nicklas Werge, Yi-Shan Wu, Melih Kandemir

    Abstract: ObjectRL is an open-source Python codebase for deep reinforcement learning (RL), designed for research-oriented prototyping with minimal programming effort. Unlike existing codebases, ObjectRL is built on Object-Oriented Programming (OOP) principles, providing a clear structure that simplifies the implementation, modification, and evaluation of new algorithms. ObjectRL lowers the entry barrier for… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  3. arXiv:2505.19682  [pdf, other

    cs.LG

    Deep Actor-Critics with Tight Risk Certificates

    Authors: Bahareh Tasdighi, Manuel Haussmann, Yi-Shan Wu, Andres R. Masegosa, Melih Kandemir

    Abstract: After a period of research, deep actor-critic algorithms have reached a level where they influence our everyday lives. They serve as the driving force behind the continual improvement of large language models through user-collected feedback. However, their deployment in physical systems is not yet widely adopted, mainly because no validation scheme that quantifies their risk of malfunction. We dem… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  4. arXiv:2406.03890  [pdf, ps, other

    cs.LG stat.ML

    Improving Actor-Critic Training with Steerable Action-Value Approximation Errors

    Authors: Bahareh Tasdighi, Nicklas Werge, Yi-Shan Wu, Melih Kandemir

    Abstract: Off-policy actor-critic algorithms have shown strong potential in deep reinforcement learning for continuous control tasks. Their success primarily comes from leveraging pessimistic state-action value function updates, which reduce function approximation errors and stabilize learning. However, excessive pessimism can limit exploration, preventing the agent from effectively refining its policies. C… ▽ More

    Submitted 20 August, 2025; v1 submitted 6 June, 2024; originally announced June 2024.

  5. arXiv:2402.03055  [pdf, ps, other

    cs.LG

    Deep Exploration with PAC-Bayes

    Authors: Bahareh Tasdighi, Manuel Haussmann, Nicklas Werge, Yi-Shan Wu, Melih Kandemir

    Abstract: Reinforcement learning (RL) for continuous control under delayed rewards is an under-explored problem despite its significance in real-world applications. Many complex skills are based on intermediate ones as prerequisites. For instance, a humanoid locomotor must learn how to stand before it can learn to walk. To cope with delayed reward, an agent must perform deep exploration. However, existing d… ▽ More

    Submitted 20 August, 2025; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: ECAI camera-ready version; fixed acknowledgements; fixed github reference

  6. arXiv:2301.12776  [pdf, other

    cs.LG stat.ML

    PAC-Bayesian Soft Actor-Critic Learning

    Authors: Bahareh Tasdighi, Abdullah Akgül, Manuel Haussmann, Kenny Kazimirzak Brink, Melih Kandemir

    Abstract: Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. The practicality of this approach comes at the expense of training instability, caused mainly by the destructive effect of the approximation errors of the critic on the actor. We tackle this bottleneck by employing an existing Probably Approximat… ▽ More

    Submitted 10 June, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: 19 pages, 2 figures