Skip to main content

Showing 1–15 of 15 results for author: Akrour, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13862  [pdf, ps, other

    cs.LG cs.AI

    StaQ it! Growing neural networks for Policy Mirror Descent

    Authors: Alena Shilova, Alex Davey, Brahim Driss, Riad Akrour

    Abstract: In Reinforcement Learning (RL), regularization has emerged as a popular tool both in theory and practice, typically based either on an entropy bonus or a Kullback-Leibler divergence that constrains successive policies. In practice, these approaches have been shown to improve exploration, robustness and stability, giving rise to popular Deep RL algorithms such as SAC and TRPO. Policy Mirror Descent… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 44 pages, 12 figures

  2. arXiv:2506.13741  [pdf, ps, other

    cs.AI cs.LG

    PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning

    Authors: Brahim Driss, Alex Davey, Riad Akrour

    Abstract: Preference-based reinforcement learning (PbRL) has emerged as a promising approach for learning behaviors from human feedback without predefined reward functions. However, current PbRL methods face a critical challenge in effectively exploring the preference space, often converging prematurely to suboptimal policies that satisfy only a narrow subset of human preferences. In this work, we identify… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  3. arXiv:2503.08322  [pdf, other

    cs.LG cs.AI

    Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs

    Authors: Hector Kohler, Quentin Delfosse, Waris Radji, Riad Akrour, Philippe Preux

    Abstract: There exist applications of reinforcement learning like medicine where policies need to be ''interpretable'' by humans. User studies have shown that some policy classes might be more interpretable than others. However, it is costly to conduct human studies of policy interpretability. Furthermore, there is no clear definition of policy interpretabiliy, i.e., no clear metrics for interpretability an… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: 12 pages of main text, under review

  4. arXiv:2407.04864  [pdf, other

    cs.LG

    Augmented Bayesian Policy Search

    Authors: Mahdi Kallel, Debabrota Basu, Riad Akrour, Carlo D'Eramo

    Abstract: Deterministic policies are often preferred over stochastic ones when implemented on physical systems. They can prevent erratic and harmful behaviors while being easier to implement and interpret. However, in practice, exploration is largely performed by stochastic policies. First-order Bayesian Optimization (BO) methods offer a principled way of performing exploration using deterministic policies.… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to the International Conference on Learning Representations (ICLR) 2024

  5. arXiv:2405.14956  [pdf, other

    cs.AI cs.LG

    Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning

    Authors: Hector Kohler, Quentin Delfosse, Riad Akrour, Kristian Kersting, Philippe Preux

    Abstract: Deep reinforcement learning agents are prone to goal misalignments. The black-box nature of their policies hinders the detection and correction of such misalignments, and the trust necessary for real-world deployment. So far, solutions learning interpretable policies are inefficient or require many human priors. We propose INTERPRETER, a fast distillation method producing INTerpretable Editable tR… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  6. arXiv:2309.13365  [pdf, other

    cs.LG cs.AI

    Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs

    Authors: Hector Kohler, Riad Akrour, Philippe Preux

    Abstract: Interpretability of AI models allows for user safety checks to build trust in such AIs. In particular, Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision. However, interpretability is hindered if the DT is too large. To learn compact trees, a recent Reinforcement Learning (RL) framework has been pr… ▽ More

    Submitted 21 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: To be included in an other submission. arXiv admin note: text overlap with arXiv:2304.05839

  7. Breiman meets Bellman: Non-Greedy Decision Trees with MDPs

    Authors: Hector Kohler, Riad Akrour, Philippe Preux

    Abstract: In supervised learning, decision trees are valued for their interpretability and performance. While greedy decision tree algorithms like CART remain widely used due to their computational efficiency, they often produce sub-optimal solutions with respect to a regularized training loss. Conversely, optimal decision tree methods can find better solutions but are computationally intensive and typicall… ▽ More

    Submitted 1 June, 2025; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: ACM SIGKDD 2025, 12 pages

  8. arXiv:2304.05839  [pdf, other

    cs.LG cs.AI

    Optimal Interpretability-Performance Trade-off of Classification Trees with Black-Box Reinforcement Learning

    Authors: Hector Kohler, Riad Akrour, Philippe Preux

    Abstract: Interpretability of AI models allows for user safety checks to build trust in these models. In particular, decision trees (DTs) provide a global view on the learned model and clearly outlines the role of the features that are critical to classify a given data. However, interpretability is hindered if the DT is too large. To learn compact trees, a Reinforcement Learning (RL) framework has been rece… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  9. arXiv:2210.08503  [pdf, other

    cs.LG

    Entropy Regularized Reinforcement Learning with Cascading Networks

    Authors: Riccardo Della Vecchia, Alena Shilova, Philippe Preux, Riad Akrour

    Abstract: Deep Reinforcement Learning (Deep RL) has had incredible achievements on high dimensional problems, yet its learning process remains unstable even on the simplest tasks. Deep RL uses neural networks as function approximators. These neural models are largely inspired by developments in the (un)supervised machine learning community. Compared to these learning frameworks, one of the major difficultie… ▽ More

    Submitted 16 October, 2022; originally announced October 2022.

  10. arXiv:2011.07016  [pdf, other

    cs.LG math.OC stat.ML

    Convex Optimization with an Interpolation-based Projection and its Application to Deep Learning

    Authors: Riad Akrour, Asma Atamna, Jan Peters

    Abstract: Convex optimizers have known many applications as differentiable layers within deep neural architectures. One application of these convex layers is to project points into a convex set. However, both forward and backward passes of these convex layers are significantly more expensive to compute than those of a typical neural network. We investigate in this paper whether an inexact, but cheaper proje… ▽ More

    Submitted 13 November, 2020; originally announced November 2020.

  11. arXiv:2006.05911  [pdf, other

    cs.LG stat.ML

    Continuous Action Reinforcement Learning from a Mixture of Interpretable Experts

    Authors: Riad Akrour, Davide Tateo, Jan Peters

    Abstract: Reinforcement learning (RL) has demonstrated its ability to solve high dimensional tasks by leveraging non-linear function approximators. However, these successes are mostly achieved by 'black-box' policies in simulated domains. When deploying RL to the real world, several concerns regarding the use of a 'black-box' policy might be raised. In order to make the learned policies more transparent, we… ▽ More

    Submitted 18 November, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

  12. arXiv:2001.10972  [pdf, other

    stat.ML cs.LG

    An Upper Bound of the Bias of Nadaraya-Watson Kernel Regression under Lipschitz Assumptions

    Authors: Samuele Tosatto, Riad Akrour, Jan Peters

    Abstract: The Nadaraya-Watson kernel estimator is among the most popular nonparameteric regression technique thanks to its simplicity. Its asymptotic bias has been studied by Rosenblatt in 1969 and has been reported in a number of related literature. However, Rosenblatt's analysis is only valid for infinitesimal bandwidth. In contrast, we propose in this paper an upper bound of the bias which holds for fini… ▽ More

    Submitted 30 January, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

  13. arXiv:1902.02823  [pdf, other

    cs.LG stat.ML

    Compatible Natural Gradient Policy Search

    Authors: Joni Pajarinen, Hong Linh Thai, Riad Akrour, Jan Peters, Gerhard Neumann

    Abstract: Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value func… ▽ More

    Submitted 7 February, 2019; originally announced February 2019.

  14. arXiv:1606.09197  [pdf, other

    cs.LG cs.RO

    Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

    Authors: Riad Akrour, Abbas Abdolmaleki, Hany Abdulsamad, Jan Peters, Gerhard Neumann

    Abstract: Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-t… ▽ More

    Submitted 2 July, 2018; v1 submitted 29 June, 2016; originally announced June 2016.

  15. arXiv:1208.0984  [pdf, other

    cs.LG

    APRIL: Active Preference-learning based Reinforcement Learning

    Authors: Riad Akrour, Marc Schoenauer, Michèle Sebag

    Abstract: This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demon… ▽ More

    Submitted 5 August, 2012; originally announced August 2012.

    Journal ref: ECML PKDD 2012 7524 (2012) 116-131