Skip to main content

Showing 1–10 of 10 results for author: Aouali, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.03434  [pdf, other

    cs.LG cs.AI stat.ML

    Unified PAC-Bayesian Study of Pessimism for Offline Policy Learning with Regularized Importance Sampling

    Authors: Imad Aouali, Victor-Emmanuel Brunel, David Rohde, Anna Korba

    Abstract: Off-policy learning (OPL) often involves minimizing a risk estimator based on importance weighting to correct bias from the logging policy used to collect data. However, this method can produce an estimator with a high variance. A common solution is to regularize the importance weights and learn the policy by minimizing an estimator with penalties derived from generalization bounds specific to the… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at UAI 2024

  2. arXiv:2405.14335  [pdf, other

    stat.ML cs.LG

    Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning

    Authors: Otmane Sakhi, Imad Aouali, Pierre Alquier, Nicolas Chopin

    Abstract: This work investigates the offline formulation of the contextual bandit problem, where the goal is to leverage past interactions collected under a behavior policy to evaluate, select, and learn new, potentially better-performing, policies. Motivated by critical applications, we move beyond point estimators. Instead, we adopt the principle of pessimism where we construct upper bounds that assess a… ▽ More

    Submitted 30 October, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: NeuRIPS '24 Spotlight

  3. arXiv:2402.14664  [pdf, other

    cs.LG cs.AI stat.ML

    Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

    Authors: Imad Aouali, Victor-Emmanuel Brunel, David Rohde, Anna Korba

    Abstract: In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach for OPE and OPL, grounded in both algorithmic… ▽ More

    Submitted 8 April, 2025; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted at AISTATS 2025

  4. arXiv:2402.10028  [pdf, other

    cs.LG cs.AI stat.ML

    Diffusion Models Meet Contextual Bandits with Large Action Spaces

    Authors: Imad Aouali

    Abstract: Efficient exploration is a key challenge in contextual bandits due to the large size of their action space, where uninformed exploration can result in computational and statistical inefficiencies. Fortunately, the rewards of actions are often correlated and this can be leveraged to explore them efficiently. In this work, we capture such correlations using pre-trained diffusion models; upon which w… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 26 pages, 5 figures

  5. arXiv:2402.05878  [pdf, other

    stat.ML cs.LG

    Prior-Dependent Allocations for Bayesian Fixed-Budget Best-Arm Identification in Structured Bandits

    Authors: Nicolas Nguyen, Imad Aouali, András György, Claire Vernade

    Abstract: We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits. We propose an algorithm that uses fixed allocations based on the prior information and the structure of the environment. We provide theoretical bounds on its performance across diverse models, including the first prior-dependent upper bounds for linear and hierarchical BAI. Our key contribution is in… ▽ More

    Submitted 24 April, 2025; v1 submitted 8 February, 2024; originally announced February 2024.

    Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS 2025)

  6. arXiv:2305.15877  [pdf, other

    cs.LG cs.AI stat.ML

    Exponential Smoothing for Off-Policy Learning

    Authors: Imad Aouali, Victor-Emmanuel Brunel, David Rohde, Anna Korba

    Abstract: Off-policy learning (OPL) aims at finding improved policies from logged bandit data, often by minimizing the inverse propensity scoring (IPS) estimator of the risk. In this work, we investigate a smooth regularization for IPS, for which we derive a two-sided PAC-Bayes generalization bound. The bound is tractable, scalable, interpretable and provides learning certificates. In particular, it is also… ▽ More

    Submitted 5 June, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: ICML 2023 (Oral and Poster)

  7. arXiv:2209.08642  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation

    Authors: Imad Aouali, Amine Benhalloum, Martin Bompaire, Benjamin Heymann, Olivier Jeunen, David Rohde, Otmane Sakhi, Flavian Vasile

    Abstract: Both in academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems. Naturally, the reason for this is that we can directly measure utility metrics that rely on interventions, being the recommendations that are being shown to users. Nevertheless, online evaluation methods are costly for a number of reasons… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: Accepted at the ACM RecSys 2021 Workshop on Simulation Methods for Recommender Systems

  8. arXiv:2208.06263  [pdf, other

    cs.IR cs.LG stat.ML

    Probabilistic Rank and Reward: A Scalable Model for Slate Recommendation

    Authors: Imad Aouali, Achraf Ait Sidi Hammou, Otmane Sakhi, David Rohde, Flavian Vasile

    Abstract: We introduce Probabilistic Rank and Reward (PRR), a scalable probabilistic model for personalized slate recommendation. Our approach allows off-policy estimation of the reward in the scenario where the user interacts with at most one item from a slate of K items. We show that the probability of a slate being successful can be learned efficiently by combining the reward, whether the user successful… ▽ More

    Submitted 5 July, 2024; v1 submitted 10 August, 2022; originally announced August 2022.

  9. arXiv:2205.15124  [pdf, other

    cs.LG stat.ML

    Mixed-Effect Thompson Sampling

    Authors: Imad Aouali, Branislav Kveton, Sumeet Katariya

    Abstract: A contextual bandit is a popular framework for online learning to act under uncertainty. In practice, the number of actions is huge and their expected rewards are correlated. In this work, we introduce a general framework for capturing such correlations through a mixed-effect model where actions are related through multiple shared effect parameters. To explore efficiently using this structure, we… ▽ More

    Submitted 5 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  10. arXiv:2107.12455  [pdf, other

    cs.LG

    Combining Reward and Rank Signals for Slate Recommendation

    Authors: Imad Aouali, Sergey Ivanov, Mike Gartrell, David Rohde, Flavian Vasile, Victor Zaytsev, Diego Legrand

    Abstract: We consider the problem of slate recommendation, where the recommender system presents a user with a collection or slate composed of K recommended items at once. If the user finds the recommended items appealing then the user may click and the recommender system receives some feedback. Two pieces of information are available to the recommender system: was the slate clicked? (the reward), and if th… ▽ More

    Submitted 29 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: KDD '21 Workshop on Bayesian Causal Inference for Real World Interactive Systems, August 14th-15th, 2021, Singapore