Skip to main content

Showing 1–3 of 3 results for author: Scheid, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2410.17055  [pdf, other

    cs.LG stat.ML

    Optimal Design for Reward Modeling in RLHF

    Authors: Antoine Scheid, Etienne Boursier, Alain Durmus, Michael I. Jordan, Pierre Ménard, Eric Moulines, Michal Valko

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has become a popular approach to align language models (LMs) with human preferences. This method involves collecting a large dataset of human pairwise preferences across various text generations and using it to infer (implicitly or explicitly) a reward model. Numerous methods have been proposed to learn the reward model and align a LM with it. Howe… ▽ More

    Submitted 23 October, 2024; v1 submitted 22 October, 2024; originally announced October 2024.

  2. arXiv:2406.19824  [pdf, other

    cs.GT stat.ML

    Learning to Mitigate Externalities: the Coase Theorem with Hindsight Rationality

    Authors: Antoine Scheid, Aymeric Capitaine, Etienne Boursier, Eric Moulines, Michael I Jordan, Alain Durmus

    Abstract: In economic theory, the concept of externality refers to any indirect effect resulting from an interaction between players that affects the social welfare. Most of the models within which externality has been studied assume that agents have perfect knowledge of their environment and preferences. This is a major hindrance to the practical implementation of many proposed solutions. To address this i… ▽ More

    Submitted 28 January, 2025; v1 submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2403.03811  [pdf, other

    stat.ML cs.GT cs.LG

    Incentivized Learning in Principal-Agent Bandit Games

    Authors: Antoine Scheid, Daniil Tiapkin, Etienne Boursier, Aymeric Capitaine, El Mahdi El Mhamdi, Eric Moulines, Michael I. Jordan, Alain Durmus

    Abstract: This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent. The principal and the agent have misaligned objectives and the choice of action is only left to the agent. However, the principal can influence the agent's decisions by offering incentives which add up to his rewards. The principal aims to iteratively learn an i… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.