Skip to main content

Showing 1–4 of 4 results for author: Bloesch, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2410.04166  [pdf, other

    cs.LG stat.ML

    Learning from negative feedback, or positive feedback or both

    Authors: Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari, Jost Tobias Springenberg, Tim Hertweck, Rishabh Joshi, Junhyuk Oh, Michael Bloesch, Thomas Lampe, Nicolas Heess, Jonas Buchli, Martin Riedmiller

    Abstract: Existing preference optimization methods often assume scenarios where paired preference feedback (preferred/positive vs. dis-preferred/negative examples) is available. This requirement limits their applicability in scenarios where only unpaired feedback--for example, either positive or negative--is available. To address this, we introduce a novel approach that decouples learning from positive and… ▽ More

    Submitted 7 March, 2025; v1 submitted 5 October, 2024; originally announced October 2024.

  2. arXiv:2409.01369  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Imitating Language via Scalable Inverse Reinforcement Learning

    Authors: Markus Wulfmeier, Michael Bloesch, Nino Vieillard, Arun Ahuja, Jorg Bornschein, Sandy Huang, Artem Sokolov, Matt Barnes, Guillaume Desjardins, Alex Bewley, Sarah Maria Elisabeth Bechtle, Jost Tobias Springenberg, Nikola Momchev, Olivier Bachem, Matthieu Geist, Martin Riedmiller

    Abstract: The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability of maximum likelihood estimation (MLE) for next token prediction led to its role as predominant paradigm. However, the broader field of imitation learning can mo… ▽ More

    Submitted 9 December, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: Published at NeurIPS 2024

  3. arXiv:2008.12228  [pdf, other

    cs.RO cs.AI cs.LG stat.ML

    Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion

    Authors: Roland Hafner, Tim Hertweck, Philipp Klöppner, Michael Bloesch, Michael Neunert, Markus Wulfmeier, Saran Tunyasuvunakool, Nicolas Heess, Martin Riedmiller

    Abstract: Modern Reinforcement Learning (RL) algorithms promise to solve difficult motor control problems directly from raw sensory inputs. Their attraction is due in part to the fact that they can represent a general class of methods that allow to learn a solution with a reasonably set reward and minimal prior knowledge, even in situations where it is difficult or expensive for a human expert. For RL to tr… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

  4. arXiv:2005.07541  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Simple Sensor Intentions for Exploration

    Authors: Tim Hertweck, Martin Riedmiller, Michael Bloesch, Jost Tobias Springenberg, Noah Siegel, Markus Wulfmeier, Roland Hafner, Nicolas Heess

    Abstract: Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic sy… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.