Skip to main content

Showing 1–7 of 7 results for author: Shahriari, B

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.17338  [pdf, other

    cs.AI cs.LG stat.ML

    Capturing Individual Human Preferences with Reward Features

    Authors: André Barreto, Vincent Dumoulin, Yiran Mao, Nicolas Perez-Nieves, Bobak Shahriari, Yann Dauphin, Doina Precup, Hugo Larochelle

    Abstract: Reinforcement learning from human feedback usually models preferences using a reward model that does not distinguish between people. We argue that this is unlikely to be a good design choice in contexts with high potential for disagreement, like in the training of large language models. We propose a method to specialise a reward model to a person or group of people. Our approach builds on the obse… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  2. arXiv:2410.04166  [pdf, other

    cs.LG stat.ML

    Learning from negative feedback, or positive feedback or both

    Authors: Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari, Jost Tobias Springenberg, Tim Hertweck, Rishabh Joshi, Junhyuk Oh, Michael Bloesch, Thomas Lampe, Nicolas Heess, Jonas Buchli, Martin Riedmiller

    Abstract: Existing preference optimization methods often assume scenarios where paired preference feedback (preferred/positive vs. dis-preferred/negative examples) is available. This requirement limits their applicability in scenarios where only unpaired feedback--for example, either positive or negative--is available. To address this, we introduce a novel approach that decouples learning from positive and… ▽ More

    Submitted 7 March, 2025; v1 submitted 5 October, 2024; originally announced October 2024.

  3. arXiv:2006.15134  [pdf, other

    cs.LG cs.AI stat.ML

    Critic Regularized Regression

    Authors: Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

    Abstract: Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learnin… ▽ More

    Submitted 22 September, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

    Comments: 24 pages; presented at NeurIPS 2020

  4. arXiv:1508.03666  [pdf, other

    stat.ML

    Unbounded Bayesian Optimization via Regularization

    Authors: Bobak Shahriari, Alexandre Bouchard-Côté, Nando de Freitas

    Abstract: Bayesian optimization has recently emerged as a popular and efficient tool for global optimization and hyperparameter tuning. Currently, the established Bayesian optimization practice requires a user-defined bounding box which is assumed to contain the optimizer. However, when little is known about the probed objective function, it can be difficult to prescribe such bounds. In this work we modify… ▽ More

    Submitted 14 August, 2015; originally announced August 2015.

    Comments: 9 pages, 4 figures

  5. arXiv:1410.7172  [pdf, other

    cs.LG math.OC stat.ML

    Heteroscedastic Treed Bayesian Optimisation

    Authors: John-Alexander M. Assael, Ziyu Wang, Bobak Shahriari, Nando de Freitas

    Abstract: Optimising black-box functions is important in many disciplines, such as tuning machine learning models, robotics, finance and mining exploration. Bayesian optimisation is a state-of-the-art technique for the global optimisation of black-box functions which are expensive to evaluate. At the core of this approach is a Gaussian process prior that captures our belief about the distribution over funct… ▽ More

    Submitted 4 March, 2015; v1 submitted 27 October, 2014; originally announced October 2014.

  6. arXiv:1406.4625  [pdf, other

    stat.ML cs.LG

    An Entropy Search Portfolio for Bayesian Optimization

    Authors: Bobak Shahriari, Ziyu Wang, Matthew W. Hoffman, Alexandre Bouchard-Côté, Nando de Freitas

    Abstract: Bayesian optimization is a sample-efficient method for black-box global optimization. How- ever, the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function, and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection… ▽ More

    Submitted 4 March, 2015; v1 submitted 18 June, 2014; originally announced June 2014.

    Comments: 10 pages, 5 figures

  7. arXiv:1303.6746  [pdf, other

    stat.ML cs.LG

    Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

    Authors: Matthew W. Hoffman, Bobak Shahriari, Nando de Freitas

    Abstract: We address the problem of finding the maximizer of a nonlinear smooth function, that can only be evaluated point-wise, subject to constraints on the number of permitted function evaluations. This problem is also known as fixed-budget best arm identification in the multi-armed bandit literature. We introduce a Bayesian approach for this problem and show that it empirically outperforms both the exis… ▽ More

    Submitted 11 November, 2013; v1 submitted 27 March, 2013; originally announced March 2013.