-
A survey and benchmark of high-dimensional Bayesian optimization of discrete sequences
Authors:
Miguel González-Duque,
Richard Michael,
Simon Bartels,
Yevgen Zainchkovskyy,
Søren Hauberg,
Wouter Boomsma
Abstract:
Optimizing discrete black-box functions is key in several domains, e.g. protein engineering and drug design. Due to the lack of gradient information and the need for sample efficiency, Bayesian optimization is an ideal candidate for these tasks. Several methods for high-dimensional continuous and categorical Bayesian optimization have been proposed recently. However, our survey of the field reveal…
▽ More
Optimizing discrete black-box functions is key in several domains, e.g. protein engineering and drug design. Due to the lack of gradient information and the need for sample efficiency, Bayesian optimization is an ideal candidate for these tasks. Several methods for high-dimensional continuous and categorical Bayesian optimization have been proposed recently. However, our survey of the field reveals highly heterogeneous experimental set-ups across methods and technical barriers for the replicability and application of published algorithms to real-world tasks. To address these issues, we develop a unified framework to test a vast array of high-dimensional Bayesian optimization methods and a collection of standardized black-box functions representing real-world application domains in chemistry and biology. These two components of the benchmark are each supported by flexible, scalable, and easily extendable software libraries (poli and poli-baselines), allowing practitioners to readily incorporate new optimization objectives or discrete optimizers. Project website: https://machinelearninglifescience.github.io/hdbo_benchmark
△ Less
Submitted 4 November, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
A Continuous Relaxation for Discrete Bayesian Optimization
Authors:
Richard Michael,
Simon Bartels,
Miguel González-Duque,
Yevgen Zainchkovskyy,
Jes Frellsen,
Søren Hauberg,
Wouter Boomsma
Abstract:
To optimize efficiently over discrete data and with only few available target observations is a challenge in Bayesian optimization. We propose a continuous relaxation of the objective function and show that inference and optimization can be computationally tractable. We consider in particular the optimization domain where very few observations and strict budgets exist; motivated by optimizing prot…
▽ More
To optimize efficiently over discrete data and with only few available target observations is a challenge in Bayesian optimization. We propose a continuous relaxation of the objective function and show that inference and optimization can be computationally tractable. We consider in particular the optimization domain where very few observations and strict budgets exist; motivated by optimizing protein sequences for expensive to evaluate bio-chemical properties. The advantages of our approach are two-fold: the problem is treated in the continuous setting, and available prior knowledge over sequences can be incorporated directly. More specifically, we utilize available and learned distributions over the problem domain for a weighting of the Hellinger distance which yields a covariance function. We show that the resulting acquisition function can be optimized with both continuous or discrete optimization algorithms and empirically assess our method on two bio-chemical sequence optimization tasks.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Kernel-Matrix Determinant Estimates from stopped Cholesky Decomposition
Authors:
Simon Bartels,
Wouter Boomsma,
Jes Frellsen,
Damien Garreau
Abstract:
Algorithms involving Gaussian processes or determinantal point processes typically require computing the determinant of a kernel matrix. Frequently, the latter is computed from the Cholesky decomposition, an algorithm of cubic complexity in the size of the matrix. We show that, under mild assumptions, it is possible to estimate the determinant from only a sub-matrix, with probabilistic guarantee o…
▽ More
Algorithms involving Gaussian processes or determinantal point processes typically require computing the determinant of a kernel matrix. Frequently, the latter is computed from the Cholesky decomposition, an algorithm of cubic complexity in the size of the matrix. We show that, under mild assumptions, it is possible to estimate the determinant from only a sub-matrix, with probabilistic guarantee on the relative error. We present an augmentation of the Cholesky decomposition that stops under certain conditions before processing the whole matrix. Experiments demonstrate that this can save a considerable amount of time while having an overhead of less than $5\%$ when not stopping early. More generally, we present a probabilistic stopping strategy for the approximation of a sum of known length where addends are revealed sequentially. We do not assume independence between addends, only that they are bounded from below and decrease in conditional expectation.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
Conjugate Gradients for Kernel Machines
Authors:
Simon Bartels,
Philipp Hennig
Abstract:
Regularized least-squares (kernel-ridge / Gaussian process) regression is a fundamental algorithm of statistics and machine learning. Because generic algorithms for the exact solution have cubic complexity in the number of datapoints, large datasets require to resort to approximations. In this work, the computation of the least-squares prediction is itself treated as a probabilistic inference prob…
▽ More
Regularized least-squares (kernel-ridge / Gaussian process) regression is a fundamental algorithm of statistics and machine learning. Because generic algorithms for the exact solution have cubic complexity in the number of datapoints, large datasets require to resort to approximations. In this work, the computation of the least-squares prediction is itself treated as a probabilistic inference problem. We propose a structured Gaussian regression model on the kernel function that uses projections of the kernel matrix to obtain a low-rank approximation of the kernel and the matrix. A central result is an enhanced way to use the method of conjugate gradients for the specific setting of least-squares regression as encountered in machine learning. Our method improves the approximation of the kernel ridge regressor / Gaussian process posterior mean over vanilla conjugate gradients and, allows computation of the posterior variance and the log marginal likelihood (evidence) without further overhead.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
Probabilistic Linear Solvers: A Unifying View
Authors:
Simon Bartels,
Jon Cockayne,
Ilse C. F. Ipsen,
Philipp Hennig
Abstract:
Several recent works have developed a new, probabilistic interpretation for numerical algorithms solving linear systems in which the solution is inferred in a Bayesian framework, either directly or by inferring the unknown action of the matrix inverse. These approaches have typically focused on replicating the behavior of the conjugate gradient method as a prototypical iterative method. In this wo…
▽ More
Several recent works have developed a new, probabilistic interpretation for numerical algorithms solving linear systems in which the solution is inferred in a Bayesian framework, either directly or by inferring the unknown action of the matrix inverse. These approaches have typically focused on replicating the behavior of the conjugate gradient method as a prototypical iterative method. In this work surprisingly general conditions for equivalence of these disparate methods are presented. We also describe connections between probabilistic linear solvers and projection methods for linear systems, providing a probabilistic interpretation of a far more general class of iterative methods. In particular, this provides such an interpretation of the generalised minimum residual method. A probabilistic view of preconditioning is also introduced. These developments unify the literature on probabilistic linear solvers, and provide foundational connections to the literature on iterative solvers for linear systems.
△ Less
Submitted 17 October, 2018; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets
Authors:
Aaron Klein,
Stefan Falkner,
Simon Bartels,
Philipp Hennig,
Frank Hutter
Abstract:
Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization, we propose a gen…
▽ More
Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed Fabolas, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that Fabolas often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband.
△ Less
Submitted 7 March, 2017; v1 submitted 23 May, 2016;
originally announced May 2016.