-
Taking the GP Out of the Loop
Authors:
David Sweet,
Siddhant anand Jadhav
Abstract:
Bayesian optimization (BO) has traditionally solved black box problems where evaluation is expensive and, therefore, design-evaluation pairs (i.e., observations) are few. Recently, there has been growing interest in applying BO to problems where evaluation is cheaper and, thus, observations are more plentiful. An impediment to scaling BO to many observations, $N$, is the $O(N^3)$ scaling of a na{ï…
▽ More
Bayesian optimization (BO) has traditionally solved black box problems where evaluation is expensive and, therefore, design-evaluation pairs (i.e., observations) are few. Recently, there has been growing interest in applying BO to problems where evaluation is cheaper and, thus, observations are more plentiful. An impediment to scaling BO to many observations, $N$, is the $O(N^3)$ scaling of a na{ï}ve query of the Gaussian process (GP) surrogate. Modern implementations reduce this to $O(N^2)$, but the GP remains a bottleneck. We propose Epistemic Nearest Neighbors (ENN), a surrogate that estimates function values and epistemic uncertainty from $K$ nearest-neighbor observations. ENN has $O(N)$ query time and omits hyperparameter fitting, leaving uncertainty uncalibrated. To accommodate the lack of calibration, we employ an acquisition method based on Pareto-optimal tradeoffs between predicted value and uncertainty. Our proposed method, TuRBO-ENN, replaces the GP surrogate in TuRBO with ENN and its Thompson sampling acquisition method with our Pareto-based alternative. We demonstrate numerically that TuRBO-ENN can reduce the time to generate proposals by one to two orders of magnitude compared to TuRBO and scales to thousands of observations.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Fast, Precise Thompson Sampling for Bayesian Optimization
Authors:
David Sweet
Abstract:
Thompson sampling (TS) has optimal regret and excellent empirical performance in multi-armed bandit problems. Yet, in Bayesian optimization, TS underperforms popular acquisition functions (e.g., EI, UCB). TS samples arms according to the probability that they are optimal. A recent algorithm, P-Star Sampler (PSS), performs such a sampling via Hit-and-Run. We present an improved version, Stagger Tho…
▽ More
Thompson sampling (TS) has optimal regret and excellent empirical performance in multi-armed bandit problems. Yet, in Bayesian optimization, TS underperforms popular acquisition functions (e.g., EI, UCB). TS samples arms according to the probability that they are optimal. A recent algorithm, P-Star Sampler (PSS), performs such a sampling via Hit-and-Run. We present an improved version, Stagger Thompson Sampler (STS). STS more precisely locates the maximizer than does TS using less computation time. We demonstrate that STS outperforms TS, PSS, and other acquisition methods in numerical experiments of optimizations of several test functions across a broad range of dimension. Additionally, since PSS was originally presented not as a standalone acquisition method but as an input to a batching algorithm called Minimal Terminal Variance (MTV), we also demon-strate that STS matches PSS performance when used as the input to MTV.
△ Less
Submitted 29 November, 2024; v1 submitted 25 November, 2024;
originally announced November 2024.
-
Optimal Initialization of Batch Bayesian Optimization
Authors:
Jiuge Ren,
David Sweet
Abstract:
Field experiments and computer simulations are effective but time-consuming methods of measuring the quality of engineered systems at different settings. To reduce the total time required, experimenters may employ Bayesian optimization, which is parsimonious with measurements, and take measurements of multiple settings simultaneously, in a batch. In practice, experimenters use very few batches, th…
▽ More
Field experiments and computer simulations are effective but time-consuming methods of measuring the quality of engineered systems at different settings. To reduce the total time required, experimenters may employ Bayesian optimization, which is parsimonious with measurements, and take measurements of multiple settings simultaneously, in a batch. In practice, experimenters use very few batches, thus, it is imperative that each batch be as informative as possible. Typically, the initial batch in a Batch Bayesian Optimization (BBO) is constructed from a quasi-random sample of settings values. We propose a batch-design acquisition function, Minimal Terminal Variance (MTV), that designs a batch by optimization rather than random sampling. MTV adapts a design criterion function from Design of Experiments, called I-Optimality, which minimizes the variance of the post-evaluation estimates of quality, integrated over the entire space of settings. MTV weights the integral by the probability that a setting is optimal, making it able to design not only an initial batch but all subsequent batches, as well. Applicability to both initialization and subsequent batches is novel among acquisition functions. Numerical experiments on test functions and simulators show that MTV compares favorably to other BBO methods.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.