-
A Millimeter-Wave Photometric Camera for Long-Range Imaging Through Optical Obscurants Using Kinetic Inductance Detectors
Authors:
Jack Sayers,
Daniel Cunnane,
Sage Crystian,
Peter K. Day,
Fabien Defrance,
Byeong Ho Eom,
Jonathan Greenfield,
Matthew Hollister,
Bradley R. Johnson,
Henry G. LeDuc,
Philip Mauskopf,
Nia McNichols,
Cody Roberson,
Marcus C. Runyan,
Adhitya B. Sriram,
Sage Stanton,
Ryan C. Stephenson,
Liam C. Walters,
Eric Weeks
Abstract:
Passive imaging through optical obscurants is a promising application for mm-wave sensing. We have thus developed the Superconducting Kinetic Inductance Passive Radiometer (SKIPR), a 150 GHz polarization-sensitive photometric camera optimized for terrestrial imaging using a focal plane array with 3,840 kinetic inductance detectors (KIDs). We present a full description of the instrument design, wit…
▽ More
Passive imaging through optical obscurants is a promising application for mm-wave sensing. We have thus developed the Superconducting Kinetic Inductance Passive Radiometer (SKIPR), a 150 GHz polarization-sensitive photometric camera optimized for terrestrial imaging using a focal plane array with 3,840 kinetic inductance detectors (KIDs). We present a full description of the instrument design, with a particular emphasis on the cryogenic system based on a Gifford-McMahon cryocooler with a two-stage Adiabatic Demagnetization Refrigerator and a dedicated 1.59 m crossed Dragone telescope with an altitude/azimuth mount. We include a detailed lab-based characterization of the KIDs, which results in a determination of their superconducting resonator parameters and optical properties. We also present in situ measurements from the telescope, including point-spread functions and noise characterization. In sum, we find that SKIPR performs as expected, providing diffraction-limited imaging with detector noise performance set by the random arrivals of photons from the ambient background. There is minimal variation in detector characteristics over the full SKIPR focal plane array, and the overall detector yield is 92 per cent.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Concept Bottleneck Language Models For protein design
Authors:
Aya Abdelsalam Ismail,
Tuomas Oikarinen,
Amy Wang,
Julius Adebayo,
Samuel Stanton,
Taylor Joren,
Joseph Kleinhenz,
Allen Goodman,
Héctor Corrada Bravo,
Kyunghyun Cho,
Nathan C. Frey
Abstract:
We introduce Concept Bottleneck Protein Language Models (CB-pLM), a generative masked language model with a layer where each neuron corresponds to an interpretable concept. Our architecture offers three key benefits: i) Control: We can intervene on concept values to precisely control the properties of generated proteins, achieving a 3 times larger change in desired concept values compared to basel…
▽ More
We introduce Concept Bottleneck Protein Language Models (CB-pLM), a generative masked language model with a layer where each neuron corresponds to an interpretable concept. Our architecture offers three key benefits: i) Control: We can intervene on concept values to precisely control the properties of generated proteins, achieving a 3 times larger change in desired concept values compared to baselines. ii) Interpretability: A linear mapping between concept values and predicted tokens allows transparent analysis of the model's decision-making process. iii) Debugging: This transparency facilitates easy debugging of trained models. Our models achieve pre-training perplexity and downstream task performance comparable to traditional masked protein language models, demonstrating that interpretability does not compromise performance. While adaptable to any language model, we focus on masked protein language models due to their importance in drug discovery and the ability to validate our model's capabilities through real-world experiments and expert knowledge. We scale our CB-pLM from 24 million to 3 billion parameters, making them the largest Concept Bottleneck Models trained and the first capable of generative language modeling.
△ Less
Submitted 11 December, 2024; v1 submitted 9 November, 2024;
originally announced November 2024.
-
Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks
Authors:
Angelica Chen,
Samuel D. Stanton,
Frances Ding,
Robert G. Alberstein,
Andrew M. Watkins,
Richard Bonneau,
Vladimir Gligorijević,
Kyunghyun Cho,
Nathan C. Frey
Abstract:
Although large language models (LLMs) have shown promise in biomolecule optimization problems, they incur heavy computational costs and struggle to satisfy precise constraints. On the other hand, specialized solvers like LaMBO-2 offer efficiency and fine-grained control but require more domain expertise. Comparing these approaches is challenging due to expensive laboratory validation and inadequat…
▽ More
Although large language models (LLMs) have shown promise in biomolecule optimization problems, they incur heavy computational costs and struggle to satisfy precise constraints. On the other hand, specialized solvers like LaMBO-2 offer efficiency and fine-grained control but require more domain expertise. Comparing these approaches is challenging due to expensive laboratory validation and inadequate synthetic benchmarks. We address this by introducing Ehrlich functions, a synthetic test suite that captures the geometric structure of biophysical sequence optimization problems. With prompting alone, off-the-shelf LLMs struggle to optimize Ehrlich functions. In response, we propose LLOME (Language Model Optimization with Margin Expectation), a bilevel optimization routine for online black-box optimization. When combined with a novel preference learning loss, we find LLOME can not only learn to solve some Ehrlich functions, but can even outperform LaMBO-2 on moderately difficult Ehrlich variants. However, LLOME is comparable to LaMBO-2 on very easy or difficult variants, exhibits some likelihood-reward miscalibration, and struggles without explicit rewards. Our results indicate LLMs can provide significant benefits in some cases, but specialized solvers are still competitive and incur less overhead.
△ Less
Submitted 2 April, 2025; v1 submitted 29 October, 2024;
originally announced October 2024.
-
Closed-Form Test Functions for Biophysical Sequence Optimization Algorithms
Authors:
Samuel Stanton,
Robert Alberstein,
Nathan Frey,
Andrew Watkins,
Kyunghyun Cho
Abstract:
There is a growing body of work seeking to replicate the success of machine learning (ML) on domains like computer vision (CV) and natural language processing (NLP) to applications involving biophysical data. One of the key ingredients of prior successes in CV and NLP was the broad acceptance of difficult benchmarks that distilled key subproblems into approachable tasks that any junior researcher…
▽ More
There is a growing body of work seeking to replicate the success of machine learning (ML) on domains like computer vision (CV) and natural language processing (NLP) to applications involving biophysical data. One of the key ingredients of prior successes in CV and NLP was the broad acceptance of difficult benchmarks that distilled key subproblems into approachable tasks that any junior researcher could investigate, but good benchmarks for biophysical domains are rare. This scarcity is partially due to a narrow focus on benchmarks which simulate biophysical data; we propose instead to carefully abstract biophysical problems into simpler ones with key geometric similarities. In particular we propose a new class of closed-form test functions for biophysical sequence optimization, which we call Ehrlich functions. We provide empirical results demonstrating these functions are interesting objects of study and can be non-trivial to solve with a standard genetic optimization baseline.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them)
Authors:
Drew Prinster,
Samuel Stanton,
Anqi Liu,
Suchi Saria
Abstract:
As artificial intelligence (AI) / machine learning (ML) gain widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when such systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the da…
▽ More
As artificial intelligence (AI) / machine learning (ML) gain widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when such systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction is a promising approach to uncertainty and risk quantification, but prior variants' validity guarantees have assumed some form of ``quasi-exchangeability'' on the data distribution, thereby excluding many types of sequential shifts. In this paper we prove that conformal prediction can theoretically be extended to \textit{any} joint data distribution, not just exchangeable or quasi-exchangeable ones. Although the most general case is exceedingly impractical to compute, for concrete practical applications we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of AI/ML-agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks.
△ Less
Submitted 5 June, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Protein Design with Guided Discrete Diffusion
Authors:
Nate Gruver,
Samuel Stanton,
Nathan C. Frey,
Tim G. J. Rudner,
Isidro Hotzel,
Julien Lafrance-Vanasse,
Arvind Rajpal,
Kyunghyun Cho,
Andrew Gordon Wilson
Abstract:
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to…
▽ More
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to develop guided diffusion models for structure with inverse folding to recover sequences. In this work, we propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models that follows gradients in the hidden states of the denoising network. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods, including scarce data and challenging inverse design. Moreover, we use NOS to generalize LaMBO, a Bayesian optimization procedure for sequence design that facilitates multiple objectives and edit-based constraints. The resulting method, LaMBO-2, enables discrete diffusions and stronger performance with limited edits through a novel application of saliency maps. We apply LaMBO-2 to a real-world protein design task, optimizing antibodies for higher expression yield and binding affinity to several therapeutic targets under locality and developability constraints, attaining a 99% expression rate and 40% binding rate in exploratory in vitro experiments.
△ Less
Submitted 12 December, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
GAUCHE: A Library for Gaussian Processes in Chemistry
Authors:
Ryan-Rhys Griffiths,
Leo Klarner,
Henry B. Moss,
Aditya Ravuri,
Sang Truong,
Samuel Stanton,
Gary Tom,
Bojana Rankovic,
Yuanqi Du,
Arian Jamasb,
Aryan Deshwal,
Julius Schwartz,
Austin Tripp,
Gregory Kell,
Simon Frieder,
Anthony Bourached,
Alex Chan,
Jacob Moss,
Chengzhi Guo,
Johannes Durholt,
Saudamini Chaurasia,
Felix Strieth-Kalthoff,
Alpha A. Lee,
Bingqing Cheng,
Alán Aspuru-Guzik
, et al. (2 additional authors not shown)
Abstract:
We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations, however, is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings…
▽ More
We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations, however, is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings and bit vectors. By defining such kernels in GAUCHE, we seek to open the door to powerful tools for uncertainty quantification and Bayesian optimisation in chemistry. Motivated by scenarios frequently encountered in experimental chemistry, we showcase applications for GAUCHE in molecular discovery and chemical reaction optimisation. The codebase is made available at https://github.com/leojklarner/gauche
△ Less
Submitted 21 February, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Bayesian Optimization with Conformal Prediction Sets
Authors:
Samuel Stanton,
Wesley Maddox,
Andrew Gordon Wilson
Abstract:
Bayesian optimization is a coherent, ubiquitous approach to decision-making under uncertainty, with applications including multi-arm bandits, active learning, and black-box optimization. Bayesian optimization selects decisions (i.e. objective function queries) with maximal expected utility with respect to the posterior distribution of a Bayesian model, which quantifies reducible, epistemic uncerta…
▽ More
Bayesian optimization is a coherent, ubiquitous approach to decision-making under uncertainty, with applications including multi-arm bandits, active learning, and black-box optimization. Bayesian optimization selects decisions (i.e. objective function queries) with maximal expected utility with respect to the posterior distribution of a Bayesian model, which quantifies reducible, epistemic uncertainty about query outcomes. In practice, subjectively implausible outcomes can occur regularly for two reasons: 1) model misspecification and 2) covariate shift. Conformal prediction is an uncertainty quantification method with coverage guarantees even for misspecified models and a simple mechanism to correct for covariate shift. We propose conformal Bayesian optimization, which directs queries towards regions of search space where the model predictions have guaranteed validity, and investigate its behavior on a suite of black-box optimization tasks and tabular ranking tasks. In many cases we find that query coverage can be significantly improved without harming sample-efficiency.
△ Less
Submitted 12 December, 2023; v1 submitted 22 October, 2022;
originally announced October 2022.
-
PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design
Authors:
Ji Won Park,
Samuel Stanton,
Saeed Saremi,
Andrew Watkins,
Henri Dwyer,
Vladimir Gligorijevic,
Richard Bonneau,
Stephen Ra,
Kyunghyun Cho
Abstract:
Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarch…
▽ More
Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarchical dependency structure. We consider a common use case where some regions of the Pareto frontier are prioritized over others according to a specified $\textit{partial ordering}$ in the objectives. For instance, when designing antibodies, we would like to maximize the binding affinity to a target antigen only if it can be expressed in live cell culture -- modeling the experimental dependency in which affinity can only be measured for antibodies that can be expressed and thus produced in viable quantities. In general, we may want to confer a partial ordering to the properties such that each property is optimized conditioned on its parent properties satisfying some feasibility condition. To this end, we present PropertyDAG, a framework that operates on top of the traditional multi-objective BO to impose this desired ordering on the objectives, e.g. expression $\rightarrow$ affinity. We demonstrate its performance over multiple simulated active learning iterations on a penicillin production task, toy numerical problem, and a real-world antibody design task.
△ Less
Submitted 8 October, 2022;
originally announced October 2022.
-
Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders
Authors:
Samuel Stanton,
Wesley Maddox,
Nate Gruver,
Phillip Maffettone,
Emily Delaney,
Peyton Greenside,
Andrew Gordon Wilson
Abstract:
Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of mult…
▽ More
Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit tradeoff over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on two small-molecule design tasks, and introduce new tasks optimizing \emph{in silico} and \emph{in vitro} properties of large-molecule fluorescent proteins. In our experiments LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that BayesOpt is practical and effective for biological sequence design.
△ Less
Submitted 12 July, 2022; v1 submitted 23 March, 2022;
originally announced March 2022.
-
Deconstructing the Inductive Biases of Hamiltonian Neural Networks
Authors:
Nate Gruver,
Marc Finzi,
Samuel Stanton,
Andrew Gordon Wilson
Abstract:
Physics-inspired neural networks (NNs), such as Hamiltonian or Lagrangian NNs, dramatically outperform other learned dynamics models by leveraging strong inductive biases. These models, however, are challenging to apply to many real world systems, such as those that don't conserve energy or contain contacts, a common setting for robotics and reinforcement learning. In this paper, we examine the in…
▽ More
Physics-inspired neural networks (NNs), such as Hamiltonian or Lagrangian NNs, dramatically outperform other learned dynamics models by leveraging strong inductive biases. These models, however, are challenging to apply to many real world systems, such as those that don't conserve energy or contain contacts, a common setting for robotics and reinforcement learning. In this paper, we examine the inductive biases that make physics-inspired models successful in practice. We show that, contrary to conventional wisdom, the improved generalization of HNNs is the result of modeling acceleration directly and avoiding artificial complexity from the coordinate system, rather than symplectic structure or energy conservation. We show that by relaxing the inductive biases of these models, we can match or exceed performance on energy-conserving systems while dramatically improving performance on practical, non-conservative systems. We extend this approach to constructing transition models for common Mujoco environments, showing that our model can appropriately balance inductive biases with the flexibility required for model-based control.
△ Less
Submitted 11 February, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
Conditioning Sparse Variational Gaussian Processes for Online Decision-making
Authors:
Wesley J. Maddox,
Samuel Stanton,
Andrew Gordon Wilson
Abstract:
With a principled representation of uncertainty and closed form posterior updates, Gaussian processes (GPs) are a natural choice for online decision making. However, Gaussian processes typically require at least $\mathcal{O}(n^2)$ computations for $n$ training points, limiting their general applicability. Stochastic variational Gaussian processes (SVGPs) can provide scalable inference for a datase…
▽ More
With a principled representation of uncertainty and closed form posterior updates, Gaussian processes (GPs) are a natural choice for online decision making. However, Gaussian processes typically require at least $\mathcal{O}(n^2)$ computations for $n$ training points, limiting their general applicability. Stochastic variational Gaussian processes (SVGPs) can provide scalable inference for a dataset of fixed size, but are difficult to efficiently condition on new data. We propose online variational conditioning (OVC), a procedure for efficiently conditioning SVGPs in an online setting that does not require re-training through the evidence lower bound with the addition of new data. OVC enables the pairing of SVGPs with advanced look-ahead acquisition functions for black-box optimization, even with non-Gaussian likelihoods. We show OVC provides compelling performance in a range of applications including active learning of malaria incidence, and reinforcement learning on MuJoCo simulated robotic control tasks.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Does Knowledge Distillation Really Work?
Authors:
Samuel Stanton,
Pavel Izmailov,
Polina Kirichenko,
Alexander A. Alemi,
Andrew Gordon Wilson
Abstract:
Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not typically work as it is commonly understood: there often remains a surprisingly large discrepancy between the predictive distributions of the teacher and the s…
▽ More
Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not typically work as it is commonly understood: there often remains a surprisingly large discrepancy between the predictive distributions of the teacher and the student, even in cases when the student has the capacity to perfectly match the teacher. We identify difficulties in optimization as a key reason for why the student is unable to match the teacher. We also show how the details of the dataset used for distillation play a role in how closely the student matches the teacher -- and that more closely matching the teacher paradoxically does not always lead to better student generalization.
△ Less
Submitted 6 December, 2021; v1 submitted 10 June, 2021;
originally announced June 2021.
-
Kernel Interpolation for Scalable Online Gaussian Processes
Authors:
Samuel Stanton,
Wesley J. Maddox,
Ian Delbridge,
Andrew Gordon Wilson
Abstract:
Gaussian processes (GPs) provide a gold standard for performance in online settings, such as sample-efficient control and black box optimization, where we need to update a posterior distribution as we acquire data in a sequential fashion. However, updating a GP posterior to accommodate even a single new observation after having observed $n$ points incurs at least $O(n)$ computations in the exact s…
▽ More
Gaussian processes (GPs) provide a gold standard for performance in online settings, such as sample-efficient control and black box optimization, where we need to update a posterior distribution as we acquire data in a sequential fashion. However, updating a GP posterior to accommodate even a single new observation after having observed $n$ points incurs at least $O(n)$ computations in the exact setting. We show how to use structured kernel interpolation to efficiently recycle computations for constant-time $O(1)$ online updates with respect to the number of points $n$, while retaining exact inference. We demonstrate the promise of our approach in a range of online regression and classification settings, Bayesian optimization, and active sampling to reduce error in malaria incidence forecasting. Code is available at https://github.com/wjmaddox/online_gp.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Ultra Short Period Planets in K2 III: Neighbors are Common with 13 New Multi-Planet Systems and 10 Newly Validated Planets in Campaigns 0-8, 10
Authors:
Elisabeth R. Adams,
Brian Jackson,
Samantha Johnson,
David R. Ciardi,
William D. Cochran,
Michael Endl,
Mark E. Everett,
Elise Furlan,
Steve B. Howell,
Prasanna Jayanthi,
Phillip J. MacQueen,
Rachel A. Matson,
Ciera Partyka-Worley,
Joshua Schlieder,
Nicholas J. Scott,
Sevio M. Stanton,
Carl Ziegler
Abstract:
Using the EVEREST photometry pipeline, we have identified 74 candidate ultra-short-period planets (orbital period P<1 d) in the first half of the K2 data (Campaigns 0-8 and 10). Of these, 33 candidates have not previously been reported. A systematic search for additional transiting planets found 13 new multi-planet systems, doubling the number known and representing a third (32%) of USPs. We also…
▽ More
Using the EVEREST photometry pipeline, we have identified 74 candidate ultra-short-period planets (orbital period P<1 d) in the first half of the K2 data (Campaigns 0-8 and 10). Of these, 33 candidates have not previously been reported. A systematic search for additional transiting planets found 13 new multi-planet systems, doubling the number known and representing a third (32%) of USPs. We also identified 30 companions, which have periods from 1.4 to 31 days (median 5.5 d). A third (36 of 104) of the candidate USPs and companions have been statistically validated or confirmed, 10 for the first time, including 7 USPs. Almost all candidates, and all validated planets, are small (radii Rp<=3 R_E) with a median radius of R_p=1.1 R_E; the validated and confirmed candidates have radii between 0.4 R_E and 2.4 R_E and periods from P=0.18 to 0.96 d. The lack of candidate (a) ultra-hot-Jupiters (R_p>10 R_E) and (b) short-period desert (3<=Rp<=10 R_E) planets suggests that both populations are rare, although our survey may have missed some of the very deepest transits. These results also provide strong evidence that we have not reached a lower limit on the distribution of planetary radius values for planets at close proximity to a star, and suggest that additional improvements in photometry techniques would yield yet more ultra-short-period planets. The large fraction of USPs in known multi-planet systems supports origins models that involve dynamical interactions with exterior planets coupled to tidal decay of the USP orbits.
△ Less
Submitted 19 May, 2021; v1 submitted 23 November, 2020;
originally announced November 2020.
-
On the model-based stochastic value gradient for continuous reinforcement learning
Authors:
Brandon Amos,
Samuel Stanton,
Denis Yarats,
Andrew Gordon Wilson
Abstract:
For over a decade, model-based reinforcement learning has been seen as a way to leverage control-based domain knowledge to improve the sample-efficiency of reinforcement learning agents. While model-based agents are conceptually appealing, their policies tend to lag behind those of model-free agents in terms of final reward, especially in non-trivial environments. In response, researchers have pro…
▽ More
For over a decade, model-based reinforcement learning has been seen as a way to leverage control-based domain knowledge to improve the sample-efficiency of reinforcement learning agents. While model-based agents are conceptually appealing, their policies tend to lag behind those of model-free agents in terms of final reward, especially in non-trivial environments. In response, researchers have proposed model-based agents with increasingly complex components, from ensembles of probabilistic dynamics models, to heuristics for mitigating model error. In a reversal of this trend, we show that simple model-based agents can be derived from existing ideas that not only match, but outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward. We find that a model-free soft value estimate for policy evaluation and a model-based stochastic value gradient for policy improvement is an effective combination, achieving state-of-the-art results on a high-dimensional humanoid control task, which most model-based agents are unable to solve. Our findings suggest that model-based policy evaluation deserves closer attention.
△ Less
Submitted 27 May, 2021; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data
Authors:
Marc Finzi,
Samuel Stanton,
Pavel Izmailov,
Andrew Gordon Wilson
Abstract:
The translation equivariance of convolutional layers enables convolutional neural networks to generalize well on image problems. While translation equivariance provides a powerful inductive bias for images, we often additionally desire equivariance to other transformations, such as rotations, especially for non-image data. We propose a general method to construct a convolutional layer that is equi…
▽ More
The translation equivariance of convolutional layers enables convolutional neural networks to generalize well on image problems. While translation equivariance provides a powerful inductive bias for images, we often additionally desire equivariance to other transformations, such as rotations, especially for non-image data. We propose a general method to construct a convolutional layer that is equivariant to transformations from any specified Lie group with a surjective exponential map. Incorporating equivariance to a new group requires implementing only the group exponential and logarithm maps, enabling rapid prototyping. Showcasing the simplicity and generality of our method, we apply the same model architecture to images, ball-and-stick molecular data, and Hamiltonian dynamical systems. For Hamiltonian systems, the equivariance of our models is especially impactful, leading to exact conservation of linear and angular momentum.
△ Less
Submitted 24 September, 2020; v1 submitted 25 February, 2020;
originally announced February 2020.