Skip to main content

Showing 1–6 of 6 results for author: Brookes, D H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2411.10548  [pdf, ps, other

    cs.LG q-bio.BM

    BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

    Authors: Peter St. John, Dejun Lin, Polina Binder, Malcolm Greaves, Vega Shah, John St. John, Adrian Lange, Patrick Hsu, Rajesh Illango, Arvind Ramanathan, Anima Anandkumar, David H Brookes, Akosua Busia, Abhishaike Mahajan, Stephen Malina, Neha Prasad, Sam Sinai, Lindsay Edwards, Thomas Gaudelet, Cristian Regep, Martin Steinegger, Burkhard Rost, Alexander Brace, Kyle Hippe, Luca Naef , et al. (68 additional authors not shown)

    Abstract: Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational bio… ▽ More

    Submitted 12 June, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

  2. arXiv:2311.05363  [pdf, other

    cs.LG q-bio.QM

    Beyond the training set: an intuitive method for detecting distribution shift in model-based optimization

    Authors: Farhan Damani, David H Brookes, Theodore Sternlieb, Cameron Webster, Stephen Malina, Rishi Jajoo, Kathy Lin, Sam Sinai

    Abstract: Model-based optimization (MBO) is increasingly applied to design problems in science and engineering. A common scenario involves using a fixed training set to train models, with the goal of designing new samples that outperform those present in the training data. A major challenge in this setting is distribution shift, where the distributions of training and design samples are different. While som… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  3. arXiv:2305.03136  [pdf, other

    q-bio.PE cs.LG

    Contrastive losses as generalized models of global epistasis

    Authors: David H. Brookes, Jakub Otwinowski, Sam Sinai

    Abstract: Fitness functions map large combinatorial spaces of biological sequences to properties of interest. Inferring these multimodal functions from experimental data is a central task in modern protein engineering. Global epistasis models are an effective and physically-grounded class of models for estimating fitness functions from observed data. These models assume that a sparse latent function is tran… ▽ More

    Submitted 15 October, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

  4. arXiv:1905.10474  [pdf, ps, other

    cs.LG stat.ML

    A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization

    Authors: David H. Brookes, Akosua Busia, Clara Fannjiang, Kevin Murphy, Jennifer Listgarten

    Abstract: We show that a large class of Estimation of Distribution Algorithms, including, but not limited to, Covariance Matrix Adaption, can be written as a Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of infinite samples. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a new coherent framework with which to rea… ▽ More

    Submitted 10 June, 2022; v1 submitted 24 May, 2019; originally announced May 2019.

  5. arXiv:1901.10060  [pdf, other

    cs.LG stat.ML

    Conditioning by adaptive sampling for robust design

    Authors: David H. Brookes, Hahnbeom Park, Jennifer Listgarten

    Abstract: We present a new method for design problems wherein the goal is to maximize or specify the value of one or more properties of interest. For example, in protein design, one may wish to find the protein sequence that maximizes fluorescence. We assume access to one or more, potentially black box, stochastic "oracle" predictive functions, each of which maps from input (e.g., protein sequences) design… ▽ More

    Submitted 11 May, 2021; v1 submitted 28 January, 2019; originally announced January 2019.

  6. arXiv:1810.03714  [pdf, other

    cs.LG q-bio.QM stat.ML

    Design by adaptive sampling

    Authors: David H. Brookes, Jennifer Listgarten

    Abstract: We present a probabilistic modeling framework and adaptive sampling algorithm wherein unsupervised generative models are combined with black box predictive models to tackle the problem of input design. In input design, one is given one or more stochastic "oracle" predictive functions, each of which maps from the input design space (e.g. DNA sequences or images) to a distribution over a property of… ▽ More

    Submitted 10 February, 2020; v1 submitted 8 October, 2018; originally announced October 2018.