-
BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery
Authors:
Peter St. John,
Dejun Lin,
Polina Binder,
Malcolm Greaves,
Vega Shah,
John St. John,
Adrian Lange,
Patrick Hsu,
Rajesh Illango,
Arvind Ramanathan,
Anima Anandkumar,
David H Brookes,
Akosua Busia,
Abhishaike Mahajan,
Stephen Malina,
Neha Prasad,
Sam Sinai,
Lindsay Edwards,
Thomas Gaudelet,
Cristian Regep,
Martin Steinegger,
Burkhard Rost,
Alexander Brace,
Kyle Hippe,
Luca Naef
, et al. (63 additional authors not shown)
Abstract:
Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational bio…
▽ More
Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.
△ Less
Submitted 9 June, 2025; v1 submitted 15 November, 2024;
originally announced November 2024.
-
Season combinatorial intervention predictions with Salt & Peper
Authors:
Thomas Gaudelet,
Alice Del Vecchio,
Eli M Carrami,
Juliana Cudini,
Chantriolnt-Andreas Kapourani,
Caroline Uhler,
Lindsay Edwards
Abstract:
Interventions play a pivotal role in the study of complex biological systems. In drug discovery, genetic interventions (such as CRISPR base editing) have become central to both identifying potential therapeutic targets and understanding a drug's mechanism of action. With the advancement of CRISPR and the proliferation of genome-scale analyses such as transcriptomics, a new challenge is to navigate…
▽ More
Interventions play a pivotal role in the study of complex biological systems. In drug discovery, genetic interventions (such as CRISPR base editing) have become central to both identifying potential therapeutic targets and understanding a drug's mechanism of action. With the advancement of CRISPR and the proliferation of genome-scale analyses such as transcriptomics, a new challenge is to navigate the vast combinatorial space of concurrent genetic interventions. Addressing this, our work concentrates on estimating the effects of pairwise genetic combinations on the cellular transcriptome. We introduce two novel contributions: Salt, a biologically-inspired baseline that posits the mostly additive nature of combination effects, and Peper, a deep learning model that extends Salt's additive assumption to achieve unprecedented accuracy. Our comprehensive comparison against existing state-of-the-art methods, grounded in diverse metrics, and our out-of-distribution analysis highlight the limitations of current models in realistic settings. This analysis underscores the necessity for improved modelling techniques and data acquisition strategies, paving the way for more effective exploration of genetic intervention effects.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Analysis of EEG data using complex geometric structurization
Authors:
Eddy Kwessi,
Lloyd Edwards
Abstract:
Electroencephalogram (EEG) is a common tool used to understand brain activities. The data are typically obtained by placing electrodes at the surface of the scalp and recording the oscillations of currents passing through the electrodes. These oscillations can sometimes lead to various interpretations, depending on the subject's health condition, the experiment carried out, the sensitivity of the…
▽ More
Electroencephalogram (EEG) is a common tool used to understand brain activities. The data are typically obtained by placing electrodes at the surface of the scalp and recording the oscillations of currents passing through the electrodes. These oscillations can sometimes lead to various interpretations, depending on the subject's health condition, the experiment carried out, the sensitivity of the tools used, human manipulations etc. The data obtained over time can be considered a time series. There is evidence in the literature that epilepsy EEG data may be chaotic. Either way, the embedding theory in dynamical systems suggests that time series from a complex system could be used to reconstruct its phase space under proper conditions. In this paper, we propose an analysis of epilepsy electroencephalogram time series data based on a novel approach dubbed complex geometric structurization. Complex geometric structurization stems from the construction of strange attractors using embedding theory from dynamical systems. The complex geometric structures are themselves obtained using a geometry tool, namely the $α$-shapes from shape analysis. Initial analyses show a proof of concept in that these complex structures capture the expected changes brain in lobes under consideration. Further, a deeper analysis suggests that these complex structures can be used as biomarkers for seizure changes.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.